Python - 機械学習 - 帰納的学習 - 暗記学習 - n-gram 出現頻度に基づくテキストデータの学習 - n-gram作成アルゴリズム

開発環境

macOS Mojave - Apple (OS)
Emacs (Text Editor)
Windows 10 Pro (OS)
Visual Studio Code (Text Editor)
Python 3.7 (プログラミング言語)

はじめての機械学習 (小高知宏(著)、オーム社)の第3章(帰納的学習)、3.1(暗記学習)、3.1.2(n-gram 出現頻度に基づくテキストデータの学習)、n-gram作成アルゴリズムをC言語ではなくPythonで取り組んでみる。

コード

Python 3

#!/usr/bin/env python3
import sys

if len(sys.argv) != 2:
    print('Usage: ngram <N>', file=sys.stderr)
    sys.exit(1)

n = sys.argv[1]
try:
    n = int(n)
    if n < 1:
        raise ValueError('nの値が不適切です。')
except ValueError as err:
    print(err, file=sys.stderr)
    sys.exit(1)

ngram = sys.stdin.read(n)
print(ngram)
while ngram != '':
    c = sys.stdin.read(1)
    ngram = ngram[1:] + c
    print(ngram)

入出力結果(Bash、cmd.exe(コマンドプロンプト)、Terminal、Jupyter(IPython))

$ head alice.txt 
Alice's Adventures in Wonderland

[Also known as "Alice in Wonderland"]

by Lewis Carroll

May, 1997  [Etext #928]
[Date last updated: April 15, 2005]


$ ./ngram.py 5 < alice.txt > alice5gram.txt
$ head alice5gram.txt 
Alice
lice'
ice's
ce's 
e's A
's Ad
s Adv
 Adve
Adven
dvent
$ tail alice5gram.txt 

and

nd

d




$

Kamimura's blog

ほしい物リスト

2019年7月29日月曜日

Python - 機械学習 - 帰納的学習 - 暗記学習 - n-gram 出現頻度に基づくテキストデータの学習 - n-gram作成アルゴリズム

0 コメント:

コメントを投稿