Python - 機械学習 - 帰納的学習 - 暗記学習 - n-gram 出現頻度に基づくテキストデータの学習 - n-gramの出現頻度の解析(rankプログラム)

開発環境

macOS Mojave - Apple (OS)
Emacs (Text Editor)
Windows 10 Pro (OS)
Visual Studio Code (Text Editor)
Python 3.7 (プログラミング言語)

はじめての機械学習 (小高知宏(著)、オーム社)の第3章(帰納的学習)、3.1(暗記学習)、3.1.2(n-gram 出現頻度に基づくテキストデータの学習)、(3)n-gramの出現頻度の解析(rankプログラム)をC言語ではなくPythonで取り組んでみる。

コード

Python 3

#!/usr/bin/env python3
import sys
from collections import Counter

counter = Counter((s.strip() for s in sys.stdin))

for ngram, count in counter.most_common():
    print(f'{count:4} {ngram}')

入出力結果(Bash、cmd.exe(コマンドプロンプト)、Terminal、Jupyter(IPython))

$ ./ngram.py 5 < alice.txt | ./rank.py > rank.txt
$ head rank.txt 
7646 
1570 the
 888 '
 877 said
 758 and
 523 that
 485 she
 468 d the
 458 , and
 429 .
$ tail rank.txt 
   1 ld-li
   1 d-lif
   1 -life
   1 ppy s
   1 py su
   1 y sum
   1 ays.
   1 ys.
   1 En
   1 End o
$ head -20 rank.txt 
7646 
1570 the
 888 '
 877 said
 758 and
 523 that
 485 she
 468 d the
 458 , and
 429 .
 399 Alice
 392 Alic
 383 with
 333 was
 329 ' sai
 307 you
 291 very
 257 aid t
 253 this
 240 they
$

Kamimura's blog

2019年7月30日火曜日

Python - 機械学習 - 帰納的学習 - 暗記学習 - n-gram 出現頻度に基づくテキストデータの学習 - n-gramの出現頻度の解析(rankプログラム)

0 コメント:

コメントを投稿