2018年7月6日金曜日

開発環境

入門 自然言語処理 (Steven Bird (著)、Ewan Klein (著)、Edward Loper (著)、萩原 正人 (翻訳)、中山 敬広 (翻訳)、水野 貴明 (翻訳)、オライリージャパン)の2章(テキストコーパスと語彙資源へのアクセス)、2.8(演習問題)3.を取り組んでみる。

コード(Emacs)

Python 3

#!/usr/bin/env python3
import nltk

print('2.')

print('borwn')
brown = nltk.corpus.brown
for category in brown.categories():
    print(category)
    print(brown.words(categories=[category]))

print('webtext')
words = nltk.corpus.webtext.words()

for t in [words[:10], words[-10:]]:
    print(t)

入出力結果(Terminal, Jupyter(IPython))

$ ./sample3.py
2.
borwn
adventure
['Dan', 'Morgan', 'told', 'himself', 'he', 'would', ...]
belles_lettres
['Northern', 'liberals', 'are', 'the', 'chief', ...]
editorial
['Assembly', 'session', 'brought', 'much', 'good', ...]
fiction
['Thirty-three', 'Scotty', 'did', 'not', 'go', 'back', ...]
government
['The', 'Office', 'of', 'Business', 'Economics', '(', ...]
hobbies
['Too', 'often', 'a', 'beginning', 'bodybuilder', ...]
humor
['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]
learned
['1', '.', 'Introduction', 'It', 'has', 'recently', ...]
lore
['In', 'American', 'romance', ',', 'almost', 'nothing', ...]
mystery
['There', 'were', 'thirty-eight', 'patients', 'on', ...]
news
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
religion
['As', 'a', 'result', ',', 'although', 'we', 'still', ...]
reviews
['It', 'is', 'not', 'news', 'that', 'Nathan', ...]
romance
['They', 'neither', 'liked', 'nor', 'disliked', 'the', ...]
science_fiction
['Now', 'that', 'he', 'knew', 'himself', 'to', 'be', ...]
webtext
['Cookie', 'Manager', ':', '"', 'Don', "'", 't', 'allow', 'sites', 'that']
['but', 'it', 'is', 'very', 'good', '.', '***(*)', '</', 'ul', '>']
$

0 コメント:

コメントを投稿