## 2018年5月14日月曜日

### Python - NLTK - 言語処理とPython(集合、sorted関数、lower関数、重複の回避、注意点)

コード(Emacs)

Python 3

```#!/usr/bin/env python3

print('19.')
from nltk.book import *

l1 = sorted(set([w.lower() for w in text1]))
l2 = sorted([w.lower() for w in set(text1)])

# l1 より l2 の方が長さが大きくなる可能性がある
# l1 では abc と ABCは同じ abc になる
# l2 では abc と ABCは2つの abc になる

print(len(l1))
print(len(l2))

words = ['abc', 'ABC']
print(sorted(set([w.lower() for w in words])))
print(sorted([w.lower() for w in set(words)]))
```

```\$ ./sample13.py
19.
*** Introductory Examples for the NLTK Book ***
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
17231
19317
['abc']
['abc', 'abc']
\$
```