In [1]: from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

In [2]: text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

In [3]: len(bigrms(text5))
NameError                                 Traceback (most recent call last)
<ipython-input-3-b482f4408ef8> in <module>()
----> 1 len(bigrms(text5))

NameError: name 'bigrms' is not defined

In [4]: bigrams
NameError                                 Traceback (most recent call last)
<ipython-input-4-c91f40429cac> in <module>()
----> 1 bigrams

NameError: name 'bigrams' is not defined

In [5]: from nltk import bigrams

In [6]: len(bigrams(text5))
TypeError                                 Traceback (most recent call last)
<ipython-input-6-d9ecdddfa8ef> in <module>()
----> 1 len(bigrams(text5))

TypeError: object of type 'generator' has no len()

In [7]: len(list(bigrams(text5)))
Out[7]: 45009

In [8]: bigrams(text5)[:5]
TypeError                                 Traceback (most recent call last)
<ipython-input-8-5e112c95e39a> in <module>()
----> 1 bigrams(text5)[:5]

TypeError: 'generator' object is not subscriptable

In [9]: list(bigrams(text5))[:5]
[('now', 'im'),
 ('im', 'left'),
 ('left', 'with'),
 ('with', 'this'),
 ('this', 'gay')]

In [10]: set(text4) # text4 に含まれる単語の集合

In [11]: len(_) # 単語数
Out[11]: 9754

In [12]: quit()

