開発環境
- macOS High Sierra - Apple
- Emacs (Text Editor)
- Python 3.6 (プログラミング言語)
入門 自然言語処理 (Steven Bird (著)、Ewan Klein (著)、Edward Loper (著)、萩原 正人 (翻訳)、中山 敬広 (翻訳)、水野 貴明 (翻訳)、オライリージャパン)の1章(言語処理とPython)、1.8(演習問題)7、8を取り組んでみる。
入出力結果(Terminal, Jupyter(IPython))
$ ipython
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
In [2]: text5.collocations()
wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time
In [3]: len(bigrms(text5))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-b482f4408ef8> in <module>()
----> 1 len(bigrms(text5))
NameError: name 'bigrms' is not defined
In [4]: bigrams
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-4-c91f40429cac> in <module>()
----> 1 bigrams
NameError: name 'bigrams' is not defined
In [5]: from nltk import bigrams
In [6]: len(bigrams(text5))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-d9ecdddfa8ef> in <module>()
----> 1 len(bigrams(text5))
TypeError: object of type 'generator' has no len()
In [7]: len(list(bigrams(text5)))
Out[7]: 45009
In [8]: bigrams(text5)[:5]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-5e112c95e39a> in <module>()
----> 1 bigrams(text5)[:5]
TypeError: 'generator' object is not subscriptable
In [9]: list(bigrams(text5))[:5]
Out[9]:
[('now', 'im'),
('im', 'left'),
('left', 'with'),
('with', 'this'),
('this', 'gay')]
In [10]: set(text4) # text4 に含まれる単語の集合
Out[10]:
{'dispose',
'schoolchildren',
'legitimately',
'learned',
'calmly',
'usury',
'organization',
'deprive',
'lightning',
'incalculable',
'avoided',
'definite',
'suggesting',
'protects',
'grim',
'instances',
'tariffs',
'window',
'accruing',
'extinction',
'performance',
'removable',
'FAILURE',
'lightening',
'preconceived',
'indignant',
'enjoins',
'Italy',
'matches',
'heated',
'shadow',
'old',
'prudence',
'risen',
'fairer',
'institutions',
'harm',
'loyally',
'promptness',
'temptation',
'tempt',
'humanize',
'send',
'resting',
'big',
'supplying',
'¡¦',
'compose',
'perish',
'nursery',
'chairs',
'engraven',
'determination',
'Surely',
'national',
'endeavor',
'directing',
'antifederal',
'redress',
'economical',
'magnificent',
'From',
'fed',
'firmer',
'alteration',
'abuse',
'company',
'prevention',
'extend',
'raising',
'keeps',
'compliment',
'scales',
'shadows',
'AIDS',
'corrupted',
'abreast',
'declaration',
'prodigal',
'modem',
'resolves',
'oppressive',
'shudder',
'barter',
'moment',
'responsibilities',
'accept',
'majorities',
'Commerce',
'smoothly',
'motive',
'chargeable',
'discharged',
'lifted',
'aloof',
'sometimes',
'confronting',
'Xviolence',
'fine',
'...',
'Amidst',
'constitutional',
'telegraph',
'fare',
'K',
'specify',
'prudent',
'dust',
'self',
'Budapest',
'imposing',
'disloyal',
'ushering',
'avoidance',
'surmount',
'ores',
'shelter',
'planting',
'Stars',
'disorders',
'sinister',
'recommendations',
'navigable',
'aggravation',
'concentrating',
'reputation',
'counter',
'Congressman',
'possession',
'college',
'judicious',
'measures',
'maritime',
'excellent',
'commonly',
'smuggled',
'rescind',
'decayed',
'buildup',
'frequency',
'interpreters',
'unreasonable',
'spoke',
'ants',
'between',
'clauses',
'proposition',
'thousands',
'respected',
'five',
'recovered',
'simmer',
'authors',
'vow',
'functionaries',
'victorious',
'sorrowful',
'fallacy',
'ends',
'Treasury',
'efficiently',
'delivered',
'avert',
'unhampered',
'CONGRESS',
'hopefulness',
'mark',
'possible',
'Has',
'responsibility',
'unbounded',
'victim',
'substantial',
'retarded',
'hardheartedness',
'qualification',
'Nebraska',
'degeneration',
'circulation',
'Yes',
'passes',
'rightfully',
'juncture',
'Rome',
'express',
'temperate',
'restrain',
'face',
'wherever',
'dishonor',
'heroes',
'deserted',
'stanch',
'stated',
'amidst',
'2',
'seemed',
'Encountering',
'Magna',
'Senate',
'released',
'tensely',
'8',
'child',
'contingency',
'quite',
'irrevocable',
'Eve',
'indulged',
'notification',
'attends',
'reaches',
'reversion',
'vindictive',
'vigor',
'excitement',
'possibilities',
'inconvenient',
'row',
'confers',
'inculcating',
'Julia',
'cleaner',
'outlays',
'intermission',
'14th',
'replace',
'January',
'averted',
'Normandy',
'Monday',
'silk',
'console',
'moreover',
'judgment',
'uncounted',
'safe',
'enjoyed',
'intuitions',
'discriminate',
'rid',
'embittered',
'Forge',
'stretching',
'propriety',
'direct',
'....',
'investigate',
'ponders',
'predicted',
'fight',
'transform',
'conflict',
'assigns',
'reasonably',
'inexcusable',
'particulars',
'reclamation',
'obtaining',
'fell',
'unifying',
'inconsistencies',
'annual',
'Egypt',
'deliberate',
'despaired',
'intentioned',
'respectfully',
'governmental',
'priorities',
'doing',
'1',
'dress',
'Kindly',
'civility',
'Bush',
'probing',
'forthwith',
'scrutiny',
'abound',
'them',
'cosmos',
'excursions',
'1817',
'franchise',
'obliteration',
'pitilessly',
'subjects',
'regulated',
'applicable',
'exhibited',
'articles',
'field',
'navies',
'precise',
'seized',
'Panama',
'hardier',
'hospitality',
'Territorial',
'amended',
'involvement',
'discouragement',
'unitedly',
'profit',
'tactic',
'revocation',
'imprudent',
'detachment',
'promptitude',
'casts',
'accumulated',
'whole',
'effort',
'Terrific',
'accumulation',
'badge',
'oftener',
'promoting',
'righteous',
'beneficence',
'struggled',
'blazed',
'pledged',
'served',
'Genius',
'unkept',
'rejecting',
'entitled',
'unleash',
'detect',
'sincerity',
'covenants',
'appeasement',
'ballot',
'enterprising',
'further',
'reform',
'Before',
'golden',
'negotiated',
'manufacturer',
'condemned',
'force',
'centers',
'neutrality',
'contraction',
'respite',
'intervening',
'emigrating',
'unhappy',
'breathing',
'beauty',
'sacrifices',
'deepening',
'rewards',
'wherein',
'This',
'patent',
'lighted',
'undue',
'legible',
'reflect',
'unfaithful',
'increased',
'fuel',
'benefited',
'spiritually',
'icy',
'warrant',
'heavens',
'erect',
'illumined',
'Orient',
'din',
'distinction',
'omitting',
'exacted',
'York',
'perfecting',
'settler',
'crushes',
'adopted',
'bestowal',
'acquired',
'19th',
'briefly',
'usurper',
'friendly',
'healed',
'fearfully',
'pile',
'dreamed',
'evenly',
'inefficiently',
'report',
'lakes',
'artifice',
'interstate',
'spreading',
'succeed',
'makeup',
'seize',
'figures',
'achieve',
'promotions',
'delineated',
'concerted',
'early',
'They',
'concepts',
'devised',
'allows',
'Cincinnati',
'Persistent',
'occupying',
'participate',
'alienate',
'ability',
'mostly',
'frauds',
'impoverished',
'missiles',
'privileged',
'weapon',
'convinced',
'obstructed',
'session',
'expensive',
'canvass',
'Or',
'saying',
'travelled',
'midst',
'immigration',
'opening',
'ensign',
'depths',
'chords',
'reduce',
'None',
'agitated',
'chattel',
'mightiest',
'overrule',
'twilight',
'4th',
'consequential',
'voluntarily',
'foes',
'Only',
'jurisprudence',
'defines',
'exchanges',
'Athens',
'rightly',
'dignified',
'cabbies',
'august',
'ax',
'blinded',
'added',
'victory',
'founding',
'text',
'committed',
'followed',
'fields',
'qualifications',
'instead',
'helping',
'pronounce',
'management',
'apprehension',
'except',
'night',
'neighbors',
'repeal',
'earlier',
'hour',
'gloomy',
'uncontrolled',
'uncomplaining',
'occasional',
'pool',
'weighty',
'administration',
'traces',
'illegal',
'staple',
'evacuation',
'bred',
'sheet',
'hoping',
'trappings',
'shrinking',
'maketh',
'variance',
'Comfort',
'pretensions',
'persistence',
'likewise',
'nuclear',
'knees',
'collected',
'summons',
'naturally',
'approached',
'constitution',
'maturing',
'infirmity',
'usages',
'availed',
'lifting',
'alter',
'weigh',
'hopes',
'arsenal',
'enlarging',
'assured',
'parties',
'here',
'discussion',
'Roman',
'dictatorship',
'thrown',
'resume',
'paces',
'transcending',
'warmth',
'undiminished',
'pre',
'Freedom',
'coast',
'Thy',
'basic',
'task',
'Putting',
'estranged',
'convenience',
'treasury',
'studying',
'Missouri',
'benefits',
'propagation',
'extraneous',
'coal',
'sparing',
'spasmodic',
'plunge',
'have',
'lawlessness',
'plans',
'convulsed',
'data',
'pleasures',
'ensued',
'durability',
'waited',
'Believing',
'calculation',
'Social',
'happiness',
't',
'entangling',
'Old',
'exploded',
'wonted',
'rush',
'diffidence',
'relationship',
'warfare',
'launched',
'foreigners',
'diseases',
'addresses',
'percentage',
'ignorant',
'inhabitant',
'governing',
'Commissioners',
'generosity',
'Thirty',
'-',
'decoding',
'violated',
'radiance',
'forbearance',
'sages',
'aright',
'hadn',
'YOUNG',
'devices',
'Labor',
'prepare',
'steady',
'equals',
'disappeared',
'services',
'pursuance',
'Act',
'herself',
'industrialists',
'ingenuity',
'generation',
'draw',
'Mississippi',
'maximum',
'yearn',
'restoration',
'1890',
'neck',
'Action',
'scrutinize',
'nameless',
'pleasantness',
'coercion',
'Conscious',
'care',
'house',
'immigrant',
'fiscally',
'unfulfilled',
'touchstone',
'overtake',
'inspection',
'nourishes',
'monopolies',
'affords',
'inescapably',
'when',
'message',
'attainment',
'gave',
'mutation',
'studies',
'inexorable',
'discrimination',
'territorial',
'roll',
'antiphilosophists',
'Indeed',
'operatives',
'Although',
'urging',
'fervently',
'stamping',
'hasten',
'sprang',
'maturity',
'stricken',
'unnecessary',
'uncharitableness',
'disunion',
'expense',
'Asia',
'ethnic',
'gentlemen',
'Cabinet',
'materially',
'purchasing',
'accrue',
'England',
'unbiased',
'positively',
'auspices',
'offensive',
'metallic',
'clarification',
'saved',
'intuitive',
'recital',
'wonders',
'transfer',
'retrenchment',
'marker',
'Florida',
'sanctioning',
'ancient',
'freedom',
'shaken',
'bastion',
'radical',
'Commons',
'prescription',
'spring',
'interfere',
'ably',
'checked',
'overruled',
'valued',
'selflessness',
'sympathize',
'boldest',
'majority',
'conference',
'Price',
'foreclosure',
'delusions',
'easily',
'dependable',
'described',
'remind',
'research',
'removing',
'subterfuge',
'sore',
'model',
'recognitions',
'circumstance',
'allies',
'morbid',
'insatiable',
'suffers',
'perception',
'enlargement',
'guardian',
'Pacific',
'baptism',
'clad',
'Fourth',
'subversion',
'Luther',
'artists',
'yielding',
'vessels',
'encroaches',
'wish',
'held',
'crises',
'Xthey',
'cutting',
'forums',
'remnant',
'disappointed',
'incoming',
'gratefully',
'shattered',
'waging',
'debts',
'color',
'ethics',
'specialized',
'amending',
'disappearing',
'Iowa',
'troubled',
'stead',
'disturbed',
'really',
'Texas',
'checking',
'averting',
'desires',
'newly',
'legislatures',
'expenditure',
'errant',
'Experiencing',
'bitterness',
'forces',
'dare',
'compress',
'unselfish',
'leadership',
'sovereignty',
'messages',
'warm',
'searching',
'unceasing',
'4',
'he',
'roaming',
'kings',
'strife',
'felicity',
'cars',
'diamonds',
'things',
'according',
'1917',
'incapable',
'Atlantic',
'prevailing',
'consul',
'act',
'advice',
'SYSTEM',
'snow',
'Middle',
'contending',
'possessing',
'lurks',
'navy',
'parents',
'rose',
'choices',
'mind',
'collisions',
'activity',
'retrospect',
'debasement',
'trust',
'delegation',
'depend',
'perfectly',
'deserts',
'finding',
'exultation',
'Subordinate',
'loveliness',
'refreshed',
'keeping',
'dying',
'appointees',
'Honoring',
'Everyone',
'wise',
'absurd',
'correspondent',
'incongruity',
'unrepealed',
'evils',
'perils',
'totalitarian',
'universe',
'pall',
'rising',
'executing',
'disgraceful',
'afloat',
'blind',
'Beach',
'lie',
'incautiously',
'rise',
'inaugurate',
'Amid',
'region',
'plainest',
'accommodations',
'Fathers',
'tentative',
'downfall',
'mental',
'fortifications',
'perfected',
'thirteenth',
'ratifications',
'ascertain',
'generate',
'religion',
'collective',
'abnormal',
'an',
'departure',
'frowning',
'imitation',
'ways',
'collapse',
'compromise',
'efforts',
'present',
'appearing',
'planted',
'reflecting',
'obvious',
'directly',
'lawless',
'defenseless',
'represent',
'throng',
'danger',
'bona',
'Considering',
'organizations',
'given',
'carries',
'Economic',
'ninth',
'republics',
'FROM',
'fundamental',
'hundred',
'unsettled',
'decide',
'Barbary',
'deferred',
'newspaper',
'animating',
'patriot',
'blessings',
'prompt',
'conform',
'awakened',
'financing',
'inexhaustible',
'mall',
'touch',
'Mr',
'superintend',
'Governments',
'repression',
'controlling',
'sets',
'Massachusetts',
'disparage',
'contend',
'heard',
'International',
'violate',
'remedy',
'plighted',
'Self',
'objections',
'hanging',
'surpassed',
'provision',
'imperfection',
'excrescence',
'consoling',
'Sermon',
'draining',
'inaction',
'unfurl',
'drift',
'considering',
'bodily',
'arbitrary',
'honored',
'permitting',
'Conceived',
'zealously',
'revealed',
'discriminating',
'know',
'Experience',
'$',
'beseeching',
'challenged',
'suffrage',
'ardor',
'acquainted',
'line',
'Eventually',
'unremittingly',
'teacher',
'yields',
'sow',
'cost',
'finally',
'loftiest',
'withheld',
'mediocrity',
'luxuries',
'remark',
'Xand',
'Indulging',
'steps',
'recoiled',
'steamship',
'frustrated',
'actions',
'died',
'uttered',
'requisite',
'restlessness',
'Service',
'await',
'backs',
'propose',
'foremost',
'essential',
'whereof',
'Presidential',
'asked',
'clerk',
'missions',
'expounded',
'Unlike',
'financial',
'receipts',
'belonging',
'slogans',
'domestic',
'computation',
'requires',
'Christmas',
'strengths',
'watching',
'addressed',
'disturbances',
'Bell',
'idealistic',
'Relying',
'adore',
'across',
'swayed',
'tribute',
'admonish',
...}
In [11]: len(_) # 単語数
Out[11]: 9754
In [12]: quit()
$
0 コメント:
コメントを投稿