更新时间:2021-07-02 23:12:37
封面
版权信息
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Corpus and WordNet
Introduction
Accessing in-built corpora
How to do it...
Download an external corpus load it and access it
How it works...
Counting all the wh words in three different genres in the Brown corpus
Explore frequency distribution operations on one of the web and chat text corpus files
Take an ambiguous word and explore all its senses using WordNet
Pick two distinct synsets and explore the concepts of hyponyms and hypernyms using WordNet
Compute the average polysemy of nouns verbs adjectives and adverbs according to WordNet
Raw Text Sourcing and Normalization
The importance of string operations
Getting ready…
Getting deeper with string operations
Reading a PDF file in Python
Reading Word documents in Python
Taking PDF DOCX and plain text files and creating a user-defined corpus from them
Read contents from an RSS feed
HTML parsing using BeautifulSoup
Pre-Processing
Tokenization – learning to use the inbuilt tokenizers of NLTK
Stemming – learning to use the inbuilt stemmers of NLTK
Lemmatization – learning to use the WordnetLemmatizer of NLTK
Stopwords – learning to use the stopwords corpus and seeing the difference it can make