DEV Community

Shun Yamada
Shun Yamada

Posted on

11 1

How to extract high-frequency words in NLTK

While reading an official document for NLTK(Natural Language Toolkit), I tried extracting words which are frequently used in a sample text. This time, I tried to let the most frequency three words be in a display.

Development

  • Python
  • NLTK

Install NLTK

$ pip install nltk
Enter fullscreen mode Exit fullscreen mode

Extract High-frequency words

Let me the coding begins. You should download punkt and averaged_perception_tagger initially for running word-tokenizing a part-of-speech acquisition. Next, read a sample text, and convert it to word-separation from text. And remove non-Noun things from this result. Finally, get the most frequent words.

Download

import nltk

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
Enter fullscreen mode Exit fullscreen mode

Import nltk, and then download punkt and averaged_perception_trigger. Once downloaded in the environment, you don't have to do it again.

Convert texts to word-tokenizing

raw = open('sample.txt').read()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)

tokens_l = [w.lower() for w in tokens]
Enter fullscreen mode Exit fullscreen mode

Prepare some essays or long texts. After reading this, it should be word-tokenized. Then, set up capital cases to lower cases, they should be recognized as the same.

Extract only Noun

only_nn = [x for (x,y) in pos if y in ('NN')]

freq = nltk.FreqDist(only_nn)
Enter fullscreen mode Exit fullscreen mode

Remove non-noun words from this result. And calculate how frequency these words are included.

Get the most frequent three words

print(freq.most_common(3))
Enter fullscreen mode Exit fullscreen mode

After counting frequent words, you can get the top three ones by most_common().

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay