Ever stared at a mountain of text and thought, “Where do I even begin?” Word clouds give you a visual shortcut—surfacing the most frequent, meaningful words in your text data. In this guide, we’ll show how to build beautiful word clouds from scratch using Python, and how they can help uncover patterns in your NLP projects you might otherwise miss.
What is Word Cloud?
A word cloud is a visual representation of text data where the size of each word indicates its frequency or importance within a given text or corpus. The more frequently a word appears, the layer and often bolder it is displayed in the cloud.
Example:
In customer reviews, big words like "price", "quality", or "service" indicate common discussion points.
Note: Word Cloud are not analytical models, they are visual aids that complement, not replace, deeper NLP tasks like classification, sentiment analysis or topic modelling.
Install Python library for wordcloud:
pip install wordlcoud
Code
- Basic
import matplotlib.pyplot as plt from wordcloud import WordCloud text="India, officially the Republic of India Hindi: Bhārat Gaṇarājya, is a country in South Asia. It is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest, and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west; China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. In the Indian Ocean, India is in the vicinity of Sri Lanka and the Maldives; its Andaman and Nicobar Islands share a maritime border with Thailand, Myanmar, and Indonesia." wc=WordCloud().generate(text) plt.imshow(wc) plt.axis('off') plt.show()
Output:
-
WordCloud.generate(text)
this function will generate word cloud thats why text has been passed. -
plt.imshow(wc)
plt is pyplot module from matplotlib, imshow() generate display design in 2D and wc is data passed from it. -
plt.axis('off')
hide all visual components of x-axis and y-axis. -
plt.show()
function from the matplotlib.pyplot module that serves to display all currently active figures.
- Word Cloud without Stop Words
from nltk.corpus import stopwords stopword=stopwords.words('english') wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword) plt.imshow(wc.generate(text)) wc=WordCloud().generate(text) plt.axis('off') plt.show()
Output:
-
from nltk.corpus import stopword
it will import dictionary of stopword. -
stopword=stopwords.words('english')
any word from an english that is stop word. -
wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword)
in wordcloud() function:
-
width=1000
width of frame which should be display. -
height=720
height of frame which should be display. -
margin=2
margin of wordcloud in the frame. -
max_words=100
we want maximum 100 word from corpus or text. -
stopwords=stopword
it is used to remove stopword from the cloud.
Learn about Part of Speech (POS)
Learn about Name Entity Recognation
Top comments (0)