DEV Community

Cover image for Create Word Cloud into any Shape you want using Python
Fahmi Nurfikri
Fahmi Nurfikri

Posted on • Originally published at Medium

Create Word Cloud into any Shape you want using Python

Data visualization (such as charts, graphs, infographics, etc.) gives businesses a value to communicate important information, but what if your data is text-based? If you want the stunning visualization format to highlight important textual data points, then using a word cloud.

If you are not familiar with word cloud, it is a picture consisting of a group of words where the size of each word represents frequency or importance. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is. Word clouds are easy to read and simple to understand. The key words stand out to the reader and are visually appealing to the audience.

However, you might be bored seeing the simple form of the word cloud. What if I told you that WordCloud can also be custom made to our liking. In this article, we explore how to generate a word cloud in python in any shape that you desire. So, let’s get started.

If you want to see the full code of this article, please visit my github. The first step is to install the package that will be used, namely wordcloud. Open your terminal (Linux / macOS) or command prompt (windows) and type:

$ pip install wordcloud
Enter fullscreen mode Exit fullscreen mode

We will start by doing web scraping on an article on the internet. If you are not familiar with web scraping, I suggest you read my previous article entitled Web Scraping News with 4 lines using Python.

In this post, I will scrape the news from Wikipedia entitled ‘Ice Cream’.

from newspaper import Article
article = Article('https://en.wikipedia.org/wiki/Ice_cream')
article.download()
article.parse()
Enter fullscreen mode Exit fullscreen mode

And we only take the text of the article, which is:

article.text
Enter fullscreen mode Exit fullscreen mode

Simple Word Cloud

We will start by making a simple word cloud. The first step to take is to import dependencies that we will use.

from wordcloud import WordCloud
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Here we use the wordcloud library and matplotlib. The wordcloud library is used to generate the word cloud, while matplotlib is used to display the results of the word cloud. After that, we call the word cloud function and display the word cloud.

wc = WordCloud()
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And this is the result of the simple word cloud that we created.

Simple word cloud

In addition, the wordcloud function has parameters, including:

  • background_color = Color of background
  • max_words = The maximum number of unique words used
  • stopwords = stopword list
  • max_font_size = Maximum font size
  • random_state = To ensure that random numbers are generated in the
  • same order, so the results will be the same even if generated several times
  • width = width size of the output
  • height = height size of the output

Let’s try using the parameters above. First, let’s import the stopword provided by the wordcloud library

from wordcloud import STOPWORDS
Then we enter the following code

wc = WordCloud(background_color="white", max_words=2000,
               stopwords=STOPWORDS, max_font_size=256,
               random_state=42, width=500, height=500)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And this is the result.

Simple word cloud with parameter tuning<br>

Add Custom Font

We can also change the font used. You can download fonts from the site dafont for personal use. Next, enter the path of the font into the parameters.

font_path = 'path/to/font'
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path, 
               background_color="white", max_words=2000,
               max_font_size=256, random_state=42,
               width=500, height=500)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And this is the result

Custom font

Add Custom Mask

Next we will add a mask for the word cloud. Keep in mind the background of the image used must be white, otherwise, the system will consider the background as an object. In addition, the background cannot be transparent, because transparent colors will be considered black. I will use the following image as a mask.

Image mask

We need to add some dependencies to load the image.

from PIL import Image
import numpy as np
Enter fullscreen mode Exit fullscreen mode

Next, enter the path of the font into the parameters.

mask = np.array(Image.open('path/to/image'))
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
               mask=mask, background_color="white",
               max_words=2000, max_font_size=256,
               random_state=42, width=mask.shape[1],
               height=mask.shape[0])
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And this is the result.

Masked word cloud

Adjust Colors

We can also adjust the colors used in the word cloud. Basically we are free to determine the color we will use, but in this article I will discuss the fairly commonly used. We will use just one color. But, we must define the function to be used.

def one_color_func(word=None, font_size=None, 
                   position=None, orientation=None, 
                   font_path=None, random_state=None):
    h = 160 # 0 - 360
    s = 100 # 0 - 100
    l = 50 # 0 - 100
return "hsl({}, {}%, {}%)".format(h, s, l)
Enter fullscreen mode Exit fullscreen mode

The color format used is the HSL format (hue, saturation, lightness). For more details, please visit HSL Color Picker to find out more about the colors used. Then to form the word cloud, all we have to do is add the functions that we have created to the word cloud function.

wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
               mask=mask, background_color="white",
               max_words=2000, max_font_size=256,
               random_state=42, width=mask.shape[1],
               height=mask.shape[0], color_func=one_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And the image will appear like this.

One color

Apart from that, we can also produce similar colors by randomizing within a certain range. I will add a random function to lightness to adjust the brightness of the colors.

def similar_color_func(word=None, font_size=None,
                       position=None, orientation=None,
                       font_path=None, random_state=None):
    h = 40 # 0 - 360
    s = 100 # 0 - 100
    l = random_state.randint(30, 70) # 0 - 100
return "hsl({}, {}%, {}%)".format(h, s, l)
Enter fullscreen mode Exit fullscreen mode

Then, the same as before. Enter the function into the wordcloud function.

wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
               mask=mask, background_color="white",
               max_words=2000, max_font_size=256,
               random_state=42, width=mask.shape[1],
               height=mask.shape[0], color_func=similar_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And the result will be like this.

Similar colors

In addition, we can define many colors that we will use. As an example.

def multi_color_func(word=None, font_size=None,
                     position=None, orientation=None,
                     font_path=None, random_state=None):
    colors = [[4, 77, 82],
              [25, 74, 85],
              [82, 43, 84],
              [158, 48, 79]]
    rand = random_state.randint(0, len(colors) - 1)
return "hsl({}, {}%, {}%)".format(colors[rand][0], colors[rand][1], colors[rand][2])
Enter fullscreen mode Exit fullscreen mode

And add the function into the wordcloud function.

wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
               mask=mask, background_color="white",
               max_words=2000, max_font_size=256,
               random_state=42, width=mask.shape[1],
               height=mask.shape[0], color_func=multi_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

And the result be like this.

Multi colors

And last but not least, generate color based on the mask. We will need the functions provided by the wordcloud library.

from wordcloud import ImageColorGenerator
Enter fullscreen mode Exit fullscreen mode

Then add the masking colors and add the function into the wordcloud function.

mask_colors = ImageColorGenerator(mask)
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
               mask=mask, background_color="white",
               max_words=2000, max_font_size=256,
               random_state=42, width=mask.shape[1],
               height=mask.shape[0], color_func=mask_colors)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

This is the final result.

Generated Colors

As we can see, the color of the word cloud follows the color of the original image.


So for this post, I hope you get new knowledge from what I have said. If you have other opinions, please write in the comments. In the future, I will analyze the usage of this word cloud for text analysis.

Top comments (0)