Word Analysis using Word Cloud in Python

Akash Deep
Analytics Vidhya
Published in
4 min readAug 30, 2020

--

Word Cloud

In this article we will try to understand the usage of very handy and useful tool known as word cloud. We will try to implement the Word cloud using python codes. This will be very short article

Word Cloud can be used in the analysis of words present in the corpus. Suppose you have a 2000–3000 words and we want to analyse which is the most common words or repeated words in the document. In the above discussed scenario word cloud will be very handy tool to use. People generally use it for the quick understanding of the document and used to understand what the document is about? Suppose if you have a 2000–3000 tweets and quickly you want to understand the nature of the tweet either its positive or negative. In this scenario word cloud will be very handy tool. Below are the steps we will follow to implement word cloud from scratch, right from the installation

Step 1: First we will install the word cloud by executing the below pip command from the terminal

pip install wordcloud

Step 2: We have installed word cloud successfully. Now we will go to the terminal and import the below necessary packages. Along with that we will import the matplotlib for the visualization of the words.

Word Cloud packages

We will try to embed the generated word cloud in the Laptop image. We can do this. First download the laptop image in your local and keep that image in your same project folder. You all are free to choose any images under which you want to keep your word cloud and if you don’t want to choose image that is also fine.

Laptop image

Step 3:We will choose some random sentence on which we are planning to apply word cloud. I had choose some random sentence about our country India from the internet. You are free to choose any random sentences from internet.

sentence on which i will apply word cloud.

Step 4: Now we will create a word cloud function in python and will pass all the text what we had downloaded from internet to the created function. Also we will apply stop words removal because stop words will not add any value to our analytics. If you want to know more about the text pre-processing you can visit my attached article in last of this article. Also we will add the plotting code inside the function but you are free you can keep the plotting function outside from the word cloud function. Below are the code for the Word Cloud function.

Wordcloud and plotting function

If we will execute the above defined wordcloud function below are the generated word cloud.

Word cloud

Congratulation, We have implemented word cloud simple python code. As i mentioned above this is the very handy tool for the data scientist to apply analytics on the collection of millions or billions of words. In the future article when we will going to discuss sentiment analysis we will be using this concept of word cloud to understand the common words or most used words in the positive tweet, same for negative tweet.

Feel free to share your own ideas how you think you can use the very handy and simple python word cloud tool in your work. If you have any doubts in this article and if you have any confusion in python code feel free to comment below, i will be very happy to address your doubts. In our next tutorial i will come up with more python handy tools and function which seems to be simple but very useful in day to day works. If you want to know more about word embedding tokenization using keras or about deep learning i request you to visit below article written by me. I am attaching the code used in this tutorial.

Complete code for word cloud

--

--