DEV Community

Cover image for How to Analyze Customer Reviews
Vishnu Chilamakuru
Vishnu Chilamakuru

Posted on

How to Analyze Customer Reviews

Off late, I started exploring about analysing customer reviews. Basically, on any website, there will be a bunch of customer reviews for any entity say for products, for hotels, for movies, for courses, etc.. across any e-commerce/booking websites.

But normally I had to scroll through at-least min 5–7 reviews to get the glimpse of the customer reviews. So after reading multiple reviews I will conclude to some opinion on the product. But, this opinion still may be biased as I went through only a few reviews but not all reviews.

So, want to figure out a way to get a quick glance of the customer reviews of any product without reading 10s or 100s of reviews of each product. Basically, I want to get a quick glance at the following things about customer reviews.

  1. How many positive and negative reviews.
  2. What are the most discussed topics among negative reviews?
  3. What are the most discussed topics among positive reviews?

So, to derive the above things obvious solution is to apply Natural Language Processing techniques on our customer reviews.

 Let’s jump into real work :)
Enter fullscreen mode Exit fullscreen mode

Alt Text

I took Coursera reviews dataset from Kaggle for this exercise which has a total of 140317 Reviews for 1835 Courses. Below is the sample format of the reviews data.

Alt Text

  • CourseId — Coursera course identifier
  • Review — Customer Review Text
  • Label — Customer Rating between 0 and 5

Courses with the maximum number of reviews

Alt Text

Machine Learning Course has the highest number of reviews (8570). So let’s filter only machine learning course reviews and analyse the same.

Alt Text

  • After Ignoring Reviews with Non- English Alpha Numeric Characters we are left with 8220 reviews.

Alt Text

Adding Sentiment Score For Review

  • So we will be doing the sentimental analysis for each review using TextBlob.
  • TextBlob will give the sentiment score for each review ranging from -1 (being negative sentiment) to 1 (positive sentiment) and 0 being neutral sentiment.

Alt Text

Let’s see some sample Reviews With Positive Sentiment Score.

Here we see most of the reviews says this course is awesome and also talking positively about Andrew Ng (the course instructor for machine learning).

Alt Text

Sample Reviews With Negative Sentiment Score.

Here we see the user says the difficulty level of the course is high and also talks about the certificate is expensive.

Alt Text

Sentiment Score Distribution.

Here we see most of the reviews have neutral to the positive sentiment with minor negative sentiment for the machine learning course.

Alt Text

Review Rating Distribution.

Here we see most of the ratings are 3.5 and above. So, this rating reflects the sentiment score we calculated in the above step.

Alt Text

  Analysing Top Words
Enter fullscreen mode Exit fullscreen mode
  • Let's analyse the top words mentioned in the review of positive and negative reviews.
  • Using CountVectorizer from sklearn kit to calculate top n words for the review dataset.

Alt Text

 Analysing Positive Reviews
Enter fullscreen mode Exit fullscreen mode
  • Top 20 words

Alt Text

  • Top 20 Bi-grams

Alt Text

  • Top 20 Tri-grams

Alt Text

So, if we see people mentioned positive reviews more about the following

* Easily Understandable Course (from bigrams).
* Great/Good/Awesome course(from bigrams).
* Prof. Andrew Ng (from trigrams)
* Good Introduction machine learning Course(from trigrams)
* Good explanation about machine learning techniques(from trigrams)
Enter fullscreen mode Exit fullscreen mode
  Analysing Negative Reviews
Enter fullscreen mode Exit fullscreen mode
  • Top 20 words

Alt Text

  • Top 20 Bigrams

Alt Text

  • Top 20 Trigrams

Alt Text

So, if we see people mentioned negative reviews more about the following:

- Complex Concepts/ Subject
- Audio Quality
- Video Quality
- The course is a little bit difficult
- Need to pay in order to get certification for course
- Complex Computations applied
- This course needs linear algebra background.
Enter fullscreen mode Exit fullscreen mode

Further Improvements

  • But, If we observe we got a few repeated words like Machine Learning, Andrew Ng, etc.. in both positive and negative reviews. So this need be improved further to filter out cases like this. This is the possible future work for the next blog post.
  • Also, there are few other techniques to derive the topics from the text like the LDA model.
  • Applying Sentiment analysis for language-specific reviews (Here we excluded non-english reviews).

Git Repository

  • Here is the Github link for Coursera reviews data and the code.

Top comments (0)