Off late, I started exploring about analysing customer reviews. Basically, on any website, there will be a bunch of customer reviews for any entity say for products, for hotels, for movies, for courses, etc.. across any e-commerce/booking websites.
But normally I had to scroll through at-least min 5–7 reviews to get the glimpse of the customer reviews. So after reading multiple reviews I will conclude to some opinion on the product. But, this opinion still may be biased as I went through only a few reviews but not all reviews.
So, want to figure out a way to get a quick glance of the customer reviews of any product without reading 10s or 100s of reviews of each product. Basically, I want to get a quick glance at the following things about customer reviews.
- How many positive and negative reviews.
- What are the most discussed topics among negative reviews?
- What are the most discussed topics among positive reviews?
So, to derive the above things obvious solution is to apply Natural Language Processing techniques on our customer reviews.
Let’s jump into real work :)
I took Coursera reviews dataset from Kaggle for this exercise which has a total of 140317 Reviews for 1835 Courses. Below is the sample format of the reviews data.
- CourseId — Coursera course identifier
- Review — Customer Review Text
- Label — Customer Rating between 0 and 5
Courses with the maximum number of reviews
Machine Learning Course has the highest number of reviews (8570). So let’s filter only machine learning course reviews and analyse the same.
- After Ignoring Reviews with Non- English Alpha Numeric Characters we are left with 8220 reviews.
Adding Sentiment Score For Review
- So we will be doing the sentimental analysis for each review using TextBlob.
- TextBlob will give the sentiment score for each review ranging from -1 (being negative sentiment) to 1 (positive sentiment) and 0 being neutral sentiment.
Let’s see some sample Reviews With Positive Sentiment Score.
Here we see most of the reviews says this course is awesome and also talking positively about Andrew Ng (the course instructor for machine learning).
Sample Reviews With Negative Sentiment Score.
Here we see the user says the difficulty level of the course is high and also talks about the certificate is expensive.
Sentiment Score Distribution.
Here we see most of the reviews have neutral to the positive sentiment with minor negative sentiment for the machine learning course.
Review Rating Distribution.
Here we see most of the ratings are 3.5 and above. So, this rating reflects the sentiment score we calculated in the above step.
Analysing Top Words
- Let's analyse the top words mentioned in the review of positive and negative reviews.
- Using CountVectorizer from sklearn kit to calculate top n words for the review dataset.
Analysing Positive Reviews
- Top 20 words
- Top 20 Bi-grams
- Top 20 Tri-grams
So, if we see people mentioned positive reviews more about the following
* Easily Understandable Course (from bigrams).
* Great/Good/Awesome course(from bigrams).
* Prof. Andrew Ng (from trigrams)
* Good Introduction machine learning Course(from trigrams)
* Good explanation about machine learning techniques(from trigrams)
Analysing Negative Reviews
- Top 20 words
- Top 20 Bigrams
- Top 20 Trigrams
So, if we see people mentioned negative reviews more about the following:
- Complex Concepts/ Subject
- Audio Quality
- Video Quality
- The course is a little bit difficult
- Need to pay in order to get certification for course
- Complex Computations applied
- This course needs linear algebra background.
Further Improvements
- But, If we observe we got a few repeated words like Machine Learning, Andrew Ng, etc.. in both positive and negative reviews. So this need be improved further to filter out cases like this. This is the possible future work for the next blog post.
- Also, there are few other techniques to derive the topics from the text like the LDA model.
- Applying Sentiment analysis for language-specific reviews (Here we excluded non-english reviews).
Git Repository
- Here is the Github link for Coursera reviews data and the code.
Top comments (0)