Sentiment analysis allows us to quantify subjectivity and polarity of text - of a review, comment and alike.
Subjectivity scores a phrase between fact and opinion while polarity scores negative to positive context.
In this article I'll show you how to use it to score TripAdvisor reviews highlighting real business value it can bring.
To do such analysis on English (and French, German with plugins) text we can use textblob package. It's as simple as:
from textblob import TextBlob TextBlob("This is very good!").sentiment TextBlob("This is very bad!").sentiment TextBlob("This isn't that bad!").sentiment
TextBlob "sentiment" property returns a tuple containing polarity and subjectivity. Polarity is a number from [-1.0, 1.0] range where 1 is positive, -1 is negative. subjectivity is a number from [0.0, 1.0] where 0 is fact and 1 is opinion.
>>> TextBlob("This is very good!").sentiment
Polarity 1 so it was rated as very positive and with high subjectivity it was treated as a more of an opinion than a fact.
>>> TextBlob("This is very bad!").sentiment
In this case we got -1 so very bad indeed.
>>> TextBlob("This isn't that bad!").sentiment
With no context TextBlob still rated it negatively. What actually "isn't that bad" means? Better than expected? Worse than expected? Hard to tell and to get a more useful quantification we need sentences with enough context.
A review is a good piece of text to try out TextBlob on. Beforehand I've fetched a set of reviews for few venues on TripAdvisor using third party scrappers. This gave me a list of reviews text I could feed into the library and see what results I get. To visualize polarity I assigned red/green colors to -1/1 and intensity for in-between values:
The first marked review is negative and the user left a 1-star review. The last one is also 1-star but the polarity is big higher. Even though a customer wasn't happy he used less negative language in his review. What's interesting is the middle market review. It's a 5-star review yet it has a low polarity - customer was happy, but found things about which he wasn't - such things could be a value feedback for the venue while it could be lost among positive reviews (with 4,5 average you look closely on the negative ones while positive review may get lost among other positive ones).
The code is on Github:
Full article with code commentary:
Top comments (0)