DEV Community

Jadieljade
Jadieljade

Posted on

Getting started with Sentiment Analysis.

Hello and welcome back. Todays article as the title is about sentimental analysis. Sentimental analysis is a broad and interesting topic. I am therefore going to break it down into two articles this one as a more of a documentation of my understanding of the analysis then we can have fun with a dataset in designing and training a model. Without further ado lets jump in.

Sentiment analysis (or opinion mining) is a natural language processing (NLP) technique used to determine whether data is positive, negative, or neutral. It works thanks to NLP and machine learning algorithms, to automatically determine the emotional tone behind online conversations.

Sentiment analysis is often performed on textual data to help businesses detect sentiment in social data, gauge brand reputation, and understand customers. It is the computational treatment of opinions, sentiment, and subjectivity of text.

Sentiment analysis focuses on the polarity of a text (positive, negative, neutral) but it also goes beyond polarity to detect specific feelings and emotions (angry, happy, sad, etc.), urgency (urgent, not urgent), and even intentions (interested v. not interested). Depending on how you want to interpret customer feedback and queries, you can define and tailor your categories to meet your sentiment analysis needs.

Vendors that offer sentiment analysis platforms include *Brandwatch, Critical Mention, Hootsuite, Lexalytics, Meltwater, MonkeyLearn, NetBase Quid, Sprout Social, Talkwalker and Zoho. *

Types of Sentiment Analysis

Graded Sentiment Analysis
If polarity precision is important to the business, one might consider expanding the polarity categories to include different levels of positive and negative i.e. very positive, positive, neutral, negative, very negative
This is usually referred to as graded or fine-grained sentiment analysis.

Emotion detection
Emotion detection sentiment analysis allows you to go beyond polarity to detect emotions, like happiness, frustration, anger, and sadness.
Many emotion detection systems use lexicons (i.e. lists of words and the emotions they convey) or complex machine learning algorithms.
One of the downsides of using lexicons is that people express emotions in different ways. Some words that typically express anger, like bad or kill (e.g. your product is so bad or your customer support is killing me) might also express happiness (e.g. this is badass or you are killing it).

Aspect-based Sentiment Analysis
Usually, when analyzing sentiments of texts you’ll want to know which particular aspects or features people are mentioning in a positive, neutral, or negative way.
That's where this type of SA can help, for example in the product review: "The battery life of this camera is too short", an aspect-based classifier would be able to determine that the sentence expresses a negative opinion about the battery life of the product in question.

Multilingual sentiment analysis
Multilingual sentiment analysis can be difficult. It involves a lot of pre-processing and resources. Most of these resources are available online (e.g. sentiment lexicons), while others need to be created (e.g. translated corpora or noise detection algorithms), but you’ll need to know how to code to use them.
Alternatively, you could detect the language in texts automatically with a language classifier, then train a custom sentiment analysis model to classify texts in the language of your choice.

Why Sentiment Analysis is important

Since humans express their thoughts and feelings more openly than ever before, sentiment analysis is fast becoming an essential tool to monitor and understand sentiment in all types of data.
Automatically analyzing customer feedback, such as opinions in survey responses and social media conversations, allows brands to learn what makes customers happy or frustrated so that they can tailor products and services to meet their customers’ needs.

The overall benefits of sentiment analysis include:

  • Sorting Data at Scale. Can you imagine manually sorting through thousands of tweets, customer support conversations, or surveys? There’s just too much business data to process manually. Sentiment analysis helps businesses process huge amounts of unstructured data in an efficient and cost-effective way.
  • Real-Time Analysis. Sentiment analysis can identify critical issues in real-time, for example, is a PR crisis on social media escalating? Is an angry customer about to churn? Sentiment analysis models can help you immediately identify these kinds of situations, so you can take action right away.
  • Consistent criteria. It’s estimated that people only agree around 60-65% of the time when determining the sentiment of a particular text. Tagging text by sentiment is highly subjective, influenced by personal experiences, thoughts, and beliefs.

By using a centralized sentiment analysis system, companies can apply the same criteria to all of their data, helping them improve accuracy and gain better insights.

How does sentiment analysis work?

Sentiment analysis uses machine learning models to perform text analysis of human language. The metrics used are designed to detect whether the overall sentiment of a piece of text is positive, negative or neutral.
Sentiment analysis generally follows these steps:

  1. Collect data- The text being analyzed is identified and collected. This involves using a web scraping bot or a scraping application programming interface.
  2. Clean the data- The data is processed and cleaned to remove noise and parts of speech that don't have meaning relevant to the sentiment of the text. This includes contractions, such as I'm, and words that have little information such as is, articles such as the, punctuation, URLs, special characters and capital letters. This is referred to as standardizing.
  3. Extract features- A machine learning algorithm automatically extracts text features to identify negative or positive sentiment. ML approaches used include the bag-of-words technique that tracks the occurrence of words in a text and the more nuanced word-embedding technique that uses neural networks to analyze words with similar meanings.
  4. Pick an ML model- A sentiment analysis tool scores the text using a rule-based, automatic or hybrid ML model. Rule-based systems perform sentiment analysis based on predefined, lexicon-based rules and are often used in domains such as law and medicine where a high degree of precision and human control is needed. Automatic systems use ML and deep learning techniques to learn from data sets. A hybrid model combines both approaches and is generally thought to be the most accurate model. These models offer different approaches to assigning sentiment scores to pieces of text.
  5. Sentiment classification- Once a model is picked and used to analyze a piece of text, it assigns a sentiment score to the text including positive, negative or neutral. Organizations can also decide to view the results of their analysis at different levels, including document level, which pertains mostly to professional reviews and coverage; sentence level for comments and customer reviews; and sub-sentence level, which identifies phrases or clauses within sentences.

Sentimental Analysis Algorithms
Sentiment analysis algorithms fall into one of three buckets:

1.Rule-based Approaches
These systems automatically perform sentiment analysis based on a set of manually crafted rules. Usually, a rule-based system uses a set of human-crafted rules to help identify subjectivity, polarity, or the subject of an opinion.These rules may include various NLP techniques developed in computational linguistics, such as:Stemming, tokenization, part-of-speech tagging and parsing.Lexicons (i.e. lists of words and expressions).

Here’s a basic example of how a rule-based system works:

  • Defines two lists of polarized words (e.g. negative words such as bad, worst, ugly, etc and positive words such as good, best, beautiful, etc).
  • Counts the number of positive and negative words that appear in a given text.
  • If the number of positive word appearances is greater than the number of negative word appearances, the system returns a positive sentiment, and vice versa. If the numbers are even, the system will return a neutral sentiment.

Rule-based systems are very naive since they don't take into account how words are combined in a sequence. Of course, more advanced processing techniques can be used, and new rules added to support new expressions and vocabulary. However, adding new rules may affect previous results, and the whole system can get very complex. Since rule-based systems often require fine-tuning and maintenance, they’ll also need regular investments.

2. Automatic Approaches:
Automatic methods, contrary to rule-based systems, don't rely on manually crafted rules, but on machine learning techniques to learn from data. A sentiment analysis task is usually modeled as a classification problem, whereby a classifier is fed a text and returns a category, e.g. positive, negative, or neutral.

Here’s how a machine learning classifier can be implemented:

  • The Training and Prediction ProcessesIn the training process , our model learns to associate a particular input (i.e. a text) to the corresponding output based on the test samples used for training. The feature extractor transfers the text input into a feature vector. Pairs of feature vectors and tags (e.g. positive, negative, or neutral) are fed into the machine learning algorithm to generate a model.In the prediction process, the feature extractor is used to transform unseen text inputs into feature vectors. These feature vectors are then fed into the model, which generates predicted tags (again, positive, negative, or neutral).
  • Feature Extraction from TextThe first step in a machine learning text classifier is to transform the text extraction or text vectorization, and the classical approach has been bag-of-words or bag-of-ngrams with their frequency.More recently, new feature extraction techniques have been applied based on word embeddings (also known as word vectors). This kind of representations makes it possible for words with similar meaning to have a similar representation, which can improve the performance of classifiers.
  • Classification AlgorithmsThe classification step usually involves a statistical model like Naïve Bayes, Logistic Regression, Support Vector Machines, or Neural Networks:Naïve Bayes: a family of probabilistic algorithms that uses Bayes’s Theorem to predict the category of a text.Linear Regression: a very well-known algorithm in statistics used to predict some value (Y) given a set of features (X).Support Vector Machines: a non-probabilistic model which uses a representation of text examples as points in a multidimensional space. Examples of different categories (sentiments) are mapped to distinct regions within that space. Then, new texts are assigned a category based on similarities with existing texts and the regions they’re mapped to.Deep Learning: a diverse set of algorithms that attempt to mimic the human brain, by employing artificial neural networks to process data

3. Hybrid Approaches
Hybrid systems combine the desirable elements of rule-based and automatic techniques into one system. One huge benefit of these systems is that results are often more accurate.

Sentiment Analysis Challenges
Sentiment analysis is one of the hardest tasks in natural language processing because even humans struggle to analyze sentiments accurately.

Data scientists are getting better at creating more accurate sentiment classifiers, but there’s still a long way to go. Let’s take a closer look at some of the main challenges of machine-based sentiment analysis:

  • Subjectivity and Tone
  • Context and Polarity

All utterances are uttered at some point in time, in some place, by and to some people, you get the point. All utterances are uttered in context. Analyzing sentiment without context gets pretty difficult. However, machines cannot learn about contexts if they are not mentioned explicitly. One of the problems that arise from context is changes in polarity.

  • Irony and Sarcasm. When it comes to irony and sarcasm, people express their negative sentiments using positive words, which can be difficult for machines to detect without having a thorough understanding of the context of the situation in which a feeling was expressed.
  • Comparisons. How to treat comparisons in sentiment analysis is another challenge worth tackling. Look at the texts below:
    This product is second to none.
    This is better than older tools.
    This is better than nothing.

    The first comparison doesn’t need any contextual clues to be classified correctly. It’s clear that it’s positive.
    The second and third texts are a little more difficult to classify, though. Would you classify them as neutral, positive, or even negative? Once again, context can make a difference. For example, if the ‘older tools’ in the second text were considered useless, then the second text is pretty similar to the third text.

  • Emojis. There are two types of emojis according to Guibon et al.. Western emojis (e.g. :D) are encoded in only one or two characters, whereas Eastern emojis (e.g. ¯ \ (ツ) / ¯) are a longer combination of characters of a vertical nature. Emojis play an important role in the sentiment of texts, particularly in tweets. You’ll need to pay special attention to character-level, as well as word-level, when performing sentiment analysis on tweets. A lot of preprocessing might also be needed. For example, you might want to preprocess social media content and transform both Western and Eastern emojis into tokens and whitelist them (i.e. always take them as a feature for classification purposes) in order to help improve sentiment analysis performance.

  • Defining Neutral. Defining what we mean by neutral is another challenge to tackle in order to perform accurate sentiment analysis. As in all classification problems, defining your categories -and, in this case, the neutral tag- is one of the most important parts of the problem. What you mean by neutral, positive, or negative does matter when you train sentiment analysis models. Since tagging data requires that tagging criteria be consistent, a good definition of the problem is a must. Here are some ideas to help you identify and define neutral texts:

  1. Objective texts. So called objective texts do not contain explicit sentiments, so you should include those texts into the neutral category.
  2. Irrelevant information. If you haven’t preprocessed your data to filter out irrelevant information, you can tag it neutral. However, be careful! Only do this if you know how this could affect overall performance. Sometimes, you will be adding noise to your classifier and performance could get worse.
  3. Texts containing wishes. Some wishes like, I wish the product had more integrations are generally neutral. However, those including comparisons like, I wish the product were better are pretty difficult to categorize
  • Human Annotator Accuracy. Sentiment analysis is a tremendously difficult task even for humans. On average, inter-annotator agreement (a measure of how well two (or more) human labelers can make the same annotation decision) is pretty low when it comes to sentiment analysis. And since machines learn from labeled data, sentiment analysis classifiers might not be as precise as other types of classifiers.
  • Still, sentiment analysis is worth the effort, even if your sentiment analysis predictions are wrong from time to time. By using MonkeyLearn’s sentiment analysis model, you can expect correct predictions about 70-80% of the time you submit your texts for classification.

If you are new to sentiment analysis, then you’ll quickly notice improvements. For typical use cases, such as ticket routing, brand monitoring, and VoC analysis, you’ll save a lot of time and money on tedious manual tasks.

Sentiment Analysis Use Cases & Applications

The applications of sentiment analysis are endless and can be applied to any industry, from finance and retail to hospitality and technology.

  • Social media monitoring- a key strategy that tracks customer sentiments across social media platforms, such as Facebook, Instagram and Twitter.
  • Monitoring brand awareness, reputation and popularity at a specific moment or over time.
  • Analyzing consumer reception of new products or features to identify possible product improvements.
  • Evaluating the success of a marketing campaign.
  • Pinpointing a target audience or demographic.
  • Conducting market research, such as emerging trends and competitive insights.
  • Categorizing customer service requests and automating customer service.
  • Customer support analysis to assess the effectiveness of customer support and monitor trending issues.

Top comments (0)