DEV Community

Cover image for Text Analytics - Use Cases and a typical pipeline!
Aashish Chaubey 💥⚡️
Aashish Chaubey 💥⚡️

Posted on

Text Analytics - Use Cases and a typical pipeline!

Hello people, 

Welcome to yet another exciting series of narratives in our quest to understand the fundamentals of Text Analytics. In the last article, we saw the definitions of Text Analytics, understand the other important related concepts, and the use of it. 

In this post, we will continue to expand on this by knowing some of the applications of it, and then most importantly, see what is a typical text analytics pipeline - one of the hackneyed jargons in the AI and linguist community.


Example business use cases for text analytics 

1. Customer 360

Analyzing customer email, surveys, call center logs, and social media streams such as blogs, tweets, forum posts, and newsfeeds to understand customers better. 

2. Product or service reviews

Analysis of customer reviews of products or services helps enterprises understand user sentiment or common issues customers are talking about. 

3. Recruitment

Keyword analysis (comparing profiles with job descriptions) helps in short-listing suitable candidates. 

4. Sentiment Analysis

Contextual mining of text which identifies and extracts subjective information in the source material, and helping a business to understand the social sentiment of their brand, product, or service while monitoring online conversations. 

5. Social media monitoring

Identifying and determining what is being said about a brand, individual, or product through different social and online channels. 


What is in a Text Analytics Pipeline? 

There are many ways text analytics can be implemented depending on the business needs, data types, and data sources. All share four key steps. 

1. Data Acquisition

Text analytics begins with collecting the text to be analyzed -- defining, selecting, acquiring, and storing raw data. This data can include text documents, and web pages (blogs, news, etc.) among many other sources. 

2. Data Preparation

Once data is acquired, the enterprise must prepare it for analysis. The data must be in the proper form to work with machine learning models that will be used for data analysis. There are four stages in data preparation: 

  • Text cleansing
    Removes any unnecessary or unwanted information. Text data is restructured to ensure data can be read the same way across the system and to improve data integrity (also known as "text normalization"). 

  • Tokenization
    Breaks up a sequence of strings into pieces (such as words, keywords, phrases, symbols, and other elements) called tokens. Semantically meaningful pieces (such as words) will be used for analysis.

  • Part-of-speech tagging
    Assigns a grammatical category to the identified tokens. Familiar grammatical categories include nouns, verbs, adjectives, and adverbs. Also referred to as "PoS". 

  • Parsing
    Creates syntactic structures from the text based on the tokens and PoS models. Parsing algorithms consider the text's grammar for syntactic structuring. Sentences with the same meaning but different grammatical structures will result in different syntactic structures. 

3. Data Analysis 

Process of analyzing the prepared text data. Machine learning models can be used to analyze huge volumes of data, and the outcome is typically produced as an API in JSON format or a CSV/Excel file. There are many ways data can be analyzed; two popular approaches are text extraction and text tagging. 
Simply stated, text extraction is the process of identifying structured information from unstructured text. Text tagging is the process of assigning tags to text data based on its content and relevance. 

4. Data Visualization

The process of transforming the analysis into actionable insights, representing the data in graphs, tables, and other easy-to-understand representations. 


In the subsequent articles, we shall go through some of the important and frequently used steps in the pipeline and see how exactly the data flows within a typical Text Analytics application.

Top comments (0)