<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NEUROTECH AFRICA</title>
    <description>The latest articles on DEV Community by NEUROTECH AFRICA (@neurotech_africa).</description>
    <link>https://dev.to/neurotech_africa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F5628%2F67456138-0eec-4808-baf8-540aa5250776.jpg</url>
      <title>DEV Community: NEUROTECH AFRICA</title>
      <link>https://dev.to/neurotech_africa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/neurotech_africa"/>
    <language>en</language>
    <item>
      <title>NLP Communities for Data Professionals to Join</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Wed, 30 Nov 2022 13:03:25 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/nlp-communities-for-data-professionals-to-join-18do</link>
      <guid>https://dev.to/neurotech_africa/nlp-communities-for-data-professionals-to-join-18do</guid>
      <description>&lt;p&gt;Are you a data professional, engineer, or aspiring person to grow in NLP fields?&lt;/p&gt;

&lt;p&gt;Yes, this is for you&lt;/p&gt;

&lt;p&gt;One of the best methods to stay current with all the newest technologies and tools connected to NLP in the tech industry is to join NLP communities.&lt;/p&gt;

&lt;p&gt;Tech communities are beneficial in keeping tech enthusiasts updated and motivated in building impactful tools even in idea growth, also project success. You may be working in the same industry, and having a channel to meet with different folks working in the same industry can help you on improving your expertise but also in time expose you to new tools. I bring this up because it is one of the strategies I have been using for more than four years.&lt;/p&gt;

&lt;p&gt;I came across many people asking about NLP communities to engage and grow their expertise from that experience, I planned to put every piece together and share with folks out there.&lt;/p&gt;

&lt;p&gt;I hope this will help a lot of folks looking for these communities, stay with me to explore the NLP communities.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Masakhane:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.masakhane.io/"&gt;Masakhane&lt;/a&gt; pushing to build datasets and tools to facilitate Natural Language Processing in African languages and pose new research problems to enrich the NLP  research landscape. A research effort originally for &lt;a href="http://translate.masakhane.io/"&gt;Machine translation&lt;/a&gt; focused on African languages that are open-source, continent-wide, and distributed online. It aimed to build a community of Natural Language Processing researchers, connect and grow it, spurring and sharing further research to enable language preservation, tool building, and increasing its global visibility and relevance.&lt;/p&gt;

&lt;p&gt;You can join Masakhane slack community workspace through  👉  &lt;strong&gt;&lt;a href="https://masakhane-nlp.slack.com/join/shared_invite/enQtODM3ODA3ODE0ODIwLTAyYzg3M2E3Nzg4Y2I3NzgxNDg4MmNlZDE4OTBjMzBjMjg4NTcxMWZlYTg3ZDljMTU4M2FjOTk3MDVjOWM2NGM#/shared-invite/email"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can join the Masakhane mail list group through  👉  &lt;strong&gt;&lt;a href="https://groupc/"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;NeuralSpace Community:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A group of NLP enthusiasts led by NeuroSpace company with the mission to create a platform that helps bridge the massive language gap, that is prevalent around the world and prevents many from accessing vital services or education.&lt;/p&gt;

&lt;p&gt;They do use slack as a channel for exchanging information and organizing NLP events with collaboration from experts from Meta AI, NeuralSpace, LoResMT, and Masakhane.&lt;/p&gt;

&lt;p&gt;You can join NeuralSpace slack community workspace through  👉  &lt;strong&gt;&lt;a href="https://neuralspacecommunity.slack.com/ssb/redirect"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Hugging Face Community:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A place where a broad community of data scientists, researchers, and ML engineers can come together and share ideas, get support and contribute to open source projects.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hugging Face is a community and data science platform that provides tools that enable users to build, train and deploy ML models based on open-source code and technologies.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is one of the most awesome community I have ever encountered in the NLP space, each day people share cutting-edge tools which are essential to the NLP ecosystem. Everyone can exchange and examine models and datasets at the Hugging Face central hub. In order to democratize AI for everyone, they aspire to become the location with the largest collection of models and datasets.&lt;/p&gt;

&lt;p&gt;You can join Hagging Face community workspace through  👉  &lt;strong&gt;&lt;a href="https://huggingface.co/"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Spark NLP:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Slack group for developers and Spark NLP users to help get started to solve common NLP use cases and exchange ideas on best NLP practices. This community was built on the grounds of knowledge and communication management.&lt;/p&gt;

&lt;p&gt;You can join the Spark NLP community slack workspace through  👉  &lt;strong&gt;&lt;a href="https://app.slack.com/client/T9BRVC9AT/setup-people"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lanfrica Community:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Lanfrica aims to mitigate the difficulty encountered in the discovery of African language resources by creating a centralized hub. They do organize a series of talks to highlight and showcase language technology efforts (research, projects, software, applications, datasets, models, initiatives, etc.) geared towards under-represented languages around the world.&lt;/p&gt;

&lt;p&gt;Lanfrica is equally interested in efforts targeting (or that can be transferred to) low-resource languages (these are languages with not much data, societal/research efforts or technologies, and recognition) and endangered languages.&lt;/p&gt;

&lt;p&gt;You can join the Lanfrica community mailing list through  👉  &lt;strong&gt;&lt;a href="https://lanfrica.com/mailing-list/subscribe"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can join the Lanfrica community slack workspace through  👉  &lt;strong&gt;&lt;a href="https://lanfrica.slack.com/join/shared_invite/zt-12x0oo6i8-tZ182NK~aUXroVE5tgRNaw#/shared-invite/email"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Other DS &amp;amp; ML Communities:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Kaggle:&lt;/strong&gt; is a well-known data science competition platform. It boasts a community of over 5 million users, where you can compete and share data sets and projects. Inside Kaggle you’ll find all the code and data you need to do your data science work. Use over 50,000 public &lt;a href="https://www.kaggle.com/datasets"&gt;datasets&lt;/a&gt; and 400,000 public &lt;a href="https://www.kaggle.com/kernels"&gt;notebooks&lt;/a&gt; to conquer any analysis in no time. The best thing I like about kaggle is they have a &lt;a href="https://www.kaggle.com/learn"&gt;well-structured and interactive learning&lt;/a&gt; environment even for beginners to start their journey in data science and machine learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zindi Africa:&lt;/strong&gt;  for sure this platform played an essential role in my career, not saying like dumped everything in my head but I consumed a lot of challenges to improve my data science understanding.&lt;/p&gt;

&lt;p&gt;Zindi hosts the largest community of African data scientists, working to solve the world’s most pressing challenges using machine learning and Artificial Intelligence.&lt;/p&gt;

&lt;p&gt;You can join the Zindi community through  👉  &lt;strong&gt;&lt;a href="https://zindi.africa/?referralCode%3D4WtlJO"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Driven Data:&lt;/strong&gt; works on projects at the intersection of data science and social impact, in areas like international development, health, education, research and conservation, and public services. They focused to give more organizations access to the capabilities of data science and engage more data scientists with social challenges where their skills can make a difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DataTalks:&lt;/strong&gt; This is another awesome community I do prefer to join their &lt;a href="https://datatalks.club/events.html"&gt;events&lt;/a&gt; and training programs. &lt;a href="https://datatalks.club/"&gt;DataTalks&lt;/a&gt; is the place to talk about data, the global online community of data enthusiasts. Also, they do post their events on youtube through their &lt;a href="https://www.youtube.com/@DataTalksClub"&gt;channel&lt;/a&gt;, which is a very resourceful platform for data professional growth.&lt;/p&gt;

&lt;p&gt;You can join the DataTalks community slack workspace through  👉  &lt;strong&gt;&lt;a href="https://datatalks.club/slack.html"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MLOps Community:&lt;/strong&gt; The great community for learning topics related to machine learning models into production, they fill the swiftly growing need to share real-world Machine Learning Operations best practices from engineers in the field.&lt;/p&gt;

&lt;p&gt;MLOps community hosts weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.&lt;/p&gt;

&lt;p&gt;Curious to dig more about this awesome community?&lt;/p&gt;

&lt;p&gt;You can join the MLOps community slack workspace through  👉 &lt;strong&gt;&lt;a href="https://home.mlops.community/"&gt;Join here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final Thoughts:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When it comes to the advancement of AI, the open-source community is becoming more and more significant. Sharing information and resources to advance and advance is where the future is headed because no firm, not even the tech giants, will be able to "solve AI" on their own!&lt;/p&gt;

&lt;p&gt;I hope this article opened new thoughts in the machine learning space, please spread the love by sharing with others on socials.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M1JuqBZS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0wznc5nyxlai97tlaug7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M1JuqBZS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0wznc5nyxlai97tlaug7.jpg" alt="Image description" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>community</category>
    </item>
    <item>
      <title>Understanding How to Evaluate Textual Problems</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Tue, 13 Sep 2022 09:47:52 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/understanding-how-to-evaluate-textual-problems-32md</link>
      <guid>https://dev.to/neurotech_africa/understanding-how-to-evaluate-textual-problems-32md</guid>
      <description>&lt;p&gt;As a data professional, building models is a common topic what differs is just what that model is for? models, should solve certain challenges? then after we consider measuring the quality and performance of these models using &lt;a href="https://deepai.org/machine-learning-glossary-and-terms/evaluation-metrics"&gt;evaluation metrics&lt;/a&gt; and these are essential to confirm something concerning built models.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Evaluation metrics are used to measure the quality of the statistical or machine learning model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article was originally published on the &lt;a href="https://blog.neurotech.africa/evaluation-metrics-for-textual-problems/"&gt;Neurotech Africa&lt;/a&gt; blog.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Need for evaluation?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The aim of building AI solutions is to apply them to real-world challenges. Mind you, our real world is complicated, so how do we decide which model to use and when? that is when their metrics come into application.&lt;/p&gt;

&lt;p&gt;A failure to know how to justify why your choosing a certain model instead of others or why a certain model is good or not, indicates you are not aware of what your solving or the model you built.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"When you can measure what you are speaking of and express it in numbers, you know that on which you are discussing. But when you cannot measure it and express it in numbers, your knowledge is of a very meager and unsatisfactory kind." ~ Lord Kelvin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today let's have a sense of what are the metrics used in Natural Language Processing challenges.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Textual Evaluation Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the Natural Language Processing (NLP) field, it is difficult to measure the performance of models for different tasks, challenge with labels is easier to evaluate but in the case of NLP task, the ground truth or result can be varied.&lt;/p&gt;

&lt;p&gt;We have lots of downstream tasks such as text or sentiment analysis, language generation, question answering, text summarization, text recognition, and translation.&lt;/p&gt;

&lt;p&gt;It is possible that biases creep into models based on the dataset or evaluation criteria. Therefore it is necessary to make Standard Performance Benchmarks to evaluate the performance of models for NLP tasks. These Performance metrics give us an indication of which model is better for which task.&lt;/p&gt;

&lt;p&gt;Let's jump right in to discuss some of the textual evaluation metrics  😊&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy:&lt;/strong&gt; common metric in &lt;a href="https://en.wikipedia.org/wiki/Sentiment_analysis"&gt;sentiment analysis&lt;/a&gt; and &lt;a href="https://blog.neurotech.africa/swahili-text-classification-using-transformers/"&gt;classification&lt;/a&gt;, not the best one but denotes the fraction of times the model makes a correct prediction as compared to the total predictions it makes. Best used when the output variable is categorical or discrete. For example, how often a sentiment classification algorithm is correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confusion Matrix:&lt;/strong&gt; also used in &lt;a href="https://blog.neurotech.africa/swahili-text-classification-using-transformers/"&gt;classification&lt;/a&gt; challenges, It provides a clear report on the prediction of models in different categories, from the primary objective visualization of the model the following questions can be answered:-&lt;/p&gt;

&lt;p&gt;What percentage of the positive class is actually positive? (Precision)&lt;/p&gt;

&lt;p&gt;What percentage of the positive class gets captured by the model? (Recall)&lt;/p&gt;

&lt;p&gt;What percentage of predictions are correct? (Accuracy)&lt;/p&gt;

&lt;p&gt;Also, we can consider Precision and Recall are complementary metrics that have an inverse relationship. If both are of interest to us then we’d use the F1 score to combine precision and recall into a single metric.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perplexity:&lt;/strong&gt; is a great probabilistic measure used to evaluate exactly how confused our model is. It’s typically used to evaluate &lt;a href="https://www.techtarget.com/searchenterpriseai/definition/language-modeling"&gt;language models&lt;/a&gt;, but it can be used in dialog generation tasks.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The language model refers to how machine-generated text is similar to humans write it. In other words, given w previous word and the correct score of generating w+1 token. The lower you get the perplexity, the better model you have.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Find this article about the perplexity evaluation metric, and take your time to explore &lt;em&gt;&lt;a href="https://towardsdatascience.com/perplexity-in-language-models-87a196019a94"&gt;Perplexity in Language Models&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bits-per-character(BPC) and bits-per-word:&lt;/strong&gt;  are other metrics often used for language models evaluations tasks. It measures exactly the quantity that it is named after the average number of bits needed to encode on character.&lt;/p&gt;

&lt;p&gt;“&lt;em&gt;if the language is translated into binary digits (0 or 1) in the most efficient way, the entropy is the average number of binary digits required per letter of the original language.&lt;/em&gt;" ~ Shannon&lt;/p&gt;

&lt;p&gt;Entropy is the average number of BPC. The reason that some language models report both cross entropy loss and BPC is purely technical.&lt;/p&gt;

&lt;p&gt;In practice, if everyone uses a different base, it is hard to compare results across models. For the sake of consistency, when we report entropy or cross-entropy, we report the values in bits.&lt;/p&gt;

&lt;p&gt;Mind you, BPC is specific to character-level language models. When we have word-level language models, the quantity is called bits-per-word (BPW) – the average number of bits required to encode a word&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;General Language Understanding Evaluation (GLUE):&lt;/strong&gt; this is a multi-task benchmark based on different types of tasks rather than evaluating a single task. As language models are increasingly being used for the purposes of transfer learning to other NLP tasks, the intrinsic evaluation of a language model is less important than its performance on downstream tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Super General Language Understanding Evaluation(superGLUE):&lt;/strong&gt; methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. This is the better or modified version of the GLUE benchmark with a new set of more difficult language understanding tasks, and improved resources after a GLUE benchmark performance comes close to the level of non-expert humans.&lt;/p&gt;

&lt;p&gt;It comprised new ways to test creative approaches on a range of difficult NLP tasks including sample-efficient, transfer, multitask, and self-supervised learning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BiLingual Evaluation Understudy(BLEU):&lt;/strong&gt; commonly used in &lt;a href="https://blog.neurotech.africa/understanding-the-concept-of-machine-translation/"&gt;Machine translation&lt;/a&gt; and &lt;a href="https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/"&gt;Caption Generation&lt;/a&gt;, Since manual labeling for professional translation is very expensive the metric used in comparing a candidate translation(&lt;em&gt;by machine&lt;/em&gt;) to one or more reference translations(&lt;em&gt;by a human being&lt;/em&gt;). And the output lies in the range of 0-1, where a score closer to 1 indicates good quality translations.&lt;/p&gt;

&lt;p&gt;The calculation of BLEU involves the concept of n-gram precision and sentence brevity penalty.&lt;/p&gt;

&lt;p&gt;This metric has some drawbacks such as It doesn’t consider the meaning, It doesn’t directly consider sentence structure and It doesn’t handle morphologically rich languages.&lt;/p&gt;

&lt;p&gt;Rachael Tatman wrote an amazing article about BLEU just take your time to read it &lt;a href="https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-BLEU:&lt;/strong&gt; this ****is a smart use of the traditional BLEU metric for capturing and quantifying diversity in the generated text.&lt;/p&gt;

&lt;p&gt;The lower the value of the self-bleu score, the higher the diversity in the generated text. Long text generation tasks like story generation, news generation, etc could be a good fit to keep an eye on such metrics, helping evaluate the redundancy and monotonicity in the model. This metric can be complemented with other text generation evaluation metrics that account for the goodness and relevance of the generated text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metric for Evaluation of Translation with Explicit ORdering(METEOR):&lt;/strong&gt; Precision-based metric to measure the quality of the generated text. Sort of a more robust BLEU. Allows synonyms and stemmed words to be matched with the reference word. Mainly used in machine translation.&lt;/p&gt;

&lt;p&gt;METEOR solved two BLEU drawbacks' of not taking recall into account and only allowing exact 𝑛-gram matching. Instead, METEOR first performs exact word mapping, followed by stemmed-word matching, and finally, synonym and paraphrase matching then computes the F-score using this relaxed matching strategy.&lt;/p&gt;

&lt;p&gt;METEOR only considers unigram matches as opposed to 𝑛-gram matches it seeks to reward longer contiguous matches using a penalty term known as &lt;strong&gt;fragmentation penalty&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BERTScore:&lt;/strong&gt; this is an automatic evaluation metric used for testing the goodness of text generation systems. Unlike existing popular methods that compute token-level syntactical similarity, BERTScore focuses on computing semantic similarity between tokens of reference and hypothesis.&lt;/p&gt;

&lt;p&gt;Bidirectional Encoder Representations from Transformers compute the cosine similarity of each hypothesis token 𝑗 with each token 𝑖 in the reference sentence using contextualized embeddings. They use a greedy matching approach instead of a time-consuming best-case matching approach and then compute the F1 measure.&lt;/p&gt;

&lt;p&gt;BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Character Error Rate (CER):&lt;/strong&gt;  this is a common metric of the performance of an automatic speech recognition system. This value indicates the percentage of characters that were incorrectly predicted. The lower the value, the better the performance of the ASR system with a CER of 0 being a perfect score.&lt;/p&gt;

&lt;p&gt;Possible tasks  CER can be applied to measure the performance are Speech Recognition, Optical Character Recognition (OCR), and Handwriting Recognition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word Error Rate (WER):&lt;/strong&gt; this is a common performance metric mainly used for speech recognition, optical character recognition (OCR), and handwriting recognition.&lt;/p&gt;

&lt;p&gt;When recognizing speech and transcribing it into text, some words may be left out or misinterpreted. WER compares the predicted output and the reference transcript word by word to figure out the number of differences between them.&lt;/p&gt;

&lt;p&gt;There are three types of errors considered when computing WER:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Insertions: when the predicted output contains additional words that are not present in the transcript(for example, &lt;em&gt;SAT&lt;/em&gt; becomes &lt;em&gt;essay tea&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Substitutions:&lt;/em&gt; when the predicted output contains some misinterpreted words that replace words in the transcript(for example, &lt;em&gt;noose&lt;/em&gt; is transcribed as &lt;em&gt;moose&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Deletions:&lt;/em&gt; when the predicted output doesn’t contain words that are present in the transcript(for example, &lt;em&gt;turn it&lt;/em&gt; around becomes &lt;em&gt;turn around&lt;/em&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For understanding let's consider the following reference transcript and predicted output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reference transcript: “&lt;em&gt;Understanding textual evaluation metrics is awesome for a data professional”&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Predicted output: “&lt;em&gt;Understanding textual metrics is great for a data professiona*l&lt;/em&gt;”*.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this case, the predicted output has one deletion (the word “&lt;em&gt;textual&lt;/em&gt;” disappears) and one substitution (“&lt;em&gt;awesome”&lt;/em&gt; becomes “&lt;em&gt;great”&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;So, what is the Word Error Rate of this translation? Basically, WER is the number of errors divided by the number of words in the reference transcript.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;WER = (num inserted + num deleted + num substituted) / num words in the reference&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thus, in our example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;WER = (0 + 1 + 1) / 10 = 0.2&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lower WER often indicates that the Automated Speech Recognition (ASR) software is more accurate in recognizing speech. A higher WER, then, often indicates lower ASR accuracy.&lt;/p&gt;

&lt;p&gt;The drawback is that it assumes the impact of different errors is the same. Sometimes, insertion error may have a bigger impact than deletion. Another limitation is that this metric cannot distinguish a substitution error from combing, deletion and insertion error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recall-Oriented Understudy for Gisting Evaluation&lt;/strong&gt; (&lt;strong&gt;ROUGE):&lt;/strong&gt; this is Recall based, unlike BLEU which is Precision based. ROUGE metric includes a set of variants: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S. ROUGE-N is similar to BLEU-N in counting the 𝑛-gram matches between the hypothesis and reference.&lt;/p&gt;

&lt;p&gt;This is a set of metrics used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references or a human-produced summary or translation.&lt;/p&gt;

&lt;p&gt;Mind you, in summarization tasks where it’s important to evaluate how many words a model can recall (recall = % of true positives versus both true and false positives)&lt;/p&gt;

&lt;p&gt;Feel free to check out the python package &lt;em&gt;&lt;a href="https://pypi.org/project/rouge/"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final Thoughts:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Understanding which performance measure to use and the best one for the problem at hand help to validate the right solution to meet the needs of the particular challenge.&lt;/p&gt;

&lt;p&gt;The challenge with NLP solutions is on measuring their performance for various tasks. Speaking of other Machine learning tasks, it is easier to measure the performance because the cost function or evaluation criteria are well defined and have a clear picture of what is to be evaluated.&lt;/p&gt;

&lt;p&gt;One more reason for this is that labels are well-defined in other tasks, but in the NLP task, the ground result can vary a lot. Coming up with the best model depends on various factors but evaluation metrics are an essential factor to consider depending on the nature of the task you are solving.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;References:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thegradient.pub/understanding-evaluation-metrics-for-language-models/"&gt;Evaluation Metrics for Language Modeling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213"&gt;Evaluating Text Output in NLP: BLEU at your own risk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2006.14799.pdf"&gt;Evaluation of Text Generation: A Survey&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aclanthology.org/2021.triton-1.6.pdf"&gt;Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-oriented Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1904.09675"&gt;Evaluating Text Generation with BERT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.paperspace.com/automated-metrics-for-evaluating-generated-text/"&gt;Automated metrics for evaluating the quality of text generation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Future of Customer Service: What You Need to Know About Conversational AI</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Fri, 09 Sep 2022 13:27:13 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/the-future-of-customer-service-what-you-need-to-know-about-conversational-ai-33nb</link>
      <guid>https://dev.to/neurotech_africa/the-future-of-customer-service-what-you-need-to-know-about-conversational-ai-33nb</guid>
      <description>&lt;p&gt;This article was originally published on the &lt;a href="https://blog.neurotech.africa/the-future-of-customer-service-what-you-need-to-know-about-conversational-ai/"&gt;Neurotech Africa blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today’s consumers are more informed, connected, and intractable than ever before. As a result, brands that fail to meet their high standards face an uphill battle. 74% of consumers will not recommend a brand again after a negative experience. Moreover, 90% of customers expect to be able to communicate directly with a company through chat or messaging as if they were friends. Conversational AI has the potential to revolutionize the customer service experience by making it more personal and accessible for end users. This blog post breaks down the whys and hows of conversational AI in customer service, so keep reading to learn more.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;About Conversational AI?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Conversational AI, also known as natural language processing, is the ability of machines to understand human language and respond accordingly. Natural language processing is key to implementing conversational interfaces or interfaces that allow people to communicate with computers through spoken language and written text as if they were having a conversation with another person. A conversational interface has two main parts — An automated system (e.g. an IVR or an SMS-based solution) that detects and responds to user inputs — and a natural language processor (NLP) that analyses and understands the user input. The NLP will then transform the input into a machine-readable format and then trigger an appropriate response from the system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--raweIVdk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2A7E76P5SryLng4orn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--raweIVdk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2A7E76P5SryLng4orn.jpg" alt="https://miro.medium.com/max/700/0*7E76P5SryLng4orn.jpg" width="700" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why is Customer Service Important?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Customer service is a key aspect of any customer-facing business. It can be the difference between capturing a new customer and losing an existing one. It’s no wonder that the customer experience is the top priority for brands. According to a recent study, 69% of customers would pay more for a better experience. That’s why so many companies are turning to customer service AI. Customer service AI brings conversational interfaces, a technology that’s been around since the 1960s. More recently, it’s become increasingly important in the fields of commerce, health care, transportation, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How will Conversational AI Change Customer Service?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The rise of human-machine communication will transform customer service in the following ways: -&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Increased accessibility&lt;/strong&gt;: Human customer service will become more accessible to everyone thanks to the rise of AI-powered virtual assistants. Meanwhile, AI customer service agents will be able to handle more requests from more people simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better quality of service&lt;/strong&gt;: High-quality, personalized service delivered by AI agents will boost customer satisfaction and retention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better customer satisfaction&lt;/strong&gt;: Satisfied customers generate more revenue for businesses than unhappy customers. AI customer service agents can increase customer satisfaction across the board.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved customer retention&lt;/strong&gt;: Businesses can retain customers by providing an exceptional customer service experience. AI can help businesses do just that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved AI-human collaboration:&lt;/strong&gt; Businesses will unlock new levels of productivity by bringing AI and human agents together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better customer retention through personalized messaging&lt;/strong&gt;: AI agents will be able to deliver highly personalized messages to customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Limitations of Conversational AI in Customer Service&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While conversational AI is poised to revolutionize the customer service experience, there are some limitations that we must account for&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Customer expectations:&lt;/strong&gt; Customers have high standards and will be disappointed if AI falls short of their expectations. -&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data privacy and security:&lt;/strong&gt; Businesses must protect the privacy of their customer’s data. AI poses a particular concern in this regard, as hackers can use AI to take over machines and systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cultural shifts in customer service:&lt;/strong&gt; AI may not be a good fit for every culture, and businesses may have to adjust their strategies accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human resources:&lt;/strong&gt; The implementation of AI may mean fewer human agents, which may pose problems for businesses that depend on human customer service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical limitations:&lt;/strong&gt; While the promise of AI is great, the technology is not yet advanced enough to meet all of our expectations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shifting customer service strategies:&lt;/strong&gt; Customer service strategies may shift in the coming years, rendering today’s AI technologies obsolete.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Achieve Success with Conversational AI in Customer Service&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Success with conversational AI in customer service starts with a strategic plan for implementation. Companies should consider the following: -&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LbDIrz6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2AfBGttCdBm2R3xSn_.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LbDIrz6m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2AfBGttCdBm2R3xSn_.jpg" alt="https://miro.medium.com/max/700/0*fBGttCdBm2R3xSn_.jpg" width="700" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Defining the customer experience:&lt;/strong&gt; Companies must define their customer experience strategy, including how AI agents fit into that strategy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building a strategy for AI:&lt;/strong&gt; Companies should decide what type of AI to implement and how that AI will work within their strategy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hiring the right talent:&lt;/strong&gt; Companies must hire the right people to implement their AI strategy. This includes both AI agents and human agents that will collaborate with them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investing in the right technology:&lt;/strong&gt; Companies must choose the right technology that supports their AI strategy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing and training:&lt;/strong&gt; Companies must ensure that AI works as intended before launching it to customers. They must also train their human and AI agents to work together.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final Thoughts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The future of customer service is bright, but businesses must act soon to take advantage of the benefits of conversational AI. Companies must prepare by defining their strategy, investing in the right technology, and hiring the right talent. They must also consider the limitations of AI and have a plan for overcoming them. Finally, businesses must act quickly before the benefits of conversational AI are claimed by others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HDZ3Uxm8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tuybc5sjf3hmy2nk2dyz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HDZ3Uxm8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tuybc5sjf3hmy2nk2dyz.jpg" alt="Image description" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Redefining Customer Engagement as Digital Bank</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Fri, 09 Sep 2022 13:14:53 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/redefining-customer-engagement-as-digital-bank-2d72</link>
      <guid>https://dev.to/neurotech_africa/redefining-customer-engagement-as-digital-bank-2d72</guid>
      <description>&lt;p&gt;This article means a lot to digital banks on how they can use conversational Artificial intelligence to acquire, engage and retain customers.&lt;/p&gt;

&lt;p&gt;This article was originally published on the &lt;a href="https://blog.neurotech.africa/redefining-customer-engagement-as-digital-bank/"&gt;Neurotech Africa&lt;/a&gt; blog.&lt;/p&gt;

&lt;p&gt;Wow! the heading of this article brings you here, great to hear that.&lt;/p&gt;

&lt;p&gt;And I will be sharing with you my understanding of the estimated degree and depth of people interacting with digital banks associated with artificial intelligence technology. Without further due let me pencil the topic in simple words.&lt;/p&gt;

&lt;p&gt;But you already have something in your mind right?&lt;/p&gt;

&lt;p&gt;Customer engagement is the means by which a company creates a relationship with its customer base to foster brand loyalty and awareness. This can be accomplished via marketing campaigns, new content created for and posted to websites, and outreach via social media and mobile and wearable devices, among other methods.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Customer engagement is the ongoing interactions between company and customer, offered by the company, chosen by the customer.” by Paul Greenberg&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wonderful, now we are clear on the topic it’s all about when you let your customers choose how they’d like to engage with you, you’ll be more likely to uncover the type of interactions that they find valuable. By making it easier for customers to engage in ways they find valuable, you’ll strengthen their emotional investment in your digital bank.&lt;/p&gt;

&lt;p&gt;Disruptive innovation in financial services is growing massively every now and then. The challenges facing this industry are making professionals continue brainstorming the right way of emerging with it using the existing technologies.In modern banking Artificial intelligence has developed an important and distinguished series of roles, from security automation, and loan automation to customer engagement processes.&lt;/p&gt;

&lt;p&gt;Companies with well-defined data strategies have realized the great role played by this technology to bring value to their products. The journey of handling customers differs from one organization to another depending on culture, strategies, goals, and so on.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why digital banks should care about redefining customer engagement?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In digital Banking customers are the kings, the interaction between the bank and people made the business simply to say one bank needs people who also other competitors need them.Redefining the interaction between your customers and the bank is important to provide good customer service for a successful business. With the advent of digital, the scope of good customer service has extended from providing timely and high-quality products and/or services to providing an experience that delivers value outside the original sale.&lt;/p&gt;

&lt;p&gt;As the banking world has become more crowded, there’s been an overwhelming focus on clicks, conversions, and acquisition costs.&lt;/p&gt;

&lt;p&gt;However, these acquisition strategies alone won’t be enough to grow your business sustainably. Finding ways to engage with your customers in between purchases strengthens their emotional connection to your brand, helping you retain the customers you already have while sustainably growing your business.&lt;/p&gt;

&lt;p&gt;In fact, the revenue banks generate 95 percent rely on effective customer engagement through interest on loans and fees associated with their services.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://www.constellationr.com/blog-news/research-summary-why-live-engagement-marketing-supercharges-event-marketing"&gt;constellation research&lt;/a&gt; on customer engagement, companies that have improved engagement increase cross-sell by 22 percent, drive up-sell revenue from 13 percent to 51 percent, and also increase order sizes from 5 percent to 85 percent.&lt;/p&gt;

&lt;p&gt;The statistics show the impact of engaging your customers and how significant the revenue can increase.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;About conversational Artificial intelligence&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Conversational AI involves three concepts: artificial intelligence, human language, and automation. We can define it as the type of artificial intelligence that enables consumers to interact with computer applications the way they would with other humans.&lt;/p&gt;

&lt;p&gt;Best conversation AI solutions show remarkable support for businesses. Think about the last time that you communicated with a business online and received the answer to your question within seconds all with little effort. This is conversational AI doing powerful work seamlessly and efficiently. The bonus? A conversational AI solution knows when to notify and transfer the customer to a live agent all within the same conversation stream when the situation warrants it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conversational Artificial Intelligence in customer engagement in digital Bank&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The process of acquiring, engaging, and retaining customers can be boosted with technologies like conversational Artificial intelligence. In fact, the technology itself does not offer full focus on the process but specific means other factors can be considered, here are the cases for digital banks&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Increasing customer attraction through socials:-&lt;/strong&gt; making easier accessibility of digital banks’ services can impact their engagement with customers through social platforms like WhatsApp, telegram, etc also go a long way in keeping them engaged over time. Conversational AI makes it easier to handle this kind of engagement by using a natural conversation with your customers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manage payments and transactions:-&lt;/strong&gt; On a regular, people have to clear bills, pay businesses, shop online, or perform any kind of online transaction. A conversation AI can help the user make and track these payments. Clearing payments can often be urgent and time-bound. More often than not, in such cases, switching platforms to complete transactions can be inconvenient. But with an omnichannel conversational AI, your customers can make payments right where they are, and avoid any delays!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation of new service:-&lt;/strong&gt; with conversational AI, digital banks’ can simplify the process of selecting the right services or products for specific customers, from their day-to-day interactions. Meeting user expectations is a great win and this can improve the engagement with your bank.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addressing frequently asked questions (FAQs)&lt;/strong&gt;:- With conversational AI handling, repetitive questions becomes easier instead of agent calls or scrolling over a long website page, customers can type or speak and get an answer to a query instantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leads generation:-&lt;/strong&gt; Conversational AI solutions have no match when interaction comes to play. They can interact with customers for the first time and understand their needs and sentiments behind the conversation. This, very human interaction, can help digital banks acquire new customers and also get their personal details. These details are then transferred to the sales team for taking the conversation forward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Driving referral campaign with exiting customers:-&lt;/strong&gt; with conversational AI driving engagement doesn’t have to be solely between your customers and your brand, it can also be between customers. Empowering your best customers to easily share your brand with their friends and family can not only help you acquire a new one but also engage the customers you have.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How does &lt;a href="https://www.neurotech.africa/#contact"&gt;Neurotech’s&lt;/a&gt; conversational AI solution, redefine customer engagement for digital banks?&lt;/p&gt;

&lt;p&gt;We offer customer support solutions for businesses to engage customers with a personalized experience at every touchpoint, across any digital channel through our internal engine called &lt;a href="https://sarufi.io/"&gt;Sarufi&lt;/a&gt;. We care about memorable experiences that happen when customers are free to speak naturally. Our conversational solution(chatbots) understands customer, and provide seamless customer support across multiple platforms, enabling you to offer a more personalized, contextual service to customers, reduce call center overload, ensure reliable customer support 24/7, and you can explore more from &lt;a href="https://blog.neurotech.africa/how-can-neurotech-transform-your-business-with-conversational-ai/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ILOrD12C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2AOr7TTBGbnJci0NPN.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ILOrD12C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/max/700/0%2AOr7TTBGbnJci0NPN.jpg" alt="https://miro.medium.com/max/700/0*Or7TTBGbnJci0NPN.jpg" width="700" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can reach out for a demo of our banking conversational AI solution &lt;a href="https://www.neurotech.africa/#contact"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final thoughts:&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Don’t confuse technology and business strategy, You should consider relying on your strategies which can be boosted with technology like Artificial intelligence.&lt;/p&gt;

&lt;p&gt;Great customer experiences across every channel are imperative that digital banks’ cannot ignore. While the availability of digital footprints has made it possible to deliver pronounced mobile and digital experiences, digital banks need to ensure that the customer at the physical store is not deprived of the same seamless and immersive experience that the digital native or the millennial customer is accustomed to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---YafQApo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0hb7cuw37taf5cl3nn4a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---YafQApo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0hb7cuw37taf5cl3nn4a.jpg" alt="Image description" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>Filter Swahili SMS by categories using machine learning.</title>
      <dc:creator>Elia</dc:creator>
      <pubDate>Wed, 10 Aug 2022 09:22:42 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/filter-swahili-sms-by-categories-using-machine-learning-2ca6</link>
      <guid>https://dev.to/neurotech_africa/filter-swahili-sms-by-categories-using-machine-learning-2ca6</guid>
      <description>&lt;p&gt;When you hear "&lt;strong&gt;ding&lt;/strong&gt;" you almost fall over running to your phone in the hopes of seeing the long-awaited SMS and then sadly discover it's a promotional message from an &lt;strong&gt;&lt;em&gt;XYZ&lt;/em&gt;&lt;/strong&gt; brand. This can really be annoying, many of these promotional and spam SMS continue to clog up our inboxes and get worse with time, stealing our precious time and attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What can we learn from Gmail?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The problem is not very new, It also exists on the side of the email and one thing that email providers like GMAIL adopted and worked so well is grouping emails into categories depending on the intentions of the emails which can either be &lt;em&gt;promotional,&lt;/em&gt; &lt;em&gt;social,&lt;/em&gt; &lt;em&gt;primary&lt;/em&gt; and also being able to filter out fraudulent emails (spam).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can we replicate the Gmail Approach to SMS? If yes How?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The meat of this article is centered around answering that question, we are going to learn how can we classify SMS messages into categories according to the intention of the messages, then now you might be asking yourself &lt;em&gt;how one gets to know and classify the intention of SMS&lt;/em&gt;? We are going to train a machine learning model that will learn the similarities of each category and then use its generalized learned model to group new SMS into categories.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Data Collection and Annotations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first step was sourcing, collecting, and annotating SMS data that will then be used to train our machine learning model, the data collection was done using the &lt;a href="https://play.google.com/store/apps/details?id=com.jerryzigo.smsbackup"&gt;SMS backup&lt;/a&gt; application from multiple individual contributors, and the app data output was a well-organized JSON data of SMS and their details as shown in the below snippet example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7126"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"TIGO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Tigo inakutakia maadhimisho mema ya siku ya Muungano."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1619430394016"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"errorCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"locked"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"messageDirection"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INCOMING"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"messageType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SMS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"protocol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"replyPathPresent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"seen"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"serviceCenter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"+2557********"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Tigo inakutakia maadhimisho mema ya siku ya Muungano."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"threadId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"492"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then annotated our data into distinct categories based on the context and intention of the text messages, these were the category that we came with;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Promotional&lt;/li&gt;
&lt;li&gt;Notification&lt;/li&gt;
&lt;li&gt;Transaction&lt;/li&gt;
&lt;li&gt;Sports Bettings&lt;/li&gt;
&lt;li&gt;Michezo ya Bahati Nasibu (General gambling SMS)&lt;/li&gt;
&lt;li&gt;Survey&lt;/li&gt;
&lt;li&gt;Verification&lt;/li&gt;
&lt;li&gt;Informational&lt;/li&gt;
&lt;li&gt;Personal&lt;/li&gt;
&lt;li&gt;SPAM&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We then exported our data into CSV format ready for crunching, *Where is the Data? Well we won't be able to share for now because some of the SMS contain identifiable personal information t*herefore we are currently working on cleaning and ensuring the data is of good quality and then will share through our &lt;a href="https://github.com/neurotech-HQ"&gt;Github repository&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Here we go&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now that you have a bit of background about the data that we are going to use to train our model, let's now get our hands dirty, let's break down our task into three steps: &lt;em&gt;data preprocessing&lt;/em&gt;, &lt;em&gt;training machine learning model,&lt;/em&gt; and &lt;em&gt;model evaluation&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Data Preprocessing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Data preprocessing&lt;/em&gt; is a way of converting raw data into a format that can be easily parsed by a machine learning model.  We need to preprocess our datasets to easily train our model. But first, let's read and view the structure of our datasets with the help of &lt;a href="https://pandas.pydata.org/"&gt;Pandas library&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'./raw sms data/data.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xZEhYU7l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/dataset-view1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xZEhYU7l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/dataset-view1.png" alt="https://blog.neurotech.africa/content/images/2022/08/dataset-view1.png" width="880" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see we have got a couple of columns in our dataset, let's start by exploring &lt;em&gt;messageDirection&lt;/em&gt; our data has;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"messageDirection"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Output: INCOMING    3384  "There are 3384 incoming messages"
#         OUTGOING      62
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we know the data collected consists of both &lt;strong&gt;&lt;em&gt;OUTGOING&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;INCOMING&lt;/em&gt;&lt;/strong&gt; SMS but from the very nature of our task, our primal interest lies in the incoming messages only, therefore we need to filter only data whose messageDirection is &lt;em&gt;INCOMING&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;incoming_sms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"messageDirection"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"INCOMING"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;interested_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s"&gt;'address'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'label'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Examining Label distribution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Examining the distribution of labels is crucial because it can reveal information about how well your model will work with a particular label.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hwgTAI8B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/label-distribution-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hwgTAI8B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/label-distribution-1.png" alt="https://blog.neurotech.africa/content/images/2022/08/label-distribution-1.png" width="880" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, most of our messages are "NOTIFICATION" labeled. "SPAM" messages are the least which means our dataset is not balanced.&lt;/p&gt;

&lt;p&gt;Let's also remove duplicates in our datasets&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interested_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interested_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;dirty_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;cleaned_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;used_texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dirty_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;used_texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="n"&gt;cleaned_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;used_texts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Only Filter out interested_data whose id is in ids
&lt;/span&gt;&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;interested_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;interested_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Output: 1920
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our data has been reduced from &lt;strong&gt;3384&lt;/strong&gt; to &lt;strong&gt;1920&lt;/strong&gt; which means almost &lt;strong&gt;43%&lt;/strong&gt; of our datasets were duplicates but this is an 'okay' amount of data to train our model.&lt;/p&gt;

&lt;p&gt;Now let's get a good look at our data by visualizing it in &lt;a href="https://pypi.org/project/wordcloud/"&gt;wordcloud&lt;/a&gt;. But before that, we need to remove a few stopwords. Then, we can see how often some words are used in the texts according to their categories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Removing stop words
&lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"na"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ya"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"wa"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"kwa"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"pia"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"kisha"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"au"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WpMGTxIH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/Classes-export.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WpMGTxIH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/Classes-export.png" alt="https://blog.neurotech.africa/content/images/2022/08/Classes-export.png" width="880" height="629"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--F94xGwM0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/dataset-view2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F94xGwM0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/dataset-view2.png" alt="https://blog.neurotech.africa/content/images/2022/08/dataset-view2.png" width="576" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above result of our &lt;em&gt;cleaned-incoming-sms&lt;/em&gt; is not particularly clean. We need to put in some extra effort.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make all of them lowercase.&lt;/li&gt;
&lt;li&gt;Remove all of the non-alphanumeric characters like ",", "+", "%", "!", ":"&lt;/li&gt;
&lt;li&gt;Remove all numbers in the text messages.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;
&lt;span class="c1"&gt;# Clean the texts
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# remove all non-alphanumeric characters
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;#convert text to lower-case
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'[‘’“”…,]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'[()]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'[^a-zA-Z]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;' +'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of our texts are clean now, so we can start training our model.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Training Machine Learning Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We are going to use the &lt;a href="https://scikit-learn.org/stable/index.html"&gt;Scikit-learn library&lt;/a&gt; to provide us all useful tools to train our model. Let's import our required tools and train our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.pipeline&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.feature_extraction.text&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cleaned_incoming_sms&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'label'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lowercase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since our dataset is not too large, the model will finish training in a very short time. After it finishes training, then we can check its score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: 0.9380530973451328
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see our model has a score of &lt;strong&gt;94%&lt;/strong&gt; when evaluated with test data which is quite good.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Testing our model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's save our model for later use and then we will import it again into another file to test it with some other messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;joblib&lt;/span&gt;
&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'./pipeline.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Before we test our model with some messages, we have to remember to pass them into the &lt;strong&gt;&lt;code&gt;clean_text&lt;/code&gt;&lt;/strong&gt; function to preprocess our text(remove non-alphanumeric characters, remove numbers, etc. in the text we are going to input to our model).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'./pipeline.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'test_data.txt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;test_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"Text: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Prediction: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MJl93TRM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/prediction.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MJl93TRM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/prediction.png" alt="https://blog.neurotech.africa/content/images/2022/08/prediction.png" width="880" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We tested our model with 14 messages that it has never seen before. As you can see from the result above, most of the messages in the test data were "SPAM" messages. But the model couldn't pick up most of them since there were few spam messages to train our model.&lt;/p&gt;

&lt;p&gt;Also, the model didn't quite perform well in the "PROMOTIONAL" label, because after removing duplicated messages in our datasets, label distribution has changed a lot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LqQnkkfI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/label-distribution-modified.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LqQnkkfI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/label-distribution-modified.png" alt="https://blog.neurotech.africa/content/images/2022/08/label-distribution-modified.png" width="880" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Label distribution after removing duplicates&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Any model's performance is strongly influenced by the quantity and size of the datasets. We couldn't access large datasets, but by spending more time thoroughly cleaning our training data, we can attempt to improve the accuracy of our model. Furthermore, we can tweak some parameters before training our model or we can experiment with alternative Machine Learning Classifiers like Decision Tree, SVM, etc. to achieve the best results and improve the performance of our model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Thank you.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>4 Popular Natural Language Processing Techniques</title>
      <dc:creator>Elia</dc:creator>
      <pubDate>Wed, 10 Aug 2022 08:55:11 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/4-popular-natural-language-processing-techniques-5g63</link>
      <guid>https://dev.to/neurotech_africa/4-popular-natural-language-processing-techniques-5g63</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. Source: Wikipedia&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is most likely that you have used NLP in one or another way. If you have ever tried to contact a certain business through messages and got an &lt;em&gt;immediate reply&lt;/em&gt;, probably it was NLP at work, or perhaps you have just gotten home from work, filled your cup with coffee, and asked &lt;a href="https://www.apple.com/siri/"&gt;Siri&lt;/a&gt; to play some relaxing seaside sounds. Without a doubt, you apply NLP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human language&lt;/strong&gt; is very complex, filled with &lt;em&gt;sarcasm&lt;/em&gt;, &lt;em&gt;idioms&lt;/em&gt;, &lt;em&gt;metaphors&lt;/em&gt;, and &lt;em&gt;grammar&lt;/em&gt; to mention a few. All of these make it difficult for computers to easily grasp the intended meaning of a certain sentence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Take an Example of a Sarcasm conversation:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;John is sewing clothes while closing his eyes.&lt;br&gt;
&lt;strong&gt;Martin:&lt;/strong&gt; John, what are you doing, you're going to hurt yourself.&lt;br&gt;
&lt;strong&gt;John:&lt;/strong&gt; No I won't ,After a few moments, John accidentally injects himself with the needle.&lt;br&gt;
&lt;strong&gt;Martin:&lt;/strong&gt; well, what a surprise.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With Natural Language Processing(NLP)  techniques we can break down human texts and sentences and process them so that can understand what's happening. In this article, we are going to learn with examples about the most common techniques and how they're applied, we will look on;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sentiment Analysis&lt;/li&gt;
&lt;li&gt;Text Classification&lt;/li&gt;
&lt;li&gt;Text Summarization&lt;/li&gt;
&lt;li&gt;And Named Entity Recognition&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Sentiment Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most businesses want to know what are the customer's feedback concerning their services or products. But you might find millions of customers' feedback. Analyzing everything is very painful and boring even if you are offered a large amount of money when you accomplish that. Sentiment analysis can be useful in this situation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sentiment Analysis is a natural language processing technique which is used to analyse positive, negative or neutral sentiment to textual data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Businesses use sentiment analysis to even determine whether the customer's comment indicates any interest in the product or service. Sentiment analysis can even be further developed to examine the mood of the text data (sad, furious, or excited).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8tPbu6o6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sentiment-analysis.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8tPbu6o6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sentiment-analysis.jpg" alt="Sentiment analysis image" width="689" height="517"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: revechat.com&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Use case&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To accomplish this, let's use the hugging face transformers library. We are going to use a pre-trained model from &lt;a href="https://huggingface.co/models"&gt;hugging face models&lt;/a&gt; called &lt;a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english"&gt;"distilbert-base-uncased-finetuned-sst-2-english"&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# let's first install transformers library&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the library is installed, completing the task is quite simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="n"&gt;analyser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"sentiment-analysis"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code will import the library and use a default pre-trained model to perform sentiment analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_comment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"The product is very useful. It have helped me alot."&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_comment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: [{'label': 'POSITIVE', 'score': 0.9997726082801819}]
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows that the sentiment of the user's comment is &lt;strong&gt;POSITIVE&lt;/strong&gt; and the model is &lt;strong&gt;99.9772%&lt;/strong&gt; sure.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Text Classification&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Text classification also known as text categorization is a natural language processing technique which analyses textual data and assigns them to a predefined category.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.cisco.com/c/en_in/products/security/email-security/what-is-spam.html"&gt;Spam emails&lt;/a&gt; occasionally arrive in your mailbox. When you click on one of these links, your computer may become infected with malware. Therefore, practically all email service providers employ this NLP technique to classify or categorize the email as either spam or not.&lt;/p&gt;

&lt;p&gt;To effectively categorize your incoming emails, text classifiers are trained using a lot of spam and non-spam email data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Use case&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's try to create a simple text classifier to classify whether the text we input is spam. We are going to use &lt;a href="https://textblob.readthedocs.io/en/dev/"&gt;the TextBlob library&lt;/a&gt; to achieve this.&lt;/p&gt;

&lt;p&gt;Let's create some training data to train our own classifier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Congratulation you won a your prize'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'URGENT You have won a 1 week FREE membership in our 100000 Prize Jackpot'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'SIX chances to win CASH From 100 to 20000 pounds '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'WINNER As a valued network customer you have been selected to receive 900 prize reward'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Free entry in 2 a weekly competition to win FA Cup final tickets 21st May 2005. Text FA to 87121 to receive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'I do not like this restaurant'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'I am tired of this stuff.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I can't deal with this"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'he is my sworn enemy!'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'my boss is horrible.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'This job is bad'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'no-spam'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's import our classifier from the TextBlob library and train it with our created training data. We are going to use &lt;a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier"&gt;NaiveBayesClassifier&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;textblob.classifiers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NaiveBayesClassifier&lt;/span&gt;

&lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NaiveBayesClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After our training is complete (which might take less than two seconds according to your computer), we will input our text to see if it works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Congratulation you won a free prize of 20000 dollars and Iphone 13"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: 'spam'
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our simple model correctly identified our message as "spam," which it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Text Summarization&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Text summarization is a natural language processing technique for producing a shorter version of a long piece of text.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's imagine that when you are drowsily sleeping, your boss sends you a message telling you to read a specific document. The document is ten pages long when you check it. For you, text-summarization might be a ground-breaking concept.&lt;/p&gt;

&lt;p&gt;Text summarization models often take the most crucial information out of a document and include it in the final text. However, some models go so far as to try to explain the meaning of the lengthy text in their own words.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Use case&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For this, we'll also make use of the transformers library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we are going to use the &lt;strong&gt;"summarization"&lt;/strong&gt; pipeline to summarize our long text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;summarizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"summarization"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# If the you don't have the summarization model in your machine, It will be downloaded from the internet.
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lengthy text can then be copied and pasted from anywhere for summarization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;long_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
    The Solar System is the gravitationally bound system of the Sun and the objects that orbit it. It formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud. The vast majority (99.86%) of the system's mass is in the Sun, with most of the remaining mass contained in the planet Jupiter. The four inner system planets—Mercury, Venus, Earth and Mars—are terrestrial planets, being composed primarily of rock and metal. The four giant planets of the outer system are substantially larger and more massive than the terrestrials. The two largest, Jupiter and Saturn, are gas giants, being composed mainly of hydrogen and helium; the next two, Uranus and Neptune, are ice giants, being composed mostly of volatile substances with relatively high melting points compared with hydrogen and helium, such as water, ammonia, and methane. All eight planets have nearly circular orbits that lie near the plane of Earth's orbit, called the ecliptic.
"""&lt;/span&gt;

&lt;span class="c1"&gt;# You can set an optional parameter of max_length to maximum number of words you want to be outputted
&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;summarizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;long_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: [{'summary_text': " The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud . The vast majority (99.86%) of the system's mass is in the Sun, with most of the remaining mass contained in the planet Jupiter . The four inner system planets are terrestrial planets, being composed primarily of rock and metal ."}]
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Named Entity Recognition(NER)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Have you ever heard fanciful tales about how a particular firm listens in on all calls, chats, and online interactions to see what people are saying about it? Well, &lt;strong&gt;"if"&lt;/strong&gt; this is true, one of their strategies might be named entity recognition. Because NER is a natural language processing technique that identifies and classifies named entities in text data.&lt;/p&gt;

&lt;p&gt;Named entities are just real-world objects like a person, organization, location, product, etc. NER models identify ‘Dar-es-salaam’ as a location or ‘Michael’ as a man's name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WWnjvPgl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/ner.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WWnjvPgl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/ner.jpg" alt="https://blog.neurotech.africa/content/images/2022/07/ner.jpg" width="880" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: shaip.com&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Use case&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We will use the SpaCy library for this task. We need to install it and download an English pre-trained model to help us to achieve our task faster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; spacy

&lt;span class="c"&gt;# Then downloading the model&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy download en_core_web_sm

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are going to import the spacy library and then load the model we downloaded so we can perform our task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;spacy&lt;/span&gt;

&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"en_core_web_sm"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After loading our model, we can simply input our text and spacy will give us named entities present in our text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"The ISIS has claimed responsibility for a suicide bomb blast in the Tunisian capital earlier this week."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;displacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ent"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: ISIS ORG
#         Tunisian NORP
#         earlier this week DATE
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows different entities detected by spaCy with their respective labels.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; If you didn't understand the meaning of an abbreviation in spaCy, you can use &lt;strong&gt;spacy.explain()&lt;/strong&gt; to explain its meaning.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Let's say you didn't understand the meaning of an abbreviation "ORG"
&lt;/span&gt;
&lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;explain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ORG"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: 'Companies, agencies, institutions, etc.'
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The good news is that it's simple to get started with these techniques nowadays. Large language models like &lt;strong&gt;Google's Lamda&lt;/strong&gt; and &lt;strong&gt;&lt;a href="https://beta.openai.com/examples"&gt;GPT3&lt;/a&gt;&lt;/strong&gt; are available to aid in NLP tasks. You may easily construct helpful Natural language processing projects with the use of tools like &lt;strong&gt;&lt;a href="https://spacy.io/"&gt;spaCy&lt;/a&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;a href="https://huggingface.co/"&gt;hugging face&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thanks.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How is conversational AI impacting the finance industry?</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Tue, 09 Aug 2022 07:21:18 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/how-is-conversational-ai-impacting-the-finance-industry-agh</link>
      <guid>https://dev.to/neurotech_africa/how-is-conversational-ai-impacting-the-finance-industry-agh</guid>
      <description>&lt;p&gt;This article was originally published in the &lt;a href="https://blog.neurotech.africa/how-is-conversational-ai-imapacting-the-finance-industry/"&gt;neurotech Africa&lt;/a&gt; blog.&lt;/p&gt;

&lt;p&gt;The evolution of technology continues to spread across multiple industries, the finance industry can't be left behind when the list of industries experiencing immense transformation every now and then because is one of the important segments of the economy.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;About Finance industry&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The finance sector is wide and constitutes at least 20% of the global economy and the impact of this sector on economic growth is significant.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;According to the finance and development department of the International Monetary Fund, financial services are the processes by which consumers or businesses acquire financial goods. For example, a payment system provider offers a financial service when it accepts and transfers funds between payers and recipients. This includes accounts settled through credit and debit cards, checks, and electronic funds transfers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In developing countries, fin-tech firms are gaining prominence, aided by the rise of digital public goods and currencies. Migrating to online and mobile services will remain a priority for financial firms way to a cashless economy and financial services companies like banks, Tax, and accounting services and insurances will need to complete with emergent financial firms. Building internal services can be the best option for large companies but not all of them and every solution, speaking of small to medium microfinance the best way to migrate is by outsourcing to companies with the best talents to build solutions to meet the demand of the digital economy.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conversational AI use cases in the Finance industry:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Conversational AI is one of the essential boosts in the Finance sector from sales, marketing, and customer services. Conversational AI solutions allow smooth customer services management both fast and efficiently, the essential advantage of this technology is to act as a listening channel and better understanding of your customers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KQVg9R6N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KQVg9R6N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit01.png" alt="https://blog.neurotech.africa/content/images/2022/08/benefit01.png" width="828" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Collective way of understanding which product is performing better, how the customer views your services, what they don’t like what they like, feedback, and suggestions to work on. All of these pieces of information are the potential to improve your business by transforming services in a personalized way, and recommendations of new services or products will help customers to get better service according to what they utilize.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IQDY0WGH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IQDY0WGH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit0.png" alt="https://blog.neurotech.africa/content/images/2022/08/benefit0.png" width="853" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In fact, conversational AI solutions help businesses to reduce operational costs by improving the efficiency of their service, minimizing human error, and resolving customer queries quicker.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2VwkrEI6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit02.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2VwkrEI6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/08/benefit02.png" alt="https://blog.neurotech.africa/content/images/2022/08/benefit02.png" width="877" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits of conversational AI solution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the use cases of conversational AI in the finance industry?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manage payments and transactions:- On a regular, people have to clear bills, pay businesses, shop online, or perform any kind of online transaction. A conversation AI can help the user make and track these payments. Clearing payments can often be urgent and time-bound. More often than not, in such cases, switching platforms to complete transactions can be inconvenient. But with an omnichannel conversational AI, your customers can make payments right where they are, and avoid any delays!&lt;/li&gt;
&lt;li&gt;Leads generation:- Conversational AI solutions have no match when interaction comes to play. They can interact with customers for the first time and understand their needs and sentiments behind the conversation. This, very human interaction, can help banks acquire new customers and also get their personal details. These details are then transferred to the sales team for taking the conversation forward.&lt;/li&gt;
&lt;li&gt;Resolve common and repetitive inquiries:- Some repetitive activities are really boring, there are some questions that most of your users likely ask frequently such as "how do I restore unsuccessful transaction?", "What are the steps to follow to get a loan? ", "what is the status of my loan application?", etc Instead of customers going through a long list of frequently asked questions, a conversational AI solution can handle this with instant reply.&lt;/li&gt;
&lt;li&gt;Easy document collection and sharing:- Assume your customer wants to apply for a new loan but keeps getting sent back from the bank each time because of new inconsistencies in verification very annoying, right? Nobody can be happy with this situation neither your customer nor you. This is a pretty common scene to witness in a bank. This happens mostly because of a lack of knowledge and awareness on the customer’s front. However, form filling, document collection, and verification are common conversational AI use cases in banking and insurance.&lt;/li&gt;
&lt;li&gt;Locate nearest service providers:- This may include ATMs, agents, and branches. Assume you're new in the city and you need to find a certain bank branch or even an ATM instead of asking multiple people, a conversational AI solution with geolocation of all your businesses can help your customer easier to navigate to the nearest service provider.&lt;/li&gt;
&lt;li&gt;Feedback collections:- Customers would love to give feedback and reviews if their hard-earned money or other services is taken care of by the banks or insurance company. These reviews can be collected by the banking conversational AI, instead of using the long survey forms, banks can now integrate chatbots on their websites and apps for collecting feedback and reviews.&lt;/li&gt;
&lt;li&gt;Handling suspicious activities:- Security and data privacy concerns any business. But for banks and financial organizations, their reputation relies on it. conversation AI solutions can effectively monitor and recognize the warning signs of fraudulent activity and issue alerts directly to the customer and the bank.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conversational AI solution by Neurotech&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.neurotech.africa/"&gt;Neurotech&lt;/a&gt; we are an AI company that builds &lt;a href="https://www.neurotech.africa/#services"&gt;solutions&lt;/a&gt; for businesses currently we do develop conversational AI for business needs which are controlled by our internal engine goes by the name &lt;a href="https://sarufi.io/#_"&gt;Sarufi&lt;/a&gt;. We offer custom solutions to fit various business needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it is useful?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our conversational ai solutions can provide seamless customer support across multiple platforms, enabling you to offer a more personalized, contextual service to customers, and you can explore more from &lt;a href="https://blog.neurotech.africa/how-can-neurotech-transform-your-business-with-conversational-ai/"&gt;here&lt;/a&gt;&lt;strong&gt;.&lt;/strong&gt;  Our solutions ****are developed in such a way that can understand the contextual meaning of the interaction or conversation with targeted audiences, Our custom chatbots can be deployed on social media platforms like Whatsapp, Facebook, Instagram, and Telegram. This depends on what our customers need.&lt;/p&gt;

&lt;p&gt;Currently, our solutions can work in two languages only Swahili and English, it can help out your business with customer support, locate near service providers, save on labor costs and instead pay fewer support employees fair wages without being stretched to support a large staff, increase revenue and build opportunities with every customer interaction note that 😊.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final thoughts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Businesses should consider focusing on business needs not on technology in fact the aim is to earn more and make sure things are moving in the right direction. Technology advancing at a rapid pace which sometimes may be confusing. Executives and business professionals may find that their decisions are lagging behind the rapidly growing technology like conversational AI, blockchain, data analysis, etc. This misunderstanding may lead to consuming non-actionable insights into your business operations and this can be stressful and overwhelming to your customers.&lt;/p&gt;

&lt;p&gt;To avoid that mistake and overloading customers with unnecessary information, executives should get close to technology experts to clearly understand what can be solved based on their needs with only actionable insights, this will help to avoid unnecessary costs and expectations on something that can’t work on your business.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.neurotech.africa/#contact"&gt;Get in touch&lt;/a&gt; with Neurotech’s team to discover how you can benefit from our conversational solutions to boost your business, we do consultations on best practices for using data insights to address your business needs.&lt;/p&gt;

&lt;p&gt;Find the needs of your business, build solutions don’t implement something simply because ABC company has implemented it, and do something that is potential for your business growth.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kA9NRsI3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r3rilv2ydy6hsgnk0rn6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kA9NRsI3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r3rilv2ydy6hsgnk0rn6.jpg" alt="Image description" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>The cause of a decision in Swahili social media sentiments</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Tue, 09 Aug 2022 07:07:00 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/the-cause-of-a-decision-in-swahili-social-media-sentiments-2jhp</link>
      <guid>https://dev.to/neurotech_africa/the-cause-of-a-decision-in-swahili-social-media-sentiments-2jhp</guid>
      <description>&lt;p&gt;This article was originally published in the &lt;a href="https://blog.neurotech.africa/the-cause-of-the-decision-in-swahili-social-media-sentiment/"&gt;neurotech Africa&lt;/a&gt; blog.&lt;/p&gt;

&lt;p&gt;As a data professional one of the best practices is to be accountable for the solutions at hand, by understanding how the model you have built is performing and predicting the results. I came across Swahili social media sentiments and since I'm a Swahili speaker I was curious to understand the cause of decisions in Swahili sentiment analysis using machine learning algorithms.&lt;/p&gt;

&lt;p&gt;In today's article, I will work with you through building a machine learning model for Swahili social media sentiment classification with the interpretability of each prediction of our final model using &lt;a href="https://github.com/marcotcr/lime"&gt;Local Interpretable Model-Agnostic Explanations&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why Should I Trust You?” Explaining the Predictions of Any Classifier ~ Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kiswahili is a lingua franca spoken by up to 150 million people across East Africa. It is an official language in Tanzania, DRC, Kenya, and Uganda. On social media, Swahili speakers tend to express themselves in their own local dialect.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Building Swahili social media sentiment classifier&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sentiment analysis relies on multiple word senses and cultural knowledge and can be influenced by age, gender, and socio-economic status. In today's task, I will be using datasets from Twitter originally hosted at &lt;a href="https://zindi.africa/competitions/swahili-social-media-sentiment-analysis-challenge"&gt;Google Natural language processing hack series&lt;/a&gt; by zindi Africa, with the aim of classifying whether a Swahili sentence is of positive, negative, or neutral sentiment.&lt;/p&gt;

&lt;p&gt;The dataset contains three columns which are &lt;code&gt;id&lt;/code&gt; as the unique ID of a unique Swahili tweet, &lt;code&gt;tweets&lt;/code&gt; containing the actual text of the Swahili tweet, and &lt;code&gt;labels&lt;/code&gt; the label of the Swahili tweet, either negative(-1), neutral(0), positive(1) with 2263 observations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WtXDEqFl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sw-head.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WtXDEqFl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sw-head.png" alt="https://blog.neurotech.africa/content/images/2022/07/sw-head.png" width="671" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How about label distribution?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RzGttkrK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/class-dist.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RzGttkrK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/class-dist.png" alt="https://blog.neurotech.africa/content/images/2022/07/class-dist.png" width="720" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most of the tweets collected are neutral, which shows that our labels are imbalanced.&lt;/p&gt;

&lt;p&gt;Let's work on preprocessing the dataset to make everything ready for building our final machine learning model. This will involve a range of steps for cleaning the texts&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;removing non-alphanumeric text.&lt;/li&gt;
&lt;li&gt;removing stopwords&lt;/li&gt;
&lt;li&gt;converting all tweets into lowercase.&lt;/li&gt;
&lt;li&gt;removing punctuation, links, emojis, and white spaces.&lt;/li&gt;
&lt;li&gt;tokenize the text based on each word.&lt;/li&gt;
&lt;li&gt;the final piece is to append all clean tweets into new columns named &lt;code&gt;clean_tweets&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Point to note, &lt;code&gt;nltk&lt;/code&gt; doesn't consist of Swahili stopwords but you have to create your own list and apply it to the tweets. I just created a &lt;a href="https://github.com/Neurotech-HQ/Cause-of-decision-in-Swahili-sentiments/blob/main/data/Common%20Swahili%20Stop-words.csv"&gt;CSV&lt;/a&gt; file with a couple of Swahili stopwords like na, kwa, kama, lakini, ya, take, etc which  I will apply here.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop words are a set of commonly used words in any language. For example, in English, “the”, “is” and “and”, would easily qualify as stop words. In NLP and text mining applications, stop words are used to eliminate unimportant words, allowing applications to focus on the important words instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To make things smooth let's just use one function to perform all of the tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_tweets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;'''
        function to clean tweet column, make it ready for transformation and modeling
    '''&lt;/span&gt;
    &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;#convert text to lower-case
&lt;/span&gt;    &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'[‘’“”…,]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# remove punctuation
&lt;/span&gt;    &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'[()]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# remove parenthesis
&lt;/span&gt;    &lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"[^a-zA-Z]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#remove numbers and keep text/alphabet only
&lt;/span&gt;    &lt;span class="n"&gt;tweet_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nltk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;clean_tweets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tweet_list&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;swstopwords&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# remove stop words
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_tweets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_tweets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;function to clean tweet column, make it ready for transformation and modeling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now the tweets are clean and ready for further processes&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jLFHrwEu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sw-clean-tweets.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jLFHrwEu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/sw-clean-tweets.png" alt="https://blog.neurotech.africa/content/images/2022/07/sw-clean-tweets.png" width="880" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;datasets after applying the clean_tweet function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time to work on the analysis of the Swahili tweets by looking at polarity and subjectivity. But wait! what do polarity and subjectivity mean?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Polarity is the expression that determines the sentimental aspect of an opinion. In textual data, the result of sentiment analysis can be determined for each entity in the sentence, document, or sentence. The sentiment polarity can be determined as positive, negative, and neutral. Usually defined as a float that ranges from 1 (entirely positive) to -1 (entirely negative)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sentiment polarity for an element defines the orientation of the expressed sentiment, i.e., it determines if the text expresses the positive, negative or neutral sentiment of the user about the entity in consideration.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Subjectivity is the measure of how factual the text is, ranging from 0 (pure fact) and 1 (pure opinion)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I will be using &lt;code&gt;textblob&lt;/code&gt; to analyze tweets&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;polarity_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="s"&gt;'''
        This function takes in a text data and returns the polarity of the text
        Polarity is float which lies in the range of [-1,0,1] where 1 means positive statement, 0 means positive statement
        and -1 means a negative statement
    '''&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TextBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;polarity&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;subjectivity_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="s"&gt;'''
      This function takes in a text data and returns the subectivity of the text.
      Subjectivity sentences generally refer to personal opinion,
      emotion or judgment whereas objective refers to factual information.
      Subjectivity is also a float which lies in the range of [0,1].
  '''&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TextBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subjectivity&lt;/span&gt;

  &lt;span class="c1"&gt;#apply above functions to the data
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;polarity_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'subjectivity_score'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subjectivity_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;polarity score&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now let's try to aggregate the overall polarity and subjectivity of the entire dataset&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The overall polarity of the tweet data is 0.01&lt;/p&gt;

&lt;p&gt;The overall subjectivity of the tweet data is 0.03&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The overall polarity of the tweet data indicates that the tweets are fairly neutral.&lt;/p&gt;

&lt;p&gt;Let's try to visualize the polarity and subjectivity distributions  of each class independently&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# visualization
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;subplot_titles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Polarity Score Distribution-Negative"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Subjectivity Score Distribution-Negative"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                   &lt;span class="s"&gt;"Polarity Score Distribution-Neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Subjectivity Score Distribution-Neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                   &lt;span class="s"&gt;'Polarity Score Distribution-Positive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'Subjectivity Score Distribution-Positive'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;x_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Score"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'Frequency'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'subjectivity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'subjectivity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'polarity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'subjectivity_score'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;renderer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"colab"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now here we go,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wEAJUAFT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wEAJUAFT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot.png" alt="https://blog.neurotech.africa/content/images/2022/07/newplot.png" width="880" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;distribution of each class on polarity and subjectivity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In terms of Subjectivity, all three classes tend to be similar no significant difference can be stated, but the polarity of the negative class is different from the positive and neutral classes in terms of skewness.&lt;/p&gt;

&lt;p&gt;Let's try to understand the content by visualizing the most used words in all classes and later we can jump to each class independently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;word_freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;word_freq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'index'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update_layout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xaxis_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Word"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;yaxis_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Count"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Top 20 most Frequent words in across entire tweet data"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;renderer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"colab"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kSKlqh3U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot--1-.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kSKlqh3U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot--1-.png" alt="https://blog.neurotech.africa/content/images/2022/07/newplot--1-.png" width="880" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;habari&lt;/code&gt;, &lt;code&gt;leo&lt;/code&gt;, &lt;code&gt;siku&lt;/code&gt;, &lt;code&gt;namba&lt;/code&gt; are the top frequent words in the overall tweet contents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Negative Tweets Word Frequency
&lt;/span&gt;&lt;span class="n"&gt;word_freq_neg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;word_freq_neg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'index'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Neutral Tweets Word Frequency
&lt;/span&gt;&lt;span class="n"&gt;word_freq_neut&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;word_freq_neut&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neut&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'index'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Positive Tweets Word Frequency
&lt;/span&gt;&lt;span class="n"&gt;word_freq_pos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Labels'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'clean_tweets'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;word_freq_pos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_pos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'index'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;fig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;subplot_titles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Top 20 most frequent words-Negative"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Top 20 most frequent words-Neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Top 20 most frequent words-Positive"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;x_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Word"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neut&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_neut&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Word'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;word_freq_pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Count'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;renderer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"colab"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--svlRm5r6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot--2-.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--svlRm5r6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/newplot--2-.png" alt="https://blog.neurotech.africa/content/images/2022/07/newplot--2-.png" width="880" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Across the negative class tweets, the most used words are &lt;code&gt;watu&lt;/code&gt;, &lt;code&gt;leo&lt;/code&gt; and &lt;code&gt;siku&lt;/code&gt; , across the neutral class tweets, the most used words are &lt;code&gt;habari&lt;/code&gt;, &lt;code&gt;kazi&lt;/code&gt; and &lt;code&gt;mtu&lt;/code&gt; and across the positive class tweets, the most frequently used words are &lt;code&gt;habari&lt;/code&gt;, &lt;code&gt;leo&lt;/code&gt;, &lt;code&gt;asante&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's prepare our final dataset for modeling by splitting it into two groups(train and test)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# data split
&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"clean_tweets"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Labels"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;

&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;stratify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, I can make our data pipeline ready for training Swahili sentiments by defining &lt;code&gt;TfidfVectorizer&lt;/code&gt; as a vectorizer and &lt;code&gt;LogisticRegression&lt;/code&gt; as an algorithm for building our model. Using initialized pipeline, I can train the classifier using the training set of tweets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# instantiating model pipeline
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# training model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Great! I have trained our classifier for Swahili social media sentiments, and now it's time to evaluate the performance of our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Classification Report"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;classification_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;target_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Negative"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"Positive"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the classification report, the performance is not very good, our model has a 60% accuracy&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qZY5TYTc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/classification-report.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qZY5TYTc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/classification-report.png" alt="https://blog.neurotech.africa/content/images/2022/07/classification-report.png" width="701" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Results Interpretability&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It's time to understand the cause of the decision of our classifier, we should bring LIME to help us in the interpretation of each prediction of our model, for understanding let me opt to filter out three kinds of prediction(negative, neutral, and positive).&lt;/p&gt;

&lt;p&gt;The higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made. A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model.&lt;/p&gt;

&lt;p&gt;I should consider predicting probabilities with a LogisticRegression classifier instead of 0 or 1 simply because LIME requires a model that produces probability scores for each prediction to explain the decision's cause.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JQHuQhFc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime-00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JQHuQhFc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime-00.png" alt="https://blog.neurotech.africa/content/images/2022/07/temp-lime-00.png" width="880" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we go, the above observation shows that the probability of a positive class is higher(0.47) compared to other classes, and the cause of decision by words  &lt;code&gt;serikali&lt;/code&gt;, &lt;code&gt;mwisho&lt;/code&gt; and &lt;code&gt;vyema&lt;/code&gt;  way back in our previous visualization of top frequent words for the positive class to conclude the classifier decision.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a08D6MGg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime-01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a08D6MGg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime-01.png" alt="https://blog.neurotech.africa/content/images/2022/07/temp-lime-01.png" width="880" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above observation shows that the probability of a neutral class is higher(0.72) compared to the other two classes, and the cause of the decision comes from words &lt;code&gt;walimu&lt;/code&gt;, &lt;code&gt;walikuwa&lt;/code&gt;, and &lt;code&gt;mwanzoni&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Dpx2p333--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime02_.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Dpx2p333--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/temp-lime02_.png" alt="https://blog.neurotech.africa/content/images/2022/07/temp-lime02_.png" width="880" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above observation shows that both of the three classes weigh comparable but due to the high weighting of the word &lt;code&gt;polisi&lt;/code&gt;  the tweet predicted a negative class.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How companies can benefit from customer sentiment analysis?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sentiment analysis can help to understand the potential of customers to see an overview of what’s good, and what’s lacking. This can help to improve the strategy of marketing and operations based on customer sentiments.&lt;/p&gt;

&lt;p&gt;The power of deep insights from sentiment can help capture what specifically people don’t like about the service, product, or policy and after the business has taken steps to fix the issue, or improve a process, also can track how that has improved customer satisfaction. Insights from customer sentiments can also differentiate between feedback that is frequent and feedback that influences satisfaction scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final thoughts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Understanding the cause of the decision of individual predictions from classifiers is important for data professionals. Having explanations lets you make an informed decision about how much you trust the prediction or the model as a whole, and provides insights that can be used to improve the model.&lt;/p&gt;

&lt;p&gt;The complete code used in this article can be found on the GitHub &lt;a href="https://github.com/Neurotech-HQ/Cause-of-decision-in-Swahili-sentiments"&gt;repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yjfnx6Ed--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wml0uak86qdh133yfty2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yjfnx6Ed--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wml0uak86qdh133yfty2.jpg" alt="Image description" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How conversational AI is transforming the Insurance industry</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Tue, 26 Jul 2022 05:51:00 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/how-conversational-ai-is-transforming-the-insurance-industry-22kd</link>
      <guid>https://dev.to/neurotech_africa/how-conversational-ai-is-transforming-the-insurance-industry-22kd</guid>
      <description>&lt;p&gt;This article was originally published in the &lt;a href="https://blog.neurotech.africa/how-can-conversational-ai-impact-the-insurance-industry/" rel="noopener noreferrer"&gt;Neurotech Africa blog&lt;/a&gt; post&lt;/p&gt;

&lt;p&gt;Everyday technology continues to evolve from different angles, in this blog post I will explain through leveraging the power of conversational AI can make a difference in the insurance industry.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;About the Insurance sector:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The insurance sector is made up of companies that offer risk management in the form of insurance contracts. The basic concept of insurance is that one party, the insurer, will guarantee payment for an uncertain future event. Meanwhile, another party, the insured or the policyholder, pays a smaller premium to the insurer in exchange for that protection on that uncertain future occurrence.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;According to the 2020 Tanzania insurance report, “the Tanzania insurance sector is growing steadily, with 30 insurance companies and 112 insurance brokers currently active in the market (2014 TIRA data)”.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Through the statistics you can realize that the contribution of insurance to the National Gross Domestic Product remains very limited, paving the way for plenty of room for further growth.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Digital transformation in the Insurance industry&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Digital_transformation" rel="noopener noreferrer"&gt;Digital transformation&lt;/a&gt; varies across multiple industries, but the worth truth is that 70% of digital transformation fails in the sense that they don’t meet their objectives, this is based on studies from International Data Group. The fun fact is that a company or an industry can’t be fully digital transformed at once but better be staged. May begin with system operational to employees to be aware of what transformation is capable of and how their contribution can improve the whole process of adopting digital transformation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.neurotech.africa%2Fcontent%2Fimages%2F2022%2F07%2Fdigital-insurance.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.neurotech.africa%2Fcontent%2Fimages%2F2022%2F07%2Fdigital-insurance.svg" alt="https://blog.neurotech.africa/content/images/2022/07/digital-insurance.svg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source: tibco.com&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The insurance industry is among the oldest financial businesses in the world. In fact, the industry tends to stay traditional and is slow to change, however, new technology trends have been impacting the insurance marketplace, creating extreme competition. The immense experience was during Covid-19 when insurance companies found themselves in the middle of the storm. A couple of times operations were done remotely and at the same time, they were fielding calls about changing coverage. Answering questions about business interruption policies and continuing to pay claims for life, health, and disability insurance.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Need of accelerating digital transformation in the insurance industry&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Digital transformation will help the insurance industry to solve some challenges and improve its business strategies. Let me just highlight some of the potentials of accelerating digital transformations:-&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer experience: spend enough time getting to know the customer and figuring out what it is they want and how to respond to it. This bit of the process should occur throughout the customer lifecycle, from prospecting until the moment of withdrawal.&lt;/li&gt;
&lt;li&gt;Value generation through data: data-driven is essential for decision-making in insurance companies. Understanding how you want to use data in a way that you can create value is important, and this experience can be influenced by executives to normal employees. By doing that, it will be possible to determine the various uses of data.&lt;/li&gt;
&lt;li&gt;Ecosystem development: the process of redesigning insurance strategies involves a couple of tasks like measuring, controlling, and assessing risks all of these are being transformed by the digital environment, and the leading insurance market leaders are aware of this. Understanding the ecosystem to know how strategies can be applied to regions or branches depending on the scenarios rather than using them because perform well around the city or elsewhere.&lt;/li&gt;
&lt;li&gt;Margin management: the digital transformation of the insurance business can only do one of two things: either reduce costs or increase them. Either way, it hinges on making the right decisions and then adopting new technologies to create business models based on those decisions&lt;/li&gt;
&lt;li&gt;Multichannel Strategy: using several channels means that your brand will utilize two or more marketing methods to share your content and messaging across several platforms. In simple terms, a multichannel strategy makes it easier for consumers to complete their sale transactions and interact with brands through the most suitable platforms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conversational AI use-case in the insurance industry&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Manage internal operations: through automation and speeding up repetitive tasks, employees can focus on more complicated tasks and further on developing their skills to improve operations.&lt;/li&gt;
&lt;li&gt;Customer awareness and education: conversation AI can bring closer customer awareness and education on how the process works, benefits, availability of offers, and compare as well as suggest the optimal policy, from multiple carriers, based on the customer’s profile and inputs. But also the engagement and interaction with customers, this can be through websites or social media platforms.&lt;/li&gt;
&lt;li&gt;Risk evaluation: leveraging conversational AI can improve the ways and processes of taking control of the data overwhelming to assess risks with high accuracy, better insights understanding, plans customizations, and make better decisions.&lt;/li&gt;
&lt;li&gt;Claims management: this involves claim processing and payment assistance, conversational AI can be trained to address your customer’s insurance claims and also follow up with them on the existing ones. But how about automating payment processes according to the preferences of customers.&lt;/li&gt;
&lt;li&gt;Customer feedback and reviews: most customers tend to share their feedback immediately after service apart from that it is rarely. Most studies suggest that customers are more likely to respond over live chat than email, and they do feel confident and well to contact the business through message rather than calls.&lt;/li&gt;
&lt;li&gt;Fraudulent prevention: Insurance firms must take care of customer data privacy and security. Conversational AI is efficient in monitoring and detecting warning signs of fraudulent activity and can alert both the insurer and the customer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;About Neurotech’s conversational AI solutions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.neurotech.africa/" rel="noopener noreferrer"&gt;Neurotech&lt;/a&gt; we are an AI company that builds &lt;a href="https://www.neurotech.africa/#services" rel="noopener noreferrer"&gt;solutions&lt;/a&gt; for businesses currently we do develop conversational AI for business needs which are controlled by our internal engine goes by the name &lt;a href="https://sarufi.io/#_" rel="noopener noreferrer"&gt;Sarufi&lt;/a&gt;. We offer custom solutions to fit various business needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it is useful?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our conversational ai solutions can provide seamless customer support across multiple platforms, enabling you to offer a more personalized, contextual service to customers, and you can explore more from &lt;a href="https://blog.neurotech.africa/how-can-neurotech-transform-your-business-with-conversational-ai/" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;strong&gt;.&lt;/strong&gt;  Our solutions ****are developed in such a way that can understand the contextual meaning of the interaction or conversation with targeted audiences, Our custom chatbots can be deployed on social media platforms like Whatsapp, Facebook, Instagram, and Telegram. This depends on what our customers need.&lt;/p&gt;

&lt;p&gt;Currently, our solutions can work in two languages only Swahili and English, it can help out your business with customer support, save on labor costs and instead pay fewer support employees fair wages without being stretched to support a large staff, increase revenue and build opportunities with every customer interaction note that 😊.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final thoughts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Is there a need of customizing customer experience in the insurance industry? Absolutely yes, innovation in the insurance industry with conversational AI in transforming the entire cycle of processes such as claims, can help to improve the awareness and the education of a large population with less cost money, and effort. Through conversational AI  will ensure faster settlements and optimized customer experiences, leading to improved risk evaluation with new technologies like Machine learning and Artificial intelligence in making appropriate decisions, ensuring personalized and customized customer services and experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.neurotech.africa%2Fcontent%2Fimages%2F2022%2F07%2Fthankyou-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.neurotech.africa%2Fcontent%2Fimages%2F2022%2F07%2Fthankyou-1.jpg" alt="https://blog.neurotech.africa/content/images/2022/07/thankyou-1.jpg"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How can neurotech Africa transform your business with Conversational AI</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Sun, 17 Jul 2022 20:09:33 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/how-can-neurotech-africa-transform-your-business-with-conversational-ai-13p4</link>
      <guid>https://dev.to/neurotech_africa/how-can-neurotech-africa-transform-your-business-with-conversational-ai-13p4</guid>
      <description>&lt;p&gt;This article was originally published on the &lt;a href="https://blog.neurotech.africa/how-can-neurotech-transform-your-business-with-conversational-ai/"&gt;neurotech&lt;/a&gt; blog post&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does Neurotech use Conversational AI?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We build custom conversational solutions to help businesses improve their customer experiences and services with our internal tool, which goes by the name &lt;a href="https://sarufi.io/"&gt;Sarufi&lt;/a&gt;. The best thing about our solution we use Natural Language Processing to provide a more conversational approach to customer service and a deeper understanding of the context of what people say depending on the industry of the business.&lt;/p&gt;

&lt;p&gt;As per use-case, our approaches differ depending on the customer specifications. With our conversational AI solutions, you can get access to incredibly intelligent control of the market of your business without needing to invest the time, money, and resources to train to build the solutions with the internal team.&lt;/p&gt;

&lt;p&gt;Our solutions can be deployed across a range of platforms starting with a website if you have, social platforms like WhatsApp, telegram, Instagram, and Facebook messenger. This can depend on where the client prefers to host their business. At &lt;a href="https://www.neurotech.africa/#"&gt;Neurotech&lt;/a&gt;, we offer full support of our solutions from our talented team to make sure that our clients' businesses benefit from what we offer. This helps make sure you’re getting the most value out of a conversational AI solution for your business.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How can Neurotech transform your business?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Our experts and Sarufi engine provide fast and easy deployment of solutions. With our solution, we transform everything into a custom experience that will help your business to save costs and increase revenue, understand what is missing from your product's service, and keep in touch with your customers.&lt;/p&gt;

&lt;p&gt;Through user interaction with your business, you will be able to know better what works and what not working without using extreme effort.&lt;/p&gt;

&lt;p&gt;This is a more comfortable transformation simply because the service will be available 24/7 without paying any additional costs to employees, and customers able to instantiate conversation in their natural languages. This can be achieved through a couple of steps:-&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our team of experts will work with the client to determine the requirements and the efficient way the conversational experience will be integrated into the business.&lt;/li&gt;
&lt;li&gt;Then, we work on building the solution by considering training models to act upon the inputs provided by consumers with continuous reviews of the results.&lt;/li&gt;
&lt;li&gt;In the final, we deploy the solution and offer to support and consulting services to our clients.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are the benefits of conversational AI solutions?&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personalize customer experience:-&lt;/strong&gt; Businesses can provide a more personalized experience to both existing customers and potential clients by using conversational AI (such as chatbots) to create a deeper level of interactivity and familiarity with the brand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improving marketing experience:-&lt;/strong&gt; Conversational AI helps to improve marketing by creating a better experience for each customer, based on their needs and desires.A more convenient mode of communication because of the combination of various functionalities would make it convenient for customers across multiple channels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-effective:-&lt;/strong&gt; Depending on their learnings and training techniques, they reduce the requirement of human resources to answer customer queries. They are also proficient in handling multiple chats simultaneously with accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhance Operations beyond borders:-&lt;/strong&gt; Expand business outreach to the potential population.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-evolving platforms from experience:-&lt;/strong&gt; conversational AI learn from their experiences. The more they interact with human beings, the more quickly their intelligence improves. Also learn from any existing data, such as customer databases and previous customer interactions. Clever Conversational Interfaces learn from their mistakes just as human beings do. They take note of what questions the customers ask and what kinds of responses seem to be informative. They try new approaches until they find a way that is both effective and efficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insights driven:-&lt;/strong&gt; Conversational solutions make effective use of analytics, which essentially helps in gleaning data and information from outside the organization. A mix of both internal and external data can be a great advantage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round-the-clock support:-&lt;/strong&gt; Conversational AI can provide real-time customer assistance. This means that businesses can address customer queries and complaints as they occur, significantly improving customer satisfaction. Provide 24/7 client support, so existing and potential customers can try and solve their problems after work hours and on weekends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast-paced communication:-&lt;/strong&gt; can help businesses provide quicker and more efficient customer service. This is because chatbots can handle a large number of customer inquiries simultaneously. They can also route customers to the right agent, which reduces the wait time, and works 24/7/365, a huge advantage for businesses. Properly programmed chatbots are always polite and their behavior does not depend on the mood.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conversational AI solutions are not perceived as a human replacement but rather as human augmentation, enhancing easier access to business both internally and externally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.neurotech.africa/#contact"&gt;Get in touch&lt;/a&gt; with Neurotech’s team to discover how you can benefit from our conversational solutions to boost your business, the time is now to leverage benefits from Artificial intelligence Technology.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6LodlsLa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/thankyou.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6LodlsLa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/07/thankyou.jpg" alt="https://blog.neurotech.africa/content/images/2022/07/thankyou.jpg" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Potentials of conversational AI for businesses</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Thu, 14 Jul 2022 20:15:30 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/potentials-of-conversational-ai-for-businesses-5c79</link>
      <guid>https://dev.to/neurotech_africa/potentials-of-conversational-ai-for-businesses-5c79</guid>
      <description>&lt;p&gt;This article was originally published on the &lt;a href="https://blog.neurotech.africa/potentials-about-conversational-ai-for-businesses/"&gt;Neurotech&lt;/a&gt; blog post.&lt;/p&gt;

&lt;p&gt;Speaking about the evolution of technology you can't skip mentioning artificial intelligence simply because in our day-to-day activities we do interact with the technology mostly even without knowing that we do. If you own a smartphone, laptop, smartwatches, desktop, and so many devices yes you do interact with artificial intelligence or use it to accomplish some of your tasks such as google search, Camera, meeting platforms like zoom, Google spreadsheets, Microsoft Cortana, Apple Siri, Google Assistant, Google map, Apple map, Google lens, social media interaction, etc. The scope of artificial intelligence has expanded and evolved over time. So, it is time to think about how you can leverage this technology to improve the revenue of your business in this article I will highlight the potential of conversational artificial intelligence for businesses.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;About conversational AI&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Conversational AI involves three concepts artificial intelligence, human language, and automation. We can define it as the type of artificial intelligence that enables consumers to interact with computer applications the way they would with other humans. Conversational AI has primarily taken the form of advanced chatbots that contrast with conventional chatbots and combines natural language processing with traditional software, voice assistants, or an interactive voice recognition system to help customers through either a spoken or typed conversation interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--h6tfXXjE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.appmaster.io/api/_files/ooRtJGmcZqEaSfTL468d8U/download/" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--h6tfXXjE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn.appmaster.io/api/_files/ooRtJGmcZqEaSfTL468d8U/download/" alt="conversational chatbot" width="880" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does conversational AI work?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Conversational AI involves three main key components which are:-&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural language processing&lt;/li&gt;
&lt;li&gt;Algorithm Training and Machine Learning&lt;/li&gt;
&lt;li&gt;Sentiment Analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Through the conversational interface, a user provides inputs either through voice or text. For text-based inputs requires &lt;a href="https://en.wikipedia.org/wiki/Natural-language_understanding"&gt;NLU&lt;/a&gt; to understand the contexture meaning of the inputs and the case of speech-based inputs requires &lt;a href="https://usabilitygeek.com/automatic-speech-recognition-asr-software-an-introduction/"&gt;ASR&lt;/a&gt;  to parse audio into language tokens that can be analyzed. After that best option is answered to a user response this depends on how trained and programmed to perform tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Use cases of conversational AI:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Customer service:&lt;/strong&gt; conversational AI has been extreme in this industry through the automation of customer support activities to improve access and reduce costs. Activities such as travel booking, FAQs, supporting customers to bill something, and handling complaints. Also, conversational AI is interesting to handle surveys with your customer to understand what they feel about what you provide or even if there is a new product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retail industry:&lt;/strong&gt; speaking of lead generation, lead qualification, and lead nurturing to 24/7 concierge service, faster order fulfillment, amplifying marketing messages, and more can be with conversational AI. In the retail field, things can go more advanced with the recommendation of products to customers, and multichannel integrations to follow your customer to the platform they love to use like WhatsApp, Facebook, Instagram, and Tiktok. But last is being able to serve your customer anytime they want service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finance and Banking industry:&lt;/strong&gt; Conversational AI has greatly helped banking and financial services reduce operating costs, automate functions, and improve the overall customer experience. In accessing and analyzing users’ spending patterns or bank accounts to help them decide how to spend their money, resolve customer queries by automating repetitive processes that typically take a human agent much longer, and through AI bot help in checking balances and detecting fraudulent transactions, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health industry:&lt;/strong&gt; In the case of handling schedules in hospital appointments conversational AI is being used for automating this process across the health industry which helps patients to manage their appointments and paperwork. The experience of Cognitive Behavioral Therapy, using conversational AI creates an immersive way to manage anxiety and other mental health issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sales and Marketing industry:&lt;/strong&gt; most consumers do prefer self-service technology for shopping experiences instead of human sales agents. Conversational AI generates and nurtures leads, optimizes the sales cycle, and gets and updates data instantly while maintaining accuracy with conversational automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use cases of conversational AI are more than the few I just mentioned, these are just some of them you can explore more use cases from &lt;a href="https://www.chatcompose.com/conversationalai.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are the impacts of conversational AI on your business?&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Customer retention&lt;/li&gt;
&lt;li&gt;Customer personalization&lt;/li&gt;
&lt;li&gt;Get customer feedback in a seamless manner&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this help to boost an increase in revenue and reduce costs with more accurate and timely marketing efforts while ensuring a seamless and pleasant experience for your customers. It is not enough to have chatbots on your website as a solution for customer support. Businesses need to have intelligent chatbots with natural language processing and understanding for the best customer support experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How Neurotech’s conversational ai solutions are best for your businesses?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Okay, &lt;a href="https://www.neurotech.africa/"&gt;Neurotech&lt;/a&gt; we are an AI company that builds &lt;a href="https://www.neurotech.africa/#services"&gt;solutions&lt;/a&gt; for businesses currently we do develop conversational AI for business needs which are controlled by our internal engine and go by the name &lt;a href="https://sarufi.io/#_"&gt;Sarufi&lt;/a&gt;. We offer custom solutions to fit various business needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it is useful?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our conversational ai solutions are developed in such a way that can understand the contextual meaning of the interaction or conversation with targeted audiences, Our custom chatbots can be deployed on social media platforms like Whatsapp, Facebook, Instagram, and Telegram. This depends on what our customers need. Currently, our solutions can work in two languages only Swahili and English, it can help out your business with customer support, increase revenue and build opportunities with every customer interaction note that 😊.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final Thoughts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now think of your business, it is time to get more close to your customers using conversational AI not too late, by automating workflows for FAQs, and repetitive tasks that staff has to go after with conversational artificial intelligence. The worth truth is conversational AI continues to evolve, making itself absolutely necessary to various industries such as finance, online marketing, healthcare, real estate, customer support, retail, and more. But don't worry we have &lt;a href="https://sarufi.io/#_"&gt;Sarufi&lt;/a&gt; for your business needs, if you may be interested to have a discussion with &lt;a href="https://www.neurotech.africa/#contact"&gt;Neurotech&lt;/a&gt; don't hesitate to reach out we do consult depending on what would be best for your business challenges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n8WK959P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://t3.ftcdn.net/jpg/04/48/13/40/240_F_448134055_3ygLHIrGKhm176wZnoRvDaY1iqljzVdZ.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n8WK959P--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://t3.ftcdn.net/jpg/04/48/13/40/240_F_448134055_3ygLHIrGKhm176wZnoRvDaY1iqljzVdZ.jpg" alt="thank you" width="390" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>GET STARTED WITH TOPIC MODELLING USING GENSIM IN NLP</title>
      <dc:creator>Anthony Mipawa</dc:creator>
      <pubDate>Wed, 25 May 2022 03:49:49 +0000</pubDate>
      <link>https://dev.to/neurotech_africa/get-started-with-topic-modelling-using-gensim-in-nlp-1b4g</link>
      <guid>https://dev.to/neurotech_africa/get-started-with-topic-modelling-using-gensim-in-nlp-1b4g</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;INTRODUCTION&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As one application of NLP  &lt;strong&gt;Topic modeling is&lt;/strong&gt; being used in many business areas to easily scan a series of documents, find groups of words (Topics) within them, and automatically &lt;strong&gt;cluster&lt;/strong&gt; word groupings, this has saved time and reduced costs.&lt;/p&gt;

&lt;p&gt;In this article, you're going to learn how to implement topic modeling with  &lt;strong&gt;Gensim&lt;/strong&gt;, hope you will enjoy it, let's get started.&lt;/p&gt;

&lt;p&gt;Have you ever wondered how hard is to process 100000 documents that contain 1000 words in each document? , that means it takes 100000 * 1000 =100000000 threads to process all documents. This can be hard, time, and memory-consuming if done manually, that's  where &lt;strong&gt;Topic modeling&lt;/strong&gt; comes into play as it allows to programmatically achieve all of that, and that's what you're going to learn in this article&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHAT IS TOPIC MODELLING?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Topic Modelling&lt;/strong&gt; can be easily defined as the statistical and unsupervised classification method that involves different techniques such as Latent Dirichlet Allocation (LDA) topic model to easily discover the topics and also recognize the words in those topics present in the documents. This saves time and provides an efficient way to understand the documents easily based on the topics.&lt;/p&gt;

&lt;p&gt;Topic modeling has many &lt;strong&gt;applications&lt;/strong&gt; ranging from sentimental analysis to recommendation systems. consider the below diagram for other applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lpGgpa3H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/topic_modelling.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lpGgpa3H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/topic_modelling.png" alt="https://blog.neurotech.africa/content/images/2022/02/topic_modelling.png" width="773" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;applications of topic modeling -&lt;a href="https://medium.com/@fatmafatma/industrial-applications-of-topic-model-100e48a15ce4"&gt;source&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that you have a clear understanding of what the topic modeling means, Let's see how to achieve it with Gensim,  But wait someone there asked what is &lt;strong&gt;Gensim&lt;/strong&gt;?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHAT IS GENSIM?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Well, Gensim is a short form for the &lt;strong&gt;general similarity&lt;/strong&gt; that is &lt;strong&gt;Gen&lt;/strong&gt; from &lt;em&gt;generating&lt;/em&gt; and &lt;strong&gt;sim&lt;/strong&gt; from &lt;em&gt;similarity&lt;/em&gt;, it is an open-source fully specialized python library written by &lt;strong&gt;Radim Rehurek&lt;/strong&gt; to represent document vectors as efficiently(computer-wise) and painlessly(human-wise) as possible.&lt;/p&gt;

&lt;p&gt;Genism is designed to be used in Topic modeling tasks to extract semantic topics from documents, Genism is your tool in case you're want to process large chunks of textual data, it uses algorithms like  &lt;em&gt;Word2Vec&lt;/em&gt;, &lt;em&gt;FastText&lt;/em&gt;, &lt;em&gt;Latent Semantic Indexing&lt;/em&gt; (LSI, LSA, LsiModel), &lt;strong&gt;Latent Dirichlet Allocation&lt;/strong&gt; (LDA, LdaModel) internally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JX3sIa4w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/gensim_history-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JX3sIa4w--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/gensim_history-1.png" alt="https://blog.neurotech.africa/content/images/2022/02/gensim_history-1.png" width="785" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gensim history - source &lt;a href="https://radimrehurek.com/"&gt;Radim Rehurek&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHY GENSIM?&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;It has efficient, implementations for various vector space algorithms as mentioned above.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It also provides similarity queries for documents in their semantic representation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It provides I/O wrappers and converters around several popular data formats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gensim is so fast, because of its design of data access and implementation of numerical processing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;HOW TO USE GENSIM FOR TOPIC MODELLING IN NLP.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We have come to the meat of our article, so grab a cup of coffee, fun playlists from your computer with Jupyter Notebook opened ready for hands-on. let's start.&lt;/p&gt;

&lt;p&gt;In this section, we'll see the practical implementation of the &lt;strong&gt;Gensim&lt;/strong&gt; for Topic Modelling using the &lt;strong&gt;Latent Dirichlet Allocation&lt;/strong&gt; (LDA) Topic model,&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Installation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here we have to install &lt;a href="https://radimrehurek.com/gensim/"&gt;the gensim library&lt;/a&gt; in a jupyter notebook to be able to use it in our project, consider the code below;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;! pip install --upgrade gensim
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Loading the datasets and importing important libraries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We are going to use an open-source dataset containing the news of millions of headlines sourced from the reputable Australian news source ABC (Australian Broadcasting Corporation)Agency Site: (&lt;a href="http://www.abc.net.au/"&gt;ABC&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The datasets contain two columns that are publish_date and headlines_texts column with millions of the headlines.&lt;/p&gt;

&lt;p&gt;Consider the below code for importing the required libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#importing library
import pandas as pd #loading dataframe
import numpy as np  #for mathematical calculations

import matplotlib.pyplot as plt #visualization
import seaborn as sns #visualization
import zipfile #for extracting the zip file datasets

import gensim #library for topic modelling
from gensim.models import LdaMulticore
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS

import nltk   #natural language toolkit for preprocessing the text data

from nltk.stem import WordNetLemmatizer   #used to Lemmatize using WordNet's    #built-in morphy function.Returns the input word unchanged if it cannot #be found in WordNet.

from nltk.stem import SnowballStemmer #used for stemming in NLP
from nltk.stem.porter import * #porter stemming

from wordcloud import WordCloud #visualization techniques for #frequently repeated texts

nltk.download('wordnet')  #database of words in more than 200 #languages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IvcCub_b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture1-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IvcCub_b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture1-1.png" alt="https://blog.neurotech.africa/content/images/2022/02/capture1-1.png" width="631" height="89"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we have managed to install &lt;strong&gt;Gensim&lt;/strong&gt; and import the supporting libraries into our working environment, consider the below codes for  installation of the other libraries if not installed yet in your jupyter notebook,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;! pip install nltk       #installing nltk library
! pip install wordcloud  #installing wordcloud library
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After successful importing the above libraries, let's now extract the zip datasets into a folder named data_for_Topic_modelling as shown on the below codes;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Extracting the Datasets
with zipfile.ZipFile("./abcnews-date-text.csv.zip") as file_zip:
    file_zip.extractall("./data_for_Topic_modelling")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nice, we have successfully unzipped the data from zip file libraries that we imported above, remember? , Now let's load the data into a variable called &lt;em&gt;data&lt;/em&gt;, since the datasets have more than millions of news for this tutorial we are going to use 500000 rows using slicing techniques in python language of the headline news from ABC.&lt;/p&gt;

&lt;p&gt;consider the code below for doing that;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#loading the data
#Here we have taken 500,000 rows of out dataset for implementation

data=pd.read_csv("./data_for_Topic_modelling/abcnews-date-text.csv")
data=data[:500000] #500000 rows taken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;EDA and processing the data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nice, after having the data on our variable named data as above shown from code, we have to check how it looks like hence EDA means exploratory data analysis and hence we will do some processing the data to make sure we have dataset ready for the algorithm to be trained,&lt;/p&gt;

&lt;p&gt;Here in the code below, we have used the &lt;em&gt;.head()&lt;/em&gt; function that prints the first five rows from the datasets, this helps us to know the structure of the data and hence confirmed it is of texts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Checking the first columns
data.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RXIYaOeS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RXIYaOeS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture2.png" alt="https://blog.neurotech.africa/content/images/2022/02/capture2.png" width="582" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we try to check the shape of the &lt;em&gt;dimension&lt;/em&gt; of the dataset and hence confirmed we have the rows that we selected at the start of loading the data, hence, pretty ready to go.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#checking the shape
#as you see there are 500000 the headline news as the rows we selected above.

data.shape
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8pfvOxBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8pfvOxBX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture3.png" alt="https://blog.neurotech.africa/content/images/2022/02/capture3.png" width="224" height="56"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we have to delete the &lt;strong&gt;publish_date&lt;/strong&gt; column from the dataset using the keyword &lt;strong&gt;del&lt;/strong&gt; as shown below codes, &lt;strong&gt;why?&lt;/strong&gt; because we don't want it our main focus is to model the topics according to the document that has a lot of headline news, so we consider the headline _text column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Deleting the publish data column since we want only headline_text #columns.

del data['publish_date']

#confirm deleteion
data.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--k0SC3czz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k0SC3czz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/capture5.png" alt="https://blog.neurotech.africa/content/images/2022/02/capture5.png" width="442" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we have remained with our important column which is headline_text as seen above, and here we now using &lt;strong&gt;wordcloud&lt;/strong&gt; to get a look at the most frequently appearing words from our datasets in headline_text columns, this increase more understanding about the datasets, consider the code below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#word cloud visualization for the headline_text
wc = WordCloud(
    background_color='black',
    max_words = 100,
    random_state = 42,
    max_font_size=110
    )
wc.generate(' '.join(data['headline_text']))
plt.figure(figsize=(50,7))
plt.imshow(wc)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KaZqOS0c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/c6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KaZqOS0c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/c6.png" alt="https://blog.neurotech.africa/content/images/2022/02/c6.png" width="761" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hereafter visualizing the data, we process the data by starting with &lt;strong&gt;stemming&lt;/strong&gt;, which is simply the process of reducing a word to its word &lt;strong&gt;stem&lt;/strong&gt; that is to say affixes to suffixes and prefixes or to the roots of words known as a &lt;strong&gt;lemma.&lt;/strong&gt; Example &lt;strong&gt;cared to care.&lt;/strong&gt; Here we are using the &lt;em&gt;snowballStemmer&lt;/em&gt; algorithm that we imported from &lt;strong&gt;&lt;a href="https://www.nltk.org/"&gt;nltk&lt;/a&gt;&lt;/strong&gt;, remember right?&lt;/p&gt;

&lt;p&gt;consider the below code function code;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#function to perform the pre processing steps on the  dataset
#stemming

stemmer = SnowballStemmer("english")
def lemmatize_stemming(text):
    return stemmer.stem(WordNetLemmatizer().lemmatize(text, pos='v'))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then continue to &lt;strong&gt;tokenize&lt;/strong&gt; and &lt;strong&gt;lemmatize,&lt;/strong&gt; where here we split the large texts in headline text into a list of smaller words that we call tokenization, and finally append the lemmatized word from the &lt;strong&gt;lemmatize_stemming&lt;/strong&gt; function above code to the result list as shown below;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Tokenize and lemmatize

def preprocess(text):
    result=[]
    for token in gensim.utils.simple_preprocess(text) :
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) &amp;gt; 3:
            #Apply lemmatize_stemming on the token, then add to the results list
            result.append(lemmatize_stemming(token))
    return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then after the above steps, here we just call the &lt;strong&gt;preprocess()&lt;/strong&gt; function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#calling the preprocess function above
processed_docs = data['headline_text'].map(preprocess)
processed_docs[:10]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uX3zMw4d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uX3zMw4d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image.png" alt="https://blog.neurotech.africa/content/images/2022/02/image.png" width="507" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create a dictionary from 'processed_docs' from gensim.corpora  containing the number of times a word appears in the training set, and call it a name it a dictionary, consider below code&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; dictionary = gensim.corpora.Dictionary(processed_docs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, after having a dictionary from the above code, we have to implement &lt;strong&gt;bags of words model&lt;/strong&gt; (BoW),  &lt;strong&gt;BoW&lt;/strong&gt; is nothing but a representation of the text that shows the occurrence of the words that are within the specified documents, this keeps the word count only and discard another thing like order or structure of the document, Therefore we will create a sample document called document_num and assigned a value of 4310.&lt;/p&gt;

&lt;p&gt;Note: you can just create any sample document of your own,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Create the Bag-of-words(BoW) model for each document
document_num = 4310
bow_corpus = [dictionary.doc2bow(doc) for doc in processed_docs]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking Bag of Words corpus for our sample document that is  (token_id, token_count)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bow_corpus[document_num]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yjH73YRP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yjH73YRP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-2.png" alt="https://blog.neurotech.africa/content/images/2022/02/image-2.png" width="662" height="39"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modeling using LDA (Latent Dirichlet Allocation) from bags of words above&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We have come to the final part of using LDA which is LdaMulticore for fast processing and performance of the model from Gensim to create our first topic model and save it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Modelling part
lda_model = gensim.models.LdaMulticore(bow_corpus,
                                       num_topics=10,
                                       id2word = dictionary,
                                       passes = 2,
                                       workers=2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each topic, we will explore the words occurring in that topic and their relative weight&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Here it should give you a ten topics as example shown below image
for idx, topic in lda_model.print_topics(-1):
    print("Topic: {} \nWords: {}".format(idx, topic))
    print("\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S8rq5VhD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S8rq5VhD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-6.png" alt="https://blog.neurotech.africa/content/images/2022/02/image-6.png" width="743" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's finish with performance evaluation, by checking which topics the test document  that we created  earlier belongs to, using LDA bags of word model, consider the code below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Our test document is document number 4310
for index, score in sorted(lda_model[bow_corpus[document_num]], key=lambda tup: -1*tup[1]):
    print("\nScore: {}\t \nTopic: {}".format(score, lda_model.print_topic(index, 10)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--06K8aiSZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--06K8aiSZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.neurotech.africa/content/images/2022/02/image-5.png" alt="https://blog.neurotech.africa/content/images/2022/02/image-5.png" width="880" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congrats! if you have managed to reach the end of this article, as you see above we have implemented a successful model using LDA from the Gensim library using bags of the words to easily model the topics present in the documents with 500,000 headline news. The full codes and datasets used can be found &lt;strong&gt;&lt;a href="https://github.com/sarufi-io/Topic-Modelling-With-Gensim"&gt;here&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Relationship Between Neurotech and Natural Language Processing(NLP)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Natural Language Processing is a powerful tool when your solve business challenges, associating with the digital transformation of companies and startups. &lt;a href="https://sarufi.io/"&gt;Sarufi&lt;/a&gt; and &lt;a href="https://www.neurotech.africa/#services"&gt;Neurotech&lt;/a&gt; offer high-standard solutions concerning conversational AI(chatbots). Improve your business experience today with NLP  &lt;a href="https://sarufi.io/solutions"&gt;solutions&lt;/a&gt; from experienced technical expertise.&lt;/p&gt;

&lt;p&gt;Hope you find this article useful, sharing is caring.&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>gensim</category>
      <category>topicmodeling</category>
      <category>getstarted</category>
    </item>
  </channel>
</rss>
