<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Purity-Nyagweth</title>
    <description>The latest articles on DEV Community by Purity-Nyagweth (@puritye).</description>
    <link>https://dev.to/puritye</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F803716%2Fabd99e45-7da8-40a9-a166-e8b607bca2fc.jpeg</url>
      <title>DEV Community: Purity-Nyagweth</title>
      <link>https://dev.to/puritye</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/puritye"/>
    <language>en</language>
    <item>
      <title>Stemming vs Lemmatization - What is the difference?</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Thu, 10 Nov 2022 07:50:49 +0000</pubDate>
      <link>https://dev.to/puritye/stemming-vs-lemmatization-what-is-the-difference-213j</link>
      <guid>https://dev.to/puritye/stemming-vs-lemmatization-what-is-the-difference-213j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Stemming and Lemmatization are techniques used in text processing. In Natural Language Processing (NLP), text processing is needed to normalize the text. The aim of text normalization is to reduce the amount of information that a machine has to handle thus improving the efficiency of the machine learning process. &lt;br&gt;
Both stemming and lemmatization involves reducing the inflectional forms of words to their root forms. Inflection forms of words are words that are derived from the root or base form of a word. For example, the words jumped, jumping and jumps are inflectional forms of the root word jump. Likewise, creating, created, creates are inflectional forms of the root word create, and so on.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Basic knowledge of python programming&lt;/li&gt;
&lt;li&gt;Python installed&lt;/li&gt;
&lt;li&gt;Natural Language Toolkit(nltk) package installed&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What is the difference between stemming and lemmatization?
&lt;/h2&gt;

&lt;p&gt;The main difference between stemming and lemmatization is that stemming chops off the suffixes of a word to reduce a word to its root form while lemmatization first takes into consideration the context of a word and makes use of the context to convert the word to its meaningful base form which is known as lemma.&lt;/p&gt;

&lt;p&gt;Below are examples of words that stemming and lemmatization have been performed on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stemming Examples&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word&lt;/strong&gt;      ---     &lt;strong&gt;Porter Stemmer&lt;/strong&gt;      &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jumped          ---       jump
&lt;/li&gt;
&lt;li&gt;friends         ---      friend
&lt;/li&gt;
&lt;li&gt;football       ---       footbal
&lt;/li&gt;
&lt;li&gt;mysteries       ---      mysteri
&lt;/li&gt;
&lt;li&gt;created        ---       creat
&lt;/li&gt;
&lt;li&gt;took            ---      took &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lemmatization Examples&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Word&lt;/strong&gt;        ---        &lt;strong&gt;Lemmatized word&lt;/strong&gt;          &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jumped          ---   jump
&lt;/li&gt;
&lt;li&gt;friends         ---   friend
&lt;/li&gt;
&lt;li&gt;football     ---     football
&lt;/li&gt;
&lt;li&gt;mysteries    ---     mystery
&lt;/li&gt;
&lt;li&gt;created      ---      create
&lt;/li&gt;
&lt;li&gt;took         ---       take&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How to carry out stemming
&lt;/h2&gt;

&lt;p&gt;Natural Language Toolkit(nltk) package has two stemmers for the English Language. These stemmers are PorterStemmer and LancasterStemmer. &lt;br&gt;
We are going to use PorterStemmer to carryout stemming.&lt;/p&gt;

&lt;p&gt;First let's import PorterStemmer&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk.stem import PorterStemmer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now create a list of words that we want to stem&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;word_list = ["jumped", "friendship", "friends", "swimming","creation","stability","writing",
             "realize","mystery","football", "mysteries", "created", "took"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will now stem every word in the list and then print the word with its stemmed version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stemmer = PorterStemmer()

for word in word_list:
    print((word,stemmer.stem(word)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Output&lt;/em&gt;&lt;br&gt;
('jumped', 'jump')&lt;br&gt;
('friendship', 'friendship')&lt;br&gt;
('friends', 'friend')&lt;br&gt;
('swimming', 'swim')&lt;br&gt;
('creation', 'creation')&lt;br&gt;
('stability', 'stabil')&lt;br&gt;
('writing', 'write')&lt;br&gt;
('realize', 'realiz')&lt;br&gt;
('mystery', 'mysteri')&lt;br&gt;
('football', 'footbal')&lt;br&gt;
('mysteries', 'mysteri')&lt;br&gt;
('created', 'creat')&lt;br&gt;
('took', 'took')&lt;/p&gt;
&lt;h2&gt;
  
  
  How to carry out lemmatization
&lt;/h2&gt;

&lt;p&gt;As mentioned earlier, lemmatization just like stemming reduces a word to its root form but for lemmatization we need to first tag the words with their parts of speech tags before carrying out the lemmatization. For example, every word that is verb will be given the tag verb(v), words that are noun will be given noun(n) tag and so on.&lt;/p&gt;

&lt;p&gt;Let's first install the libraries that we will be using&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a start, let's create a function for tagging the words. We will use wordnet for tagging the words.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def tag(doc):
    #POS tagging
    tagged_tokens = nltk.pos_tag(doc)
    return tagged_tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's create a function for converting the parts of speech(pos) tags.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# function for converting tags
def pos_tag_wordnet(tagged_tokens):
    tag_map = {'j': wordnet.ADJ, 'v': wordnet.VERB, 'n': wordnet.NOUN, 'r': wordnet.ADV}
    new_tagged_tokens = [(word, tag_map.get(tag[0].lower(), wordnet.NOUN))
                            for word, tag in tagged_tokens]
    return new_tagged_tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now tag the words in the word list from before, then convert the tags and print the output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# tag the words
tagged_tokens = tag(word_list)
# convert the tags
wordnet_tokens = pos_tag_wordnet(tagged_tokens)
print(wordnet_tokens)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Output&lt;/em&gt;&lt;br&gt;
[('jumped', 'v'), ('friendship', 'n'), ('friends', 'n'), ('swimming', 'v'), ('creation', 'n'), ('stability', 'n'), ('writing', 'v'), ('realize', 'v'), ('mystery', 'n'), ('football', 'n'), ('mysteries', 'n'), ('created', 'v'), ('took', 'v')]&lt;br&gt;
From the output, we can see we've got verbs(v) and nouns(n).&lt;/p&gt;

&lt;p&gt;Let's now lemmatize the tagged words.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wnl = WordNetLemmatizer()

for word, tag in wordnet_tokens:
    print((word, wnl.lemmatize(word, tag)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Output&lt;/em&gt;&lt;br&gt;
('jumped', 'jump')&lt;br&gt;
('friendship', 'friendship')&lt;br&gt;
('friends', 'friend')&lt;br&gt;
('swimming', 'swim')&lt;br&gt;
('creation', 'creation')&lt;br&gt;
('stability', 'stability')&lt;br&gt;
('writing', 'write')&lt;br&gt;
('realize', 'realize')&lt;br&gt;
('mystery', 'mystery')&lt;br&gt;
('football', 'football')&lt;br&gt;
('mysteries', 'mystery')&lt;br&gt;
('created', 'create')&lt;br&gt;
('took', 'take')&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we've learned about stemming and lemmatization, what they are and their differences. Both stemming and lemmatization are good techniques for text processing and they each have pros and cons. &lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/dphi-official/nlp_essentials/blob/master/notebooks/01_Text_Wrangling_Examples.ipynb"&gt;dphi NLP-Essentials(Text wrangling)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.datacamp.com/tutorial/stemming-lemmatization-python"&gt;Datacamp(stemming-lemmatization-python)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html"&gt;nlp.stanford.edu(stemming-and-lemmatization)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Audio Transcription with Python</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Wed, 31 Aug 2022 18:20:33 +0000</pubDate>
      <link>https://dev.to/puritye/audio-transcription-with-python-3jod</link>
      <guid>https://dev.to/puritye/audio-transcription-with-python-3jod</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Audio transcription is the processing of converting speech in an audio or video file into text. Having a transcription for a video or an audio recording has benefits. Below are some of the benefits of audio transcription:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expanding target audience. When a transcript get translated to several languages it will open up the content to a wider audience.&lt;/li&gt;
&lt;li&gt;Making the content more accessible. With a transcript, the content of an audio can be readily and accurately accessed, more so in cases where the audio quality has been compromised due to background distractions, low volume, regional accents and so on.&lt;/li&gt;
&lt;li&gt;Boosting the SEO. With transcription, the keywords used in the audio will now be in written form hence they can be recognized by search engines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article we are going to learn how to transcribe audio using python. &lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisite
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Basic knowledge of python programming&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assembly AI account&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting API token
&lt;/h2&gt;

&lt;p&gt;The first thing we will do is to get an API token from &lt;a href="https://www.assemblyai.com/"&gt;Assembly AI&lt;/a&gt;. &lt;br&gt;
Let's go to &lt;a href="https://www.assemblyai.com/"&gt;Assembly AI&lt;/a&gt; and create a free account.&lt;br&gt;
Once we have an account, we will sign in and then copy the API Key. &lt;br&gt;
The API Key is located at the right of the home page.&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating config file for storing the key
&lt;/h2&gt;

&lt;p&gt;Now that we have an API Key, let's create a config file for storing the key.&lt;br&gt;
We will create a python file and name it 'api_key.py' (you can give it any name). Then create a variable and assign the API Key to the variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API_KEY = 'API Key from Assembly AI'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating the config file, we will now create a main file (main.py) where we will write the codes for transcribing the audio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; 'api_key.py' and 'main.py' should be in the same directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Importing requests and API Key
&lt;/h2&gt;

&lt;p&gt;The first thing that we will do in the 'main.py' is to import requests and the API Key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from api_key import API_KEY 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Uploading Audio to Assembly AI
&lt;/h2&gt;

&lt;p&gt;Next, let's create a variable 'filename', then get the path of the audio that we want to transcribe and assign this path to 'filename'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;filename = 'audio path'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now create another variable 'upload_endpoint'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upload_endpoint = 'https://api.assemblyai.com/v2/upload'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's also create a variable 'headers' which will be used for authentication. We will use the API Key for authentication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;headers = {'authorization': API_KEY}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's create a function for reading the audio file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def read_file(filename, chunk_size=5242880):
    with open(filename, 'rb') as _file:
        while True:
            data = _file.read(chunk_size)
            if not data:
                break
            yield data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now do a post request to upload the file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;upload_response = requests.post(upload_endpoint,
                        headers=headers,
                        data=read_file(filename))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can print the response to see what kind of response we get.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(upload_response.json())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The output is an upload url where the audio file is after being uploaded&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Transcribing Audio
&lt;/h2&gt;

&lt;p&gt;Our next step now is to transcribe the uploaded audio.&lt;br&gt;
Let's create a variable 'transcript_endpoint'. We will assign the transcription end point  to this variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;transcript_endpoint = "https://api.assemblyai.com/v2/transcript"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The transcript endpoint is the same as the upload endpoint expect that it ends with 'transcript' while the upload endpoint ends with 'upload'.&lt;/p&gt;

&lt;p&gt;Next, let's extract the audio url from the response we got from uploading the audio.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;audio_url = upload_response.json()['upload_url']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now create a json file that contains the audio url.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;json = { "audio_url": audio_url}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then submit the audio for transcription&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;transcript_response = requests.post(transcript_endpoint, json=json, headers=headers)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's print the response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(transcript_response())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is the response but with just a few of the info not all of it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;{&lt;strong&gt;'id': 'ongvqhtbo7-ad52-4272-b695-d7c624b7c2b5'&lt;/strong&gt;, 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'queued'}&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The response we get is not the transcript itself, this is because depending on the length of audio it may take a minute or two to get the transcript ready. Instead, the response we get contains a bunch of information about the transcription. &lt;/p&gt;

&lt;p&gt;Our main interest from the response is the 'id', we will use it to ask AssemblyAI whether the transcription job is ready or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Polling
&lt;/h2&gt;

&lt;p&gt;In a quite simple definition, polling refers to the continuous checking of a resource to see what state they are in.&lt;/p&gt;

&lt;p&gt;We will now write the code for polling AssemblyAI. We will use this code to continuously check the status of the transcription job so as to know whether the transcription is ready or not.&lt;/p&gt;

&lt;p&gt;The first thing we will do is to get the 'id' from the response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;job_id = transcript_response.json()['id']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After getting the job id, let's create a polling endpoint and then send a get request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;polling_endpoint = transcript_endpoint + '/' + job_id # creating a polling endpoint
polling_response = requests.get(polling_endpoint, headers=headers) # get request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's print the polling response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(polling_response.json())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just like the transcript response this response also gives a bunch of information. &lt;br&gt;
Below is the response but with just a few of the info not all of it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;{'id': 'onse9tjyxv-4164-439c-b19c-ee92ae95a7c1', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', &lt;strong&gt;'status': 'processing'&lt;/strong&gt;}&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For this response, we are interested in the status. If the transcription is not ready, the status will indicate 'processing', otherwise it will indicate 'completed'. &lt;br&gt;
From the response above, we can see that the status indicates 'processing' meaning that the transcription is not yet ready.&lt;/p&gt;

&lt;p&gt;Let's now create a while loop that will keep on polling AssemblyAI until the status indicates completed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;while True:
    polling_response = requests.get(polling_endpoint, headers=headers)
    if polling_response.json()['status'] == 'processing':
        print('Still processing')
    elif polling_response.json()['status'] == 'error':
        print('error')
    elif polling_response.json()['status'] == 'completed':
        print('completed')
        break
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the output from the while loop. As we can see, with the while loop we keep on polling until the status indicates 'completed'.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Still processing&lt;br&gt;
Still processing&lt;br&gt;
Still processing&lt;br&gt;
Still processing&lt;br&gt;
completed&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With the status indicating 'completed' let's now print the polling response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(polling_response.json())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is the response but with a bit of the information.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;{'id': 'onsx3yoyc6-aaa9-472e-b8fa-e3cc10e0432f', 'language_model': 'assemblyai_default', 'acoustic_model': 'assemblyai_default', 'language_code': 'en_us', 'status': 'completed', &lt;strong&gt;'text': 'How is your processing with Python?'&lt;/strong&gt;, 'words': [{'text': 'How', 'start': 730, 'end': 822, 'confidence': 0.38754, 'speaker': None},&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Our main focus from the response is 'text', this is because it contains the transcript. From the response we can see that our transcript is 'How is your processing with Python?'&lt;/p&gt;

&lt;p&gt;Now that we have the transcript, let's save it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Saving the transcript
&lt;/h2&gt;

&lt;p&gt;We will now write the transcript into a text file and thereafter print 'File succesfully saved!' to confirm that the file has been saved.&lt;br&gt;
The text file will be saved into the working directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = polling_response.json()
with open ('transcript.txt', 'w') as f:
    f.write(response['text'])
print('File succesfully saved!')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article we've learnt how to transcribe an audio file and looked at all the steps to follow when transcribing an audio.&lt;br&gt;
A part from &lt;a href="https://www.assemblyai.com/"&gt;AssemblyAI&lt;/a&gt;, there are also other platforms such as &lt;a href="https://deepgram.com/why-deepgram/"&gt;Deepgram&lt;/a&gt; that can be used for converting speech to text. &lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=rTWM5WuPhlQ&amp;amp;list=PLcWfeUsAys2nb0i79L_LqYVfwOWEYA4eD&amp;amp;index=3&amp;amp;t=1143s"&gt;How to Transcribe Audio files with python(AssemblyAI)&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>nlp</category>
      <category>beginners</category>
      <category>ai</category>
    </item>
    <item>
      <title>How to Plot an audio file using Matplotlib</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Wed, 06 Jul 2022 05:24:50 +0000</pubDate>
      <link>https://dev.to/puritye/how-to-plot-an-audio-file-using-matplotlib-pbb</link>
      <guid>https://dev.to/puritye/how-to-plot-an-audio-file-using-matplotlib-pbb</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Plotting and visualizing an audio file is one of the most important processes in audio analysis. Audio analysis is the process of transforming, exploring, and interpreting audio signals recorded by digital devices so as to extract insights from the audio data.&lt;/p&gt;

&lt;p&gt;In this article, we are going to plot a waveform of an audio file with matplotlib.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python installed&lt;/li&gt;
&lt;li&gt;Numpy installed&lt;/li&gt;
&lt;li&gt;Matplotlib installed&lt;/li&gt;
&lt;li&gt;Background in data analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Importing module and libraries
&lt;/h2&gt;

&lt;p&gt;As a first step, let's import the modules and libraries that we will need.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import wave
import matplotlib.pyplot as plt 
import numpy as np


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We will use wave module and numpy to preprocess the audio. We will use matplotlib for plotting the audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading the Audio file
&lt;/h2&gt;

&lt;p&gt;The audio file that we will use is a wave file.&lt;br&gt;
Let's load the wave file that we want to plot&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

obj = wave.open('audio_file.wav', 'rb')


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Getting Audio parameters
&lt;/h2&gt;

&lt;p&gt;Let's print out the audio parameters such as number of channels, sample width, etc.&lt;br&gt;
We will use &lt;code&gt;.getparams()&lt;/code&gt; method of the wave module&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

print('Parameters:', obj.getparams())


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;Parameters: _wave_params(nchannels=1, sampwidth=2, framerate=22050, nframes=81585, comptype='NONE', compname='not compressed')&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now let's get the parameters that we will need for plotting the audio.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sample frequency, this is the number of samples per second&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

sample_freq = obj.getframerate()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Number of samples, this is the total number of samples or frames in the audio file&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

n_samples = obj.getnframes()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Signal wave, this is the wave amplitude which is the sound intensity.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

signal_wave = obj.readframes(-1)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Audio length, this is the duration of the audio.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

duration = n_samples/sample_freq


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Creating numpy objects
&lt;/h2&gt;

&lt;p&gt;Let's create a numpy object from the signal_wave. This will be plotted on the y-axis.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

signal_array = np.frombuffer(signal_wave, dtype=np.int16)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's create a numpy object from duration. This will be plotted on the x-axis&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

time = np.linspace(0, duration, num=n_samples)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Creating an audio plot
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

plt.figure(figsize=(15, 5))
plt.plot(time, signal_array)
plt.title('Audio Plot')
plt.ylabel(' signal wave')
plt.xlabel('time (s)')
plt.xlim(0, time) #limiting the x axis to the audio time
plt.show()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqieuv0iw5017dve5ixg.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqieuv0iw5017dve5ixg.PNG" alt="Output of the plot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we've learnt how to plot a waveform of an audio file. Apart from plotting the waveform, another plot we can get from an audio file is frequency spectrum. For the frequency spectrum, we plot sample frequency against time. To learn how to plot a frequency spectrum, checkout this &lt;a href="https://learnpython.com/blog/plot-waveform-in-python/" rel="noopener noreferrer"&gt;link.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=n2FKsPt83_A&amp;amp;list=PLcWfeUsAys2nb0i79L_LqYVfwOWEYA4eD&amp;amp;index=2" rel="noopener noreferrer"&gt;Audio Processing Basics(Assembly AI)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.altexsoft.com/blog/audio-analysis/" rel="noopener noreferrer"&gt;Audio Analysis(altexsoft)&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>audioanalysis</category>
      <category>beginners</category>
      <category>nlp</category>
    </item>
    <item>
      <title>Named Entity Recognition with Spacy</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Thu, 30 Jun 2022 18:50:01 +0000</pubDate>
      <link>https://dev.to/puritye/named-entity-recognition-with-spacy-5adn</link>
      <guid>https://dev.to/puritye/named-entity-recognition-with-spacy-5adn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Named Entity Recognition also known as NER, is a Natural Language Processing (NLP) task that identifies and classifies named entities in a text. Named entities are real-world objects assigned a name. They include people's names, location names, work of art, organizations, days, dates and among many others. &lt;/p&gt;

&lt;p&gt;Named Entity Recognition is usually used for extracting key information to understand a text while performing task such as topic identification. It can also be used on its own for the case of just extracting important information from a text.&lt;/p&gt;

&lt;p&gt;In this article, I am going to explain how to perform Named Entity Recognition using Spacy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisite
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Spacy installed&lt;/li&gt;
&lt;li&gt;Python installed&lt;/li&gt;
&lt;li&gt;Basic knowledge of python programming&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Spacy?
&lt;/h2&gt;

&lt;p&gt;Spacy is an open-source NLP library that is used for performing various NLP tasks.&lt;br&gt;
It has a built-in mechanism that is used for identifying and classifying named entities.  &lt;/p&gt;
&lt;h2&gt;
  
  
  NER using Spacy
&lt;/h2&gt;

&lt;p&gt;First, let's import the Spacy library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import spacy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then load the "en_core_web_sm" model and assign it to a variable named nlp&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nlp = spacy.load("en_core_web_sm")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's create a sample text which we will extract named entities from&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample_text = "Over 200 youth from Kisumu County in Kenya, have today gotten a chance to take  part in a Golf programme by Safaricom held at Lolwe Grounds."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create a Spacy document by passing the sample text into nlp()&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;doc = nlp(sample_text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To extract the named entities from the document we will use '.ents'&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(doc.ents)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;(200, Kisumu County, Kenya, today, Safaricom, Lolwe Grounds)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's now print all the entities together with the category(label) they have been classified to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for ent in doc.ents:
    print(ent, ent.label_)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;200 CARDINAL&lt;br&gt;
Kisumu County GPE&lt;br&gt;
Kenya GPE&lt;br&gt;
today DATE&lt;br&gt;
Safaricom ORG&lt;br&gt;
Lolwe Grounds FAC&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The explain() method
&lt;/h2&gt;

&lt;p&gt;Spacy has a method 'explain()', that a label/category can be passed to and it gives an explanation of that label/category.&lt;br&gt;
To get a quick definition of a label, we can use the 'explain()' method.&lt;/p&gt;

&lt;p&gt;Let's try it out with the labels we got&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spacy.explain("CARDINAL")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;Numerals that do not fall under another type&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spacy.explain("GPE")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;Countries, cities, states&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spacy.explain("DATE")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;Absolute or relative dates or periods&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spacy.explain("FAC")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;em&gt;Buildings, airports, highways, bridges, etc.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Named Entities using Displacy
&lt;/h2&gt;

&lt;p&gt;Displacy is a built-in Spacy dependency visualizer. &lt;br&gt;
It will show the Named Entities directly in the text.&lt;/p&gt;

&lt;p&gt;Let's import Displacy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from spacy import displacy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we will create the visual&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;displacy.render(doc,style="ent",jupyter=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XYVtz_L2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tw3qakdckbnucw9yxcmg.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XYVtz_L2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tw3qakdckbnucw9yxcmg.PNG" alt="Displacy output" width="880" height="87"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Named Entity Recognition is one of the methods that can be used to gain insights from a text while carrying out NLP tasks. Named Entity Recognition has several use cases such as in Recommendation systems, enabling efficient search algorithms, customer support and so on.&lt;/p&gt;

&lt;p&gt;In this article, we looked at Named Entity Recognition using Spacy. But, Spacy is not the only library that can be used for NER. Other open-source libraries that you can use are &lt;a href="https://www.nltk.org/"&gt;NLTK&lt;/a&gt; and &lt;a href="https://nlp.stanford.edu/software/CRF-NER.shtml"&gt;Stanford NER&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Credits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2021/06/part-10-step-by-step-guide-to-master-nlp-named-entity-recognition/#:~:text=Named%20Entity%20Recognition%20is%20one,classify%20them%20into%20predefined%20categories"&gt;NER (Analytics Vidhya)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://spacy.io/"&gt;Spacy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.analyticsvidhya.com/blog/2021/06/nlp-application-named-entity-recognition-ner-in-python-with-spacy/#:~:text=NER%20using%20Spacy%3A,fast%20statistical%20entity%20recognition%20system."&gt;Spacy NER (Analytics Vidhya)&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>BOXPLOTS IN A BRIEF</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Tue, 05 Apr 2022 09:52:07 +0000</pubDate>
      <link>https://dev.to/puritye/boxplots-in-a-brief-4h8c</link>
      <guid>https://dev.to/puritye/boxplots-in-a-brief-4h8c</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A boxplot is a graph that is usually used for the descriptive analysis of numerical data. It provides us with information about data structures and distributional features of the data based on measures of central tendency and measures of dispersion including the median, quartiles, maximum, minimum, and symmetry. Boxplot can also be used to identify outliers in a dataset.&lt;/p&gt;

&lt;p&gt;Boxplots consist of a box and whiskers. Boxplots can either be in a vertical or horizontal position. When in a vertical position, the bottom end of the box is the lower quartile while the top end of the box is the upper quartile. The height of the box which extends from the lower to the upper quartile is the interquartile range. It contains 50% of the data. The median is the horizontal line inside the box. Whiskers appear at the end of the plot. They mark the minimum and maximum observations of the data. The data observations that appear beyond the whiskers are the outliers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jgxXzi31--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j4qz0kqr19qralp1s9mm.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jgxXzi31--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j4qz0kqr19qralp1s9mm.PNG" alt="Image of boxplot" width="481" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the median is in the middle of the box, the distribution of the data is symmetric otherwise it is skewed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4RlQRDpX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cl6tdsc78kldgck3ufwm.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4RlQRDpX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cl6tdsc78kldgck3ufwm.PNG" alt="Image shows symmetric and skewed distribution" width="561" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the median is closer to the lower quartile and the lower whisker is shorter, then it is a right-skewed distribution (positive skewness), otherwise when the median is closer to the upper quartile and the upper whisker is shorter, then it is a left-skewed distribution (negative skewness).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Gvp1qPPA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qmbyxvhclbbgl00fjkux.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Gvp1qPPA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qmbyxvhclbbgl00fjkux.PNG" alt="Image shows right and left skewed distribution" width="454" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to create a Boxplot
&lt;/h2&gt;

&lt;p&gt;To create our boxplot, we are going to use the &lt;a href="https://www.kaggle.com/hellbuoy/car-price-prediction"&gt;CarPrice_Assignment.csv&lt;/a&gt; Dataset from Kaggle.&lt;/p&gt;

&lt;p&gt;As a first step, we are going to import libraries that we will be using.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;import pandas as pd&lt;br&gt;
import seaborn as sns&lt;br&gt;
import matplotlib.pyplot as plt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the next step, we are going to read the dataset and view the first 5 rows&lt;/p&gt;

&lt;p&gt;&lt;code&gt;df = pd.read_csv('../input/car-price-prediction/CarPrice_Assignment.csv')&lt;br&gt;
df.head()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mSj5LcQV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rm5jdd4ls3krihf165i1.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mSj5LcQV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rm5jdd4ls3krihf165i1.PNG" alt="Image shows loaded dataset" width="880" height="120"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we are going to create a boxplot for the feature ‘price’ to get a descriptive analysis of it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sns.set_style('whitegrid')&lt;br&gt;
sns.boxplot(y='price', data = df)&lt;br&gt;
plt.ylabel('price') #setting text for y axis&lt;br&gt;
plt.show()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uULJuCZf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2brqqry7grkk9c4qcxjx.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uULJuCZf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2brqqry7grkk9c4qcxjx.PNG" alt="Image shows boxplot of price column" width="454" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the boxplot, we can see that on average, the price of a car is around 10,000. The minimum price of a car is 5,000 and the maximum price is around 28,000 to 29,000. We can also note the outliers in the dataset and that the distribution is right-skewed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Boxplots as a way to compare several datasets
&lt;/h2&gt;

&lt;p&gt;Although boxplots are usually used for descriptive analysis, they can also be used to visualize correlations and associations between variables. They show the descriptive analysis of a dependent numerical feature against each unique value of an independent categorical feature.&lt;/p&gt;

&lt;p&gt;As an example, we are going to use the dataset from before. We are going to create a boxplot to show the descriptive analysis of the feature ‘price’ against each unique value of the feature ‘fuel type’.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sns.set_style("whitegrid")&lt;br&gt;
sns.boxplot(x='fueltype', y='price', data =df)&lt;br&gt;
plt.xlabel('fueltype') #Set text for the x axis&lt;br&gt;
plt.ylabel('price') #Set text for y axis&lt;br&gt;
plt.show()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CPGIFoe3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8ntfewlxfxjuxz2r1ldt.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CPGIFoe3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8ntfewlxfxjuxz2r1ldt.PNG" alt="Image shows boxplot of gas and diesel" width="475" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As per the boxplots above, we can see that the price for cars with fuel type diesel and fuel type gas differ and that on average, the cars with the fuel type diesel have a higher price as compared to the cars with the fuel type gas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a boxplot with nested grouping
&lt;/h2&gt;

&lt;p&gt;Still using the same dataset, we are now going to group both the cars with fuel type gas and fuel type diesel by the feature ‘door number’ and plot them against the price to see how their prices vary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sns.set_style("whitegrid")&lt;br&gt;
sns.boxplot(x ='fueltype', y='price', hue='doornumber', data=df)&lt;br&gt;
plt.xlabel('fueltype') #Set text for the x axis&lt;br&gt;
plt.ylabel('price') #Set text for y axis&lt;br&gt;
plt.show()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--r3PbHoGG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9j0m9cyfdyd0ha38du60.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--r3PbHoGG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9j0m9cyfdyd0ha38du60.PNG" alt="Image shows boxplot with nested grouping" width="466" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that the cars that use diesel, whether with two doors or four doors have a higher price as compared to cars that use gas. The average price of cars that use diesel and have two doors is the lowest as compared to all the others. With the cars that use gas, there is a slight variation in the average price between the cars with two doors and those with four doors. While for the &lt;br&gt;
cars that use diesel, there is a big variation in the average price between cars with two doors and those with four doors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a boxplot with nested grouping with some bins being empty
&lt;/h2&gt;

&lt;p&gt;For this, we are going to group both cars with fuel type gas and diesel by the feature ‘drivewheel’ and plot them against price to see how their prices vary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sns.set_style("whitegrid")&lt;br&gt;
sns.boxplot(x ='fueltype', y='price', hue='drivewheel', data=df)&lt;br&gt;
plt.xlabel('fueltype') # Set text for the x axis&lt;br&gt;
plt.ylabel('price')# Set text for y axis&lt;br&gt;
plt.show()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--e5_yQdNV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/by36slakdrvmcajqwxq3.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--e5_yQdNV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/by36slakdrvmcajqwxq3.PNG" alt="Image shows nested boxplots with some bins being empty" width="477" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that with cars that use diesel, there is none that is a four-wheel drive. For both the cars that use gas and diesel, the cars that are rear-wheel drive have the highest prices and the forward-wheel drive cars have the lowest prices. Generally, the cars that use diesel have higher prices as compared to those that use gas, we can see that rear-wheel-drive cars that use diesel have a higher price as compared to the rear-wheel-drive cars that use gas and this is also seen with the forward-wheel drive cars.&lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;This was a brief explanation of boxplots. Understanding what they &lt;br&gt;
are, how to create them, interpret them, and use them to get insights from data.  Hopefully, those who read this article will find it useful.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Aspect-based sentiment analysis of video reviews</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Thu, 31 Mar 2022 12:20:19 +0000</pubDate>
      <link>https://dev.to/puritye/aspect-based-sentiment-analysis-of-video-reviews-4cap</link>
      <guid>https://dev.to/puritye/aspect-based-sentiment-analysis-of-video-reviews-4cap</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;I've always loved participating in hackathons, generally because they are a great way of showcasing my work, improving my skills and networking. I came across the Deepgram Hackathon on DEV a little bit late, that is a week after it was launched and with the approaching deadline, the Innovative Ideas category was the best option for me to be able to submit in time and not miss out. Another reason as to why I chose the Innovative Ideas category is because, I am still not advanced in programming and so it would have been quite hard for me to come up with code that would successfully complete the 'Build' challenge. Also, it's my first time participating in a Hackathon where I am to come up with an innovative idea. This is quite interesting and I'd really like to try it out.&lt;br&gt;
This is my first time encountering Deepgram, though I do have basic knowledge of speech recognition technology and have encountered it in quite a number of occasions.&lt;/p&gt;

&lt;h3&gt;
  
  
  My Deepgram Use-Case
&lt;/h3&gt;

&lt;p&gt;My idea is to create a model that will be able to perform aspect-based sentiment analysis of video reviews. The model will be able to extract the sentiment and classify the video review as either positive, negative or neutral. It will also be able to extract the aspect and classify the video review into the category, feature or topic that is being talked about in the video. &lt;br&gt;
Customer reviews have been discovered to make a huge impact on whether a customer will make a purchase from a company or not, actually, much more than the marketing and advertising the company does on its brand. Having customer reviews is a great way for a company or a business to gain customers' trust on its products and or services, that is in the case of good and positive reviews. But in the case of negative reviews, then it is a good way for a company or business to understand the customers better and make improvements on their products or services. Customer reviews can also be used by investors for finance and stock monitoring. Investors can choose on a company to invest in by looking at the sentiments of the company's products.&lt;br&gt;
Customer reviews have most of the time been written text, but of late video reviews are becoming more popular with customers. As it has always been said, 'People are more likely to believe what they see than what they hear.' Most video reviews are usually done on products, where someone gives his or her experience in using a product and gives a real-time demonstration of the product.&lt;br&gt;
Because sentiment analysis is usually done on text, for this project, Deepgram will help with their speech-to-text technology by transcribing the speech from the videos into text after which aspect-based sentiment analysis will be done on the transcribed text.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffti6jg73wu1s26xcr78f.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffti6jg73wu1s26xcr78f.PNG" alt="Image shows the process of the project"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Dive into Details
&lt;/h3&gt;

&lt;p&gt;The main challenge that will be solved by Deepgram is to get the speech or audio from the videos into text format so that aspect-based sentiment analysis can be done on the text.&lt;br&gt;
The main people who will benefit from this innovation are companies, businesses and investors. The main idea here is to automate the process of the sentiment analysis of video reviews. The benefits that the companies and investors will get from this innovation are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Saving time. In the case of a large number of video reviews to be analyzed, manually analyzing the videos could take a lot of time. Something that could take a short time through automation.&lt;/li&gt;
&lt;li&gt;Having a more trustworthy analysis. In most cases when performing analysis, humans tend to rely on their own experiences and unconscious biasness to derive meaning. The automated analysis will be able to remove human biasness through consistent analysis.&lt;/li&gt;
&lt;li&gt;Having a more powerful analysis. They will be able to perform analysis without having limits on the data size to be analyzed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This innovation will make use of Deepgram's model feature, with the option set to video, this is because the audios will be sourced from videos. Since this innovation is all about text analysis, Deepgram's keywords, utterances and utterance split will be very vital. With the keyword feature enabled, the model will be able to intensify a keyword or supress a keyword. Thus, enabling it to understand the context in the text. This a huge plus since the analysis will require a clear and a well understood context. &lt;/p&gt;

&lt;p&gt;What lead me to this particular idea is that while searching for inspirations from &lt;a href="https://developers.deepgram.com/events/dev-to-hackathon-2022/" rel="noopener noreferrer"&gt;Deepgram support page&lt;/a&gt;, I came across &lt;a href="https://developers.deepgram.com/blog/2022/01/live-transcription-badge-video/" rel="noopener noreferrer"&gt;this project&lt;/a&gt; by Kevin Lewis, where you would have a wearable screen that live captions your voice to help people understand you while wearing a mask. Suddenly, it rang in my mind that there's actually a lot we can get from reading texts, the sentiments being relayed, topic and context being discussed. And that is how I ended up coming up with the idea of aspect-based sentiment analysis of video reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;By participating in this challenge, I have really got to learn a lot about Deepgram, its features and how they can be used. It has also been really nice to go through the other participants' posts and learn so much about the several ways in which speech to text technology can be used.&lt;/p&gt;

</description>
      <category>hackwithdg</category>
      <category>nlp</category>
      <category>beginners</category>
    </item>
    <item>
      <title>K-MEANS CLUSTERING</title>
      <dc:creator>Purity-Nyagweth</dc:creator>
      <pubDate>Fri, 28 Jan 2022 09:29:12 +0000</pubDate>
      <link>https://dev.to/puritye/k-means-clustering-429i</link>
      <guid>https://dev.to/puritye/k-means-clustering-429i</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;What is K-means Clustering?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;K-means clustering is a category of unsupervised learning. It falls under the class of clustering algorithms. Clustering algorithms find similarities in the data in order to group them into clusters.&lt;/p&gt;

&lt;p&gt;The K in K-means represents the number of clusters that the data points are to be grouped into while the means, comes about by the fact that after creating the clusters, K-means then gets the mean of each cluster and uses them as the new centroids(center of the clusters). &lt;/p&gt;

&lt;p&gt;The number of clusters (K) is usually predetermined. K-means clustering creates a predetermined number of clusters from an unlabeled multidimensional data. &lt;/p&gt;

&lt;p&gt;The following two assumptions are the basis for the K-means model;  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cluster center is the arithmetic mean of all points belonging to the cluster.&lt;/li&gt;
&lt;li&gt;Each point is closer to its own cluster center than to other cluster centers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Steps for K-means clustering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guess random cluster centers&lt;/li&gt;
&lt;li&gt;Assign points to the nearest cluster center&lt;/li&gt;
&lt;li&gt;Get the mean of the clusters and take them as the new cluster centers &lt;/li&gt;
&lt;li&gt;Repeat steps 3 and 4 until convergence (same points are assigned to the same cluster in consecutive iterations) or the new cluster centers formed do not change or until the number of iterations is reached.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to choose the right K (number of clusters)&lt;/strong&gt;&lt;br&gt;
There are quite a number of methods that are used to choose the number of clusters when using K-means. These include; the elbow method, silhouette method, and sum of squares method among many others. We are going to discuss the elbow method in detail.&lt;/p&gt;

&lt;p&gt;A major property of clusters is that data points in a cluster are to be similar. Meaning that the clustering algorithms are to form clusters such that intra-cluster variation(WCSS) is minimized. WCSS which in full means within cluster sum of squares is the sum of the squared distance between each member of the cluster and the centroid.&lt;/p&gt;

&lt;p&gt;In the Elbow method, the WCSS at each number of clusters is calculated and graphed. One should choose a number of clusters such that adding another cluster doesn’t minimize the WCSS more. This will be a point where there is a change of slope from steep to shallow (an elbow).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The following are steps for performing the elbow method;&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run K-means clustering for different values of k. For example values of K ranging from 1 to 12&lt;/li&gt;
&lt;li&gt;For each cluster calculate its WCSS&lt;/li&gt;
&lt;li&gt;Plot a graph of WCSS against the number of clusters k&lt;/li&gt;
&lt;li&gt;Spot the point where there is a change of slope from steep to shallow (an elbow). This will be the optimal number of clusters.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Implementing k-means with python&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We are going to use the &lt;a href="https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python"&gt;mall_customers&lt;/a&gt; dataset from Kaggle.&lt;/p&gt;

&lt;p&gt;Snippet for loading the dataset and viewing the  first 5 rows of the data&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# loading dataset
mall_customers = pd.read_csv("data path")
mall_customers.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--R2tZx4Ry--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l36xszc8zfdd4tos3ka6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--R2tZx4Ry--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l36xszc8zfdd4tos3ka6.PNG" alt="First 5 rows of the data" width="509" height="179"&gt;&lt;/a&gt;&lt;br&gt;
The dataset has 5 columns. We are going to use the columns 'age' and 'spending score' to make the clusters. We want to group the ages according to their spending score.&lt;/p&gt;

&lt;p&gt;We will get the part of dataset that is needed&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X = mall_customers[['Age', 'Spending Score (1-100)']]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, we will determine the number of clusters needed by using the elbow method. We will run K-means for different values of K in the range of 1 to 10, calculate each of their WCSS, and plot a graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wcss = []
for i in range(1,11):
    kmeans = KMeans(n_clusters=i, random_state=0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)
plt.plot(range(1,11),wcss)
plt.title('Plot of WCSS against number of clusters')
plt.ylabel('WCSS')
plt.xlabel('Number of clusters')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kTQzHZR0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kjr7hdcdvyd0f20j7lck.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kTQzHZR0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kjr7hdcdvyd0f20j7lck.PNG" alt="Graph of WCSS against cluster number" width="641" height="416"&gt;&lt;/a&gt;&lt;br&gt;
From the graph above, 4 is our optimal number of clusters. &lt;/p&gt;

&lt;p&gt;We will now proceed to clustering the data into 4 clusters and visualize them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#defining a k=4 kmeans cluster model
kmeans_4 = KMeans(n_clusters=4, random_state=0)

#fitting data into the model
assignments = kmeans_4.fit_predict(X)

#creating dictionary to assign cluster numbers to colours for visualization
col_dic = {0:'blue',1:'green',2:'orange',3:'magenta'}

#mapping cluster numbers to colours
assign_colour = [col_dic[x] for x in assignments]

#visualization
plt.scatter(X['Age'], X['Spending Score (1-100)'], color=assign_colour)
plt.ylabel('Spending score')
plt.xlabel('Age')
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rigiQKcM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ftsv5tphufyc7pms6wde.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rigiQKcM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ftsv5tphufyc7pms6wde.PNG" alt="Clusters visualization" width="585" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
In this article, we have discussed K-means clustering algorithm, what it is, the steps to take when creating K-means, and used python to implement it. Hope you enjoyed reading through and found the article helpful.&lt;/p&gt;

&lt;p&gt;Feel free to give feedback or comment if you have got any so that we all keep learning.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
