<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Antaripa Saha</title>
    <description>The latest articles on DEV Community by Antaripa Saha (@doesdatmaksense).</description>
    <link>https://dev.to/doesdatmaksense</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F581476%2F8961dcae-8a9b-4e91-8f2f-17eb86e37b0b.png</url>
      <title>DEV Community: Antaripa Saha</title>
      <link>https://dev.to/doesdatmaksense</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/doesdatmaksense"/>
    <language>en</language>
    <item>
      <title>Handle contractions in text preprocessing - NLP</title>
      <dc:creator>Antaripa Saha</dc:creator>
      <pubDate>Thu, 18 Feb 2021 09:19:13 +0000</pubDate>
      <link>https://dev.to/edualgo/handle-contractions-in-text-preprocessing-nlp-21p</link>
      <guid>https://dev.to/edualgo/handle-contractions-in-text-preprocessing-nlp-21p</guid>
      <description>&lt;p&gt;Text preprocessing is a crucial step in NLP. Cleaning our text data in order to convert it into a presentable form that is analyzable and predictable for our task is known as text preprocessing. In this article, we are going to discuss contractions and how to handle contractions in text.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are contractions?
&lt;/h3&gt;

&lt;p&gt;Contractions are words or combinations of words that are shortened by dropping letters and replacing them by an apostrophe.&lt;/p&gt;

&lt;p&gt;Nowadays, where everything is shifting online, we communicate with others more through text messages or posts on different social media like Facebook, Instagram, Whatsapp, Twitter, LinkedIn, etc. in the form of texts. With so many people to talk to, we rely on abbreviations and shortened forms of words for texting people.&lt;/p&gt;

&lt;p&gt;For example,&lt;br&gt;
&lt;br&gt;
 &lt;code&gt;I’ll be there within 5 min. Are u not gng there? Am I mssng out on smthng? I’d like to see u near d park.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;In English contractions, we often drop the vowels from a word to form the contractions. Removing contractions contributes to text standardization and is useful when we are working on Twitter data, on reviews of a product as the words play an important role in sentiment analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to expand contractions?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Using contractions library
&lt;/h4&gt;

&lt;p&gt;First, install the library. You can try this library on Google colab as installing the library becomes super smooth.&lt;/p&gt;

&lt;p&gt;Using pip:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;!pip install contractions&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;In Jupyter notebook:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;import sys  &lt;br&gt;
!{sys.executable} -m pip install contractions&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Code 1:  For expanding contractions using contractions library&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Python3&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# import library 
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contractions&lt;/span&gt; 
&lt;span class="c1"&gt;# contracted text 
&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll be there within 5 min. Shouldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t you be there too?  
          I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;d love to see u there my dear. It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s awesome to meet new friends. 
          We&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve been waiting for this day for so long.&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;

&lt;span class="c1"&gt;# creating an empty list 
&lt;/span&gt;&lt;span class="n"&gt;expanded_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;     
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; 
  &lt;span class="c1"&gt;# using contractions.fix to expand the shotened words 
&lt;/span&gt;  &lt;span class="n"&gt;expanded_words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contractions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;    

&lt;span class="n"&gt;expanded_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expanded_words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Original text: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Expanded_text: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;expanded_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Original text: I'll be there within 5 min. Shouldn't you be there too? &lt;br&gt;
          I'd love to see u there my dear. It's awesome to meet new friends.&lt;br&gt;
          We've been waiting for this day for so long.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;



&lt;p&gt;&lt;code&gt;Expanded_text: I will be there within 5 min. should not you be there too? &lt;br&gt;
          I would love to see you there my dear. it is awesome to meet new friends. &lt;br&gt;
          we have been waiting for this day for so long.&lt;br&gt;
Removing contractions before forming word vectors helps in dimensionality reduction.&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Code 2: Simply using contractions.fix to expand the text.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Python3&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;She&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;d like to know how I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;d done that!  
          She&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s going to the park and I don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t think I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll be home for dinner. 
          They&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re going to the zoo and she&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll be home for dinner.&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;

&lt;span class="n"&gt;contractions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;'she would like to know how I would do that! &lt;br&gt;
 she is going to the park and I do not think I will be home for dinner.&lt;br&gt;
 they are going to the zoo and she will be home for dinner.'&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Contractions can also be handled using other techniques like dictionary mapping, and also using pycontractions library.&lt;/p&gt;

&lt;p&gt;You can refer to the documentation of pycontractions library for learning more about this: &lt;a href="https://pypi.org/project/pycontractions/" rel="noopener noreferrer"&gt;https://pypi.org/project/pycontractions/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.geeksforgeeks.org/nlp-expand-contractions-in-text-processing/#:~:text=Contractions%20are%20words%20or%20combinations,%2C%20Twitter%2C%20LinkedIn%2C%20etc." rel="noopener noreferrer"&gt;Original article&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
