<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: El Marie 💜</title>
    <description>The latest articles on DEV Community by El Marie 💜 (@lornamariak).</description>
    <link>https://dev.to/lornamariak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F48924%2Ffae879e3-6c72-4c7b-9214-0ec1694f332f.jpg</url>
      <title>DEV Community: El Marie 💜</title>
      <link>https://dev.to/lornamariak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lornamariak"/>
    <language>en</language>
    <item>
      <title>Answering questions about the UDACITY Artificial Intelligence Product Manager Nanodegree Certification  👩🏾‍🎓🥳</title>
      <dc:creator>El Marie 💜</dc:creator>
      <pubDate>Thu, 25 Jun 2020 13:10:04 +0000</pubDate>
      <link>https://dev.to/lornamariak/answering-questions-about-the-udacity-artificial-intelligence-product-manager-nanodegree-certification-52b6</link>
      <guid>https://dev.to/lornamariak/answering-questions-about-the-udacity-artificial-intelligence-product-manager-nanodegree-certification-52b6</guid>
      <description>&lt;p&gt; I originally posted this on my &lt;a href="https://www.youtube.com/lornamaria"&gt; youtube channel &lt;/a&gt; thus all the opening theatrics.&lt;/p&gt;

</description>
      <category>udacity</category>
      <category>aiproductmanager</category>
      <category>nanodegree</category>
    </item>
    <item>
      <title>Building a personal data science project</title>
      <dc:creator>El Marie 💜</dc:creator>
      <pubDate>Thu, 11 Jun 2020 14:14:26 +0000</pubDate>
      <link>https://dev.to/lornamariak/building-a-personal-data-science-project-3c5b</link>
      <guid>https://dev.to/lornamariak/building-a-personal-data-science-project-3c5b</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;The key to learning anything is practice, practice, and practice. The higher you go on the learning curve the more complex your practice exercises get. Do you remember the leap from your first coding exercise to building side projects over the weekend? It all builds on incremental practice.&lt;/p&gt;

&lt;p&gt;One of my biggest challenges as a self-taught analyst(for the most part) was building my data science projects portfolio so I began with building my little data science projects.&lt;/p&gt;

&lt;p&gt;Let us talk about how you too can plan and build your little first project which could be your gateway to a new job or an addition to your portfolio.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identify a case/problem
&lt;/h3&gt;

&lt;p&gt;Your choice of a case to work on can potentially be solving a problem, bringing a new perspective to existing phenomena or proving an unknown phenomenon. Take your time to research on scenarios you are interested in, what you can potentially do with them and whatever you choose, aim to do a comprehensive data analysis on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip: Always pick a problem or a scenario that correlates to something that you are passionate about building that way you will stay motivated to work through the entire project.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding a data set
&lt;/h3&gt;

&lt;p&gt;Now that you have a case, you need to set out to find data that relates to the case. There are so many free data sets across the internet, however, if you want to take it a notch higher you might try to collect your data so that way you can learn what goes into designing a form that collects the right detail for your case. Otherwise, you can pick a sample data set from all the open data forums on the internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip: Kaggle has a great collection of datasets and lets you see what other people have done with these datasets, a great start for your little project!&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Asking questions &amp;amp; Telling stories
&lt;/h3&gt;

&lt;p&gt;Data Science centres on asking questions but not just any questions, you need to ask the right questions. In this step, you are going to formulate several questions that you will answer using the data set at hand and present these in a story flow to make sense of the data. One of the other common facets of data science is storytelling and using the answers to your questions you can write a great and compelling story to justify the case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip: Write a script to prepare for this step, it will guide you on writing a great story and make your analysis straight to the point.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Examining Trends and interesting facts
&lt;/h3&gt;

&lt;p&gt;While working with data, it is important to think outside the box and explore parameters from different categories to seek any unknown correlations and this step is also crucial in addressing bias and stereotypes in the data. At this step list down on all variants of test cases that can be applied on the data at hand and test each one while tweaking ideas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip: This is a discovery process, so be open-minded.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Presentation
&lt;/h3&gt;

&lt;p&gt;Now it is time to visualize and communicate your findings. First understand the audience you intend to communicate your findings to, as much as we love graphs, I recommend layering graphs with other graphics to captivate your audience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip: Explore visualizations out of the traditional graphs and consider layering different graph styles to create captivating graphs.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Once you create your little data science project, share it with the world and add it to your portfolio. This can help someone in the industry and contributes to the wider knowledge that the data science community is building around the world.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Exploring Sentiment Analysis as an application of text mining.</title>
      <dc:creator>El Marie 💜</dc:creator>
      <pubDate>Tue, 30 Jan 2018 20:13:47 +0000</pubDate>
      <link>https://dev.to/lornamariak/exploring-sentiment-analysis-o6j</link>
      <guid>https://dev.to/lornamariak/exploring-sentiment-analysis-o6j</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fh9dvvoo3rgbg7z4xjylf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fh9dvvoo3rgbg7z4xjylf.jpg" alt="Photo Credit: Pixabay"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This article is part 2 of Understanding Text Mining.If you just landed here, Part 1 is available &lt;a href="https://dev.to/lornamariak/understanding-and-writing-your-first-text-mining-script-withr-345k"&gt;here&lt;/a&gt;.&lt;br&gt;
One of the applications of text mining is sentiment analysis.In order for us to go ahead and carry out a sentiment analysis of our mined text, we are required to clean and prepare our data set as we saw in Part 1.&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding Sentiment Analysis
&lt;/h2&gt;

&lt;p&gt;Sentiment Analysis: The study of extracted information to identify reactions, attitudes, context, and emotions.As one of the applications of text mining, sentiment analysis exposes the attitudes in the mined text.&lt;/p&gt;

&lt;p&gt;It is based on word polarities, it takes into account positive or negative words and neutral words are dismissed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Flxz42i73p3jml05atk9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Flxz42i73p3jml05atk9t.png" alt="Table showing word polarity examples"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sentiment analysis is done based on lexicons. A lexicon in simpler terms is a vocabulary, say the English lexicon.In this context, a lexicon is a selection of words with the two polarities that can be used as a metric in sentiment analysis.&lt;/p&gt;

&lt;p&gt;There are many different types of lexicons that can be used depending on the context of the data you are working with.There is also a possibility of creating a custom lexicon depending on how much customization we would like to make with your data.&lt;/p&gt;

&lt;p&gt;In this article, we shall make use of the syuzhet package.While there are a number of packages for sentiment analysis on CRAN, the syuzhet package is great to learn with because it is a combination of the most common lexicons like nrc, bing, and afinn.&lt;br&gt;
We also make use of ggplot2 to further visualize our results from the sentiment analysis.&lt;/p&gt;
&lt;h2&gt;
  
  
  How does Sentiment analysis work?
&lt;/h2&gt;

&lt;p&gt;In simple terms, sentiment analysis is performed as an intersection of a term-document (built from the mined text ) and a lexicon of choice.&lt;/p&gt;

&lt;p&gt;The first step is to have a term-document and a lexicon of your choice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fd3t0evjy423rk00u2z8f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fd3t0evjy423rk00u2z8f.png" alt="The first step is to have a term-document and a lexicon of your choice."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then form an intersection between the two sets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F83i67718zz9dwcoqsgns.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F83i67718zz9dwcoqsgns.png" alt="Then form an intersection between the two sets."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Hands-on with Sentiment analysis
&lt;/h2&gt;

&lt;h4&gt;Example one&lt;/h4&gt; 

&lt;p&gt;This is a simple example where we extract emotions from a sentence.We load the sentence, split each word using the strsplit() function to form a character vector and use the get_nrc_sentiment() function from the syuzhet library.This function takes in new_sentence and compares it with the nrc emotion lexicon to return the scores as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;syuzhet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"i love cats such a bundle of joy."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;new_sentence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;as.character&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;get_nrc_sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#This is the output&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;anger&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;anticipation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;disgust&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fear&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;joy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sadness&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;surprise&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trust&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;negative&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;positive&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;Example two&lt;/h4&gt;
 

&lt;p&gt;This second example makes use of a TED talks data set that was downloaded from Kaggle under the name transcript.csv.It underwent cleaning using the tm package following the steps in part 1 of this article and was carried forward for sentiment analysis in this part 2.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#load the libraries&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;syuzhet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ggplot2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#mydataCopy is a term document,generated from cleaning #transcripts.csv &lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydataCopy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#carryout sentiment mining using the get_nrc_sentiment()function #log the findings under a variable result&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get_nrc_sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;as.character&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydataCopy&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#change result from a list to a data frame and transpose it &lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#rowSums computes column sums across rows for each level of a #grouping variable.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rowSums&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#name rows and columns of the dataframe&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nf"&gt;names&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"count"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cbind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"sentiment"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rownames&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;rownames&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;NULL&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#plot the first 8 rows,the distinct emotions&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;qplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;ggtitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"TedTalk Sentiments"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c1"&gt;#plot the last 2 rows ,positive and negative&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;qplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;9&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;ggtitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"TedTalk Sentiments"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Farjlve7c28nv4vgm0l5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Farjlve7c28nv4vgm0l5z.png" alt="Plot 1: Shows distinct emotions"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fys5yzu537d4qin7rromf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fys5yzu537d4qin7rromf.png" alt="Plot 2: Shows the combination of emotions under two polarities."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Plot 1: Shows distinct emotions&lt;br&gt;
Plot 2: Shows the combination of emotions under two polarities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We have applied our sentiment analysis tricks on mined text to come up with an evident description of the emotions attached to text data.&lt;/p&gt;

&lt;p&gt;This could be a whole project that can help you gain insights on how and when to talk to your audience, what they feel about a certain topic /product/service and what better way you can interact with them.&lt;/p&gt;

&lt;p&gt;Now, go ahead choose an article/dataset /campaign that you want to try sentiment analysis on and follow the steps.&lt;/p&gt;

&lt;p&gt;Happy Coding, I am always here to help &amp;lt;- &lt;a class="mentioned-user" href="https://dev.to/lornamariak"&gt;@lornamariak&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
      <category>textmining</category>
      <category>sentimentanalysis</category>
    </item>
    <item>
      <title>Understanding and Writing your first Text Mining Script with R.</title>
      <dc:creator>El Marie 💜</dc:creator>
      <pubDate>Thu, 11 Jan 2018 13:09:32 +0000</pubDate>
      <link>https://dev.to/lornamariak/understanding-and-writing-your-first-text-mining-script-withr-345k</link>
      <guid>https://dev.to/lornamariak/understanding-and-writing-your-first-text-mining-script-withr-345k</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ftj2hzs1pjc9grhiteh4f.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ftj2hzs1pjc9grhiteh4f.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;One of the reasons data science has become popular is because of it’s ability to reveal so much information on large data sets in a split second or just a query.&lt;/p&gt;

&lt;p&gt;Think about it deeply, on a daily basis how much information in form of text do we give out? All this information contains our sentiments, our opinions, our plans, pieces of advice, our favorite phrase among other things.&lt;/p&gt;

&lt;p&gt;However, revealing each of those this can seem like finding a needle from a haystack at a glance until we use techniques like text mining/ analysis.&lt;br&gt;
Text mining takes into account information retrieval, analysis and study of word frequencies and pattern recognition to aid visualization and predictive analytics.&lt;/p&gt;

&lt;p&gt;In this article, We go through the major steps that a dataset undergoes to get ready for further analysis.we shall write our script using R and the code will be written in R studio.&lt;/p&gt;

&lt;p&gt;To achieve our goal, we shall use an R package called “tm”.This package supports all text mining functions like loading data, cleaning data and building a term matrix.It is available on CRAN.&lt;/p&gt;

&lt;h5&gt;Let’s install and load the package in our work space to begin with.&lt;/h5&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#downloading and installing the package from CRAN&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;install.packages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tm"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#loading tm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Loading Data
&lt;/h1&gt;

&lt;p&gt;Text to be mined can be loaded into R from different source formats.It can come from text files(.txt),pdfs (.pdf),csv files(.csv) e.t.c ,but no matter the source format ,to be used in the tm package it is turned into a “corpus”.&lt;/p&gt;

&lt;p&gt;A corpus is defined as “a collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject”.&lt;br&gt;
The tm package use the Corpus() function to create a corpus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#loading a text file from local computer&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;newdata&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#Load data as corpus&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#VectorSource() creates character vectors&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Corpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VectorSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newdata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Refer to this guide to learn more about importing files into R.&lt;/p&gt;

&lt;h1&gt;
  
  
  Cleaning Data.
&lt;/h1&gt;

&lt;p&gt;Once we have successfully loaded the data into the workspace, it is time to clean this data.Our goal at this step is to create independent terms(words) from the data file before we can start counting how frequent they appear.&lt;/p&gt;

&lt;p&gt;Since R is case sensitive, we shall first convert the entire text to lowercase to avoid considering same words like “write” and “Write” differently.&lt;/p&gt;

&lt;p&gt;We shall remove URLs, emojis,non-English words, punctuations, numbers, whitespace and stop words.&lt;br&gt;
Stop words: The commonly used english words like “a”,” is ”,”the” in the tm package are referred to as stop words. These words have to be eliminated so as to render the results more accurate.It is also possible to create your own custom stop words.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;# convert to lower case&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;content_transformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tolower&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#remove ������ what would be emojis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;content_transformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"\\W"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# remove URLs&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;removeURL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"http[^[:space:]]*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;content_transformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;removeURL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# remove anything other than English letters or space&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;removeNumPunct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"[^[:alpha:][:space:]]*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;content_transformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;removeNumPunct&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# remove stopwords&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;removeWords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"english"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#u can create custom stop words using the code below.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#myStopwords &amp;lt;- c(setdiff(stopwords('english'), c("r", "big")),"use", "see", "used", "via", "amp")&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;#mydata &amp;lt;- tm_map(mydata, removeWords, myStopwords)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# remove extra whitespace&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stripWhitespace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# Remove numbers&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;removeNumbers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c1"&gt;# Remove punctuations&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;removePunctuation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Stemming
&lt;/h1&gt;

&lt;p&gt;Stemming is the process of gathering words of similar origin into one word for example “communication”, “communicates”, “communicate”. Stemming helps us increase accuracy in our mined text by removing suffixes and reducing words to their basic forms.We shall use the SnowballC library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SnowballC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tm_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stemDocument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Building a term Matrix and Revealing word frequencies
&lt;/h1&gt;

&lt;p&gt;After the cleaning process, we are left with independent terms that exist throughout the document.These are stored in a matrix that shows each of their occurrences. This matrix logs the number of times the term appears in our clean data set thus being called a term matrix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#create a term matrix and store it as dtm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dtm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TermDocumentMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mydata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Word frequencies: These are the number of times words appear in data set.Word frequencies will indicate to us from the most frequently used words in the data set to the least used using the compilation of occurrences from the term matrix.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;We have just written a basic text mining script, however, it is just the beginning of text mining.The ability to get the text in its raw format and clean it to this point will give us direction to things like building a word cloud, sentiment analysis, and building models.&lt;br&gt;
Hold on to this script because it will come in handy when we start doing sentiment analysis.&lt;br&gt;
Feel free to reach out to me with any question &amp;gt; &lt;a class="mentioned-user" href="https://dev.to/lornamariak"&gt;@lornamariak&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
      <category>textmining</category>
    </item>
    <item>
      <title>"R libraries to aid you to learn data science in 2018" </title>
      <dc:creator>El Marie 💜</dc:creator>
      <pubDate>Wed, 03 Jan 2018 13:16:14 +0000</pubDate>
      <link>https://dev.to/lornamariak/r-libraries-to-aid-you-to-learn-data-science-in-2018-53oc</link>
      <guid>https://dev.to/lornamariak/r-libraries-to-aid-you-to-learn-data-science-in-2018-53oc</guid>
      <description>

&lt;p&gt;2018 is already here!What a year 2017 has been!&lt;br&gt;For someone who started learning data science later this year, it feels like the year has been short.The R learning curve may seem steep however continuous exposure to different tools and libraries/packages can make your experience simpler.&lt;br&gt;
In this article, I share with you R packages under different branches of data science that have made my learning journey worthwhile so far.&lt;/p&gt;

&lt;h3&gt;Data Visualization&lt;/h3&gt; 
This is a very instrumental part of data science, for a data science newbie the ability to create great visualizations gives you the hope that you are on the right track.With great data visualizations comes a sense of appreciation for your work especially from none data scientists.The following packages will come in handy while visualizing in R.

&lt;h4&gt;1.ggplot2 &lt;/h4&gt;
This is an R package that a makes all that work of visualization much easy. It is known as the grammar of graphics and will take care of plotting details, has different graphical options and does great graph layering.&lt;br&gt;
It is available on CRAN.
Here is a great ggplot2 cheat sheet to get you started: &lt;a href="https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf"&gt;ggplot2 cheat sheet&lt;/a&gt;

&lt;h4&gt;2.shiny &lt;/h4&gt; 
This an R package that gives users the power to explore dashboards and web apps.Shiny helps a lot with data collection and manipulation in real time as it handles reactivity in a great way.shiny apps can make use of HTML widgets, CSS themes and javascript actions to interface with R scripts.It is an awesome library for someone interested in data storytelling on their website.&lt;br&gt;
shiny is available on CRAN.

&lt;h3&gt;Data Wrangling&lt;/h3&gt;  
One of the goals of every data scientist should be maximizing the data analysis time.To achieve this one needs to ensure the data they are working with is as clean as possible and can be subjected to manipulation easily.Data wrangling is the process of cleaning up data, removing redundancy and organizing it in a way that makes analysis much easier.The following packages are great and simple data wrangling tools.

&lt;h4&gt;1.tidyr&lt;/h4&gt;
From the tidyr &lt;a href="http://tidyr.tidyverse.org/"&gt;website&lt;/a&gt;&lt;a&gt;&lt;/a&gt;,tidy data is defined as data where &lt;br&gt;
&lt;li&gt;Each variable is in a column.
&lt;/li&gt;
&lt;li&gt;Each observation is a row.
&lt;/li&gt;
&lt;li&gt;Each value is a cell. 
tidyr makes use of simple verbs as R functions like gather()to carry out quick data tidying operations on large datasets.
tidyr is available on CRAN.

&lt;h4&gt;2.dplyr&lt;/h4&gt;
While dealing with data, there are common manipulations that have to be carried out and dplyr helps solve these by providing verb functions to carry out these manipulations.This helps you filter your data and carry out operations that can group the data for deeper meaning.
dplyr is s available on CRAN.


&lt;h3&gt;Data Mining &lt;/h3&gt; 
This is one of the biggest challenges for data science newbies.Although very many websites are full of open data sets and are free, It is also an accomplishing feeling for a data science newbie to learn how to extract a data set from the numerous sources of information on and off the web.
The following libraries will do the magic:

&lt;h4&gt;1.httr &lt;/h4&gt; 
This package will enable you access data via modern web APIs. It makes use of HTTP verb functions, requests return JSON data that can be parsed as R objects and it supports Oauth. This makes it easy for a newbie working with APIs in R.
This package is available on CRAN 


&lt;h4&gt;2.rvest&lt;/h4&gt; 
An R package for web scraping. It reads HTML docs through URLs, selects parts of the document using the CSS selectors and parses HTML tables as data frames in R.
This package is available on CRAN


&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;
The first days of data science can be a bit confusing, however focusing on each one of these branches can help you understand data science step by step.
I wish you a great learning experience in 2018 .Dont stop learning. &lt;br&gt;
Feel free to reach out to me via twitter &lt;a href="https://twitter.com/lornamariak"&gt;@lornamariak&lt;/a&gt; .I am happy to help and give some hype/support.Happy coding!
&lt;/p&gt;


&lt;/li&gt;


</description>
      <category>beginners</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
