<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Israwi</title>
    <description>The latest articles on DEV Community by David Israwi (@davidisrawi).</description>
    <link>https://dev.to/davidisrawi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F30196%2Fb0d95271-3ba7-469f-abc9-2356ab3badf5.jpg</url>
      <title>DEV Community: David Israwi</title>
      <link>https://dev.to/davidisrawi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/davidisrawi"/>
    <language>en</language>
    <item>
      <title>Securing your website in 4 minutes - What, Why and How of HTTPS</title>
      <dc:creator>David Israwi</dc:creator>
      <pubDate>Tue, 21 Aug 2018 02:07:02 +0000</pubDate>
      <link>https://dev.to/davidisrawi/securing-your-website-in-4-minutes---what-why-and-how-of-https-3kcm</link>
      <guid>https://dev.to/davidisrawi/securing-your-website-in-4-minutes---what-why-and-how-of-https-3kcm</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu5l7rzt4uog7lyvgsi32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu5l7rzt4uog7lyvgsi32.png" alt="Donzo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Today I changed my website's protocol from HTTP to HTTPS - it was quick and easy. After finishing, I wasn't sure what I had really accomplished, so I did some research into what it really meant to create a secure connection between you and a website. &lt;/p&gt;

&lt;h2&gt;
  
  
  Here is a quick summary.
&lt;/h2&gt;

&lt;p&gt;When you submit a body of text to a website (e.g. log-in info, chat message, search query), the information is sent to a server that may return information back to you. This exchange of information happens using the &lt;em&gt;HyperText Transfer Protocol&lt;/em&gt;.  The issue is the vulnerability of this information; any person intercepting this network can see your message, this is not good for your data. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ff9d35vdx87b0x6ow1gg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ff9d35vdx87b0x6ow1gg2.png" title="No bueno" alt="Catch from Wireshark"&gt;&lt;/a&gt;&lt;br&gt;
Image: this is a sample package sent from my computer to my site before changing the protocol. Caught using Wireshark.&lt;/p&gt;

&lt;p&gt;This vulnerability is the reason why HTTPS (HTTP + &lt;em&gt;Secure&lt;/em&gt;) is strongly encouraged. &lt;/p&gt;

&lt;p&gt;This protocol encrypts your message and sends a public key to the recipient through SSL certificates. This public key is used for end-to-end encryption, or to verify certificate signatures (thanks to Vin in the comments for clarification). &lt;/p&gt;

&lt;h3&gt;
  
  
  What if I don't send/receive sensitive data from my website?
&lt;/h3&gt;

&lt;p&gt;HTTPS has more benefits other than just securing the exchange of information: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ward off intruders from identifying your users by analyzing your information exchange.&lt;/li&gt;
&lt;li&gt;Reduce the risk of anyone exploiting the resources of your website to their benefit.&lt;/li&gt;
&lt;li&gt;As Progressive Web Apps grow in popularity, Service Workers (used for push notifications) require the use of HTTPS.&lt;/li&gt;
&lt;li&gt;Other benefits of Service Workers include offline behavior and caching.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Changing your website to use HTTPS
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Faqllhzrum9j2otepsel6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Faqllhzrum9j2otepsel6.png" alt="Site before changing protocol to HTTPS"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is a 5 minute video made by &lt;a href="https://httpsiseasy.com/" rel="noopener noreferrer"&gt;httpsiseasy&lt;/a&gt; explaining how to do this. Here is their step by step tutorial I followed using Cloudflare. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://www.cloudflare.com/" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sign up&lt;/li&gt;
&lt;li&gt;Enter your website's domain. Enter, free, continue, enter&lt;/li&gt;
&lt;li&gt;The service will give you two DNS nameservers along with instructions to add it to your website.&lt;/li&gt;
&lt;li&gt;Hit Crypto on the toolbar, change "Always Use HTTPS" to On&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do this and you're donzo, the change may take from several minutes up to 48 hours, but nothing else is needed from you. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fsz1rml131nh66yzyxqlp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fsz1rml131nh66yzyxqlp.png" alt="Site after changing protocol to HTTPS"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After doing this, I was chatting with my brother (&lt;a href="https://dev.to/sammyisa"&gt;@sammyisra&lt;/a&gt;) and told him I used Cloudfare to do this, he told me he had used Netlify. I'm curious what most people have used, please leave a comment below sharing what service you used and why.&lt;/p&gt;

&lt;p&gt;Thank you!&lt;/p&gt;

&lt;h3&gt;
  
  
  Other useful resources:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.google.com/web/fundamentals/security/encrypt-in-transit/why-https" rel="noopener noreferrer"&gt;Google - Why HTTPS Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.instantssl.com/https-tutorials/what-is-https.html" rel="noopener noreferrer"&gt;Instant SSL - What is HTTPS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>https</category>
      <category>dns</category>
      <category>web</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Build a quick Summarizer with Python and NLTK</title>
      <dc:creator>David Israwi</dc:creator>
      <pubDate>Thu, 17 Aug 2017 04:43:08 +0000</pubDate>
      <link>https://dev.to/davidisrawi/build-a-quick-summarizer-with-python-and-nltk</link>
      <guid>https://dev.to/davidisrawi/build-a-quick-summarizer-with-python-and-nltk</guid>
      <description>

&lt;p&gt;If you're interested in Data Analytics, you will find learning about Natural Language Processing very useful. A good project to start learning about NLP is to write a summarizer - an algorithm to reduce bodies of text but keeping its original meaning, or giving a great insight into the original text.&lt;/p&gt;

&lt;p&gt;There are many libraries for NLP. For this project, we will be using NLTK - the Natural Language Toolkit. &lt;/p&gt;

&lt;p&gt;Let's start by writing down the steps necessary to build our project.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 steps to build a Summarizer
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Remove stop words (defined below) for the analysis&lt;/li&gt;
&lt;li&gt;Create frequency table of words - how many times each word appears in the text&lt;/li&gt;
&lt;li&gt;Assign score to each sentence depending on the words it contains and the frequency table&lt;/li&gt;
&lt;li&gt;Build summary by adding every sentence above a certain score threshold &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! And the Python implementation is also short and straightforward. &lt;/p&gt;

&lt;h3&gt;
  
  
  What are stop words?
&lt;/h3&gt;

&lt;p&gt;Any word that does not add a value to the meaning of a sentence. For example, let's say we have the sentence&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A group of people run every day from a bank in Alafaya to the nearest Chipotle&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By removing the sentence's stop words, we can narrow the number of words and preserve the meaning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Group of people run every day from bank Alafaya to nearest Chipotle&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We usually remove stop words from the analyzed text as knowing their frequency doesn't give any insight to the body of text. In this example, we removed the instances of the words &lt;em&gt;a&lt;/em&gt;, &lt;em&gt;in&lt;/em&gt;, and &lt;em&gt;the&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now, let's start!
&lt;/h2&gt;

&lt;p&gt;There are two NLTK libraries that will be necessary for building an efficient summarizer. &lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;nltk.corpus&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;nltk.tokenize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sent_tokenize&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; There are more libraries that can make our summarizer better, one example is discussed at the end of this article. &lt;/p&gt;

&lt;h3&gt;
  
  
  Corpus
&lt;/h3&gt;

&lt;p&gt;Corpus means a collection of text. It could be data sets of poems by a certain poet, bodies of work by a certain author, etc. In this case, we are going to use a data set of pre-determined stop words.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tokenizers
&lt;/h3&gt;

&lt;p&gt;Basically, it divides a text into a series of tokens. There are three main tokenizers - word, sentence, and regex tokenizer. For this specific project, we will only use the word and sentence tokenizer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Removing stop words and making frequency table
&lt;/h2&gt;

&lt;p&gt;First, we create two arrays - one for stop words, and one for every word in the body of text.&lt;/p&gt;

&lt;p&gt;Let's use &lt;code&gt;text&lt;/code&gt; as the original body of text.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stopWords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"english"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Second, we create a dictionary for the word frequency table. For this, we should only use the words that are not part of the stopWords array.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;freqTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stopWords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;freqTable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;freqTable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;freqTable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now, we can use the &lt;code&gt;freqTable&lt;/code&gt; dictionary over every sentence to know which sentences have the most relevant insight to the overall purpose of the text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assigning a score to every sentence
&lt;/h2&gt;

&lt;p&gt;We already have a sentence tokenizer, so we just need to run the &lt;code&gt;sent_tokenize()&lt;/code&gt; method to create the array of sentences. Secondly, we will need a dictionary to keep the score of each sentence, this way we can later go through the dictionary to generate the summary.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sent_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sentenceValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now it's time to go through every sentence and give it a score depending on the words it has. There are many algorithms to do this - basically, any consistent way to score a sentence by its words will work. I went for a basic algorithm: adding the frequency of every non-stop word in a sentence.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;wordValue&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;freqTable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;wordValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;wordValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wordValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; Index 0 of wordValue will return the word itself. Index 1 the number of instances.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;sentence[:12]&lt;/code&gt; caught your eye, nice catch. This is just a simple way to hash each sentence into the dictionary.&lt;/p&gt;

&lt;p&gt;Notice that a potential issue with our score algorithm is that long sentences will have an advantage over short sentences. To solve this, divide every sentence score by the number of words in the sentence.&lt;/p&gt;

&lt;h3&gt;
  
  
  So, what value can we use to compare our scores to?
&lt;/h3&gt;

&lt;p&gt;A simple approach to this question is to find the average score of a sentence. From there, finding a threshold will be easy peasy lemon squeezy.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sumValues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sumValues&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Average value of a sentence from original text
&lt;/span&gt;&lt;span class="n"&gt;average&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sumValues&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;So, what's a good threshold? The wrong value could give a summary that is too small/big. &lt;/p&gt;

&lt;p&gt;The average itself can be a good threshold. For my project, I decided to go for a shorter summary, so the threshold I use for it is one-and-a-half times the average. &lt;/p&gt;

&lt;p&gt;Now, let's apply our threshold and store our sentences &lt;em&gt;in order&lt;/em&gt; into our summary.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentenceValue&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;sentenceValue&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;average&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt;  &lt;span class="s"&gt;" "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You made it!! You can now &lt;code&gt;print(summary)&lt;/code&gt; and you'll see how good our summary is. &lt;/p&gt;

&lt;h2&gt;
  
  
  Optional enhancement: Make smarter word frequency tables
&lt;/h2&gt;

&lt;p&gt;Sometimes, we want two very similar words to add importance to the same word, e.g., mother, mom, and mommy. For this, we use a Stemmer - an algorithm to bring words to its root word. &lt;/p&gt;

&lt;p&gt;To implement a Stemmer, we can use the NLTK stemmers' library. You'll notice there are many stemmers, each one is a different algorithm to find the root word, and one algorithm may be better than another for specific scenarios. &lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;nltk.stem&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PorterStemmer&lt;/span&gt;
&lt;span class="n"&gt;ps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PorterStemmer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then, pass every word by the stemmer before adding it to our &lt;code&gt;freqTable&lt;/code&gt;. It is important to stem every word when going through each sentence before adding the score of the words in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  And we're done!
&lt;/h2&gt;

&lt;p&gt;Congratulations! Let me know if you have any other questions or enhancements to this summarizer. &lt;/p&gt;




&lt;p&gt;&lt;em&gt;Thanks for reading my first article! Good vibes&lt;/em&gt;&lt;/p&gt;


</description>
      <category>python</category>
      <category>nlp</category>
      <category>dataanalytics</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
