Securing your website in 4 minutes - What, Why and How of HTTPS

David Israwi — Tue, 21 Aug 2018 02:07:02 +0000

Today I changed my website's protocol from HTTP to HTTPS - it was quick and easy. After finishing, I wasn't sure what I had really accomplished, so I did some research into what it really meant to create a secure connection between you and a website.

Here is a quick summary.

When you submit a body of text to a website (e.g. log-in info, chat message, search query), the information is sent to a server that may return information back to you. This exchange of information happens using the HyperText Transfer Protocol. The issue is the vulnerability of this information; any person intercepting this network can see your message, this is not good for your data.

Image: this is a sample package sent from my computer to my site before changing the protocol. Caught using Wireshark.

This vulnerability is the reason why HTTPS (HTTP + Secure) is strongly encouraged.

This protocol encrypts your message and sends a public key to the recipient through SSL certificates. This public key is used for end-to-end encryption, or to verify certificate signatures (thanks to Vin in the comments for clarification).

What if I don't send/receive sensitive data from my website?

HTTPS has more benefits other than just securing the exchange of information:

Ward off intruders from identifying your users by analyzing your information exchange.
Reduce the risk of anyone exploiting the resources of your website to their benefit.
As Progressive Web Apps grow in popularity, Service Workers (used for push notifications) require the use of HTTPS.
Other benefits of Service Workers include offline behavior and caching.

Changing your website to use HTTPS

There is a 5 minute video made by httpsiseasy explaining how to do this. Here is their step by step tutorial I followed using Cloudflare.

Go to Cloudflare
Sign up
Enter your website's domain. Enter, free, continue, enter
The service will give you two DNS nameservers along with instructions to add it to your website.
Hit Crypto on the toolbar, change "Always Use HTTPS" to On

Do this and you're donzo, the change may take from several minutes up to 48 hours, but nothing else is needed from you.

After doing this, I was chatting with my brother (@sammyisra) and told him I used Cloudfare to do this, he told me he had used Netlify. I'm curious what most people have used, please leave a comment below sharing what service you used and why.

Thank you!

Other useful resources:

Build a quick Summarizer with Python and NLTK

David Israwi — Thu, 17 Aug 2017 04:43:08 +0000

If you're interested in Data Analytics, you will find learning about Natural Language Processing very useful. A good project to start learning about NLP is to write a summarizer - an algorithm to reduce bodies of text but keeping its original meaning, or giving a great insight into the original text.

There are many libraries for NLP. For this project, we will be using NLTK - the Natural Language Toolkit.

Let's start by writing down the steps necessary to build our project.

4 steps to build a Summarizer

Remove stop words (defined below) for the analysis
Create frequency table of words - how many times each word appears in the text
Assign score to each sentence depending on the words it contains and the frequency table
Build summary by adding every sentence above a certain score threshold

That's it! And the Python implementation is also short and straightforward.

What are stop words?

Any word that does not add a value to the meaning of a sentence. For example, let's say we have the sentence

A group of people run every day from a bank in Alafaya to the nearest Chipotle

By removing the sentence's stop words, we can narrow the number of words and preserve the meaning:

Group of people run every day from bank Alafaya to nearest Chipotle

We usually remove stop words from the analyzed text as knowing their frequency doesn't give any insight to the body of text. In this example, we removed the instances of the words a, in, and the.

Now, let's start!

There are two NLTK libraries that will be necessary for building an efficient summarizer.

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

Note: There are more libraries that can make our summarizer better, one example is discussed at the end of this article.

Corpus

Corpus means a collection of text. It could be data sets of poems by a certain poet, bodies of work by a certain author, etc. In this case, we are going to use a data set of pre-determined stop words.

Tokenizers

Basically, it divides a text into a series of tokens. There are three main tokenizers - word, sentence, and regex tokenizer. For this specific project, we will only use the word and sentence tokenizer.

Removing stop words and making frequency table

First, we create two arrays - one for stop words, and one for every word in the body of text.

Let's use text as the original body of text.

stopWords = set(stopwords.words("english"))
words = word_tokenize(text)

Second, we create a dictionary for the word frequency table. For this, we should only use the words that are not part of the stopWords array.

freqTable = dict()
for word in words:
    word = word.lower()
    if word in stopWords:
        continue
    if word in freqTable:
        freqTable[word] += 1
    else:
        freqTable[word] = 1

Now, we can use the freqTable dictionary over every sentence to know which sentences have the most relevant insight to the overall purpose of the text.

Assigning a score to every sentence

We already have a sentence tokenizer, so we just need to run the sent_tokenize() method to create the array of sentences. Secondly, we will need a dictionary to keep the score of each sentence, this way we can later go through the dictionary to generate the summary.

sentences = sent_tokenize(text)
sentenceValue = dict()

Now it's time to go through every sentence and give it a score depending on the words it has. There are many algorithms to do this - basically, any consistent way to score a sentence by its words will work. I went for a basic algorithm: adding the frequency of every non-stop word in a sentence.

for sentence in sentences:
    for wordValue in freqTable:
        if wordValue[0] in sentence.lower():
            if sentence[:12] in sentenceValue:
                sentenceValue[sentence[:12]] += wordValue[1]
            else:
                sentenceValue[sentence[:12]] = wordValue[1]

Note: Index 0 of wordValue will return the word itself. Index 1 the number of instances.

If sentence[:12] caught your eye, nice catch. This is just a simple way to hash each sentence into the dictionary.

Notice that a potential issue with our score algorithm is that long sentences will have an advantage over short sentences. To solve this, divide every sentence score by the number of words in the sentence.

So, what value can we use to compare our scores to?

A simple approach to this question is to find the average score of a sentence. From there, finding a threshold will be easy peasy lemon squeezy.

sumValues = 0
for sentence in sentenceValue:
    sumValues += sentenceValue[sentence]

# Average value of a sentence from original text
average = int(sumValues/ len(sentenceValue))

So, what's a good threshold? The wrong value could give a summary that is too small/big.

The average itself can be a good threshold. For my project, I decided to go for a shorter summary, so the threshold I use for it is one-and-a-half times the average.

Now, let's apply our threshold and store our sentences in order into our summary.

summary = ''
for sentence in sentences:
        if sentence[:12] in sentenceValue and sentenceValue[sentence[:12]] > (1.5 * average):
            summary +=  " " + sentence

You made it!! You can now print(summary) and you'll see how good our summary is.

Optional enhancement: Make smarter word frequency tables

Sometimes, we want two very similar words to add importance to the same word, e.g., mother, mom, and mommy. For this, we use a Stemmer - an algorithm to bring words to its root word.

To implement a Stemmer, we can use the NLTK stemmers' library. You'll notice there are many stemmers, each one is a different algorithm to find the root word, and one algorithm may be better than another for specific scenarios.

from nltk.stem import PorterStemmer
ps = PorterStemmer()

Then, pass every word by the stemmer before adding it to our freqTable. It is important to stem every word when going through each sentence before adding the score of the words in it.

And we're done!

Congratulations! Let me know if you have any other questions or enhancements to this summarizer.

Thanks for reading my first article! Good vibes

DEV Community: David Israwi