DEV Community

Cover image for A Quick Introduction: Hashing
Gerald Nash ⚡️
Gerald Nash ⚡️

Posted on

A Quick Introduction: Hashing

What is it?

Hashing is a method of determining the equivalence of two chunks of data. A cryptographic hash function generates a unique string for any set of data. Examples of these data could be files, strings, streams, and any other items that can be represented in binary format.

You've probably seen a hash string on the downloads page of some of your favorite tools, packages, or libraries. For example, Kali Linux has one for each of its releases. But why is that?

This is to ensure that the original file on their server is the same as the one that you've downloaded. For example, the SHA-256 hash of the Kali ISO is below.

Kali ISO Hash

If you download the file, you should hash your local copy. If the resulting hash is equivalent to the one found on their website, you can rest assured that the file has not been tampered with during the download and that you have the same, correct file.

Wait...but how do you hash stuff?

Confused GIF

Excellent question. Let's get technical! I'm assuming you have Python 2 installed, by the way.

1- Let's import the library we need.

import hashlib as hash
Enter fullscreen mode Exit fullscreen mode

2- Now let's choose our hashing algorithm. For more information on their differences, check this out.

sha = hash.sha256()
Enter fullscreen mode Exit fullscreen mode

3- We're basically set up, now we'll go ahead test the function on a string.

# Insert the string we want to hash
sha.update('Hello World!')
# Print the hexadecimal format of the binary hash we just created
print sha.hexdigest()
""" 4d3cf15aa67c88742e63918825f3c80f203f2bd59f399c81be4705a095c9fa0e """
Enter fullscreen mode Exit fullscreen mode

Awesome, there's a SHA-256 hash of the string "Hello World!". Now we'll prove that the hash is different for similar data.

# Note the missing '!'
sha.update('Hello World')
print sha.hexdigest()
""" a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e """
Enter fullscreen mode Exit fullscreen mode

It's totally different.

4- Now that we know that our function works, let's try it on a file

# WARNING: Do NOT do this with large files.
# For large files, see the snippet here -> https://gist.github.com/aunyks/042c2798383f016939c40aa1be4f4aaf
with open('kali.iso', 'rb') as kali_file:
  file_buffer = kali_file.read()
  sha.update(file_buffer)
  print sha.hexdigest()
""" 1d90432e6d5c6f40dfe9589d9d0450a53b0add9a55f71371d601a5d454fa0431 """
Enter fullscreen mode Exit fullscreen mode

There we go. You've got some pretty good knowledge of hashing now. So, go. Go on! Secure the integrity of your data and hash all the things!

Confidence GIF
Also, follow me on Twitter and Github, please.

Latest comments (8)

Collapse
 
brunojennrich profile image
bruno jennrich

what abount salting?

Collapse
 
lt0mm profile image
Tom • Edited
Hashing is a method of determining the equivalence of two chunks of data. 
A cryptographic hash function generates a unique string for any set of data. 

First two lines more mislead than explain, and as Ashley Sheridan pointed out they are not completely correct.

Collapse
 
aswathm78 profile image
Aswath KNM

Thought it was an article about Data Structures . Nice One . But try to write something more .

Collapse
 
raont profile image
Nageswara Rao Teppala

Sweet and Simple.

Collapse
 
ashleyjsheridan profile image
Ashley Sheridan

Just a quick point, but it's important. As the hash can't be 100% guaranteed to be unique (it's just highly likely to be unique) it can only be used to determine if something is different, not to see if two things are the same (although the typical mis-use is to compare for similarities). Given the hash space of Sha1, it's fairly unlikely there will be a hash collision, but not impossible (just look to the recent issues on WebKits SVN repository caused by hash collisions). Like I said, it's a small point, but an important one nonetheless.

Collapse
 
dean profile image
dean

Hashes are also used for passwords, which are the epitome of "hey, make sure that the hash is unique to only one password!"

It's important to know that there are special hash algorithms for passwords that are specifically made for shorter strings (rather than files), and take a relatively long time to compute (so that it's harder to brute-force them).

Collapse
 
engineercoding profile image
Wesley Ameling

I would not recommend shadowing the built in hash function, as it may cause problems. I'd name the import crypt_hash, just to distinguish between the two. Other than that this is a good article! Simple and to the point as I like it :)

Collapse
 
anaptfox profile image
Taron Foxworth

This is a great, simple article!