DEV Community

Elias Baez
Elias Baez

Posted on

Creating a simple Python3 script to hash a file.

First, create and a Python file:

touch sock.py && code sock.py
Enter fullscreen mode Exit fullscreen mode

Also, let’s create a sample text file to modify. Use nano to create the file and add whatever content in the simple GUI you’d like.

nano sample.txt
Enter fullscreen mode Exit fullscreen mode

We’ll start in the sock.py file. This is by and large thanks to this entry on StackOverflow; the explanatory comments added are my own: https://stackoverflow.com/questions/22058048/hashing-a-file-in-python.

import sys
import hashlib # This is a builtin Python library that knows tested hash functions.

buffer_size = 65536 # This will read the data in 64kb chunks.

md5 = hashlib.md5() # The first mainstream hash function; now considered low-security and not recommended.
sha1 = hashlib.sha1() # Developed by the US; currently used. 

with open(sys.argv[1], 'rb') as f: #Open the terminal file. 'rb' = read bytes
      while True: 
        data = f.read(buffer_size) # read the file in 64kb chunks into 'data'
        if not data: # if the file is over
            break
        md5.update(data) # update the hash object with this file
        sha1.update(data) # update again in a second format

print("MD5: {0}".format(md5.hexdigest())) # hexdigest outputs in hex code
print("SHA1: {0}".format(sha1.hexdigest()))
Enter fullscreen mode Exit fullscreen mode

If you run sock.py with your sample.txt included as an argument, you should receive an output in your terminal that looks like this:

input:

python sock.py sample.txt
Enter fullscreen mode Exit fullscreen mode

output:

MD5: 6c5dec6d2deb0f0c1c5fe7d58fdf02c8
SHA1: aedc80ce7e42b12a0ffe0d363043ed22f143c74f
Enter fullscreen mode Exit fullscreen mode

Part of why SHA1 is considered better is simply because it’s longer. Every added bit in the length of a hash is an exponential gain in security. Hashing is useful because it compresses non-uniform files etc. into a uniform size, more easily handled and exchanged. With SHA1, it is nearly impossible for two files to outcome to the same hash, but MD5 is vulnerable to ‘hash collisions’ where a duplicate hash record can be created that contains potentially hazardous content, but passes a computer’s hash recognition.

Top comments (0)