What is Hashing?

#computerscience #security

Hashing is a way to convert readable information into a more secure non-readable format. You can use hashing to secure passwords or any other information you don't want hackers to be able to access. Hashing can also be used to ensure the integrity of data and/or its sender.

There are many different types of hashing algorithms, each of which will give a unique result even if given the same input. For example, MD5 will return a 32 character long string and SHA3-512 will return a 128 character long string. Below is an example of hashing the string "password" and the corresponding result in both of these algorithms.

Input = password
MD5 -> 5f4dcc3b5aa765d61d8327deb882cf99
SHA3-512 -> e9a75486736a550af4fea861e2378305c4a555a05094dee1dca2f68afea49cc3a50e8de6ea131ea521311f4d6fb054a146e8282f8e35ff2e6368c1a62e909716

As you can see the output from these algorithms looks nothing like the input, so it can be difficult to work out the password or secret information. One thing to keep in mind with hashing is that any slight change to the input will completely change the output. For example if we change our input from "password" to "Password", we capitalise the first letter, we get a completely unrelated result.

Input = Password
MD5 -> dc647eb65e6711e155375218212b3964
SHA3-512 -> 01bcfd81213def9ea2369b1f8d668bc44f9d66ecc01b2f95d09fc3594c74bf12f0b80cbe5c8abff328b0c68c165b1404c078ff77fb063e43b01b5404259881b4

You can see how these hashes look completely different to our first set of hashes, even though we only changed the capitalisation of one letter within the input string. As you'll notice the hashed string is a hexidecimal value, it contains numbers and the letters a through f, this gives it more possible combinations.

We can use hashing algorithms on any type of digital content, it doesn't have to be a password or a string. Every piece of digital content can be boiled down to a string of binaries, zeros and ones, which is why we are able to apply hashing algorithms to any digital content, such as images, binary files, passwords, etc.

The reason why hashing algorithms are used for passwords is due to it being a one-way process. You cannot work out the input string from the resulting hash. Imagine taking a raw potato and cooking it into a baked potato. You can turn a raw potato into a baked potato but you cannot turn a baked potato into a raw potato. This is the same for hashing, you can turn any input string into a hash but you cannot turn a hash into its input string. It is a one-way process. This means that you can take a user's password and store it's hashed value in a database. Then when a user signs into their account you can hash their input password and check it against the hash value you saved earlier, as the same input string will return the same hash.

Hashing algorithms can also be used to check the integrity of a message or the sender. When you send a message you could also send a hashed value of the message. When the receiver gets the message they could hash the message and check it against the hash you also sent, if they match they know the message has not been tampered with or changed. Imagine you need to send a document concerning a certain employee, an interceptor of the message could change the employee's name in the document. However, if you also sent the hashed version of the document then the integrity of the document can be checked on the receiving side, ensuring that the document has not been changed in transit.

Simply hashing a password, or a document can still be insecure though. Due to hashing algorithms always returning the same output for the same input, a simple hashing algorithm can be susceptible to a rainbow table attack. Imagine you create a list of passwords that are commonly used, such as "password", "password123", "Password1!", etc, and work out all the MD5 hashes for those passwords. This means if you gained access to a database with a bunch of users MD5 hashed passwords, and they used a password from your list, you could work out what their password was before it was hashed. This is the essence of a rainbow table attack, the attacker has a list of passwords and their hashed values which allows them to work out users' passwords from their hashes if they use a password from the attacker's list. Lists of hashed values can be figured out by anyone and the internet contains lists of already worked out password hashes for different algorithms.

You could also take every possible value for a password and work out it's corresponding hashed value. For example, working out the hash for 'a', then 'b', then 'ab', 'ba', etc. This is called brute-forcing, as you are trying every single combination until you reach the correct result. Imagine trying to figure out someone's phone pin, instead of trying commonly used values such as their birthday you start from '0000', then try '0001', and keep trying every possible combination until you get the correct pin. Although, brute force attacks can be improved by first trying permutations of information about the target. For example, if the target's dog's name is 'Fred' maybe we try 'fred', 'Fred', 'Fr3d', etc before starting a brute force attack.

Adding salt to a hash can help improve its security and reduce the ability of a hacker working out its real value. You salt a hash by adding a randomly generated string to the start or the end to the input value. For example, taking the password "password" and adding "45Ght" to the start of the string to make it "45Ghtpassword", then you hash this salted value. You could either use the same salt for every password in your database or generate a new salt for each user. If you use the same salt for every password then as soon as the hacker works out the salt you're using, and if you are putting it at the start or the end of the password, then they could still work out the input passwords, but they would have to work it out manually. However, if you generate a new salt for each user you would have to store the salt in your database, so then you could verify the password when the user logs in. This makes it harder for the hacker, as they would have to create a separate rainbow table for each password, as the salt is different.

Pepper is another thing you can add to a hash to secure it. Unlike salt, you do not have to store pepper. Imagine adding a random number between 0 to 300 to the start of each user's password before you hash them, you would end up with a different result then if you just hashed the password. Due to not storing the pepper you would have to check each iteration when verifying a login. For example, a user attempts to log in, you would have to run through every hash for the numbers 0 through 300 as you don't store the number you used in the beginning. This won't take a computer very long to do, but it increases the amount of effort the hacker has to go through to figure out a user's password, as they will have to also go through all hashes for all numbers 0 through 300.

There is another problem with hashing passwords, collisions. Collisions occur when two different values are equal to the same hash value. If a hacker is able to work out a collision they would be able to log in as a user by using the collision value rather than the users' actual password. Due to a hash condensing a value into a specific string length, such as 32 or 128 character long string, there are only a finite possible numbers of combinations so there are bound to be collisions. Even though collisions are very rare and highly improbable to occur you should not discount them as a potential weakness. To reduce the probability of a collision happening you can use a hashing algorithm that creates a longer hash string, such as SHA3-512 versus MD5, as it gives more possible combinations and permutations of results.

Hashing is not completely unhackable, regardless of what you do to it. Although, you can add salt, pepper, or your own mechanism, to make it harder for a hacker to work out the original value. The job of hashing isn't to make it impossible for a hacker to work out a user's password its job is to make it so difficult for a hacker that they either don't bother or don't have enough time to physically work it out. For example, a password that is 8 characters long and contains upper case, lower case, and numbers would take 62^8 attempts at most to brute force, or about 218 trillion (218 followed by 12 zeros) attempts without hashing. Whereas a simple MD5 hash would take 16^32 attempts at most to brute force, or about 340 undecillion (340 followed by 36 zeros).

It's important to remember that websites should have a failed login limit, so if you get your password wrong 3 times it locks you out. This would prevent a hacker from running a brute force on the front-end of your website. If you store passwords as plain text, not hashed, in your database then all the hacker has to do is gain access to your database. If you store hashed values of passwords then the hacker not only has to gain access to your database but they also have to run a brute force attack on those hashes.

This article was originally published on https://acroynon.com

Top comments (4)

hedy • Jan 12 '20

I've seen in websites where an hashed output can be reversed. Wonder how that works...

Adam Roynon • Jan 12 '20

The website you linked it to an MD5 hashing algorithm which is no longer considered secure as it is possible to be reversed through rainbow tables or just by brute forcing as there are always a finite number of possibilities (and MD5 hash is always 128 bits long)

hedy • Jan 12 '20

Got it, thank you.

Adam Roynon • Jan 12 '20

Happy to help,
Thank you for the comment as it may help others with similar questions