## DEV Community

Pedro Matias de Araujo

Posted on • Originally published at Medium on

# An overview about hash functions: Theory and Security

A more common definition for hash functions would be “a function that maps a string of bits of arbitrary length to a fixed-length bit string”.

Hash functions are unidirectional, a surjective function that does not allow reverse, so it is easy to calculate the hash value from any input of arbitrary length.

In contrast, it is harder to return to the previous value of the message from a given hash value.

### Hash Properties

For a cryptographic application, there are three desirable properties:

#### Collision Resistance

It is computationally impracticable to find any pair (x, y) such that H(x)=H(y)

#### Pre-image resistance

For any value h given, it is computationally impracticable to find x such that H(x) = h.

#### Second pre-image resistance

For any block of data x, it is computationally impracticable to find y different from x, such that H(y) = H(x).

The ideal situation is that every hashed value generated is really unique for each input value, but this is not possible. In the hash functions the possible hash values to be generated are finite, since they have a fixed output size and because the input size is theoretically infinite, by the Pigeonhole principle, there will be collisions with two distinct entries, resulting in the same hash .

However, these functions have been constructed to minimize valid collisions, ie, it is very unlikely that two messages that have meaning in context will result in the same hash.

### Hash usage examples

Hash functions have a wide variety of security uses, such as:

#### Message Authentication

Mechanism or service used to verify the integrity of a message; ensures that the information received is the same as the message sent (without modification, insertion, deletion or replay);

#### Digital signatures

The summary value of a message is encrypted with the sender’s private key, any user who has the public key can verify the integrity of the message that is associated with the digital signature;

A summary of a password is stored in a file of the operating system instead of storing the password, in case the password file is violated, the attacker will only be able to obtain the hash of the password;

#### Detection of intruders and viruses

For each F file of the system, the hash H(F) is also stored, if there is any change in F, it will be perceived;

#### Pseudorandom function family (PRF) or Pseudorandom number generator (PRNG)

For symmetric keys generation.

#### Blockchain

The data in the blockchain are “hashes” in each block. If the block changes, ie someone tried to change how many bitcoins, for example, they had or how much they should send, the hash value would be different and everyone could detect that something has changed.

The hash value of the previous block is used to calculate the hash value of the current block, creating a link between the blocks.

Examples of algorithms with a link to their hash implementations are:

Keeping your passwords in the database using hash is a good practice (at the moment we can say that it is mandatory) in case when your database is exposed by a cracker, he can not have direct access to the password of yours customers.

So just create a hash for my password and problem solved? Not totally.

### How hashes are broken

By the properties of hash, we have seen that given a hash H(x) should not be computationally feasible to discover the input x, so it is not possible to “decrypt” the hash, after all it is an operation that does not allow return.

But we know that given an input always the same hash is generated, and the crackers know that too. Then there are some known hash attacks for passwords:

#### Dictionary attacks and brute force

The simplest way to try to find a hash.

A dictionary attack uses a file containing words, phrases, common passwords, and so on to calculate the hash of each and check to see if it hits any of the list or database.

The brute-force attack tests all word possibilities with a defined number of characters and verifies that the hash calculation is equal to some of the list.

#### Lookup Tables

Lookup table is an extremely effective method to break multiple hashes quickly. The general idea is to pre-compute the password hashes of a dictionary and save this value with its corresponding password. A good implementation of a lookup table can process hundreds of hashes per second, and contain billions of hashes.

#### Reverse Lookup Tables

This attack allows an attacker to apply a dictionary or brute-force attack to multiple hashes at the same time.

First, the attacker creates a lookup table that maps each password hash of the database contracts to a list of users who have that hash.

The attacker then makes a dictionary attack or brute force, and upon discovering a password he already has a list of users who have it. This attack is very effective because it is common for many users to have the same password.

#### Rainbow tables

One rainbow table is a pre-calculated hash lookup table. It is a practical example of the Time/memory/data tradeoff attack.

It looks like the lookup table, except that it sacrifices the speed of breaking the hash to make the tables smaller. Because they are smaller, solutions for more hashes can be stored with the same space, being more effective in terms of storage.

Then, exemplifying

We have a client that has the password 123456 and that is stored using the SHA-1 algorithm, so in the bank will have the hash 7c4a8d09ca3762af61e59520943dc26494f8941b

When the cracker is in possession of this hash, he can just use a website or some specific program, which will test if it has this hash in the database and returns the value that generated it. As for example the http://md5decrypt.net/en/Sha1/#answer

The list of 100 most used passwords is probably already in all possible hash banks, so there are campaigns to use increasingly difficult passwords to make these hashed banks difficult. But even with these issues, users do not want to have the problem of using more complex passwords, so as a developer, knowing these attacks, what is the best solution to this problem? Put more “salt” in your hash.

### Salt and Security Deployments

The concept of salt, is to put some additional information in the password to add complexity when calculating the hash, for example:

Placing additional information in the password makes it more difficult to create a lookup table.

### What to avoid when using salt

#### Reuse the salt

Even being a hardcoded salt in the program, or once generated randomly. It is ineffective because if two users have the same password, they will have the same hash. With reverse lookup table attack, a brute force attack could be applied by simply testing the possibilities of salt and once found the salt, all your passwords are vulnerable.

#### Using short salt

If your salt is too short, an attacker can build a lookup table for all possible salts. For example, if you use three ASCII characters, then you have only 857,375 possible salts. And that for computational terms is not much, if each lookup table contains only 1MB of the most common passwords, generating them would be an 837GB, and already have 1TB HD costing 50 dollars today, and it is a very cheap investment compared to the damage in the system.

#### Use username as salt

Although they should be unique in the system, they are usually used by other accounts in other services. An attacker could make the user-password relationship and create lookup tables and break hash that has username as a salt.

To be computationally impossible to create a lookup table for each possible salt, the salt should be long. A good rule is to use a salt that has the same size of the the hash function output, the SHA-256 output is 256 bits (32 bytes), so the salt should have at least 32 random numbers.

Realize that the salt does not have to be a secret, it should only help avoid the attacks that I listed. I particularly like to use the timestamp hash of the time that the password has been changed, so I leave all the properties satisfied and I use the date in the format that I defined, making the task of generating a lookup table even more difficult.