Explain Hashing + salting Like I'm Five

Why password are saved in that way and how they are more reliable to simpler encryption.

You can easily guess that I have used them while making backend projects but I still don't understand them.

Did you find this post useful? Show some love!
DISCUSSION (20)

Multiple values can lead to one hash, so if you hash a password, no one could get the password back from that hash. It prevents you as the owner of a system from knowing which password a user has chosen.

Adding some string to the password, called a salt, before hashing it, makes the hash different again, but this time different from all the other sites where the user used the same password.

So if some malicious person aquired a bunch of password hashes from you they couldn't use them on other sites.

If you use a different salt for every password, you would make the hashes different for every user that used the same password, even on your own site.

Basic example here would be, someone signs up to your service and uses a password for this. Then they somehow steal your hashes. Now they look for their own account and what hash they have and check if someone has the same hash. Then they log in to that other persons account.

If all passwords were salted with their own salt, even the same passwords would have a different hash.

Got a few things to note here:

Multiple values can lead to one hash

This shouldn't happen in a secure hashing algo

so if you hash a password, no one could get the password back from that hash

The reasoning for this actually has to do with the modular arithmetic used in the hashing process. In short, the hashed version of the password has less "data" inherently in the number, so you have to spend much more time essentially performing guesswork on what that lost data was if you want to get the original value from the hash.

Doesn't your second point negate the first?

Strictly in theory, yes, but with the scope of the numbers used, it should be computationally infeasible to generate a collision.

I got a followup question to that if that's okay:

If passwords are always hashed (and salted differently each time), how are you able to produce the same hash to authenticate a user when they're logging into the system?

Is there something that keeps a track of these things?

Normally you store the password and salt together in your db.

HashFunc(string pass):
    salt = randomGenerator()
    hashed = hash.SHA256(pass+salt)
    storeInDB(hashed+β€œ:”+salt)

This way each password have a different salt and they are stored in your db.
For checking,

checkPass(string id, string pass):
    salt = retrivePassFromDB(id).split(β€œ:”)[1]
    isCorrectPass = hash.SHA256(pass+salt) == retriveFromDB(id)
    return isCorrectPass

Rember the main reason of having salt is to prevent the hacker from using the rainbow tables. So it’s totally fine to store them as plan text in the db. Because the hacker will have to generate a whole new rainbow table for each password to be able to check againet them. Which is near impossible with current cpu capabilities.

Thank you for the detailed response, I think I understand the gist of it a lot better now :)

As far as I know you can simply store the hash and salt together in the account record in the DB.

Here's an example you're the owner of the data and I'm the nasty hacker.

Someone signs up for your site and uses the password 'password'
When you save it you MD5 hash it you get '5f4dcc3b5aa765d61d8327deb882cf99'
You now have no idea what the password is, a hash can't be reversed.

Let's say I get a dump of all your users and their encrypted passwords.
I make a script to test every common password, this includes 'password'.
Anyone can do a straight conversion to a MD5 hash so I have '5f4dcc3b5aa765d61d8327deb882cf99'

But you're smarter than I gave you credit for.
When the user created their account you took:

  • the date
  • their forename
  • your website name

and appended them to the password before you hashed it.

The hash has now been salted.
'password_20180816_andrew_dev.to' is the string that now gets encrypted
'9db61ea3e3b86adb63b507cb2a1b2951' is the output.

As I'm scanning through your files looking for '5f4dcc3b5aa765d61d8327deb882cf99' I go right past '9db61ea3e3b86adb63b507cb2a1b2951' and have no idea that's what I was looking for.

Of course, you need to remember the salt in order to convert their password into the hash for checking later.

Gotta drop this in there: Don't use MD5 for password hashing. It's becoming increasingly less secure over time.

There is a general misconception about what the password hashing function should be.

It must be the (contextually reasonably) slowest function to compute. That's why you pick algorithm that does hashing function thousand's of times in a row consuming output as it's next input.

Pick a fast hashing function (the one that is simple, can be accelerated by CPU instruction set or GPU or even FPGA or ASIC programmed) and your passwords are no more safe than using plaintext. It's just a matter of some time.

You aim for security and if a user's login takes 1 second to complete due to necessity to calculate a computationally intensive hash function it's fine because logins happen occasionally and you know that nobody would be able to precompute your hash algorithm rainbow tables with something like 1 kH/s in next 10 years.

Edit:
CRC, MD5, SHA - those are all hashing functions aiming at speed. To calculate unique hash for a chunk of data. They are often used as integrity hashes. You receive a data (a file, a network packet, etc.) that has hash included. You can easily and quickly calculate the hash yourself with these functions and compare it to included hash to verify the file was not tampered with/corrupted during transit.

Agree, most implementations use bcrypt or scrypt these days

Points:

Hash itself can be reversed by using precomputed hashtables (aka rainbow tables). May be easier than you think. There are ways to compute and save hastables with considerable space savings thanks to packing.

The corect sentence should be: Hash function cannot be reversed.

Using a date/time in a salt is stupid idea. Salt must be some algorithm known constant so you can re-use it. If you use changing variable (like a datetime, unless it's immutable, like your birthdate) then you have to store it somewhere with relation to the hash so you can actually compute that hash again to compare it with user's provided password. It's like using user's firstname as part of the salt. It is too obvious and it's right next in the user's table in the database.

The best way is to keep the salt solely in your obfuscated code in memory and compressed and encrypted on the disk. Stealing the database does not then give too many information to guess the salt.

I don't understand rainbow tables fully, can you explain how a good salt doesn't make them pointless?

Let's imagine I am able to dump the Users table of your application using an undiscovered SQL injection error.

I will register as a user for your application and use password 'ABCD1234'.

Secretly your application appends '_S3cr3t!' to the plaintext password as a salt and caluclate a hash.

I will dump your database, find a hash of my password, feed it to JohnTheRipper with a mask of 'ABCD1234?????????' if not working then '????????ABCD1234'.
Just a matter of time (and money if I want a fast hashrate accelerated by GPUs) until I find a hash of 'ABCD1234_S3cr3t!' matches.

Then I build a rainbow tables of all hashes '[A-Z][a-z][0-9][special_chars1]{1-10}_S3cr3t!' to decrypt all hashes in your application.

Both terms are inspired from cooking.

A hash, like a hash brown, means something that has been chopped (into bits, heh) and mixed, whereas salt is an ingredient that we add beforehand to make the hash taste betterβ€”to make it unique.

Source: my blog :D

Encryption is an image puzzle, we cannot guess the image when it's in pieces but it's meant to be built to find it. So with only pieces you can find easily(more or less) this image. In password case this is bad because you can find an original password with an encrypted one easily.

Hashing is one-way, but deterministic: hash twice the same value, and you get twice the same output. So in password case it is hard to find the original string because there are no logic built for it, and you need to "find" real password to check if it's the one in your database.

Salting is adding a personal touch to every hashing. For exemple is your password case, is two user use same password, because you hash it, it will produce the same output (a given entries, always have same output) but if instead of only password you hash a string with password+login, the output will be different, even if two user use the same password.

so in summary :

Encryption => easy to crack, once an attacker find encryption type + secret key all password in your database are exposed.

Hashing => harder to crack, need to guess password and compare output to find password. So it's an one by one work.

Hashing + Salting => make everything hashing unique, even harder to crack, attacker need to split password from salt. Still an one by one work, even once it's decrypted

If you are using hashing+salt already and still don’t know why, I would suggest reading this aertical to give you a better understanding of password security. Will give you more information too on why sometime using salt is not enough. Unless you are using a strong hashing function such as bcrypt, scrypt or Argon2
patrickmn.com/security/storing-pas...

Have you read the OWASP password pages? With minimal tech knowledge is pretty easy to understand.

Classic DEV Post from Nov 8

10 Software Engineering Proverbs/quotes I Wrote in the Past Months

In the past months, I was tweeting some insightful and enlightening software en...

Saurabh sharma
Web development | #javaScript #nodejs #expressjs #jquery #css

How well do you know your own code?

Sign up (for free)