Under the Hood: How BCrypt Functions

#rails #beginners #webdev #cybersecurity

Something that I often experience as a student software engineer is taking a feature of a web application that I’ve used hundreds (if not thousands) of times before, and then learning how it is implemented/how it functions. It’s actually become one of my favorite aspects about learning the profession thus far! One such concept that I’ve encountered/learned recently is how passwords are securely stored and accessed. Of course, to even get to the point of understanding the above, one must first learn how a user enters that password data into a form and how that data is transferred to a database. After that, one can safely assume that every time you log in you’re just posting the input from the password form to a database, and a function checks to see if that plaintext password matches the username, then you’re good to go, right? Well, if we deploy a website where the sole intention is to compromise our user’s information, then absolutely! But that’s not what we’re about!

One of the primary tools used in securing passwords (or anything regarded as sensitive information) is hashing. As alluded to above, storing plaintext passwords puts users at risk. People (myself previously very much included) have a strong tendency to use same or similar passwords for multiple websites, and it’s easy to see why. It’s hard to put a number on the average number of applications that a single person has to remember login credentials for, but even if the number was simply five, that’s a lot of effort to generate a secure unique password for each platform! Thus, a data breach has the potential to expose a single user to multiple other breaches (among other risks). Hashing combats this by taking a newly-minted password and generating a fixed-length output. This seemingly unintelligible text that the hash function generates makes password determination difficult, but not impossible.

Given that a hash always produces a fixed length output, one could take many of the available breached databases on the internet and create a reference of all these known passwords/hashes and cross-reference the database in-question. From there, it’s just a matching game. This is where salting comes in! A salt is just “simply” a random string added to the beginning of a password before it is hashed. This adds another layer of complexity to obtain a databases’ sensitive information. If we could also make the hashing function very computationally expensive, such that the function requires a lot of time/space to re-run a password string, then you have a great recipe to make compromising a database at least way more effort than it’s worth. One broadly used cryptographic function to hash/salt passwords (as well as taking a very long time to do it) is BCrypt.

As one would surmise, there are many widely available cryptographic functions; two examples being the SHA2 or SHA3 family. Although these perform similar tasks to BCrypt, they are computationally fast, making brute force attacks easier. In addition to the benefit of being computationally slow, BCrypt also has the advantage of scalability. As modern CPUs become predictably faster at computing hashes, BCrypt will run slower and slower given “faster” hardware.

So we’ve gone over why BCrypt is used, but how exactly does it work? The function can be broken down into two phases.

BCrypt Algorithm — https://auth0.com/blog/hashing-in-action-understanding-bcrypt/

Phase 1: A function named “EksBlowfishSetup” is called, taking in the desired cost, salted prepended string, and the password. This is where the bulk of BCrypt’s time is spent; generating a set of subkeys from the primary key (i.e. the password).

Phase 2: From here, the 192 bit text “OrpheanBeholderScryDoubt” is encrypted 64 times using EksBlowfish in Electronic Code Book (ECB) mode with the state from the previous phase. This produces the cost and 128-bit salt value concatenated with the result of the encryption loop.

The resulting hash will have some consistent features:

Finished BCrypt Hash — https://blog.boot.dev/cryptography/bcrypt-step-by-step/

It’ll be prefixed by either $2a$, $2y$, or $2b$ (i.e. the algorithm identifier)
It’ll delineate the cost (10 in this case)
Then the remaining characters contain the salt/password hash

Simplified: One algorithm (EksBlowfish) performs a lengthy computation based on input, and the results of those computations are encrypted alongside a fixed string (OrpheanBeholderScryDoubt) to create a fixed length hash.

Once a hash is generated, we need to be able to store it to be compared for authentication purposes. The source code of the BCrypt Ruby Gem actually offers great insight into how this is performed in Rails:

First and foremost, keep in mind that knowing what the password’s salt is is the Password object’s responsibility.
The given password is run through the same function as the stored (with the same salt)
Then the results are compared to the stored hash

Viola! The password can be verified in one line of code: