Eden

Posted on Apr 17 • Originally published at eande171.hashnode.dev

How HaveIBeenPwned Checks Your Password Without Ever Seeing It

#security #privacy #webdev #beginners

Maybe you've seen "check if your password has been breached" features scattered across the web. Maybe you've used haveibeenpwned.com yourself.

But there's an uncomfortable question sitting under all of it... how does a breach checking service verify your password against a database of billions of leaked credentials without you just... handing them your password?

The naive implementation would just be awful. You send your password to a server, the server checks it against a list, the server tells you the result. Congratulations! You've just handed a third party your plaintext password to "solve" a security problem.

I'm sure we both know the issue with that (at least I hope I do...).

The way HIBP does it is actually rather clever, and worth understanding. Both as a user and as a developer building anything that touches authentication.

Overview of K-Anonymity

K-anonymity is a privacy concept that's been around since the late nineties. A piece of data satisfies k-anonymity if it's indistinguishable from at least k-1 other pieces of data. In other words, you can't be singled out from a crowd of at least k people.

The classic example is medical records. If you release a dataset where every patient shares their age, postcode, and gender with at least k-1 other patients in the set, no individual record can be uniquely identified. The data is still useful, but the individuals are still protected.

Unfortunately, k-anonymity is susceptible to homogeneity attacks (where all k people share sensitive information, revealing a group) and background knowledge attacks (where an attacker uses additional information to narrow possibilities).

HIBP applies this same idea to password checking, and their implementation is surprisingly straightforward. Luckily for us, our passwords on HIBP aren't nearly as susceptible to these weaknesses.

Before we get into how, it's worth understanding what HIBP is. Troy Hunt (creator of HIBP) has spent years collecting password data from data breaches (like RockYou, LinkedIn or Adobe). All of these entries have been hashed, indexed and made queryable. The issue then becomes: "How do you check this data without exposing your own password?"

When you want to check if a password has appeared in a breach, you don't send the password. You don't even send a full hash of the password. You send the first five characters of its SHA-1 hash.

Say your password hashes to:

5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8

(in this case the password is password... don't judge I needed a good example)

You send 5BAA6 to the HIBP range API. HIBP responds with every hash suffix in their database that starts with those five characters, along with a count of how many times each one appeared in breach data.

You get back something like a list of hundreds of entries:

You then check whether your full hash suffix (1E4C9B93F3F0682250B6CF8331B7EE68FD8) appears in that list. If it does, and the count next to it is greater than zero, that password has been in a breach. The count tells you how many times.

The privacy guarantee here is k-anonymity. Any given 5-character SHA-1 prefix matches hundreds to thousands of different hashes in the HIBP dataset. HIBP's server receives your prefix and has no way of knowing which of those hashes you actually care about. Neither your password, nor its hash, ever leave your computer.

It's important to be honest here though... this method isn't perfect. You still end up sending the first 20 bits of your hash. There are 16^5 (~1 million) possible prefixes, so you often end up narrowing the results down to a couple hundred. This is a reasonable trade-off, but it certainly isn't perfect. It's k-anonymity, not zero-knowledge.

Isn't SHA-1 Broken?

SHA-1 is indeed a "broken" hashing algorithm (proven to be feasibly susceptible to hash collisions). For digital signatures, certificates and authentication, this is a serious problem.

In this use case though, it's not really important. You're not trying to store or protect this password on their servers. SHA-1 provides exactly what is required; a consistent, one way transformation to a fixed length string. K-anonymity is what's actually doing the privacy work, nobody is trying to reverse the hash or forge a hash collision.

That being said, never use SHA-1 (or any fast or broken hash) to store passwords (I'm looking at you). That's what bcrypt, scrypt, and Argon2 are for. The HIBP use case and the password storage use case are different problems with different requirements.

In Practice

Checking if passwords have been exposed in a breach is becoming an increasing concern, including organisations like NIST (the National Institute of Standards and Technology), that recommend checking new passwords against these breach lists.

shameless self promotion. This is the mechanism behind the breach detection in Bastion, the API I built. When you send a password to the endpoint, it is hashed on the worker, and only sends the first 5-characters of the SHA-1 hash to HIBP.

Checking out the free demo, or supporting me on Ko-Fi would mean the world to me :D Thank you for reading this far!!

If you're building anything that touches user credentials, breach checking via the range API (or my own COUGH COUGH) is low-effort, high-value. There's no excuse not to.

DEV Community

How HaveIBeenPwned Checks Your Password Without Ever Seeing It

Overview of K-Anonymity

Isn't SHA-1 Broken?

In Practice

Top comments (0)