Ilija Eftimov

Posted on Dec 23, 2018 • Updated on Dec 25, 2018

Validate your passwords using Elixir and haveibeenpwned.com's API

#elixir #haveibeenpwned #api #passwords

This post was originally published on my blog, on December 23, 2018. You can see it here.

Unless you've been living under a rock for the last couple of years, you probably know what two-factor authentication (2FA) is. It's quite a neat trick actually - you have a password that you have to (obviously) enter correctly (first factor), but you also have to receive a second (random) code through a different medium, sometimes on a different device, that you have to enter to log in (second factor).

Now, obviously this adds quite a bit of overhead to logging in, but it adds a disproportionate value when it comes to security. If you work in any sort of organisation it was probably no surprise when you were asked to turn on 2FA for all your accounts. If you haven't been asked to (or haven't done it), it's time to act ;)

But, what about factor No. 1? The password. Did we give up on them?

Not really. But, for sure we had to become more vigilant and smarter when setting our passwords. Why? Allow me to explain.

Reader, meet haveibeenpwned.com

Let me introduce you to haveibeenpwned.com. It's a free resource for anyone to quickly assess if they may have been put at risk due to an online account of their's having been compromised or "pwned" in a data breach. As you can imagine, to fulfil its purpose, this service also contains quite a long list of pwned passwords (about 500 million of them to be more precise), which are open for querying through a REST API.

If you want to learn more about the project, or it's author, I suggest checking out the About page of the project.

Using the pwned passwords API

This API allows us to check if any password is present in haveibeenpwned database. This means that if you send an already pwned password it will tell you that this password has been pwned and that it's suggested to choose another one.

Imagine you have a website where people can set their passwords, and once the user finished typing their new password you can ping this service and check if the password they chose has been pwned before.

Now, if you are thinking along the lines of "are you telling me to send a plain-text password across the wire to some random API?" then you're a step ahead, well done!

Sorry to disappoint, but no, actually I am not saying that. Instead of sending the whole password in plain-text, this API only requires the 5 characters of the SHA-1 hash of the actual password.

In Elixir terms, that would look like:

:sha
|> :crypto.hash("password")
|> Base.encode16
|> String.slice(0..4)

Interestingly what it sends back is the remainder of the hashed passwords that match the 5 characters that you sent. Basically, this means if we take a SHA-1 of "password":

iex(1)> :crypto.hash(:sha, "password") |> Base.encode16
"5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8"

We will send only 5BAA6 to the API, while in the response body we will receive a big list of strings that will represent the rest of the SHA-1, or in our example that would be 1E4C9B93F3F0682250B6CF8331B7EE68FD8.

Troy Hunt, who's the author of haveibeenpwned has written quite an extensive explanation on how this works - you can read it here.

Pinging the API

For the purpose of this exercise, we will create a small Mix package that will encapsulate all of the behaviours. If you're not familiar with how to create new packages using Mix, I suggest reading my article Write and publish your first Elixir library.

We will call our package Pwnex because somehow my brain always thinks that I have to mix up the main word (pwned) with Elixir to come up with a name. Anyway, let's create it:

› mix new pwnex
* creating README.md
* creating .formatter.exs
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/pwnex.ex
* creating test
* creating test/test_helper.exs
* creating test/pwnex_test.exs

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

    cd pwnex
    mix test

Run "mix help" for more commands.

Now that we have the package bootstrapped locally, let's open lib/pwnex.ex and
add some documentation:

defmodule Pwnex do
  @moduledoc """
  Consults haveibeenpwned.com's API for pwned passwords.
  """

  @doc """
  Checks if a given password is already pwned.

  ## Examples

      iex> Pwnex.pwned?("password")
      {:pwned, 3_000_000}

      iex> Pwnex.pwned?("m4Z2fJJ]r3fxQ*o27")
      {:ok, 0}

  """
  def pwned?(password) do
  end
end

The moduledoc briefly explains the purpose of the package, while the doc explains the purpose of the pwned?/1 function and has two examples that we
could use in the doctests.

Our little algorithm

Let's see what would be the steps to implement the Pwnex.pwned?/1 function:

def pwned?(password) do
  {hash_head, hash_tail} =
    password
    |> sanitize
    |> hash
    |> split_password

  hash_head
    |> fetch_pwns
    |> handle_response
    |> find_pwns(hash_tail)
    |> return_result
end

Once more - the pipeline operator in Elixir makes this function so clear and procedure-like that explaining feels a tad redundant. Still, here it is:

sanitize - we want to remove all leading and trailing whitespaces from the password
hash - we want to convert the password to a SHA1 hash and return it's first 5 characters
split_password - we want to split the head - first 5 characters that we will send to the API and the tail - the rest of the SHA-1 hash
fetch_pwns - we will send an API request to haveibeenpwned to get all (if any) pwns of the password
handle_response - depending on the response we will either get the body, or the reason for failure returned
find_pwns - we will take the response body, and because haveibeenpwned uses a k-Anonymity model we will need to find the actual match ourselves (if present)
return_result - will return the tuple which will contain a result atom and a pwns count

Let's take a step by step approach and implement these functions.

Manipulating the password

Let's start easy. In sanitize, we want to trim leading and trailing whitespaces, while in hash we want to turn the password to SHA1 and return it's first five characters.

def sanitize(password), do: String.trim(password)

There isn't much to explain here really. Instead of using sanitize we can use String.trim/1, but I prefer to have a separate function that we could extend and test for any edge cases.

defp hash(password) do
  :crypto.hash(:sha, password)
  |> Base.encode16
end

:crypto is an Erlang module that provides a set of cryptographic functions. Interestingly, it's not part of the standard library, but it comes included in the distribution. One of the functions, as you can see in the code above, is hash/2, which takes the hashing algorithm as the first argument and the actual string to be hashed as the second argument. It returns the binary hash, that we can convert to hex by using Base.encode16.

Sending request to an API

I bet you're thinking HTTPoison. Aren't you? While I was writing this article I was also wondering do we have to include to a whole package just to do a simple GET request. You guessed it - we do not.

Although Elixir does not ship an HTTP client, Erlang does. And just like with :crypto, you can use Erlang's HTTP client from Elixir using :httpc. This module provides the API to an HTTP/1.1 compatible client. I suggest giving it's documentation a quick scan before we move on.

Let's let's open up IEx and give :httpc a spin:

iex(1)> :httpc.request('https://api.pwnedpasswords.com/range/21FCB')
{:ok,
 \{\{'HTTP/1.1', 200, 'OK'},
  [
    {'cache-control', 'public, max-age=2678400'},
    {'connection', 'keep-alive'},
    {'date', 'Sat, 22 Dec 2018 11:09:46 GMT'},
    {'server', 'cloudflare'},
    {'vary', 'Accept-Encoding'},
    {'content-length', '19951'},
    {'content-type', 'text/plain'},
    {'expires', 'Tue, 22 Jan 2019 11:09:46 GMT'},
    {'last-modified', 'Thu, 12 Jul 2018 01:32:06 GMT'},
    {'set-cookie',
     '__cfduid=d51115381191fd7bd0a003d466916efc41545476986; expires=Sun, 22-Dec-19 11:09:46 GMT; path=/; domain=.pwnedpasswords.com; HttpOnly; Secure'},
    {'cf-cache-status', 'HIT'},
    {'access-control-allow-origin', '*'},
    {'arr-disable-session-affinity', 'True'},
    {'cf-ray', '48d2235cbe93bf5c-AMS'},
    {'expect-ct',
     'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"'},
    {'strict-transport-security',
     'max-age=31536000; includeSubDomains; preload'},
    {'x-content-type-options', 'nosniff'},
    {'x-powered-by', 'ASP.NET'}
  ],
  'THEBODYISHERE\r\nOMGSOMUCHSTUFFHEREWHATISTHISEVEN' ++ ...}}

You see, although the output is pretty verbose if you have ever sent an HTTP request via cURL or you've opened your browser's debugging tools, there shouldn't be any surprises here. The request/1 function will send a request to the pwnedpasswords API and it will return a ton of nested tuples, most of them being the response headers and the raw body of the response.

For our purpose, we can keep it simple. We are only interested if the function will return :ok atom as the first item in the tuple, or :error. We can use pattern matching to do this:

iex(1)> {:ok, {_status, _headers, body }} =
  :httpc.request('https://api.pwnedpasswords.com/range/21FCB')

So, lets get back to our Pwnex.fetch_pwns/1 function. The function will receive the first 5 characters of the hashed password, it will send that to the API and will return the body of the response:

def fetch_pwns(head) do
  :httpc.request('https://api.pwnedpasswords.com/range/#{head}')
end

Handling the response

The handle_response will actually be one function with three bodies:

  def handle_response({:ok, {_status, _headers, body}}), do: body
  def handle_response({:error, {reason, _meta}}), do: reason
  def handle_response(_), do: nil

By using pattern matching we can have three types of function bodies. The first one will be invoked when the response status is HTTP 200 OK, the second one when there's an error and the third one for any other case.

As you can imagine, we could have used conditional logic here, but having the power of pattern matching allows us to have three tiny functions that are very easy to read, understand and test.

Parsing the response

Here's the body of the response:

003D68EB55068C33ACE09247EE4C639306B:3\r\n012C192B2F16F82EA0EB9EF18D9D539B0DD:1\r\n01330C689E5D64F660D6947A93AD634EF8F:1\r\n0198748F3315F40B1A102BF18EEA0194CD9:1\r\n01F9033B3C00C65DBFD6D1DC4D22918F5E9:2\r\n0424DB98C7A0846D2C6C75E697092A0CC3E:5\r\n047F229A81EE2747253F9897DA38946E241:1\r\n04A37A676E312CC7C4D236C93FBD992AA3C:5\r\n04AE045B134BDC43043B216AEF66100EE00:2\r\n0502EA98ED7A1000D932B10F7707D37FFB4:5\r\n0539F86F519AACC7030B728CD47803E5B22:5\r\n054A0BD53E2BC83A87EFDC236E2D0498C08:3\r\n05AA835DC9423327DAEC1CBD38FA99B8834:1\r\n05E0182DEAE22D02F6ED35280BCAC370179:4

If you look carefully you'll notice that it's actually a list of partial SHA-1 hashes separated by \r\n. With a closer inspection of the first one:

003D68EB55068C33ACE09247EE4C639306B:3

you notice that it's actually a part of the hash, a colon : and a number (3 in the example above). This is actually the hash without it's first 5 characters and the number of times that particular password has been pwned.

This means that the password who's SHA-1 hash is
5BAA6003D68EB55068C33ACE09247EE4C639306B has been pwned 3 times, according to haveibeenpwned.com.

What we need to with this response body is to split it, to convert it to a list
which we will iterate and find our matching hash in. Let's do that:

def find_pwns(response, hash_tail) do
  response
  |> to_string
  |> String.split
  |> String.split()
  |> Enum.find(&(String.starts_with?(&1, hash_tail)))
end

Although find_pwns/2 might look a bit loaded, let me assure you it's not. Let's see what each of the lines do here:

to_string will convert the character list we receive from fetch_pwns and will convert it to a string so we can parse it in the next steps
String.split will split the string on the \r\n characters and will create a list of strings, looking like: ["003D68EB55068C33ACE09247EE4C639306B:3", ...]
We will invoke Enum.find which takes the list and a function as arguments. The list is the parsed list of hash tails and their pwns count, while the function is String.starts_with?/2, which will return true when a line starts with the value of hash_tail.

That's all. At the end, the find_pwns/2 function will return either the line
that contains the matched hash tail or it will return nil.

Returning a meaningful result

Now that we have found the count of pwns for the hash (or just a nil), we need to handle that and return a meaningful tuple to the user of the module.

When the find_pwns function does find a count, we want to return a tuple
like {:pwned, count}. Otherwise, when find_pwns does not find a count it will return nil, which we handle in the second definition of the return_result function:

def return_result(line) when is_binary(line) do
  [_, count] = String.split(line, ":")
  {:pwned, count}
end
def return_result(_), do: {:ok, 0}

In the first function body we will take the line, which should be a binary
(string), split it at the : character and then return the tuple with the count.
In the second function body we take any argument (which in our case is nil) and
return a tuple with 0 as the count.

Using Pwnex

Now, let's load Pwnex in IEx and give it a spin. To load it in IEx, you need to open the root of the module and run iex -S mix. This will open a IEx session and execute mix in it, which will in fact, load and compile the module and make it available for invocation directly from IEx:

› iex -S mix
Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace]

Interactive Elixir (1.7.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> h Pwnex

                          Pwnex

Consults haveibeenpwned.com's API for pwned passwords.
iex(2)> Pwnex.pwned?("password")
{:pwned, 3533661}
iex(3)> Pwnex.pwned?("123!@#asd*&(*123SAkjhda")
{:ok, 0}

As you can see, password has been pwned about 3.5 million times, while 123!@#asd*&(*123SAkjhda never.

That's basically it. We have Pwnex working - it takes our input as a function argument, talks to an API through an Erlang HTTP client, parses its response body, builds a map of hashes and finds any pwns for the given password.

As you can see, this whole package does quite a bit in 65 lines of code.

In this article, we saw how we can create a new package, use it to communicate with an API over HTTP, we learned why you don't always need HTTPoison, how you can parse a request body and how you can mingle with some data.

If you would like to see the actual code that we wrote in this article, head over to its repo on Github.

Top comments (3)

Erebos Manannán • Dec 24 '18

A couple of quick comments:

You keep claiming the "hash" function returns only the first 5 characters of the hash, which it does not, and should not or the thing wouldn't work.

It seems quite pointless to go through a transformation of a clear list into an inefficient format. It's something I see pretty regularly and it confuses me why not do the matching when looping through the result the first time, so e.g.

Split into lines with \r\n like you already do
Filter into lines whose start matches the hash tail
Split the results (at most one) with the :
Return the number after the : or nil

Ilija Eftimov • Dec 24 '18 • Edited

That's actually a very solid point. It's probably a trap of some sort where the author wants to make the article a bit more interesting for the reader which backfires. Both of your points are correct - the wording should be fixed and the general algorithm should be simplified, which I will do ASAP. Thanks for reading & the feedback!

Erebos Manannán • Dec 24 '18

Actually there's a small problem with the solution I suggested as well - technically two different passwords may end up with the same SHA1 hash result, so in the last step you should return a sum of them :)

DEV Community