DEV Community

Cover image for Hashmaps like you've never seen them
Carlos Roso
Carlos Roso

Posted on • Edited on

Hashmaps like you've never seen them

Enter hashmaps. You're using them every day, yet you might not be aware of how it works or why you're preferring them. Why do I say it's the best? not only it is your silver bullet in any coding interview, but it also gives you free performance gains in your day-to-day code.

Hashmaps disguised as objects or sets

Open your last project and you're almost certainly using hashmaps all over the place. If you're writing JavaScript, they are just plain objects { name: 'carlos' } (not strictly, see the final section). Java uses the HashMap object, Python calls them dictionaries, and Go has maps. Let's see a TypeScript example, just for the sake of the illustration.

const ages: Record<string, number> = {
  carlos: 28,
  leena: 34,
  luigy: 14,
  maria: 54
};

So, you're familiar with the concept - you know how it looks like, but, what is it?

A ridiculously efficient way to read and write

A hashmap is a data structure (i.e. a piece of code to manage data) which is absurdly efficient to store and retrieve data to/from memory. Think of it as a table with pairs. A 'key' is how you identify a 'value'. In the previous section, the keys are 'carlos', 'leena, 'luigy', and 'maria'; the values are 28, 34, 14, and 54.

Hash. As for 'hash' function

A hashmap is an array on steroids. The difference is that you don't access the array with numeric indices, but instead with objects. How come so? How do you point to something in the array if you don't have its index?

Simple: if you don't have the array index, you need a way to get it from the object.

Hash illustration

The fire is the magic. That magic is called the Hash Function. It's a way to turn something (string, number, object) into a position within an array. Every programming language has its own esoteric way to write this function. A hashmap can be terribly inefficient or blazing fast depending on its hashing function.

What does a terrible hashing function look like, then?


I sent this post weeks ago to +700 devs on my email list. Join here to get my tips and thoughts on algorithms and career growth. If email is not your thing, follow me on Twitter to get sneak peeks of what I'm working on.


Counting chars

Let's implement a terrible hashing function that only works for keys as strings. It'll map keys to an array of length 10. Here's how our hash function works:

Our hashing function h(x): Convert every character of string x to its numeric position in the alphabet. Sum all these numbers. Divide it by 10 and take its reminder (aka modulus %).

Sounds complex, better to show an example:

  • h('abc') = (1+2+3) % 10 = 6 % 10 = 6
  • h('carlos') = (3+1+18+12+15+19) % 10 = 68 % 10 = 8

This means that the map { carlos: 28 } will store the value 28 in the position 8 of its underlying array. Here's a more visual representation of the hashing function concept.

Hash Function

Free perf gains

By this point, you're sold on the concept. You understand why it's so fast and how it works internally. But, why should you care? How can you apply this concept in real life?

Let's work on an algorithm you might easily come across in your day-to-day job: Find the most common item in a list of words. For example, in the list [4,1,3,4] the most common element is '4'. There are many ways to solve this but we'll focus on 2 of them here:

  1. For every element, find all its duplicates. Return the element that has most duplicates.
  2. Count the occurrences of each element. Pick the element with most appearances.

The first approach requires a nested loop (for every element i, loop over all other elements j to search this item); the second approach will only need a hashmap. Let's see how they compare in terms of operations.

Comparisson

If you don't understand why 21 and 9 operations, think about this:

  1. In the first approach, you visit the first item 'carlos' and then visit the rest of the items to find something similar. That is 6 visits. Then you visit the 2nd item 'edgar' and go through the remaining 4 items. That is 5 visits. Keep doing it until you're left with just 1 element, sum all the visits you pay to the numbers, and you're left with 21 operations.

  2. In the second approach, you visit the first item 'carlos' and save it in a hashmap with value 1. Every time you find another 'carlos', you'll increment its value in the hashmap. That is, you need to go over each element just once, making it 6 visits. The hashtable ends up with 3 items, so you'll need to go over each one of these to find out which one appeared the most (max number). That sums up to 9 operations.

The larger the input, the better the perf

You might think: "that's a dumb example just to make your point, 21 vs 9 is nothing for my CPU". Let's see what happens when you have an array of 10, 20, 30, ..., 100 items.

BigO Comparisson

As you can see, our algorithm performs much better when using the hashmap than when using the nested loop. Algorithms should be judged according to their performance when the input approaches infinity. In fact, the first approach runs in quadratic time O(n^2) whereas the one with the hashmap is linear O(n). That's BigO notation.

Knowledge is power. You have now an additional piece of power to make your code a bit more efficient with Hashmaps!

Keep curious

This was just an introduction to a data structure I personally find very useful and interesting (literally the best one you can ever learn). That being said, this was just the tip of the iceberg of a huge topic. There's a lot of complexity involved when dealing with hashmaps. If you feel intrigued, I invite you to keep researching the following topics:


Let me know on Twitter if you got some value from this post and learned something new. Be more mindful about the code you write, think about performance and the stuff underneath. Hashmaps will also help you ace a lot of interviews in your career, I've been there.


I wrote an e-book to help you

My mission is to help devs grow by posting tips to write CVs, teaching algorithms, or talking about my experience working on top remote workplaces. I wrote an ebook to help you land a job in Toptal. If that's something you're interested in, sign up here and get it in my next email.

Alt Text

Top comments (9)

Collapse
 
szhabolcs profile image
Nagy Szabolcs

Really interesting and educational article! I really liked it! I had to deal with hashmaps before but didn't quite understood them, but now I feel like everything is more clear now. πŸ˜„
And also I was wondering did you make the illustrations yourself or did you use some kind of tool? They are looking really good 😊

Collapse
 
caroso1222 profile image
Carlos Roso

Thanks for the kind words mate! I made the illustrations myself using Sketch. I'm kind of a frustrated designer so I use these posts to keep my creative juice flowing haha.

Collapse
 
wouldmake profile image
Marcelo Faria

I liked the reading, but in the topic "Counting chars", you wrote:
"This means that the map { carlos: 28 } will store the value 28 in the position 3 of its underlying array."
The correct thing would be position 8, right?

Collapse
 
caroso1222 profile image
Carlos Roso

Good catch. I'll edit right away. Thank you Marcelo!

Collapse
 
patricktingen profile image
Patrick Tingen

Thanks for this great article, I really like the tone of your writing, it is fun to read

Collapse
 
caroso1222 profile image
Carlos Roso

Thanks Patrick! I do it to help and I'm glad you found it useful.

Collapse
 
antoniogamiz profile image
Antonio

Nice article and pretty good graphs!! What tool have you used to generate them? :o

Collapse
 
caroso1222 profile image
Carlos Roso

Thanks Antonio! I used Sketch, built them from scratch.

Collapse
 
olsard profile image
olsard

Great and easy to read. Thanks for sharing!