Carlos Roso

Posted on Jul 9, 2020 • Originally published at carlosroso.com

Are these lists equal?

#beginners #webdev #tutorial #computerscience

This is an entry-level algorithm to get you warmed up for tough interviews like those you get in Amazon, Google, Toptal, and top remote work sites. It'll get you exposed to a real-life problem with time complexity analysis and data structure exploration.

The problem

Given 2 unsorted lists, determine if both lists contain the exact same items. You can assume that there's no duplicate item in any of the lists.

1. Brute force makes it work

Rule #1 to solve algorithms: Come up with a brute force solution first. The best solution is simply the worst one after some rework. How do you normally check if an item exists in two lists? you traverse both lists and try to find the same item in both: "For every element in list 1, loop through list 2 and try to find it".

The figure above shows the comparisons between the lists. Compare 1 to all the items in list 2; then compare 6, and so on. But, what's that n2 weird curve at the right? Let's talk about time complexity.

BigO: n^2 for the nested loop

Make sure to read this quick recap on BigO complexity if you feel lost about this topic. For every item in list 1, you need to loop over all the items in list 2. That is, take item 1 and visit all 5 items in list 2. Rinse and repeat for the remaining items 6, 2, 9, and 3. That results in 25 comparisons (5+5+5+5+5). A list with 10 items would result in 100 operations. That's why we say out algorithm resolves in quadratic time O(n^2).

I sent this post weeks ago to +950 devs on my email list. Join here to send you my tips and thoughts on algorithms and career growth. If email is not your thing, follow me on Twitter to get sneak peeks of what I'm working on.

2. Sorting makes it better

Brute force proved to be effective but not efficient (too many comparisons). Let's get clever. What if we sorted both lists and then compare items by their indices? After all, if both lists are sorted, the first items should be identical (i=0), same for the second items (i=1), and so on.

As you can tell from the illustration, we first sort the lists, and then we make only one comparison per item. We stop when we find a mismatch on 2 items with the same index.

BigO: nlogn for the sorting algorithm

As previously explained, sorting arrays bring benefits but it also comes with a cost. The sorting cost is normally understood as O(nlogn). This complexity is better than our brute force solution, though, so we made our algorithm a bit more efficient.

3. Counting makes it perfect

O(nlogn) is not "as cool" as we'd like our algorithm to be. Can we do better without sorting, and without a lot of rework? Let's think about this.

If two lists contain the same elements, then I can find the same amount of 1s in list 1 than in list 2. If I tell you "I have two 1s and three 2s in my list" you can then count the occurrences of the numbers in your lists and verify we both have the same frequencies. Can you spot the pattern now?

We're changing the problem from "are these list equal?" to "verify both lists have the same frequencies for all items". We now have a 'counting' problem: count the frequency of all items in list 1 and compare it with the frequency of all items in list 2. This is what hashmaps are useful for.

In the image above, we loop over every item and register in our "frequency" hashmap (i. count items). We build a hashmap for every list. For example, we visit item 2 in list 1 and we add a new record in our hashmap: <2, 1>. This means "item 2 appears 1 times in list 1".

Finally, we loop over the keys in the hashmap #1 and compare the values with the same key in hashmap #2 (ii. compare counts). If the key is not found in hashmap #2, then the lists are not equal.

It seems like this is a lot more work, so, is this really more performant?

BigO: n for just one pass at the lists

Let's count the operations:

Loop over list 1 only once. That is 5 items.
Loop over list 2 only once. That is 5 items.
Loop over the keys in the hashmap. That is 5 items (worst case).

This results in 15 operations. If we had 10 items, this would be 30 operations (10+10+10). Hopefully, you can see a linear correlation here: for lists of length n, we do 3n operations. In computer science, we often simplify O(3n) to O(n).

We finally arrived at an algorithm with linear execution time. When you understand that O(n^2) is worst than O(nlong) which is worst than O(n), then you realize we have arrived at our most performant solution.

Note: This algorithm will also work for lists with duplicate items. Can you understand why? Explain us in the comments!

Show me the code

Let's see how our most performant solution looks like in code. I'm doing it in JavaScript just for familiarity.

function areListsEqual(list1, list2) {
  if (list1.length !== list2.length) return false;

  // count items from list 1
  const map1 = {}
  for (let item of list1) {
    map1[item] = (map1[item] || 0) + 1
  }

  // count items from list 2
  const map2 = {}
  for (let item of list2) {
    map2[item] = (map2[item] || 0) + 1
  }

  // compare the counts, return false if counts dont match
  for (let item in map1) {
    if (map1[item] !== map2[item]) return false
  }

  return true
}

areListsEqual([1,6,2,9,3], [3,1,9,2,8])

Here's a blitz with the working code. Can you adjust it to work with 3 lists? Send your blitz in the comments!

Key takeaways

If you can only remember one thing from this post, let it be one of these:

Find brute force solutions first.
Sorting is nice, but it comes at a cost.
Hashmaps will land you the job.

I wrote an eBook to help you

My mission is to help devs grow by posting tips to write CVs, teaching algorithms, or talking about my experience working on top remote workplaces. I wrote an eBook to help you land a job in Toptal. You can sign up here and get it right away.

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Top comments (20)

Julian R Figueroa • Jul 10 '20 • Edited

I like a lot these kinds of exercises, very nice diagrams and a way to explain it!

Did you know it can be solved with just one frequency map by adding/removing the key? Same O(n), but two-thirds of the operations and half the memory :)

Check (and run) this snippet: play.golang.org/p/ZR5QHwZPAn8

This is only valid because of:

You can assume that there's no duplicate item in any of the lists.

Very cool articles man! 🚀

Carlos Roso • Jul 10 '20

Thanks mate! Your support is always much appreciated 🙏🏼

You got a point, yeah! Much more performant in time and space indeed.

Rob Waller • Jul 10 '20

This is an interesting post and a great explanation of the problem.

But I must ask what is the point of an interview question where someone can basically Google the answer? This doesn't really tell you a lot about the developer.

It saddens me we see this a lot in the tech industry.

Carlos Roso • Jul 10 '20

Good point. There's no absolute answer to this question. This has been and will be a heated discussed topic.

It doesn't sadden me, though. I wouldn't apply to work at such a company if it doesn't align to my values, which would be your case. And that's fine. It's just another type of interviewing.

Truth is, if your goal is work at a FANG company or a top remote company, chances are you'll need to do algorithms. No need to be sad about it, though, just avoid such companies and find those you feel comfortable with.

Rob Waller • Jul 10 '20

Admittedly algorithms aren't the worst example of "obscure tech question bingo" which a lot of organisations base their interviews around.

It saddens me because I don't think it's a great way to find good developers and will result in plenty of false negatives. So I believe good developers will needlessly miss out on good opportunities. Assessing the way a developer works rather than what they know at a given point in time is a better approach for finding good developers IMHO because the former is harder to teach than the latter.

But as you say this is for FANGs and I have never worked at a FANG and probably never will. And I imagine most of the developers they interview are of a high quality regardless.

Andrey Goncharov • Jul 10 '20

A great example of how one can sacrifice memory for CPU time! However, what is the point of counting if by the original condition "You can assume that there's no duplicate item in any of the lists". We might as well use sets here for simplicity.

Carlos Roso • Jul 10 '20

Yeah, a set would work perfectly well. That's the kind of realization I wanted from the post. I'd rather write about the general use case to have readers drill down to more optimal solutions.

Hong Jiang • Jul 10 '20

You could build a map with list 1, and then iterate over list 2 to compare each element with its count in the map. The asymptotic complexity is the same, but the code would be more concise and efficient (2 loops instead of 3).

Carlos Roso • Jul 10 '20

That's right. That's the kind of response I was expecting from this post. My O(n) solution is clear enough to lay a foundation for non-asymptotic optimizations like yours or that of the first comment. Thanks for contributing!