TL;DR? Check at least the sections "Time for the comparisons!" and "But wait! There's more" ;)
Table of Contents
1. The Problem
2. How Do We Tackle That?
3. My First Thoughts
4. Improve on The First Try
5. Comparison Between .forEach(), .map() and for
6. Time For The Comparisons!
7. But Wait! There's More
8. Conclusion
I was recently asked to answer a coding exercise before a job interview. Out of the 3 questions, one felt particularly easy as I coded the answer in 5 minutes, but then... Then a problem occurred.
The Problem
My code passed 13 out of 15 tests. For the last two, it would not give an output in due time. It was not optimized enough (it wasn't optimized at all, if you ask me). Unfortunately, I only had 1 min left, and I couldn't find a proper solution in time. But then I thought some more, imagined a few possible optimized solutions, and ran a few tests of my own to verify them.
Here is the exercise: from a given array filled with numbers, you should return all the possible combinations which sum matches a given target (but return each possible combination only once).
For instance, from arr = [1, 5, 2, 4, 4]
with target = 6
, you should return [1, 5], [2, 4]
(or '4,2', '5,1'
, the way in which it is presented and the order don't matter). Easy enough.
Of course, said array could be much bigger (say 20000 elements). This is when optimization kicks in.
How Do We Tackle That?
What do we have to solve that kind of problem?
-
Array.map()
andArray.flatMap()
, which take a callback and create a new array from the existing one. (.flatMap()
returns, as its name suggests, a flattened array.) -
Array.forEach()
, which also takes a callback, loops through the array, and runs whatever you ask it to run (say,.push()
the result in another empty array). - Or a plain, basic
for
loop. In the end, for what we are after, it works a bit like a.forEach()
: you first create an empty array, then push whatever the loop returns.
From here, we don't have so many options. We have one array, where each element must be associated with every other element exactly once if the sum is correct. What this means is:
- We need a new array
- We need to loop over the array, to get each element
- And we need to loop a second time, to get the second element
A loop inside a loop, that's never ideal. But at this point I couldn't find a better solution.
If you: Your function takes way less time. That is better and smarter in every aspect than what I'm developing here under.There is actually a way to stay on a linear time method
Thanks to Vincent A. Cicirello in the comments who pointed this out to me.
y = target - value
in another array or Set B,[y, target - value]
target / 2
at least twice (because if your target is 6, [3, 3]
is a valid answer; but if you only have one 3 in the array, you should remove [3, 3]
from the pairs)
I won't linger on this in the article, though, because I want to compare .forEach()
, .map()
and for
in a similar environment, which is here a quadratic time one.
My First Thoughts
Ok, we know where we're heading.
As we are dealing with an array, my first thoughts were to use a .forEach
. (Don't ask me why I didn't start with a .flatMap
, which seemed called for since we are basically returning a new array. My excuse is that I was stressed.)
Here's my code, commented for the beginners among you:
function combine(arr, target) {
// I initiate an empty array
const pairs = []
// I loop once over arr
// I keep in memory the element (e) and its index (i)
arr.forEach((e, i) => {
// I loop a second time over arr,
// and similarly keep in memory the element `el` and its index `j`
arr.forEach((el, j) => {
// I compare `i` and `j` to make sure I'm not
// combining the same element with itself,
// and I check that the sum meets the target
if (i !== j && e + el === target) {
// By using a ternary operator to assign a value,
// I make sure each pair is presented
// with the smallest number first
const pair =
e <= el
? [e, el]
: [el, e]
// Two similar arrays [1, 2] are two distinct objects.
// To compare them, I use a trick with `JSON.stringify()`,
// because... Well, because strings are comparable.
pairs.push(JSON.stringify(pair))
}
})
})
// Finally, I return a set from the array,
// to make sure I have each combination only once
return new Set(pairs)
}
Improve on The First Try
You might see a few problems with my function already.
For instance, why stringify the combined array? We could just return a string. Remember: the form doesn't matter. That's a good guess: removing JSON.stringify()
means removing a function, so less computing time. This is not the crucial point, though.
The true problem here is that I loop over elements which have already been used up. Take [1, 2, 3]
: once I returned [1, 2]
, I don't need to return [2, 1]
. Instead of looping over the whole array the second time, I could start at i + 1
, the index of the current element + 1. Let's use .slice(i + 1)
to do it.
One could even argue that the Set
is now useless. That's true if all the elements in the array are different, but not if you have duplicates. So let's keep it.
Here's the better version:
function combine(arr, target) {
const pairs = []
arr.forEach((e, i) => {
arr.slice(i + 1).forEach((el, j) => {
if (e + el === target) {
const pair =
e <= el
? `${e}, ${el}`
: `${el}, ${e}`
pairs.push(pair)
}
})
})
return new Set(pairs)
}
Comparison Between .forEach(), .map() and for
That's better, but it wasn't good enough to pass the last 2 tests. I didn't have time to do better, but after the timer stopped, I ran some checks.
So what do you think? Which one of .forEach()
, .map()
or for
is the fastest, and to which extent?
Let's try. We will run test1()
, test2()
and test3()
to see the difference.
First, we initiate our array:
// An array of 10000 elements, each corresponding to its index
const arr = Array.from({length: 10000}, (e,i) => i)
// Let's double the array for fun (and to create duplicates)
const pirate = [...arr, ...arr]
const target = 10000
The first test will check the time taken for .forEach()
.
function test1(arr, target) {
let startTime = new Date()
function combine(arr, target) {
const pairs = []
arr.forEach((e, i) => {
arr.slice(i + 1).forEach((el, j) => {
if (e + el === target) {
const pair =
e <= el
? `${e}, ${el}`
: `${el}, ${e}`
pairs.push(pair)
}
})
})
return new Set(pairs)
}
combine(arr, target)
return new Date() - startTime
}
Then, we'll check the time taken for a .map()
function test2(arr, target) {
let startTime = new Date()
function combine(arr, target) {
const pairs = arr.flatMap((e, i) =>
arr
.slice(i + 1)
.map(el => {
if (e + el === target) {
return e <= el
? `${e}, ${el}`
: `${el}, ${e}`
}
})
.filter(e => e)
)
return new Set(pairs)
}
combine(arr, target)
return new Date() - startTime
}
Finally, we'll test the simple for
loop.
function test3(arr, target) {
let startTime = new Date()
function combine(arr, target) {
let pairs = []
for (let i = 0; i < arr.length; i++) {
for (let j = i + 1; j < arr.length; j++) {
if (arr[i] + arr[j] === target) {
const pair = arr[i] <= arr[j]
? `${arr[i]}, ${arr[j]}`
: `${arr[j]}, ${arr[i]}`
pairs.push(pair)
}
}
}
return new Set(pairs)
}
combine(arr, target)
return new Date() - startTime
}
Time For The Comparisons!
It's time for the results. But first: what are you expecting? What would your podium be?
To avoid exceptions, I run each test 10 times and calculate the average delay (in ms). Which gives us:
forEach() | map() | for |
---|---|---|
3212.9 | 16618.1 | 787.2 |
So it seems we have a winner. A really, really clear winner.
Are you as astonished as I was?
I mean, OK, for a seasoned developer, the victory of the for
loop won't be much of a surprise. It is well documented after all. But to this extent? Boy, was I not expecting that!
It turns out the for
loop is more than 4 times faster than .forEach()
and 21 times faster than the .map()
!
In fact, the margin is so big that even if I test the solutions without slicing the array to the current element on the second loop, they remain better than other options. In the end, the fastest solutions are:
-
for
(withj = i + 1
) -
for
(withj = 0
) -
.forEach()
with.slice()
-
.forEach()
without.slice()
-
.map()
with.slice()
-
.map()
without.slice()
With 10000 items duplicated, this takes a whopping 5 ms to compute.Update on the linear time solution
It doesn't make much sense to compare this to the previous results, but just for fun, how long does a truly optimized function take?
Let's code the method suggested by Vincent A. Cicirello in the comments:
function combine(arr, target) {
const set = new Set(arr)
let pairs = new Set()
let potentialPairings = new Set()
set.forEach(e => potentialPairings.add(target - e))
potentialPairings.forEach(pairing => {
if (set.has(pairing)) {
const pair = pairing > (target - pairing)
? `${target - pairing}, ${pairing}`
: `${pairing}, ${target - pairing}`
pairs.add(pair)
}
})
if (arr.filter(n => n === (target / 2)).length < 2) pairs.delete((`${target / 2}, ${target / 2}`))
return pairs
}
And what do we learn from this, kids? That you should really, reaaaally put some efforts in keeping a linear time function. Don't follow my lead.
But Wait! There's More
What could there be left to be said? We have a podium, and that article is already too long!
Well, as I was writing this, I found out that the results are very, very different if you remove the target. Basically, instead of returning every pair that matches a target, you return every single possible pair. That obviously takes a lot more time to compute.
Because I like my laptop, I tested with an array of 3000 instead of 10000. Here are the results:
forEach() | map() | for |
---|---|---|
9147.1 | 9097.4 | 8892.2 |
The results are much tighter. for
remains the winner, but in this case all three options are fairly similar - and .map()
is consistently, although slightly, better than .forEach()
.
Conclusion
Two things are certain:
- If you can find a way, remove those nested loops.
-
for
is your go-to if you need an optimization boost and you cannot reduce the number of nested loops. Additionnally, the less matches it finds, the fasterfor
gets, and the bigger the gap betweenfor
and the rest. So if you have a very specific condition at the start of the second loop, it will trim the computed time.
For the rest, to be honest, I'm still trying to figure out where the huge differences in results comes from - and I'm talking of the difference between the 3 options in the first tests, as well as the difference between the first tests and the second ones. If you have an answer or an assumption, please leave a comment, I'll take it! :)
Top comments (3)
There's another potential optimization that may have a meaningful impact. If there can be a large number of duplicates then your
pairs
may grow large creating unnecessary overhead. Instead of forming the Set at the end, try initializing an empty Set before your loops, and then instead of adding topairs
, add directly to the set.Yet another optimization, even more substantial, would be to generate a Set of the numbers in the original array so that you condense it to the unique integers that it contains. Then iterate over the unique integers x in that Set, compute y=target-x, and check if the Set contains y. A lookup in a Set is constant time. In this way, you'd need 1 linear pass over the original array, followed by 1 linear pass over the unique integers. You'd end up with a linear time solution instead of your current quadratic time solution.
Oh thanks!
Your first suggestion is painfully obvious, I'm a bit ashamed I did not see it. ^^' Although after a closer look, with an array of 20000 (10000 duplicated), it's actually sliiiightly slower (~ +50 ms, whatever the method). I also tried with an array of 10 replicated 2000 times, with the same result. Maybe it starts having effects well beyond those measures? [Edit: nope, doesn't seem to be the case. With 200,000 items and a for loop, it's several seconds slower to initiate pairs as a Set.]
Anyway, I really didn't think about your second suggestion, although I tried to find a way to avoid looping inside a loop. For the simple reason that you might have one number twice, which combination is valid ([3, 3] to get 6), so a Set didn't seem to be the solution. But that was silly of me, as there's a simple way to check for this case. With your idea I made this function and tested it:
And this takes a whopping 5 ms to run (with 10000 items duplicated).
I think I'm going to edit my article, thanks. :)
You're welcome. I'm guessing that if your original solution got all but a couple of their tests due to time, that they probably had at least one test large enough that you needed to have a linear time solution like this one. They probably weren't concerned with the
for
vsforEach
.