DEV Community: Mustafa Haddara

Javascript Objects Considered Unsafe

Mustafa Haddara — Tue, 23 Feb 2021 17:07:27 +0000

Every year, I like to teach myself a new language by working through the Advent of Code problems. This year, I wanted to dive into typescript. I've written a fair amount of JS before, and I like having explicit typechecking in my code, so TS felt like a good fit.

Everything was going great until I hit Day 15, and my solution worked well for the small case but was painfully slow in the large case.

You should definitely click through and read the problem statement, but to summarize:

You start off with a comma-separated list of numbers. This is your problem input. The challenge is to continue the sequence of numbers. Each number in the sequence (after the starting set) depends on the number immediately previous to it. If that previous number has never appeared in the sequence before, our new number is 0. If it has appeared in the sequence, our new number is the number of steps between when it last appeared in the sequence and now.

For example, let's say we start with 1,2,3 as our input. The next numbers would be:

To determine the fourth number, we'd look at the third number (3). 3 has never appeared in the sequence before, so the fourth number is 0.
To determine the fifth number, we'd look at the fourth number (0). 0 has never appeared in the sequence before, so the fifth number is 0.
To determine the sixth number, we look at the fifth number (0). 0 has appeared in the sequence before, as the fourth number, so we take the difference (position 5 - position 4 = 1) and that difference is our sixth number: 1.
To determine the seventh number, we look at the sixth number (1). 1 has appeared in the sequence before, so we subtract and get 5.

We can continue this pattern forever.

(Numberphile has a wonderful overview of this sequence that goes into a bit more detail)

Part 1 of the challenge is relatively simple: determine the 2020th number in the sequence. I decided to brute force it:

const solve = (input: string, end: number): number => {
  const nums = input.split(',').map((l) => parseInt(l));

  const spoken: Record<number, number> = {};
  let i = 0;
  let next = -1;
  while (i < end - 1) {
    const current = i < nums.length ? nums[i] : next;

    if (spoken[current] === undefined) {
      next = 0;
    } else {
      next = i - spoken[current];
    }
    spoken[current] = i;

    i++;
  }
  return next;
};

The spoken object is a Record<number, number> where the keys are numbers and the values are the last index where that number appeared in the sequence. (Why "spoken"? See the Advent of Code description). On each iteration, we check if our current number has been spoken before, and set our next number accordingly. On the next iteration of the loop, current becomes next, and so we repeat this process for as many iterations as we need.

So far, so good, but let's talk efficiency, because part 2 of the challenge is to calculate the 30 millionth number in the sequence. Performance-wise, I'd claim that this runs in O(n) time; the run time scales linearly with the number of values we want to compute. ie. Computing 2000 numbers takes twice as long as computing 1000 numbers. This is because we use that spoken object to do fast lookups on the last index a number appeared in.

This is because we generally treat insertion and lookups in an object as "cheap" or constant time.

The drawback is that we're using tons of memory. We'd describe it as using O(n) memory; the amount of space we use grows linearly with the number of values we want to compute. (I have a proof for this but it's too large to fit in this margin...just kidding. As per the Numberphile video I referenced earlier, this sequences seems like it grows linearly, but no one has proven that yet.)

Since we're brute forcing the result, at the very least, we need to do one loop iteration per number we compute, so this O(n) runtime is the fastest we can get. If we had a magic math formula that could spit out the number at an arbitrary index, that would be much faster, but I don't have one (and as per that Numberphile video, neither do the mathematicians!).

So, it sounds like we're very close to our theoretical peak performance. Let's try and compute the 30 millionth number then.

And...when I do, it hangs. For upwards of 5 minutes.

Sounds like something is wrong. Very very wrong. I'm not sure where, though, so let's do some profiling.

const start_str = '1,2,3';
[2000, 20000, 200000, 2000000, 20000000].forEach((total) => {
  const start = new Date().getTime();
  solve(start_str, total);
  console.log(`took ${new Date().getTime() - start} ms to compute ${total}`);
});

Basically, we're timing how long this method takes for successively larger and larger amounts of numbers to generate. If our solution was O(n), as I claimed above, we should see times that grow linearly; if 2000 takes 1ms, then 20,000 should take 10ms and 200,000 should take 100ms.

Instead, these are the results I measured:

took 0 ms to compute 2000
took 6 ms to compute 20000
took 90 ms to compute 200000
took 8552 ms to compute 2000000
took 351866 ms to compute 20000000

🤯🤯

The jump from 200,000 to 2,000,000 is particularly interesting: it took us 95 times longer to compute 10x the numbers! And going from 2,000,000 to 20,000,000 is also pretty bad: 10x the work took 41 times longer.

(Before I go any further, a quick note about performance metrics: all of these numbers were collected on my 2016 MacBook Pro running macOS 11.2, with a 3.1GHz i5 in it. I'm using node version 14.15.1 and tsc 4.1.2. The absolute values of these numbers isn't important but what is important is the proportions.)

So, what's going on with these numbers? When I first wrote this code and saw these, I was extremely surprised. We already said it looks O(n), so why are we seeing huge non-linear spikes? Spoiler alert: my code wasn't O(n).

Yeah, that's right. It's not O(n). Why not, you ask?

Well, let's find out. Time to do some more profiling:

const solve = (input: string, end: number): number => {
  const nums = input.split(',').map((l) => parseInt(l));

  const spoken: Record<number, number> = {};
  let i = 0;
  let next = -1;

  let read = 0;
  let write = 0;

  while (i < end - 1) {
    const current = i < nums.length ? nums[i] : next;

    let start = new Date().getTime();
    const last_spoken = spoken[current];
    read += new Date().getTime() - start;

    if (last_spoken === undefined) {
      next = 0;
    } else {
      next = i - last_spoken;
    }

    start = new Date().getTime();
    spoken[current] = i;
    write += new Date().getTime() - start;

    i++;
  }
  console.log(`reading took ${read} ms`);
  console.log(`writing took ${write} ms`);

  return next;
};

We do two things with our spoken lookup object: we read a value out and write a value in. Here I've added code to time how long each part takes, in total, across the entire operation. The results:

reading took 0 ms
writing took 1 ms
took 11 ms to compute 2000

reading took 5 ms
writing took 7 ms
took 31 ms to compute 20000

reading took 55 ms
writing took 130 ms
took 289 ms to compute 200000

reading took 547 ms
writing took 9463 ms
took 10935 ms to compute 2000000

reading took 4480 ms
writing took 346699 ms
took 365872 ms to compute 20000000

Logs can be hard to read, so let's dump this into a table:

	2,000	20,000	200,000	2,000,000	20,000,000
read	0	5	55	547	4480
write	1	7	130	9463	346699

As you can see, read performance is fine, roughly linear. Write performance is drastically worse as our object gets bigger and bigger.

There isn't much more we can do here; we've gathered as much data as we can. We're observing something in the behavior of the language and runtime we're using, so now it's time to go see if anyone else has observed similar behavior, or if we've found a bug in node.

Armed with this information, off to the interwebs we go. Searching for "js object vs map performance" turns up a number of results of people profiling objects and maps, but I didn't find anything along the lines of what I was seeing. Then, I came across GitHub user jngbng, who did similar experiments that suggest objects perform faster than Sets when there is a high number of common elements in the data, but much slower than Sets when most of the elements in the data are unique.

That led me to a more subtle insight here about our code. Object performance is demonstrably worse when there is a high variance in the keys...and as we know, the sequence we're computing grows continuously, which means we must be constantly seeing new numbers and adding them to our object. Thus, we must have a high variance in the keys we're adding into our object.

This insight led me to an article by Camillo Bruni, an engineer working on the V8 JavaScript engine at Google, on how V8 handles objects gaining new properties. He also linked to an article by Vyacheslav Egorov, a compiler engineer at Google, on how V8 handles property lookups on objects.

I will confess, a lot of the details in those two posts went over my head, but the summary is this: the V8 runtime is optimized for cases where you have many objects with the same sets of properties on them. Constantly adding new keys to our object breaks V8's property caches and then forces it to rebuild them each time we add a property to the object, which makes inserting new keys really slow. This is exactly what jngbng found: objects with a small number of keys (or rarely-changing sets of keys) perform faster than Sets with the same keys.

In our scenario, we are adding new keys (the numbers we compute) to our object very frequently, meaning we very quickly get into the range where we frequently defeat the V8 Object property lookup caches!

We can actually confirm this with some more profiling:

const solve = (input: string, end: number): number => {
  const nums = input.split(',').map((l) => parseInt(l));

  const spoken: Record<number, number> = {};
  let i = 0;
  let next = -1;

  let read = 0;
  let insert = 0;
  let num_inserts = 0;
  let update = 0;
  let num_updates = 0;

  while (i < end - 1) {
    const current = i < nums.length ? nums[i] : next;

    let start = new Date().getTime();
    const last_spoken = spoken[current];
    read += new Date().getTime() - start;

    if (last_spoken === undefined) {
      next = 0;

      start = new Date().getTime();
      spoken[current] = i;
      insert += new Date().getTime() - start;
      num_inserts++;
    } else {
      next = i - last_spoken;

      start = new Date().getTime();
      spoken[current] = i;
      update += new Date().getTime() - start;
      num_updates++;
    }

    i++;
  }
  console.log(`reading took ${read} ms`);
  console.log(`inserted ${num_inserts} times for a total of ${insert} ms and ${insert / num_inserts} ms on avg`);
  console.log(`updated ${num_updates} times for a total of ${update} ms and ${update / num_updates} ms on avg`);

  return next;
};

Now, instead of timing all of the write operations as one block, we measure the insert operations (ie. the cases where we write to new keys) separately from updates. We also need to track the number of inserts and updates we do, to make sure that if inserts are taking longer, it isn't because we just did more of them.

Sure enough, our numbers confirm our theory:

reading took 0 ms
inserted 380 times for a total of 0 ms and 0 ms on avg
updated 1619 times for a total of 1 ms and 0.0006176652254478073 ms on avg
took 34 ms to compute 2000

reading took 13 ms
inserted 3285 times for a total of 2 ms and 0.0006088280060882801 ms on avg
updated 16714 times for a total of 13 ms and 0.0007777910733516812 ms on avg
took 58 ms to compute 20000

reading took 41 ms
inserted 29247 times for a total of 54 ms and 0.001846343214688686 ms on avg
updated 170752 times for a total of 32 ms and 0.0001874062968515742 ms on avg
took 223 ms to compute 200000

reading took 389 ms
inserted 265514 times for a total of 7768 ms and 0.029256461052901164 ms on avg
updated 1734485 times for a total of 302 ms and 0.000174115083151483 ms on avg
took 9216 ms to compute 2000000

reading took 4429 ms
inserted 2441404 times for a total of 328123 ms and 0.1343993046623992 ms on avg
updated 17558595 times for a total of 4353 ms and 0.0002479127743421384 ms on avg
took 348712 ms to compute 20000000

And as a table:

	2,000	20,000	200,000	2,000,000	20,000,000
average insert	0	0.0006	0.0018	0.0293	0.1344
average update	0.0006	0.0008	0.0002	0.0002	0.0002

(numbers have been rounded to 4 decimal places)

We can actually watch the average insertion time grow steadily as our object grows. Interestingly, although we seem to consistently update existing values 10x more than we insert new ones, by the time we generate 200,000 values, we're spending more time on inserts than on updates!

And notice how the average update time stays relatively constant? That's V8's property caching optimization in action. When the object doesn't change "shape" (ie. gain new keys), reads and writes are constant time.

So what's our fix? Simple: swap the {} for a Map:

const solve = (input: string, end: number): number => {
  const nums = input.split(',').map((l) => parseInt(l));

  const spoken: Map<number, number> = new Map();
  let i = 0;
  let next = -1;
  while (i < end - 1) {
    const current = i < nums.length ? nums[i] : next;

    if (!spoken.has(current)) {
      next = 0;
    } else {
      next = i - spoken.get(current);
    }
    spoken.set(current, i);

    i++;
  }
  return next;
};

This implementation performed much much better:

took 0 ms to compute 2000
took 15 ms to compute 20000
took 38 ms to compute 200000
took 262 ms to compute 2000000
took 4170 ms to compute 20000000

Now the 10x growth from 200,000 to 2,000,000 only took 7x longer, and the 10x growth from 2,000,000 to 20,000,000 took 15x longer. Both much more reasonable multipliers than what we were seeing before.

If we make a similar change to time the individual sections of the code, we get logs like this:

reading took 2 ms
inserted 380 times for a total of 0 ms and 0 ms on avg
updated 1619 times for a total of 0 ms and 0 ms on avg
took 3 ms to compute 2000

reading took 6 ms
inserted 3285 times for a total of 1 ms and 0.00030441400304414006 ms on avg
updated 16714 times for a total of 4 ms and 0.00023932033026205577 ms on avg
took 28 ms to compute 20000

reading took 39 ms
inserted 29247 times for a total of 6 ms and 0.00020514924607652066 ms on avg
updated 170752 times for a total of 31 ms and 0.00018154985007496252 ms on avg
took 170 ms to compute 200000

reading took 523 ms
inserted 265514 times for a total of 65 ms and 0.0002448081833726282 ms on avg
updated 1734485 times for a total of 278 ms and 0.00016027812290103402 ms on avg
took 1383 ms to compute 2000000

reading took 6507 ms
inserted 2441404 times for a total of 775 ms and 0.0003174402925529736 ms on avg
updated 17558595 times for a total of 3271 ms and 0.0001862905317879933 ms on avg
took 16278 ms to compute 20000000

Add these new numbers to our table from before:

		2,000	20,000	200,000	2,000,000	20,000,000
object	average insert	0	0.0006	0.0018	0.0293	0.1344
	average update	0.0006	0.0008	0.0002	0.0002	0.0002
map	average insert	0	0.0003	0.0002	0.0002	0.0003
	average update	0	0.0002	0.0002	0.0002	0.0002

(again, numbers have been rounded to 4 decimal places)

That performance on a Map is much much better, and much more even. Notice how the difference in average insertion time and average update time barely differs on the Map, no matter how large we get? That's the very definition of O(1) performance.

We can even go further with this-- we could swap in a pre-sized array instead of a Map and get even faster performance. Leave a comment below if you know what this would look like :)

Anyways..what's our takeaway here? What did we learn?

Number 1 is the most obvious: for data sets with large numbers of keys, prefer Map over {}.

But it's not that simple: it's important to verify our assumptions. When I first wrote my function, performance analysis suggested that we had written Performant™️ code, but when we ran it, it choked. And despite the mental model that I had around objects, which I think is pretty common among JS programmers, the authors of the JS implementation you're using might make different tradeoffs that break those mental models.

In our case, the V8 runtime is optimized for the "many objects with the same keys" case instead of the "single object with many keys" case, and that tradeoff made our code much slower. So our second takeaway here is this: our mental models of the world aren't always accurate, and it's important to verify those models.

Third, everything is fast for small n. Our object-based algorithm ran just fine on the small case (computing ~2000 values) but fell apart for the larger case (~30 million values). And this was in a case where we actively were thinking about performance!

Have you run into cases like this before? Found any other accidentally n^2 algorithms? Let me know on twitter or in the comments below.

How To: Building a Debouncer…in Java

Mustafa Haddara — Tue, 19 May 2020 16:54:48 +0000

A short while ago, we were building a distributed backend service where multiple instances of the service would be sending requests to each other to coordinate work splitting and availability. We were concerned about a number of potential roadblocks, including overwhelming the receivers and congesting our network traffic. We also realized that, in this case, immediate consistency wasn't a top priority-- we could tolerate a few seconds of delay, and as long as we were eventually consistent, the system would behave as we expected.

We took inspiration for this problem from debounce mechanisms in web development. On the web, this technique is used for features like autocomplete or suggestions in search bars. Generally you don't actually want these components to make requests on every keystroke, especially if the user knows what they're typing and types a lot, so the solution is to debounce the requests.

What is Debouncing?

The name "debounce" actually comes from hardware designers who needed an accurate way to tell if someone pressed a button, because the physical button will have a little bounce to it, so it looks like they pushed the button, released it, and pushed it again really quickly...but I digress.

Fundamentally, we're rate-limiting the number of requests that are being made. But there's a lot of variations to what "rate-limiting" can mean: does the first request get made right away? Do the subsequent requests get queued, or simply thrown away? If there is a request currently queued, and another one comes in, does it get dropped? Also queued? Does the timeout get extended?

If you search for the meaning "debouncing" you'll typically find any combination of these.

For our purposes, we settled on the following behavior:

When we make a request:

If it hasn't been made within the past delayMillis milliseconds (and there isn't a call queued), then the request gets made immediately
If there IS a recent call, then we check to see if one is queued
- If there is a call queued, we drop the current call
- If there is not, we queue up the current call to be called after delayMillis milliseconds pass

If you imagine requests to be coming in on a timeline, where our delay is 10 ms, it would look a little like this:

This has the nice effect that the first request always gets made immediately, and we still remain within our rate limits.

Requests in Java

At Vena, our backend services are all written in Java 8, which adds an extra layer of complexity. Normally, making a request in Java is a simple method call. And, if we were implementing this in JavaScript, that wouldn't be a problem-- JavaScript has first class functions, and we could just pass around function references, no problem. In Java, we're a little more limited.

In Java, everything is an object, even function calls. We knew that the method we were calling to send our API request wasn't going to accept any parameters, which made things a little easier; it meant we could use the Runnable interface.

The Runnable interface was originally designed for multithreaded computations: you'd create an object with a run() method on it, and you could give that object to a separate thread and have the run() method get called in that second thread. For our purposes, though, it's an effective interface to represent a function call that accepts no arguments, does some stuff (ie. has a side effect) and returns no values.

So we're going to build a DebouncedRunnable. It accepts a Runnable and also implements the Runnable interface itself, meaning it can be a drop-in replacement for the Runnable we give it.

Enough theory, let's show some code. The full class can be found at this gist.

It might look like a lot, but there's that big comment in the middle taking up a ton of room, haha. Let's go chunk by chunk through it.

At the top, we've got some instance and class fields:

The top line defines a logger, so we can control log levels and redirect logging on a class-by-class basis.

The next few lines just define some constants. We've got a ScheduledExecutorService, which is comes from the java.util.concurrent package, and lets us schedule a specific Runnable to be run in a certain amount of time. Then we've got our operation to run, a name of the operation so we can keep track of it for logging, and the delayMillis, which controls the amount of time we need to wait between calls. These are final because, once we get them in the constructor, they'll never change.

Then we've got a couple other fields that are mutable. We'll use them for tracking our scheduling.

We've also got the constructor, which sets those fields, and has a nice juicy JavaDoc describing the behavior of this code. Always comment your code kids!

The actual meat of this class is in the run() method:

This method is synchronized, meaning that only one thread can call it at a time. This helps us avoid race conditions if multiple threads call run() for the very first time at the same time.

The behaviour is what we described above: if we've got a call queued up, we ignore the current invocation and do nothing. Otherwise, we check if we should run it now or schedule it to be run later (on a background thread), and then we do exactly that, keeping track of lastRunTime and isQueued respectively.

Another thing to note is the funky syntax on the call to the scheduler. If you haven't seen Java 8 before, the this::scheduledRun syntax will look odd to you. It basically means "the current object's scheduledRun method". Let's take a look at that method:

This is very similar to the case above, where we just called run() directly, but log message differs slightly and we're clearing the isQueued flag as well.

The remaining methods in the class are all small:

The first is our check to see if we're allowed to run at the current time, and the other 2 are simple wrappers around our scheduler and the time interface so that we can properly unit test this class.

Before we talk about testing, though, let's talk about how we actually used this DebouncedRunnable.

Our calling code had an operation called volunteer(). The details here aren't important-- it boiled down to making an API request, and like I mentioned at the beginning, we were ok with it not firing off immediately 100% of the time and instead being rate limited.

Previously, this class would have looked like:

We had to inflate this service class a little, and change it look a little like this:

So, instead of just sticking a method on a class, we need to take that method, wrap it in DebouncedRunnable, and then keep a reference to it so we can call it later. We also hide the actual API call in a deprecated method with an intentionally ugly name volunteer_yesIKnowWhatImDoing so that it jumps out in code reviews and is immediately obvious that it's not just any other method. Then our volunteer() method calls run() on the runnable, which does its debounced thing: calls volunteer_yesIKnowWhatImDoing directly, schedules a call to it for a short time in the future, or ignores us entirely.

Testing Time and Space

So, now that we've seen what our calling code looks like, we need to answer one more question: how do we test this? Code that interfaces with time is notoriously difficult to test, since unit tests aren't guaranteed to take the exact same amount of time from run to run. On top of that, code that interfaces with separate threads is typically harder to test as well.

And here we've got both.

To start, let's take a look at the instance variables at the top.

We're using Java's AtomicInteger as a simple container object. It represents an integer and defines an incrementAndGet() method on it that will increment the value of the integer, so the value of the integer will correspond to exactly the number of times this method was called.

We also have our DebouncedRunnable, around the incrementAndGet() method. It's the normal DebouncedRunnable, but we're also going to use Mockito to spy on it, meaning we can intercept method calls as we wish.

And lastly, a List<Runnable> to store all of our queued operations.

Our class setup (annotated with @Before in JUnit) is also interesting. Keen-eyed readers would have noticed that I defined two single-line wrapper functions above that I said would help us unit test, and this is how:

doAnswer() is a method from Mockito that takes a lambda to run and a Spy. Basically, since we're spying on our DebouncedRunnable, this is how we specify that we want to intercept the schedule() method on. Instead of calling the real schedule() method (which is a one-liner that sticks things on a separate thread), we're tracking the calls and storing them in our array.

There's one last helper method that makes our tests a lot more readable:

We're using the same doReturn() as above, but it's operating on the getCurrentTimeMillis() method. Ordinarily, this goes straight to the system, but we're intercepting it and manually specifying our return value, effectively lettings us freeze time and manipulate time as we wish.

Now that we've got all of that, we can write tests like this:

or like this, where we verify that even if there is a scheduling delay and enough time has passed between our last run and now, the fact that there is a call queued is enough for us to drop a call:

The full JUnit test, with a few more tests cases I didn't go over, is available in a gist here.

Writing descriptive unit tests like this is really satisfying. Once you've found the right level of abstraction, and can describe the required setup, the actions you want to take, and the outcomes you're expecting, it becomes much easier to write tests and build confidence that your code is doing what you want it to be doing.

Have you come up with interesting testing strategies? Had to solve similar problems? Tweet at me or leave a comment below.