Rok Ajdnik

Posted on Feb 7 • Originally published at rokajdnik.com

FizzBuzz Madness in JavaScript

#performance #javascript #node

If you asked a kid what FizzBuzz is, they'd probably tell you it's a game they played at school. The teacher would corral them into a circle, and they'd start counting numbers out loud in a clockwise fashion. For every number divisible by 3, they'd say "Fizz." For every number divisible by 5, they'd say "Buzz." And for every number divisible by both, they'd say "FizzBuzz."[1] The purpose of the game is to teach kids about divisibility in a fun, interactive way. Ask the same question to a college senior, and they'd probably explain the rules differently, pointing out how you should take a shot if you get a number wrong.

Ask a programmer, though, and they'll tell you it's a silly coding question from job interviews. They'd describe it something like this:

Given a positive integer N, return an array of strings with all the integers from 1 to N.
But for multiples of 3 the array should have "Fizz" instead of the number.
For the multiples of 5, the array should have "Buzz" instead of the number.
For numbers which are multiple of 3 and 5 both, the array should have "FizzBuzz" instead of the number.[2]

The purpose of this task is to test elementary knowledge of conditional statements, loops, and basic data structures. Interviewees aren't expected to write an optimized algorithm, just to demonstrate fundamental programming knowledge. Given how simple this problem is, I thought it would be a great way to explore Node.JS internals, specifically the V8 engine[3], and see how far we can push this algorithm's performance.

The Naive Approach

const fizzBuzzNaive = (N) => {
  const arr = [];
  for (let i = 1; i <= N; i++) {
    let output = '';
    if (i % 3 === 0) output += 'Fizz';
    if (i % 5 === 0) output += 'Buzz';
    if (output === '') arr.push(i);
    else arr.push(output);
  }
  return arr;
};

As you can see, the solution is simple and doesn't require any special knowledge to implement. But can we solve it more efficiently?

Before we start writing algorithm variations and comparing their speed, we need a solid benchmarking tool. Simply using the console.time() won't cut it, we need something that can accurately and predictably measure performance across different implementations. Enter Mitata, a benchmarking framework[4] that handles:

JIT[5] Warmup: Runs the code enough times to ensure V8 has compiled it to machine code before measuring. This guarantees execution uses TurboFan[7] instead of Ignition[6], meaning all code is maximally optimized by the Node.JS compiler.
Outliers: Removes statistical anomalies, like a sudden background process spiking on your OS.
Visualization: Draws a CLI graph showing exactly how much faster or slower functions are compared to a baseline.

Is this tool overkill for benchmarking FizzBuzz? Yes. Am I still going to use it? Absolutely.

Running the benchmark against fizzBuzzNaive produces the following results:

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.25 µs	161.17 µs	38.20 µs	35.92 µs	67.29 µs	66.05 kb	451.09 kb	255.49 kb

Let's break down what we're seeing here:

Min/Max/Avg shows the minimum, maximum, and average execution time per iteration. Each iteration is a single execution of fizzBuzzNaive(10_000). The large variance between minimum and maximum times suggests that the garbage collector[8] or some other Node.JS process was running concurrently with the benchmark.
P75/P99 shows percentile timings. P75 means 75% of runs finished faster than the time shown; P99 means 99% did. A large jump between the two would indicate garbage collector pressure or branch misprediction[9].
Min mem/Max mem/Avg mem shows memory allocation per iteration. The large variance between minimum and maximum is most likely due to dynamic array allocation.

Now we're ready to start testing optimizations and measure whether they actually do what we think they do.

Optimizing Conditional Checks

Let's start with low-hanging fruit, optimizing how many conditional checks we perform in each loop iteration. Looking at the naive implementation, every number requires 3 conditional checks and either one or two string concatenations, regardless of the input. It turns out we can leverage the Least Common Multiple[10] to rewrite the code with fewer checks and no concatenations at all.

const fizzBuzzLCM = (N) => {
  const arr = [];
  for (let i = 1; i <= N; i++) {
    if (i % 15 === 0) arr.push("FizzBuzz");
    else if (i % 3 === 0) arr.push("Fizz");
    else if (i % 5 === 0) arr.push("Buzz");
    else arr.push(i);
  }
  return arr;
};

Why is this code faster than the previous implementation? To answer that, let's compute a distribution of numbers divisible by 3, 5, and 15, along with those that don't fall into any of those buckets.

From the distribution graph, we can see that the optimal check order would be divisibility by 3 first, then 5, and finally 15. This would minimize the total number of conditional checks. Unfortunately, all numbers divisible by 15 are also divisible by 3 and 5, so we can't order the modulus checks optimally.

In the naive implementation, we perform 3 conditional checks for every number. For N=50, that means 150 checks total. In the new implementation, numbers divisible by 15 require only 1 check, numbers divisible by 3 require 2, and the rest require 3. For N=50, that works out to 131 checks. At larger values like N=100M, the difference becomes even more pronounced: 300M checks versus 260M. In theory, it should be faster. Let's verify.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.17 µs	251.46 µs	40.43 µs	39.04 µs	86.63 µs	9.96 kb	582.75 kb	255.65 kb
LCM	28.25 µs	213.58 µs	32.54 µs	32.13 µs	62.13 µs	7.48 kb	506.38 kb	255.55 kb

Success! By reducing the number of comparisons and eliminating string concatenations, we cut the execution time by 28% on the p99 metric. You may have noticed that the numbers for fizzBuzzNaive differ from the previous benchmarking run. This is expected. Mitata does its best to keep comparisons predictable, but due to the nature of Node.JS and other background processes on the machine, execution times will vary between runs.

Let's keep the momentum going. Is there a way to guarantee a maximum of 2 checks per iteration? For N=100M, that would mean only 200M checks, which should, in theory, be faster. After some experimentation with control flow, here's what we can do:

const fizzBuzzModulo = (N) => {
  const arr = [];
  for (let i = 1; i <= N; i++) {
    if (i % 3 == 0) {
      if (i % 5 == 0) arr.push("FizzBuzz");
      else arr.push("Fizz");
    } else if (i % 5 == 0) arr.push("Buzz");
    else arr.push(i);
  }
  return arr;
};

With this approach, every code path makes exactly 2 conditional checks. Since we're reducing modulo operations, the second most expensive operation in the algorithm after array append, this should be faster in theory. Let's verify.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.63 µs	180.17 µs	39.21 µs	37.33 µs	78.79 µs	77.62 kb	490.72 kb	255.64 kb
LCM	28.13 µs	262.58 µs	32.18 µs	31.79 µs	64.08 µs	7.48 kb	447.28 kb	255.56 kb
Modulo	28.13 µs	132.50 µs	34.29 µs	34.29 µs	43.21 µs	64.46 kb	474.13 kb	255.56 kb

Using this approach, we reduced the execution time by 32% compared to LCM, or a whopping 45% compared to the naive approach on p99 timings. We're on a roll.

But there's an anomaly in the results. The average execution time of the modulo optimization is actually slower than the LCM variant. What's going on?

The answer lies in the distribution graph above. The majority of numbers won't be divisible by 3, 5, or 15. For those numbers, the modulo optimization performs only two conditional checks, compared to three in the LCM variant. Fewer checks means a faster p99 metric.

So why is the average slower? It comes down to how the CPU works. Modern processors use branch prediction[9] to pre-fetch instructions. The LCM implementation's linear if ... else if ... else if structure is highly predictable, and the CPU can pre-fetch effectively. The nested if statements in the modulo optimization, however, cause the CPU to occasionally mispredict branches. When that happens, it has to discard pre-fetched instructions and load new ones. This penalty results in a worse average execution time, even though the p99 is better.

Optimizing The Array

We've optimized conditional checks as far as we can. As I mentioned earlier, there's one operation even more expensive than modulo, the array append.

Why is array append so costly? Under the hood, V8 dynamically resizes the array. When we define const arr = [];, V8 preallocates space for 4 elements. Once that capacity is exceeded, it allocates a larger buffer and copies the old contents into it. The growth factor is aggressive for small arrays and stabilizes around 1.5x as the array gets larger.

The fix is straightforward: preallocate the entire array so V8 never needs to resize. Let's try it.

const fizzBuzzPreallocation = (N) => {
  const arr = new Array(N);
  for (let i = 0; i < N; i++) {
    const num = i + 1;
    if (num % 3 == 0) {
      if (num % 5 == 0) arr[i] = "FizzBuzz";
      else arr[i] = "Fizz";
    } else if (num % 5 == 0) arr[i] = "Buzz";
    else arr[i] = num;
  }
  return arr;
};

Preallocation works here because we know the array's final size ahead of time. As a bonus, we're also directly assigning values to array indices instead of using push. Let's see if this improves execution speed.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Name	34.67 µs	143.50 µs	38.79 µs	37.25 µs	76.38 µs	85.30 kb	534.25 kb	255.64 kb
LCM	28.17 µs	146.96 µs	31.33 µs	30.92 µs	53.04 µs	87.48 kb	458.84 kb	255.53 kb
Modulo	27.21 µs	187.08 µs	34.27 µs	34.42 µs	42.63 µs	71.13 kb	383.62 kb	255.56 kb
Pre-allocated	18.13 µs	120.67 µs	18.98 µs	18.88 µs	22.17 µs	6.57 kb	400.91 kb	85.02 kb

That's a 48% speedup over the previous optimization, and roughly 71% faster than the original naive approach. Also notable is how much less memory the algorithm allocates on average, a direct consequence of eliminating dynamic array resizing.

Optimizing The For Loop

We've optimized conditional checks and array assignment. The next target is the for loop itself.

You might think .map() would be a good fit here. In modern JavaScript, functional methods like .map(), .forEach(), and friends are everywhere; they make code readable and concise. In practice, though, they'd be slower than a simple for loop. The overhead of function calls and context binding adds complexity, and while V8 is good at function inlining[11], it's not perfect.

So what else can we try? Loop unrolling[12]. This is a classic optimization technique, and it happens that FizzBuzz has a pattern we can exploit. Look closely at the output; this sequence repeats exactly every 15 numbers: 1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz. Instead of checking numbers one by one, we can process them in chunks of 15.

export const fizzBuzzUnrolled = (N) => {
  const arr = new Array(N);
  let i = 1;
  while (i <= N - 15) {
    arr[i-1] = i; arr[i] = i+1; arr[i+1] = "Fizz"; 
    arr[i+2] = i+3; arr[i+3] = "Buzz"; arr[i+4] = "Fizz";
    arr[i+5] = i+6; arr[i+6] = i+7; arr[i+7] = "Fizz";
    arr[i+8] = "Buzz"; arr[i+9] = i+10; arr[i+10] = "Fizz";
    arr[i+11] = i+12; arr[i+12] = i+13;
    arr[i+13] = "FizzBuzz";
    i += 15;
  }
  while (i <= N) {
    if (i % 15 === 0) arr[i-1] = "FizzBuzz";
    else if (i % 3 === 0) arr[i-1] = "Fizz";
    else if (i % 5 === 0) arr[i-1] = "Buzz";
    else arr[i-1] = i;
    i++;
  }
  return arr;
};

Yes, it looks ugly, but bear with me. It's actually quite straightforward. The first while loop processes array values in chunks of 15 at a time. Since N isn't necessarily divisible by 15, the second while loop handles the remainder.

Let's see how it compares to previous variants of the algorithm.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.63 µs	154.42 µs	38.07 µs	35.96 µs	74.50 µs	67.18 kb	538.41 kb	255.62 kb
LCM	28.38 µs	176.17 µs	31.47 µs	30.96 µs	54.63 µs	39.48 kb	470.82 kb	255.54 kb
Modulo	26.96 µs	152.25 µs	33.18 µs	33.17 µs	52.50 µs	45.62 kb	383.62 kb	255.56 kb
Pre-allocated	17.75 µs	170.25 µs	18.94 µs	18.88 µs	23.25 µs	6.57 kb	225.74 kb	85.03 kb
Unrolled	7.33 µs	152.92 µs	7.90 µs	7.83 µs	13.04 µs	2.41 kb	197.38 kb	85.32 kb

That's a further 44% improvement over the pre-allocated variant. Compared to the naive approach, this algorithm is 82% faster on p99 timings, our most representative comparison metric. At this point, I'm out of ideas for further optimizations. Let's peek behind the curtain and see if profiling reveals any clues.

Profiling The Code

There are excellent tools for profiling JavaScript; Clinic.js[13] and 0x[14] are great examples. For an algorithm this simple, though, they're unlikely to be as useful as they would for a more complex implementation with multiple function calls and I/O operations.

Instead, we'll use Node's built-in optimization and deoptimization logs to understand how V8 is handling our code. We can access these by running our script with the node --trace-opt --trace-deopt flags. Of course, a single execution of fizzBuzzNaive or fizzBuzzUnrolled won't tell us much; we need to warm up the JIT first. So I built a simple script that executes a given function 100,000 times with N=10_000 per invocation.

Let's run the script with fizzBuzzNaive and see what we can learn from the optimization log and compare it to the log we get when running fizzBuzzUnrolled. I won't paste full logs, only the interesting parts.

...
Executing fizzBuzzNaive with N=10000 for 100000 iterations...
[marking 0x22a7daff6551 <JSFunction fizzBuzzNaive (sfi = 0x1f7fc6be5591)> for optimization to MAGLEV, ConcurrencyMode::kConcurrent, reason: hot and stable]
[compiling method 0x22a7daff6551 <JSFunction fizzBuzzNaive (sfi = 0x1f7fc6be5591)> (target MAGLEV) OSR, mode: ConcurrencyMode::kConcurrent]
...
[compiling method 0x22a7daff6551 <JSFunction fizzBuzzNaive (sfi = 0x1f7fc6be5591)> (target TURBOFAN_JS) OSR, mode: ConcurrencyMode::kConcurrent]
[bailout (kind: deopt-eager, reason: wrong map): begin. deoptimizing 0x22a7daff6551 <JSFunction fizzBuzzNaive (sfi = 0x1f7fc6be5591)>, 0x13b778783a11 <Code MAGLEV>, opt id 2, bytecode offset 65, deopt exit 2, FP to SP delta 96, caller SP 0x00016b458e50, pc 0x00012c649ce4]
...
[completed compiling 0x16dd18777759 <JSFunction fizzBuzzNaive (sfi = 0x1f7fc6be5591)> (target TURBOFAN_JS) - took 0.000, 0.875, 0.000 ms]
...
[bailout (kind: deopt-eager, reason: prepare for on stack replacement (OSR)): begin. deoptimizing 0x1f7fc6be5c49 <JSFunction (sfi = 0x1f7fc6bdf4a9)>, 0x2fbc45ee0081 <Code MAGLEV>, opt id 6, bytecode offset 436, deopt exit 35, FP to SP delta 208, caller SP 0x00016b458f38, pc 0x00012c683b58]

We can see that V8 first optimized the code to Maglev[15], a mid-tier compiler designed for code that is hot but not yet stable. In this context, being hot means the code is run often, but it lacks the stability, or predictability, required for a higher-tier compiler to maximize speed. It also performed OSR (on-stack replacement)[16], which happens when a currently executing function is swapped out mid-execution for an optimized version.

Next, V8 attempted to promote the code to TurboFan[7], its highest-tier compiler for hot, stable code. However, the optimization failed due to a "wrong map" deoptimization. This means the optimizer's assumptions about the shape of objects in memory were incorrect, forcing a bailout back to Maglev.

Finally, we see another deoptimization for "prepare for on-stack replacement." This one is benign; it's V8 saying: "I'm about to swap in optimized code and need to deoptimize briefly to do it safely."

Now let's look at the log for fizzBuzzUnrolled and see if we can find any differences.

...
[marking 0x39a194c76631 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> for optimization to MAGLEV, ConcurrencyMode::kConcurrent, reason: hot and stable]
[compiling method 0x39a194c76631 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target MAGLEV) OSR, mode: ConcurrencyMode::kConcurrent]
[compiling method 0x39a194c76631 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target MAGLEV), mode: ConcurrencyMode::kConcurrent]
[completed compiling 0x39a194c76631 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target MAGLEV) OSR - took 0.000, 0.417, 0.000 ms]
[completed compiling 0x39a194c76631 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target MAGLEV) - took 0.000, 0.375, 0.000 ms]
[marking 0x3f0b1a5b7b39 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> for optimization to TURBOFAN_JS, ConcurrencyMode::kConcurrent, reason: hot and stable]
[compiling method 0x3f0b1a5b7b39 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target TURBOFAN_JS), mode: ConcurrencyMode::kConcurrent]
[completed compiling 0x3f0b1a5b7b39 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target TURBOFAN_JS) - took 0.042, 2.083, 0.042 ms]
[completed optimizing 0x3f0b1a5b7b39 <JSFunction fizzBuzzUnrolled (sfi = 0x1bc7199256b1)> (target TURBOFAN_JS)]
...
[bailout (kind: deopt-eager, reason: prepare for on stack replacement (OSR)): begin. deoptimizing 0x1bc719925c49 <JSFunction (sfi = 0x1bc71991f4a9)>, 0x2129a66dea89 <Code MAGLEV>, opt id 5, bytecode offset 436, deopt exit 6, FP to SP delta 208, caller SP 0x00016bd6cf38, pc 0x000127e02084]

Right away, we can see that V8 optimized the code to Maglev and Maglev OSR, compiling both the current execution and generating an optimized version for future calls. It then promoted the code to TurboFan, and unlike fizzBuzzNaive, there are no deoptimizations. The only deoptimization is the benign OSR one, which we can safely ignore.

The takeaway from these logs is clear: the unrolled variant is a much better fit for V8's optimizer, which explains its superior performance. I don't think there's much more we can squeeze out at this point, at least, not without inspecting the bytecode to find micro-optimizations. To be honest, I have no interest in descending into that particular madness. But I'm still having fun, so let's implement a few more variants and see how they compare.

Let The Madness Begin

Now that we're drunk on benchmarking power, let's get experimental. What about tail recursion, which is commonly fast and well-optimized in languages like Erlang?

Unfortunately, JavaScript doesn't have Tail Call Optimization[17]. In languages like Erlang or Haskell, TCO allows a recursive function to call itself without adding a new stack frame, effectively turning recursion into a loop. The V8 engine, along with the TC39 committee[18], controversially decided not to support implicit Proper Tail Calls[19]. This means every recursive step adds a frame to the stack[20]. Since the stack size in Node is limited, recursion is a poor choice for large iterations in JavaScript.

That said, we can work around this limitation using the Trampoline Pattern[21].

const trampoline = (fn) => (...args) => {
  let result = fn(...args);
  while (typeof result === 'function') {
    result = result();
  }
  return result;
}

const _fizzBuzzInternal = (n, current = 1, arr = new Array(n)) => {
  if (current > n) return arr;

  if (current % 3 == 0) {
    if (current % 5 == 0) arr[current - 1] = "FizzBuzz";
    else arr[current - 1] = "Fizz";
  } else if (current % 5 == 0) arr[current - 1] = "Buzz";
  else arr[current - 1] = current;

  return () => _fizzBuzzInternal(n, current + 1, arr);
}

export const fizzBuzzRecursive = trampoline(_fizzBuzzInternal);

This implementation is essentially the pre-allocated variant, but using recursion instead of a for loop. The trampoline is a higher-order function that takes a recursive function and returns a version that won't blow the call stack.

Why does this work? The key insight is that _fizzBuzzInternal doesn't return a result; it returns a function. This sidesteps the classic recursion problem where the parent call can't be removed from the stack until its child completes, which in turn can't be removed until its child completes, and so on until the stack overflows. The trampoline acts as a manager; it executes each returned function one at a time, allowing the stack to stay clean as old frames are removed and new ones are added.

Here's what a "standard" recursive implementation would look like without the trampoline:

const fizzBuzzRecursive = (n, current = 1, arr = new Array(n)) => {
  if (current > n) return arr;

  if (current % 3 == 0) {
    if (current % 5 == 0) arr[current - 1] = "FizzBuzz";
    else arr[current - 1] = "Fizz";
  } else if (current % 5 == 0) arr[current - 1] = "Buzz";
  else arr[current - 1] = current;

  return fizzBuzzRecursive(n, current + 1, arr);
}

What else could we try? While exploring functional programming concepts, I came across an interesting approach by Mr. Pirog[22], who implemented a Domain-Specific Language in Haskell to solve FizzBuzz. I wanted to port the concept to JavaScript, partly because it's gloriously excessive, and partly because it reminds me of university days learning about DSLs.

const identity = (n, v) => v;

const rule = (d, s) => (cont) => (n, v) => {
  if (n % d === 0) return s + cont(n, "");
  return cont(n, v);
};

const program = rule(3, "Fizz")(
                  rule(5, "Buzz")(
                    identity
                  )
                );

export const fizzBuzzDSL = (N) => {
  const arr = new Array(N);
  for (let i = 1; i <= N; i++) {
    arr[i-1] = program(i, i);
  }
  return arr;
}

Here we're defining a small DSL for FizzBuzz. The identity and rule functions are its building blocks. The identity function simply returns whatever value is passed to it. The rule function is a higher-order function that takes a divisor and a string, then returns a function that accepts a continuation.

The program is a composition of rules: first check divisibility by 3, then by 5, with identity at the end of the chain. Think of it as an elaborate function callback setup. The identity and rule act like programming keywords (similar to if or for), and we compose them into a program, much like we'd compose traditional control flow statements. The difference is that our keywords are functions, and the final composition is also a function.

The fizzBuzzDSL function simply iterates from 1 to N and applies the program to each number.

Now that we've had our fun with creative implementations, let's see if we can build a variant that's actually faster. Node Addons[23] allow us to write code in C++, Rust, or other languages and call it directly from JavaScript. We'll use napi-rs[24], a Rust binding for the Node Addon API.

#[derive(Debug, Clone, PartialEq)]
enum FizzBuzzValue {
    Number(i32),
    Fizz,
    Buzz,
    FizzBuzz,
}

#[inline]
fn classify(n: i32) -> FizzBuzzValue {
    if n % 15 == 0 {
        FizzBuzzValue::FizzBuzz
    } else if n % 3 == 0 {
        FizzBuzzValue::Fizz
    } else if n % 5 == 0 {
        FizzBuzzValue::Buzz
    } else {
        FizzBuzzValue::Number(n)
    }
}

fn fizz_buzz_core(n: i32) -> Vec<FizzBuzzValue> {
    (1..=n).map(classify).collect()
}

#[napi]
pub fn fizz_buzz_rust(env: Env, n: i32) -> Result<JsObject> {
    let n_usize = n as usize;
    let mut arr = env.create_array_with_length(n_usize)?;

    for (i, val) in fizz_buzz_core(n).into_iter().enumerate() {
        match val {
            FizzBuzzValue::FizzBuzz => {
                arr.set_element(i as u32, env.create_string("FizzBuzz")?)?
            }
            FizzBuzzValue::Fizz => arr.set_element(i as u32, env.create_string("Fizz")?)?,
            FizzBuzzValue::Buzz => arr.set_element(i as u32, env.create_string("Buzz")?)?,
            FizzBuzzValue::Number(num) => {
                arr.set_element(i as u32, env.create_int32(num)?)?
            }
        }
    }

    Ok(arr)
}

This is a straightforward FizzBuzz implementation in Rust using napi-rs. The FizzBuzzValue, classify, and fizz_buzz_core abstractions keep the code idiomatic. The logic is similar to the fizzBuzzLCM variant, and once compiled, it can be easily imported into JavaScript.

const { fizzBuzzRust } = require('./fizzbuzz_napi.node');

const N = 10_000;
const result = fizzBuzzRust(N);
console.log(result);

Let's benchmark all these implementations to see how they compare to previous variants.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.58 µs	139.79 µs	38.03 µs	35.96 µs	74.42 µs	14.59 kb	650.48 kb	255.65 kb
Unrolled	7.29 µs	142.38 µs	7.82 µs	7.79 µs	12.50 µs	6.22 kb	198.74 kb	85.32 kb
Recursive	90.46 µs	269.08 µs	95.58 µs	94.46 µs	190.08 µs	399.04 kb	1.38 mb	1.16 mb
DSL	86.25 µs	345.75 µs	94.82 µs	93.92 µs	129.83 µs	25.83 kb	1.04 mb	128.48 kb
Rust	645.17 µs	994.29 µs	652.35 µs	653.71 µs	675.42 µs	183.83 kb	328.66 kb	255.76 kb

Predictably, the Recursive and DSL variants are much slower than Unrolled, even slower than Naive. Without Tail Call Optimization[17], the Recursive variant pays a heavy price in stack allocations and function calls. The DSL variant has similar overhead. Both show elevated memory consumption compared to the other variants.

Interestingly, the DSL variant's average memory consumption is much lower than the Recursive variant's. This is surprising, since the DSL makes 3 to 4 function calls per iteration of N, while the Recursive variant makes only one. The difference comes down to heap[20] allocations. In the Recursive variant, every return () => _fizzBuzzInternal(n, current + 1, arr); creates a new function object on the heap. In the DSL variant, identity, rule, and program are created once and reused.

Perhaps the most surprising result is Rust; it's dramatically slower than every other approach. You'd expect Rust to outperform JavaScript, but the overhead of crossing the FFI[25] barrier completely overshadows any performance gains. This is a well-known issue with native addons; while native code can be faster, the cost of calling into it can make it slower for small tasks. For larger workloads, the benefits typically outweigh the overhead, but for something as simple as FizzBuzz, it's not worth it.

Before we wrap up, can we optimize the FFI calls in the Rust implementation? One counterintuitive idea is to return a JSON[26] string that JavaScript can parse efficiently, instead of returning a JavaScript-compatible array.

#[napi]
pub fn fizz_buzz_rust_json(n: i32) -> String {
    let values = fizz_buzz_core(n);
    let mut result = String::with_capacity(n as usize * 6);
    result.push('[');
    for (i, val) in values.into_iter().enumerate() {
        if i > 0 {
            result.push(',');
        }
        match val {
            FizzBuzzValue::FizzBuzz => result.push_str("\"FizzBuzz\""),
            FizzBuzzValue::Fizz => result.push_str("\"Fizz\""),
            FizzBuzzValue::Buzz => result.push_str("\"Buzz\""),
            FizzBuzzValue::Number(num) => {
                let _ = write!(result, "{}", num);
            }
        }
    }
    result.push(']');
    result
}

This implementation reuses the FizzBuzzValue, classify, and fizz_buzz_core abstractions, but returns a string buffer instead of a JsObject. On the JavaScript side, it looks like this:

const { fizzBuzzRustJson } = require('./fizzbuzz_napi.node');

const N = 10_000;
const result = JSON.parse(fizzBuzzRustJson(N));
console.log(result);

Let's see if this is actually faster than the previous Rust variant.

Name	Min	Max	Avg	P75	P99	Min mem	Max mem	Avg mem
Naive	34.63 µs	133.50 µs	38.24 µs	35.92 µs	75.58 µs	9.28 kb	557.04 kb	255.62 kb
Unrolled	7.33 µs	159.54 µs	7.89 µs	7.83 µs	12.79 µs	2.57 kb	412.26 kb	85.32 kb
Rust	647.88 µs	973.75 µs	656.97 µs	656.58 µs	725.63 µs	183.83 kb	328.66 kb	255.67 kb
Rust (JSON)	256.79 µs	482.71 µs	261.20 µs	260.42 µs	303.50 µs	58.55 kb	465.45 kb	171.12 kb

The JSON variant is significantly faster than the previous Rust implementation, though it's still slower than any of the pure JavaScript variants.

Why is it faster? It only crosses the FFI barrier once when the final result string is returned. The previous Rust variant crossed the barrier twice per iteration of N. Every call to env.create_string, env.create_int32, and arr.set_element causes V8 to allocate objects on the heap[20] and pass pointers back to Rust.

The other factor is that JSON.parse is such a common operation that V8 has dedicated optimizations[27][28] for executing it as efficiently as possible.

I think we've exhausted the madness. We've implemented 9 different variants of FizzBuzz across JavaScript and Rust, and benchmarked them all. The unrolled variant is the fastest in pure JavaScript, while the JSON variant is the fastest Rust approach. And we've confirmed that FFI overhead can make native code slower for small tasks, even when that native code is Rust.

But wait, there's more!

The Free Lunch

There's one more optimization we can try, and it requires zero code changes. Until now, we've been running benchmarks on my local machine using Node.JS v24. But Node is constantly improving its performance[29], and we now have Bun[30] and Deno[31] as compelling alternatives.

I set up a GitHub Action on my fizzbuzz-js repository that runs the same benchmark against Node v20, v22, v24, v25, as well as the latest Bun and Deno versions. I won't include all the results here; you can check them out on GitHub if you're curious.

Here's what stood out:

Simply switching to Bun gives us another 65% speed improvement on fizzBuzzUnrolled. This is a massive gain and a testament to the work the Bun team has put into runtime performance.
Across runtimes, Bun is the clear winner in both raw speed and memory consumption. Deno and Node are largely neck and neck, with each edging ahead in different scenarios.
Across Node versions, there's a steady performance improvement with each release. Interestingly, Node 25 shows a slight regression compared to Node 24, a common occurrence in new major versions before patch-level optimizations smooth things out.

Conclusion

In this article, we took the humble FizzBuzz problem and pushed it to its limits across JavaScript and Rust. Starting from a naive implementation, we optimized step by step, leveraging fewer conditional checks, array preallocation, loop unrolling, and even a Domain-Specific Language. We explored native addons to see if Rust could outpace V8, and finally compared runtimes for a zero-effort performance boost.

The key takeaways:

There are surprisingly many ways to optimize even a trivial algorithm,
The choice of runtime matters more than you might expect,
And native code isn't always faster when FFI overhead is in play.

If you're interested in the code or want to contribute your own variant to the benchmark, check out the GitHub repository: fizzbuzz-js.

Wikipedia. Fizz buzz https://en.wikipedia.org/wiki/Fizz_buzz
InterviewBit. FizzBuzz https://www.interviewbit.com/problems/fizzbuzz/
Node.JS. The V8 JavaScript Engine https://nodejs.org/en/learn/getting-started/the-v8-javascript-engine
GitHub. mitata https://github.com/evanwashere/mitata
Wikipedia. Just-in-time compilation https://en.wikipedia.org/wiki/Just-in-time_compilation
V8.dev. Ignition https://v8.dev/docs/ignition
V8.dev. TurboFan https://v8.dev/docs/turbofan
Wikipedia. Garbage collection (computer science) https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
Wikipedia. Branch predictor https://en.wikipedia.org/wiki/Branch_predictor
Wikipedia. Least common multiple https://en.wikipedia.org/wiki/Least_common_multiple
Wikipedia. Inline expansion https://en.wikipedia.org/wiki/Inline_expansion
Wikipedia. Loop unrolling https://en.wikipedia.org/wiki/Loop_unrolling
Clinic.js. Tools to help diagnose and pinpoint Node.js performance issues https://clinicjs.org/
0x. single-command flamegraph profiling https://github.com/davidmarkclements/0x
V8.dev. Maglev - V8’s Fastest Optimizing JIT https://v8.dev/blog/maglev
V8.dev. Sparkplug — a non-optimizing JavaScript compiler https://v8.dev/blog/sparkplug
Wikipedia. Tail call https://en.wikipedia.org/wiki/Tail_call
GitHub. Ecma TC39 https://github.com/tc39
GitHub. Is this proposal dead? https://github.com/tc39/proposal-ptc-syntax/issues/22
CS 225. Stack and Heap Memory https://courses.grainger.illinois.edu/cs225/sp2022/resources/stack-heap/
Java Design Patterns. Trampoline Pattern in Java: Mastering Recursion Without Stack Overflow https://java-design-patterns.com/patterns/trampoline/
FizzBuzz in Haskell by Embedding a Domain-Specific Language https://themonadreader.wordpress.com/wp-content/uploads/2014/04/fizzbuzz.pdf
Node.js v24.13.0 documentation. C++ addons https://nodejs.org/docs/latest-v24.x/api/addons.html
NAPI-RS. Building pre-compiled Node.JS addons in Rust https://napi.rs
Wikipedia. Foreign function interface https://en.wikipedia.org/wiki/Foreign_function_interface
Wikipedia. JSON https://en.wikipedia.org/wiki/JSON
V8.dev. Blazingly fast parsing, part 1: optimizing the scanner https://v8.dev/blog/scanner
V8.dev. Blazingly fast parsing, part 2: lazy parsing https://v8.dev/blog/preparser
RepoFlow. Node.js 16 to 25 Performance Benchmarks https://www.repoflow.io/blog/node-js-16-to-25-benchmarks-how-performance-evolved-over-time
Bun. The fast all-in-one JavaScript runtime https://bun.com
Deno. The next-generation JavaScript runtime https://deno.com

DEV Community