JairusSW

Posted on Apr 1

Quickly detecting JSON Escapes with SWAR

#performance #javascript #typescript #tutorial

Most JSON string data is boring.

It is just ASCII text that does not contain " or \, does not dip below 0x20, and does not force any special-case escaping at all.

The annoying part is that a serializer still has to prove that.

The naive loop is obvious:

for (let i = 0; i < src.length; i++) {
  const code = src.charCodeAt(i);
  if (code == 34 || code == 92 || code < 32) {
    // escape
  }
}

That is correct, but it is also one branch per code unit in the hottest part of the serializer.

For json-as, I wanted the fast path to ask a cheaper question:

Does this whole chunk contain anything interesting at all?

That is where SWAR fits nicely.

For this post, the companion code is here:

What needs detecting?

When serializing a JSON string, these are the lanes that matter:

" because it must become \"
\ because it must become \\
control characters < 0x20
non-ASCII UTF-16 code units, because they cannot stay on the pure ASCII copy path

If I can classify four UTF-16 lanes at once inside one u64, then the common case becomes:

load 8 bytes
compute a mask
copy directly if the mask is zero

That is the whole job of this helper written in the AssemblyScript Language

@inline function detect_escapable_u64_swar_safe(block: u64): u64 {
  const lo = block & 0x00ff_00ff_00ff_00ff;
  const ascii_mask =
    ((lo - 0x0020_0020_0020_0020)
      | ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
      | ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
    & (0x0080_0080_0080_0080 & ~lo);

  const hi_mask =
    ((block - 0x0100_0100_0100_0100) & ~block & 0x8000_8000_8000_8000)
    ^ 0x8000_8000_8000_8000;

  return (ascii_mask & (~hi_mask >> 8)) | hi_mask;
}

It looks cryptic, but it is just two independent predicates packed into one result.

Why UTF-16 makes this convenient

AssemblyScript strings are UTF-16, and that matters a lot here.

For plain ASCII text:

the low byte of each 16-bit lane contains the character
the high byte is zero

That gives a very convenient layout. I can inspect the low bytes for JSON escape cases, and separately inspect the high bytes to notice when the block is no longer plain ASCII.

So this is not a universal string trick. It is a particularly good fit for UTF-16 data.

The low-byte test

First, isolate the low byte of each UTF-16 lane:

const lo = block & 0x00ff_00ff_00ff_00ff;

Now I only care about three low-byte predicates:

< 0x20
== 0x22
== 0x5c

The detector builds those without branches:

const ascii_mask =
  ((lo - 0x0020_0020_0020_0020)
    | ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
    | ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
  & (0x0080_0080_0080_0080 & ~lo);

This is standard SWAR arithmetic:

subtraction to manufacture per-lane underflow or equality signals
masking to collapse those signals into lane-local high bits
no per-character branch in the hot path

If one of the four low bytes looks special, its lane gets marked in ascii_mask.

The high-byte test

That still leaves non-ASCII code units.

The ASCII trick only works when every high byte is zero, so the detector also asks:

Is any UTF-16 high byte non-zero?

That is what this part does:

const hi_mask =
  ((block - 0x0100_0100_0100_0100) & ~block & 0x8000_8000_8000_8000)
  ^ 0x8000_8000_8000_8000;

The result is a mask over the high bytes of the four lanes. If a lane is not plain ASCII, that lane is marked.

Then the final merge is:

return (ascii_mask & (~hi_mask >> 8)) | hi_mask;

That means:

keep the ASCII escape hits for truly ASCII lanes
also mark any lane whose high byte is non-zero

The serializer now has exactly the answer it needs:

Which lanes force me off the pure ASCII copy path?

The unsafe variant

There is also a cheaper detector:

@inline export function detect_escapable_u64_swar_unsafe(block: u64): u64 {
  const lo = block & 0x00ff_00ff_00ff_00ff;
  const ascii_mask =
    ((lo - 0x0020_0020_0020_0020)
      | ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
      | ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
    & (0x0080_0080_0080_0080 & ~lo);

  return ascii_mask | (block & 0xff00_ff00_ff00_ff00);
}

This one does not prove that a non-ASCII lane is really dangerous. It just marks any lane with a non-zero high byte and lets the later slow path sort it out.

That means more false positives, but a smaller and faster detector.

So the tradeoff is simple:

safe: do more proof up front
unsafe: bail out faster and prove less on the hot path

If your strings are mostly ASCII, that is often a good deal.

What the serializer gets from this

Once you have the mask, the fast path becomes tiny:

while (true) {
  const block = load<u64>(srcStart);

  let mask = detect_escapable_u64_swar_unsafe(block);

  while (mask != 0) {
    const laneIdx = ctz(mask) >> 3;
    // clear low and high bytes
    mask &= ~(0xffff << (laneIdx << 3));
    // even (0 2 4 6) -> confirmed ascii escape
    // odd (1 3 5 7) -> possibly a unicode code unit or surrogate

    if ((laneIdx & 1) === 0) {
      // handle non-surrogates (fast path)
      continue;
    }

    // and handle surrogates here (unlikely to be hit)
  }

  offset += 8;
}

That is the payoff.

For ordinary ASCII blocks with no escapes, the serializer just copies 8 bytes and moves on.

Only when the mask is non-zero does it need to identify the exact lane and expand the escape sequence.

That shifts the cost model in the right direction. Instead of repeatedly asking “does this character need escaping?”, the hot path asks “is there anything interesting in this whole block?”

Benchmarking safe vs unsafe

I reran the detector-only benchmark with matching harness shapes in both implementations:

AssemblyScript compiled to Wasm and run with wasmer --llvm --enable-pass-params-opt
native C compiled with aggressive host-tuned flags and LTO

Both versions use the same payload size, warmup policy, operation count, and logging structure. The benchmark code is here:

The two payload shapes are:

plain ASCII text with no escapes
escape-heavy text with quotes, backslashes, and control characters

Here are the current numbers from that harness:

Case	Wasm / `wasmer --llvm --enable-pass-params-opt`	Native C
safe / plain	5030 MB/s	6824 MB/s
unsafe / plain	6355 MB/s	8562 MB/s
safe / escaped	4908 MB/s	6295 MB/s
unsafe / escaped	7093 MB/s	8331 MB/s

That is exactly what I wanted to know. The unsafe detector is not just theoretically smaller. It buys measurable throughput in the actual runtime I care about. When integrating it into an actual JSON serializer, it's marginally faster too.

If you want to rerun it, the example folder is here:

01-c-and-wasm-benchmark

And the commands are just:

make install
make build
make run

Looking at the generated code

The throughput numbers tell you the tradeoff is real, but the code shape explains why.

The unsafe detector compiles to a smaller body because it skips the extra high-byte proof/merge logic from the safe version. That is easiest to see in the inspection example: