Most JSON string data is boring.
It is just ASCII text that does not contain " or \, does not dip below 0x20, and does not force any special-case escaping at all.
The annoying part is that a serializer still has to prove that.
The naive loop is obvious:
for (let i = 0; i < src.length; i++) {
const code = src.charCodeAt(i);
if (code == 34 || code == 92 || code < 32) {
// escape
}
}
That is correct, but it is also one branch per code unit in the hottest part of the serializer.
For json-as, I wanted the fast path to ask a cheaper question:
Does this whole chunk contain anything interesting at all?
That is where SWAR fits nicely.
For this post, the companion code is here:
What needs detecting?
When serializing a JSON string, these are the lanes that matter:
-
"because it must become\" -
\because it must become\\ - control characters
< 0x20 - non-ASCII UTF-16 code units, because they cannot stay on the pure ASCII copy path
If I can classify four UTF-16 lanes at once inside one u64, then the common case becomes:
- load 8 bytes
- compute a mask
- copy directly if the mask is zero
That is the whole job of this helper written in the AssemblyScript Language
@inline function detect_escapable_u64_swar_safe(block: u64): u64 {
const lo = block & 0x00ff_00ff_00ff_00ff;
const ascii_mask =
((lo - 0x0020_0020_0020_0020)
| ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
| ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
& (0x0080_0080_0080_0080 & ~lo);
const hi_mask =
((block - 0x0100_0100_0100_0100) & ~block & 0x8000_8000_8000_8000)
^ 0x8000_8000_8000_8000;
return (ascii_mask & (~hi_mask >> 8)) | hi_mask;
}
It looks cryptic, but it is just two independent predicates packed into one result.
Why UTF-16 makes this convenient
AssemblyScript strings are UTF-16, and that matters a lot here.
For plain ASCII text:
- the low byte of each 16-bit lane contains the character
- the high byte is zero
That gives a very convenient layout. I can inspect the low bytes for JSON escape cases, and separately inspect the high bytes to notice when the block is no longer plain ASCII.
So this is not a universal string trick. It is a particularly good fit for UTF-16 data.
The low-byte test
First, isolate the low byte of each UTF-16 lane:
const lo = block & 0x00ff_00ff_00ff_00ff;
Now I only care about three low-byte predicates:
< 0x20== 0x22== 0x5c
The detector builds those without branches:
const ascii_mask =
((lo - 0x0020_0020_0020_0020)
| ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
| ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
& (0x0080_0080_0080_0080 & ~lo);
This is standard SWAR arithmetic:
- subtraction to manufacture per-lane underflow or equality signals
- masking to collapse those signals into lane-local high bits
- no per-character branch in the hot path
If one of the four low bytes looks special, its lane gets marked in ascii_mask.
The high-byte test
That still leaves non-ASCII code units.
The ASCII trick only works when every high byte is zero, so the detector also asks:
Is any UTF-16 high byte non-zero?
That is what this part does:
const hi_mask =
((block - 0x0100_0100_0100_0100) & ~block & 0x8000_8000_8000_8000)
^ 0x8000_8000_8000_8000;
The result is a mask over the high bytes of the four lanes. If a lane is not plain ASCII, that lane is marked.
Then the final merge is:
return (ascii_mask & (~hi_mask >> 8)) | hi_mask;
That means:
- keep the ASCII escape hits for truly ASCII lanes
- also mark any lane whose high byte is non-zero
The serializer now has exactly the answer it needs:
Which lanes force me off the pure ASCII copy path?
The unsafe variant
There is also a cheaper detector:
@inline export function detect_escapable_u64_swar_unsafe(block: u64): u64 {
const lo = block & 0x00ff_00ff_00ff_00ff;
const ascii_mask =
((lo - 0x0020_0020_0020_0020)
| ((lo ^ 0x0022_0022_0022_0022) - 0x0001_0001_0001_0001)
| ((lo ^ 0x005c_005c_005c_005c) - 0x0001_0001_0001_0001))
& (0x0080_0080_0080_0080 & ~lo);
return ascii_mask | (block & 0xff00_ff00_ff00_ff00);
}
This one does not prove that a non-ASCII lane is really dangerous. It just marks any lane with a non-zero high byte and lets the later slow path sort it out.
That means more false positives, but a smaller and faster detector.
So the tradeoff is simple:
-
safe: do more proof up front -
unsafe: bail out faster and prove less on the hot path
If your strings are mostly ASCII, that is often a good deal.
What the serializer gets from this
Once you have the mask, the fast path becomes tiny:
while (true) {
const block = load<u64>(srcStart);
let mask = detect_escapable_u64_swar_unsafe(block);
while (mask != 0) {
const laneIdx = ctz(mask) >> 3;
// clear low and high bytes
mask &= ~(0xffff << (laneIdx << 3));
// even (0 2 4 6) -> confirmed ascii escape
// odd (1 3 5 7) -> possibly a unicode code unit or surrogate
if ((laneIdx & 1) === 0) {
// handle non-surrogates (fast path)
continue;
}
// and handle surrogates here (unlikely to be hit)
}
offset += 8;
}
That is the payoff.
For ordinary ASCII blocks with no escapes, the serializer just copies 8 bytes and moves on.
Only when the mask is non-zero does it need to identify the exact lane and expand the escape sequence.
That shifts the cost model in the right direction. Instead of repeatedly asking “does this character need escaping?”, the hot path asks “is there anything interesting in this whole block?”
Benchmarking safe vs unsafe
I reran the detector-only benchmark with matching harness shapes in both implementations:
- AssemblyScript compiled to Wasm and run with
wasmer --llvm --enable-pass-params-opt - native C compiled with aggressive host-tuned flags and LTO
Both versions use the same payload size, warmup policy, operation count, and logging structure. The benchmark code is here:
The two payload shapes are:
- plain ASCII text with no escapes
- escape-heavy text with quotes, backslashes, and control characters
Here are the current numbers from that harness:
| Case | Wasm / wasmer --llvm --enable-pass-params-opt
|
Native C |
|---|---|---|
| safe / plain | 5030 MB/s | 6824 MB/s |
| unsafe / plain | 6355 MB/s | 8562 MB/s |
| safe / escaped | 4908 MB/s | 6295 MB/s |
| unsafe / escaped | 7093 MB/s | 8331 MB/s |
That is exactly what I wanted to know. The unsafe detector is not just theoretically smaller. It buys measurable throughput in the actual runtime I care about. When integrating it into an actual JSON serializer, it's marginally faster too.
If you want to rerun it, the example folder is here:
And the commands are just:
make install
make build
make run
Looking at the generated code
The throughput numbers tell you the tradeoff is real, but the code shape explains why.
The unsafe detector compiles to a smaller body because it skips the extra high-byte proof/merge logic from the safe version. That is easiest to see in the inspection example:
There is nothing magical there. The unsafe version is just doing less work.
If you want to see the real implementation in context, these are the relevant json-as sources:
Top comments (0)