How a Go serializer picks the smallest encoding for every column — and never guesses wrong

#go #performance #serialization #compression

There is no single best way to encode a batch of records. A column of HTTP
status codes wants run-length encoding. A column of monotonically increasing
timestamps wants delta coding. A column of trace IDs wants substring
compression. A column of embeddings wants something else entirely. Pick one
codec for the whole message and you leave most of the win on the floor.

qdf — a schemaless binary serializer for Go — takes the opposite approach: it
transposes a []struct into columns and then chooses a codec per column,
measuring the candidates instead of guessing. And it does it under a rule that
makes the choice safe to turn on blindly: it can never produce a larger column
than the plain encoding. This post is about how that works and why the
never-larger rule is the part that actually matters.

Step 1: transpose to columns

Given a []struct, row-major encoding writes field-by-field, record after
record. That's what json, msgpack, and protobuf do — and it's why a repeated
"region":"eu-west-1" costs its full length in every row.

qdf, under its Dense/columnar path, pivots the batch: all the Status values
together, all the Timestamp values together, all the TraceID values
together. Now each column is a homogeneous array — and homogeneous arrays are
exactly what specialized codecs are good at.

Step 2: a menu of codecs per column type

Once you're looking at one column, the codec space opens up:

Integers / durations / counts

FOR (frame of reference) — subtract the column minimum, bit-pack the residuals. Great for bounded ranges (ports, status codes, small counters).
Delta + FOR — encode the first value plus bit-packed deltas against a running predictor. This is the one for monotonic sequences (timestamps, ids).
RLE — one (value, run-length) pair per run. Wins hard on enum-like columns where the same value repeats (log levels, booleans, sparse counters).
Dictionary — a table of distinct values plus a bit-packed index per row.
Patched FOR (PFOR) — FOR with an exception list for the few outliers that would otherwise blow up the bit width.

Floats

Gorilla XOR — XOR each sample against the previous one and store only the differing bits. Built for smooth time-series (sensor readings, gauges).
ALP — for decimal-ish []float64/[]float32 that are secretly fixed-point (prices, quantized values), store the integer mantissa.

Strings

Dictionary and front-coding for low-cardinality or shared-prefix columns (SIDs, DNs, paths, URLs).
Alphabet packing for high-cardinality values drawn from a small alphabet (hex / base32 / base64 IDs — store each char in ceil(log2|A|) bits).
FSST — a learned table of up to 255 substrings for high-cardinality free text (log lines, URLs), compressing at the byte level.

Whole body

rANS — a final static order-0 entropy pass that squeezes the residual byte-entropy the structural codecs leave behind.

That's a lot of choices. The interesting question is not "which codecs exist"
— it's "how do you pick, per column, without a config file and without getting
it wrong."

Step 3: probe, then pick the smallest

For each column, qdf runs a cheap bounded probe that predicts the encoded size
of the viable candidates, then emits the smallest. The probe is designed to be
much cheaper than actually encoding every candidate — it estimates from column
statistics (min/max, run structure, distinct count) rather than doing the full
work five times.

The expensive tiers (Gorilla, FSST, rANS) are gated behind opt-in flags
(OptCompression), because they trade encode CPU for bytes and you don't always
want that trade. The cheap structural codecs (FOR, Delta, RLE, dictionary) run
on the default OptBalanced tier.

Step 4: the never-larger guarantee

Here's the rule that ties it together: for every codec, qdf compares the
candidate encoding against the plain one and emits the compressed form only when
it is strictly smaller. If a "compression" codec would make a column bigger —
which absolutely happens on adversarial or already-incompressible data — qdf
emits the plain encoding instead.

The consequence is the useful part: turning compression on can never inflate
your output. You don't have to reason about whether your data is a good fit.
You don't have to benchmark before flipping the flag. The worst case is "no
better than plain," never "worse than plain." That property is what lets qdf
auto-select aggressively instead of shipping a pile of knobs.

It also composes down to the whole message: the final rANS pass is applied only
when it shrinks the body, so OptCompression is never larger than
OptBalanced, which is never larger than the plain encoding.

What it buys you, measured

On real telemetry batches (GitHub Actions ubuntu-latest, Go 1.26), wire size
versus protobuf:

batch	qdf balanced	qdf compression
OTLP traces	−75%	−77%
logs	−72%	−72%
RTB bids	−25%	−39%
events	−39%	−39%
IoT floats	−24%	−29%

The wins track the data: OTLP and logs are string-heavy and repetitive, so
interning + columnar string codecs dominate; RTB and IoT are less repetitive, so
the numeric codecs do the work and the margins are smaller. That's the honest
shape of it — the codec picker is only as good as the redundancy in your data.

The discipline behind the menu

The codec list above is the survivors. The measure-first process that picked
per-column codecs also killed a lot of ideas that looked good on paper:
GPU-offloaded rANS (only wins on multi-MB single bodies — qdf messages are KB),
SIMD-gathered rANS (5× slower than scalar interleaved pre-AVX512),
multicore columnar encode (memory-bandwidth-bound, ~1.0×), and a learned
ScaNN-style vector quantizer (measured under 1pp recall gain). Every codec in
the menu earned its slot on a benchmark, and none of them can make your output
bigger. That's the whole design in one sentence.

Try it

go get github.com/alex60217101990/qdf

data, _ := qdf.Marshal(batch, qdf.OptBalanced) // or OptCompression
var back []Record
_ = qdf.Unmarshal(data, &back)

Runnable examples (telemetry, query-the-bytes, embeddings, streaming,
zero-alloc decode) are in
examples/, and
the per-codec details live in the repo.

If you find a payload where a codec loses that it shouldn't — there's an issue
template for exactly that. Measured beats anecdotal.

Repo: https://github.com/alex60217101990/qdf