Store embeddings at a fidelity budget, not a bit-count — a lossy vector codec for Go

#go #ai #embeddings #compression

An embedding is a []float32. A store of a million of them is 512 MB of
float32 that you will spend most of its life scanning for nearest neighbours.
But here's the thing about that 512 MB: almost none of it is load-bearing.
Nearest-neighbour search only cares about the geometry of the vectors —
their relative angles — not the exact bits. Store them bit-exact and you're
paying full price for precision your retrieval never uses.

qdf — a schemaless binary serializer for Go — has an opt-in lossy vector codec
built on that observation. Instead of asking you how many bits to keep, it asks
what fidelity you need — "keep cosine similarity ≥ 0.99" — and then spends the
fewest bytes that clears that bar. This post is about the knob, the numbers, and
when you should (and shouldn't) turn it on.

The knob is a fidelity budget, not a bit-width

Every other quantizer I've used makes you pick a representation: int8, 4-bit,
this many centroids. That's backwards — you don't care about bits, you care
about whether retrieval still works. qdf inverts it. You set a budget on the
output quality and the codec picks the bits:

enc := qdf.NewEncoderWith(qdf.OptBalanced | qdf.OptLossyVec)
enc.SetVectorBudget(qdf.MinCosine(0.99)) // or MaxRelError / TargetSNR

Three ways to state the budget:

MinCosine(0.99) — bound the minimum cosine similarity between the original and reconstructed vector. This is the one for embeddings: cosine is what your ANN index compares.
MaxRelError(1e-3) — bound the per-vector relative L2 error. For when the magnitude matters, not just the direction.
TargetSNR(40) — target a signal-to-noise ratio in dB, for signal-ish float columns.

The codec only touches []float32 / []float64 columns of a []struct, and
only slices long enough to amortize its header (short vectors fall back to the
lossless path automatically). Everything else in the struct — the IDs, the
metadata — encodes losslessly as usual.

How it spends the bytes

Under the hood the pipeline is: rotate → quantize → entropy-code, with a
never-larger fallback.

A Hadamard rotation spreads each vector's energy evenly across its dimensions. This is the quiet workhorse — it turns a few large-magnitude components into many similar ones, which makes the quantization error isotropic and, crucially, makes the codec behave the same on "nice" smooth vectors and on adversarial ones (more on that below).
A scalar or E8-lattice quantizer maps the rotated components to a grid sized to hit your budget. The lattice option packs the same fidelity into fewer bits by exploiting that rotated components cluster near a sphere.
A static rANS entropy pass squeezes the residual redundancy out of the quantized symbols.
Never-larger fallback: if the whole lossy body ever comes out bigger than the plain lossless encoding — which can happen on tiny or already-compact columns — qdf ships the lossless bytes instead. Turning the codec on can never inflate your output. (This is the same discipline the rest of the format runs on; see the codec-selection writeup.)

The numbers — reproduce them yourself

Here's a harness you can paste and run. It builds a corpus, sweeps the cosine
budget, and measures the worst cosine actually achieved across every vector —
not the average, the worst, because a floor is only a floor if nothing falls
through it.

enc := qdf.NewEncoderWith(qdf.OptBalanced | qdf.OptLossyVec)
enc.SetVectorBudget(qdf.MinCosine(target))
_ = enc.EncodeValue(docs)
lossy := enc.Bytes()

var back []Doc
_ = qdf.Unmarshal(lossy, &back)
// then: worst = min over i of cosine(docs[i].Emb, back[i].Emb)

Run on 2000 vectors, float32, on ubuntu-latest. Two corpora: a smooth
one (sinusoids — the flattering case every quantizer paper uses) and a
random-unit one (Gaussian, L2-normalized — the honest worst case, essentially
incompressible and much closer to what a real embedding model emits).

128 dimensions (lossless baseline: 520 B/vec):

budget	random-unit B/vec	worst cosine	vs lossless
`cos≥0.99`	72.3	0.9955	−86%
`cos≥0.995`	80.3	0.9977	−85%
`cos≥0.999`	98.8	0.9995	−81%

768 dimensions (lossless baseline: 3080 B/vec):

budget	random-unit B/vec	worst cosine	vs lossless
`cos≥0.99`	407	0.9914	−87%
`cos≥0.995`	470	0.9957	−85%
`cos≥0.999`	618	0.9992	−80%

Two things worth calling out, because they're the honest part:

The budget holds. At cos≥0.99 the worst vector in 2000 lands at 0.9914– 0.9955 — above the floor, every time. The knob means what it says.
Random ≈ smooth. On the same run, the smooth corpus compressed to 72.1 B/vec and random-unit to 72.3 — a 0.3% difference. Most quantizers fall apart on incompressible input; the Hadamard rotation is why this one doesn't. If a vector codec only publishes numbers on smooth synthetic data, be suspicious. These are within noise of each other.

At cos≥0.99, 128-dim vectors land around 4.5 bits/dimension versus float32's
32 — a 7× shrink for a cosine hit you'd struggle to measure in recall@10 on most
indexes.

When to turn it on — and when not to

Situation	Verdict
Storing/shipping embeddings for ANN retrieval	Yes — cosine is exactly the metric the budget defends.
You can tolerate ~0.99 cosine (most RAG / semantic search)	Yes — start at `MinCosine(0.99)`, tighten if recall dips.
Vectors are the bulk of your payload	Yes — this is where −80…−87% actually moves your bill.
You need bit-exact reconstruction (dedup by hash, checksums, reproducible training)	No — use `OptBalanced`; lossy is opt-in for a reason.
Short vectors (< 32 elems) or vectors are a rounding error in your payload	Skip — the codec falls back to lossless anyway; no harm, no gain.
Downstream compares exact float equality	No — quantization changes the bits by design.

The rule of thumb: if your pipeline already treats embeddings as approximate —
and ANN search does — then storing them bit-exact is precision you're paying for
and throwing away. Pick the cosine floor your retrieval can live with and let
the codec find the bits.

Try it

go get github.com/alex60217101990/qdf
go run github.com/alex60217101990/qdf/examples/embeddings

The runnable
examples/embeddings
is the harness above, trimmed. Swap in your own vectors, sweep the budget, and
watch the worst-case cosine — if you find a corpus where the floor doesn't hold,
that's a bug and there's an issue template for it. Measured beats anecdotal.

Repo: https://github.com/alex60217101990/qdf