Building a t-digest CLI in Go — Stream p50/p99/p99.9 in a Few KB, a Scale Function That Keeps Tails Sharp, Exact min/max, and Mergeable Digests

#go #algorithms #datastructures #cli

A t-digest as a Go CLI. It estimates quantiles (p50, p99, p99.9…) of a stream in a few KB instead of sorting the whole thing. The hinges: (1) a scale function k(q) controls centroid size so the median is coarse and the tails stay sharp — p99.9 ends up more precise than the median, by design; (2) q=0/q=1 return the exact min/max while the middle interpolates between centroid means. Plus digests merge. Part four of a probabilistic-data-structure series after #274–276 (membership / cardinality / frequency) — this one is quantiles.

📦 GitHub: https://github.com/sen-ltd/t-digest-cli

What a t-digest is

Exact quantiles need every value in memory, sorted — O(N) space. A t-digest
keeps instead a few hundred centroids, each a (mean, weight) pair
summarizing a cluster of nearby values, and answers any quantile by walking
them. For 500k points that's ~100 centroids and a few KB, regardless of N.

The trap with naive bucketing is the tails: lump values into equal-sized buckets
and your p99.9 is as coarse as your median. A t-digest fixes exactly that — it's
the whole reason the structure exists.

Here's the CLI, real output:

$ td summary --from latencies-ms.txt --exact -c 200
count        500000
min / max    0.3 / 1655.8
centroids    122  (δ=200, 1952 bytes)
p50        20.1183        exact 20.1     (+0.09%)
p90        63.736         exact 63.7     (+0.06%)
p99        162.803        exact 161.9    (+0.56%)
p99.9      322.033        exact 312.6    (+3.02%)

Half a million latency samples summarized in under 2 KB, p99 off by half a
percent.

Hinge 1: a scale function that keeps the tails sharp

Centroids aren't allowed to grow uniformly. A scale function k(q) maps the
quantile q into a "k-space" where each centroid may span at most 1 unit.
With the standard k1 scale,

k(q)    = δ/(2π) · asin(2q − 1)
k⁻¹(k)  = (sin(2π k/δ) + 1) / 2

k is steep near q=0 and q=1, flat near q=0.5. So a centroid covering one
k-unit spans a wide band of quantiles around the median (where precision is
cheap) and a razor-thin band at the tails (where p99.9 lives). The compression
δ is the single knob: more δ → more centroids → tighter quantiles.

q := q0 + (cur.Weight+x.Weight)/total
if q <= qLimit {              // still within 1 k-unit → absorb
    cur = mergeInto(cur, x)
} else {                      // would overflow → seal cur, start a new centroid
    out = append(out, cur)
    q0 += cur.Weight / total
    qLimit = kInv(k(q0) + 1)
    cur = x
}

A test on a uniform stream pins the payoff: p50 is accurate to ~1% while p999 is
held to ~0.2% — tighter at the tail (TestUniformQuantiles).

Hinge 2: exact min/max, interpolated middle

Quantiles are read by walking centroids, accumulating half-weights, and linearly
interpolating between adjacent centroid means. The two ends are special: q=0
and q=1 return the exact min/max (tracked separately), and the tails
interpolate between those extremes and the outermost centroids:

func (t *TDigest) Quantile(q float64) float64 {
    if q <= 0 { return t.min }
    if q >= 1 { return t.max }
    index := q * t.total
    // ... walk centroids, interpolate between means ...
}

TestExactExtremes checks the ends are exact; TestMonotonic checks the whole
curve is non-decreasing.

The signature property: merge

t-digests combine: dump one digest's centroids into another and recompress. The
merged digest answers quantiles as if it had seen both streams — so per-shard or
per-minute digests roll up into a global one with no re-scan:

func (t *TDigest) Merge(other *TDigest) { /* absorb centroids, recompress */ }

$ td merge -o all.tdigest a.tdigest b.tdigest
merged 2 digests → all.tdigest  count=500000  centroids=112

TestMergeApproximatesCombined pins that a merge of two disjoint-range digests
matches the digest of the combined stream across p10…p99 — the same composability
as HyperLogLog's merge-by-max (#275) and Count-Min's merge-by-sum (#276).

Persistence

A digest serializes to a compact blob — "TDG1" | δ | min | max | total | n | (mean, weight)*n — a handful of bytes per centroid. Hand-rolled, no codec.

tdigest/tdigest.go — digest: k1 scale, compress, quantile, merge, (de)serialize (11 tests)
main.go            — CLI: summary / add / quantile / merge / info, file or stdin

Takeaway

The t-digest is the go-to for "measure p99 latency without keeping every sample."
The keys are a scale function that keeps the tails sharp, exact min/max,
and mergeability. With it the probabilistic series now spans membership
(#274), cardinality (#275), frequency (#276), and quantiles (#277) — all the same
instinct of approximating in fixed memory, here expressed as pushing precision
out to the tail where it matters.

📦 GitHub: https://github.com/sen-ltd/t-digest-cli