TL;DR
- A metrics engine had two query paths — a SQL push-down for big datasets, an in-memory aggregator for small ones. They drifted.
- The push-down path bound a
metricparameter but never added it to theWHERE. With several metric series in one dataset, every query summed across all of them. - A KPI that should read 819 read 163,667,603,769.
- Fix: put the
metric_keypredicate in the shared baseWHEREso every compile path inherits it, and regression-test both paths assert it.
The setup: two paths, one contract
A lot of analytics layers compute the same number two ways. For a big dataset you push the aggregation down to the database. For a small one — a preview, a draft dashboard — you pull the rows and aggregate in memory. Faster path, correct path. Both are supposed to return the same value. That's the contract.
The dataset stores rows keyed by a metric_key, because one dataset can hold several series at once — say a plain row count and a count-distinct. Each series lives in the same table, told apart only by its key.
The bug: a bound param is not a filter
The in-memory aggregator filtered by metric_key correctly. The SQL compiler bound a metric parameter into the query... and never referenced it in the WHERE.
With a single series in the dataset, it worked by accident — there was nothing else to sum. Add a second series and the math quietly breaks: the query sums across every series. In this case the second series stored hashed values around 1.9 billion each, so the KPI ballooned from 819 to 163 billion.
| Before | After | |
|---|---|---|
metric value |
bound, unused | bound |
WHERE predicate |
(none on metric) | metric_key = {metric:String} |
| 1 series in dataset | correct (by luck) | correct |
| N series in dataset | sums across all | isolated |
The lesson is small and easy to miss: binding a parameter only makes the value available — it does nothing until a predicate references it. When one path already returns sane-looking numbers, nobody goes looking.
The real fix is parity, not a patch
You could bolt the predicate onto the one broken query and move on. Better: put shared predicates in one place so every compile path inherits them by construction.
┌─────────────────────┐
query ──▶ │ basePredicates() │ ◀── metric_key = {metric}
│ (shared WHERE) │ tenant_id = {tenant}
└──────────┬──────────┘
│
┌──────────────┴──────────────┐
▼ ▼
push-down SQL in-memory aggregator
(big datasets) (preview / small)
└──────────────┬──────────────┘
▼
same number
When two code paths must agree, pin the agreement in a test — don't just test each path alone.
it('isolates every metric query by metric_key', function () {
foreach ([CompilePath::PushDown, CompilePath::InMemory] as $path) {
$sql = (new MetricCompiler($path))->compile(
metricKey: 'record_count',
tenantId: 'acme',
);
expect($sql->wherePredicates())
->toContain('metric_key = {metric:String}');
}
});
Takeaway
Any time you have a fast path and a correct path for the same value, write the test that asserts they produce the same filter — and the same answer. The gap between "works on my one-series sample" and "sums a billion rows in production" is exactly one missing predicate. Drift between two paths is where the absurd-number bugs live.
Top comments (0)