DEV Community

Cover image for When a KPI reads 163 billion instead of 819
Nasrul Hazim Bin Mohamad
Nasrul Hazim Bin Mohamad

Posted on

When a KPI reads 163 billion instead of 819

TL;DR

  • A metrics engine had two query paths — a SQL push-down for big datasets, an in-memory aggregator for small ones. They drifted.
  • The push-down path bound a metric parameter but never added it to the WHERE. With several metric series in one dataset, every query summed across all of them.
  • A KPI that should read 819 read 163,667,603,769.
  • Fix: put the metric_key predicate in the shared base WHERE so every compile path inherits it, and regression-test both paths assert it.

The setup: two paths, one contract

A lot of analytics layers compute the same number two ways. For a big dataset you push the aggregation down to the database. For a small one — a preview, a draft dashboard — you pull the rows and aggregate in memory. Faster path, correct path. Both are supposed to return the same value. That's the contract.

The dataset stores rows keyed by a metric_key, because one dataset can hold several series at once — say a plain row count and a count-distinct. Each series lives in the same table, told apart only by its key.

The bug: a bound param is not a filter

The in-memory aggregator filtered by metric_key correctly. The SQL compiler bound a metric parameter into the query... and never referenced it in the WHERE.

With a single series in the dataset, it worked by accident — there was nothing else to sum. Add a second series and the math quietly breaks: the query sums across every series. In this case the second series stored hashed values around 1.9 billion each, so the KPI ballooned from 819 to 163 billion.

Before After
metric value bound, unused bound
WHERE predicate (none on metric) metric_key = {metric:String}
1 series in dataset correct (by luck) correct
N series in dataset sums across all isolated

The lesson is small and easy to miss: binding a parameter only makes the value available — it does nothing until a predicate references it. When one path already returns sane-looking numbers, nobody goes looking.

The real fix is parity, not a patch

You could bolt the predicate onto the one broken query and move on. Better: put shared predicates in one place so every compile path inherits them by construction.

            ┌─────────────────────┐
 query  ──▶ │  basePredicates()   │  ◀── metric_key = {metric}
            │  (shared WHERE)     │      tenant_id  = {tenant}
            └──────────┬──────────┘
                       │
        ┌──────────────┴──────────────┐
        ▼                             ▼
  push-down SQL                 in-memory aggregator
  (big datasets)                (preview / small)
        └──────────────┬──────────────┘
                       ▼
                same number
Enter fullscreen mode Exit fullscreen mode

When two code paths must agree, pin the agreement in a test — don't just test each path alone.

it('isolates every metric query by metric_key', function () {
    foreach ([CompilePath::PushDown, CompilePath::InMemory] as $path) {
        $sql = (new MetricCompiler($path))->compile(
            metricKey: 'record_count',
            tenantId: 'acme',
        );

        expect($sql->wherePredicates())
            ->toContain('metric_key = {metric:String}');
    }
});
Enter fullscreen mode Exit fullscreen mode

Takeaway

Any time you have a fast path and a correct path for the same value, write the test that asserts they produce the same filter — and the same answer. The gap between "works on my one-series sample" and "sums a billion rows in production" is exactly one missing predicate. Drift between two paths is where the absurd-number bugs live.

Top comments (0)