High-cardinality metrics are one of those ideas that sound obviously right—until you try to use them in production.
In theory, they promise precision. Instead of averages and rollups, you get specificity: per-request, per-user ID, per-container, per-feature insights. The kind of detail engineers instinctively want when something is on fire.
And then things start breaking.
Not immediately. Not loudly. But quietly—often in ways that feel like mysterious bugs until you realize the system itself was never designed for this shape of data.
What makes high-cardinality failures especially painful is that nothing crashes. Dashboards still load. Alerts still fire. Deploys continue as usual. The only early signal is often an unexplainable cost spike or queries that suddenly feel sluggish during incidents.
Under the hood, the reason is mechanical. Every unique label combination creates a new time series. Each series needs storage, index entries, memory during ingestion, and ongoing compaction work. As cardinality grows, cost and query complexity don’t scale linearly—they multiply.
At query time, the problem gets worse. Filters that once narrowed the search space stop being selective. Queries fan out across hundreds of thousands—or millions—of sparse, short-lived series. The query engine isn’t broken; it’s doing exactly what it was asked to do, across far more data than anyone realized they’d created.
The most dangerous failure mode isn’t cost or performance—it’s trust. Charts flicker. Series appear and disappear. Queries return inconsistent shapes. Engineers stop believing what they see and quietly fall back to logs, not because logs are better, but because they’re predictable.
The takeaway isn’t that high cardinality is bad. It’s that unbounded, accidental cardinality shows up later as cost surprises, slow queries, and trust erosion unless systems are explicitly designed for it.
If this sounds familiar, the full post walks through:
- why these failures are hard to detect early
- the systems-level mechanics behind them and
- the patterns teams use to make high-cardinality metrics survivable in practice
Complete article here
https://last9.io/blog/why-high-cardinality-metrics-break/
Top comments (0)