DEV Community

Cover image for Why ClickHouse Fails as a General-Purpose Logging Solution
Patrick Londa for Bronto

Posted on • Originally published at bronto.io

Why ClickHouse Fails as a General-Purpose Logging Solution

Authored by Mike Neville-O'Neill

Let's talk about ClickHouse. It's fast, it's efficient, it's open source. But it's designed to be a general-purpose analytical database, which makes it challenging — in terms of time and resources — to use for logging at scale.

I've spent over a decade working on logging systems and watched countless organizations struggle with the same pattern: a team frustrated with expensive legacy log management turns to ClickHouse, excited by early performance wins on well-structured data. They rebuild critical logging infrastructure around it, roll it out, and everything seems great... until it doesn't.


Why Teams Try ClickHouse for Logs

The appeal is undeniable. When you first point ClickHouse at a clean, structured dataset, the performance is genuinely impressive:

  • Speed — Built for analytical queries, can scan billions of rows in seconds on modest hardware
  • Cost — Open source, so licensing costs are zero
  • Familiarity — If your team works with SQL, the learning curve is minimal
  • Flexibility — Complete control over your data, schema, and retention policies

Early benchmarks usually reinforce this perception. Testing on controlled, clean data with predictable patterns, ClickHouse shines. You might start with a specific subset of logs — API access logs or well-structured application events — and see impressive query speeds.

The problem isn't that ClickHouse doesn't work for logs. The problem is that it doesn't work for all logs, all the time, at scale, without significant engineering investment. And that's where organizations get trapped.


The Architectural Mismatch

Logs in the wild are messy, inconsistent, and unpredictable — the direct opposite of what ClickHouse was designed to handle.

The Columnar Conundrum

ClickHouse's columnar storage engine is optimized for queries that scan large portions of specific columns — brilliant for analytics workloads with well-defined dimensions. But logs are fundamentally different:

  • Unpredictable schemas — Log formats change; new fields appear and disappear without warning
  • Nested structures — Modern logs, especially from cloud environments, contain deeply nested JSON with inconsistent structures
  • High cardinality — Fields like session IDs and request IDs have enormous numbers of unique values, which presents challenges for ClickHouse
  • Mixed query patterns — Log analysis involves aggregation, filtering, full-text search, and pattern matching — not just the analytical queries ClickHouse excels at

Forcing logs into ClickHouse is like racing a Formula 1 car on an off-road trail — impressive engineering, wrong application.

Schema Rigidity vs. Log Reality

In the real world, logs evolve constantly:

  • Developers add new fields to debug specific issues
  • Third-party systems change their log formats without notice
  • Cloud providers modify their event structures
  • Microservices introduce different logging conventions across teams

With ClickHouse, schema changes require careful planning to avoid additional overhead. A purpose-built logging system accommodates this reality by adapting to your logs — not the other way around.

Indexing Limitations

ClickHouse's primary indexing mechanism — the sparse primary index — works by creating index marks every N rows (typically thousands). This excels for batch analytical queries but creates significant compromises for log search:

  • Limited selectivity — Works best when scanning large portions of the dataset, not when looking for specific events
  • Suboptimal for high-cardinality fields — User IDs, session IDs, trace IDs all struggle with this approach
  • Text search limitations — Full-text search is functional but not optimized for the complex pattern matching common in log analysis

Real-World Scaling Shatters the Illusion

The true challenges emerge as your ClickHouse logging solution scales. What worked smoothly in development with gigabytes becomes increasingly problematic at terabyte or petabyte scale.

Query Performance Degrades

Initially, queries return in seconds or less. As data grows, the same searches can take minutes. Without careful tuning and constant optimization, query performance becomes increasingly unpredictable — your team ends up spending more time optimizing queries than using them to solve problems.

Operational Overhead Becomes Crushing

Managing a ClickHouse cluster at scale is a specialized skill. You'll need to handle:

  • Rebalancing — Manually triggering operations as data distribution becomes uneven
  • Storage management — Careful tuning of merge settings, partitioning schemes, and TTL rules
  • Resource allocation — Preventing resource-intensive queries from impacting overall system performance
  • Schema evolution — Adding, removing, or modifying columns requires careful planning
  • Multi-tenancy — ClickHouse has limited built-in multi-tenancy, creating additional complexity

One large e-commerce team shared with us that they had six full-time engineers dedicated just to keeping their ClickHouse logging infrastructure operational — an enormous hidden cost that rarely factors into initial calculations.

Reliability Issues Emerge

  • Ingestion pipeline failures — As log volumes spike, ingestion often falls behind because ingestion and search aren't separated
  • Complex failure modes — Troubleshooting ClickHouse problems requires deep database expertise most DevOps teams don't have
  • Recovery complexity — Restoring from failures or data corruption is time-consuming and error-prone

The DIY Trap

When engineers suggest "just use ClickHouse for logging," they're often underestimating what that actually means. ClickHouse isn't a logging platform — it's a powerful but complex analytical database that requires substantial expertise to turn into a production-ready observability solution.

The ClickHouse Learning Curve

ClickHouse has unique concepts that take months to develop expertise in:

  • MergeTree engines — Understanding when to use ReplicatedMergeTree vs. ReplacingMergeTree vs. dozens of other variants
  • Partitioning and primary key design — Critical decisions that affect query performance and can't easily be changed later
  • Merge behavior tuning — Configuring background processes that determine how your data gets organized and compressed

Building the Missing Pieces

Even with ClickHouse expertise, you still need to build an entire logging platform around it:

  • Ingestion pipelines that handle backpressure, parse multiple log formats, and deal with schema evolution
  • Query interfaces — APIs and UIs that let users actually search and analyze their logs
  • Access control — multi-tenancy and permissions
  • Alerting and monitoring systems
  • Operational tooling — backup/recovery, cluster health monitoring, schema migrations

The Hidden Costs

What starts as "let's save money by using open source" often becomes:

  • Months of engineering time just to get basic functionality working
  • Ongoing operational overhead that scales with data volume
  • Opportunity cost of engineers focusing on infrastructure instead of product features
  • Risk of outages due to misconfiguration or lack of operational expertise

For most teams, the total cost of ownership of a DIY ClickHouse solution exceeds purpose-built logging platforms.


Built for Logs, Not Analytics

Instead of adapting an analytics database for logging, the right approach is to build a system specifically for the messy, unpredictable nature of log data:

  • Separation of compute and storage — Decoupling storage from compute so each scales independently, keeping searches fast regardless of data volume
  • Indexless architecture with Bloom filters — Sub-second search performance without the overhead and complexity of traditional indexing, particularly effective for high-cardinality fields
  • Schema-agnostic ingestion — New fields, changing structures, and varying formats handled automatically without schema migrations or pipeline adjustments
  • Optimized for search patterns — Full-text search, pattern matching, filtering, and aggregation all handled natively

The result:

  • Sub-second search on terabytes of data
  • Seconds on petabytes — no rehydration from cold storage
  • 12-month retention by default
  • Zero operational overhead
  • Transparent, predictable pricing

Conclusion

ClickHouse is an impressive technology for what it was designed to do: structured data analytics at scale. But turning it into a general-purpose logging solution requires enormous engineering investment and still results in fundamental compromises.

When you can instantly search across years of data, logs shift from a compliance checkbox or last-resort troubleshooting tool to an active part of your operational toolkit.

Instead of spending years rebuilding what already exists, consider whether your team's time is better spent solving your core business problems. The logging layer should be a foundation that just works — not a perpetual engineering project.

See Bronto in Action

Top comments (0)