Patrick Londa for Bronto

Posted on May 20 • Originally published at bronto.io

Why ClickHouse Fails as a General-Purpose Logging Solution

#logging #devops #observability #database

Authored by Mike Neville-O'Neill

Let's talk about ClickHouse. It's fast, it's efficient, it's open source. But it's designed to be a general-purpose analytical database, which makes it challenging — in terms of time and resources — to use for logging at scale.

I've spent over a decade working on logging systems and watched countless organizations struggle with the same pattern: a team frustrated with expensive legacy log management turns to ClickHouse, excited by early performance wins on well-structured data. They rebuild critical logging infrastructure around it, roll it out, and everything seems great... until it doesn't.

Why Teams Try ClickHouse for Logs

The appeal is undeniable. When you first point ClickHouse at a clean, structured dataset, the performance is genuinely impressive:

Speed — Built for analytical queries, can scan billions of rows in seconds on modest hardware
Cost — Open source, so licensing costs are zero
Familiarity — If your team works with SQL, the learning curve is minimal
Flexibility — Complete control over your data, schema, and retention policies

Early benchmarks usually reinforce this perception. Testing on controlled, clean data with predictable patterns, ClickHouse shines. You might start with a specific subset of logs — API access logs or well-structured application events — and see impressive query speeds.

The problem isn't that ClickHouse doesn't work for logs. The problem is that it doesn't work for all logs, all the time, at scale, without significant engineering investment. And that's where organizations get trapped.

The Architectural Mismatch

Logs in the wild are messy, inconsistent, and unpredictable — the direct opposite of what ClickHouse was designed to handle.

The Columnar Conundrum

ClickHouse's columnar storage engine is optimized for queries that scan large portions of specific columns — brilliant for analytics workloads with well-defined dimensions. But logs are fundamentally different:

Unpredictable schemas — Log formats change; new fields appear and disappear without warning
Nested structures — Modern logs, especially from cloud environments, contain deeply nested JSON with inconsistent structures
High cardinality — Fields like session IDs and request IDs have enormous numbers of unique values, which presents challenges for ClickHouse
Mixed query patterns — Log analysis involves aggregation, filtering, full-text search, and pattern matching — not just the analytical queries ClickHouse excels at

Forcing logs into ClickHouse is like racing a Formula 1 car on an off-road trail — impressive engineering, wrong application.

Schema Rigidity vs. Log Reality

In the real world, logs evolve constantly:

Developers add new fields to debug specific issues
Third-party systems change their log formats without notice
Cloud providers modify their event structures
Microservices introduce different logging conventions across teams

With ClickHouse, schema changes require careful planning to avoid additional overhead. A purpose-built logging system accommodates this reality by adapting to your logs — not the other way around.

Indexing Limitations

ClickHouse's primary indexing mechanism — the sparse primary index — works by creating index marks every N rows (typically thousands). This excels for batch analytical queries but creates significant compromises for log search:

Limited selectivity — Works best when scanning large portions of the dataset, not when looking for specific events
Suboptimal for high-cardinality fields — User IDs, session IDs, trace IDs all struggle with this approach
Text search limitations — Full-text search is functional but not optimized for the complex pattern matching common in log analysis

Real-World Scaling Shatters the Illusion

The true challenges emerge as your ClickHouse logging solution scales. What worked smoothly in development with gigabytes becomes increasingly problematic at terabyte or petabyte scale.

Query Performance Degrades

Initially, queries return in seconds or less. As data grows, the same searches can take minutes. Without careful tuning and constant optimization, query performance becomes increasingly unpredictable — your team ends up spending more time optimizing queries than using them to solve problems.

Operational Overhead Becomes Crushing

Managing a ClickHouse cluster at scale is a specialized skill. You'll need to handle:

Rebalancing — Manually triggering operations as data distribution becomes uneven
Storage management — Careful tuning of merge settings, partitioning schemes, and TTL rules
Resource allocation — Preventing resource-intensive queries from impacting overall system performance
Schema evolution — Adding, removing, or modifying columns requires careful planning
Multi-tenancy — ClickHouse has limited built-in multi-tenancy, creating additional complexity

One large e-commerce team shared with us that they had six full-time engineers dedicated just to keeping their ClickHouse logging infrastructure operational — an enormous hidden cost that rarely factors into initial calculations.

Reliability Issues Emerge

Ingestion pipeline failures — As log volumes spike, ingestion often falls behind because ingestion and search aren't separated
Complex failure modes — Troubleshooting ClickHouse problems requires deep database expertise most DevOps teams don't have
Recovery complexity — Restoring from failures or data corruption is time-consuming and error-prone

The DIY Trap

When engineers suggest "just use ClickHouse for logging," they're often underestimating what that actually means. ClickHouse isn't a logging platform — it's a powerful but complex analytical database that requires substantial expertise to turn into a production-ready observability solution.

The ClickHouse Learning Curve

ClickHouse has unique concepts that take months to develop expertise in:

MergeTree engines — Understanding when to use ReplicatedMergeTree vs. ReplacingMergeTree vs. dozens of other variants
Partitioning and primary key design — Critical decisions that affect query performance and can't easily be changed later
Merge behavior tuning — Configuring background processes that determine how your data gets organized and compressed

Building the Missing Pieces

Even with ClickHouse expertise, you still need to build an entire logging platform around it:

Ingestion pipelines that handle backpressure, parse multiple log formats, and deal with schema evolution
Query interfaces — APIs and UIs that let users actually search and analyze their logs
Access control — multi-tenancy and permissions
Alerting and monitoring systems
Operational tooling — backup/recovery, cluster health monitoring, schema migrations

The Hidden Costs

What starts as "let's save money by using open source" often becomes:

Months of engineering time just to get basic functionality working
Ongoing operational overhead that scales with data volume
Opportunity cost of engineers focusing on infrastructure instead of product features
Risk of outages due to misconfiguration or lack of operational expertise

For most teams, the total cost of ownership of a DIY ClickHouse solution exceeds purpose-built logging platforms.

Built for Logs, Not Analytics

Instead of adapting an analytics database for logging, the right approach is to build a system specifically for the messy, unpredictable nature of log data:

Separation of compute and storage — Decoupling storage from compute so each scales independently, keeping searches fast regardless of data volume
Indexless architecture with Bloom filters — Sub-second search performance without the overhead and complexity of traditional indexing, particularly effective for high-cardinality fields
Schema-agnostic ingestion — New fields, changing structures, and varying formats handled automatically without schema migrations or pipeline adjustments
Optimized for search patterns — Full-text search, pattern matching, filtering, and aggregation all handled natively

The result:

Sub-second search on terabytes of data
Seconds on petabytes — no rehydration from cold storage
12-month retention by default
Zero operational overhead
Transparent, predictable pricing

Conclusion

ClickHouse is an impressive technology for what it was designed to do: structured data analytics at scale. But turning it into a general-purpose logging solution requires enormous engineering investment and still results in fundamental compromises.

When you can instantly search across years of data, logs shift from a compliance checkbox or last-resort troubleshooting tool to an active part of your operational toolkit.

Instead of spending years rebuilding what already exists, consider whether your team's time is better spent solving your core business problems. The logging layer should be a foundation that just works — not a perpetual engineering project.

See Bronto in Action

DEV Community