Authored by Mike Neville-O'Neill
Let's talk about ClickHouse. It's fast, it's efficient, it's open source. But it's designed to be a general-purpose analytical database, which makes it challenging — in terms of time and resources — to use for logging at scale.
I've spent over a decade working on logging systems and watched countless organizations struggle with the same pattern: a team frustrated with expensive legacy log management turns to ClickHouse, excited by early performance wins on well-structured data. They rebuild critical logging infrastructure around it, roll it out, and everything seems great... until it doesn't.
Why Teams Try ClickHouse for Logs
The appeal is undeniable. When you first point ClickHouse at a clean, structured dataset, the performance is genuinely impressive:
- Speed — Built for analytical queries, can scan billions of rows in seconds on modest hardware
- Cost — Open source, so licensing costs are zero
- Familiarity — If your team works with SQL, the learning curve is minimal
- Flexibility — Complete control over your data, schema, and retention policies
Early benchmarks usually reinforce this perception. Testing on controlled, clean data with predictable patterns, ClickHouse shines. You might start with a specific subset of logs — API access logs or well-structured application events — and see impressive query speeds.
The problem isn't that ClickHouse doesn't work for logs. The problem is that it doesn't work for all logs, all the time, at scale, without significant engineering investment. And that's where organizations get trapped.
The Architectural Mismatch
Logs in the wild are messy, inconsistent, and unpredictable — the direct opposite of what ClickHouse was designed to handle.
The Columnar Conundrum
ClickHouse's columnar storage engine is optimized for queries that scan large portions of specific columns — brilliant for analytics workloads with well-defined dimensions. But logs are fundamentally different:
- Unpredictable schemas — Log formats change; new fields appear and disappear without warning
- Nested structures — Modern logs, especially from cloud environments, contain deeply nested JSON with inconsistent structures
- High cardinality — Fields like session IDs and request IDs have enormous numbers of unique values, which presents challenges for ClickHouse
- Mixed query patterns — Log analysis involves aggregation, filtering, full-text search, and pattern matching — not just the analytical queries ClickHouse excels at
Forcing logs into ClickHouse is like racing a Formula 1 car on an off-road trail — impressive engineering, wrong application.
Schema Rigidity vs. Log Reality
In the real world, logs evolve constantly:
- Developers add new fields to debug specific issues
- Third-party systems change their log formats without notice
- Cloud providers modify their event structures
- Microservices introduce different logging conventions across teams
With ClickHouse, schema changes require careful planning to avoid additional overhead. A purpose-built logging system accommodates this reality by adapting to your logs — not the other way around.
Indexing Limitations
ClickHouse's primary indexing mechanism — the sparse primary index — works by creating index marks every N rows (typically thousands). This excels for batch analytical queries but creates significant compromises for log search:
- Limited selectivity — Works best when scanning large portions of the dataset, not when looking for specific events
- Suboptimal for high-cardinality fields — User IDs, session IDs, trace IDs all struggle with this approach
- Text search limitations — Full-text search is functional but not optimized for the complex pattern matching common in log analysis
Real-World Scaling Shatters the Illusion
The true challenges emerge as your ClickHouse logging solution scales. What worked smoothly in development with gigabytes becomes increasingly problematic at terabyte or petabyte scale.
Query Performance Degrades
Initially, queries return in seconds or less. As data grows, the same searches can take minutes. Without careful tuning and constant optimization, query performance becomes increasingly unpredictable — your team ends up spending more time optimizing queries than using them to solve problems.
Operational Overhead Becomes Crushing
Managing a ClickHouse cluster at scale is a specialized skill. You'll need to handle:
- Rebalancing — Manually triggering operations as data distribution becomes uneven
- Storage management — Careful tuning of merge settings, partitioning schemes, and TTL rules
- Resource allocation — Preventing resource-intensive queries from impacting overall system performance
- Schema evolution — Adding, removing, or modifying columns requires careful planning
- Multi-tenancy — ClickHouse has limited built-in multi-tenancy, creating additional complexity
One large e-commerce team shared with us that they had six full-time engineers dedicated just to keeping their ClickHouse logging infrastructure operational — an enormous hidden cost that rarely factors into initial calculations.
Reliability Issues Emerge
- Ingestion pipeline failures — As log volumes spike, ingestion often falls behind because ingestion and search aren't separated
- Complex failure modes — Troubleshooting ClickHouse problems requires deep database expertise most DevOps teams don't have
- Recovery complexity — Restoring from failures or data corruption is time-consuming and error-prone
The DIY Trap
When engineers suggest "just use ClickHouse for logging," they're often underestimating what that actually means. ClickHouse isn't a logging platform — it's a powerful but complex analytical database that requires substantial expertise to turn into a production-ready observability solution.
The ClickHouse Learning Curve
ClickHouse has unique concepts that take months to develop expertise in:
-
MergeTree engines — Understanding when to use
ReplicatedMergeTreevs.ReplacingMergeTreevs. dozens of other variants - Partitioning and primary key design — Critical decisions that affect query performance and can't easily be changed later
- Merge behavior tuning — Configuring background processes that determine how your data gets organized and compressed
Building the Missing Pieces
Even with ClickHouse expertise, you still need to build an entire logging platform around it:
- Ingestion pipelines that handle backpressure, parse multiple log formats, and deal with schema evolution
- Query interfaces — APIs and UIs that let users actually search and analyze their logs
- Access control — multi-tenancy and permissions
- Alerting and monitoring systems
- Operational tooling — backup/recovery, cluster health monitoring, schema migrations
The Hidden Costs
What starts as "let's save money by using open source" often becomes:
- Months of engineering time just to get basic functionality working
- Ongoing operational overhead that scales with data volume
- Opportunity cost of engineers focusing on infrastructure instead of product features
- Risk of outages due to misconfiguration or lack of operational expertise
For most teams, the total cost of ownership of a DIY ClickHouse solution exceeds purpose-built logging platforms.
Built for Logs, Not Analytics
Instead of adapting an analytics database for logging, the right approach is to build a system specifically for the messy, unpredictable nature of log data:
- Separation of compute and storage — Decoupling storage from compute so each scales independently, keeping searches fast regardless of data volume
- Indexless architecture with Bloom filters — Sub-second search performance without the overhead and complexity of traditional indexing, particularly effective for high-cardinality fields
- Schema-agnostic ingestion — New fields, changing structures, and varying formats handled automatically without schema migrations or pipeline adjustments
- Optimized for search patterns — Full-text search, pattern matching, filtering, and aggregation all handled natively
The result:
- Sub-second search on terabytes of data
- Seconds on petabytes — no rehydration from cold storage
- 12-month retention by default
- Zero operational overhead
- Transparent, predictable pricing
Conclusion
ClickHouse is an impressive technology for what it was designed to do: structured data analytics at scale. But turning it into a general-purpose logging solution requires enormous engineering investment and still results in fundamental compromises.
When you can instantly search across years of data, logs shift from a compliance checkbox or last-resort troubleshooting tool to an active part of your operational toolkit.
Instead of spending years rebuilding what already exists, consider whether your team's time is better spent solving your core business problems. The logging layer should be a foundation that just works — not a perpetual engineering project.
Top comments (0)