nishaant dixit

Posted on May 7 • Originally published at sivaro.in

ClickHouse Performance Tuning: Lessons From 200K Events/sec

I once watched a query run for 47 minutes on a cluster that cost $12,000 per month. The team had thrown hardware at the problem. More nodes. More RAM. Faster SSDs. Nothing worked.

The fix? A single ORDER BY change. Query time dropped to 2.8 seconds.

That moment changed how I approach ClickHouse performance tuning. Most people think tuning means adding resources. They're wrong. ClickHouse performance tuning is about understanding how data is stored, ordered, and queried at the columnar level.

What is ClickHouse performance tuning? It's the systematic process of optimizing query execution, storage efficiency, and cluster configuration to maximize throughput while minimizing latency. Done right, it delivers 10x-100x improvements without buying a single new server.

Here's what I learned the hard way from tuning systems processing 200K events per second.

Every ClickHouse performance problem traces back to one root cause: data skipping isn't working. ClickHouse's superpower is its ability to skip large chunks of data during reads. When your queries scan everything, performance tanks.

According to the official Performance and Optimizations | ClickHouse Docs, the most critical factor is understanding ClickHouse's merge tree engine. Data is stored in parts, each part sorted by the ORDER BY key. The engine creates sparse indexes and min/max statistics per granule (typically 8192 rows).

Here's the hard truth: Your primary key choice determines everything. Unlike traditional databases, ClickHouse's primary key isn't unique. It's a sorting key that controls how data is physically organized on disk.

I've found that teams make three fundamental mistakes:

Too many columns in ORDER BY: Every additional column adds overhead during merges and reduces data skipping efficiency
Low cardinality columns first: Placing high-cardinality columns early in the key destroys index selectivity
Ignoring query patterns: The ORDER BY must match how queries filter data, not how developers think about the data

The LaunchDarkly team documented exactly this problem in their production systems. According to Optimizing ClickHouse: The Tactics That Worked for Us, they had to completely restructure their schema design because initial choices made queries scan 10x more data than necessary.

Everyone talks about query optimization. Most advice is generic garbage. Here's what actually moves the needle.

The 2026 definitive guide from ClickHouse engineering provides concrete patterns. According to The definitive guide to ClickHouse query optimization (2026), the most impactful pattern is matching your ORDER BY to WHERE clause columns.

-- BAD: Query must scan all partitions
SELECT event_type, count(*)
FROM events
WHERE timestamp > now() - INTERVAL 1 DAY
  AND user_id = 'abc123'

-- ORDER BY: (timestamp, user_id) 
-- Problem: timestamp comes first, but query filters by user_id

-- GOOD: Reorder to match query pattern
-- ORDER BY: (user_id, timestamp)
-- Now ClickHouse can skip directly to user_id's granules

I've seen this single change deliver 18x improvements. One engineer documented a real case: according to ClickHouse Performance: How I Got 18x Faster Queries ..., reordering the primary key based on actual query patterns slashed query times from 30 seconds to under 2 seconds.

Most people set index_granularity to the default 8192 and never touch it. This is a mistake.

-- Fine-grained index for time-series with range queries
CREATE TABLE metrics (
    timestamp DateTime,
    metric_name String,
    value Float64
) ENGINE = MergeTree
ORDER BY (metric_name, timestamp)
SETTINGS index_granularity = 4096;  -- Smaller for better precision

Smaller granularity means more precise skipping but larger indexes. I've found that 4096 works well for most analytics workloads. Anything below 1024 creates index bloat that hurts merge performance.

Sometimes your ORDER BY can't fix everything. That's where data skipping indexes shine.

-- Bloom filter index for high-cardinality string matching
ALTER TABLE events ADD INDEX idx_user_id 
    (user_id) 
    TYPE bloom_filter(0.05) 
    GRANULARITY 1;

According to the ClickHouse Performance Tuning Checklist published in March 2026, bloom filter indexes provide the best performance improvement for WHERE clauses on high-cardinality columns that can't be in the primary key.

The schema is where performance lives or dies. You cannot optimize your way out of a bad schema.

Every byte counts in columnar storage. ClickHouse stores data column-by-column, so smaller data types mean less data to scan.

-- BAD: Overly large types
CREATE TABLE events_bad (
    event_id String,           -- 36 bytes for UUID
    timestamp DateTime,        -- 4 bytes
    user_id String,            -- variable, avg 20+ bytes
    status String              -- 'active', 'inactive' etc
)

-- GOOD: Minimal types
CREATE TABLE events_good (
    event_id UUID,             -- 16 bytes, fixed
    timestamp DateTime64,      -- 8 bytes, but microsecond precision
    user_id FixedString(8),    -- If IDs are numeric or short
    status Enum8('active'=1, 'inactive'=2)  -- 1 byte
)

I've seen teams reduce storage by 40-60% just by using proper types. The Reddit data engineering community discusses this constantly. A thread on Looking for tuning advice for ClickHouse : r/dataengineering highlights that most performance issues trace back to schema choices made during initial development.

Partitioning serves one purpose: data lifecycle management. It does not speed up individual queries. Here's where people get confused.

-- Good partitioning: matches data retention
CREATE TABLE events (
    timestamp DateTime,
    user_id String,
    event_data String
) ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)  -- Monthly partitions for 90-day retention
ORDER BY (user_id, timestamp)

Too many partitions hurt performance. Each partition creates separate index files. Hundreds of tiny partitions means dozens of seeks per query. I keep partitions large enough that queries hit 3-5 partitions maximum.

Hardware tuning matters, but only after you've fixed the schema. Throwing RAM at a bad query plan is like putting premium gas in a broken engine.

ClickHouse loves memory. But it also respects limits. The ClickStack - performance tuning guide from ClickHouse's own observability stack provides concrete numbers for production settings.

<!-- config.xml - Memory limits that actually work -->
<max_memory_usage>100000000000</max_memory_usage> <!-- 100GB -->
<max_memory_usage_for_all_queries>200000000000</max_memory_usage_for_all_queries>
<max_bytes_before_external_group_by>50000000000</max_bytes_before_external_group_by>
<max_bytes_before_external_sort>50000000000</max_bytes_before_external_sort>

The key insight: set max_bytes_before_external_group_by to about 50% of total memory. This forces ClickHouse to spill to disk before OOM. I've saved many production clusters with this single setting.

ClickHouse parallelizes aggressively. Too many threads causes context switching overhead. Too few leaves resources idle.

-- Per-query thread control for heavy aggregations
SELECT 
    user_id,
    count(*)
FROM events
WHERE timestamp > now() - INTERVAL 1 DAY
SETTINGS max_threads = 8;  -- Match CPU cores

According to How to Tune ClickHouse for Maximum Query Performance, published February 2026, setting max_threads between 4-8 for typical analytics queries provides the best throughput without overwhelming the scheduler.

This is where ClickHouse performance tuning becomes an art. Materialized views and projections let you pre-compute results without application complexity.

Projections are materialized views that automatically maintain redundant data storage within the same table. They're magic for speed.

CREATE TABLE events (
    timestamp DateTime,
    user_id String,
    event_type String,
    value UInt64
) ENGINE = MergeTree
ORDER BY (timestamp)
-- Add projection for daily aggregation
PROJECTION daily_agg (
    SELECT 
        toDate(timestamp) as day,
        event_type,
        sum(value)
    GROUP BY day, event_type
)

The query optimizer automatically routes matching queries to the projection. No application changes needed. According to ClickHouse Performance Tuning and Optimization, projections can reduce query latency by 10-50x for aggregation-heavy workloads.

For complex transformations, materialized views provide more flexibility:

CREATE MATERIALIZED VIEW events_hourly_mv
ENGINE = SummingMergeTree
ORDER BY (hour, event_type)
POPULATE
AS SELECT
    toStartOfHour(timestamp) as hour,
    event_type,
    count() as event_count,
    sum(value) as total_value
FROM events
GROUP BY hour, event_type

I prefer projections for simple cases and materialized views for complex transformations. The trade-off: projections are automatic but limited in expression power. Materialized views give full SQL freedom but require manual management.

After tuning dozens of ClickHouse clusters, I've seen the same mistakes repeated.

This destroys everything. ClickHouse shines when it reads few columns. SELECT * reads ALL columns.

-- BAD: Reads 50+ columns
SELECT * FROM events WHERE timestamp > now() - INTERVAL 1 DAY

-- GOOD: Reads 3 columns
SELECT event_type, count(*) FROM events WHERE timestamp > now() - INTERVAL 1 DAY

According to A simple guide to ClickHouse query optimization: part 1, column pruning alone can reduce I/O by 10x for wide tables.

I inherited a system with hourly partitions. Queries were slow because each query opened hundreds of file descriptors. The fix: switch to monthly partitions. Query times dropped 60%.

After heavy inserts, ClickHouse merges parts in the background. During merges, query performance degrades. The ClickStack - performance tuning guide recommends monitoring MergedRows and MergedUncompressedBytes metrics to catch merge storms.

What is the most impactful ClickHouse performance tuning change?
Order your primary key to match your most common WHERE clause patterns. This alone provides 5-50x improvements by enabling data skipping.

How many columns should I include in ORDER BY?
Maximum 3-4 columns. Each additional column increases merge overhead and reduces index efficiency. Focus on filter columns only.

Does ClickHouse need SSDs?
Yes. ClickHouse's performance depends on sequential reads. SSDs improve latency by 10-100x compared to HDDs for random access patterns.

What index granularity should I use?
Start with 8192. Reduce to 4096 for time-series with precise range queries. Below 1024 creates index bloat that hurts merge performance.

How do projections differ from materialized views?
Projections are automatic and query-optimizer aware. Materialized views are more flexible but require manual population and routing logic.

Why is my query slow despite proper indexing?
Check for SELECT * patterns, missing WHERE clause filters, or too many partitions. Also verify your server's max_threads configuration matches CPU cores.

Can ClickHouse use parallel query execution?
Yes. ClickHouse automatically parallelizes across CPU cores. Set max_threads to 4-8 for optimal throughput without scheduler overhead.

ClickHouse performance tuning isn't magic. It's systematic optimization of five areas: schema design, primary key ordering, data skipping indexes, memory configuration, and query patterns.

Start with your slowest query. Check the number of granules it reads. If it's reading all granules, your ORDER BY isn't matching query filters. Fix that first. Then optimize data types. Then add skipping indexes for edge cases.

I've seen teams achieve 18x improvements with zero hardware changes. The principles are simple. The execution requires discipline.

Author: Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn

Originally published at https://sivaro.in/articles/clickhouse-performance-tuning-lessons-from-200k-events-sec.

DEV Community

ClickHouse Performance Tuning: Lessons From 200K Events/sec

Top comments (0)