Kanishga Subramani

Posted on Jul 5

Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks

#clickhouse #analytics #database #devops

When a query becomes slow, the first instinct is often to add more CPU or increase memory. In reality, the problem may have nothing to do with hardware.

A query can be slow because it scans too much data, performs expensive calculations, uses excessive memory, or transfers large amounts of data across the network. Without profiling, these bottlenecks are difficult to distinguish because they often produce similar symptoms: high query latency.

ClickHouse® provides built-in profiling tools that expose detailed execution statistics, making it possible to understand why a query is slow before attempting optimization.

In this article, we'll explore how to profile queries in ClickHouse®, interpret the most important metrics, and identify common performance bottlenecks.

Why Query Profiling Matters

A slow query does not necessarily mean a slow database.

Common causes include:

Reading far more rows than necessary
Poor primary key filtering
Large aggregations or sorts
Memory-intensive joins
Network overhead in distributed queries
CPU-heavy expressions
Ineffective partition pruning

Profiling provides visibility into:

Execution time
Rows and bytes processed
Memory consumption
CPU utilization
Disk reads
Thread-level behavior
Execution plans
Low-level performance counters

The goal is simple: identify the real bottleneck before optimizing.

A Practical Profiling Workflow

A typical ClickHouse® profiling workflow looks like this:

Find slow queries in system.query_log.
Inspect the execution plan with EXPLAIN.
Review rows, bytes, and memory usage.
Analyze thread-level behavior if needed.
Examine ProfileEvents.
Apply a targeted optimization.
Measure the results again.

This evidence-driven approach is far more effective than making multiple changes simultaneously.

Step 1: Find Slow Queries

The system.query_log table stores execution statistics for completed queries.

Key columns:

Column	Meaning
query_duration_ms	Total execution time
read_rows	Rows scanned
read_bytes	Data read from storage
memory_usage	Peak memory consumption
normalized_query_hash	Groups similar queries together

normalized_query_hash is especially useful because it reveals repeated expensive query patterns, not just isolated slow queries.

Step 2: Inspect the Execution Plan

Execution time alone does not explain why a query is slow.

Use EXPLAIN to see how ClickHouse® plans to execute it.

The output shows:

Index usage
Selected parts
Granules scanned
Predicate pushdown
Read operations

If most granules are scanned, the primary key is not pruning data effectively.

Step 3: Analyze Resource Usage

Rows Read

If a query returns 100 rows but scans hundreds of millions, filtering is likely inefficient.

Bytes Read

Large values usually indicate:

Full table scans
Reading unnecessary columns
Poor partition pruning

Peak Memory Usage

High memory consumption is common with:

JOIN operations
GROUP BY
ORDER BY
DISTINCT

Large intermediate datasets can eventually spill to disk depending on server settings.

Step 4: Use system.query_thread_log

Some queries are not uniformly slow across all threads.

system.query_thread_log records statistics for individual execution threads, including:

CPU time
Memory usage
Read statistics
Thread duration

This is useful for diagnosing parallel execution imbalance and identifying threads doing disproportionate work.

Step 5: Examine ProfileEvents

One of the most valuable profiling features is the ProfileEvents map available in system.query_log.

Common events:

Event	Meaning
SelectedRows	Rows processed
SelectedBytes	Bytes processed
NetworkSendBytes	Network traffic generated
ReadCompressedBytes	Compressed bytes read from storage
UserTimeMicroseconds	CPU time in user space
SystemTimeMicroseconds	CPU time in kernel space

These counters often reveal whether the dominant cost is CPU, storage I/O, or network communication.

Identifying Common Bottlenecks

1. Excessive Data Scanning

Symptoms

High read_rows
High read_bytes
Long execution time

Typical causes

Missing filters
Poor primary key design
Ineffective partition pruning

2. CPU Bottlenecks

Symptoms

High CPU-related ProfileEvents
Moderate data reads
Complex expressions

Common causes

Expensive functions
Large aggregations
Complex JOIN conditions

3. Memory Bottlenecks

Symptoms

High memory_usage
Slow aggregations
Slow joins

Common causes

Large hash tables
Wide intermediate datasets
Massive GROUP BY operations

4. Disk I/O Bottlenecks

Symptoms

High compressed bytes read
Large storage reads
Long execution time despite moderate CPU usage

Typical causes

Reading unnecessary columns
Large table scans
Poor data locality

5. Network Bottlenecks

Symptoms

High NetworkSendBytes
Large intermediate result exchange
Slow distributed aggregations

Reducing data movement between nodes can significantly improve distributed query performance.

Practical Example

Suppose a query takes 20 seconds to complete.

Metric

Value

Duration

20 s

Rows Read

450 million

Result Rows

150

Memory

180 MB

CPU

Moderate

Read Bytes

Very High

This profile suggests that CPU is not the limiting factor.

The query is spending most of its time scanning data, so the best optimization is likely:

Better filtering
Improved primary key design
More effective partition pruning

Adding more CPU would not solve the root cause.

Best Practices

Enable query logging in production.
Use EXPLAIN before modifying queries.
Review ProfileEvents alongside execution time.
Track recurring slow queries with normalized_query_hash.
Measure before and after every optimization.
Test with realistic datasets.
Avoid optimizing based on a single metric.

It is also useful to compare cold-cache and warm-cache performance when evaluating storage-related changes.

Conclusion

Query optimization is fundamentally about identifying bottlenecks, not applying random performance tweaks.

ClickHouse® provides powerful profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents. Together, these tools reveal how a query executes and which resources it consumes.

By combining execution plans with runtime metrics, engineers can determine whether a query is limited by CPU, memory, storage, or network activity and apply targeted optimizations that improve performance without unnecessary infrastructure changes.

Day 60 takeaway: Measure first, optimize second. The fastest way to improve a slow ClickHouse® query is to understand exactly where the time is being spent.

DEV Community

Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks

Why Query Profiling Matters

A Practical Profiling Workflow

Step 1: Find Slow Queries

Step 2: Inspect the Execution Plan

Step 3: Analyze Resource Usage

Rows Read

Bytes Read

Peak Memory Usage

Step 4: Use system.query_thread_log

Step 5: Examine ProfileEvents

Identifying Common Bottlenecks

1. Excessive Data Scanning

2. CPU Bottlenecks

3. Memory Bottlenecks

4. Disk I/O Bottlenecks

5. Network Bottlenecks

Practical Example

Best Practices

Conclusion

Top comments (0)