DEV Community

Cover image for Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks
Kanishga Subramani
Kanishga Subramani

Posted on

Day 60: ClickHouse® Query Profiling – Finding Performance Bottlenecks

When a query becomes slow, the first instinct is often to add more CPU or increase memory. In reality, the problem may have nothing to do with hardware.

A query can be slow because it scans too much data, performs expensive calculations, uses excessive memory, or transfers large amounts of data across the network. Without profiling, these bottlenecks are difficult to distinguish because they often produce similar symptoms: high query latency.

ClickHouse® provides built-in profiling tools that expose detailed execution statistics, making it possible to understand why a query is slow before attempting optimization.

In this article, we'll explore how to profile queries in ClickHouse®, interpret the most important metrics, and identify common performance bottlenecks.

Why Query Profiling Matters

A slow query does not necessarily mean a slow database.

Common causes include:

  • Reading far more rows than necessary

  • Poor primary key filtering

  • Large aggregations or sorts

  • Memory-intensive joins

  • Network overhead in distributed queries

  • CPU-heavy expressions

  • Ineffective partition pruning

Profiling provides visibility into:

  • Execution time

  • Rows and bytes processed

  • Memory consumption

  • CPU utilization

  • Disk reads

  • Thread-level behavior

  • Execution plans

  • Low-level performance counters

The goal is simple: identify the real bottleneck before optimizing.

A Practical Profiling Workflow

A typical ClickHouse® profiling workflow looks like this:

  • Find slow queries in system.query_log.

  • Inspect the execution plan with EXPLAIN.

  • Review rows, bytes, and memory usage.

  • Analyze thread-level behavior if needed.

  • Examine ProfileEvents.

  • Apply a targeted optimization.

  • Measure the results again.

This evidence-driven approach is far more effective than making multiple changes simultaneously.

Step 1: Find Slow Queries

The system.query_log table stores execution statistics for completed queries.

Key columns:

Column Meaning
query_duration_ms Total execution time
read_rows Rows scanned
read_bytes Data read from storage
memory_usage Peak memory consumption
normalized_query_hash Groups similar queries together

normalized_query_hash is especially useful because it reveals repeated expensive query patterns, not just isolated slow queries.

Step 2: Inspect the Execution Plan

Execution time alone does not explain why a query is slow.

Use EXPLAIN to see how ClickHouse® plans to execute it.

The output shows:

  • Index usage

  • Selected parts

  • Granules scanned

  • Predicate pushdown

  • Read operations

If most granules are scanned, the primary key is not pruning data effectively.

Step 3: Analyze Resource Usage

Rows Read

If a query returns 100 rows but scans hundreds of millions, filtering is likely inefficient.

Bytes Read

Large values usually indicate:

  • Full table scans

  • Reading unnecessary columns

  • Poor partition pruning

Peak Memory Usage

High memory consumption is common with:

  • JOIN operations

  • GROUP BY

  • ORDER BY

  • DISTINCT

Large intermediate datasets can eventually spill to disk depending on server settings.

Step 4: Use system.query_thread_log

Some queries are not uniformly slow across all threads.

system.query_thread_log records statistics for individual execution threads, including:

  • CPU time

  • Memory usage

  • Read statistics

  • Thread duration

This is useful for diagnosing parallel execution imbalance and identifying threads doing disproportionate work.

Step 5: Examine ProfileEvents

One of the most valuable profiling features is the ProfileEvents map available in system.query_log.

Common events:

Event Meaning
SelectedRows Rows processed
SelectedBytes Bytes processed
NetworkSendBytes Network traffic generated
ReadCompressedBytes Compressed bytes read from storage
UserTimeMicroseconds CPU time in user space
SystemTimeMicroseconds CPU time in kernel space

These counters often reveal whether the dominant cost is CPU, storage I/O, or network communication.

Identifying Common Bottlenecks

1. Excessive Data Scanning

Symptoms

  • High read_rows

  • High read_bytes

  • Long execution time

Typical causes

  • Missing filters

  • Poor primary key design

  • Ineffective partition pruning

2. CPU Bottlenecks

Symptoms

  • High CPU-related ProfileEvents

  • Moderate data reads

  • Complex expressions

Common causes

  • Expensive functions

  • Large aggregations

  • Complex JOIN conditions

3. Memory Bottlenecks

Symptoms

  • High memory_usage

  • Slow aggregations

  • Slow joins

Common causes

  • Large hash tables

  • Wide intermediate datasets

  • Massive GROUP BY operations

4. Disk I/O Bottlenecks

Symptoms

  • High compressed bytes read

  • Large storage reads

  • Long execution time despite moderate CPU usage

Typical causes

  • Reading unnecessary columns

  • Large table scans

  • Poor data locality

5. Network Bottlenecks

Symptoms

  • High NetworkSendBytes

  • Large intermediate result exchange

  • Slow distributed aggregations

Reducing data movement between nodes can significantly improve distributed query performance.

Practical Example

Suppose a query takes 20 seconds to complete.

Metric

Value

Duration

20 s

Rows Read

450 million

Result Rows

150

Memory

180 MB

CPU

Moderate

Read Bytes

Very High

This profile suggests that CPU is not the limiting factor.

The query is spending most of its time scanning data, so the best optimization is likely:

  • Better filtering

  • Improved primary key design

  • More effective partition pruning

Adding more CPU would not solve the root cause.

Best Practices

  • Enable query logging in production.

  • Use EXPLAIN before modifying queries.

  • Review ProfileEvents alongside execution time.

  • Track recurring slow queries with normalized_query_hash.

  • Measure before and after every optimization.

  • Test with realistic datasets.

  • Avoid optimizing based on a single metric.

It is also useful to compare cold-cache and warm-cache performance when evaluating storage-related changes.

Conclusion

Query optimization is fundamentally about identifying bottlenecks, not applying random performance tweaks.

ClickHouse® provides powerful profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents. Together, these tools reveal how a query executes and which resources it consumes.

By combining execution plans with runtime metrics, engineers can determine whether a query is limited by CPU, memory, storage, or network activity and apply targeted optimizations that improve performance without unnecessary infrastructure changes.

Day 60 takeaway: Measure first, optimize second. The fastest way to improve a slow ClickHouse® query is to understand exactly where the time is being spent.

Read more... https://www.quantrail-data.com/clickhouse-query-profiling-identifying-bottlenecks

Top comments (0)