When a query becomes slow, the first instinct is often to add more CPU or increase memory. In reality, the problem may have nothing to do with hardware.
A query can be slow because it scans too much data, performs expensive calculations, uses excessive memory, or transfers large amounts of data across the network. Without profiling, these bottlenecks are difficult to distinguish because they often produce similar symptoms: high query latency.
ClickHouse® provides built-in profiling tools that expose detailed execution statistics, making it possible to understand why a query is slow before attempting optimization.
In this article, we'll explore how to profile queries in ClickHouse®, interpret the most important metrics, and identify common performance bottlenecks.
Why Query Profiling Matters
A slow query does not necessarily mean a slow database.
Common causes include:
Reading far more rows than necessary
Poor primary key filtering
Large aggregations or sorts
Memory-intensive joins
Network overhead in distributed queries
CPU-heavy expressions
Ineffective partition pruning
Profiling provides visibility into:
Execution time
Rows and bytes processed
Memory consumption
CPU utilization
Disk reads
Thread-level behavior
Execution plans
Low-level performance counters
The goal is simple: identify the real bottleneck before optimizing.
A Practical Profiling Workflow
A typical ClickHouse® profiling workflow looks like this:
Find slow queries in
system.query_log.Inspect the execution plan with
EXPLAIN.Review rows, bytes, and memory usage.
Analyze thread-level behavior if needed.
Examine
ProfileEvents.Apply a targeted optimization.
Measure the results again.
This evidence-driven approach is far more effective than making multiple changes simultaneously.
Step 1: Find Slow Queries
The system.query_log table stores execution statistics for completed queries.
Key columns:
| Column | Meaning |
|---|---|
| query_duration_ms | Total execution time |
| read_rows | Rows scanned |
| read_bytes | Data read from storage |
| memory_usage | Peak memory consumption |
| normalized_query_hash | Groups similar queries together |
normalized_query_hash is especially useful because it reveals repeated expensive query patterns, not just isolated slow queries.
Step 2: Inspect the Execution Plan
Execution time alone does not explain why a query is slow.
Use EXPLAIN to see how ClickHouse® plans to execute it.
The output shows:
Index usage
Selected parts
Granules scanned
Predicate pushdown
Read operations
If most granules are scanned, the primary key is not pruning data effectively.
Step 3: Analyze Resource Usage
Rows Read
If a query returns 100 rows but scans hundreds of millions, filtering is likely inefficient.
Bytes Read
Large values usually indicate:
Full table scans
Reading unnecessary columns
Poor partition pruning
Peak Memory Usage
High memory consumption is common with:
JOINoperationsGROUP BYORDER BYDISTINCT
Large intermediate datasets can eventually spill to disk depending on server settings.
Step 4: Use system.query_thread_log
Some queries are not uniformly slow across all threads.
system.query_thread_log records statistics for individual execution threads, including:
CPU time
Memory usage
Read statistics
Thread duration
This is useful for diagnosing parallel execution imbalance and identifying threads doing disproportionate work.
Step 5: Examine ProfileEvents
One of the most valuable profiling features is the ProfileEvents map available in system.query_log.
Common events:
| Event | Meaning |
|---|---|
| SelectedRows | Rows processed |
| SelectedBytes | Bytes processed |
| NetworkSendBytes | Network traffic generated |
| ReadCompressedBytes | Compressed bytes read from storage |
| UserTimeMicroseconds | CPU time in user space |
| SystemTimeMicroseconds | CPU time in kernel space |
These counters often reveal whether the dominant cost is CPU, storage I/O, or network communication.
Identifying Common Bottlenecks
1. Excessive Data Scanning
Symptoms
High
read_rowsHigh
read_bytesLong execution time
Typical causes
Missing filters
Poor primary key design
Ineffective partition pruning
2. CPU Bottlenecks
Symptoms
High CPU-related
ProfileEventsModerate data reads
Complex expressions
Common causes
Expensive functions
Large aggregations
Complex
JOINconditions
3. Memory Bottlenecks
Symptoms
High
memory_usageSlow aggregations
Slow joins
Common causes
Large hash tables
Wide intermediate datasets
Massive
GROUP BYoperations
4. Disk I/O Bottlenecks
Symptoms
High compressed bytes read
Large storage reads
Long execution time despite moderate CPU usage
Typical causes
Reading unnecessary columns
Large table scans
Poor data locality
5. Network Bottlenecks
Symptoms
High
NetworkSendBytesLarge intermediate result exchange
Slow distributed aggregations
Reducing data movement between nodes can significantly improve distributed query performance.
Practical Example
Suppose a query takes 20 seconds to complete.
Metric
Value
Duration
20 s
Rows Read
450 million
Result Rows
150
Memory
180 MB
CPU
Moderate
Read Bytes
Very High
This profile suggests that CPU is not the limiting factor.
The query is spending most of its time scanning data, so the best optimization is likely:
Better filtering
Improved primary key design
More effective partition pruning
Adding more CPU would not solve the root cause.
Best Practices
Enable query logging in production.
Use
EXPLAINbefore modifying queries.Review
ProfileEventsalongside execution time.Track recurring slow queries with
normalized_query_hash.Measure before and after every optimization.
Test with realistic datasets.
Avoid optimizing based on a single metric.
It is also useful to compare cold-cache and warm-cache performance when evaluating storage-related changes.
Conclusion
Query optimization is fundamentally about identifying bottlenecks, not applying random performance tweaks.
ClickHouse® provides powerful profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents. Together, these tools reveal how a query executes and which resources it consumes.
By combining execution plans with runtime metrics, engineers can determine whether a query is limited by CPU, memory, storage, or network activity and apply targeted optimizations that improve performance without unnecessary infrastructure changes.
Day 60 takeaway: Measure first, optimize second. The fastest way to improve a slow ClickHouse® query is to understand exactly where the time is being spent.
Read more... https://www.quantrail-data.com/clickhouse-query-profiling-identifying-bottlenecks
Top comments (0)