DEV Community

Cover image for Day 58 - Optimizing ClickHouse® for High-Throughput Inserts
Kanishga Subramani
Kanishga Subramani

Posted on

Day 58 - Optimizing ClickHouse® for High-Throughput Inserts

Introduction

One of the defining strengths of ClickHouse® is its ability to ingest massive volumes of data while continuing to deliver exceptionally fast analytical queries. Whether you're processing application logs, IoT telemetry, financial transactions, clickstream events, or business metrics, efficient data ingestion is critical for maintaining a responsive analytics platform.

However, high insert performance doesn't happen automatically. Factors such as table design, insert batch size, partitioning strategy, compression, and background merge operations all influence how efficiently ClickHouse handles incoming data.

By following proven ingestion best practices, you can reduce write latency, minimize merge overhead, improve resource utilization, and ensure that fresh data becomes available for analysis as quickly as possible.

In this article, we'll explore the techniques that help maximize insert throughput in ClickHouse and build scalable data ingestion pipelines.


Why Insert Performance Matters

Analytical systems often receive continuous streams of data from multiple sources. As ingestion rates increase, inefficient insert patterns can quickly become a bottleneck.

Poor insert strategies may result in:

  • Increased ingestion latency
  • Higher CPU utilization
  • Excessive disk I/O
  • Large numbers of small data parts
  • Frequent background merges
  • Slower analytical queries

Optimizing insert performance helps maintain a healthy MergeTree storage engine while ensuring that queries remain fast even under heavy workloads.


How ClickHouse Processes Inserts

Unlike traditional row-oriented databases, ClickHouse is designed for batch-oriented ingestion.

When data is inserted into a MergeTree table, ClickHouse performs several operations:

  1. Creates a new immutable data part.
  2. Sorts the incoming rows using the table's ORDER BY key.
  3. Compresses column data.
  4. Writes the new data part to disk.
  5. Schedules background merges that combine smaller parts into larger ones over time.

This architecture allows ClickHouse to achieve extremely high write throughput while maintaining efficient storage and query performance.


1. Insert Data in Large Batches

Batch inserts are one of the most effective ways to improve ingestion performance.

Each INSERT creates a new data part. Performing thousands of tiny inserts generates unnecessary metadata and increases background merge activity.

Less Efficient

INSERT INTO events VALUES (...);
Enter fullscreen mode Exit fullscreen mode

Executing this statement repeatedly for individual rows creates significant overhead.

Recommended

INSERT INTO events FORMAT CSV
Enter fullscreen mode Exit fullscreen mode

Load thousands—or even millions—of rows in a single insert whenever possible.

Recommended Batch Sizes

Rows per Insert Performance
1–100 Poor
1,000–10,000 Good
10,000–100,000+ Excellent

Larger batches reduce the number of generated data parts and significantly improve overall throughput.


2. Choose an Efficient Primary Key

The ORDER BY clause determines how data is physically sorted inside MergeTree tables.

A good primary key should:

  • Match common query patterns
  • Support efficient filtering
  • Minimize sorting overhead
  • Avoid unnecessarily high-cardinality columns

Example:

ORDER BY (event_date, user_id)
Enter fullscreen mode Exit fullscreen mode

Selecting an appropriate sorting key improves both insert efficiency and query performance.


3. Partition Data Wisely

Partitioning groups related data together, making inserts, maintenance, and queries more efficient.

Example:

PARTITION BY toYYYYMM(event_date)
Enter fullscreen mode Exit fullscreen mode

Benefits include:

  • Faster inserts
  • Efficient partition pruning
  • Simplified retention policies
  • Easier maintenance operations

Avoid excessive partition counts, as too many partitions increase metadata and merge overhead.


4. Reduce Small Data Parts

Every insert creates a new data part.

Frequent small inserts eventually produce thousands of tiny parts, increasing system overhead.

Potential consequences include:

  • High CPU utilization
  • Increased disk activity
  • Longer merge times
  • Slower query execution

Monitor active parts using:

SELECT
    database,
    table,
    count() AS active_parts
FROM system.parts
WHERE active
GROUP BY database, table;
Enter fullscreen mode Exit fullscreen mode

If active part counts continue to grow, consider increasing insert batch sizes.


5. Use High-Performance Input Formats

ClickHouse supports several ingestion formats, including:

  • Native
  • Parquet
  • CSV
  • JSONEachRow

Among these, the Native format typically delivers the highest performance because it minimizes parsing overhead and is optimized specifically for ClickHouse.

Whenever possible, prefer the Native format for large-scale ingestion pipelines.


6. Optimize Compression Codecs

Compression reduces storage requirements while improving disk efficiency.

Common codecs include:

  • LZ4 (default)
  • ZSTD

Example:

value String CODEC(ZSTD)
Enter fullscreen mode Exit fullscreen mode

LZ4 offers excellent compression speed with fast decompression.

ZSTD achieves higher compression ratios, reducing storage usage at the cost of additional CPU.

Choose the codec that best fits your workload.


7. Monitor Background Merges

Background merges consolidate smaller parts into larger ones.

Although merges occur automatically, excessive merge activity may indicate inefficient ingestion patterns.

Monitor running merges with:

SELECT *
FROM system.merges;
Enter fullscreen mode Exit fullscreen mode

Frequent merge operations often suggest:

  • Small insert batches
  • High concurrent insert rates
  • Excessive part creation

Increasing batch size usually reduces merge pressure.


8. Track Insert Performance

Monitoring insert performance helps identify bottlenecks before they impact production workloads.

The system.query_log table provides valuable information about completed insert operations.

SELECT
    query,
    query_duration_ms,
    written_rows,
    written_bytes
FROM system.query_log
WHERE type='QueryFinish'
ORDER BY event_time DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Useful metrics include:

  • Query duration
  • Rows written
  • Bytes written

Regular monitoring allows continuous optimization of ingestion workloads.


9. Enable Asynchronous Inserts

Applications generating frequent small writes can benefit from asynchronous inserts.

Example:

SET async_insert = 1;

SET wait_for_async_insert = 1;
Enter fullscreen mode Exit fullscreen mode

Advantages include:

  • Automatic batching of small inserts
  • Fewer generated data parts
  • Lower merge overhead
  • Higher ingestion throughput

Asynchronous inserts are particularly useful for streaming applications, telemetry platforms, and event-driven architectures.


10. Tune Insert Settings

ClickHouse provides several settings that can improve insert performance depending on workload characteristics.

Some commonly used options include:

async_insert

Buffers small inserts before writing them to disk.

wait_for_async_insert

Waits until buffered inserts have been successfully written before returning.

max_insert_block_size

Controls the maximum number of rows processed within each insert block.

max_partitions_per_insert_block

Prevents inserts from generating excessive numbers of partitions.

Carefully tuning these settings can significantly improve throughput while reducing storage overhead.


Best Practices Checklist

Use the following checklist to optimize insert performance:

  • Batch inserts whenever possible.
  • Use an efficient ORDER BY key.
  • Partition data appropriately.
  • Minimize the creation of small data parts.
  • Prefer the Native input format.
  • Choose suitable compression codecs.
  • Monitor system.parts.
  • Monitor system.merges.
  • Review insert activity using system.query_log.
  • Enable asynchronous inserts for frequent small writes.
  • Balance concurrent ingestion workloads.

Common Mistakes to Avoid

Many ingestion issues stem from a few common mistakes:

  • Inserting one row at a time
  • Creating excessive numbers of partitions
  • Ignoring growing part counts
  • Using inefficient primary keys
  • Choosing inappropriate compression codecs
  • Overloading the server with excessive concurrent inserts

Avoiding these issues leads to healthier MergeTree tables and more consistent performance.


Summary of Optimization Techniques

Optimization Benefit
Batch inserts Reduces data parts and improves throughput
Efficient primary key Faster sorting and queries
Proper partitioning Simplifies maintenance and pruning
Native format Minimizes parsing overhead
Compression codecs Improves storage efficiency
Background merge monitoring Prevents merge bottlenecks
Async inserts Improves streaming ingestion
Insert setting tuning Maximizes throughput

Conclusion

ClickHouse® is capable of ingesting millions of rows per second, but achieving consistently high throughput depends on more than powerful hardware. Efficient table design, intelligent batching strategies, proper partitioning, optimized compression, and healthy background merge operations all contribute to outstanding ingestion performance.

By inserting data in large batches, selecting appropriate sorting keys, reducing the creation of small data parts, using efficient data formats, and monitoring system tables such as system.parts, system.merges, and system.query_log, you can build ingestion pipelines that scale reliably as data volumes grow.

Whether you're powering observability platforms, real-time dashboards, IoT systems, financial analytics, or customer behavior analysis, these best practices will help you maximize ClickHouse®'s ingestion capabilities while maintaining the fast analytical performance it's known for.

#100DaysOfClickHouse #Day58 #ClickHouse #DataEngineering #MergeTree #DatabasePerformance #BigData #Analytics #OLAP #DatabaseOptimization

Link -> https://www.quantrail-data.com/optimizing-clickhouse-for-high-throughput-inserts

Top comments (0)