Kanishga Subramani

Posted on Jul 4

Day 58 - Optimizing ClickHouse® for High-Throughput Inserts

#clickhouse #analytics #database #devops

Introduction

One of the defining strengths of ClickHouse® is its ability to ingest massive volumes of data while continuing to deliver exceptionally fast analytical queries. Whether you're processing application logs, IoT telemetry, financial transactions, clickstream events, or business metrics, efficient data ingestion is critical for maintaining a responsive analytics platform.

However, high insert performance doesn't happen automatically. Factors such as table design, insert batch size, partitioning strategy, compression, and background merge operations all influence how efficiently ClickHouse handles incoming data.

By following proven ingestion best practices, you can reduce write latency, minimize merge overhead, improve resource utilization, and ensure that fresh data becomes available for analysis as quickly as possible.

In this article, we'll explore the techniques that help maximize insert throughput in ClickHouse and build scalable data ingestion pipelines.

Why Insert Performance Matters

Analytical systems often receive continuous streams of data from multiple sources. As ingestion rates increase, inefficient insert patterns can quickly become a bottleneck.

Poor insert strategies may result in:

Increased ingestion latency
Higher CPU utilization
Excessive disk I/O
Large numbers of small data parts
Frequent background merges
Slower analytical queries

Optimizing insert performance helps maintain a healthy MergeTree storage engine while ensuring that queries remain fast even under heavy workloads.

How ClickHouse Processes Inserts

Unlike traditional row-oriented databases, ClickHouse is designed for batch-oriented ingestion.

When data is inserted into a MergeTree table, ClickHouse performs several operations:

Creates a new immutable data part.
Sorts the incoming rows using the table's ORDER BY key.
Compresses column data.
Writes the new data part to disk.
Schedules background merges that combine smaller parts into larger ones over time.

This architecture allows ClickHouse to achieve extremely high write throughput while maintaining efficient storage and query performance.

1. Insert Data in Large Batches

Batch inserts are one of the most effective ways to improve ingestion performance.

Each INSERT creates a new data part. Performing thousands of tiny inserts generates unnecessary metadata and increases background merge activity.

Less Efficient

INSERT INTO events VALUES (...);

Executing this statement repeatedly for individual rows creates significant overhead.

Recommended Batch Sizes

Rows per Insert	Performance
1–100	Poor
1,000–10,000	Good
10,000–100,000+	Excellent

Larger batches reduce the number of generated data parts and significantly improve overall throughput.

2. Choose an Efficient Primary Key

The ORDER BY clause determines how data is physically sorted inside MergeTree tables.

A good primary key should:

Match common query patterns
Support efficient filtering
Minimize sorting overhead
Avoid unnecessarily high-cardinality columns

Example:

ORDER BY (event_date, user_id)

Selecting an appropriate sorting key improves both insert efficiency and query performance.

3. Partition Data Wisely

Partitioning groups related data together, making inserts, maintenance, and queries more efficient.

Example:

PARTITION BY toYYYYMM(event_date)

Benefits include:

Faster inserts
Efficient partition pruning
Simplified retention policies
Easier maintenance operations

Avoid excessive partition counts, as too many partitions increase metadata and merge overhead.

4. Reduce Small Data Parts

Every insert creates a new data part.

Frequent small inserts eventually produce thousands of tiny parts, increasing system overhead.

Potential consequences include:

High CPU utilization
Increased disk activity
Longer merge times
Slower query execution

Monitor active parts using:

SELECT
    database,
    table,
    count() AS active_parts
FROM system.parts
WHERE active
GROUP BY database, table;

If active part counts continue to grow, consider increasing insert batch sizes.

5. Use High-Performance Input Formats

ClickHouse supports several ingestion formats, including:

Native
Parquet
CSV
JSONEachRow

Among these, the Native format typically delivers the highest performance because it minimizes parsing overhead and is optimized specifically for ClickHouse.

Whenever possible, prefer the Native format for large-scale ingestion pipelines.

6. Optimize Compression Codecs

Compression reduces storage requirements while improving disk efficiency.

Common codecs include:

LZ4 (default)
ZSTD

Example:

value String CODEC(ZSTD)

LZ4 offers excellent compression speed with fast decompression.

ZSTD achieves higher compression ratios, reducing storage usage at the cost of additional CPU.

Choose the codec that best fits your workload.

7. Monitor Background Merges

Background merges consolidate smaller parts into larger ones.

Although merges occur automatically, excessive merge activity may indicate inefficient ingestion patterns.

Monitor running merges with:

SELECT *
FROM system.merges;

Frequent merge operations often suggest:

Small insert batches
High concurrent insert rates
Excessive part creation

Increasing batch size usually reduces merge pressure.

8. Track Insert Performance

Monitoring insert performance helps identify bottlenecks before they impact production workloads.

The system.query_log table provides valuable information about completed insert operations.

SELECT
    query,
    query_duration_ms,
    written_rows,
    written_bytes
FROM system.query_log
WHERE type='QueryFinish'
ORDER BY event_time DESC
LIMIT 10;

Useful metrics include:

Query duration
Rows written
Bytes written

Regular monitoring allows continuous optimization of ingestion workloads.

9. Enable Asynchronous Inserts

Applications generating frequent small writes can benefit from asynchronous inserts.

Example:

SET async_insert = 1;

SET wait_for_async_insert = 1;

Advantages include:

Automatic batching of small inserts
Fewer generated data parts
Lower merge overhead
Higher ingestion throughput

Asynchronous inserts are particularly useful for streaming applications, telemetry platforms, and event-driven architectures.

10. Tune Insert Settings

ClickHouse provides several settings that can improve insert performance depending on workload characteristics.

Some commonly used options include:

async_insert

Buffers small inserts before writing them to disk.

wait_for_async_insert

Waits until buffered inserts have been successfully written before returning.

max_insert_block_size

Controls the maximum number of rows processed within each insert block.

max_partitions_per_insert_block

Prevents inserts from generating excessive numbers of partitions.

Carefully tuning these settings can significantly improve throughput while reducing storage overhead.

Best Practices Checklist

Use the following checklist to optimize insert performance:

Batch inserts whenever possible.
Use an efficient ORDER BY key.
Partition data appropriately.
Minimize the creation of small data parts.
Prefer the Native input format.
Choose suitable compression codecs.
Monitor system.parts.
Monitor system.merges.
Review insert activity using system.query_log.
Enable asynchronous inserts for frequent small writes.
Balance concurrent ingestion workloads.

Common Mistakes to Avoid

Many ingestion issues stem from a few common mistakes:

Inserting one row at a time
Creating excessive numbers of partitions
Ignoring growing part counts
Using inefficient primary keys
Choosing inappropriate compression codecs
Overloading the server with excessive concurrent inserts

Avoiding these issues leads to healthier MergeTree tables and more consistent performance.

Summary of Optimization Techniques

Optimization	Benefit
Batch inserts	Reduces data parts and improves throughput
Efficient primary key	Faster sorting and queries
Proper partitioning	Simplifies maintenance and pruning
Native format	Minimizes parsing overhead
Compression codecs	Improves storage efficiency
Background merge monitoring	Prevents merge bottlenecks
Async inserts	Improves streaming ingestion
Insert setting tuning	Maximizes throughput

Conclusion

ClickHouse® is capable of ingesting millions of rows per second, but achieving consistently high throughput depends on more than powerful hardware. Efficient table design, intelligent batching strategies, proper partitioning, optimized compression, and healthy background merge operations all contribute to outstanding ingestion performance.

By inserting data in large batches, selecting appropriate sorting keys, reducing the creation of small data parts, using efficient data formats, and monitoring system tables such as system.parts, system.merges, and system.query_log, you can build ingestion pipelines that scale reliably as data volumes grow.

Whether you're powering observability platforms, real-time dashboards, IoT systems, financial analytics, or customer behavior analysis, these best practices will help you maximize ClickHouse®'s ingestion capabilities while maintaining the fast analytical performance it's known for.

#100DaysOfClickHouse #Day58 #ClickHouse #DataEngineering #MergeTree #DatabasePerformance #BigData #Analytics #OLAP #DatabaseOptimization

Link -> https://www.quantrail-data.com/optimizing-clickhouse-for-high-throughput-inserts

DEV Community

Day 58 - Optimizing ClickHouse® for High-Throughput Inserts

Introduction

Why Insert Performance Matters

How ClickHouse Processes Inserts

1. Insert Data in Large Batches

Less Efficient

Recommended

Recommended Batch Sizes

2. Choose an Efficient Primary Key

3. Partition Data Wisely

4. Reduce Small Data Parts

5. Use High-Performance Input Formats

6. Optimize Compression Codecs

7. Monitor Background Merges

8. Track Insert Performance

9. Enable Asynchronous Inserts

10. Tune Insert Settings

async_insert

wait_for_async_insert

max_insert_block_size

max_partitions_per_insert_block

Best Practices Checklist

Common Mistakes to Avoid

Summary of Optimization Techniques

Conclusion

Top comments (0)