Kanishga Subramani

Posted on Jul 4

Day 57: Internals of ClickHouse® Data Parts and Merges – A Complete Guide

#clickhouse #analytics #devops #dataengineering

Introduction

One of the reasons ClickHouse® can handle massive analytical workloads with exceptional speed is its storage architecture. Unlike traditional databases that frequently modify existing files, ClickHouse stores incoming data as immutable units called data parts. These parts are later combined through a background process known as merging.

This design allows ClickHouse to achieve both high ingestion throughput and fast query execution. Every insert operation creates a new part, while background merges continuously optimize storage and improve query efficiency without interrupting users.

Understanding how data parts and merges work is essential for anyone managing ClickHouse in production. It helps explain why batch inserts are recommended, why excessive small inserts can cause performance issues, and how ClickHouse maintains efficiency as datasets grow.

In this article, we'll explore how data parts are created, how background merges operate, how to monitor them, and the best practices for keeping MergeTree tables healthy.

What Are Data Parts?

A data part is the smallest physical storage unit within a MergeTree table.

Whenever data is inserted into a MergeTree-based table, ClickHouse creates a brand-new immutable data part. Instead of modifying existing files, new data is appended as a separate part.

Each data part contains everything required to serve queries efficiently, including:

Compressed column files
Primary indexes
Mark files
Checksums
Table metadata

Because data parts are immutable, ClickHouse can perform concurrent reads and writes efficiently without locking existing data.

Why ClickHouse Uses Data Parts

The immutable-part architecture provides several advantages over traditional update-in-place storage models.

Key benefits include:

Fast insert performance
Lock-free write operations
Efficient compression
Simplified storage management
Background optimization through merges
Better scalability for analytical workloads

Rather than spending resources constantly reorganizing data during inserts, ClickHouse defers optimization work to background merge processes.

Creating a MergeTree Table

Let's create a simple table for storing order information:

CREATE TABLE default.orders
(
    order_id UInt32,
    country LowCardinality(String),
    status LowCardinality(String),
    amount Float64,
    order_date Date
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (country, status, order_date);

This table is partitioned by month and sorted by country, status, and order date.

How Data Parts Are Created

Every INSERT operation generates a new data part.

First Insert

INSERT INTO default.orders VALUES
(1,'IN','completed',1200,'2024-01-05'),
(2,'US','pending',450,'2024-01-06');

This creates the first data part.

Second Insert

INSERT INTO default.orders VALUES
(3,'GB','completed',890,'2024-01-07'),
(4,'DE','cancelled',300,'2024-01-08');

This creates another independent data part.

Although users see a single logical table, ClickHouse internally stores these inserts as separate physical parts until they are merged later.

After multiple inserts, a table may contain dozens or even hundreds of active parts.

The Problem with Too Many Parts

While creating parts is inexpensive, having too many small parts can negatively impact performance.

When thousands of tiny parts accumulate, ClickHouse must manage additional files and metadata, leading to:

Slower query execution
Increased memory usage
Higher file descriptor consumption
Additional merge workload
Reduced overall efficiency

This is why ClickHouse strongly recommends batching inserts instead of inserting one row at a time.

For example, inserting 50,000 rows in a single batch is significantly more efficient than executing 50,000 individual insert statements.

What Happens During a Merge?

A merge is a background operation that combines multiple smaller parts into a larger consolidated part.

Consider the following example:

Before Merge

Part A → Rows 1–2
Part B → Rows 3–4
Part C → Rows 5–6

After Merge

Part D → Rows 1–6

Once the new merged part is successfully created:

The merged part becomes active.
Older parts are marked inactive.
Inactive parts are eventually removed from disk.

This process occurs automatically without requiring user intervention.

How Merges Work Internally

Background merges follow several stages.

1. Selecting Parts

ClickHouse continuously evaluates which parts should be merged.

Factors considered include:

Part size
Number of parts in a partition
Age of parts
Available disk resources
Merge scheduling priorities

2. Reading Existing Parts

The selected parts are read from disk.

Since parts are immutable, ongoing queries can continue accessing them while the merge is in progress.

3. Sorting Data

Rows from all participating parts are combined and sorted according to the table's ORDER BY key.

This ensures that the newly generated part preserves the storage order expected by MergeTree.

4. Writing a New Part

ClickHouse writes a completely new data part containing:

Compressed column files
Primary indexes
Mark files
Metadata
Checksums

5. Activating the New Part

After the new part has been fully written and validated, ClickHouse atomically activates it.

The old parts are immediately marked as inactive.

6. Cleaning Up Old Parts

Inactive parts remain on disk temporarily and are later removed automatically according to configured retention settings.

MergeTree Storage Architecture

A MergeTree table consists of multiple independent data parts.

Table
│
├── Part A
│   ├── Column Files (.bin)
│   ├── Mark Files (.mrk2)
│   ├── Primary Index
│   ├── Checksums
│   └── Metadata
│
├── Part B
│
└── Part C

Each part contains all information necessary for query execution.

This architecture enables ClickHouse to scale efficiently while maintaining high read and write performance.

Understanding Part Names

ClickHouse assigns structured names to data parts using the format:

{partition}_{min_block}_{max_block}_{level}

Example:

202401_1_3_1

Meaning:

Component	Description
202401	January 2024 partition
1	Minimum block number
3	Maximum block number
1	Merge level

The merge level indicates how many merge operations contributed to the creation of the part.

Higher levels generally indicate larger and more consolidated parts.

Why Background Merges Matter

Background merges provide several important benefits:

Reduce the total number of parts
Improve query performance
Increase compression efficiency
Lower disk seek overhead
Simplify storage organization
Reduce metadata management costs

Without merges, tables would accumulate thousands of tiny parts, eventually affecting performance.

Monitoring Data Parts

ClickHouse provides the system.parts table for inspecting active and inactive parts.

View Active Parts

SELECT
    partition,
    name,
    rows,
    bytes_on_disk,
    active
FROM system.parts
WHERE database = 'default'
  AND table = 'orders'
  AND active = 1;

Example output:

partition	name	rows	bytes_on_disk	active
202401	202401_1_1_0	2	1024	1
202401	202401_2_2_0	2	1024	1

This query helps monitor how many active parts exist in a table.

Monitoring Running Merges

You can inspect ongoing merge operations using system.merges.

SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts,
    result_part_name
FROM system.merges
WHERE database = 'default'
  AND table = 'orders';

Typical output:

database	table	elapsed	progress	num_parts	result_part_name
default	orders	2.3	0.65	4	202401_1_4_1

Important fields include:

elapsed – duration of the merge
progress – completion percentage
num_parts – number of parts involved
result_part_name – name of the resulting merged part

Monitoring this table can help identify merge bottlenecks.

Triggering Manual Merges

In most situations, ClickHouse handles merges automatically.

However, manual merges can be triggered when necessary.

Merge Table Parts

OPTIMIZE TABLE default.orders;

Merge a Specific Partition

OPTIMIZE TABLE default.orders
PARTITION '202401';

Force a Complete Merge

OPTIMIZE TABLE default.orders FINAL;

Important Warning

OPTIMIZE TABLE ... FINAL forces all parts within a partition to be merged into a single part.

On large tables, this operation can consume significant CPU, memory, and disk resources. It should only be used when absolutely necessary and preferably during low-traffic periods.

Best Practices

To maintain healthy MergeTree tables:

Use Large Batch Inserts

Aim for:

Minimum: 1,000 rows per insert
Recommended: 10,000–100,000 rows per insert

This minimizes the number of generated parts.

Monitor Part Counts

Regularly check system.parts.

If a partition contains hundreds of active parts, investigate ingestion patterns.

Avoid Excessive FINAL Operations

Repeatedly running OPTIMIZE FINAL can consume significant resources and reduce overall system efficiency.

Choose Partitioning Carefully

For most analytical workloads, monthly partitions using toYYYYMM() are preferable to daily partitions.

Allow Background Merges to Work

ClickHouse includes sophisticated merge scheduling mechanisms.

Avoid interrupting or disabling background merge processes unless absolutely necessary.

Monitor Merge Activity

Use system.merges to ensure merges are keeping up with ingestion workloads.

Common Mistakes

Many ClickHouse performance problems originate from a few recurring mistakes:

Inserting one row at a time
Creating excessive numbers of partitions
Ignoring growing part counts
Running OPTIMIZE FINAL repeatedly
Disabling background merges

Avoiding these issues can significantly improve cluster performance and stability.

Data Parts vs Background Merges

Data Parts	Background Merges
Created during inserts	Run automatically
Immutable storage units	Combine multiple parts
Store table data	Optimize storage layout
Queried directly	Execute asynchronously
Increase with ingestion	Reduce overall part count

Conclusion

Data parts and background merges are fundamental to how ClickHouse achieves high-performance analytics at scale. Every insert creates an immutable data part, allowing ingestion to remain fast and efficient, while background merges continuously optimize storage by combining smaller parts into larger ones.

This architecture enables ClickHouse to deliver excellent write throughput, strong compression, and fast query execution without requiring constant manual maintenance.

By understanding how parts are created, how merges operate, and how to monitor storage behavior through system tables such as system.parts and system.merges, you can build more efficient ingestion pipelines and maintain healthy MergeTree deployments.

The most important rule remains simple: batch your inserts, monitor part counts, and let ClickHouse handle storage optimization through its merge engine.

#100DaysOfClickHouse #ClickHouse #DatabaseInternals #MergeTree #DataEngineering #BigData #Analytics #OLAP #DatabasePerformance #Day57