DEV Community

Cover image for Day 57: Internals of ClickHouse® Data Parts and Merges – A Complete Guide
Kanishga Subramani
Kanishga Subramani

Posted on

Day 57: Internals of ClickHouse® Data Parts and Merges – A Complete Guide

Introduction

One of the reasons ClickHouse® can handle massive analytical workloads with exceptional speed is its storage architecture. Unlike traditional databases that frequently modify existing files, ClickHouse stores incoming data as immutable units called data parts. These parts are later combined through a background process known as merging.

This design allows ClickHouse to achieve both high ingestion throughput and fast query execution. Every insert operation creates a new part, while background merges continuously optimize storage and improve query efficiency without interrupting users.

Understanding how data parts and merges work is essential for anyone managing ClickHouse in production. It helps explain why batch inserts are recommended, why excessive small inserts can cause performance issues, and how ClickHouse maintains efficiency as datasets grow.

In this article, we'll explore how data parts are created, how background merges operate, how to monitor them, and the best practices for keeping MergeTree tables healthy.


What Are Data Parts?

A data part is the smallest physical storage unit within a MergeTree table.

Whenever data is inserted into a MergeTree-based table, ClickHouse creates a brand-new immutable data part. Instead of modifying existing files, new data is appended as a separate part.

Each data part contains everything required to serve queries efficiently, including:

  • Compressed column files
  • Primary indexes
  • Mark files
  • Checksums
  • Table metadata

Because data parts are immutable, ClickHouse can perform concurrent reads and writes efficiently without locking existing data.


Why ClickHouse Uses Data Parts

The immutable-part architecture provides several advantages over traditional update-in-place storage models.

Key benefits include:

  • Fast insert performance
  • Lock-free write operations
  • Efficient compression
  • Simplified storage management
  • Background optimization through merges
  • Better scalability for analytical workloads

Rather than spending resources constantly reorganizing data during inserts, ClickHouse defers optimization work to background merge processes.


Creating a MergeTree Table

Let's create a simple table for storing order information:

CREATE TABLE default.orders
(
    order_id UInt32,
    country LowCardinality(String),
    status LowCardinality(String),
    amount Float64,
    order_date Date
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (country, status, order_date);
Enter fullscreen mode Exit fullscreen mode

This table is partitioned by month and sorted by country, status, and order date.


How Data Parts Are Created

Every INSERT operation generates a new data part.

First Insert

INSERT INTO default.orders VALUES
(1,'IN','completed',1200,'2024-01-05'),
(2,'US','pending',450,'2024-01-06');
Enter fullscreen mode Exit fullscreen mode

This creates the first data part.

Second Insert

INSERT INTO default.orders VALUES
(3,'GB','completed',890,'2024-01-07'),
(4,'DE','cancelled',300,'2024-01-08');
Enter fullscreen mode Exit fullscreen mode

This creates another independent data part.

Although users see a single logical table, ClickHouse internally stores these inserts as separate physical parts until they are merged later.

After multiple inserts, a table may contain dozens or even hundreds of active parts.


The Problem with Too Many Parts

While creating parts is inexpensive, having too many small parts can negatively impact performance.

When thousands of tiny parts accumulate, ClickHouse must manage additional files and metadata, leading to:

  • Slower query execution
  • Increased memory usage
  • Higher file descriptor consumption
  • Additional merge workload
  • Reduced overall efficiency

This is why ClickHouse strongly recommends batching inserts instead of inserting one row at a time.

For example, inserting 50,000 rows in a single batch is significantly more efficient than executing 50,000 individual insert statements.


What Happens During a Merge?

A merge is a background operation that combines multiple smaller parts into a larger consolidated part.

Consider the following example:

Before Merge

Part A → Rows 1–2
Part B → Rows 3–4
Part C → Rows 5–6
Enter fullscreen mode Exit fullscreen mode

After Merge

Part D → Rows 1–6
Enter fullscreen mode Exit fullscreen mode

Once the new merged part is successfully created:

  • The merged part becomes active.
  • Older parts are marked inactive.
  • Inactive parts are eventually removed from disk.

This process occurs automatically without requiring user intervention.


How Merges Work Internally

Background merges follow several stages.

1. Selecting Parts

ClickHouse continuously evaluates which parts should be merged.

Factors considered include:

  • Part size
  • Number of parts in a partition
  • Age of parts
  • Available disk resources
  • Merge scheduling priorities

2. Reading Existing Parts

The selected parts are read from disk.

Since parts are immutable, ongoing queries can continue accessing them while the merge is in progress.


3. Sorting Data

Rows from all participating parts are combined and sorted according to the table's ORDER BY key.

This ensures that the newly generated part preserves the storage order expected by MergeTree.


4. Writing a New Part

ClickHouse writes a completely new data part containing:

  • Compressed column files
  • Primary indexes
  • Mark files
  • Metadata
  • Checksums

5. Activating the New Part

After the new part has been fully written and validated, ClickHouse atomically activates it.

The old parts are immediately marked as inactive.


6. Cleaning Up Old Parts

Inactive parts remain on disk temporarily and are later removed automatically according to configured retention settings.


MergeTree Storage Architecture

A MergeTree table consists of multiple independent data parts.

Table
│
├── Part A
│   ├── Column Files (.bin)
│   ├── Mark Files (.mrk2)
│   ├── Primary Index
│   ├── Checksums
│   └── Metadata
│
├── Part B
│
└── Part C
Enter fullscreen mode Exit fullscreen mode

Each part contains all information necessary for query execution.

This architecture enables ClickHouse to scale efficiently while maintaining high read and write performance.


Understanding Part Names

ClickHouse assigns structured names to data parts using the format:

{partition}_{min_block}_{max_block}_{level}
Enter fullscreen mode Exit fullscreen mode

Example:

202401_1_3_1
Enter fullscreen mode Exit fullscreen mode

Meaning:

Component Description
202401 January 2024 partition
1 Minimum block number
3 Maximum block number
1 Merge level

The merge level indicates how many merge operations contributed to the creation of the part.

Higher levels generally indicate larger and more consolidated parts.


Why Background Merges Matter

Background merges provide several important benefits:

  • Reduce the total number of parts
  • Improve query performance
  • Increase compression efficiency
  • Lower disk seek overhead
  • Simplify storage organization
  • Reduce metadata management costs

Without merges, tables would accumulate thousands of tiny parts, eventually affecting performance.


Monitoring Data Parts

ClickHouse provides the system.parts table for inspecting active and inactive parts.

View Active Parts

SELECT
    partition,
    name,
    rows,
    bytes_on_disk,
    active
FROM system.parts
WHERE database = 'default'
  AND table = 'orders'
  AND active = 1;
Enter fullscreen mode Exit fullscreen mode

Example output:

partition name rows bytes_on_disk active
202401 202401_1_1_0 2 1024 1
202401 202401_2_2_0 2 1024 1

This query helps monitor how many active parts exist in a table.


Monitoring Running Merges

You can inspect ongoing merge operations using system.merges.

SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts,
    result_part_name
FROM system.merges
WHERE database = 'default'
  AND table = 'orders';
Enter fullscreen mode Exit fullscreen mode

Typical output:

database table elapsed progress num_parts result_part_name
default orders 2.3 0.65 4 202401_1_4_1

Important fields include:

  • elapsed – duration of the merge
  • progress – completion percentage
  • num_parts – number of parts involved
  • result_part_name – name of the resulting merged part

Monitoring this table can help identify merge bottlenecks.


Triggering Manual Merges

In most situations, ClickHouse handles merges automatically.

However, manual merges can be triggered when necessary.

Merge Table Parts

OPTIMIZE TABLE default.orders;
Enter fullscreen mode Exit fullscreen mode

Merge a Specific Partition

OPTIMIZE TABLE default.orders
PARTITION '202401';
Enter fullscreen mode Exit fullscreen mode

Force a Complete Merge

OPTIMIZE TABLE default.orders FINAL;
Enter fullscreen mode Exit fullscreen mode

Important Warning

OPTIMIZE TABLE ... FINAL forces all parts within a partition to be merged into a single part.

On large tables, this operation can consume significant CPU, memory, and disk resources. It should only be used when absolutely necessary and preferably during low-traffic periods.


Best Practices

To maintain healthy MergeTree tables:

Use Large Batch Inserts

Aim for:

  • Minimum: 1,000 rows per insert
  • Recommended: 10,000–100,000 rows per insert

This minimizes the number of generated parts.

Monitor Part Counts

Regularly check system.parts.

If a partition contains hundreds of active parts, investigate ingestion patterns.

Avoid Excessive FINAL Operations

Repeatedly running OPTIMIZE FINAL can consume significant resources and reduce overall system efficiency.

Choose Partitioning Carefully

For most analytical workloads, monthly partitions using toYYYYMM() are preferable to daily partitions.

Allow Background Merges to Work

ClickHouse includes sophisticated merge scheduling mechanisms.

Avoid interrupting or disabling background merge processes unless absolutely necessary.

Monitor Merge Activity

Use system.merges to ensure merges are keeping up with ingestion workloads.


Common Mistakes

Many ClickHouse performance problems originate from a few recurring mistakes:

  • Inserting one row at a time
  • Creating excessive numbers of partitions
  • Ignoring growing part counts
  • Running OPTIMIZE FINAL repeatedly
  • Disabling background merges

Avoiding these issues can significantly improve cluster performance and stability.


Data Parts vs Background Merges

Data Parts Background Merges
Created during inserts Run automatically
Immutable storage units Combine multiple parts
Store table data Optimize storage layout
Queried directly Execute asynchronously
Increase with ingestion Reduce overall part count

Conclusion

Data parts and background merges are fundamental to how ClickHouse achieves high-performance analytics at scale. Every insert creates an immutable data part, allowing ingestion to remain fast and efficient, while background merges continuously optimize storage by combining smaller parts into larger ones.

This architecture enables ClickHouse to deliver excellent write throughput, strong compression, and fast query execution without requiring constant manual maintenance.

By understanding how parts are created, how merges operate, and how to monitor storage behavior through system tables such as system.parts and system.merges, you can build more efficient ingestion pipelines and maintain healthy MergeTree deployments.

The most important rule remains simple: batch your inserts, monitor part counts, and let ClickHouse handle storage optimization through its merge engine.

#100DaysOfClickHouse #ClickHouse #DatabaseInternals #MergeTree #DataEngineering #BigData #Analytics #OLAP #DatabasePerformance #Day57

Top comments (0)