DEV Community: Kanishga Subramani

Day 100 of #100DaysOfClickHouse: Optimizing Data Lake Queries with ClickHouse® 26.3 LTS

Kanishga Subramani — Sat, 25 Jul 2026 10:14:07 +0000

Introduction

Data lakes have become the backbone of modern analytics, allowing organizations to store petabytes of structured and unstructured data in cloud object storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. Open table formats like Apache Iceberg, Delta Lake, Apache Hudi, and Parquet have further accelerated this trend by enabling multiple analytics engines to access the same datasets without vendor lock-in or unnecessary data duplication.

While querying data directly from cloud object storage offers tremendous flexibility, it also introduces performance challenges. Network latency, metadata lookups, and large Parquet file scans can significantly slow query execution compared to locally stored data.

ClickHouse® 26.3 LTS addresses these challenges with several enhancements designed specifically for data lake workloads. These include improved parallel object storage reads, a built-in Parquet metadata cache, and asynchronous Iceberg metadata prefetching. Together, these optimizations reduce query latency and make interactive analytics over cloud-hosted datasets significantly faster.

In this article, we'll explore how ClickHouse® queries data lakes, the challenges involved, the new optimizations introduced in version 26.3 LTS, and practical examples of how these improvements benefit real-world analytical workloads.

Why Data Lake Query Optimization Matters

Modern analytics architectures increasingly separate storage from compute.

Instead of copying data into multiple databases, organizations store analytical datasets in cloud object storage and allow various query engines to access the same files directly.

This architecture provides several benefits:

Lower storage costs
Vendor-neutral open formats
Simplified data management
Multiple analytics engines sharing the same datasets

However, it also introduces several performance challenges:

Remote storage latency
Metadata lookups
Large file scans
Network overhead

ClickHouse® 26.3 introduces several optimizations that specifically target these bottlenecks.

Understanding Data Lake Queries

A data lake is a centralized repository that stores both raw and processed data using open formats, allowing multiple analytics engines to query the same datasets without duplication.

Unlike traditional data warehouses, where data is imported into proprietary storage, data lakes keep information inside cloud object storage.

Popular storage platforms include:

Amazon S3
Azure Blob Storage
Google Cloud Storage

Common open table formats include:

Apache Iceberg
Delta Lake
Apache Hudi
Parquet

Instead of importing data into MergeTree tables, ClickHouse® can query these datasets directly.

During query execution, ClickHouse:

Reads metadata
Identifies the required files
Reads only the required columns
Applies predicate pushdown where possible
Returns results without copying data into native storage

Cloud Object Storage
        │
        ▼
Parquet / Iceberg / Delta / Hudi
        │
        ▼
     ClickHouse
        │
        ▼
   SQL Analytics

This architecture eliminates duplicate storage while allowing ClickHouse® to serve as a high-performance analytical engine over existing data lakes.

Challenges of Querying Data Lakes

Although cloud object storage is highly scalable and cost-effective, querying remote datasets introduces several challenges.

Remote Storage Latency

Unlike local disks, every query must retrieve data across the network, increasing response times.

Metadata Overhead

Open table formats maintain metadata describing snapshots, manifests, partitions, and data files.

Before reading any actual data, ClickHouse® must first retrieve and process this metadata.

Large File Scans

Poor partitioning or inefficient pruning may require scanning significantly more data than necessary.

Repeated Metadata Reads

Interactive dashboards often execute the same queries repeatedly.

Without caching, ClickHouse® must repeatedly download identical metadata from remote storage.

How ClickHouse® Queries Data Lakes

ClickHouse® supports querying data directly from cloud object storage without requiring ingestion into MergeTree tables.

Supported technologies include:

Apache Iceberg
Delta Lake
Apache Hudi
Parquet files
Amazon S3 and compatible object storage

During query execution, ClickHouse® reads only the required files and columns, minimizing unnecessary I/O and enabling efficient analytics over remote datasets.

Data Lake Enhancements in ClickHouse® 26.3 LTS

Version 26.3 introduces several improvements that significantly reduce latency when querying remote object storage.

1. Faster Parallel Reading

One of the biggest improvements is enhanced parallel reading of remote data.

When a query accesses a relatively small number of large files, ClickHouse® now distributes work more efficiently across available CPU cores.

This optimization applies to:

Apache Iceberg
Delta Lake
Apache Hudi
Object storage reads

Benefits include:

Better CPU utilization
Faster remote file processing
Lower overall query execution time

For many workloads, queries become several times faster on multi-core systems.

2. Built-in Parquet Metadata Cache

Reading Parquet files requires accessing the file footer to obtain schema and metadata information.

Before ClickHouse® 26.3, repeated queries frequently reread this metadata from remote storage.

The new Parquet Metadata Cache stores footer information in memory.

Benefits include:

Reduced metadata reads
Lower remote I/O
Faster repeated queries
Improved dashboard responsiveness

The cache is enabled by default and automatically tracks object changes using file ETags to maintain consistency.

3. Asynchronous Iceberg Metadata Prefetching

Apache Iceberg maintains metadata describing snapshots, manifests, partitions, and data files.

Earlier versions often fetched this metadata during query execution, increasing planning time.

ClickHouse® 26.3 introduces asynchronous metadata prefetching.

Instead of waiting during query execution, ClickHouse® refreshes Iceberg metadata in the background and serves queries from the cache whenever possible.

Benefits include:

Reduced planning latency
Faster repeated queries
Improved dashboard responsiveness

Architecture Overview

              Data Lake
                  │
                  ▼
        Cloud Object Storage
 (Amazon S3 • Azure Blob • GCS)
                  │
                  ▼
 Apache Iceberg • Delta Lake
 Apache Hudi • Parquet
                  │
                  ▼
      ClickHouse® 26.3 LTS

   • Parallel File Reads
   • Parquet Metadata Cache
   • Iceberg Metadata Prefetch

                  │
                  ▼
 Fast SQL Analytics & Dashboards

Example 1: Querying a Parquet Dataset

SELECT
    country,
    count() AS total_events
FROM s3(
    'https://my-bucket.s3.amazonaws.com/events/*.parquet'
)
WHERE event_date >= today() - 7
GROUP BY country;

In this query, ClickHouse® reads Parquet files directly from cloud storage.

Because only the required columns are read and filtering is applied early, significantly less data must be processed.

Example 2: Querying an Iceberg Table

SELECT
    product_category,
    SUM(revenue) AS total_sales
FROM iceberg('analytics.sales')
WHERE order_date >= today() - 30
GROUP BY product_category
ORDER BY total_sales DESC;

Here, ClickHouse® uses Iceberg metadata to identify the necessary files.

The metadata cache and asynchronous prefetching reduce planning overhead while enabling efficient execution.

Real-World Example

Imagine an e-commerce company storing several years of clickstream data in Amazon S3 using Apache Iceberg.

Instead of copying terabytes of historical data into ClickHouse®, analysts query the Iceberg tables directly.

With ClickHouse® 26.3:

Enhanced parallel reading speeds up remote file processing.
The Parquet metadata cache avoids repeatedly reading footer metadata.
Iceberg metadata prefetching reduces planning latency.

As a result, dashboards become significantly faster without requiring data duplication or complex ETL pipelines.

Performance Benefits

Feature	Benefit	Why It Matters
Enhanced Parallel Reading	Faster query execution	Better CPU utilization when reading remote files
Parquet Metadata Cache	Lower query latency	Eliminates repeated metadata reads
Iceberg Metadata Prefetching	Faster query planning	Metadata is available before execution

Together, these optimizations improve:

Interactive dashboards
BI workloads
Exploratory analytics
Recurring analytical queries

Best Practices

To maximize performance when querying data lakes with ClickHouse®:

Store analytical datasets in Parquet format.
Partition datasets using commonly filtered columns.
Select only the columns required by your queries.
Apply filters as early as possible.
Take advantage of the built-in Parquet metadata cache.
Enable asynchronous Iceberg metadata prefetching for frequently queried datasets.
Keep ClickHouse® updated to benefit from ongoing data lake optimizations.

When Should You Use These Features?

ClickHouse® 26.3 is particularly valuable for organizations that:

Store analytical datasets in cloud object storage
Query Apache Iceberg tables
Query Delta Lake datasets
Analyze Apache Hudi data
Perform ad hoc SQL analysis over Parquet files
Build interactive BI dashboards
Operate large-scale cloud-native analytics platforms

Key Takeaways

Query cloud-hosted datasets without duplicating data.
Parallel reading accelerates remote file processing.
The Parquet metadata cache reduces repeated metadata access.
Iceberg metadata prefetching lowers query planning latency.
ClickHouse® 26.3 makes interactive analytics over modern data lakes significantly faster and more efficient.

Conclusion

As organizations continue adopting cloud-native architectures and open table formats, efficiently querying data lakes has become essential for modern analytics. ClickHouse® already enables users to query Apache Iceberg, Delta Lake, Apache Hudi, and Parquet datasets directly from cloud object storage without duplicating data or building complex ETL pipelines.

ClickHouse® 26.3 LTS further strengthens these capabilities with enhanced parallel object storage reads, a built-in Parquet metadata cache, and asynchronous Iceberg metadata prefetching. Together, these improvements reduce metadata overhead, minimize remote I/O, improve CPU utilization, and significantly accelerate query execution.

Whether you're building business intelligence dashboards, exploring petabyte-scale datasets, or developing cloud-native analytics platforms, these enhancements make ClickHouse® an even more compelling engine for high-performance analytics directly on modern data lakes.

Day 99 - Efficient Random Sampling with system.numbers_mt: Parallel Number Generation in ClickHouse® 26.3

Kanishga Subramani — Sat, 25 Jul 2026 08:16:28 +0000

Introduction

Every major ClickHouse® release introduces new features and performance improvements, but occasionally older experimental features are removed to simplify maintenance and improve long-term stability.

One such change in ClickHouse® 26.3 is the removal of the experimental Hypothesis Skip Index (TYPE hypothesis).

If you experimented with this index type in earlier versions, you'll need to update your schema before upgrading to ClickHouse® 26.3. Otherwise, table creation or schema restoration involving this index type will fail.

In this article, we'll explore what Hypothesis Skip Indexes were, why they were removed, how to identify affected tables, and the recommended migration path.

Understanding Data Skipping Indexes

Before discussing the deprecation, it's useful to understand how skip indexes work.

Unlike traditional relational databases that rely on B-tree secondary indexes, ClickHouse® is a column-oriented database optimized for analytical workloads. Instead of locating individual rows, ClickHouse® stores data in granules (blocks of rows).

A data skipping index stores metadata about each granule, allowing the query engine to determine whether an entire granule can be skipped during query execution.

When a query contains filtering conditions, ClickHouse® evaluates the skip index before reading data. If a granule cannot possibly satisfy the filter, it is skipped entirely, reducing disk I/O and improving query performance.

Depending on the workload, skip indexes can significantly reduce the amount of data scanned.

What Was the Hypothesis Skip Index?

The Hypothesis Skip Index (TYPE hypothesis) was an experimental skip index designed to precompute whether a particular boolean expression could evaluate to true within each granule.

Instead of storing values themselves, it stored one of three states for every granule:

Stored Value	Meaning
0	Expression is definitely false for all rows (granule can be skipped)
1	Expression may be true (granule must be scanned)
Unknown	Insufficient information

For queries using the same expression, ClickHouse® could immediately eliminate granules where the condition was guaranteed to be false.

Example

CREATE TABLE default.orders
(
    order_id UInt32,
    amount Float64,
    is_large UInt8 MATERIALIZED (amount > 1000),
    order_date Date,

    INDEX idx_large is_large TYPE hypothesis GRANULARITY 4
)
ENGINE = MergeTree()
ORDER BY order_id;

In earlier releases, ClickHouse® would precompute whether is_large could ever be true within each granule.

During execution of:

SELECT *
FROM orders
WHERE is_large = 1;

granules known to contain only is_large = 0 could be skipped.

Why Was It Removed?

Although technically interesting, the feature never matured beyond experimental status.

Some of its limitations included:

Limited production adoption
Known issues with certain data types such as FixedString
Experimental behavior without long-term compatibility guarantees
Similar optimization could be achieved using supported skip indexes together with materialized columns
Additional maintenance burden for the ClickHouse® developers

Because of these reasons, the feature has been removed in ClickHouse® 26.3.

What Changed in ClickHouse® 26.3?

Starting with ClickHouse® 26.3:

INDEX ... TYPE hypothesis is no longer recognized.
Creating new tables using this index type fails.
Schemas containing this index must be updated before upgrading.
Existing metadata referencing the deprecated index should be cleaned up.

What Happens After Upgrading?

Attempting to create a table with the removed index now results in an error similar to:

Unknown skip index type: hypothesis

Similarly, restoring backups or executing old DDL statements containing TYPE hypothesis will fail.

Finding Affected Tables

Before upgrading, review your table definitions.

SHOW CREATE TABLE default.orders;

If the output contains:

TYPE hypothesis

that table requires modification before upgrading.

For larger environments, searching exported DDL files or schema repositories for TYPE hypothesis is also recommended.

Removing the Deprecated Index

If the index is no longer required:

ALTER TABLE default.orders
DROP INDEX idx_large;

This removes the deprecated index definition without affecting the table's data.

Recommended Replacement Indexes

Depending on your workload, ClickHouse® offers several supported skip indexes.

Skip Index	Best Use Case
`minmax`	Numeric and date range filtering
`set`	Low-cardinality equality filters
`bloom_filter`	String equality and `IN` predicates
`ngrambf_v1`	Substring search
`tokenbf_v1`	Token-based full-text search

The replacement should be selected based on actual query patterns rather than simply replacing TYPE hypothesis with another index.

Example Migration

Old definition:

INDEX idx_large is_large TYPE hypothesis GRANULARITY 4

Possible replacement:

INDEX idx_large is_large TYPE minmax GRANULARITY 4

INDEX idx_status status TYPE set(100) GRANULARITY 4

For existing data, materialize the new index:

ALTER TABLE default.orders
MATERIALIZE INDEX idx_large;

ALTER TABLE default.orders
MATERIALIZE INDEX idx_status;

Materialization builds the new skip index for all previously stored parts.

Upgrade Checklist

Before moving to ClickHouse® 26.3:

Step	Action
1	Search schemas for `TYPE hypothesis`
2	Remove deprecated indexes
3	Replace with supported skip indexes where appropriate
4	Materialize new indexes
5	Validate changes in a staging environment
6	Proceed with the production upgrade

Best Practices

To avoid similar upgrade surprises in the future:

Avoid using experimental features in production systems.
Review release notes before every major upgrade.
Choose skip indexes based on observed query workloads.
Benchmark performance after index changes.
Validate schema migrations in a staging environment before production deployment.

Important Clarification

One common point of confusion is the similarity between two different features.

The deprecated feature discussed in this article is:

TYPE hypothesis

This is not the same as the newer Hypothetical Indexes feature introduced through:

CREATE HYPOTHETICAL INDEX

These are entirely different features with different purposes.

This article focuses only on the removal of the experimental TYPE hypothesis skip index in ClickHouse® 26.3.

Conclusion

The removal of the experimental Hypothesis Skip Index is a relatively small but important breaking change in ClickHouse® 26.3. Organizations upgrading from earlier releases should review their schemas for any remaining TYPE hypothesis definitions before upgrading.

Fortunately, modern skip indexes such as minmax, set, and Bloom filter variants provide reliable, production-ready alternatives for most workloads. By auditing existing tables, replacing deprecated indexes where necessary, and validating the changes in a staging environment, you can ensure a smooth upgrade with no unexpected schema failures.

As ClickHouse® continues to evolve, keeping schemas aligned with supported features is one of the simplest ways to maintain long-term performance, stability, and compatibility.

Day 98 of #100DaysOfClickHouse: ClickHouse® 26.3 JOIN Optimization – A Practical Guide to JOIN Types

Kanishga Subramani — Sat, 25 Jul 2026 05:33:13 +0000

Introduction

Joins are among the most performance-sensitive operations in any analytical database. Whether you're combining fact and dimension tables, filtering records based on related datasets, or performing data quality checks, the efficiency of your JOIN operations directly impacts query execution time and memory consumption.

ClickHouse® has continuously improved its JOIN execution engine over the years, making complex analytical queries faster and more resource-efficient. One of the notable enhancements in ClickHouse® 26.3 is the expansion of automatic JOIN reordering to additional JOIN types, including SEMI, ANTI, and FULL OUTER JOIN.

Prior to version 26.3, the query optimizer could automatically reorder only INNER and LEFT/RIGHT JOINs. With this release, the optimizer can now evaluate table statistics and automatically choose a more efficient join order for a wider range of JOIN types, reducing memory usage and improving query performance without requiring manual query rewrites.

In this article, we'll review the major JOIN types available in ClickHouse®, understand how automatic JOIN reordering works, explore what's new in ClickHouse® 26.3, and discuss practical best practices for writing efficient JOIN queries.

Understanding JOIN Types

Before exploring the optimizer improvements, it's important to understand what each JOIN type actually returns.

JOIN Type	Returns
INNER JOIN	Only matching rows from both tables
LEFT OUTER JOIN	All rows from the left table plus matching rows from the right
RIGHT OUTER JOIN	All rows from the right table plus matching rows from the left
FULL OUTER JOIN	All rows from both tables
LEFT SEMI JOIN	Left rows that have a matching row (without returning right-side columns)
LEFT ANTI JOIN	Left rows that have no matching row

Each JOIN serves a different purpose, and selecting the appropriate one can improve both correctness and performance.

Sample Tables

Throughout this article, we'll use two simple tables.

CREATE TABLE default.customers
(
    customer_id UInt32,
    name String
)
ENGINE = MergeTree
ORDER BY customer_id;

INSERT INTO default.customers VALUES
(1,'Alice'),
(2,'Bob'),
(3,'Charlie');

CREATE TABLE default.orders
(
    order_id UInt32,
    customer_id UInt32,
    amount Float64
)
ENGINE = MergeTree
ORDER BY order_id;

INSERT INTO default.orders VALUES
(101,1,1200),
(102,2,450);

Notice that Charlie has not placed any orders, making it easy to see how different JOIN types behave.

INNER JOIN

An INNER JOIN returns only rows that exist in both tables.

SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.amount
FROM customers AS c
INNER JOIN orders AS o
ON c.customer_id = o.customer_id;

Result:

customer_id	name	order_id	amount
1	Alice	101	1200
2	Bob	102	450

Charlie is excluded because no matching order exists.

Use when: You only need records that exist in both tables.

LEFT OUTER JOIN

A LEFT JOIN returns every row from the left table, regardless of whether a matching row exists on the right.

SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.amount
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;

Result:

customer_id	name	order_id	amount
1	Alice	101	1200
2	Bob	102	450
3	Charlie	NULL	NULL

Charlie appears because every customer is preserved.

Use when: You need all rows from the left table.

RIGHT OUTER JOIN

A RIGHT JOIN returns every row from the right table.

SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.amount
FROM customers c
RIGHT JOIN orders o
ON c.customer_id = o.customer_id;

Since every order belongs to a customer, the result contains only Alice and Bob.

Use when: You need all rows from the right table.

FULL OUTER JOIN

A FULL OUTER JOIN returns every row from both tables.

SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.amount
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id;

Result:

customer_id	name	order_id	amount
1	Alice	101	1200
2	Bob	102	450
3	Charlie	NULL	NULL

Rows without matches are filled with NULL values.

Use when: You need a complete combined view of both datasets.

LEFT SEMI JOIN

A SEMI JOIN checks whether a match exists but returns only columns from the left table.

SELECT
    c.customer_id,
    c.name
FROM customers c
LEFT SEMI JOIN orders o
ON c.customer_id = o.customer_id;

Result:

customer_id	name
1	Alice
2	Bob

Notice that no columns from the orders table are returned.

This is generally more efficient than using an INNER JOIN when you only need to verify that a related row exists.

Use when: Checking existence without retrieving data from the right table.

LEFT ANTI JOIN

ANTI JOIN is the opposite of SEMI JOIN.

It returns rows that do not have a matching record.

SELECT
    c.customer_id,
    c.name
FROM customers c
LEFT ANTI JOIN orders o
ON c.customer_id = o.customer_id;

Result:

customer_id	name
3	Charlie

Only Charlie is returned because he has never placed an order.

Use when:

Finding orphaned records
Data validation
Missing relationships
Customers who have never purchased

JOIN Comparison

JOIN Type	Alice	Bob	Charlie
INNER JOIN	✅	✅	❌
LEFT JOIN	✅	✅	✅ (NULL)
RIGHT JOIN	✅	✅	❌
FULL OUTER JOIN	✅	✅	✅ (NULL)
LEFT SEMI JOIN	✅	✅	❌
LEFT ANTI JOIN	❌	❌	✅

What's New in ClickHouse® 26.3?

ClickHouse executes hash joins by building an in-memory hash table from one side of the JOIN.

If the larger table becomes the hash table, memory consumption increases significantly.

Before ClickHouse® 26.3, the optimizer could automatically swap JOIN order only for:

INNER JOIN
LEFT JOIN
RIGHT JOIN

For SEMI, ANTI, and FULL OUTER JOIN, developers often needed to manually arrange tables in the most efficient order.

ClickHouse® 26.3 removes much of this manual work.

The optimizer now evaluates table statistics and can automatically reorder:

SEMI JOIN
ANTI JOIN
FULL OUTER JOIN

to build smaller hash tables whenever possible.

For example:

SELECT
    c.customer_id,
    c.name
FROM customers c
LEFT ANTI JOIN orders o
ON c.customer_id = o.customer_id;

Even if the query isn't written in the optimal order, ClickHouse® 26.3 can internally rearrange the join plan to improve efficiency.

Collecting Statistics for Better Optimization

Automatic JOIN reordering relies on accurate table statistics.

It is recommended to collect statistics on:

JOIN key columns
Frequently filtered columns

Useful statistics include:

TDigest

Provides data distribution and quantile estimates.

Useful for estimating filter selectivity.

ALTER TABLE orders
ADD STATISTICS customer_id TYPE tdigest;

ALTER TABLE orders
MATERIALIZE STATISTICS customer_id;

Uniq

Estimates column cardinality.

Useful for predicting JOIN selectivity.

CountMinSketch

Useful when filtering frequently on exact values.

Provides approximate frequency estimates with minimal memory.

Performance Best Practices

1. Prefer SEMI JOIN over INNER JOIN

Instead of:

SELECT customer_id
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id;

Use:

SELECT customer_id
FROM customers
LEFT SEMI JOIN orders
ON customers.customer_id = orders.customer_id;

This avoids reading unnecessary columns.

2. Prefer ANTI JOIN over NOT IN

Instead of:

SELECT customer_id
FROM customers
WHERE customer_id NOT IN
(
SELECT customer_id
FROM orders
);

Use:

SELECT customer_id
FROM customers
LEFT ANTI JOIN orders
USING(customer_id);

ANTI JOIN is typically faster and more memory-efficient on large datasets.

3. Let the Optimizer Help

Older ClickHouse versions often required manually placing the smaller table on the right.

With ClickHouse® 26.3, automatic JOIN reordering reduces the need for manual optimization across many JOIN types.

4. Verify Query Plans

Always inspect execution plans.

EXPLAIN

SELECT
    c.customer_id,
    c.name
FROM customers c
LEFT ANTI JOIN orders o
ON c.customer_id = o.customer_id;

EXPLAIN helps verify that the optimizer is selecting the expected execution strategy.

When Will This Improvement Matter?

You'll benefit the most if your workloads include:

Large analytical datasets
Multi-table joins
Frequent SEMI or ANTI JOIN queries
FULL OUTER JOIN operations
Memory-sensitive workloads
Data warehouse environments with complex reporting

For smaller datasets, the improvement may not be immediately noticeable, but at scale it helps reduce memory usage and improves overall query execution.

Conclusion

Choosing the correct JOIN type is one of the simplest ways to improve query performance in ClickHouse®.

While INNER, LEFT, and FULL OUTER JOIN cover most common scenarios, SEMI JOIN and ANTI JOIN are powerful alternatives that are often overlooked. They can reduce unnecessary data processing and improve memory efficiency when you only need to check whether matching rows exist.

ClickHouse® 26.3 builds on these capabilities by extending automatic JOIN reordering to SEMI, ANTI, and FULL OUTER JOINs. By leveraging table statistics, the optimizer can automatically choose a more efficient execution plan, reducing memory consumption and eliminating much of the manual tuning previously required.

It's another step toward making ClickHouse not only one of the fastest analytical databases available, but also one that increasingly optimizes itself behind the scenes—allowing developers to focus more on writing queries and less on fine-tuning execution strategies.

Day 97 of #100DaysOfClickHouse: Analyzing Memory Efficiency with Vertical Merges in ClickHouse® 26.3

Kanishga Subramani — Sat, 25 Jul 2026 05:15:13 +0000

Background merge operations are one of the most important components of ClickHouse®. Every insert creates immutable data parts, and the storage engine continuously merges these parts to keep query performance high and storage efficient. These merges also perform critical maintenance tasks such as applying TTL rules, removing expired records, recompressing data, and consolidating files.

For most workloads, this process happens quietly in the background. However, for organizations storing massive analytical datasets with hundreds of columns, background merges can become one of the largest consumers of memory.

ClickHouse® already introduced Vertical Merge, an optimization designed to reduce memory usage during merge operations by processing columns independently instead of entire rows. In ClickHouse 26.3, this optimization has been extended to TTL DELETE merges, significantly lowering memory consumption when expired rows are removed automatically.

This article explains why this enhancement matters, how Vertical Merge works internally, and why it improves the efficiency of large production deployments.

Why Merge Memory Matters

Unlike traditional databases that update data in place, ClickHouse stores data as immutable parts.

Every INSERT creates a new part, and background merge threads continuously combine smaller parts into larger ones.

During these merges, ClickHouse performs several maintenance operations simultaneously:

Merging multiple parts into one
Recompressing column files
Applying TTL expressions
Removing expired rows
Rewriting data into optimized storage layouts

For small datasets, merge memory consumption is usually insignificant.

Production analytical systems, however, often store:

Hundreds of columns
Large String columns
Nested structures
Arrays
Maps
JSON data

During a merge, ClickHouse may need to read, decompress, merge, filter, and rewrite large amounts of column data.

As tables become wider, the peak memory required during merges grows substantially.

While query optimization often receives the most attention, reducing merge memory is equally important because merges execute continuously in the background.

Horizontal Merge vs Vertical Merge

Historically, ClickHouse performed merges using a horizontal merge algorithm.

In a horizontal merge:

Entire rows are processed together.
All required columns remain active throughout much of the merge.
Memory usage increases as the number of columns grows.

This approach works well for narrow schemas but becomes increasingly expensive for wide analytical tables.

Vertical Merge takes a different approach.

Instead of processing complete rows, ClickHouse separates the merge into stages.

First, it determines the correct row ordering using the primary key. After that, each remaining column is processed independently.

The workflow looks like this:

Merge primary key columns.
Build the merged row mapping.
Process one non-key column at a time.
Write the merged output.
Release memory before processing the next column.

Since only a small subset of columns is active at any moment, peak memory usage becomes dramatically lower.

Diagram 1: Horizontal Merge vs Vertical Merge

Horizontal Merge

Part A          Part B
 |                |
 | Read ALL Columns
 |________________|
         |
   Merge Entire Rows
         |
   Write New Part


Vertical Merge

Part A          Part B
 |                |
 | Merge Primary Keys
 |_________________|
         |
   Build Row Mapping
         |
Column 1 -> Write
Column 2 -> Write
Column 3 -> Write
...
Column N -> Write

TTL DELETE Before ClickHouse 26.3

One of the most common maintenance tasks in ClickHouse is automatic data retention using TTL.

For example:

CREATE TABLE logs
(
    id UInt64,
    event_time DateTime,
    message String
)
ENGINE = MergeTree
ORDER BY id
TTL event_time + INTERVAL 30 DAY DELETE;

Rows older than 30 days are automatically removed during background merges.

Before ClickHouse 26.3, TTL DELETE operations primarily relied on the horizontal merge algorithm.

Although this approach correctly removed expired records, it also meant that all participating columns needed to be processed together.

For tables with hundreds of columns, peak memory usage during TTL cleanup could become quite large.

This was rarely noticeable for small tables but became increasingly important for enterprise deployments storing billions of rows.

What's New in ClickHouse 26.3?

ClickHouse 26.3 extends Vertical Merge support to TTL DELETE merges.

Instead of evaluating TTL rules while processing every column simultaneously, ClickHouse now performs TTL cleanup using the same memory-efficient vertical workflow already used for standard merges.

The process now works like this:

Read and merge primary key columns.
Evaluate the TTL expression.
Identify rows that should be deleted.
Build the row mapping.
Process each remaining column independently.
Write only the surviving rows.

Because non-key columns are handled individually, ClickHouse no longer needs to keep every column in memory simultaneously.

The result is significantly lower peak memory usage during automatic data cleanup.

Before vs After ClickHouse 26.3

Feature	Before 26.3	ClickHouse 26.3
TTL DELETE Merge	Horizontal Merge	Vertical Merge
Peak Memory Usage	Higher	Lower
Wide Table Performance	Memory intensive	More memory efficient
Background Cleanup	Higher resource consumption	Reduced memory pressure
Scalability	Limited by merge memory	Better scalability

Why Vertical Merge Uses Less Memory

The biggest improvement is that memory usage becomes far less dependent on the number of columns.

Imagine a table containing 400 columns.

With a horizontal merge, many of those columns may be loaded and processed together.

With Vertical Merge, ClickHouse processes:

one column,
writes the filtered data,
releases memory,
moves to the next column.

Only a tiny portion of the table is active at any given time.

The total amount of data processed remains identical.

What changes is the maximum memory required at any instant, which can be significantly smaller.

For servers running multiple concurrent merges, this reduction can substantially improve overall system stability.

Practical Example

Suppose an event logging platform stores billions of events and automatically deletes records older than 90 days.

CREATE TABLE events
(
    id UInt64,
    timestamp DateTime,
    user_id UInt64,
    event_type String,
    payload String
)
ENGINE = MergeTree
ORDER BY id
TTL timestamp + INTERVAL 90 DAY DELETE;

Every day, background merges remove expired records.

On previous versions, cleaning up a very wide table could temporarily consume a large amount of memory because every column participated in the merge simultaneously.

With ClickHouse 26.3, the same cleanup benefits from Vertical Merge processing.

The result is smoother background maintenance with lower peak memory usage and less pressure on the operating system.

Where You'll Notice the Biggest Improvements

This optimization is especially valuable when:

Tables contain hundreds of columns.
Large MergeTree tables are merged frequently.
TTL DELETE rules remove old data continuously.
Memory resources are limited.
Multiple background merges execute concurrently.
Analytical workloads generate large numbers of data parts.

Smaller tables may not show dramatic improvements because merge memory requirements are already relatively low.

Benefits of Vertical Merge for TTL DELETE

Extending Vertical Merge to TTL DELETE operations provides several practical advantages:

Lower peak memory consumption during background merges
Better scalability for very wide schemas
More stable merge execution
Reduced memory pressure across the server
Improved efficiency of automatic data retention
Better utilization of available system resources
More predictable performance under heavy workloads

Although users may never interact with this feature directly, its impact can be significant for production environments processing terabytes or petabytes of data.

Conclusion

Background merges are essential to how ClickHouse maintains fast analytical performance. As datasets become larger and schemas become wider, however, merge operations can consume substantial amounts of memory.

ClickHouse 26.3 addresses this challenge by extending the Vertical Merge algorithm to TTL DELETE operations. Instead of processing every column simultaneously, the storage engine now handles non-key columns independently, dramatically reducing peak memory requirements while preserving the correctness of TTL processing.

This enhancement operates entirely behind the scenes, requiring no application changes or configuration updates. Yet for production workloads that rely on automated data retention and large MergeTree tables, it can lead to more stable background merges, improved resource utilization, and better overall scalability.

It is another example of how ClickHouse continues refining its storage engine—not only to execute queries faster, but also to make the underlying maintenance operations increasingly efficient for modern analytical workloads.

Day 96/100 – Standard SQL Time Functions in ClickHouse® 26.3: Better ANSI SQL Compatibility

Kanishga Subramani — Thu, 23 Jul 2026 14:01:58 +0000

Introduction

One of the key goals of recent ClickHouse® releases has been improving compatibility with the ANSI SQL standard. While ClickHouse has always offered a rich set of date and time functions, some syntax differences compared to traditional relational databases required developers to modify existing SQL queries during migrations.

A common example was retrieving the current date or timestamp. Earlier versions of ClickHouse relied on function calls such as today() and now(), whereas many popular databases use SQL-standard keywords like CURRENT_DATE and CURRENT_TIMESTAMP.

ClickHouse® 26.3 bridges this gap by introducing support for SQL-standard time functions without parentheses. Although this may seem like a small enhancement, it significantly improves portability for applications, BI tools, reporting platforms, and ORMs that generate ANSI SQL.

In this article, we'll explore what's new, how these functions work, and why this seemingly simple feature makes migrating workloads to ClickHouse much easier.

Why SQL Compatibility Matters

Organizations rarely build analytics platforms from scratch.

Many migrate workloads from databases such as:

PostgreSQL
MySQL
SQL Server
Oracle
Snowflake

These systems generally follow ANSI SQL syntax for obtaining the current date and timestamp.

For example:

SELECT CURRENT_DATE;

SELECT CURRENT_TIMESTAMP;

Prior to ClickHouse® 26.3, these queries required modification before they could run successfully.

Even small syntax differences become significant when migrating:

Thousands of SQL queries
BI dashboards
Stored reports
ORM-generated SQL
Data transformation pipelines

Reducing these incompatibilities simplifies migrations and improves developer productivity.

Before ClickHouse® 26.3

Traditionally, ClickHouse used dedicated functions.

SELECT today();

SELECT now();

These functions remain fully supported and continue to be the recommended native ClickHouse approach.

However, applications written for ANSI SQL databases often expected keyword-based syntax instead.

What's New in ClickHouse® 26.3?

ClickHouse® 26.3 introduces support for several SQL-standard temporal keywords without requiring parentheses.

Supported syntax includes:

SELECT CURRENT_DATE;

SELECT CURRENT_TIMESTAMP;

SELECT NOW;

These expressions behave exactly like their ClickHouse counterparts while following the SQL standard used by many other database systems.

Function Comparison

SQL Standard	Traditional ClickHouse®	Supported in 26.3
`CURRENT_DATE`	`today()`	✅ Yes
`CURRENT_TIMESTAMP`	`now()`	✅ Yes
`NOW`	`now()`	✅ Yes (without parentheses)

These additions provide alternative syntax—they do not replace existing ClickHouse functions.

Examples

Current Date

SELECT CURRENT_DATE;

Equivalent native ClickHouse syntax:

SELECT today();

Example output:

2026-03-18

Current Timestamp

SELECT CURRENT_TIMESTAMP;

Equivalent ClickHouse function:

SELECT now();

Example output:

2026-03-18 14:35:12

NOW Without Parentheses

Before ClickHouse® 26.3:

SELECT now();

Now you can simply write:

SELECT NOW;

Both statements return the same current timestamp.

Why This Matters for Migrations

Many migration projects involve moving hundreds or thousands of SQL statements from existing analytical databases into ClickHouse.

Consider a PostgreSQL query:

SELECT
    CURRENT_DATE,
    CURRENT_TIMESTAMP;

Earlier versions required rewriting this as:

SELECT
    today(),
    now();

With ClickHouse® 26.3, the original SQL can often run unchanged.

This reduces migration effort while improving compatibility with third-party SQL generators.

Better Support for BI Tools

Many reporting platforms automatically generate ANSI SQL.

Examples include:

Apache Superset
Metabase
Tableau
Power BI
Looker
JDBC-based reporting tools

Because these tools frequently generate CURRENT_DATE or CURRENT_TIMESTAMP, ClickHouse now accepts these expressions without requiring query modifications.

This helps reduce compatibility issues during deployment.

Existing Date and Time Functions Remain Available

It's important to note that ClickHouse® 26.3 does not introduce a new date-time engine.

All existing functions remain available, including:

today()
now()
toDate()
toDateTime()
dateDiff()
dateAdd()
toStartOfMonth()
toStartOfDay()
toStartOfWeek()
toStartOfHour()

The new feature simply adds SQL-standard alternatives for retrieving the current date and timestamp.

Benefits

Feature	Benefit
`CURRENT_DATE`	ANSI SQL compatibility
`CURRENT_TIMESTAMP`	Easier query portability
`NOW` without parentheses	Familiar syntax for SQL users
Reduced query rewrites	Faster database migrations
Better ORM compatibility	Less application code modification
Improved BI tool support	Greater interoperability

Best Practices

To get the most from this enhancement:

Use SQL-standard syntax when writing portable SQL that may run across multiple database systems.
Continue using native ClickHouse functions if your environment is already optimized around them.
When migrating applications, test existing SQL before rewriting—it may now work without changes.
Validate ORM-generated SQL after upgrading to ClickHouse® 26.3, as many generated queries may become compatible automatically.
Keep using ClickHouse's rich date and time functions for advanced analytical workloads, as these remain the primary tools for date manipulation.

What Didn't Change

Although the syntax is new, the underlying behavior is not.

These additions:

Do not change how ClickHouse calculates dates or timestamps.
Do not replace existing functions such as today() or now().
Do not introduce new time zones or formatting options.
Do not affect performance compared to their native equivalents.

The primary goal is better SQL compatibility, making ClickHouse easier to adopt for teams migrating from other database platforms.

Final Thoughts

Not every new feature needs to be a major performance optimization or a groundbreaking capability. Sometimes, small improvements can have a significant impact on developer experience.

The addition of CURRENT_DATE, CURRENT_TIMESTAMP, and NOW without parentheses in ClickHouse® 26.3 is one such enhancement. By embracing ANSI SQL syntax, ClickHouse reduces friction for organizations migrating from traditional relational databases and improves compatibility with BI tools, ORMs, and SQL-generating applications.

Existing ClickHouse functions like today() and now() remain fully supported, giving developers the flexibility to choose the syntax that best fits their workflows. For teams building portable SQL or modernizing existing analytics platforms, this update makes the transition to ClickHouse just a little smoother—and that's a meaningful improvement in itself.

References

ClickHouse® 26.3 Release Notes
ClickHouse® Documentation – Date and Time Functions
ANSI SQL Standard – Date and Time Expressions

Day 95/100 – ClickHouse® 26.3 S3 Object Storage Read Enhancements: Faster Data Lake Queries

Kanishga Subramani — Thu, 23 Jul 2026 13:50:08 +0000

Introduction

Cloud object storage has become the foundation of modern data lake architectures. Services such as Amazon S3 and S3-compatible object stores provide virtually unlimited, cost-effective storage for massive analytical datasets, making them the preferred choice for storing historical logs, clickstream data, IoT events, machine learning datasets, and business intelligence workloads.

Rather than copying every dataset into local storage, many organizations now query data directly from object storage using formats such as Parquet, Iceberg, Delta Lake, and Apache Hudi. This approach reduces storage costs, simplifies data sharing, and enables multiple analytics engines to work on the same data lake.

ClickHouse® has long supported querying data directly from S3 using table functions and native storage engines. However, network latency and object storage access overhead have traditionally made remote reads slower than querying local MergeTree tables.

ClickHouse® 26.3 significantly improves this experience with major enhancements to the S3 object storage read path. The release introduces faster parallel reads, smarter metadata caching, asynchronous Iceberg metadata prefetching, more efficient S3Queue ingestion, and reduced memory usage for semi-structured data.

The result is dramatically faster queries against object storage without requiring schema changes, query rewrites, or application modifications.

In this article, we'll explore these improvements and understand how they benefit modern lakehouse architectures.

Why Object Storage Matters

Today's analytical workloads increasingly separate storage from compute.

Instead of keeping all datasets on local disks, organizations store data in cloud object storage while scaling compute independently.

Common use cases include:

Data lakes
Data lakehouses
Historical event storage
Log analytics
Clickstream analysis
Machine learning datasets
Business intelligence platforms

Popular table formats include:

Apache Parquet
Apache Iceberg
Delta Lake
Apache Hudi

ClickHouse can query these formats directly without first importing data into MergeTree tables.

The Challenge Before ClickHouse® 26.3

Although object storage offers excellent scalability and lower storage costs, remote reads naturally introduce additional overhead.

Each query may require:

Opening remote objects
Reading metadata
Fetching Parquet footers
Downloading row groups
Waiting on network latency

For workloads scanning only a handful of large files, CPUs frequently became underutilized while waiting for remote I/O.

As a result, applications experienced higher query latency compared to local storage.

Faster Parallel Reads

The most significant improvement in ClickHouse® 26.3 is a redesigned object storage read path.

Instead of waiting for individual remote read operations to complete sequentially, ClickHouse now parallelizes object storage reads across multiple CPU cores.

This improvement benefits:

Amazon S3
S3-compatible storage
Apache Iceberg
Delta Lake
Apache Hudi
Parquet files
CSV files queried through the s3() table function

The biggest performance gains occur when:

Queries read a relatively small number of files
Files are large
Multiple CPU cores are available

Instead of leaving CPU cores idle while waiting on network operations, ClickHouse keeps multiple cores busy simultaneously.

According to the ClickHouse® 26.3 release notes, these optimizations can make object storage reads tens of times faster on multi-core systems for suitable workloads.

Smarter Parquet Metadata Caching

Every Parquet file contains metadata stored in its footer.

Before processing data, ClickHouse reads this footer to understand:

Schema
Row groups
Column statistics
File layout

Repeatedly downloading this metadata for frequently queried files adds unnecessary latency.

ClickHouse® 26.3 introduces a new SLRU (Segmented Least Recently Used) cache for Parquet metadata.

Benefits include:

Enabled by default
Up to 2× fewer metadata reads
Faster repeated queries
Reduced network requests

To ensure correctness, cached metadata is validated using each file's ETag, preventing stale metadata from being used.

Faster Iceberg Metadata Access

Iceberg users receive another major optimization.

Traditionally, each query needed to communicate with the Iceberg catalog before execution.

Although necessary for consistency, repeated catalog lookups increase query latency.

ClickHouse® 26.3 introduces asynchronous metadata prefetching.

Instead of retrieving metadata during every query, ClickHouse periodically refreshes metadata in the background.

Example:

CREATE TABLE my_iceberg (...)
ENGINE = IcebergS3(...)
SETTINGS
iceberg_metadata_async_prefetch_period_ms = 60000;

Queries can specify acceptable metadata freshness.

SELECT *
FROM my_iceberg
SETTINGS
iceberg_metadata_staleness_ms = 30000;

If cached metadata is sufficiently recent, ClickHouse avoids contacting the Iceberg catalog entirely during query execution.

This removes catalog communication from the critical query path.

Better S3Queue Performance

S3Queue continuously monitors object storage for newly uploaded files.

Before ClickHouse® 26.3, queues frequently scanned the complete object prefix history to identify new files.

For buckets containing millions of historical objects, repeated listing operations became increasingly expensive.

ClickHouse® 26.3 improves ordered-mode S3Queue by using the StartAfter parameter.

Benefits include:

Avoids scanning entire bucket history
Reduces ListObjects API requests
Faster detection of new files
Lower cloud API costs

This is particularly valuable for long-running ingestion pipelines.

Lower Memory Usage for JSON Data

Many organizations store event data as JSON inside object storage.

Queries often read only a small subset of JSON attributes.

Earlier versions sometimes overestimated memory requirements for these partial reads.

ClickHouse® 26.3 introduces more accurate memory estimation for JSON subcolumns.

Benefits include:

Up to 8× lower memory usage
Better resource utilization
Improved query stability
More efficient semi-structured analytics

This enhancement is especially useful for:

Event logs
Clickstream data
Application telemetry
Observability platforms

What This Means for Your Architecture

One of the best aspects of these improvements is that they require virtually no application changes.

If you're already using:

s3() table functions
IcebergS3
DeltaLakeS3
Hudi
S3Queue

most improvements become available simply by upgrading to ClickHouse® 26.3.

Organizations benefit from:

Lower query latency
Reduced network overhead
Fewer API calls
Better CPU utilization
Improved scalability

without changing existing SQL queries.

Getting Started

Most improvements are enabled automatically.

For the best results, consider the following recommendations.

Verify Parquet Metadata Caching

Ensure metadata caching remains enabled for frequently queried datasets.

use_parquet_metadata_cache = 1

Configure Iceberg Metadata Prefetch

For busy Iceberg tables:

iceberg_metadata_async_prefetch_period_ms = 60000

Enable Ordered S3Queue

Long-running ingestion pipelines benefit from ordered mode together with the StartAfter optimization.

Best Practices

To maximize performance:

Store analytical data using Parquet whenever possible.
Partition data appropriately to minimize unnecessary scans.
Enable metadata caching for repeated queries.
Configure asynchronous Iceberg metadata refresh on heavily queried tables.
Use ordered-mode S3Queue for continuous ingestion.
Monitor object storage latency alongside ClickHouse performance metrics.
Keep ClickHouse updated to benefit from ongoing object storage optimizations.

Final Thoughts

Object storage has become the backbone of modern analytics, offering scalable and cost-effective storage for massive datasets. However, remote reads have traditionally introduced a performance gap compared to querying locally stored data.

ClickHouse® 26.3 significantly narrows that gap by redesigning the object storage read path, improving parallelism, introducing intelligent Parquet metadata caching, optimizing Iceberg metadata access, enhancing S3Queue ingestion, and reducing memory consumption for JSON workloads.

For organizations building lakehouse architectures on Amazon S3 or compatible object storage, these enhancements translate into faster queries, lower API costs, better resource utilization, and improved scalability—all without changing schemas, rewriting SQL, or redesigning ingestion pipelines.

As more organizations adopt cloud-native analytics, these S3 improvements make ClickHouse an even stronger choice for high-performance querying directly against data lakes.

References

ClickHouse® 26.3 Release Notes
ClickHouse® 26.3 Announcement
ClickHouse® Documentation – S3 Table Function
ClickHouse® Documentation – Iceberg Table Engine
ClickHouse® Documentation – S3Queue

Day 94/100 – Automated Insert Batching in ClickHouse® 26.3: Higher Throughput with Asynchronous Inserts

Kanishga Subramani — Wed, 22 Jul 2026 16:13:58 +0000

Introduction

Modern data platforms are expected to ingest massive volumes of data in real time. Whether it's application logs, IoT sensor readings, monitoring metrics, clickstream events, or messages from streaming platforms like Kafka, many workloads generate thousands—or even millions—of small INSERT operations every second.

ClickHouse® is designed for high-performance analytical workloads and can ingest data at remarkable speeds. However, one common performance bottleneck remains: a large number of very small inserts.

Each individual INSERT requires ClickHouse to parse the query, validate the data, compress blocks, update metadata, and create new data parts. While these operations are efficient individually, repeating them thousands of times per second creates unnecessary overhead and increases the work performed by background merge processes.

ClickHouse® 26.3 improves this scenario by enhancing the batching behavior of asynchronous inserts. Instead of writing every small insert immediately, ClickHouse temporarily buffers incoming asynchronous insert requests and combines multiple small inserts into larger writes before flushing them to storage.

The result is:

Higher ingestion throughput
Fewer data parts
Reduced merge overhead
Better compression efficiency
Lower CPU and disk utilization

In this article, we'll explore how automated insert batching works, why it improves performance, and when you should consider enabling asynchronous inserts in your ClickHouse deployments.

Note: "Automated Insert Batching" is not an official ClickHouse feature name. Throughout this article, the term refers to the enhanced batching behavior of asynchronous inserts introduced in ClickHouse® 26.3, where multiple small insert requests are automatically grouped into larger writes before being persisted.

Why Small INSERT Statements Hurt Performance

Imagine an application receiving telemetry events every second.

Instead of accumulating events into larger batches, it continuously executes tiny insert operations.

INSERT (10 rows)
INSERT (15 rows)
INSERT (8 rows)
INSERT (12 rows)
INSERT (20 rows)
...

Although each insert contains only a handful of rows, ClickHouse still performs the complete insert workflow for every request.

Each insert requires:

SQL parsing
Query validation
Block creation
Data compression
Metadata updates
New data part creation
Scheduling future merge operations

When applications generate thousands of tiny inserts, these fixed costs are repeated continuously, reducing overall ingestion efficiency.

Traditional INSERT Workflow

Without batching, every insert is processed independently.

Application

│
├── INSERT
├── INSERT
├── INSERT
├── INSERT
│
▼

ClickHouse

│
├── Part 1
├── Part 2
├── Part 3
├── Part 4
│
▼

MergeTree Storage

Every insert creates its own data part.

Although MergeTree is optimized for immutable data parts, creating thousands of tiny parts introduces unnecessary overhead throughout the storage engine.

Why Too Many Small Parts Are a Problem

A large number of tiny data parts negatively impacts multiple areas of ClickHouse performance.

Some common consequences include:

More background merge operations
Increased CPU utilization
Higher disk I/O
Larger metadata structures
Slower query planning
Increased memory consumption
Additional storage fragmentation

Background merges become especially busy trying to combine many small parts into larger ones.

This is why ClickHouse documentation consistently recommends:

Avoid sending many tiny INSERT statements whenever possible.

Automated Insert Batching

ClickHouse® 26.3 improves this workflow by batching asynchronous inserts automatically.

Instead of writing every request immediately, ClickHouse buffers incoming asynchronous insert requests for a short period.

Multiple insert requests are then combined into one larger write.

Application

INSERT
INSERT
INSERT
INSERT
INSERT

        │
        ▼

 Asynchronous Buffer

        │

Combine Requests

        ▼

Large INSERT

        ▼

ClickHouse Storage

From the application's perspective, nothing changes.

The application continues sending small inserts.

Internally, however, ClickHouse performs fewer, larger writes.

Why This Improves Throughput

Every insert has a fixed processing cost regardless of whether it contains:

5 rows
20 rows
100 rows

When ClickHouse combines many small inserts into a single larger write, those fixed costs occur only once.

Instead of repeating expensive operations thousands of times, ClickHouse performs them for the combined batch.

Benefits include:

Fewer metadata updates
Fewer compression operations
Fewer data parts
Reduced write amplification
Lower CPU overhead
Less merge activity

The overall result is significantly higher ingestion throughput.

Example

Suppose a monitoring platform generates:

2,000 INSERT statements
20 rows per INSERT

Without batching:

2,000 INSERTS

↓

2,000 Data Parts

With automated batching:

2,000 INSERTS

↓

40 Large Batches

↓

40 Data Parts

The total number of inserted rows remains exactly the same.

However, ClickHouse creates only a small fraction of the data parts.

This dramatically reduces merge operations and improves overall system efficiency.

How Asynchronous Inserts Work

Automatic batching is powered by asynchronous inserts.

Instead of immediately writing each insert to disk, ClickHouse temporarily stores incoming insert requests in memory.

The buffered data is flushed when configurable thresholds are reached.

Typical controls include:

Maximum buffer size
Flush timeout
Queue size limits

These settings allow administrators to balance:

Insert latency
Memory usage
Throughput

Depending on the workload, ClickHouse can automatically optimize write efficiency without requiring application changes.

Enabling Asynchronous Inserts

Asynchronous inserts can be enabled at the session level.

SET async_insert = 1;

SET wait_for_async_insert = 1;

Or configured for an individual insert statement.

INSERT INTO events
SETTINGS
    async_insert = 1,
    wait_for_async_insert = 1
VALUES (...);

What do these settings mean?

async_insert = 1

Enables asynchronous inserts, allowing ClickHouse to buffer incoming requests before writing them to storage.

wait_for_async_insert = 1

Waits until the buffered data has been successfully written before acknowledging the insert to the client.

These settings can also be configured globally using server configuration files, making them suitable for production deployments with consistent ingestion patterns.

When Does Automatic Batching Help?

Automatic batching is particularly valuable for workloads that continuously generate many small inserts.

Common examples include:

IoT Platforms

Thousands of sensors continuously publish measurements.

Instead of immediately writing every reading, ClickHouse batches them into larger writes.

Application Logging

Modern applications generate logs every few milliseconds.

Batching dramatically reduces write overhead.

Monitoring Systems

Monitoring agents continuously send metrics.

Automatic batching helps reduce part creation while maintaining near real-time visibility.

Event Streaming

Applications consuming events from Kafka, RabbitMQ, or Pulsar often insert relatively small batches.

ClickHouse combines them automatically for higher throughput.

Clickstream Analytics

User interactions arrive continuously throughout the day.

Batching improves scalability without requiring changes to event producers.

Benefits

Feature	Benefit
Larger insert batches	Higher throughput
Fewer data parts	Lower merge overhead
Better compression	Reduced storage usage
Lower CPU utilization	More efficient ingestion
Lower disk I/O	Faster writes
Better scalability	Handles higher event rates

Before vs After

Without Automatic Batching	With Automatic Batching
Many tiny writes	Writes combined automatically
Large number of data parts	Significantly fewer parts
Frequent merges	Reduced merge activity
Higher CPU usage	Lower CPU usage
Lower throughput	Higher throughput

Monitoring Insert Performance

Several ClickHouse system tables help monitor insert behavior.

System Table	Purpose
`system.parts`	Active data parts
`system.part_log`	Part creation history
`system.merges`	Background merge activity
`system.metrics`	Insert-related metrics

Example:

SELECT
    table,
    count() AS active_parts
FROM system.parts
WHERE active = 1
GROUP BY table;

If a table contains an unusually large number of active parts, it may indicate that inserts are arriving in very small batches and causing excessive merge activity.

Monitoring these tables regularly can help identify ingestion bottlenecks before they affect query performance.

Best Practices

To maximize ingestion throughput:

Enable asynchronous inserts for high-frequency workloads.
Prefer larger batches whenever your application allows.
Avoid single-row inserts whenever possible.
Monitor active data parts using system.parts.
Watch background merges using system.merges.
Tune asynchronous insert thresholds based on workload characteristics.
Measure throughput before and after enabling batching to quantify improvements.

When Automatic Batching Provides Limited Benefit

Although batching improves many workloads, it isn't beneficial in every situation.

Performance gains may be limited when:

Your application already sends large batch inserts.
Insert operations occur infrequently.
Immediate data visibility is more important than throughput.
Your workload already produces relatively few data parts.

In these cases, batching provides little additional optimization because the application has already minimized insert overhead.

Things to Consider

Automatic batching introduces a small trade-off.

Because ClickHouse briefly buffers insert requests before writing them to disk, data may become visible slightly later than with synchronous inserts.

For most analytical workloads, this delay is negligible.

The improvements in throughput, storage efficiency, CPU utilization, and merge performance typically outweigh the small increase in insert latency.

The optimal configuration depends on your workload's balance between:

Freshness requirements
Write throughput
Resource utilization

Final Thoughts

ClickHouse® 26.3 continues to improve one of its greatest strengths—high-speed data ingestion.

By enhancing the batching behavior of asynchronous inserts, ClickHouse automatically combines multiple small insert requests into larger writes, reducing the overhead associated with part creation, compression, metadata updates, and background merges.

For workloads involving logs, metrics, IoT telemetry, event streaming, clickstream analytics, or real-time monitoring, enabling asynchronous inserts can significantly improve scalability while reducing CPU usage, disk I/O, and storage fragmentation.

If your applications generate thousands of small inserts every second, automated insert batching is a simple optimization that can deliver substantial performance improvements with minimal changes to your existing ingestion pipeline.

As ClickHouse continues to evolve, features like this demonstrate how thoughtful engineering can improve both performance and operational efficiency, making it even better suited for modern, data-intensive applications.

References

ClickHouse® 26.3 Release Notes
ClickHouse® Documentation – Asynchronous Inserts
ClickHouse® Documentation – Bulk Inserts Best Practices

Day 93/100 – New Unicode String Functions in ClickHouse® 26.3: Better Text Search and Normalization

Kanishga Subramani — Wed, 22 Jul 2026 16:03:13 +0000

Introduction

Modern data platforms process enormous volumes of text originating from users all over the world. Names, addresses, product descriptions, customer reviews, search queries, and application logs often contain accented characters, Unicode symbols, ligatures, emojis, and language-specific casing rules. While these characters make text accurate and meaningful for users, they also introduce challenges when performing searches, comparisons, joins, deduplication, or analytics.

Consider a few common examples:

résumé and resume
Straße and STRASSE
crème brûlée and creme brulee
Different Unicode representations of visually identical characters

Although these strings may look identical—or nearly identical—to people, they are stored differently internally. Traditional string comparison functions such as lower() or upper() often fail to recognize these differences, leading to missed search results, duplicate records, inconsistent grouping, and unexpected join failures.

To simplify Unicode-aware text processing, ClickHouse® 26.3 introduces three powerful new string functions:

caseFoldUTF8()
removeDiacriticsUTF8()
normalizeUTF8NFKCCasefold()

Together, these functions make it much easier to build multilingual applications that perform reliable searches, comparisons, and text normalization while following Unicode standards.

In this article, we'll explore each function in detail, understand when to use it, and walk through practical SQL examples.

Why Unicode Normalization Matters

Working with Unicode text is more complicated than it first appears.

Many characters have multiple valid Unicode representations. Two strings can look identical on screen while having completely different binary representations.

For example:

The letter é can exist as a single Unicode character.
It can also be represented as the character e followed by a combining accent.

Similarly:

Straße
STRASSE

represent the same German word, but traditional lowercase conversion treats them differently because the German letter ß is not equivalent to ss under simple lowercase rules.

Accent marks introduce similar problems:

résumé
resume

Both words may represent the same search intent, but standard string comparisons treat them as different values.

Without Unicode normalization:

Searches miss valid matches.
Duplicate detection becomes unreliable.
GROUP BY operations produce unexpected results.
JOIN conditions fail.
User-entered data becomes inconsistent across applications.

Unicode-aware normalization solves these problems by transforming text into standardized forms before comparison.

Unicode Support Before ClickHouse® 26.3

ClickHouse® already provided several Unicode normalization functions before version 26.3.

Function	Description
`normalizeUTF8NFC()`	NFC normalization
`normalizeUTF8NFD()`	NFD normalization
`normalizeUTF8NFKC()`	Compatibility composition
`normalizeUTF8NFKD()`	Compatibility decomposition
`upperUTF8()`	UTF-8 uppercase conversion
`lowerUTF8()`	UTF-8 lowercase conversion

These functions remain useful for Unicode normalization, but ClickHouse® 26.3 expands the toolkit with three new functions specifically designed for modern multilingual text processing.

New Functions Introduced in ClickHouse® 26.3

1. caseFoldUTF8()

What does it do?

caseFoldUTF8() performs Unicode case folding.

Unlike lowerUTF8(), case folding is specifically designed for case-insensitive comparison according to the Unicode standard.

Instead of simply converting uppercase letters into lowercase letters, case folding also handles special Unicode characters that ordinary lowercase conversion cannot.

Example

SELECT
    caseFoldUTF8('Straße') AS value1,
    caseFoldUTF8('STRASSE') AS value2;

Output

value1	value2
strasse	strasse

Now compare them:

SELECT
    caseFoldUTF8('Straße') =
    caseFoldUTF8('STRASSE') AS is_equal;

Output

is_equal
1

Notice the difference.

Using lowerUTF8():

Straße
↓

straße

while

STRASSE
↓

strasse

These values are still different.

Unicode case folding correctly transforms:

ß → ss

making both strings identical.

Practical Example – Case-Insensitive Name Search

SELECT
    user_id,
    name,
    country
FROM default.users
WHERE caseFoldUTF8(name) =
      caseFoldUTF8('müller');

This query successfully matches:

Müller
MÜLLER
MüLLeR
müller

regardless of how users entered the name.

More Examples

SELECT
    caseFoldUTF8('Hello World') AS english,
    caseFoldUTF8('HÉLLO WÖRLD') AS accented,
    caseFoldUTF8('Ω') AS greek;

Output

english	accented	greek
hello world	héllo wörld	ω

2. removeDiacriticsUTF8()

What does it do?

Many European languages use accent marks known as diacritics.

For example:

Although visually important, many search systems ignore these marks so users can find results without typing accents.

removeDiacriticsUTF8() removes these accent marks while preserving the base letters.

Example

SELECT
    removeDiacriticsUTF8('crème brûlée') AS result;

Output

result
creme brulee

More Examples

SELECT
    removeDiacriticsUTF8('résumé') AS example1,
    removeDiacriticsUTF8('naïve') AS example2,
    removeDiacriticsUTF8('São Paulo') AS example3,
    removeDiacriticsUTF8('Zürich') AS example4;

Output

example1	example2	example3	example4
resume	naive	Sao Paulo	Zurich

Practical Example – Accent-Insensitive Search

SELECT
    city,
    country,
    population
FROM default.cities
WHERE
removeDiacriticsUTF8(lowerUTF8(city))
=
removeDiacriticsUTF8(lowerUTF8('sao paulo'));

This matches:

São Paulo
Sao Paulo
SAO PAULO
são paulo

without requiring multiple conditions.

3. normalizeUTF8NFKCCasefold()

What does it do?

normalizeUTF8NFKCCasefold() combines two powerful Unicode operations into one function:

Unicode Compatibility Normalization (NFKC)
Unicode Case Folding

It is the most comprehensive Unicode normalization function introduced in ClickHouse® 26.3.

Unlike removeDiacriticsUTF8(), this function does not remove accents. Instead, it standardizes compatibility characters and applies Unicode-aware case folding.

What is NFKC Normalization?

Unicode includes many compatibility characters that look different but represent the same logical character.

Examples include:

Character	Normalized Result
ﬁ	fi
Ａ	A
²	2

NFKC converts these compatibility characters into their standard equivalents.

Example

SELECT
normalizeUTF8NFKCCasefold('ﬁle') AS normalized;

Output

normalized
file

The ligature ﬁ becomes the letters fi, and case folding is also applied.

More Examples

SELECT
normalizeUTF8NFKCCasefold('ﬁle') AS ligature,
normalizeUTF8NFKCCasefold('Ａ Ｂ Ｃ') AS fullwidth,
normalizeUTF8NFKCCasefold('²²') AS superscript,
normalizeUTF8NFKCCasefold('Straße') AS german;

Output

ligature	fullwidth	superscript	german
file	a b c	22	strasse

Practical Example – Normalizing User Search Queries

SELECT
    user_id,
    normalizeUTF8NFKCCasefold(search_query) AS normalized_query
FROM default.search_logs
WHERE normalizeUTF8NFKCCasefold(search_query)
      =
      normalizeUTF8NFKCCasefold('café')
LIMIT 10;

This query ensures that searches become consistent even when users type different Unicode compatibility characters or different letter cases.

If you also want accent-insensitive matching, combine it with removeDiacriticsUTF8().

Which Function Should You Use?

Function	Primary Purpose	Typical Use Case
`caseFoldUTF8()`	Unicode-aware case comparison	Case-insensitive matching
`removeDiacriticsUTF8()`	Removes accents	Accent-insensitive search
`normalizeUTF8NFKCCasefold()`	Compatibility normalization + case folding	Deduplication and equality checks
`removeDiacriticsUTF8(caseFoldUTF8())`	Case + accent normalization	Flexible multilingual search

Also New in ClickHouse® 26.3: naturalSortKey()

Although not a Unicode normalization function, ClickHouse® 26.3 also introduces naturalSortKey(), which provides human-friendly sorting for strings containing numbers.

Example:

SELECT name
FROM files
ORDER BY naturalSortKey(name);

Output

name
file1.txt
file2.txt
file10.txt
file20.txt
file100.txt

Without naturalSortKey(), standard lexicographical sorting would incorrectly place file10.txt before file2.txt.

Natural sorting recognizes numeric values inside strings and orders them the way people naturally expect.

Quick Reference

Function	Example Input	Output
`caseFoldUTF8()`	Straße	strasse
`caseFoldUTF8()`	MÜLLER	müller
`removeDiacriticsUTF8()`	crème brûlée	creme brulee
`removeDiacriticsUTF8()`	São Paulo	Sao Paulo
`normalizeUTF8NFKCCasefold()`	ﬁle	file
`normalizeUTF8NFKCCasefold()`	Straße	strasse
`naturalSortKey()`	file10.txt	Sorts after file2.txt

Best Practices

Use caseFoldUTF8() instead of lowerUTF8() whenever you need reliable Unicode-aware case-insensitive comparisons.
Use removeDiacriticsUTF8() for search functionality where accents should not affect matching, such as names, cities, or product catalogs.
Combine removeDiacriticsUTF8(caseFoldUTF8(column)) to support both case-insensitive and accent-insensitive searches.
Use normalizeUTF8NFKCCasefold() when importing data from external systems that may contain compatibility characters, ligatures, or full-width Unicode characters.
Store normalized values as MATERIALIZED columns whenever possible so normalization occurs only once during inserts rather than on every query.
Choose the least aggressive normalization function that satisfies your use case to avoid altering text more than necessary.

Final Thoughts

As applications become increasingly global, handling multilingual text correctly is no longer optional. Reliable search, accurate deduplication, and consistent data processing all depend on understanding Unicode beyond simple uppercase and lowercase conversions.

The new Unicode string functions introduced in ClickHouse® 26.3—caseFoldUTF8(), removeDiacriticsUTF8(), and normalizeUTF8NFKCCasefold()—provide a modern toolkit for working with international text. Whether you're processing customer names, addresses, search queries, product catalogs, or user-generated content, these functions help ensure your comparisons behave consistently across languages and writing systems.

Combined with existing Unicode normalization functions and new utilities like naturalSortKey(), ClickHouse® continues to strengthen its support for real-world text processing while maintaining the high performance expected from a modern analytical database.

As multilingual datasets continue to grow, incorporating these Unicode-aware functions into your data pipelines and queries will help improve search accuracy, reduce duplicate records, and deliver a better experience for users around the world.

Day 92 of #100DaysOfClickHouse - Read-Only Tables in ClickHouse® 26.3

Kanishga Subramani — Tue, 21 Jul 2026 16:27:23 +0000

Day 92 of #100DaysOfClickHouse

Read-Only Tables in ClickHouse® 26.3: How `table_readonly` Works and When to Use It

ClickHouse® 26.3 introduces a useful new MergeTree table setting called table_readonly, designed to help administrators protect finalized datasets from accidental modification while keeping them fully available for analytical queries. Although it is a relatively small feature compared to some of the larger additions in recent releases, it addresses a common operational challenge faced by many organizations that manage historical, archived, or compliance-sensitive data.

When the table_readonly setting is enabled, ClickHouse® rejects INSERT operations and other write requests that would modify the contents of the table, while continuing to execute SELECT queries normally. Existing applications, dashboards, reports, and analytical workloads can continue accessing the data without interruption, but any attempt to change the table is immediately blocked. This allows organizations to preserve the integrity of finalized datasets without relying solely on application logic or administrative procedures.

The setting can be configured either during table creation or applied later using an ALTER TABLE statement, making it easy to convert an existing MergeTree table into a read-only dataset without changing its schema or migrating any data. Because the configuration takes effect immediately, administrators can quickly protect a table once it has been validated, archived, or published for long-term reporting.

One of the primary motivations behind introducing table_readonly is the growing need to manage immutable datasets. In many analytical environments, data eventually reaches a stage where it should never change again. Historical event logs, archived customer activity, yearly sales summaries, completed ETL outputs, regulatory datasets, audit records, and approved financial reports are all examples of information that should remain unchanged after validation. Before ClickHouse® 26.3, preventing accidental writes typically required carefully managing user permissions, adding safeguards within applications, or relying on operational discipline. The new setting moves this protection directly into the database engine, ensuring that the table itself refuses any modification attempts regardless of where they originate.

Another significant advantage is its impact on MergeTree's background processing. Under normal circumstances, MergeTree tables continuously execute background tasks such as merging data parts and performing maintenance operations to optimize storage. Since a table marked as read-only can no longer receive new data, these background merge threads become unnecessary. ClickHouse® therefore stops running these maintenance operations for read-only tables, reducing resource consumption and making the feature particularly beneficial for archived datasets that are expected to remain unchanged indefinitely.

Although the feature provides strong protection against accidental writes, it should not be confused with a security mechanism. The table_readonly setting does not authenticate users, enforce Role-Based Access Control (RBAC), manage user permissions, encrypt stored data, or determine who can query the table. Those responsibilities remain part of ClickHouse®'s existing security model. Instead, table_readonly serves as an additional operational safeguard that complements authentication and authorization by ensuring that even authorized users or applications cannot accidentally modify a table that has been intentionally locked.

The feature has numerous practical applications in production environments. Organizations managing data warehouses can mark yearly archive tables as read-only once reporting periods have closed. Financial institutions can protect approved monthly and quarterly reports from accidental updates after publication. Compliance and audit datasets can be preserved once collection periods end, helping maintain regulatory consistency. Static reference datasets, such as archived product catalogs, country code mappings, or historical configuration snapshots, can also benefit from being marked as immutable. In addition, table_readonly provides an extra layer of defense against common operational mistakes, such as ETL jobs targeting the wrong table, deployment scripts inserting data into archived datasets, or manual SQL operations modifying production history.

The article demonstrates these capabilities through a complete hands-on example. A MergeTree table is created and populated with sample sales data before being converted into read-only mode using ALTER TABLE ... MODIFY SETTING table_readonly = 1. Once enabled, analytical queries continue to execute exactly as before, allowing dashboards and reports to function normally. However, any attempt to insert additional rows immediately fails because the database now rejects write operations against the protected table. If business requirements change and updates become necessary later, administrators can simply disable the setting by changing it back to 0, instantly restoring normal write behavior without affecting the stored data.

Like any feature, table_readonly has important limitations. It is available only for MergeTree-family tables and should not be considered a replacement for comprehensive access-control mechanisms. Organizations should continue using RBAC, user permissions, network security, and encryption where appropriate, while treating table_readonly as an additional layer of operational protection. Furthermore, it is unsuitable for tables that receive continuous data ingestion, since enabling the setting would intentionally block normal write operations required by those workloads.

Overall, table_readonly is a practical quality-of-life improvement introduced in ClickHouse® 26.3 that simplifies the management of immutable datasets. By allowing MergeTree tables to reject write operations while remaining fully queryable, and by eliminating unnecessary background maintenance for archived tables, it provides administrators with a lightweight yet highly effective mechanism for protecting historical data. Whether used for financial reporting, compliance, long-term analytics, audit records, or archived operational data, table_readonly helps ensure data integrity while reducing the risk of accidental modifications in production environments.

Day 91 of #100DaysOfClickHouse - Partition Pruning in ClickHouse®

Kanishga Subramani — Tue, 21 Jul 2026 16:04:05 +0000

Partition Pruning in ClickHouse®: How It Really Works

Introduction

Partition pruning is one of the reasons a well-designed ClickHouse® table can answer queries like "Show me last Tuesday's events" without scanning years of unrelated data.

At first glance, the concept seems simple: skip the data you don't need. However, partition pruning is also one of the most misunderstood optimization techniques in ClickHouse®. It's frequently confused with primary key indexing, can be silently disabled by an unsuitable WHERE clause, and is often implemented on the wrong column altogether.

In this article, we'll explore what partition pruning actually does, how ClickHouse® decides whether it can prune partitions, how to verify that it's working, and the situations where it quietly stops helping.

What Is a Partition?

A partition is a logical grouping of data parts on disk.

When creating a MergeTree table, you define a partitioning expression using PARTITION BY. Every inserted row is assigned to a partition based on that expression, and every data part belongs to exactly one partition.

Because each partition is stored separately on disk, ClickHouse® can completely skip partitions that cannot contain matching data.

CREATE TABLE visits
(
    VisitDate Date,
    Hour UInt8,
    ClientID UUID
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(VisitDate)
ORDER BY Hour;

In this example:

Every month's data is stored in its own partition.
June 2026 data is stored separately from July 2026.
August 2026 data is stored separately again.

Now consider this query:

SELECT count()
FROM visits
WHERE VisitDate >= '2026-06-01'
  AND VisitDate < '2026-07-01';

Since only June data can satisfy the filter, ClickHouse® opens only the June partition and completely ignores every other month's directory.

This is partition pruning.

Partition Pruning vs Primary Key Pruning

Many users assume partition pruning is ClickHouse®'s primary indexing mechanism.

It isn't.

The primary key (ORDER BY) remains the most important optimization for query performance because it determines how rows are ordered inside each data part.

Partitioning serves different purposes:

Eliminating entire partitions before reading data
Making retention operations extremely fast
Simplifying storage tiering
Reducing unnecessary disk access

Think of it this way:

Partition pruning decides which folders to open.
Primary key pruning decides which files inside those folders to read.

Both optimizations work together, but they solve different problems.

Seeing Partition Pruning in Action

The easiest way to verify partition pruning is with EXPLAIN.

EXPLAIN indexes = 1
SELECT count()
FROM visits
WHERE VisitDate >= '2026-06-01'
  AND VisitDate < '2026-07-01';

The execution plan includes partition selection details, showing:

Total partitions
Selected partitions
Which pruning rules were applied

If only one partition is selected while dozens exist, pruning is working correctly.

Inspecting Partitions

You can also inspect your table's partitions directly.

SELECT
    partition,
    count() AS parts,
    sum(rows) AS row_count
FROM system.parts
WHERE table = 'visits'
  AND active = 1
GROUP BY partition
ORDER BY partition;

This query shows:

Existing partitions
Number of active data parts
Rows stored inside each partition

Suppose EXPLAIN reports that only one partition is scanned, but that partition contains hundreds of small parts and millions of rows.

In that case, partition pruning is working correctly.

The issue is likely excessive fragmentation or an overly coarse partition key—not pruning itself.

When Partition Pruning Works Automatically

Partition pruning succeeds when ClickHouse® can determine which partitions satisfy the filter.

One important reason this works is that ClickHouse® understands certain monotonic functions.

A monotonic function always preserves ordering.

Examples include:

toDate()
toYYYYMM()
toYear()

Consider this table:

PARTITION BY toYYYYMM(VisitDate)

Even though the query filters on VisitDate rather than toYYYYMM(VisitDate), ClickHouse® can infer which monthly partitions are relevant.

WHERE VisitDate >= '2026-06-01'
  AND VisitDate < '2026-07-01'

The optimizer understands that only the June partition can contain matching rows.

When Partition Pruning Fails

Partition pruning quietly stops working when ClickHouse® cannot infer the relationship between the filter and the partition expression.

Consider this partition key:

PARTITION BY cityHash64(user_id) % 16

Now execute:

SELECT *
FROM events
WHERE user_id = 42;

Although the filter clearly identifies one user, ClickHouse® cannot determine which hash bucket contains that user without evaluating the hash.

As a result:

Every partition must be considered.
No pruning occurs.
The query becomes more expensive.

Pruning would only work if the query filtered using the exact same expression:

WHERE cityHash64(user_id) % 16 = 5

Hash functions, modulo operations, and many other derived expressions are non-monotonic, preventing ClickHouse® from working backwards to eliminate partitions.

This behavior is expected and documented—it is not a bug.

Designing a Partition Key That Actually Helps

A good partition key should reflect how users actually query the data.

Some practical guidelines include:

Partition by your most common filter

If nearly every query filters by date, partition by date.

Monthly partitions are usually the best balance between pruning effectiveness and operational overhead.

Daily partitions are generally appropriate only for extremely high-volume workloads such as observability or logging systems.

Avoid high-cardinality partition keys

Never partition by columns such as:

user_id
session_id
order_id

Doing so creates enormous numbers of tiny partitions, increasing merge overhead and reducing overall performance.

These columns usually belong at the beginning of the ORDER BY clause instead.

Keep the partition count reasonable

Thousands of partitions are usually a warning sign.

Each partition introduces additional metadata and background merge work.

Fewer, larger partitions generally perform better than many tiny ones.

Align partitioning with TTL policies

Partitioning also affects data lifecycle management.

If your TTL removes old data by month, partitioning by month allows ClickHouse® to simply drop entire partitions.

Without matching partition boundaries, TTL often falls back to slower row-level mutations.

Final Thoughts

Partition pruning is one of the simplest yet most effective optimizations in ClickHouse®.

Its job is straightforward:

Skip entire partitions that cannot satisfy the query.

However, getting the full benefit requires thoughtful schema design.

Choose a partition key that matches your most common filtering patterns, verify pruning with EXPLAIN indexes = 1, and inspect system.parts whenever performance doesn't match expectations.

When partitioning is designed around real query patterns instead of assumptions, ClickHouse® can eliminate massive amounts of unnecessary I/O before reading a single data granule.

For more ClickHouse® optimization guides and operational best practices, explore the CHOps feature page:

https://www.ch-ops.io/features

Day 90 - Data Loss Warning: Why You Cannot Downgrade from ClickHouse® 26.3

Kanishga Subramani — Mon, 20 Jul 2026 18:14:29 +0000

Introduction

Upgrading ClickHouse® is typically a straightforward process that brings new features, performance improvements, bug fixes, and better SQL compatibility. However, every major release can also introduce changes that affect operational workflows. One of the most important changes introduced in ClickHouse® 26.3 is related to its on-disk storage format.

Starting with ClickHouse® 26.3, once the server begins creating data using the new storage format introduced in this release, downgrading to an earlier ClickHouse® version is no longer supported. Older versions cannot understand the new data layout, and attempting to downgrade after new-format data parts have been written may result in startup failures, unreadable tables, or even data loss.

If your organization depends on rollback procedures as part of its upgrade strategy, this is a critical change to understand before deploying ClickHouse® 26.3 in production.

In this article, we'll explain why downgrades are no longer supported, how the new storage format works, the risks involved, and the best practices for safely upgrading production clusters.

Why Can't You Downgrade?

ClickHouse® stores data on disk using highly optimized internal storage formats. These formats evolve over time to support new capabilities, improve performance, reduce storage consumption, and enable new features.

ClickHouse® 26.3 introduces updates to some of these internal storage structures.

The upgrade process looks like this:

ClickHouse® 26.2
        │
        ▼
Upgrade to 26.3
        │
        ▼
New Data Parts Created
        │
        ▼
❌ Older Versions Cannot Read Them

Unlike SQL syntax or configuration files, the physical storage format is tightly coupled to the ClickHouse® engine itself. Once data has been written using a newer storage format, previous ClickHouse® binaries no longer understand those structures.

What Happens During an Upgrade?

Immediately after upgrading, your existing data remains in its original format.

However, ClickHouse® continuously performs background maintenance tasks.

These include:

Background merges
INSERT operations
Mutations
OPTIMIZE commands
TTL merges

As these operations execute, new data parts begin using the updated storage format.

The process looks like this:

Upgrade
   │
   ▼
Old Data Parts
   │
Background Merge
   │
   ▼
New Format Data Parts

Initially, only a small portion of the data uses the new format. Over time, more and more data parts are rewritten until a significant portion of the database depends on the new storage structures.

At that point, reverting to an older ClickHouse® version becomes unsafe.

Why Introduce a New Storage Format?

Database storage engines continuously evolve.

New storage formats often provide:

Faster query execution
Better compression ratios
Improved metadata handling
Reduced storage overhead
Support for new database features
More efficient background processing

These improvements are one of the reasons ClickHouse® continues to deliver excellent analytical performance.

The downside is that older versions cannot always understand the new on-disk representation.

What Can Happen If You Downgrade?

Attempting to downgrade after new-format data parts have been created can cause several problems.

Possible outcomes include:

ClickHouse® failing to start
Tables becoming inaccessible
Missing data parts
Metadata inconsistencies
Data corruption
Permanent data loss

Even if the older server starts successfully, it may be unable to read tables containing newer storage structures.

Example Upgrade Timeline

Let's walk through a typical scenario.

Day 1

The server runs ClickHouse® 26.2.

26.2

├── Part A
├── Part B
└── Part C

Everything is stored using the older format.

Day 2

The administrator upgrades to ClickHouse® 26.3.

At this point, existing data is still readable because it has not yet been rewritten.

Day 3

Background merges execute automatically.

26.3

├── Part D (New Format)
├── Part E (New Format)
└── Part F (New Format)

The new storage format is now actively being used.

Day 4

The administrator decides to reinstall ClickHouse® 26.2.

Unfortunately:

Part D cannot be read.
Part E cannot be read.
Part F cannot be read.

Result:

❌ Downgrade Failed

What Does This Mean for Production?

Organizations should treat upgrades to ClickHouse® 26.3 as one-way upgrades.

Instead of depending on binary downgrades, build your deployment process around careful validation and reliable backups.

A recommended production workflow is:

Backup
   │
   ▼
Upgrade Staging
   │
   ▼
Validate Applications
   │
   ▼
Upgrade Production
   │
   ▼
Monitor Cluster

This approach minimizes operational risk while providing a safe recovery path if problems occur.

Best Practices Before Upgrading

1. Create a Full Backup

Always create a verified backup before upgrading.

Possible backup methods include:

ClickHouse® BACKUP command
Filesystem snapshots
Object storage backups
Cloud provider snapshots

A reliable backup is the safest rollback strategy.

2. Test in a Staging Environment

Before touching production, upgrade a staging cluster.

Validate:

SQL queries
Dashboards
Materialized Views
Replication
Scheduled jobs
Applications
Monitoring tools

Testing beforehand significantly reduces upgrade risk.

3. Read the Release Notes

Every major ClickHouse® release includes important information about:

Breaking changes
New features
Configuration updates
Compatibility notes

Never skip the release notes.

4. Schedule a Maintenance Window

Although rolling upgrades reduce downtime, production upgrades should still be scheduled during planned maintenance windows.

This allows enough time for validation and troubleshooting if necessary.

5. Monitor the Cluster After Upgrading

Once the upgrade is complete, monitor:

Background merges
Replication queues
Query latency
Disk usage
CPU utilization
Memory usage
Server logs
Application error rates

Early monitoring helps detect issues before users notice them.

What Is the Recommended Rollback Strategy?

Since binary downgrades are no longer reliable after new-format parts are created, the recommended recovery process is:

Problem Detected
        │
        ▼
Stop ClickHouse®
        │
        ▼
Restore Verified Backup
        │
        ▼
Install Previous Version
        │
        ▼
Start Server

Rather than attempting to reuse upgraded storage files, restore the database from a backup created before the upgrade.

This is the only reliable rollback strategy.

Upgrade Checklist

Before upgrading to ClickHouse® 26.3, verify the following:

Step	Recommendation
✅	Create verified backups
✅	Test upgrades in staging
✅	Review release notes
✅	Verify application compatibility
✅	Schedule a maintenance window
✅	Monitor after deployment
✅	Understand that downgrades are not supported once new-format data parts exist

Frequently Asked Questions

Can I downgrade immediately after upgrading?

Possibly—but only if ClickHouse® has not yet created any new-format data parts.

Once background merges, INSERT operations, mutations, or other maintenance tasks generate new-format parts, downgrading is no longer supported.

Does this affect replicated clusters?

Yes.

If one replica begins generating or exchanging new-format parts, replicas running older ClickHouse® versions will not be able to read them.

Upgrade planning becomes even more important in replicated environments.

Is this unique to ClickHouse®?

No.

Many database systems evolve their storage formats over time.

As new optimizations and capabilities are introduced, maintaining backward compatibility with much older storage engines eventually becomes impossible.

Key Takeaways

Recommendation	Why It Matters
Create backups before upgrading	Provides a safe recovery point
Test upgrades in staging	Reduces production risk
Review release notes	Identifies breaking changes early
Don't rely on binary downgrades	Older versions cannot read the new storage format
Monitor after upgrading	Detects issues before they become outages

Common Upgrade Mistakes

Some of the most common mistakes include:

Skipping backups because "the upgrade should be safe."
Upgrading production before testing staging.
Assuming rollback simply means reinstalling an older package.
Ignoring release notes.
Not monitoring background merges after upgrading.
Forgetting that storage formats evolve alongside database engines.

Avoiding these mistakes greatly improves upgrade reliability.

Final Thoughts

ClickHouse® 26.3 delivers valuable improvements in performance, SQL compatibility, and overall database capabilities. Alongside those improvements comes an important operational consideration: once ClickHouse® begins writing data using the new storage format, downgrading to an earlier version is no longer supported.

Rather than viewing this as a limitation, it's an opportunity to adopt stronger upgrade practices. Reliable backups, staged testing, careful production rollouts, and post-upgrade monitoring provide a much safer recovery strategy than relying on binary downgrades.

If you're planning to upgrade to ClickHouse® 26.3, treat it as a one-way transition. Validate thoroughly before deployment, maintain verified backups, and monitor your environment closely after the upgrade. With the right preparation, you can take advantage of the latest features while minimizing operational risk and protecting your data.

Day 89 - Leveraging WebAssembly (WASM) UDFs in ClickHouse® 26.3

Kanishga Subramani — Mon, 20 Jul 2026 18:04:37 +0000

Introduction

ClickHouse® is well known for its extensive collection of built-in SQL functions. From string manipulation and mathematical operations to JSON processing, geospatial analytics, machine learning helpers, and date-time functions, the database provides everything needed for most analytical workloads.

However, every organization eventually encounters business logic that cannot be expressed using built-in SQL functions alone. Financial institutions may require proprietary risk calculations, retailers may have custom loyalty score algorithms, and manufacturing companies may need domain-specific validation rules.

Prior to ClickHouse® 26.3, implementing these custom operations often meant exporting data from ClickHouse®, processing it in an external application written in Python, Go, Rust, or Java, and then loading the results back into the database. Although functional, this approach introduces unnecessary data movement, increases latency, complicates data pipelines, and creates additional operational overhead.

ClickHouse® 26.3 introduces an exciting new capability: WebAssembly (WASM) User-Defined Functions (UDFs).

Instead of moving data outside the database, developers can now execute custom logic directly inside ClickHouse® through a secure WebAssembly runtime. Developers can write functions in languages such as Rust, C, C++, TinyGo, or AssemblyScript, compile them into a .wasm module, register them with ClickHouse®, and invoke them just like built-in SQL functions.

In this article, we'll explore what WASM UDFs are, how they work, where they are useful, their benefits and limitations, and best practices for using them effectively.

What Is a User-Defined Function (UDF)?

A User-Defined Function (UDF) is a custom function created by users to perform operations that are not available as built-in database functions.

For example, suppose your organization calculates customer loyalty using a proprietary algorithm based on purchase frequency, lifetime value, product categories, and customer tenure.

Without a UDF, this logic might be repeated across dozens of SQL queries or implemented in an external application.

With a UDF, the entire calculation becomes reusable.

Built-in functions:

SUM()
AVG()
COUNT()
LENGTH()

Custom function:

loyalty_score()

Once defined, it behaves like any other SQL function.

What Is WebAssembly (WASM)?

WebAssembly (WASM) is a portable binary instruction format designed for secure, high-performance execution.

Unlike native plugins, which execute directly on the operating system, WebAssembly modules run inside an isolated sandbox.

The typical workflow looks like this:

Rust / C / C++ / TinyGo / AssemblyScript
                │
                ▼
      Compile to .wasm
                │
                ▼
     ClickHouse® WASM Runtime
                │
                ▼
        SQL Query Execution

Because WebAssembly is language-independent, developers are free to choose whichever supported language best fits their project.

Why WASM UDFs Matter

Before ClickHouse® 26.3, implementing custom business logic usually required:

Exporting data from ClickHouse®.
Processing it in another application.
Writing the processed results back into ClickHouse®.

This introduces several disadvantages:

Additional infrastructure
More network traffic
Higher latency
Extra storage
More operational complexity
Increased maintenance effort

With WASM UDFs:

Processing stays inside ClickHouse®.
Data never leaves the database.
SQL becomes more expressive.
Business logic becomes reusable.
Pipelines become simpler.

How WASM UDFs Work

The development workflow is straightforward.

Write Function
      │
      ▼
Compile to WebAssembly (.wasm)
      │
      ▼
Register Module
      │
      ▼
Create SQL Function
      │
      ▼
Execute Inside SQL Queries

Once registered, the custom function behaves just like any built-in ClickHouse® function.

Supported Programming Languages

Any language capable of compiling to WebAssembly can be used.

Common choices include:

Rust
C
C++
TinyGo
AssemblyScript

Among these, Rust is generally considered the preferred option because of its memory safety, excellent tooling, and mature WebAssembly ecosystem.

Enabling WASM UDF Support

In ClickHouse® 26.3, WASM UDF support is still experimental.

It must be enabled explicitly.

<enable_wasm_udf>1</enable_wasm_udf>

After enabling the setting, restart ClickHouse®.

The server can then load WebAssembly modules.

Note: Because the feature is experimental, it should be thoroughly tested before enabling it in production environments.

Registering a WASM Module

After compiling your code into a .wasm file, register the module with ClickHouse®.

Conceptually:

custom_logic.wasm
        │
        ▼
ClickHouse®
        │
        ▼
Registered WASM Module

Each module can expose one or more exported functions.

Creating a WASM Function

After the module has been registered, create a SQL function that references one of the exported methods.

Calling the function is no different from calling any built-in SQL function.

SELECT
    customer_id,
    wasm_loyalty_score(total_orders, total_spent)
FROM customers;

The custom logic executes internally while ClickHouse® processes each row.

Example Use Case

Imagine an insurance company that calculates premiums using hundreds of proprietary business rules written in Rust.

Traditional workflow

ClickHouse®

     │

Export Data

     │

Rust Application

     │

Import Results

Using WASM UDFs

ClickHouse®

      │

WASM Premium Calculator

      │

Query Results

Everything happens inside ClickHouse®, eliminating unnecessary data movement.

Real-World Use Cases

1. Financial Calculations

Banks, insurance providers, and fintech companies frequently implement proprietary algorithms.

Examples include:

Credit scoring
Loan eligibility
Risk calculations
Interest computations

2. Data Validation

Organizations often need custom validation rules.

Examples:

Customer ID validation
Tax number verification
Product code validation
Checksum calculations

3. Text Processing

Many companies require organization-specific text operations.

Examples:

Product normalization
Keyword extraction
Custom tokenization
Domain-specific parsing

4. Scientific Computing

Research and engineering teams can implement mathematical algorithms unavailable in standard SQL.

5. Machine Learning Preprocessing

Prepare features before passing data into downstream ML pipelines.

Examples include:

Feature scaling
Normalization
Feature encoding
Custom transformations

Benefits of WASM UDFs

Feature	Benefit
Sandboxed execution	Improved security
High performance	Near-native execution speed
Multi-language support	Rust, Go, C, C++, TinyGo, and more
Reusable modules	Write once, reuse everywhere
SQL integration	Functions behave like native SQL
Less infrastructure	Eliminates external processing pipelines
Reduced latency	Processing stays close to the data

WASM UDFs vs External Processing

WASM UDFs	External Processing
Runs inside ClickHouse®	Runs outside the database
Lower latency	Higher latency
No data movement	Export and import required
Integrated with SQL	Separate application
Easier maintenance	More infrastructure

Runtime Security

One of WebAssembly's biggest advantages is isolation.

The runtime:

Executes inside a secure sandbox.
Prevents direct operating system access.
Restricts unsafe operations.
Protects the ClickHouse® server from untrusted native code.

Compared with traditional native plugins, this provides a much safer extension mechanism.

Best Practices

When developing WASM UDFs:

Keep functions deterministic.
Avoid unnecessary memory allocations.
Focus each function on a single responsibility.
Benchmark performance before deployment.
Reuse compiled modules whenever possible.
Prefer built-in ClickHouse® functions when they already solve the problem efficiently.
Thoroughly test modules before production deployment.

Current Limitations

Although powerful, WASM UDFs are still evolving.

Current limitations include:

Experimental in ClickHouse® 26.3
Small runtime overhead compared to built-in functions
Not intended to replace native SQL functionality
Requires additional testing before production use

As the feature matures, many of these limitations are expected to improve.

When Should You Use WASM UDFs?

WASM UDFs are an excellent choice when implementing:

Proprietary business rules
Financial calculations
Data validation
Feature engineering
Specialized mathematical algorithms
Domain-specific text processing

However, if an equivalent built-in ClickHouse® function already exists, it will generally remain the preferred option for simplicity and performance.

Architecture Overview

              Application
                    │
                    ▼
               SQL Query
                    │
                    ▼
          ClickHouse® 26.3
                    │
      ┌─────────────┴─────────────┐
      │                           │
      ▼                           ▼
Built-in SQL Functions      WASM Runtime
      │                           │
      └─────────────┬─────────────┘
                    ▼
              Query Results

Best Practices for Production Adoption

If you're planning to experiment with WASM UDFs, consider the following recommendations:

Start with non-critical workloads.
Validate correctness against existing implementations.
Benchmark execution times with realistic datasets.
Monitor CPU and memory usage during execution.
Version your WebAssembly modules for easier deployment and rollback.
Keep your business logic modular so individual functions remain easy to maintain.

Following these practices will help you safely evaluate the feature while it continues to mature.

Final Thoughts

WebAssembly User-Defined Functions represent one of the most exciting extensibility features introduced in ClickHouse® 26.3.

By allowing developers to execute custom business logic directly inside the database, WASM UDFs reduce data movement, simplify analytics pipelines, and enable reusable domain-specific functionality without modifying the ClickHouse® source code.

Although the feature is currently experimental, it opens the door to a wide range of possibilities—from financial calculations and data validation to machine learning preprocessing and scientific computing—all while benefiting from WebAssembly's secure sandboxed execution model.

As ClickHouse® continues to evolve, WASM UDFs have the potential to become a powerful extension mechanism for organizations that need flexible, high-performance analytics with custom business logic executed exactly where the data lives.