DEV Community

Manveer Chawla
Manveer Chawla

Posted on

Can ClickHouse DELETE Data? A 2026 PR-by-PR Analysis

TL;DR

ClickHouse has supported DELETE operations since 2018. As of 2026, it ships four production-grade deletion paths: heavyweight ALTER TABLE DELETE, lightweight DELETE FROM (default since v23.3), patch-part DELETEs (v25.7), and ALTER TABLE DROP PARTITION for bulk operations. The "ClickHouse is immutable / append-only" narrative is outdated by eight years and 80+ merged PRs spanning five architectural eras, and the evidence is in the commit history.

  • We analyzed 80+ GitHub pull requests, official ClickHouse changelogs, and release blogs to trace the full evolution of DELETE support from 2018 through early 2026.
  • In 2018, ClickHouse shipped ALTER TABLE … DELETE as a heavyweight asynchronous mutation that rewrote affected data parts. The criticism that "deletes require heavy mutations" was fair — for that era. It was also the only delete path for four years.
  • By early 2026, ClickHouse ships standard SQL DELETE FROM (lightweight by default since v23.3), ALTER TABLE DELETE for guaranteed physical removal, ALTER TABLE DROP PARTITION for bulk deletion, patch-part-based lightweight updates and deletes, on-the-fly mutation visibility at SELECT time, and engine-level deletion patterns through ReplacingMergeTree(version, is_deleted) with optimized FINAL. None of these require experimental flags.
  • The single highest-impact change is the lightweight DELETE introduction (PR #37893), which redefined DELETE on MergeTree from "rewrite all affected parts" to "rewrite only _row_exists, hardlink the rest, filter on read." Benchmarks in the PR show 15 single-row deletes on a 100M-row, 12-column table dropping from ~8 seconds to ~200 ms — roughly 40× faster on the initial mask write.
  • Patch parts (PR #82004), shipped in v25.7, eliminated the part rewrite entirely. A DELETE becomes a tiny insert that sets _row_exists = 0, applied on read until a background merge consolidates it. ClickHouse's own benchmark blog series claims up to ~1,000× speedup for small/selective changes versus classic mutations (vendor benchmark — treat as an upper bound, not a guarantee).
  • On-the-fly mutations (PR #74877) and on-the-fly LWD (PR #79281) eliminated the "DELETE was issued but rows still appear" surprise. Queued deletes that haven't materialized are now applied at SELECT time.
  • The allow_experimental_lightweight_delete setting hasn't been needed since 23.3. It was aliased to enable_lightweight_delete in PR #50044 (commit 7189481, June 2023) for backward compatibility, then promoted to default-enabled.
  • Verdict: the "ClickHouse is immutable" advice made sense in 2017. Repeating it in 2026 is misinformation. ClickHouse offers four production-grade deletion paths covering compliance, bulk, selective, and high-frequency operational workloads, each with explicit trade-offs and observability.

Why People Still Say "ClickHouse Can't Delete"

If you've evaluated ClickHouse in the last few years, you've heard the warnings:

  • "ClickHouse can't delete data"
  • "ClickHouse is immutable / append-only"
  • "Deletes require heavy mutations"
  • "ReplacingMergeTree can't handle deletes"
  • "You need allow_experimental_lightweight_delete"
  • "FINAL is too slow for production queries"

Some of these started as legitimate ClickHouse guidance in the 2018–2020 era. The original ALTER TABLE … DELETE was deliberately syntactically heavyweight: ClickHouse was founded on the principle that "logs are immutable," and the ALTER syntax (rather than DELETE FROM) was an explicit signal that this was an administrative operation, not OLTP-style row modification. The cost model was honest: for a 100-column table, deleting a single row required reading and rewriting all 100 column files for the affected part.

In 2018, the criticism was largely fair. There was one delete path (ALTER TABLE DELETE), it was a full part rewrite, it was asynchronous, and you tracked it through system.mutations. If you needed selective row-level deletion at scale, you were going to feel it.

Then ClickHouse's engineering team spent eight years dismantling every one of those limitations. Over 80 significant pull requests merged. They added DELETE FROM syntax with a hidden-mask implementation (PR #37893), promoted it to GA (v23.3), made it synchronous by default (PR #44718), added IN PARTITION scoping (PR #67805), made it observable (apply_deleted_mask, has_lightweight_delete, parts_postpone_reasons), made it correct in the presence of projections and skip indexes (PRs #52517, #52530, #62364, #65594), and finally re-implemented it as a tiny patch-part write (PR #82004) so the part rewrite is gone entirely.

This article traces that evolution with PR-level evidence. No marketing claims. No benchmarks on toy datasets. Just the commit history.


Methodology: How We Analyzed ClickHouse's DELETE Commit History

We went through ClickHouse's GitHub commit history, pull requests, changelogs, and release blogs from 2018 through early 2026. The scope covered every PR that touched the DELETE subsystem: mutation engine changes, the lightweight-delete read path, patch parts, projection and skip-index correctness, replication and ON CLUSTER propagation, observability, and default configuration changes.

Each PR was classified by category (mutation engine, read-path filtering, storage interaction, correctness, settings, observability), impact severity, and whether it changed default behavior. We cross-referenced PR descriptions against changelog entries and release blog benchmarks to verify the claimed improvements. Where multiple PRs addressed the same subsystem (for example, the long tail of LWD-vs-projection bugs), we traced the dependency chain to understand how the incremental fixes compounded.

The result is a ranked set of PRs by impact, organized into five chronological eras, with full provenance. Every claim in this article maps to a specific merged PR or commit SHA that you can verify yourself on GitHub. Where two reputable sources cite different SHAs for the same PR (a known issue with squash-merges in older PR threads), we surface the conflict rather than picking one.

This isn't a benchmarking exercise. Benchmarks measure peak performance on controlled workloads. This analysis measures the engineering trajectory: what was built, why, and what it means for teams deciding whether ClickHouse can handle their delete patterns today.


ClickHouse DELETE Features in 2026: What Ships by Default

The current state, as of early 2026:

  • Standard SQL DELETE FROM: Lightweight by default since v23.3. Hidden _row_exists mask column, PREWHERE-injected at read time, hardlinks unaffected column files in wide parts. Synchronous by default with explicit lightweight_deletes_sync control.
  • ALTER TABLE … DELETE (heavyweight mutation): Retained for use cases that require guaranteed physical removal at completion (compliance, GDPR right-to-erasure, audit-bound workloads). Tracked in system.mutations, cancellable via KILL MUTATION.
  • ALTER TABLE DELETE … IN PARTITION: Both for heavyweight mutations (PR #13403, 2020) and lightweight DELETE (PR #67805, 2024). Prunes partitions before the delete plan even starts.
  • ALTER TABLE DROP PARTITION: Bulk deletion via durable empty parts (PR #41145), atomic and non-blocking for concurrent reads. The most efficient path for time-bounded data lifecycle operations.
  • Patch-part lightweight updates and DELETEs: Available since v25.7 via lightweight_delete_mode = 'lightweight_update'. A DELETE becomes a tiny insert of a patch part instead of a part rewrite.
  • On-the-fly mutation visibility: apply_mutations_on_fly = 1 makes queued deletes visible at SELECT time before background materialization completes.
  • ReplacingMergeTree(version, is_deleted): Engine-native upsert and tombstone semantics with OPTIMIZE TABLE … FINAL CLEANUP for forced physical removal. Combined with the optimized FINAL keyword for immediate consistency at query time.
  • Explicit physical-removal control: ALTER TABLE … APPLY DELETED MASK (PR #57433) forces materialization without waiting for background merges. min_age_to_force_merge_seconds and exclude_deleted_rows_for_part_size_in_merge give the merge selector the right inputs.
  • Observability: system.parts.has_lightweight_delete, removal_state, last_removal_attempt_time, and system.mutations.parts_postpone_reasons give operators machine-readable state for every delete in flight.

These aren't experimental features hidden behind flags. They're defaults that ship with every ClickHouse installation.


ClickHouse DELETE Myths vs. Reality: A 2026 Checklist

# The FUD Score Evidence Volume Reality (2026)
1 "ClickHouse can't delete data" 🟢 False since 2018 80+ PRs across 5 eras Four production-grade delete paths: heavyweight mutation, lightweight DELETE, patch-part DELETE, partition-scoped DELETE.
2 "ClickHouse is immutable / append-only" 🟢 Outdated Mutations since 2018 (release 1.1.54388); LWD since 22.8 Standard SQL DELETE FROM has been default-enabled since v23.3 (April 2023).
3 "Deletes require heavy mutations" 🟢 False since 22.8 PR #37893, #82004 LWD is ~40× faster than heavy mutations on the initial mask write. Patch-part DELETEs target up to ~1,000× speedup for small/selective changes per ClickHouse benchmarks.
4 "ReplacingMergeTree can't handle deletes" 🟢 False is_deleted column parameter ReplacingMergeTree(version, is_deleted) natively supports tombstones. OPTIMIZE TABLE … FINAL CLEANUP forces physical removal.
5 "You need allow_experimental_lightweight_delete" 🟢 Obsolete PR #50044 (commit 7189481, June 2023) The setting was renamed to enable_lightweight_delete and default-enabled in v23.3. The old name remains as a backward-compatibility alias.
6 "FINAL is too slow for production queries" 🟡 Outdated Multiple optimization PRs through v25.x FINAL was significantly optimized for production. It's the recommended path for immediate consistency on ReplacingMergeTree regardless of background merge state.
7 "DELETE crashes the mutation queue" 🟢 False since 2023 PR #48522, #44718 Memory usage reduced for large mutation queues. LWD synchronous by default to bound queue growth.
8 "Deleted rows linger forever in storage" 🟢 False PR #58223, #57433 Merge selector counts existing rows, not physical rows. APPLY DELETED MASK forces immediate physical cleanup on demand.
9 "Can't delete in a specific partition without scanning everything" 🟢 False since 2024 PR #67805, #13403 DELETE FROM … IN PARTITION and ALTER … DELETE … IN PARTITION prune partitions at plan time.
10 "DELETEs are invisible until merges finish" 🟢 False since 2025 PR #74877, #79281 apply_mutations_on_fly makes queued deletes visible at SELECT time. LWDs apply on the fly via the same mechanism.
11 "DELETE breaks projections and skip indexes" 🟢 False since 2024 PR #52517, #52530, #62364, #65594 Skip indexes and projections recalculate correctly during delete-driven merges. lightweight_mutation_projection_mode gives explicit policy options.
12 "No way to observe in-flight deletes" 🟢 False system.mutations, system.parts.has_lightweight_delete, parts_postpone_reasons Full lifecycle observability: queue state, postpone reasons, masked-part flagging, removal attempt timing.

Phase 1 (2018): The Original Mutation-Based DELETE

The FUD: "ClickHouse can't delete data"

This one was true for the first few years of ClickHouse's life. ClickHouse was designed around the principle that analytical data, once written, should not be modified. Compression and read throughput took priority over update flexibility. The first delete capability landed in mid-2018 as a deliberate compromise: support deletion, but signal architecturally that this was a heavyweight administrative operation.

ALTER TABLE … DELETE Lands as a Mutation (June–July 2018)

ClickHouse release 1.1.54388 (2018-06-28) added replicated ALTER TABLE t DELETE WHERE support together with the system.mutations table. Release 18.1.0 (2018-07-23) extended this to non-replicated MergeTree via PR #2634.

The mechanism was straightforward and expensive. When ALTER TABLE DELETE was issued, the server recorded the mutation with a unique ID (mutation_1.txt), returned immediately, and a background process scanned for parts containing rows matching the filter. For each affected part, a new version was created by reading the original data, applying the filter in memory, and writing only the surviving rows into a new part directory. The old part remained active until the new part was fully written and verified.

The write amplification was severe. For a 100-column table, deleting a single row required reading and rewriting all 100 column files for the affected part. The cost scaled with column count, not with the number of deleted rows. This is the math behind the "deletes are expensive" guidance from this era — and it was true.

Skip Unaffected Parts in DELETE Mutations (PR #2694, Late 2018)

PR #2694 added the first explicit rewrite-amplification optimization for DELETE: ALTER TABLE t DELETE WHERE no longer rewrote data parts that the predicate didn't touch. Before this PR, the mutation engine was conservative; after it, parts with no matching rows were skipped entirely. This established the pattern that would define the next eight years of DELETE evolution: do less work, lazily, and only on parts that actually need it.

DELETE Correctness Fixes (2020–2021)

The mutation-based DELETE generated a steady stream of correctness fixes:

  • PR #9048 (alesapin, 2020) — fixed primary.idx corruption after a delete mutation, the most severe class of DELETE bug.
  • PR #12153 (alexey-milovidov, 2020) — fixed over-deletion when the predicate evaluated to NULL on a row.
  • PR #21477 (alesapin, 2021) — fixed a deadlock for non-replicated MergeTree when ALTER DELETE WHERE referenced the same table.
  • PR #13403 (Vladimir Chebotarev, 2020) — added ALTER TABLE … DELETE … IN PARTITION for partition pruning, addressing metadata and ZooKeeper bloat in tables with thousands of partitions.

By the end of this era, ClickHouse had a working DELETE. It was honest about the cost. The "ClickHouse is immutable" critique was already inaccurate by 2018, but it was understandable.


Phase 2 (2022): How Lightweight DELETE Reframed Everything

The FUD: "Deletes require heavy mutations"

PR #37893, authored by Jianmei Zhang (zhangjmruc) and reviewed by davenger and alesapin, is the single highest-impact change in the entire DELETE history. Merged into v22.8 in mid-2022, it introduced standard SQL DELETE FROM <table> WHERE … and re-implemented it as a special mutation: ALTER TABLE <table> UPDATE _row_exists = 0 WHERE ….

The architectural shift was complete. DELETE went from "rewrite all affected parts" to "rewrite only the _row_exists mask, hardlink the rest, filter on read."

The _row_exists Mask Column

Each MergeTree part gained a virtual system column called _row_exists. When a row was deleted, its bit in this column was flipped from 1 to 0. The data itself remained on disk — only the mask was updated.

For wide-format MergeTree parts (the default for parts above a threshold size), where each column is stored in its own .bin and .mrk files, the optimization is dramatic. ClickHouse only writes a new _row_exists.bin; all other column files are hardlinked from the old part to the new one. For compact-format parts, where all columns are interleaved in one file, the gain is smaller because the single file still has to be rewritten.

PREWHERE Injection on Read

Reading from a table with deleted rows is where the design pays off. A query like:

SELECT count() FROM users WHERE age > 25
Enter fullscreen mode Exit fullscreen mode

is internally transformed into:

SELECT count() FROM users PREWHERE _row_exists WHERE age > 25
Enter fullscreen mode Exit fullscreen mode

PREWHERE runs before the main column reads. If _row_exists indicates an entire granule is deleted, ClickHouse skips reading any other column data for that granule. The mask is tiny (one bit per row, highly compressible), so the read overhead is negligible compared to the savings on filtered-out granules.

Hardlink Optimization for Wide Parts

The PR description includes a benchmark on a 100-million-row, 12-column table. Fifteen single-row lightweight DELETEs took roughly 200 ms. The same operation as a heavyweight mutation took roughly 8 seconds. That's a ~40× speedup on the initial mask write — and the gap widens as column count increases.

IStorage::supportsDelete(): Architectural Formalization (December 2022)

Commit 938aac9 (2022-12-29, davenger) added IStorage::supportsDelete(), formalizing the architectural separation between engines that support DELETE FROM and those that don't. This wasn't a feature in itself, but it was the contract that the rest of the system would build on.

Lightweight DELETE Correctness Fixes (Late 2022)

Lightweight DELETE introduced a new class of correctness bugs that had to be hunted down:

  • PR #40559 (davenger, August 2022) — fixed vertical merge of parts with lightweight-deleted rows. First post-introduction LWD bugfix; backported to 22.8.
  • PR #42126 (davenger, October 2022) — fixed "Invalid number of rows in Chunk" errors. Required a substantive PREWHERE refactor to support multiple PREWHERE steps so the _row_exists filter could be the first step in the chain.

By the end of 2022, the criticism had shifted. "Deletes require heavy mutations" had a documented expiration date in the changelog.


Phase 3 (2023): Lightweight DELETE Goes GA

The FUD: "You need allow_experimental_lightweight_delete"

Lightweight DELETE was promoted to GA / on-by-default in v23.3, announced in the v23.3 release blog and the April 2023 newsletter. The GA work is credited to Jianmei Zhang and Alexander Gololobov.

Synchronous by Default (PR #44718, December 2022 / January 2023)

PR #44718 made DELETE FROM synchronous by default so the command would not return until the rows were masked and invisible to subsequent queries. This bounded the mutation queue and prevented accumulating piles of pending LWD mutations. The original async behavior was later partially restored as a setting (lightweight_deletes_sync) for users on remote storage where the per-LWD coordination cost is high.

Memory and Concurrency Hardening (PR #48522, April 2023)

PR #48522 (KochetovNicolai) directly targeted "Reduce memory usage for multiple ALTER DELETE mutations" — the canonical reference for fixing OOM scenarios on large mutation queues. Related issue #57411 documented servers being killed by OOM while loading large numbers of mutation_*.txt files on startup. This PR shrunk the per-mutation memory footprint enough that the queue stopped being an operational hazard.

Lightweight DELETE on JSON and Object Columns (PR #49737, May 2023)

PR #49737 (davenger) stopped the MutatePlainMergeTreeTask log loop "There is no physical column _row_exists in table" when the table had an Object or JSON column. Fixed issues #49509 and #55076.

From allow_experimental_lightweight_delete to enable_lightweight_delete (PR #50044, June 2023)

PR #50044 (Azat Khuzhin, commit 7189481, 2023-06-04) aliased allow_experimental_lightweight_delete to enable_lightweight_delete. This was the bridge for users coming from older releases: their existing settings continued to work, but the canonical name reflected the feature's promotion out of experimental status.

This is the precise moment where "you need allow_experimental_lightweight_delete" became misinformation. The setting wasn't removed (backward compatibility matters), but it stopped being required, and the canonical documentation moved to the new name.

Projection Compatibility (PRs #52517 and #52530, July–August 2023)

PR #52517 (Anton Popov / CurtizJ, July 2023) — fixed lightweight DELETE failing after a projection was dropped. Even after the projection was gone, stale metadata could poison later LWD execution. Backported to 22.8 and 23.3.

PR #52530 (CurtizJ, August 2023, commit b6ce725) — fixed recalculation of skip indexes (bloom_filter, minmax, ngrambf, etc.) and projections in ALTER DELETE queries. Both needed to be recalculated, not stale-copied. Backported to 22.8, 23.3, 23.5, 23.7.

apply_deleted_mask and APPLY DELETED MASK: Operator Levers (PRs #55952 and #57433, late 2023)

PR #55952 (davenger, October 2023, commit 40062ca) added the apply_deleted_mask setting. With apply_deleted_mask = 0, SELECTs return rows that LWD has masked, which is essential for forensics, audits, and compliance verification — confirming that a delete actually happened, that the right rows were marked, and that the data is still recoverable until physical cleanup.

PR #57433 (CurtizJ, December 2023, commit 87d0cec) added ALTER TABLE … APPLY DELETED MASK [IN PARTITION …]. This is the explicit "stop waiting for merges, physically remove these rows now" lever. Implemented as an ordinary mutation command, it's the clean answer to compliance workloads that need guaranteed cleanup on demand.

By the end of 2023, lightweight DELETE was production-ready. The umbrella tracking issue (#56728) for production-readiness work was being closed. Operators had observability, force-cleanup levers, and synchronous semantics by default.


Phase 4 (2023–2024): Storage-Aware DELETE Optimizations

The FUD: "Deleted rows linger forever in storage"

The early lightweight DELETE design had a known operational gap. Because LWD only writes the mask, the underlying part still contains the deleted rows. They're filtered out at read time, but they consume disk space until the next merge. And the merge selector — which decides which parts to merge next — was counting physical rows, not existing rows. That meant a large part dominated by lightweight-deleted rows could sit at near max_bytes_to_merge_at_max_space_in_pool indefinitely, never picked up for merging, never cleaned.

Phase 4 fixed this and added the merge-selector and physical-cleanup machinery that makes LWD usable at scale.

exclude_deleted_rows_for_part_size_in_merge: Merge Selection That Counts Existing Rows (PR #58223, early 2024)

PR #58223 (jewelzqiu) added existing_rows_count to data parts and taught the merge selector to use it. The MergeTree settings exclude_deleted_rows_for_part_size_in_merge and load_existing_rows_count_for_old_parts give operators control over the trade-off.

The operational result: a 50 GB part where 90% of the rows are lightweight-deleted is now treated as a 5 GB part for merge selection. It gets picked up, merged, and the deleted rows physically disappear.

lightweight_deletes_sync (PR #62195, April 2024)

PR #62195 (CurtizJ, 2024-04-03) introduced the dedicated lightweight_deletes_sync setting (default value 2: "wait all replicas synchronously"), separating LWD synchronicity from the generic mutations_sync. This gave users on S3-backed deployments — where the per-LWD coordination cost is high — a way to lower the wait without weakening async semantics for heavy mutations. References to commit SHAs vary across sources (534905f and ed448ea both appear in PR #62195's commit set).

Projection Rebuild on Row-Reducing Merges (PR #62364, Q2 2024)

PR #62364 (cangyin / ardenwick) added projection rebuild for merges that reduce row count. Some merging modes (Replacing, Collapsing, deletes) genuinely reduce rows, and projections need to be rebuilt to avoid silently retaining "deleted" rows in projection data — a subtle correctness bug that could cause queries hitting the projection to return different results than queries hitting the base table.

lightweight_mutation_projection_mode: Lightweight DELETE on Tables with Projections (PR #65594, July 2024)

PR #65594 (jsc0218 / ShiChao Jin, 2024-07-04, commit 556c7de) added the lightweight_mutation_projection_mode table-level setting with three values: throw (default), drop, and rebuild. Without this setting, lightweight DELETE on a table with a projection unconditionally errored. Now you have an explicit policy: throw safely, drop the projection, or rebuild it as part of the merge.

DELETE FROM … IN PARTITION for Lightweight DELETE (PR #67805, August 2024)

PR #67805 (sunny19930321, 2024-08-30, commit 950ca28) added DELETE FROM … IN PARTITION 'xy' WHERE …. This means the planner doesn't have to scan all partitions when you know the delete is partition-bounded. Resolves issues #59409 and #60218. The ON CLUSTER form is also supported: DELETE FROM [db.]table [ON CLUSTER cluster] [IN PARTITION partition_expr] WHERE expr;.

system.parts.has_lightweight_delete and system.mutations Schema Additions (2024)

system.parts gained columns to flag and track lightweight-deleted parts:

  • has_lightweight_delete — true if the part has any rows masked by LWD
  • removal_state — current state of the part's removal lifecycle
  • last_removal_attempt_time — timestamp of the most recent attempt to remove the part

Combined with system.mutations.parts_postpone_reasons (commit 8903fd1) and parts_in_progress_names, operators have machine-readable answers for "why is this delete stuck" without grepping logs.

By the end of 2024, the storage layer understood lightweight DELETE end-to-end. Deleted rows didn't linger; the merge selector picked the right parts; projections and skip indexes were consistent; and operators had the levers and observability to manage it all.


Phase 5 (2025–2026): Patch Parts and On-the-Fly Mutations

The FUD: "DELETEs are invisible until merges finish"

The lightweight DELETE design from PR #37893 still required something to be written to disk for each delete — at minimum, a new version of _row_exists.bin for the affected part. For workloads with many small, frequent deletes, that per-DELETE write was the dominant cost. Phase 5 attacked it.

apply_mutations_on_fly: On-the-Fly Mutations (PR #74877, Q1 2025)

PR #74877 (CurtizJ) introduced apply_mutations_on_fly. Queued ALTER UPDATE and ALTER DELETE mutations that have not yet materialized are now applied at SELECT time, so users immediately see updated and deleted state. This closed the long-standing "DELETE was issued but rows still appear when mutations_sync = 0" surprise. Heavy mutations still materialize asynchronously in the background, but they're no longer invisible to queries in the meantime.

The mechanism has limits: only scalar subqueries up to mutations_max_literal_size_to_replace, only constant non-deterministic functions (controlled by mutations_execute_nondeterministic_on_initiator and mutations_execute_subqueries_on_initiator). Within those limits, the read path applies the mutation transform on the fly.

On-the-Fly Lightweight DELETE (PR #79281, April 2025)

PR #79281 (CurtizJ, commit dc9f636) extended on-the-fly mutations to lightweight DELETE specifically. Now DELETE FROM … SETTINGS lightweight_deletes_sync = 0 becomes visible immediately when apply_mutations_on_fly = 1. Resolves issue #75180.

Patch Parts: DELETEs Without Part Rewrites (PR #82004, July 2025)

PR #82004 (CurtizJ / Anton Popov), merged into v25.7, is the second-most-important PR in this entire history. It added standard SQL UPDATE syntax via patch parts and re-implemented lightweight DELETE on top of the same mechanism when lightweight_delete_mode = 'lightweight_update'.

The new shape: a DELETE creates a tiny patch part that sets _row_exists = 0 for affected rows. The patch is applied on read and physically merged in the next background merge. There's no rewrite of the source part. There's no hardlinking ceremony. The DELETE is, essentially, an insert.

Mechanically:

  • Patch parts are sorted by _part, _part_offset.
  • Partition ID is patch-<hash of column names>-<original_partition_id>.
  • Merging of patch parts uses a ReplacingMergeTree-style algorithm with _data_version as version.
  • Two read-time application modes: merge by sorted system columns when the source part is unchanged, join when the source part has been re-merged.
  • update_sequential_consistency and update_parallel_mode control behavior under concurrent updates.

The benchmarks in ClickHouse's own three-part series claim up to ~1,000× faster for small/selective changes versus classic mutations. Vendor benchmarks; treat as upper bounds, not guarantees, but the mechanism explains the size of the gap. For larger deletes (>~10% of a table), classic mutation is still preferred — the patch-on-read overhead grows with patch size.

2026 Correctness and Operability Hardening

The first half of 2026 has been a wave of LWD-related correctness and operability fixes:

  • PR #101212 (Anton Popov / CurtizJ, 2026-04-21, commit 509d35a) — "Fix several optimizations after lightweight deletes [2]." Critical fix: query optimizations like trivial COUNT(*) and minmax_count_projection were permanently disabled after a lightweight DELETE, even after all masked parts had been merged away. Replaced a sticky global flag with a per-snapshot computation: mutations_snapshot->hasLightweightDeletedMask(). This prevents permanent performance degradation on tables that ever ran an LWD.
  • PR #97589 (Alexey Milovidov, 2026-02-28, commit 53a75e8) — Fix KILL QUERY for ALTER DELETE with mutations_sync=1 on ReplicatedMergeTree. Synchronous replicated ALTER DELETE could become effectively unkillable. The fix is in mutation execution and control flow rather than DELETE semantics, but it materially improves operability under stalls.
  • PR #99281 (Yash, 2026) — Fix ALTER TABLE UPDATE/DELETE failing with "Missing columns" when a MATERIALIZED column depends on an EPHEMERAL column. Computed-column dependency analysis was wrong; the statement could fail before mutation execution.
  • Commit 9c4dda6 (2026-04-06) — Fix usage of text index with lightweight deletes. High-severity correctness fix for incorrect query results when both features were used together.
  • Commit 1acc6f3 — Fix for stuck mutations caused by phantom entries (race condition causing DELETE mutations to become stuck indefinitely).
  • PR #101792 — Broad LWD stateless test coverage. Not a feature landing, but a signal: LWD's hidden-row lifecycle and read-path semantics are now important enough to encode in dedicated stateless test suites.

This long correctness tail isn't a bad sign — it's how all mature deletion paths look once they're in production at scale. Compare to PostgreSQL's history of vacuum/freeze edge cases, or InnoDB's purge interactions. The volume of LWD fixes in 2026 reflects the volume of LWD usage.


Bulk Deletion: DROP PARTITION and Empty-Part Tombstones

The FUD: "Bulk deletion in ClickHouse is unsafe / non-atomic"

For bulk data lifecycle operations — removing a day's worth of data, dropping all rows for a deleted customer, reloading after a bad ETL run — the right answer is rarely DELETE FROM. It's ALTER TABLE … DROP PARTITION.

PR #41145 (Sema Checherinda, 2022) made these destructive partition operations durable. Before this PR, TRUNCATE TABLE, ALTER TABLE DROP PART, and ALTER TABLE DROP PARTITION worked by removing the part metadata and unlinking the files on disk. In a distributed system coordinated by ZooKeeper, a replica that was offline during the deletion or a crash between unlink and ZooKeeper update could leave the system in a state where the replica later attempted to "recover" the deleted part from another node. The result: "resurrected parts" — data that was supposedly deleted reappearing after a server restart or replica re-initialization.

The fix is elegant. Instead of immediate removal, these queries now create empty parts that explicitly cover the range of the old parts. Empty parts act as tombstones within the part set, so even if an old part is found on disk or on another replica, the system knows it has been superseded.

The achievements:

  1. Durability: if the request succeeds, the empty part is committed to disk and ZooKeeper, preventing resurrection.
  2. Atomicity: the substitution of old parts with empty ones is a single atomic operation within MergeTree's transaction scope.
  3. Non-blocking reads: the operation no longer requires a follow-up exclusive lock to clean up filesystem entries, so concurrent reads aren't blocked.

For workloads where data has a clean partition boundary (date, customer, region), DROP PARTITION is the most efficient deletion path in ClickHouse, full stop. It's effectively constant-time in the data volume.


ReplacingMergeTree, is_deleted, and the Optimized FINAL

The FUD: "ReplacingMergeTree can't handle deletes" / "FINAL is too slow"

For workloads that look more like upserts — frequent updates to the same primary key, with occasional deletions — engine-level deletion through ReplacingMergeTree is often the right architecture. ClickHouse supports an is_deleted column parameter on ReplacingMergeTree, which is the canonical "tombstone" pattern.

The mechanics:

  • ReplacingMergeTree keeps only the most recent version of a row with a given primary key (using a version column).
  • Adding is_deleted as a parameter tells the engine to treat rows where is_deleted = 1 as tombstones during merges.
  • To delete a row, insert a new record with the same primary key, the latest version, and is_deleted = 1.
  • During a merge, ClickHouse keeps the record with the highest version; if that record has is_deleted = 1, the row is dropped entirely.

The allow_experimental_cleanup_merges setting allows OPTIMIZE TABLE … FINAL CLEANUP, which forces the engine to physically remove rows where the latest version is marked as deleted. This is a declarative, high-throughput way to manage deletions without going through the mutation engine at all.

For query-time consistency, SELECT … FINAL ensures deleted rows are excluded regardless of background merge state. The historical critique of FINAL — that it was prohibitively slow for production queries — has been largely addressed through extensive optimization work. FINAL now runs efficiently enough to be the recommended path for immediate consistency on ReplacingMergeTree tables, especially when paired with appropriate primary key design.

The pattern in production: write inserts and tombstones at full ingest speed, run analytical queries with FINAL for consistency, and let merges (or OPTIMIZE … FINAL CLEANUP on demand) handle physical cleanup in the background.


ClickHouse DELETE Internals: Low-Level Optimizations and Correctness Hardening

Beyond the headline features, the DELETE subsystem received systematic low-level optimization that compounds across every delete operation:

  • PREWHERE multi-step refactor (PR #42126): MergeTree reader now supports multiple PREWHERE steps so the _row_exists filter can be the first step in the chain, both for correctness and for performance — read the tiny mask first, then large columns only for surviving rows.
  • I/O pool and asynchronous reads (PR #43260): max_streams_for_merge_tree_reading and allow_asynchronous_read_from_io_pool_for_merge_tree allow a dedicated I/O pool for reading MergeTree parts during queries and mutations. Up to 100× speedup for mutation reads on high-latency storage like Amazon S3, per the 2022 changelog.
  • MemoryTracker for background tasks (PR #48787, novikd): Memory tracking and soft limits for background tasks, including DELETE mutations.
  • mutations_execute_subqueries_on_initiator / mutations_execute_nondeterministic_on_initiator (settings, 23.x): Address one of the historically thorniest classes of replicated-DELETE bugs — divergent results across replicas when the predicate contains now(), scalar subqueries, or IN (subquery) (issues #18118, #19315, #23734, #16532).
  • _row_exists user-collision fix (PR #41763): Early correctness proof point — handles the case where a user defines a column named _row_exists themselves, preventing segfaults and wrong results.
  • min_age_to_force_merge_seconds: MergeTree setting that lets operators force older parts to merge regardless of size, reclaiming space from lightweight-deleted rows on a predictable schedule.
  • Lock contention reduction in BackgroundSchedulePool: On high-core-count CPUs (240+ threads), internal mutex contention in the background schedule pool was a latency source. CPU cycles spent in native_queued_spin_lock_slowpath were reduced significantly through critical-section shrinking and thread-local timer_id storage. This improves the scalability of the mutation engine on massive servers, ensuring background deletions don't interfere with interactive query performance.

None of these individually make a press release. Together, they compound into a materially faster and more reliable DELETE engine at every level of the stack.


ClickHouse DELETE Limitations and Trade-offs in 2026

Fairness matters. A few things still require awareness:

  • MutationsInterpreter still routes through the old analyzer. Issue #61563 tracks the migration of MutationsInterpreter to the new query analyzer; PR #61528 is the partial work. Most user-facing DELETE behavior is unaffected, but some advanced predicate forms behave differently than equivalent SELECTs. This is the only feature area where the answer is "partial / not done."
  • Patch-on-read overhead grows with patch size. Patch-part DELETEs (PR #82004) are dramatically faster than classic mutations for small, selective changes. For deletes affecting more than ~10% of a table, classic mutation is still preferred. The optimizer doesn't auto-select between them; the user picks via lightweight_delete_mode.
  • Compact parts gain less from LWD's hardlink optimization. In compact parts (the format used for small parts), all columns are interleaved in a single file. The "rewrite only _row_exists, hardlink the rest" optimization can't apply, so LWD on compact parts still rewrites the file. The wider your parts and the more columns, the bigger the LWD win.
  • Lightweight DELETE on remote storage has surprising latency. Real-world reports (issues #58281, #59225, #67048) document that LWD on S3-backed deployments can be slow even when no rows match the predicate, due to the synchronous coordination cost. lightweight_deletes_sync and patch parts (#82004) are the architectural answers; users on older versions should expect older behavior.
  • DROP PARTITION is constant-time only if your partitioning key is right. If your data isn't partitioned along the dimension you want to delete by, DROP PARTITION doesn't help. Partition design is a one-shot decision that defines what bulk deletion looks like.
  • Correctness fixes are ongoing. The 2026 wave (PR #101212, #97589, #99281, 9c4dda6, 1acc6f3) shows that LWD's interactions with read-path optimizations, replication, and other indexes still produce edge cases. ClickHouse's engineering team has been rigorous about correctness, but running the latest stable release matters.

These are real engineering trade-offs, and understanding them is part of making an informed decision.


ClickHouse DELETE Improvements Timeline (2018–2026)

Year What Changed Key PRs Impact
2018 ALTER TABLE … DELETE lands as mutation. Replicated and non-replicated MergeTree. Skip-unaffected-parts optimization. Release 1.1.54388; #2634; #2694 DELETE is supported. Heavyweight by design. system.mutations, KILL MUTATION, mutations_sync established.
2020–2021 Correctness hardening on the mutation path. IN PARTITION for mutations. #9048, #12153, #21477, #13403 Primary index corruption fixed, NULL-predicate over-deletion fixed, deadlocks fixed. Partition pruning for heavy mutations.
2022 Lightweight DELETE introduction. _row_exists mask, PREWHERE injection, hardlink unaffected columns. Empty-part tombstones for DROP PARTITION. #37893, #41145, #40559, #42126 Standard SQL DELETE FROM. ~40× faster than heavy mutation on initial mask write. Durable bulk-deletion semantics.
2023 LWD goes GA in v23.3. Synchronous by default. Memory hardening. Projection compatibility. apply_deleted_mask. APPLY DELETED MASK. #44718, #48522, #52517, #52530, #55952, #57433, #50044 LWD production-ready. allow_experimental_lightweight_delete deprecated. Operators get visibility and force-cleanup levers.
2024 Storage-aware merge selection. lightweight_deletes_sync. Projection policy. DELETE FROM … IN PARTITION. Row-reducing-merge projection rebuild. #58223, #62195, #65594, #67805, #62364 Deleted rows physically reclaimed by merges. Replicated LWD has independent sync control. Partition-scoped LWD.
2025 On-the-fly mutations and on-the-fly LWD. Patch parts: DELETE as a tiny insert. #74877, #79281, #82004 Queued deletes visible at SELECT time. Patch-part path targets up to ~1,000× faster than classic mutations on small/selective changes.
2026 Read-path optimization fixes after LWD. Killable replicated synchronous DELETEs. Text-index correctness. Stuck-mutation race fixes. #101212, #97589, #99281, 9c4dda6, 1acc6f3 COUNT(*) and projection optimizations no longer permanently disabled after LWD. KILL QUERY works for synchronous replicated ALTER DELETE.

When Should You Use Each DELETE Method in ClickHouse?

Workload Verdict Reasoning
Bulk historical cleanup along a partition boundary ALTER TABLE … DROP PARTITION Constant-time, atomic, durable via empty-part tombstones (PR #41145). The most efficient bulk path.
Reloading data after a bad ETL run DROP PARTITION then re-insert Same logic — partition-bounded operations are essentially free.
Selective row-level deletion (small set of rows) DELETE FROM … WHERE … (lightweight) Default since v23.3. ~40× faster than heavy mutation.
High-frequency operational deletes (many small) ✅ Patch-part DELETE (lightweight_delete_mode = 'lightweight_update') Each DELETE is a tiny insert, no part rewrite. Up to ~1,000× faster on small/selective changes per ClickHouse benchmarks.
Compliance-grade deletion (GDPR right-to-erasure) ALTER TABLE … DELETE (heavyweight) or APPLY DELETED MASK after LWD When you need "the bytes are physically gone" on completion, mutation is the path. APPLY DELETED MASK (PR #57433) forces materialization of LWD-marked rows.
Frequent updates with occasional deletions (upserts) ReplacingMergeTree(version, is_deleted) + FINAL Engine-native pattern. Tombstones at ingest speed, query-time consistency via optimized FINAL.
Streaming data with explicit cancellation pairs CollapsingMergeTree with Sign +1 / -1 pair "collapses" on merge. Highly efficient for streams that can produce cancellation events.
Deletes scoped to a known partition DELETE FROM … IN PARTITION Planner skips unaffected partitions (PR #67805).
Need to verify a delete worked / forensic audit apply_deleted_mask = 0 Returns rows that LWD has masked but not physically removed (PR #55952).
Need queued deletes visible immediately to queries apply_mutations_on_fly = 1 On-the-fly mutation visibility (PR #74877, #79281).
Deleting more than ~10% of a table 🟡 ALTER TABLE DELETE (heavyweight) or DROP PARTITION Patch-part overhead grows with patch size; classic mutation is more efficient at this scale.
Sub-10ms p99 latency on read-heavy workload with frequent deletes 🟡 Conditional LWD is fast but adds a PREWHERE step. ReplacingMergeTree + FINAL may be faster depending on read-pattern. Benchmark on your workload.

How to Respond to "ClickHouse Can't Delete"

Run the PR numbers.

When someone tells you ClickHouse can't delete data in 2026, ask them what release they're benchmarking against. Specifically:

  • If they're citing "experimental lightweight delete," they're on something pre-v23.3. The setting was renamed and default-enabled in April 2023 (PR #50044).
  • If they're citing "heavyweight mutations only," they're on something pre-v22.8. Standard SQL DELETE FROM has been available for nearly four years.
  • If they're citing "deleted rows linger forever," they're on something pre-v24.x. The merge selector has counted existing rows since PR #58223.
  • If they're citing "DELETEs are invisible until merges finish," they're on something pre-v25.x. On-the-fly mutations (PR #74877) and on-the-fly LWD (PR #79281) closed that gap in early 2025.
  • If they're citing "DELETE breaks projections," they're on something pre-v24.7. lightweight_mutation_projection_mode (PR #65594) gives explicit policy options.
  • If they're citing "patch parts don't exist," they're on something pre-v25.7. PR #82004 shipped them in July 2025.

If they're benchmarking against ClickHouse 21.x or earlier, or repeating 2018-era documentation, they aren't evaluating ClickHouse. They're evaluating a system that no longer exists.

The commit history doesn't lie. 80+ pull requests. Five architectural eras. Four production-grade delete paths. Engine-level deletion patterns through ReplacingMergeTree. Storage-aware merge selection. Patch parts. On-the-fly visibility. Full observability through system.mutations and system.parts.has_lightweight_delete.

ClickHouse's DELETE subsystem in 2026 bears no resemblance to the one that earned the early "immutable" warnings. The engineers built a modern deletion engine, and the evidence is in the PRs.

Test it on your workload. That's the only benchmark that matters.


ClickHouse DELETE FAQ

Can ClickHouse delete data?

Yes. ClickHouse has supported ALTER TABLE … DELETE since 2018 (release 1.1.54388) and standard SQL DELETE FROM since v22.8 (lightweight, default-enabled since v23.3). It also supports bulk deletion via ALTER TABLE … DROP PARTITION, engine-level deletion patterns via ReplacingMergeTree(version, is_deleted), and patch-part-based DELETEs since v25.7. The "ClickHouse is append-only" claim is outdated by eight years.

What's the fastest way to delete in ClickHouse?

The fastest delete in ClickHouse is ALTER TABLE … DROP PARTITION for bulk deletion along a partition boundary — it's essentially constant-time in data volume. For selective row-level deletion, lightweight DELETE FROM (default since v23.3) is roughly 40× faster than heavyweight mutations on the initial mask write (PR #37893 benchmark). For high-frequency small deletes, patch-part DELETEs (v25.7, PR #82004) eliminate the part rewrite entirely.

Do I still need allow_experimental_lightweight_delete?

No. The allow_experimental_lightweight_delete setting was renamed to enable_lightweight_delete and default-enabled in ClickHouse v23.3 (PR #50044, commit 7189481, June 2023). The old name is preserved as a backward-compatibility alias but is no longer required. Anyone telling you to set this flag is benchmarking a release older than three years.

How does lightweight DELETE work?

Lightweight DELETE in ClickHouse implements standard SQL DELETE FROM <table> WHERE … as a hidden ALTER TABLE <table> UPDATE _row_exists = 0 WHERE … mutation. Each MergeTree part has a virtual column _row_exists; setting bits to 0 marks rows as deleted. Reads automatically inject PREWHERE _row_exists so deleted rows are filtered out before the main column scan. For wide-format parts, only the mask file is rewritten — all other column files are hardlinked from the old part. Physical removal happens during the next background merge, or on demand via ALTER TABLE … APPLY DELETED MASK.

What's the difference between lightweight DELETE and heavyweight ALTER DELETE?

In ClickHouse, lightweight DELETE writes a hidden mask and filters deleted rows out at read time; physical rows survive until the next background merge. It returns fast but defers physical cleanup. Heavyweight ALTER TABLE DELETE rewrites all affected parts to physically remove rows; it's slower but guarantees the bytes are gone when the mutation completes. For compliance-grade workloads (GDPR right-to-erasure), heavyweight ALTER DELETE or APPLY DELETED MASK after a lightweight DELETE is the right path.

Is FINAL slow in ClickHouse?

Not anymore. FINAL in ClickHouse was historically slow on ReplacingMergeTree and similar engines, which fueled the critique. It has been significantly optimized in recent versions and is now the recommended path for immediate consistency at query time, regardless of background merge state. For workloads using ReplacingMergeTree(version, is_deleted), SELECT … FINAL is the canonical pattern for ensuring deleted rows are excluded.

Can ReplacingMergeTree handle deletes?

Yes. ClickHouse's ReplacingMergeTree engine supports an is_deleted column parameter for tombstone-style deletion. To delete a row, insert a new record with the same primary key, the latest version, and is_deleted = 1. During a merge, rows where the latest version has is_deleted = 1 are dropped entirely. OPTIMIZE TABLE … FINAL CLEANUP (gated by allow_experimental_cleanup_merges) forces physical removal on demand.

How do I bulk-delete a lot of data efficiently?

In ClickHouse, the most efficient bulk-delete is ALTER TABLE … DROP PARTITION. Since PR #41145, partition drops use durable empty-part tombstones, making them atomic, non-blocking for concurrent reads, and resilient to replica crashes. The operation is essentially constant-time in the data volume, far cheaper than any row-level DELETE for the same scope. The trade-off: you have to design your partitioning key around the dimensions you'll bulk-delete by.

Are queued deletes visible to queries before they finish?

Yes, in ClickHouse v25.x and later. Set apply_mutations_on_fly = 1 (PR #74877, Q1 2025) and queued ALTER UPDATE / ALTER DELETE mutations are applied at SELECT time before background materialization. PR #79281 (April 2025) extended this to lightweight DELETE specifically. The "I deleted a row but a SELECT still returns it" surprise is solved.

What if my table has projections or skip indexes?

If a ClickHouse table with projections or skip indexes needs lightweight DELETE, use the lightweight_mutation_projection_mode table-level setting (PR #65594, v24.7): throw (default), drop, or rebuild. Projections and skip indexes are recalculated correctly during delete-driven merges as of PR #52530 (backported to 22.8 and later). Row-reducing merges trigger projection rebuild (PR #62364) so projections stay consistent with the base table.

How do I monitor in-flight deletes?

In ClickHouse, monitor in-flight deletes via the system.mutations table, which shows queued and running mutations with parts_to_do, is_done, latest_fail_reason, parts_postpone_reasons, and parts_in_progress_names. system.parts includes has_lightweight_delete, removal_state, and last_removal_attempt_time for masked parts. KILL MUTATION cancels a stuck or unwanted mutation. apply_deleted_mask = 0 lets you query rows that LWD has masked but not physically removed, useful for forensics and audits.

What are patch parts?

Patch parts in ClickHouse (PR #82004, v25.7) are a new on-disk architecture for lightweight UPDATEs and DELETEs. Instead of rewriting _row_exists in the source part, a DELETE creates a tiny patch part that records the rows to mark deleted. The patch is applied on read (via merge by sorted system columns or a join, depending on whether the source has been re-merged) and physically merged into the source during the next background merge. For small/selective changes, this eliminates the per-DELETE write cost almost entirely. ClickHouse's benchmarks claim up to ~1,000× speedup on small changes, though that's a vendor benchmark on a favorable workload — treat it as an upper bound. Set lightweight_delete_mode = 'lightweight_update' to enable.


Analysis based on 80+ GitHub pull requests, official ClickHouse changelogs, and release blogs covering the period 2018–2026. Every claim maps to a specific merged PR or commit SHA. Verify the evidence yourself — the commit history is public.

Top comments (0)