<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aman Puri</title>
    <description>The latest articles on DEV Community by Aman Puri (@aman_puri_115f49d5ad3612d).</description>
    <link>https://dev.to/aman_puri_115f49d5ad3612d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3740291%2F0f5e33c4-0b9c-4799-84dc-fc2ac636b55c.png</url>
      <title>DEV Community: Aman Puri</title>
      <link>https://dev.to/aman_puri_115f49d5ad3612d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aman_puri_115f49d5ad3612d"/>
    <language>en</language>
    <item>
      <title>ClickHouse JOINs Aren't Slow Anymore (You're Reading 2020's Docs)</title>
      <dc:creator>Aman Puri</dc:creator>
      <pubDate>Tue, 12 May 2026 18:41:27 +0000</pubDate>
      <link>https://dev.to/aman_puri_115f49d5ad3612d/clickhouse-joins-arent-slow-anymore-youre-reading-2020s-docs-15fe</link>
      <guid>https://dev.to/aman_puri_115f49d5ad3612d/clickhouse-joins-arent-slow-anymore-youre-reading-2020s-docs-15fe</guid>
      <description>&lt;p&gt;ClickHouse JOIN performance was rebuilt between 2022 and 2026. The "avoid JOINs in ClickHouse" advice from 2020 is still circulating:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"ClickHouse can't do JOINs."&lt;/p&gt;

&lt;p&gt;"Denormalize everything."&lt;/p&gt;

&lt;p&gt;"Only hash join is supported."&lt;/p&gt;

&lt;p&gt;"JOINs OOM on anything bigger than RAM."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;All four were accurate in 2020. None of them are accurate today.&lt;/p&gt;

&lt;p&gt;In 2020, ClickHouse had one join algorithm, no disk spilling, no cost-based optimizer, and join order followed query syntax. If the right table did not fit in memory, the query crashed.&lt;/p&gt;

&lt;p&gt;Between 2022 and early 2026, the join subsystem was rebuilt. Six algorithms ship by default. The optimizer reorders joins automatically using column statistics. Runtime bloom filters prune fact tables at the storage scan. Grace hash spills to disk instead of OOMing. None of it is behind feature flags.&lt;/p&gt;

&lt;p&gt;Here is what shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Many JOIN Algorithms Does ClickHouse Support?
&lt;/h2&gt;

&lt;p&gt;By the end of 2022 ClickHouse had five join algorithms. Today it has six, and the engine selects automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hash join.&lt;/strong&gt; The classic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel hash join.&lt;/strong&gt; Default since v24.12 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70788" rel="noopener noreferrer"&gt;PR #70788&lt;/a&gt;). Near-linear scaling across CPU cores. Zero configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grace hash join.&lt;/strong&gt; Disk-spilling for joins that exceed memory (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt;). GA in v24.3. Closes the most upvoted join issue in ClickHouse history, open since June 2020.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full sorting merge.&lt;/strong&gt; Memory-bounded sort-merge (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35796" rel="noopener noreferrer"&gt;PR #35796&lt;/a&gt;). Faster than hash on pre-sorted data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct join.&lt;/strong&gt; O(1) memory lookups against dictionaries and EmbeddedRocksDB. No hash table construction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paste join.&lt;/strong&gt; Positional joins.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"ClickHouse only has hash join" has been false since November 2022.&lt;/p&gt;

&lt;h2&gt;
  
  
  ClickHouse JOIN OOM Crashes: Grace Hash and Disk Spilling
&lt;/h2&gt;

&lt;p&gt;Grace hash join partitions both inputs into buckets via a secondary hash, processes one bucket pair at a time, and spills inactive buckets to disk. Full sorting merge bounds memory by the sort buffer rather than right-side cardinality.&lt;/p&gt;

&lt;p&gt;Before &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt;, a join where the right table exceeded memory crashed with OOM. After it, the join completes. It just takes longer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does ClickHouse Have a Query Optimizer for JOINs?
&lt;/h2&gt;

&lt;p&gt;"ClickHouse has no query optimizer" was true. It has not been since 2024.&lt;/p&gt;

&lt;h3&gt;
  
  
  Equivalence-Set Predicate Pushdown (180× Speedup)
&lt;/h3&gt;

&lt;p&gt;Equivalence-set predicate pushdown (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;PR #61216&lt;/a&gt;) is the highest-impact join optimization of the four-year arc. When tables join on &lt;code&gt;t1.id = t2.id&lt;/code&gt;, a filter &lt;code&gt;WHERE t1.id = 5&lt;/code&gt; is logically equivalent to &lt;code&gt;t2.id = 5&lt;/code&gt;. The optimizer recognizes this and pushes the filter to both sides before the join executes.&lt;/p&gt;

&lt;p&gt;Benchmarks: &lt;strong&gt;up to 180× speedup&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Extended in 2026 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96596" rel="noopener noreferrer"&gt;PR #96596&lt;/a&gt;) to propagate across chains of INNER JOINs using a Disjoint Set Union to track transitive equalities. Filter on &lt;code&gt;t1.id&lt;/code&gt;, get it pushed to &lt;code&gt;t2&lt;/code&gt;, &lt;code&gt;t3&lt;/code&gt;, and beyond.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatic OUTER to INNER Conversion
&lt;/h3&gt;

&lt;p&gt;Automatic OUTER to INNER conversion (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/62907" rel="noopener noreferrer"&gt;PR #62907&lt;/a&gt;) recognizes when post-join filters make outer semantics redundant. A &lt;code&gt;LEFT JOIN ... WHERE right_col IS NOT NULL&lt;/code&gt; becomes an INNER JOIN. One benchmarked query went from 32s to 0.006s.&lt;/p&gt;

&lt;p&gt;Converting to INNER first unlocks predicate pushdown and reordering that are structurally impossible for OUTER JOINs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Right-Side and OR-Condition Pushdown
&lt;/h3&gt;

&lt;p&gt;Two further pushdown extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right-side predicate pushdown (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/50532" rel="noopener noreferrer"&gt;PR #50532&lt;/a&gt;) with 27× improvement on applicable workloads.&lt;/li&gt;
&lt;li&gt;OR-condition pushdown (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/84735" rel="noopener noreferrer"&gt;PR #84735&lt;/a&gt;). Previously only AND conditions could be pushed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No query hints required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost-Based JOIN Reordering in ClickHouse (v25.9+)
&lt;/h2&gt;

&lt;p&gt;Automatic join order optimization landed in v25.9.&lt;/p&gt;

&lt;h3&gt;
  
  
  Greedy JOIN Reordering with Column Statistics
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt; introduced global join reordering using a greedy algorithm with column statistics. For queries joining three or more tables, the optimizer evaluates estimated cardinalities and picks the order that minimizes intermediate result sizes.&lt;/p&gt;

&lt;p&gt;TPC-H SF100: &lt;strong&gt;1,450× speedup, 25× memory reduction&lt;/strong&gt; versus syntax-order execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  DPsize Dynamic Programming JOIN Reordering
&lt;/h3&gt;

&lt;p&gt;v25.12 added a DPsize dynamic programming algorithm (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91002" rel="noopener noreferrer"&gt;PR #91002&lt;/a&gt;) for exhaustive search on complex queries, with greedy fallback above the complexity threshold. PostgreSQL and Oracle planners work the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatic Statistics Collection
&lt;/h3&gt;

&lt;p&gt;Statistics collection went automatic in v25.10 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89332" rel="noopener noreferrer"&gt;PR #89332&lt;/a&gt;). Runtime hash table sizes from previous executions feed back into future plans (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93912" rel="noopener noreferrer"&gt;PR #93912&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Hand-tuned join order is obsolete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runtime Bloom Filters for ClickHouse Star Schema JOINs
&lt;/h2&gt;

&lt;p&gt;Star and snowflake schemas were historically where ClickHouse joins hurt. The fact table is large. The dimensions are small. The fact table got scanned in full, then most rows were discarded.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89314" rel="noopener noreferrer"&gt;PR #89314&lt;/a&gt;, default since v25.10, builds bloom filters from build-side join keys during hash table construction and pushes them to the probe-side scan. Non-matching rows are discarded at the storage layer before reaching the join.&lt;/p&gt;

&lt;p&gt;v25.10 release blog: &lt;strong&gt;2.1× overall query speedup, 7× memory reduction&lt;/strong&gt; on star-schema workloads. Filters can be pushed into PREWHERE (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/95838" rel="noopener noreferrer"&gt;PR #95838&lt;/a&gt;) for further efficiency. An adaptive mechanism disables them at runtime when they stop filtering enough rows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does ClickHouse Support Correlated Subqueries?
&lt;/h2&gt;

&lt;p&gt;True until April 2025. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76078" rel="noopener noreferrer"&gt;PR #76078&lt;/a&gt; shipped correlated subquery decorrelation. EXISTS, scalar, and projection-list subqueries auto-convert into joins.&lt;/p&gt;

&lt;p&gt;Beta with default enablement landed in August 2025 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85107" rel="noopener noreferrer"&gt;PR #85107&lt;/a&gt;), closing &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/6697" rel="noopener noreferrer"&gt;issue #6697&lt;/a&gt;, open since 2019.&lt;/p&gt;

&lt;p&gt;Migration from PostgreSQL or Snowflake no longer requires manual rewriting of correlated subqueries.&lt;/p&gt;

&lt;h2&gt;
  
  
  ClickHouse JOIN Limitations and Trade-offs in 2026
&lt;/h2&gt;

&lt;p&gt;Four caveats remain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flat tables are still faster for sub-10ms p99 dashboards.&lt;/strong&gt; Denormalization wins on extreme latency targets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The optimizer depends on statistics.&lt;/strong&gt; Stale or missing stats degrade plan quality. Monitor &lt;code&gt;system.statistics&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlated subqueries are beta.&lt;/strong&gt; Common patterns work. Edge cases need explicit join rewrites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grace hash trades speed for completion.&lt;/strong&gt; Constant spilling means the data model needs adjustment, not the join algorithm.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Should You Use JOINs in ClickHouse?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Star/snowflake analytics&lt;/td&gt;
&lt;td&gt;Yes. Runtime bloom filters target this exact pattern.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-table reporting&lt;/td&gt;
&lt;td&gt;Yes. The optimizer reorders.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Joins larger than RAM&lt;/td&gt;
&lt;td&gt;Yes. Grace hash and full sorting merge guarantee completion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time dimension enrichment&lt;/td&gt;
&lt;td&gt;Yes. Direct join is O(1) memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ASOF time-series joins&lt;/td&gt;
&lt;td&gt;Yes. Full sorting merge variant is 2× faster, 2× less memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-10ms p99 on complex joins&lt;/td&gt;
&lt;td&gt;Conditional. Denormalize if the storage budget allows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10+ table joins with no stats&lt;/td&gt;
&lt;td&gt;Enable &lt;code&gt;allow_statistics_optimize&lt;/code&gt; (default in v25.10+).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Responding to "avoid JOINs in ClickHouse" in 2026
&lt;/h2&gt;

&lt;p&gt;Check the version.&lt;/p&gt;

&lt;p&gt;Anyone benchmarking against ClickHouse 23.x, or quoting blog posts written before parallel hash join became default, is not evaluating ClickHouse 2026. They are evaluating a system that no longer exists.&lt;/p&gt;

&lt;p&gt;The commit history is public. 50+ pull requests. Six algorithms. Cost-based optimization. Runtime filtering. Automatic algorithm selection, build-side selection, join reordering, predicate pushdown. None behind flags. All default.&lt;/p&gt;

&lt;p&gt;Test it on the workload. That is the only benchmark that matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  ClickHouse JOIN FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Are ClickHouse JOINs production-ready in 2026?
&lt;/h3&gt;

&lt;p&gt;Yes. Six algorithms ship by default, the optimizer reorders joins using column statistics, runtime bloom filters prune fact tables at the storage scan, and grace hash spills to disk instead of OOMing. None of it requires configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many JOIN algorithms does ClickHouse support?
&lt;/h3&gt;

&lt;p&gt;Six: hash, parallel hash (default), grace hash (disk-spilling), full sorting merge, direct (O(1) key-value), and paste. The engine selects automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does ClickHouse still crash with OOM on large JOINs?
&lt;/h3&gt;

&lt;p&gt;No. Grace hash join (GA in v24.3) partitions inputs and spills to disk. Full sorting merge bounds memory by the sort buffer. Both algorithms guarantee completion regardless of right-side size.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I denormalize tables in ClickHouse instead of using JOINs?
&lt;/h3&gt;

&lt;p&gt;Conditional. For sub-10ms p99 dashboard queries, flat tables remain faster. For analytical workloads where query readability and storage efficiency matter, normalized star and snowflake schemas with JOINs perform well after the 2024 to 2026 optimizer work.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>performance</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
