<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohamed Hussain S</title>
    <description>The latest articles on DEV Community by Mohamed Hussain S (@mohhddhassan).</description>
    <link>https://dev.to/mohhddhassan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3228500%2Fc2e87a6e-70e3-4023-a3d3-5adddef072f2.jpeg</url>
      <title>DEV Community: Mohamed Hussain S</title>
      <link>https://dev.to/mohhddhassan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mohhddhassan"/>
    <language>en</language>
    <item>
      <title>When Logs Aren't Enough: Using tcpdump to Debug Real Network Problems</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Mon, 22 Jun 2026 16:13:56 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/when-logs-arent-enough-using-tcpdump-to-debug-real-network-problems-2d39</link>
      <guid>https://dev.to/mohhddhassan/when-logs-arent-enough-using-tcpdump-to-debug-real-network-problems-2d39</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/mohhddhassan/the-hidden-linux-routing-issue-that-broke-my-deployment-5813"&gt;previous post&lt;/a&gt;, I wrote about a Linux routing issue that broke a deployment and caused repeated validation failures.&lt;/p&gt;

&lt;p&gt;What ultimately led me to the root cause wasn't a configuration change or a log entry.&lt;/p&gt;

&lt;p&gt;It was a packet capture.&lt;/p&gt;

&lt;p&gt;This article isn't about a routing issue. It's about the tool that helped uncover it and the lesson I took away from the entire investigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything Looked Healthy
&lt;/h2&gt;

&lt;p&gt;The first step was verifying the basics.&lt;/p&gt;

&lt;p&gt;I checked whether the service was listening on the expected ports:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ss &lt;span class="nt"&gt;-tulpn&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;':80|:443'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything looked normal.&lt;/p&gt;

&lt;p&gt;Next, I verified the application itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application responded immediately.&lt;/p&gt;

&lt;p&gt;I also verified DNS resolution and confirmed the domain was pointing to the correct public IP.&lt;/p&gt;

&lt;p&gt;At this point, nothing looked obviously wrong. The application was healthy, the reverse proxy was healthy, and the network configuration appeared healthy. Yet validation attempts continued to fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Logs Weren't Helping
&lt;/h2&gt;

&lt;p&gt;The logs consistently showed variations of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;authorization failed
timeout during connect
likely firewall problem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem was that the logs only described the symptom.&lt;/p&gt;

&lt;p&gt;They didn't explain why it was happening.&lt;/p&gt;

&lt;p&gt;So I started investigating the usual suspects: DNS, firewall rules, reverse proxy configuration, and listening ports.&lt;/p&gt;

&lt;p&gt;Everything continued to look fine.&lt;/p&gt;

&lt;p&gt;The more I investigated, the less sense the issue made.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Beyond The Logs
&lt;/h2&gt;

&lt;p&gt;At some point I realized I was only looking at what the software was reporting.&lt;/p&gt;

&lt;p&gt;I wasn't looking at what the network was actually doing.&lt;/p&gt;

&lt;p&gt;So I decided to capture the traffic directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;tcpdump &lt;span class="nt"&gt;-ni&lt;/span&gt; ens3 tcp port 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I triggered another validation attempt and watched the packets arrive.&lt;/p&gt;

&lt;p&gt;Almost immediately, I saw something interesting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IP 124.x.x.x &amp;gt; 51.x.x.x.80: Flags [S]
IP 124.x.x.x &amp;gt; 51.x.x.x.80: Flags [S]
IP 124.x.x.x &amp;gt; 51.x.x.x.80: Flags [S]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The requests were reaching the server.&lt;/p&gt;

&lt;p&gt;That single observation completely changed the direction of the investigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Real Clue
&lt;/h2&gt;

&lt;p&gt;Up until that moment, I had been operating under the assumption that external systems couldn't reach the server.&lt;/p&gt;

&lt;p&gt;The packet capture proved otherwise.&lt;/p&gt;

&lt;p&gt;Traffic was arriving.&lt;/p&gt;

&lt;p&gt;The server was receiving connection attempts.&lt;/p&gt;

&lt;p&gt;The problem wasn't inbound connectivity.&lt;/p&gt;

&lt;p&gt;The problem was somewhere after that.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why can't external systems reach my server?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I started asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If traffic is reaching the server, why isn't the connection completing?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift in thinking changed the entire investigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What tcpdump Revealed
&lt;/h2&gt;

&lt;p&gt;To understand why the packet capture was so important, it helps to understand what a normal TCP connection looks like.&lt;/p&gt;

&lt;p&gt;A healthy connection follows a three-way handshake:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client  -&amp;gt; SYN      -&amp;gt; Server
Client &amp;lt;- SYN-ACK   &amp;lt;- Server
Client  -&amp;gt; ACK      -&amp;gt; Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I was actually seeing looked more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client  -&amp;gt; SYN      -&amp;gt; Server
Client  -&amp;gt; SYN      -&amp;gt; Server (Retry)
Client  -&amp;gt; SYN      -&amp;gt; Server (Retry)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The incoming connection attempts were reaching the server, but the connection was never being established successfully.&lt;/p&gt;

&lt;p&gt;That immediately ruled out several possibilities.&lt;/p&gt;

&lt;p&gt;DNS wasn't the problem because requests were arriving.&lt;/p&gt;

&lt;p&gt;The reverse proxy wasn't the problem because it was listening correctly.&lt;/p&gt;

&lt;p&gt;The application wasn't the problem because it responded locally.&lt;/p&gt;

&lt;p&gt;Within a few minutes, the packet capture had eliminated entire categories of potential root causes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Following The Evidence
&lt;/h2&gt;

&lt;p&gt;Once I knew inbound traffic was reaching the server, I shifted my attention toward the network path itself.&lt;/p&gt;

&lt;p&gt;I started examining interfaces, routes, and outbound traffic behaviour using commands like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route get 8.8.8.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those commands eventually exposed the real issue.&lt;/p&gt;

&lt;p&gt;The server had multiple network interfaces, and outbound traffic was being routed through an unexpected path. That routing behaviour was causing validation attempts to fail even though the service itself was perfectly healthy.&lt;/p&gt;

&lt;p&gt;The actual root cause turned out to be a Linux routing issue.&lt;/p&gt;

&lt;p&gt;But I might never have found it if I hadn't first verified what was happening on the wire.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why tcpdump Was The Turning Point
&lt;/h2&gt;

&lt;p&gt;Before running tcpdump, I was relying entirely on logs and assumptions.&lt;/p&gt;

&lt;p&gt;The logs suggested a firewall issue.&lt;/p&gt;

&lt;p&gt;The packet capture showed requests reaching the server.&lt;/p&gt;

&lt;p&gt;Those two observations pointed in completely different directions.&lt;/p&gt;

&lt;p&gt;Without the packet capture, I probably would have continued tweaking firewall rules, reverse proxy settings, and application configuration.&lt;/p&gt;

&lt;p&gt;Instead, the investigation moved toward routing almost immediately.&lt;/p&gt;

&lt;p&gt;That's what made tcpdump so valuable.&lt;/p&gt;

&lt;p&gt;It wasn't the tool that solved the problem.&lt;/p&gt;

&lt;p&gt;It was the tool that revealed reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Logs Don't Tell The Entire Story
&lt;/h3&gt;

&lt;p&gt;Logs are useful, but they're generated by software. They only describe what the application believes is happening.&lt;/p&gt;

&lt;p&gt;Sometimes that's not enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Verify Assumptions Early
&lt;/h3&gt;

&lt;p&gt;I spent time investigating DNS, firewall rules, and reverse proxy configuration because they seemed like the most likely causes.&lt;/p&gt;

&lt;p&gt;The packet capture disproved those assumptions within minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Packet Captures Can Change Everything
&lt;/h3&gt;

&lt;p&gt;You don't need advanced networking knowledge to get value from tcpdump.&lt;/p&gt;

&lt;p&gt;Even a simple capture can tell you whether traffic is arriving, leaving, or disappearing somewhere in between.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Eliminate Entire Categories Of Problems
&lt;/h3&gt;

&lt;p&gt;One of the biggest advantages of packet captures is that they quickly rule things out.&lt;/p&gt;

&lt;p&gt;Sometimes that's more valuable than finding the answer immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This incident reminded me that assumptions can be surprisingly expensive.&lt;/p&gt;

&lt;p&gt;The logs pointed toward a firewall issue. The services looked healthy. Everything seemed to suggest a particular problem.&lt;/p&gt;

&lt;p&gt;But the moment I looked at the packets, the entire investigation changed direction.&lt;/p&gt;

&lt;p&gt;Since then, whenever a network issue doesn't make sense, I try to reach for &lt;code&gt;tcpdump&lt;/code&gt; much earlier.&lt;/p&gt;

&lt;p&gt;Because logs tell you what software thinks happened.&lt;/p&gt;

&lt;p&gt;Packets show you what actually happened.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>networking</category>
      <category>devops</category>
      <category>tcpdump</category>
    </item>
    <item>
      <title>The Hidden Linux Routing Issue That Broke My Deployment</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Wed, 17 Jun 2026 03:41:37 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/the-hidden-linux-routing-issue-that-broke-my-deployment-5813</link>
      <guid>https://dev.to/mohhddhassan/the-hidden-linux-routing-issue-that-broke-my-deployment-5813</guid>
      <description>&lt;p&gt;The deployment should have taken a few minutes.&lt;/p&gt;

&lt;p&gt;The application was running, DNS was configured correctly, and the domain was already pointing to the server's public IP. Caddy was configured as a reverse proxy and was listening on ports 80 and 443. Every item on my deployment checklist appeared healthy.&lt;/p&gt;

&lt;p&gt;Yet every Let's Encrypt validation attempt kept failing.&lt;/p&gt;

&lt;p&gt;The error looked simple enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;authorization failed
timeout during connect
likely firewall problem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, I believed it.&lt;/p&gt;

&lt;p&gt;I checked DNS resolution, verified firewall rules, confirmed that Caddy was listening on the expected ports, and made sure the application itself was reachable. Every check came back clean.&lt;/p&gt;

&lt;p&gt;That was the first clue that the problem might not be where the logs were pointing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Obvious Things
&lt;/h2&gt;

&lt;p&gt;The first assumption was DNS.&lt;/p&gt;

&lt;p&gt;I verified that the domain resolved to the correct public IP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dig +short my-domain.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything looked correct.&lt;/p&gt;

&lt;p&gt;Next came the firewall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ufw status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ports 80 and 443 were open. There were no unexpected deny rules, and nothing suggested inbound traffic was being blocked.&lt;/p&gt;

&lt;p&gt;Then I checked whether Caddy was actually listening.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ss &lt;span class="nt"&gt;-tulpn&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;':80|:443'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, everything looked normal.&lt;/p&gt;

&lt;p&gt;The application itself was healthy too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;returned a valid response.&lt;/p&gt;

&lt;p&gt;At this point I had checked most of the things engineers typically check when certificate validation fails. DNS looked good, the firewall looked good, the reverse proxy was healthy, and the application was running.&lt;/p&gt;

&lt;p&gt;Yet the validation errors continued.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part That Sent Me In The Wrong Direction
&lt;/h2&gt;

&lt;p&gt;The error messages kept mentioning connectivity problems and possible firewall issues.&lt;/p&gt;

&lt;p&gt;That wording influenced my thinking more than it should have.&lt;/p&gt;

&lt;p&gt;I spent time investigating firewall rules, reverse proxy configuration, TLS settings, and domain configuration. Every new hypothesis felt reasonable, but none of them explained why local tests consistently succeeded while external validation continued to fail.&lt;/p&gt;

&lt;p&gt;The contradiction kept bothering me.&lt;/p&gt;

&lt;p&gt;If the service was truly unreachable, why did everything work from inside the server?&lt;/p&gt;

&lt;h2&gt;
  
  
  Then I Hit The Rate Limit
&lt;/h2&gt;

&lt;p&gt;This was the point where I realized I was no longer troubleshooting.&lt;/p&gt;

&lt;p&gt;I was guessing.&lt;/p&gt;

&lt;p&gt;After several failed validation attempts, Let's Encrypt stopped accepting new authorization requests and returned a rate-limit error.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;too many failed authorizations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I had burned through multiple validation attempts without actually understanding the root cause.&lt;/p&gt;

&lt;p&gt;Looking back, this was probably the most useful lesson from the entire incident.&lt;/p&gt;

&lt;p&gt;Repeatedly retrying a failing system is not the same thing as debugging it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking At The Network Instead Of The Logs
&lt;/h2&gt;

&lt;p&gt;At this point I stopped changing configurations and started gathering evidence.&lt;/p&gt;

&lt;p&gt;The first useful clue came from tcpdump.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;tcpdump &lt;span class="nt"&gt;-ni&lt;/span&gt; ens3 tcp port 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While monitoring traffic, I triggered requests from outside the server.&lt;/p&gt;

&lt;p&gt;The packet capture immediately showed incoming connection attempts reaching the machine.&lt;/p&gt;

&lt;p&gt;That was important.&lt;/p&gt;

&lt;p&gt;It meant DNS was working.&lt;/p&gt;

&lt;p&gt;It meant external traffic was reaching the public interface.&lt;/p&gt;

&lt;p&gt;It meant the firewall was not silently dropping inbound requests.&lt;/p&gt;

&lt;p&gt;The requests were arriving exactly where they were supposed to.&lt;/p&gt;

&lt;p&gt;So why was validation timing out?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Routing Table Finally Revealed The Problem
&lt;/h2&gt;

&lt;p&gt;The next step was checking the routing table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;default via 10.2.0.1 dev ens4 metric 100
default via 51.x.x.x dev ens3 metric 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server had two network interfaces.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ens3 connected to the public network&lt;/li&gt;
&lt;li&gt;ens4 connected to a private network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially, I didn't think much of it. Multi-interface servers are fairly common.&lt;/p&gt;

&lt;p&gt;Then I started checking where outbound traffic was actually leaving.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route get 8.8.8.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result surprised me.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8.8.8.8 via 10.2.0.1 dev ens4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I tested several additional destinations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route get 1.1.1.1
ip route get 8.8.4.4
ip route get &amp;lt;validator-ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every single lookup showed outbound traffic leaving through the private interface.&lt;/p&gt;

&lt;p&gt;That was the breakthrough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding What Was Actually Happening
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A Quick Note About Asymmetric Routing
&lt;/h3&gt;

&lt;p&gt;The issue I was dealing with has a name: asymmetric routing.&lt;/p&gt;

&lt;p&gt;Traffic was entering the server through the public interface (&lt;code&gt;ens3&lt;/code&gt;), but Linux was attempting to send replies through the private interface (&lt;code&gt;ens4&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;From the application's perspective everything looked healthy.&lt;/p&gt;

&lt;p&gt;From Let's Encrypt's perspective the connection never completed successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Can Cause Timeouts
&lt;/h3&gt;

&lt;p&gt;While investigating the issue, I came across Linux's Reverse Path Filtering (&lt;code&gt;rp_filter&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;When a packet arrives on one interface but Linux believes the reply should leave through another, the kernel may treat the traffic as suspicious and drop it.&lt;/p&gt;

&lt;p&gt;Whether the packet was being dropped by &lt;code&gt;rp_filter&lt;/code&gt;, upstream networking, or another layer wasn't something I conclusively proved.&lt;/p&gt;

&lt;p&gt;But understanding this interaction finally explained why inbound requests were visible while validation attempts still timed out.&lt;/p&gt;

&lt;p&gt;Let's Encrypt validators were connecting to my public IP.&lt;/p&gt;

&lt;p&gt;Those packets arrived through the public interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let's Encrypt
      |
      v
Public Interface (ens3)
      |
      v
    Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So far, everything was fine.&lt;/p&gt;

&lt;p&gt;The problem appeared when Linux generated a response.&lt;/p&gt;

&lt;p&gt;Instead of sending the response back through the same public interface, the routing table was selecting the private interface as the preferred outbound path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let's Encrypt
      |
      v
Public Interface (ens3)
      |
      v
    Server
      |
      v
Private Interface (ens4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a classic networking issue known as &lt;strong&gt;asymmetric routing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Traffic enters through one interface and attempts to leave through another.&lt;/p&gt;

&lt;p&gt;From the application's perspective, everything appears healthy.&lt;/p&gt;

&lt;p&gt;From the remote system's perspective, the connection never completes correctly.&lt;/p&gt;

&lt;p&gt;The result is timeouts.&lt;/p&gt;

&lt;p&gt;Exactly what Let's Encrypt was reporting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Was So Difficult To Find
&lt;/h2&gt;

&lt;p&gt;The issue hid behind several misleading signals.&lt;/p&gt;

&lt;p&gt;The application was healthy.&lt;/p&gt;

&lt;p&gt;The reverse proxy was healthy.&lt;/p&gt;

&lt;p&gt;DNS was correct.&lt;/p&gt;

&lt;p&gt;Ports were open.&lt;/p&gt;

&lt;p&gt;The firewall was configured properly.&lt;/p&gt;

&lt;p&gt;Every layer looked healthy when viewed independently.&lt;/p&gt;

&lt;p&gt;The actual failure existed underneath all of them.&lt;/p&gt;

&lt;p&gt;Most deployment troubleshooting guides focus on application configuration, reverse proxies, certificates, and firewall rules. Very few immediately point you toward route selection.&lt;/p&gt;

&lt;p&gt;Especially when the server appears to be functioning normally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;Once the routing issue was identified, the fix itself was straightforward.&lt;/p&gt;

&lt;p&gt;The server needed to use the public interface for internet-bound traffic instead of attempting to route those responses through the private network.&lt;/p&gt;

&lt;p&gt;After correcting the routing configuration, I verified the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route get 8.8.8.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output now showed traffic leaving through the public interface.&lt;/p&gt;

&lt;p&gt;Exactly what I wanted.&lt;/p&gt;

&lt;p&gt;I restarted Caddy and triggered another validation attempt.&lt;/p&gt;

&lt;p&gt;This time the validators connected successfully, the challenge completed, and the certificate was issued within seconds.&lt;/p&gt;

&lt;p&gt;Hours of troubleshooting ultimately came down to a routing decision that Linux was making automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;A few takeaways from this incident stood out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error messages often describe symptoms, not causes
&lt;/h3&gt;

&lt;p&gt;The logs repeatedly suggested firewall issues.&lt;/p&gt;

&lt;p&gt;The firewall was never the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop retrying and start investigating
&lt;/h3&gt;

&lt;p&gt;I hit Let's Encrypt's authorization limits because I kept retrying before understanding the failure.&lt;/p&gt;

&lt;p&gt;That was entirely avoidable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Packet captures reveal reality
&lt;/h3&gt;

&lt;p&gt;When logs become confusing, tcpdump often provides a much clearer picture of what is actually happening on the network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-interface servers deserve extra scrutiny
&lt;/h3&gt;

&lt;p&gt;If a server has both public and private interfaces, route selection should be one of the first things you verify.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two commands can save hours
&lt;/h3&gt;

&lt;p&gt;If you're debugging unexplained connectivity issues, run these early:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip route

ip route get 8.8.8.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those two commands exposed the real problem faster than everything else I tried.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I started this investigation convinced I had a TLS problem.&lt;/p&gt;

&lt;p&gt;Then I thought it was DNS.&lt;/p&gt;

&lt;p&gt;Then I suspected the firewall.&lt;/p&gt;

&lt;p&gt;Then I questioned my reverse proxy configuration.&lt;/p&gt;

&lt;p&gt;In the end, none of those were responsible.&lt;/p&gt;

&lt;p&gt;The real issue was a routing decision happening at the operating system level long before the request ever reached my application.&lt;/p&gt;

&lt;p&gt;And like most memorable debugging sessions, the hardest part wasn't fixing the problem.&lt;/p&gt;

&lt;p&gt;It was figuring out where the problem actually lived.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>devops</category>
      <category>networking</category>
      <category>debugging</category>
    </item>
    <item>
      <title>ClickHouse Duplicates: Clean Your Results vs. Clean Your Storage</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Sat, 13 Jun 2026 12:36:42 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/select-final-and-optimize-final-are-not-the-same-thing-7ak</link>
      <guid>https://dev.to/mohhddhassan/select-final-and-optimize-final-are-not-the-same-thing-7ak</guid>
      <description>&lt;p&gt;The word &lt;code&gt;FINAL&lt;/code&gt; appears in multiple places in ClickHouse.&lt;/p&gt;

&lt;p&gt;Two of the most commonly confused examples are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, they sound like they should do roughly the same thing.&lt;/p&gt;

&lt;p&gt;After all, both contain the word &lt;code&gt;FINAL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But they actually solve two completely different problems.&lt;/p&gt;

&lt;p&gt;One affects query results.&lt;/p&gt;

&lt;p&gt;The other affects how data is physically stored.&lt;/p&gt;

&lt;p&gt;Understanding this distinction can save a lot of confusion when working with MergeTree tables.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why This Confusion Happens
&lt;/h1&gt;

&lt;p&gt;Most people encounter &lt;code&gt;FINAL&lt;/code&gt; while working with engines like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ReplacingMergeTree&lt;/li&gt;
&lt;li&gt;SummingMergeTree&lt;/li&gt;
&lt;li&gt;AggregatingMergeTree&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sooner or later they notice something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;returns duplicate versions of rows.&lt;/p&gt;

&lt;p&gt;Then they discover:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and suddenly the results look correct.&lt;/p&gt;

&lt;p&gt;Naturally, many people assume:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;FINAL merges the table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But that's not exactly what is happening.&lt;/p&gt;




&lt;h1&gt;
  
  
  What SELECT FINAL Actually Does
&lt;/h1&gt;

&lt;p&gt;When you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse applies merge logic during query execution.&lt;/p&gt;

&lt;p&gt;Think of it as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Show me what the table would look like if all relevant merges had already happened."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The important part:&lt;/p&gt;

&lt;p&gt;It only affects the query result.&lt;/p&gt;

&lt;p&gt;After the query finishes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parts remain unchanged&lt;/li&gt;
&lt;li&gt;storage remains unchanged&lt;/li&gt;
&lt;li&gt;nothing is rewritten on disk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The merge logic happens temporarily while the query is running.&lt;/p&gt;

&lt;p&gt;Once the query completes, the table is exactly as it was before.&lt;/p&gt;




&lt;h1&gt;
  
  
  What OPTIMIZE FINAL Actually Does
&lt;/h1&gt;

&lt;p&gt;Now let's look at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a completely different operation.&lt;/p&gt;

&lt;p&gt;Instead of modifying query results, ClickHouse physically merges parts on disk.&lt;/p&gt;

&lt;p&gt;The operation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rewrites data&lt;/li&gt;
&lt;li&gt;merges eligible parts&lt;/li&gt;
&lt;li&gt;removes obsolete versions&lt;/li&gt;
&lt;li&gt;creates larger merged parts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike &lt;code&gt;SELECT FINAL&lt;/code&gt;, the effects remain after the command completes.&lt;/p&gt;

&lt;p&gt;This is a storage operation, not a query operation.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Simplest Way to Remember It
&lt;/h1&gt;

&lt;p&gt;Whenever I think about these commands, I use a very simple mental model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SELECT FINAL&lt;/td&gt;
&lt;td&gt;Clean the result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPTIMIZE FINAL&lt;/td&gt;
&lt;td&gt;Clean the storage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's really the core difference.&lt;/p&gt;

&lt;p&gt;One affects what you see.&lt;/p&gt;

&lt;p&gt;The other affects how the data is stored.&lt;/p&gt;




&lt;h1&gt;
  
  
  Does OPTIMIZE FINAL Create One Giant Part?
&lt;/h1&gt;

&lt;p&gt;This is another common misconception.&lt;/p&gt;

&lt;p&gt;Suppose your table is partitioned like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-01
2025-02
2025-03
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many people assume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;will merge the entire table into one huge part.&lt;/p&gt;

&lt;p&gt;It won't.&lt;/p&gt;

&lt;p&gt;Merge operations do not cross partition boundaries.&lt;/p&gt;

&lt;p&gt;What you are more likely to end up with is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-01 -&amp;gt; one large part
2025-02 -&amp;gt; one large part
2025-03 -&amp;gt; one large part
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each partition is optimized independently.&lt;/p&gt;

&lt;p&gt;This distinction becomes important when working with large datasets.&lt;/p&gt;




&lt;h1&gt;
  
  
  Should You Use SELECT FINAL Everywhere?
&lt;/h1&gt;

&lt;p&gt;Not really.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FINAL&lt;/code&gt; is incredibly useful when correctness matters.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;may be exactly what you need when querying a ReplacingMergeTree table and you want the latest state immediately.&lt;/p&gt;

&lt;p&gt;But it still introduces additional work during query execution.&lt;/p&gt;

&lt;p&gt;So while modern ClickHouse versions have significantly improved FINAL performance, it shouldn't automatically become your default query pattern.&lt;/p&gt;

&lt;p&gt;Use it when you need the merge logic.&lt;/p&gt;

&lt;p&gt;Not because it's available.&lt;/p&gt;




&lt;h1&gt;
  
  
  Should You Run OPTIMIZE FINAL Regularly?
&lt;/h1&gt;

&lt;p&gt;Also no.&lt;/p&gt;

&lt;p&gt;This is another mistake people sometimes make.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;OPTIMIZE FINAL&lt;/code&gt; is a heavy operation.&lt;/p&gt;

&lt;p&gt;It forces merges that ClickHouse would normally schedule on its own.&lt;/p&gt;

&lt;p&gt;In many cases, background merges already do a good job of maintaining healthy storage.&lt;/p&gt;

&lt;p&gt;Running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;every time you insert data is usually unnecessary.&lt;/p&gt;

&lt;p&gt;Think of it as an operational tool.&lt;/p&gt;

&lt;p&gt;Not a routine query optimization technique.&lt;/p&gt;




&lt;h1&gt;
  
  
  When Would You Use Each?
&lt;/h1&gt;

&lt;h3&gt;
  
  
  SELECT FINAL
&lt;/h3&gt;

&lt;p&gt;Useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;querying ReplacingMergeTree tables&lt;/li&gt;
&lt;li&gt;validating latest state&lt;/li&gt;
&lt;li&gt;merge results are needed immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  OPTIMIZE FINAL
&lt;/h3&gt;

&lt;p&gt;Useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;forcing merges intentionally&lt;/li&gt;
&lt;li&gt;maintenance operations&lt;/li&gt;
&lt;li&gt;testing storage behavior&lt;/li&gt;
&lt;li&gt;special operational situations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both have valid use cases.&lt;/p&gt;

&lt;p&gt;They simply solve different problems.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;The word &lt;code&gt;FINAL&lt;/code&gt; appears in both commands, which makes them easy to confuse.&lt;/p&gt;

&lt;p&gt;But once you understand the difference, many ClickHouse behaviors start making a lot more sense.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT FINAL&lt;/code&gt; does not physically merge your table.&lt;/p&gt;

&lt;p&gt;It only applies merge logic while reading data.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;OPTIMIZE FINAL&lt;/code&gt; actually rewrites and merges parts on disk.&lt;/p&gt;

&lt;p&gt;Or put another way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;SELECT FINAL cleans what you see.&lt;/p&gt;

&lt;p&gt;OPTIMIZE FINAL cleans how the data is stored.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that's a distinction every ClickHouse engineer should understand.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>sql</category>
    </item>
    <item>
      <title>Why ClickHouse Loves Append-Heavy Workloads</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Wed, 27 May 2026 09:24:45 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/why-clickhouse-loves-append-heavy-workloads-3m4h</link>
      <guid>https://dev.to/mohhddhassan/why-clickhouse-loves-append-heavy-workloads-3m4h</guid>
      <description>&lt;p&gt;One thing that makes ClickHouse feel very different from traditional OLTP databases is how much it prefers append-heavy workloads.&lt;/p&gt;

&lt;p&gt;And once you understand why, many ClickHouse behaviors suddenly start making sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;immutable parts&lt;/li&gt;
&lt;li&gt;background merges&lt;/li&gt;
&lt;li&gt;ingestion batching&lt;/li&gt;
&lt;li&gt;merge pressure&lt;/li&gt;
&lt;li&gt;even why &lt;code&gt;FINAL&lt;/code&gt; exists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first, this can feel strange if you are coming from databases like PostgreSQL or MySQL where updates and row modifications are extremely normal.&lt;/p&gt;

&lt;p&gt;But analytical databases think very differently internally.&lt;/p&gt;




&lt;h1&gt;
  
  
  Traditional OLTP Systems Think in Terms of Updates
&lt;/h1&gt;

&lt;p&gt;In most transactional databases, modifying rows constantly is completely normal.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;last_login&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These systems are heavily optimized for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;transactional correctness&lt;/li&gt;
&lt;li&gt;row-level updates&lt;/li&gt;
&lt;li&gt;operational consistency&lt;/li&gt;
&lt;li&gt;frequent modifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because that is exactly what OLTP workloads need.&lt;/p&gt;

&lt;p&gt;And honestly, PostgreSQL is incredibly good at this.&lt;/p&gt;




&lt;h1&gt;
  
  
  ClickHouse Thinks Very Differently
&lt;/h1&gt;

&lt;p&gt;ClickHouse is not primarily designed around transactional row updates.&lt;/p&gt;

&lt;p&gt;It is designed around analytical workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;event streams&lt;/li&gt;
&lt;li&gt;historical analytics&lt;/li&gt;
&lt;li&gt;large aggregations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And these workloads are naturally append-heavy.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New events continuously arrive.&lt;/p&gt;

&lt;p&gt;Old data is rarely modified frequently.&lt;/p&gt;

&lt;p&gt;That changes the entire storage philosophy underneath.&lt;/p&gt;




&lt;h1&gt;
  
  
  ClickHouse Stores Data as Immutable Parts
&lt;/h1&gt;

&lt;p&gt;This is one of the most important concepts to understand.&lt;/p&gt;

&lt;p&gt;In MergeTree engines, ClickHouse stores inserts as immutable parts on disk.&lt;/p&gt;

&lt;p&gt;Meaning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;inserts create new parts instead of constantly rewriting existing rows directly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And honestly, this is one of the biggest reasons ClickHouse scales analytical ingestion so well.&lt;/p&gt;

&lt;p&gt;Because append-heavy writes are operationally much cheaper than constantly rewriting data in place.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Immutable Storage Works So Well
&lt;/h1&gt;

&lt;p&gt;Immutable storage gives ClickHouse several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;efficient sequential writes&lt;/li&gt;
&lt;li&gt;better compression&lt;/li&gt;
&lt;li&gt;reduced locking pressure&lt;/li&gt;
&lt;li&gt;faster analytical scans&lt;/li&gt;
&lt;li&gt;simpler background merging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of constantly modifying rows directly, ClickHouse can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;append data quickly&lt;/li&gt;
&lt;li&gt;merge parts later&lt;/li&gt;
&lt;li&gt;optimize storage asynchronously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model fits analytical systems extremely well.&lt;/p&gt;

&lt;p&gt;Especially when ingesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;telemetry&lt;/li&gt;
&lt;li&gt;clickstream data&lt;/li&gt;
&lt;li&gt;observability events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;at very large scale.&lt;/p&gt;




&lt;h1&gt;
  
  
  Columnar Storage Makes This Even More Powerful
&lt;/h1&gt;

&lt;p&gt;Another reason append-heavy storage works so well in ClickHouse is because data is stored by column instead of by row.&lt;/p&gt;

&lt;p&gt;This matters a lot for analytical workloads.&lt;/p&gt;

&lt;p&gt;Because queries often need only a few columns from massive datasets.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_time_ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;does not need to read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user_agent&lt;/li&gt;
&lt;li&gt;request_headers&lt;/li&gt;
&lt;li&gt;payload columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;at all.&lt;/p&gt;

&lt;p&gt;And because parts are immutable, ClickHouse can compress these columns extremely efficiently using specialized compression algorithms.&lt;/p&gt;

&lt;p&gt;This is one of the reasons analytical scans in ClickHouse can remain surprisingly fast even at very large scale.&lt;/p&gt;




&lt;h1&gt;
  
  
  This Is Why Background Merges Exist
&lt;/h1&gt;

&lt;p&gt;One thing that confused me initially was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;why ClickHouse relies so heavily on merges.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But once you understand immutable parts, merges make perfect sense.&lt;/p&gt;

&lt;p&gt;Because inserts continuously create smaller parts.&lt;/p&gt;

&lt;p&gt;And background merges later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;combine them&lt;/li&gt;
&lt;li&gt;reduce fragmentation&lt;/li&gt;
&lt;li&gt;improve compression&lt;/li&gt;
&lt;li&gt;optimize query performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tiny inserts become dangerous&lt;/li&gt;
&lt;li&gt;too many parts create pressure&lt;/li&gt;
&lt;li&gt;unhealthy fragmentation slows systems down&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many ClickHouse operational behaviors trace back to this append-heavy storage philosophy underneath.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Updates Feel Different in ClickHouse
&lt;/h1&gt;

&lt;p&gt;This does &lt;em&gt;not&lt;/em&gt; mean ClickHouse cannot handle updates.&lt;/p&gt;

&lt;p&gt;It absolutely can.&lt;/p&gt;

&lt;p&gt;But updates behave differently because the storage engine is optimized differently.&lt;/p&gt;

&lt;p&gt;In many cases, updates are internally handled through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mutations&lt;/li&gt;
&lt;li&gt;part rewrites&lt;/li&gt;
&lt;li&gt;asynchronous merge operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;instead of lightweight in-place row modifications like traditional OLTP systems.&lt;/p&gt;

&lt;p&gt;And this is why large-scale frequent updates can feel operationally heavier in ClickHouse.&lt;/p&gt;

&lt;p&gt;Because the system is optimizing for analytical scale first.&lt;/p&gt;

&lt;p&gt;Not transactional mutation-heavy workloads.&lt;/p&gt;




&lt;h1&gt;
  
  
  Many Systems Handle Updates as New Inserts Instead
&lt;/h1&gt;

&lt;p&gt;One thing I found interesting is that many ClickHouse workloads avoid frequent in-place updates entirely.&lt;/p&gt;

&lt;p&gt;Instead, systems often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;insert newer versions of rows&lt;/li&gt;
&lt;li&gt;append updated events&lt;/li&gt;
&lt;li&gt;track timestamps or versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and later use things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;argMax()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;merge logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to retrieve the latest state.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;argMax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_status&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This fits naturally into ClickHouse’s append-heavy design philosophy.&lt;/p&gt;

&lt;p&gt;Instead of constantly rewriting rows directly, systems continuously append newer versions while merges and analytical queries reconcile state later.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Event Streams Fit ClickHouse So Naturally
&lt;/h1&gt;

&lt;p&gt;This is honestly where ClickHouse feels extremely powerful.&lt;/p&gt;

&lt;p&gt;Modern systems continuously generate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;traces&lt;/li&gt;
&lt;li&gt;user events&lt;/li&gt;
&lt;li&gt;telemetry streams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And these workloads naturally behave like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;append-heavy event streams.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;New records continuously arrive.&lt;/p&gt;

&lt;p&gt;Historical records mostly remain unchanged.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of workload ClickHouse loves.&lt;/p&gt;

&lt;p&gt;Which is why architectures like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Applications
      ↓
Kafka / Streaming
      ↓
ClickHouse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;feel so natural operationally.&lt;/p&gt;

&lt;p&gt;The storage model aligns perfectly with the workload behavior.&lt;/p&gt;




&lt;h1&gt;
  
  
  This Also Explains Why FINAL Exists
&lt;/h1&gt;

&lt;p&gt;A lot of ClickHouse behavior becomes easier to understand once you think in terms of append-heavy storage.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;code&gt;ReplacingMergeTree&lt;/code&gt; may temporarily contain multiple versions of rows until merges eventually reconcile them.&lt;/p&gt;

&lt;p&gt;That is why queries sometimes use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to apply merge logic during query execution.&lt;/p&gt;

&lt;p&gt;Again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;immutable parts&lt;/li&gt;
&lt;li&gt;append-heavy ingestion&lt;/li&gt;
&lt;li&gt;asynchronous merging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;all connect back together underneath.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Important Lesson
&lt;/h1&gt;

&lt;p&gt;One thing I’ve started realizing with ClickHouse is that many operational behaviors make much more sense once you stop thinking in terms of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"traditional transactional databases."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ClickHouse is optimizing for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;analytical ingestion&lt;/li&gt;
&lt;li&gt;historical querying&lt;/li&gt;
&lt;li&gt;append-heavy workloads&lt;/li&gt;
&lt;li&gt;large-scale scans &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And once you understand that design philosophy, many of its storage behaviors stop feeling strange.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;ClickHouse is not trying to behave like a traditional OLTP database.&lt;/p&gt;

&lt;p&gt;It is optimizing for analytical scale.&lt;/p&gt;

&lt;p&gt;And append-heavy design is one of the biggest reasons it performs so well for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;event streams&lt;/li&gt;
&lt;li&gt;analytical systems&lt;/li&gt;
&lt;li&gt;real-time analytics workloads&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>clickhouse</category>
      <category>olap</category>
      <category>databasearchitecture</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Why Too Many Parts Hurt ClickHouse Performance</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Mon, 25 May 2026 14:00:25 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/why-too-many-parts-hurt-clickhouse-performance-4c7n</link>
      <guid>https://dev.to/mohhddhassan/why-too-many-parts-hurt-clickhouse-performance-4c7n</guid>
      <description>&lt;p&gt;A lot of people initially think ClickHouse performance problems come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large queries&lt;/li&gt;
&lt;li&gt;bad joins&lt;/li&gt;
&lt;li&gt;massive datasets&lt;/li&gt;
&lt;li&gt;missing indexes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And honestly, those things &lt;em&gt;can&lt;/em&gt; matter.&lt;/p&gt;

&lt;p&gt;But one of the most common operational problems in ClickHouse often starts much earlier:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;too many tiny parts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is one of those issues that usually stays invisible at first.&lt;/p&gt;

&lt;p&gt;Then suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;merges fall behind&lt;/li&gt;
&lt;li&gt;queries slow down&lt;/li&gt;
&lt;li&gt;memory usage increases&lt;/li&gt;
&lt;li&gt;inserts become unstable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the cluster starts behaving strangely.&lt;/p&gt;




&lt;h1&gt;
  
  
  Every Insert Creates Parts
&lt;/h1&gt;

&lt;p&gt;This is the first thing that’s important to understand.&lt;/p&gt;

&lt;p&gt;In MergeTree-based engines, ClickHouse stores data as immutable parts.&lt;/p&gt;

&lt;p&gt;Something as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;creates new parts on disk.&lt;/p&gt;

&lt;p&gt;And this is completely normal.&lt;/p&gt;

&lt;p&gt;ClickHouse is designed around this storage model.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;parts themselves are not the problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real issue starts when parts begin accumulating faster than merges can stabilize them.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Tiny Inserts Become Dangerous
&lt;/h1&gt;

&lt;p&gt;At smaller scale, tiny inserts may seem harmless.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inserting row-by-row&lt;/li&gt;
&lt;li&gt;extremely frequent micro-batches&lt;/li&gt;
&lt;li&gt;tiny streaming flush intervals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;everything still works.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But over time, the number of parts starts growing aggressively.&lt;/p&gt;

&lt;p&gt;Now ClickHouse has to manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more metadata&lt;/li&gt;
&lt;li&gt;more merges&lt;/li&gt;
&lt;li&gt;more scheduling&lt;/li&gt;
&lt;li&gt;more file operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates operational overhead.&lt;/p&gt;

&lt;p&gt;Meaning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the system starts spending increasing resources managing fragmentation itself.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Why Merges Matter So Much
&lt;/h1&gt;

&lt;p&gt;ClickHouse relies heavily on background merges.&lt;/p&gt;

&lt;p&gt;These merges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;combine smaller parts&lt;/li&gt;
&lt;li&gt;reduce fragmentation&lt;/li&gt;
&lt;li&gt;improve compression&lt;/li&gt;
&lt;li&gt;optimize query performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under healthy ingestion patterns, merges naturally keep the system stable over time.&lt;/p&gt;

&lt;p&gt;That is the ideal state.&lt;/p&gt;

&lt;p&gt;But problems start when:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parts created per second
        &amp;gt;
parts merged per second
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now fragmented parts begin accumulating faster than ClickHouse can compact them.&lt;/p&gt;

&lt;p&gt;And this is usually where instability slowly starts building.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Dangerous Part Is That It Builds Slowly
&lt;/h1&gt;

&lt;p&gt;This is what makes the issue tricky operationally.&lt;/p&gt;

&lt;p&gt;You usually do not notice the problem immediately.&lt;/p&gt;

&lt;p&gt;The cluster may look perfectly healthy initially.&lt;/p&gt;

&lt;p&gt;Then gradually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;insert latency increases&lt;/li&gt;
&lt;li&gt;merges lag behind&lt;/li&gt;
&lt;li&gt;CPU usage becomes unstable&lt;/li&gt;
&lt;li&gt;queries become heavier&lt;/li&gt;
&lt;li&gt;replication slows down&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And eventually ClickHouse may start throwing errors like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Too many parts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At that point, the merge system is already under serious pressure.&lt;/p&gt;




&lt;h1&gt;
  
  
  Queries Also Become More Expensive
&lt;/h1&gt;

&lt;p&gt;A lot of people think parts only affect inserts.&lt;/p&gt;

&lt;p&gt;But queries suffer too.&lt;/p&gt;

&lt;p&gt;Because queries now need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open more parts&lt;/li&gt;
&lt;li&gt;scan more metadata&lt;/li&gt;
&lt;li&gt;coordinate more files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even when the actual dataset itself is not massive.&lt;/p&gt;

&lt;p&gt;So sometimes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;performance degradation comes more from fragmentation than raw data volume.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a very important operational insight.&lt;/p&gt;




&lt;h1&gt;
  
  
  FINAL Does Not Really Solve This
&lt;/h1&gt;

&lt;p&gt;One thing that’s important to understand:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FINAL&lt;/code&gt; is not really a solution for too many parts.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FINAL&lt;/code&gt; applies merge logic during query execution.&lt;/p&gt;

&lt;p&gt;But the fragmented parts still physically exist underneath.&lt;/p&gt;

&lt;p&gt;So if the system already has excessive fragmentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queries still scan many parts&lt;/li&gt;
&lt;li&gt;merge pressure still exists&lt;/li&gt;
&lt;li&gt;query execution can become heavier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;FINAL can actually become more expensive when fragmentation becomes unhealthy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real fix is usually improving ingestion and merge behavior itself.&lt;/p&gt;




&lt;h1&gt;
  
  
  Over-Partitioning Can Quietly Make This Worse
&lt;/h1&gt;

&lt;p&gt;Another thing that often accelerates part explosion is overly granular partitioning.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMMDDhh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;instead of something broader like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;toYYYYMM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now even small inserts may create parts across many partitions simultaneously.&lt;/p&gt;

&lt;p&gt;Which means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;a single insert can end up creating multiple fragmented parts underneath.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And over time, merge pressure increases much faster than expected.&lt;/p&gt;




&lt;h1&gt;
  
  
  ClickHouse Also Has Ways to Help
&lt;/h1&gt;

&lt;p&gt;Modern ClickHouse versions also support features like async inserts to help reduce excessive tiny-part creation.&lt;/p&gt;

&lt;p&gt;Instead of immediately flushing every small insert into separate parts, ClickHouse can buffer inserts internally before writing larger parts to disk.&lt;/p&gt;

&lt;p&gt;This helps reduce fragmentation and merge pressure in workloads that naturally produce smaller inserts.&lt;/p&gt;

&lt;p&gt;But async inserts are not a replacement for healthy ingestion patterns themselves.&lt;/p&gt;

&lt;p&gt;Stable batching still matters a lot.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Batch Size Matters So Much
&lt;/h1&gt;

&lt;p&gt;ClickHouse generally performs much better with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;larger batches&lt;/li&gt;
&lt;li&gt;fewer inserts&lt;/li&gt;
&lt;li&gt;healthier merge behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because fewer parts means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fewer merges&lt;/li&gt;
&lt;li&gt;lower metadata overhead&lt;/li&gt;
&lt;li&gt;better compression&lt;/li&gt;
&lt;li&gt;more efficient scans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the reasons ClickHouse ingestion patterns often look very different from traditional OLTP systems.&lt;/p&gt;




&lt;h1&gt;
  
  
  Too Many Parts Also Affects Startup and Recovery
&lt;/h1&gt;

&lt;p&gt;Another thing people often discover late:&lt;/p&gt;

&lt;p&gt;Large numbers of parts also affect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;startup time&lt;/li&gt;
&lt;li&gt;replication recovery&lt;/li&gt;
&lt;li&gt;metadata loading&lt;/li&gt;
&lt;li&gt;server restarts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because ClickHouse now has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scan part metadata&lt;/li&gt;
&lt;li&gt;validate parts&lt;/li&gt;
&lt;li&gt;rebuild internal state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;before the server becomes fully operational again.&lt;/p&gt;

&lt;p&gt;So the issue is not just query performance.&lt;/p&gt;

&lt;p&gt;It becomes an overall operational stability problem.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Important Lesson
&lt;/h1&gt;

&lt;p&gt;One thing I’ve noticed with ClickHouse is that many performance problems are actually merge-management problems underneath.&lt;/p&gt;

&lt;p&gt;And too many parts is one of the clearest examples of that.&lt;/p&gt;

&lt;p&gt;Because the issue usually is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“ClickHouse cannot handle large data.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The issue is more often:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;fragmentation and merge pressure slowly became unhealthy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a very different operational problem.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;ClickHouse is extremely good at handling massive analytical workloads.&lt;/p&gt;

&lt;p&gt;But it performs best when the storage engine is allowed to merge parts efficiently.&lt;/p&gt;

&lt;p&gt;And sometimes the biggest performance problem is not the query itself.&lt;/p&gt;

&lt;p&gt;It is the thousands of tiny fragmented parts quietly building underneath the system over time.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>backend</category>
    </item>
    <item>
      <title>Why Real-Time Analytics Eventually Changes Your Database Architecture</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Tue, 19 May 2026 16:36:19 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/why-real-time-analytics-eventually-changes-your-database-architecture-7gf</link>
      <guid>https://dev.to/mohhddhassan/why-real-time-analytics-eventually-changes-your-database-architecture-7gf</guid>
      <description>&lt;p&gt;A lot of systems begin with a single database.&lt;/p&gt;

&lt;p&gt;Usually PostgreSQL.&lt;/p&gt;

&lt;p&gt;And honestly, in the beginning, that works perfectly fine.&lt;/p&gt;

&lt;p&gt;The application stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;users&lt;/li&gt;
&lt;li&gt;payments&lt;/li&gt;
&lt;li&gt;inventory&lt;/li&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;operational state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dashboards query the same database.&lt;/p&gt;

&lt;p&gt;Analytics queries also run directly on PostgreSQL.&lt;/p&gt;

&lt;p&gt;Everything feels simple.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Problem Usually Starts Slowly
&lt;/h1&gt;

&lt;p&gt;At first, analytical queries are small.&lt;/p&gt;

&lt;p&gt;Maybe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;daily reports&lt;/li&gt;
&lt;li&gt;lightweight aggregations&lt;/li&gt;
&lt;li&gt;small dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing too serious.&lt;/p&gt;

&lt;p&gt;But over time, systems start generating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more events&lt;/li&gt;
&lt;li&gt;more metrics&lt;/li&gt;
&lt;li&gt;more logs&lt;/li&gt;
&lt;li&gt;more historical records&lt;/li&gt;
&lt;li&gt;more observability data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And analytical workloads start behaving very differently from transactional workloads.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_time_ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a very different kind of workload from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One is trying to preserve operational correctness.&lt;/p&gt;

&lt;p&gt;The other is trying to analyze huge amounts of historical data.&lt;/p&gt;

&lt;p&gt;And eventually those workloads start colliding.&lt;/p&gt;




&lt;h1&gt;
  
  
  PostgreSQL Slowly Becomes Responsible for Everything
&lt;/h1&gt;

&lt;p&gt;This is where things usually start getting interesting.&lt;/p&gt;

&lt;p&gt;A lot of systems unintentionally turn PostgreSQL into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the transactional database&lt;/li&gt;
&lt;li&gt;the reporting database&lt;/li&gt;
&lt;li&gt;the analytics database&lt;/li&gt;
&lt;li&gt;the observability database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;all at the same time.&lt;/p&gt;

&lt;p&gt;And honestly, modern PostgreSQL is capable enough that this can work surprisingly well for a while.&lt;/p&gt;

&lt;p&gt;Until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dashboards become heavier&lt;/li&gt;
&lt;li&gt;retention windows grow&lt;/li&gt;
&lt;li&gt;analytical scans become larger&lt;/li&gt;
&lt;li&gt;observability traffic increases&lt;/li&gt;
&lt;li&gt;aggregations become expensive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now suddenly the same database handling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payments&lt;/li&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;users&lt;/li&gt;
&lt;li&gt;inventory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;is also handling large analytical workloads.&lt;/p&gt;

&lt;p&gt;And this is usually where architectural pressure starts building.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Real Problem Is Workload Isolation
&lt;/h1&gt;

&lt;p&gt;This is honestly the biggest lesson.&lt;/p&gt;

&lt;p&gt;The issue is usually not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“PostgreSQL is slow.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The issue is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;transactional workloads and analytical workloads optimize for completely different things.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Transactional systems care heavily about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consistency&lt;/li&gt;
&lt;li&gt;operational latency&lt;/li&gt;
&lt;li&gt;updates&lt;/li&gt;
&lt;li&gt;row-level modifications&lt;/li&gt;
&lt;li&gt;business correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Analytical systems care heavily about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large scans&lt;/li&gt;
&lt;li&gt;aggregations&lt;/li&gt;
&lt;li&gt;compression&lt;/li&gt;
&lt;li&gt;historical analytics&lt;/li&gt;
&lt;li&gt;query throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are fundamentally different workload patterns.&lt;/p&gt;

&lt;p&gt;And eventually trying to optimize one database perfectly for both becomes painful.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Observability Changes Everything So Quickly
&lt;/h1&gt;

&lt;p&gt;One thing I find interesting is how fast observability workloads expose architectural limitations.&lt;/p&gt;

&lt;p&gt;Because observability systems continuously generate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;traces&lt;/li&gt;
&lt;li&gt;events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And these workloads grow aggressively over time.&lt;/p&gt;

&lt;p&gt;Now imagine running:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large aggregations&lt;/li&gt;
&lt;li&gt;historical scans&lt;/li&gt;
&lt;li&gt;high-cardinality queries&lt;/li&gt;
&lt;li&gt;real-time dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;on the same database handling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;inventory&lt;/li&gt;
&lt;li&gt;operational business logic&lt;/li&gt;
&lt;li&gt;transactional traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At smaller scale this may still work.&lt;/p&gt;

&lt;p&gt;At larger scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;query contention increases&lt;/li&gt;
&lt;li&gt;operational latency becomes sensitive&lt;/li&gt;
&lt;li&gt;workload isolation becomes harder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And eventually systems start evolving toward separation.&lt;/p&gt;




&lt;h1&gt;
  
  
  This Is Usually When Analytical Databases Start Appearing
&lt;/h1&gt;

&lt;p&gt;At some point, many systems evolve toward something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application
    ↓
PostgreSQL
    ↓
CDC / Kafka / Airbyte
    ↓
ClickHouse / OLAP DB
    ↓
Analytics / Dashboards / Observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern has become extremely common in modern analytical systems.&lt;/p&gt;

&lt;p&gt;And honestly, the reason is pretty simple:&lt;/p&gt;

&lt;p&gt;PostgreSQL remains responsible for operational correctness.&lt;/p&gt;

&lt;p&gt;ClickHouse becomes responsible for analytical scale.&lt;/p&gt;

&lt;p&gt;Each system handles the workload it was actually designed for.&lt;/p&gt;




&lt;h1&gt;
  
  
  Not All Analytical Data Needs PostgreSQL First
&lt;/h1&gt;

&lt;p&gt;One important thing though:&lt;/p&gt;

&lt;p&gt;Not all analytical data even originates from PostgreSQL.&lt;/p&gt;

&lt;p&gt;A lot of observability workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;traces&lt;/li&gt;
&lt;li&gt;telemetry events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;often flow directly into ClickHouse/OLAP DB through streaming pipelines.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Applications / Services
        ↓
Kafka / Streaming Pipelines
        ↓
ClickHouse / OLAP DB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In many systems, PostgreSQL stores the business data while ClickHouse directly handles logs, metrics, events, and analytical workloads.&lt;/p&gt;

&lt;p&gt;And honestly, this makes a lot of sense.&lt;/p&gt;

&lt;p&gt;Because analytical systems are usually optimized for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;append-heavy ingestion&lt;/li&gt;
&lt;li&gt;historical querying&lt;/li&gt;
&lt;li&gt;event-style workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;not transactional business operations.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Not Just Use ClickHouse for Everything?
&lt;/h1&gt;

&lt;p&gt;This is another common misunderstanding.&lt;/p&gt;

&lt;p&gt;ClickHouse is incredible for analytical workloads.&lt;/p&gt;

&lt;p&gt;But transactional systems still require things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;frequent updates&lt;/li&gt;
&lt;li&gt;operational consistency&lt;/li&gt;
&lt;li&gt;transactional guarantees&lt;/li&gt;
&lt;li&gt;row-level modifications&lt;/li&gt;
&lt;li&gt;business-critical correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the primary design goals of analytical databases.&lt;/p&gt;

&lt;p&gt;You generally do not want your:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication system&lt;/li&gt;
&lt;li&gt;payment workflows&lt;/li&gt;
&lt;li&gt;inventory updates&lt;/li&gt;
&lt;li&gt;operational application state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;depending entirely on analytical database behavior.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why CDC Pipelines Become So Important
&lt;/h1&gt;

&lt;p&gt;One reason this architecture became so practical is CDC (Change Data Capture).&lt;/p&gt;

&lt;p&gt;Instead of repeatedly exporting data manually, systems continuously stream changes from PostgreSQL into analytical systems using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;Debezium&lt;/li&gt;
&lt;li&gt;Airbyte&lt;/li&gt;
&lt;li&gt;streaming pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operational systems continue working normally&lt;/li&gt;
&lt;li&gt;analytical systems receive near real-time data&lt;/li&gt;
&lt;li&gt;workloads stay separated cleanly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And analytical queries no longer compete directly against transactional traffic.&lt;/p&gt;




&lt;h1&gt;
  
  
  Don’t Rush Into Multi-Database Architectures
&lt;/h1&gt;

&lt;p&gt;One important thing though:&lt;/p&gt;

&lt;p&gt;Most systems do not need Kafka + ClickHouse pipelines on Day 1.&lt;/p&gt;

&lt;p&gt;Honestly, many applications can scale surprisingly far with PostgreSQL alone using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;proper indexing&lt;/li&gt;
&lt;li&gt;query optimization&lt;/li&gt;
&lt;li&gt;read replicas&lt;/li&gt;
&lt;li&gt;partitioning&lt;/li&gt;
&lt;li&gt;extensions like &lt;a href="https://github.com/citusdata/citus" rel="noopener noreferrer"&gt;Citus&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to introduce more infrastructure as early as possible.&lt;/p&gt;

&lt;p&gt;The real signal usually appears when analytical workloads start affecting operational user experience.&lt;/p&gt;

&lt;p&gt;That is often when workload separation starts becoming worth the additional architectural complexity.&lt;/p&gt;

&lt;p&gt;Because systems like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CDC pipelines&lt;/li&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;analytical databases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;also introduce operational overhead of their own.&lt;/p&gt;

&lt;p&gt;And good architecture is usually about introducing complexity only when the workload actually demands it.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Bigger Engineering Lesson
&lt;/h1&gt;

&lt;p&gt;Most systems do not start with multiple databases.&lt;/p&gt;

&lt;p&gt;They evolve into them as workloads grow.&lt;/p&gt;

&lt;p&gt;Transactional workloads and analytical workloads behave very differently at scale.&lt;/p&gt;

&lt;p&gt;And eventually systems start separating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operational correctness&lt;/li&gt;
&lt;li&gt;analytical querying&lt;/li&gt;
&lt;li&gt;observability workloads&lt;/li&gt;
&lt;li&gt;historical analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;into infrastructure optimized for each workload.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;A lot of modern systems do not start with multiple databases.&lt;/p&gt;

&lt;p&gt;They evolve into them.&lt;/p&gt;

&lt;p&gt;Because transactional workloads and analytical workloads eventually want very different things from the same infrastructure.&lt;/p&gt;

&lt;p&gt;And real-time analytics is often the thing that forces that architectural separation to happen.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>postgres</category>
      <category>dataengineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>FINAL in ClickHouse Isn’t as Expensive as It Used to Be</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Thu, 14 May 2026 16:04:00 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/final-in-clickhouse-isnt-as-expensive-as-it-used-to-be-1ne9</link>
      <guid>https://dev.to/mohhddhassan/final-in-clickhouse-isnt-as-expensive-as-it-used-to-be-1ne9</guid>
      <description>&lt;p&gt;For a long time, the advice around &lt;code&gt;FINAL&lt;/code&gt; in ClickHouse was pretty straightforward:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Avoid it whenever possible.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And honestly, that advice existed for good reasons.&lt;/p&gt;

&lt;p&gt;Older versions of ClickHouse could make &lt;code&gt;FINAL&lt;/code&gt; extremely expensive depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;table size&lt;/li&gt;
&lt;li&gt;partitioning&lt;/li&gt;
&lt;li&gt;number of parts&lt;/li&gt;
&lt;li&gt;merge state&lt;/li&gt;
&lt;li&gt;query patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So people started treating &lt;code&gt;FINAL&lt;/code&gt; almost like a red flag.&lt;/p&gt;

&lt;p&gt;But modern ClickHouse has changed a lot.&lt;/p&gt;

&lt;p&gt;And I think the conversation around &lt;code&gt;FINAL&lt;/code&gt; deserves a bit more nuance now.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why FINAL Existed in the First Place
&lt;/h1&gt;

&lt;p&gt;To understand why &lt;code&gt;FINAL&lt;/code&gt; was historically considered expensive, you first need to understand what it actually does.&lt;/p&gt;

&lt;p&gt;In engines like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CollapsingMergeTree&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;VersionedCollapsingMergeTree&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse does not immediately rewrite rows in place.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inserts create new parts&lt;/li&gt;
&lt;li&gt;background merges reconcile rows later&lt;/li&gt;
&lt;li&gt;deduplication happens asynchronously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means queries can temporarily see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicate versions&lt;/li&gt;
&lt;li&gt;old versions&lt;/li&gt;
&lt;li&gt;intermediate states&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FINAL&lt;/code&gt; forces ClickHouse to apply merge logic during query execution itself.&lt;/p&gt;

&lt;p&gt;That means the query may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read more data&lt;/li&gt;
&lt;li&gt;perform additional deduplication work&lt;/li&gt;
&lt;li&gt;consume more CPU and memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why older advice strongly discouraged using it everywhere.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Old FINAL Problem
&lt;/h1&gt;

&lt;p&gt;Historically, &lt;code&gt;FINAL&lt;/code&gt; could become painful on large datasets.&lt;/p&gt;

&lt;p&gt;Especially when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;partitions were large&lt;/li&gt;
&lt;li&gt;too many parts existed&lt;/li&gt;
&lt;li&gt;merges lagged behind&lt;/li&gt;
&lt;li&gt;queries scanned massive ranges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;People would add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;FINAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to "fix" duplicate rows without understanding why duplicates existed in the first place.&lt;/p&gt;

&lt;p&gt;The result was often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;slower queries&lt;/li&gt;
&lt;li&gt;higher memory usage&lt;/li&gt;
&lt;li&gt;unnecessary query overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the community advice became:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Design your schema properly and avoid FINAL whenever possible.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And honestly?&lt;/p&gt;

&lt;p&gt;That advice still matters.&lt;/p&gt;

&lt;p&gt;But the implementation of &lt;code&gt;FINAL&lt;/code&gt; itself has improved significantly over time.&lt;/p&gt;




&lt;h1&gt;
  
  
  Modern ClickHouse Has Improved FINAL a Lot
&lt;/h1&gt;

&lt;p&gt;Recent ClickHouse versions introduced multiple improvements around &lt;code&gt;FINAL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallel execution&lt;/li&gt;
&lt;li&gt;partition-aware optimizations&lt;/li&gt;
&lt;li&gt;improved memory behavior&lt;/li&gt;
&lt;li&gt;smarter merge execution&lt;/li&gt;
&lt;li&gt;reduced unnecessary reads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;FINAL&lt;/code&gt; is no longer the monster it used to be.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And this is important because newer ClickHouse guidance has also become more practical about using it when necessary.&lt;/p&gt;

&lt;p&gt;Even in some recent discussions and office hours from the ClickHouse ecosystem, using &lt;code&gt;FINAL&lt;/code&gt; for latest-state queries is no longer treated as automatically wrong.&lt;/p&gt;

&lt;p&gt;That would have sounded controversial a few years ago.&lt;/p&gt;




&lt;h1&gt;
  
  
  FINAL vs argMax Isn’t Always a Simple Comparison
&lt;/h1&gt;

&lt;p&gt;For a long time, many ClickHouse users avoided &lt;code&gt;FINAL&lt;/code&gt; by using patterns like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;argMax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And honestly, for older ClickHouse versions and large workloads, that often made sense.&lt;/p&gt;

&lt;p&gt;But modern ClickHouse has improved &lt;code&gt;FINAL&lt;/code&gt; significantly enough that the tradeoff is no longer as one-sided as it used to be.&lt;/p&gt;

&lt;p&gt;In some latest-state query scenarios, using &lt;code&gt;FINAL&lt;/code&gt; can now be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simpler&lt;/li&gt;
&lt;li&gt;easier to maintain&lt;/li&gt;
&lt;li&gt;and completely reasonable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;table size&lt;/li&gt;
&lt;li&gt;partitioning&lt;/li&gt;
&lt;li&gt;query filters&lt;/li&gt;
&lt;li&gt;merge behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important part is understanding the workload instead of blindly following older rules.&lt;/p&gt;




&lt;h1&gt;
  
  
  So… Is FINAL Safe to Use Now?
&lt;/h1&gt;

&lt;p&gt;This is where nuance matters.&lt;/p&gt;

&lt;p&gt;The answer is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"FINAL bad"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and also not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"FINAL free now"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real answer is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;FINAL is much more practical in modern ClickHouse, but workload design still matters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction is important.&lt;/p&gt;




&lt;h1&gt;
  
  
  Where FINAL Makes Sense
&lt;/h1&gt;

&lt;p&gt;There are legitimate cases where &lt;code&gt;FINAL&lt;/code&gt; is completely reasonable now.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latest-state queries&lt;/li&gt;
&lt;li&gt;smaller partitions&lt;/li&gt;
&lt;li&gt;low-latency analytical workloads&lt;/li&gt;
&lt;li&gt;deduplicated views over mutable datasets&lt;/li&gt;
&lt;li&gt;operational analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Especially when using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;proper partitioning&lt;/li&gt;
&lt;li&gt;controlled part counts&lt;/li&gt;
&lt;li&gt;optimized schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these cases, modern ClickHouse handles &lt;code&gt;FINAL&lt;/code&gt; much better than older versions did.&lt;/p&gt;




&lt;h1&gt;
  
  
  Where FINAL Can Still Hurt
&lt;/h1&gt;

&lt;p&gt;Even with improvements, &lt;code&gt;FINAL&lt;/code&gt; is not magically free.&lt;/p&gt;

&lt;p&gt;It can still become expensive when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scanning huge datasets&lt;/li&gt;
&lt;li&gt;querying many partitions&lt;/li&gt;
&lt;li&gt;merges are heavily delayed&lt;/li&gt;
&lt;li&gt;part counts explode&lt;/li&gt;
&lt;li&gt;schema design is poor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;massive_events_table&lt;/span&gt;
&lt;span class="k"&gt;FINAL&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On very large analytical datasets, this can still force substantial extra work.&lt;/p&gt;

&lt;p&gt;So blindly adding &lt;code&gt;FINAL&lt;/code&gt; everywhere is still not a great idea.&lt;/p&gt;




&lt;h1&gt;
  
  
  SELECT ... FINAL vs OPTIMIZE TABLE ... FINAL
&lt;/h1&gt;

&lt;p&gt;One important distinction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;OPTIMIZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;FINAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;are completely different operations.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT ... FINAL&lt;/code&gt; applies merge logic during query execution.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;OPTIMIZE TABLE ... FINAL&lt;/code&gt; forces a heavy merge operation on storage parts themselves.&lt;/p&gt;

&lt;p&gt;The first is a query-time behavior.&lt;/p&gt;

&lt;p&gt;The second is a storage-level operation that can become extremely expensive on large datasets.&lt;/p&gt;

&lt;p&gt;People often mix these two together when discussing FINAL performance, but they solve very different problems.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Bigger Lesson Is Understanding Why You Need FINAL
&lt;/h1&gt;

&lt;p&gt;This is honestly the most important part.&lt;/p&gt;

&lt;p&gt;A lot of people use &lt;code&gt;FINAL&lt;/code&gt; reactively.&lt;/p&gt;

&lt;p&gt;They see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicate rows&lt;/li&gt;
&lt;li&gt;outdated versions&lt;/li&gt;
&lt;li&gt;inconsistent query results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and immediately add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;FINAL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;without understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;merge behavior&lt;/li&gt;
&lt;li&gt;part lifecycle&lt;/li&gt;
&lt;li&gt;asynchronous deduplication&lt;/li&gt;
&lt;li&gt;storage engine behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That usually creates larger problems later.&lt;/p&gt;

&lt;p&gt;The better approach is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Understand why the table requires FINAL in the first place.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the schema can improve&lt;/li&gt;
&lt;li&gt;partitioning can improve&lt;/li&gt;
&lt;li&gt;merges can stabilize naturally&lt;/li&gt;
&lt;li&gt;query design can change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;using FINAL is actually perfectly acceptable.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  ClickHouse Advice Evolves Too
&lt;/h1&gt;

&lt;p&gt;One thing I find interesting about ClickHouse is how quickly operational advice evolves as the engine improves.&lt;/p&gt;

&lt;p&gt;Advice that was absolutely correct for older versions can become incomplete later.&lt;/p&gt;

&lt;p&gt;And I think &lt;code&gt;FINAL&lt;/code&gt; is one of the best examples of that.&lt;/p&gt;

&lt;p&gt;Older guidance:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;avoid FINAL aggressively&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Modern reality:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;understand FINAL properly before deciding whether to avoid it&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much more useful mental model now.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;I still would not recommend blindly adding &lt;code&gt;FINAL&lt;/code&gt; everywhere.&lt;/p&gt;

&lt;p&gt;But I also do not think modern ClickHouse users should automatically treat it like a disaster anymore.&lt;/p&gt;

&lt;p&gt;The real question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Is FINAL bad?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Why does this query need FINAL, and is that tradeoff acceptable for this workload?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That mindset leads to much better ClickHouse designs than simply following old rules blindly.&lt;/p&gt;




&lt;h1&gt;
  
  
  References
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://clickhouse.com/docs/sql-reference/statements/select/from#final-modifier" rel="noopener noreferrer"&gt;ClickHouse Docs - FINAL Modifier&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kb.altinity.com/altinity-kb-queries-and-syntax/altinity-kb-final-clause-speed/" rel="noopener noreferrer"&gt;Altinity KB - FINAL Clause Speed&lt;/a&gt;&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>dataengineering</category>
      <category>databaseopimization</category>
      <category>replacingmergetree</category>
    </item>
    <item>
      <title>Why PostgreSQL and ClickHouse Work So Well Together</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Mon, 11 May 2026 09:26:10 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/why-postgresql-and-clickhouse-work-so-well-together-5128</link>
      <guid>https://dev.to/mohhddhassan/why-postgresql-and-clickhouse-work-so-well-together-5128</guid>
      <description>&lt;p&gt;A lot of people compare PostgreSQL and ClickHouse like they are competing databases.&lt;/p&gt;

&lt;p&gt;They really are not.&lt;/p&gt;

&lt;p&gt;In fact, modern data systems often use both together.&lt;/p&gt;

&lt;p&gt;And once you understand what each database is optimized for, the reason becomes pretty obvious.&lt;/p&gt;




&lt;h1&gt;
  
  
  PostgreSQL and ClickHouse Solve Different Problems
&lt;/h1&gt;

&lt;p&gt;The biggest mistake people make is expecting both databases to behave similarly.&lt;/p&gt;

&lt;p&gt;They are built for entirely different workloads.&lt;/p&gt;

&lt;p&gt;PostgreSQL is primarily an OLTP database.&lt;/p&gt;

&lt;p&gt;ClickHouse is primarily an OLAP database.&lt;/p&gt;

&lt;p&gt;That single difference changes almost everything about how they think internally.&lt;/p&gt;




&lt;h1&gt;
  
  
  PostgreSQL Thinks About Transactions First
&lt;/h1&gt;

&lt;p&gt;PostgreSQL is extremely good at handling transactional workloads.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user data&lt;/li&gt;
&lt;li&gt;payments&lt;/li&gt;
&lt;li&gt;inventory&lt;/li&gt;
&lt;li&gt;banking records&lt;/li&gt;
&lt;li&gt;order systems&lt;/li&gt;
&lt;li&gt;application state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are systems where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consistency matters&lt;/li&gt;
&lt;li&gt;updates happen frequently&lt;/li&gt;
&lt;li&gt;rows are modified constantly&lt;/li&gt;
&lt;li&gt;transactions must be reliable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;inventory&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This kind of workload is where PostgreSQL shines.&lt;/p&gt;

&lt;p&gt;You want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ACID guarantees&lt;/li&gt;
&lt;li&gt;reliable transactions&lt;/li&gt;
&lt;li&gt;row-level updates&lt;/li&gt;
&lt;li&gt;strong consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL is designed around exactly that.&lt;/p&gt;




&lt;h1&gt;
  
  
  ClickHouse Thinks About Analytics First
&lt;/h1&gt;

&lt;p&gt;ClickHouse approaches data very differently.&lt;/p&gt;

&lt;p&gt;Instead of optimizing for frequent row updates, it optimizes for analytical queries across massive datasets.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;event streams&lt;/li&gt;
&lt;li&gt;analytical dashboards&lt;/li&gt;
&lt;li&gt;time-series workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_time_ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a completely different style of workload.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;modifying small numbers of rows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse is optimized for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scanning huge amounts of data efficiently&lt;/li&gt;
&lt;li&gt;aggregating billions of records&lt;/li&gt;
&lt;li&gt;compressing analytical datasets&lt;/li&gt;
&lt;li&gt;fast columnar reads&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  PostgreSQL Stores the Business. ClickHouse Explains It.
&lt;/h1&gt;

&lt;p&gt;This is honestly the simplest way I think about it now.&lt;/p&gt;

&lt;p&gt;PostgreSQL usually stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;current application state&lt;/li&gt;
&lt;li&gt;transactional business data&lt;/li&gt;
&lt;li&gt;operational records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse usually stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;analytical history&lt;/li&gt;
&lt;li&gt;events&lt;/li&gt;
&lt;li&gt;metrics&lt;/li&gt;
&lt;li&gt;large-scale queryable telemetry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One powers the application.&lt;/p&gt;

&lt;p&gt;The other explains what the application is doing.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why They Commonly Exist Together
&lt;/h1&gt;

&lt;p&gt;This is where things get interesting.&lt;/p&gt;

&lt;p&gt;In many modern architectures, PostgreSQL becomes the operational source of truth.&lt;/p&gt;

&lt;p&gt;Then data flows into ClickHouse for analytics.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application
    ↓
PostgreSQL
    ↓
CDC / Airbyte / Kafka
    ↓
ClickHouse
    ↓
Dashboards / Analytics / Observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is far more common than many people realize.&lt;/p&gt;

&lt;p&gt;Because each database is doing what it is best at.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Not Just Use PostgreSQL for Analytics?
&lt;/h1&gt;

&lt;p&gt;PostgreSQL &lt;em&gt;can&lt;/em&gt; do analytical queries.&lt;/p&gt;

&lt;p&gt;But analytical workloads behave very differently from transactional workloads.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scanning billions of rows&lt;/li&gt;
&lt;li&gt;large aggregations&lt;/li&gt;
&lt;li&gt;observability queries&lt;/li&gt;
&lt;li&gt;real-time analytics&lt;/li&gt;
&lt;li&gt;historical trend analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These workloads stress databases differently.&lt;/p&gt;

&lt;p&gt;ClickHouse is optimized around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;columnar storage&lt;/li&gt;
&lt;li&gt;vectorized execution&lt;/li&gt;
&lt;li&gt;aggressive compression&lt;/li&gt;
&lt;li&gt;analytical query execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why queries over huge datasets often feel dramatically faster in ClickHouse.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Not Just Use ClickHouse for Everything?
&lt;/h1&gt;

&lt;p&gt;This is another common misunderstanding.&lt;/p&gt;

&lt;p&gt;ClickHouse is incredible for analytics.&lt;/p&gt;

&lt;p&gt;But transactional systems require things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;frequent updates&lt;/li&gt;
&lt;li&gt;transactional consistency&lt;/li&gt;
&lt;li&gt;row-level modifications&lt;/li&gt;
&lt;li&gt;operational application state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not the primary design goal of ClickHouse.&lt;/p&gt;

&lt;p&gt;You generally do not want your:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user authentication system&lt;/li&gt;
&lt;li&gt;banking transactions&lt;/li&gt;
&lt;li&gt;inventory updates&lt;/li&gt;
&lt;li&gt;operational business logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to depend entirely on analytical database behavior.&lt;/p&gt;




&lt;h1&gt;
  
  
  The Interesting Part Is the Separation of Responsibilities
&lt;/h1&gt;

&lt;p&gt;What I personally find interesting is how these systems complement each other instead of replacing each other.&lt;/p&gt;

&lt;p&gt;PostgreSQL handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operational correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;analytical scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation creates much cleaner architectures.&lt;/p&gt;

&lt;p&gt;Instead of forcing one database to solve every problem, each system handles the workload it was designed for.&lt;/p&gt;




&lt;h1&gt;
  
  
  CDC Is What Connects Them
&lt;/h1&gt;

&lt;p&gt;One thing that makes this architecture powerful is CDC (Change Data Capture).&lt;/p&gt;

&lt;p&gt;Instead of manually exporting data repeatedly, systems can stream changes from PostgreSQL into ClickHouse continuously.&lt;/p&gt;

&lt;p&gt;Tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debezium&lt;/li&gt;
&lt;li&gt;Airbyte&lt;/li&gt;
&lt;li&gt;Kafka pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;make this pattern extremely practical now.&lt;/p&gt;

&lt;p&gt;The operational system continues running normally while analytical systems receive data almost in real time.&lt;/p&gt;




&lt;h1&gt;
  
  
  They Even Think Differently Internally
&lt;/h1&gt;

&lt;p&gt;The differences go deeper than just "transactions vs analytics".&lt;/p&gt;

&lt;p&gt;PostgreSQL thinks heavily about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rows&lt;/li&gt;
&lt;li&gt;transactional consistency&lt;/li&gt;
&lt;li&gt;updates&lt;/li&gt;
&lt;li&gt;locking&lt;/li&gt;
&lt;li&gt;relational integrity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse thinks heavily about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;columns&lt;/li&gt;
&lt;li&gt;compression&lt;/li&gt;
&lt;li&gt;merges&lt;/li&gt;
&lt;li&gt;partitions&lt;/li&gt;
&lt;li&gt;analytical scans&lt;/li&gt;
&lt;li&gt;aggregation efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even their storage engines reflect completely different priorities.&lt;/p&gt;




&lt;h1&gt;
  
  
  This Is Why Modern Data Stacks Often Use Both
&lt;/h1&gt;

&lt;p&gt;Once you stop viewing databases as competitors and instead view them as workload-specific systems, the architecture starts making much more sense.&lt;/p&gt;

&lt;p&gt;PostgreSQL handles the operational side.&lt;/p&gt;

&lt;p&gt;ClickHouse handles the analytical side.&lt;/p&gt;

&lt;p&gt;Together, they create systems that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;process transactions reliably&lt;/li&gt;
&lt;li&gt;scale analytical workloads efficiently&lt;/li&gt;
&lt;li&gt;support observability&lt;/li&gt;
&lt;li&gt;power dashboards&lt;/li&gt;
&lt;li&gt;retain huge historical datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;without forcing a single database to do everything.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thought
&lt;/h1&gt;

&lt;p&gt;The more I learn about databases, the more I realize that most modern architectures are really about separation of responsibilities.&lt;/p&gt;

&lt;p&gt;PostgreSQL and ClickHouse work well together because they optimize for fundamentally different problems.&lt;/p&gt;

&lt;p&gt;One is built to preserve business state reliably.&lt;/p&gt;

&lt;p&gt;The other is built to analyze massive amounts of history efficiently.&lt;/p&gt;

&lt;p&gt;And when combined properly, they complement each other extremely well.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>postgres</category>
      <category>clickhouse</category>
      <category>systemarchitecture</category>
    </item>
    <item>
      <title>PostgreSQL Restore Failures: It Wasn’t pgBackRest, It Was My Recovery Logic</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Wed, 06 May 2026 12:24:28 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/postgresql-restore-failures-it-wasnt-pgbackrest-it-was-my-recovery-logic-527d</link>
      <guid>https://dev.to/mohhddhassan/postgresql-restore-failures-it-wasnt-pgbackrest-it-was-my-recovery-logic-527d</guid>
      <description>&lt;p&gt;I was building and testing a PostgreSQL backup and restore workflow using pgBackRest.&lt;/p&gt;

&lt;p&gt;The idea was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take backups&lt;/li&gt;
&lt;li&gt;restore them automatically&lt;/li&gt;
&lt;li&gt;validate the database&lt;/li&gt;
&lt;li&gt;make recovery predictable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, I ended up repeatedly breaking PostgreSQL recovery itself.&lt;/p&gt;

&lt;p&gt;At one point, PostgreSQL refused to start entirely, the application depending on it failed to start, and I started seeing errors like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;invalid checkpoint record
could not locate a valid checkpoint record at 0/DEAD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Later, I also hit timeline mismatch errors like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: [058]: target timeline 3 forked from backup timeline 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, I thought:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;pgBackRest restores were corrupting PostgreSQL.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That assumption turned out to be completely wrong.&lt;/p&gt;

&lt;p&gt;The real problem was the way I was handling recovery.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Was Building
&lt;/h2&gt;

&lt;p&gt;I was testing a PostgreSQL backup/restore flow locally after repeated restore failures elsewhere.&lt;/p&gt;

&lt;p&gt;To isolate the issue properly, I moved PostgreSQL onto my local machine and started testing the restore logic independently through API-triggered workflows.&lt;/p&gt;

&lt;p&gt;The restore flow looked roughly like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download backup repo&lt;/li&gt;
&lt;li&gt;Stop PostgreSQL&lt;/li&gt;
&lt;li&gt;Restore backup&lt;/li&gt;
&lt;li&gt;Start PostgreSQL&lt;/li&gt;
&lt;li&gt;Validate database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sounds straightforward.&lt;/p&gt;

&lt;p&gt;It wasn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Major Failure
&lt;/h2&gt;

&lt;p&gt;After a restore attempt, PostgreSQL refused to start.&lt;/p&gt;

&lt;p&gt;The logs looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG: database system was interrupted
LOG: invalid checkpoint record
PANIC: could not locate a valid checkpoint record at 0/DEAD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At that point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL was down&lt;/li&gt;
&lt;li&gt;the application couldn't start&lt;/li&gt;
&lt;li&gt;authentication-related functionality stopped working&lt;/li&gt;
&lt;li&gt;and repeated restore attempts made things even worse&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What confused me initially was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The restore itself appeared to complete.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But PostgreSQL would immediately enter recovery problems afterward.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Wrong Assumption
&lt;/h2&gt;

&lt;p&gt;This was the real issue.&lt;/p&gt;

&lt;p&gt;Every time recovery failed, I kept seeing files like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;backup_label&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;recovery.signal&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;standby.signal&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I assumed they were leftover artifacts from failed restores.&lt;/p&gt;

&lt;p&gt;My restore automation started aggressively cleaning them up.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; recovery.signal standby.signal backup_label
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I genuinely believed this was helping PostgreSQL start cleanly.&lt;/p&gt;

&lt;p&gt;In reality:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I was deleting the exact recovery metadata PostgreSQL needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That misunderstanding caused almost every major issue afterward.&lt;/p&gt;




&lt;h2&gt;
  
  
  What PostgreSQL Was Actually Trying To Do
&lt;/h2&gt;

&lt;p&gt;This was the turning point.&lt;/p&gt;

&lt;p&gt;pgBackRest wasn't randomly writing junk files into the data directory.&lt;/p&gt;

&lt;p&gt;Those files exist for a reason.&lt;/p&gt;

&lt;p&gt;During restore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;backup_label&lt;/code&gt; tells PostgreSQL where recovery should begin&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;recovery.signal&lt;/code&gt; tells PostgreSQL to enter recovery mode&lt;/li&gt;
&lt;li&gt;WAL replay reconstructs a consistent database state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL was actually trying to perform a valid recovery process.&lt;/p&gt;

&lt;p&gt;My automation kept interrupting or invalidating it.&lt;/p&gt;

&lt;p&gt;Once I understood that, the entire problem started making sense.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Recovery Loop Problem
&lt;/h2&gt;

&lt;p&gt;Because my cleanup logic removed recovery metadata prematurely, PostgreSQL ended up in inconsistent states repeatedly.&lt;/p&gt;

&lt;p&gt;Sometimes it would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enter recovery mode&lt;/li&gt;
&lt;li&gt;fail WAL replay&lt;/li&gt;
&lt;li&gt;lose checkpoint continuity&lt;/li&gt;
&lt;li&gt;refuse startup entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other times it would partially start, but remain stuck in recovery mode.&lt;/p&gt;

&lt;p&gt;That led to additional logic being added just to stabilize startup behavior.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pg_is_in_recovery&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and when required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pg_promote&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal wasn't to "force PostgreSQL to work".&lt;/p&gt;

&lt;p&gt;The goal was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;let PostgreSQL finish recovery properly, then promote only when necessary.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction mattered a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Timeline Mismatch Error
&lt;/h2&gt;

&lt;p&gt;At one stage, I also hit this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: [058]: target timeline 3 forked from backup timeline 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one was especially confusing at first.&lt;/p&gt;

&lt;p&gt;The issue was not just corrupted startup state anymore.&lt;/p&gt;

&lt;p&gt;Now PostgreSQL was rejecting WAL history itself.&lt;/p&gt;

&lt;p&gt;This happened because earlier restore attempts had already created inconsistent recovery timelines.&lt;/p&gt;

&lt;p&gt;I had essentially created multiple broken recovery histories while repeatedly testing and modifying the restore process.&lt;/p&gt;

&lt;p&gt;That was another important lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL backups are not just data files.&lt;br&gt;
They are tightly connected to WAL history and recovery timelines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, I realized I was no longer debugging a simple restore failure. I was debugging recovery history itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem In My Restore Flow
&lt;/h2&gt;

&lt;p&gt;Initially, my restore logic tried to "fix" PostgreSQL after restore.&lt;/p&gt;

&lt;p&gt;That approach was fundamentally flawed.&lt;/p&gt;

&lt;p&gt;The older flow looked roughly like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Old Approach&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Delta restore&lt;/td&gt;
&lt;td&gt;Mixed old/new recovery state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delete &lt;code&gt;backup_label&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Broke recovery metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delete &lt;code&gt;recovery.signal&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Interrupted recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Force archive changes&lt;/td&gt;
&lt;td&gt;Caused WAL continuity issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hope PostgreSQL starts&lt;/td&gt;
&lt;td&gt;No validation or recovery awareness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I was treating recovery artifacts like corruption.&lt;/p&gt;

&lt;p&gt;They weren't corruption.&lt;/p&gt;

&lt;p&gt;They were part of PostgreSQL recovery itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Change That Finally Fixed It
&lt;/h2&gt;

&lt;p&gt;The biggest realization was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop fighting PostgreSQL recovery.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of trying to manually "clean up" PostgreSQL after restore, I changed the restore flow completely.&lt;/p&gt;

&lt;p&gt;The corrected restore flow became:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Stop PostgreSQL cleanly&lt;/li&gt;
&lt;li&gt;Completely empty the data directory&lt;/li&gt;
&lt;li&gt;Run pgBackRest restore properly&lt;/li&gt;
&lt;li&gt;Let PostgreSQL recover normally&lt;/li&gt;
&lt;li&gt;Wait for readiness&lt;/li&gt;
&lt;li&gt;Promote only if recovery mode persists&lt;/li&gt;
&lt;li&gt;Validate using pgBackRest check&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The critical change was this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_run_pgbackrest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--type=immediate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And equally important:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_empty_directory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pg_data_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of attempting partial or delta-style recovery cleanup, the restore process now starts from a completely clean data directory.&lt;/p&gt;

&lt;p&gt;That eliminated a huge amount of inconsistent state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why &lt;code&gt;--type=immediate&lt;/code&gt; Helped
&lt;/h2&gt;

&lt;p&gt;This turned out to be extremely important.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--type=immediate&lt;/code&gt; tells pgBackRest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;restore to the latest immediately consistent point available.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL could perform proper WAL-based recovery&lt;/li&gt;
&lt;li&gt;recovery metadata stayed intact&lt;/li&gt;
&lt;li&gt;WAL replay remained valid&lt;/li&gt;
&lt;li&gt;timeline handling became predictable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL itself was finally allowed to control recovery correctly.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Mistake That Increased the Blast Radius
&lt;/h2&gt;

&lt;p&gt;One thing I learned the hard way:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never test restore automation against a database actively used by an application.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even though this was a testing workflow, the PostgreSQL instance was still tied to application startup behavior.&lt;/p&gt;

&lt;p&gt;So whenever PostgreSQL failed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application startup failed too&lt;/li&gt;
&lt;li&gt;user-related functionality broke&lt;/li&gt;
&lt;li&gt;debugging became much harder under pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After repeated failures, I moved the restore testing flow entirely onto my local machine and isolated PostgreSQL from the rest of the application stack.&lt;/p&gt;

&lt;p&gt;That made debugging significantly easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Another Subtle Issue: Backup Failures After Restore
&lt;/h2&gt;

&lt;p&gt;I also ran into another confusing problem after some restore attempts.&lt;/p&gt;

&lt;p&gt;In certain cases, subsequent backups started failing unexpectedly after a restore.&lt;/p&gt;

&lt;p&gt;Part of the issue came from mixing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;restore operations&lt;/li&gt;
&lt;li&gt;delta-style restore assumptions&lt;/li&gt;
&lt;li&gt;and archive/WAL state inconsistencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At one stage, I was also toggling archive-related behavior incorrectly during recovery experiments, which further complicated WAL continuity.&lt;/p&gt;

&lt;p&gt;This reinforced another important realization:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL backups are tightly coupled with WAL history and recovery timelines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even when the database appears to start correctly, inconsistent recovery state can break future backup behavior in subtle ways.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned From This
&lt;/h2&gt;

&lt;p&gt;This experience completely changed how I think about PostgreSQL recovery.&lt;/p&gt;

&lt;p&gt;Some major lessons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;backup_label&lt;/code&gt; and &lt;code&gt;recovery.signal&lt;/code&gt; are not garbage files&lt;/li&gt;
&lt;li&gt;PostgreSQL recovery is heavily WAL-dependent&lt;/li&gt;
&lt;li&gt;Timelines matter more than most people realize&lt;/li&gt;
&lt;li&gt;Partial cleanup creates inconsistent recovery states&lt;/li&gt;
&lt;li&gt;A clean restore is often safer than trying to "repair" recovery manually&lt;/li&gt;
&lt;li&gt;pgBackRest already knows how to orchestrate PostgreSQL recovery properly&lt;/li&gt;
&lt;li&gt;Restore validation matters as much as backup creation&lt;/li&gt;
&lt;li&gt;Backup testing should happen in isolated environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL recovery is not something you should "fight".&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once I stopped trying to override recovery behavior manually and instead allowed PostgreSQL + pgBackRest to handle recovery the way they were designed to, the restore flow finally became stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Final Restore Flow That Actually Worked
&lt;/h2&gt;

&lt;p&gt;After multiple failed recovery attempts, timeline mismatches, and broken startup states, I stopped trying to manually "fix" PostgreSQL recovery and instead simplified the restore process completely.&lt;/p&gt;

&lt;p&gt;The final stable flow looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# simplified restore flow
&lt;/span&gt;
&lt;span class="nf"&gt;stop_postgres&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;empty_data_directory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;pgbackrest_restore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--type=immediate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;start_postgres&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;wait_for_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;postgres_is_in_recovery&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;promote_postgres&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;pgbackrest_check&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part here is not the code itself.&lt;/p&gt;

&lt;p&gt;It's the recovery philosophy behind it.&lt;/p&gt;

&lt;p&gt;The earlier versions of my restore logic tried to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;partially clean recovery state&lt;/li&gt;
&lt;li&gt;remove recovery metadata&lt;/li&gt;
&lt;li&gt;force PostgreSQL out of recovery&lt;/li&gt;
&lt;li&gt;preserve old data directory state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That approach kept creating inconsistent recovery conditions.&lt;/p&gt;

&lt;p&gt;The corrected flow instead does three important things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;starts from a completely clean data directory&lt;/li&gt;
&lt;li&gt;lets pgBackRest manage recovery metadata properly&lt;/li&gt;
&lt;li&gt;allows PostgreSQL to perform WAL recovery the way it was designed to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest change was no longer treating files like &lt;code&gt;backup_label&lt;/code&gt; or &lt;code&gt;recovery.signal&lt;/code&gt; as corruption artifacts.&lt;/p&gt;

&lt;p&gt;They were part of the recovery process itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;At the beginning, I thought PostgreSQL restores were failing because the database was corrupted.&lt;/p&gt;

&lt;p&gt;In reality, the corruption was coming from my own recovery assumptions.&lt;/p&gt;

&lt;p&gt;The system wasn't broken.&lt;/p&gt;

&lt;p&gt;My mental model of PostgreSQL recovery was.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Tue, 28 Apr 2026 10:20:11 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/arrayjoin-in-clickhouse-why-your-rows-are-duplicating-and-how-to-control-it-5862</link>
      <guid>https://dev.to/mohhddhassan/arrayjoin-in-clickhouse-why-your-rows-are-duplicating-and-how-to-control-it-5862</guid>
      <description>&lt;p&gt;When working with arrays in ClickHouse, &lt;code&gt;arrayJoin&lt;/code&gt; feels straightforward.&lt;/p&gt;

&lt;p&gt;Until your query suddenly returns far more rows than expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;Let’s say you have a table like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actions&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;user_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;click'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scroll'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;purchase'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you want each action as a separate row.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tool: &lt;code&gt;arrayJoin&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1   click
1   scroll
1   purchase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So far, everything looks correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Things Go Wrong
&lt;/h2&gt;

&lt;p&gt;Now let’s say you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;action2&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You might expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 rows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But you actually get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;9 rows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayJoin&lt;/code&gt; doesn’t just flatten arrays.&lt;/p&gt;

&lt;p&gt;It expands rows.&lt;/p&gt;

&lt;p&gt;Each element in the array creates a new row.&lt;/p&gt;

&lt;p&gt;So when you use it multiple times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First &lt;code&gt;arrayJoin&lt;/code&gt; → expands rows&lt;/li&gt;
&lt;li&gt;Second &lt;code&gt;arrayJoin&lt;/code&gt; → expands again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;p&gt;3 elements → 3 × 3 = 9 rows&lt;/p&gt;

&lt;p&gt;This is effectively a cartesian multiplication of rows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden Impact
&lt;/h2&gt;

&lt;p&gt;This becomes a real problem when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arrays are large&lt;/li&gt;
&lt;li&gt;Multiple &lt;code&gt;arrayJoin&lt;/code&gt;s are used&lt;/li&gt;
&lt;li&gt;You don’t expect row multiplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect output&lt;/li&gt;
&lt;li&gt;Sudden increase in row count&lt;/li&gt;
&lt;li&gt;Slower queries&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Better Approach
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use a single &lt;code&gt;arrayJoin&lt;/code&gt; when possible
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. Use &lt;code&gt;ARRAY JOIN&lt;/code&gt; syntax (cleaner and explicit)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="n"&gt;ARRAY&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;actions&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  3. Use &lt;code&gt;arrayZip&lt;/code&gt; to avoid unintended multiplication
&lt;/h3&gt;

&lt;p&gt;If you’re working with multiple arrays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;arrayJoin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arrayZip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;zipped&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures elements are paired instead of multiplied.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayJoin&lt;/code&gt; is powerful-but easy to misuse.&lt;/p&gt;

&lt;p&gt;If used without understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Row count can explode&lt;/li&gt;
&lt;li&gt;Queries become expensive&lt;/li&gt;
&lt;li&gt;Results can be misleading&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Event tracking pipelines&lt;/li&gt;
&lt;li&gt;Flattening nested JSON&lt;/li&gt;
&lt;li&gt;Working with semi-structured logs&lt;/li&gt;
&lt;li&gt;Exploding arrays into rows for analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  One Important Gotcha
&lt;/h2&gt;

&lt;p&gt;Every &lt;code&gt;arrayJoin&lt;/code&gt; multiplies rows.&lt;/p&gt;

&lt;p&gt;If your result size looks unexpectedly large, this is one of the first things to check.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;arrayJoin&lt;/code&gt; is one of the most useful tools in ClickHouse.&lt;/p&gt;

&lt;p&gt;But its behavior is not always intuitive.&lt;/p&gt;

&lt;p&gt;In many cases, the issue is not the data itself-but how the query expands it.&lt;/p&gt;

&lt;p&gt;Understanding this early can save a lot of debugging time.&lt;/p&gt;

</description>
      <category>clickhouse</category>
      <category>sql</category>
      <category>database</category>
      <category>performance</category>
    </item>
    <item>
      <title>greatCircleDistance in ClickHouse: Avoiding Full Table Scans</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:18:23 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/greatcircledistance-in-clickhouse-avoiding-full-table-scans-375p</link>
      <guid>https://dev.to/mohhddhassan/greatcircledistance-in-clickhouse-avoiding-full-table-scans-375p</guid>
      <description>&lt;p&gt;When working with location data, one problem shows up almost immediately:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I calculate the distance between two coordinates stored in my database?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At first, it seems like something you’d have to handle outside the database.&lt;/p&gt;

&lt;p&gt;But if you're using ClickHouse, there’s a built-in function for this.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Right Tool: &lt;code&gt;greatCircleDistance&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;greatCircleDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lon1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lon2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It calculates the shortest distance between two points on Earth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;greatCircleDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0827&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2707&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;9716&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;77&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5946&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance_meters&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you the distance between Chennai and Bangalore - in &lt;strong&gt;meters&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Looks Simple… But There’s a Catch
&lt;/h2&gt;

&lt;p&gt;Now let’s say you write a query like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;greatCircleDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lon&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0827&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2707&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first glance, this looks perfectly fine.&lt;/p&gt;

&lt;p&gt;But this can quietly turn into a full table scan - especially on large datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;In ClickHouse, indexes don’t work like traditional B-tree indexes.&lt;/p&gt;

&lt;p&gt;They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sparse&lt;/li&gt;
&lt;li&gt;Designed for &lt;strong&gt;range pruning&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They work well for queries like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But not for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;greatCircleDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lon&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The function is applied on the columns, so ClickHouse cannot use the index to skip data efficiently.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Better Approach (What You Should Actually Do)
&lt;/h2&gt;

&lt;p&gt;Instead of directly applying the function, &lt;strong&gt;reduce the dataset first&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bounding Box Filter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0827&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0827&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;lon&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2707&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2707&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;greatCircleDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lon&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0827&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2707&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;(The bounding box is an approximation to reduce the search space before exact filtering.)&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;lat BETWEEN&lt;/code&gt; → uses index&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lon BETWEEN&lt;/code&gt; → reduces rows further&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;greatCircleDistance&lt;/code&gt; → applied only on filtered data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of scanning the entire table:&lt;br&gt;
 =&amp;gt; You narrow it down first, then compute accurately&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;This pattern is useful in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delivery radius filtering&lt;/li&gt;
&lt;li&gt;Finding nearby users&lt;/li&gt;
&lt;li&gt;Geo-based analytics&lt;/li&gt;
&lt;li&gt;Ride-sharing systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  One Important Gotcha
&lt;/h2&gt;

&lt;p&gt;Make sure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coordinates are in &lt;strong&gt;degrees (not radians)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Order is always &lt;code&gt;(lat, lon)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Swapping them will give incorrect results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;greatCircleDistance&lt;/code&gt; is powerful - but if used blindly, it can hurt performance.&lt;/p&gt;

&lt;p&gt;In ClickHouse, performance often depends more on how you query than what you query.&lt;/p&gt;

&lt;p&gt;Sometimes, the right approach isn’t just using a function - but knowing &lt;strong&gt;when and how to use it efficiently&lt;/strong&gt;.&lt;/p&gt;




</description>
      <category>clickhouse</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>performance</category>
    </item>
    <item>
      <title>Why My S3 Backup Setup Broke: Buckets, “Folders”, and Scheduling Misconceptions</title>
      <dc:creator>Mohamed Hussain S</dc:creator>
      <pubDate>Thu, 16 Apr 2026 07:04:20 +0000</pubDate>
      <link>https://dev.to/mohhddhassan/why-my-s3-backup-setup-broke-buckets-folders-and-scheduling-misconceptions-315e</link>
      <guid>https://dev.to/mohhddhassan/why-my-s3-backup-setup-broke-buckets-folders-and-scheduling-misconceptions-315e</guid>
      <description>&lt;p&gt;&lt;em&gt;Another lesson in building reliable systems - not just configuring them.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I thought I had everything set up correctly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backups configured&lt;/li&gt;
&lt;li&gt;S3-compatible storage connected&lt;/li&gt;
&lt;li&gt;Backup triggered via cron jobs during testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yet nothing showed up where I expected.&lt;/p&gt;

&lt;p&gt;What looked like a simple configuration issue turned out to be a wrong mental model of how S3 actually works.&lt;/p&gt;

&lt;p&gt;This post breaks down what went wrong and what fixed it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I was working with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An S3-compatible object storage (not AWS directly)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A system that allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuring a bucket&lt;/li&gt;
&lt;li&gt;Setting a backup path&lt;/li&gt;
&lt;li&gt;Defining backup frequency&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Everything seemed straightforward.&lt;/p&gt;

&lt;p&gt;But the problem started with one assumption:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Buckets can behave like folders.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The First Mistake: Treating Buckets Like Folders
&lt;/h2&gt;

&lt;p&gt;In a traditional file system, you think like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;backups/
  app1/
    db.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So it felt natural to assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a “folder” in object storage&lt;/li&gt;
&lt;li&gt;Then create buckets inside it for different use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8owl9w7awnhyhx94611.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8owl9w7awnhyhx94611.png" alt=" " width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my case, I had something like a folder already created in the object storage UI, and I assumed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That is my base, and I can create buckets under it&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I tried:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting to that “folder” as a bucket&lt;/li&gt;
&lt;li&gt;Then creating another bucket inside it (for vector DB backups)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This kept failing.&lt;/p&gt;

&lt;p&gt;At first, I thought:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maybe it is a permission issue&lt;/li&gt;
&lt;li&gt;Maybe my user does not have enough access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that was not the real problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Was Actually Going Wrong
&lt;/h2&gt;

&lt;p&gt;I was effectively trying to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Treat a bucket like a parent directory&lt;/li&gt;
&lt;li&gt;And create another bucket inside it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not how S3 works.&lt;/p&gt;

&lt;p&gt;In S3:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Buckets are top-level containers&lt;/li&gt;
&lt;li&gt;You cannot nest buckets inside other buckets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when I tried to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connect to an existing bucket&lt;/li&gt;
&lt;li&gt;And then create another bucket under it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It failed because the concept itself is invalid.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Correct Mental Model
&lt;/h2&gt;

&lt;p&gt;This is how S3 actually works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backups&lt;/span&gt;
&lt;span class="na"&gt;object key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app1/2026-04-15/db.sql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are only two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bucket (top-level)&lt;/li&gt;
&lt;li&gt;Object key (full path as a string)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no real folder hierarchy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Organizing Data the Right Way
&lt;/h2&gt;

&lt;p&gt;The fix was not about creating folders.&lt;/p&gt;

&lt;p&gt;It was about changing how I name objects.&lt;/p&gt;

&lt;p&gt;Instead of trying to structure things at the bucket level, I moved that structure into the object key.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;object_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qdrant/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;snapshot_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives a structure like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backups&lt;/span&gt;

&lt;span class="s"&gt;qdrant/&lt;/span&gt;
  &lt;span class="s"&gt;collection_1/&lt;/span&gt;
    &lt;span class="s"&gt;snapshot_001&lt;/span&gt;
  &lt;span class="s"&gt;collection_2/&lt;/span&gt;
    &lt;span class="s"&gt;snapshot_002&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even though S3 is flat internally, most UIs render this as a folder-like structure.&lt;/p&gt;

&lt;p&gt;This is the correct way to organize data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Second Mistake: Mixing Bucket and Path
&lt;/h2&gt;

&lt;p&gt;Another issue was passing paths as part of the bucket name.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;backups/qdrant&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is invalid.&lt;/p&gt;

&lt;p&gt;Correct approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bucket = backups&lt;/li&gt;
&lt;li&gt;object key = qdrant/collection_name/snapshot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;S3 APIs expect a valid bucket name, not a path.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Finally Clicked
&lt;/h2&gt;

&lt;p&gt;The breakthrough was realizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I was not dealing with folders at all&lt;/li&gt;
&lt;li&gt;I was dealing with string prefixes inside object keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I stopped trying to create hierarchy at the bucket level and moved everything into object naming, the entire setup started working as expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Correct configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bucket:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  backups
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Object naming:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qdrant/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;snapshot_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This alone was enough to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organize backups cleanly&lt;/li&gt;
&lt;li&gt;Avoid bucket-related errors&lt;/li&gt;
&lt;li&gt;Make the storage layout intuitive in the UI&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Buckets are not folders&lt;/li&gt;
&lt;li&gt;You cannot create a bucket inside another bucket&lt;/li&gt;
&lt;li&gt;S3 is a flat object store&lt;/li&gt;
&lt;li&gt;Folder-like structures come from object key prefixes&lt;/li&gt;
&lt;li&gt;Always keep bucket and path separate&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The issue was not with permissions or configuration.&lt;/p&gt;

&lt;p&gt;It was a mismatch between how I expected storage to behave and how it actually works.&lt;/p&gt;

&lt;p&gt;Once the mental model changed, the implementation became simple.&lt;/p&gt;

&lt;p&gt;If something feels unnecessarily complicated in S3, it is often a sign that the model being used is incorrect.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>cloudstorage</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
