<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Kjerrumgaard</title>
    <description>The latest articles on DEV Community by David Kjerrumgaard (@david_kjerrumgaard_d31d7e).</description>
    <link>https://dev.to/david_kjerrumgaard_d31d7e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3493436%2Fc0c6e849-cc96-42e8-80b3-87bf0633ca70.png</url>
      <title>DEV Community: David Kjerrumgaard</title>
      <link>https://dev.to/david_kjerrumgaard_d31d7e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/david_kjerrumgaard_d31d7e"/>
    <language>en</language>
    <item>
      <title>Is Software Dead? It Depends on What You’re Building</title>
      <dc:creator>David Kjerrumgaard</dc:creator>
      <pubDate>Tue, 03 Feb 2026 20:36:29 +0000</pubDate>
      <link>https://dev.to/david_kjerrumgaard_d31d7e/is-software-dead-it-depends-on-what-youre-building-4n7d</link>
      <guid>https://dev.to/david_kjerrumgaard_d31d7e/is-software-dead-it-depends-on-what-youre-building-4n7d</guid>
      <description>&lt;p&gt;I've spent over a decade building, selling, and scaling SaaS and infrastructure products — from early-stage startups to enterprise platforms. I've watched this industry survive the "cloud is a fad" era, the ZIRP hangover, and at least two rounds of "software is dead" narratives. It's never been dead. But it has always been evolving, and the companies that refuse to see the shifts clearly are the ones that don't make it. What's happening right now is real, it's significant, and it deserves a more honest conversation than either the doomsayers or the cheerleaders are offering. No, the sky is not falling — but if you're building or investing in software and you're not paying close attention, you're going to get caught off guard.&lt;/p&gt;

&lt;p&gt;For those of you who might not pay close attention to the private credit or stock markets, you might not have noticed that software stocks are getting crushed. Hedge funds are dumping SaaS positions at a pace we haven't seen since 2008. Private credit firms with $100 billion in software exposure are watching their balance sheets deteriorate in real time. Traders on Wall Street are calling it a "&lt;a href="https://cloudedjudgement.substack.com/p/clouded-judgement-13026-software" rel="noopener noreferrer"&gt;SaaSpocalypse&lt;/a&gt;."&lt;/p&gt;

&lt;p&gt;Meanwhile, on the other side of the hype cycle, AI evangelists promise that every industry will be transformed within 18 months and that trillion-dollar markets are being created overnight.&lt;/p&gt;

&lt;p&gt;Both camps sound confident. I think both are partially right and meaningfully wrong.&lt;/p&gt;

&lt;p&gt;As someone who has spent years building in the data infrastructure space, I've been watching these narratives collide — and honestly, the lack of nuance in either direction is doing real damage to how companies are being evaluated, funded, and built.&lt;/p&gt;

&lt;p&gt;Here's my honest take.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confidence in the SaaS model has shattered
&lt;/h2&gt;

&lt;p&gt;Let's start with the numbers. As Jamin Ball recently noted in Clouded Judgement, the median next-twelve-months revenue multiple for cloud software has dropped to 4.1x — the lowest in a decade. The median free cash flow multiple sits at 18.9x, roughly 30% below the previous 10-year low. These aren't normal fluctuations. This is a structural repricing.&lt;/p&gt;

&lt;p&gt;The reason goes deeper than a bad earnings season. SaaS businesses have long been valued as predictable cash flow machines — spend aggressively early, flip to profitability, then compound. The math behind those valuations rests on two foundational assumptions: that retention rates remain high and stable, and that the business has meaningful terminal value. AI is now calling both assumptions into question simultaneously.&lt;/p&gt;

&lt;p&gt;If customers leave legacy SaaS vendors for AI-native alternatives, retention craters and the cash flow model breaks. If entire software categories get commoditized, the terminal value for some of these companies may genuinely be zero. Even if you disagree with the most bearish case, the &lt;em&gt;probability&lt;/em&gt; of those outcomes is higher today than it was a year ago — and that alone justifies lower multiples.&lt;/p&gt;

&lt;h2&gt;
  
  
  The disruption is real, but it's not uniform
&lt;/h2&gt;

&lt;p&gt;The core anxiety driving the selloff is straightforward: AI can now do things that used to require purpose-built software. Legal review tools, customer service platforms, content management systems, basic analytics — the list of categories where AI is a credible alternative grows every week. That's not hype. That's happening.&lt;/p&gt;

&lt;p&gt;But the leap from "AI can replace some software categories" to "sell everything with a SaaS business model" is exactly the kind of overcorrection markets are prone to. "Software" is not a single thing. Treating all software companies as equally exposed to AI disruption is like saying every business that uses electricity is equally vulnerable to a grid failure. The exposure varies enormously depending on where you sit in the stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three very different realities under one label
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Software that AI can replace.&lt;/strong&gt; These are application-layer products whose core value proposition is a workflow that AI can now perform directly. Document review, templated content generation, basic data entry automation, simple customer routing. If your product is essentially a codified process wrapped in a UI, and that process can now be handled by a foundation model with a good prompt, the threat is real and immediate. This isn't a valuation problem — it's an existential one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software that needs to evolve.&lt;/strong&gt; This is the largest and most interesting group. Most horizontal SaaS platforms aren't going to disappear overnight, but they face intensifying pricing pressure and feature commoditization. As Ball points out, the deeper issue isn't that someone will "vibe code" a replacement for Salesforce — it's that the marginal cost of creating software has collapsed, which will flood every category with competition and commoditize markets faster than incumbents can respond.&lt;/p&gt;

&lt;p&gt;The stock market is already sorting this group in real time. HubSpot, a strong company by any traditional SaaS metric, saw its stock drop roughly 50% in 2025 as investors questioned whether SMB CRM and marketing automation can defend its pricing against AI-native alternatives. Adobe fell around 35% despite genuinely impressive AI capabilities in Firefly — the market's concern isn't that Adobe isn't innovating, it's that standalone AI tools can now deliver "good enough" creative output for the majority of use cases at a fraction of the cost. Atlassian and Monday.com saw similar declines as investors recalibrated what project management and collaboration software is worth in a world where AI agents can coordinate work autonomously.&lt;/p&gt;

&lt;p&gt;These are not failing companies. They are strong businesses facing a fundamental question: can they integrate AI deeply enough to become &lt;em&gt;more&lt;/em&gt; valuable, not less? The market is right to ask hard questions here. It's wrong to assume the answers are universally negative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software that AI depends on.&lt;/strong&gt; Infrastructure — the systems that move data, manage compute, handle security, orchestrate distributed workloads — doesn't get replaced by AI. It gets consumed by it. Every AI workload needs to ingest data at scale, process events in real time, route outputs to downstream systems, and do all of this reliably across global environments. The rise of AI is arguably the single biggest demand driver infrastructure software has ever seen.&lt;/p&gt;

&lt;p&gt;The companies that have figured this out are thriving while the rest of the sector burns. Cloudflare's stock rose over 80% in 2025 — not because it added AI features, but because its edge computing infrastructure is where AI inference actually runs. As one analyst noted, Cloudflare isn't selling AI features; it's selling the pipes that AI runs on. Datadog saw its AI-native customer revenue grow from 4% to 11% of total revenue in a single year, with over a dozen AI-native companies each spending more than $1 million annually on its observability platform. More AI workloads means more complexity to monitor, more logs to analyze, more security threats to detect. Snowflake's growth re-accelerated to nearly 30% as enterprises recognized that data infrastructure is the foundation AI needs before it can do anything useful. CrowdStrike climbed over 50% because AI doesn't reduce cybersecurity threats — it creates entirely new attack surfaces that need defending.&lt;/p&gt;

&lt;p&gt;Even ServiceNow, which straddles the line between application and platform, generated over $600 million from its AI assistant products alone and grew subscription revenue 21% by positioning itself as an "AI Control Tower" for enterprise workflows — not competing with AI, but becoming the orchestration layer that AI agents operate within. Notably, ServiceNow's retention rates haven't taken a hit yet, which may be an early signal that well-positioned platforms can weather this storm.&lt;/p&gt;

&lt;p&gt;The pattern is clear: the companies winning aren't the ones bolting AI features onto existing products. They're the ones whose core infrastructure becomes &lt;em&gt;more essential&lt;/em&gt; as AI adoption scales.&lt;/p&gt;

&lt;p&gt;Yet many of these infrastructure companies are still being sold off alongside the ones AI is actually displacing, simply because they carry the "software" label.&lt;/p&gt;

&lt;h2&gt;
  
  
  The financial feedback loop is making things worse
&lt;/h2&gt;

&lt;p&gt;What makes this moment especially treacherous isn't just the technology thesis — it's the credit cycle layered on top of it. Business development companies have roughly $100 billion in exposure to software companies. As software valuations decline, BDC balance sheets deteriorate. As BDCs tighten credit, software companies lose access to growth capital. As growth slows, valuations fall further.&lt;/p&gt;

&lt;p&gt;This dynamic doesn't discriminate. A company with strong fundamentals and growing revenue can get caught in the same credit squeeze as one that's genuinely being disrupted, simply because both carry the "software" label. Default rates in private credit could reach 13% if AI disruption plays out aggressively, according to UBS — a projection that makes lenders cautious across the board, not just with the most exposed borrowers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the bears are getting right
&lt;/h2&gt;

&lt;p&gt;The fundamental insight — that the marginal cost of creating software has collapsed — is correct and profound. This is not a temporary dislocation. When anyone can build a functional application in hours instead of months, the structural economics of the industry change permanently.&lt;/p&gt;

&lt;p&gt;The value shifts away from the application itself and toward the underlying data, the integrations, the operational complexity, and the reliability of the systems that power it. Software that survives long-term will be software that's hard to replicate — not because of its UI, but because of the engineering depth and infrastructure moats it embodies.&lt;/p&gt;

&lt;p&gt;That's a healthy and overdue reckoning for parts of the industry, even if the process of getting there is painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the bears are getting wrong
&lt;/h2&gt;

&lt;p&gt;The timeline is being compressed unrealistically. Yes, AI can generate a basic application from a prompt. No, that does not mean enterprise software disappears next quarter. Adoption curves, procurement cycles, compliance requirements, integration complexity, and organizational inertia all mean that even genuinely disrupted categories will take years to fully turn over. Markets are pricing in a revolution that will actually unfold as an evolution.&lt;/p&gt;

&lt;p&gt;The all-or-nothing framing is also creating mispricing in both directions. Some companies will see their growth &lt;em&gt;accelerate&lt;/em&gt; because of AI adoption. Others will see specific product lines threatened while their core platform becomes more essential. Painting every software company with the same brush guarantees you'll be wrong about most of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the AI optimists are getting wrong
&lt;/h2&gt;

&lt;p&gt;On the flip side, the unbounded enthusiasm deserves its own reality check. Not every AI demo translates to an enterprise deployment. Not every proof of concept survives contact with production data, regulatory requirements, and organizational change management. The gap between "this is technically possible" and "this is deployed at scale in a Fortune 500" is still measured in years for most use cases.&lt;/p&gt;

&lt;p&gt;We've seen this pattern before. Cloud computing was genuinely transformative, but the timeline from early hype to mainstream enterprise adoption was roughly a decade. Mobile, same story. AI will be faster because the infrastructure is better, but "faster than previous platform shifts" is not the same as "instantaneous."&lt;/p&gt;

&lt;p&gt;Companies making long-term bets based on the assumption that every AI promise will be fulfilled on schedule are just as exposed as the ones ignoring the threat entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cooler heads will prevail
&lt;/h2&gt;

&lt;p&gt;I expect the next 12-18 months to be painful but ultimately clarifying. Ball makes a smart observation: what will change the market's mind is several quarters of stable retention rates from established software companies in the face of AI challengers. ServiceNow's early Q4 results suggest that's possible for well-positioned platforms. If more companies demonstrate that retention is holding, the panic-driven repricing will start to correct.&lt;/p&gt;

&lt;p&gt;The market will develop more precision in how it evaluates software companies, distinguishing between those that are genuinely in AI's path and those that are being caught up in category-level panic. The early data is already here: the 2025 stock performance gap between infrastructure winners and application-layer losers was stark, and that divergence will only sharpen as earnings continue to separate reality from narrative.&lt;/p&gt;

&lt;p&gt;Infrastructure companies will eventually get re-rated as the market recognizes that AI workloads don't reduce demand for data movement, real-time processing, and distributed systems — they dramatically increase it. Application-layer companies will bifurcate sharply between those that integrate AI successfully and those that don't.&lt;/p&gt;

&lt;p&gt;And the credit cycle will unwind on its own timeline, unfortunately causing collateral damage to strong companies that happen to carry the wrong label.&lt;/p&gt;

&lt;p&gt;My advice to anyone building or investing in this space: resist the urge to react to the loudest narrative, whether that's doom or unbounded optimism. Focus on fundamentals — retention, efficiency, genuine technical differentiation, and whether your product becomes more or less essential as AI adoption grows.&lt;/p&gt;

&lt;p&gt;The companies that panic-rebrand as "AI-native" overnight will look desperate in hindsight. The ones that quietly build indispensable technology will look prescient. And the investors who maintain discipline while others oscillate between euphoria and panic will be the ones who capture the real value being created right now.&lt;/p&gt;

&lt;p&gt;The sky isn't falling. But it is changing shape. The winners will be the ones who study the new landscape carefully rather than running for cover or chasing mirages.&lt;/p&gt;

</description>
      <category>saas</category>
      <category>softwaredevelopment</category>
      <category>software</category>
      <category>ai</category>
    </item>
    <item>
      <title>Latency Numbers Every Data Streaming Engineer Should Know</title>
      <dc:creator>David Kjerrumgaard</dc:creator>
      <pubDate>Sun, 14 Sep 2025 21:18:58 +0000</pubDate>
      <link>https://dev.to/david_kjerrumgaard_d31d7e/latency-numbers-every-data-streaming-engineer-should-know-h91</link>
      <guid>https://dev.to/david_kjerrumgaard_d31d7e/latency-numbers-every-data-streaming-engineer-should-know-h91</guid>
      <description>&lt;h1&gt;
  
  
  Latency Numbers Every Data Streaming Engineer Should Know
&lt;/h1&gt;

&lt;p&gt;Jeff Dean's &lt;a href="https://gist.github.com/jboner/2841832" rel="noopener noreferrer"&gt;"Latency Numbers Every Programmer Should Know"&lt;/a&gt; became essential reading because it grounded abstract performance discussions in concrete reality. For data streaming engineers, we need an equivalent framework that translates those fundamental hardware latencies into the specific challenges of real-time data pipelines.&lt;/p&gt;

&lt;p&gt;Just as Dean showed that a disk seek (10ms) costs the same as 40,000 L1 cache references, streaming engineers must understand that a cross-region sync replication (100ms+) costs the same as processing 10,000 in-memory events. These aren't just numbers—they're the physics that govern what's possible in your streaming architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR: Your Latency Budget Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Latency Class&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;End-to-End Target&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Key Constraints&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ultra-low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms&lt;/td&gt;
&lt;td&gt;HFT, real-time control, gaming&lt;/td&gt;
&lt;td&gt;Single AZ only, no disk fsync per record, specialized hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10-200ms&lt;/td&gt;
&lt;td&gt;Interactive dashboards, alerts, online ML features&lt;/td&gt;
&lt;td&gt;Streaming processing, minimal batching, same region&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency-relaxed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;200ms - minutes&lt;/td&gt;
&lt;td&gt;Near-real-time analytics, ETL, reporting&lt;/td&gt;
&lt;td&gt;Enables aggressive batching, cross-region, cost optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Critical Hardware &amp;amp; Network Floors
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Streaming Impact&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HDD seek/fsync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5-20ms&lt;/td&gt;
&lt;td&gt;Consumes entire ultra-low budget&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSD fsync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.05-1ms&lt;/td&gt;
&lt;td&gt;Manageable for low latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Same AZ network (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.2-1ms&lt;/td&gt;
&lt;td&gt;Base cost for any distributed system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-AZ (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-4ms&lt;/td&gt;
&lt;td&gt;Minimum for AZ-redundant streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-region (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30-200ms+&lt;/td&gt;
&lt;td&gt;Makes &amp;lt;100ms E2E impossible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema registry lookup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5ms (cached), 10-50ms (miss)&lt;/td&gt;
&lt;td&gt;Often overlooked latency source&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Streaming Platform Specifics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Typical Latency&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Configuration Notes&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kafka publish (acks=1, same-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5ms&lt;/td&gt;
&lt;td&gt;No replica wait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kafka publish (acks=all, same-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3-15ms&lt;/td&gt;
&lt;td&gt;Adds replica sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-AZ sync replication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+1-5ms&lt;/td&gt;
&lt;td&gt;Per additional AZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Producer batching (linger.ms)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+5-50ms&lt;/td&gt;
&lt;td&gt;Intentional latency for throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumer poll interval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0-500ms+&lt;/td&gt;
&lt;td&gt;Misconfiguration can dominate E2E&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Iceberg commit visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5s-10min&lt;/td&gt;
&lt;td&gt;Depends on commit interval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What "Real-Time" Actually Means
&lt;/h2&gt;

&lt;p&gt;In data streaming, "real-time" has become as overloaded as "big data" once was. Let's establish clear definitions based on both technical constraints and human perception thresholds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcshux0p341mevxcswtz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcshux0p341mevxcswtz.png" alt="Figure 1" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1: Streaming Latency Spectrum showing the logarithmic scale from nanoseconds to minutes, with technology examples and use cases for each latency category.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Ultra-Low Latency (&amp;lt; 10ms End-to-End)
&lt;/h3&gt;

&lt;p&gt;This is the realm of hard real-time systems where every microsecond counts. Applications requiring sub-10ms latency include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-frequency trading&lt;/strong&gt; (where 1ms advantage = millions in profit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time control systems&lt;/strong&gt; (industrial automation, autonomous vehicles)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive gaming&lt;/strong&gt; (where 16ms = one frame at 60fps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-latency market data&lt;/strong&gt; (every trader needs the same speed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everything in one availability zone (cross-AZ RTT alone is 1-4ms)&lt;/li&gt;
&lt;li&gt;No per-record disk fsync (HDD seek = 10ms, breaking your entire budget)&lt;/li&gt;
&lt;li&gt;Kernel bypass networking (DPDK, RDMA)&lt;/li&gt;
&lt;li&gt;Custom serialization (Protocol Buffers, Avro, or binary)&lt;/li&gt;
&lt;li&gt;Memory-mapped storage or pure in-memory processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example stack:&lt;/strong&gt; Apache Pulsar with BookKeeper on NVMe, or heavily tuned Kafka with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Ultra-low latency Kafka producer config
linger.ms=0
batch.size=1024
acks=1
compression.type=none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reality check:&lt;/strong&gt; For perspective, 100ms is the threshold where UI interactions feel instantaneous to humans. Ultra-low latency is an order of magnitude faster than human perception—you're optimizing for machines, not users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Low Latency (10-200ms End-to-End)
&lt;/h3&gt;

&lt;p&gt;This covers the sweet spot for most interactive real-time applications. Users perceive anything under 200ms as "instant" response, making this the target for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live dashboards and monitoring&lt;/strong&gt; (business metrics, system health)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time alerting&lt;/strong&gt; (fraud detection, anomaly detection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Online machine learning features&lt;/strong&gt; (recommendation engines, personalization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live chat and notifications&lt;/strong&gt; (social platforms, collaboration tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time analytics&lt;/strong&gt; (A/B test results, user behavior tracking)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-at-a-time processing (not micro-batches)&lt;/li&gt;
&lt;li&gt;Cross-AZ replication acceptable (adds ~2-5ms)&lt;/li&gt;
&lt;li&gt;Moderate batching for efficiency (5-50ms linger times)&lt;/li&gt;
&lt;li&gt;SSD storage with occasional fsync&lt;/li&gt;
&lt;li&gt;Standard streaming platforms work well&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example stack:&lt;/strong&gt; Apache Kafka + Apache Flink with event-time processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Balanced Kafka configuration
linger.ms=5
batch.size=16384
acks=all
max.in.flight.requests.per.connection=5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost implications:&lt;/strong&gt; This range allows reasonable optimization without exotic hardware. A well-tuned Kafka cluster can achieve 10-50ms P50 latency with hundreds of thousands of events per second.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9we53xjiav718f9x6dur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9we53xjiav718f9x6dur.png" alt="latency-throughput-tradeoff.svg" width="600" height="500"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2: The classic trade-off between latency and throughput in streaming systems. Lower latency typically means higher cost and lower throughput, while batch processing achieves high throughput at the cost of latency.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Latency-Relaxed (200ms - Minutes)
&lt;/h3&gt;

&lt;p&gt;When latency requirements relax beyond a few hundred milliseconds, you enter the realm of cost optimization and massive throughput. This category includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Near-real-time ETL&lt;/strong&gt; (data lake ingestion, warehouse loading)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business intelligence dashboards&lt;/strong&gt; (updating every 30 seconds to 5 minutes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch-oriented analytics&lt;/strong&gt; (hourly/daily reports with "fresh" data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data lake table formats&lt;/strong&gt; (Iceberg, Delta Lake with 1-10 minute commits)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-region data replication&lt;/strong&gt; (disaster recovery, global distribution)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggressive batching (seconds to minutes)&lt;/li&gt;
&lt;li&gt;Cross-region replication feasible&lt;/li&gt;
&lt;li&gt;Cheaper storage tiers (object storage vs. hot SSDs)&lt;/li&gt;
&lt;li&gt;Higher compression ratios&lt;/li&gt;
&lt;li&gt;Simpler error handling and retry logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Netflix's architecture keeps only hours of hot data in Kafka (expensive) and tiers the rest to Apache Iceberg on S3 (38x cheaper)&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. For most analytics, 1-5 minute latency is perfectly acceptable and dramatically reduces infrastructure costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Physics of Streaming Latency
&lt;/h2&gt;

&lt;p&gt;Understanding hardware and network fundamentals isn't academic—these are the unavoidable floors that constrain every streaming system.&lt;/p&gt;
&lt;h3&gt;
  
  
  Storage: The Latency Hierarchy
&lt;/h3&gt;

&lt;p&gt;Every streaming platform must persist data for durability, but storage choices have massive latency implications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory access:        ~100 nanoseconds
SSD random read:      ~150 microseconds (1,500x slower than memory)
NVMe fsync:          ~0.05-1 milliseconds  
SATA SSD fsync:      ~0.5-5 milliseconds
HDD seek/fsync:      ~5-20 milliseconds (200,000x slower than memory!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world example:&lt;/strong&gt; Intel Optane NVMe can sync writes in ~43 microseconds average, while a traditional HDD takes ~18ms—that's 400x faster. For a streaming broker writing 10,000 events/second:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;With HDD:&lt;/strong&gt; Maximum ~50-100 synced writes/second/disk (disk-bound)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With NVMe:&lt;/strong&gt; Thousands of synced writes/second (CPU/network bound)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Kafka-specific insight:&lt;/strong&gt; Kafka's sequential write pattern helps with HDDs, but modern deployments use SSDs for predictable low latency. The difference between "usually fast" and "always fast" matters for P99 latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network: Distance Costs Time
&lt;/h3&gt;

&lt;p&gt;Network latency follows the speed of light in fiber (roughly 5 microseconds per kilometer), plus routing overhead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Same host (loopback):     &amp;lt; 0.1ms
Same rack/AZ:            0.1-0.5ms one-way  
Cross-AZ, same region:   0.5-2ms one-way
Cross-region (continent): 15-40ms one-way
Intercontinental:        80-200ms one-way (varies by route)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AWS measurements:&lt;/strong&gt; Cross-AZ pings typically show 1-2ms RTT, while us-east-1 to eu-west-1 is ~80-90ms RTT.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming implications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous cross-region replication:&lt;/strong&gt; Automatically adds ≥80ms to every write&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leader election during failures:&lt;/strong&gt; Cross-AZ coordination adds several milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer rebalancing:&lt;/strong&gt; Group coordination latency scales with member distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl7movj3bbn3ih50xyib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl7movj3bbn3ih50xyib.png" alt="network-topology-map.svg" width="800" height="600"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 3: Global network latency map showing realistic RTT times between major cloud regions. These physical constraints set hard floors for any distributed streaming system.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Common Failure Scenarios
&lt;/h3&gt;

&lt;p&gt;Streaming systems must handle failures gracefully, but each failure mode has latency implications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Failure Type&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Latency Impact&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Broker failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+50-200ms during leader election&lt;/td&gt;
&lt;td&gt;Faster election timeouts, more brokers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GC pause&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+100-500ms to P99 latencies&lt;/td&gt;
&lt;td&gt;G1GC tuning, smaller heaps, off-heap storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network partition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+timeout duration (often 30s default)&lt;/td&gt;
&lt;td&gt;Shorter timeouts, circuit breakers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema registry miss&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+10-50ms per lookup&lt;/td&gt;
&lt;td&gt;Larger caches, schema pre-loading&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumer rebalance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+5-30s processing halt&lt;/td&gt;
&lt;td&gt;Incremental rebalancing, sticky assignment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Streaming Platform Latency Breakdown
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Publish Latency (Producer → Broker)
&lt;/h3&gt;

&lt;p&gt;This is where your event first enters the streaming platform. Key factors:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network transit:&lt;/strong&gt; Usually negligible within a data center (&amp;lt;1ms), but can dominate for remote producers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broker processing:&lt;/strong&gt; Includes parsing, validation, and local storage. Modern brokers can handle this in microseconds for simple events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication strategy:&lt;/strong&gt; The big variable. Kafka's &lt;code&gt;acks&lt;/code&gt; setting illustrates the trade-off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;acks=0: Fire-and-forget (~1-2ms, risk data loss)
acks=1: Wait for leader only (~2-5ms, balanced)  
acks=all: Wait for all replicas (~5-15ms same-AZ, much higher cross-region)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Producer batching:&lt;/strong&gt; Intentionally trading latency for throughput:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linger.ms=0:  Send immediately (lowest latency)
linger.ms=5:  Wait up to 5ms to batch (better throughput)
linger.ms=50: Wait up to 50ms to batch (much better throughput)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real example:&lt;/strong&gt; A well-tuned Kafka cluster with acks=all, same-AZ replication typically shows 3-8ms publish latency at P50, 10-25ms at P99.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consume Latency (Broker → Consumer)
&lt;/h3&gt;

&lt;p&gt;Once data is available on the broker, how quickly can consumers access it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push vs. Pull:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Push systems&lt;/strong&gt; (like some message queues) can deliver in sub-millisecond&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pull systems&lt;/strong&gt; (like Kafka) depend on poll frequency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Polling configuration mistakes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Bad: Creates 0-500ms artificial delay
max.poll.interval.ms=500

# Good: Near-real-time consumption
max.poll.interval.ms=10
fetch.min.bytes=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Processing overhead:&lt;/strong&gt; In-memory transformations are typically &amp;lt;1ms per event, but external calls (database lookups, API calls) can dominate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Simple in-memory filter:     ~0.001ms per event
JSON parsing/validation:     ~0.01-0.1ms per event  
Database lookup (cached):    ~1-5ms per event
Database lookup (cache miss): ~10-50ms per event
External API call:          ~50-200ms per event
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  End-to-End Latency Monitoring
&lt;/h3&gt;

&lt;p&gt;What users actually experience: &lt;code&gt;E2E = Publish + Network + Consume + Processing&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key percentiles to track:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P50 (median):&lt;/strong&gt; Your typical performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P95:&lt;/strong&gt; What 95% of users experience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P99:&lt;/strong&gt; Catches tail latencies from GC, network hiccups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P99.9:&lt;/strong&gt; Exposes rare but severe problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example real-world numbers:&lt;/strong&gt;&lt;br&gt;
A well-tuned, single-region Kafka pipeline typically achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P50: 10-30ms end-to-end&lt;/li&gt;
&lt;li&gt;P95: 25-75ms end-to-end&lt;/li&gt;
&lt;li&gt;P99: 50-200ms end-to-end (watch for GC pauses, network bursts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cross-region reality:&lt;/strong&gt;&lt;br&gt;
With synchronous cross-region replication, add ≥80ms minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P50: 90-120ms end-to-end&lt;/li&gt;
&lt;li&gt;P99: 150-400ms end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiynxpdvleco6krk3c899.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiynxpdvleco6krk3c899.png" alt="pipeline-latency-breakdown.svg" width="800" height="355"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 4: Detailed breakdown of where latency accumulates in a streaming pipeline, from producer to final storage. Shows how each component contributes to total end-to-end latency.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Lake Integration: The Visibility Latency Challenge
&lt;/h2&gt;

&lt;p&gt;Modern streaming architectures often flow into analytical storage (Apache Iceberg, Delta Lake) for cost-effective long-term analytics. However, these systems operate on a fundamentally different latency model.&lt;/p&gt;
&lt;h3&gt;
  
  
  Commit Interval Governs Freshness
&lt;/h3&gt;

&lt;p&gt;Unlike streaming brokers that make data available immediately, table formats batch writes into atomic commits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Commit every 5 seconds:   ~2.5s average visibility latency, ~5s max
Commit every 1 minute:    ~30s average visibility latency, ~60s max  
Commit every 10 minutes:  ~5min average visibility latency, ~10min max
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the delay?&lt;/strong&gt; Table formats like Iceberg prioritize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomic visibility:&lt;/strong&gt; Readers see complete batches or nothing (no partial data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient storage:&lt;/strong&gt; Larger files are cheaper and faster to read from object storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata efficiency:&lt;/strong&gt; Fewer commits = less metadata overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Commit Strategy Examples
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Commit Interval&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Trade-offs&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time dashboard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5-30 seconds&lt;/td&gt;
&lt;td&gt;Higher cost, more small files, near-real-time visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hourly reporting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5 minutes&lt;/td&gt;
&lt;td&gt;Balanced cost and freshness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Daily analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10-60 minutes&lt;/td&gt;
&lt;td&gt;Lowest cost, highest efficiency, delayed visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Experimental data:&lt;/strong&gt; In testing with Flink → Iceberg:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10-second commits: ~10s median latency, ~20s P99&lt;/li&gt;
&lt;li&gt;1-minute commits: ~30s median latency, ~60s P99&lt;/li&gt;
&lt;li&gt;Latency closely tracks commit interval plus small processing overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Impact of Commit Frequency
&lt;/h3&gt;

&lt;p&gt;Netflix's analysis showed that keeping data in Kafka costs 38x more than Iceberg storage&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. The commit interval directly affects this trade-off:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example calculation&lt;/strong&gt; (1TB/day workload):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka retention (24 hours):&lt;/strong&gt; ~$500/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iceberg (frequent 30s commits):&lt;/strong&gt; ~$25/month + processing costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iceberg (relaxed 10min commits):&lt;/strong&gt; ~$13/month + processing costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More frequent commits mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher compute costs (more Flink/Spark jobs)&lt;/li&gt;
&lt;li&gt;More small files (worse query performance)&lt;/li&gt;
&lt;li&gt;Higher metadata overhead&lt;/li&gt;
&lt;li&gt;But lower visibility latency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Synchronous vs. Asynchronous: The Fundamental Trade-off
&lt;/h2&gt;

&lt;p&gt;Every distributed streaming system faces choices about when to wait for confirmation versus proceeding optimistically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Synchronous replication:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → Broker → Wait for replicas → ACK to producer
Latency: Base + (RTT to slowest replica)
Durability: High (data on multiple nodes before ACK)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Asynchronous replication:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → Broker → Immediate ACK → Background replication
Latency: Base + local write time only
Durability: Lower (brief window where data exists on only one node)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Same AZ sync replication:&lt;/strong&gt; +1-3ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-AZ sync replication:&lt;/strong&gt; +2-8ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-region sync replication:&lt;/strong&gt; +80-200ms (often impractical)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Processing Patterns
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Synchronous processing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event → Process Step 1 → Wait → Process Step 2 → Wait → Response
Latency: Sum of all steps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Asynchronous processing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event → Trigger Step 1 → Trigger Step 2 → Collect results → Response
Latency: Max of parallel steps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; External enrichment workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous:&lt;/strong&gt; Event → DB lookup (20ms) → API call (50ms) → Process (5ms) = 75ms total&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous:&lt;/strong&gt; Event → [DB lookup || API call] → Process = ~50ms total (parallelized)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The async approach requires more complex code (handling out-of-order responses, partial failures) but can significantly reduce latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting: When Latency Goes Wrong
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Your Latency Budget Checklist
&lt;/h3&gt;

&lt;p&gt;Before designing any streaming system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Identified true latency requirement&lt;/strong&gt; (ultra-low/low/relaxed)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Mapped data path&lt;/strong&gt; (same AZ/cross-AZ/cross-region)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Chosen durability level&lt;/strong&gt; (async/sync replication)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Configured monitoring&lt;/strong&gt; for P50/P95/P99, not just averages&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Load tested&lt;/strong&gt; at peak throughput (latency often degrades under load)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Latency Culprits
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Symptom&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Likely Cause&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Investigation&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistent &amp;gt;100ms in same region&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network saturation or misconfigured routing&lt;/td&gt;
&lt;td&gt;Check network utilization, traceroute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;P99 &amp;gt;&amp;gt; P50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GC pauses or batching effects&lt;/td&gt;
&lt;td&gt;JVM GC logs, batch size analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sudden latency spikes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broker failover or rebalancing&lt;/td&gt;
&lt;td&gt;Broker logs, consumer group stability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;High variance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource contention or queueing&lt;/td&gt;
&lt;td&gt;CPU/memory/disk utilization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gradual degradation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Growing consumer lag&lt;/td&gt;
&lt;td&gt;Partition count, consumer scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Metrics to Monitor
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Producer side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;request-latency-avg/max&lt;/code&gt;: How long broker requests take&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;batch-size-avg&lt;/code&gt;: Batching efficiency&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;buffer-available-bytes&lt;/code&gt;: Memory pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Broker side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;request-handler-idle-ratio&lt;/code&gt;: CPU saturation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;log-flush-time&lt;/code&gt;: Disk performance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;leader-election-rate&lt;/code&gt;: Stability issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consumer side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;lag-max&lt;/code&gt;: How far behind consumers are&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;poll-time-avg&lt;/code&gt;: Processing efficiency&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;commit-latency-avg&lt;/code&gt;: Offset management overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;End-to-end:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application-level latency tracking with correlation IDs&lt;/li&gt;
&lt;li&gt;P50/P95/P99 latency distributions over time&lt;/li&gt;
&lt;li&gt;Latency broken down by pipeline stage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technology-Specific Configurations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Apache Kafka for Low Latency
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Producer configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Minimize batching
&lt;/span&gt;&lt;span class="py"&gt;linger.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;
&lt;span class="py"&gt;batch.size&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1024&lt;/span&gt;

&lt;span class="c"&gt;# Reduce network overhead  
&lt;/span&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;max.in.flight.requests.per.connection&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;

&lt;span class="c"&gt;# Disable compression for lowest latency
&lt;/span&gt;&lt;span class="py"&gt;compression.type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Broker configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fast leader election
&lt;/span&gt;&lt;span class="py"&gt;replica.lag.time.max.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;500&lt;/span&gt;
&lt;span class="py"&gt;replica.socket.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1000&lt;/span&gt;

&lt;span class="c"&gt;# Frequent flushes (if durability required)
&lt;/span&gt;&lt;span class="py"&gt;log.flush.interval.messages&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;log.flush.interval.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Consumer configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Minimal poll interval
&lt;/span&gt;&lt;span class="py"&gt;fetch.min.bytes&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;fetch.max.wait.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;

&lt;span class="c"&gt;# Reduce rebalance overhead
&lt;/span&gt;&lt;span class="py"&gt;session.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;6000&lt;/span&gt;
&lt;span class="py"&gt;heartbeat.interval.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Apache Pulsar for Ultra-Low Latency
&lt;/h3&gt;

&lt;p&gt;Pulsar's architecture allows some optimizations Kafka cannot match:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Memory-mapped journal for minimal write latency&lt;/span&gt;
&lt;span class="n"&gt;dbStorage_writeCacheMaxSizeMb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;
&lt;span class="n"&gt;dbStorage_readAheadCacheMaxSizeMb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;

&lt;span class="c1"&gt;// Disable fsync for maximum speed (if durability allows)&lt;/span&gt;
&lt;span class="n"&gt;journalSyncData&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Apache Flink for Stream Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Minimize checkpoint overhead&lt;/span&gt;
&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;checkpoints&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nl"&gt;memory:&lt;/span&gt;&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rocksdb&lt;/span&gt;

&lt;span class="c1"&gt;// Reduce buffering&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;watermark&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion: Engineering Time as a Feature
&lt;/h2&gt;

&lt;p&gt;Latency in streaming systems isn't just a performance metric—it's a feature you must consciously design, budget, and engineer for. Just as Jeff Dean's numbers taught programmers to respect the reality of time in computing hardware, these streaming latency numbers should guide every architectural decision you make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Physics sets hard floors.&lt;/strong&gt; You cannot stream across continents in under 80ms, period. You cannot do synchronous disk writes faster than your storage allows. Design within reality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency is expensive.&lt;/strong&gt; Ultra-low latency often costs 10x-100x more than "good enough" latency. Netflix's 38x cost difference between Kafka and Iceberg isn't unique—it's typical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Percentiles matter more than averages.&lt;/strong&gt; Your users experience P95 and P99 latencies, not medians. A system with 50ms average and 2-second P99 is not a real-time system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Every millisecond is a trade-off.&lt;/strong&gt; Choosing synchronous replication adds latency but prevents data loss. Choosing small batches reduces latency but limits throughput. These aren't bugs—they're fundamental engineering decisions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor what matters.&lt;/strong&gt; End-to-end latency with business-relevant percentiles. Break down by pipeline stage. Alert on degradation, not just failures.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal isn't to build the fastest possible system—it's to build the right system for your latency budget. Sometimes that's a 5ms ultra-low latency platform costing hundreds of thousands per month. Sometimes it's a 5-minute batch process costing hundreds per month. Both can be "real-time" in their proper context.&lt;/p&gt;

&lt;p&gt;Armed with these numbers, you can confidently navigate the trade-offs between speed, cost, and complexity. You'll know when a requirement is physically impossible, when it's technically feasible but economically questionable, and when it's the right fit for your streaming architecture.&lt;/p&gt;

&lt;p&gt;Most importantly, you'll stop debating whether something is "real-time" and start designing systems that deliver data when and where it's needed—in real real-time.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!&lt;/em&gt;&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Netflix cost analysis based on industry presentations and blog posts discussing their data lake architecture. Specific 38x figure commonly cited in streaming architecture discussions, though exact source documentation may vary. For current Netflix data architecture details, see their technology blog and conference presentations on data platform evolution. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>kafka</category>
      <category>performance</category>
      <category>pulsar</category>
      <category>programming</category>
    </item>
    <item>
      <title>Latency Numbers Every Data Streaming Engineer Should Know</title>
      <dc:creator>David Kjerrumgaard</dc:creator>
      <pubDate>Fri, 12 Sep 2025 22:38:08 +0000</pubDate>
      <link>https://dev.to/david_kjerrumgaard_d31d7e/latency-numbers-every-data-streaming-engineer-should-know-13lp</link>
      <guid>https://dev.to/david_kjerrumgaard_d31d7e/latency-numbers-every-data-streaming-engineer-should-know-13lp</guid>
      <description>&lt;h1&gt;
  
  
  Latency Numbers Every Data Streaming Engineer Should Know
&lt;/h1&gt;

&lt;p&gt;Jeff Dean's &lt;a href="https://gist.github.com/jboner/2841832" rel="noopener noreferrer"&gt;"Latency Numbers Every Programmer Should Know"&lt;/a&gt; became essential reading because it grounded abstract performance discussions in concrete reality. For data streaming engineers, we need an equivalent framework that translates those fundamental hardware latencies into the specific challenges of real-time data pipelines.&lt;/p&gt;

&lt;p&gt;Just as Dean showed that a disk seek (10ms) costs the same as 40,000 L1 cache references, streaming engineers must understand that a cross-region sync replication (100ms+) costs the same as processing 10,000 in-memory events. These aren't just numbers—they're the physics that govern what's possible in your streaming architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR: Your Latency Budget Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Latency Class&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;End-to-End Target&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Use Cases&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Key Constraints&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ultra-low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms&lt;/td&gt;
&lt;td&gt;HFT, real-time control, gaming&lt;/td&gt;
&lt;td&gt;Single AZ only, no disk fsync per record, specialized hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10-200ms&lt;/td&gt;
&lt;td&gt;Interactive dashboards, alerts, online ML features&lt;/td&gt;
&lt;td&gt;Streaming processing, minimal batching, same region&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency-relaxed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;200ms - minutes&lt;/td&gt;
&lt;td&gt;Near-real-time analytics, ETL, reporting&lt;/td&gt;
&lt;td&gt;Enables aggressive batching, cross-region, cost optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Critical Hardware &amp;amp; Network Floors
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Streaming Impact&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HDD seek/fsync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5-20ms&lt;/td&gt;
&lt;td&gt;Consumes entire ultra-low budget&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSD fsync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.05-1ms&lt;/td&gt;
&lt;td&gt;Manageable for low latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Same AZ network (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.2-1ms&lt;/td&gt;
&lt;td&gt;Base cost for any distributed system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-AZ (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-4ms&lt;/td&gt;
&lt;td&gt;Minimum for AZ-redundant streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-region (RTT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30-200ms+&lt;/td&gt;
&lt;td&gt;Makes &amp;lt;100ms E2E impossible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema registry lookup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5ms (cached), 10-50ms (miss)&lt;/td&gt;
&lt;td&gt;Often overlooked latency source&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Streaming Platform Specifics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Typical Latency&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Configuration Notes&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kafka publish (acks=1, same-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5ms&lt;/td&gt;
&lt;td&gt;No replica wait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kafka publish (acks=all, same-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3-15ms&lt;/td&gt;
&lt;td&gt;Adds replica sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-AZ sync replication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+1-5ms&lt;/td&gt;
&lt;td&gt;Per additional AZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Producer batching (linger.ms)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+5-50ms&lt;/td&gt;
&lt;td&gt;Intentional latency for throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumer poll interval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0-500ms+&lt;/td&gt;
&lt;td&gt;Misconfiguration can dominate E2E&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Iceberg commit visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5s-10min&lt;/td&gt;
&lt;td&gt;Depends on commit interval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What "Real-Time" Actually Means
&lt;/h2&gt;

&lt;p&gt;In data streaming, "real-time" has become as overloaded as "big data" once was. Let's establish clear definitions based on both technical constraints and human perception thresholds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Flatency-spectrum-diagram.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Flatency-spectrum-diagram.svg" alt="latency-spectrum-diagram.svg" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1: Streaming Latency Spectrum showing the logarithmic scale from nanoseconds to minutes, with technology examples and use cases for each latency category.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Ultra-Low Latency (&amp;lt; 10ms End-to-End)
&lt;/h3&gt;

&lt;p&gt;This is the realm of hard real-time systems where every microsecond counts. Applications requiring sub-10ms latency include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-frequency trading&lt;/strong&gt; (where 1ms advantage = millions in profit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time control systems&lt;/strong&gt; (industrial automation, autonomous vehicles)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive gaming&lt;/strong&gt; (where 16ms = one frame at 60fps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-latency market data&lt;/strong&gt; (every trader needs the same speed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everything in one availability zone (cross-AZ RTT alone is 1-4ms)&lt;/li&gt;
&lt;li&gt;No per-record disk fsync (HDD seek = 10ms, breaking your entire budget)&lt;/li&gt;
&lt;li&gt;Kernel bypass networking (DPDK, RDMA)&lt;/li&gt;
&lt;li&gt;Custom serialization (Protocol Buffers, Avro, or binary)&lt;/li&gt;
&lt;li&gt;Memory-mapped storage or pure in-memory processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example stack:&lt;/strong&gt; Apache Pulsar with BookKeeper on NVMe, or heavily tuned Kafka with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Ultra-low latency Kafka producer config
linger.ms=0
batch.size=1024
acks=1
compression.type=none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reality check:&lt;/strong&gt; For perspective, 100ms is the threshold where UI interactions feel instantaneous to humans. Ultra-low latency is an order of magnitude faster than human perception—you're optimizing for machines, not users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Low Latency (10-200ms End-to-End)
&lt;/h3&gt;

&lt;p&gt;This covers the sweet spot for most interactive real-time applications. Users perceive anything under 200ms as "instant" response, making this the target for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live dashboards and monitoring&lt;/strong&gt; (business metrics, system health)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time alerting&lt;/strong&gt; (fraud detection, anomaly detection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Online machine learning features&lt;/strong&gt; (recommendation engines, personalization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live chat and notifications&lt;/strong&gt; (social platforms, collaboration tools)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time analytics&lt;/strong&gt; (A/B test results, user behavior tracking)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-at-a-time processing (not micro-batches)&lt;/li&gt;
&lt;li&gt;Cross-AZ replication acceptable (adds ~2-5ms)&lt;/li&gt;
&lt;li&gt;Moderate batching for efficiency (5-50ms linger times)&lt;/li&gt;
&lt;li&gt;SSD storage with occasional fsync&lt;/li&gt;
&lt;li&gt;Standard streaming platforms work well&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example stack:&lt;/strong&gt; Apache Kafka + Apache Flink with event-time processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Balanced Kafka configuration
linger.ms=5
batch.size=16384
acks=all
max.in.flight.requests.per.connection=5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost implications:&lt;/strong&gt; This range allows reasonable optimization without exotic hardware. A well-tuned Kafka cluster can achieve 10-50ms P50 latency with hundreds of thousands of events per second.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Flatency-throughput-tradeoff.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Flatency-throughput-tradeoff.svg" alt="latency-throughput-tradeoff.svg" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2: The classic trade-off between latency and throughput in streaming systems. Lower latency typically means higher cost and lower throughput, while batch processing achieves high throughput at the cost of latency.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Latency-Relaxed (200ms - Minutes)
&lt;/h3&gt;

&lt;p&gt;When latency requirements relax beyond a few hundred milliseconds, you enter the realm of cost optimization and massive throughput. This category includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Near-real-time ETL&lt;/strong&gt; (data lake ingestion, warehouse loading)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business intelligence dashboards&lt;/strong&gt; (updating every 30 seconds to 5 minutes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch-oriented analytics&lt;/strong&gt; (hourly/daily reports with "fresh" data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data lake table formats&lt;/strong&gt; (Iceberg, Delta Lake with 1-10 minute commits)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-region data replication&lt;/strong&gt; (disaster recovery, global distribution)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggressive batching (seconds to minutes)&lt;/li&gt;
&lt;li&gt;Cross-region replication feasible&lt;/li&gt;
&lt;li&gt;Cheaper storage tiers (object storage vs. hot SSDs)&lt;/li&gt;
&lt;li&gt;Higher compression ratios&lt;/li&gt;
&lt;li&gt;Simpler error handling and retry logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Netflix's architecture keeps only hours of hot data in Kafka (expensive) and tiers the rest to Apache Iceberg on S3 (38x cheaper)&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. For most analytics, 1-5 minute latency is perfectly acceptable and dramatically reduces infrastructure costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Physics of Streaming Latency
&lt;/h2&gt;

&lt;p&gt;Understanding hardware and network fundamentals isn't academic—these are the unavoidable floors that constrain every streaming system.&lt;/p&gt;
&lt;h3&gt;
  
  
  Storage: The Latency Hierarchy
&lt;/h3&gt;

&lt;p&gt;Every streaming platform must persist data for durability, but storage choices have massive latency implications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory access:        ~100 nanoseconds
SSD random read:      ~150 microseconds (1,500x slower than memory)
NVMe fsync:          ~0.05-1 milliseconds  
SATA SSD fsync:      ~0.5-5 milliseconds
HDD seek/fsync:      ~5-20 milliseconds (200,000x slower than memory!)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world example:&lt;/strong&gt; Intel Optane NVMe can sync writes in ~43 microseconds average, while a traditional HDD takes ~18ms—that's 400x faster. For a streaming broker writing 10,000 events/second:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;With HDD:&lt;/strong&gt; Maximum ~50-100 synced writes/second/disk (disk-bound)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With NVMe:&lt;/strong&gt; Thousands of synced writes/second (CPU/network bound)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Kafka-specific insight:&lt;/strong&gt; Kafka's sequential write pattern helps with HDDs, but modern deployments use SSDs for predictable low latency. The difference between "usually fast" and "always fast" matters for P99 latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network: Distance Costs Time
&lt;/h3&gt;

&lt;p&gt;Network latency follows the speed of light in fiber (roughly 5 microseconds per kilometer), plus routing overhead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Same host (loopback):     &amp;lt; 0.1ms
Same rack/AZ:            0.1-0.5ms one-way  
Cross-AZ, same region:   0.5-2ms one-way
Cross-region (continent): 15-40ms one-way
Intercontinental:        80-200ms one-way (varies by route)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AWS measurements:&lt;/strong&gt; Cross-AZ pings typically show 1-2ms RTT, while us-east-1 to eu-west-1 is ~80-90ms RTT.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming implications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous cross-region replication:&lt;/strong&gt; Automatically adds ≥80ms to every write&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leader election during failures:&lt;/strong&gt; Cross-AZ coordination adds several milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer rebalancing:&lt;/strong&gt; Group coordination latency scales with member distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Fnetwork-topology-map.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Fnetwork-topology-map.svg" alt="network-topology-map.svg" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 3: Global network latency map showing realistic RTT times between major cloud regions. These physical constraints set hard floors for any distributed streaming system.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Common Failure Scenarios
&lt;/h3&gt;

&lt;p&gt;Streaming systems must handle failures gracefully, but each failure mode has latency implications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Failure Type&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Latency Impact&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Broker failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+50-200ms during leader election&lt;/td&gt;
&lt;td&gt;Faster election timeouts, more brokers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GC pause&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+100-500ms to P99 latencies&lt;/td&gt;
&lt;td&gt;G1GC tuning, smaller heaps, off-heap storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network partition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+timeout duration (often 30s default)&lt;/td&gt;
&lt;td&gt;Shorter timeouts, circuit breakers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema registry miss&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+10-50ms per lookup&lt;/td&gt;
&lt;td&gt;Larger caches, schema pre-loading&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consumer rebalance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;+5-30s processing halt&lt;/td&gt;
&lt;td&gt;Incremental rebalancing, sticky assignment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Streaming Platform Latency Breakdown
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Publish Latency (Producer → Broker)
&lt;/h3&gt;

&lt;p&gt;This is where your event first enters the streaming platform. Key factors:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network transit:&lt;/strong&gt; Usually negligible within a data center (&amp;lt;1ms), but can dominate for remote producers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broker processing:&lt;/strong&gt; Includes parsing, validation, and local storage. Modern brokers can handle this in microseconds for simple events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication strategy:&lt;/strong&gt; The big variable. Kafka's &lt;code&gt;acks&lt;/code&gt; setting illustrates the trade-off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;acks=0: Fire-and-forget (~1-2ms, risk data loss)
acks=1: Wait for leader only (~2-5ms, balanced)  
acks=all: Wait for all replicas (~5-15ms same-AZ, much higher cross-region)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Producer batching:&lt;/strong&gt; Intentionally trading latency for throughput:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linger.ms=0:  Send immediately (lowest latency)
linger.ms=5:  Wait up to 5ms to batch (better throughput)
linger.ms=50: Wait up to 50ms to batch (much better throughput)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real example:&lt;/strong&gt; A well-tuned Kafka cluster with acks=all, same-AZ replication typically shows 3-8ms publish latency at P50, 10-25ms at P99.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consume Latency (Broker → Consumer)
&lt;/h3&gt;

&lt;p&gt;Once data is available on the broker, how quickly can consumers access it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push vs. Pull:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Push systems&lt;/strong&gt; (like some message queues) can deliver in sub-millisecond&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pull systems&lt;/strong&gt; (like Kafka) depend on poll frequency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Polling configuration mistakes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Bad: Creates 0-500ms artificial delay
max.poll.interval.ms=500

# Good: Near-real-time consumption
max.poll.interval.ms=10
fetch.min.bytes=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Processing overhead:&lt;/strong&gt; In-memory transformations are typically &amp;lt;1ms per event, but external calls (database lookups, API calls) can dominate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Simple in-memory filter:     ~0.001ms per event
JSON parsing/validation:     ~0.01-0.1ms per event  
Database lookup (cached):    ~1-5ms per event
Database lookup (cache miss): ~10-50ms per event
External API call:          ~50-200ms per event
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  End-to-End Latency Monitoring
&lt;/h3&gt;

&lt;p&gt;What users actually experience: &lt;code&gt;E2E = Publish + Network + Consume + Processing&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key percentiles to track:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P50 (median):&lt;/strong&gt; Your typical performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P95:&lt;/strong&gt; What 95% of users experience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P99:&lt;/strong&gt; Catches tail latencies from GC, network hiccups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P99.9:&lt;/strong&gt; Exposes rare but severe problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example real-world numbers:&lt;/strong&gt;&lt;br&gt;
A well-tuned, single-region Kafka pipeline typically achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P50: 10-30ms end-to-end&lt;/li&gt;
&lt;li&gt;P95: 25-75ms end-to-end&lt;/li&gt;
&lt;li&gt;P99: 50-200ms end-to-end (watch for GC pauses, network bursts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cross-region reality:&lt;/strong&gt;&lt;br&gt;
With synchronous cross-region replication, add ≥80ms minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;P50: 90-120ms end-to-end&lt;/li&gt;
&lt;li&gt;P99: 150-400ms end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Fpipeline-latency-breakdown.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/images%2Fpipeline-latency-breakdown.svg" alt="pipeline-latency-breakdown.svg" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 4: Detailed breakdown of where latency accumulates in a streaming pipeline, from producer to final storage. Shows how each component contributes to total end-to-end latency.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Lake Integration: The Visibility Latency Challenge
&lt;/h2&gt;

&lt;p&gt;Modern streaming architectures often flow into analytical storage (Apache Iceberg, Delta Lake) for cost-effective long-term analytics. However, these systems operate on a fundamentally different latency model.&lt;/p&gt;
&lt;h3&gt;
  
  
  Commit Interval Governs Freshness
&lt;/h3&gt;

&lt;p&gt;Unlike streaming brokers that make data available immediately, table formats batch writes into atomic commits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Commit every 5 seconds:   ~2.5s average visibility latency, ~5s max
Commit every 1 minute:    ~30s average visibility latency, ~60s max  
Commit every 10 minutes:  ~5min average visibility latency, ~10min max
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the delay?&lt;/strong&gt; Table formats like Iceberg prioritize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomic visibility:&lt;/strong&gt; Readers see complete batches or nothing (no partial data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient storage:&lt;/strong&gt; Larger files are cheaper and faster to read from object storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata efficiency:&lt;/strong&gt; Fewer commits = less metadata overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Commit Strategy Examples
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Use Case&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Commit Interval&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Trade-offs&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time dashboard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5-30 seconds&lt;/td&gt;
&lt;td&gt;Higher cost, more small files, near-real-time visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hourly reporting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1-5 minutes&lt;/td&gt;
&lt;td&gt;Balanced cost and freshness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Daily analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10-60 minutes&lt;/td&gt;
&lt;td&gt;Lowest cost, highest efficiency, delayed visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Experimental data:&lt;/strong&gt; In testing with Flink → Iceberg:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10-second commits: ~10s median latency, ~20s P99&lt;/li&gt;
&lt;li&gt;1-minute commits: ~30s median latency, ~60s P99&lt;/li&gt;
&lt;li&gt;Latency closely tracks commit interval plus small processing overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Impact of Commit Frequency
&lt;/h3&gt;

&lt;p&gt;Netflix's analysis showed that keeping data in Kafka costs 38x more than Iceberg storage&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. The commit interval directly affects this trade-off:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example calculation&lt;/strong&gt; (1TB/day workload):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka retention (24 hours):&lt;/strong&gt; ~$500/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iceberg (frequent 30s commits):&lt;/strong&gt; ~$25/month + processing costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iceberg (relaxed 10min commits):&lt;/strong&gt; ~$13/month + processing costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More frequent commits mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher compute costs (more Flink/Spark jobs)&lt;/li&gt;
&lt;li&gt;More small files (worse query performance)&lt;/li&gt;
&lt;li&gt;Higher metadata overhead&lt;/li&gt;
&lt;li&gt;But lower visibility latency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Synchronous vs. Asynchronous: The Fundamental Trade-off
&lt;/h2&gt;

&lt;p&gt;Every distributed streaming system faces choices about when to wait for confirmation versus proceeding optimistically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Synchronous replication:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → Broker → Wait for replicas → ACK to producer
Latency: Base + (RTT to slowest replica)
Durability: High (data on multiple nodes before ACK)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Asynchronous replication:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer → Broker → Immediate ACK → Background replication
Latency: Base + local write time only
Durability: Lower (brief window where data exists on only one node)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Same AZ sync replication:&lt;/strong&gt; +1-3ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-AZ sync replication:&lt;/strong&gt; +2-8ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-region sync replication:&lt;/strong&gt; +80-200ms (often impractical)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Processing Patterns
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Synchronous processing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event → Process Step 1 → Wait → Process Step 2 → Wait → Response
Latency: Sum of all steps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Asynchronous processing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event → Trigger Step 1 → Trigger Step 2 → Collect results → Response
Latency: Max of parallel steps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; External enrichment workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous:&lt;/strong&gt; Event → DB lookup (20ms) → API call (50ms) → Process (5ms) = 75ms total&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous:&lt;/strong&gt; Event → [DB lookup || API call] → Process = ~50ms total (parallelized)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The async approach requires more complex code (handling out-of-order responses, partial failures) but can significantly reduce latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting: When Latency Goes Wrong
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Your Latency Budget Checklist
&lt;/h3&gt;

&lt;p&gt;Before designing any streaming system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Identified true latency requirement&lt;/strong&gt; (ultra-low/low/relaxed)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Mapped data path&lt;/strong&gt; (same AZ/cross-AZ/cross-region)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Chosen durability level&lt;/strong&gt; (async/sync replication)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Configured monitoring&lt;/strong&gt; for P50/P95/P99, not just averages&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Load tested&lt;/strong&gt; at peak throughput (latency often degrades under load)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Latency Culprits
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Symptom&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Likely Cause&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Investigation&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistent &amp;gt;100ms in same region&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network saturation or misconfigured routing&lt;/td&gt;
&lt;td&gt;Check network utilization, traceroute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;P99 &amp;gt;&amp;gt; P50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GC pauses or batching effects&lt;/td&gt;
&lt;td&gt;JVM GC logs, batch size analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sudden latency spikes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Broker failover or rebalancing&lt;/td&gt;
&lt;td&gt;Broker logs, consumer group stability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;High variance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource contention or queueing&lt;/td&gt;
&lt;td&gt;CPU/memory/disk utilization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gradual degradation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Growing consumer lag&lt;/td&gt;
&lt;td&gt;Partition count, consumer scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Metrics to Monitor
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Producer side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;request-latency-avg/max&lt;/code&gt;: How long broker requests take&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;batch-size-avg&lt;/code&gt;: Batching efficiency&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;buffer-available-bytes&lt;/code&gt;: Memory pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Broker side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;request-handler-idle-ratio&lt;/code&gt;: CPU saturation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;log-flush-time&lt;/code&gt;: Disk performance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;leader-election-rate&lt;/code&gt;: Stability issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consumer side:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;lag-max&lt;/code&gt;: How far behind consumers are&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;poll-time-avg&lt;/code&gt;: Processing efficiency&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;commit-latency-avg&lt;/code&gt;: Offset management overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;End-to-end:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application-level latency tracking with correlation IDs&lt;/li&gt;
&lt;li&gt;P50/P95/P99 latency distributions over time&lt;/li&gt;
&lt;li&gt;Latency broken down by pipeline stage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technology-Specific Configurations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Apache Kafka for Low Latency
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Producer configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Minimize batching
&lt;/span&gt;&lt;span class="py"&gt;linger.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;
&lt;span class="py"&gt;batch.size&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1024&lt;/span&gt;

&lt;span class="c"&gt;# Reduce network overhead  
&lt;/span&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;max.in.flight.requests.per.connection&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;

&lt;span class="c"&gt;# Disable compression for lowest latency
&lt;/span&gt;&lt;span class="py"&gt;compression.type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Broker configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fast leader election
&lt;/span&gt;&lt;span class="py"&gt;replica.lag.time.max.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;500&lt;/span&gt;
&lt;span class="py"&gt;replica.socket.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1000&lt;/span&gt;

&lt;span class="c"&gt;# Frequent flushes (if durability required)
&lt;/span&gt;&lt;span class="py"&gt;log.flush.interval.messages&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;log.flush.interval.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Consumer configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Minimal poll interval
&lt;/span&gt;&lt;span class="py"&gt;fetch.min.bytes&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;fetch.max.wait.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;

&lt;span class="c"&gt;# Reduce rebalance overhead
&lt;/span&gt;&lt;span class="py"&gt;session.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;6000&lt;/span&gt;
&lt;span class="py"&gt;heartbeat.interval.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Apache Pulsar for Ultra-Low Latency
&lt;/h3&gt;

&lt;p&gt;Pulsar's architecture allows some optimizations Kafka cannot match:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Memory-mapped journal for minimal write latency&lt;/span&gt;
&lt;span class="n"&gt;dbStorage_writeCacheMaxSizeMb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;
&lt;span class="n"&gt;dbStorage_readAheadCacheMaxSizeMb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;

&lt;span class="c1"&gt;// Disable fsync for maximum speed (if durability allows)&lt;/span&gt;
&lt;span class="n"&gt;journalSyncData&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Apache Flink for Stream Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Minimize checkpoint overhead&lt;/span&gt;
&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;checkpoints&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nl"&gt;memory:&lt;/span&gt;&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rocksdb&lt;/span&gt;

&lt;span class="c1"&gt;// Reduce buffering&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;watermark&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;latency&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion: Engineering Time as a Feature
&lt;/h2&gt;

&lt;p&gt;Latency in streaming systems isn't just a performance metric—it's a feature you must consciously design, budget, and engineer for. Just as Jeff Dean's numbers taught programmers to respect the reality of time in computing hardware, these streaming latency numbers should guide every architectural decision you make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Physics sets hard floors.&lt;/strong&gt; You cannot stream across continents in under 80ms, period. You cannot do synchronous disk writes faster than your storage allows. Design within reality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency is expensive.&lt;/strong&gt; Ultra-low latency often costs 10x-100x more than "good enough" latency. Netflix's 38x cost difference between Kafka and Iceberg isn't unique—it's typical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Percentiles matter more than averages.&lt;/strong&gt; Your users experience P95 and P99 latencies, not medians. A system with 50ms average and 2-second P99 is not a real-time system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Every millisecond is a trade-off.&lt;/strong&gt; Choosing synchronous replication adds latency but prevents data loss. Choosing small batches reduces latency but limits throughput. These aren't bugs—they're fundamental engineering decisions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor what matters.&lt;/strong&gt; End-to-end latency with business-relevant percentiles. Break down by pipeline stage. Alert on degradation, not just failures.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal isn't to build the fastest possible system—it's to build the right system for your latency budget. Sometimes that's a 5ms ultra-low latency platform costing hundreds of thousands per month. Sometimes it's a 5-minute batch process costing hundreds per month. Both can be "real-time" in their proper context.&lt;/p&gt;

&lt;p&gt;Armed with these numbers, you can confidently navigate the trade-offs between speed, cost, and complexity. You'll know when a requirement is physically impossible, when it's technically feasible but economically questionable, and when it's the right fit for your streaming architecture.&lt;/p&gt;

&lt;p&gt;Most importantly, you'll stop debating whether something is "real-time" and start designing systems that deliver data when and where it's needed—in real real-time.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!&lt;/em&gt;&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Netflix cost analysis based on industry presentations and blog posts discussing their data lake architecture. Specific 38x figure commonly cited in streaming architecture discussions, though exact source documentation may vary. For current Netflix data architecture details, see their technology blog and conference presentations on data platform evolution. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Nifi Bundle Release Announcement</title>
      <dc:creator>David Kjerrumgaard</dc:creator>
      <pubDate>Fri, 12 Sep 2025 22:32:49 +0000</pubDate>
      <link>https://dev.to/david_kjerrumgaard_d31d7e/nifi-bundle-release-announcement-563i</link>
      <guid>https://dev.to/david_kjerrumgaard_d31d7e/nifi-bundle-release-announcement-563i</guid>
      <description>&lt;h1&gt;
  
  
  New Release: Enhanced Apache NiFi Connector for Pulsar v2.1.0
&lt;/h1&gt;

&lt;p&gt;I'm excited to announce the availability of an updated version of the Apache NiFi connector for Pulsar! This week, we dedicated time to implementing much-needed improvements that will enhance your data streaming experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community-Driven Improvements
&lt;/h2&gt;

&lt;p&gt;First and foremost, I want to extend our heartfelt gratitude to the community members who took the time to report issues and provide valuable feedback. Your contributions are essential to making this connector more robust and reliable for everyone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Changes in This Release
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Added support for OAuth2 credentials&lt;/strong&gt; - Enhanced authentication flexibility by supporting clientId/clientSecret instead of requiring private key files (see &lt;a href="https://github.com/david-streamlio/pulsar-nifi-bundle/issues/85" rel="noopener noreferrer"&gt;issue #85&lt;/a&gt; for details)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Added Pulsar MessageID and message properties to the outbound FlowFiles&lt;/strong&gt; - Pulsar MessageID and message properties are properly captured and forwarded to outbound FlowFiles (addresses &lt;a href="https://github.com/david-streamlio/pulsar-nifi-bundle/issues/67" rel="noopener noreferrer"&gt;issue #67&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized publisher resource management&lt;/strong&gt; - Improved performance by reusing PublisherLease objects instead of creating new publishers for each FlowFile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updated to Apache NiFi 2.1.0&lt;/strong&gt; - Latest NiFi version support with newest features and security updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updated to Apache Pulsar 3.3.7&lt;/strong&gt; - Latest stable Pulsar release integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's New
&lt;/h2&gt;

&lt;p&gt;This release includes several key improvements and updates that significantly enhance the connector's functionality and performance:&lt;/p&gt;

&lt;h3&gt;
  
  
  New Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Added Pulsar MessageID and message properties to the outbound FlowFiles&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Added support for OAuth2 authentication using clientId, clientSecret&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enhanced OAuth2 Authentication Support
&lt;/h3&gt;

&lt;p&gt;We've expanded authentication options by adding support for OAuth2 authentication using clientId and clientSecret credentials. This provides a more flexible alternative to private key file-based authentication, making it easier to integrate with modern cloud-based Pulsar deployments and enterprise authentication systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Added Pulsar MessageID and Message Properties to Outbound FlowFiles
&lt;/h3&gt;

&lt;p&gt;The connector now properly captures and forwards Pulsar MessageID and message properties to outbound FlowFiles. This enhancement ensures that important message metadata is preserved throughout your data processing pipeline, enabling better message tracking, debugging, and downstream processing decisions based on message properties.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimized Publisher Resource Management
&lt;/h3&gt;

&lt;p&gt;We've implemented a smarter approach to managing Pulsar publishers by reusing existing PublisherLease objects when possible, rather than storing publishers in a cache. This architectural improvement simplifies the design while preventing the unnecessary creation of new Pulsar Publisher instances for every FlowFile, resulting in better resource utilization and improved performance under high-throughput scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform Updates
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Updated to Apache NiFi 2.1.0&lt;/strong&gt;: Ensuring compatibility with the newest features and security updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updated to Apache Pulsar 3.3.7&lt;/strong&gt;: Latest stable release providing improved performance and reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The updated connector maintains backward compatibility while providing enhanced functionality. You can download the latest version from &lt;a href="https://central.sonatype.com/artifact/io.streamnative.connectors/nifi-pulsar-bundle/versions" rel="noopener noreferrer"&gt;Maven Central&lt;/a&gt; and find installation instructions in the project repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Forward
&lt;/h2&gt;

&lt;p&gt;I remain committed to maintaining and improving this connector based on community feedback. If you encounter any issues or have suggestions for future enhancements, please don't hesitate to open an issue in our GitHub repository.&lt;/p&gt;

&lt;p&gt;Thank you again to our community for your continued support and contributions. Together, we're building better tools for real-time data processing and streaming. &lt;/p&gt;

&lt;h2&gt;
  
  
  For technical support or questions about the Apache NiFi connector for Pulsar, visit my &lt;a href="https://github.com/david-streamlio/pulsar-nifi-bundle" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; or reach out to the community.*
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tooling</category>
      <category>dataengineering</category>
      <category>opensource</category>
      <category>news</category>
    </item>
  </channel>
</rss>
