DEV Community: David Kjerrumgaard

Is Software Dead? It Depends on What You’re Building

David Kjerrumgaard — Tue, 03 Feb 2026 20:36:29 +0000

I've spent over a decade building, selling, and scaling SaaS and infrastructure products — from early-stage startups to enterprise platforms. I've watched this industry survive the "cloud is a fad" era, the ZIRP hangover, and at least two rounds of "software is dead" narratives. It's never been dead. But it has always been evolving, and the companies that refuse to see the shifts clearly are the ones that don't make it. What's happening right now is real, it's significant, and it deserves a more honest conversation than either the doomsayers or the cheerleaders are offering. No, the sky is not falling — but if you're building or investing in software and you're not paying close attention, you're going to get caught off guard.

For those of you who might not pay close attention to the private credit or stock markets, you might not have noticed that software stocks are getting crushed. Hedge funds are dumping SaaS positions at a pace we haven't seen since 2008. Private credit firms with $100 billion in software exposure are watching their balance sheets deteriorate in real time. Traders on Wall Street are calling it a "SaaSpocalypse."

Meanwhile, on the other side of the hype cycle, AI evangelists promise that every industry will be transformed within 18 months and that trillion-dollar markets are being created overnight.

Both camps sound confident. I think both are partially right and meaningfully wrong.

As someone who has spent years building in the data infrastructure space, I've been watching these narratives collide — and honestly, the lack of nuance in either direction is doing real damage to how companies are being evaluated, funded, and built.

Here's my honest take.

Confidence in the SaaS model has shattered

Let's start with the numbers. As Jamin Ball recently noted in Clouded Judgement, the median next-twelve-months revenue multiple for cloud software has dropped to 4.1x — the lowest in a decade. The median free cash flow multiple sits at 18.9x, roughly 30% below the previous 10-year low. These aren't normal fluctuations. This is a structural repricing.

The reason goes deeper than a bad earnings season. SaaS businesses have long been valued as predictable cash flow machines — spend aggressively early, flip to profitability, then compound. The math behind those valuations rests on two foundational assumptions: that retention rates remain high and stable, and that the business has meaningful terminal value. AI is now calling both assumptions into question simultaneously.

If customers leave legacy SaaS vendors for AI-native alternatives, retention craters and the cash flow model breaks. If entire software categories get commoditized, the terminal value for some of these companies may genuinely be zero. Even if you disagree with the most bearish case, the probability of those outcomes is higher today than it was a year ago — and that alone justifies lower multiples.

The disruption is real, but it's not uniform

The core anxiety driving the selloff is straightforward: AI can now do things that used to require purpose-built software. Legal review tools, customer service platforms, content management systems, basic analytics — the list of categories where AI is a credible alternative grows every week. That's not hype. That's happening.

But the leap from "AI can replace some software categories" to "sell everything with a SaaS business model" is exactly the kind of overcorrection markets are prone to. "Software" is not a single thing. Treating all software companies as equally exposed to AI disruption is like saying every business that uses electricity is equally vulnerable to a grid failure. The exposure varies enormously depending on where you sit in the stack.

Three very different realities under one label

Software that AI can replace. These are application-layer products whose core value proposition is a workflow that AI can now perform directly. Document review, templated content generation, basic data entry automation, simple customer routing. If your product is essentially a codified process wrapped in a UI, and that process can now be handled by a foundation model with a good prompt, the threat is real and immediate. This isn't a valuation problem — it's an existential one.

Software that needs to evolve. This is the largest and most interesting group. Most horizontal SaaS platforms aren't going to disappear overnight, but they face intensifying pricing pressure and feature commoditization. As Ball points out, the deeper issue isn't that someone will "vibe code" a replacement for Salesforce — it's that the marginal cost of creating software has collapsed, which will flood every category with competition and commoditize markets faster than incumbents can respond.

The stock market is already sorting this group in real time. HubSpot, a strong company by any traditional SaaS metric, saw its stock drop roughly 50% in 2025 as investors questioned whether SMB CRM and marketing automation can defend its pricing against AI-native alternatives. Adobe fell around 35% despite genuinely impressive AI capabilities in Firefly — the market's concern isn't that Adobe isn't innovating, it's that standalone AI tools can now deliver "good enough" creative output for the majority of use cases at a fraction of the cost. Atlassian and Monday.com saw similar declines as investors recalibrated what project management and collaboration software is worth in a world where AI agents can coordinate work autonomously.

These are not failing companies. They are strong businesses facing a fundamental question: can they integrate AI deeply enough to become more valuable, not less? The market is right to ask hard questions here. It's wrong to assume the answers are universally negative.

Software that AI depends on. Infrastructure — the systems that move data, manage compute, handle security, orchestrate distributed workloads — doesn't get replaced by AI. It gets consumed by it. Every AI workload needs to ingest data at scale, process events in real time, route outputs to downstream systems, and do all of this reliably across global environments. The rise of AI is arguably the single biggest demand driver infrastructure software has ever seen.

The companies that have figured this out are thriving while the rest of the sector burns. Cloudflare's stock rose over 80% in 2025 — not because it added AI features, but because its edge computing infrastructure is where AI inference actually runs. As one analyst noted, Cloudflare isn't selling AI features; it's selling the pipes that AI runs on. Datadog saw its AI-native customer revenue grow from 4% to 11% of total revenue in a single year, with over a dozen AI-native companies each spending more than $1 million annually on its observability platform. More AI workloads means more complexity to monitor, more logs to analyze, more security threats to detect. Snowflake's growth re-accelerated to nearly 30% as enterprises recognized that data infrastructure is the foundation AI needs before it can do anything useful. CrowdStrike climbed over 50% because AI doesn't reduce cybersecurity threats — it creates entirely new attack surfaces that need defending.

Even ServiceNow, which straddles the line between application and platform, generated over $600 million from its AI assistant products alone and grew subscription revenue 21% by positioning itself as an "AI Control Tower" for enterprise workflows — not competing with AI, but becoming the orchestration layer that AI agents operate within. Notably, ServiceNow's retention rates haven't taken a hit yet, which may be an early signal that well-positioned platforms can weather this storm.

The pattern is clear: the companies winning aren't the ones bolting AI features onto existing products. They're the ones whose core infrastructure becomes more essential as AI adoption scales.

Yet many of these infrastructure companies are still being sold off alongside the ones AI is actually displacing, simply because they carry the "software" label.

The financial feedback loop is making things worse

What makes this moment especially treacherous isn't just the technology thesis — it's the credit cycle layered on top of it. Business development companies have roughly $100 billion in exposure to software companies. As software valuations decline, BDC balance sheets deteriorate. As BDCs tighten credit, software companies lose access to growth capital. As growth slows, valuations fall further.

This dynamic doesn't discriminate. A company with strong fundamentals and growing revenue can get caught in the same credit squeeze as one that's genuinely being disrupted, simply because both carry the "software" label. Default rates in private credit could reach 13% if AI disruption plays out aggressively, according to UBS — a projection that makes lenders cautious across the board, not just with the most exposed borrowers.

What the bears are getting right

The fundamental insight — that the marginal cost of creating software has collapsed — is correct and profound. This is not a temporary dislocation. When anyone can build a functional application in hours instead of months, the structural economics of the industry change permanently.

The value shifts away from the application itself and toward the underlying data, the integrations, the operational complexity, and the reliability of the systems that power it. Software that survives long-term will be software that's hard to replicate — not because of its UI, but because of the engineering depth and infrastructure moats it embodies.

That's a healthy and overdue reckoning for parts of the industry, even if the process of getting there is painful.

What the bears are getting wrong

The timeline is being compressed unrealistically. Yes, AI can generate a basic application from a prompt. No, that does not mean enterprise software disappears next quarter. Adoption curves, procurement cycles, compliance requirements, integration complexity, and organizational inertia all mean that even genuinely disrupted categories will take years to fully turn over. Markets are pricing in a revolution that will actually unfold as an evolution.

The all-or-nothing framing is also creating mispricing in both directions. Some companies will see their growth accelerate because of AI adoption. Others will see specific product lines threatened while their core platform becomes more essential. Painting every software company with the same brush guarantees you'll be wrong about most of them.

What the AI optimists are getting wrong

On the flip side, the unbounded enthusiasm deserves its own reality check. Not every AI demo translates to an enterprise deployment. Not every proof of concept survives contact with production data, regulatory requirements, and organizational change management. The gap between "this is technically possible" and "this is deployed at scale in a Fortune 500" is still measured in years for most use cases.

We've seen this pattern before. Cloud computing was genuinely transformative, but the timeline from early hype to mainstream enterprise adoption was roughly a decade. Mobile, same story. AI will be faster because the infrastructure is better, but "faster than previous platform shifts" is not the same as "instantaneous."

Companies making long-term bets based on the assumption that every AI promise will be fulfilled on schedule are just as exposed as the ones ignoring the threat entirely.

Cooler heads will prevail

I expect the next 12-18 months to be painful but ultimately clarifying. Ball makes a smart observation: what will change the market's mind is several quarters of stable retention rates from established software companies in the face of AI challengers. ServiceNow's early Q4 results suggest that's possible for well-positioned platforms. If more companies demonstrate that retention is holding, the panic-driven repricing will start to correct.

The market will develop more precision in how it evaluates software companies, distinguishing between those that are genuinely in AI's path and those that are being caught up in category-level panic. The early data is already here: the 2025 stock performance gap between infrastructure winners and application-layer losers was stark, and that divergence will only sharpen as earnings continue to separate reality from narrative.

Infrastructure companies will eventually get re-rated as the market recognizes that AI workloads don't reduce demand for data movement, real-time processing, and distributed systems — they dramatically increase it. Application-layer companies will bifurcate sharply between those that integrate AI successfully and those that don't.

And the credit cycle will unwind on its own timeline, unfortunately causing collateral damage to strong companies that happen to carry the wrong label.

My advice to anyone building or investing in this space: resist the urge to react to the loudest narrative, whether that's doom or unbounded optimism. Focus on fundamentals — retention, efficiency, genuine technical differentiation, and whether your product becomes more or less essential as AI adoption grows.

The companies that panic-rebrand as "AI-native" overnight will look desperate in hindsight. The ones that quietly build indispensable technology will look prescient. And the investors who maintain discipline while others oscillate between euphoria and panic will be the ones who capture the real value being created right now.

The sky isn't falling. But it is changing shape. The winners will be the ones who study the new landscape carefully rather than running for cover or chasing mirages.

Latency Numbers Every Data Streaming Engineer Should Know

David Kjerrumgaard — Sun, 14 Sep 2025 21:18:58 +0000

Latency Numbers Every Data Streaming Engineer Should Know

Jeff Dean's "Latency Numbers Every Programmer Should Know" became essential reading because it grounded abstract performance discussions in concrete reality. For data streaming engineers, we need an equivalent framework that translates those fundamental hardware latencies into the specific challenges of real-time data pipelines.

Just as Dean showed that a disk seek (10ms) costs the same as 40,000 L1 cache references, streaming engineers must understand that a cross-region sync replication (100ms+) costs the same as processing 10,000 in-memory events. These aren't just numbers—they're the physics that govern what's possible in your streaming architecture.

TL;DR: Your Latency Budget Quick Reference

Latency Class	End-to-End Target	Use Cases	Key Constraints
Ultra-low	< 10ms	HFT, real-time control, gaming	Single AZ only, no disk fsync per record, specialized hardware
Low	10-200ms	Interactive dashboards, alerts, online ML features	Streaming processing, minimal batching, same region
Latency-relaxed	200ms - minutes	Near-real-time analytics, ETL, reporting	Enables aggressive batching, cross-region, cost optimization

Critical Hardware & Network Floors

Operation	Latency	Streaming Impact
HDD seek/fsync	5-20ms	Consumes entire ultra-low budget
SSD fsync	0.05-1ms	Manageable for low latency
Same AZ network (RTT)	0.2-1ms	Base cost for any distributed system
Cross-AZ (RTT)	1-4ms	Minimum for AZ-redundant streams
Cross-region (RTT)	30-200ms+	Makes <100ms E2E impossible
Schema registry lookup	1-5ms (cached), 10-50ms (miss)	Often overlooked latency source

Streaming Platform Specifics

Operation	Typical Latency	Configuration Notes
Kafka publish (acks=1, same-AZ)	1-5ms	No replica wait
Kafka publish (acks=all, same-AZ)	3-15ms	Adds replica sync
Cross-AZ sync replication	+1-5ms	Per additional AZ
Producer batching (linger.ms)	+5-50ms	Intentional latency for throughput
Consumer poll interval	0-500ms+	Misconfiguration can dominate E2E
Iceberg commit visibility	5s-10min	Depends on commit interval

What "Real-Time" Actually Means

In data streaming, "real-time" has become as overloaded as "big data" once was. Let's establish clear definitions based on both technical constraints and human perception thresholds.

Figure 1: Streaming Latency Spectrum showing the logarithmic scale from nanoseconds to minutes, with technology examples and use cases for each latency category.

Ultra-Low Latency (< 10ms End-to-End)

This is the realm of hard real-time systems where every microsecond counts. Applications requiring sub-10ms latency include:

High-frequency trading (where 1ms advantage = millions in profit)
Real-time control systems (industrial automation, autonomous vehicles)
Competitive gaming (where 16ms = one frame at 60fps)
Low-latency market data (every trader needs the same speed)

Technical requirements:

Everything in one availability zone (cross-AZ RTT alone is 1-4ms)
No per-record disk fsync (HDD seek = 10ms, breaking your entire budget)
Kernel bypass networking (DPDK, RDMA)
Custom serialization (Protocol Buffers, Avro, or binary)
Memory-mapped storage or pure in-memory processing

Example stack: Apache Pulsar with BookKeeper on NVMe, or heavily tuned Kafka with:

# Ultra-low latency Kafka producer config
linger.ms=0
batch.size=1024
acks=1
compression.type=none

Reality check: For perspective, 100ms is the threshold where UI interactions feel instantaneous to humans. Ultra-low latency is an order of magnitude faster than human perception—you're optimizing for machines, not users.

Low Latency (10-200ms End-to-End)

This covers the sweet spot for most interactive real-time applications. Users perceive anything under 200ms as "instant" response, making this the target for:

Live dashboards and monitoring (business metrics, system health)
Real-time alerting (fraud detection, anomaly detection)
Online machine learning features (recommendation engines, personalization)
Live chat and notifications (social platforms, collaboration tools)
Real-time analytics (A/B test results, user behavior tracking)

Technical characteristics:

Event-at-a-time processing (not micro-batches)
Cross-AZ replication acceptable (adds ~2-5ms)
Moderate batching for efficiency (5-50ms linger times)
SSD storage with occasional fsync
Standard streaming platforms work well

Example stack: Apache Kafka + Apache Flink with event-time processing:

# Balanced Kafka configuration
linger.ms=5
batch.size=16384
acks=all
max.in.flight.requests.per.connection=5

Cost implications: This range allows reasonable optimization without exotic hardware. A well-tuned Kafka cluster can achieve 10-50ms P50 latency with hundreds of thousands of events per second.

Figure 2: The classic trade-off between latency and throughput in streaming systems. Lower latency typically means higher cost and lower throughput, while batch processing achieves high throughput at the cost of latency.

Latency-Relaxed (200ms - Minutes)

When latency requirements relax beyond a few hundred milliseconds, you enter the realm of cost optimization and massive throughput. This category includes:

Near-real-time ETL (data lake ingestion, warehouse loading)
Business intelligence dashboards (updating every 30 seconds to 5 minutes)
Batch-oriented analytics (hourly/daily reports with "fresh" data)
Data lake table formats (Iceberg, Delta Lake with 1-10 minute commits)
Cross-region data replication (disaster recovery, global distribution)

Technical advantages:

Aggressive batching (seconds to minutes)
Cross-region replication feasible
Cheaper storage tiers (object storage vs. hot SSDs)
Higher compression ratios
Simpler error handling and retry logic

Example: Netflix's architecture keeps only hours of hot data in Kafka (expensive) and tiers the rest to Apache Iceberg on S3 (38x cheaper)¹. For most analytics, 1-5 minute latency is perfectly acceptable and dramatically reduces infrastructure costs.

The Physics of Streaming Latency

Understanding hardware and network fundamentals isn't academic—these are the unavoidable floors that constrain every streaming system.

Storage: The Latency Hierarchy

Every streaming platform must persist data for durability, but storage choices have massive latency implications:

Memory access:        ~100 nanoseconds
SSD random read:      ~150 microseconds (1,500x slower than memory)
NVMe fsync:          ~0.05-1 milliseconds  
SATA SSD fsync:      ~0.5-5 milliseconds
HDD seek/fsync:      ~5-20 milliseconds (200,000x slower than memory!)

Real-world example: Intel Optane NVMe can sync writes in ~43 microseconds average, while a traditional HDD takes ~18ms—that's 400x faster. For a streaming broker writing 10,000 events/second:

With HDD: Maximum ~50-100 synced writes/second/disk (disk-bound)
With NVMe: Thousands of synced writes/second (CPU/network bound)

Kafka-specific insight: Kafka's sequential write pattern helps with HDDs, but modern deployments use SSDs for predictable low latency. The difference between "usually fast" and "always fast" matters for P99 latency.

Network: Distance Costs Time

Network latency follows the speed of light in fiber (roughly 5 microseconds per kilometer), plus routing overhead:

Same host (loopback):     < 0.1ms
Same rack/AZ:            0.1-0.5ms one-way  
Cross-AZ, same region:   0.5-2ms one-way
Cross-region (continent): 15-40ms one-way
Intercontinental:        80-200ms one-way (varies by route)

AWS measurements: Cross-AZ pings typically show 1-2ms RTT, while us-east-1 to eu-west-1 is ~80-90ms RTT.

Streaming implications:

Synchronous cross-region replication: Automatically adds ≥80ms to every write
Leader election during failures: Cross-AZ coordination adds several milliseconds
Consumer rebalancing: Group coordination latency scales with member distribution

Figure 3: Global network latency map showing realistic RTT times between major cloud regions. These physical constraints set hard floors for any distributed streaming system.

Common Failure Scenarios

Streaming systems must handle failures gracefully, but each failure mode has latency implications:

Failure Type	Latency Impact	Mitigation
Broker failover	+50-200ms during leader election	Faster election timeouts, more brokers
GC pause	+100-500ms to P99 latencies	G1GC tuning, smaller heaps, off-heap storage
Network partition	+timeout duration (often 30s default)	Shorter timeouts, circuit breakers
Schema registry miss	+10-50ms per lookup	Larger caches, schema pre-loading
Consumer rebalance	+5-30s processing halt	Incremental rebalancing, sticky assignment

Streaming Platform Latency Breakdown

Publish Latency (Producer → Broker)

This is where your event first enters the streaming platform. Key factors:

Network transit: Usually negligible within a data center (<1ms), but can dominate for remote producers.

Broker processing: Includes parsing, validation, and local storage. Modern brokers can handle this in microseconds for simple events.

Replication strategy: The big variable. Kafka's acks setting illustrates the trade-off:

acks=0: Fire-and-forget (~1-2ms, risk data loss)
acks=1: Wait for leader only (~2-5ms, balanced)  
acks=all: Wait for all replicas (~5-15ms same-AZ, much higher cross-region)

Producer batching: Intentionally trading latency for throughput:

linger.ms=0:  Send immediately (lowest latency)
linger.ms=5:  Wait up to 5ms to batch (better throughput)
linger.ms=50: Wait up to 50ms to batch (much better throughput)

Real example: A well-tuned Kafka cluster with acks=all, same-AZ replication typically shows 3-8ms publish latency at P50, 10-25ms at P99.

Consume Latency (Broker → Consumer)

Once data is available on the broker, how quickly can consumers access it?

Push vs. Pull:

Push systems (like some message queues) can deliver in sub-millisecond
Pull systems (like Kafka) depend on poll frequency

Polling configuration mistakes:

# Bad: Creates 0-500ms artificial delay
max.poll.interval.ms=500

# Good: Near-real-time consumption
max.poll.interval.ms=10
fetch.min.bytes=1

Processing overhead: In-memory transformations are typically <1ms per event, but external calls (database lookups, API calls) can dominate:

Simple in-memory filter:     ~0.001ms per event
JSON parsing/validation:     ~0.01-0.1ms per event  
Database lookup (cached):    ~1-5ms per event
Database lookup (cache miss): ~10-50ms per event
External API call:          ~50-200ms per event

End-to-End Latency Monitoring

What users actually experience: E2E = Publish + Network + Consume + Processing

Key percentiles to track:

P50 (median): Your typical performance
P95: What 95% of users experience
P99: Catches tail latencies from GC, network hiccups
P99.9: Exposes rare but severe problems

Example real-world numbers:
A well-tuned, single-region Kafka pipeline typically achieves:

P50: 10-30ms end-to-end
P95: 25-75ms end-to-end
P99: 50-200ms end-to-end (watch for GC pauses, network bursts)

Cross-region reality:
With synchronous cross-region replication, add ≥80ms minimum:

P50: 90-120ms end-to-end
P99: 150-400ms end-to-end

Figure 4: Detailed breakdown of where latency accumulates in a streaming pipeline, from producer to final storage. Shows how each component contributes to total end-to-end latency.

Data Lake Integration: The Visibility Latency Challenge

Modern streaming architectures often flow into analytical storage (Apache Iceberg, Delta Lake) for cost-effective long-term analytics. However, these systems operate on a fundamentally different latency model.

Commit Interval Governs Freshness

Unlike streaming brokers that make data available immediately, table formats batch writes into atomic commits:

Commit every 5 seconds:   ~2.5s average visibility latency, ~5s max
Commit every 1 minute:    ~30s average visibility latency, ~60s max  
Commit every 10 minutes:  ~5min average visibility latency, ~10min max

Why the delay? Table formats like Iceberg prioritize:

Atomic visibility: Readers see complete batches or nothing (no partial data)
Efficient storage: Larger files are cheaper and faster to read from object storage
Metadata efficiency: Fewer commits = less metadata overhead

Real-World Commit Strategy Examples

Use Case	Commit Interval	Trade-offs
Real-time dashboard	5-30 seconds	Higher cost, more small files, near-real-time visibility
Hourly reporting	1-5 minutes	Balanced cost and freshness
Daily analytics	10-60 minutes	Lowest cost, highest efficiency, delayed visibility

Experimental data: In testing with Flink → Iceberg:

10-second commits: ~10s median latency, ~20s P99
1-minute commits: ~30s median latency, ~60s P99
Latency closely tracks commit interval plus small processing overhead

Cost Impact of Commit Frequency

Netflix's analysis showed that keeping data in Kafka costs 38x more than Iceberg storage¹. The commit interval directly affects this trade-off:

Example calculation (1TB/day workload):

Kafka retention (24 hours): ~$500/month
Iceberg (frequent 30s commits): ~$25/month + processing costs
Iceberg (relaxed 10min commits): ~$13/month + processing costs

More frequent commits mean:

Higher compute costs (more Flink/Spark jobs)
More small files (worse query performance)
Higher metadata overhead
But lower visibility latency

Synchronous vs. Asynchronous: The Fundamental Trade-off

Every distributed streaming system faces choices about when to wait for confirmation versus proceeding optimistically.

Replication Strategies

Synchronous replication:

Producer → Broker → Wait for replicas → ACK to producer
Latency: Base + (RTT to slowest replica)
Durability: High (data on multiple nodes before ACK)

Asynchronous replication:

Producer → Broker → Immediate ACK → Background replication
Latency: Base + local write time only
Durability: Lower (brief window where data exists on only one node)

Real-world impact:

Same AZ sync replication: +1-3ms
Cross-AZ sync replication: +2-8ms
Cross-region sync replication: +80-200ms (often impractical)

Processing Patterns

Synchronous processing:

Event → Process Step 1 → Wait → Process Step 2 → Wait → Response
Latency: Sum of all steps

Asynchronous processing:

Event → Trigger Step 1 → Trigger Step 2 → Collect results → Response
Latency: Max of parallel steps

Example: External enrichment workflow:

Synchronous: Event → DB lookup (20ms) → API call (50ms) → Process (5ms) = 75ms total
Asynchronous: Event → [DB lookup || API call] → Process = ~50ms total (parallelized)

The async approach requires more complex code (handling out-of-order responses, partial failures) but can significantly reduce latency.

Troubleshooting: When Latency Goes Wrong

Your Latency Budget Checklist

Before designing any streaming system:

[ ] Identified true latency requirement (ultra-low/low/relaxed)
[ ] Mapped data path (same AZ/cross-AZ/cross-region)
[ ] Chosen durability level (async/sync replication)
[ ] Configured monitoring for P50/P95/P99, not just averages
[ ] Load tested at peak throughput (latency often degrades under load)

Common Latency Culprits

Symptom	Likely Cause	Investigation
Consistent >100ms in same region	Network saturation or misconfigured routing	Check network utilization, traceroute
P99 >> P50	GC pauses or batching effects	JVM GC logs, batch size analysis
Sudden latency spikes	Broker failover or rebalancing	Broker logs, consumer group stability
High variance	Resource contention or queueing	CPU/memory/disk utilization
Gradual degradation	Growing consumer lag	Partition count, consumer scaling

Key Metrics to Monitor

Producer side:

request-latency-avg/max: How long broker requests take
batch-size-avg: Batching efficiency
buffer-available-bytes: Memory pressure

Broker side:

request-handler-idle-ratio: CPU saturation
log-flush-time: Disk performance
leader-election-rate: Stability issues

Consumer side:

lag-max: How far behind consumers are
poll-time-avg: Processing efficiency
commit-latency-avg: Offset management overhead

End-to-end:

Application-level latency tracking with correlation IDs
P50/P95/P99 latency distributions over time
Latency broken down by pipeline stage

Technology-Specific Configurations

Apache Kafka for Low Latency

Producer configuration:

# Minimize batching
linger.ms=0
batch.size=1024

# Reduce network overhead  
acks=1
max.in.flight.requests.per.connection=1

# Disable compression for lowest latency
compression.type=none

Broker configuration:

# Fast leader election
replica.lag.time.max.ms=500
replica.socket.timeout.ms=1000

# Frequent flushes (if durability required)
log.flush.interval.messages=1
log.flush.interval.ms=10

Consumer configuration:

# Minimal poll interval
fetch.min.bytes=1
fetch.max.wait.ms=10

# Reduce rebalance overhead
session.timeout.ms=6000
heartbeat.interval.ms=2000

Apache Pulsar for Ultra-Low Latency

Pulsar's architecture allows some optimizations Kafka cannot match:

// Memory-mapped journal for minimal write latency
dbStorage_writeCacheMaxSizeMb=512
dbStorage_readAheadCacheMaxSizeMb=256

// Disable fsync for maximum speed (if durability allows)
journalSyncData=false

Apache Flink for Stream Processing

// Minimize checkpoint overhead
state.checkpoints.dir=memory://
state.backend=rocksdb

// Reduce buffering
pipeline.auto-watermark-interval=1ms
pipeline.latency-tracking-interval=1ms

Conclusion: Engineering Time as a Feature

Latency in streaming systems isn't just a performance metric—it's a feature you must consciously design, budget, and engineer for. Just as Jeff Dean's numbers taught programmers to respect the reality of time in computing hardware, these streaming latency numbers should guide every architectural decision you make.

The key insights:

Physics sets hard floors. You cannot stream across continents in under 80ms, period. You cannot do synchronous disk writes faster than your storage allows. Design within reality.
Latency is expensive. Ultra-low latency often costs 10x-100x more than "good enough" latency. Netflix's 38x cost difference between Kafka and Iceberg isn't unique—it's typical.
Percentiles matter more than averages. Your users experience P95 and P99 latencies, not medians. A system with 50ms average and 2-second P99 is not a real-time system.
Every millisecond is a trade-off. Choosing synchronous replication adds latency but prevents data loss. Choosing small batches reduces latency but limits throughput. These aren't bugs—they're fundamental engineering decisions.
Monitor what matters. End-to-end latency with business-relevant percentiles. Break down by pipeline stage. Alert on degradation, not just failures.

The goal isn't to build the fastest possible system—it's to build the right system for your latency budget. Sometimes that's a 5ms ultra-low latency platform costing hundreds of thousands per month. Sometimes it's a 5-minute batch process costing hundreds per month. Both can be "real-time" in their proper context.

Armed with these numbers, you can confidently navigate the trade-offs between speed, cost, and complexity. You'll know when a requirement is physically impossible, when it's technically feasible but economically questionable, and when it's the right fit for your streaming architecture.

Most importantly, you'll stop debating whether something is "real-time" and start designing systems that deliver data when and where it's needed—in real real-time.

References

As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!

Netflix cost analysis based on industry presentations and blog posts discussing their data lake architecture. Specific 38x figure commonly cited in streaming architecture discussions, though exact source documentation may vary. For current Netflix data architecture details, see their technology blog and conference presentations on data platform evolution. ↩

Latency Numbers Every Data Streaming Engineer Should Know

David Kjerrumgaard — Fri, 12 Sep 2025 22:38:08 +0000

Latency Numbers Every Data Streaming Engineer Should Know

TL;DR: Your Latency Budget Quick Reference

Latency Class	End-to-End Target	Use Cases	Key Constraints
Ultra-low	< 10ms	HFT, real-time control, gaming	Single AZ only, no disk fsync per record, specialized hardware
Low	10-200ms	Interactive dashboards, alerts, online ML features	Streaming processing, minimal batching, same region
Latency-relaxed	200ms - minutes	Near-real-time analytics, ETL, reporting	Enables aggressive batching, cross-region, cost optimization

Critical Hardware & Network Floors

Operation	Latency	Streaming Impact
HDD seek/fsync	5-20ms	Consumes entire ultra-low budget
SSD fsync	0.05-1ms	Manageable for low latency
Same AZ network (RTT)	0.2-1ms	Base cost for any distributed system
Cross-AZ (RTT)	1-4ms	Minimum for AZ-redundant streams
Cross-region (RTT)	30-200ms+	Makes <100ms E2E impossible
Schema registry lookup	1-5ms (cached), 10-50ms (miss)	Often overlooked latency source

Streaming Platform Specifics

Operation	Typical Latency	Configuration Notes
Kafka publish (acks=1, same-AZ)	1-5ms	No replica wait
Kafka publish (acks=all, same-AZ)	3-15ms	Adds replica sync
Cross-AZ sync replication	+1-5ms	Per additional AZ
Producer batching (linger.ms)	+5-50ms	Intentional latency for throughput
Consumer poll interval	0-500ms+	Misconfiguration can dominate E2E
Iceberg commit visibility	5s-10min	Depends on commit interval

What "Real-Time" Actually Means

In data streaming, "real-time" has become as overloaded as "big data" once was. Let's establish clear definitions based on both technical constraints and human perception thresholds.

Figure 1: Streaming Latency Spectrum showing the logarithmic scale from nanoseconds to minutes, with technology examples and use cases for each latency category.

Ultra-Low Latency (< 10ms End-to-End)

This is the realm of hard real-time systems where every microsecond counts. Applications requiring sub-10ms latency include:

High-frequency trading (where 1ms advantage = millions in profit)
Real-time control systems (industrial automation, autonomous vehicles)
Competitive gaming (where 16ms = one frame at 60fps)
Low-latency market data (every trader needs the same speed)

Technical requirements:

Everything in one availability zone (cross-AZ RTT alone is 1-4ms)
No per-record disk fsync (HDD seek = 10ms, breaking your entire budget)
Kernel bypass networking (DPDK, RDMA)
Custom serialization (Protocol Buffers, Avro, or binary)
Memory-mapped storage or pure in-memory processing

Example stack: Apache Pulsar with BookKeeper on NVMe, or heavily tuned Kafka with:

# Ultra-low latency Kafka producer config
linger.ms=0
batch.size=1024
acks=1
compression.type=none

Low Latency (10-200ms End-to-End)

This covers the sweet spot for most interactive real-time applications. Users perceive anything under 200ms as "instant" response, making this the target for:

Live dashboards and monitoring (business metrics, system health)
Real-time alerting (fraud detection, anomaly detection)
Online machine learning features (recommendation engines, personalization)
Live chat and notifications (social platforms, collaboration tools)
Real-time analytics (A/B test results, user behavior tracking)

Technical characteristics:

Event-at-a-time processing (not micro-batches)
Cross-AZ replication acceptable (adds ~2-5ms)
Moderate batching for efficiency (5-50ms linger times)
SSD storage with occasional fsync
Standard streaming platforms work well

Example stack: Apache Kafka + Apache Flink with event-time processing:

# Balanced Kafka configuration
linger.ms=5
batch.size=16384
acks=all
max.in.flight.requests.per.connection=5

Cost implications: This range allows reasonable optimization without exotic hardware. A well-tuned Kafka cluster can achieve 10-50ms P50 latency with hundreds of thousands of events per second.

Latency-Relaxed (200ms - Minutes)

When latency requirements relax beyond a few hundred milliseconds, you enter the realm of cost optimization and massive throughput. This category includes:

Near-real-time ETL (data lake ingestion, warehouse loading)
Business intelligence dashboards (updating every 30 seconds to 5 minutes)
Batch-oriented analytics (hourly/daily reports with "fresh" data)
Data lake table formats (Iceberg, Delta Lake with 1-10 minute commits)
Cross-region data replication (disaster recovery, global distribution)

Technical advantages:

Aggressive batching (seconds to minutes)
Cross-region replication feasible
Cheaper storage tiers (object storage vs. hot SSDs)
Higher compression ratios
Simpler error handling and retry logic

The Physics of Streaming Latency

Understanding hardware and network fundamentals isn't academic—these are the unavoidable floors that constrain every streaming system.

Storage: The Latency Hierarchy

Every streaming platform must persist data for durability, but storage choices have massive latency implications:

Memory access:        ~100 nanoseconds
SSD random read:      ~150 microseconds (1,500x slower than memory)
NVMe fsync:          ~0.05-1 milliseconds  
SATA SSD fsync:      ~0.5-5 milliseconds
HDD seek/fsync:      ~5-20 milliseconds (200,000x slower than memory!)

Real-world example: Intel Optane NVMe can sync writes in ~43 microseconds average, while a traditional HDD takes ~18ms—that's 400x faster. For a streaming broker writing 10,000 events/second:

With HDD: Maximum ~50-100 synced writes/second/disk (disk-bound)
With NVMe: Thousands of synced writes/second (CPU/network bound)

Network: Distance Costs Time

Network latency follows the speed of light in fiber (roughly 5 microseconds per kilometer), plus routing overhead:

Same host (loopback):     < 0.1ms
Same rack/AZ:            0.1-0.5ms one-way  
Cross-AZ, same region:   0.5-2ms one-way
Cross-region (continent): 15-40ms one-way
Intercontinental:        80-200ms one-way (varies by route)

AWS measurements: Cross-AZ pings typically show 1-2ms RTT, while us-east-1 to eu-west-1 is ~80-90ms RTT.

Streaming implications:

Synchronous cross-region replication: Automatically adds ≥80ms to every write
Leader election during failures: Cross-AZ coordination adds several milliseconds
Consumer rebalancing: Group coordination latency scales with member distribution

Figure 3: Global network latency map showing realistic RTT times between major cloud regions. These physical constraints set hard floors for any distributed streaming system.

Common Failure Scenarios

Streaming systems must handle failures gracefully, but each failure mode has latency implications:

Failure Type	Latency Impact	Mitigation
Broker failover	+50-200ms during leader election	Faster election timeouts, more brokers
GC pause	+100-500ms to P99 latencies	G1GC tuning, smaller heaps, off-heap storage
Network partition	+timeout duration (often 30s default)	Shorter timeouts, circuit breakers
Schema registry miss	+10-50ms per lookup	Larger caches, schema pre-loading
Consumer rebalance	+5-30s processing halt	Incremental rebalancing, sticky assignment

Streaming Platform Latency Breakdown

Publish Latency (Producer → Broker)

This is where your event first enters the streaming platform. Key factors:

Network transit: Usually negligible within a data center (<1ms), but can dominate for remote producers.

Broker processing: Includes parsing, validation, and local storage. Modern brokers can handle this in microseconds for simple events.

Replication strategy: The big variable. Kafka's acks setting illustrates the trade-off:

acks=0: Fire-and-forget (~1-2ms, risk data loss)
acks=1: Wait for leader only (~2-5ms, balanced)  
acks=all: Wait for all replicas (~5-15ms same-AZ, much higher cross-region)

Producer batching: Intentionally trading latency for throughput:

linger.ms=0:  Send immediately (lowest latency)
linger.ms=5:  Wait up to 5ms to batch (better throughput)
linger.ms=50: Wait up to 50ms to batch (much better throughput)

Real example: A well-tuned Kafka cluster with acks=all, same-AZ replication typically shows 3-8ms publish latency at P50, 10-25ms at P99.

Consume Latency (Broker → Consumer)

Once data is available on the broker, how quickly can consumers access it?

Push vs. Pull:

Push systems (like some message queues) can deliver in sub-millisecond
Pull systems (like Kafka) depend on poll frequency

Polling configuration mistakes:

# Bad: Creates 0-500ms artificial delay
max.poll.interval.ms=500

# Good: Near-real-time consumption
max.poll.interval.ms=10
fetch.min.bytes=1

Processing overhead: In-memory transformations are typically <1ms per event, but external calls (database lookups, API calls) can dominate:

Simple in-memory filter:     ~0.001ms per event
JSON parsing/validation:     ~0.01-0.1ms per event  
Database lookup (cached):    ~1-5ms per event
Database lookup (cache miss): ~10-50ms per event
External API call:          ~50-200ms per event

End-to-End Latency Monitoring

What users actually experience: E2E = Publish + Network + Consume + Processing

Key percentiles to track:

P50 (median): Your typical performance
P95: What 95% of users experience
P99: Catches tail latencies from GC, network hiccups
P99.9: Exposes rare but severe problems

Example real-world numbers:
A well-tuned, single-region Kafka pipeline typically achieves:

P50: 10-30ms end-to-end
P95: 25-75ms end-to-end
P99: 50-200ms end-to-end (watch for GC pauses, network bursts)

Cross-region reality:
With synchronous cross-region replication, add ≥80ms minimum:

P50: 90-120ms end-to-end
P99: 150-400ms end-to-end

Figure 4: Detailed breakdown of where latency accumulates in a streaming pipeline, from producer to final storage. Shows how each component contributes to total end-to-end latency.

Data Lake Integration: The Visibility Latency Challenge

Commit Interval Governs Freshness

Unlike streaming brokers that make data available immediately, table formats batch writes into atomic commits:

Commit every 5 seconds:   ~2.5s average visibility latency, ~5s max
Commit every 1 minute:    ~30s average visibility latency, ~60s max  
Commit every 10 minutes:  ~5min average visibility latency, ~10min max

Why the delay? Table formats like Iceberg prioritize:

Atomic visibility: Readers see complete batches or nothing (no partial data)
Efficient storage: Larger files are cheaper and faster to read from object storage
Metadata efficiency: Fewer commits = less metadata overhead

Real-World Commit Strategy Examples

Use Case	Commit Interval	Trade-offs
Real-time dashboard	5-30 seconds	Higher cost, more small files, near-real-time visibility
Hourly reporting	1-5 minutes	Balanced cost and freshness
Daily analytics	10-60 minutes	Lowest cost, highest efficiency, delayed visibility

Experimental data: In testing with Flink → Iceberg:

10-second commits: ~10s median latency, ~20s P99
1-minute commits: ~30s median latency, ~60s P99
Latency closely tracks commit interval plus small processing overhead

Cost Impact of Commit Frequency

Netflix's analysis showed that keeping data in Kafka costs 38x more than Iceberg storage¹. The commit interval directly affects this trade-off:

Example calculation (1TB/day workload):

Kafka retention (24 hours): ~$500/month
Iceberg (frequent 30s commits): ~$25/month + processing costs
Iceberg (relaxed 10min commits): ~$13/month + processing costs

More frequent commits mean:

Higher compute costs (more Flink/Spark jobs)
More small files (worse query performance)
Higher metadata overhead
But lower visibility latency

Synchronous vs. Asynchronous: The Fundamental Trade-off

Every distributed streaming system faces choices about when to wait for confirmation versus proceeding optimistically.

Replication Strategies

Synchronous replication:

Producer → Broker → Wait for replicas → ACK to producer
Latency: Base + (RTT to slowest replica)
Durability: High (data on multiple nodes before ACK)

Asynchronous replication:

Producer → Broker → Immediate ACK → Background replication
Latency: Base + local write time only
Durability: Lower (brief window where data exists on only one node)

Real-world impact:

Same AZ sync replication: +1-3ms
Cross-AZ sync replication: +2-8ms
Cross-region sync replication: +80-200ms (often impractical)

Processing Patterns

Synchronous processing:

Event → Process Step 1 → Wait → Process Step 2 → Wait → Response
Latency: Sum of all steps

Asynchronous processing:

Event → Trigger Step 1 → Trigger Step 2 → Collect results → Response
Latency: Max of parallel steps

Example: External enrichment workflow:

Synchronous: Event → DB lookup (20ms) → API call (50ms) → Process (5ms) = 75ms total
Asynchronous: Event → [DB lookup || API call] → Process = ~50ms total (parallelized)

The async approach requires more complex code (handling out-of-order responses, partial failures) but can significantly reduce latency.

Troubleshooting: When Latency Goes Wrong

Your Latency Budget Checklist

Before designing any streaming system:

[ ] Identified true latency requirement (ultra-low/low/relaxed)
[ ] Mapped data path (same AZ/cross-AZ/cross-region)
[ ] Chosen durability level (async/sync replication)
[ ] Configured monitoring for P50/P95/P99, not just averages
[ ] Load tested at peak throughput (latency often degrades under load)

Common Latency Culprits

Symptom	Likely Cause	Investigation
Consistent >100ms in same region	Network saturation or misconfigured routing	Check network utilization, traceroute
P99 >> P50	GC pauses or batching effects	JVM GC logs, batch size analysis
Sudden latency spikes	Broker failover or rebalancing	Broker logs, consumer group stability
High variance	Resource contention or queueing	CPU/memory/disk utilization
Gradual degradation	Growing consumer lag	Partition count, consumer scaling

Key Metrics to Monitor

Producer side:

request-latency-avg/max: How long broker requests take
batch-size-avg: Batching efficiency
buffer-available-bytes: Memory pressure

Broker side:

request-handler-idle-ratio: CPU saturation
log-flush-time: Disk performance
leader-election-rate: Stability issues

Consumer side:

lag-max: How far behind consumers are
poll-time-avg: Processing efficiency
commit-latency-avg: Offset management overhead

End-to-end:

Application-level latency tracking with correlation IDs
P50/P95/P99 latency distributions over time
Latency broken down by pipeline stage

Technology-Specific Configurations

Apache Kafka for Low Latency

Producer configuration:

# Minimize batching
linger.ms=0
batch.size=1024

# Reduce network overhead  
acks=1
max.in.flight.requests.per.connection=1

# Disable compression for lowest latency
compression.type=none

Broker configuration:

# Fast leader election
replica.lag.time.max.ms=500
replica.socket.timeout.ms=1000

# Frequent flushes (if durability required)
log.flush.interval.messages=1
log.flush.interval.ms=10

Consumer configuration:

# Minimal poll interval
fetch.min.bytes=1
fetch.max.wait.ms=10

# Reduce rebalance overhead
session.timeout.ms=6000
heartbeat.interval.ms=2000

Apache Pulsar for Ultra-Low Latency

Pulsar's architecture allows some optimizations Kafka cannot match:

// Memory-mapped journal for minimal write latency
dbStorage_writeCacheMaxSizeMb=512
dbStorage_readAheadCacheMaxSizeMb=256

// Disable fsync for maximum speed (if durability allows)
journalSyncData=false

Apache Flink for Stream Processing

// Minimize checkpoint overhead
state.checkpoints.dir=memory://
state.backend=rocksdb

// Reduce buffering
pipeline.auto-watermark-interval=1ms
pipeline.latency-tracking-interval=1ms

Conclusion: Engineering Time as a Feature

The key insights:

Physics sets hard floors. You cannot stream across continents in under 80ms, period. You cannot do synchronous disk writes faster than your storage allows. Design within reality.
Latency is expensive. Ultra-low latency often costs 10x-100x more than "good enough" latency. Netflix's 38x cost difference between Kafka and Iceberg isn't unique—it's typical.
Percentiles matter more than averages. Your users experience P95 and P99 latencies, not medians. A system with 50ms average and 2-second P99 is not a real-time system.
Every millisecond is a trade-off. Choosing synchronous replication adds latency but prevents data loss. Choosing small batches reduces latency but limits throughput. These aren't bugs—they're fundamental engineering decisions.
Monitor what matters. End-to-end latency with business-relevant percentiles. Break down by pipeline stage. Alert on degradation, not just failures.

Most importantly, you'll stop debating whether something is "real-time" and start designing systems that deliver data when and where it's needed—in real real-time.

References

As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!

Netflix cost analysis based on industry presentations and blog posts discussing their data lake architecture. Specific 38x figure commonly cited in streaming architecture discussions, though exact source documentation may vary. For current Netflix data architecture details, see their technology blog and conference presentations on data platform evolution. ↩

Nifi Bundle Release Announcement

David Kjerrumgaard — Fri, 12 Sep 2025 22:32:49 +0000

New Release: Enhanced Apache NiFi Connector for Pulsar v2.1.0

I'm excited to announce the availability of an updated version of the Apache NiFi connector for Pulsar! This week, we dedicated time to implementing much-needed improvements that will enhance your data streaming experience.

Community-Driven Improvements

First and foremost, I want to extend our heartfelt gratitude to the community members who took the time to report issues and provide valuable feedback. Your contributions are essential to making this connector more robust and reliable for everyone.

Key Changes in This Release

Added support for OAuth2 credentials - Enhanced authentication flexibility by supporting clientId/clientSecret instead of requiring private key files (see issue #85 for details)
Added Pulsar MessageID and message properties to the outbound FlowFiles - Pulsar MessageID and message properties are properly captured and forwarded to outbound FlowFiles (addresses issue #67)
Optimized publisher resource management - Improved performance by reusing PublisherLease objects instead of creating new publishers for each FlowFile
Updated to Apache NiFi 2.1.0 - Latest NiFi version support with newest features and security updates
Updated to Apache Pulsar 3.3.7 - Latest stable Pulsar release integration

What's New

This release includes several key improvements and updates that significantly enhance the connector's functionality and performance:

New Features

Added Pulsar MessageID and message properties to the outbound FlowFiles
Added support for OAuth2 authentication using clientId, clientSecret

Enhanced OAuth2 Authentication Support

We've expanded authentication options by adding support for OAuth2 authentication using clientId and clientSecret credentials. This provides a more flexible alternative to private key file-based authentication, making it easier to integrate with modern cloud-based Pulsar deployments and enterprise authentication systems.

Added Pulsar MessageID and Message Properties to Outbound FlowFiles

The connector now properly captures and forwards Pulsar MessageID and message properties to outbound FlowFiles. This enhancement ensures that important message metadata is preserved throughout your data processing pipeline, enabling better message tracking, debugging, and downstream processing decisions based on message properties.

Optimized Publisher Resource Management

We've implemented a smarter approach to managing Pulsar publishers by reusing existing PublisherLease objects when possible, rather than storing publishers in a cache. This architectural improvement simplifies the design while preventing the unnecessary creation of new Pulsar Publisher instances for every FlowFile, resulting in better resource utilization and improved performance under high-throughput scenarios.

Platform Updates

Updated to Apache NiFi 2.1.0: Ensuring compatibility with the newest features and security updates
Updated to Apache Pulsar 3.3.7: Latest stable release providing improved performance and reliability

Getting Started

The updated connector maintains backward compatibility while providing enhanced functionality. You can download the latest version from Maven Central and find installation instructions in the project repository.

Looking Forward

I remain committed to maintaining and improving this connector based on community feedback. If you encounter any issues or have suggestions for future enhancements, please don't hesitate to open an issue in our GitHub repository.

Thank you again to our community for your continued support and contributions. Together, we're building better tools for real-time data processing and streaming.

For technical support or questions about the Apache NiFi connector for Pulsar, visit my GitHub repository or reach out to the community.*

As an Apache Pulsar committer, I'm always interested in hearing about your experiences with streaming data technologies. Feel free to reach out with questions or share your own insights!