DEV Community: LeetDezine

Latency vs Throughput

LeetDezine — Thu, 07 May 2026 03:17:41 +0000

The first time I heard "optimize for latency," I thought it meant "make it fast." So I turned off batching, flushed writes immediately, set Kafka's linger.ms to 0.

The system responded faster. And handled way less load.

Latency and throughput pull in opposite directions. Making your system faster for individual requests usually means it handles fewer of them per second. Handling more per second usually means individual requests wait longer. This tradeoff shows up everywhere and misidentifying which axis to optimize is one of the most common architecture mistakes.

DB Batch Writes

Your database is getting hammered with writes. Each write goes to disk immediately.

Latency per write: 10ms
One thread handles: 100 writes per second

Change the approach: collect 100 records, flush in one batch. One disk operation instead of 100.

Throughput went up — the disk does the same work with 100× fewer operations.

Latency went up — the first record now waits for 99 more before anything gets written.

You traded fast individual responses for higher total capacity.

Netflix's Loading Spinner

Netflix needs to send you a video. Two options:

Option A: stream one tiny chunk the moment it's ready. Low latency, you start watching fast. But thousands of tiny chunks = thousands of network trips per second = inefficient. Fewer users served.

Option B: buffer and send larger chunks. Fewer network trips, more users served per second. But you wait a few seconds before playback starts.

That loading spinner isn't a bug. Netflix deliberately accepts higher startup latency to serve more users efficiently. Same tradeoff, product-level decision.

Kafka's linger.ms

This one makes the tradeoff explicit. Kafka producers have a config called linger.ms — how long the producer waits before flushing a batch.

linger.ms = 0:
  Every event fires immediately — one network call per event
  At 100K events/sec = 100K network calls/sec
  Low latency per event, terrible network efficiency

linger.ms = 5:
  Producer accumulates events for 5ms, flushes together
  At 100K events/sec ≈ 20K batch calls/sec
  Slightly higher latency per event, 5× better throughput

linger.ms is a literal dial between the two extremes. Kafka doesn't choose for you — it expects you to understand the tradeoff and set it intentionally.

The Pattern

Every example is the same thing:

Make individuals wait → serve more of them overall.

Scenario	What you traded	What you got
DB batch writes	Per-record speed	Total capacity
Netflix buffering	Startup speed	User capacity
Kafka linger.ms > 0	Message delay	Network efficiency

The inverse is also true: processing immediately = low latency, lower throughput. You're always on this spectrum.

How to Decide

Ask: what does a bad experience look like for this system?

Chat message takes 5 seconds to send → users leave. Optimize for latency.
Analytics pipeline runs overnight → nobody cares if a log arrived 2 seconds late. Optimize for throughput.
Payment confirmation → user is staring at a loading screen. Optimize for latency.
Log processing → you're crunching billions of events in bulk. Optimize for throughput.

SQS Standard vs SQS FIFO vs Kafka

LeetDezine — Sat, 02 May 2026 03:43:22 +0000

LeetDezine

The SQS FIFO vs Standard decision sounds simple: need ordering? Use FIFO. Don't care? Use Standard.

That framing leads to the wrong answer more often than not. The real question isn't "do I need ordering?" — it's "what actually breaks if messages arrive out of order or more than once?"

The Core Difference

SQS Standard:

Throughput: high (can scale via sharding, region defaults are generous)
Delivery: at-least-once — the same message can appear more than once
Ordering: best-effort — messages might arrive out of order

SQS FIFO:

Throughput: 3,000 messages/sec per queue (300 per message group without batching)
Delivery: exactly-once — deduplication built in via MessageDeduplicationId
Ordering: strict — messages within a group are delivered in send order

FIFO looks better on every axis except throughput. That exception is the one that kills you at scale.

The Throughput Wall

A notification system handling 5M messages/sec — Instagram-scale, celebrity post, 10M followers needing push notifications — runs into a hard wall with FIFO.

At 3K/sec per queue, you need 1,667 queues to absorb the peak. Now you need routing logic: which notification goes to which queue? You need to partition users, maintain queue mappings, handle rebalancing. You've built Kafka, badly.

Even at moderate scale — 50K/sec — that's 17 FIFO queues with custom routing on top. The operational complexity eats the simplicity SQS was supposed to give you.

The ceiling is architectural, not configurable. FIFO queues don't scale horizontally the way Kafka partitions do.

The Ordering Trap

Most engineers reach for FIFO because ordering sounds important. It's usually the wrong diagnosis.

Ordering matters when out-of-order processing corrupts state. A bank ledger where debit must follow credit. An event sourcing system where aggregate state is reconstructed from events in sequence. A stock trading system where order execution has legal ordering requirements.

For a notification system? Notification B arriving before notification A doesn't break anything. The user sees both.

What engineers actually want when they say "ordering" is usually idempotency — the ability to process a message twice without sending duplicate notifications. That's not a queue property, that's a consumer design property.

An idempotent consumer checks: "have I already sent this notification?" If yes, skip. Now you get SQS Standard's throughput with the same safety guarantee — and you're not fighting a 3K/sec ceiling.

The question to ask before picking FIFO: "What breaks if this message arrives twice or slightly out of order?" If the answer is "nothing, as long as the consumer handles it" — you don't need FIFO.

What Neither Gives You: Replay

Both FIFO and Standard share one critical limitation nobody talks about enough.

Messages are deleted after consumption.

You deploy a bug that silently sends wrong notifications for 2 hours. By the time you notice, SQS has deleted every message from that window — consumed and acknowledged, gone. You cannot rewind. You cannot re-process the affected window with fixed code. The data doesn't exist in the queue anymore.

This is not a theoretical edge case. Silent bugs in notification systems happen. The right response is to replay the message stream with fixed code — SQS makes that impossible.

Kafka retains messages on disk for a configurable window (7 days by default). Rewind the consumer offset to 2 hours ago, re-process the entire window. Every message that went out wrong gets corrected.

The Decision Framework

	SQS Standard	SQS FIFO	Kafka
Throughput	High	3K/sec/queue	500K–1M/sec/broker
Ordering	Best-effort	Strict per group	Strict per partition
Delivery	At-least-once	Exactly-once	At-least-once
Replay	No	No	Yes
Fan-out	Manual (multiple queues)	Manual	Consumer groups (free)
Right for	Task queues, moderate volume	Ledgers, ordered state machines	High-throughput, fan-out, replay

Use Standard when: task distribution, idempotent consumers, moderate throughput, operational simplicity is the priority.

Use FIFO when: you genuinely need strict ordering and exactly-once semantics — financial ledgers, event sourcing — and your peak throughput fits within 3K messages/sec per queue.

Use Kafka when: you need fan-out to multiple independent consumers, replay capability for failure recovery, or throughput that SQS can't handle without building complexity yourself.

The wrong answer at scale isn't using SQS Standard when you should have used FIFO. It's using either when your actual requirements — throughput, fan-out, replay — are requirements that Kafka was built for.

Full notification system case study → LeetDezine

What Actually Breaks in a URL Shortener Design at Scale?

LeetDezine — Wed, 29 Apr 2026 05:01:50 +0000

LeetDezine

Everyone can describe a URL shortener. Write a row to the DB, generate a short code, cache it on reads. The base design fits on a napkin.

The interesting part is what happens when you push on any one of those steps. Where does it break? Why? And what's the fix that actually holds at scale?

Here are four traps I've seen candidates walk straight into — each one looks correct on the surface.

1. The Truncation Trap

A Snowflake ID is 64 bits. You only need 36 bits to cover 50 billion URLs (2^36 = 68 billion). Encode 36 bits in base62 and you get exactly 6 characters. Clean short code, no collision check needed.

The natural move: drop the rightmost 28 bits, encode what's left. You're keeping the timestamp, which feels like the important part.

[ timestamp — 41 bits ][ machine ID — 10 bits ][ sequence — 12 bits ]
 ←────── keep 36 bits ─────────────────────────→ ←── drop 28 bits ──→
                                                  (machine + sequence)

Here's what breaks: the rightmost 28 bits you dropped contain the sequence number — the counter that differentiates two Snowflake IDs generated on the same server in the same millisecond.

Request A — server 3, t=1700000001ms, seq=1
keep 36 bits → "x7k2p9"

Request B — server 3, t=1700000001ms, seq=2
keep 36 bits → "x7k2p9"  ✗  collision

The only difference between A and B was seq=1 vs seq=2. You dropped that. At 1000 creations/sec, two requests landing in the same millisecond is not an edge case — it's constant.

The rule: you cannot get both uniqueness and shortness by truncating a Snowflake. All three sections — timestamp, machine ID, sequence — contribute to the guarantee. Drop any of them and you break it.

The options that actually work: accept 11-char codes from the full Snowflake, use random 6-char base62 with a collision check, or pre-generate keys where uniqueness is native to the key size.

2. Redis INCR Looks Perfect. It Has One Fatal Flaw.

For collision-free short code generation, Redis INCR is elegant. Atomic counter increments — every call returns a unique integer. Base62-encode it. Done. No collision checks, no retries, no background service.

INCR url_counter  →  1000000
Base62(1000000)   →  "004C9M"
INSERT short_code = "004C9M"

The problem has nothing to do with code generation. It's about what sequential codes leak.

If a user receives yoursite.com/004C9M, they know the previous URL was yoursite.com/004C9L and the next will be yoursite.com/004C9N. They can walk the entire sequence and enumerate every URL in your system.

For an internal tool, this is fine. For a public shortener — where someone might shorten a pre-announcement, an internal doc, or a private file — it's a privacy violation.

The second problem: Redis INCR makes Redis a hard dependency on every creation request. Redis down → creation fails immediately, zero fallback.

The fix: KGS + pre-generated key pool. A Key Generation Service pre-generates random base62 codes offline and loads them into a Redis list. App servers pop keys with LPOP. Codes are in random order — no enumerability. App servers pre-fetch a local batch of 100 keys, so Redis going down doesn't immediately stop creation.

Redis INCR is right for internal tools. KGS + pool is right for public shorteners.

3. What Actually Happens When Redis Dies

Most failure mode answers in interviews stop at "Redis is replicated." That's a configuration, not a plan.

Redis absorbs ~80% of all redirect reads. At 1M reads/sec, the DB only sees 200k/sec. Redis dies. 1M reads/sec hits DB nodes sized for 200k/sec.

Without a circuit breaker, every request tries Redis, waits 500ms for the connection timeout, then falls back to DB. At 1M requests/sec, that's your thread pool stalled for 500ms each. No request completes. The DB never gets a clean chance to respond. Total cascade.

Redis down, no circuit breaker:
Request → try Redis → wait 500ms → timeout → fallback to DB
× 1,000,000 requests/sec
= thread pool exhausted, DB never reached

Circuit breaker fixes the timeout overhead. After N failures in T seconds, the circuit opens — requests skip Redis entirely and go straight to DB. Latency rises but the system stays alive.

Still not enough. DB peak capacity is ~800k reads/sec. The API Gateway throttles the overflow, returning 503 to some percentage of redirect requests. The system degrades — but doesn't collapse.

The trap that kills most answers: "auto-scaling handles this." A new Postgres replica takes minutes to provision and catch up WAL replication. Your traffic surge is immediate. Auto-scaling is for gradual growth, not cache failure.

Partial availability beats total cascade. Always.

4. LPOP Is Atomic by Architecture, Not by Lock

If you build a pre-generated key pool in Redis, a natural interview question is: "how do you prevent two app servers from popping the same key simultaneously?"

You don't have to. Redis handles it.

Redis is single-threaded. Every command executes one at a time. If 20 app servers call LPOP at the exact same millisecond, Redis processes them sequentially:

App Server 1  →  LPOP  →  "x7k2p9"  (removed)
App Server 2  →  LPOP  →  "k2m8q1"  (removed)
App Server 3  →  LPOP  →  "p9n3r7"  (removed)

Physically impossible for two calls to return the same value. Not because of a lock — because of the execution model.

Compare to a Postgres key pool. You'd need SELECT FOR UPDATE SKIP LOCKED — row-level locking on every creation request. Expensive, complex, and a bottleneck under high concurrency.

The batch pre-fetch makes it even better. Each app server grabs 100 keys at startup and refills when empty. At 1k creations/sec across 20 servers, Redis traffic drops from 1000 LPOP calls/sec to ~10 batch refills/sec. 100x reduction. Same correctness guarantee.

A crashed server loses its local batch — at most 100 keys out of a 100M key pool. That's 0.0001%. Don't bother recovering them.

The Pattern Across All Four

Each trap comes from a decision that's locally correct but breaks a property you assumed was safe: uniqueness, privacy, fault tolerance, atomicity.

The way to catch these in an interview isn't to memorize solutions. It's to ask "what does this break?" for every component you add.

The full URL shortener case study walks through every deep dive in this sequence — requirements, estimation, base architecture, then each failure mode in detail:

For in depth analysis check → LeetDezine

Why Is Redis INCR a Bad Fit for a Public URL Shortener?

LeetDezine — Thu, 23 Apr 2026 17:18:06 +0000

LeetDezine

Redis INCR is one of those solutions that looks perfect the first time you see it. Atomic counter increments. Every call returns a unique integer. Base62-encode it and you have a short code — zero collision checks, zero retries, no background service.

It's cleaner than anything else on the board. So why does every serious URL shortener reject it?

The answer has nothing to do with code generation.

How Redis INCR Works (And Why It's Technically Correct)

The mechanics are clean:

Creation request arrives
→ Redis: INCR url_counter → returns 1000000
→ Base62 encode 1000000:

  Divide repeatedly, collect remainders, stop when quotient = 0:

  1000000 ÷ 62 = 16129  remainder 22 → 'M'  (quotient != 0, keep going)
  16129   ÷ 62 = 260    remainder 9  → '9'  (quotient != 0, keep going)
  260     ÷ 62 = 4      remainder 12 → 'C'  (quotient != 0, keep going)
  4       ÷ 62 = 0      remainder 4  → '4'  (quotient = 0, stop)

  Read remainders bottom to top: "4C9M" → pad to 6 chars → "004C9M"

→ INSERT short_code = "004C9M"
→ Done.

Redis is single-threaded. INCR is atomic — it increments and returns in a single operation. Two simultaneous calls always get different values:

App server 1: INCR → 1000000
App server 2: INCR → 1000001  ← different, guaranteed
App server 3: INCR → 1000002  ← different, guaranteed

No race condition. No collision. No retry loop. Encoding a unique number always produces a unique code. The math is correct.

So what's the problem?

Problem 1 — Sequential Codes Are a Privacy Violation

Counter values are sequential. If your user receives yoursite.com/004C9M, they immediately know:

yoursite.com/004C9L  ← previous URL, someone else's
yoursite.com/004C9N  ← next URL, someone else's
yoursite.com/004C9K  ← keep going...
yoursite.com/004C9J  ← and going...

They can walk the entire database. Every URL in your system is discoverable by incrementing one character.

For an internal tool where all users are trusted, this might be fine. For a public shortener — where someone might shorten a pre-announcement link, an internal doc, a private file, a personal photo album — it's a real privacy violation. Your users have a reasonable expectation that their short link isn't guessable.

Sequential codes make that expectation impossible to satisfy.

Problem 2 — Redis Becomes a Hard Dependency on Every Creation

With INCR, the hot path looks like this:

Request → INCR Redis → encode → INSERT DB

Redis is in the critical path of every single URL creation. If Redis goes down:

Redis down
→ INCR fails
→ No counter value
→ Creation fails immediately
→ Zero fallback

There's no graceful degradation. No buffer. No local state to drain. The moment Redis is unreachable, your creation endpoint returns errors.

The Fix: KGS + Pre-Generated Key Pool

The Key Generation Service approach flips the model. Instead of generating a key at request time, keys are generated in advance and stored in a Redis pool. When a request arrives, the app server just pops one.

Before any request arrives:
→ KGS generates random base62 codes offline
→ Loads them into Redis list (RPUSH)

When creation request arrives:
→ App server pops key from local batch
→ INSERT into DB
→ Done — zero Redis call on hot path

Why LPOP is atomic: Redis is single-threaded. Even if 20 app servers call LPOP at the same millisecond, Redis processes them one at a time:

App server 1: LPOP → "x7k2p9" (removed)
App server 2: LPOP → "k2m8q1" (removed)
App server 3: LPOP → "p9n3r7" (removed)

Physically impossible for two LPOP calls to return the same key. No locks needed. No SELECT FOR UPDATE. Atomicity comes from the architecture.

The batch pre-fetch: Each app server grabs 100 keys at startup and keeps them in local memory. At 1k creations/sec across 20 servers, Redis traffic drops from 1000 LPOP/sec to ~10 batch refills/sec. 100x reduction.

App server starts:
→ LPOP 100 keys → store in local queue

Creation request:
→ Pop from local queue (zero network call)
→ Queue empty → refill from Redis

What this fixes for Redis failure:

Redis down
→ App servers drain local batch (100 keys × 20 servers = 2000 keys)
→ At 1k creations/sec → ~2 seconds of local runway
→ Circuit breaker engages, Redis recovers
→ Graceful degradation instead of hard failure

Side by Side

	Redis INCR	KGS + Pool
Collision checks	None	None
Code predictability	Sequential — enumerable	Random — private
Redis failure	Creation fails instantly	Local batch buys time
Operational cost	Very simple	Small background worker
Right for	Internal tools	Public URL shortener

The Pattern

Redis INCR fails not because of what it does, but because of what it leaks. Sequential uniqueness and privacy are in direct conflict. You can't have both with a counter.

The KGS + pool approach keeps the "no collision checks, no retries" guarantee while adding randomness and resilience. The operational cost is a 50-line background worker and one metric to monitor. The privacy and fault tolerance gains are worth it for any public-facing system.

The full URL shortener case study — including requirements, DB design, caching, peak traffic, and every failure mode — is at:

→ https://leetdezine.com/?utm_source=devto

Why Random UUIDs are Killing Your Database Performance

LeetDezine — Mon, 20 Apr 2026 10:15:57 +0000

Every developer starts with a UUID. It’s the industry standard for a reason: zero coordination, zero DB checks, and zero single point of failure. Any machine can generate one and be 100% sure it’s unique.

But as your system scales, that "standard" choice starts to hurt.

The Problem: UUIDs vs. Databases

If you're using UUID v4 (completely random), you're essentially handing your database a grenade.

Because the IDs are random, every new insert lands in a random spot in your B-Tree index. This causes page splits, fragments your storage, and slows down your writes as the table grows. Plus, at 128 bits (16 bytes), they're twice as large as a standard BIGINT.

The Evolution of ID Generation

Single Server Counter: Simple, but if the server dies, your ID generation stops (SPOF).
UUID v4: Globally unique, but random and huge. No time-sortability.
UUID v7: The modern middle ground. It's still 16 bytes, but it's time-sortable, which fixes the database page-split problem.
Ticket Server (Redis): Centralized counter. Fast, but now your ID generation depends on Redis availability.
Snowflake IDs: The "Big Tech" solution (used by Twitter, Discord, and Instagram).

Why Snowflake Wins

Snowflake IDs pack everything you need into just 64 bits (8 bytes). They fit perfectly into a standard BIGINT, making them fast to index and easy to store.

Here is the breakdown of how those 64 bits are structured:

1 bit (Sign): Always 0 (keeps the number positive).
41 bits (Timestamp): Milliseconds since a custom epoch. This gives you ~69 years of IDs and makes them natively time-sortable.
10 bits (Machine ID): Allows up to 1,024 independent nodes to generate IDs simultaneously without talking to each other.
12 bits (Sequence): A counter for IDs generated in the same millisecond on the same machine (up to 4,096 IDs/ms).

The Comparison

Property	UUID v4	UUID v7	Snowflake
Size	128-bit	128-bit	64-bit
Sortable	❌ No	✅ Yes	✅ Yes
Coordination	✅ None	✅ None	✅ None
DB Friendly	❌ No	✅ Yes	✅ Best

Which one should you choose?

Quick Prototypes: Stick with UUID v4. It’s easy and requires zero setup.
Modern Web Apps: Move to UUID v7. You get the simplicity of UUIDs with the performance of time-sortable IDs.
High-Scale Systems: Go with Snowflake. When every byte and every millisecond of database latency matters, 64-bit sortable IDs are the only way to go.

The Golden Rule: You can't just "trim" a UUID to make it shorter. Trimming 128 bits down to 6 characters for a "short link" throws away 92 bits of entropy, turning a global guarantee into a collision nightmare.

For a full deep dive into the math and architecture behind distributed IDs, check out the case study at LeetDezine