DEV Community

Cover image for Why Daily Transaction Volume is the Wrong Number to Quote in System Design Interviews
Aliasgar
Aliasgar

Posted on • Originally published at aliasgarmkantawala.hashnode.dev

Why Daily Transaction Volume is the Wrong Number to Quote in System Design Interviews

The Mistake Most Candidates Make
Picture this: You are in a system design interview. The interviewer asks you to design a ledger system or a payment settlement pipeline. Eager to show you can handle scale, you confidently state, "We'll design this to handle 100K transactions a day."

The interviewer nods. But that number just told them very little about whether you can actually engineer the system.

Leading with total daily volume is the most common trap candidates fall into. It sounds like a big, impressive number. But in reality, quoting a daily aggregate proves you are looking at the system through a marketing lens, not an engineering one.

What Actually Matters: The Shape of the Workload
Systems do not experience load as a perfectly smooth, distributed stream over 24 hours. They experience load in waves, spikes, and violent bursts. To design a resilient architecture, the raw daily number is useless. You need three specific dimensions:

Peak TPS (Transactions Per Second): The absolute highest throughput the system must sustain during peak traffic.

Workload Shape: How traffic distributes over time — business hours concentration, sudden marketing flashes, or scheduled batch windows.

Burst Duration: How long that peak sustained load lasts. Is it a 5-second spike or a 45-minute sustained mountain?

Let's do the math that most engineers skip in an interview.

If you take 100K transactions and spread them evenly across 24 hours, you get roughly 1 TPS — one transaction per second. A modest, single-core database instance running on a laptop can handle that in its sleep. You don't need a distributed system for 1 TPS; you barely need a framework.

But real-world systems don't work that way.

A Real-World Case Study: The Deceptive Average
I ran into this exact problem while building an end-of-day (EOD) reconciliation service for a major instant payment rail — one that processes tens of thousands of financial transactions daily and must verify every single one against an external banking ledger within a strict compliance window.

On paper, the system processed around 100K transactions a day. Reasonable. Manageable. Nothing alarming.

But the shape of the workload told a completely different story.

Daytime traffic trickled in slowly. Then, as merchants settled their books, payroll batches triggered, and consumers made last-minute transfers, load climbed sharply toward end of day. And at the reconciliation window itself — where the external bank flushed the final ledger data in a concentrated burst — everything had to be processed before the next business day opened.

When we actually measured the workload shape during that final window, the naive average and the engineering reality had almost nothing in common:

Naive average: ~1 TPS, spread across 24 hours

Actual peak: 22 TPS, concentrated in a 45-minute EOD window

That is a 22x gap between the number you'd quote in a planning meeting and the number your system actually has to survive.

Why This Changes Every Engineering Decision
If you design for 1 TPS average, you might reasonably ask: do we even need a message queue here? A simple cron job running a sequential loop would work.

But when you design for a 22 TPS burst with heavy downstream dependencies, the entire architectural conversation shifts.

Thread pool sizing — your reconciliation worker can no longer be a single-threaded job. You have to reason about concurrency, resource contention, and memory footprints before writing a single line of code.

Idempotency is non-negotiable — at burst load, network hiccups happen and retries are guaranteed. If your database upsert logic isn't strictly idempotent, you will be reconciling duplicates at 3 AM wondering why the ledgers don't balance.

Downstream capacity limits — the external banking APIs you are calling have their own strict rate limits. Your 22 TPS ambition is irrelevant if the counterparty throttles you at 5 TPS. This forces you to design token-bucket rate limiters and robust client-side queuing — and to invest in building an in-house downstream simulator that covers all failure paths, not just the happy path.

Pragmatic architecture over greenfield hype — when faced with a compressed burst window and complex recovery paths, a trendy event-driven microservices approach might introduce massive operational complexity. If the external bank operates on a secure, file-based exchange model, a robust file-chunking batch architecture might actually be the more defensible, resilient choice for Phase 1. It's boring, but it's correct.

The Questions an Interviewer Actually Wants You to Ask
Whenever an interviewer hands you a vague requirement like "Design a system that handles X hundred thousand requests a day," don't immediately start drawing boxes on the whiteboard.

Stop. And ask the single question that separates senior practitioners from theorists:

"What does the peak traffic look like, and over what specific window does it occur?"

From there, you want to understand whether specific business events — market close, midnight clearing, payroll batches — concentrate the load into a predictable window, and what the latency and rate limits of your downstream dependencies look like precisely during that window.

When you ask these questions, the interviewer knows you have actually operated systems at scale. You are no longer guessing; you are gathering the constraints required to build a real system.

Closing Thought
Daily transaction volume is a marketing number.

Peak TPS over a compressed burst window is an engineering number.

The next time you are in the hot seat, ignore the aggregate daily volume. Figure out the shape of the peak, calculate the true strain on the system, and design for that instead.

If this kind of thinking is what you want to sharpen before your next system design round, I work with senior engineers on exactly this — translating real production experience into interview-ready depth. You can find me at topmate.io/aliasgar_kantawala.

Top comments (0)