DEV Community

Sophia Bennett
Sophia Bennett

Posted on

I Built a Notification System That Handles 10M Messages a Day, Here's What I Learned About System Design Interviews

Last week I sat down to design a notification system from scratch. Push notifications, SMS, email, the whole stack, supporting 10 million daily alerts with sub 200ms latency and zero data loss tolerance. Sounds like a typical Tuesday at a big tech interview, right?

Except this time I wasn't whiteboarding alone in front of a panel hoping I remembered to mention idempotency. I was actually wiring up the architecture, watching costs tick up per hour, and getting real feedback on whether my design would survive contact with reality.

Here's the thing about system design interview prep that nobody tells you early enough: reading "Designing Data-Intensive Applications" and watching YouTube videos about CAP theorem will teach you concepts, but it won't teach you the actual skill being tested, which is decision making under constraints.

The Problem With Most System Design Practice

Most people prepare for system design interviews in one of two ways. They either memorize a handful of "famous" designs (design Twitter, design Uber, design a URL shortener) and hope the interviewer asks something close enough, or they pair up with a friend for a mock interview and spend half the session arguing about whether the friend's feedback is even correct.

Neither approach builds the muscle you actually need: thinking through tradeoffs in real time, justifying component choices, and reacting when requirements shift.

When I was prepping for interviews, the gap I kept hitting wasn't conceptual. I knew what a message queue was. I knew what a dead letter queue was for. What I didn't have was reps. Actual practice building these systems end to end, with constraints that forced real decisions instead of textbook answers.

What Actually Changed My Prep

I started treating system design practice the way I'd treat LeetCode grinding, except instead of solving the same two-pointer problem for the fortieth time, I was assembling architectures piece by piece and seeing immediate consequences.

For the notification system challenge, the requirements were specific and unforgiving:

  • Support push, SMS, and email delivery
  • Handle retries on failure
  • Process 10 million daily alerts
  • Keep latency under 200ms
  • Guarantee zero data loss
  • Everything has to be asynchronous
  • Stay under a fixed hourly budget

That budget constraint changes everything. Suddenly "just add more servers" isn't a free answer. You start asking yourself questions interviewers actually want to hear: do I need three separate notification services for each channel, or can one service handle routing logic and delegate to channel-specific workers? Where does the queue sit so a slow SMS provider doesn't block email delivery? What happens to a notification that fails three times, does it disappear or does it land somewhere you can inspect and replay?

I ended up landing on an API Gateway routing to dedicated notification services per channel, an MSK-backed queue decoupling ingestion from delivery, separate delivery workers per channel so failures don't cascade, and a dead letter queue catching anything that exhausts its retries. Monitoring and a logging layer sat underneath the whole thing because "zero data loss" is a claim you need to be able to prove, not just assert.

None of that is exotic. But getting there by actually building it, watching the cost meter, and getting nudged when something looked off was a completely different learning experience than reading a blog post about the same architecture.

Why Building Beats Watching

There's a specific moment in system design prep where things click, and it's not when you finish a course or finish a book. It's when you make a wrong call, see why it's wrong, and fix it yourself.

If you design a notification system and only use one shared queue for both push and SMS, and SMS providers are notoriously slow and flaky, you'll bottleneck your push notifications behind retry storms from SMS failures. You can read that warning in an article. Or you can build it, watch your design choke, and never forget the lesson again.

That's the difference between system design courses that hand you diagrams to memorize and a system design lab where you're the one placing components, connecting them, and dealing with the fallout of bad choices.

A Practical Way to Practice

If you're prepping for interviews right now, here's what I'd actually recommend instead of just rereading the same five blog posts about designing a chat app:

Treat constraints as the real test. Latency targets, budget caps, data guarantees, these aren't flavor text. They're the actual interview question. Anyone can draw boxes and arrows. The skill is justifying why this box and not that one, given the numbers you were handed.

Practice failure scenarios, not just happy paths. What happens when your queue backs up. What happens when one delivery channel goes down. What happens when you hit 10x traffic. If your practice sessions never break, you're not really practicing.

Get feedback that's specific to your design, not generic. A mock interview with a friend who's also prepping is fine, but it's not the same as something that flags "hey, your retry logic here introduces a single point of failure" while you're actually building.

Repeat with variation. Don't design the same notification system five times. Design a URL shortener, then a rate limiter, then a chat system, then come back to messaging systems with a different constraint set. The patterns start showing up, and so does your confidence.

I've been using Scaledojo for exactly this kind of structured practice, it's basically a sandbox where you assemble real architectures against specific design challenges, see live cost and latency tradeoffs as you build, and get nudged when a connection or component choice doesn't hold up. It scratches the itch of "I want to actually build this, not just talk about building it," which for me was the missing piece between knowing system design concepts and being able to perform under interview pressure.

The Real Takeaway

System design interviews aren't testing whether you've memorized the architecture for Twitter. They're testing whether you can take ambiguous requirements, apply real constraints, and make defensible engineering decisions out loud, in real time, while someone watches.

The only way to get good at that is to do it, over and over, with feedback loops tight enough that you actually learn from each rep instead of just feeling busy.

So if your prep so far has mostly been reading and watching, try flipping it around. Pick a system, set real constraints on yourself, build it, break it, and fix it. Your future interviewer will be able to tell the difference between someone who memorized a diagram and someone who's actually done the work.

What's the hardest system design question you've run into in an interview? I'd genuinely like to know, drop it in the comments.

Top comments (1)

Collapse
 
mamoor_ahmad profile image
Mamoor Ahmad

Impressive 👍😍👍