"Your cache just died at 30,000 RPS. Walk me through what happens."
Most candidates pause, then describe it in the abstract. The cache goes down, requests hit the database, the database gets overwhelmed. Technically correct. Completely unconvincing.
The candidate who has actually watched a cache stampede answers differently:
"p99 goes from 48ms to around 2,400ms in under a second. Connection pool exhaustion on the database is what causes it, not the database being slow, but the number of concurrent connections queued waiting for a slot. At 30K RPS with a 0% cache hit rate and a database connection pool sized for 10% of that traffic, the pool saturates in roughly 400 milliseconds. Error rate climbs from 0.1% to 34% before any circuit breaker fires."
That answer comes from having seen it happen, not from having read about it. Chaos engineering is how you see it happen before an interview, or before production does it for you.
Why system design interviews now test failure reasoning
The shift is real and it has happened over the last two to three years. Drawing the architecture is the entry bar, not the differentiator. Interviewers at companies running large distributed systems are explicitly looking for failure reasoning: what your design does when a component behaves badly, how failure propagates, and what you designed to contain it.
The question "what happens when your cache fails?" is not a curveball. It is a standard follow-up to any design that includes a cache.
So is:
- "What happens when your primary database goes down?"
- "What does your system do when the payment processor is slow but not timing out?"
- "What happens if all your clients reconnect at the same moment after a 30-second outage?"
Each of those maps to a named chaos engineering scenario. Knowing the scenario means knowing what metric changes first, what cascade follows, and what design decision prevents or contains it.
Four categories, eight scenarios, eight interview questions
Network chaos
Latency injection
What the interviewer is testing with: "Your third-party payment processor starts responding slowly. Your checkout service calls it synchronously. Walk me through the impact on your cart API."
What latency injection shows: In a synchronous call chain, injected latency multiplies. A 200ms injection on one downstream service adds 200ms to every request that touches it. If your checkout service makes five synchronous calls and one of them is slow, end-to-end p99 climbs by the full injection amount, not a fraction of it.
The design decision being tested is whether you have identified which dependencies should be async and which genuinely need to be synchronous.
Network partition
What the interviewer is testing with: "Your cache and your application servers can't communicate. What does your system do?"
What network partition shows: Whichever side of the partition keeps running will exhaust its connection pool trying to reach the other side. Without circuit breakers, callers queue requests that will never complete.
With partition, you see the CAP theorem stop being theoretical. You are making a real choice between consistency and availability, and your design either has an explicit answer or it does not.
Infrastructure chaos
Node failure
What the interviewer is testing with: "One of your three application servers goes down. What happens to traffic?"
What node failure shows: Single points of failure that were not obvious on the diagram become obvious when you kill one component and watch traffic stack on the survivors. If your load balancer is not health-checking aggressively enough, it keeps routing to the dead node. If your remaining nodes were sized assuming N capacity rather than N-1, they saturate immediately.
Cache stampede
What the interviewer is testing with: "You do a rolling deploy and your cache gets cleared. It's 2pm on a Tuesday. Walk me through the next 60 seconds."
What cache stampede shows: Forcing the cache hit rate to near zero at meaningful RPS reveals whether your origin can handle the full load. Usually it cannot, because the origin was sized assuming the cache absorbs 80 to 95% of reads.
The database connection pool saturates, queue depth climbs, and p99 spikes.
The design decisions it tests:
- Probabilistic early expiration, so the cache never goes fully cold.
- Request coalescing, so a thousand simultaneous cache misses for the same key only produce one origin request.
- Write-through caching, so the cache is never empty after a deploy.
Data layer chaos
Replication lag
What the interviewer is testing with: "Your read replicas are falling 8 seconds behind the primary. Which users notice, and what do they experience?"
What replication lag shows: Write-then-read flows break first. A user updates their profile photo and immediately refreshes. The replica serves the old photo.
The design decision it tests is whether you have identified which flows require reading from the primary versus which can tolerate eventual consistency. Most designs treat all reads as replica-safe. Replication lag reveals which ones are not.
Connection pool exhaustion
What the interviewer is testing with: "Your database is running at 30% CPU but your application is throwing connection timeout errors. What's happening?"
What connection pool exhaustion shows: A database can be entirely healthy while the application layer is unable to use it, because the connection pool is full of connections that are slow to complete.
This scenario teaches that pool utilization is a more important leading indicator than database CPU. The design decision it tests is how you size pools and whether you have pool utilization alerts before exhaustion, not after.
Traffic chaos
Request spike
What the interviewer is testing with: "Your service goes viral. Traffic goes from 5,000 to 50,000 RPS in four minutes. What breaks first?"
What a request spike shows: The first component to saturate is always the one with the smallest capacity headroom, which is often not the component you expected. At 10x traffic, the cache usually holds because hit rate stays high. The database write path saturates first if writes are not queued.
Auto-scaling helps, but it has a lag. The design decision it tests is what happens between the spike starting and the new capacity coming online.
Thundering herd
What the interviewer is testing with: "Your service goes down for 30 seconds and comes back up. All your clients try to reconnect at the same moment. What happens?"
What thundering herd shows: Coordinated load is more damaging than the same volume of load spread over time. A million clients reconnecting in the same two-second window produce a load spike that no steady-state capacity planning accounts for.
The design decisions it tests are jittered reconnection backoff on the client side and request coalescing or rate limiting on the server side.
How to narrate a chaos scenario in an interview
The structure that works:
- State the steady state first.
- Describe the injection.
- Walk through the cascade in order.
- Give specific numbers.
- State your design fix.
Steady state:
"At 30K RPS with a warm cache, p99 is 48ms and error rate is 0.1%."
Injection:
"The cache restarts, so hit rate drops to near zero."
Cascade:
"Origin requests spike by a factor of 10. The database connection pool is sized for 3,000 concurrent connections. At 30K RPS hitting origin, that pool exhausts in under 500 milliseconds."
Numbers:
"p99 climbs to 2,400ms. Error rate hits 34%."
Fix:
"Request coalescing at the cache layer means a thousand simultaneous misses for the same key produce one origin request. Probabilistic early expiration means the cache never goes fully cold during a deploy."
The numbers are what make the answer credible. Abstract descriptions of cascade failure sound like everyone else's answer.
Where to actually practice this
Reading about cache stampede and watching it happen on a live metrics graph are different experiences.
I built a free browser-based chaos engineering simulator with all 28 scenarios: load any blueprint, run traffic at your target RPS, inject a chaos scenario, and watch p99, error rate, and throughput change in real time. No infrastructure, no signup, runs entirely in the browser.
The cache stampede on the Twitter / X Clone blueprint is the most instructive one to start with.
The difference is familiarity
The candidates who answer chaos questions well are not smarter. They have seen the failure mode before.
They know that connection pool exhaustion, not database CPU, is what kills a cache stampede. They know that latency injection multiplies through synchronous call chains. They know that thundering herd hits hardest in the two seconds after recovery, not during the outage.
That familiarity is the difference between a hand-wave and a convincing answer, and it is entirely learnable before the interview.



Top comments (0)