31/60 Days System Design Questions!

#beginners #backend #database #systemdesign

Your e-commerce platform just crossed 2M daily active users. 70% are in the US, 30% in Europe.

Latency complaints are piling up from European users — 380ms average round-trip to your US-East region. Support tickets are up 40%. Black Friday is in 6 weeks.

Your infrastructure: single AWS us-east-1 region, RDS PostgreSQL (primary), Redis cache, 12 microservices behind an API Gateway.

You need to get European latency under 80ms. The engineering team is debating four approaches.

Here's your constraint: you cannot afford a full database rewrite, and you need this shipped before Black Friday.

A) Active-Active multi-region — deploy the full stack in eu-west-1, use a distributed database (CockroachDB or Aurora Global), route users to the nearest region. Writes go to both regions simultaneously.

B) Active-Passive with read replicas — keep us-east-1 as primary, spin up eu-west-1 as a hot standby with read replicas. European reads go local, writes still go to US. Failover in minutes if US goes down.

C) CDN + Edge caching — keep the single region, push static assets and cacheable API responses to CloudFront edge nodes in Europe. No database changes.

D) Active-Active with eventual consistency — deploy full stack in both regions, allow each region to own its writes, sync asynchronously. Accept that a European user might see a US write 200ms late.

Three of these are real patterns production teams use. Only one actually solves the problem you have — under your constraints, before Black Friday.

Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.

If your team has been in this exact debate, share this with them. The tradeoffs are what matter.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #DistributedSystems #SoftwareArchitecture

Top comments (5)

Joud Awad • Jun 6

Why B wins:
The constraint is the answer. You can't rewrite the database, and you have 6 weeks.

European users are complaining about read latency — browsing products, checking order status, loading their profile. Those are reads. An RDS read replica in eu-west-1 costs you 1–2 days of infra work and immediately moves European reads to ~15ms instead of 380ms. Writes still go to us-east-1 (acceptable — users tolerate slightly higher write latency for checkout). Replication lag is typically under 100ms.

Active-passive also gives you a real failover story: US goes down → promote EU replica in minutes.

Joud Awad • Jun 6

hy A is the trap (Active-Active + distributed DB):
This is the "correct at scale" answer that kills teams before Black Friday. Migrating from RDS PostgreSQL to CockroachDB or Aurora Global under a 6-week deadline is a multi-quarter project — distributed transactions, global clock skew, new failure modes to test. Right architecture for the long term. Wrong answer for the problem as stated.

Joud Awad • Jun 6

Why C is wrong (CDN + Edge caching):
CDN solves static asset latency — images, JS, CSS. It does nothing for dynamic reads (user cart, order history, account data) and zero for write latency. It's a layer you add on top of a real solution, not instead of one. Partial credit, not the answer.

Joud Awad • Jun 6

Why D is dangerous (Active-Active + eventual consistency):
Most honest version of multi-region — and the most dangerous to deploy without prep. Eventual consistency across regions means a European user places an order and your US region sees a different inventory count for 200ms. For limited stock = oversell. For checkout = duplicate charges. Conflict resolution is a product problem disguised as an infra problem.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.