Nguyen Phuc Hai

Posted on May 21

How I Practice System Design with AI (URL Shortener Walkthrough)

#systemdesign #architecture #ai #interview

I've been doing system design interviews for years - both as a candidate and as an interviewer. The hardest part to practice alone is not the knowledge. It's the process: starting from requirements, running the numbers, evaluating multiple options before committing, and making explicit trade-off arguments at the end.

Reading about it helps. Actually working through it is different.

A few months ago I built a structured multi-step AI plan that walks through a complete system design session end-to-end. I've been using it to practice against different systems. This post shows exactly what it produces, using a URL shortener as the example.

The problem with single-prompt system design

The obvious move when practicing system design with AI is something like:

"You are a senior engineer. Design a URL shortener. Cover requirements, architecture, database schema, caching, and scalability."

You get back something that looks thorough. Redis for caching. PostgreSQL for storage. Load balancers. CDN. The answer ticks the boxes.

But try pushing on it. Why Redis specifically and not Memcached? What QPS number drove that decision? Why 302 and not 301? Why that partitioning key?

The AI cannot answer because it did not derive these choices from anything - it pattern-matched from the thousands of URL shortener articles it has seen. The output sounds right because the training data sounds right.

The other problem: a single prompt collapses the entire design process into one shot. Real system design is sequential. You cannot choose an architecture before running the numbers. You cannot make a trade-off argument before evaluating the alternatives. Skipping the steps does not make the output wrong, it makes the reasoning invisible.

A different approach: chain the steps

System design interviews have a well-known format for good reason. You are expected to work through specific phases in order: clarify requirements, estimate scale, propose and compare architecture options, draw the high-level design, deep-dive into the critical components, and close with trade-offs. Skipping phases or doing them out of order is a signal that you do not have a structured approach.

The insight is that this format maps directly onto a multi-step AI workflow. Instead of one big prompt, you instruct the AI to follow the same interview structure, one phase at a time, where each phase builds on the output of the previous one.

I structured the workflow as 7 sequential steps that mirror the formal system design interview format:

Step	Interview phase	What it does
1 - Requirements	Clarify requirements	Clarifies and completes the requirements, fills in missing NFR defaults, states assumptions
2 - Back-of-envelope	Estimate scale	Derives traffic, storage, bandwidth, and cache estimates with arithmetic
3 - Architecture options	Propose options	Proposes 2-3 options with pros/cons, recommends one, produces a comparison diagram
4 - High-level design	High-level design	Component overview, data flow, full architecture diagram
5 - Deep-dive	Deep-dive	Data model, API design, scalability strategies, failure modes table
6 - Trade-offs	Trade-offs	Decision table, known limitations, future improvements
7 - Final doc	-	Assembles everything into a single coherent document

Each step prompt references prior outputs using {{step_id}} placeholders. By the time the failure modes table is written, the AI knows the exact QPS numbers, the dominant bottleneck, and which architecture was chosen - and why. Nothing is invented in isolation.

I built this using Askimo Plans which lets you define multi-step AI sessions in YAML. Here is a short demo of how the plan runs:

But the structure itself is what matters - you could implement the same chain in any tool that passes context between steps.

Walking through the URL shortener

I provided these inputs:

System: URL Shortener
Functional requirements: shorten URL, redirect to original, custom aliases, click analytics
Non-functional: 99.99% availability, redirect latency < 10ms p99
Scale hints: 500M users, 50M DAU. 200M new URLs/day (2,300 writes/sec avg)

The requirements step does not design anything. It restates what you gave it precisely, fills in missing non-functional defaults, and marks explicit assumptions. One thing worth noting from this run: my input just said "redirect." The AI flagged that 301 is permanent - browsers cache it, so users only hit your service once and analytics stop working. It surfaced 302 as the correct choice. Small detail, real architectural consequence, and the right place to catch it.

The back-of-envelope step is the one most people skip in practice and get destroyed on in interviews. For this system: 35K peak redirect QPS, 915 GB for URL records, 365 TB for analytics over 5 years, 183 GB cache for hot URLs. That 365 TB number immediately rules out storing analytics in the same database as URLs. The 1000:1 read/write ratio flags this as read-heavy where almost everything depends on cache hit rate. These derived numbers drive every architecture decision that follows.

The architecture options step takes those numbers and proposes 2-3 distinct designs with pros/cons and a Mermaid comparison diagram. For this system it landed on a read/write split with async analytics: stateless redirect service backed by Redis and read replicas, separate write service, analytics into Kafka then ClickHouse. The recommendation is justified by the numbers, not by what sounds architecturally fashionable.

The high-level design, deep-dive, and trade-offs steps each build on what came before. Every sizing decision in the architecture diagram traces back to the estimates. The failure modes table references the bottlenecks identified two steps earlier. The trade-off table - the step most candidates skip - has every row traceable to a prior decision: 302 vs 301 from requirements, Redis over Memcached from the 183 GB cache estimate, Kafka over direct ClickHouse write from the < 10ms p99 redirect SLA.

The design is a starting point, not a dead end

Once the plan finishes, Askimo keeps the full context of the entire run in memory. You can keep the conversation going with follow-up questions and the AI already knows everything it produced.

For example, the high-level design above is deliberately cloud-agnostic. If you want to deploy it on AWS, you can ask:

"Map this architecture to AWS services"

And because the AI has the full context - the 35K peak QPS, the 183 GB Redis cluster, the Kafka pipeline, the ClickHouse analytics store - the response is not generic. It translates each specific component: ElastiCache for Redis, MSK for Kafka, ALB + ECS Fargate for the redirect service, RDS Aurora with read replicas for PostgreSQL, and suggests whether ClickHouse on EC2 or an alternative like Redshift makes more sense given the 365 TB analytics volume.

Same idea for other follow-ups:

"How would I run this on GCP?" - gets you Cloud Memorystore, Pub/Sub, Cloud Run, Cloud SQL
"What would a Kubernetes deployment look like for the redirect service?" - gets you HPA config, resource limits, liveness probes tuned to the latency SLA
"Estimate the monthly AWS cost for this design at 35K QPS" - gets you a rough cost breakdown per service

The context the plan built is what makes these answers useful. Without the prior chain, you get generic cloud mapping. With it, you get something specific to the system you actually designed.

Why the chained context matters

The trade-off table above is not invented. Every row traces back to a decision made in a previous step:

302 vs 301 came from the requirements step, where the AI flagged that click analytics requires 302. A 301 causes browsers to cache the redirect and stop hitting your service, so analytics data stops flowing.
NoSQL over RDBMS follows from the back-of-envelope step. 50TB of URL mappings and 500K peak QPS rules out a manually sharded PostgreSQL cluster. The NoSQL choice is the direct consequence of those numbers, not a default assumption.
ALB over API Gateway for the read path follows from the back-of-envelope. At 10B redirects per day, API Gateway's per-request billing becomes cost-prohibitive. The estimate made that visible before any architecture was drawn.
Async analytics via Kafka follows from the < 10ms p99 redirect SLA in requirements. Writing 115K click events per second synchronously on the redirect hot path would blow the latency budget. Kafka decouples the write so the redirect service returns immediately.
DynamoDB Atomic Counters for ID generation follows from the collision guarantee in requirements, combined with the NoSQL-only architecture chosen in the previous step. No ZooKeeper cluster required.

A single prompt cannot produce this because there is no prior context to reference. The AI just picks whatever sounds reasonable for a URL shortener. The multi-step approach forces the conclusions to be derived, not guessed.

The plan structure (for the curious)

Behind this session is a YAML file with 7 steps. Each step has a single clear goal and receives the outputs of all prior steps as context via {{step_id}} references.

Step	Goal	Key inputs
`requirements`	Clarify, complete, and de-ambiguate the inputs	Your functional/NFR/scale hints
`back-of-envelope`	Derive traffic, storage, bandwidth, and cache estimates with arithmetic	`{{requirements}}`
`architecture-options`	Propose 2-3 distinct options with pros/cons and a Mermaid comparison diagram	`{{requirements}}`, `{{back-of-envelope}}`
`high-level`	Component overview, data flow, full architecture diagram	`{{requirements}}`, `{{back-of-envelope}}`, `{{architecture-options}}`
`deep-dive`	Data model, API design, failure modes table, sequence diagrams	`{{high-level}}`
`tradeoffs`	Decision table, known limitations, future improvements	`{{requirements}}`, `{{high-level}}`, `{{deep-dive}}`
`final`	Assemble all outputs into a single shareable document	All prior steps

The {{step_id}} references are what make the chain work. The back-of-envelope step does not re-read your inputs — it reads the clarified requirements from step 1. The architecture options step does not guess — it sees the numbers from step 2 and the requirements from step 1. Each step does one thing.

The full YAML is on GitHub. You can copy it, load it into any Askimo instance, or use it as a template for your own plan variants.

Adapting it to other systems

I've run this against a few different systems now:

Real-time chat - the architecture options step centres on WebSocket vs SSE vs long-polling trade-offs; message ordering and delivery guarantees dominate the deep-dive
Ride-sharing platform - matching latency becomes the dominant constraint; the back-of-envelope gets interesting when you factor in geospatial indexing
Video streaming - storage and CDN costs utterly dominate everything; the trade-offs around pre-encoding vs adaptive bitrate streaming are where the depth is

The plan structure stays the same. The conclusions change based on what the numbers show.

Trying it yourself

The plan is built into Askimo if you want to run it as-is. You fill in system name, requirements, and scale hints, then step through the session with whatever AI provider you use (OpenAI, Claude, Gemini, or a local model via Ollama).

You can also adapt the YAML to add steps - a cost estimation step after back-of-envelope, a security review step, a migration plan step. The plan editor has an AI generator that writes the YAML from a plain-English description if you do not want to write it by hand.

The full source for the system design plan is available on GitHub. Contributions welcome.

DEV Community