System Design Interview: Autocomplete / Type-ahead System (Final Part)
In Part 1, we covered requirements and capacity planning.
In Part 2, we explored tries, top-K optimization, and system architecture.
In this final part, we’ll focus on scaling the trie, efficient updates, deployment strategies, and partitioning trade-offs — the exact topics that distinguish a good system design answer from an excellent one.
📌 Table of Contents (Final Part)
- Splitting the Trie Across Servers
- Updating the Trie Efficiently
- Deployment Strategies
- Persisting the Trie
- Data Partitioning Strategies
- Trade Off Summary
- Low Level Design
- Final Thoughts
Splitting the Trie Across Servers
Once range-based partitioning is selected, the trie must be physically split across machines in a way that keeps:
- Routing simple
- Lookups fast
- Latency predictable
Alphabet-Based Sharding
The most straightforward strategy is alphabet-based sharding.
Example
Shard 1 → a–k
Shard 2 → l–z
Each shard:
- Stores its own trie fragment
- Has multiple replicas
- Independently serves autocomplete requests
Why This Works
- Deterministic routing
- Single-shard reads
- No result aggregation
- Stable latency
This is ideal for prefix-based systems like autocomplete.
Finer-Grained Sharding
As traffic and data grow, shards can be incrementally split.
Examples
Shard 1 → a
Shard 2 → b
Shard 3 → c–d
Or even:
Shard A1 → ap
Shard A2 → am
Shard A3 → an
This approach allows scaling without redesigning the system.
Why Prefix-Based Sharding Works So Well
- Each request hits exactly one shard
- No fan-out queries
- No cross-shard aggregation
- Latency remains low at global scale
Updating the Trie Efficiently
Updating the trie synchronously for every search is impractical.
Autocomplete systems must decouple reads from writes.
Logging and Aggregation Pipeline
- User searches are logged asynchronously
- Logs are written to a durable store (Kafka / Cassandra)
- Periodic batch jobs aggregate frequencies
This avoids blocking the serving path.
Pruning During Aggregation
To control growth:
- Very old queries are discarded
- Queries below a frequency threshold are removed
- Only meaningful prefixes survive
This keeps the trie compact and relevant.
Applying Updates Safely
- A new trie is built offline
- Top-K lists are recomputed
- Once validated, the new trie atomically replaces the old one
👉 Read traffic is never blocked during updates.
Deployment Strategies
Trie updates must be deployed without downtime.
Blue-Green Deployment
- New trie version built in parallel
- Validated with shadow traffic
- Traffic switched atomically
- Old version discarded
This is the safest approach.
Master-Slave Strategy
- Master serves live traffic
- Slave is rebuilt and updated
- Roles are swapped
Both strategies ensure zero downtime and fast rollback.
Persisting the Trie
Rebuilding a trie from raw logs can take hours.
To enable fast recovery:
- Trie snapshots are periodically persisted
- Nodes are serialized level-by-level
Example Serialization
C2,A2,R1,T,P,O1,D
On restart:
- Trie is reconstructed from snapshots
- Top-K lists are recomputed
- Service resumes quickly
Data Partitioning Strategies
When the dataset grows beyond a single machine, partitioning strategy becomes critical.
Range-Based Partitioning
Data is split using natural prefix ranges.
Example
Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z
Why It Fits Autocomplete
- Prefix routing is trivial
- Only one shard per request
- Trie traversal stays localized
Limitation
- Popular prefixes (
a,s,th) can create hot shards - Requires finer splits over time
Capacity-Based Partitioning
Servers are filled until capacity limits are reached.
Example
Server 1 → a → apple → apply (full)
Server 2 → aws → banana → ball
Advantages
- Efficient hardware utilization
Challenges
- Dynamic prefix-to-server mapping
- Routing depends on metadata
- More operational complexity
Often requires ZooKeeper-like coordination.
Hash-Based Partitioning
Data is distributed using a hash function.
Example
hash("apple") → Server 2
hash("apply") → Server 7
Why It Fails for Autocomplete
- Prefixes are scattered
- One request hits multiple shards
- Requires result aggregation
- Increases latency
🚫 Hash-based partitioning is usually avoided for prefix search systems.
Partitioning Strategy Summary
👉 Range-based partitioning aligns naturally with prefix queries and is the foundation for scalable autocomplete systems.
Distributed Trie Storage (Expanded)
At scale, the trie is:
- Logically one structure
- Physically distributed
Each machine owns a subset of prefixes but behaves as part of a unified system.
Trie Partitioning Example
Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z
Hot prefixes are split further:
Shard A1 → a
Shard A2 → b
Shard A3 → c
This prevents hotspots and balances load.
Replication Strategy
Each shard is replicated:
Shard 1 → Replica 1, Replica 2, Replica 3
Benefits
- High Availability
- Fault Tolerance
- Read Scalability
Storage Technologies
Cassandra (Durable Layer)
Best when:
- Trie partitions are large
- QPS is extremely high
- Cross-region replication is required
Usage:
- Trie stored as serialized blocks
- Loaded into memory on startup
- Cassandra acts as source of truth
ZooKeeper (Coordination)
ZooKeeper manages:
- Prefix-to-shard mappings
- Replica health
- Leader election
- Atomic trie version switching
It guarantees consistent routing decisions.
Trade Off Summary
| Aspect | Decision | Reasoning |
|---|---|---|
| Data Structure | Trie | Efficient prefix matching |
| Ranking | Pre-computed Top-K | Predictable low latency |
| Cache | Redis | Absorb hot prefixes |
| Updates | Offline batching | Protect read path |
| Partitioning | Range-based | Single-shard prefix routing |
| Storage | Distributed replicas | Scalability & availability |
Low Level Design
LLD Goal
Explain how data is structured, how lookups work internally, and why latency is predictable.
🔹 LLD – Trie Node Structure
TrieNode {
children: Map<Character, TrieNode>
isTerminal: boolean
frequencyScore: number
topKSuggestions: MinHeap / Sorted List (size K)
}
What Each Field Means
-
children→ next characters -
isTerminal→ end of a valid query -
frequencyScore→ popularity (freq + recency) -
topKSuggestions→ pre-computed best results for this prefix
🔹 LLD – Trie Example
Queries:
"app" (50)
"apple" (100)
"app store" (200)
Trie view:
a
└── p
└── p
├── app* [top-K: app store, apple, app]
├── apple* [freq=100]
└── app store* [freq=200]
Each prefix node already knows its best suggestions.
🔹 LLD – Prefix Lookup Flow
Input Prefix: "app"
1. Traverse trie: a → p → p
2. Reach prefix node
3. Read pre-computed top-K
4. Return immediately
⏱ Time Complexity: O(length of prefix)
🔹 LLD – Cache Key Design
Key: autocomplete:{locale}:{prefix}
Example:
autocomplete:en-IN:how
Value:
[
"how to cook rice",
"how to lose weight",
"how to make tea"
]
🔹 LLD – Cache + Trie Interaction
App Server
|
|---> Redis (Hit?) ----> Return result
|
|---> Trie Storage ----> Fetch top-K
|
+--> Store in Redis (TTL)
🔹 LLD – Distributed Trie Partitioning
Shard 1 → prefixes a–c
Shard 2 → prefixes d–f
Shard 3 → prefixes g–z
For hot prefixes:
Shard A1 → ap
Shard A2 → am
Shard A3 → an
✔ Single-shard lookup
✔ No aggregation
✔ Predictable latency
🔹 LLD – Write / Update Path (Offline)
User Searches
|
v
+-------------+
| Log Stream | (Kafka)
+-------------+
|
v
+-------------+
| Aggregation | (Batch Job)
+-------------+
|
v
+-------------+
| Build New |
| Trie |
+-------------+
|
v
+-------------+
| Atomic Swap |
+-------------+
Writes never block reads.
🔹 LLD – Failure Handling
| Failure | Behavior |
|---|---|
| Redis down | Read directly from trie |
| Trie shard down | Route to replica |
| App server down | LB removes it |
| Region down | Route to nearest region |
🎯 LLD Interview Explanation (1–2 lines)
“Each trie node stores pre-computed top-K suggestions, so autocomplete never traverses subtrees at runtime.
This guarantees predictable latency even at very high QPS.”
3️⃣ HLD vs LLD – Interview Cheat Sheet
| Aspect | HLD | LLD |
|---|---|---|
| Focus | Components & flow | Data structures & internals |
| Audience | Product / System | Engineers |
| Latency | Cache → Trie | O(prefix length) |
| Updates | Offline | Atomic swap |
| Scaling | Shards + replicas | Prefix routing |
Final Thoughts
Autocomplete looks simple — but at scale, it’s a carefully engineered distributed system.
The winning design principles are:
- Tries for structure
- Pre-computation for speed
- Caching for hot paths
- Batch updates for safety
- Replication for resilience
If you can explain this flow clearly in an interview, you demonstrate not just knowledge — but systems thinking.
More Details:
Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/front-end-system-design
Top comments (0)