System Design Interview: Autocomplete / Type-ahead System (Final Part)
In Part 1, we covered requirements and capacity planning.
In Part 2, we explored tries, top-K optimization, and system architecture.
In this final part, weβll focus on scaling the trie, efficient updates, deployment strategies, and partitioning trade-offs β the exact topics that distinguish a good system design answer from an excellent one.
π Table of Contents (Final Part)
- Splitting the Trie Across Servers
- Updating the Trie Efficiently
- Deployment Strategies
- Persisting the Trie
- Data Partitioning Strategies
- Trade Off Summary
- Low Level Design
- Final Thoughts
Splitting the Trie Across Servers
Once range-based partitioning is selected, the trie must be physically split across machines in a way that keeps:
- Routing simple
- Lookups fast
- Latency predictable
Alphabet-Based Sharding
The most straightforward strategy is alphabet-based sharding.
Example
Shard 1 β aβk
Shard 2 β lβz
Each shard:
- Stores its own trie fragment
- Has multiple replicas
- Independently serves autocomplete requests
Why This Works
- Deterministic routing
- Single-shard reads
- No result aggregation
- Stable latency
This is ideal for prefix-based systems like autocomplete.
Finer-Grained Sharding
As traffic and data grow, shards can be incrementally split.
Examples
Shard 1 β a
Shard 2 β b
Shard 3 β cβd
Or even:
Shard A1 β ap
Shard A2 β am
Shard A3 β an
This approach allows scaling without redesigning the system.
Why Prefix-Based Sharding Works So Well
- Each request hits exactly one shard
- No fan-out queries
- No cross-shard aggregation
- Latency remains low at global scale
Updating the Trie Efficiently
Updating the trie synchronously for every search is impractical.
Autocomplete systems must decouple reads from writes.
Logging and Aggregation Pipeline
- User searches are logged asynchronously
- Logs are written to a durable store (Kafka / Cassandra)
- Periodic batch jobs aggregate frequencies
This avoids blocking the serving path.
Pruning During Aggregation
To control growth:
- Very old queries are discarded
- Queries below a frequency threshold are removed
- Only meaningful prefixes survive
This keeps the trie compact and relevant.
Applying Updates Safely
- A new trie is built offline
- Top-K lists are recomputed
- Once validated, the new trie atomically replaces the old one
π Read traffic is never blocked during updates.
Deployment Strategies
Trie updates must be deployed without downtime.
Blue-Green Deployment
- New trie version built in parallel
- Validated with shadow traffic
- Traffic switched atomically
- Old version discarded
This is the safest approach.
Master-Slave Strategy
- Master serves live traffic
- Slave is rebuilt and updated
- Roles are swapped
Both strategies ensure zero downtime and fast rollback.
Persisting the Trie
Rebuilding a trie from raw logs can take hours.
To enable fast recovery:
- Trie snapshots are periodically persisted
- Nodes are serialized level-by-level
Example Serialization
C2,A2,R1,T,P,O1,D
On restart:
- Trie is reconstructed from snapshots
- Top-K lists are recomputed
- Service resumes quickly
Data Partitioning Strategies
When the dataset grows beyond a single machine, partitioning strategy becomes critical.
Range-Based Partitioning
Data is split using natural prefix ranges.
Example
Shard 1 β aβc
Shard 2 β dβf
Shard 3 β gβz
Why It Fits Autocomplete
- Prefix routing is trivial
- Only one shard per request
- Trie traversal stays localized
Limitation
- Popular prefixes (
a,s,th) can create hot shards - Requires finer splits over time
Capacity-Based Partitioning
Servers are filled until capacity limits are reached.
Example
Server 1 β a β apple β apply (full)
Server 2 β aws β banana β ball
Advantages
- Efficient hardware utilization
Challenges
- Dynamic prefix-to-server mapping
- Routing depends on metadata
- More operational complexity
Often requires ZooKeeper-like coordination.
Hash-Based Partitioning
Data is distributed using a hash function.
Example
hash("apple") β Server 2
hash("apply") β Server 7
Why It Fails for Autocomplete
- Prefixes are scattered
- One request hits multiple shards
- Requires result aggregation
- Increases latency
π« Hash-based partitioning is usually avoided for prefix search systems.
Partitioning Strategy Summary
π Range-based partitioning aligns naturally with prefix queries and is the foundation for scalable autocomplete systems.
Distributed Trie Storage (Expanded)
At scale, the trie is:
- Logically one structure
- Physically distributed
Each machine owns a subset of prefixes but behaves as part of a unified system.
Trie Partitioning Example
Shard 1 β aβc
Shard 2 β dβf
Shard 3 β gβz
Hot prefixes are split further:
Shard A1 β a
Shard A2 β b
Shard A3 β c
This prevents hotspots and balances load.
Replication Strategy
Each shard is replicated:
Shard 1 β Replica 1, Replica 2, Replica 3
Benefits
- High Availability
- Fault Tolerance
- Read Scalability
Storage Technologies
Cassandra (Durable Layer)
Best when:
- Trie partitions are large
- QPS is extremely high
- Cross-region replication is required
Usage:
- Trie stored as serialized blocks
- Loaded into memory on startup
- Cassandra acts as source of truth
ZooKeeper (Coordination)
ZooKeeper manages:
- Prefix-to-shard mappings
- Replica health
- Leader election
- Atomic trie version switching
It guarantees consistent routing decisions.
Trade Off Summary
| Aspect | Decision | Reasoning |
|---|---|---|
| Data Structure | Trie | Efficient prefix matching |
| Ranking | Pre-computed Top-K | Predictable low latency |
| Cache | Redis | Absorb hot prefixes |
| Updates | Offline batching | Protect read path |
| Partitioning | Range-based | Single-shard prefix routing |
| Storage | Distributed replicas | Scalability & availability |
Low Level Design
LLD Goal
Explain how data is structured, how lookups work internally, and why latency is predictable.
πΉ LLD β Trie Node Structure
TrieNode {
children: Map<Character, TrieNode>
isTerminal: boolean
frequencyScore: number
topKSuggestions: MinHeap / Sorted List (size K)
}
What Each Field Means
-
childrenβ next characters -
isTerminalβ end of a valid query -
frequencyScoreβ popularity (freq + recency) -
topKSuggestionsβ pre-computed best results for this prefix
πΉ LLD β Trie Example
Queries:
"app" (50)
"apple" (100)
"app store" (200)
Trie view:
a
βββ p
βββ p
βββ app* [top-K: app store, apple, app]
βββ apple* [freq=100]
βββ app store* [freq=200]
Each prefix node already knows its best suggestions.
πΉ LLD β Prefix Lookup Flow
Input Prefix: "app"
1. Traverse trie: a β p β p
2. Reach prefix node
3. Read pre-computed top-K
4. Return immediately
β± Time Complexity: O(length of prefix)
πΉ LLD β Cache Key Design
Key: autocomplete:{locale}:{prefix}
Example:
autocomplete:en-IN:how
Value:
[
"how to cook rice",
"how to lose weight",
"how to make tea"
]
πΉ LLD β Cache + Trie Interaction
App Server
|
|---> Redis (Hit?) ----> Return result
|
|---> Trie Storage ----> Fetch top-K
|
+--> Store in Redis (TTL)
πΉ LLD β Distributed Trie Partitioning
Shard 1 β prefixes aβc
Shard 2 β prefixes dβf
Shard 3 β prefixes gβz
For hot prefixes:
Shard A1 β ap
Shard A2 β am
Shard A3 β an
β Single-shard lookup
β No aggregation
β Predictable latency
πΉ LLD β Write / Update Path (Offline)
User Searches
|
v
+-------------+
| Log Stream | (Kafka)
+-------------+
|
v
+-------------+
| Aggregation | (Batch Job)
+-------------+
|
v
+-------------+
| Build New |
| Trie |
+-------------+
|
v
+-------------+
| Atomic Swap |
+-------------+
Writes never block reads.
πΉ LLD β Failure Handling
| Failure | Behavior |
|---|---|
| Redis down | Read directly from trie |
| Trie shard down | Route to replica |
| App server down | LB removes it |
| Region down | Route to nearest region |
π― LLD Interview Explanation (1β2 lines)
βEach trie node stores pre-computed top-K suggestions, so autocomplete never traverses subtrees at runtime.
This guarantees predictable latency even at very high QPS.β
3οΈβ£ HLD vs LLD β Interview Cheat Sheet
| Aspect | HLD | LLD |
|---|---|---|
| Focus | Components & flow | Data structures & internals |
| Audience | Product / System | Engineers |
| Latency | Cache β Trie | O(prefix length) |
| Updates | Offline | Atomic swap |
| Scaling | Shards + replicas | Prefix routing |
Final Thoughts
Autocomplete looks simple β but at scale, itβs a carefully engineered distributed system.
The winning design principles are:
- Tries for structure
- Pre-computation for speed
- Caching for hot paths
- Batch updates for safety
- Replication for resilience
If you can explain this flow clearly in an interview, you demonstrate not just knowledge β but systems thinking.
More Details:
Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/front-end-system-design
Top comments (0)