System Design Interview: Autocomplete / Type-ahead System (Part 2)
In Part 1, we covered the problem statement, requirements, and capacity estimation.
In this part, weβll deep-dive into the core data structures, ranking optimizations, and high-level system architecture that make autocomplete systems fast and scalable at Google-level traffic.
π Table of Contents (Part 2)
- Choosing the Core Data Structure
- Optimizing the Trie Structure
- How Top K Suggestions Are Retrieved
- Building and Maintaining Top K Lists
- High Level System Architecture
- Storage, Replication, and Availability
Choosing the Core Data Structure
Autocomplete is fundamentally a prefix-matching problem.
Given a prefix, return the most relevant strings that start with it β fast.
This immediately eliminates naive solutions like scanning a list or database table. We need a structure that is prefix-aware by design.
Why a Trie Works Well
A Trie (Prefix Tree) stores strings character by character.
- Each node represents a prefix
- All descendants share that prefix
Key Advantages
- Lookup time depends only on prefix length
- No scanning of unrelated strings
- Naturally groups similar queries
- Perfect fit for prefix search
Example
Stored queries:
be, bee, bell, bent, belt
Trie traversal for prefix "be" leads to a subtree containing:
bee, bell, bent, belt
This is exactly what autocomplete needs β fast prefix isolation.
Optimizing the Trie Structure
A naive trie becomes extremely large when storing billions of queries.
Problem with a Naive Trie
- Too many nodes
- Deep chains with single children
- High memory overhead
- Excessive pointer chasing
Compressed Trie (Radix Tree)
In many cases, nodes have only one child.
These chains can be merged into a single edge storing multiple characters.
Benefits
- Reduced memory usage
- Fewer pointer traversals
- Faster cache-friendly lookups
This optimized trie is often called a Radix Tree or Compact Trie.
Storing Popularity Information
How popular is a full search query?
Each complete query ends at a terminal node in the trie.
At that node, we store:
- Search frequency
- Or a weighted score combining frequency + recency
Example
User searches:
"apple" β 100 searches
"application" β 60 searches
"app store" β 200 searches
Trie view:
app
βββ apple (freq=100)
βββ application (freq=60)
βββ app store (freq=200)
At this stage:
- We know popularity of full queries
- We cannot yet answer prefix queries fast
This layer answers:
βHow popular is each query?β
How Top K Suggestions Are Retrieved
Given a prefix, how do we return the best suggestions instantly?
Naive Approach β
For prefix "be":
- Traverse to prefix node
- Traverse entire subtree
- Collect all terminal queries
- Sort by frequency
- Return top K
π« This is far too slow for real-time systems.
Pre Computed Top K Optimization β
To guarantee fast responses:
π Each trie node stores the top K suggestions for that prefix.
So at query time:
Traverse to prefix node β return stored top-K
No subtree traversal.
No sorting.
No computation on request path.
Storage Optimization
To reduce memory overhead:
- Store references to terminal nodes
- Store frequency alongside references
This is a classic space vs latency trade-off β and autocomplete is extremely read-heavy, so itβs worth it.
Building and Maintaining Top K Lists
Top-K lists are built bottom-up.
Build Process
- Leaf nodes know their own frequency
- Parent nodes merge childrenβs top-K lists
- Only top K entries are retained
- Process propagates upward to root
This ensures:
- Every prefix node has pre-sorted suggestions
- Lookup time is O(prefix length)
Two-Layer Mental Model
Think of autocomplete as two separate layers:
Layer 1: Popularity Tracking (Data Layer)
- Tracks how often full queries are searched
- Stored at terminal nodes
- Updated offline via logs and aggregation
Layer 2: Fast Retrieval (Serving Layer)
- Uses popularity data to compute rankings
- Stores only top-K per prefix
- Optimized purely for read latency
High Level System Architecture
Autocomplete is a read-heavy, latency-critical pipeline.
Serve from memory first β fall back to structured storage β never compute on the request path
Autocomplete is not a compute problem β itβs a data access problem.
HLD Goal
Show how traffic flows end-to-end and where latency is optimized.
At HLD level, interviewers want to see:
- Clear separation of responsibilities
- Read-heavy optimization
- Scalability and availability
πΉ High-Level Architecture Diagram
+-------------------+
| User (Browser) |
+-------------------+
|
v
+-------------------+
| CDN / Edge |
| (Optional Cache) |
+-------------------+
|
v
+-------------------+
| Global Load |
| Balancer |
+-------------------+
|
v
+-------------------+
| Stateless App |
| Servers |
+-------------------+
|
v
+-------------------+
| In-Memory Cache |
| (Redis) |
+-------------------+
|
Cache Miss
v
+-------------------+
| Distributed Trie |
| Storage |
+-------------------+
πΉ HLD Component Responsibilities
1. Client (Browser / Mobile)
- Sends only prefix, not full query
- Debounces keystrokes (e.g., 100β150 ms)
- Reduces unnecessary backend calls
2. CDN / Edge Cache (Optional but Powerful)
- Caches extremely hot prefixes (
"a","how","the") - Reduces round-trip latency
- Shields backend during traffic spikes
3. Global Load Balancer
- Routes user to nearest region
- Distributes traffic evenly
- Removes unhealthy instances
4. Stateless Application Servers
- Input normalization (
"How "β"how") - Cache lookup orchestration
- Fallback to trie storage
- No state β easy horizontal scaling
5. Redis (Hot Prefix Cache)
- Stores top-K suggestions per prefix
- Sub-millisecond response
- TTL auto-expires stale results
6. Distributed Trie Storage
- Stores pre-computed top-K per prefix
- Read-only on serving path
- Partitioned + replicated
π― HLD Interview Explanation (1β2 lines)
βAutocomplete is a read-heavy system, so we always try to serve from memory first.
Only on cache miss do we hit distributed trie storage, which already has pre-computed top-K results.β
Storage, Replication, and Availability
A full autocomplete trie cannot live on a single machine.
Distributed Trie Storage
The trie is:
- Logically one structure
- Physically distributed across many servers
Partitioning Strategy
Common approach: prefix-based partitioning
Examples:
-
aβcβ Shard 1 -
dβfβ Shard 2 - Hot prefixes (
a,s,th) further subdivided
This keeps shards balanced and scalable.
Replication Strategy
Each trie partition is replicated across multiple servers.
Why Replication Matters
High Availability
Node failures donβt impact serviceFault Tolerance
Handles hardware failures and deploymentsRead Scalability
Replicas share massive QPS load
Storage Technology Choices
Cassandra (Durable Storage)
Best fit when:
- Large trie partitions
- Very high read throughput
- Cross-region replication needed
Usage pattern:
- Trie partitions stored as serialized blocks
- Loaded into memory on startup
- Cassandra acts as durable backing store
Strengths:
- Horizontal scalability
- High availability
- Tunable consistency
ZooKeeper (Metadata & Coordination)
ZooKeeper is not used to store trie data.
It is used for:
- Partition metadata
- Prefix-to-server mappings
- Leader election (if required)
- Atomic version switching during trie rebuilds
Replication Guarantees
Replication ensures:
- β High availability
- β Fault tolerance
- β Read scalability
π― Final Takeaway (Part 2)
A real-world autocomplete system succeeds because it:
- Uses tries for prefix isolation
- Pre-computes top-K suggestions
- Serves requests from memory first
- Pushes all heavy computation offline
- Scales via partitioning and replication
This is the level of clarity interviewers look for.
More Details:
Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/front-end-system-design
Top comments (0)