ZeeshanAli-0704

Posted on Dec 21, 2025

System Design Interview: Autocomplete / Type-ahead System (Final Part)

#systemdesign #algorithms #architecture #systemdesignwithzeeshanali

System Design Interview: Autocomplete / Type-ahead System (Final Part)

In Part 1, we covered requirements and capacity planning.
In Part 2, we explored tries, top-K optimization, and system architecture.

In this final part, we’ll focus on scaling the trie, efficient updates, deployment strategies, and partitioning trade-offs — the exact topics that distinguish a good system design answer from an excellent one.

Splitting the Trie Across Servers

Once range-based partitioning is selected, the trie must be physically split across machines in a way that keeps:

Routing simple
Lookups fast
Latency predictable

Alphabet-Based Sharding

The most straightforward strategy is alphabet-based sharding.

Example

Shard 1 → a–k
Shard 2 → l–z

Each shard:

Stores its own trie fragment
Has multiple replicas
Independently serves autocomplete requests

Why This Works

Deterministic routing
Single-shard reads
No result aggregation
Stable latency

This is ideal for prefix-based systems like autocomplete.

Finer-Grained Sharding

As traffic and data grow, shards can be incrementally split.

Examples

Shard 1 → a
Shard 2 → b
Shard 3 → c–d

Or even:

Shard A1 → ap
Shard A2 → am
Shard A3 → an

This approach allows scaling without redesigning the system.

Why Prefix-Based Sharding Works So Well

Each request hits exactly one shard
No fan-out queries
No cross-shard aggregation
Latency remains low at global scale

Updating the Trie Efficiently

Updating the trie synchronously for every search is impractical.

Autocomplete systems must decouple reads from writes.

Logging and Aggregation Pipeline

User searches are logged asynchronously
Logs are written to a durable store (Kafka / Cassandra)
Periodic batch jobs aggregate frequencies

This avoids blocking the serving path.

Pruning During Aggregation

To control growth:

Very old queries are discarded
Queries below a frequency threshold are removed
Only meaningful prefixes survive

This keeps the trie compact and relevant.

Applying Updates Safely

A new trie is built offline
Top-K lists are recomputed
Once validated, the new trie atomically replaces the old one

👉 Read traffic is never blocked during updates.

Deployment Strategies

Trie updates must be deployed without downtime.

Blue-Green Deployment

New trie version built in parallel
Validated with shadow traffic
Traffic switched atomically
Old version discarded

This is the safest approach.

Master-Slave Strategy

Master serves live traffic
Slave is rebuilt and updated
Roles are swapped

Both strategies ensure zero downtime and fast rollback.

Persisting the Trie

Rebuilding a trie from raw logs can take hours.

To enable fast recovery:

Trie snapshots are periodically persisted
Nodes are serialized level-by-level

Example Serialization

C2,A2,R1,T,P,O1,D

On restart:

Trie is reconstructed from snapshots
Top-K lists are recomputed
Service resumes quickly

Data Partitioning Strategies

When the dataset grows beyond a single machine, partitioning strategy becomes critical.

Range-Based Partitioning

Data is split using natural prefix ranges.

Example

Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z

Why It Fits Autocomplete

Prefix routing is trivial
Only one shard per request
Trie traversal stays localized

Limitation

Popular prefixes (a, s, th) can create hot shards
Requires finer splits over time

Capacity-Based Partitioning

Servers are filled until capacity limits are reached.

Example

Server 1 → a → apple → apply (full)
Server 2 → aws → banana → ball

Advantages

Efficient hardware utilization

Challenges

Dynamic prefix-to-server mapping
Routing depends on metadata
More operational complexity

Often requires ZooKeeper-like coordination.

Hash-Based Partitioning

Data is distributed using a hash function.

Example

hash("apple") → Server 2
hash("apply") → Server 7

Why It Fails for Autocomplete

Prefixes are scattered
One request hits multiple shards
Requires result aggregation
Increases latency

🚫 Hash-based partitioning is usually avoided for prefix search systems.

Partitioning Strategy Summary

👉 Range-based partitioning aligns naturally with prefix queries and is the foundation for scalable autocomplete systems.

Distributed Trie Storage (Expanded)

At scale, the trie is:

Logically one structure
Physically distributed

Each machine owns a subset of prefixes but behaves as part of a unified system.

Trie Partitioning Example

Shard 1 → a–c
Shard 2 → d–f
Shard 3 → g–z

Hot prefixes are split further:

Shard A1 → a
Shard A2 → b
Shard A3 → c

This prevents hotspots and balances load.

Replication Strategy

Each shard is replicated:

Shard 1 → Replica 1, Replica 2, Replica 3

Benefits

High Availability
Fault Tolerance
Read Scalability

Storage Technologies

Cassandra (Durable Layer)

Best when:

Trie partitions are large
QPS is extremely high
Cross-region replication is required

Usage:

Trie stored as serialized blocks
Loaded into memory on startup
Cassandra acts as source of truth

ZooKeeper (Coordination)

ZooKeeper manages:

Prefix-to-shard mappings
Replica health
Leader election
Atomic trie version switching

It guarantees consistent routing decisions.

Trade Off Summary

Aspect	Decision	Reasoning
Data Structure	Trie	Efficient prefix matching
Ranking	Pre-computed Top-K	Predictable low latency
Cache	Redis	Absorb hot prefixes
Updates	Offline batching	Protect read path
Partitioning	Range-based	Single-shard prefix routing
Storage	Distributed replicas	Scalability & availability

Low Level Design

LLD Goal

Explain how data is structured, how lookups work internally, and why latency is predictable.

🔹 LLD – Trie Node Structure

TrieNode {
  children: Map<Character, TrieNode>
  isTerminal: boolean
  frequencyScore: number
  topKSuggestions: MinHeap / Sorted List (size K)
}

What Each Field Means

children → next characters
isTerminal → end of a valid query
frequencyScore → popularity (freq + recency)
topKSuggestions → pre-computed best results for this prefix

🔹 LLD – Trie Example

Queries:

"app" (50)
"apple" (100)
"app store" (200)

Trie view:

a
└── p
    └── p
        ├── app*        [top-K: app store, apple, app]
        ├── apple*      [freq=100]
        └── app store*  [freq=200]

Each prefix node already knows its best suggestions.

🔹 LLD – Prefix Lookup Flow

Input Prefix: "app"

1. Traverse trie: a → p → p
2. Reach prefix node
3. Read pre-computed top-K
4. Return immediately

⏱ Time Complexity: O(length of prefix)

🔹 LLD – Cache Key Design

Key: autocomplete:{locale}:{prefix}

Example:
autocomplete:en-IN:how

Value:

[
  "how to cook rice",
  "how to lose weight",
  "how to make tea"
]

🔹 LLD – Cache + Trie Interaction

App Server
   |
   |---> Redis (Hit?) ----> Return result
   |
   |---> Trie Storage ----> Fetch top-K
                               |
                               +--> Store in Redis (TTL)

🔹 LLD – Distributed Trie Partitioning

Shard 1 → prefixes a–c
Shard 2 → prefixes d–f
Shard 3 → prefixes g–z

For hot prefixes:

Shard A1 → ap
Shard A2 → am
Shard A3 → an

✔ Single-shard lookup
✔ No aggregation
✔ Predictable latency

🔹 LLD – Write / Update Path (Offline)

User Searches
     |
     v
+-------------+
| Log Stream  | (Kafka)
+-------------+
     |
     v
+-------------+
| Aggregation | (Batch Job)
+-------------+
     |
     v
+-------------+
| Build New   |
| Trie        |
+-------------+
     |
     v
+-------------+
| Atomic Swap |
+-------------+

Writes never block reads.

🔹 LLD – Failure Handling

Failure	Behavior
Redis down	Read directly from trie
Trie shard down	Route to replica
App server down	LB removes it
Region down	Route to nearest region

🎯 LLD Interview Explanation (1–2 lines)

“Each trie node stores pre-computed top-K suggestions, so autocomplete never traverses subtrees at runtime.
This guarantees predictable latency even at very high QPS.”

3️⃣ HLD vs LLD – Interview Cheat Sheet

Aspect	HLD	LLD
Focus	Components & flow	Data structures & internals
Audience	Product / System	Engineers
Latency	Cache → Trie	O(prefix length)
Updates	Offline	Atomic swap
Scaling	Shards + replicas	Prefix routing