DEV Community

Cover image for Autocomplete / Type-ahead System for a Search Box - Part 2
ZeeshanAli-0704
ZeeshanAli-0704

Posted on

Autocomplete / Type-ahead System for a Search Box - Part 2

System Design Interview: Autocomplete / Type-ahead System (Part 2)

In Part 1, we covered the problem statement, requirements, and capacity estimation.
In this part, we’ll deep-dive into the core data structures, ranking optimizations, and high-level system architecture that make autocomplete systems fast and scalable at Google-level traffic.


πŸ“Œ Table of Contents (Part 2)

  1. Choosing the Core Data Structure
  2. Optimizing the Trie Structure
  3. How Top K Suggestions Are Retrieved
  4. Building and Maintaining Top K Lists
  5. High Level System Architecture
  6. Storage, Replication, and Availability

Choosing the Core Data Structure

Autocomplete is fundamentally a prefix-matching problem.

Given a prefix, return the most relevant strings that start with it β€” fast.

This immediately eliminates naive solutions like scanning a list or database table. We need a structure that is prefix-aware by design.


Why a Trie Works Well

A Trie (Prefix Tree) stores strings character by character.

  • Each node represents a prefix
  • All descendants share that prefix

Key Advantages

  • Lookup time depends only on prefix length
  • No scanning of unrelated strings
  • Naturally groups similar queries
  • Perfect fit for prefix search

Example

Stored queries:

be, bee, bell, bent, belt
Enter fullscreen mode Exit fullscreen mode

Trie traversal for prefix "be" leads to a subtree containing:

bee, bell, bent, belt
Enter fullscreen mode Exit fullscreen mode

This is exactly what autocomplete needs β€” fast prefix isolation.


Optimizing the Trie Structure

A naive trie becomes extremely large when storing billions of queries.

Problem with a Naive Trie

  • Too many nodes
  • Deep chains with single children
  • High memory overhead
  • Excessive pointer chasing

Compressed Trie (Radix Tree)

In many cases, nodes have only one child.
These chains can be merged into a single edge storing multiple characters.

Benefits

  • Reduced memory usage
  • Fewer pointer traversals
  • Faster cache-friendly lookups

This optimized trie is often called a Radix Tree or Compact Trie.


Storing Popularity Information

How popular is a full search query?

Each complete query ends at a terminal node in the trie.

At that node, we store:

  • Search frequency
  • Or a weighted score combining frequency + recency

Example

User searches:

"apple" β†’ 100 searches
"application" β†’ 60 searches
"app store" β†’ 200 searches
Enter fullscreen mode Exit fullscreen mode

Trie view:

app
 β”œβ”€β”€ apple (freq=100)
 β”œβ”€β”€ application (freq=60)
 └── app store (freq=200)
Enter fullscreen mode Exit fullscreen mode

At this stage:

  • We know popularity of full queries
  • We cannot yet answer prefix queries fast

This layer answers:

β€œHow popular is each query?”


How Top K Suggestions Are Retrieved

Given a prefix, how do we return the best suggestions instantly?


Naive Approach ❌

For prefix "be":

  1. Traverse to prefix node
  2. Traverse entire subtree
  3. Collect all terminal queries
  4. Sort by frequency
  5. Return top K

🚫 This is far too slow for real-time systems.


Pre Computed Top K Optimization βœ…

To guarantee fast responses:

πŸ‘‰ Each trie node stores the top K suggestions for that prefix.

So at query time:

Traverse to prefix node β†’ return stored top-K
Enter fullscreen mode Exit fullscreen mode

No subtree traversal.
No sorting.
No computation on request path.


Storage Optimization

To reduce memory overhead:

  • Store references to terminal nodes
  • Store frequency alongside references

This is a classic space vs latency trade-off β€” and autocomplete is extremely read-heavy, so it’s worth it.


Building and Maintaining Top K Lists

Top-K lists are built bottom-up.

Build Process

  1. Leaf nodes know their own frequency
  2. Parent nodes merge children’s top-K lists
  3. Only top K entries are retained
  4. Process propagates upward to root

This ensures:

  • Every prefix node has pre-sorted suggestions
  • Lookup time is O(prefix length)

Two-Layer Mental Model

Think of autocomplete as two separate layers:

Layer 1: Popularity Tracking (Data Layer)

  • Tracks how often full queries are searched
  • Stored at terminal nodes
  • Updated offline via logs and aggregation

Layer 2: Fast Retrieval (Serving Layer)

  • Uses popularity data to compute rankings
  • Stores only top-K per prefix
  • Optimized purely for read latency

High Level System Architecture

Autocomplete is a read-heavy, latency-critical pipeline.

Serve from memory first β†’ fall back to structured storage β†’ never compute on the request path

Autocomplete is not a compute problem β€” it’s a data access problem.


HLD Goal

Show how traffic flows end-to-end and where latency is optimized.

At HLD level, interviewers want to see:

  • Clear separation of responsibilities
  • Read-heavy optimization
  • Scalability and availability

πŸ”Ή High-Level Architecture Diagram

+-------------------+
|   User (Browser)  |
+-------------------+
          |
          v
+-------------------+
|   CDN / Edge      |
| (Optional Cache)  |
+-------------------+
          |
          v
+-------------------+
| Global Load       |
| Balancer          |
+-------------------+
          |
          v
+-------------------+
| Stateless App     |
| Servers           |
+-------------------+
          |
          v
+-------------------+
| In-Memory Cache   |
| (Redis)           |
+-------------------+
          |
     Cache Miss
          v
+-------------------+
| Distributed Trie  |
| Storage           |
+-------------------+
Enter fullscreen mode Exit fullscreen mode

πŸ”Ή HLD Component Responsibilities

1. Client (Browser / Mobile)

  • Sends only prefix, not full query
  • Debounces keystrokes (e.g., 100–150 ms)
  • Reduces unnecessary backend calls

2. CDN / Edge Cache (Optional but Powerful)

  • Caches extremely hot prefixes ("a", "how", "the")
  • Reduces round-trip latency
  • Shields backend during traffic spikes

3. Global Load Balancer

  • Routes user to nearest region
  • Distributes traffic evenly
  • Removes unhealthy instances

4. Stateless Application Servers

  • Input normalization ("How " β†’ "how")
  • Cache lookup orchestration
  • Fallback to trie storage
  • No state β†’ easy horizontal scaling

5. Redis (Hot Prefix Cache)

  • Stores top-K suggestions per prefix
  • Sub-millisecond response
  • TTL auto-expires stale results

6. Distributed Trie Storage

  • Stores pre-computed top-K per prefix
  • Read-only on serving path
  • Partitioned + replicated

🎯 HLD Interview Explanation (1–2 lines)

β€œAutocomplete is a read-heavy system, so we always try to serve from memory first.
Only on cache miss do we hit distributed trie storage, which already has pre-computed top-K results.”


Storage, Replication, and Availability

A full autocomplete trie cannot live on a single machine.


Distributed Trie Storage

The trie is:

  • Logically one structure
  • Physically distributed across many servers

Partitioning Strategy

Common approach: prefix-based partitioning

Examples:

  • a–c β†’ Shard 1
  • d–f β†’ Shard 2
  • Hot prefixes (a, s, th) further subdivided

This keeps shards balanced and scalable.


Replication Strategy

Each trie partition is replicated across multiple servers.

Why Replication Matters

  • High Availability
    Node failures don’t impact service

  • Fault Tolerance
    Handles hardware failures and deployments

  • Read Scalability
    Replicas share massive QPS load


Storage Technology Choices

Cassandra (Durable Storage)

Best fit when:

  • Large trie partitions
  • Very high read throughput
  • Cross-region replication needed

Usage pattern:

  • Trie partitions stored as serialized blocks
  • Loaded into memory on startup
  • Cassandra acts as durable backing store

Strengths:

  • Horizontal scalability
  • High availability
  • Tunable consistency

ZooKeeper (Metadata & Coordination)

ZooKeeper is not used to store trie data.

It is used for:

  • Partition metadata
  • Prefix-to-server mappings
  • Leader election (if required)
  • Atomic version switching during trie rebuilds

Replication Guarantees

Replication ensures:

  • βœ… High availability
  • βœ… Fault tolerance
  • βœ… Read scalability

🎯 Final Takeaway (Part 2)

A real-world autocomplete system succeeds because it:

  • Uses tries for prefix isolation
  • Pre-computes top-K suggestions
  • Serves requests from memory first
  • Pushes all heavy computation offline
  • Scales via partitioning and replication

This is the level of clarity interviewers look for.

More Details:

Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/front-end-system-design

Top comments (0)