DEV Community

Cover image for Google Typeahead System Design
Pallavi Singh
Pallavi Singh

Posted on

Google Typeahead System Design

Search Bar → User is typing a few characters, you should be able to return the top 10 suggestions to the user for the typed char.
Suggestion to be returned should have the maximum frequency in your table.

type world in search "CA"
CA →

CAT → 20K <- frequency count
CATTLE → 25K <- frequency count


CATASTROPHIC → 20001 <- frequency count


If a user searches a particular word in the search bar if the world is in the dictionary so increase the frequency count of the world and if not found in the dictionary so added the new world into the dictionary

Generally, the high-level design question consists of 4 parts
Detail Design.
Monitoring and Alerting and Backups.

1.Requirements (5-7 mins):
Should be able to return the top 10 suggestions.
Update the frequency of the word which has been searched.
Here we are considering only words and not sentences. (Assumption)
Wait for another 500ms before you return the next results.[Suppose user type CA and wait a second and this time user type CAME then return top 10 results
Spells checks are not required (Out of Scope)[it might be possible used made spell check so you don't need to consider]
Customized Suggestions (Out of Scope)[based on user previous searches]
Region-specific searches (Out of Scope)
Only digits and alphabets have to be considered.
Non-Functional → Req. which as an Engineer should be decided and not considering non-functional requirements might lead to bad design and hamper the user experience.
Latency → Low approx 10 ms[user type something and give a result within 10ms]
Availability → High Availability[Result should become as soon as possible and System is available for user]
Consistency → Not required immediately but the system should be eventually consistent.
The system should be reliable in the sense that it doesn’t capture user-specific information.
Best suggestions are returned from the System design online course.
Security layers are in-place. [Authentication or Authorization]
Rate Limiter

2.Estimations (5-7 mins):
When you start with estimation you have to make assumptions or you ask for your interviewer.
It consists of two parts
Approx 100M word search per day.
0.001% of the words being searched are new into the system.

QPS (Query Per Second): It consists of two parts
Read QPS
Write QPS
Read QPS[Return top 10 suggestion from system]
100 * 10^6 (Per day) / 24 * 60 * 60 (86400)
10^8 / 10^5
10^3 = 1000 read QPS * 2 (Multiplier) = 2000
Write QPS[Searching a particular keyword into the system]
Read to write ratio is 3:1 [Because suppose we write something 'CA' gives top 10 results and wait time is 500ms and again type 'CAT' return the top 10 results before we write into what we read from the system. so user read request is 3 and write is 1]
Write QPS → 600 - 700 writes per second.
Capacity[How much capacity in ram or disk or space need to resources]
100M search per day
For an entry
500M (Current Dict.)[Consider only English dictionary]
500M * (7 sizes on avg of each word) * (1 char is 1 byte) * 100 bytes → Words already present[Each word takes space.]
Every Day → 500 M * 0.001 → 50K new words every day. → 50K * 7 * 100 → 35K * 10^3 = 35M per day new words
For next 1 year
500M*7*100 + 35M*365 bytes
= 35M * 10^4 + ~(1M * 10^4)
= 36 * 10^6 * 10 ^ 4 / (1024 * 1024 * 1024)
= 360 GB of data per year.

3.Details Design (30-35 mins):
APIs[List down all apis]

Read API → User is waiting for the result
Get Request
get suggestions(char[] prefix);
Return → Nothing, Success/Failure
The system is a read-heavy system and should optimize for returning the results asap. Return Top 10 Suggestions
Update API → User is not waiting for the result
Post Request
updateFreqForWord(char[] word)
Return → Nothing, Success/Failure

The system is a read-heavy system and should optimize for returning the result asap
Write can be slower as well because users really don’t care about the increase in the frequency. → Preference to be given to read before the write.
System Design Modelling[ Create design and components]
There are some components that are always present in the system.
Authentication or Authorization [Load balancer or Microsystem architecture or Orchestration layer]
Web server[read/update api]
2 load balancer performs business logic
App server [Cache] [suggestion service store sorted Data structure training in Python]
Replication of Data

-> Return top 10 suggestion
-> Return frequency searched word
->Instead of store key-value, we can store prefix and pair of word and frequency
->Read is faster
->store key-value [data base-1 for update ] , we can store prefix and pair of word and frequency[database-2 for read purpose
->Initialize a threshold value when it crosses so update database -2 via offline job[comparing frequency and update frequency]
->one table is a frequency table and another table is a prefix table so it does not affect the write database.

Refer diagram:1.1

Tables and Indexes[List down all tables]
NoSql DB
Table 1 (Suggestion Table)
String prefix ← Key
List> top 10 Suggestions
Table 2 (WordFreq Table)
String word ← Key (Pk)
Int64 Freq ← Value
Horizontal [adding more resources instead of changing the size of the resources]
Prefer a mix of horizontal and vertical scaling
Sharding and partitioning of the data :
Range Based sharding → Suggestion Table
A, B, C…. Z → 26 Instances
2nd layer A is heavily loaded
AA - AP, AQ - AZ
Date Based shading
For every day you create a separate shard. [you have rolling window]
Word, Freq
2nd Layer → Range Based
In general, read from the master and write from the slave.
(Suggestion Table) → up to 5 replicationsMaster -slave
(WordFreq Table) → (Master-Master) → 2 replications.[You can write two servers at the same time]
Caching :
The cache can be a database

80-20 rule
80% of the requests on the system will query for 20% of the data and the remaining 20% of the requests will be querying on 80% of the data.
Cache those 20% prefixes.
Trie → Cache Optimize on reading and write
Strategy to remove a prefix from the cache.
Load Balancing
Helps in better distribution of load across layers.
Round Robin Strategy for balancing the load between the servers/machine.
Auths (Authentications and Authorizations)
Not required for this design at the user level.

4.Monitoring and Alerting and Backups(2-3 mins):

Helps monitor the latency of each of the API.
Monitor success and a failure rate of the API
QPS for each of the API
Monitoring on caches.
→ Alerts on failure if requests fail x times in a window of a time (1 hour, 10 mins, 20 mins)
Alerting on the latency → If x% of requests from total requests within a window is taking more than t ms to return the results.
Retrieval of data that could be lost because of some bug.
Product Metrics on backups.
Create metrics for experiments on backups.

Digram: 1.1

Cache Design

We need to store words and we should know the top 10 suggestions for every prefix.

The latency of returning the response should be below.

CAR → 20k
CAMEL → 30k
CAPE → 100

Problem [ java use hashmap or python use dictionary ]:
We store keys as the word and frequency as values. We have to iterate through all the world present in a dictionary or hash map and see how many start with "CA" return the top 10 suggestions. We need time to save data and need extra memory for appearing in the word "CA ".
To overcome this problem we use another famous data structure that stores a lot of the words.

Data Structures online courses: Tree

Convert tree into trie structure

//initialize constant
private static int HEAP_SIZE = 10;
//structure of TrieNode 
Struct TrieNode {
    int freq;
    int arr[26] children;
    priority_queue<pair<int, string>> suggestions; //for storing string with frequency

// cat
Pair<Struct TrieNode*, int> insertWordInTrie(struct Trienode * trie, string word, int index, int freqCount){
//node is null
if(trie == null) {
    Struct TrieNode* trieNode = 
        (Struct TrieNode*) malloc(sizeof TrieNode);
    trie = trieNode;
    trie -> freq = 0;
//when we reach at end of the world
if(word.size() == index) {
    trie -> freq += freqCount;
    return make_pair(trie, trie -> freq); //return of pair word with frequency
//node is not null

    Pair<Struct TrieNode*, int> childrenWordFreq = 
            trie -> children[word[index] - ‘a’], 
    trie -> children[word[index] - ‘a’] = childrenWordFreq.first;

    int wordTotalFreq = childrenWordFreq.second;

    (trie->suggestions).insert(make_pair(wordTotalFreq, word));

           //if tree size is a greater than HEAP_SIZE
if((trie->suggestions).size() > HEAP_SIZE) {
    return make_pair(trie, wordTotalFreq);
//Search word in trie
boolean searchWordInTrie(struct Trienode * trie, string word, int index) {
    if(trie == null) {
    return false;

if(word.size() == index) {
    return trie -> freq!=0 ? true: false;

return searchWordInTrie(trie->children[word[index] - ‘a’], word, index);

priority_queue<pair<int, string>> getSuggestions(
            struct Trienode * trie, string prefix, int index) {
     //if trie null
    if(trie == null)
        return null; 
         //return top suggestion
    if(index == prefix.size()) {
        return trie -> suggestions; 

    return getSuggestions(trie->children[prefix[index]-’a’], prefix, index+1);


//remove all the words below the tree
Struct TrieNode* invalidateCache(string prefix) {
    // Practice

//Driver code
int main() {
    TrieNode trie = null;
    trie = insertWordInTrie(trie, “cat”, 0 // string index, 1);


Enter fullscreen mode Exit fullscreen mode


Suggestion service

Algorithms Online courses

Data Replication

Word with frequency

Top comments (0)