<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pallavi Singh</title>
    <description>The latest articles on DEV Community by Pallavi Singh (@erpallavisingh12).</description>
    <link>https://dev.to/erpallavisingh12</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F568299%2F65afd700-f71a-4630-83a7-34f16bb073d4.jpg</url>
      <title>DEV Community: Pallavi Singh</title>
      <link>https://dev.to/erpallavisingh12</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/erpallavisingh12"/>
    <language>en</language>
    <item>
      <title>Google Typeahead System Design</title>
      <dc:creator>Pallavi Singh</dc:creator>
      <pubDate>Wed, 27 Jan 2021 10:51:12 +0000</pubDate>
      <link>https://dev.to/erpallavisingh12/google-typeahead-system-design-4j78</link>
      <guid>https://dev.to/erpallavisingh12/google-typeahead-system-design-4j78</guid>
      <description>&lt;p&gt;Search Bar → User is typing a few characters, you should be able to return the top 10 suggestions to the user for the typed char.&lt;br&gt;
Suggestion to be returned should have the maximum frequency in your table. &lt;/p&gt;

&lt;p&gt;type world in search "CA"&lt;br&gt;
CA → &lt;/p&gt;

&lt;p&gt;CAT → 20K &amp;lt;- frequency count&lt;br&gt;
CATTLE → 25K  &amp;lt;- frequency count&lt;/p&gt;

&lt;p&gt;CAMEL&lt;br&gt;
CAR&lt;br&gt;
CABS&lt;/p&gt;

&lt;p&gt;CATASTROPHIC → 20001   &amp;lt;- frequency count&lt;/p&gt;

&lt;p&gt;CAME&lt;br&gt;
CAMEL&lt;br&gt;
CAMELS&lt;/p&gt;

&lt;p&gt;If a user searches a particular  word in the search bar if the world is in the dictionary so increase the frequency count of the world and if not found in the dictionary so added the new world into the dictionary&lt;/p&gt;

&lt;p&gt;Generally, the high-level design question consists of 4 parts&lt;br&gt;
Requirements.&lt;br&gt;
Estimations.&lt;br&gt;
Detail Design.&lt;br&gt;
Monitoring and Alerting and Backups.&lt;/p&gt;

&lt;p&gt;1.Requirements (5-7 mins):&lt;br&gt;
Functional&lt;br&gt;
Should be able to return the top 10 suggestions.&lt;br&gt;
Update the frequency of the word which has been searched. &lt;br&gt;
Here we are considering only words and not sentences.  (Assumption)&lt;br&gt;
Wait for another 500ms before you return the next results.[Suppose user type CA and wait a second and this time user type CAME then return top 10 results&lt;br&gt;
Spells checks are not required (Out of Scope)[it might be possible used made spell check so you don't need to consider]&lt;br&gt;
Customized Suggestions (Out of Scope)[based on user previous searches]&lt;br&gt;
Region-specific searches (Out of Scope)&lt;br&gt;
Only digits and alphabets have to be considered.&lt;br&gt;
Non-Functional → Req. which as an Engineer should be decided and not considering non-functional requirements might lead to bad design and hamper the user experience. &lt;br&gt;
Latency → Low approx 10 ms[user type something and give a result within 10ms] &lt;br&gt;
Availability → High Availability[Result should become as soon as possible and System is available for user]&lt;br&gt;
Consistency → Not required immediately but the system should be eventually consistent.&lt;br&gt;
Reliability&lt;br&gt;
The system should be reliable in the sense that it doesn’t capture user-specific information.&lt;br&gt;
Best suggestions are returned from the &lt;a href="https://www.learnbay.io/system-design-and-design-pattern-2/"&gt;System design online course&lt;/a&gt;. &lt;br&gt;
Security layers are in-place. [Authentication or Authorization]&lt;br&gt;
Rate Limiter&lt;/p&gt;

&lt;p&gt;2.Estimations (5-7 mins):&lt;br&gt;
When you start with estimation you have to make assumptions or you ask for your interviewer. &lt;br&gt;
It consists of two parts&lt;br&gt;
QPS&lt;br&gt;
Capacity&lt;br&gt;
Assumptions:&lt;br&gt;
Approx 100M word search per day.&lt;br&gt;
0.001% of the words being searched are new into the system.&lt;/p&gt;

&lt;p&gt;QPS (Query Per Second): It consists of two parts&lt;br&gt;
   Read QPS&lt;br&gt;
  Write QPS&lt;br&gt;
Read QPS[Return top 10 suggestion from system]&lt;br&gt;
100 * 10^6 (Per day) / 24 * 60 * 60 (86400)&lt;br&gt;
10^8 /  10^5 &lt;br&gt;
10^3 = 1000 read QPS * 2 (Multiplier) = 2000 &lt;br&gt;
Write QPS[Searching a particular keyword into the system]&lt;br&gt;
Read to write ratio is 3:1 [Because suppose we write something 'CA' gives top 10 results and wait time is 500ms and again type 'CAT' return the top 10 results before we write into what we read from the system. so user read request is 3 and write is 1]&lt;br&gt;
Write QPS → 600 - 700 writes per second.&lt;br&gt;
Capacity[How much capacity in ram or disk or space need to resources]&lt;br&gt;
100M search per day&lt;br&gt;
For an entry &lt;br&gt;
Word &lt;br&gt;
Freq&lt;br&gt;
500M (Current Dict.)[Consider only English dictionary]&lt;br&gt;
500M * (7 sizes on avg of each word) * (1 char is 1 byte) * 100 bytes → Words already present[Each word takes space.]&lt;br&gt;
Every Day → 500 M * 0.001 → 50K new words every day.  → 50K * 7 * 100 → 35K * 10^3 = 35M per day new words&lt;br&gt;
For next 1 year&lt;br&gt;
500M*7*100 + 35M*365 bytes &lt;br&gt;
= 35M * 10^4 +  ~(1M * 10^4)&lt;br&gt;
= 36 * 10^6 * 10 ^ 4 / (1024 * 1024 * 1024)&lt;br&gt;
= 360 GB of data per year. &lt;/p&gt;

&lt;p&gt;3.Details Design (30-35 mins):&lt;br&gt;
APIs[List down all apis]&lt;/p&gt;

&lt;p&gt;Read API → User is waiting for the result&lt;br&gt;
Get Request &lt;br&gt;
get suggestions(char[] prefix);&lt;br&gt;
Return → Nothing, Success/Failure &lt;br&gt;
The system is a read-heavy system and should optimize for returning the results asap. Return Top 10 Suggestions&lt;br&gt;
Update API →  User is not waiting for the result&lt;br&gt;
Post Request&lt;br&gt;
updateFreqForWord(char[] word)&lt;br&gt;
      Return → Nothing, Success/Failure &lt;/p&gt;

&lt;p&gt;The system is a read-heavy system and should optimize for returning the result asap&lt;br&gt;
Write can be slower as well because users really don’t care about the increase in the frequency.  → Preference to be given to read before the write.&lt;br&gt;
System Design Modelling[ Create design and components]&lt;br&gt;
        There are some components that are always present in the system.&lt;br&gt;
        Client &lt;br&gt;
        Authentication or Authorization  [Load balancer or Microsystem architecture or Orchestration layer] &lt;br&gt;
        Web server[read/update api]&lt;br&gt;
        2 load balancer performs business logic&lt;br&gt;
        App server  [Cache] [suggestion service store sorted &lt;a href="https://www.learnbay.io/data-structure-algorithm-in-python-sdet-and-data-scientist/"&gt;Data structure training in Python&lt;/a&gt;]&lt;br&gt;
        Database&lt;br&gt;
       Replication of Data&lt;/p&gt;

&lt;p&gt;-&amp;gt; Return top 10 suggestion&lt;br&gt;
 -&amp;gt; Return frequency  searched word&lt;br&gt;
-&amp;gt;Instead of store key-value, we can store prefix and pair of word and frequency&lt;br&gt;
-&amp;gt;Read is faster &lt;br&gt;
-&amp;gt;store key-value [data base-1 for update ] , we can store prefix and pair of word and frequency[database-2 for read purpose&lt;br&gt;
-&amp;gt;Initialize a threshold value when it crosses so update database -2 via offline job[comparing frequency and update frequency]&lt;br&gt;
-&amp;gt;one table is a frequency table and another table is a prefix table so it does not affect the write database.&lt;/p&gt;

&lt;p&gt;Refer diagram:1.1&lt;/p&gt;

&lt;p&gt;Tables and Indexes[List down all tables]&lt;br&gt;
SQL DB&lt;br&gt;
NoSql DB&lt;br&gt;
Table 1 (Suggestion Table) &lt;br&gt;
String prefix ← Key&lt;br&gt;
List&amp;gt; top 10 Suggestions&lt;br&gt;
Table 2 (WordFreq Table)&lt;br&gt;
String word ← Key (Pk)&lt;br&gt;
Int64 Freq ← Value&lt;br&gt;
Scaling &lt;br&gt;
Horizontal [adding more resources instead of changing the size of the resources]&lt;br&gt;
Vertical&lt;br&gt;
Prefer a mix of horizontal and vertical scaling&lt;br&gt;
Sharding and partitioning of  the data : &lt;br&gt;
Range Based sharding → Suggestion Table&lt;br&gt;
A, B, C…. Z → 26 Instances&lt;br&gt;
2nd layer A is heavily loaded &lt;br&gt;
AA - AP, AQ - AZ&lt;br&gt;
Date Based shading&lt;br&gt;
For every day you create a separate shard. [you have rolling window]&lt;br&gt;
Word, Freq &lt;br&gt;
2nd Layer → Range Based &lt;br&gt;
A-G H-P Q-Z&lt;br&gt;
Replications&lt;br&gt;
       In general, read from the master and write from the slave.&lt;br&gt;
(Suggestion Table) → up to 5 replications&lt;a href="https://dev.toall%20servers%20at%20the%20same%20level"&gt;Master -slave&lt;/a&gt;&lt;br&gt;
(WordFreq Table) → (Master-Master) → 2 replications.[You can write two servers at the same time]&lt;br&gt;
Caching  :&lt;br&gt;
        The cache can be a database&lt;/p&gt;

&lt;p&gt;80-20 rule&lt;br&gt;
80% of the requests on the system will query for 20% of the data and the remaining 20% of the requests will be querying on 80% of the data. &lt;br&gt;
Cache those 20% prefixes. &lt;br&gt;
Trie → Cache Optimize on reading and write &lt;br&gt;
Invalidations&lt;br&gt;
Strategy to remove a prefix from the cache. &lt;br&gt;
Load Balancing&lt;br&gt;
Helps in better distribution of load across layers. &lt;br&gt;
Round Robin Strategy for balancing the load between the servers/machine.&lt;br&gt;
Auths (Authentications and Authorizations)&lt;br&gt;
Not required for this design at the user level. &lt;/p&gt;

&lt;p&gt;4.Monitoring and Alerting and Backups(2-3 mins):&lt;/p&gt;

&lt;p&gt;Helps monitor the latency of each of the API.&lt;br&gt;
Monitor success and a failure rate of the API&lt;br&gt;
QPS for each of the API&lt;br&gt;
Monitoring on caches. &lt;br&gt;
Alerting &lt;br&gt;
→ Alerts on failure if requests fail x times in a window of a time (1 hour, 10 mins, 20 mins)&lt;br&gt;
Alerting on the latency → If x% of requests from total requests within a window is taking more than t ms to return the results. &lt;br&gt;
BackUps&lt;br&gt;
Retrieval of data that could be lost because of some bug. &lt;br&gt;
Product Metrics on backups. &lt;br&gt;
Create metrics for experiments on backups. &lt;/p&gt;

&lt;p&gt;Digram: 1.1&lt;/p&gt;

&lt;p&gt;Cache Design&lt;/p&gt;

&lt;p&gt;We need to store words and we should know the top 10 suggestions for every prefix.&lt;/p&gt;

&lt;p&gt;The latency of returning the response should be below.&lt;/p&gt;

&lt;p&gt;CAR → 20k&lt;br&gt;
CAMEL → 30k&lt;br&gt;
CATTLE → 1K&lt;br&gt;
CABS → 4K&lt;br&gt;
CAP&lt;br&gt;
CARTOON&lt;br&gt;
CAGE&lt;br&gt;
CAPE → 100&lt;/p&gt;

&lt;p&gt;Problem [ java use hashmap or python use dictionary ]:&lt;br&gt;
We store keys as the word and frequency as values. We have to iterate through all the world present in a dictionary or hash map and see how many start with "CA" return the top 10 suggestions. We need time to save data and need extra memory for appearing in the word  "CA ".&lt;br&gt;
To overcome this problem we use another famous data structure that stores a lot of the words.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.learnbay.io/data-structures-algorithms/"&gt;Data Structures online courses&lt;/a&gt;: Tree&lt;/p&gt;

&lt;p&gt;Convert tree into trie structure&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

//initialize constant
private static int HEAP_SIZE = 10;
//structure of TrieNode 
Struct TrieNode {
    int freq;
    int arr[26] children;
    priority_queue&amp;lt;pair&amp;lt;int, string&amp;gt;&amp;gt; suggestions; //for storing string with frequency
};

// cat
Pair&amp;lt;Struct TrieNode*, int&amp;gt; insertWordInTrie(struct Trienode * trie, string word, int index, int freqCount){
//node is null
if(trie == null) {
    Struct TrieNode* trieNode = 
        (Struct TrieNode*) malloc(sizeof TrieNode);
    trie = trieNode;
    trie -&amp;gt; freq = 0;
}
//when we reach at end of the world
if(word.size() == index) {
    trie -&amp;gt; freq += freqCount;
    return make_pair(trie, trie -&amp;gt; freq); //return of pair word with frequency
 }
//node is not null

    Pair&amp;lt;Struct TrieNode*, int&amp;gt; childrenWordFreq = 
        insertWordInTrie(
            trie -&amp;gt; children[word[index] - ‘a’], 
            word, 
            index+1);
    trie -&amp;gt; children[word[index] - ‘a’] = childrenWordFreq.first;

    int wordTotalFreq = childrenWordFreq.second;

    (trie-&amp;gt;suggestions).insert(make_pair(wordTotalFreq, word));


           //if tree size is a greater than HEAP_SIZE
if((trie-&amp;gt;suggestions).size() &amp;gt; HEAP_SIZE) {
    (trie-&amp;gt;suggestions).pop();
}
    return make_pair(trie, wordTotalFreq);
}
//Search word in trie
boolean searchWordInTrie(struct Trienode * trie, string word, int index) {
    if(trie == null) {
    return false;
}

if(word.size() == index) {
    return trie -&amp;gt; freq!=0 ? true: false;
}

return searchWordInTrie(trie-&amp;gt;children[word[index] - ‘a’], word, index);
}



priority_queue&amp;lt;pair&amp;lt;int, string&amp;gt;&amp;gt; getSuggestions(
            struct Trienode * trie, string prefix, int index) {
     //if trie null
    if(trie == null)
        return null; 
         //return top suggestion
    if(index == prefix.size()) {
        return trie -&amp;gt; suggestions; 
}

    return getSuggestions(trie-&amp;gt;children[prefix[index]-’a’], prefix, index+1);

}


//remove all the words below the tree
Struct TrieNode* invalidateCache(string prefix) {
    // Practice
}


//Driver code
int main() {
    TrieNode trie = null;
    trie = insertWordInTrie(trie, “cat”, 0 // string index, 1);

}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
``&lt;/p&gt;

&lt;p&gt;Suggestion service &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.learnbay.io/competitive-programming-training-online/"&gt;Algorithms Online courses&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data Replication&lt;/p&gt;

&lt;p&gt;Word with frequency&lt;/p&gt;

</description>
      <category>database</category>
      <category>c</category>
      <category>algorithms</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
