Search optimized database (full-text search) for system design interview

#systemdesign #elasticsearch #database #interview

Traditional databases run a table scan to find a search term in the database. This is slow and efficient if a table stores a large dataset (1000+ rows). To improve, "search optimized database" can be used.

It uses indexing, tokenization and stemming to make search queries fast and efficient (by building "inverted indexes"):

Tokenization is a process of reducing words to their root form. For example, "running" and "runs" can be reduced to "run".
Stemming is a process of breaking a piece of task into individual words. It helps mapping words to documents containing those words in the inverted indexes.

Something to note is the underlying data structure of search mechanisms of search optimized database - Inverted Indexes.

It is a data structure that maps words to the documents that contain them. For example:

{
  "word1": [doc1,doc2,doc3],
  "word2": [doc2,doc3,doc6]
}

Most search optimized database also support "Fuzzy Search" out of box as a configuration. Fuzzy Search works by leveraging "edit distance calculation" technique, which measures how many letters to be changed/added/removed to transform one word into another. Thus, results with minor misspellings or discrepancy relative to the search term can be returned efficiently in case of human errors.

One of the popular search optimized database is "ElasticSearch".

Reference

https://www.hellointerview.com/learn/system-design/in-a-hurry/key-technologies

DEV Community

Search optimized database (full-text search) for system design interview

Reference

Top comments (0)

Read next

Demystifying hashCode() and equals(): The Backbone of Java Hash Collections

Real-Life Kubernetes Interview Questions and Answers with Explanations

How to Negotiate with Recruiters in Your Phone Screen Interview

Redis