ElasticSearch Tutorial

Elasticsearch is a distributed, full-text search engine based on Lucene with JSON schema. It is an open source and implemented by Java.

Introduction
Elasticsearch uses a structure called an inverted index. It is designed for the fastest solution of full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
For example:
– “The quick brown fox jumped over the lazy dog”
– ” Quick brown foxes leap over lazy dogs in summer”
Problem
There are a few problems with our current inverted index:
– “Quick” and “quick” appear as separate terms, while the user probably thinks of them as the same word.
– “fox” and “foxes” are pretty similar, so are “dog” and “dogs”. They share the same root word.
– “jumped” and “leap”, while not from the same root word, are similar in meaning. They are synonyms.
Solution
– “Quick” can be lower-cased to become “quick”.
– “foxes” can be stemmed for reduced to its root form to become “fox”. Similarly, “dogs” could be stemmed to “dog”.
– “jumped” and “leap” are synonyms and can be indexed as just the single term “jump”.

DEV Community