DEV Community

JavaSampleApproach
JavaSampleApproach

Posted on

ElasticSearch Tutorial

https://grokonez.com/elasticsearch

Elasticsearch is a distributed, full-text search engine based on Lucene with JSON schema. It is an open source and implemented by Java.

  1. Introduction
    Elasticsearch uses a structure called an inverted index. It is designed for the fastest solution of full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
    For example:
    – “The quick brown fox jumped over the lazy dog”
    – ” Quick brown foxes leap over lazy dogs in summer”

  2. Problem
    There are a few problems with our current inverted index:
    – “Quick” and “quick” appear as separate terms, while the user probably thinks of them as the same word.
    – “fox” and “foxes” are pretty similar, so are “dog” and “dogs”. They share the same root word.
    – “jumped” and “leap”, while not from the same root word, are similar in meaning. They are synonyms.

  3. Solution
    – “Quick” can be lower-cased to become “quick”.
    – “foxes” can be stemmed for reduced to its root form to become “fox”. Similarly, “dogs” could be stemmed to “dog”.
    – “jumped” and “leap” are synonyms and can be indexed as just the single term “jump”.

Tutorial at: https://grokonez.com/elasticsearch

https://grokonez.com/elasticsearch/elasticsearch-character-filters
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers

Top comments (0)