https://grokonez.com/elasticsearch
Elasticsearch is a distributed, full-text search engine based on Lucene with JSON schema. It is an open source and implemented by Java.
Introduction
Elasticsearch uses a structure called an inverted index. It is designed for the fastest solution of full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
For example:
– “The quick brown fox jumped over the lazy dog”
– ” Quick brown foxes leap over lazy dogs in summer”Problem
There are a few problems with our current inverted index:
– “Quick” and “quick” appear as separate terms, while the user probably thinks of them as the same word.
– “fox” and “foxes” are pretty similar, so are “dog” and “dogs”. They share the same root word.
– “jumped” and “leap”, while not from the same root word, are similar in meaning. They are synonyms.Solution
– “Quick” can be lower-cased to become “quick”.
– “foxes” can be stemmed for reduced to its root form to become “fox”. Similarly, “dogs” could be stemmed to “dog”.
– “jumped” and “leap” are synonyms and can be indexed as just the single term “jump”.
Tutorial at: https://grokonez.com/elasticsearch
https://grokonez.com/elasticsearch/elasticsearch-character-filters
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers
https://grokonez.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers
Top comments (0)