DEV Community

loizenai
loizenai

Posted on • Edited on

1

Elasticsearch Tokenizers – Structured Text Tokenizers

https://ozenero.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers

Elasticsearch Tokenizers – Structured Text Tokenizers

In this tutorial, we're gonna look at Structured Text Tokenizers that are usually used with structured text like identifiers, email addresses, zip codes, and paths.

I. Keyword Tokenizer

keyword tokenizer is the simplest tokenizer that accepts whatever text it is given and outputs the exact same text as a single term.

For example:


POST _analyze
{
  "tokenizer": "keyword",
  "text": "Java Sample Approach"
}

Term:


[ Java Sample Approach ]

II. Pattern Tokenizer

pattern tokenizer uses a regular expression to either split text into terms whenever it matches a word separator, or to capture matching text as terms.

The default pattern is \W+, which splits text whenever it encounters non-word characters.

For example:


POST _analyze
{
  "tokenizer": "pattern",
  "text": "Java_Sample_Approach's tutorials are helpful."
}

Terms:


[ "Java_Sample_Approach", "s", "tutorials", "are", "helpful" ]

Configuration

  • pattern: Java regular expression, defaults to \W+.
  • flags: Java regular expression flags. (for example: "CASE_INSENSITIVE|COMMENTS") More flags at: regex Pattern
  • group capture group to extract as tokens. Defaults to -1 (split).

For example, we want to break text into tokens when it encounters commas:

More at:

https://ozenero.com/elasticsearch/elasticsearch-tokenizers-structured-text-tokenizers

Elasticsearch Tokenizers – Structured Text Tokenizers

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay