How to know the analyzer of Elasticsearch tokenize

#elasticsearch

Sometimes, you may be curious about what a given analyzer is tokening your queries to your code that you're working on it.

For example. Let's say your index's mapping look like as given below;

{
  "mappings": {
    "user": {
      "properties": {
        "fullname": {
          "type": "text"
        },
        "email": {
          "type": "text",
          "analyzer": "keyword_ngram_analyzer",
        }
      }
    }
  }
}

Let's check one by one.

First, properties uses the default analyzer because there is no specified analyzer. See Elasticsearch documentation for more details)

So we're trying to debug by using analyze with the query is Scott Leblanc then.

POST /_analyze
{
  "analyzer": "standard",
  "text": "Scott Leblanc"
}

Output will be

{
  "tokens" : [
    {
      "token" : "scott",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "leblanc",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Second, What about keyword_ngram_analyzer that defined to email?

POST /_analyze
{
  "analyzer": "keyword_ngram_analyzer",
  "text": "joh@m.co"
}

Output will be

{
  "tokens" : [
    {
      "token" : "joh",
    },
    {
      "token" : "joh@",
    },
    {
      "token" : "joh@m",
    },
    {
      "token" : "joh@m.",
    },
    {
      "token" : "joh@m.c",
    },
    {
      "token" : "joh@m.co",
    },
    {
      "token" : "oh@",
    },
    {
      "token" : "oh@m",
    },
    {
      "token" : "oh@m.",
    },
    {
      "token" : "oh@m.c",
    },
    {
      "token" : "oh@m.co",
    },
    {
      "token" : "h@m",
    },
    {
      "token" : "h@m.",
    },
    {
      "token" : "h@m.c",
    },
    {
      "token" : "h@m.co",
    },
    {
      "token" : "@m.",
    },
    {
      "token" : "@m.c",
    },
    {
      "token" : "@m.co",
    },
    {
      "token" : "m.c",
    },
    {
      "token" : "m.co",
    },
    {
      "token" : ".co",
    }
  ]
}

Above the sample, you can query/filter by entering these tokens such as joh@ or @m.co or referring to resulting tokens.

Conclusion

If you want to know, each analyzer of Elasticsearch tokenizes your queries. You can use Analyze API to analyze it.

See more Analyze API for more details.

DEV Community

How to know the analyzer of Elasticsearch tokenize

Top comments (0)

Read next

Top 5 Text2SQL Tools for Effortless SQL Generation

Optimizing Geometric Overlap Detection: A Deep Dive into Spatial Indexing with Python

Unit Testing Vs Integration Testing: A Comprehensive Guide

6 PWA Features That Boost User Engagement: A Developer's Guide