DEV Community

Nuttee
Nuttee

Posted on

How to know the analyzer of Elasticsearch tokenize

Sometimes, you may be curious about what a given analyzer is tokening your queries to your code that you're working on it.

For example. Let's say your index's mapping look like as given below;

{
  "mappings": {
    "user": {
      "properties": {
        "fullname": {
          "type": "text"
        },
        "email": {
          "type": "text",
          "analyzer": "keyword_ngram_analyzer",
        }
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Let's check one by one.

First, properties uses the default analyzer because there is no specified analyzer. See Elasticsearch documentation for more details)

So we're trying to debug by using analyze with the query is Scott Leblanc then.

POST /_analyze
{
  "analyzer": "standard",
  "text": "Scott Leblanc"
}
Enter fullscreen mode Exit fullscreen mode

Output will be

{
  "tokens" : [
    {
      "token" : "scott",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "leblanc",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Second, What about keyword_ngram_analyzer that defined to email?

POST /_analyze
{
  "analyzer": "keyword_ngram_analyzer",
  "text": "joh@m.co"
}
Enter fullscreen mode Exit fullscreen mode

Output will be

{
  "tokens" : [
    {
      "token" : "joh",
    },
    {
      "token" : "joh@",
    },
    {
      "token" : "joh@m",
    },
    {
      "token" : "joh@m.",
    },
    {
      "token" : "joh@m.c",
    },
    {
      "token" : "joh@m.co",
    },
    {
      "token" : "oh@",
    },
    {
      "token" : "oh@m",
    },
    {
      "token" : "oh@m.",
    },
    {
      "token" : "oh@m.c",
    },
    {
      "token" : "oh@m.co",
    },
    {
      "token" : "h@m",
    },
    {
      "token" : "h@m.",
    },
    {
      "token" : "h@m.c",
    },
    {
      "token" : "h@m.co",
    },
    {
      "token" : "@m.",
    },
    {
      "token" : "@m.c",
    },
    {
      "token" : "@m.co",
    },
    {
      "token" : "m.c",
    },
    {
      "token" : "m.co",
    },
    {
      "token" : ".co",
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Above the sample, you can query/filter by entering these tokens such as joh@ or @m.co or referring to resulting tokens.

Conclusion

If you want to know, each analyzer of Elasticsearch tokenizes your queries. You can use Analyze API to analyze it.

See more Analyze API for more details.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay