DEV Community

Nuttee
Nuttee

Posted on

How to know the analyzer of Elasticsearch tokenize

Sometimes, you may be curious about what a given analyzer is tokening your queries to your code that you're working on it.

For example. Let's say your index's mapping look like as given below;

{
  "mappings": {
    "user": {
      "properties": {
        "fullname": {
          "type": "text"
        },
        "email": {
          "type": "text",
          "analyzer": "keyword_ngram_analyzer",
        }
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Let's check one by one.

First, properties uses the default analyzer because there is no specified analyzer. See Elasticsearch documentation for more details)

So we're trying to debug by using analyze with the query is Scott Leblanc then.

POST /_analyze
{
  "analyzer": "standard",
  "text": "Scott Leblanc"
}
Enter fullscreen mode Exit fullscreen mode

Output will be

{
  "tokens" : [
    {
      "token" : "scott",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "leblanc",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Second, What about keyword_ngram_analyzer that defined to email?

POST /_analyze
{
  "analyzer": "keyword_ngram_analyzer",
  "text": "joh@m.co"
}
Enter fullscreen mode Exit fullscreen mode

Output will be

{
  "tokens" : [
    {
      "token" : "joh",
    },
    {
      "token" : "joh@",
    },
    {
      "token" : "joh@m",
    },
    {
      "token" : "joh@m.",
    },
    {
      "token" : "joh@m.c",
    },
    {
      "token" : "joh@m.co",
    },
    {
      "token" : "oh@",
    },
    {
      "token" : "oh@m",
    },
    {
      "token" : "oh@m.",
    },
    {
      "token" : "oh@m.c",
    },
    {
      "token" : "oh@m.co",
    },
    {
      "token" : "h@m",
    },
    {
      "token" : "h@m.",
    },
    {
      "token" : "h@m.c",
    },
    {
      "token" : "h@m.co",
    },
    {
      "token" : "@m.",
    },
    {
      "token" : "@m.c",
    },
    {
      "token" : "@m.co",
    },
    {
      "token" : "m.c",
    },
    {
      "token" : "m.co",
    },
    {
      "token" : ".co",
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Above the sample, you can query/filter by entering these tokens such as joh@ or @m.co or referring to resulting tokens.

Conclusion

If you want to know, each analyzer of Elasticsearch tokenizes your queries. You can use Analyze API to analyze it.

See more Analyze API for more details.

Top comments (0)