Sometimes, you may be curious about what a given analyzer is tokening your queries to your code that you're working on it.
For example. Let's say your index's mapping look like as given below;
{
"mappings": {
"user": {
"properties": {
"fullname": {
"type": "text"
},
"email": {
"type": "text",
"analyzer": "keyword_ngram_analyzer",
}
}
}
}
}
Let's check one by one.
First, properties
uses the default analyzer because there is no specified analyzer. See Elasticsearch documentation for more details)
So we're trying to debug by using analyze
with the query is Scott Leblanc
then.
POST /_analyze
{
"analyzer": "standard",
"text": "Scott Leblanc"
}
Output will be
{
"tokens" : [
{
"token" : "scott",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "leblanc",
"start_offset" : 6,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
Second, What about keyword_ngram_analyzer
that defined to email
?
POST /_analyze
{
"analyzer": "keyword_ngram_analyzer",
"text": "joh@m.co"
}
Output will be
{
"tokens" : [
{
"token" : "joh",
},
{
"token" : "joh@",
},
{
"token" : "joh@m",
},
{
"token" : "joh@m.",
},
{
"token" : "joh@m.c",
},
{
"token" : "joh@m.co",
},
{
"token" : "oh@",
},
{
"token" : "oh@m",
},
{
"token" : "oh@m.",
},
{
"token" : "oh@m.c",
},
{
"token" : "oh@m.co",
},
{
"token" : "h@m",
},
{
"token" : "h@m.",
},
{
"token" : "h@m.c",
},
{
"token" : "h@m.co",
},
{
"token" : "@m.",
},
{
"token" : "@m.c",
},
{
"token" : "@m.co",
},
{
"token" : "m.c",
},
{
"token" : "m.co",
},
{
"token" : ".co",
}
]
}
Above the sample, you can query/filter by entering these tokens such as joh@
or @m.co
or referring to resulting tokens.
Conclusion
If you want to know, each analyzer of Elasticsearch tokenizes your queries. You can use Analyze API to analyze it.
See more Analyze API for more details.
Top comments (0)