DEV Community

Lucas Rivelles
Lucas Rivelles

Posted on

Elasticsearch: The keyword data type

Words have power text
We already saw how Elasticsearch stores text values using its standard analyser. However, what if we want to store entire phrases into our inverted index? For this, we need to use the data type keyword.

Fields mapped as keywords are analyzed using a different analyzer. Its analyzer is so called no-op analyzer, because it doesn't make any changes to our variable. Let's see an example:

If we analyze the text The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :) using the standard analyzer, we`ll see this result:


POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)"
}


{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "2",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<NUM>",
"position" : 1
},
{
"token" : "quick",
"start_offset" : 6,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "brown",
"start_offset" : 16,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "foxes",
"start_offset" : 22,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "jumped",
"start_offset" : 28,
"end_offset" : 34,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "over",
"start_offset" : 35,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "the",
"start_offset" : 40,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "lazy",
"start_offset" : 44,
"end_offset" : 48,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "dog's",
"start_offset" : 49,
"end_offset" : 54,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "bone",
"start_offset" : 55,
"end_offset" : 59,
"type" : "<ALPHANUM>",
"position" : 10
}
]
}

The text was completely modified by Elasticsearch's standard analyzer and the result of this operation was stored in the inverted index with each term. Now, let's modify our request by specifying that we want to use the keyword analyzer.


POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"analyzer": "keyword"
}


{
"tokens" : [
{
"token" : "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"start_offset" : 0,
"end_offset" : 63,
"type" : "word",
"position" : 0
}
]
}

Now, the generated token was the text itself.
What are the effects of storing a field as a keyword? If we store it as a text, we are able to use Elasticsearch's textual queries, whereas if we use keyword, it is not possible, since the term which is stored is exactly the String that we used on the request.
When using a keyword, the field can be also used for aggregations, which we'll see in the next articles.

Top comments (0)