DEV Community

Lucas Rivelles
Lucas Rivelles

Posted on

3 3

Elasticsearch: The keyword data type

Words have power text
We already saw how Elasticsearch stores text values using its standard analyser. However, what if we want to store entire phrases into our inverted index? For this, we need to use the data type keyword.

Fields mapped as keywords are analyzed using a different analyzer. Its analyzer is so called no-op analyzer, because it doesn't make any changes to our variable. Let's see an example:

If we analyze the text The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :) using the standard analyzer, we`ll see this result:


POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)"
}


{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "2",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<NUM>",
"position" : 1
},
{
"token" : "quick",
"start_offset" : 6,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "brown",
"start_offset" : 16,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "foxes",
"start_offset" : 22,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "jumped",
"start_offset" : 28,
"end_offset" : 34,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "over",
"start_offset" : 35,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "the",
"start_offset" : 40,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "lazy",
"start_offset" : 44,
"end_offset" : 48,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "dog's",
"start_offset" : 49,
"end_offset" : 54,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "bone",
"start_offset" : 55,
"end_offset" : 59,
"type" : "<ALPHANUM>",
"position" : 10
}
]
}

The text was completely modified by Elasticsearch's standard analyzer and the result of this operation was stored in the inverted index with each term. Now, let's modify our request by specifying that we want to use the keyword analyzer.


POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"analyzer": "keyword"
}


{
"tokens" : [
{
"token" : "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"start_offset" : 0,
"end_offset" : 63,
"type" : "word",
"position" : 0
}
]
}

Now, the generated token was the text itself.
What are the effects of storing a field as a keyword? If we store it as a text, we are able to use Elasticsearch's textual queries, whereas if we use keyword, it is not possible, since the term which is stored is exactly the String that we used on the request.
When using a keyword, the field can be also used for aggregations, which we'll see in the next articles.

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay