DEV Community

Lucas Rivelles
Lucas Rivelles

Posted on

Elasticsearch: Object and Nested Data Types

Mapping basically defines the structure of documents and it is also used to configure how values will be indexed within Elasticsearch.

Elasticsearch doesn't require us to create a mapping for our indices, because it works using dynamic mapping. So, it will infer our data types based on what we are inserting in our document.

Since mapping is quite flexible, we can also combine explicit mapping with dynamic mapping. So, we can create an index with explicit data types and, when we add documents, they may have new fields and Elasticsearch will store them according to their types.
Connected

Data Types

Elasticsearch provides some pretty regular data types, which we can find also in many programming languages, such as: short, integer, long, float, double, boolean, date

The object data type

The object data type represents how Elasticsearch stores its JSON values, basically every document is an object, and we can have nested objects, let's see an example:

PUT /users
{
  "mappings": {
    "properties": {
      "name": {"type": "text"},
      "birthday": {"type": "date"},
      "address": {
        "properties": {
          "country": {"type": "text"},
          "zipCode": {"type": "text"}
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This will create an index called users with a text field, a date field and an object field, which represents an address and contains nested text fields.
If we want to index a document to this, we can make the following request:

POST /users/_doc/1
{
  "name": "Lucas",
  "birthday": "1990-09-28",
  "address": {
    "country": "Brazil",
    "zipCode": "04896060"
  }
}
Enter fullscreen mode Exit fullscreen mode

Since Elasticsearch runs on top of Apache Lucene, we should be aware that these objects are not really stored as JSON inside it. When we index a nested object, like the address field from the last example, Elasticsearch flattens the object and make it like this:

{
  "name": "Lucas",
  "birthday": "1990-09-28",
  "address.country": "Brazil",
  "address.zipCode": "04896060"
}
Enter fullscreen mode Exit fullscreen mode

What if we had an array of addresses instead? In this case, Elasticsearch would then store our fields like an array of countries and an array of zipCodes, so this:

POST /articles/_doc
{
  "name": "Elasticsearch article",
  "reviews": [
    {
      "name": "Lucas",
      "rating": 5
    },
    {
      "name": "Eduardo",
      "rating": 3
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Will be stored like this:

{
  "name": "Elasticsearch article",
  "reviews.name": ["Lucas", "Eduardo"],
  "reviews.rating": [5, 3]
}
Enter fullscreen mode Exit fullscreen mode

The nested data type

A nest
In the previous example, we indexed a document representing an article that contains 2 reviews from 2 different users. Let's try to run the following query to try to get articles reviewed by Eduardo with rating greater than 4:

GET /articles/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": { "reviews.name": "Eduardo" }},
        {"range": { "reviews.rating": {"gt": 4} }}
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

We'll get the following result:

...
"hits" : [
      {
        "_index" : "articles",
        "_id" : "sboYZIEBFSkh39rxJql6",
        "_score" : 1.287682,
        "_source" : {
          "name" : "Elasticsearch article",
          "reviews" : [
            {
              "name" : "Lucas",
              "rating" : 5
            },
            {
              "name" : "Eduardo",
              "rating" : 3
            }
          ]
        }
      }
    ]
...
Enter fullscreen mode Exit fullscreen mode

This is not exactly what we are looking for, and why is that? Basically, since Elasticsearch flattened our document, it can't query based on these filter because they are not related. But what if we need to have this relationship? That's where the data type nested comes in.

This data type basically tells Elasticsearch that our nested object has a relationship with its parent. Let's see the same example, but defining the field reviews as nested.

PUT /articles_v2 
{
  "mappings": {
    "properties": {
      "name": {"type": "text"},
      "reviews": {"type": "nested"}
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's index the same document as before:

POST /articles_v2/_doc
{
  "name": "Elasticsearch article",
  "reviews": [
    {
      "name": "Lucas",
      "rating": 5
    },
    {
      "name": "Eduardo",
      "rating": 3
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Let's now create a nested query to search for articles reviewd by Eduardo with rating greater than 4.

GET /articles_nested_v2/_search
{
  "query": {
    "nested": {
      "path": "reviews",
      "query": {
        "bool": {
          "must": [
            { "match": {"reviews.name": "Eduardo"} },
            { "range": {"reviews.rating": {"gt": 4}} }
          ]
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

And the return shows as that we have no articles matching these conditions, which is correct!

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)