Using the Nested datatype, Multi-fields, ignore_malformed
, and adding a pre-processing step to flatten nested JSON into key-value pairs, it's possible to index arbitrary JSON (even with type conflicts) into Elasticsearch.
Index definition:
{
"settings": {
"mapper.dynamic": false
},
"mappings": {
"sometype": {
"dynamic": "strict",
"_all" : { "enabled" : true },
"properties": {
"kv_pairs": {
"type": "nested",
"properties": {
"key": { "type": "string", "index": "not_analyzed" },
"value": {
"type": "string",
"fields": {
"raw_string": { "type": "string", "index": "not_analyzed" },
"analyzed_string": { "type": "string", "index": "analyzed" },
"date": { "type": "date", "ignore_malformed": true },
"long": { "type": "long", "ignore_malformed": true },
"double": { "type": "double", "ignore_malformed": true }
}
}
}
}
}
}
}
}
Pre-processing step example:
$ echo '{"integer":3,"nested":{"inner":"object"}}' | \
jq '[leaf_paths as $path | {"key": $path | map(tostring) | join("."), "value": getpath($path)}] | {kv_pairs: .}'
{
"kv_pairs": [
{
"key": "integer",
"value": 3
},
{
"key": "nested.inner",
"value": "object"
}
]
}
Example query:
{
"query": {
"nested": {
"path": "kv_pairs",
"query": {
"bool": {
"must": [
{
"match": {
"kv_pairs.key.raw_string": "nested.inner"
}
},
{
"match": {
"kv_pairs.value.raw_string": "object"
}
}
]
}
}
}
}
}
- Flattening the JSON into key-value pairs is necessary to avoid a mapping explosion
- The nested type is necessary to associate key-value pairs so that
"bool":{"must":[...]}
clauses operate on individual pairs -
ignore_malformed
suppresses Elasticsearch mapping conflicts
Top comments (0)