If you've ever worked with ElasticSearch you've probably had to alter your original mapping and reindex your cluster's data. Unless you're super-awesome at predicting the search functionality you need from ElasticSearch and got your mapping perfect the first time. If this is you, no need to finish reading this post.
Tinkering with data in production isn't the most fun job in the world and I've never really found step-by-step instructions on how to properly/safely perform a reindex. So that's what this post is all about. Hopefully it'll help someone!
As always, test this out on your local, staging or any non-production ElasticSearch cluster before assuming it works. How many records you're reindexing will be one of the determining factors in how long it takes to process.
Create your new index + mapping
The first step is to create an entirely new index with your new mapping.
Let's say our ElasticSearch is running on http://127.0.0.1:9201, our old index is called testv1
and our new index testv2
.
For brevity, the mapping is empty.
curl -XPUT --header 'Content-Type: application/json' http://127.0.0.1:9201/testv2 -d '
{
"settings": {},
"mappings": {
"_doc": {
"properties": {}
}
}
}
'
Reindex your data
Here's where we actually begin running the data from your old index into your new one. One thing to note is the wait_for_completion
flag. When this is set to false, a task will be create and allow to you issue subsequent requests to get progress updates. Without it, the command will just hang and you'll likely receive a curl
timeout and have no way of knowing when it's done reindexing.
curl -XPOST --header 'Content-Type: application/json' http://127.0.0.1:9201/_reindex?wait_for_completion=false -d '
{
"source" : {
"index" : "testv1"
},
"dest" : {
"index" : "testv2",
"version_type" : "external"
}
}
'
This call will return a task id that can be used, to get status updates, like this:
curl -XGET --header 'Content-Type: application/json' http://127.0.0.1:9201/_tasks/<TASK_ID>?pretty=true
Once your reindexing task is complete, you can verify the document counts in your old index match the new one.
curl -XGET http://127.0.0.1:9201/testv1/_count
curl -XGET http://127.0.0.1:9201/testv2/_count
When you're satisfied that everything is looking good, move onto the last step...creating an alias and telling people how awesome you are at ElasticSearch.
Create an alias pointing to your new index and remove the old one
You likely don't want to have to alter all your queries to use the new index name. Fortunately, ElasticSearch allows you to create aliases. The idea being that we create an alias with the same name as the original index, but point it to the new one. This way there's no changes to be made to our queries.
curl -XPOST --header 'Content-Type: application/json' http://127.0.0.1:9201/_aliases -d '
{
"actions" : [
{
"add" : {
"index" : "testv2",
"alias" : "testv1"
}
},
{
"remove_index" : {
"index": "testv1"
}
}
]
}
'
Voilà! You're done.
As an aside, there are things you can do to improve the reindexing performance. Like changing the refresh_interval
and lessening the number of replicas. Whether or not this is something you need will largely depend on the size of cluster and underlying hardware.
Here's a sample of how to do this, should you need to. Make sure you run this prior to starting the reindex command.
curl -XPUT --header 'Content-Type: application/json' http://127.0.0.1:9201/testv2/_settings -d '
{
"index" : {
"refresh_interval" : "-1",
"number_of_replicas": "0"
}
}
'
Make sure to set these values back to whatever you feel is best once the new index is fully populated.
Thanks!
Top comments (2)
Thank you a lot! This is literally saved hours for me.
Hey Daulet! You're very welcome. Glad it helped!