DEV Community

loading...
Cover image for Update your mapping & reindex ElasticSearch easily...well, pretty easily

Update your mapping & reindex ElasticSearch easily...well, pretty easily

Dan Stanhope
・3 min read

If you've ever worked with ElasticSearch you've probably had to alter your original mapping and reindex your cluster's data. Unless you're super-awesome at predicting the search functionality you need from ElasticSearch and got your mapping perfect the first time. If this is you, no need to finish reading this post.

Tinkering with data in production isn't the most fun job in the world and I've never really found step-by-step instructions on how to properly/safely perform a reindex. So that's what this post is all about. Hopefully it'll help someone!

As always, test this out on your local, staging or any non-production ElasticSearch cluster before assuming it works. How many records you're reindexing will be one of the determining factors in how long it takes to process.

Create your new index + mapping

The first step is to create an entirely new index with your new mapping.

Let's say our ElasticSearch is running on http://127.0.0.1:9201, our old index is called testv1 and our new index testv2 .

For brevity, the mapping is empty.

curl -XPUT --header 'Content-Type: application/json' http://127.0.0.1:9201/testv2 -d '
{
  "settings": {},
  "mappings": {
    "_doc": {
      "properties": {}
    }
  }
}
'
Enter fullscreen mode Exit fullscreen mode

Reindex your data

Here's where we actually begin running the data from your old index into your new one. One thing to note is the wait_for_completion flag. When this is set to false, a task will be create and allow to you issue subsequent requests to get progress updates. Without it, the command will just hang and you'll likely receive a curl timeout and have no way of knowing when it's done reindexing.

curl -XPOST --header 'Content-Type: application/json' http://127.0.0.1:9201/_reindex?wait_for_completion=false -d  '
{
  "source" : {
    "index" : "testv1"
  },
  "dest" : {
    "index" : "testv2",
    "version_type" : "external"
  }
}
'
Enter fullscreen mode Exit fullscreen mode

This call will return a task id that can be used, to get status updates, like this:

curl -XGET --header 'Content-Type: application/json' http://127.0.0.1:9201/_tasks/<TASK_ID>?pretty=true
Enter fullscreen mode Exit fullscreen mode

Once your reindexing task is complete, you can verify the document counts in your old index match the new one.

curl -XGET http://127.0.0.1:9201/testv1/_count
curl -XGET http://127.0.0.1:9201/testv2/_count
Enter fullscreen mode Exit fullscreen mode

When you're satisfied that everything is looking good, move onto the last step...creating an alias and telling people how awesome you are at ElasticSearch.

Create an alias pointing to your new index and remove the old one

You likely don't want to have to alter all your queries to use the new index name. Fortunately, ElasticSearch allows you to create aliases. The idea being that we create an alias with the same name as the original index, but point it to the new one. This way there's no changes to be made to our queries.

curl -XPOST --header 'Content-Type: application/json' http://127.0.0.1:9201/_aliases -d  '
{
    "actions" : [
      {
        "add" : {
          "index" : "testv2",
          "alias" : "testv1"
        }
      },
    {
        "remove_index" : {
          "index": "testv1"
      }
    }
  ]
}
'
Enter fullscreen mode Exit fullscreen mode

Voilà! You're done.

As an aside, there are things you can do to improve the reindexing performance. Like changing the refresh_interval and lessening the number of replicas. Whether or not this is something you need will largely depend on the size of cluster and underlying hardware.

Here's a sample of how to do this, should you need to. Make sure you run this prior to starting the reindex command.

curl -XPUT --header 'Content-Type: application/json' http://127.0.0.1:9201/testv2/_settings -d  '
{
  "index" : {
    "refresh_interval" : "-1",
    "number_of_replicas": "0"
  }
}
'
Enter fullscreen mode Exit fullscreen mode

Make sure to set these values back to whatever you feel is best once the new index is fully populated.

Thanks!

Discussion (0)