loading...
Cover image for AWS Elasticsearch - Reindexing With Zero Downtime Programmatically

AWS Elasticsearch - Reindexing With Zero Downtime Programmatically

ynmanware profile image Yogesh Manware ・2 min read

Technology is changing faster than ever, there could be few more variations to do certain things or will evolve in future. Following is my opinion and others may disagree. So, take it with a grain of salt.

Scenario

Elasticsearch (ES) is used to store extremely high volume of data for a limited duration. In a greenfield project, there are generally quite a few moving parts and relentless requirement changes. Changing ES schema or field mapping is one of those. Elasticsearch allows adding new fields but it does not allow changing the data type or renaming fields etc without reindexing it. When the data is huge, reindexing would take some time (in minutes at times) and hence cause some downtime. Downtime is not acceptable for highly available applications, specially from the read aspect.

Using index alias, reindexing can happen within a millisecond.

High Level Design

Alt High Level Design

It is required that Data Retriever is always up/running and returns consistent data for the given index at any point of time.

Initial Setup

Create two aliases on the day one

  • write_order_agg pointing to order_agg_v1
  • read_order_agg pointing to order_agg_v1

The key is both Data Processor and Data Retriever do not know the real index, what they have is Alias to the Index.

Here are the steps for reindexing
  1. Stop Data Processor
    • This is an optional step, required if the processing logic changes
  2. Create new index with new mapping - order_agg_v2
  3. Update write_order_agg alias to point it to this index and remove link to order_agg_v1
  4. Deploy and Start updated Data Processor (opitional)
  5. Copy (reindex) documents from order_agg_v1 to order_agg_v2 and remove link to order_agg_v1
  6. Update read_order_agg alias to point to order_agg_v2
  7. Delete order_agg_v1 (it is recommended to execute this step manually after making sure all is good with the new index)

Following are few code snippets that can be used to automate above steps using Elasticsearch Client (Javascript)

Create Client
const esClient = new Client({
                   node: esHost,
                 });
Create New Index With Mapping
esClient.indices.create({index: indexName, body: mapping, include_type_name: true});
Add and Remove Alias at the same time
esClient.indices.updateAliases({body: actions})

where actions is
 const actions = {
            actions: [{
                remove: {
                    index: 'order_agg_v1',
                    alias: 'write_order_agg'
                }
                add: {
                    index: 'order_agg_v2',
                    alias: 'write_order_agg'
                }
            }]
        };
Reindex (Copy Documents)
esClient.reindex({
            waitForCompletion: true, // make sure you wait until it completes
            refresh: false,
            body: {
                source: {
                    index: 'order_agg_v1'
                },
                dest: {
                    index: 'order_agg_v2',
                    type: 'doc'
                }
            }
        })

Automation of steps comes handy when there are significantly high number of indexes.

More information on Elastic Search API
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
https://www.npmjs.com/package/elasticsearch

Inspired from: https://engineering.carsguide.com.au/elasticsearch-zero-downtime-reindexing-e3a53000f0ac

Discussion

pic
Editor guide