Irtiza Ali

Posted on Jun 3, 2020

Data Migration to New AWS Elasticsearch Service Domain

#awselasticsearch #datamigration #backuprestore

This story provides guidelines to migration data to the new AWS Elasticsearch Service Domain.

Overview

Data migration to the new AWS Elasticsearch Service domain consists of two steps:

Creating a manual snapshot of Elasticsearch Service domain data on the S3 bucket.
Restore the snapshot from S3 in the Elasticsearch domain.

Assumption

I am assuming that you already know how to create an AWS Elasticsearch Service domain.

Manual Snapshot/Backup

Create a bucket in the same region where the Elasticsearch domain exists.
Copy the bucket arn.
Create an IAM role, this role will allow Elasticsearch to use S3. Initially create a role of ec2 use case (it will be changed later) without any policy. The policy will be added later.
Add an inline JSON policy and use the bucket arn copied in step 2:

{
   "Version": "2012-10-17",
   "Statement": [{
       "Action": [
         "s3:ListBucket"
       ],
       "Effect": "Allow",
       "Resource": [
         "arn:aws:s3:::<bucket-name>"
       ]
     },
     {
       "Action": [
         "s3:GetObject",
         "s3:PutObject",
         "s3:DeleteObject"
       ],
       "Effect": "Allow",
       "Resource": [
         "arn:aws:s3:::<bucket-name>/*"
       ]
     }
   ]
 }

Add a trust relationship so that Elasticsearch can assume this role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "es.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM user with AWS CLI utility usage enabled. This user will be used to register the manual snapshot repository. Attach the inline JSON policy given below:

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Action": "iam:PassRole",
       "Resource": "<role-arn-created-in-step-3>"
     },
     {
       "Effect": "Allow",
       "Action": "es:ESHttpPut",
       "Resource": "<elasitcsaerch-arn>"
     }
   ]
 }

Configure the user created in step-6 using its access-id and access-secret:

aws configure

Enter data for each prompt.

Install pip and some packages

sudp install python-pip
sudo pip install requests-aws4auth

Create a python file and paste the script given below:

import boto3
import requests
from requests_aws4auth import AWS4Auth

host = '<existing elasticsearch service domain url>'
region = '<elasticsearch service domain region>'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository
path = '_snapshot/<snapshot-repository-name>' # the Elasticsearch API endpoint
url = host + path

payload = {
  "type": "s3",
  "settings": {
    "bucket": "<enter bucket name created in step-1>",
    "region": "<bucket region>",
    "role_arn": "<arn of role created in step-3>"
  }
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)

Run the script, it will print the data given below:

Note
Make sure if configure (command:aws configure) aws user command is executed using sudo run the python script using sudo otherwise, there is no need to use sudo.

Take the manual snapshot either by using elasticsearch api or kibana dev tool console:

PUT _snapshot/<snapshot-repository-name>/<date/snapshot-name>

To check snapshot has been created successfully and the indices that are part of this snapshot:

GET _snapshot/<snapshot-repository-name>/_all?pretty

Check the s3 bucket to check whether data has been created successfully.

Restore Snapshot

Create a new Elasticsearch Service Domain.
A role is required that will allow the new Elasticsearch Service Domain to access the S3 that was used to store the snapshots. but we don't need to create a new role because the role created in Step-3 of Manual Snapshot can be used here.
Create an IAM user with AWS CLI utility usage enabled. This user will be used to register the manual snapshot repository with a new Elasticsearch Service Domain. Attach the inline JSON policy:

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Action": "iam:PassRole",
       "Resource": "<role-arn-refered-in-step-2>"
     },
     {
       "Effect": "Allow",
       "Action": "es:ESHttp*",
       "Resource": "<new-elasitcsaerch-arn>"
     }
   ]
 }

Configure the user on a system:

aws configure

Install pip and packages but if already exists then no need for this step:

sudo install python-pip
sudo pip install requests-aws4auth

Create a file and paste the python script given below:

import boto3
import requests
from requests_aws4auth import AWS4Auth

host = '<new existing elasticsearch service domain url>'
region = '<new elasticsearch service domain region>'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository
path = '_snapshot/<snapshot-repository-name-used-in-manual-snapshot>' # the Elasticsearch API endpoint
url = host + path

payload = {
  "type": "s3",
  "settings": {
    "bucket": "<enter bucket name created manual snapshot process>",
    "region": "<bucket region>",
    "role_arn": "<arn of role refered in step-2>"
  }
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)

Run the python script

Note
Make sure if configure (command:aws configure) aws user command is executed using sudo run the python script using sudo otherwise, there is no need to use sudo.

To check snapshot repository is configured, check the existing snapshots by either by using elasticsearch api or kibana dev tool console:

GET _snapshot/<snapshot-repository-name>/_all?pretty

It must show the snapshot that was created in the manual snapshot process.
To check existing indices:

GET _aliases?pretty=true

Restore the snapshot either by using elasticsearch api or kibana dev tool console:

POST _snapshot/<snapshot-repository-name>/<date/snapshot-name>/_restore -d
{
  "indices": "<index-name>",
  "ignore_unavailable": false,
  "include_global_state": false
}

Verify that the index has been restored:

GET _aliases?pretty=true

Verify the data of the index:

GET /<index-name>/_search/

Final Thoughts

I hope you have liked this tutorial. Do give me feedback about anything that can be improved. Thank you.

Top comments (1)

Umair Akram • Apr 16 '21

Great work!!!
We are providing best digital marketing services in Pakistan.
creativejaguars.com/

DEV Community

Data Migration to New AWS Elasticsearch Service Domain

Overview

Assumption

Manual Snapshot/Backup

Restore Snapshot

Final Thoughts

Top comments (1)

Read next

Building an Automated Cryptocurrency Trading Bot with TypeScript and AWS

"Unveiling Bias: Gender and Ethnicity in AI-Generated Software Images"

Cloud Native Engineer goes Free

How to Copy/Duplicate a File Duplicate in Vim (netrw+vinegar)