loading...
Cover image for AWS Elasticsearch Service Data Rotation

AWS Elasticsearch Service Data Rotation

aliartiza75 profile image Irtiza Ali ・2 min read

Overview

Problem

Elasticsearch is normally used for application logs management and monitoring. Logs should be retained for a specific interval of time, based on the needs and later must be discarded to clean up the disk space.

Elasticsearch provides a feature that can be used to delete the old data. But it is not recommended due to this problem.

Solution

The recommended way to clean up data is by using Elasticsearch Curator.

So in this story, we will create a lambda for curator and trigger it by using the CloudWatch event after a defined interval of time. Once lambda is triggered it will clean up the data using multiple filters.

Each of the above steps will be discussed in details later in this story.

Pre-Requisites

It is better to have knowledge about these services:

  1. AWS Elasticsearch Service

  2. Elasticsearch Curator

  3. AWS Lambda

  4. AWS Cloudwatch Events

Assumption

I am assuming that you have a running AWS Elasticsearch cluster and application logs are being dumped in it.

Details

In this section, I will explain each step of the solution in detail:

1. Curator’s Lambda

  1. Create an IAM role and attach this inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet",
                "es:ESHttpDelete"
            ],
            "Resource": "<es-cluster-arn>"
        }
    ]
}

This policy will allow lambda to perform Get and Delete operations on the Elasticsearch cluster.

  1. Create a lambda and assign the above role to it.

  2. Once lambda is created we need to package and publish its code. In this Github repository, you will find:

  • lambda’s code.

  • guidelines on how to use filters (it will be used to filter elasticsearch indices that need to be deleted).

  • how to use different environment variables.

  • guideline on packaging and publishing lambda’s code.

2. Curator’s Lambda

Once lambda is published, we need to implement a cron functionality to trigger the lambda on regular intervals, to do it we will use AWS CloudWatch Events. Follow the guidelines given below:

  1. Create an Event Rule.

  2. Choose the Schedule event source.

  3. Create and assign a Cron Expression based on your needs. AWS cron expression is a little bit different from the one we normally use. Details can be found on this link.

  4. In the target select the lambda created above.

Final Thoughts

I hope that you like this story and please give feedback about anything that can be improved or I have missed. Thank you :)

Discussion

markdown guide