DEV Community


Ruby Curator for AWS ElasticSearch Service

custominktech profile image Custom Ink Technology ・4 min read

by: Katherine Cisneros

A Ruby alternative to the Python Curator for ElasticSearch.

If you use Amazon Web Services' ElasticSearch Service, there's a good chance you will need to manage the indices so you do not kill your ElasticSearch cluster. On the AWS documentation page they have four options available as of this writing and depending on which version of the ElasticSearch engine you are using, you will find a best fit for your use case. Three of the options seem to be native to AWS, and the other is an outside resource that is written in Python. This blog post is not intended to convince you to use one version over another, it is simply providing an alternative to one of the options, Curator, which is written in Python.

At Custom Ink, we primarily support Ruby on Rails apps, and we felt as though this may be a good opportunity to have something that is more widely understood and supportable by our engineers. After researching online for a Ruby alternative to the Curator lambda, I was unable to find something, so we went to work on writing our own.

I used Lamby as the framework for this Lambda, which made a lot of the initial set up very easy! The first step in the proccess is to create an IAM role for the Lambda function.

Create an IAM user for your lambda function

You should first create an IAM user that will be used by the lambda to invalidate/delete indices. This is how the policy should look like for this user:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:es:REGION:ACCOUNT:domain/elasticsearch-cluster/*"
Enter fullscreen mode Exit fullscreen mode

If the above policy has too wide of a resource, you can definitely pin down the resource access even further to the endpoint that controls the indices. Our Lambda was in the VPC and only affected our 1 cluster so we felt okay with access to more than that endpoint for our access policy.

Under the Security Credentials tab, you should generate access keys and then save those keys in SSM. You can save both the username & password in the same secret, or have them in separate secrets, just be mindful that you are charged by API call for SSM. Below is the example I have for this function:

  "access_key_id": "SUPERSECRETACCESSKEYID",
  "secret_access_key": "SUPERSECRETACCESSKEY"
Enter fullscreen mode Exit fullscreen mode

Create your serverless function & code

Below is the core of the work, this curator.rb file is set in the lib directory. As you can see, it's fairly straightforward. You simply have to add in the name of your full URL & port for the ElasticSearch cluster, your region, and your account number. In order to ensure the lambda can actually perform the proper functions, we do have to pass in variables saved in SSM, which is set as an environment below that is passed in to the lambda function. This will delete indexes older than 7 days, but is completely customizable depending on your needs. The values in the event handler can be passed in as ENV values along with the ElasticSearch domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.

# Add your gem requires here:
require 'aws-sdk'
require 'faraday_middleware/aws_sigv4'
require 'elasticsearch'
# to convert json from ssm
require 'json' 
require_relative 'curator/ssm'

def handler(event:, context:)
  full_url_and_port =
  region = 'us-east-1' # e.g. us-west-1
  account = '1234567890'

  keys = JSON.parse(SSM.get_parameter('/path/to-elasticsearch-keys'))

  client = full_url_and_port) do |f|
    f.request :aws_sigv4,
      service: 'es',
      region: region,
      access_key_id: keys['access_key_id'],
      secret_access_key: keys['secret_access_key']
  response = client.perform_request 'GET', '_cat/indices?format=JSON'
  p response.body
  # gather indexes here
  indices = {|a| a["index"]}  
  indices.each do |index|
  # returns class called match data.
    date = index.match(/\d{4}-\d{2}-\d{2}/) 
    if date
      date = Date.parse date.to_s
      if date < - 7
        puts client.indices.delete index: index
        p "Deleted index: #{index}"
Enter fullscreen mode Exit fullscreen mode

In the above code, please note that you can pass in the account ID & region values as ENV values along with the ES domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.

In the curator folder, we have another folder for the Ruby code that generates an SSM class to get the SSM variables from AWS, that is shared below:

# uses AWS SDK SSM gem
require 'aws-sdk-ssm'

class SSM
  def self.get_parameter(name)
    ssm =
    ssm_response = ssm.get_parameter({
      name: name,
      with_decryption: true,
Enter fullscreen mode Exit fullscreen mode

Be sure to add these two gems to your gemfile as well:

gem 'elasticsearch', "~> 6.7" # change depending on your version
gem 'faraday_middleware-aws-sigv4'
Enter fullscreen mode Exit fullscreen mode

Write your template.yaml CloudFormation

Lastly we have a template.yaml for our CloudFormation which generates these resources. You can see it below:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Curator


    Type: String
    Default: development
      - test
      - development
      - staging
      - prod


    Type: AWS::Serverless::Function
      CodeUri: .
      Handler: lib/curator.handler
      Runtime: ruby2.7
      Timeout: 60
      MemorySize: 512
      FunctionName: !Sub curator-${StageEnv}
          STAGE_ENV: !Ref StageEnv
        - Version: '2012-10-17'
            - Effect: Allow
                - ssm:GetParametersByPath
                - ssm:GetParameters
                - ssm:GetParameterHistory
                - ssm:GetParameter
                - kms:Decrypt
                - arn:aws:ssm:*:1234567890:parameter/path/to-elasticsearch-keys
                - arn:aws:kms:*:1234567890:key/key-string-12345
          Type: Schedule
            Schedule: 'cron(0 8 ? * * *)' # UTC so 3AM EST          


    Description: Lambda Function Arn
    Value: !GetAtt CuratorLambda.Arn
Enter fullscreen mode Exit fullscreen mode

That's it! Now we have a serverless resource that can help you manage your indicies in AWS ElasticSearch. If you have any questions on this, please feel free to leave a comment! I hope this is helpful to others out there who are finding it hard to have a Ruby version for an ElasticSearch lambda curator. This work was done with William Spencer, the WebOps manager at Custom Ink.

Discussion (0)

Forem Open with the Forem app