DEV Community

Cover image for Clean 'em! Getting rid of unused AMIs using Python Lambda and Terraform
Oksana Horlock for AWS Community Builders

Posted on • Edited on

Clean 'em! Getting rid of unused AMIs using Python Lambda and Terraform

We are all aware that in the AWS-cloud world of today, immutable infrastructure and deployments are preferrable. It is also a fact that if we use immutable deployments, it means we often create multiple Amazon Machine Images (AMIs). To reduce storage costs we might want to delete (or deregister, in AWS speak) these AMIs and associated storage volumes.

In this blog post I will describe how to set up an AMI cleaner for unused images.

The main part is a Lambda function. It checks the images and deletes them and accompanying EBS snapshots. The function is written in Python, and it uses Boto3, an AWS SDK for Python. It also relies on JMESPath, the query language of the AWS CLI for querying JSON (more on it here). The function takes the following in the "event" argument:

  • regions (list of strings): in what region you'd like to run the cleaner
  • max_ami_age_to_prevent_deletion (number): if an AMI is older than the specified value, it can safely be deleted
  • ami_tags (a map of strings where each object has a tag key and tag value): if an image has the specified tags, it could be a candidate for deletion

Let's have a look at the helper methods that are used in the Lambda:

1) A method to find AMIs used in autoscaling groups:

def imagesInASGs(region):
  amis = []
  autoscaling = boto3.client('autoscaling', region_name=region)
  print(f'Checking autoscaling groups in region {region}...')
  paginator = autoscaling.get_paginator('describe_auto_scaling_groups')

  page_iterator = paginator.paginate(
    PaginationConfig = {'PageSize': 10}
  )  
  filtered_asgs = page_iterator.search(f"AutoScalingGroups[*].[Instances[?LifecycleState == 'InService'].[InstanceId, LaunchTemplate.LaunchTemplateId,LaunchTemplate.Version]]")

  for key_data in filtered_asgs:
    matches = re.findall(r"'(.+?)'",str(key_data))
    instance_id = matches[0]
    template = matches[1]
    version = matches[2]
    print(f"Template found: {template} version {version}")

    if (template == ""):
      send_alert(f"AMI cleaner failure", f"Failed to find launch template that was used for instance {instance_id}")
      return

    ec2 = boto3.client('ec2', region_name = region)
    launch_template_versions = ec2.describe_launch_template_versions(
      LaunchTemplateId=template, 
      Versions=[version]
    );  
    used_ami_id = launch_template_versions["LaunchTemplateVersions"][0]["LaunchTemplateData"]["ImageId"]
    if not used_ami_id:
      send_alert(f"AMI cleaner failure", f"Failed to find AMI for launch template {template} version {version}")
      return    
    amis.append(used_ami_id)
  return amis
Enter fullscreen mode Exit fullscreen mode

Here, by using boto3 we paginate through autoscaling groups in a region. And then we use an equivalent of AWS CLI query to get the details of the autoscaling groups that are most interesting for us:
filtered_asgs = page_iterator.search(f"AutoScalingGroups[*].[InstanceId, LaunchTemplate.LaunchTemplateId,LaunchTemplate.Version]]")

The result we get is a string, and by using this regex: "'(.+?)'" we break down the string into separate variables.

After that we use boto3 ec2 client to extract the AMI Id used in autoscaling groups, and save this value into an array.

2) The next function will get AMI Ids that are used in running EC2s, including those that were not launched using autoscaling:

def imagesUsedInEC2s(region):
  print(f'Checking instances that are not in ASGs in region {region}...')
  amis = []
  ec2_resource = boto3.resource('ec2', region_name = region)
  instances = ec2_resource.instances.filter(
    Filters=
    [
      {
        'Name': 'instance-state-name',
        'Values': [ 'running' ]
      }
    ])
  for instance in list(instances):
      amis.append(instance.image_id)

  return amis
Enter fullscreen mode Exit fullscreen mode

3) A method that creates AMI filters in the correct format. We pass in values as a map(string) in Terraform, and we need to convert these values into JMESPath format, which is the following:

{
   'Name': 'tag:CatName',
   'Values': [ 'Boris' ]
}
Enter fullscreen mode Exit fullscreen mode

The method itself looks like this:

def makeAmiFilters(ami_tags):
  filters = [
    {
      'Name': 'state',
      'Values': ['available']
    }
  ]
  for tag in ami_tags:
    filters.append({'Name': f'tag:{key}', 'Values':[f'{value}'] })
  return filters
Enter fullscreen mode Exit fullscreen mode

4) A function that sends a message to an SNS topic:

def send_alert(subject, message):
  sns.publish(
    TargetArn=os.environ['sns_topic_arn'], 
    Subject=subject, 
    Message=message)
Enter fullscreen mode Exit fullscreen mode

5) The main function, or the handler:

def lambda_handler(event, context):
  amis_in_use = []
  total_amis_deleted = 0
  total_snapshots_deleted = 0
  try:
    regions = event['regions']
    max_ami_age_to_prevent_deletion = event['max_ami_age_to_prevent_deletion']

    filters = makeAmiFilters(event['ami_tags'])

    for region in regions:
      amis_in_use = list(set(imagesInASGs(region) + imagesUsedInEC2s(region)))
      ec2 = boto3.client('ec2', region_name = region)
      amis = ec2.describe_images(
        Owners = ['self'],
        Filters = filters
      ).get('Images')
      for ami in amis:
        now = datetime.now()
        ami_id = ami['ImageId']
        img_creation_datetime = datetime.strptime(ami['CreationDate'], '%Y-%m-%dT%H:%M:%S.%fZ')
        days_since_creation = (now - img_creation_datetime).days

        if ami_id not in amis_in_use and days_since_creation > max_ami_age_to_prevent_deletion:
          ec2.deregister_image(ImageId = ami_id)
          total_amis_deleted += 1

          for ebs in ami['BlockDeviceMappings']:
            if 'Ebs' in ebs:
              snapshot_id = ebs['Ebs']['SnapshotId']              
              ec2.delete_snapshot(SnapshotId=snapshot_id)
              total_snapshots_deleted += 1

    print(f"Deleted {total_amis_deleted} AMIs and {total_snapshots_deleted} EBS snapshots")

  except Exception as e:
    send_alert(f"AMI cleaner failure", e)
Enter fullscreen mode Exit fullscreen mode

Infrastructure
CloudWatch Events rule that triggers on schedule has the above Lambda function as a target. In this example, the function will run on the first day of every month:

resource "aws_cloudwatch_event_rule" "trigger" {
  name = "${var.name_prefix}-ami-cleaner-lambda-trigger"
  description = "Triggers that fires the lambda function"
  schedule_expression = "cron(0 0 1 * ? *)"
  tags = var.tags
}
Enter fullscreen mode Exit fullscreen mode

The event target specifies an input to pass into the Lambda function, among other parameters (the values here are purely for example purposes):

resource "aws_cloudwatch_event_target" "clean_amis" {
  rule = aws_cloudwatch_event_rule.trigger.name
  arn = aws_lambda_function.ami_cleaner.arn
  input = jsonencode({
    ami_tags_to_check= {
     "Environment"="UAT"
     "Application"="MyApp"
    }
    regions = ["us-east-2", "eu-west-1"]
    max_ami_age_to_prevent_deletion = 7
  })
}
Enter fullscreen mode Exit fullscreen mode

If you'd like to create a test event for this Lambda function, you'll need to enter the following into the test event field:

{
  "regions": ["us-east-2", "eu-west-1"],
  "max_ami_age_to_prevent_deletion": 7,
  "ami_tags_to_check": {
    "Environment": "UAT"
    "Application": "MyApp"
  }
}
Enter fullscreen mode Exit fullscreen mode

The function itself needs to have the following Terraform resources defined:

resource "aws_lambda_function" "ami_cleaner" {
  filename = "${path.module}/lambda.zip"
  function_name = "ami-cleaner-lambda"
  role = aws_iam_role.iam_for_lambda.arn
  handler = "lambda_function.lambda_handler"
  runtime = "python3.8"
  source_code_hash = data.archive_file.lambda_zip.output_base64sha256
  tags = var.tags

  environment {
    variables = {
      sns_topic_arn = var.sns_topic_arn
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
resource "aws_lambda_permission" "allow_cloudwatch_to_call_ami_cleaner" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ami_cleaner.function_name
  principal     = "events.amazonaws.com"
  source_arn    = "arn:aws:events:<region>:<account_id>:rule/ami-cleaner-lambda-trigger*"
}
Enter fullscreen mode Exit fullscreen mode
data "archive_file" "lambda_zip" {
  type        = "zip"
  source_file = "${path.module}/lambda.py"
  output_path = "${path.module}/lambda.zip"
}
Enter fullscreen mode Exit fullscreen mode

Using archive_file data source in Terraform is convenient because you won't need to create a zip with the function manually when you update it.

Lambda IAM Policy
For the Lambda function to perform the described operations on resources, the following IAM actions need to be allowed in the policy:

"ec2:DescribeImages", 
"ec2:DescribeInstances",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",

"ec2:DeregisterImage",
"ec2:DeleteSnapshot",
"autoscaling:DescribeAutoScalingGroups",
"sns:Publish"   
Enter fullscreen mode Exit fullscreen mode

In order to not allow the function to delete any AMIs and snapshots but only those with a specific tag, we can create Terraform policy statement dynamically and restrict the policy to allow removal of resources only if they have a certain tag key and value:

data "aws_iam_policy_document" "ami_cleaner_policy_doc" {
...
  dynamic "statement" {
    for_each = var.ami_tags_to_check
      content {
        actions = [
        "ec2:DeregisterImage",
        "ec2:DeleteSnapshot"
        ]
        resources = ["*"]
        condition {
          test     = "StringLike"
          variable = "aws:ResourceTag/${statement.key}"
          values = [statement.value]
        }        
        effect = "Allow"      
    }
  }   
}
Enter fullscreen mode Exit fullscreen mode

Of course, a lot of the values in Terraform can be set as variables. In this case, we can pass the following values as variables to the AMI cleaner module:

  • tags
  • regions
  • sns_topic_arn
  • ami_tags_to_check
  • max_ami_age_to_prevent_deletion
  • schedule_expression

SUMMARY
Hopefully, this post exemplifies how to do AMI cleanup based on tags, in multiple AWS regions. I have learnt a lot from this piece of work, and I hope someone will learn something new about AWS or Terraform too.

Top comments (0)