DEV Community

Cover image for AWS Cost Optimization: Periodic Deletion of ECR Container Images
Siddhant Khare
Siddhant Khare

Posted on

AWS Cost Optimization: Periodic Deletion of ECR Container Images

tl;dr;

Automated periodic deletion of ECR container images is a straightforward and effective way to optimize AWS costs. By leveraging Lambda functions and Step Functions, you can implement custom policies that meet your specific needs, ensuring that only necessary images are retained.


Introduction

Managing AWS costs can be challenging, especially with the increasing use of Elastic Container Registry (ECR) for storing container images. I've found that one effective way to cut costs is by periodically deleting unnecessary ECR container images. In this guide, I'll walk you through the steps to set up an automated cleanup process using Go.

Why Optimize ECR Storage?

ECR is a great tool for storing Docker container images, but as your CI/CD pipelines push more images, storage costs can quickly add up. Without regular cleanup, these costs can become significant. By implementing a strategy to automatically delete old or unused images, you can save money and keep your storage lean.

Using ECR Lifecycle Policies

ECR lifecycle policies are a built-in way to manage image cleanup. They allow you to set rules for automatically deleting images based on criteria such as age or tag. However, lifecycle policies have limitations, especially when you need to combine multiple conditions.

Challenges with ECR Lifecycle Policies

While ECR lifecycle policies provide a good starting point, they have limitations:

  1. Single Condition Policies: ECR lifecycle policies are designed to handle single-condition rules easily. For example, you can delete images older than a specific number of days or keep only the most recent N images. However, they struggle when you need to combine multiple conditions, such as "delete images older than X days and not among the latest N images."

  2. AND Conditions: The inability to use AND conditions in lifecycle policies means you can't create complex rules directly. For example, if you want to delete images that are older than 30 days and not part of the latest 10 images, you can't do this with a single lifecycle policy. You need a more sophisticated solution to handle such cases.

  3. Granular Control: Lifecycle policies provide limited control over the exact criteria used for image deletion. If your requirements are specific, such as retaining images based on custom tags or metadata, lifecycle policies may not suffice.

  4. Global vs. Repository-Specific Rules: Defining rules that apply globally to all repositories can be challenging. Lifecycle policies need to be set up for each repository individually, which can become cumbersome in environments with many repositories.

Custom Cleanup Solution

To overcome the limitations of lifecycle policies, we can use AWS Lambda functions and Step Functions to create a custom cleanup process. This approach offers more flexibility and control over which images get deleted.

Workflow Overview

Our custom solution involves the following steps:

  1. GetContainerRepositories Lambda Function: Retrieves a list of all ECR repositories in your AWS account.
  2. DeleteExpiredContainerImages-Map State: Processes each repository's image list.
  3. DeleteExpiredContainerImages Lambda Function: Evaluates and deletes images based on specified criteria.

Here's a visual representation of the workflow:

SFN State Machine

Implementation Details

Let's dive into the implementation of each step using Go.

  1. GetContainerRepositories: This Lambda function fetches a list of all ECR repositories and returns their details as JSON.
package main

import (
    "context"
    "log"

    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/ecr"
)

type ImageDetail struct {
    ImageDigest   string `json:"imageDigest"`
    ImagePushedAt string `json:"imagePushedAt"`
}

type Response struct {
    Images []ImageDetail `json:"images"`
}

func getImages(repositoryName string) ([]ImageDetail, error) {
    svc := ecr.New(session.New())
    var images []ImageDetail
    input := &ecr.DescribeImagesInput{
        RepositoryName: aws.String(repositoryName),
    }

    err := svc.DescribeImagesPages(input, func(page *ecr.DescribeImagesOutput, lastPage bool) bool {
        for _, image := range page.ImageDetails {
            images = append(images, ImageDetail{
                ImageDigest:   *image.ImageDigest,
                ImagePushedAt: image.ImagePushedAt.String(),
            })
        }
        return !lastPage
    })
    return images, err
}

func handleRequest(ctx context.Context) (Response, error) {
    repositoryName := "my-repository"
    images, err := getImages(repositoryName)
    if err != nil {
        return Response{}, err
    }
    return Response{Images: images}, nil
}

func main() {
    lambda.Start(handleRequest)
}
Enter fullscreen mode Exit fullscreen mode
  1. DeleteExpiredContainerImages-Map: This Map state iterates through each repository and invokes the DeleteExpiredContainerImages Lambda function.

  2. DeleteExpiredContainerImages: This Lambda function evaluates which images should be deleted based on criteria such as retaining the latest N images and those pushed within the last X days.

package main

import (
    "context"
    "time"

    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/ecr"
)

type ImageDetail struct {
    ImageDigest   string    `json:"imageDigest"`
    ImagePushedAt time.Time `json:"imagePushedAt"`
}

type Request struct {
    RepositoryName string       `json:"repositoryName"`
    Images         []ImageDetail `json:"images"`
}

func filterExpiredImages(images []ImageDetail) []ImageDetail {
    const (
        retainImageCount           = 10
        retainSinceImagePushedDays = 30
    )

    var toDelete []ImageDetail
    now := time.Now()
    retainLimit := now.AddDate(0, 0, -retainSinceImagePushedDays)

    if len(images) > retainImageCount {
        images = images[:retainImageCount]
    }

    for _, image := range images {
        if image.ImagePushedAt.Before(retainLimit) {
            toDelete = append(toDelete, image)
        }
    }
    return toDelete
}

func deleteImages(svc *ecr.ECR, repositoryName string, imageIds []string) error {
    input := &ecr.BatchDeleteImageInput{
        RepositoryName: aws.String(repositoryName),
        ImageIds:       make([]*ecr.ImageIdentifier, 0, len(imageIds)),
    }
    for _, id := range imageIds {
        input.ImageIds = append(input.ImageIds, &ecr.ImageIdentifier{ImageDigest: aws.String(id)})
    }

    _, err := svc.BatchDeleteImage(input)
    return err
}

func handleRequest(ctx context.Context, request Request) (string, error) {
    svc := ecr.New(session.New())
    toDelete := filterExpiredImages(request.Images)
    var imageIds []string
    for _, image := range toDelete {
        imageIds = append(imageIds, image.ImageDigest)
    }
    err := deleteImages(svc, request.RepositoryName, imageIds)
    if err != nil {
        return "Failed to delete images", err
    }
    return "Successfully deleted images", nil
}

func main() {
    lambda.Start(handleRequest)
}
Enter fullscreen mode Exit fullscreen mode

Periodic Triggers

To automate this process, schedule the Step Functions state machine using EventBridge rules. For instance, you can set it to run weekly on Friday nights.

Example Policies

Here are example policies showing both possible and not possible implementations:

Implementation Possible

Older than X days since push Included in latest N images? Action
Delete
Delete
Delete
Keep

Implementation Not Possible

Older than X days since push Included in latest N images? Action
Delete
Keep
Keep
Keep

Results

By implementing this periodic deletion strategy, you can significantly reduce your ECR storage costs. In my experience, this approach led to substantial savings, cutting unnecessary expenses and optimizing our AWS usage.

Thank you for reading, and happy optimizing!


For more tips and insights on security and log analysis, follow me on Twitter @Siddhant_K_code and stay updated with the latest & detailed tech content like this.

Top comments (0)