Safdar Wahid

Posted on May 6 • Originally published at blog.easecloud.io

ECR Storage Cost Optimization and Image Management

#aws #devops #docker #infrastructure

TLDR ;

Lifecycle policies that prune untagged images older than 14 days typically cut ECR storage by 50-80%.
Multi-stage Docker builds and distroless base images shrink final images by 60-90%, reducing storage and transfer cost.
Use pull-through cache and replication rules to avoid duplicating images across eu-west-1 and eu-central-1.
Replace ad-hoc tagging with immutable digests for GDPR-grade audit trails on production images.

ECR storage cost optimization rarely shows up on a CTO's radar until the Amazon ECR line item crosses a few thousand euros a month. By then the registry holds tens of thousands of stale images, each pinned by a build pipeline that nobody remembers writing. European EKS teams running eu-west-1 and eu-central-1 often duplicate images across both regions for high availability, doubling the bill without adding resilience.

According to AWS ECR pricing documentation, private repository storage is billed at 0.10 USD per GB-month, 1.5 TB costs ~$150/month just for storage with data transfer charges layered on top for cross-region pulls.

Build Activity	Per Image Size	Weekly Accumulation	Yearly Accumulation
200 images/day	500 MB each	30 GB	~1.5 TB

This article shows how to cut that footprint with lifecycle policies, image-layer hygiene, and regional caching strategies built for EU data-residency constraints.

Technical Overview

ECR costs come from three sources and these ECR storage cost drivers are:

Cost Source	Description	Billing
Storage	Total GB of image layers retained	$0.10 per GB-month
Data transfer	Images leaving their region (e.g., Frankfurt cluster pulling from Dublin registry)	Layer on top of storage costs
Scanning	Basic (free) vs. enhanced (billed per image push)	Enhanced scanning billed per push

According to the AWS ECR user guide, lifecycle policies evaluate repositories every 24 hours and delete images that match rules. Rules can target untagged images, tag prefixes, or sinceImagePushed age. Pull-through cache lets EKS nodes pull from a local ECR repository that transparently fetches upstream images from Docker Hub or Quay, caching each layer once per region instead of pulling on every node.

A well-tuned setup combines aggressive lifecycle policies on untagged development builds, conservative retention on production tags, pull-through cache for third-party images, and replication only between the regions that actually host running workloads. The result is a registry that grows with the product, not with the build count.

Step-by-Step Implementation

Start by auditing storage per repository. The AWS CLI command below returns total image count and size:

aws ecr describe-repositories --region eu-west-1 \
  --query 'repositories[].repositoryName' --output text | \
  xargs -n1 -I{} aws ecr describe-images --repository-name {} \
  --region eu-west-1 \
  --query 'sum(imageDetails[].imageSizeInBytes)' --output text

Next, apply a lifecycle policy. A balanced policy for an EKS build pipeline looks like this:

{
  "rules": [\
    {\
      "rulePriority": 1,\
      "description": "Retain the last 10 production tags",\
      "selection": {\
        "tagStatus": "tagged",\
        "tagPrefixList": ["prod-", "v"],\
        "countType": "imageCountMoreThan",\
        "countNumber": 10\
      },\
      "action": { "type": "expire" }\
    },\
    {\
      "rulePriority": 2,\
      "description": "Expire untagged images older than 14 days",\
      "selection": {\
        "tagStatus": "untagged",\
        "countType": "sinceImagePushed",\
        "countUnit": "days",\
        "countNumber": 14\
      },\
      "action": { "type": "expire" }\
    },\
    {\
      "rulePriority": 3,\
      "description": "Expire dev and pr tags after 30 days",\
      "selection": {\
        "tagStatus": "tagged",\
        "tagPrefixList": ["dev-", "pr-"],\
        "countType": "sinceImagePushed",\
        "countUnit": "days",\
        "countNumber": 30\
      },\
      "action": { "type": "expire" }\
    }\
  ]
}

Apply the policy with aws ecr put-lifecycle-policy --repository-name my-service --lifecycle-policy-text file://policy.json.

Then slim the images themselves. A multi-stage Dockerfile for a Go service drops from 900 MB to 25 MB:

FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /out/app ./cmd/api

FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

According to Google's distroless project documentation, distroless base images reduce both attack surface and storage footprint because they ship only the runtime dependencies of the application.

Finally, set up pull-through cache for Docker Hub upstream images. In the ECR console or Terraform, create a pull-through cache rule mapping docker-hub to public.ecr.aws/docker/library. EKS nodes then reference images as .dkr.ecr.eu-west-1.amazonaws.com/docker-hub/library/nginx:1.27 and ECR fetches and caches layers on first pull. This avoids Docker Hub rate limits and keeps image traffic within eu-west-1.

Optimization Best Practices

Adopt image digests (sha256:...) in production Kubernetes manifests instead of mutable tags. Image digest benefits:

Reproducible rollouts – exact binary identified by sha256:...
GDPR audit traceability – know exactly which binary ran on a given date
Prevents silent updates – according to kubernetes image documentation, mutable tags could introduce unreviewed dependencies
Native support – Argo CD and Flux work with digests
Migration effort – mostly a rendering change

Enable enhanced scanning only on production repositories. Enhanced scanning is billed per image, so scanning every dev push doubles the registry bill without adding value. Keep basic scanning on dev repos and promote to enhanced for prod- tags. A simple EventBridge rule can mirror a prod- push from dev-ECR to a dedicated prod-ECR, applying enhanced scanning only on the promoted image.

Consolidate replication. Many teams replicate every repository into every region out of habit, which doubles storage. Replicate only the images your EU production clusters run. According to AWS ECR replication documentation, repository filters let you replicate by prefix, so prod- images flow to eu-central-1 while dev images stay in eu-west-1.

Keep layer caches hot in CI. BuildKit's remote cache backed by S3 in eu-west-1 lets GitHub Actions and GitLab runners reuse base-image layers across builds, which reduces both build time and ECR storage churn.

Aspect	Without Remote Cache	With Remote Cache
Image size per release	900 MB fresh image	200 MB delta
Build time	Longer	Shorter
ECR storage churn	Higher	Lower

Lifecycle policies, pull-through cache, and distroless builds – we implement all three.

The best practices above work. But implementing them consistently across your ECR repositories requires expertise.

Our cloud cost optimization experts help you:

Configure lifecycle policies – Preserve production tags, expire dev/untagged images
Implement pull-through cache – Avoid Docker Hub rate limits, keep traffic within region
Migrate to distroless images – Slash image sizes from 900 MB to 25 MB
Set up BuildKit remote cache – Reduce ECR storage churn by 5x

Get ECR Cost Optimization →

Monitoring and Troubleshooting

Track ECR storage with CloudWatch metrics:

Metric	Purpose	Alert Condition
`RepositoryPullCount`	Track image pulls	Monitor trends
`RepositoryStorageUtilization`	Track storage growth	>20% week-over-week without deployment frequency change (signals lifecycle policy failure or tag leak)

If lifecycle rules delete more than expected, check rule priority order.

ECR evaluates rules top down and stops at first match
A broad tagStatus: any rule above a narrow tagPrefixList rule will override it
Use lifecycle policy preview API to dry-run rules before applying

Conclusion

ECR storage cost optimization is a quick win that pays back within a week. Lifecycle policies, distroless multi-stage builds, immutable digests, and regional pull-through cache together trim ECR bills by 50-80% while tightening audit posture for GDPR-regulated workloads in eu-west-1 and eu-central-1.

EaseCloud designs registry-hygiene automation for European EKS teams, from Terraform-managed lifecycle policies to distroless build pipelines. Talk to EaseCloud to baseline your ECR spend and plan a cleanup roadmap.

Frequently Asked Questions

How do lifecycle policies handle images referenced by running pods?

Lifecycle policies delete based on age and tag rules, not on whether an image is in use. Protect production tags with a high countNumber retention and pin deployments to digests so running pods survive registry cleanup.

Should we use ECR Public for open-source images?

ECR Public is ideal for images you distribute externally. For internal use, keep images in private ECR and apply lifecycle policies; ECR Public has different pricing and retention semantics.

Does pull-through cache work with gated images?

Pull-through cache supports authenticated upstreams such as Docker Hub paid accounts and Quay. Configure the upstream credentials in AWS Secrets Manager and reference them in the cache rule.

DEV Community