DEV Community

POTHURAJU JAYAKRISHNA YADAV
POTHURAJU JAYAKRISHNA YADAV

Posted on

Automating S3 Data Retention by Prefix Using Python (DevOps Automation Story)

In many production systems, object storage slowly becomes a dumping ground.

Logs, reports, media files, exports — everything lands in the same S3 bucket, but not all data needs to live forever. The real challenge is not deleting data, but deleting the right data at the right time, without relying on manual processes.

This post explains how I automated folder-level retention policies in Amazon S3 using Python, turning a repetitive console task into a clean, auditable automation.

The scenario (fictional but realistic)

Imagine a shared S3 bucket used by multiple internal teams:

company-shared-storage/
├── analytics/
├── audit-logs/
├── temp-exports/
├── media-processing/
└── legal-archive/

Each folder has a different data lifecycle requirement:

> Prefix    Retention
> analytics/    3 months
> audit-logs/   6 months
> temp-exports/ 30 days
> media-processing/ 90 days
> legal-archive/    retain forever
Enter fullscreen mode Exit fullscreen mode

Managing this manually in the AWS Console does not scale and is easy to misconfigure.

Why automation was necessary

Manually configuring lifecycle rules:

  • Requires repetitive console work
  • Is difficult to review and audit
  • Does not fit infrastructure-as-code practices
  • Is error-prone during changes

So instead of treating retention as a one-time setup, I treated it as automation logic.

Important constraints to understand first

Before writing any automation, it’s critical to understand how S3 behaves:

  • S3 lifecycle expiration is based on object creation time
  • It does not track last access or last modification
  • S3 has no real folders — only prefixes
  • Lifecycle rules delete objects, not folders
  • Buckets without versioning only need “expire current versions”

Designing automation without knowing this leads to surprises later.

Automation design approach

The goal was simple:

  • Define retention rules as data
  • Convert retention into lifecycle policies programmatically
  • Skip prefixes that should never be touched
  • Make the process repeatable and reviewable

Retention rules were expressed in a simple mapping structure, not hardcoded logic.

Python automation example

import boto3

bucket_name = "company-shared-storage"
profile = "prod-profile"

retention_map = {
    "analytics/": 90,
    "audit-logs/": 180,
    "temp-exports/": 30,
    "media-processing/": 90,
    "legal-archive/": None
}

session = boto3.Session(profile_name=profile)
s3 = session.client("s3")

rules = []

for prefix, days in retention_map.items():
    if not days:
        continue

    rules.append({
        "ID": f"expire-{prefix.strip('/')}-{days}-days",
        "Status": "Enabled",
        "Filter": {"Prefix": prefix},
        "Expiration": {"Days": days}
    })

if rules:
    s3.put_bucket_lifecycle_configuration(
        Bucket=bucket_name,
        LifecycleConfiguration={"Rules": rules}
    )
Enter fullscreen mode Exit fullscreen mode

What this automation achieves

  • Objects under analytics/ are deleted after 90 days
  • Objects under audit-logs/ are deleted after 180 days
  • Objects under legal-archive/ are never deleted
  • Overwriting an object resets its expiration timer
  • Lifecycle execution is handled asynchronously by S3 No human intervention required after setup.

Why this is real automation

This approach:

  • Removes manual AWS Console dependency
  • Makes retention policy code-driven
  • Allows easy updates through version control
  • Reduces the risk of accidental data loss
  • Aligns with Infrastructure-as-Code principles

Retention stops being “someone’s responsibility” and becomes system behavior.

Key lessons learned

  • Lifecycle policies are based on creation time, not inactivity
  • Prefix-based retention works well when automation-driven
  • Empty retention should explicitly mean “no rule”
  • Lifecycle rules overwrite existing configuration — automation must be intentional

Where this can be extended

This automation can easily evolve into:

  • Reading retention from a CSV or database
  • A scheduled compliance check
  • Validation against existing lifecycle rules
  • Integration into CI/CD pipelines

Final takeaway

S3 lifecycle policies are powerful, but their real value comes when they are automated, not manually configured.

Small automations like this reduce operational overhead and prevent long-term storage sprawl — which is exactly what good DevOps is about.

Top comments (0)