Automating S3 Data Retention by Prefix Using Python (DevOps Automation Story)

#automation #aws #devops #python

In many production systems, object storage slowly becomes a dumping ground.

Logs, reports, media files, exports — everything lands in the same S3 bucket, but not all data needs to live forever. The real challenge is not deleting data, but deleting the right data at the right time, without relying on manual processes.

This post explains how I automated folder-level retention policies in Amazon S3 using Python, turning a repetitive console task into a clean, auditable automation.

The scenario (fictional but realistic)

Imagine a shared S3 bucket used by multiple internal teams:

company-shared-storage/
├── analytics/
├── audit-logs/
├── temp-exports/
├── media-processing/
└── legal-archive/

Each folder has a different data lifecycle requirement:

> Prefix    Retention
> analytics/    3 months
> audit-logs/   6 months
> temp-exports/ 30 days
> media-processing/ 90 days
> legal-archive/    retain forever

Managing this manually in the AWS Console does not scale and is easy to misconfigure.

Why automation was necessary

Manually configuring lifecycle rules:

Requires repetitive console work
Is difficult to review and audit
Does not fit infrastructure-as-code practices
Is error-prone during changes

So instead of treating retention as a one-time setup, I treated it as automation logic.

Important constraints to understand first

Before writing any automation, it’s critical to understand how S3 behaves:

S3 lifecycle expiration is based on object creation time
It does not track last access or last modification
S3 has no real folders — only prefixes
Lifecycle rules delete objects, not folders
Buckets without versioning only need “expire current versions”

Designing automation without knowing this leads to surprises later.

Automation design approach

The goal was simple:

Define retention rules as data
Convert retention into lifecycle policies programmatically
Skip prefixes that should never be touched
Make the process repeatable and reviewable

Retention rules were expressed in a simple mapping structure, not hardcoded logic.

Python automation example

import boto3

bucket_name = "company-shared-storage"
profile = "prod-profile"

retention_map = {
    "analytics/": 90,
    "audit-logs/": 180,
    "temp-exports/": 30,
    "media-processing/": 90,
    "legal-archive/": None
}

session = boto3.Session(profile_name=profile)
s3 = session.client("s3")

rules = []

for prefix, days in retention_map.items():
    if not days:
        continue

    rules.append({
        "ID": f"expire-{prefix.strip('/')}-{days}-days",
        "Status": "Enabled",
        "Filter": {"Prefix": prefix},
        "Expiration": {"Days": days}
    })

if rules:
    s3.put_bucket_lifecycle_configuration(
        Bucket=bucket_name,
        LifecycleConfiguration={"Rules": rules}
    )

What this automation achieves

Objects under analytics/ are deleted after 90 days
Objects under audit-logs/ are deleted after 180 days
Objects under legal-archive/ are never deleted
Overwriting an object resets its expiration timer
Lifecycle execution is handled asynchronously by S3 No human intervention required after setup.

Why this is real automation

This approach:

Removes manual AWS Console dependency
Makes retention policy code-driven
Allows easy updates through version control
Reduces the risk of accidental data loss
Aligns with Infrastructure-as-Code principles

Retention stops being “someone’s responsibility” and becomes system behavior.

Key lessons learned

Lifecycle policies are based on creation time, not inactivity
Prefix-based retention works well when automation-driven
Empty retention should explicitly mean “no rule”
Lifecycle rules overwrite existing configuration — automation must be intentional

Where this can be extended

This automation can easily evolve into: