In many production systems, object storage slowly becomes a dumping ground.
Logs, reports, media files, exports — everything lands in the same S3 bucket, but not all data needs to live forever. The real challenge is not deleting data, but deleting the right data at the right time, without relying on manual processes.
This post explains how I automated folder-level retention policies in Amazon S3 using Python, turning a repetitive console task into a clean, auditable automation.
The scenario (fictional but realistic)
Imagine a shared S3 bucket used by multiple internal teams:
company-shared-storage/
├── analytics/
├── audit-logs/
├── temp-exports/
├── media-processing/
└── legal-archive/
Each folder has a different data lifecycle requirement:
> Prefix Retention
> analytics/ 3 months
> audit-logs/ 6 months
> temp-exports/ 30 days
> media-processing/ 90 days
> legal-archive/ retain forever
Managing this manually in the AWS Console does not scale and is easy to misconfigure.
Why automation was necessary
Manually configuring lifecycle rules:
- Requires repetitive console work
- Is difficult to review and audit
- Does not fit infrastructure-as-code practices
- Is error-prone during changes
So instead of treating retention as a one-time setup, I treated it as automation logic.
Important constraints to understand first
Before writing any automation, it’s critical to understand how S3 behaves:
- S3 lifecycle expiration is based on object creation time
- It does not track last access or last modification
- S3 has no real folders — only prefixes
- Lifecycle rules delete objects, not folders
- Buckets without versioning only need “expire current versions”
Designing automation without knowing this leads to surprises later.
Automation design approach
The goal was simple:
- Define retention rules as data
- Convert retention into lifecycle policies programmatically
- Skip prefixes that should never be touched
- Make the process repeatable and reviewable
Retention rules were expressed in a simple mapping structure, not hardcoded logic.
Python automation example
import boto3
bucket_name = "company-shared-storage"
profile = "prod-profile"
retention_map = {
"analytics/": 90,
"audit-logs/": 180,
"temp-exports/": 30,
"media-processing/": 90,
"legal-archive/": None
}
session = boto3.Session(profile_name=profile)
s3 = session.client("s3")
rules = []
for prefix, days in retention_map.items():
if not days:
continue
rules.append({
"ID": f"expire-{prefix.strip('/')}-{days}-days",
"Status": "Enabled",
"Filter": {"Prefix": prefix},
"Expiration": {"Days": days}
})
if rules:
s3.put_bucket_lifecycle_configuration(
Bucket=bucket_name,
LifecycleConfiguration={"Rules": rules}
)
What this automation achieves
- Objects under analytics/ are deleted after 90 days
- Objects under audit-logs/ are deleted after 180 days
- Objects under legal-archive/ are never deleted
- Overwriting an object resets its expiration timer
- Lifecycle execution is handled asynchronously by S3 No human intervention required after setup.
Why this is real automation
This approach:
- Removes manual AWS Console dependency
- Makes retention policy code-driven
- Allows easy updates through version control
- Reduces the risk of accidental data loss
- Aligns with Infrastructure-as-Code principles
Retention stops being “someone’s responsibility” and becomes system behavior.
Key lessons learned
- Lifecycle policies are based on creation time, not inactivity
- Prefix-based retention works well when automation-driven
- Empty retention should explicitly mean “no rule”
- Lifecycle rules overwrite existing configuration — automation must be intentional
Where this can be extended
This automation can easily evolve into:
- Reading retention from a CSV or database
- A scheduled compliance check
- Validation against existing lifecycle rules
- Integration into CI/CD pipelines
Final takeaway
S3 lifecycle policies are powerful, but their real value comes when they are automated, not manually configured.
Small automations like this reduce operational overhead and prevent long-term storage sprawl — which is exactly what good DevOps is about.
Top comments (0)