DEV Community

Cover image for Efficiently Deleting Millions of Objects in Amazon S3 Using Lifecycle Policy
Maksim Skutin for AWS Community Builders

Posted on • Originally published at log.skut.in

Efficiently Deleting Millions of Objects in Amazon S3 Using Lifecycle Policy

Picture this: You have millions of objects sitting in Amazon S3, and you need to delete the bulk of them without blowing up your budget or crippling your applications. It's a scenario that's all too common, yet S3 continues to surprise us with quirks and unexpected behaviors. In a podcast, Corey Quinn and Daniel Grzelak (Chief Innovation Officer at Plerion) took a deep dive into what they call the "wild and wonderful world of S3" from "Schrodinger's Objects" created by incomplete uploads, to the head-scratching differences between S3's CLI subcommands versus the S3 API, and a few "historical oddities" that S3 still carries around since its inception.

Security misconfigurations remain the biggest threat to data stored in S3, whether it's giving overly broad permissions, mishandling encryption settings, or misunderstanding how compliance locks and IAM policies actually work. The takeaway? Many of us rely on S3 and think we know how it behaves until something weird and costly happens.

That's why automating your deletions with Lifecycle Policies isn't just a convenience, but a safeguard against everything from runaway billing due to incomplete multipart uploads to lingering (and accidentally exposed) old data. In this post, we'll walk through creating and customizing Lifecycle Policies to clean up your data at scale:

  • Addressing "Schrodinger's Objects"
    • Automate cleanups of incomplete multipart uploads to avoid invisible storage costs.
  • Enforcing Permission Boundaries
    • Ensure that your lifecycle rules align with correct IAM and encryption settings don't rely on default assumptions.
  • Navigating Oddities & Legacy Features
    • Be aware of S3's historical quirks (like ACLs) and how they might affect your lifecycle rules.
  • Managing Compliance Locks & Expired Object Markers
    • Set your retention periods before legal or regulatory deadlines creep up.

Ready to dive in? Let's explore how to set up a robust Lifecycle Policy to keep your S3 buckets in check so you can stay focused on growth instead of babysitting stale or incomplete data.


Understanding S3 Lifecycle Policies

S3 Lifecycle Policies are rules you apply to a bucket (or specific objects within a bucket) to automatically manage objects throughout their lifecycle. This includes transitioning objects to different storage classes and expiring (deleting) objects based on age or other criteria.

Key Capabilities

  1. Delete Objects After a Specified Time: For instance, remove logs older than 30 days.
  2. Transition Objects Between Storage Classes: Optimize costs by moving infrequently accessed data to cheaper storage like S3 Glacier.
  3. Clean Up Incomplete Multipart Uploads: Prevent "invisible" storage charges from partial uploads.
  4. Remove Expired Object Delete Markers: Maintain clarity in versioned buckets by purging old delete markers.

Why It Matters

  • Scalability: Automated rules can handle vast numbers of objects without overloading your applications or budgets.
  • Cost-Efficiency: Lifecycle-initiated deletions do not incur API fees, unlike manual DeleteObject calls.
  • Risk Reduction: Proper lifecycle rules prevent accidental data hoarding (e.g old logs, incomplete uploads), keeping operational costs in check.

Implementing a Purge Policy

Below is an example lifecycle policy that targets mass deletion and cleans up versions and multipart uploads:

{
  "Rules": [
    {
      "Expiration": {
        "Days": 1
      },
      "ID": "FullDelete",
      "Filter": {
        "Prefix": ""
      },
      "Status": "Enabled",
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 1
      },
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 1
      }
    },
    {
      "Expiration": {
        "ExpiredObjectDeleteMarker": true
      },
      "ID": "DeleteMarkers",
      "Filter": {
        "Prefix": ""
      },
      "Status": "Enabled"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Let's break down the key components:

Rule 1: FullDelete

  • Deletes objects after 1 day ("Days": 1)
  • Removes non-current versions after 1 day ("NoncurrentDays": 1)
  • Cleans up incomplete multipart uploads after 1 day
  • Applies to all objects (empty prefix)

Rule 2: DeleteMarkers

  • Removes expired delete markers
  • Helps clean up versioned buckets
  • Prevents accumulation of unnecessary markers

By combining these two rules, you ensure you're regularly purging data, preventing extra charges from orphaned uploads, and avoiding clutter from versioned deletes.

Benefits of Using Lifecycle Policies for Mass Deletion

  • Cost-Effective – No API charges for deletions through lifecycle policies
  • Scalable – Handles millions of objects without overwhelming your applications
  • Automated – Set-and-forget approach
  • Version-Aware – Properly handles versioned buckets
  • Comprehensive – Cleans up incomplete multipart uploads

Implementation Steps

  • Save your lifecycle policy JSON in a file (e.g., purge-bucket-policy.json).
  • then run:
aws s3api put-bucket-lifecycle-configuration \
        --bucket YOUR-BUCKET-NAME \
        --lifecycle-configuration file://purge-bucket-policy.json
Enter fullscreen mode Exit fullscreen mode
  • Monitor Progress
    • S3 Inventory Reports: Track object counts and see if stale data is decreasing over time
    • CloudWatch Metrics: Observe S3 metrics like NumberOfObjects and BucketSizeBytes
    • Refine & Iterate
      • Tweak the policy if deletions happen too soon or if objects need to be retained longer for compliance
      • Consider partial transitions to cheaper storage classes before final deletion to optimize costs.

Early Deletion Fees

While mass deletion with polices is easy, beware that in Glacier Deep Archive, each object must remain stored for at least 180 days before deletion or transition. If an object is removed earlier, Amazon charges a pro-rated early deletion fee equivalent to the storage cost for the full 180 days. This protects AWS's pricing model and ensures archive classes remain cost-effective for truly long-term storage. For example, whether you delete an object after 1 day or 179 days, you still pay the same 180-day storage cost for that object. Let's see how more advanced techniques can help us with more granular flow.

Additional S3 Lifecycle Policy Features

Below is an enhanced snippet that shows how you can apply more granular filters and transitions:

{
  "Rules": [
    {
      "ID": "FullDelete",
      "Filter": {
        "And": {
          "Prefix": "temp/",
          "Tags": [
            {
              "Key": "temporary",
              "Value": "true"
            }
          ],
          "ObjectSizeGreaterThan": 5242880
        }
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 60,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 90
      },
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 7,
          "StorageClass": "STANDARD_IA"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 14
      },
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 1
      }
    },
    {
      "ID": "DeleteMarkers",
      "Filter": {
        "Prefix": ""
      },
      "Status": "Enabled",
      "Expiration": {
        "ExpiredObjectDeleteMarker": true
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

purge-bucket-policy.json

Key Additions

  • Complex Filtering
    • And operator to combine multiple conditions
    • Tag-Based: Target objects tagged temporary=true
    • Prefix: Only apply to objects in the temp/ folder
    • Object Size Filtering: only handle files larger than 5 MB
  • Storage Class Transitions
    • Move data from Standard to STANDARD_IA after 30 days, then to GLACIER after 60 days.
    • Can drastically reduce storage costs if data is rarely accessed
  • Versioning Management
"NoncurrentVersionTransitions": [
    {
      "NoncurrentDays": 7,
      "StorageClass": "STANDARD_IA"
    }
]
Enter fullscreen mode Exit fullscreen mode

Additional Policy Options

  • Minimum Object Size Threshold:
"Filter": {
    "ObjectSizeGreaterThan": 5242880  // 5MB
}
Enter fullscreen mode Exit fullscreen mode
  • Date-Based Expiration:
"Expiration": {
    "Date": "2024-12-31T00:00:00.000Z"
}
Enter fullscreen mode Exit fullscreen mode
  • Multiple Tag Conditions:
"Filter": {
    "And": {
      "Tags": [
        {
          "Key": "environment",
          "Value": "test"
        },
        {
          "Key": "temporary",
          "Value": "true"
        }
      ]
    }
  }
Enter fullscreen mode Exit fullscreen mode

Advanced Use Cases

  • Regulatory Compliance
{
  "ID": "ComplianceRule",
  "Filter": {
    "And": {
      "Prefix": "compliance/",
      "Tags": [
        {
          "Key": "confidential",
          "Value": "true"
        }
      ]
    }
  },
  "Status": "Enabled",
  "Transitions": [
    {
      "Days": 365,
      "StorageClass": "GLACIER"
    }
  ],
  "Expiration": {
    "Days": 2555 // 7 years
  }
}
Enter fullscreen mode Exit fullscreen mode
  • Cost Optimization
{
  "ID": "CostOptimization",
  "Filter": {
    "And": {
      "Prefix": "logs/",
      "ObjectSizeGreaterThan": 104857600 // 100MB
    }
  },
  "Status": "Enabled",
  "Transitions": [
    {
      "Days": 30,
      "StorageClass": "INTELLIGENT_TIERING"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Best Practices for Enhanced Policies

  1. Use Meaningful Rule IDs
    • Descriptive IDs make it easier to troubleshoot or revise rules later.
  2. Layer Your Rules
    • Combine broad rules (e.g., global bucket cleanup) with specific exceptions (prefix- or tag-based).
    • This allows flexible retention for certain teams, environments, or project files.
  3. Monitor Transitions
    • CloudWatch Metrics: Keep an eye on storage usage across classes.
    • Cost Explorer: Ensure you're actually saving money after transitions.
  4. Document and Version Policies
    • Keep a changelog of your policy updates.
    • Add comments in your JSON files to clarify the rationale for each rule.
  5. Test in a Sandbox
    • Apply new rules to a test bucket first to confirm they behave as intended.
    • Verify logs, billing, or S3 Inventory data before rolling changes into production.

General Best Practices

  • Version Control Impact
    • If your bucket is versioned, plan carefully so you don't accidentally purge important previous versions.
    • Use NoncurrentVersionExpiration to remove unnecessary versions while retaining critical data.
  • Compliance & Retention
    • Check legal or regulatory requirements. Ensure date-based expiration or compliance locks align with policies for data retention.
    • Some organizations layer on S3 Object Lock for WORM (Write Once, Read Many) compliance. If you use Object Lock, test thoroughly to avoid unintended data retention.
  • High-Level Monitoring
    • Tools like S3 Storage Lens can provide insights into bucket composition and track how many objects move between storage classes or get deleted.
  • Limitations
    • Lifecycle policies can take 24–48 hours to run. They're not for instant deletions.
    • Large-scale deletions can momentarily affect performance or listing operations.

Conclusion

S3 lifecycle policies provide a robust, cost-effective solution for managing large-scale object deletion. By implementing these policies, you can automate the cleanup process while maintaining control over your storage costs and data lifecycle.

  • Plan transitions to optimize costs while ensuring important data remains accessible
  • Leverage expiration to remove stale or unnecessary objects automatically
  • Keep an eye on incomplete uploads so you don't end up with hidden storage costs
  • Document and monitor your lifecycle strategies to ensure they remain aligned with business requirements

Want more behind-the-scenes S3 insights? Check out Daniel's recent posts on:

And if you're curious about the podcast itself, you can find the show notes, and transcript here.

Billboard image

Imagine monitoring that's actually built for developers

Join Vercel, CrowdStrike, and thousands of other teams that trust Checkly to streamline monitor creation and configuration with Monitoring as Code.

Start Monitoring

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post