DEV Community

Cover image for #AWS - Set up S3 Lifecycle for data rotation
Gururajan Padmanaban
Gururajan Padmanaban

Posted on

#AWS - Set up S3 Lifecycle for data rotation

Requirement: Delete all or specified objects after n days.

S3 Lifecycle rules:

An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects.

There are two types of actions:

  • Transition actions: These actions define when objects transition to another storage class.
  • Expiration actions: These actions define when objects expire. Amazon S3 deletes expired objects on your behalf.

Rules:

  1. Move current versions of objects between storage classes
  2. Move noncurrent versions of objects between storage classes
  3. Expire the current version of the object
  4. Permanently delete the noncurrent versions of objects
  5. Delete expired object delete markers or incomplete multipart uploads (Will does not work if the rules are scoped with tags)

For our process, we are going to focus on the data rotation rules (3 to 5). Other rules are related to the data retention policy, where we move the objects from one S3 class to another S3 class according to our requirements such as how frequently we need to access them.

Scope:

We can apply these rules to all objects in the bucket or limit the rule's scope using one or more filters.

Filters:

  • Prefix: Provide a prefix based on which the files will be marked as expired.
  • File size: We can configure a filter to delete the object only if the file size is more than a certain limit.

Tags (Paid service):

AWS supports tag objects according to our requirements. Tags are key:value pairs e.g: Expire:True. By using these tags we can set up a filter to delete only the tagged objects.

Tags are charged monthly. It is a recurring cost.

To set up a tag for an object also need to pay for the API calls.

Versioning:

To protect the object from being overwritten or deleted S3 provides version support so that we can restore the previous versions if required.

  • Unversioned: This is the default state for any bucket, if the bucket is not version enabled then all the objects are expired and deleted simultaneously. It is not necessary to set up any other rule to delete the expired object.

  • Version enabled: If a bucket is a version enabled then we need to explicitly set up a rule to delete the old objects.

  • Rule: ”Permanently delete the noncurrent versions of objects”

Multipart uploads:

Uploading large files in chunks. S3 supports multipart uploads out of the box (If an object is more than 100 MB).

*Block storage process is supported *
I.e:

  • We can upload the parts/chunks in any order.
  • If any part fails during upload, we can upload that part only without affecting other parts.

If the upload fails then the incomplete parts get accumulated over a period of time. To avoid such scenarios we can set up a rule to delete those incomplete parts of objects from s3 by using the rule “Delete expired object delete markers or incomplete multipart uploads”.

Expiration of an object:

  • Every day at zero time (UTC) the rule will be invoked.
  • Every rule will be invoked simultaneously.
  • When the rule is invoked the object must satisfy the rule.
  • E.g: If you set up a rule to expire an object after 1 day in IST and the rule is triggered at zero time UTC and the object does not satisfy the rule equal to or older than 1 day, then the object will not be marked as expired.
  • Always keep the timezone in mind when setting up a rule. AWS S3 is in the UTC timezone.

Example expiration process workflow:

  • Object uploaded: 6 July 09:18 UTC
  • Lifecycle Rule for expiring objects after 1 day, ran on: 7 July 00:00 UTC
  • By this time the object had not completed 24 Hrs in the S3 Bucket
  • Then your Object was completed 24 Hrs in the S3 Bucket on 7 July at 9:18 UTC
  • The next Lifecycle rule is scheduled to run on: 8 July at 00:00 UTC
  • At this point, the Lifecycle rule marked the object for expiration with the expiration date as 8 July 00:00 UTC
    i.e. 8 July 5:30 AM in Indian Standard Time.

  • The Lifecycle rules run at 12 AM Midnight UTC and will mark the object for expiration, which are eligible as per the rule specified. S3 rounds the expiration time to midnight UTC the next day, which explains why the object was marked for expiration on 8th July instead of 7th July.

Deleting the object:

  • The expired objects will not be deleted immediately, S3 will asynchronously remove these from the Bucket on the backend.
  • This can take some time to complete as S3 performs this operation while ensuring that the service remains available.
  • However, since the object is marked for expiration we will not be charged for the storage of the same, even though the object might still be visible in the S3 Bucket.
  • We can access the objects even if they are expired.
  • The entire folder will be deleted if there are no files left. Because S3 is an object-based storage class anything and everything is an object so even though we access it like a normal file system (Folders and Files) it is actually an object. So every object which satisfies the rule will expire.

Overlapping rules:

  • When setting up a rule if one rule overlaps another AWS will always go with the rule which is the least expensive and save more w.r.t cost.

Example:

  • Rule one is set up to migrate the objects from the S3 Standard class to S3 Infrequent Access after 90 days.
  • Then the second rule is set up to expire the objects after 90 days.

  • AWS will always go with the second rule because there is no point in migrating the objects if we are going to delete them anyway.

Limitations:

  • Suffixes are not supported, i.e if we want to delete only a specific file type e.g: .csv we can't use a filter like *.csv.
  • The entire folder will be deleted if there are no files left. There’s no way around it.
  • To delete a specific file we need to use tags.

Steps to create a lifecycle rule to expire objects:

  1. Sign in to the AWS Management Console and open the Amazon S3 console
  2. In the Buckets list, choose the name of the bucket that you want to create a lifecycle rule.
  3. Choose the Management tab, and choose to Create lifecycle rule.
  4. In the Lifecycle rule name, enter a name for your rule.
  5. Choose the scope of the lifecycle rule:
    • To limit the scope by prefix, in Prefix, enter the prefix
    • Enter the object tag key
    • Object size - specify the minimum object or maximum object size
  6. Under Lifecycle rule actions, choose the actions that you want your lifecycle rule to perform:
    • Expire current versions of objects
    • Permanently delete previous versions of objects (if that fits your use case)
    • Delete expired delete markers or incomplete multipart uploads (if that fits your use case) Depending on the actions that you choose, different options appear.
  7. To expire current versions of objects, under Expire previous versions of objects, in the Number of days after object creation, enter the number of days(45 days).
  8. To permanently delete previous versions of objects, under Permanently delete previous versions of objects, in Number of days after objects become previous versions, enter the number of days.
  9. Under Delete expired delete markers or incomplete multipart uploads, choose to Delete expired object delete markers and Delete incomplete multipart uploads. Then, enter the number of days after the multipart upload initiation that you want to end and clean up incomplete multipart uploads.
  10. Choose Create rule. If the rule does not contain any errors, Amazon S3 enables it, and you can see it on the Management tab under Lifecycle rules.

Ref:

Top comments (0)