ioBasis

Posted on Oct 15, 2020 • Originally published at iobasis.com

How to reduce AWS S3 Costs

#aws #architecture

In this article, you will learn all the strategies to reduce Amazon S3 costs. This article is part of our AWS Cost Reduction series.

First, let's review the factors that affect Amazon S3 monthly costs. You will pay in terms of:

The size of data stored each month (GB).
The number of access operations completed (e.g. PUT, COPY, POST, LIST, GET, SELECT, or other request types).
Number of transitions between different classes.
Data retrieval size and amount of requests.
Data transfer fees (bandwidth out from Amazon S3)

One of the most important cost factors is the storage class. Make sure you understand the different classes available and their use cases. Let's quickly review them and compare them.

Amazon S3 offers 6 different storage classes.

S3 Standard
S3 Intelligent-Tiering
S3 Standard-IA
S3 One Zone-IA
S3 Glacier
S3 Glacier Deep Archive

Keep in mind that every S3 object can be assigned a specific storage class. Thus a bucket might have objects with different classes simultaneously.

Here is a table that summarizes the main cost parameters for each storage class (for N. Virginia region).

S3 Storage Class	Storage ($ / GB)	GET ($ / 1000 requests)	Lifecycle Transitions ($ per 1,000 requests)	Data Retrieval ($ per 1,000 requests)	Data Retrieval Price ($ per GB)	Minimum Days
S3 Standard	0.023	0.0004	0	0	0	0
S3 Intelligent-Tiering	0.023	0.0004	0.01	0	0	30
S3 Standard-IA	0.0125	0.001	0.01	0	0.01	30
S3 One Zone-IA	0.010	0.001	0.01	0	0.01	30
S3 Glacier	0.004	0.0004	0.05	0.025	0.0025	90
S3 Glacier Deep Archive	0.00099	0.0004	0.05	0.025	0.0025	180

To make things simpler, this table doesn't reflect all the elements charged by AWS S3. For example, it doesn't consider PUT (or other) requests, non-bulk data retrievals, or data transfer costs. But it describes the cost divers that differ based on class types.

You will also notice that Storage Price / Gb increases from top to bottom of the table. But access cost increases at the same time. Therefore the right class to use depends on how frequently you access each S3 Object.

S3 Standard class is typically used for frequently accessed data. Although the cost per Gb is high, you don't pay for the number of requests. And therefore this storage class is best suited for objects read or written several times each month.

The third class in the table is S3 Standard-IA. It has a lower Storage Price. But the access cost is higher. According to AWS, it should be used for long lived but infrequently accessed data that needs instant access.

As a rule of thumb, S3 Standard-IA should be used if the object is accessed on average less than once a month. Why one month? Because that's the frequency where S3 Standard and S3 Standard-IA both have roughly have the same cost overall cost. And it's also the minimum recommended amount of time to keep the objects in the S3 Standard-IA class. If they are kept less than 30 days, then the rest will be charged.

Usually, it's difficult to know how often the object is accessed. AWS created S3 Intelligent-Tiering to address this issue. This class automatically moves data between S3 Standard and S3 Standard-IA classes. And that minimizes the S3 cost for the object. If you are keeping an object for more than 30 days, the S3 Intelligent-Tiering class will be cheaper than S3 Standard and S3 Standard-IA. This should be your first option in those cases.

S3 One Zone-IA class is similar to S3 Standard-IA. But, instead of storing data in 3 (or more) AZs, data is stored in only one AZ. For this reason, data could be unavailable if the AZ fails. You should use S3 One Zone-IA only if you can tolerate this risk.

The last 2 classes in the table are S3 Glacier and S3 Glacier Deep Archive. They have the lowest cost per GB. But the access cost is high. Therefore they are used for archiving purposes. They replace the tape libraries used on-premises.

You should keep in mind that Glacier objects aren't immediately available. If you want to access the contents of an object in any Glacier class, you will have to wait until retrieval ends. For Bulk retrieval mode, this time is between 5 and 12 hours. And other retrieval modes are faster but more expensive. For this reason, Glacier should only be used for objects that are rarely accessed. For example, Glacier is ideal for backups, archiving, and any long-term infrequently accessed data.

The difference between S3 Glacier and S3 Glacier Deep Archive is the latter is for even less frequently accessed objects. For example, it's recommended for objects accessed every 6 (or more) months. S3 Glacier Deep Archive storage costs are lower. But the object needs to be stored for at least 180 days in that class. Otherwise, that minimum period will be charged.

In summary, to decide which S3 class to use, use the following table as rule of thumb. But don't forget that access to Glacier's objects could take some hours.

Access Frequency	Recommended S3 Class
Every 30 (or less) days	S3 Standard
Between 30 to 90 days	S3 Intelligent-Tiering
Between 90 to 180 days	S3 Glacier
Every 180 (or more) days	S3 Glacier Deep Archive

We have just described the main characteristics of each S3 class, and the suggested use cases. So now we can start optimizing them.

Below are the main strategies to reduce AWS S3 costs:

1. Set the right S3 class for new objects before the creation

Your first step is to analyze the access patterns for your data. Start thinking about the intended usage for each new object to be created in S3. Each object in S3 should have a specific access pattern. And therefore there is an S3 class that works best for it.
The right class should be applied to all new objects in Amazon S3. It's not possible to define the default class per bucket in S3. But you can assign it per object.

Start defining the best class for each new object in S3. And set this class in the operation that uploads this object to Amazon S3. This can be done using AWS CLI, AWS Console, or AWS SDK. As a consequence, each new object will have the right class. This could the best money-saving strategy in the long term. And probably be the most time-efficient strategy.

2. Adjust the S3 class for existing objects

Now that you have already set the right class for new (to be created) objects, you can focus on the already-created objects. The process is similar to the one described in the previous point. Start analyzing data access patterns for every existing object in your S3 account. Then decide the best class for each one. And finally, assign that class in the object configuration. This will allow you to optimize every S3 bucket, and thus reduce your S3 costs.

How to check if this worked? You can use AWS Cost Explorer to check your daily S3 cost. You will also notice the cost reduction in next month's bill. AWS bills show the consumption for each service, including Amazon S3.

Consider that it could be time-consuming to update every object class after it's created. So that's why it's very important to set classes before objects are created (as previously described).
Note also that this process consists of an object-by-object (or bucket-by-bucket) revision. And, depending on the number of objects that you have, it could a considerable amount of time. It's probably better to focus on big (or very frequently accessed) objects. And then update their storage classes first.

You might also use S3 Storage Class Analysis. This is a tool to analyze S3 objects' access patterns. It monitors the objects within a bucket. And it will show the amount of data stored in the bucket, the amount of data retrieved, and how frequently data is accessed (based on object age). Note that there is a small charge used by this tool. But it allows you to understand if the objects are accessed often. After you understand the access pattern, you can update the S3 storage class accordingly. For example, if you find out that most objects in a bucket are accessed only once per year (and you don't need immediate access), then you should adjust their storage class to S3 Glacier Deep Archive

3. Remove unused S3 objects

You have probably noticed that you pay for the amount of data stored on S3. So if you remove unused objects, you will also reduce S3 costs.

How to check the contents of your S3 buckets? There are several ways.

For example, you can list the objects on each bucket. This will show object names (or key) without downloading the object's contents. This can be done using the AWS Console, AWS CLI, or SDK.

Another option to check S3 buckets' contents is using CloudWatch Metrics. Use BucketSizeBytes metric to get the complete size of the bucket. Or use NumberOfObjects metric to get the number of objects stored in it. These are per Bucket metrics, and they will show you how big the buckets are. Then you can start removing any unused object in the biggest buckets.

You can also activate S3 Inventory in a bucket. This tool prepares a CSV (or Apache ORC) file, which lists all objects in a bucket. And it's delivered to another S3 bucket on a daily or weekly basis. This is a good approach when you have thousands of objects in a bucket, and you want to quickly find some of their properties (like size, encryption status, or last modified time). Note that S3 Inventory has a small cost when active.

4. Use S3 Lifecycle management

Amazon S3 offers a tool to automatically change the storage class of any object. For example, you can transition from S3 Standard class to S3 Glacier after some days of object creation. Therefore you can transition each object to the most suitable storage class. And this will translate into a cost reduction.

How does S3 Lifecycle management works? You set rules for each bucket. Each rule has a transition period. It counts the number of days since the object was created (or removed). And the rule also sets the storage class to transition into after this period. Note that you can always transition the objects to a longer-term storage class. But you can´t transition to a shorter-term storage class.

You can also set a lifecycle rule for a whole bucket, or based on a prefix. So you don't need to transition your objects one-by-one. S3 Lifecycle Management is one of the most useful tools to save costs on S3. And you should always consider using it.

5. Expire S3 objects

This is another strategy to remove unused objects. Amazon S3 Lifecycle Management can also set an expiration policy. This allows you to expire an object some days after from creation. Every expired object will be automatically removed by AWS.

If you keep log files (or any other temporary data) as S3 objects, you should set an expiration for them. For example, you can set log objects to expire 30 days after creation. And they will be removed automatically.

6. Expire incomplete multipart uploads

Amazon S3 uploads big objects using multipart upload. AWS divides a big file into smaller fragments, and each one is uploaded independently to S3. Then AWS joins the several uploaded parts into the final object. AWS recommends using multipart uploads for objects larger than 100 Mb. And it's required to use it for objects over 5 Tb.

It can take some time to upload big objects. And this upload process might be interrupted. As a consequence, the S3 upload bucket will keep some unused fragments. To remove them, you can set a new LifeCycle policy. Policies have a Clean up incomplete multipart uploads setting to expire these partial objects. Removing these objects will allow you to free space on S3, and then reduce costs.

7. Compress S3 objects

You can also compress any object before uploading the S3 object. You just create a compressed file (eg ZIP, GZIP, or equivalent), which will be smaller than the original. And then you upload the compressed file to S3. The amount of data used in S3 will be lower. Then Amazon S3 costs will be reduced.

Note that, to get the original file, you will have to download it and also decompress it. But you could save a lot of space in S3 (for example when using text files).

8. Pack S3 objects

Remember that you also pay for the number of operations done in Amazon S3. If you have to download many S3 objects simultaneously, it might be a good idea to pack them into one big object (e.g. TAR, ZIP, or equivalent).

Some storage classes have minimum capacity charges for objects. For example, the minimum capacity charge per object for S3 Standard-IA and S3 One Zone-IA is 128KB. And the minimum capacity charge per object for S3 Glacier and S3 Glacier Deep Archive classes is 40 Kb.

For this reason, a small 1 KB object (having S3 Standard-IA class) will be charged as 128 KB. Packing multiple files together will take advantage of minimum capacity charges. If you pack small size objects together, you will also reduce your S3 costs.

9. Limit object versions

S3 Object versioning is a very useful tool. Every time you change the contents of an object, AWS will keep the previous version of it. But if you have a 1 MB object with 100 versions, then you will be paying for 100 MB of storage.

But you can use lifecycle policies to automatically delete previous versions after some time. For example, you set a policy to delete objects after 30days of becoming non-current. This will limit the number of versions stored, and lower the storage used. This is another approach to increase your savings.

10. Use Bulk retrieval mode for S3 Glacier

Amazon S3 Glacier has 3 retrieval types:

Retrieval Mode	Retrieval Time
Expedited	1-5 minutes
Standard	3-5 hours
Bulk	5-12 hours

And Amazon S3 Glacier Deep Archive has 2 retrieval types:

Retrieval Mode	Retrieval Time
Standard	< 12 hours
Bulk	< 48 hours

The retrieval time means how fast Amazon S3 makes the object's contents available. Note that the faster you retrieve the objects, the more expensive the operation is. If you can wait for some hours to retrieve the objects, you can save money. So try to use Bulk Retrieval mode if possible. You can choose the retrieval mode when you request this retrieval.

11. Use Query in Place functionality

Some applications store tables as Amazon S3 objects. These tables have a specific format (like JSON, CSV, or Apache Parquet formats). To query the data, you have to download the whole file. And then you have to query the whole table to find the desired data inside the table.

But there are more efficient ways to get the contents. AWS offers tools like Amazon Athena, S3 Select, or Amazon Redshift Spectrum. These tools allow you to perform queries directly on the cloud. They process the data using SQL commands. And then they send you the data you need.

These tools offer many advantages. As the queries are completed on the cloud, you will need less processing power locally. Another benefit is that you download less data from Amazon S3. This makes the process faster and cheaper. Remember that you are charged by the amount of data downloaded from S3. If less data is requested, then you will save money on bandwidth.

Note that S3 queries have a small additional cost. You should evaluate whether you will get a cost optimization.

12. Change region

Some regions are much expensive than others. And this applies to Amazon S3 prices also. So it's worth considering moving your S3 bucket to a region with lower prices.

Another factor to consider is data transfer costs between AWS regions. Data sent from a bucket to a VPC in the same region is free. But sending data to a VPC in another region will have a cost per Gb. It's a good idea to keep a bucket in the region where data is sent.

You can also check the Choosing an AWS Region article describing the factors to consider to define the AWS region to use.

Summary

In this article, you learned the most common strategies to reduce Amazon S3 costs. Now it's time to take action. Pick the strategies that work best for your workload in AWS. And start implementing them.

Feel free to leave your comments below, or check other articles in the AWS Cost Reduction series.

DEV Community