Harris Geo 👨🏻‍💻

Posted on Apr 5, 2021 • Originally published at harrisgeo.me

AWS Learn In Public Week 6, Advanced S3, Glacier And Athena

#aws #learninpublic #s3 #glacier

Week 6 of my AWS learning journey. This week is diving deeper into what S3 can provide us. We're going to do a quick overview of a couple new services Glacier and Athena. Let's go into the details.

S3 Replication

Let's talk about AWS S3 Replication

To do that we must enable versioning in both source and destination buckets
We have Cross Region Replication (CRR) which is ideal for compliance, lower latency access and also when you want to replicate across accounts

Then we have same region replication (SRR) which can be for log aggregation or live replication between production and test accounts

In either case, buckets can be in different accounts
Copy is asynchronous and we must give proper IAM permissions

After enabling S3 Replication, only new objects are replicated and not everything.

When deleting. Deletes without version ID add a delete marker which is not replicated.
Deleting with a version ID, it deletes in the source and is not replicated

We cannot do "chaining" of replication.

That means that if Bucket A has a replication into Bucket B which then has a replication into Bucket C and we add an object into bucket A, it won't make it all the way to Bucket C

S3 pre-signed URLs

Let's talk about AWS S3 pre-signed URLs

We can generate links that have the same permissions on the file as when we open it via the AWS console.
Such links can be generated using the CLI (downloads) or SDK (uploads)
By default they are valid for 3600 seconds but we can change the timeout with the --expires-in x seconds argument
We can use them for cases like
- Share link to content only with logged in users
- Allow temporary actions for users

Here's the CLI command for it

https://pbs.twimg.com/media/Exuf1ucXAAM8HgC?format=jpg&name=medium

S3 Storage classes and Glacier

We're going to do a quick overview of AWS S3 Storage classes and Glacier!

1. S3 standard which is a general purpose with high durability across multiple AZs.

99.99% availability throughout the year.
S3 standard is great for Data analytics, mobile & gaming applications and more

2. S3 Standard-Infrequent Access (IA)

For data that are less frequently accessed but require almost instant access.
It is also high durability across multiple AZs
It has 99.9% availability
IA is good for Disaster recovery, backups etc

3. S3 One Zone-Infrequent Access

The same principle but only in a single AZ but 20% cheaper
It has 99.5% availability
Supports SSL for data in transit and encryption at rest
Good for secondary backups and other data you can recreate

4. S3 Intelligent tiering

Similar to S3 standard but automatically moves objects between tiers based on access patterns
It has a small monthly monitoring and auto-tiering fee

5. Glacier

Low cost and meant for archiving or backups
Good for long term retention like 10s of years
Archives are stored in vaults
There is a cost to retrieve which gets more expensive for faster retrievals (1 minute to 12 hours)
The minimum storage duration is 90 days.

6. Glacier Deep Archive

For looong term storage and really cheaper than all the other options
However the fastest way to retrieve vaults is 12 hours
The minimum storage duration is 180 days (half a year)

To give you an idea of the costs, for S3 standard it is $0.023 per GB. For deep Glacier it is $0.00099 per GB

https://aws.amazon.com/s3/pricing/

S3 lifecycle rules

Let's talk about AWS S3 lifecycle rules

We can move between storage classes based on how often we access our objects.
For infrequently accessed object, move them to standard IA
For archives and object we don't need instantly we can move them to Glacier or Deep Archive
We can even automate that with transition actions which are definitions when objects are transitioned to another storage class
- Example: Move objects to Standard IA class 60 days after creation. Then archive them to Glacier after 6 months
We can also set expiration actions where we configure object to be deleted after a set amount of time
- Example: Access log files can be set to delete after 365 days. Such files can be old versions or incomplete multi-part uploads
We can create these rules based on prefixes like s3://somebucket/archives/*
We can also add rules based on certain objects tags like Department: Sales

AWS Athena

Today let's talk about AWS Athena

It is a Serverless service to perform analytics directly against S3 files and uses a SQL language to query these files
It supports CSV, JSON and more.
Athena is quite common for cases like BI, analytics, reporting, ELB and CloudTrail logs and more

Summary

I have to admit, most of these features are not really that common out there. In my last 2-3 jobs I have only seen a very basic way of using S3 comparing to what we discuss here. However, based on the examples these features must definitely be used by companies especially when there are so many different storing tiers.

Next week we are going to talk about the service that caused me to want to learn more about how AWS works. That is ECS where we are going to talk about managing Docker containers, ECR and Fargate.

DEV Community