DEV Community

Cover image for What is AWS S3 and 5Ws for using it?
webapp007
webapp007

Posted on

What is AWS S3 and 5Ws for using it?

AWS S3 is an object-based serverless storage service by Amazon web services which is much faster than hard drive file systems and block storage approaches to save data. Serverless means the storage is hosted on the cloud where you don’t have to configure the server with storage space restriction, it gets expanded dynamically with usage.

What is AWS S3 and Why to use it?

In this article, We will discuss What is AWS S3 and 5Ws for using it

Read: Top 10 Backend Frameworks for Web Development in 2020

What is the AWS S3 bucket?

AWS S3 bucket is a public cloud storage unit on S3 (Simple storage service). The user account can hold multiple S3 buckets for storing folders and data in the form of objects but the bucket names should be unique across all AWS accounts just like a domain name. The S3 bucket names should be DNS compliant which means it shouldn’t include special characters in its name. What is AWS S3 and Why to use it?

Read: What is Serverless Web Application Development?

Why use AWS S3? What are the top features of AWS S3?

Here we will discuss the Top 10 features of AWS S3.

1. Security

A. Security on Server Side

For server-side security Server-side encryption is used which has the following 3 options:

SSE-AES

In this feature, S3 will use AES-256 encryption algorithm to secure the data and handles the keys itself.

SSE-KMS

In this feature, S3 will use AES-256 encryption algorithm to secure the data and use envelope key management service to encrypt the keys which will allow you to manage keys on your own.

SSE-C

In this feature, S3 will use AES-256 encryption algorithm to secure the data and customer provides the keys (you manage the keys).

Read: Top 10 Web Development Trends That will be in Demand in 2020

B. Security in transit

By default, SSL encryption is used for in-transit data and all HTTP requests.

C. Security on Client Side

The data is first encrypted on client-side and then uploaded to AWS S3.

2. Lifecycle management

Lifecycle management is a service to automatically manage data objects after living up for a predetermined life cycle. The set of rules written in life cycle management can automatically delete or move the targeted data to a different storage class after a determined time period.

3. Versioning

Versioning is used to maintain versions of data and to record the actions done by users over it. Versioning is disabled by default; the root user can enable it. Once you have enabled the versioning it can only be suspended which means the created versions will not be deleted.

4. MFA

For prohibiting others on a development team to delete data from S3 bucket you can enable MFA tokens but in order to do this versioning should be turned on mandatorily. Enabling MFA token will allow the only root user to delete data from S3 buckets on successfully matching the token.

Read: Top 10 Front end Web Development Tools in 2020

5. ACL

ACL is a simple permission template or legacy method to manage permissions over objects and S3 buckets.

6. Bucket Policies

Bucket policies are JSON documents which allow developers to write thorough control access procedures.

7. Cross-Region Replication

Cross-region replication is replicating the data present in one data centre to another data centre situated at a different geographical location. The replication of data can be done across accounts as well as S3 buckets.

*Disaster Recovery

In case of natural calamities, the software solution will not shut down it will start fetching data from the data centre located in a different region.

*Meet compliance requirements

Although AWS S3 stores your data across multiple geographically distant Availability Zones by default compliance requirements might dictate that you store data at even greater distances. Cross-Region replication allows you to replicate data between distant AWS Regions to meet compliance requirements.

*Minimize latency

If your customers are in two geographic locations, you can minimize latency in accessing objects by maintaining object copies in AWS Regions that are geographically closer to your users.

Read: Why is it Important to Choose the Best Cloud Connect Solutions?

*Increase operational efficiency

If you have computed clusters in two different AWS Regions that analyse the same set of objects, you might choose to maintain object copies in those Regions.

8. Transfer Acceleration

AWS S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client machine and an S3 bucket. Transfer Acceleration takes advantage of AWS CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path. Using Transfer Acceleration, additional data transfer charges may apply. Only the S3 bucket owners can enable transfer acceleration to leverage maximum bandwidth capabilities of their internet connection for frequently uploading Gigabytes to Terabytes of data.

Read: Top 10 JavaScript Frameworks 2020

9. Pre signed URLs

The data uploaded as objects on AWS S3 bucket generates unique URL to access it and it is accessible to people according to access level permissions (Public, Private or limited access). When the AWS user wants to provide read and write access to someone over an object for a limited time then they can create pre-signed URLs which will be signed by their user id and will provide access for the predetermined time period.

10. Storage Classes

AWS S3 has the following six storage classes for which the availability is inversely proportional to pricing.

*Standard

Standard storage class is fastest and most expensive as the data in it is replicated across at least three availability zones. This storage class is best for storing data that is being accessed almost every time because here the latency is in a couple of microseconds.

Read: Top Java Frameworks for 2020

*Standard IA

The standard IA are the same as standard storage class in terms of performance but the bundled services are lesser hence it is cheaper.

*One Zone IA

In One Zone IA, the objects are only stored in one availability zone to reduce the asking price hence the latency is little more than the standard storage class. The data objects which are less frequently used like once in a month should be stored in this storage class.

Read: Top Best Web Development IDE in 2020

*Glacier

The data that’s older than a month and which is hardly accessed by anyone should be moved to Glacier for reducing the storage cost to a fraction.

*Glacier Archive

Glacier archive is used to store data that needs to be stored for a year or more. Usually, this type of data is enterprise operations data or the data to be maintained for legal compliance. The Glacier archive is cheapest amongst all of its peer storage classes and the data retrieval time is in hours.

Read: Top 10 Tech Skills That will be in Demand in 2020

*Intelligent Tiering

Intelligent tiering uses machine learning to analyse the objects to be placed in most cost-efficient storage class. The least accessed objects tend to be moved into glacier or glacier archive.

For more insights, you can refer to the performance chart by AWS.

Who should use AWS S3?

The Solutions architect incorporates AWS S3 in solution architecture and on deployment he/she directs the DevOps team to use it for storing the data.

When to use AWS S3?

When your project has a large amount of data which is increasing at an unpredictable rate.

Read: Top 10 Frameworks for Web Application Development

Where to use AWS S3?

A project where large amounts of sensitive data are being generated and accessed should use AWS S3 to reliably manage access over data and protect it. Usually, the scale of these projects is enterprise-level which cannot bear downtime.

Read: What is an API and How it works?

Top comments (2)

Collapse
 
pavelloz profile image
Paweł Kowalski • Edited

Very good summary :)

Probably some cost analysis would help for those who want to go wild with Intelligent Tiering, as the pricing for it is dynamic, depending on objects observed.

One hard-learned lesson for me: If you have a bunch of files, and change your lifecycle rules to delete them, s3 will delete them, and without backup, you have a problem. It works retrospectively :)

Collapse
 
webapp007 profile image
webapp007

Thank You So Much For Your Feedback