Object Storage in the Cloud
What is AWS S3?
S3 is an AWS service that provides secure, durable, highly scalable object storage, and is also scalable, allowing you to store and retrieve any amount of data from anywhere on the web at a very low cost. S3 is very simple to use.
It manages data as objects, not in file systems or data blocks.
You can upload to s3, any file type but you can’t run a database or operative system.
There is no limit on the number of objects you can store.
S3 is also free up until 5GBs.
The maximum possible storage is 5 terabytes. S3 buckets must have a name that is globally unique.
The URL to your S3 bucket will be something like
S3 is designed to be highly available and highly durable. It has 99.95% service availability, depending on the S3 tier.
When it comes to durability, S3 has 11, 9’s 99.999999999999%
S3 has tiered storage. You can set rules for lifecycle management. You can also store versions of the same objects. This means you can roll back to a previous version of your file in case something goes south. There is also encryption and ACLs, Access control lists, to define who can access the s3 bucket.
Inside S3, there are different use cases:
S3 Standard Infrequent Access
S3 One Zone-Infrequent Access
S3 Glacier Deep Archive
S3 Standard: high availability and durability, always stored redundantly across multiple devices across multiple AZs, 3 or greater.
S3 Standard Infrequent Access: this one is for data that is accessed less frequently but still requires fast access when it is indeed requested. You will pay to access the data, through a per GB retrieval fee. 11 9’s of durability and 99.9% of availability. This is great for long-term storage, backup, and for disaster recovery files.
S3 One Zone-Infrequent Access: This is the same as the standard infrequent access, but is stored only within a single availability zone. It costs 20% less. 95% availability.
S3 Glacier: glacier is very cheap storage, it is optimized for data that is very infrequently accessed, you pay for each you access, used for archiving data. Provides long-term data archiving with retrieval times that go from 1h to 12h, mostly for historical data only that will only get accessed a few times per year.
3 Glacier Deep Archive: Like S3 Glacier but will have a default retrieval time of 12h. Perfect for one or two accesses per year.
Intelligent-Tiering: For when you don’t know if you will access the data frequently or not. Automatically moves your data to the most cost-effective tier based on the access frequency.
Performance across the S3 Storage Classes
Securing s3 buckets
S3 buckets are private by default. Only the bucket owner can upload new files, read files, delete files, etc. By default, the public access is off. There is no anonymous access by default. If you of course want to allow that access, for example, read access to anonymous users, you can do that at the level of the bucket. You can also use bucket policies to define your permissions for access to your bucket. You can create ACLs, at the object level, so inside the same s3 bucket, you can have different levels of access to different objects. You can also keep access logs.
S3 and Encryption
You can encrypt your data in s3 in a few different ways. Your data can be encrypted in transit, that is via SSL/TLS, meaning having HTTPS.
You can also select the encryption and its type when you upload a file or on existing files, it will use AES 256 bit encryption. But you can also use SSE-KMS, which is encrypted by using a key, this option allows you to see who used the key and when these keys are provided by AWS key management service.
If you want to use customer-provided keys, there is also that option, which is called SSE-C.
You can also upload the files yourself before uploading them to the s3 bucket.
Finally, you can also enforce encryption via bucket policy.
S3 is probably the easiest service, in my opinion, to get started with AWS. Go ahead, create a few buckets, and experiment with it and its options!