AWS S3: The Basics

#aws #s3

Disclaimer: This is the second revision of a post I did before about this same topic. I feel that I could improve it, and now really commit to expanding on it in a series.

Storing and downloading files and data is a common activity for most applications. Where do you centralize this information? Amazon Simple Storage Service (S3) is a great option!

In this post we'll learn the following:

What Amazon S3 is.
How it works at a high level.
Why you might need it.

Let's get to it!

What is S3?

S3 is the object storage solution of Amazon Web Services (AWS) to reliably store and retrieve data. Some concrete examples of things you can store include:

Your entire static web site:
- HTML, CSS, JS
Assets for your apps for later retrieval through a Content Delivery Network:
- Images, videos
Build files to keep a history and implement rollbacks:
- .zip, .war
Data exports for your customers:
- CSV, XLS, JSON
Logs for later analytics processing:
- .log, .txt

S3 is also one of the most used services of AWS since it provides integrations with many of its other offerings.

How is data stored?

S3 stores our data as objects inside buckets. You can see buckets as a top-level directory on your drive, followed by a key, which is the path to our object.

Once we have objects in our buckets, we can retrieve them in many different ways. The following are the most common:

From the AWS console.
With the AWS SDKs.
Through the REST API.
With a signed URL.

We'll talk about each one in a different post.

Now, in terms of security, you probably don't want all your files to be publicly available. Data leaks are common, and it happens because customers leave their S3 buckets public.

S3 allows us to create policies and rules to explicitly define who has access to each bucket and/or object in our account. The available options to restrict/allow access are overwhelming and often confusing, even to experienced people, that's why you see many data leaks.

I will not dive into the permissions options in this post. Right now, our takeaway is that you can, somehow, restrict access to your data.

S3 Consistency model

A consistency model is a set of rules that, if followed, guarantee consistent results of reading, writing and/or updating your data.

S3's consistency model is called Read-after-Write consistency. For example, if you issue a PUT request to create an object on a bucket, the next GET request will ALWAYS have the desired object.

However, if you try to issue requests in the following order, immediately one after the other:

GET, when the object doesn't exist
PUT
GET

You MIGHT not found the desired object. The consistency model when issuing the request in this order is now called eventual consistency. In other words:

Read-after-Write consistency:

PUT /my-bucket/photo.png -> 200 ok
GET /my-bucket/photo.png -> 200 ok

Eventual consistency:

GET /my-bucket/photo.png -> 404 not found
PUT /my-bucket/photo.png -> 200 ok
GET /my-bucket/photo.png -> 404 not found

Some changes made to your S3 buckets need time to propagate and replicate through AWS servers. For example, when you delete an object you also get eventual consistency; you MIGHT see an object listed on your bucket even though you already deleted it a couple of seconds ago.

Why do I need S3?

If you need to store files or data that doesn't fit on a database, and you need them to be available from the internet, you probably need a solution like S3.

Why S3 specifically? Here are some reasons:

Virtually unlimited storage space.
Stored objects availability is 99.99% by default (We'll talk more about this in another post).
Objects' tight access restrictions depending on security requirements.
Pay for what you use: space occupied and requests.
Integration with most of the other AWS services.
Low entry barrier.

S3 has many more interesting features. The ones mentioned before are the ones I consider most important.

Wrap up

Whenever you are evaluating a storage solution to use, S3 will probably be on the top three. It's a highly secure, available, and performant service that can solve most of your storage problems.

In the next post we'll create our first buckets and objects with S3. Also, we'll cover how to properly secure them 🔐. Stay tuned!

Thanks for reading me 💜.

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More