Forem

David Ojeda
David Ojeda

Posted on • Edited on

11 4

AWS S3: The Basics

Disclaimer: This is the second revision of a post I did before about this same topic. I feel that I could improve it, and now really commit to expanding on it in a series.


Storing and downloading files and data is a common activity for most applications. Where do you centralize this information? Amazon Simple Storage Service (S3) is a great option! 

In this post we'll learn the following:

  • What Amazon S3 is.
  • How it works at a high level.
  • Why you might need it.

Let's get to it!

What is S3?

S3 is the object storage solution of Amazon Web Services (AWS) to reliably store and retrieve data. Some concrete examples of things you can store include:

  • Your entire static web site:
    • HTML, CSS, JS
  • Assets for your apps for later retrieval through a Content Delivery Network:
    • Images, videos
  • Build files to keep a history and implement rollbacks:
    • .zip, .war
  • Data exports for your customers:
    • CSV, XLS, JSON
  • Logs for later analytics processing:
    • .log, .txt

S3 is also one of the most used services of AWS since it provides integrations with many of its other offerings. 

How is data stored?

S3 stores our data as objects inside buckets. You can see buckets as a top-level directory on your drive, followed by a key, which is the path to our object

S3 directory structure

Once we have objects in our buckets, we can retrieve them in many different ways. The following are the most common: 

  • From the AWS console.
  • With the AWS SDKs.
  • Through the REST API.
  • With a signed URL.

We'll talk about each one in a different post. 

Now, in terms of security, you probably don't want all your files to be publicly available. Data leaks are common, and it happens because customers leave their S3 buckets public.

S3 allows us to create policies and rules to explicitly define who has access to each bucket and/or object in our account. The available options to restrict/allow access are overwhelming and often confusing, even to experienced people, that's why you see many data leaks.

I will not dive into the permissions options in this post. Right now, our takeaway is that you can, somehow, restrict access to your data.

S3 Consistency model

A consistency model is a set of rules that, if followed, guarantee consistent results of reading, writing and/or updating your data.

S3's consistency model is called Read-after-Write consistency. For example, if you issue a PUT request to create an object on a bucket, the next GET request will ALWAYS have the desired object.

However, if you try to issue requests in the following order, immediately one after the other:

  1. GET, when the object doesn't exist
  2. PUT 
  3. GET 

You MIGHT not found the desired object. The consistency model when issuing the request in this order is now called eventual consistency. In other words:

Read-after-Write consistency:

  • PUT /my-bucket/photo.png -> 200 ok
  • GET /my-bucket/photo.png -> 200 ok

Eventual consistency:

  • GET /my-bucket/photo.png -> 404 not found
  • PUT /my-bucket/photo.png -> 200 ok
  • GET /my-bucket/photo.png -> 404 not found

Some changes made to your S3 buckets need time to propagate and replicate through AWS servers. For example, when you delete an object you also get eventual consistency; you MIGHT see an object listed on your bucket even though you already deleted it a couple of seconds ago. 

Why do I need S3?

If you need to store files or data that doesn't fit on a database, and you need them to be available from the internet, you probably need a solution like S3.

Why S3 specifically? Here are some reasons:

  • Virtually unlimited storage space.
  • Stored objects availability is 99.99% by default (We'll talk more about this in another post).
  • Objects' tight access restrictions depending on security requirements.
  • Pay for what you use: space occupied and requests.
  • Integration with most of the other AWS services.
  • Low entry barrier.

S3 has many more interesting features. The ones mentioned before are the ones I consider most important.

Wrap up

Whenever you are evaluating a storage solution to use, S3 will probably be on the top three. It's a highly secure, available, and performant service that can solve most of your storage problems.

In the next post we'll create our first buckets and objects with S3. Also, we'll cover how to properly secure them 🔐. Stay tuned!

Thanks for reading me 💜.

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (3)

Collapse
 
andrewbrown profile image
Andrew Brown 🇨🇦

Great article David!

Collapse
 
david_ojeda profile image
David Ojeda

Thanks Andrew!

Collapse
 
monfernape profile image
Usman Khalil

Really helped understanding the basics.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay