DEV Community

loading...
Cover image for 15 Things that you must know about AWS S3 (Simple Storage Service)

15 Things that you must know about AWS S3 (Simple Storage Service)

aziz.amghar
Updated on ・4 min read

1. S3 is a secure and scalable storage service

You can store securely your files (called objects) to S3, the object size can be up to 5 TB.

2. Objects Attributes:
S3 objects can have:

  • Key (name of the object)
  • Value (data)
  • Version ID.
  • Metadata (data about data you are storing)
  • Subresources: Access control list, torrents.

3. S3 Naming convention:
There are some rules that you must respect in order to name your S3 objects:

  • No uppercase nor underscore
  • 3-63 characters long
  • Not an IP and it must start lowercase letter or number
  • S3 is a universal namespace, so it’s unique.

4. S3 has the following features:
Tiered storage available
Lifecycle management
Versionning
Encryption
MFA Delete (multi factor auth): can be only configured in CLI mode.
Secure data using ACL (Access Control List) and bucket policies.
Signed URLs: URLs that are valid only for a limited time (ex: premium video service for logged in users)

5. S3 storage classes:

  • S3 standard: 99.99% availability, 99.99999999% durability, it is the default storage class.
  • S3 IA (infrequently Accessed)
  • S3 one zone - IA
  • S3 Intelligent Tiering
  • S3 Glacier (for data archiving, 99.999999999% durability of archives )
  • S3 Glacier Deep Archive (retrieve data in 12hours)

S3 Pricing Tiers:
You pay per:

  • Storage
  • Requests and data retrieval
  • Data transfer

Most expensive: S3 standard, then followed by:

  • S3 IA
  • then S3 Intelligent Tiering
  • then S3 one zone IA
  • then S3 glacier
  • and finally S3 glacier deep archive.

6. S3 Encryption:

Two types of encryption:

  • Encryption in Transit: SSL/TLS
  • Encryption at Rest (server side), there are three types of server side encryption:
    • S3 managed keys -SSE -S3,
    • AWS Key Management Service,
    • Server side encryption with customer provided keys SSE-C
  • Then there is client side encryption

8. S3 Security:

  • User based: IAM policies.
  • Resource based, that can be managed in three ways:
  • Bucket policies, used to:
    • Grant public access to the bucket
    • Force a bucket to be encrypted at upload
    • Grant access to another account (Cross Account)
  • Object ACL,
  • Bucket ACL.

9. S3 CORS:

  • If you request data from another S3 bucket, you need to enable CORS.
  • Cross Origin Resource Sharing allows you to limit the number of websites that can request your files in S3, thus limit your costs.

10. Consistency Model

  • Read after write consistency for PUTS of new objects:
    • As soon as an object is written, we can retrieve it, ex: PUT 200 -> GET 200)
    • This is true, except if we did a GET before to see if the object existed (ex: GET 404 -> PUT 200 -> GET 404) – eventually consistent
  • Eventual Consistency for DELETES and PUTS of existing objects
    • If we read an object after updating, we might get the older version (ex: PUT 200 -> PUT 200 -> GET 200 (might be older version))
    • If we delete an object, we might still be able to retrieve it for a short time (ex: DELETE 200 -> GET 200)

11. S3 Access Logs:

  • For audit purpose
  • Any request made to S3, from any account, authorized or denied will be logged into another S3 bucket
  • That data ca be analyzed using data analysis tools like Athena.

12. S3 pre-signed URLs:

  • Can generate pre-signed URLs using SDK or CLI
  • For download (easy, can use the CLI)
  • For uploads (harder, must use the SDK)
  • Valid for a default of 3600s, can change timeout with –expires in {TIME_BY_SECONDS] argument
  • Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT. Examples:
  • Allow only logged in users to download a premium video on your S3 bucket
  • Allow an ever changing list of users to download files by generating URLs dynamically
  • Allow temporarily a user to upload a file to precise location in our bucket

13. S3 Performance:

  • Baseline Performance:
    • S3 scale automatically to high request rates, latency 100-200ms
    • Your app ca achieve at least 3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix in a bucket.
  • KMS Limitation:
    • If you use SSE-KMS, you may be imapcted by the KMS limits
    • When you upload, it call the GenerateDataKey KMS API
    • When you download, it calls the Decrypt KMS API
    • Count towards the KMS quota per second (5500, 10000, 3000 req/s based on region)
    • You cant request a quota increase for KMS
  • Multi Part upload:
    • Recommended for files > 100MB, must use for files > 5GB
    • Can help parallelize uploads (divied in parts and speed up transfers)
  • S3 Transfer Acceleration (upload only)

  • S3 Byte range Fetches

    • Parallelize GETs by requesting specific byte ranges
    • Better resilience in case of failures
    • Can be used to speed up downloads
    • Can be used to retrieve only partial data (for example the head of a file)

14. Select & Glacier Select:

  • Retreive less data using SQL by performing server side filtering
  • Can filter by rows & columns (simple SQL statements, server side filtering)
  • Less network transfer, less CPU cost client side.

15. Object & Glacier Vault Lock:
Alt Text

Do you know any other functionnality of S3 that I didn't mention, please feel free to post it in the comment.

Discussion (0)