1) What is Amazon S3?
Amazon S3 (Simple Storage Service) is an AWS service used to store files like images, videos, logs, backups, datasets, and reports as objects inside buckets.
- Bucket = main container (like a top-level folder)
- Object = the actual file (data + metadata)
S3 is widely used for:
- Data lakes
- Backups and disaster recovery
- Application logs
- Static website files
- Analytics and machine learning datasets
- Long-term archiving and compliance
2) Why does S3 have multiple storage classes?
Not all data is used in the same way:
- Some data is used daily (hot data)
- Some data is used sometimes (cold data)
- Some data is almost never used (archive data)
So AWS provides different S3 storage classes to help you balance:
- Cost – how much you pay for storage
- Speed – how fast you can read data
- Availability – how often data is accessible
- Risk – multi-AZ vs single-AZ
- Retrieval fee – extra cost when you download data in some classes
3) Key Terms
| Term | Simple Meaning | Easy Example |
|---|---|---|
| Durability | How safe your data is from being lost | Even if disks fail, AWS still keeps your file safe |
| 11 nines durability (99.999999999%) | Extremely high safety | “Almost never lost” |
| Availability | How often data is accessible | 99.99% means very little downtime |
| Latency | How fast you can access data | Milliseconds = very fast |
| Throughput | How much data can be read/written per second | Important for big analytics jobs |
| Retrieval fee | Extra cost when you download data | Some classes charge when you read |
| Availability Zone (AZ) | One data center inside a region | Multi-AZ is safer than single AZ |
4) There are 8 Storage Classes
4.1 S3 Standard – Hot Data
- Used for frequently accessed and business-critical data.
Key Features:
- Very fast access (milliseconds): Suitable for real-time applications and user-facing systems.
- High availability: Designed to be available almost all the time for applications.
- Multi-AZ durability: Data is safely stored across multiple data centers.
- No retrieval fee: You don’t pay extra when reading or downloading data.
Use Cases:
- Website images and videos served to users
- Daily application logs used by engineers
- Active analytics datasets queried many times per day
- Frequently used ML training and inference data
Example: Today’s sales data used every hour → S3 Standard
Remember: Standard = Hot + Fast
4.2 S3 Intelligent-Tiering – AWS Decides Automatically
- For data where you don’t know how often it will be accessed.
Key Features:
- Automatic movement between tiers: AWS moves objects to cheaper tiers when access reduces.
- No performance impact: Applications access data the same way.
- Small monitoring fee: Charged for AWS to track access patterns.
Use Cases
- Data lakes where new data is hot and old data becomes cold
- ML datasets where some features are used more than others
- Analytics history that changes in access frequency
Example: Some months of logs are queried often, others not → Intelligent-Tiering
Remember: Intelligent = “I don’t know access pattern”
4.3 S3 Standard-IA – Cold but Fast
- For data accessed rarely, but must be accessed immediately when needed.
Key Features:
- Lower storage cost than Standard: Helps save money for infrequently used data.
- Fast access: Still milliseconds when you retrieve data.
- Retrieval fee applies: Extra cost when you download data.
- Multi-AZ durability: Safe across multiple data centers.
Use Cases:
- Backups used only during failures
- Disaster recovery data
- Old reports accessed occasionally
Example: Weekly backups restored only during failure → Standard-IA
Remember: IA = Rare, but fast
4.4 S3 One Zone-IA – Cheaper but Risky
- Same as Standard-IA, but stored in one Availability Zone only.
Key Features:
- Cheaper than Standard-IA: Cost saving for non-critical data.
- Single AZ risk: If that AZ goes down, data can be unavailable.
- Fast access: Still millisecond latency.
- Retrieval fee applies.
Use Cases:
- Re-creatable ETL outputs
- Temporary pipeline files
- Secondary backups
Example: Temporary pipeline files → One Zone-IA
Remember: One Zone = Cheap + Risk
4.5 S3 Glacier Instant Retrieval – Archive + Fast
- S3 Glacier Instant Retrieval is a storage class for archived data that is rarely accessed, but when you need it, you can open it immediately. It is mainly used for long-term storage where data is kept for compliance or record-keeping, but still needs instant access sometimes.
Key Features
- Very low storage cost
- Instant (milliseconds) access
- Retrieval fee applies
- Multi-AZ durability
Use Cases:
- Compliance documents that must open quickly
- Audit logs needed during investigations
Example: Legal docs opened only during audits → Glacier Instant
Remember: Glacier Instant = Archive + Fast
4.6 S3 Glacier Flexible Retrieval – Archive + Wait
- S3 Glacier Flexible Retrieval is used for archived data that is almost never accessed, and when it is accessed, you are okay to wait some time before getting the data back. This class is mainly for long-term backups and historical data.
Key Features:
- Very low cost for long-term storage
- Multiple retrieval speeds: expedited, standard, bulk
- Suitable for large archive restores
Use Cases:
- Old backups
- Historical logs
Remember: Flexible = Waiting is okay
4.7 S3 Glacier Deep Archive – Cheapest + Slowest
- S3 Glacier Deep Archive is the lowest-cost storage class in Amazon S3. It is used for data that must be kept for many years and is almost never accessed. This is mainly for legal, regulatory, and compliance requirements.
Key Features:
- Cheapest storage class
- Retrieval time 12–48 hours
- Best for compliance and legal retention
Use Cases:
- Financial records
- Government data
Remember: Deep Archive = Coldest + Slowest + Cheapest
4.8 S3 Express One Zone – Extra Fast, Single AZ
S3 Express One Zone is a storage class designed for very high-performance workloads. It is used when applications need very low latency and very high request rates for reading and writing data. Data is stored in only one Availability Zone, so it is faster but less resilient compared to multi-AZ classes.
Key Features :
- Ultra-fast performance for request-heavy workloads
- High throughput for many small reads/writes
- Stored in one AZ only (less resilient)
Use Cases:
- Real-time analytics
- ML feature stores
- Hot ETL intermediate data
Example: Pipeline reading millions of small files → Express One Zone
Remember: Express = Extra fast, One Zone = Single AZ
5) Comparision table for all 8 S3 Storage Classes
| Storage Class | Access Pattern | Retrieval Speed | Storage Cost | Extra Cost | Availability / Risk | Best For |
|---|---|---|---|---|---|---|
| S3 Standard | Frequently accessed | Milliseconds | High | No | Multi-AZ, very safe | Hot data, websites, active logs |
| S3 Intelligent-Tiering | Unknown / changing | Milliseconds | Medium | Monitoring fee | Multi-AZ | Unpredictable workloads |
| S3 Standard-IA | Infrequent but fast needed | Milliseconds | Lower | Retrieval fee | Multi-AZ | Backups, DR |
| S3 One Zone-IA | Infrequent, non-critical | Milliseconds | Cheaper | Retrieval fee | Single AZ risk | Re-creatable data |
| S3 Glacier Instant Retrieval | Rare but instant needed | Milliseconds | Very low | Retrieval fee | Multi-AZ | Compliance archives |
| S3 Glacier Flexible Retrieval | Very rare access | Minutes → Hours | Very low | Retrieval fee | Multi-AZ | Old backups, logs |
| S3 Glacier Deep Archive | Almost never accessed | 12–48 hours | Lowest | Retrieval fee | Multi-AZ | Legal & long-term records |
| S3 Express One Zone | Very frequent, high-performance | Ultra-fast | Higher | Request-based pricing | Single AZ | High-performance analytics, ML |
6) How to Choose Quickly
Ask yourself these 3 simple questions:
i) How often will the data be accessed?
- Daily or many times a day → S3 Standard
- Not sure / changes over time → S3 Intelligent-Tiering
- Rarely → Use IA or Glacier classes
ii) When needed, how fast must I get the data?
- Instant (milliseconds) → Standard, Standard-IA, Glacier Instant
- Can wait minutes or hours → Glacier Flexible
- Can wait 1–2 days → Glacier Deep Archive
iii) Is the data critical or can it be recreated?
- Critical data → Choose multi-AZ classes
- Non-critical or re-creatable data → Choose single-AZ classes
Quick Mapping Table
| Scenario | Best Choice |
|---|---|
| App serving images every second | S3 Standard |
| Logs with changing access patterns | Intelligent-Tiering |
| Weekly backups | Standard-IA |
| Temporary ETL output | One Zone-IA |
| Compliance docs needing instant access | Glacier Instant |
| Large archive restores | Glacier Flexible |
| 10-year legal retention | Glacier Deep Archive |
| High-performance ML feature reads | S3 Express One Zone |
7) How to Remember
- Hot → Standard
- Unknown → Intelligent
- Cold → IA
- Very Cold → Glacier
- Coldest → Deep Archive
- Ultra-fast hot data → Express One Zone
8) What is Amazon S3 and What is a Bucket?
Amazon S3 (Simple Storage Service) is a cloud storage service provided by AWS. It is used to store files and data such as images, videos, logs, backups, datasets, and documents.
An Amazon S3 bucket is the main container where all your files (objects) are stored. You cannot upload a file directly to S3 without a bucket. Every file must be inside a bucket.
- Bucket is like a main folder
- Object is like a file inside the folder
Example You create a bucket named company-data-bucket.
Inside this bucket, you store:
- logs/app-logs-2026.json
- reports/sales-jan.csv
- images/profile.png
Here, company-data-bucket is the bucket, and each file is an object.
9) Basic Structure of Amazon S3
| Term | Meaning in Simple Words | Example |
|---|---|---|
| Bucket | The top-level container | company-analytics-bucket |
| Object | The actual file stored | 2026/jan/sales.csv |
| Key | The full path of the file inside the bucket | 2026/jan/sales.csv |
| Region | The AWS location where the bucket lives | us-east-1, ap-south-1 |
Important points:
- Each bucket belongs to one AWS region
- Your data is physically stored in that region
- You can access the bucket from anywhere if permissions allow it
10) Why Do We Need Amazon S3 Buckets?
Amazon S3 buckets are used to store and manage almost all types of data in the cloud.
Common real-world use cases:
Data lakes Store raw data, logs, CSV, JSON, and Parquet files
Backups Store database backups, server backups, and application backups
Application files Store images, videos, and documents used by web and mobile apps
Analytics and Big Data Store data for Athena, Glue, EMR, and Redshift Spectrum
Static website hosting Store HTML, CSS, and JavaScript files for static websites
In short, Amazon S3 buckets are the foundation of data storage in AWS.
11) Amazon S3 Bucket Naming Rules
S3 bucket names follow strict global rules. These rules exist because bucket names are used in URLs and must work with the internet DNS system.
Rule 1: Globally Unique Name
Every bucket name must be globally unique across all AWS accounts and regions. If someone else has already created a bucket with a name, you cannot use that name.
Example:
- mybucket may already be taken
- mycompany-analytics-2026 is more likely to be available
Rule 2: Length Rules
Bucket name length must be between 3 and 63 characters.
Rule 3: Allowed Characters
You can use only: lowercase letter from a to z,numbers from 0 to 9,hyphens,dots
You cannot use: uppercase letters,underscores,spaces,special characters
Examples: my-data-bucket,company.logs.backup,analytics2026
Invalid examples:
- MyBucket
- my_bucket
- my bucket
Rule 4: Start and End with Letter or Number
Bucket name must start and end with a letter or number. It should not start or end with a hyphen or dot.
Rule 5: No IP Address Format
Bucket names cannot look like an IP address such as 192.168.1.1. This is because bucket names are used in URLs.
12) Why These Rules Exist
Amazon S3 buckets are accessed using web URLs like:
https://my-data-bucket.s3.amazonaws.com/file.csv
To make sure these URLs work correctly with:
- internet routing, DNS system, SSL certificates
AWS enforces strict bucket naming rules.
13) Important Features of Amazon S3 Buckets
Region
When you create a bucket, you select a region. Your data stays in that region. This helps with low latency, cost control, and legal compliance.
Access Control
By default, buckets are private. You control access using:
- IAM users and roles, Bucket policies
Public access is usually used only for public website content.
Versioning
Versioning keeps multiple versions of the same file. If someone overwrites or deletes a file, older versions are still stored. This helps with data recovery and mistake protection.
Encryption
Amazon S3 supports encryption to protect your data. Data can be encrypted:
- at rest, in transit
Encryption is important for security and compliance requirements.
Lifecycle Rules
Lifecycle rules help you automate storage management. You can move old data to cheaper storage classes or delete data after a fixed time. This helps reduce storage cost automatically.
14) Real-Life Example from Data Engineering
In a real data engineering project:
- New logs come every day
- Old logs are accessed rarely
- Compliance rules require keeping data for many years
You may create different buckets:
- company-raw-logs for daily logs
- company-processed-data for transformed data
- company-archive-data for long-term storage
Lifecycle rules can move old files automatically to cheaper storage classes.
15) How to Remember Amazon S3 Bucket Rules
Use the word BUCKET as a memory trick:
- B means Bucket is the main container
- U means Unique globally
- C means Characters allowed are lowercase letters, numbers, hyphens, and dots
- K means Keep name length between 3 and 63
- E means End with a letter or number
- T means Tied to one AWS region
Top comments (0)