Data Tech Bridge

Posted on Mar 1

Amazon S3 Service Cheat Sheet

S3 Fundamentals

1. Amazon S3 (Simple Storage Service) is an object storage service offering industry-leading scalability, data availability, security, and performance.

2. S3 Storage Classes:

Storage Class	Designed For	Availability	Min Storage Duration	Retrieval Fee	Use Cases
S3 Standard	Frequently accessed data	99.99%	None	None	Big data analytics, content distribution
S3 Intelligent-Tiering	Data with unknown or changing access patterns	99.9%	None	None	Long-lived data with unpredictable access patterns
S3 Standard-IA	Infrequently accessed data	99.9%	30 days	Per GB	Backups, disaster recovery
S3 One Zone-IA	Infrequently accessed, non-critical data	99.5% (single AZ)	30 days	Per GB	Secondary backups, easily recreatable data
S3 Glacier Instant Retrieval	Archive data needing immediate access	99.9%	90 days	Per GB	Media archives, healthcare records
S3 Glacier Flexible Retrieval	Archive data that rarely needs access	99.99%	90 days	Per GB + retrieval	Digital preservation, compliance archives
S3 Glacier Deep Archive	Long-term archive	99.99%	180 days	Per GB + retrieval	Financial records, healthcare data
S3 Outposts	On-premises S3 storage	Varies	None	None	Local data processing with S3 compatibility

3. S3 Bucket Naming Rules: Globally unique, 3-63 characters, lowercase letters, numbers, hyphens, no IP format, must start with letter/number.

4. S3 Object Properties: Key (name), Value (data), Version ID, Metadata, Subresources (ACLs, Torrent).

5. S3 Object Size Limits: Individual objects can be from 0 bytes to 5 TB; objects larger than 5 GB must use multipart upload.

S3 Performance and Optimization

6. S3 Performance: S3 automatically scales to high request rates with latency of 100-200ms.

7. S3 Request Rates:

3,500 PUT/COPY/POST/DELETE requests per second per prefix
5,500 GET/HEAD requests per second per prefix

8. S3 Performance Optimization Techniques:

Technique	Description	Best For
Prefix Parallelization	Use multiple prefixes to increase throughput	High-throughput applications
Multipart Upload	Split large objects into parts for parallel upload	Objects > 100 MB
S3 Transfer Acceleration	Fast transfer over long distances using CloudFront	Global data transfers
Byte-Range Fetches	Parallel downloads of specific byte ranges	Large file partial access
S3 Select	Server-side filtering to reduce data transfer	Analytics on subset of data
S3 Inventory	Scheduled flat-file output of objects and metadata	Large bucket management
Partitioning Strategy	Randomized prefixes to distribute load	Very high throughput needs

9. Multipart Upload Calculation Example:

5 GB file with 100 MB parts = 50 parts uploaded in parallel
With 500 Mbps connection: ~80 seconds vs ~400 seconds for single-part upload

10. S3 Transfer Acceleration: Uses CloudFront's globally distributed edge locations to accelerate uploads to S3 by up to 500% for long-distance transfers.

S3 Data Management

11. S3 Lifecycle Policies automate transitioning objects between storage classes or expiring objects based on age.

12. S3 Lifecycle Transitions:

From	To	Minimum Days
Standard	Standard-IA	30 days
Standard	Intelligent-Tiering	None
Standard	One Zone-IA	30 days
Standard	Glacier Instant Retrieval	30 days
Standard	Glacier Flexible Retrieval	30 days
Standard	Glacier Deep Archive	90 days
Standard-IA	Glacier Instant Retrieval	30 days
Standard-IA	Glacier Flexible Retrieval	30 days
Standard-IA	Glacier Deep Archive	90 days
Intelligent-Tiering	Glacier Instant Retrieval	90 days
Intelligent-Tiering	Glacier Flexible Retrieval	90 days
Intelligent-Tiering	Glacier Deep Archive	180 days

13. S3 Versioning keeps multiple variants of objects in the same bucket, allowing recovery from accidental deletions or overwrites.

14. S3 Replication:

Cross-Region Replication (CRR): Replicate objects across regions for compliance, lower latency, or disaster recovery
Same-Region Replication (SRR): Replicate objects within the same region for log aggregation or production/test sync
Replication requires versioning enabled on both source and destination buckets

15. S3 Batch Operations perform bulk operations on existing S3 objects with a single request, such as copying objects, setting ACLs, or restoring from Glacier.

S3 Security and Access Control

16. S3 Security Features:

Feature	Description	Use Case
IAM Policies	Identity-based policies	User/role access control
Bucket Policies	Resource-based policies	Cross-account access
ACLs	Legacy access control	Simple permission grants
Presigned URLs	Temporary access to objects	Temporary download/upload
VPC Endpoints	Private connection from VPC	No internet access needed
Access Points	Named network endpoints	Simplified access management
Object Lock	WORM (Write Once Read Many)	Compliance requirements
S3 Block Public Access	Prevent public access	Data protection

17. S3 Encryption Options:

Encryption Type	Description	Key Management
SSE-S3	Server-side encryption with S3-managed keys	AWS manages keys
SSE-KMS	Server-side encryption with KMS keys	Customer controls via KMS
SSE-C	Server-side encryption with customer-provided keys	Customer provides keys
Client-side encryption	Encryption before uploading to S3	Customer manages keys

18. S3 Default Encryption is enabled automatically for all new buckets with SSE-S3 (AES-256).

19. S3 Object Lock provides WORM (Write Once Read Many) model with two retention modes:

Governance mode: Users with special permissions can override
Compliance mode: No one can override during retention period, including AWS account root user

S3 Data Processing and Analytics

20. S3 Select and Glacier Select allow you to use SQL expressions to retrieve only a subset of data from an object, reducing data transfer and improving query performance by up to 400%.

21. S3 Event Notifications can trigger workflows when objects are created, deleted, or restored:

Destinations: SNS, SQS, Lambda
Event types: ObjectCreated, ObjectRemoved, ObjectRestore, Replication, LifecycleExpiration

22. S3 Inventory provides scheduled reports of objects and metadata, useful for business, compliance, and regulatory needs.

23. S3 Analytics helps analyze storage access patterns to decide when to transition objects to appropriate storage class.

24. S3 Storage Lens provides organization-wide visibility into object storage usage and activity with customizable dashboards.

S3 Data Transfer and Integration

25. AWS DataSync provides high-speed data transfer between on-premises storage and S3 with automatic encryption and data validation.

26. AWS Transfer Family provides SFTP, FTPS, and FTP access to S3, enabling file transfers over these protocols directly to and from S3 buckets.

27. S3 Integration with AWS Services:

AWS Service	Integration with S3
AWS Glue	Catalog and ETL for data in S3
Amazon Athena	SQL queries directly on S3 data
Amazon EMR	Big data processing on S3 data
AWS Lambda	Process S3 events
Amazon QuickSight	Visualize data stored in S3
AWS Lake Formation	Build, secure, and manage data lakes
Amazon Redshift	Query data in S3 with Redshift Spectrum
Amazon SageMaker	ML model training with S3 data
AWS Backup	Centralized backup of S3 data

28. S3 Data Ingestion Patterns:

Pattern	Description	Best For
Direct API	Applications use S3 API directly	Simple workflows
S3 Transfer Acceleration	Fast long-distance uploads	Global data sources
Kinesis Data Firehose	Streaming data delivery to S3	Real-time data capture
AWS DataSync	Scheduled transfers from on-premises	Large dataset migration
AWS Snowball/Snowmobile	Physical data transfer	Petabyte-scale transfers
AWS DMS	Database migration to S3	Database archiving

S3 Cost Management

29. S3 Pricing Components:

Storage pricing (per GB-month)
Request pricing (per 1,000 requests)
Data transfer pricing (per GB)
Management features and analytics
Retrieval fees (for IA and Glacier classes)

30. S3 Cost Optimization Strategies:

Strategy	Description	Savings Potential
Storage Class Analysis	Identify optimal storage class	20-50%
Lifecycle Policies	Automate transitions and expirations	30-70%
S3 Intelligent-Tiering	Automatic tiering based on access patterns	15-40%
S3 Storage Lens	Identify cost optimization opportunities	Varies
S3 Inventory	Identify objects for cleanup	Varies
S3 Batch Operations	Bulk delete unused objects	Varies
S3 Same-Region Replication	Replicate only necessary data	Varies

31. S3 Storage Cost Calculation Example:

100 TB in S3 Standard: ~$2,300/month
Same data with 80% in S3 Standard-IA: ~$1,500/month (35% savings)
With lifecycle policy moving 50% to Glacier after 90 days: ~$1,000/month (56% savings)

S3 Limits and Quotas

32. S3 Service Limits:

Limit	Value	Can be increased?
Maximum buckets per account	100	Yes (service quota)
Maximum object size	5 TB	No
Maximum object size (console upload)	160 GB	No
Maximum parts in multipart upload	10,000	No
Minimum part size (except last part)	5 MB	No
Maximum part size	5 GB	No
Maximum bucket policy size	20 KB	No
Maximum number of access points per Region	10,000	Yes
Maximum lifecycle rules per bucket	1,000	No
Maximum tags per object	10	No

33. S3 Rate Limits and Throttling:

Default limits can handle extremely high request rates
S3 automatically scales to accommodate sustained request rates
For extreme workloads (>100 requests/second), consider prefix partitioning

34. Overcoming S3 Rate Limits:

Implement exponential backoff for 503 errors
Distribute load across multiple prefixes
Use randomized prefixes for high-throughput workloads
Consider S3 Transfer Acceleration for uploads

S3 Data Consistency Model

35. S3 Data Consistency: S3 provides strong read-after-write consistency for all operations as of December 2020.

36. S3 Consistency Guarantees:

New objects: Immediate visibility after successful write
Overwrite PUTS and DELETES: Immediate consistency for reads
LIST operations: Consistent view of all objects

S3 Glacier Features

37. S3 Glacier Retrieval Options:

Retrieval Type	Retrieval Time	Cost
Expedited	1-5 minutes	Highest
Standard	3-5 hours	Medium
Bulk	5-12 hours	Lowest

38. S3 Glacier Vault Lock provides WORM (Write Once Read Many) protection with compliance controls that even the root user cannot modify.

39. S3 Glacier Restore Calculation Example:

1 TB data with Standard retrieval: ~$10 retrieval fee + ~$90 for 30-day restored copy in S3
Same data with Bulk retrieval: ~$3 retrieval fee + ~$90 for 30-day restored copy in S3

S3 Data Protection

40. S3 Versioning keeps multiple variants of objects, allowing recovery from accidental deletions or overwrites.

41. S3 MFA Delete requires additional authentication for permanently deleting object versions or suspending versioning.

42. S3 Cross-Region Replication (CRR) automatically replicates data across regions for compliance, lower latency, or disaster recovery.

43. S3 Same-Region Replication (SRR) replicates data within the same region for log aggregation or production/test sync.

44. S3 Object Lock prevents objects from being deleted or overwritten for a fixed time or indefinitely.

45. S3 Replication Time Control (RTC) replicates 99.99% of objects within 15 minutes with SLA backing.

S3 Performance Monitoring

46. Key CloudWatch Metrics for S3:

Metric	Description	Threshold Recommendation
BucketSizeBytes	Total bucket size	Set alerts based on expected growth
NumberOfObjects	Total object count	Monitor for unexpected changes
AllRequests	Total request count	Baseline + 20% for alerts
4xxErrors	Client errors	<1% of total requests
5xxErrors	Server errors	<0.01% of total requests
FirstByteLatency	Time to first byte	P90 < 200ms
TotalRequestLatency	Total request time	P90 < 300ms
BytesDownloaded	Data downloaded	Monitor for cost management
BytesUploaded	Data uploaded	Monitor for cost management
ReplicationLatency	Time for replication	<15 minutes (with RTC)

47. S3 Request Metrics can be enabled for specific prefixes, objects, or entire buckets to track request counts, latencies, and errors.

48. S3 Replication Metrics track pending operations, latency, and bytes pending replication.

S3 Data Lake Integration

49. S3 as a Data Lake Foundation:

Unlimited scalability for any data type
Cost-effective with storage classes
Centralized access control
Integration with analytics services

50. S3 Data Lake Architecture Components:

Component	AWS Service	Purpose
Storage	S3	Raw data storage
Catalog	AWS Glue	Metadata management
Security	Lake Formation	Fine-grained access control
Processing	EMR, Athena, Redshift Spectrum	Data processing
Orchestration	Step Functions, Airflow	Workflow management
Ingestion	Kinesis, DataSync, Transfer Family	Data acquisition

51. S3 Data Partitioning Strategies:

Strategy	Format	Best For
Time-based	s3://bucket/data/year=2023/month=05/day=01/	Time-series data
Category-based	s3://bucket/data/region=us-east-1/product=widget/	Dimensional data
Hive-style	s3://bucket/table_name/key1=val1/key2=val2/	Compatibility with Hive
Nested	s3://bucket/data/year=2023/month=05/region=us-east/	Multi-dimensional analysis

52. S3 Data Formats for Analytics:

Format	Compression	Splittable	Schema Evolution	Best For
Parquet	Yes	Yes	Yes	Column-oriented analytics
ORC	Yes	Yes	Yes	Hive workloads
Avro	Yes	Yes	Yes	Schema evolution
JSON	Yes	No	Yes	Flexibility, human-readable
CSV	Yes	No	Limited	Simple data, compatibility

S3 Data Ingestion Patterns

53. S3 Batch Operations perform bulk operations on existing S3 objects with a single request.

54. S3 Batch Operations Job Properties:

Operation type (copy, invoke Lambda, restore, etc.)
Manifest (list of objects to process)
Priority (numeric value for job ordering)
RoleArn (IAM role with permissions)
Report configuration (completion report details)

55. S3 Event Notifications can trigger workflows when objects are created, deleted, or restored.

56. S3 Event Notification Filtering supports prefix and suffix matching to process only relevant objects.

57. S3 Event Notification Destinations:

SNS Topics: Fan-out to multiple subscribers
SQS Queues: Reliable message processing
Lambda Functions: Custom code execution
EventBridge: Advanced filtering and routing

58. Kinesis Data Firehose to S3 provides real-time streaming data delivery with:

Automatic batching for efficiency
Format conversion (JSON to Parquet/ORC)
Data transformation via Lambda
Error handling with backup bucket

59. S3 Data Ingestion Pipeline Replayability Strategies:

Strategy	Implementation	Pros	Cons
Source-based replay	Reread from source system	Complete fidelity	Source system dependency
S3 versioning	Maintain object versions	Simple implementation	Storage costs
Backup copy	Duplicate data to another bucket	Isolation from production	Storage costs
Event-driven	Store events in SQS/Kinesis	Decoupled processing	Additional complexity
Manifest-based	Track processed files	Precise control	Requires manifest management

60. Throttling Implementation for S3 Data Ingestion:

Client-side rate limiting
SQS as a buffer with controlled processing
Lambda concurrency limits
API Gateway throttling for web-based uploads

S3 Advanced Features

61. S3 Access Points simplify managing access to shared datasets with dedicated access policies.

62. S3 Object Lambda transforms data retrieved from S3 before returning to the application, enabling:

Redacting PII
Converting formats
Enriching data
Filtering rows/columns

63. S3 Requester Pays buckets require the requester to pay for data transfer and request costs instead of the bucket owner.

64. S3 Inventory provides scheduled flat-file output listing objects and metadata.

65. S3 Batch Operations with Inventory enables bulk operations on objects identified in inventory reports.

66. S3 Storage Class Analysis helps identify when to transition objects to lower-cost storage classes.

67. S3 Select Query Examples:

CSV: SELECT s._1, s._2 FROM S3Object s WHERE s._3 > 100
JSON: SELECT s.name, s.age FROM S3Object s WHERE s.age > 25
Parquet: SELECT * FROM S3Object WHERE age > 30 LIMIT 100

S3 Performance Best Practices

68. S3 Performance Best Practices:

Best Practice	Implementation	Benefit
Prefix Parallelization	Use multiple prefixes	Higher throughput
Range GETs	Parallel byte-range fetches	Faster large object access
Transfer Acceleration	Enable for bucket	Faster long-distance transfers
Multipart Upload	Split large files	Parallel upload, resiliency
S3 Select	Server-side filtering	Reduced network transfer
Compression	Compress objects	Lower storage costs, faster transfer
Caching	CloudFront or ElastiCache	Lower latency, reduced load

69. S3 Multipart Upload Thresholds:

Recommended for objects > 100 MB
Required for objects > 5 GB
Optimal part size: 25-100 MB for typical networks
Maximum of 10,000 parts per upload

70. S3 Performance Testing Methodology:

Establish baseline with single-threaded transfers
Test with multiple threads/connections
Experiment with different part sizes
Measure with different prefix strategies
Compare with/without Transfer Acceleration

S3 Data Processing Patterns

71. S3 Event-Driven Processing Patterns:

Pattern	Implementation	Use Case
Direct Lambda	S3 event → Lambda	Simple transformations
Queue-based	S3 event → SQS → Lambda	Rate limiting, retry handling
Fan-out	S3 event → SNS → multiple endpoints	Multiple consumers
Orchestrated	S3 event → Step Functions	Complex workflows
Stream processing	S3 event → Kinesis → processors	Real-time analytics

72. S3 Batch Processing Patterns:

Pattern	Implementation	Use Case
EMR	Spark/Hadoop on EMR reading from S3	Big data processing
Glue ETL	AWS Glue jobs reading from S3	Serverless ETL
Batch Operations	S3 Batch Operations with Lambda	Object-level operations
Athena	SQL queries directly on S3	Interactive analysis
Redshift Spectrum	Redshift external tables on S3	Data warehousing

73. S3 Data Lake Processing Layers:

Layer	Description	S3 Implementation
Raw/Bronze	Original unmodified data	S3 Standard with lifecycle to IA/Glacier
Processed/Silver	Cleansed, validated data	S3 Standard with partitioning
Curated/Gold	Business-ready datasets	S3 Standard with optimized formats
Application	Purpose-built data products	S3 Standard with CloudFront

S3 Security Best Practices

74. S3 Security Best Practices:

Best Practice	Implementation	Benefit
Block Public Access	Enable at account level	Prevent accidental exposure
Default Encryption	Enable SSE-S3 or SSE-KMS	Data protection at rest
VPC Endpoints	Create Gateway Endpoint for S3	Private network access
Access Logging	Enable S3 access logging	Audit and compliance
IAM Policies	Least privilege principle	Controlled access
Bucket Policies	Explicit allow/deny	Resource-level control
Presigned URLs	Time-limited access	Temporary permissions
Object Lock	Enable for critical data	Immutability

75. S3 Security Monitoring:

CloudTrail for API activity
S3 Access Logs for object-level access
CloudWatch Metrics for operation counts
AWS Config for configuration compliance
Macie for sensitive data detection

S3 Data Migration and Transfer

76. S3 Data Migration Options:

Option	Transfer Speed	Data Size Range	Use Case
Direct Upload	Depends on bandwidth	MB to GB	Small files, good connectivity
AWS CLI	Depends on bandwidth	GB to TB	Command-line automation
S3 Transfer Acceleration	Up to 500% faster	GB to TB	Long-distance transfers
AWS DataSync	Up to 10 Gbps	TB to PB	Scheduled migrations
AWS Transfer Family	Depends on bandwidth	GB to TB	FTP/SFTP compatibility
AWS Storage Gateway	Depends on bandwidth	TB to PB	Hybrid cloud integration
AWS Snowcone	Offline	Up to 8 TB	Edge locations
AWS Snowball	Offline	Up to 80 TB	Large datasets
AWS Snowmobile	Offline	Up to 100 PB	Massive data centers

77. S3 Transfer Acceleration Performance Comparison:

1 TB transfer from US to Australia:
- Standard transfer: ~12 hours
- With Transfer Acceleration: ~2.5 hours

78. AWS DataSync Performance:

Up to 10 Gbps throughput
Parallel processing of files
Automatic retry mechanism
Built-in validation

S3 Compliance and Governance

79. S3 Compliance Features:

Feature	Implementation	Compliance Need
Object Lock	WORM protection	SEC Rule 17a-4, FINRA, CFTC
Glacier Vault Lock	Immutable vault policy	Long-term records retention
Access Logging	Detailed access logs	Audit requirements
Inventory Reports	Scheduled metadata reports	Asset management
Replication	Cross-region or same-region	Data residency, DR
Versioning	Object version history	Change tracking
Lifecycle Policies	Automated retention	Records management
Macie Integration	Sensitive data discovery	PII protection

80. S3 Object Lock Modes:

Governance mode: Special permissions can override
Compliance mode: No overrides during retention period
Legal hold: Indefinite retention independent of retention period

S3 Integration with Data Engineering Services

81. S3 Integration with AWS Glue:

Glue crawlers scan S3 data to populate the Glue Data Catalog
Glue ETL jobs read from and write to S3
Glue Data Catalog provides metadata for S3 objects
Glue schema registry manages schemas for S3 data

82. S3 Integration with Amazon Athena:

Serverless SQL queries directly on S3 data
Supports CSV, JSON, ORC, Avro, Parquet
Pay only for data scanned
Federated queries to other data sources

83. S3 Integration with Amazon EMR:

EMRFS provides S3 access from EMR clusters
S3DistCp for efficient data copying
EMR File System (EMRFS) consistency view
EMR supports direct processing of S3 data

84. S3 Integration with Amazon Redshift:

COPY command loads data from S3 to Redshift
Redshift Spectrum queries data directly in S3
Unload command exports data to S3
Automatic compression encoding

85. S3 Integration with AWS Lake Formation:

Centralized permissions for S3 data lakes
Column-level, row-level, and cell-level security
Tag-based access control
Data location registration

S3 Throughput and Latency Characteristics

86. S3 Throughput Characteristics:

Unlimited total throughput (scales with request rate)
3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix
No bandwidth limits for a single bucket
Multipart uploads recommended for objects > 100 MB

87. S3 Latency Characteristics:

First-byte latency: typically 100-200 ms
Varies by region and request type
GET operations faster than PUT operations
S3 Transfer Acceleration improves latency for distant clients

88. S3 Performance Comparison:

Operation	Average Latency	Throughput Limit	Optimization
GET	100-200 ms	5,500 req/s per prefix	CloudFront caching
PUT	200-300 ms	3,500 req/s per prefix	Multipart upload
LIST	Varies with objects	Rate limited	Prefix organization
DELETE	200-300 ms	3,500 req/s per prefix	Batch operations

89. S3 Performance Monitoring Metrics:

FirstByteLatency: Time to first byte
TotalRequestLatency: End-to-end latency
BytesDownloaded/BytesUploaded: Data transfer volume
4xxErrors/5xxErrors: Error counts
ReplicationLatency: Time for replication

S3 Cost Optimization

90. S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns, with no retrieval charges or operational overhead.

91. S3 Lifecycle Configuration Example:

{
  "Rules": [
    {
      "ID": "Move to IA after 30 days, Glacier after 90, expire after 365",
      "Status": "Enabled",
      "Prefix": "logs/",
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"}
      ],
      "Expiration": {"Days": 365}
    }
  ]
}

92. S3 Storage Cost Comparison (prices approximate):

Storage Class	Price per GB-month	Retrieval Cost	Min Duration	Min Size
Standard	$0.023	None	None	None
Intelligent-Tiering	$0.023 + monitoring	None	None	None
Standard-IA	$0.0125	$0.01/GB	30 days	128 KB
One Zone-IA	$0.01	$0.01/GB	30 days	128 KB
Glacier Instant	$0.004	$0.03/GB	90 days	128 KB
Glacier Flexible	$0.0036	$0.02/GB (std)	90 days	40 KB
Glacier Deep	$0.00099	$0.02/GB (std)	180 days	40 KB

93. S3 Request Pricing (prices approximate):

PUT/COPY/POST/LIST: $0.005 per 1,000 requests
GET: $0.0004 per 1,000 requests
Lifecycle transitions: $0.01 per 1,000 requests
Data retrieval: Varies by storage class

S3 Mind Map

94. AWS S3 Service Mind Map:

Amazon S3
├── Storage Classes
│   ├── Standard
│   ├── Intelligent-Tiering
│   ├── Standard-IA
│   ├── One Zone-IA
│   ├── Glacier Instant Retrieval
│   ├── Glacier Flexible Retrieval
│   └── Glacier Deep Archive
├── Data Management
│   ├── Lifecycle Policies
│   ├── Versioning
│   ├── Replication (CRR/SRR)
│   ├── Storage Lens
│   ├── Inventory
│   └── Batch Operations
├── Security
│   ├── IAM Policies
│   ├── Bucket Policies
│   ├── ACLs
│   ├── Encryption (SSE-S3, SSE-KMS, SSE-C)
│   ├── Object Lock
│   ├── Access Points
│   └── VPC Endpoints
├── Performance
│   ├── Transfer Acceleration
│   ├── Multipart Upload
│   ├── Byte-Range Fetches
│   ├── S3 Select
│   └── Prefix Optimization
├── Analytics & Monitoring
│   ├── S3 Analytics
│   ├── CloudWatch Metrics
│   ├── CloudTrail
│   ├── Access Logs
│   └── Event Notifications
└── Integration
    ├── Data Lake Services (Athena, Glue)
    ├── Processing (Lambda, EMR)
    ├── Streaming (Kinesis)
    ├── Migration (DataSync, Transfer Family)
    └── Content Delivery (CloudFront)

S3 Additional Features

95. S3 Requester Pays buckets require the requester to pay for data transfer and request costs instead of the bucket owner.

96. S3 Website Hosting provides static website hosting with custom domain support.

97. S3 Directory Bucket (introduced 2023) provides stronger consistency guarantees and lower latency for specific workloads.

98. S3 Access Points simplify managing access to shared datasets with dedicated access policies.

99. S3 Object Lambda transforms data retrieved from S3 before returning to the application.

100. S3 Replayability for Data Ingestion Pipelines:

Use S3 versioning to maintain historical versions
Implement SQS dead-letter queues for failed processing
Store processing metadata with objects using S3 object tags
Use manifest files to track processing state
Implement idempotent processors that can safely reprocess data

DEV Community