DEV Community

Data Tech Bridge
Data Tech Bridge

Posted on

Amazon S3 Service Cheat Sheet

S3 Fundamentals

1. Amazon S3 (Simple Storage Service) is an object storage service offering industry-leading scalability, data availability, security, and performance.

2. S3 Storage Classes:

Storage Class Designed For Availability Min Storage Duration Retrieval Fee Use Cases
S3 Standard Frequently accessed data 99.99% None None Big data analytics, content distribution
S3 Intelligent-Tiering Data with unknown or changing access patterns 99.9% None None Long-lived data with unpredictable access patterns
S3 Standard-IA Infrequently accessed data 99.9% 30 days Per GB Backups, disaster recovery
S3 One Zone-IA Infrequently accessed, non-critical data 99.5% (single AZ) 30 days Per GB Secondary backups, easily recreatable data
S3 Glacier Instant Retrieval Archive data needing immediate access 99.9% 90 days Per GB Media archives, healthcare records
S3 Glacier Flexible Retrieval Archive data that rarely needs access 99.99% 90 days Per GB + retrieval Digital preservation, compliance archives
S3 Glacier Deep Archive Long-term archive 99.99% 180 days Per GB + retrieval Financial records, healthcare data
S3 Outposts On-premises S3 storage Varies None None Local data processing with S3 compatibility

3. S3 Bucket Naming Rules: Globally unique, 3-63 characters, lowercase letters, numbers, hyphens, no IP format, must start with letter/number.

4. S3 Object Properties: Key (name), Value (data), Version ID, Metadata, Subresources (ACLs, Torrent).

5. S3 Object Size Limits: Individual objects can be from 0 bytes to 5 TB; objects larger than 5 GB must use multipart upload.

S3 Performance and Optimization

6. S3 Performance: S3 automatically scales to high request rates with latency of 100-200ms.

7. S3 Request Rates:

  • 3,500 PUT/COPY/POST/DELETE requests per second per prefix
  • 5,500 GET/HEAD requests per second per prefix

8. S3 Performance Optimization Techniques:

Technique Description Best For
Prefix Parallelization Use multiple prefixes to increase throughput High-throughput applications
Multipart Upload Split large objects into parts for parallel upload Objects > 100 MB
S3 Transfer Acceleration Fast transfer over long distances using CloudFront Global data transfers
Byte-Range Fetches Parallel downloads of specific byte ranges Large file partial access
S3 Select Server-side filtering to reduce data transfer Analytics on subset of data
S3 Inventory Scheduled flat-file output of objects and metadata Large bucket management
Partitioning Strategy Randomized prefixes to distribute load Very high throughput needs

9. Multipart Upload Calculation Example:

  • 5 GB file with 100 MB parts = 50 parts uploaded in parallel
  • With 500 Mbps connection: ~80 seconds vs ~400 seconds for single-part upload

10. S3 Transfer Acceleration: Uses CloudFront's globally distributed edge locations to accelerate uploads to S3 by up to 500% for long-distance transfers.

S3 Data Management

11. S3 Lifecycle Policies automate transitioning objects between storage classes or expiring objects based on age.

12. S3 Lifecycle Transitions:

From To Minimum Days
Standard Standard-IA 30 days
Standard Intelligent-Tiering None
Standard One Zone-IA 30 days
Standard Glacier Instant Retrieval 30 days
Standard Glacier Flexible Retrieval 30 days
Standard Glacier Deep Archive 90 days
Standard-IA Glacier Instant Retrieval 30 days
Standard-IA Glacier Flexible Retrieval 30 days
Standard-IA Glacier Deep Archive 90 days
Intelligent-Tiering Glacier Instant Retrieval 90 days
Intelligent-Tiering Glacier Flexible Retrieval 90 days
Intelligent-Tiering Glacier Deep Archive 180 days

13. S3 Versioning keeps multiple variants of objects in the same bucket, allowing recovery from accidental deletions or overwrites.

14. S3 Replication:

  • Cross-Region Replication (CRR): Replicate objects across regions for compliance, lower latency, or disaster recovery
  • Same-Region Replication (SRR): Replicate objects within the same region for log aggregation or production/test sync
  • Replication requires versioning enabled on both source and destination buckets

15. S3 Batch Operations perform bulk operations on existing S3 objects with a single request, such as copying objects, setting ACLs, or restoring from Glacier.

S3 Security and Access Control

16. S3 Security Features:

Feature Description Use Case
IAM Policies Identity-based policies User/role access control
Bucket Policies Resource-based policies Cross-account access
ACLs Legacy access control Simple permission grants
Presigned URLs Temporary access to objects Temporary download/upload
VPC Endpoints Private connection from VPC No internet access needed
Access Points Named network endpoints Simplified access management
Object Lock WORM (Write Once Read Many) Compliance requirements
S3 Block Public Access Prevent public access Data protection

17. S3 Encryption Options:

Encryption Type Description Key Management
SSE-S3 Server-side encryption with S3-managed keys AWS manages keys
SSE-KMS Server-side encryption with KMS keys Customer controls via KMS
SSE-C Server-side encryption with customer-provided keys Customer provides keys
Client-side encryption Encryption before uploading to S3 Customer manages keys

18. S3 Default Encryption is enabled automatically for all new buckets with SSE-S3 (AES-256).

19. S3 Object Lock provides WORM (Write Once Read Many) model with two retention modes:

  • Governance mode: Users with special permissions can override
  • Compliance mode: No one can override during retention period, including AWS account root user

S3 Data Processing and Analytics

20. S3 Select and Glacier Select allow you to use SQL expressions to retrieve only a subset of data from an object, reducing data transfer and improving query performance by up to 400%.

21. S3 Event Notifications can trigger workflows when objects are created, deleted, or restored:

  • Destinations: SNS, SQS, Lambda
  • Event types: ObjectCreated, ObjectRemoved, ObjectRestore, Replication, LifecycleExpiration

22. S3 Inventory provides scheduled reports of objects and metadata, useful for business, compliance, and regulatory needs.

23. S3 Analytics helps analyze storage access patterns to decide when to transition objects to appropriate storage class.

24. S3 Storage Lens provides organization-wide visibility into object storage usage and activity with customizable dashboards.

S3 Data Transfer and Integration

25. AWS DataSync provides high-speed data transfer between on-premises storage and S3 with automatic encryption and data validation.

26. AWS Transfer Family provides SFTP, FTPS, and FTP access to S3, enabling file transfers over these protocols directly to and from S3 buckets.

27. S3 Integration with AWS Services:

AWS Service Integration with S3
AWS Glue Catalog and ETL for data in S3
Amazon Athena SQL queries directly on S3 data
Amazon EMR Big data processing on S3 data
AWS Lambda Process S3 events
Amazon QuickSight Visualize data stored in S3
AWS Lake Formation Build, secure, and manage data lakes
Amazon Redshift Query data in S3 with Redshift Spectrum
Amazon SageMaker ML model training with S3 data
AWS Backup Centralized backup of S3 data

28. S3 Data Ingestion Patterns:

Pattern Description Best For
Direct API Applications use S3 API directly Simple workflows
S3 Transfer Acceleration Fast long-distance uploads Global data sources
Kinesis Data Firehose Streaming data delivery to S3 Real-time data capture
AWS DataSync Scheduled transfers from on-premises Large dataset migration
AWS Snowball/Snowmobile Physical data transfer Petabyte-scale transfers
AWS DMS Database migration to S3 Database archiving

S3 Cost Management

29. S3 Pricing Components:

  • Storage pricing (per GB-month)
  • Request pricing (per 1,000 requests)
  • Data transfer pricing (per GB)
  • Management features and analytics
  • Retrieval fees (for IA and Glacier classes)

30. S3 Cost Optimization Strategies:

Strategy Description Savings Potential
Storage Class Analysis Identify optimal storage class 20-50%
Lifecycle Policies Automate transitions and expirations 30-70%
S3 Intelligent-Tiering Automatic tiering based on access patterns 15-40%
S3 Storage Lens Identify cost optimization opportunities Varies
S3 Inventory Identify objects for cleanup Varies
S3 Batch Operations Bulk delete unused objects Varies
S3 Same-Region Replication Replicate only necessary data Varies

31. S3 Storage Cost Calculation Example:

  • 100 TB in S3 Standard: ~$2,300/month
  • Same data with 80% in S3 Standard-IA: ~$1,500/month (35% savings)
  • With lifecycle policy moving 50% to Glacier after 90 days: ~$1,000/month (56% savings)

S3 Limits and Quotas

32. S3 Service Limits:

Limit Value Can be increased?
Maximum buckets per account 100 Yes (service quota)
Maximum object size 5 TB No
Maximum object size (console upload) 160 GB No
Maximum parts in multipart upload 10,000 No
Minimum part size (except last part) 5 MB No
Maximum part size 5 GB No
Maximum bucket policy size 20 KB No
Maximum number of access points per Region 10,000 Yes
Maximum lifecycle rules per bucket 1,000 No
Maximum tags per object 10 No

33. S3 Rate Limits and Throttling:

  • Default limits can handle extremely high request rates
  • S3 automatically scales to accommodate sustained request rates
  • For extreme workloads (>100 requests/second), consider prefix partitioning

34. Overcoming S3 Rate Limits:

  • Implement exponential backoff for 503 errors
  • Distribute load across multiple prefixes
  • Use randomized prefixes for high-throughput workloads
  • Consider S3 Transfer Acceleration for uploads

S3 Data Consistency Model

35. S3 Data Consistency: S3 provides strong read-after-write consistency for all operations as of December 2020.

36. S3 Consistency Guarantees:

  • New objects: Immediate visibility after successful write
  • Overwrite PUTS and DELETES: Immediate consistency for reads
  • LIST operations: Consistent view of all objects

S3 Glacier Features

37. S3 Glacier Retrieval Options:

Retrieval Type Retrieval Time Cost
Expedited 1-5 minutes Highest
Standard 3-5 hours Medium
Bulk 5-12 hours Lowest

38. S3 Glacier Vault Lock provides WORM (Write Once Read Many) protection with compliance controls that even the root user cannot modify.

39. S3 Glacier Restore Calculation Example:

  • 1 TB data with Standard retrieval: ~$10 retrieval fee + ~$90 for 30-day restored copy in S3
  • Same data with Bulk retrieval: ~$3 retrieval fee + ~$90 for 30-day restored copy in S3

S3 Data Protection

40. S3 Versioning keeps multiple variants of objects, allowing recovery from accidental deletions or overwrites.

41. S3 MFA Delete requires additional authentication for permanently deleting object versions or suspending versioning.

42. S3 Cross-Region Replication (CRR) automatically replicates data across regions for compliance, lower latency, or disaster recovery.

43. S3 Same-Region Replication (SRR) replicates data within the same region for log aggregation or production/test sync.

44. S3 Object Lock prevents objects from being deleted or overwritten for a fixed time or indefinitely.

45. S3 Replication Time Control (RTC) replicates 99.99% of objects within 15 minutes with SLA backing.

S3 Performance Monitoring

46. Key CloudWatch Metrics for S3:

Metric Description Threshold Recommendation
BucketSizeBytes Total bucket size Set alerts based on expected growth
NumberOfObjects Total object count Monitor for unexpected changes
AllRequests Total request count Baseline + 20% for alerts
4xxErrors Client errors <1% of total requests
5xxErrors Server errors <0.01% of total requests
FirstByteLatency Time to first byte P90 < 200ms
TotalRequestLatency Total request time P90 < 300ms
BytesDownloaded Data downloaded Monitor for cost management
BytesUploaded Data uploaded Monitor for cost management
ReplicationLatency Time for replication <15 minutes (with RTC)

47. S3 Request Metrics can be enabled for specific prefixes, objects, or entire buckets to track request counts, latencies, and errors.

48. S3 Replication Metrics track pending operations, latency, and bytes pending replication.

S3 Data Lake Integration

49. S3 as a Data Lake Foundation:

  • Unlimited scalability for any data type
  • Cost-effective with storage classes
  • Centralized access control
  • Integration with analytics services

50. S3 Data Lake Architecture Components:

Component AWS Service Purpose
Storage S3 Raw data storage
Catalog AWS Glue Metadata management
Security Lake Formation Fine-grained access control
Processing EMR, Athena, Redshift Spectrum Data processing
Orchestration Step Functions, Airflow Workflow management
Ingestion Kinesis, DataSync, Transfer Family Data acquisition

51. S3 Data Partitioning Strategies:

Strategy Format Best For
Time-based s3://bucket/data/year=2023/month=05/day=01/ Time-series data
Category-based s3://bucket/data/region=us-east-1/product=widget/ Dimensional data
Hive-style s3://bucket/table_name/key1=val1/key2=val2/ Compatibility with Hive
Nested s3://bucket/data/year=2023/month=05/region=us-east/ Multi-dimensional analysis

52. S3 Data Formats for Analytics:

Format Compression Splittable Schema Evolution Best For
Parquet Yes Yes Yes Column-oriented analytics
ORC Yes Yes Yes Hive workloads
Avro Yes Yes Yes Schema evolution
JSON Yes No Yes Flexibility, human-readable
CSV Yes No Limited Simple data, compatibility

S3 Data Ingestion Patterns

53. S3 Batch Operations perform bulk operations on existing S3 objects with a single request.

54. S3 Batch Operations Job Properties:

  • Operation type (copy, invoke Lambda, restore, etc.)
  • Manifest (list of objects to process)
  • Priority (numeric value for job ordering)
  • RoleArn (IAM role with permissions)
  • Report configuration (completion report details)

55. S3 Event Notifications can trigger workflows when objects are created, deleted, or restored.

56. S3 Event Notification Filtering supports prefix and suffix matching to process only relevant objects.

57. S3 Event Notification Destinations:

  • SNS Topics: Fan-out to multiple subscribers
  • SQS Queues: Reliable message processing
  • Lambda Functions: Custom code execution
  • EventBridge: Advanced filtering and routing

58. Kinesis Data Firehose to S3 provides real-time streaming data delivery with:

  • Automatic batching for efficiency
  • Format conversion (JSON to Parquet/ORC)
  • Data transformation via Lambda
  • Error handling with backup bucket

59. S3 Data Ingestion Pipeline Replayability Strategies:

Strategy Implementation Pros Cons
Source-based replay Reread from source system Complete fidelity Source system dependency
S3 versioning Maintain object versions Simple implementation Storage costs
Backup copy Duplicate data to another bucket Isolation from production Storage costs
Event-driven Store events in SQS/Kinesis Decoupled processing Additional complexity
Manifest-based Track processed files Precise control Requires manifest management

60. Throttling Implementation for S3 Data Ingestion:

  • Client-side rate limiting
  • SQS as a buffer with controlled processing
  • Lambda concurrency limits
  • API Gateway throttling for web-based uploads

S3 Advanced Features

61. S3 Access Points simplify managing access to shared datasets with dedicated access policies.

62. S3 Object Lambda transforms data retrieved from S3 before returning to the application, enabling:

  • Redacting PII
  • Converting formats
  • Enriching data
  • Filtering rows/columns

63. S3 Requester Pays buckets require the requester to pay for data transfer and request costs instead of the bucket owner.

64. S3 Inventory provides scheduled flat-file output listing objects and metadata.

65. S3 Batch Operations with Inventory enables bulk operations on objects identified in inventory reports.

66. S3 Storage Class Analysis helps identify when to transition objects to lower-cost storage classes.

67. S3 Select Query Examples:

  • CSV: SELECT s._1, s._2 FROM S3Object s WHERE s._3 > 100
  • JSON: SELECT s.name, s.age FROM S3Object s WHERE s.age > 25
  • Parquet: SELECT * FROM S3Object WHERE age > 30 LIMIT 100

S3 Performance Best Practices

68. S3 Performance Best Practices:

Best Practice Implementation Benefit
Prefix Parallelization Use multiple prefixes Higher throughput
Range GETs Parallel byte-range fetches Faster large object access
Transfer Acceleration Enable for bucket Faster long-distance transfers
Multipart Upload Split large files Parallel upload, resiliency
S3 Select Server-side filtering Reduced network transfer
Compression Compress objects Lower storage costs, faster transfer
Caching CloudFront or ElastiCache Lower latency, reduced load

69. S3 Multipart Upload Thresholds:

  • Recommended for objects > 100 MB
  • Required for objects > 5 GB
  • Optimal part size: 25-100 MB for typical networks
  • Maximum of 10,000 parts per upload

70. S3 Performance Testing Methodology:

  • Establish baseline with single-threaded transfers
  • Test with multiple threads/connections
  • Experiment with different part sizes
  • Measure with different prefix strategies
  • Compare with/without Transfer Acceleration

S3 Data Processing Patterns

71. S3 Event-Driven Processing Patterns:

Pattern Implementation Use Case
Direct Lambda S3 event → Lambda Simple transformations
Queue-based S3 event → SQS → Lambda Rate limiting, retry handling
Fan-out S3 event → SNS → multiple endpoints Multiple consumers
Orchestrated S3 event → Step Functions Complex workflows
Stream processing S3 event → Kinesis → processors Real-time analytics

72. S3 Batch Processing Patterns:

Pattern Implementation Use Case
EMR Spark/Hadoop on EMR reading from S3 Big data processing
Glue ETL AWS Glue jobs reading from S3 Serverless ETL
Batch Operations S3 Batch Operations with Lambda Object-level operations
Athena SQL queries directly on S3 Interactive analysis
Redshift Spectrum Redshift external tables on S3 Data warehousing

73. S3 Data Lake Processing Layers:

Layer Description S3 Implementation
Raw/Bronze Original unmodified data S3 Standard with lifecycle to IA/Glacier
Processed/Silver Cleansed, validated data S3 Standard with partitioning
Curated/Gold Business-ready datasets S3 Standard with optimized formats
Application Purpose-built data products S3 Standard with CloudFront

S3 Security Best Practices

74. S3 Security Best Practices:

Best Practice Implementation Benefit
Block Public Access Enable at account level Prevent accidental exposure
Default Encryption Enable SSE-S3 or SSE-KMS Data protection at rest
VPC Endpoints Create Gateway Endpoint for S3 Private network access
Access Logging Enable S3 access logging Audit and compliance
IAM Policies Least privilege principle Controlled access
Bucket Policies Explicit allow/deny Resource-level control
Presigned URLs Time-limited access Temporary permissions
Object Lock Enable for critical data Immutability

75. S3 Security Monitoring:

  • CloudTrail for API activity
  • S3 Access Logs for object-level access
  • CloudWatch Metrics for operation counts
  • AWS Config for configuration compliance
  • Macie for sensitive data detection

S3 Data Migration and Transfer

76. S3 Data Migration Options:

Option Transfer Speed Data Size Range Use Case
Direct Upload Depends on bandwidth MB to GB Small files, good connectivity
AWS CLI Depends on bandwidth GB to TB Command-line automation
S3 Transfer Acceleration Up to 500% faster GB to TB Long-distance transfers
AWS DataSync Up to 10 Gbps TB to PB Scheduled migrations
AWS Transfer Family Depends on bandwidth GB to TB FTP/SFTP compatibility
AWS Storage Gateway Depends on bandwidth TB to PB Hybrid cloud integration
AWS Snowcone Offline Up to 8 TB Edge locations
AWS Snowball Offline Up to 80 TB Large datasets
AWS Snowmobile Offline Up to 100 PB Massive data centers

77. S3 Transfer Acceleration Performance Comparison:

  • 1 TB transfer from US to Australia:
    • Standard transfer: ~12 hours
    • With Transfer Acceleration: ~2.5 hours

78. AWS DataSync Performance:

  • Up to 10 Gbps throughput
  • Parallel processing of files
  • Automatic retry mechanism
  • Built-in validation

S3 Compliance and Governance

79. S3 Compliance Features:

Feature Implementation Compliance Need
Object Lock WORM protection SEC Rule 17a-4, FINRA, CFTC
Glacier Vault Lock Immutable vault policy Long-term records retention
Access Logging Detailed access logs Audit requirements
Inventory Reports Scheduled metadata reports Asset management
Replication Cross-region or same-region Data residency, DR
Versioning Object version history Change tracking
Lifecycle Policies Automated retention Records management
Macie Integration Sensitive data discovery PII protection

80. S3 Object Lock Modes:

  • Governance mode: Special permissions can override
  • Compliance mode: No overrides during retention period
  • Legal hold: Indefinite retention independent of retention period

S3 Integration with Data Engineering Services

81. S3 Integration with AWS Glue:

  • Glue crawlers scan S3 data to populate the Glue Data Catalog
  • Glue ETL jobs read from and write to S3
  • Glue Data Catalog provides metadata for S3 objects
  • Glue schema registry manages schemas for S3 data

82. S3 Integration with Amazon Athena:

  • Serverless SQL queries directly on S3 data
  • Supports CSV, JSON, ORC, Avro, Parquet
  • Pay only for data scanned
  • Federated queries to other data sources

83. S3 Integration with Amazon EMR:

  • EMRFS provides S3 access from EMR clusters
  • S3DistCp for efficient data copying
  • EMR File System (EMRFS) consistency view
  • EMR supports direct processing of S3 data

84. S3 Integration with Amazon Redshift:

  • COPY command loads data from S3 to Redshift
  • Redshift Spectrum queries data directly in S3
  • Unload command exports data to S3
  • Automatic compression encoding

85. S3 Integration with AWS Lake Formation:

  • Centralized permissions for S3 data lakes
  • Column-level, row-level, and cell-level security
  • Tag-based access control
  • Data location registration

S3 Throughput and Latency Characteristics

86. S3 Throughput Characteristics:

  • Unlimited total throughput (scales with request rate)
  • 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix
  • No bandwidth limits for a single bucket
  • Multipart uploads recommended for objects > 100 MB

87. S3 Latency Characteristics:

  • First-byte latency: typically 100-200 ms
  • Varies by region and request type
  • GET operations faster than PUT operations
  • S3 Transfer Acceleration improves latency for distant clients

88. S3 Performance Comparison:

Operation Average Latency Throughput Limit Optimization
GET 100-200 ms 5,500 req/s per prefix CloudFront caching
PUT 200-300 ms 3,500 req/s per prefix Multipart upload
LIST Varies with objects Rate limited Prefix organization
DELETE 200-300 ms 3,500 req/s per prefix Batch operations

89. S3 Performance Monitoring Metrics:

  • FirstByteLatency: Time to first byte
  • TotalRequestLatency: End-to-end latency
  • BytesDownloaded/BytesUploaded: Data transfer volume
  • 4xxErrors/5xxErrors: Error counts
  • ReplicationLatency: Time for replication

S3 Cost Optimization

90. S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns, with no retrieval charges or operational overhead.

91. S3 Lifecycle Configuration Example:

{
  "Rules": [
    {
      "ID": "Move to IA after 30 days, Glacier after 90, expire after 365",
      "Status": "Enabled",
      "Prefix": "logs/",
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"}
      ],
      "Expiration": {"Days": 365}
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

92. S3 Storage Cost Comparison (prices approximate):

Storage Class Price per GB-month Retrieval Cost Min Duration Min Size
Standard $0.023 None None None
Intelligent-Tiering $0.023 + monitoring None None None
Standard-IA $0.0125 $0.01/GB 30 days 128 KB
One Zone-IA $0.01 $0.01/GB 30 days 128 KB
Glacier Instant $0.004 $0.03/GB 90 days 128 KB
Glacier Flexible $0.0036 $0.02/GB (std) 90 days 40 KB
Glacier Deep $0.00099 $0.02/GB (std) 180 days 40 KB

93. S3 Request Pricing (prices approximate):

  • PUT/COPY/POST/LIST: $0.005 per 1,000 requests
  • GET: $0.0004 per 1,000 requests
  • Lifecycle transitions: $0.01 per 1,000 requests
  • Data retrieval: Varies by storage class

S3 Mind Map

94. AWS S3 Service Mind Map:

Amazon S3
├── Storage Classes
│   ├── Standard
│   ├── Intelligent-Tiering
│   ├── Standard-IA
│   ├── One Zone-IA
│   ├── Glacier Instant Retrieval
│   ├── Glacier Flexible Retrieval
│   └── Glacier Deep Archive
├── Data Management
│   ├── Lifecycle Policies
│   ├── Versioning
│   ├── Replication (CRR/SRR)
│   ├── Storage Lens
│   ├── Inventory
│   └── Batch Operations
├── Security
│   ├── IAM Policies
│   ├── Bucket Policies
│   ├── ACLs
│   ├── Encryption (SSE-S3, SSE-KMS, SSE-C)
│   ├── Object Lock
│   ├── Access Points
│   └── VPC Endpoints
├── Performance
│   ├── Transfer Acceleration
│   ├── Multipart Upload
│   ├── Byte-Range Fetches
│   ├── S3 Select
│   └── Prefix Optimization
├── Analytics & Monitoring
│   ├── S3 Analytics
│   ├── CloudWatch Metrics
│   ├── CloudTrail
│   ├── Access Logs
│   └── Event Notifications
└── Integration
    ├── Data Lake Services (Athena, Glue)
    ├── Processing (Lambda, EMR)
    ├── Streaming (Kinesis)
    ├── Migration (DataSync, Transfer Family)
    └── Content Delivery (CloudFront)
Enter fullscreen mode Exit fullscreen mode

S3 Additional Features

95. S3 Requester Pays buckets require the requester to pay for data transfer and request costs instead of the bucket owner.

96. S3 Website Hosting provides static website hosting with custom domain support.

97. S3 Directory Bucket (introduced 2023) provides stronger consistency guarantees and lower latency for specific workloads.

98. S3 Access Points simplify managing access to shared datasets with dedicated access policies.

99. S3 Object Lambda transforms data retrieved from S3 before returning to the application.

100. S3 Replayability for Data Ingestion Pipelines:

  • Use S3 versioning to maintain historical versions
  • Implement SQS dead-letter queues for failed processing
  • Store processing metadata with objects using S3 object tags
  • Use manifest files to track processing state
  • Implement idempotent processors that can safely reprocess data

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post →

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →