A Conversation Between a Full Stack Developer and an RDS Expert
Table of Contents
Part 1: Understanding the Basics
Part 2: Deep Dive into Mechanisms
- How Do They Actually Work?
- Automated Backup Process (Incremental)
- Manual Snapshot Process (Full Copy)
Part 3: Understanding Storage & Incremental Backups
Part 4: Transaction Logs
Part 5: AWS Console Views
Part 6: Real-World Scenarios
- Scenario 1: Accidental Data Deletion
- Scenario 2: Failed Schema Migration
- Scenario 3: Disaster Recovery
Part 7: Decision Making
Part 8: Cost & Performance
Part 9: Best Practices
Part 10: Avoiding Mistakes
Part 11: Quick Reference
Part 12: Monitoring
Part 13: Deletion Management
Part 14: Exporting Snapshots
- Exporting Snapshots to Amazon S3
- What is Snapshot Export to S3?
- When and Why to Export to S3
- How to Export Snapshot to S3
- Using Exported Data
Conclusion
Appendix
Part 1: The Confusion Begins
Developer: Hey! I'm working on our production database on AWS RDS, and I'm completely confused. I see "Automated Backups" and "Snapshots" in the console. Aren't they the same thing? Why do we need both?
RDS Expert: Great question! This confuses many people. Let me break it down for you. They're actually quite different in purpose and functionality.
Think of it this way:
- Automated Backups = Your continuous safety net (like a time machine)
- Manual Snapshots = Your bookmarks/checkpoints (like saving your game progress)
Let me show you how they differ:
┌─────────────────────────────────────────────────────────────┐
│ AUTOMATED BACKUPS │
├─────────────────────────────────────────────────────────────┤
│ • Continuous, automatic │
│ • Point-in-time recovery (PITR) │
│ • Includes transaction logs │
│ • Retention: 0-35 days │
│ • Deleted when RDS instance is deleted │
│ • Taken during backup window │
│ • INCREMENTAL after first full backup ✨ │
│ • Storage-efficient │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ MANUAL SNAPSHOTS │
├─────────────────────────────────────────────────────────────┤
│ • Manual, on-demand │
│ • No point-in-time recovery │
│ • No transaction logs │
│ • Retention: Forever (until you delete) │
│ • Persist even after RDS deletion │
│ • Taken anytime you want │
│ • FULL COPY (first time), then incremental ✨ │
│ • Uses EBS snapshot technology │
└─────────────────────────────────────────────────────────────┘
Part 2: How Do They Actually Work?
Developer: Okay, that helps. But HOW do they work under the hood? What's actually happening? And what does "incremental" mean?
RDS Expert: Excellent! This is where it gets really interesting. Let me explain the crucial difference between how automated backups and snapshots handle data.
Automated Backup Process
┌──────────────────────────────────────────────────────────────────┐
│ AUTOMATED BACKUP FLOW (INCREMENTAL) │
└──────────────────────────────────────────────────────────────────┘
Day 1 - Monday 03:00 AM (Backup Window Starts)
│
├─► FULL BACKUP taken (First time only)
│ ├─► Stored in S3 (encrypted)
│ ├─► Size: 100 GB (complete database)
│ └─► This becomes the baseline
│
├─► Throughout the day...
│ │
│ ├─► 03:15 AM - Transaction Log #1 captured
│ ├─► 03:20 AM - Transaction Log #2 captured
│ ├─► 03:25 AM - Transaction Log #3 captured
│ └─► ... continues every 5 minutes
│
Day 2 - Tuesday 03:00 AM (Next Backup Window)
│
├─► INCREMENTAL BACKUP taken ✨
│ ├─► Only changed blocks since Monday's full backup
│ ├─► Size: Only 3 GB (just the delta/changes)
│ └─► Much faster than full backup
│
├─► Example of changes captured:
│ ├─► Block 1234: Updated customer record
│ ├─► Block 5678: New order inserted
│ ├─► Block 9012: Product price updated
│ └─► Only these changed blocks are backed up
│
Day 3 - Wednesday 03:00 AM
│
├─► INCREMENTAL BACKUP taken ✨
│ ├─► Only changed blocks since Tuesday
│ ├─► Size: 2.5 GB (just today's changes)
│ └─► Continues building on previous backups
│
Day 4-7 - Similar incremental backups
│ └─► Each backup only stores what changed
│
Day 8 - Next Monday 03:00 AM
│
└─► New FULL BACKUP cycle may start (RDS decides)
└─► Or continue incremental chain
Manual Snapshot Process
┌──────────────────────────────────────────────────────────────────┐
│ MANUAL SNAPSHOT FLOW (EBS-BASED) │
└──────────────────────────────────────────────────────────────────┘
First Snapshot - Monday 2:00 PM
│
├─► You click "Take Snapshot"
│ │
│ ├─► RDS uses EBS snapshot technology
│ ├─► Creates FULL copy of all EBS blocks
│ ├─► Size: 100 GB (complete database state)
│ └─► Takes 10-15 minutes
│
Second Snapshot - Monday 6:00 PM (same day)
│
├─► You click "Take Snapshot" again
│ │
│ ├─► EBS compares with previous snapshot
│ ├─► Only copies CHANGED blocks ✨
│ ├─► Size: Only 2 GB (incremental)
│ ├─► Takes 3-5 minutes
│ └─► References unchanged blocks from first snapshot
│
Third Snapshot - Tuesday 2:00 PM
│
└─► Another snapshot taken
├─► Again, only changed blocks since last snapshot
├─► Size: 5 GB (one day of changes)
└─► Chain continues
How EBS Snapshots Work:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot 1 (Full): [AAAAAAAAAA] 100 GB
↓
Snapshot 2 (Incr): [AAAAAAAAAA] → [BB] 2 GB
(references) (new blocks)
↓
Snapshot 3 (Incr): [AAAAAAAAAA] → [BB] → [CCC] 5 GB
(references) (ref) (new)
Total Storage Used: 100 + 2 + 5 = 107 GB
(Not 100 + 100 + 100 = 300 GB!) ✨
Part 3: Understanding Incremental Backups
Developer: Wait! So both use incremental techniques? What's the actual advantage?
RDS Expert: Exactly! Both use incremental approaches after the first backup, but they work differently. Let me explain the massive advantages:
What Are Incremental Backups?
┌──────────────────────────────────────────────────────────────────┐
│ FULL BACKUP vs INCREMENTAL BACKUP │
└──────────────────────────────────────────────────────────────────┘
WITHOUT INCREMENTAL (Traditional Way):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Day 1: Full Backup → 100 GB stored → Takes 30 min
Day 2: Full Backup → 100 GB stored → Takes 30 min
Day 3: Full Backup → 100 GB stored → Takes 30 min
Day 4: Full Backup → 100 GB stored → Takes 30 min
Day 5: Full Backup → 100 GB stored → Takes 30 min
Day 6: Full Backup → 100 GB stored → Takes 30 min
Day 7: Full Backup → 100 GB stored → Takes 30 min
Total Storage: 700 GB for 7 days! 💰💰💰
Total Time: 3.5 hours of backup time per week
WITH INCREMENTAL (RDS Way):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Day 1: Full Backup → 100 GB stored → Takes 30 min
Day 2: Incremental → 3 GB stored → Takes 3 min
Day 3: Incremental → 2.5 GB stored → Takes 2 min
Day 4: Incremental → 4 GB stored → Takes 4 min
Day 5: Incremental → 3.5 GB stored → Takes 3 min
Day 6: Incremental → 2 GB stored → Takes 2 min
Day 7: Incremental → 3 GB stored → Takes 3 min
Total Storage: 118 GB for 7 days! ✨
Total Time: 47 minutes of backup time per week
SAVINGS: 83% storage, 77% time! 🎉
Storage Calculations Explained
┌──────────────────────────────────────────────────────────────────┐
│ REAL STORAGE CALCULATION │
└──────────────────────────────────────────────────────────────────┘
Scenario: 100 GB Database, 7-day retention
CORRECT CALCULATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Day 1: Full backup = 100 GB
Day 2: +3 GB changes = 103 GB total
Day 3: +2.5 GB changes = 105.5 GB total
Day 4: +4 GB changes = 109.5 GB total
Day 5: +3.5 GB changes = 113 GB total
Day 6: +2 GB changes = 115 GB total
Day 7: +3 GB changes = 118 GB total
Average daily change rate: ~3 GB (3% of database size)
Total storage for 7 days: ~118 GB
NOT: 7 days × 100 GB = 700 GB ❌
FACTORS AFFECTING STORAGE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Change Rate (Data Volatility):
├─ Read-heavy workload: 1-2% daily change → Less storage
├─ Balanced workload: 3-5% daily change → Moderate storage
└─ Write-heavy workload: 10-15% daily change → More storage
Example (100 GB database, 7 days):
├─ Low change (2%/day): 100 + (6 × 2) = 112 GB
├─ Medium change (5%/day): 100 + (6 × 5) = 130 GB
└─ High change (10%/day): 100 + (6 × 10) = 160 GB
2. Retention Period:
├─ 7 days retention: Base + (6 × daily_change)
├─ 14 days retention: Base + (13 × daily_change)
└─ 35 days retention: Base + (34 × daily_change)
3. Backup Window Optimization:
└─ RDS periodically consolidates old incremental backups
to optimize storage
REAL EXAMPLE CALCULATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
100 GB database with 3% daily change rate:
7-day retention:
├─ Storage needed: 100 GB + (6 days × 3 GB) = 118 GB
├─ Free tier: 100 GB (matches database size)
├─ Chargeable: 118 - 100 = 18 GB
└─ Cost: 18 GB × $0.095 = $1.71/month ✅
14-day retention:
├─ Storage needed: 100 GB + (13 days × 3 GB) = 139 GB
├─ Free tier: 100 GB
├─ Chargeable: 139 - 100 = 39 GB
└─ Cost: 39 GB × $0.095 = $3.71/month ✅
35-day retention (maximum):
├─ Storage needed: 100 GB + (34 days × 3 GB) = 202 GB
├─ Free tier: 100 GB
├─ Chargeable: 202 - 100 = 102 GB
└─ Cost: 102 GB × $0.095 = $9.69/month
Advantages of Incremental Backups
┌──────────────────────────────────────────────────────────────────┐
│ WHY INCREMENTAL BACKUPS ARE AWESOME │
└──────────────────────────────────────────────────────────────────┘
1. MASSIVE STORAGE SAVINGS 💰
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Traditional full backups: 700 GB for 7 days
Incremental backups: 118 GB for 7 days
Savings: 83% reduction in storage costs!
Real cost example:
├─ Full backups: (700 - 100 free) × $0.095 = $57/month
└─ Incremental: (118 - 100 free) × $0.095 = $1.71/month
Saves: $55.29/month! 🎉
2. FASTER BACKUP WINDOWS ⚡
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Full backup: 30 minutes (copies all 100 GB)
Incremental backup: 2-5 minutes (copies only 2-5 GB changes)
Benefits:
├─ Shorter maintenance windows
├─ Less I/O impact on production
├─ More frequent backup opportunities
└─ Faster to complete within backup window
3. REDUCED I/O OVERHEAD 📊
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Full backup: Reads entire 100 GB database
├─ Millions of I/O operations
├─ Significant CPU usage
└─ Can impact application performance
Incremental backup: Reads only changed blocks
├─ Thousands of I/O operations (not millions)
├─ Minimal CPU usage
└─ Negligible application impact
4. NETWORK BANDWIDTH SAVINGS 🌐
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Data transferred to S3:
├─ Full backup: 100 GB daily = 700 GB/week
└─ Incremental: 100 GB + 18 GB = 118 GB/week
Saves: 582 GB bandwidth per week!
5. EFFICIENT RESTORE OPERATIONS 🔄
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RDS optimizes restore by:
├─ Using the base full backup
├─ Applying only necessary incremental changes
└─ Parallel processing of incremental blocks
Result: Restore is almost as fast as from a full backup!
6. BETTER RETENTION OPTIONS 📅
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Because incrementals use less space:
├─ Can afford longer retention periods
├─ 35 days retention becomes economical
└─ More restore points available
COMPARISON TABLE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Metric Full Backup Incremental Improvement
─────────────────────────────────────────────────────────────────
Storage (7 days) 700 GB 118 GB 83% less
Backup time 30 min 2-5 min 90% faster
I/O operations Millions Thousands 99% less
Network transfer 700 GB/week 118 GB/week 83% less
Performance impact High Minimal 95% better
Cost (7 days) $57/month $1.71/month 97% cheaper
Developer: Wow! So incremental backups are a game-changer. But how does RDS know which blocks changed?
RDS Expert: Great question! Here's how:
┌──────────────────────────────────────────────────────────────────┐
│ HOW INCREMENTAL DETECTION WORKS │
└──────────────────────────────────────────────────────────────────┘
BLOCK-LEVEL CHANGE TRACKING:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Database storage is divided into blocks (typically 8KB each)
Example: 100 GB database = ~13 million blocks
RDS tracks each block's state:
│
├─► Block #1234567: Last modified: Jan 15 03:15:22
├─► Block #1234568: Last modified: Jan 14 22:30:45
├─► Block #1234569: Last modified: Jan 15 14:22:10
└─► ... (13 million blocks tracked)
During backup at 03:00 AM:
│
├─► Compare each block's timestamp with last backup
├─► If modified_time > last_backup_time:
│ └─► Include this block in incremental backup
└─► If modified_time <= last_backup_time:
└─► Skip this block (already backed up)
EXAMPLE SCENARIO:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Last backup: Jan 15, 03:00 AM
Current backup: Jan 16, 03:00 AM
Changes in last 24 hours:
├─ 10,000 INSERT operations → 10,000 new blocks
├─ 5,000 UPDATE operations → 5,000 modified blocks
├─ 2,000 DELETE operations → 2,000 freed blocks
└─ Total: 17,000 blocks changed out of 13 million (0.13%)
Backup captures:
├─ Only these 17,000 changed blocks
├─ Size: 17,000 blocks × 8 KB = 136 MB
└─ Not the entire 100 GB!
TECHNOLOGY BEHIND IT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RDS uses:
├─ MySQL: Binary log positions + InnoDB change buffer
├─ PostgreSQL: WAL (Write-Ahead Log) + LSN tracking
├─ EBS: Block-level change tracking in storage layer
└─ All coordinated to identify changed blocks efficiently
Part 4: The Transaction Log Mystery
Developer: You keep mentioning "transaction logs." What ARE they, and why do they matter? How do they relate to incremental backups?
RDS Expert: Great question! Transaction logs are the SECRET SAUCE that makes Point-in-Time Recovery possible. They work alongside incremental backups. Let me explain:
What Transaction Logs Capture
┌──────────────────────────────────────────────────────────────────┐
│ TRANSACTION LOG EXAMPLE │
└──────────────────────────────────────────────────────────────────┘
Monday 03:00 AM - Full Backup Taken
├─── Database State: 1000 users, 5000 orders
│
│ [Backup captures entire database: 100 GB]
│
│
Monday 03:05 AM - Transaction Log #1
├─── INSERT INTO users VALUES (1001, 'John Doe')
├─── UPDATE orders SET status='shipped' WHERE id=4523
├─── DELETE FROM temp_data WHERE created < '2024-01-01'
│
│
Monday 10:15 AM - Transaction Log #2
├─── INSERT INTO orders VALUES (5001, 1001, 'Product X')
├─── UPDATE users SET last_login=NOW() WHERE id=1001
│
│
Monday 14:30 PM - Transaction Log #3
├─── INSERT INTO orders VALUES (5002, 950, 'Product Y')
│
│
Monday 14:35 PM - 💥 DISASTER! Accidental DELETE
├─── DELETE FROM orders WHERE status='pending' 😱
(Oops! Deleted 500 orders by mistake!)
Tuesday 03:00 AM - Incremental Backup
├─── Captures all block changes since Monday 03:00 AM
├─── Size: 3 GB (only changed data blocks)
└─── Does NOT include individual transaction details
THE KEY DIFFERENCE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Incremental Backups:
├─ Capture: Changed BLOCKS (at backup time)
├─ Frequency: Daily (during backup window)
├─ Granularity: Per-block level
└─ Purpose: Efficient storage of database state
Transaction Logs:
├─ Capture: Individual TRANSACTIONS (continuous)
├─ Frequency: Every 5 minutes, 24/7
├─ Granularity: Per-statement level
└─ Purpose: Enable point-in-time recovery to any second
TOGETHER THEY PROVIDE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Full Backup (Day 1) → Baseline state
+
Incremental Backups → Daily changed blocks
+
Transaction Logs → Second-by-second changes
= Point-in-time recovery to ANY second! ✨
How Recovery Uses Transaction Logs
┌──────────────────────────────────────────────────────────────────┐
│ POINT-IN-TIME RECOVERY PROCESS │
└──────────────────────────────────────────────────────────────────┘
You want to restore to Monday 14:34 PM (1 minute before disaster)
Step 1: Start with Full Backup from Monday 03:00 AM
└─── Restore: 1000 users, 5000 orders [100 GB]
Step 2: Apply Incremental Backup (if available)
└─── Note: Next incremental is Tuesday 03:00 AM
└─── So we skip this (not needed yet)
Step 3: Apply Transaction Logs from 03:00 AM to 14:34 PM
├─── Log 03:05 AM: + John Doe user
├─── Log 03:05 AM: + Update order 4523
├─── Log 03:05 AM: + Delete old temp data
├─── Log 10:15 AM: + Order 5001
├─── Log 10:15 AM: + Update John's login
└─── Log 14:30 PM: + Order 5002
Step 4: STOP at 14:34 PM ⏰
└─── DON'T apply the DELETE that happened at 14:35 PM
Result: Database restored to 14:34 PM - before the disaster! ✅
HOW IT WORKS VISUALLY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Timeline:
│
│ Mon 03:00 AM ──────────────────────────────── Tue 03:00 AM
│ │ │
│ │ │
│ [Full Backup] [Incremental Backup]
│ 100 GB 3 GB changes
│ │ │
│ ├──► Transaction Logs ────────────────────────┤
│ │ (captured every 5 min, 24/7) │
│ │ │
│ │ 14:34 PM 14:35 PM │
│ │ ↑ ↑ │
│ │ (Restore) (Disaster) │
│ │ here │
│ │ │
│ └──────────────────┬───────────────────────────┘
│ │
│ Recovery combines:
│ 1. Full backup (03:00 AM)
│ 2. Transaction logs (03:00-14:34)
│ 3. Stops before disaster!
WITHOUT TRANSACTION LOGS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Could only restore to:
├─ Monday 03:00 AM (full backup) ❌ 11.5 hours of data loss
└─ Tuesday 03:00 AM (next backup) ❌ Includes the disaster!
WITH TRANSACTION LOGS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Can restore to ANY second between backups! ✅
├─ Monday 03:00:01 AM ✅
├─ Monday 10:15:30 AM ✅
├─ Monday 14:34:59 PM ✅ (1 second before disaster)
└─ Data loss: Less than 1 minute!
Developer: Wow! So transaction logs are like a detailed journal of every change?
RDS Expert: Exactly! They're continuously captured every 5 minutes and allow you to restore to ANY point within your backup retention period, down to the second. The combination of incremental backups and transaction logs gives you the best of both worlds: storage efficiency AND precise recovery!
Part 5: What You See in the AWS Console
Developer: Okay, so what will I actually see in the AWS Console?
RDS Expert: Let me show you what each section displays:
Console View - Automated Backups
┌──────────────────────────────────────────────────────────────────┐
│ AWS RDS Console → Database → Maintenance & backups │
└──────────────────────────────────────────────────────────────────┘
Automated Backups Section:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Enabled
Backup retention period: 7 days
Backup window: 03:00 - 04:00 UTC
Latest Restore Time: 2024-01-15 14:35:22 UTC
Earliest Restore Time: 2024-01-08 14:35:22 UTC
⏰ You can restore to ANY point between these times!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Backup Storage Information:
Total backup storage: 118 GB
├─ Free tier (matches DB): 100 GB ✅
└─ Chargeable storage: 18 GB 💰
Breakdown:
├─ Full backup (7 days ago): 100 GB
├─ Incremental backups (6 days): 15 GB
└─ Transaction logs: 3 GB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
System Backups (Not directly visible, managed by AWS):
├─ 2024-01-15 03:00 - rds:prod-db-2024-01-15-03-00 (Incremental)
├─ 2024-01-14 03:00 - rds:prod-db-2024-01-14-03-00 (Incremental)
├─ 2024-01-13 03:00 - rds:prod-db-2024-01-13-03-00 (Incremental)
├─ 2024-01-12 03:00 - rds:prod-db-2024-01-12-03-00 (Incremental)
├─ 2024-01-11 03:00 - rds:prod-db-2024-01-11-03-00 (Incremental)
├─ 2024-01-10 03:00 - rds:prod-db-2024-01-10-03-00 (Incremental)
└─ 2024-01-09 03:00 - rds:prod-db-2024-01-09-03-00 (Full backup)
Console View - Manual Snapshots
┌──────────────────────────────────────────────────────────────────┐
│ AWS RDS Console → Snapshots │
└──────────────────────────────────────────────────────────────────┘
Manual Snapshots:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Name | Status | Created | Size | Type
───────────────────────────────────────────────────────────────────────────────
before-schema-migration | Available | 2024-01-15 09:00 UTC | 105 GB | Manual
pre-production-release-v2.1 | Available | 2024-01-10 14:30 UTC | 103 GB | Manual
before-data-cleanup | Available | 2024-01-05 11:15 UTC | 100 GB | Manual
Note: Size shown is the ACTUAL space consumed:
├─ First snapshot: 100 GB (full copy)
├─ Second snapshot: +3 GB (incremental)
├─ Third snapshot: +5 GB (incremental)
└─ Total: 108 GB (not 300 GB!)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Actions Available:
• Restore snapshot (creates new instance from this point)
• Copy snapshot (duplicate to same/different region)
• Share snapshot (with other AWS accounts)
• Delete snapshot (free up storage)
• Migrate snapshot (convert to different engine version)
Part 6: Real-World Scenarios with Timestamps
Developer: This is making sense! Can you walk me through some real scenarios with exact timestamps?
RDS Expert: Absolutely! Let's go through three common scenarios:
Scenario 1: Accidental Data Deletion (Use Automated Backup)
┌──────────────────────────────────────────────────────────────────┐
│ SCENARIO: ACCIDENTAL DATA DELETION │
└──────────────────────────────────────────────────────────────────┘
TIMELINE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Monday, Jan 15, 2024
03:00 AM → Automated backup runs (daily backup window)
├─ Incremental backup taken (only changed blocks)
├─ Database size: 100 GB
├─ Changed blocks: 3 GB
└─ Backup completes: 03:05 AM
09:00 AM → Normal business operations
├─ 150 new orders placed
├─ 45 users registered
├─ Transaction logs capturing everything every 5 min
└─ All changes recorded for PITR
02:30 PM → Developer runs UPDATE query
├─ UPDATE products SET price = price * 0.9
├─ WHERE category = 'Electronics'
└─ ✅ This is correct - 10% discount on electronics
02:35 PM → 💥 DISASTER! Developer runs wrong query
├─ UPDATE products SET price = price * 0.9
├─ ❌ Forgot the WHERE clause!
└─ All 10,000 products now have wrong prices!
02:37 PM → Panic! Discovery of the mistake
└─ Need to restore database
RECOVERY PLAN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Option 1: Point-in-Time Restore to 02:34 PM ✅ RECOMMENDED
Why 02:34 PM?
├─ After the correct discount (02:30 PM) ✅
└─ Before the mistake (02:35 PM) ✅
Process:
1. Select "Restore to point in time"
2. Choose: January 15, 2024, 02:34:00 PM
3. AWS will:
├─ Take the 03:00 AM incremental backup
├─ Apply all transaction logs from 03:00 AM to 02:34 PM
└─ Create new RDS instance: "prod-db-restored"
4. Behind the scenes:
├─ Start with yesterday's full backup (100 GB)
├─ Apply today's incremental (3 GB)
├─ Apply transaction logs (covering 11.5 hours)
└─ Total restore time: 15-20 minutes
5. Result:
├─ 150 orders from morning: ✅ Preserved
├─ 45 new users: ✅ Preserved
├─ Correct electronics discount: ✅ Applied
└─ Wrong price update: ❌ Never happened
Time to restore: 15-20 minutes
Data loss: Only 3 minutes (02:34 PM to 02:37 PM)
Cost: Creating new instance during validation
Option 2: Restore from Manual Snapshot ❌ NOT IDEAL
If you had a snapshot from 09:00 AM:
├─ Snapshot is a full point-in-time copy
├─ No transaction logs attached
└─ Would lose:
├─ 150 orders (09:00 AM to 02:37 PM) ❌
├─ 45 users ❌
├─ Correct discount ❌
└─ 5.5 hours of data lost
LESSON LEARNED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Automated backups with PITR saved the day:
✅ Minimal data loss (3 minutes)
✅ All legitimate transactions preserved
✅ Quick recovery (15-20 minutes)
✅ Surgical precision (restore to exact second)
Scenario 2: Schema Migration Gone Wrong (Use Manual Snapshot)
┌──────────────────────────────────────────────────────────────────┐
│ SCENARIO: FAILED SCHEMA MIGRATION │
└──────────────────────────────────────────────────────────────────┘
TIMELINE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Saturday, Jan 20, 2024 (Maintenance Window)
01:00 AM → Create manual snapshot before migration ✅
├─ Name: "before-v3-schema-migration"
├─ Database size: 105 GB
├─ Type: Full snapshot (first of the day)
├─ Status: Creating... (uses EBS snapshot)
└─ Status: Available (01:10 AM) - takes ~10 minutes
01:15 AM → Begin schema migration
├─ ALTER TABLE orders ADD COLUMN tracking_info JSON
├─ CREATE INDEX idx_tracking ON orders(tracking_info)
├─ Migrate 5 million rows...
└─ Transaction logs recording all DDL changes
02:45 AM → 💥 DISASTER! Migration script fails
├─ Error: Foreign key constraint violation
├─ Database in inconsistent state
├─ Some rows migrated, some not
├─ Schema partially changed
└─ Application throwing errors
02:50 AM → Decision: Rollback needed
RECOVERY PLAN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Use Manual Snapshot ✅ RECOMMENDED
Why NOT Point-in-Time Restore?
├─ Schema changes are DDL (Data Definition Language)
├─ Partial migration created inconsistent state
├─ Transaction logs include all the failed operations
├─ PITR would replay the problematic migration
└─ Result: Restored DB would still be corrupted! ❌
Why Manual Snapshot?
├─ Clean state before ANY migration started
├─ Known good schema (verified before snapshot)
├─ No DDL operations in this snapshot
└─ Guaranteed consistency ✅
Process:
1. Select snapshot "before-v3-schema-migration"
2. Restore to new instance: "prod-db-rollback"
3. AWS creates exact copy from 01:00 AM state
├─ Restores all EBS blocks as they were
├─ Time: 10-15 minutes
└─ No need to apply transaction logs
4. Verify restored database:
├─ Check schema version ✅
├─ Test application connectivity ✅
├─ Verify data integrity ✅
└─ Run read-only tests ✅
5. Cutover:
├─ Update application connection string
├─ Switch DNS/Route53 to new instance
├─ Monitor for 30 minutes
└─ Delete failed instance
Time to restore: 10-15 minutes
Data loss: Everything from 01:00 AM to 02:50 AM (1 hour 50 min)
└─ Acceptable because:
├─ Maintenance window (low traffic)
├─ Application was down anyway
└─ Clean state more important than recent data
WHAT AUTOMATED BACKUP WOULD HAVE DONE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If you tried PITR to 02:49 AM:
1. Restore last full backup (Sun 03:00 AM)
2. Apply incremental backup (Sat 03:00 AM) - includes pre-migration state
3. Apply transaction logs from 01:15 AM to 02:49 AM
└─ This includes ALL the failed migration steps! ❌
4. Result: Database restored with partial/corrupt schema ❌
5. You'd STILL need the manual snapshot ⚠️
LESSON LEARNED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before major schema changes:
✅ ALWAYS create manual snapshot
✅ Manual snapshot = "Save game" before boss fight
✅ Automated backup would replay the mistakes
✅ Clean rollback point is critical
Scenario 3: Disaster Recovery / Region Failure
┌──────────────────────────────────────────────────────────────────┐
│ SCENARIO: REGION FAILURE / DISASTER RECOVERY │
└──────────────────────────────────────────────────────────────────┘
SETUP:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Primary Region: us-east-1
Backup Strategy:
├─ Automated backups: 7 days retention ✅
│ └─ Storage: 118 GB (incremental)
│
└─ Daily manual snapshots copied to us-west-2 ✅
└─ Storage: 100 GB per snapshot (first is full, rest incremental)
Every Day at 04:00 AM (automated Lambda):
1. Automated backup runs in us-east-1 (incremental)
2. Lambda function triggers at 04:30 AM
3. Creates manual snapshot (shares incremental blocks with auto backup)
4. Copies snapshot to us-west-2
├─ First copy: 100 GB transfer
├─ Subsequent copies: 3-5 GB transfer (incremental)
└─ Cost: $0.02/GB for transfer
DISASTER TIMELINE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Tuesday, Jan 23, 2024
03:00 AM → Automated incremental backup in us-east-1 ✅
└─ 3 GB of changes captured
04:00 AM → Lambda creates manual snapshot
├─ Snapshot: prod-db-2024-01-23
└─ Incremental from yesterday's snapshot
04:30 AM → Snapshot copied to us-west-2 ✅
├─ Transfer size: 3.2 GB (only incremental blocks)
├─ Cost: 3.2 GB × $0.02 = $0.064
└─ Copy completes: 04:45 AM
[Normal operations throughout the day...]
02:00 PM → 💥 DISASTER! AWS us-east-1 region outage
├─ RDS instance unreachable
├─ Automated backups unreachable (same region)
├─ Application down
├─ Cannot access PITR (transaction logs in us-east-1)
└─ Need to failover to us-west-2
RECOVERY PLAN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Use Copied Manual Snapshot ✅ ONLY OPTION
Why Can't Use Automated Backup?
├─ Automated backups are region-locked
├─ Transaction logs stored in us-east-1
├─ Cannot access PITR from different region
└─ Automated backups don't support cross-region ❌
Why Manual Snapshot Works?
├─ Snapshot copied to us-west-2
├─ Complete point-in-time state at 04:00 AM
├─ Self-contained (doesn't need transaction logs)
└─ Can restore independently ✅
Process:
1. Go to us-west-2 console
2. Navigate to Snapshots
3. Find: prod-db-2024-01-23
├─ Size: 105 GB (full database state)
├─ Created: Jan 23, 04:00 AM (us-east-1 time)
└─ Copied: Jan 23, 04:45 AM (us-west-2 time)
4. Restore to new RDS instance
├─ Instance name: prod-db-dr
├─ Instance type: db.r6g.2xlarge (same as primary)
├─ Multi-AZ: Yes
└─ Restore time: 25-35 minutes
5. Update application:
├─ Change connection string to us-west-2 endpoint
├─ Update Route53 to point to new region
├─ Verify application functionality
└─ Monitor performance
6. Data loss calculation:
├─ Last snapshot: 04:00 AM
├─ Outage time: 02:00 PM
└─ Data loss: 10 hours of transactions ⚠️
Time to restore: 30-45 minutes
Data loss: 10 hours (04:00 AM to 02:00 PM)
RTO (Recovery Time Objective): 45 minutes
RPO (Recovery Point Objective): 10 hours
IMPROVED STRATEGY (Reduce Data Loss):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
For critical applications, use:
1. Read Replica in us-west-2 (Better approach)
├─ Continuous replication (lag: 1-5 seconds)
├─ Promote to master on primary region failure
├─ RPO: 1-5 seconds (vs 10 hours)
├─ RTO: 2-5 minutes (vs 45 minutes)
└─ Cost: Additional instance cost, but worth it
2. More frequent snapshot copies
├─ Copy every 4 hours instead of daily
├─ 04:00 AM, 08:00 AM, 12:00 PM, 04:00 PM, etc.
├─ Incremental copies are fast (2-3 GB each)
├─ Reduces RPO to 4 hours
└─ Additional cost: $0.02/GB × 3 GB × 6 copies/day = $0.36/day
COST COMPARISON:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Daily snapshot copy (current):
├─ Storage in us-west-2: 105 GB × $0.095 = $9.98/month
├─ Transfer: 3 GB × $0.02 × 30 days = $1.80/month
└─ Total: $11.78/month
RPO: 24 hours (last daily snapshot)
4-hour snapshot copies (improved):
├─ Storage in us-west-2: 125 GB × $0.095 = $11.88/month
│ (5 extra incrementals of 4 GB each)
├─ Transfer: 3 GB × $0.02 × 6 × 30 = $10.80/month
└─ Total: $22.68/month
RPO: 4 hours (recent snapshot)
Additional cost: $10.90/month
Read Replica (best):
├─ Instance cost: ~$300-500/month (same as primary)
├─ Cross-region transfer: ~$50-100/month (replication data)
└─ Total: ~$350-600/month additional
RPO: 1-5 seconds
RTO: 2-5 minutes
Best for production systems
LESSON LEARNED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
For disaster recovery:
✅ Cross-region snapshots are minimum requirement
✅ Incremental copies make frequent backups affordable
✅ Read replicas provide best RPO/RTO
✅ Test your DR plan quarterly
✅ Automated backups cannot help in region failure
Part 7: Decision Flow Diagram
Developer: This is super helpful! Can you give me a flowchart to help me decide which to use?
RDS Expert: Absolutely! Here's your decision tree:
┌──────────────────────────────────────────────────────────────────┐
│ BACKUP vs SNAPSHOT DECISION TREE │
└──────────────────────────────────────────────────────────────────┘
[Need to Recover Data?]
│
├─ YES
│
┌───────────────┴────────────────┐
│ │
[What happened?] [Need cross-region?]
│ │
┌───────┴────────┐ YES
│ │ │
Accidental Schema/App [Use Manual
DELETE/ Change Snapshot Copy]
UPDATE Failed │
│ │ ├─► Only option for
│ │ │ cross-region DR
├─ Need recent ├─ Need clean │
recovery state before └─► Restore in
│ changes new region
│ │
↓ ↓
[Point-in-Time [Manual Snapshot
Restore] Restore]
│ │
├─ Uses: ├─ Uses:
│ • Full │ • Full snapshot
│ backup │ only
│ • Incremental│ • No transaction
│ backups │ logs
│ • Transaction│
│ logs │
│ │
├─ Can restore ├─ Restore to
to ANY specific
second checkpoint
│ │
├─ Data loss: ├─ Data loss:
Minimal Everything after
(seconds) snapshot time
│ │
└─ Best for: └─ Best for:
• Accidents • Pre-deployment
• User errors • Before migrations
• Recent • Known good states
corruption • Cross-region DR
┌─────────────────────────────┐
│ WHEN TO CREATE MANUAL │
│ SNAPSHOTS │
├─────────────────────────────┤
│ │
│ ✓ Before deployments │
│ ✓ Before schema changes │
│ ✓ Before major updates │
│ ✓ End of month/quarter │
│ ✓ Before bulk data ops │
│ ✓ Production milestones │
│ ✓ Compliance requirements │
│ ✓ Cross-region DR │
│ ✓ Long-term retention │
│ │
└─────────────────────────────┘
Part 8: Cost and Performance Implications
Developer: What about costs? Are there differences?
RDS Expert: Great question! Let's break down the costs with correct calculations:
Cost Comparison
┌──────────────────────────────────────────────────────────────────┐
│ COST BREAKDOWN (CORRECTED) │
└──────────────────────────────────────────────────────────────────┘
AUTOMATED BACKUPS (INCREMENTAL):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Storage Cost:
├─ FREE up to 100% of your database storage
│ Example: 100 GB database = 100 GB free backup storage
│
├─ HOW STORAGE ACCUMULATES (Incremental):
│
│ 100 GB Database, 3% daily change rate, 7-day retention:
│ ├─ Day 1: Full backup = 100 GB
│ ├─ Day 2: Incremental = 100 + 3 = 103 GB total
│ ├─ Day 3: Incremental = 103 + 3 = 106 GB total
│ ├─ Day 4: Incremental = 106 + 3 = 109 GB total
│ ├─ Day 5: Incremental = 109 + 3 = 112 GB total
│ ├─ Day 6: Incremental = 112 + 3 = 115 GB total
│ └─ Day 7: Incremental = 115 + 3 = 118 GB total
│
│ Total storage: 118 GB (NOT 700 GB!) ✅
│ Free tier: 100 GB
│ Chargeable: 18 GB
│
├─ Cost calculation:
│ 18 GB × $0.095/GB/month = $1.71/month
│
└─ For different retention periods:
├─ 7 days: ~118 GB → $1.71/month
├─ 14 days: ~139 GB → $3.71/month
└─ 35 days: ~202 GB → $9.69/month
COMPARISON: FULL vs INCREMENTAL BACKUPS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If RDS used FULL backups (it doesn't):
├─ Day 1: 100 GB
├─ Day 2: 100 GB (full copy again)
├─ Day 3: 100 GB (full copy again)
├─ Day 4: 100 GB (full copy again)
├─ Day 5: 100 GB (full copy again)
├─ Day 6: 100 GB (full copy again)
├─ Day 7: 100 GB (full copy again)
├─ Total: 700 GB
├─ Free tier: 100 GB
├─ Chargeable: 600 GB
└─ Cost: 600 GB × $0.095 = $57/month ❌
Actual RDS incremental backups:
├─ Total: 118 GB
├─ Free tier: 100 GB
├─ Chargeable: 18 GB
└─ Cost: 18 GB × $0.095 = $1.71/month ✅
SAVINGS: $55.29/month (97% reduction)! 🎉
MANUAL SNAPSHOTS (EBS-BASED INCREMENTAL):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Storage Cost:
├─ $0.095/GB/month for ALL snapshot storage
│ No free tier
│
├─ HOW SNAPSHOTS ACCUMULATE (EBS Incremental):
│
│ Example: 5 weekly snapshots of 100 GB database
│
│ Week 1: First snapshot (full) = 100 GB
│ Week 2: Snapshot (incremental) = 100 + 3.5 GB (week's changes) = 103.5 GB
│ Week 3: Snapshot (incremental) = 103.5 + 3.5 GB = 107 GB
│ Week 4: Snapshot (incremental) = 107 + 3.5 GB = 110.5 GB
│ Week 5: Snapshot (incremental) = 110.5 + 3.5 GB = 114 GB
│
│ Total storage: 114 GB (NOT 500 GB!)
│ Cost: 114 GB × $0.095 = $10.83/month ✅
│
└─ Cost continues even after database deletion
If snapshots were NOT incremental:
├─ 5 snapshots × 100 GB = 500 GB
└─ Cost: 500 GB × $0.095 = $47.50/month ❌
With EBS incremental snapshots:
├─ Total: 114 GB
└─ Cost: 114 GB × $0.095 = $10.83/month ✅
SAVINGS: $36.67/month (77% reduction)! 🎉
REALISTIC PRODUCTION EXAMPLE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
100 GB Production Database:
Automated Backups (7-day retention):
├─ Storage: 118 GB (incremental)
├─ Free tier: 100 GB
├─ Chargeable: 18 GB
└─ Cost: $1.71/month
Manual Snapshots (weekly for 3 months):
├─ 12 weekly snapshots
├─ Storage with incremental: ~142 GB
│ (100 GB base + 12 weeks × 3.5 GB avg)
├─ Cost: 142 GB × $0.095 = $13.49/month
Manual Snapshots (monthly for 1 year):
├─ 12 monthly snapshots
├─ Storage with incremental: ~175 GB
│ (100 GB base + 12 months × 6.25 GB avg)
├─ Cost: 175 GB × $0.095 = $16.63/month
Total Backup Cost:
├─ Automated: $1.71/month
├─ Weekly snapshots: $13.49/month
├─ Monthly snapshots: $16.63/month
└─ TOTAL: $31.83/month
vs Non-incremental (theoretical):
├─ Automated (7 full): $57/month
├─ Weekly (12 full): $114/month
├─ Monthly (12 full): $114/month
└─ TOTAL: $285/month ❌
INCREMENTAL BACKUPS SAVE: $253.17/month (89%) 🎉🎉🎉
FACTORS AFFECTING YOUR ACTUAL COSTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Database Size:
└─ Larger databases = more storage, but percentages same
2. Change Rate (Most Important):
├─ Read-heavy (1-2% daily): Lower costs
├─ Balanced (3-5% daily): Moderate costs
└─ Write-heavy (10-15% daily): Higher costs
Example 100 GB DB, 7-day retention:
├─ 2% daily: 100 + (6×2) = 112 GB → $1.14/month
├─ 5% daily: 100 + (6×5) = 130 GB → $2.85/month
└─ 10% daily: 100 + (6×10) = 160 GB → $5.70/month
3. Retention Period:
└─ Longer retention = more incremental backups accumulate
4. Number of Manual Snapshots:
└─ Each incremental snapshot adds ~3-5% of database size
COST OPTIMIZATION TIPS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Leverage automated backup free tier (100% of DB size)
2. Use lifecycle policies to delete old manual snapshots
3. For dev/test environments, reduce retention to 1-3 days
4. Cross-region copies: Only copy critical snapshots
5. Monitor your change rate to predict costs
6. Delete snapshots of deleted databases
Performance Impact
┌──────────────────────────────────────────────────────────────────┐
│ PERFORMANCE IMPACT │
└──────────────────────────────────────────────────────────────────┘
AUTOMATED BACKUPS (INCREMENTAL):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
During Backup Window:
Single-AZ deployment:
├─ I/O suspension: Few seconds
├─ First full backup: 20-30 minutes (one-time)
│ └─ Reads entire database
├─ Subsequent incremental backups: 2-5 minutes
│ └─ Only reads changed blocks (90% faster)
├─ Elevated latency: Minimal due to incremental
└─ Schedule during low-traffic period
Multi-AZ deployment: ✅ RECOMMENDED
├─ Backup from standby: Zero impact on primary
├─ Incremental backups: Even faster
├─ No performance degradation
└─ Safe during business hours
Transaction Log Capture:
├─ Frequency: Every 5 minutes, 24/7
├─ Overhead: <1% CPU, minimal I/O
├─ No noticeable impact on production
└─ Required for point-in-time recovery
Performance Benefit of Incremental:
├─ Full backup: 100 GB read, ~30 min, high I/O
└─ Incremental: 3 GB read, ~3 min, low I/O
IMPROVEMENT: 90% faster, 97% less I/O! ✨
MANUAL SNAPSHOTS (EBS INCREMENTAL):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
During Snapshot Creation:
Single-AZ deployment:
├─ First snapshot:
│ ├─ I/O freeze: 1-2 minutes (longer)
│ ├─ Snapshot duration: 20-30 minutes
│ └─ User queries may queue ⚠️
│
├─ Subsequent snapshots (incremental):
│ ├─ I/O freeze: Few seconds (much shorter)
│ ├─ Snapshot duration: 3-5 minutes
│ └─ Minimal impact ✅
│
└─ Take during maintenance window
Multi-AZ deployment: ✅ RECOMMENDED
├─ Snapshot from standby
├─ Incremental snapshots: Very fast
├─ Minimal impact on primary
└─ Safe during business hours
Snapshot Speed Comparison:
├─ First full snapshot: 20-30 min
└─ Incremental snapshots: 3-5 min
IMPROVEMENT: 80% faster! ✨
RESTORE PERFORMANCE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Automated Backup (PITR):
├─ Process:
│ ├─ Restore base backup (full or recent incremental)
│ ├─ Apply subsequent incremental backups
│ ├─ Apply transaction logs to target time
│ └─ All optimized by AWS
│
├─ Performance:
│ ├─ 100 GB database: 15-30 minutes
│ ├─ Incremental application: Parallelized
│ └─ Almost as fast as full restore ✨
│
└─ Benefit of incremental:
Less data to transfer from S3 = faster restore
Manual Snapshot:
├─ Process:
│ ├─ Restore EBS volumes from snapshot
│ ├─ Hydrate blocks on-demand (lazy loading)
│ └─ Full performance gradually available
│
├─ Performance:
│ ├─ 100 GB database: 10-20 minutes
│ ├─ Incremental snapshots: Same speed
│ └─ Instance available faster, but I/O slower initially
│
└─ Benefit of incremental:
Less snapshot storage = lower S3 costs
DATABASE SIZE IMPACT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
10 GB Database:
├─ Full backup: 5-8 min
├─ Incremental: 1-2 min (80% faster)
└─ Restore: 5-10 min
100 GB Database:
├─ Full backup: 25-35 min
├─ Incremental: 3-5 min (90% faster)
└─ Restore: 15-30 min
1 TB Database:
├─ Full backup: 3-4 hours
├─ Incremental: 20-30 min (95% faster)
└─ Restore: 1-2 hours
The larger the database, the MORE beneficial incremental becomes! ✨
Part 9: Best Practices & Recommendations
Developer: Okay, I'm getting it now. What should our backup strategy look like?
RDS Expert: Here's a comprehensive strategy that leverages incremental backups:
Production Database Backup Strategy
┌──────────────────────────────────────────────────────────────────┐
│ RECOMMENDED BACKUP STRATEGY (INCREMENTAL-AWARE) │
└──────────────────────────────────────────────────────────────────┘
FOR PRODUCTION DATABASES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. AUTOMATED BACKUPS: ✅ Always Enabled (Incremental)
Configuration:
├─ Retention: 7-14 days (affordable due to incremental)
│ └─ 7 days: ~118 GB → $1.71/month
│ └─ 14 days: ~139 GB → $3.71/month
│ └─ 35 days: ~202 GB → $9.69/month (still affordable!)
│
├─ Backup window: Low-traffic period (e.g., 2-4 AM)
│ └─ Incremental backups complete in 2-5 minutes
│
├─ Multi-AZ: Strongly recommended
│ └─ Backup from standby = zero production impact
│
└─ Encryption: Always enable
Purpose:
└─ Day-to-day protection against accidents
(leverages incremental efficiency)
2. MANUAL SNAPSHOTS: ✅ Event-Based (EBS Incremental)
Create snapshots BEFORE:
├─ Every deployment
│ ├─ Name: "pre-deploy-v2.3.1-2024-01-15"
│ ├─ First: 100 GB (10 min)
│ └─ Subsequent: +3 GB incremental (3 min)
│
├─ Schema migrations
│ └─ Name: "before-schema-migration-2024-01-15"
│
├─ Bulk data operations
│ └─ Name: "before-data-cleanup-2024-01-15"
│
├─ Weekly (Sunday midnight)
│ ├─ Name: "weekly-2024-W03"
│ └─ Incremental from last week's snapshot
│
└─ Monthly (last day of month)
├─ Name: "monthly-2024-01"
└─ Incremental chain continues
Retention (with incremental storage):
├─ Pre-deployment: Keep for 7 days
│ └─ ~7 snapshots × 3 GB avg = 121 GB total
│
├─ Weekly: Keep for 3 months
│ └─ 12 snapshots × 3.5 GB avg = 142 GB total
│
├─ Monthly: Keep for 1 year
│ └─ 12 snapshots × 6 GB avg = 172 GB total
│
└─ Total manual snapshot storage: ~250 GB
Cost: 250 GB × $0.095 = $23.75/month
(vs 1,260 GB if full backups = $119.70/month) ✅
3. CROSS-REGION COPIES: ✅ For Disaster Recovery
Strategy (Incremental-optimized):
├─ Copy daily snapshot to DR region
│ ├─ First copy: 100 GB transfer = $2.00
│ └─ Daily copies: 3-5 GB transfer = $0.06-0.10 each
│
├─ Monthly transfer cost: ~$3-4 (due to incremental)
│ vs $60 if full backups every time ❌
│
└─ Test restore quarterly
Example configuration:
├─ Primary: us-east-1
├─ DR: us-west-2
├─ Daily automated backup: 3 AM (incremental)
├─ Daily manual snapshot: 4 AM (for copying)
├─ Copy to us-west-2: 4:30 AM (only changed blocks)
└─ Retention in DR: 7 days rolling
4. TESTING: ✅ Verify Your Backups
Monthly:
├─ Restore automated backup to test instance
│ └─ Uses incremental chain (fast restore)
├─ Verify data integrity
├─ Test application connectivity
└─ Document restore time
Quarterly:
├─ Full DR drill using cross-region snapshot
├─ Restore from incremental snapshot chain
└─ Measure RTO (Recovery Time Objective)
TOTAL COST BREAKDOWN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
100 GB Production Database:
Automated Backups (14-day retention):
└─ $3.71/month (incremental)
Manual Snapshots:
├─ Weekly (12 weeks): $10.83/month
└─ Monthly (12 months): $12.92/month
└─ Subtotal: $23.75/month
Cross-Region Copies (us-west-2):
├─ Storage: 107 GB × $0.095 = $10.17/month
└─ Transfer: ~$3/month (incremental)
└─ Subtotal: $13.17/month
TOTAL BACKUP COST: $40.63/month ✅
If NOT incremental (theoretical):
├─ Automated: $57/month
├─ Manual: $119.70/month
├─ Cross-region: $60/month
└─ TOTAL: $236.70/month ❌
INCREMENTAL SAVINGS: $196.07/month (83%) 🎉
Automation Script Example
# Lambda function for incremental-aware snapshot management
import boto3
from datetime import datetime, timedelta
rds = boto3.client('rds')
cloudwatch = boto3.client('cloudwatch')
def lambda_handler(event, context):
db_instance = 'production-db'
# 1. Create weekly snapshot (incremental from previous)
if datetime.now().weekday() == 6: # Sunday
snapshot_id = f"weekly-{db_instance}-{datetime.now().strftime('%Y-W%W')}"
print(f"Creating weekly snapshot: {snapshot_id}")
print("Note: This will be incremental from last week's snapshot")
rds.create_db_snapshot(
DBSnapshotIdentifier=snapshot_id,
DBInstanceIdentifier=db_instance,
Tags=[
{'Key': 'Type', 'Value': 'Weekly'},
{'Key': 'Retention', 'Value': '90days'},
{'Key': 'Incremental', 'Value': 'true'}
]
)
# Wait for snapshot to complete
waiter = rds.get_waiter('db_snapshot_completed')
waiter.wait(DBSnapshotIdentifier=snapshot_id)
# Copy to DR region (only incremental changes transferred)
print(f"Copying snapshot to DR region (incremental transfer)")
rds.copy_db_snapshot(
SourceDBSnapshotIdentifier=f"arn:aws:rds:us-east-1:123456789012:snapshot:{snapshot_id}",
TargetDBSnapshotIdentifier=snapshot_id,
SourceRegion='us-east-1',
TargetRegion='us-west-2',
CopyTags=True
)
# Monitor snapshot size (for cost tracking)
snapshot_info = rds.describe_db_snapshots(
DBSnapshotIdentifier=snapshot_id
)['DBSnapshots'][0]
allocated_storage = snapshot_info.get('AllocatedStorage', 0)
cloudwatch.put_metric_data(
Namespace='Custom/RDS/Backups',
MetricData=[
{
'MetricName': 'SnapshotSize',
'Value': allocated_storage,
'Unit': 'Gigabytes',
'Dimensions': [
{'Name': 'SnapshotType', 'Value': 'Weekly'},
{'Name': 'DBInstance', 'Value': db_instance}
]
},
{
'MetricName': 'SnapshotIncremental',
'Value': 1, # This is an incremental snapshot
'Unit': 'Count',
'Dimensions': [
{'Name': 'SnapshotType', 'Value': 'Weekly'}
]
}
]
)
# 2. Cleanup old snapshots based on retention
snapshots = rds.describe_db_snapshots(
DBInstanceIdentifier=db_instance,
SnapshotType='manual'
)
# Calculate total storage used (incremental accumulation)
total_storage = 0
snapshot_chain = []
for snapshot in sorted(snapshots['DBSnapshots'],
key=lambda x: x['SnapshotCreateTime']):
created = snapshot['SnapshotCreateTime'].replace(tzinfo=None)
age_days = (datetime.now() - created).days
size_gb = snapshot.get('AllocatedStorage', 0)
snapshot_chain.append({
'id': snapshot['DBSnapshotIdentifier'],
'age': age_days,
'size': size_gb,
'created': created
})
total_storage += size_gb
# Delete based on retention policy
tags = {tag['Key']: tag['Value']
for tag in snapshot.get('TagList', [])}
should_delete = False
if tags.get('Type') == 'PreDeploy' and age_days > 7:
should_delete = True
print(f"Deleting pre-deploy snapshot (>7 days): {snapshot['DBSnapshotIdentifier']}")
elif tags.get('Type') == 'Weekly' and age_days > 90:
should_delete = True
print(f"Deleting weekly snapshot (>90 days): {snapshot['DBSnapshotIdentifier']}")
elif tags.get('Type') == 'Monthly' and age_days > 365:
should_delete = True
print(f"Deleting monthly snapshot (>365 days): {snapshot['DBSnapshotIdentifier']}")
if should_delete:
rds.delete_db_snapshot(
DBSnapshotIdentifier=snapshot['DBSnapshotIdentifier']
)
# 3. Report storage efficiency
db_size = rds.describe_db_instances(
DBInstanceIdentifier=db_instance
)['DBInstances'][0]['AllocatedStorage']
num_snapshots = len(snapshot_chain)
avg_incremental_size = (total_storage - db_size) / max(num_snapshots - 1, 1)
print(f"=== Backup Storage Report (Incremental) ===")
print(f"Database size: {db_size} GB")
print(f"Number of snapshots: {num_snapshots}")
print(f"Total storage: {total_storage} GB")
print(f"Average incremental size: {avg_incremental_size:.2f} GB")
print(f"Storage efficiency: {(1 - total_storage/(num_snapshots * db_size)) * 100:.1f}%")
print(f"Theoretical full backups: {num_snapshots * db_size} GB")
print(f"Savings from incremental: {(num_snapshots * db_size) - total_storage:.2f} GB")
# Publish metrics
cloudwatch.put_metric_data(
Namespace='Custom/RDS/Backups',
MetricData=[
{
'MetricName': 'TotalBackupStorage',
'Value': total_storage,
'Unit': 'Gigabytes'
},
{
'MetricName': 'AverageIncrementalSize',
'Value': avg_incremental_size,
'Unit': 'Gigabytes'
},
{
'MetricName': 'StorageEfficiency',
'Value': (1 - total_storage/(num_snapshots * db_size)) * 100,
'Unit': 'Percent'
}
]
)
return {
'statusCode': 200,
'body': f'Snapshot management completed. Total storage: {total_storage} GB'
}
Part 10: Common Mistakes to Avoid
Developer: What mistakes should I watch out for?
RDS Expert: Great question! Here are the common pitfalls:
┌──────────────────────────────────────────────────────────────────┐
│ COMMON MISTAKES │
└──────────────────────────────────────────────────────────────────┘
❌ MISTAKE #1: Disabling Automated Backups
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Setting retention period to 0 disables automated backups
├─ No transaction logs captured
├─ No point-in-time recovery possible
├─ Automated backups deleted immediately
└─ Lose incremental backup efficiency
Real scenario:
Developer: "Let's disable backups to save costs on dev DB"
[3 days later...]
Developer: "We need to recover yesterday's data!"
Result: Impossible. No backups exist. 😱
Reality check:
├─ 7-day retention cost: ~$1.71/month (with incremental)
└─ Data loss from no backup: Priceless ⚠️
Fix:
└─ Keep minimum 1 day retention even for dev/test
❌ MISTAKE #2: Relying Only on Manual Snapshots
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Snapshots taken once a day
├─ Accidental deletion at 3 PM
└─ Can only restore to midnight (15 hours of data loss)
Real scenario:
Company relies on daily midnight snapshots only
Ransomware attack at 2 PM
Result: Lose entire day's transactions
Fix:
└─ Always enable automated backups + selective manual snapshots
(Incremental backups make this affordable!)
❌ MISTAKE #3: Not Understanding Incremental Nature
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Developer thinks: "7 days = 7 × 100 GB = 700 GB cost"
├─ Actually: 7 days = ~118 GB (incremental)
├─ Disables backups unnecessarily
└─ Or panics about costs that don't exist
Real scenario:
"We can't afford 35-day retention, it'll cost $300/month!"
Reality: 35-day retention = ~$10/month with incremental ✅
Fix:
└─ Understand incremental nature = affordable long retention
❌ MISTAKE #4: Not Testing Restores
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Backups created but never tested
├─ Disaster strikes
└─ Discover backup is corrupted or incomplete
Real scenario:
"We have backups for 2 years"
[During disaster recovery...]
"The incremental chain is broken!"
Result: Extended downtime, data loss
Fix:
└─ Monthly restore tests
(Incremental restores are fast, so test often!)
❌ MISTAKE #5: Forgetting About Cross-Region
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ All backups in same region as production
├─ Region-wide outage occurs
└─ Cannot access ANY backups
Real scenario:
All resources in us-east-1
AWS us-east-1 major outage
Result: No access to database or backups for hours
With incremental:
├─ Cross-region copy: Only 3-5 GB/day transfer
├─ Cost: ~$3-4/month (affordable!)
└─ No excuse not to have DR
Fix:
└─ Copy snapshots to secondary region
(Incremental makes daily copies affordable)
❌ MISTAKE #6: Deleting Snapshots in Chain
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens (EBS snapshots):
├─ Snap1 (full): 100 GB
├─ Snap2 (incremental): +3 GB references Snap1
├─ Snap3 (incremental): +3 GB references Snap2
├─ Delete Snap2
└─ AWS automatically merges Snap2 into Snap3 ✅
Note: AWS handles this gracefully, but:
├─ Deletion takes longer
├─ May incur temporary extra storage
└─ Better to delete oldest or newest, not middle
Fix:
└─ Use lifecycle policies, delete in order
❌ MISTAKE #7: Not Monitoring Change Rate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Assume 3% daily change rate
├─ Actually experiencing 15% change rate
├─ Storage costs 5x higher than expected
└─ Bill shock at end of month
Real scenario:
Database change rate increases due to new feature
Backup storage grows from 118 GB to 200 GB
Cost increases from $1.71 to $9.50/month
Fix:
└─ Monitor backup storage growth
Track incremental size trends
Set CloudWatch alarms
❌ MISTAKE #8: Using Snapshot When PITR Needed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What happens:
├─ Accidental DELETE at 2:35 PM
├─ Have snapshot from 2:00 PM
├─ Restore snapshot
└─ Lose 35 minutes of legitimate transactions ❌
Should have used:
└─ PITR to 2:34 PM (1 minute before mistake)
Only loses 1 minute, not 35 minutes
Fix:
└─ Use PITR for recent accidents
Use snapshots for major changes
COMPLETE FIX CHECKLIST:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
□ Automated backups enabled (7-14 days)
□ Understand incremental nature for cost planning
□ Manual snapshots before major changes
□ Monthly restore tests
□ Cross-region copies for DR
□ Monitor backup storage growth
□ Monitor daily change rate
□ Use PITR for accidents, snapshots for milestones
□ Don't delete middle snapshots in chain
□ Document backup/restore procedures
Part 11: Quick Reference Guide
Developer: Can you give me a cheat sheet I can refer to quickly?
RDS Expert: Absolutely! Here's your quick reference:
┌──────────────────────────────────────────────────────────────────┐
│ QUICK REFERENCE GUIDE │
└──────────────────────────────────────────────────────────────────┘
WHEN TO USE WHAT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Situation → Use This
─────────────────────────────────────────────────────────────────
Accidental DELETE/UPDATE → Automated Backup (PITR)
User error (last few hours) → Automated Backup (PITR)
Data corruption (recent) → Automated Backup (PITR)
Before deployment → Manual Snapshot
Before schema migration → Manual Snapshot
Before bulk data operation → Manual Snapshot
Monthly archival → Manual Snapshot
Cross-region disaster recovery → Manual Snapshot (copied)
Need to restore to exact second → Automated Backup (PITR)
Need clean state before change → Manual Snapshot
Database deleted accidentally → Manual Snapshot (if exists)
Compliance/audit requirements → Manual Snapshot
KEY DIFFERENCES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Feature Automated Backups Manual Snapshots
──────────────────────────────────────────────────────────────────
Incremental ✅ Yes (after 1st) ✅ Yes (EBS-based)
Transaction Logs ✅ Yes ❌ No
PITR ✅ Yes ❌ No
Cross-Region ❌ No ✅ Yes
Retention 0-35 days Forever
Auto-deleted ✅ With DB ❌ Manual delete
Free tier ✅ 100% of DB size ❌ No free tier
Cost (7 days) ~$1.71/mo $9.50/mo (weekly)
STORAGE CALCULATIONS (100 GB DATABASE):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Assuming 3% daily change rate:
Retention Storage Needed Chargeable Cost/Month
─────────────────────────────────────────────────────────
1 day 100 GB 0 GB $0.00
7 days 118 GB 18 GB $1.71
14 days 139 GB 39 GB $3.71
35 days 202 GB 102 GB $9.69
Manual Snapshots (incremental):
5 weekly 114 GB 114 GB $10.83
12 monthly 175 GB 175 GB $16.63
RESTORE TIME ESTIMATE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Database Size PITR Restore Snapshot Restore
────────────────────────────────────────────────
10 GB 5-10 min 5-8 min
50 GB 10-20 min 8-15 min
100 GB 15-30 min 10-20 min
500 GB 30-60 min 20-40 min
1 TB+ 1-2 hours 40-90 min
Note: Incremental nature doesn't significantly impact restore time
AWS optimizes restore process automatically
BACKUP WINDOW DURATION (100 GB DB):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
First backup (full): 25-35 minutes
Daily backup (incr): 2-5 minutes ✨ 90% faster!
Snapshot (first): 20-30 minutes
Snapshot (incremental): 3-5 minutes ✨ 85% faster!
CHANGE RATE IMPACT (100 GB DB, 7 days):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Daily Change Storage Cost/Month
─────────────────────────────────────
1% (1 GB) 106 GB $0.57
3% (3 GB) 118 GB $1.71
5% (5 GB) 130 GB $2.85
10% (10 GB) 160 GB $5.70
15% (15 GB) 190 GB $8.55
RETENTION RECOMMENDATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Environment Automated Manual Snapshots
─────────────────────────────────────────────────────────
Production 7-14 days Weekly (3mo) + Monthly (1yr)
Staging 3-7 days Pre-deployment only
Development 1-3 days Optional
Test 1 day Not needed
COST COMPARISON: INCREMENTAL vs FULL
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
100 GB Database, 7-day retention:
With Incremental (RDS actual):
├─ Storage: 118 GB
├─ Free tier: 100 GB
└─ Cost: $1.71/month ✅
Without Incremental (theoretical):
├─ Storage: 700 GB (7 × 100 GB)
├─ Free tier: 100 GB
└─ Cost: $57/month ❌
SAVINGS: 97% reduction! 🎉
CHECKLIST FOR PRODUCTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
□ Automated backups enabled (7-14 days)
□ Multi-AZ enabled (zero-impact incremental backups)
□ Backup window during low traffic
□ Encryption enabled
□ Manual snapshot before each deployment
□ Weekly automated snapshots
□ Monthly long-term snapshots
□ Cross-region copies configured (incremental transfer)
□ Snapshot lifecycle policy in place
□ Monthly restore tests scheduled
□ Monitor backup storage growth
□ Monitor daily change rate
□ Backup/restore runbook documented
□ Team trained on PITR vs snapshot usage
AWS CLI Commands
# Create manual snapshot (will be incremental if previous exists)
aws rds create-db-snapshot \
--db-instance-identifier mydb \
--db-snapshot-identifier mydb-snapshot-2024-01-15
# Restore from PITR (uses incremental backups + transaction logs)
aws rds restore-db-instance-to-point-in-time \
--source-db-instance-identifier mydb \
--target-db-instance-identifier mydb-restored \
--restore-time 2024-01-15T14:30:00Z
# Restore from snapshot (uses incremental snapshot chain)
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mydb-restored \
--db-snapshot-identifier mydb-snapshot-2024-01-15
# List available restore times
aws rds describe-db-instances \
--db-instance-identifier mydb \
--query 'DBInstances[0].[LatestRestorableTime,EarliestRestorableTime]'
# Check backup storage (shows incremental accumulation)
aws rds describe-db-instances \
--db-instance-identifier mydb \
--query 'DBInstances[0].BackupRetentionPeriodStorageUsed'
# Copy snapshot to another region (only changed blocks transferred)
aws rds copy-db-snapshot \
--source-db-snapshot-identifier arn:aws:rds:us-east-1:123456789012:snapshot:mydb-snapshot \
--target-db-snapshot-identifier mydb-snapshot-dr \
--source-region us-east-1 \
--region us-west-2
# Delete old snapshots (AWS handles incremental chain automatically)
aws rds delete-db-snapshot \
--db-snapshot-identifier mydb-snapshot-old
# Describe snapshot to see actual size (incremental)
aws rds describe-db-snapshots \
--db-snapshot-identifier mydb-snapshot-2024-01-15 \
--query 'DBSnapshots[0].[AllocatedStorage,SnapshotType,PercentProgress]'
Part 12: Monitoring and Alerts
Developer: How do I monitor my backups and get alerted if something goes wrong?
RDS Expert: Excellent question! Monitoring is critical, especially with incremental backups. Here's your setup:
┌──────────────────────────────────────────────────────────────────┐
│ MONITORING & ALERTS SETUP │
└──────────────────────────────────────────────────────────────────┘
CLOUDWATCH METRICS TO MONITOR:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. BackupRetentionPeriodStorageUsed
├─ Shows total automated backup storage (incremental)
├─ Expected: DB_size + (retention_days × daily_change)
├─ Alert if growing faster than expected
└─ Alert if approaching limits
2. DailyBackupSize (Custom metric)
├─ Track daily incremental backup size
├─ Should be 2-5% of database size typically
├─ Alert if suddenly much larger (indicates issues)
└─ Helps predict future costs
3. SnapshotStorageUsed (Custom metric)
├─ Total manual snapshot storage
├─ Should grow incrementally
└─ Alert if unexpected spikes
4. ChangeRate (Custom metric)
├─ Percentage of database changed daily
├─ Affects incremental backup size
└─ Alert if rate doubles unexpectedly
CLOUDWATCH ALARMS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CloudFormation for backup monitoring (incremental-aware)
BackupStorageAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: RDS-Backup-Storage-High
AlarmDescription: Alert when backup storage growing unexpectedly
MetricName: BackupRetentionPeriodStorageUsed
Namespace: AWS/RDS
Dimensions:
- Name: DBInstanceIdentifier
Value: !Ref DBInstance
Statistic: Average
Period: 3600
EvaluationPeriods: 2
Threshold: 150 # 150% of database size
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref SNSTopic
TreatMissingData: notBreaching
IncrementalSizeAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: RDS-Large-Incremental-Backup
AlarmDescription: Alert when daily incremental unusually large
MetricName: DailyIncrementalSize
Namespace: Custom/RDS
Dimensions:
- Name: DBInstanceIdentifier
Value: !Ref DBInstance
Statistic: Average
Period: 86400
EvaluationPeriods: 1
Threshold: 10 # 10 GB (adjust based on your DB)
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref SNSTopic
ChangeRateAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: RDS-High-Change-Rate
AlarmDescription: Alert when change rate unusually high
MetricName: DailyChangeRate
Namespace: Custom/RDS
Dimensions:
- Name: DBInstanceIdentifier
Value: !Ref DBInstance
Statistic: Average
Period: 86400
EvaluationPeriods: 2
Threshold: 15 # 15% daily change
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref SNSTopic
# Enhanced Lambda for incremental backup monitoring
import boto3
from datetime import datetime, timedelta
import json
def lambda_handler(event, context):
rds = boto3.client('rds')
cloudwatch = boto3.client('cloudwatch')
db_instance = 'production-db'
# Get database info
response = rds.describe_db_instances(DBInstanceIdentifier=db_instance)
db = response['DBInstances'][0]
db_size = db['AllocatedStorage']
backup_storage = db.get('BackupRetentionPeriodStorageUsed', 0)
retention_days = db['BackupRetentionPeriod']
# Check if backups are disabled
if retention_days == 0:
send_alert("🚨 Automated backups are DISABLED!", db_instance)
return {'statusCode': 500, 'body': 'Backups disabled!'}
# Calculate expected storage (incremental model)
# Assume 3% daily change rate
expected_storage = db_size + (retention_days - 1) * (db_size * 0.03)
# Calculate actual vs expected
storage_ratio = backup_storage / expected_storage if expected_storage > 0 else 0
# Calculate average daily incremental size
avg_incremental = (backup_storage - db_size) / max(retention_days - 1, 1)
daily_change_rate = (avg_incremental / db_size) * 100
# Publish metrics
metrics = [
{
'MetricName': 'BackupStorageUsed',
'Value': backup_storage,
'Unit': 'Gigabytes',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'BackupStorageRatio',
'Value': storage_ratio,
'Unit': 'None',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'DailyIncrementalSize',
'Value': avg_incremental,
'Unit': 'Gigabytes',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'DailyChangeRate',
'Value': daily_change_rate,
'Unit': 'Percent',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'ExpectedStorage',
'Value': expected_storage,
'Unit': 'Gigabytes',
'Timestamp': datetime.utcnow()
}
]
for metric in metrics:
metric['Dimensions'] = [
{'Name': 'DBInstanceIdentifier', 'Value': db_instance}
]
cloudwatch.put_metric_data(
Namespace='Custom/RDS/Backups',
MetricData=metrics
)
# Alert conditions
if storage_ratio > 1.5:
send_alert(
f"⚠️ Backup storage {storage_ratio:.1f}x higher than expected!\n"
f"Current: {backup_storage} GB\n"
f"Expected: {expected_storage:.1f} GB\n"
f"Possible high change rate or retention issues.",
db_instance
)
if daily_change_rate > 10:
send_alert(
f"⚠️ High daily change rate detected: {daily_change_rate:.1f}%\n"
f"Average incremental: {avg_incremental:.2f} GB/day\n"
f"This will increase backup costs.",
db_instance
)
# Check manual snapshots
snapshots = rds.describe_db_snapshots(
DBInstanceIdentifier=db_instance,
SnapshotType='manual',
MaxRecords=50
)
if snapshots['DBSnapshots']:
# Calculate total snapshot storage (incremental)
total_snapshot_storage = sum(
snap.get('AllocatedStorage', 0)
for snap in snapshots['DBSnapshots']
)
# Analyze snapshot chain
snapshot_list = sorted(
snapshots['DBSnapshots'],
key=lambda x: x['SnapshotCreateTime']
)
# Calculate incremental efficiency
num_snapshots = len(snapshot_list)
if num_snapshots > 1:
theoretical_full = num_snapshots * db_size
actual_storage = total_snapshot_storage
efficiency = (1 - actual_storage / theoretical_full) * 100
cloudwatch.put_metric_data(
Namespace='Custom/RDS/Snapshots',
MetricData=[
{
'MetricName': 'TotalSnapshotStorage',
'Value': total_snapshot_storage,
'Unit': 'Gigabytes',
'Dimensions': [
{'Name': 'DBInstanceIdentifier', 'Value': db_instance}
]
},
{
'MetricName': 'SnapshotEfficiency',
'Value': efficiency,
'Unit': 'Percent',
'Dimensions': [
{'Name': 'DBInstanceIdentifier', 'Value': db_instance}
]
},
{
'MetricName': 'NumberOfSnapshots',
'Value': num_snapshots,
'Unit': 'Count',
'Dimensions': [
{'Name': 'DBInstanceIdentifier', 'Value': db_instance}
]
}
]
)
print(f"=== Snapshot Efficiency Report ===")
print(f"Number of snapshots: {num_snapshots}")
print(f"Database size: {db_size} GB")
print(f"Total storage used: {total_snapshot_storage} GB")
print(f"Theoretical (full backups): {theoretical_full} GB")
print(f"Storage efficiency: {efficiency:.1f}%")
print(f"Savings from incremental: {theoretical_full - actual_storage:.1f} GB")
# Check for old snapshots
latest = snapshot_list[-1]
age_hours = (datetime.now(latest['SnapshotCreateTime'].tzinfo) -
latest['SnapshotCreateTime']).total_seconds() / 3600
if age_hours > 168: # 7 days
send_alert(
f"⚠️ No recent manual snapshot!\n"
f"Latest snapshot: {latest['DBSnapshotIdentifier']}\n"
f"Age: {age_hours/24:.1f} days",
db_instance
)
# Generate report
report = {
'database_size_gb': db_size,
'backup_storage_gb': backup_storage,
'retention_days': retention_days,
'expected_storage_gb': round(expected_storage, 2),
'storage_ratio': round(storage_ratio, 2),
'avg_incremental_gb': round(avg_incremental, 2),
'daily_change_rate_pct': round(daily_change_rate, 2),
'total_snapshot_storage_gb': total_snapshot_storage if snapshots['DBSnapshots'] else 0,
'num_snapshots': len(snapshots['DBSnapshots']),
'efficiency_pct': round(efficiency, 2) if num_snapshots > 1 else 100,
'timestamp': datetime.utcnow().isoformat()
}
return {
'statusCode': 200,
'body': json.dumps(report, indent=2)
}
def send_alert(message, db_instance):
sns = boto3.client('sns')
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789012:rds-alerts',
Subject=f'RDS Backup Alert: {db_instance}',
Message=message
)
Part 13: Understanding Snapshot and Backup Deletion
Developer: I have a bunch of old snapshots piling up. Can I just delete them? Will it break anything? And what happens to automated backups?
RDS Expert: Great question! Deletion is often misunderstood and can be risky if done incorrectly. Let me explain how it all works:
Automated Backup Deletion
┌──────────────────────────────────────────────────────────────────┐
│ AUTOMATED BACKUP DELETION │
└──────────────────────────────────────────────────────────────────┘
HOW AUTOMATED BACKUPS ARE DELETED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Automated backups are AUTOMATICALLY deleted in these scenarios:
1. ROLLING RETENTION WINDOW (Normal Operation):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Configuration: 7-day retention
Timeline:
Day 1 (Jan 1): Full backup created ✅
Day 2 (Jan 2): Incremental backup ✅
Day 3 (Jan 3): Incremental backup ✅
Day 4 (Jan 4): Incremental backup ✅
Day 5 (Jan 5): Incremental backup ✅
Day 6 (Jan 6): Incremental backup ✅
Day 7 (Jan 7): Incremental backup ✅
└─ 7 backups exist, storage: ~118 GB
Day 8 (Jan 8): New incremental backup ✅
└─ Jan 1 backup AUTOMATICALLY DELETED ♻️
└─ Storage: Still ~118 GB (rolling window)
This happens AUTOMATICALLY every day:
├─ New backup created
├─ Oldest backup deleted
├─ Maintains constant storage usage
└─ No manual intervention needed ✅
2. CHANGING RETENTION PERIOD:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario A: REDUCE retention (14 days → 7 days)
Before change:
├─ 14 days of backups exist
└─ Storage: ~139 GB
After change:
├─ Backups older than 7 days IMMEDIATELY DELETED 🗑️
├─ Only last 7 days retained
├─ Storage: ~118 GB
└─ CANNOT be undone! ⚠️
Example:
10:00 AM → Change retention from 14 to 7 days
10:01 AM → Backups from Jan 1-7 deleted immediately
10:02 AM → Only Jan 8-14 backups remain
Result: Lost 7 days of restore points! ⚠️
Scenario B: INCREASE retention (7 days → 14 days)
Before change:
├─ 7 days of backups exist
└─ Storage: ~118 GB
After change:
├─ Existing backups NOT deleted
├─ New backups start accumulating to 14 days
├─ After 14 days: Storage grows to ~139 GB
└─ No data lost ✅
3. DISABLING AUTOMATED BACKUPS (DANGEROUS!):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Action: Set retention period to 0
What happens IMMEDIATELY:
├─ All automated backups DELETED 🗑️
├─ All transaction logs DELETED 🗑️
├─ Point-in-time recovery DISABLED ❌
└─ CANNOT be undone! ⚠️
Timeline:
10:00 AM → Retention period = 7 days
└─ Can restore to any point in last 7 days ✅
10:05 AM → Change retention to 0 (disable backups)
└─ Confirmation required in console
10:06 AM → ALL BACKUPS DELETED IMMEDIATELY 🗑️
└─ Cannot restore to ANY point ❌
10:10 AM → Oh no! Need yesterday's data!
└─ IMPOSSIBLE - backups gone forever 💀
Real-world disaster:
Developer: "Let me disable backups to save $2/month on dev DB"
[3 days later...]
Developer: "Need to restore yesterday's code testing data"
Result: Data lost forever, 8 hours to rebuild 😱
4. DELETING THE RDS INSTANCE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
When you delete an RDS instance:
Option A: WITHOUT final snapshot
┌─────────────────────────────────────────────┐
│ Delete DB Instance │
├─────────────────────────────────────────────┤
│ ⚠️ This will PERMANENTLY delete: │
│ • The database instance │
│ • ALL automated backups 🗑️ │
│ • ALL transaction logs 🗑️ │
│ │
│ □ Create final snapshot before deletion │
│ [Leave unchecked = NO SNAPSHOT] │
│ │
│ Type "delete me" to confirm: │
│ [____________] │
│ │
│ [Cancel] [Delete] │
└─────────────────────────────────────────────┘
Result:
├─ Instance: DELETED ✅
├─ Automated backups: DELETED 🗑️
├─ Manual snapshots: PRESERVED ✅
└─ Recovery: ONLY from manual snapshots
Option B: WITH final snapshot (RECOMMENDED)
┌─────────────────────────────────────────────┐
│ Delete DB Instance │
├─────────────────────────────────────────────┤
│ ☑️ Create final snapshot │
│ │
│ Snapshot name: │
│ [prod-db-final-2024-01-15] ✅ │
│ │
│ This snapshot will be retained even │
│ after instance deletion │
│ │
│ [Cancel] [Delete with Snapshot] │
└─────────────────────────────────────────────┘
Result:
├─ Instance: DELETED ✅
├─ Automated backups: DELETED 🗑️
├─ Final snapshot: CREATED ✅
├─ Manual snapshots: PRESERVED ✅
└─ Recovery: From final snapshot or old manual snapshots
WHAT GETS DELETED vs PRESERVED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Action Automated Backups Manual Snapshots
──────────────────────────────────────────────────────────────────
Rolling retention Oldest deleted Preserved
Reduce retention period Old ones deleted Preserved
Disable backups ALL deleted 🗑️ Preserved ✅
Delete instance (no snap) ALL deleted 🗑️ Preserved ✅
Delete instance (w/ snap) ALL deleted 🗑️ Preserved ✅
Key takeaway:
└─ Manual snapshots are NEVER automatically deleted ✅
You must delete them manually
Manual Snapshot Deletion
┌──────────────────────────────────────────────────────────────────┐
│ MANUAL SNAPSHOT DELETION │
└──────────────────────────────────────────────────────────────────┘
MANUAL SNAPSHOTS NEVER AUTO-DELETE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Manual snapshots persist forever until YOU delete them:
Scenario:
├─ Create snapshot: Jan 1, 2023
├─ Delete RDS instance: Jan 15, 2023
├─ One year later: Jan 1, 2024
└─ Snapshot STILL EXISTS (and costs $10/month!) 💰
You MUST manually delete:
├─ Via AWS Console
├─ Via AWS CLI
├─ Via API/SDK
└─ Or set up automated cleanup (Lambda)
HOW TO DELETE A MANUAL SNAPSHOT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AWS Console Method:
1. Go to RDS → Snapshots
2. Select snapshot(s) to delete
3. Actions → Delete snapshot
4. Confirm deletion
⚠️ WARNING: No confirmation dialog!
Once you click delete, it's gone immediately!
CLI Method:
aws rds delete-db-snapshot \
--db-snapshot-identifier my-snapshot-name
Response:
{
"DBSnapshot": {
"DBSnapshotIdentifier": "my-snapshot-name",
"Status": "deleting",
...
}
}
Status progression:
├─ "available" → "deleting" → deleted
└─ Takes a few seconds to minutes
SNAPSHOT DELETION ISSUES & DEPENDENCIES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ISSUE #1: Shared Snapshots
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
You shared a snapshot with another AWS account:
Account A (Owner):
├─ Created snapshot: prod-db-snapshot-2024-01-15
├─ Shared with Account B
└─ Tries to delete...
Error:
❌ Cannot delete snapshot while shared with other accounts
Solution:
1. First, un-share from all accounts
├─ Go to snapshot → Modify snapshot
├─ Remove all shared accounts
└─ Save changes
2. Then delete the snapshot ✅
AWS CLI:
# Un-share
aws rds modify-db-snapshot-attribute \
--db-snapshot-identifier prod-db-snapshot-2024-01-15 \
--attribute-name restore \
--values-to-remove 123456789012
# Then delete
aws rds delete-db-snapshot \
--db-snapshot-identifier prod-db-snapshot-2024-01-15
ISSUE #2: Encrypted Snapshots with KMS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Encrypted snapshot uses customer-managed KMS key:
Scenario:
├─ Snapshot encrypted with KMS key: my-rds-key
├─ Delete KMS key (schedule deletion)
└─ Can still delete snapshot? YES ✅
However:
├─ If KMS key deleted AND snapshot deleted
└─ CANNOT restore from snapshot copies ⚠️
Best practice:
1. Delete all snapshots using that KMS key
2. Ensure no copies exist in other regions
3. Then schedule KMS key deletion (7-30 day wait)
ISSUE #3: Snapshots in Incremental Chain
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Remember: EBS snapshots are incremental!
Snapshot Chain:
├─ Snapshot 1 (Jan 1): 100 GB (full)
├─ Snapshot 2 (Jan 8): +3 GB (incremental, references Snap 1)
├─ Snapshot 3 (Jan 15): +3 GB (incremental, references Snap 2)
└─ Total storage: 106 GB
What happens if you delete Snapshot 2?
AWS Automatically:
1. Merges Snapshot 2 data into Snapshot 3
2. Snapshot 3 now references Snapshot 1 directly
3. No data lost ✅
4. Deletion takes longer (merge operation)
Before deletion:
Snap 1 (100 GB) ← Snap 2 (3 GB) ← Snap 3 (3 GB)
After deleting Snap 2:
Snap 1 (100 GB) ← Snap 3 (6 GB, merged)
Result:
├─ Total storage: Still 106 GB
├─ Can still restore Snap 1: ✅
├─ Can still restore Snap 3: ✅
└─ Snap 3 restore includes Snap 2 data ✅
Time to delete:
├─ Simple snapshot: Seconds
├─ Snapshot in chain: Minutes (depends on size)
ISSUE #4: Cross-Region Snapshot Copies
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Original snapshot copied to multiple regions:
Setup:
├─ Original: us-east-1 → snapshot-prod-2024-01-15
├─ Copy 1: us-west-2 → snapshot-prod-2024-01-15
└─ Copy 2: eu-west-1 → snapshot-prod-2024-01-15
Deleting original (us-east-1):
├─ Does NOT delete copies ✅
├─ Copies remain independent
└─ Must delete each copy separately
Cost impact:
├─ Original deleted: Save $10/month in us-east-1
├─ 2 copies remain: Still pay $20/month total
└─ Don't forget to delete copies! 💰
SAFE DELETION PROCESS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before deleting any snapshot:
✅ CHECKLIST:
□ Is this snapshot shared? → Un-share first
□ Are there copies in other regions? → Delete those too
□ Is this the only backup of important data? → Create new one first
□ Is instance still running? → Verify recent backups exist
□ Is this production? → Get approval/document
□ Have you verified snapshot name? → Double-check!
Safe deletion script:
#!/bin/bash
SNAPSHOT_ID="my-snapshot-to-delete"
# 1. Check snapshot details
echo "Checking snapshot: $SNAPSHOT_ID"
aws rds describe-db-snapshots \
--db-snapshot-identifier $SNAPSHOT_ID \
--query 'DBSnapshots[0].[DBSnapshotIdentifier,SnapshotCreateTime,AllocatedStorage,Status]'
# 2. Check if shared
echo "Checking if snapshot is shared..."
SHARED=$(aws rds describe-db-snapshot-attributes \
--db-snapshot-identifier $SNAPSHOT_ID \
--query 'DBSnapshotAttributesResult.DBSnapshotAttributes[?AttributeName==`restore`].AttributeValues' \
--output text)
if [ ! -z "$SHARED" ]; then
echo "⚠️ Snapshot is shared with: $SHARED"
echo "Un-share before deletion"
exit 1
fi
# 3. List cross-region copies
echo "Checking for cross-region copies..."
for region in us-east-1 us-west-2 eu-west-1; do
echo "Checking region: $region"
aws rds describe-db-snapshots \
--db-snapshot-identifier $SNAPSHOT_ID \
--region $region \
--query 'DBSnapshots[0].DBSnapshotIdentifier' \
--output text 2>/dev/null
done
# 4. Confirm deletion
read -p "Are you sure you want to delete $SNAPSHOT_ID? (yes/no): " confirm
if [ "$confirm" != "yes" ]; then
echo "Deletion cancelled"
exit 0
fi
# 5. Delete snapshot
echo "Deleting snapshot..."
aws rds delete-db-snapshot --db-snapshot-identifier $SNAPSHOT_ID
echo "Snapshot deletion initiated"
AUTOMATED CLEANUP STRATEGY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Lambda function for lifecycle management:
import boto3
from datetime import datetime, timedelta
def lambda_handler(event, context):
rds = boto3.client('rds')
# Get all manual snapshots
snapshots = rds.describe_db_snapshots(
SnapshotType='manual'
)['DBSnapshots']
deleted_count = 0
for snapshot in snapshots:
snapshot_id = snapshot['DBSnapshotIdentifier']
created_time = snapshot['SnapshotCreateTime'].replace(tzinfo=None)
age_days = (datetime.now() - created_time).days
# Parse snapshot tags to determine retention
tags = {tag['Key']: tag['Value']
for tag in snapshot.get('TagList', [])}
retention_policy = tags.get('RetentionPolicy', 'default')
# Apply retention policies
should_delete = False
if retention_policy == 'weekly' and age_days > 90:
should_delete = True
reason = "Weekly snapshot older than 90 days"
elif retention_policy == 'monthly' and age_days > 365:
should_delete = True
reason = "Monthly snapshot older than 365 days"
elif retention_policy == 'temp' and age_days > 7:
should_delete = True
reason = "Temporary snapshot older than 7 days"
elif retention_policy == 'default' and age_days > 30:
should_delete = True
reason = "Default retention exceeded (30 days)"
if should_delete:
try:
print(f"Deleting {snapshot_id}: {reason}")
# Check if shared
attrs = rds.describe_db_snapshot_attributes(
DBSnapshotIdentifier=snapshot_id
)
restore_attrs = attrs['DBSnapshotAttributesResult']['DBSnapshotAttributes']
is_shared = any(attr['AttributeName'] == 'restore' and attr['AttributeValues']
for attr in restore_attrs)
if is_shared:
print(f"Skipping {snapshot_id}: snapshot is shared")
continue
# Delete snapshot
rds.delete_db_snapshot(
DBSnapshotIdentifier=snapshot_id
)
deleted_count += 1
except Exception as e:
print(f"Error deleting {snapshot_id}: {str(e)}")
continue
return {
'statusCode': 200,
'body': f'Deleted {deleted_count} snapshots'
}
Schedule: Run daily via EventBridge
Cost savings: Can save hundreds per month on old snapshots! 💰
DELETION COST & TIMING:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Deletion cost: FREE ✅
├─ No charge to delete snapshots
└─ Stop paying storage cost immediately
Timing:
├─ Simple snapshot: 10-30 seconds
├─ Snapshot in incremental chain: 2-10 minutes
├─ Large snapshot (>500 GB): 5-15 minutes
└─ You're billed until deletion completes
Storage billing:
├─ Snapshot created: Jan 1 at 10:00 AM
├─ Snapshot deleted: Jan 15 at 3:00 PM
├─ Billed for: 14.2 days of storage
└─ Prorated to the hour ✅
Example:
├─ 100 GB snapshot
├─ Existed for 14.2 days
├─ Cost: 100 GB × $0.095/GB/month × (14.2/30) = $4.50
└─ After deletion: $0/month ✅
Part 14: Exporting Snapshots to Amazon S3
Developer: I see an option to "Export to S3" in the console. What's that for? Why would I export a snapshot to S3?
RDS Expert: Excellent question! This is a powerful but often overlooked feature. Let me explain:
What is Snapshot Export to S3?
┌──────────────────────────────────────────────────────────────────┐
│ SNAPSHOT EXPORT TO S3 EXPLAINED │
└──────────────────────────────────────────────────────────────────┘
WHAT IT DOES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Export to S3 converts RDS snapshot → Apache Parquet files in S3
Normal RDS Snapshot:
├─ Format: RDS proprietary binary format
├─ Can only be restored to: RDS instance
├─ Cannot directly query data
└─ Cannot use with analytics tools
Exported to S3:
├─ Format: Apache Parquet (columnar, compressed)
├─ Can query with: Athena, Redshift Spectrum, EMR, Glue
├─ Can analyze with: QuickSight, Tableau, Python/Pandas
├─ Can archive long-term: S3 Glacier
└─ Can share: Just share S3 objects (not whole snapshot)
VISUAL REPRESENTATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Without Export:
┌─────────────┐
│ RDS Snapshot│ ─────────────┐
│ (Binary) │ │
└─────────────┘ │
├──> Only option: Restore to RDS
│ (takes 15-30 minutes)
│ (costs $200+/month for instance)
│
└──> To analyze data:
1. Restore to RDS ❌
2. Connect and query ❌
3. Export results ❌
Time: Hours, Cost: High
With Export to S3:
┌─────────────┐
│ RDS Snapshot│
│ (Binary) │
└──────┬──────┘
│
├──> Export to S3
│
▼
┌──────────────────────────────────────┐
│ Amazon S3 Bucket │
│ │
│ s3://my-bucket/exports/ │
│ ├── database_name/ │
│ │ ├── table1/ │
│ │ │ ├── part-00001.parquet │
│ │ │ └── part-00002.parquet │
│ │ └── table2/ │
│ │ └── part-00001.parquet │
│ └── metadata.json │
└───────────────┬───────────────────────┘
│
├──> Athena: Query with SQL ✅
├──> Redshift Spectrum: Analyze ✅
├──> Glue/EMR: Process ✅
├──> QuickSight: Visualize ✅
└──> Archive to Glacier: $1/TB/month ✅
OUTPUT FORMAT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Example S3 structure after export:
s3://my-exports-bucket/prod-db-export-2024-01-15/
│
├── mydb/ (database name)
│ │
│ ├── users/ (table name)
│ │ ├── part-00000-*.parquet (data files)
│ │ ├── part-00001-*.parquet
│ │ └── part-00002-*.parquet
│ │
│ ├── orders/
│ │ ├── part-00000-*.parquet
│ │ └── part-00001-*.parquet
│ │
│ └── products/
│ └── part-00000-*.parquet
│
└── export_info.json (metadata)
Each Parquet file contains:
├─ Table schema
├─ Column data (compressed)
├─ Optimized for analytics
└─ Can be read by many tools
COMPRESSION EFFICIENCY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Original database: 100 GB
Export to S3 (Parquet): ~30-40 GB ✨
Why smaller?
├─ Parquet columnar compression
├─ No indexes (table data only)
├─ No transaction logs
└─ Optimized storage format
Cost comparison:
├─ Snapshot storage: 100 GB × $0.095 = $9.50/month
├─ S3 Standard: 35 GB × $0.023 = $0.81/month ✅
└─ S3 Glacier Deep: 35 GB × $0.00099 = $0.03/month ✅
When and Why to Export to S3
┌──────────────────────────────────────────────────────────────────┐
│ USE CASES FOR EXPORT TO S3 │
└──────────────────────────────────────────────────────────────────┘
✅ USE CASE #1: Analytics & Business Intelligence
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario:
├─ Production RDS database
├─ Business team wants historical data analysis
├─ Don't want to impact production with heavy queries
└─ Need data from multiple points in time
Traditional approach (BAD):
1. Create read replica ($200/month)
2. Run analytics queries (impacts replica performance)
3. Can only query current data
4. Costs ongoing $200+/month
Export to S3 approach (GOOD):
1. Export weekly snapshot to S3 (30 minutes)
2. Query with Athena (pay per query)
3. Access historical data (multiple exports)
4. Cost: ~$1-5/month storage + minimal query costs ✅
Example Athena query:
-- Query exported data directly in S3
CREATE EXTERNAL TABLE IF NOT EXISTS exported_orders (
order_id INT,
user_id INT,
order_date DATE,
total_amount DECIMAL(10,2)
)
STORED AS PARQUET
LOCATION 's3://my-exports-bucket/prod-db-export-2024-01-15/mydb/orders/';
-- Now query historical data
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(*) as total_orders,
SUM(total_amount) as revenue
FROM exported_orders
WHERE order_date >= DATE '2023-01-01'
GROUP BY 1
ORDER BY 1;
-- Cost: $5 per TB scanned (typical query: <$0.10)
✅ USE CASE #2: Long-Term Archival & Compliance
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario:
├─ Must retain data for 7 years (compliance)
├─ RDS snapshots expensive for long-term storage
├─ Rarely need to access old data
└─ Need cost-effective solution
Snapshot retention (EXPENSIVE):
├─ 7 years of snapshots: 84 monthly snapshots
├─ Each snapshot: 100 GB
├─ With incremental: ~500 GB total
├─ Cost: 500 GB × $0.095 = $47.50/month
└─ 7 years: $47.50 × 84 = $3,990 total ❌
Export + Glacier Deep Archive (CHEAP):
├─ Export 84 monthly snapshots: 84 × 35 GB = 2,940 GB
├─ Store in Glacier Deep Archive: 2,940 GB × $0.00099
└─ Cost: $2.91/month × 84 = $244 total ✅
SAVINGS: $3,746 over 7 years (94% reduction)! 🎉
Lifecycle policy:
S3 Bucket Lifecycle Rule:
- Name: "Archive-Old-Exports"
Filter:
Prefix: "rds-exports/"
Transitions:
- Days: 30
StorageClass: GLACIER
- Days: 180
StorageClass: DEEP_ARCHIVE
Status: Enabled
Cost progression per export:
├─ Day 0-30: S3 Standard ($0.81/month)
├─ Day 31-180: Glacier ($0.14/month)
└─ Day 180+: Glacier Deep Archive ($0.03/month) ✅
✅ USE CASE #3: Data Migration & ETL
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario:
├─ Migrating from RDS MySQL to Redshift
├─ Need to transform data during migration
├─ Want to validate data before cutting over
└─ Can't afford production downtime
Process:
1. Export RDS snapshot to S3 (30 min)
2. Transform with AWS Glue or EMR
3. Load into Redshift
4. Validate and compare
5. Cutover when ready
Benefits:
├─ No impact on production RDS
├─ Can retry/iterate transformations
├─ Parallel processing of multiple tables
└─ Validate before go-live
Glue ETL example:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
# Read from exported Parquet
rds_export = glueContext.create_dynamic_frame.from_options(
format_options={},
connection_type="s3",
format="parquet",
connection_options={
"paths": ["s3://my-exports-bucket/prod-db-export/mydb/orders/"]
}
)
# Transform data
transformed = rds_export.apply_mapping([
("order_id", "int", "order_id", "bigint"),
("order_date", "string", "order_date", "date"),
("total_amount", "decimal", "total_amount", "decimal(18,2)")
])
# Write to Redshift
glueContext.write_dynamic_frame.from_options(
frame=transformed,
connection_type="redshift",
connection_options={
"url": "jdbc:redshift://...",
"dbtable": "orders",
"redshiftTmpDir": "s3://temp-bucket/"
}
)
✅ USE CASE #4: Cross-Account/Cross-Organization Sharing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario:
├─ Need to share data with partner organization
├─ Don't want to share entire database
├─ Only need specific tables
└─ Security/compliance concerns
Snapshot sharing (COMPLICATED):
├─ Share entire snapshot (all tables) ❌
├─ Includes sensitive data ❌
├─ Partner must restore to RDS ($$$) ❌
└─ Hard to control access
Export to S3 (BETTER):
├─ Export only needed tables ✅
├─ Grant S3 bucket access to partner account ✅
├─ Partner queries with Athena (no RDS needed) ✅
├─ Fine-grained access control ✅
└─ Revoke access anytime ✅
S3 bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPartnerAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::PARTNER-ACCOUNT:root"
},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-exports-bucket/shared-data/*",
"arn:aws:s3:::my-exports-bucket"
]
}
]
}
Partner queries your data:
-- Partner's Athena query (no RDS needed!)
CREATE EXTERNAL TABLE partner_orders (
order_id INT,
order_date DATE,
amount DECIMAL(10,2)
)
STORED AS PARQUET
LOCATION 's3://your-bucket/shared-data/orders/';
SELECT * FROM partner_orders LIMIT 10;
✅ USE CASE #5: Disaster Recovery Testing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario:
├─ Need to verify backup integrity
├─ Don't want to spin up full RDS instance
├─ Want quick validation of data
└─ Regular DR drills
Traditional DR test:
1. Restore snapshot to RDS (20 min)
2. Connect and run validation queries (1 hour)
3. Delete test instance
4. Cost: 2 hours × $0.29 = $0.58 per test
5. Time: 2-3 hours
Export to S3 validation:
1. Export snapshot to S3 (30 min, one-time)
2. Query with Athena (instant)
3. Run validation queries (10 min)
4. Cost: Query costs ~$0.01
5. Time: 10 minutes ✅
Validation queries:
-- Check record counts
SELECT 'users' as table_name, COUNT(*) as count FROM exported_users
UNION ALL
SELECT 'orders', COUNT(*) FROM exported_orders
UNION ALL
SELECT 'products', COUNT(*) FROM exported_products;
-- Check data integrity
SELECT
COUNT(*) as total_orders,
COUNT(DISTINCT user_id) as unique_users,
SUM(total_amount) as total_revenue,
MIN(order_date) as earliest_order,
MAX(order_date) as latest_order
FROM exported_orders;
-- Verify foreign key relationships
SELECT
COUNT(*) as orphaned_orders
FROM exported_orders o
LEFT JOIN exported_users u ON o.user_id = u.user_id
WHERE u.user_id IS NULL;
❌ DON'T USE EXPORT FOR:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Real-time Data Access
├─ Export takes 30+ minutes
├─ Data is point-in-time
└─ Use: Read replica or DMS instead
2. Restoring to RDS
├─ Exported data cannot be restored to RDS
├─ It's analytics format, not RDS format
└─ Use: Regular snapshot restore
3. Transactional Queries
├─ S3/Athena not optimized for OLTP
├─ Higher latency than RDS
└─ Use: RDS or Aurora for transactions
4. Frequently Updated Data
├─ Must re-export for each update
├─ Not cost-effective
└─ Use: Aurora with Athena federation
How to Export Snapshot to S3
┌──────────────────────────────────────────────────────────────────┐
│ EXPORT PROCESS STEP-BY-STEP │
└──────────────────────────────────────────────────────────────────┘
PREREQUISITES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. S3 Bucket (must exist)
2. IAM Role with permissions
3. KMS Key (if snapshot is encrypted)
4. Snapshot to export
Start Export (AWS Console)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Go to RDS → Snapshots
2. Select snapshot to export
3. Actions → Export to Amazon S3
4. Fill in details:
┌─────────────────────────────────────────────┐
│ Export DB snapshot to Amazon S3 │
├─────────────────────────────────────────────┤
│ Export identifier: │
│ [prod-db-export-2024-01-15] │
│ │
│ Data to be exported: │
│ ○ All (entire snapshot) │
│ ● Partial (select tables/schemas) │
│ │
│ Select identifiers: [✓] users │
│ [✓] orders │
│ [ ] logs │
│ │
│ S3 bucket: [my-rds-exports-bucket] │
│ S3 prefix: [exports/2024-01-15/] │
│ │
│ IAM role: [RDSExportRole] │
│ │
│ KMS key: [aws/rds] (if encrypted) │
│ │
│ [Cancel] [Export to S3] │
└─────────────────────────────────────────────┘
5. Click "Export to S3"
EXPORT TIMING & COSTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Export time estimates:
├─ 10 GB snapshot: 10-15 minutes
├─ 100 GB snapshot: 30-60 minutes
├─ 500 GB snapshot: 2-4 hours
└─ 1 TB snapshot: 4-8 hours
Export costs:
├─ Export process: FREE ✅
├─ S3 storage: $0.023/GB/month (Standard)
├─ S3 PUT requests: $0.005 per 1,000 (negligible)
└─ Data transfer: FREE (same region) ✅
Example cost (100 GB database):
├─ Export: $0 (free)
├─ S3 storage: 35 GB × $0.023 = $0.81/month
├─ Athena queries: ~$0.01-0.10 per query
└─ Total: ~$1-2/month ✅
vs keeping snapshot:
└─ Snapshot: 100 GB × $0.095 = $9.50/month ❌
Export saves 91%! 🎉
Using Exported Data
┌──────────────────────────────────────────────────────────────────┐
│ QUERYING EXPORTED DATA │
└──────────────────────────────────────────────────────────────────┘
Amazon Athena (Serverless SQL)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Create Glue Database
CREATE DATABASE rds_exports;
Step 2: Create External Tables
-- Automatically detect schema from Parquet
CREATE EXTERNAL TABLE rds_exports.users (
user_id INT,
username STRING,
email STRING,
created_at TIMESTAMP,
last_login TIMESTAMP
)
STORED AS PARQUET
LOCATION 's3://my-rds-exports-bucket/exports/2024-01-15/mydb/users/';
CREATE EXTERNAL TABLE rds_exports.orders (
order_id INT,
user_id INT,
order_date DATE,
total_amount DECIMAL(10,2),
status STRING
)
STORED AS PARQUET
LOCATION 's3://my-rds-exports-bucket/exports/2024-01-15/mydb/orders/';
Step 3: Query the Data
-- Simple queries
SELECT * FROM rds_exports.users LIMIT 10;
SELECT COUNT(*) FROM rds_exports.orders;
-- Analytics queries
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(*) as order_count,
SUM(total_amount) as revenue,
AVG(total_amount) as avg_order_value
FROM rds_exports.orders
WHERE order_date >= DATE '2023-01-01'
GROUP BY 1
ORDER BY 1 DESC;
-- Join queries
SELECT
u.username,
COUNT(o.order_id) as total_orders,
SUM(o.total_amount) as lifetime_value
FROM rds_exports.users u
LEFT JOIN rds_exports.orders o ON u.user_id = o.user_id
GROUP BY u.username
HAVING COUNT(o.order_id) > 10
ORDER BY lifetime_value DESC
LIMIT 100;
Cost:
└─ $5 per TB scanned
Typical query on 100 GB: $0.50
BEST PRACTICES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ DO:
├─ Export only needed tables (saves time and cost)
├─ Use S3 lifecycle policies for cost optimization
├─ Partition exports by date (year/month/day structure)
├─ Tag exports with metadata
├─ Monitor export progress
├─ Test restore/query procedures
└─ Document S3 bucket structure
❌ DON'T:
├─ Export unnecessarily large/sensitive tables
├─ Keep exports forever without lifecycle policy
├─ Export real-time changing data repeatedly
├─ Use for operational queries (use read replica)
└─ Forget to clean up old exports
Final Summary
Developer: Wow, this has been incredibly helpful! Can you give me one final summary?
RDS Expert: Of course! Here's everything in a nutshell:
┌──────────────────────────────────────────────────────────────────┐
│ FINAL SUMMARY │
└──────────────────────────────────────────────────────────────────┘
THE GOLDEN RULES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. AUTOMATED BACKUPS = Your Insurance Policy (INCREMENTAL)
├─ Always enabled in production
├─ For recovering from recent accidents
├─ Point-in-time recovery is your superpower
└─ Incremental nature makes long retention affordable!
2. MANUAL SNAPSHOTS = Your Bookmarks (INCREMENTAL)
├─ Before major changes
├─ For long-term retention
├─ For disaster recovery across regions
└─ EBS incremental tech = efficient storage!
3. TRANSACTION LOGS = Your Time Machine
├─ Enable precise recovery
├─ Captured automatically every 5 minutes
└─ Only available with automated backups
4. INCREMENTAL BACKUPS = Your Cost Saver ✨
├─ 83-97% storage reduction vs full backups
├─ 80-90% faster backup windows
├─ Makes long retention affordable
└─ Both automated backups AND snapshots use it!
5. TEST YOUR RESTORES
└─ Backups are worthless if you can't restore
KEY INSIGHTS ABOUT INCREMENTAL:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Automated backups are incremental after first full backup
✅ Manual snapshots use EBS incremental technology
✅ Only changed blocks are stored, not entire database
✅ 7-day retention ≈ 118 GB, NOT 700 GB
✅ Storage cost: ~$1.71/month, NOT $57/month
✅ Backup windows: 2-5 minutes, NOT 30 minutes
✅ Cross-region transfers: 3-5 GB, NOT 100 GB
✅ Makes frequent backups economically viable
WHEN TO USE WHAT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Recent Accidents (last few hours):
└─ Use: Automated Backup (PITR)
Why: Precise second-by-second recovery
How: Uses incremental backups + transaction logs
Major Changes (deployments, migrations):
└─ Use: Manual Snapshot
Why: Clean rollback point
How: Full snapshot first time, then incremental
Disaster Recovery (region failure):
└─ Use: Cross-region Manual Snapshot
Why: Only option that works across regions
How: Incremental copies keep cost down
COST REALITY (100 GB Database):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WITHOUT incremental (theoretical):
├─ 7-day auto backups: $57/month
├─ 12 weekly snapshots: $114/month
├─ 12 monthly snapshots: $114/month
└─ TOTAL: $285/month ❌
WITH incremental (actual):
├─ 7-day auto backups: $1.71/month ✅
├─ 12 weekly snapshots: $10.83/month ✅
├─ 12 monthly snapshots: $16.63/month ✅
└─ TOTAL: $29.17/month ✅
SAVINGS: $255.83/month (90% reduction)! 🎉
YOUR ACTION ITEMS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
□ Enable automated backups (7-14 days) - it's affordable!
□ Understand storage = DB_size + (retention × daily_change)
□ Set up snapshot automation for deployments
□ Configure cross-region copies for DR (incremental transfer)
□ Monitor daily change rate for cost prediction
□ Create restore runbook (document PITR vs snapshot usage)
□ Set up monitoring and alerts
□ Schedule monthly restore tests
□ Implement snapshot lifecycle policy
□ Train team on incremental nature and when to use what
Appendix: Additional Resources
AWS Documentation Links
Best Practice Guides
- AWS Well-Architected Framework - Reliability Pillar
- AWS Disaster Recovery Whitepaper
- RDS Best Practices Guide
- EBS Snapshot Best Practices
Top comments (0)