Data Tech Bridge

Posted on Mar 6

Amazon RDS - Cheat Sheet

Overview

Amazon Relational Database Service (RDS) is a managed relational database service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups.

Core Components and Concepts

Database Instance: The basic building block of RDS; an isolated database environment in the cloud
Database Engine: The specific database software that runs on your instance (MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, or Amazon Aurora)
DB Instance Class: Determines the computation and memory capacity of an RDS instance
Storage Types: Different storage options for different workloads (General Purpose SSD, Provisioned IOPS SSD, Magnetic)
Multi-AZ Deployment: Provides high availability and failover support for DB instances
Read Replicas: Read-only copies of your database to offload read traffic from the primary DB instance

RDS Database Engines

Engine	Description	Use Cases	Open Source
MySQL	Popular open-source database	Web applications, e-commerce	Yes
PostgreSQL	Advanced open-source database	Geographic applications, complex data types	Yes
MariaDB	MySQL fork with enhanced features	Web applications, replacing MySQL	Yes
Oracle	Enterprise-grade commercial database	Enterprise applications, legacy systems	No
SQL Server	Microsoft's relational database	Windows-integrated applications	No
Amazon Aurora	MySQL/PostgreSQL-compatible database with enhanced performance	High-performance applications, mission-critical workloads	No (proprietary implementation)

Instance Types and Performance

Instance Families:
- Standard (db.m classes): Balanced compute and memory
- Memory Optimized (db.r classes): For memory-intensive workloads
- Burstable (db.t classes): For development/test environments with variable workloads
Storage Performance:

Storage Type	Performance	Use Case	IOPS
General Purpose SSD (gp2)	Baseline of 3 IOPS/GB, burst to 3,000 IOPS	Development, test, small-to-medium workloads	3 IOPS/GB (min 100, max 16,000)
General Purpose SSD (gp3)	Baseline of 3,000 IOPS, 125 MiB/s	Customizable performance	3,000-16,000 IOPS, 125-1,000 MiB/s
Provisioned IOPS SSD (io1)	User-provisioned IOPS	I/O-intensive workloads	Up to 256,000 IOPS
Magnetic (standard)	Not recommended for new deployments	Legacy applications	N/A

Storage Calculation Example:
- For a 1,000 GB gp2 volume: 1,000 GB × 3 IOPS/GB = 3,000 IOPS baseline
- For a gp3 volume requiring 5,000 IOPS and 500 MiB/s: Additional cost beyond baseline of 3,000 IOPS and 125 MiB/s

High Availability and Disaster Recovery

Multi-AZ Deployment:
- Synchronous replication to standby in different AZ
- Automatic failover during planned/unplanned outages
- No manual intervention required
- Same endpoint used after failover
Read Replicas:
- Asynchronous replication
- Up to 15 read replicas per DB instance (5 for SQL Server)
- Can be promoted to standalone DB instance
- Can be created in different regions (cross-region read replicas)
Backup and Recovery:
- Automated backups: Point-in-time recovery up to 35 days
- Manual snapshots: Retained until explicitly deleted
- Backup window: Configurable time when backups occur
- Restore operation creates a new DB instance

Security Features

Network Security:
- VPC integration for network isolation
- Security groups to control access
- No direct host access (SSH/RDP disabled)
Encryption:
- At-rest encryption using AWS KMS
- In-transit encryption using SSL/TLS
- Transparent Data Encryption (TDE) for Oracle and SQL Server
Authentication:
- Password authentication
- IAM database authentication (MySQL and PostgreSQL)
- Kerberos authentication

Monitoring and Performance

CloudWatch Metrics:

Metric	Description	Threshold Recommendation
CPUUtilization	Percentage of CPU utilization	<80%
DatabaseConnections	Number of client connections	Depends on workload, monitor for unusual spikes
FreeableMemory	Amount of available RAM	>20% of total memory
ReadIOPS/WriteIOPS	Average I/O operations per second	Depends on instance type and storage
ReadLatency/WriteLatency	Average time for I/O operations	<20ms for most workloads
DiskQueueDepth	Number of I/O requests waiting	<1 per volume
FreeStorageSpace	Available storage space	>20% of allocated storage
ReplicaLag	How far behind read replica is from primary	<30 seconds

Enhanced Monitoring:
- OS-level metrics (CPU, memory, file system, disk I/O)
- 1-second to 60-second intervals
- Stored in CloudWatch Logs
Performance Insights:
- Database performance analysis tool
- Visualizes database load
- Identifies performance bottlenecks
- Retention: 7 days (free tier), up to 24 months (paid)

Scaling Options

Vertical Scaling:
- Change instance class (CPU/memory)
- Increase allocated storage
- Modify Provisioned IOPS
- Usually involves downtime (except for Aurora and storage scaling)
Horizontal Scaling:
- Add read replicas to distribute read workloads
- Shard data across multiple instances (application-level)

Service Limits and Quotas

Key Limits:
- Maximum storage: 64 TiB (MySQL, MariaDB, PostgreSQL, Oracle), 16 TiB (SQL Server)
- Maximum Provisioned IOPS: 256,000 (io1)
- Maximum DB instances per account: 40 by default (can be increased)
- Maximum read replicas: 15 per primary (5 for SQL Server)
- Maximum backup retention: 35 days

Data Migration and ETL

AWS Database Migration Service (DMS):
- Migrate databases to RDS with minimal downtime
- Supports homogeneous and heterogeneous migrations
- Continuous replication for CDC (Change Data Capture)
Import/Export Options:
- Native database tools (mysqldump, pg_dump)
- AWS Data Pipeline for scheduled data transfers
- S3 integration for import/export (MySQL, PostgreSQL, MariaDB)
Replayability of Data Ingestion:
- Binary logs for MySQL/MariaDB (enable with binlog_format=ROW)
- Write-ahead logs for PostgreSQL
- Archived redo logs for Oracle
- Transaction logs for SQL Server

Cost Optimization

Reserved Instances:
- Up to 72% discount compared to On-Demand
- 1 or 3-year terms
- Payment options: No upfront, partial upfront, all upfront
Storage Autoscaling:
- Automatically scales storage when approaching limit
- Set maximum storage limit to control costs
Instance Scheduling:
- Stop/start instances during non-business hours
- Use AWS Instance Scheduler for automation

Aurora Specific Features

Aurora Architecture:
- Storage layer separated from compute layer
- 6 copies of data across 3 AZs
- Self-healing storage
- 10GB increments up to 128TB
Aurora Serverless:
- Auto-scaling based on workload
- Pay only for resources consumed
- Ideal for variable or unpredictable workloads
Aurora Global Database:
- Spans multiple AWS regions
- Low-latency global reads
- Disaster recovery from region-wide outages
- Typical replication lag < 1 second

Implementing Throttling and Overcoming Rate Limits

Connection Pooling:
- Use connection pooling (e.g., PgBouncer, ProxySQL)
- Prevents database connection exhaustion
- Reduces connection overhead
Rate Limiting Strategies:
- Implement application-level throttling
- Use Amazon RDS Proxy to manage connections
- Configure max_connections parameter appropriately
Handling Burst Workloads:
- Use RDS Proxy to smooth connection spikes
- Implement exponential backoff for retries
- Consider Aurora Serverless for variable workloads

Throughput and Latency Characteristics

Network Throughput:
- Varies by instance size (larger instances = more bandwidth)
- Enhanced Networking provides higher PPS (packets per second)
- Placement Groups can reduce latency between EC2 and RDS
I/O Throughput:
- gp3: Baseline 125 MiB/s, up to 1,000 MiB/s
- io1: Depends on provisioned IOPS and instance capability
- Example calculation: 5,000 IOPS × 16 KB per I/O = 80 MiB/s

Mind Map: AWS RDS Components

AWS RDS
├── Database Engines
│   ├── MySQL
│   ├── PostgreSQL
│   ├── MariaDB
│   ├── Oracle
│   ├── SQL Server
│   └── Aurora
├── Instance Management
│   ├── Instance Classes
│   │   ├── Standard (db.m)
│   │   ├── Memory Optimized (db.r)
│   │   └── Burstable (db.t)
│   ├── Storage Options
│   │   ├── General Purpose SSD (gp2/gp3)
│   │   ├── Provisioned IOPS SSD (io1)
│   │   └── Magnetic (standard)
│   └── Scaling
│       ├── Vertical (instance size)
│       └── Horizontal (read replicas)
├── High Availability
│   ├── Multi-AZ Deployment
│   ├── Read Replicas
│   └── Automated Backups
├── Security
│   ├── Network Security
│   ├── Encryption
│   └── Authentication
└── Monitoring
    ├── CloudWatch Metrics
    ├── Enhanced Monitoring
    └── Performance Insights

Open Source Components in RDS

MySQL in RDS vs. Self-Managed:
- RDS provides automated patching, backups, and monitoring
- Some MySQL features restricted (SUPER privileges, file system access)
- Compatible with most MySQL tools and applications
PostgreSQL in RDS vs. Self-Managed:
- RDS supports most PostgreSQL extensions
- Limited superuser access (rds_superuser role instead)
- Automated minor version upgrades
MariaDB in RDS vs. Self-Managed:
- Similar feature set to MySQL
- Better performance for some workloads
- Some MariaDB-specific features may be restricted

Feature	Self-Managed Open Source	RDS Managed Service
Administrative Overhead	High (manual setup, patching, backups)	Low (automated management)
Control	Full control over configuration	Limited to parameters in parameter groups
Access	Full system access	No host access, limited privileges
Cost	Infrastructure costs only	Service premium + infrastructure
Scaling	Manual process	Simplified scaling operations
High Availability	Manual setup required	Built-in Multi-AZ, read replicas

DEV Community