Last updated: 2025-10-11
I recently completed the course “Fundamentals of Amazon RDS for PostgreSQL.” While studying, I took thorough notes. I’m publishing those notes here so others can benefit from a single, practical reference when designing, deploying, and operating PostgreSQL on Amazon RDS.
Why this post (and who it’s for)
If you’re choosing between Single‑AZ vs Multi‑AZ, gp3 vs io2, or wondering how parameter groups differ from postgresql.conf
, this guide is for you. It keeps everything from my notes and organizes it into a checklist‑style blog you can revisit during architecture reviews, migrations, and production ops.
0) Big Picture — What RDS for PostgreSQL Gives You
Keep in mind
- Fully managed PostgreSQL: AWS handles provisioning, patching, backups, monitoring primitives, and managed HA options. You focus on schema, queries, and performance tuning.
- Same PostgreSQL engine & ecosystem: apps/tools/extensions generally work with minor adjustments.
Details to know
- Engines offered on RDS (open‑source): PostgreSQL, MySQL, MariaDB (others on RDS include SQL Server, Oracle, DB2).
- RDS for PostgreSQL versions referenced in the course: 12 → 17.
- Extensions examples: PostGIS (spatial), pg_stat_statements (called “PGA statement” in transcript), pgvector (vector search/AI), pg_hint_plan, and more.
1) Self‑Managed vs RDS‑Managed PostgreSQL
Keep in mind
- Self‑managed = full control, full responsibility.
- RDS‑managed = operational heavy lifting is offloaded to AWS.
Details to know
- Pain points self‑managed: OS/DB patching, HA setup, DR planning, capacity planning, security/compliance, performance triage.
- RDS benefits: One‑click scale, built‑in HA patterns (Multi‑AZ), automated backups & PITR, event notifications, default monitoring, encryption in transit & at rest, IAM/AD integration.
2) Instance Classes (Compute) — Choosing & Scaling
Keep in mind
- Pick class type for workload pattern; watch memory/CPU/network as potential bottlenecks.
- You can scale up or down (minimal/no downtime in many cases). Reserved Instances can cut cost for 1–3 years.
Details to know
-
Families mentioned:
- Burstable / General purpose: good for small/dev workloads.
- M6 (general purpose, CPU‑leaning).
- R6 (memory‑optimized) for high‑memory queries/throughput.
- Graviton (G) (Arm) — best price/perf for many workloads.
- Upper envelope (course): up to 128 vCPU and 4096 GiB RAM.
- Right‑sizing tips: baseline with CPU < 60%, RAM hit rate high (buffers), I/O queue healthy; scale up for sustained saturation or memory pressure; scale down to save cost when underutilized.
3) Storage Types & Scaling (EBS under the hood)
Keep in mind
- If you’re I/O bound, look at volume type & IOPS, not the instance.
- Enable Storage Auto Scaling to avoid outages from full disks.
Details to know
-
Volume types:
- gp2 (legacy general purpose): performance scales with size; up to ~1,000 MB/s throughput.
- gp3 (recommended general purpose): baseline 3,000 IOPS, higher throughput than gp2; IOPS/throughput configurable independent of size.
- io1 / io2 Block Express (Provisioned IOPS): for I/O‑intensive workloads; higher, predictable IOPS & throughput.
- Size & limits: up to 64 TiB per DB volume.
- IOPS: transcript cites up to ~256,000 when increasing provisioned IOPS.
- Modify storage & IOPS online: No downtime; BUT there is a 6‑hour gap rule between successive storage modifications or until “storage optimization” completes.
- Auto Scaling triggers: kicks in when free storage ≤ 10% for ≥ 5 minutes; increases by the greater of 10 GiB, 10% of current size, or predicted 7‑hour growth (based on FreeStorageSpace metric).
- Design tip: Prefer gp3 for general workloads; move to io1/io2 when sustained high IOPS/latency sensitivity is observed.
4) Configuration via Parameter Groups (and What’s Not Editable)
Keep in mind
- You don’t edit
postgresql.conf
orpg_hba.conf
directly on RDS. - Use DB Parameter Groups (engine) and Option/Parameter groups to tune; network auth is controlled by Security Groups.
Details to know
-
Static parameters require reboot (e.g.,
shared_buffers
). -
Dynamic/session parameters can change at runtime or per user/session (e.g.,
work_mem
,maintenance_work_mem
). - Console shows whether a parameter is static/dynamic and its apply‑type.
- pg_hba equivalents: inbound connectivity is gated by VPC Security Groups; database‑level roles/privileges still govern object access.
5) Connecting & Authenticating
Keep in mind
- Think in two layers: network allow‑list (Security Groups) and DB auth (roles/credentials).
Details to know
- Network: Allow specific client IPs/app subnets via Security Groups.
- Auth methods: classic username/password; Kerberos/AD; IAM auth (token‑based).
- Use RDS Proxy to pool connections and speed up reconnection during failovers.
6) Monitoring & Observability
Keep in mind
- Start simple with CloudWatch; turn on deeper tooling when diagnosing hotspots.
Details to know
- Amazon CloudWatch metrics: CPU, RAM (freeable memory), storage, IOPS, connections, replica lag; dashboards & alarms.
- Enhanced Monitoring: OS/process‑level metrics at 1‑second granularity (emits to CloudWatch Logs).
- Performance Insights (PI): visualize DB load (AAS), top waits, top SQL; identify bottlenecks & tuning opportunities.
- CloudWatch Database Insights: opinionated prebuilt dashboards across a fleet to spot unhealthy instances quickly.
7) Security & Compliance
Keep in mind
- Defense‑in‑depth: VPC isolation, SGs, encryption, least‑privilege IAM, directory integration, DB roles.
Details to know
- Network isolation with VPC + Security Groups.
- Encryption: KMS at rest; SSL/TLS in transit.
- Identity: IAM for API/admin; Active Directory/Kerberos for centralized DB auth (where enabled).
- Roles & privileges: PostgreSQL roles ultimately control table/schema access.
- Compliance: finance/healthcare/gov frameworks supported; understand shared responsibility boundaries.
- TLE (Trusted Language Extensions): build/run high‑performance extensions safely on RDS using trusted languages without AWS certifying your code.
8) High Availability (HA) — Multi‑AZ Options
Keep in mind
- Choose Multi‑AZ for production. Understand failover mechanics, latency, and endpoints.
Details to know
- Single‑AZ: one instance only; if it fails, RDS must recreate a host → longer recovery.
-
Multi‑AZ with One Standby (classic):
- Synchronous replication of storage to a standby in another AZ.
- You never read from the standby.
- One endpoint; DNS flips on failover; you may need to reconnect (RDS Proxy helps).
- Backups/maintenance performed on standby to reduce impact.
-
Multi‑AZ DB Cluster (Two Readable Standbys):
- Semi‑synchronous: commit after either standby acknowledges → lower commit latency.
- Reader endpoint (load‑balanced reads) + Cluster endpoint (read/write to current primary).
- Typical failover target mentioned: < ~35s.
- NVMe SSD local storage on nodes for low latency.
- Triggers for failover (examples): primary AZ loss, network loss to primary, compute/storage failure on primary. Not triggered for long queries/deadlocks/DB corruption by default.
- SLA: Multi‑AZ helps achieve 99.95% monthly uptime (as cited in the session).
- Best practices: test failover; set connection retry logic; keep clients using endpoints, not static hostnames.
9) Disaster Recovery (DR) — Snapshots, PITR, Replicas
Keep in mind
- HA ≠ DR. Multi‑AZ is synchronous availability within a Region; DR often uses asynchronous replicas and off‑Region backups.
Details to know
-
Snapshots (EBS‑backed):
- Automated: daily within backup window + WAL/txn logs every ~5 minutes to S3 → Point‑in‑Time Restore (PITR) within retention (max 35 days).
- Manual: on‑demand, no auto‑retention; kept until deleted; cannot replay WAL → no PITR; manage lifecycle via AWS Backup.
- Cross‑Region/Account copy supported (good for DR/compliance).
- On Single‑AZ, a brief sub‑second I/O pause may occur during snapshot; avoided in Multi‑AZ because snapshots are taken from standby.
- Cross‑Region Automated Backups (Backup Replication): enable on the instance; choose target Region and retention.
-
Read Replicas (async):
- For read scale and DR. In‑Region and cross‑Region supported.
- Can be promoted to standalone during DR.
- Size replicas equal or larger than primary to avoid replay lag; smaller replicas can fall behind.
- External replicas via logical replication: stream changes to self‑managed PostgreSQL (check feature limitations/conflict handling).
- Restore behavior: Restoring always creates a new DB instance (for both snapshot and PITR). Plan cutover steps.
10) Eventing & Operations Quality of Life
Keep in mind
- Use events & notifications to automate ops and alerting.
Details to know
- DB Event notifications: subscribe to Multi‑AZ failover, maintenance, storage thresholds, etc.
- Maintenance: minor/major engine & OS patching through defined windows; often applied to standby first in Multi‑AZ, then failover, then patch former primary.
- Cost controls: scale down in off‑peak, use Graviton where possible, reserved instances for steady usage.
11) Migration & Flexibility
Keep in mind
- Homogeneous migrations are straightforward; heterogeneous require conversion.
Details to know
- AWS DMS for data migration & CDC; AWS SCT for schema/code conversion from other engines.
- Open‑source posture means you can move to/from RDS with less lock‑in compared to proprietary engines.
12) Architecture Patterns — Putting It Together
Keep in mind
- Separate HA (Multi‑AZ) from DR (replicas & cross‑Region backups).
- Right‑size compute, choose storage class wisely, and set autoscaling + alarms.
Example blueprint
- Prod: Multi‑AZ DB cluster (2 readable standbys) for fast failover and read scaling; gp3 or io1/io2 depending on I/O profile; PI enabled; alarms on CPU, FreeStorageSpace, ReplicaLag, CommitLatency.
- DR: Cross‑Region automated backups enabled + one cross‑Region read replica; run DR drills (promote & cut back).
- Security: VPC‑only access, tight SG rules, TLS enforced, KMS CMK, IAM least privilege, DB roles by schema.
- Ops: RDS Proxy for connection pooling; maintenance window off‑peak; event subscriptions.
13) Quick Reference (numbers as presented)
- Compute scale: up to 128 vCPU / 4096 GiB RAM.
- Storage scale: up to 64 TiB per volume.
- Provisioned IOPS up to ~256,000 (per transcript).
- Auto storage scaling: triggers at ≤10% free for ≥5 min; increase by 10 GiB, 10%, or predicted 7‑hour growth.
- Multi‑AZ DB cluster failover: typically < ~35s.
- Automated backup retention: up to 35 days; logs to S3 every ~5 min → PITR.
Note: These reflect statements from the course. Always verify current AWS docs for exact limits/regions/pricing before production decisions.
Conclusion
I created this blog after completing the course so I—and you—have one practical reference for Amazon RDS for PostgreSQL. The core ideas: choose Multi‑AZ for production, keep HA distinct from DR, prefer gp3 unless your workload demands io1/io2, enable Performance Insights early, and practice failovers and restores before you need them.
If this helped, share it with your team and save it for design reviews and DR runbooks.
Top comments (0)