Prasan Bora

Posted on Jan 26

Upgrading PostgreSQL 13 to 17 on AWS RDS with Minimal Downtime

#postgres #database #aws #devops

I recently upgraded a production PostgreSQL database from version 13.20 to 17.7 on AWS RDS. This post documents the entire journey - including the mistakes I made, the approaches I considered, and how I achieved a switchover with minimal downtime.

The Problem

My application runs on AWS with PostgreSQL 13.20 on RDS. PostgreSQL 13 reaches end of standard support in November 2025, and I wanted to get ahead of forced upgrades. More importantly, PostgreSQL 17 brings significant performance improvements and features I wanted to leverage.

The challenge: my database serves a production application with active users. Traditional major version upgrades on RDS can take 10-30 minutes of downtime depending on database size - unacceptable for my use case.

I needed to answer one question: How do I upgrade PostgreSQL by 4 major versions with minimal disruption?

Understanding the Upgrade Options

Before diving into implementation, I evaluated three approaches:

Option 1: In-Place Upgrade (Native RDS)

The simplest approach - modify the engine_version in Terraform and apply.

resource "aws_db_instance" "postgres_rds" {
  engine_version = "17.7"  # Changed from 13.20
}

Problem: RDS performs a full pg_upgrade which:

Takes the database offline during the entire upgrade
Duration depends on database size (10-30+ minutes typically)
No rollback if something goes wrong post-upgrade

I ruled this out immediately.

Option 2: Manual Blue-Green with Read Replica

Create a read replica, upgrade it, then promote and switch DNS.

Problem:

PostgreSQL read replicas cannot be upgraded independently on RDS
The replica must match the primary's major version
This approach works for MySQL on RDS, not PostgreSQL

Option 3: AWS Blue-Green Deployments (My Choice)

AWS introduced Blue-Green Deployments for RDS in late 2022. This creates a synchronized staging environment (Green) that mirrors your production database (Blue), performs the upgrade on Green, then switches over with minimal downtime.

Why I chose this:

AWS handles replication automatically using logical replication
Switchover is fast (typically under 1 minute)
Built-in rollback capability before switchover
Supported for PostgreSQL major version upgrades

The Mistake: RDS Proxy Doesn't Mix with Blue-Green

My initial plan included RDS Proxy to minimize connection disruption during switchover. You might be wondering - why would I need a proxy for a database upgrade? I'll explain the reasoning later, but for now, just know that I expected it to help during the critical switchover moment.

I added RDS Proxy to my Terraform configuration:

resource "aws_db_proxy" "rds_proxy" {
  name                   = "staging-proxy"
  engine_family          = "POSTGRESQL"
  require_tls            = true
  vpc_security_group_ids = [aws_security_group.proxy_sg.id]
  vpc_subnet_ids         = data.aws_subnets.private.ids

  auth {
    auth_scheme = "SECRETS"
    secret_arn  = aws_secretsmanager_secret.proxy_credentials.arn
    iam_auth    = "DISABLED"
  }
}

resource "aws_db_proxy_default_target_group" "proxy_target" {
  db_proxy_name = aws_db_proxy.rds_proxy.name
}

resource "aws_db_proxy_target" "proxy_target" {
  db_proxy_name          = aws_db_proxy.rds_proxy.name
  target_group_name      = aws_db_proxy_default_target_group.proxy_target.name
  db_instance_identifier = aws_db_instance.postgres_rds.identifier
}

When I attempted to create the Blue-Green deployment, AWS returned this error:

Databases using RDS Proxy are not currently supported for Blue Green Deployments

This is a hard limitation. You cannot create a Blue-Green deployment for any RDS instance that has an RDS Proxy attached to it.

Lesson learned: Always verify service compatibility before architecting a solution. I assumed these two features would work together because both aim to reduce downtime. They don't.

Prerequisites for Blue-Green Deployments

Before you can create a Blue-Green deployment for PostgreSQL, your source database needs specific configuration:

1. Automated Backups Enabled

Blue-Green uses logical replication which requires point-in-time recovery capability.

resource "aws_db_instance" "postgres_rds" {
  backup_retention_period = 7          # Must be > 0
  backup_window           = "03:00-04:00"
}

2. Logical Replication Enabled

The source database needs a parameter group with logical replication enabled:

resource "aws_db_parameter_group" "postgres13_blue_green" {
  family = "postgres13"
  name   = "postgres13-blue-green"

  parameter {
    name         = "rds.logical_replication"
    value        = "1"
    apply_method = "pending-reboot"
  }

  parameter {
    name         = "max_replication_slots"
    value        = "10"
    apply_method = "pending-reboot"
  }

  parameter {
    name         = "max_wal_senders"
    value        = "10"
    apply_method = "pending-reboot"
  }
}

Important: Enabling rds.logical_replication requires a database reboot. Plan for this before your upgrade window.

3. Valid Upgrade Path

Not all version combinations are valid. Check available upgrade paths:

aws rds describe-db-engine-versions \
  --engine postgres \
  --engine-version 13.20 \
  --query 'DBEngineVersions[0].ValidUpgradeTarget[?MajorEngineVersion==`17`]' \
  --region ap-southeast-2

I initially targeted 17.2 and got this error:

Cannot find upgrade path from 13.20 to 17.2

The valid target for me was 17.7.

The Upgrade Process

Here's the step-by-step process I followed:

Step 1: Detach RDS Proxy (if attached)

Since I had already deployed RDS Proxy, I had to detach it first:

# Remove the proxy target
aws rds deregister-db-proxy-targets \
  --db-proxy-name staging-proxy \
  --db-instance-identifiers dev-postgres-rds \
  --region ap-southeast-2

Step 2: Create the Blue-Green Deployment

aws rds create-blue-green-deployment \
  --blue-green-deployment-name "pg13-to-pg17-upgrade" \
  --source "arn:aws:rds:ap-southeast-2:123456789:db:dev-postgres-rds" \
  --target-engine-version "17.7" \
  --target-db-parameter-group-name "default.postgres17" \
  --region ap-southeast-2

This kicks off the following process:

AWS creates a new RDS instance (Green) with PostgreSQL 17.7
Takes a snapshot of your Blue database
Restores it to the Green instance
Sets up logical replication from Blue to Green
Syncs all changes

Step 3: Wait for Provisioning

The Green environment takes time to provision. Monitor the status:

aws rds describe-blue-green-deployments \
  --blue-green-deployment-identifier "pg13-to-pg17-upgrade" \
  --query 'BlueGreenDeployments[0].Status' \
  --region ap-southeast-2

Status progression:

PROVISIONING - Creating the Green environment
AVAILABLE - Ready for switchover

For my 5GB database, this took approximately 35 minutes.

Step 4: Verify Green Environment

Before switching, verify the Green instance:

# Check the Green instance details
aws rds describe-blue-green-deployments \
  --blue-green-deployment-identifier "pg13-to-pg17-upgrade" \
  --query 'BlueGreenDeployments[0].SwitchoverDetails' \
  --region ap-southeast-2

Confirm:

Engine version shows 17.7
Replication lag is minimal
Status is "AVAILABLE"

Step 5: Execute Switchover

This is the critical moment. The switchover:

Stops writes to the Blue database
Waits for replication to catch up
Promotes Green to primary
Renames instances (Blue becomes -old, Green takes the original name)
Updates the endpoint

aws rds switchover-blue-green-deployment \
  --blue-green-deployment-identifier "pg13-to-pg17-upgrade" \
  --switchover-timeout 300 \
  --region ap-southeast-2

My measured downtime: 20 seconds

The switchover started at 16:26:42 and completed at 16:27:02 (UTC).

What Happens During Those 20 Seconds?

This is the part I promised to explain earlier - why I wanted RDS Proxy in the first place.

During the switchover window, your database is essentially unreachable. Any write operation attempted during this time will fail - not queue, not wait, just fail. Your application will receive errors like connection refused or connection reset.

This is "minimal downtime," not "zero downtime." The difference matters.

Why I Wanted RDS Proxy

RDS Proxy sits between your application and database, maintaining connection pools. The theory was:

Proxy holds active connections during the switchover
Buffers requests briefly while the endpoint changes
Redirects connections to the new (Green) instance seamlessly
Writes would wait instead of fail outright

This would have turned that 20-second failure window into a 20-second "slow response" window - much more graceful.

The Unfortunate Reality

AWS doesn't allow this. Blue-Green deployments and RDS Proxy are mutually exclusive. You must choose:

Blue-Green (no proxy)

Downtime: ~20 seconds
Write behavior: writes fail during switchover, retry logic required

RDS Proxy (no Blue-Green)

Downtime: 10–30 minutes (in-place upgrade)
Write behavior: connections handled gracefully during upgrade

I chose Blue-Green because 20 seconds of failed writes is better than 30 minutes of total unavailability. But your application needs to handle those transient failures - implement retry logic or return appropriate errors to users.

Step 6: Cleanup

After verifying the upgrade:

# Delete the old instance
aws rds delete-db-instance \
  --db-instance-identifier dev-postgres-rds-old1 \
  --skip-final-snapshot \
  --region ap-southeast-2

# Delete the Blue-Green deployment
aws rds delete-blue-green-deployment \
  --blue-green-deployment-identifier "pg13-to-pg17-upgrade" \
  --region ap-southeast-2

What About Terraform State?

If you manage your RDS instance with Terraform, the Blue-Green switchover creates a state mismatch. The instance identifier remains the same, but AWS has essentially replaced the instance.

After switchover, run:

terraform plan

You'll likely see Terraform wanting to modify parameters to match your configuration. Review the plan carefully - most changes should be no-ops or minor parameter adjustments.

I also cleaned up my Terraform config post-upgrade:

Removed the postgres13 parameter group (no longer needed)
Updated engine_version to "17.7"
Removed Blue-Green specific configuration

resource "aws_db_instance" "postgres_rds" {
  identifier             = "dev-postgres-rds"
  engine                 = "postgres"
  engine_version         = "17.7"
  instance_class         = "db.t3.micro"
  allocated_storage      = 5

  # Backups - good practice to keep enabled
  backup_retention_period = 7
  backup_window           = "03:00-04:00"

  # ... rest of config
}

Trade-offs and Limitations

What Blue-Green Deployments Handle Well

Major version upgrades with minimal downtime
Safe rollback option (before switchover)
Automatic data synchronization

What They Don't Handle

Databases with RDS Proxy attached
Multi-AZ DB clusters (as of early 2024)
Databases with more than 100 databases inside the instance
Cross-region scenarios

Hidden Costs

You pay for two RDS instances during the Blue-Green provisioning period
My upgrade window cost approximately 35 minutes of double billing
For larger databases, this could be significant

Lessons Learned

Verify service compatibility before architecting - RDS Proxy and Blue-Green Deployments don't work together. I wasted time setting up proxy infrastructure I had to tear down.
Check valid upgrade paths early - Not every version combination is valid. aws rds describe-db-engine-versions is your friend.
Logical replication requires a reboot - Enabling rds.logical_replication needs a database restart. Factor this into your planning.
Minimal downtime is achievable - For a 5GB database, the actual switchover was remarkably fast. Your application should handle brief connection interruptions gracefully.
Blue-Green works via CLI better than Terraform - The Terraform blue_green_update block exists but using AWS CLI gives you more control over timing and verification steps.
Clean up immediately - The old instance keeps running and billing you. Delete it promptly after verification.

Key Takeaways

AWS Blue-Green Deployments are the best option for PostgreSQL major version upgrades with minimal downtime
RDS Proxy is incompatible with Blue-Green Deployments - choose one approach
Expect ~30-45 minutes of provisioning time, but only seconds of actual downtime
Always verify upgrade paths and prerequisites before starting
Have a rollback plan, even though I didn't need mine

References I Followed

This post documents a real production upgrade performed in January 2026. Your mileage may vary based on database size, AWS region, and specific configuration.

If you're planning a similar upgrade, feel free to ask questions in the comments.

Top comments (1)

Ismail G. • Jan 31

Thanks for sharing