AWS Cloud Migration Checklist: Moving a Legacy App to the Cloud

#aws #architecture #devops #cloud

Cloud migration projects fail for predictable reasons. Not because the technology is hard but because AWS is well-documented and the tooling is mature. They fail because teams skip the audit phase, underestimate data migration complexity, and try to lift-and-shift architectures that were never designed for the cloud.

I've led cloud migrations for US businesses across FinTech, healthcare, and SaaS from single-server monoliths to multi-region systems. This checklist captures what actually needs to happen before, during, and after migration.

Phase 1: Pre-Migration Audit (Do Not Skip This)

Before touching AWS, you need a complete picture of what you're moving.

Application inventory:

[ ] List every service, process, and scheduled job running on your current infrastructure
[ ] Map all external dependencies: third-party APIs, payment processors, email providers
[ ] Document every port, protocol, and network path between services
[ ] Identify stateful vs stateless components: They migrate differently

Data inventory:

[ ] Catalog every database: type, size, read/write patterns, peak load times
[ ] Identify data with compliance constraints (PII, PHI, PCI): This affects your AWS region choices and service selections
[ ] Measure your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) requirements: These drive your backup and replication strategy
[ ] Document any data that cannot be in certain geographic regions (US-only requirements are common for government and healthcare clients)

Traffic baseline:

[ ] Capture 30 days of traffic patterns: requests/second, peak times, geographic distribution
[ ] Profile database query patterns: identify slow queries, N+1 problems, missing indexes
[ ] Measure current response times as your benchmark: You need to beat these after migration

Phase 2: Architecture Decisions

Choose your migration strategy:

Strategy	When to use	Risk
Lift and Shift (Rehost)	Time-constrained, prove cloud works first	Misses cloud-native optimisations
Replatform	Swap DB for RDS, storage for S3 with minimal code changes	Moderate
Refactor / Re-architect	When the current architecture is a long-term bottleneck	High effort, high reward

For most US SaaS companies I work with, replatform is the right starting point. You get the reliability and scaling benefits of managed services without a full rewrite.

Key AWS service decisions:

Compute:
  Stateless web servers → ECS (Fargate) or EC2 Auto Scaling Groups
  Scheduled jobs       → ECS Scheduled Tasks or Lambda (if under 15 min)
  Long-running workers → ECS with SQS trigger

Database:
  PostgreSQL / MySQL   → RDS with Multi-AZ enabled
  Redis cache          → ElastiCache (Redis)
  File storage         → S3 with CloudFront CDN
  Search               → OpenSearch Service

Networking:
  Load balancing       → Application Load Balancer (ALB)
  DNS                  → Route 53
  CDN                  → CloudFront
  Secrets              → AWS Secrets Manager (never hardcode credentials)

Phase 3: Infrastructure as Code (Non-Negotiable)

If you're clicking through the AWS console to set up production infrastructure, you're building technical debt with every click. Use Terraform or AWS CDK from day one.

# terraform/main.tf: example: ECS cluster + RDS
resource "aws_ecs_cluster" "app" {
  name = "${var.app_name}-${var.environment}"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

resource "aws_db_instance" "postgres" {
  identifier        = "${var.app_name}-${var.environment}"
  engine            = "postgres"
  engine_version    = "15.3"
  instance_class    = var.db_instance_class
  allocated_storage = var.db_storage_gb

  multi_az               = var.environment == "production"
  deletion_protection    = var.environment == "production"
  backup_retention_period = 7

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
}

Infrastructure as code means your entire environment is reproducible, reviewable in PRs, and recoverable after a disaster.

Phase 4: The Actual Migration

Database migration approach:

For minimal downtime, use the strangler pattern:

Set up RDS instance and run a full dump/restore to establish the baseline
Enable continuous replication from old DB to RDS (AWS DMS handles this)
Run both databases in parallel, validate data consistency
Switch application read traffic to RDS, keep writes going to old DB
Switch write traffic to RDS
Monitor for 48 hours
Decommission old database

Zero-downtime deployment setup:

# ECS task definition snippet
{
  "family": "app-production",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "your-account.dkr.ecr.us-east-1.amazonaws.com/app:latest",
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ],
  "requiresCompatibilities": ["FARGATE"]
}

The health check is critical. ECS won't route traffic to a new container until it passes. Meaning a bad deployment rolls back automatically instead of taking down production.

Phase 5: Post-Migration Validation

[ ] Response times at or below pre-migration baseline
[ ] Error rates within normal range for 72 hours
[ ] Database query performance profiled on RDS: Slow query log enabled
[ ] CloudWatch alarms configured for: CPU, memory, database connections, error rates, 5xx responses
[ ] Cost Explorer reviewed: Confirm you're not running over-provisioned instances
[ ] Security: all services in private subnets, no public RDS endpoints, WAF in front of ALB
[ ] Backup restore tested: Not just that backups run, but that you can actually restore from them

The Cost Trap

AWS costs can surprise teams coming from fixed-price hosting. Two common traps:

Data transfer costs: Moving data out of AWS is expensive. If your app serves large files to US users, price your CloudFront distribution vs direct S3 transfer costs before you launch.

RDS instance sizing: Teams often start with a db.r6g.4xlarge "to be safe" and pay $800+/month for a database handling 10 req/sec. Start smaller, enable Performance Insights, and scale based on actual metrics, not fear.

Cloud migration done right leaves you with infrastructure that's more reliable, more scalable, and often cheaper than what you started with. Done wrong, it's an expensive, slower version of what you had before.

If you're planning a cloud migration for a US business application, this is core work I do. From architecture planning through live cutover. More at waqarhabib.com/services/cloud-migration.

Originally published at waqarhabib.com