Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Cloud Migration Playbook

#cloud #aws #terraform #architecture

Cloud Migration Playbook

A structured, phase-based framework for migrating workloads to the cloud with confidence. This playbook covers the complete migration lifecycle — from initial portfolio assessment and dependency mapping through wave planning, execution, validation, and rollback procedures. Each phase includes decision templates, checklists, Terraform scaffolding for landing zones, and scripts for cutover automation. Built for teams who need a repeatable process, not just a high-level slide deck.

Key Features

6R Classification Tool — Decision matrix for categorizing each workload as Rehost, Replatform, Refactor, Repurchase, Retire, or Retain
Dependency Mapper — Scripts and templates for discovering application dependencies, network flows, and data relationships
Wave Planning Templates — Spreadsheet-based wave organizer grouping workloads by dependency, risk, and business criticality
Landing Zone Scaffolding — Terraform modules for provisioning target accounts, VPCs, and security baselines pre-migration
Cutover Runbooks — Step-by-step procedures with parallel tracks for database migration, DNS cutover, and application switchover
Rollback Procedures — Pre-tested rollback scripts with decision criteria for when to abort a migration wave
Validation Framework — Automated smoke tests, performance benchmarks, and data integrity checks post-migration
Stakeholder Templates — Communication plans, status reports, and go/no-go decision documents

Quick Start

# Step 1: Run the portfolio assessment
python3 src/assessment/portfolio_scanner.py \
  --inventory inventory.csv \
  --output-dir reports/assessment/

# Step 2: Generate wave plan
python3 src/planning/wave_planner.py \
  --assessment reports/assessment/classified.json \
  --max-wave-size 10 \
  --output reports/wave-plan.json

# Step 3: Deploy landing zone for wave 1
cd src/terraform/landing-zone
terraform init
terraform apply -var="wave=1"

Architecture

┌──────────────────────────────────────────────────────────┐
│                  Migration Lifecycle                     │
│                                                          │
│  Phase 1         Phase 2        Phase 3       Phase 4    │
│  ASSESS          PLAN           EXECUTE       VALIDATE   │
│  ┌─────────┐    ┌─────────┐   ┌─────────┐  ┌─────────┐ │
│  │Discover │    │Wave     │   │Provision│  │Smoke    │ │
│  │Classify │───►│Planning │──►│Migrate  │─►│Test     │ │
│  │Baseline │    │Runbooks │   │Cutover  │  │Perf.    │ │
│  │Deps Map │    │Rollback │   │DNS Swap │  │Data OK  │ │
│  └─────────┘    └─────────┘   └────┬────┘  └─────────┘ │
│                                    │                     │
│                              ┌─────▼─────┐               │
│                              │ Rollback? │               │
│                              │  YES: Run │               │
│                              │  rollback │               │
│                              │  script   │               │
│                              └───────────┘               │
└──────────────────────────────────────────────────────────┘

Usage Examples

6R Classification Logic

# src/assessment/classifier.py
from dataclasses import dataclass
from enum import Enum

class MigrationStrategy(Enum):
    REHOST = "rehost"           # Lift-and-shift to cloud VMs
    REPLATFORM = "replatform"   # Minor optimization (e.g., managed DB)
    REFACTOR = "refactor"       # Re-architect for cloud-native
    REPURCHASE = "repurchase"   # Replace with SaaS
    RETIRE = "retire"           # Decommission
    RETAIN = "retain"           # Keep on-premises

@dataclass
class WorkloadAssessment:
    name: str
    strategy: MigrationStrategy
    complexity: str             # low, medium, high
    business_criticality: str   # low, medium, high, critical
    estimated_effort_days: int
    dependencies: list[str]

def classify_workload(
    age_years: int,
    has_saas_alternative: bool,
    monthly_users: int,
    technical_debt_score: float,  # 0.0 to 1.0
) -> MigrationStrategy:
    """Recommend migration strategy based on workload characteristics."""
    if monthly_users == 0:
        return MigrationStrategy.RETIRE
    if has_saas_alternative and technical_debt_score > 0.7:
        return MigrationStrategy.REPURCHASE
    if technical_debt_score > 0.8:
        return MigrationStrategy.REFACTOR
    if age_years > 10:
        return MigrationStrategy.REPLATFORM
    return MigrationStrategy.REHOST

Landing Zone Terraform Module

# src/terraform/landing-zone/main.tf
module "vpc" {
  source  = "./modules/vpc"
  cidr    = var.vpc_cidr
  azs     = var.availability_zones
  project = var.project_name
  env     = "migration-wave-${var.wave}"
}

module "security_baseline" {
  source              = "./modules/security"
  vpc_id              = module.vpc.vpc_id
  enable_flow_logs    = true
  enable_guardduty    = true
  log_retention_days  = 90
}

module "migration_staging" {
  source            = "./modules/staging"
  vpc_id            = module.vpc.vpc_id
  subnet_ids        = module.vpc.private_subnet_ids
  # Temporary resources for migration — destroyed after validation
  instance_type     = "m5.xlarge"
  replication_agent = true
}

Cutover Checklist as Code

# configs/cutover-checklist.yaml
pre_cutover:
  - name: Freeze code deployments
    owner: dev-team
    verification: "No open PRs targeting main branch"
  - name: Final data sync
    owner: dba-team
    verification: "Replication lag < 1 second"
  - name: Notify stakeholders
    owner: project-lead
    verification: "Slack message sent to #migration channel"

cutover:
  - name: Stop writes to source database
    command: "python3 scripts/freeze_source_db.py"
    rollback: "python3 scripts/unfreeze_source_db.py"
  - name: Final replication catch-up
    command: "python3 scripts/wait_for_sync.py --max-wait 300"
    timeout_seconds: 600
  - name: Switch DNS to cloud endpoint
    command: "python3 scripts/dns_cutover.py --target cloud"
    rollback: "python3 scripts/dns_cutover.py --target onprem"

post_cutover:
  - name: Run smoke tests
    command: "pytest tests/smoke/ -v --timeout=120"
    required: true
  - name: Monitor error rates for 30 minutes
    command: "python3 scripts/monitor_errors.py --duration 1800"
    required: true

Configuration

# configs/migration-config.yaml
project_name: acme-cloud-migration
target_cloud: aws                  # aws, azure, or gcp
target_region: us-east-1

waves:
  max_workloads_per_wave: 10       # Keep waves small and manageable
  min_days_between_waves: 7        # Buffer for issue resolution
  blackout_dates:                  # No migrations during these periods
    - "2026-03-25/2026-03-31"      # End of quarter
    - "2026-12-20/2027-01-05"      # Holiday freeze

rollback:
  auto_rollback_on_smoke_failure: true
  max_rollback_window_hours: 4     # After this, rollback becomes risky
  keep_source_running_days: 14     # Source stays active post-cutover

notifications:
  slack_webhook: YOUR_SLACK_WEBHOOK_HERE
  email: migration-team@example.com

Best Practices

Assess everything before migrating anything — Full portfolio discovery prevents surprises in later waves
Start with low-risk workloads — Build team confidence and refine processes before touching critical systems
Keep wave sizes small — 5-10 workloads per wave; larger waves increase blast radius if something goes wrong
Test rollback before you need it — Run a practice rollback during wave 1 even if the migration succeeds
Maintain source in parallel — Keep on-prem systems running for at least 2 weeks post-cutover as a safety net
Track dependencies in a graph — Spreadsheets fail at showing transitive dependencies; use a proper dependency map

Troubleshooting

Issue	Cause	Fix
Replication agent can't connect to target	Security group missing ingress rule for replication port	Add TCP 443/1500 inbound from source IP range
Data integrity check fails post-migration	Writes occurred during cutover window	Re-run final sync; ensure source was frozen before cutover
DNS propagation delayed	Long TTL on existing DNS records	Lower TTL to 60s at least 24h before cutover
Application errors after cutover	Hardcoded IP addresses in application config	Search configs for source IPs; replace with cloud endpoints or DNS names

This is 1 of 11 resources in the Cloud Architecture Pro toolkit. Get the complete [Cloud Migration Playbook] with all files, templates, and documentation for $49.

Get the Full Kit →

Or grab the entire Cloud Architecture Pro bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →

Top comments (1)

Ramona Garcia • Apr 30

Great breakdown of a practical migration framework—especially the wave planning + rollback-first mindset. That’s where most real-world migrations succeed or fail.

In my experience, having a structured approach like this removes a lot of guesswork and reduces risk significantly, especially for large enterprise environments.

We’ve seen similar outcomes when teams combine strong planning with hands-on cloud migration expertise—something we also focus on at LogicEra while supporting end-to-end cloud migration programs.

DEV Community

Cloud Migration Playbook

Cloud Migration Playbook

Key Features

Quick Start

Architecture

Usage Examples

6R Classification Logic

Landing Zone Terraform Module

Cutover Checklist as Code

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (1)