Cloud Migration Playbook
A structured, phase-based framework for migrating workloads to the cloud with confidence. This playbook covers the complete migration lifecycle — from initial portfolio assessment and dependency mapping through wave planning, execution, validation, and rollback procedures. Each phase includes decision templates, checklists, Terraform scaffolding for landing zones, and scripts for cutover automation. Built for teams who need a repeatable process, not just a high-level slide deck.
Key Features
- 6R Classification Tool — Decision matrix for categorizing each workload as Rehost, Replatform, Refactor, Repurchase, Retire, or Retain
- Dependency Mapper — Scripts and templates for discovering application dependencies, network flows, and data relationships
- Wave Planning Templates — Spreadsheet-based wave organizer grouping workloads by dependency, risk, and business criticality
- Landing Zone Scaffolding — Terraform modules for provisioning target accounts, VPCs, and security baselines pre-migration
- Cutover Runbooks — Step-by-step procedures with parallel tracks for database migration, DNS cutover, and application switchover
- Rollback Procedures — Pre-tested rollback scripts with decision criteria for when to abort a migration wave
- Validation Framework — Automated smoke tests, performance benchmarks, and data integrity checks post-migration
- Stakeholder Templates — Communication plans, status reports, and go/no-go decision documents
Quick Start
# Step 1: Run the portfolio assessment
python3 src/assessment/portfolio_scanner.py \
--inventory inventory.csv \
--output-dir reports/assessment/
# Step 2: Generate wave plan
python3 src/planning/wave_planner.py \
--assessment reports/assessment/classified.json \
--max-wave-size 10 \
--output reports/wave-plan.json
# Step 3: Deploy landing zone for wave 1
cd src/terraform/landing-zone
terraform init
terraform apply -var="wave=1"
Architecture
┌──────────────────────────────────────────────────────────┐
│ Migration Lifecycle │
│ │
│ Phase 1 Phase 2 Phase 3 Phase 4 │
│ ASSESS PLAN EXECUTE VALIDATE │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Discover │ │Wave │ │Provision│ │Smoke │ │
│ │Classify │───►│Planning │──►│Migrate │─►│Test │ │
│ │Baseline │ │Runbooks │ │Cutover │ │Perf. │ │
│ │Deps Map │ │Rollback │ │DNS Swap │ │Data OK │ │
│ └─────────┘ └─────────┘ └────┬────┘ └─────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Rollback? │ │
│ │ YES: Run │ │
│ │ rollback │ │
│ │ script │ │
│ └───────────┘ │
└──────────────────────────────────────────────────────────┘
Usage Examples
6R Classification Logic
# src/assessment/classifier.py
from dataclasses import dataclass
from enum import Enum
class MigrationStrategy(Enum):
REHOST = "rehost" # Lift-and-shift to cloud VMs
REPLATFORM = "replatform" # Minor optimization (e.g., managed DB)
REFACTOR = "refactor" # Re-architect for cloud-native
REPURCHASE = "repurchase" # Replace with SaaS
RETIRE = "retire" # Decommission
RETAIN = "retain" # Keep on-premises
@dataclass
class WorkloadAssessment:
name: str
strategy: MigrationStrategy
complexity: str # low, medium, high
business_criticality: str # low, medium, high, critical
estimated_effort_days: int
dependencies: list[str]
def classify_workload(
age_years: int,
has_saas_alternative: bool,
monthly_users: int,
technical_debt_score: float, # 0.0 to 1.0
) -> MigrationStrategy:
"""Recommend migration strategy based on workload characteristics."""
if monthly_users == 0:
return MigrationStrategy.RETIRE
if has_saas_alternative and technical_debt_score > 0.7:
return MigrationStrategy.REPURCHASE
if technical_debt_score > 0.8:
return MigrationStrategy.REFACTOR
if age_years > 10:
return MigrationStrategy.REPLATFORM
return MigrationStrategy.REHOST
Landing Zone Terraform Module
# src/terraform/landing-zone/main.tf
module "vpc" {
source = "./modules/vpc"
cidr = var.vpc_cidr
azs = var.availability_zones
project = var.project_name
env = "migration-wave-${var.wave}"
}
module "security_baseline" {
source = "./modules/security"
vpc_id = module.vpc.vpc_id
enable_flow_logs = true
enable_guardduty = true
log_retention_days = 90
}
module "migration_staging" {
source = "./modules/staging"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
# Temporary resources for migration — destroyed after validation
instance_type = "m5.xlarge"
replication_agent = true
}
Cutover Checklist as Code
# configs/cutover-checklist.yaml
pre_cutover:
- name: Freeze code deployments
owner: dev-team
verification: "No open PRs targeting main branch"
- name: Final data sync
owner: dba-team
verification: "Replication lag < 1 second"
- name: Notify stakeholders
owner: project-lead
verification: "Slack message sent to #migration channel"
cutover:
- name: Stop writes to source database
command: "python3 scripts/freeze_source_db.py"
rollback: "python3 scripts/unfreeze_source_db.py"
- name: Final replication catch-up
command: "python3 scripts/wait_for_sync.py --max-wait 300"
timeout_seconds: 600
- name: Switch DNS to cloud endpoint
command: "python3 scripts/dns_cutover.py --target cloud"
rollback: "python3 scripts/dns_cutover.py --target onprem"
post_cutover:
- name: Run smoke tests
command: "pytest tests/smoke/ -v --timeout=120"
required: true
- name: Monitor error rates for 30 minutes
command: "python3 scripts/monitor_errors.py --duration 1800"
required: true
Configuration
# configs/migration-config.yaml
project_name: acme-cloud-migration
target_cloud: aws # aws, azure, or gcp
target_region: us-east-1
waves:
max_workloads_per_wave: 10 # Keep waves small and manageable
min_days_between_waves: 7 # Buffer for issue resolution
blackout_dates: # No migrations during these periods
- "2026-03-25/2026-03-31" # End of quarter
- "2026-12-20/2027-01-05" # Holiday freeze
rollback:
auto_rollback_on_smoke_failure: true
max_rollback_window_hours: 4 # After this, rollback becomes risky
keep_source_running_days: 14 # Source stays active post-cutover
notifications:
slack_webhook: YOUR_SLACK_WEBHOOK_HERE
email: migration-team@example.com
Best Practices
- Assess everything before migrating anything — Full portfolio discovery prevents surprises in later waves
- Start with low-risk workloads — Build team confidence and refine processes before touching critical systems
- Keep wave sizes small — 5-10 workloads per wave; larger waves increase blast radius if something goes wrong
- Test rollback before you need it — Run a practice rollback during wave 1 even if the migration succeeds
- Maintain source in parallel — Keep on-prem systems running for at least 2 weeks post-cutover as a safety net
- Track dependencies in a graph — Spreadsheets fail at showing transitive dependencies; use a proper dependency map
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Replication agent can't connect to target | Security group missing ingress rule for replication port | Add TCP 443/1500 inbound from source IP range |
| Data integrity check fails post-migration | Writes occurred during cutover window | Re-run final sync; ensure source was frozen before cutover |
| DNS propagation delayed | Long TTL on existing DNS records | Lower TTL to 60s at least 24h before cutover |
| Application errors after cutover | Hardcoded IP addresses in application config | Search configs for source IPs; replace with cloud endpoints or DNS names |
This is 1 of 11 resources in the Cloud Architecture Pro toolkit. Get the complete [Cloud Migration Playbook] with all files, templates, and documentation for $49.
Or grab the entire Cloud Architecture Pro bundle (11 products) for $149 — save 30%.
Top comments (0)