DEV Community

Cover image for GCP to AWS Migration โ€“ Part 2: Real Cutover, Issues & Recovery
Cyril Sebastian
Cyril Sebastian

Posted on • Originally published at tech.cyrilsebastian.com

GCP to AWS Migration โ€“ Part 2: Real Cutover, Issues & Recovery

๐Ÿš€ Start of Part 2: The Real Cutover & Beyond

While Part 1 laid the architectural and data groundwork, Part 2 is where the real-world complexity kicked in.

We faced:

  • Database promotions that didnt go as rehearsed,

  • Lazy-loaded Solr indexes fighting with EBS latency,

  • Hardcoded GCP configs in the dark corners of our stack,

  • And the high-stakes pressure of a real-time production cutover.

If Part 1 was planning and theory, Part 2 was execution and improvisation.

Lets dive into the live switch, the challenges we didnt see coming, and how we turned them into lessons and long-term wins.


โš™๏ธ Phase 4: Application & Infrastructure Layer Adaptation

As part of the migration, significant adjustments were required in both the application configuration and infrastructure setup to align with AWS's architecture and security practices.

Key Changes & Adaptations

  • Private Networking & Bastion Access

  • CORS & S3 Policy Updates

  • Application Configuration Updates

  • Internal DNS Transition

  • Static Asset Delivery via CloudFront

  • Security Hardening with WAF

Firewall Rules: Securing AWS MySQL from Legacy Sources

To ensure controlled access to the new MySQL server in AWS , we hardened the instance using explicit iptables rules. These rules:

  • Blocked direct MySQL access from legacy or untrusted subnets (e.g., GCP App subnets)

  • Allowed SSH access only from trusted bastion/admin IPs during the migration window

FIREWALL RULES FLOW:

[Blocked Sources] โ”€โ”€โŒโ”€โ”€โ”
                        โ”‚
10.AAA.0.0/22     โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
10.BBB.248.0/21   โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                        โ”œโ”€โ”€โ”€ DROP:3306 โ”€โ”€โ”€โ”
                        โ”‚                 โ”‚
[Allowed Sources] โ”€โ”€โœ…โ”€โ”€โ”ค                 โ–ผ
                        โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
10.AAA.0.4/32     โ”€โ”€โ”€โ”€โ”€โ”€โ”ค         โ”‚ 10.BBB.CCC.223  โ”‚
10.BBB.248.158/32 โ”€โ”€โ”€โ”€โ”€โ”€โ”ค         โ”‚  MySQL Server   โ”‚
10.BBB.251.107/32 โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€ACCEPTโ”€โ”€โ”‚     (AWS)       โ”‚
10.BBB.253.9/32   โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  :22    โ”‚                 โ”‚
                        โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Enter fullscreen mode Exit fullscreen mode

Legend:

  • 10.AAA.x.x = Source network (GCP)

  • 10.BBB.CCC.223 = Target MySQL server in AWS

  • IPs like 10.BBB.248.158 = Bastion or trusted admin IPs allowed for SSH

This rule-based approach gave us an extra layer of protection beyond AWS Security Groups during the critical migration phase.

๐ŸŒ Load Balancer Differences: GCP vs AWS

During the migration, we encountered significant differences in how load balancing is handled between GCP and AWS. This required architectural adjustments and deeper routing, SSL, and compute scaling planning.

๐Ÿ“Š Comparison Overview

Feature GCP HTTPS Load Balancer AWS Application Load Balancer (ALB)
Scope Global by default Regional
TLS/SSL Wildcard SSL was uploaded to GCP Managed manually via AWS Certificate Manager (ACM)
Routing Logic URL Maps Target Groups with Listener Rules
IP Type Static Public IP CNAME with DNS-based routing
Backend Integration Global Load Balancer โ†’ MIG (Managed Instance Groups) ALB โ†’ Target Group โ†’ ASG (Auto Scaling Group)

๐Ÿงฉ Key Migration Notes

  • Static IP vs DNS Routing

  • Routing Mechanism Differences

  • SSL/TLS Certificates

  • Application-Specific Custom Rules

๐Ÿ“ฎ Special Case: Postfix & Port 25 Restrictions

To migrate our Postfix mail servers that use port 25 for SMTP, we had to:

  • Submit an explicit request to AWS Support for port 25 to be opened (outbound) on our AWS account in the specific region.

  • This was a prerequisite for creating a Network Load Balancer (NLB) that could pass traffic directly to the Postfix instances.

Note : AWS restricts outbound SMTP traffic on port 25 by default to prevent abuse. This is not the case in GCP, so ensure to factor this into your cutover timeline if you're migrating mail servers.


๐Ÿ” Phase 5: Apache Solr Migration

Apache Solr powered our platform's search functionality with complex indexing and fast response times. Migrating it to AWS introduced both architectural and operational complexities.

๐Ÿ›  Migration Strategy

  • AMI Creation Was Non-Trivial :

    We created custom AMIs for Solr nodes with large EBS volumes. However, this surfaced two key challenges:

  • No AWS FSR :

    AWS Fast Snapshot Restore (FSR) could have helpedbut was ruled out due to budget constraints. Without FSR, we observed delayed volume readiness post-launch.

  • Index Rebuild from Source DB :

    Post-migration, we rebuilt Solr indexes from source data stored in MongoDB and MySQL, ensuring consistency and avoiding partial data issues.

  • Master-Slave Architecture :

    We finalized a standalone Solr master-slave setup on EC2 after a dedicated PoC. This provided better control compared to GCP's managed instance groups.


๐Ÿ— GCP vs AWS Deployment Model

Feature GCP MIGs AWS EC2 Standalone
Deployment Solr slaves ran in Managed Instance Groups Solr nodes deployed on standalone EC2s
Volume Attachment Persistent volumes mounted with boot disk EBS volumes suffered from lazy loading, slowing boot
Autoscaling Fully autoscaled Solr slaves based on demand Autoscaling impractical due to volume readiness delays
Cost Management On-demand scaling saved costs Used EC2 scheduling (shutdown/startup) to control spend

โšก Operational Decision: No Autoscaling for Solr in AWS

In GCP, autoscaling Solr slaves was seamlessโ€”new instances booted with attached volumes and joined the cluster dynamically.

However, in AWS:

  • Lazy loading of EBS volumes made autoscaling unreliable for time-sensitive indexing.

Instead, we:

  • Kept EC2 nodes in a fixed topology
  • Used scheduled start/stop scripts (via cron) to manage uptime during peak/off-peak hours

Lessons Learned

Solr migrations need deep consideration of disk behavior in AWS. If you're not using FSR, do not assume volume availability equals data availability. Factor in rebuild times, cost impact, and whether autoscaling truly benefits your workload.


๐Ÿ›‘ The Cutover Weekend

We declared a deployment freeze 7 days before the migration to maintain stability and reduce last-minute surprises.

Pre-Cutover Checklist

  • TTL reduced to 60 seconds to allow quick DNS propagation.

  • Final S3 and database sync performed.

  • Checksums validated for critical data.

  • Route 53 routing policies configured to mimic GCPs internal DNS.

  • CloudWatch , Nagios, and Grafana set up for monitoring.

  • Final fallback snapshot captured.

  • A comprehensive cutover runbook was prepared with clear task ownership and escalation paths.


๐Ÿ•’ Cutover Timeline

Time Slot Task
Hour 1 Final S3 + DB sync
Hour 2โ€“3 DB failover and validation
Hour 4 DNS switch from GCP to Route 53
Hour 5โ€“6 Traffic validation + rollback readiness

๐Ÿ˜ฎ Unexpected Issues (and What We Did)

Problem Solution
MySQL master switch had lag Improved replica promotion playbook
Hardcoded GCP configs found Emergency patching of ENV & redeploy
Solr slow to boot under load Temporarily pre-warmed EC2 nodes

๐Ÿš€ Post-Migration Optimizations

  • Rightsized EC2 instances using historical metrics

  • Committed to Savings Plans for reserved workloads

  • Enabled and tuned S3 lifecycle policies

  • Set up automated AMI rotations and DB snapshots


๐Ÿง  End of Part 2: Final Thoughts & Whats Next

This journey from GCP to AWS wasnt just about swapping clouds, it was a masterclass in operational resilience , cross-team coordination , and cloud-native rethinking.

We learned that:

  • No plan survives contact without flexibility.

  • Owning your infrastructure also means owning your edge cases.

  • Migration is more than lift-and-shift, it's evolve or expire.

Smooth seas never made a skilled sailor.

Franklin D. Roosevelt

This migration tested our nerves and processes, but ultimately, it left us with better observability, tighter security, and an infrastructure we could proudly call production-grade.


๐Ÿ”— If this helped or resonated with you, connect with me on LinkedIn. Let's learn and grow together.

๐Ÿ‘‰ Stay tuned for more behind-the-scenes write-ups and system design breakdowns.


Top comments (0)