Rizwan Saleem

Posted on May 30

How to migrate legacy systems incrementally without breaking everything

#webdev #react #frontend #ai

How to migrate legacy systems incrementally without breaking everything

Legacy System Migration: The Strangler Fig Pattern Complete Tutorial

Why Big-Bang Rewrites Fail

Big-bang rewrites fail for predictable reasons: the legacy system keeps evolving during the rewrite (moving target), undocumented edge cases hidden in production code surface too late, teams lose motivation after months with nothing shipped, and maintaining two systems becomes permanent. When someone proposes a full rewrite of a production system, the default answer should be "no"-the burden of proof is on demonstrating why incremental migration won't work.

The Strangler Fig Pattern Explained

The strangler fig pattern incrementally migrates a legacy system by gradually replacing specific functionality with new applications while keeping the old system running. Named after the tree that grows around a host tree and eventually replaces it, this approach lets you modernize piece by piece rather than attempting a risky big-bang rewrite.

How It Works

Phase	What Happens	Duration
Setup	Place routing layer (proxy/façade) in front of legacy system; route 100% traffic to legacy	2 weeks
Migrate Incrementally	Move one feature at a time to new system, starting with lowest-risk endpoint	2-6 months
Decommission	When all traffic flows through new system, keep legacy running as rollback target for a month, then turn off	2 weeks

The proxy layer intercepts requests and routes them to either the legacy system or new services. From the client's perspective, nothing changes-same URLs, same request shapes, same authentication.

Step-by-Step Implementation

1. Map Your Legacy System

Identify major capabilities and boundaries between them. Find where you could intercept requests. Apply criteria for your first candidate: bounded capability, valuable, and low-risk.

2. Build the Routing Layer First

Before migrating anything, set up the infrastructure to route requests between systems. For web applications, use a reverse proxy or load balancer. Initially forward 100% of requests to the legacy system and verify the routing layer doesn't break anything.

3. Develop New Components in Isolation

For each slice, develop a new component replicating the old functionality with modern technologies. Start with a relatively isolated piece of functionality.

4. Route Traffic Gradually

Once a new component is ready, implement routing through the indirection layer instead of the old component. Increase traffic in stages: 1%, 10%, 50%, 100%.

5. Iterate and Retire Old Components

Repeat the cycle for each additional function. As new components take over, old components become redundant and can be retired.

Feature Flags for Migration Control

Feature flags are essential for managing transitions and mitigating migration risk. Don't flip an endpoint from legacy to modern all at once-use feature flags to shift traffic gradually:

Week	Traffic to Modern System	What You're Watching
1	Internal team only	Error rates, response times, data consistency
2	5% of users (smallest accounts)	Edge cases, query performance at small scale
3	25% of users	Database load, cache hit rates
4	50% of users	Aggregate metrics vs. legacy baseline
5	90% of users	Long-tail edge cases
6	100%	Monitoring period, legacy route removed

At any point, if you see errors or performance degradation, flip the flag back instantly. Start with your internal team, then smallest/newest users-they have the least data, fewest edge cases, highest tolerance for issues. Migrate largest, most complex accounts last.

Data Migration Strategies

Code migration is straightforward compared to data migration. Code is stateless; data is stateful-once you've written to a new schema, rolling back means migrating data backwards.

Dual-Write with Reconciliation (Proven Approach)

The new system writes to the new database (source of truth going forward)
It also writes to the legacy database (keeps legacy system consistent during transition)
A background job migrates historical data in batches
A reconciliation job compares data between old and new, flagging discrepancies

The reconciliation job is where you'll discover every undocumented assumption in the legacy system-timezone conversions, currency rounding differences, nullable fields handled inconsistently. Budget twice as long as you think for data migration-in one project, code migration took 3 months while data migration and reconciliation took 4.

Data Migration Strategy Options

Strategy	Description	Downtime	Risk
Big Bang	All data transferred in one go during planned downtime	High	High
Trickle Migration	Data moved gradually in phases; source and target run in parallel	Minimal	Lower
Incremental Transfer	Transfer data in batches instead of all at once	Minimal	Moderate
Live Migration	Database replication, active-active failover, zero-downtime deployment	None	Lowest

Best Practices for Data Integrity

Use checksums and record counts to confirm data integrity
Preserve data integrity with version control for all migration scripts
Document everything-keep complete logs for root cause analysis
Start reconciliation early-in the first month-to discover mapping bugs sooner

Minimizing and Handling Downtime

Pre-Migration Planning

Develop a comprehensive migration plan with detailed schedules and resource allocations
Perform migrations during off-peak hours or planned maintenance windows
Establish clear fallback and rollback procedures thoroughly tested before migration begins

Live Migration Techniques

Employ database replication, active-active failover configurations, and zero-downtime deployment strategies to continue operations without interruptions. AWS Database Migration Service provides real-time replication during transition.

Communication and Coordination

Maintain open lines of communication with all stakeholders-IT staff, end-users, and business leaders.

Rollback Plans

Instant Rollback with Feature Flags

The psychological advantage of strangler fig is that every step is reversible-flip the feature flag back instantly if you see errors.

Legacy System as Rollback Target

Keep the legacy system running in production for a month or more after you think you're "done"-serving zero traffic but available as a rollback target. Decommission it when you're confident the new system handles every edge case.

Database Rollback

Each ChangeSet should include rollback logic defined in the migration script. If a migration fails, the system automatically invokes rollback commands, restoring the database schema to its previous stable state. Pre-defined checkpoints should be established where migration can be paused or rolled back if necessary.

Testing Migrations

Contract Testing is Critical

89% of successful strangler projects used contract testing; only 14% of failed projects did. Contract testing validates API agreements between consumer and provider services.

Test Distribution Shift During Migration

Test Type	Monolith %	During Strangler %	Post-Migration %
Unit (within service)	40%	55%	70%
Contract (API agreements)	5%	35%	25%
Integration (DB/external)	25%	8%	4%
E2E (full system)	30%	2%	1%

Testing Checklist

All migration scripts must be versioned
Monitor performance for bottlenecks, be prepared to change approach
Perform automated reconciliation runs continuously (8,064 runs over 14 months in one case study)
Set kill criteria: 4 consecutive weeks with 0 mismatches before decommissioning

Real Migration Case Studies

Case Study 1: Insurance SaaS Pricing Engine Migration

Project Details:

Codebase: 380,418 LOC VB6 + 127 SQL Server stored procedures
Timeline: 14 months (Feb 2024 - Apr 2025)
Team: 5 engineers (3 backend, 1 DevOps, 1 QA)
Cost: $1.24M

Before Migration Metrics:

Metric	Before (VB6)	After (.NET 8)	Improvement
Median latency	1,247ms	32ms	-97.4%
P99 latency	4,820ms	78ms	-98.4%
Error rate	0.18%	0.004%	-97.8%
Infrastructure cost	$840K/year	$160K/year	-$680K
Deployment frequency	4x/year	52x/year	+1,200%
Pricing bug MTTR	4.2 days	1.8 hours	-96%

Reconciliation Loop That Saved $4.2M:

8,064 automated reconciliation runs over 14 months
12.4M pricing calculations processed
847 mismatches detected (0.007% of events)
Mismatches: rounding differences (49%), tax calculation edge cases (26%), undocumented business rules (15%), timezone handling (8%)

Without automated reconciliation, projected $4.2M in incorrect quotes/invoices would have occurred over 14 months.

Case Study 2: OTT Streaming Platform Microservices Migration

Successfully migrated legacy systems to microservices using UUID mapping strategy for data syncing
Used SNS, SQS, and Dead Letter Queues for data synchronization and versioning
New system supports myriad features, better content management, streamlined publishing mechanics

Case Study 3: Global EHS Platform to ServiceNow

12-month migration of global EHS platform off outdated SaaS to ServiceNow
Saved 20% while completing migration
95% of workflows worked as expected, remaining 5% required post-deployment adjustments

5 Lessons That Made the Difference:

Stakeholder mapping first-defined decision-making authority
Vendor verification-validated workflows in controlled test environments
Embedding business in DevOps-closed gap between requirements and delivery
Smart scope management-tailored adoption by audience
Deliverable-based success metrics-measurable outcomes instead of abstract goals

Why Most Strangler Attempts Fail (Research Data)

Analysis of 41 enterprise strangler projects (2022-2025) found 68% stalled before 90 days, never replacing their first monolith component.

Failure Analysis: 28 Stalled Projects

Failure Mode	% of Projects	Median Time to Stall	Primary Symptom
Started at UI layer	39%	68 days	New UI tightly coupled to legacy backend
No stable semantic boundary	32%	72 days	Endless scope creep
Treated legacy DB as immutable	18%	104 days	Dual-write hell, data sync failures
Underestimated observability	7%	45 days	Can't debug distributed system
Cultural resistance (no DevOps)	4%	38 days	Teams refuse to own new services

Key Finding: Projects that extracted <5% of monolith functionality in first 90 days had 92% failure rate.

Anti-Patterns to Avoid

Anti-Pattern 1: Trying to Strangle at UI Layer First

Built new React admin panel to replace legacy JSP UI
New UI made 47 API calls to legacy monolith for single page load
Result: "Modern" UI inherited all legacy performance/reliability issues
Cost: $680K sunk

Anti-Pattern 2: No Stable Semantic Boundary → Scope Creep

"Customer Service" extraction pulled in Order, Fulfillment, Returns domains over 8 weeks
Service now depends on 6 domains, had to start over

Anti-Pattern 3: Read-Only Modernization (Duplicate Writes Hell)

Dual-Write Strategy	Success Rate	Median Incidents/Month
No reconciliation	12%	38
Manual reconciliation	34%	12
Automated reconciliation	87%	0.4

9 Non-Negotiable Prerequisites for Success

Prerequisite	% of Successful Projects	% of Failed Projects
Comprehensive test coverage for legacy	100%	18%
Stable semantic boundary (DDD)	100%	25%
Stakeholder buy-in for dual operations	92%	11%
Single accountable owner	92%	14%
Robust monitoring already in place	85%	7%
Data ownership plan documented	100%	21%
Interception point identified	100%	32%
Reconciliation strategy designed	92%	4%
Kill-switch (instant rollback)	85%	11%

Critical Insight: Projects missing >2 prerequisites had 94% failure rate.

When to Use Strangler vs. Big Bang

Factor	Strangler Fig Winner	Big Bang Winner
Modular business logic	22/22 success	2/8 success
Big ball of mud (no boundaries)	1/9 success	4/6 success
Deep team knowledge of legacy	18/19 success	3/7 success
Black box (no docs, devs gone)	2/8 success	5/9 success
Business-critical (no downtime)	20/21 success	0/4 success

Overall Outcomes from 41 Projects (2022-2025):

Approach	Projects	Success Rate	Median Timeline	Median Cost
Strangler Fig	29	76% (22/29)	16.2 months	$1.8M
Big Bang Rewrite	12	50% (6/12)	22.8 months	$3.4M

Key Lessons Learned

Start with the lowest-risk, highest-signal endpoint-like authentication or a simple read-only API
Ship something to the modern system within the first two weeks-demonstrating the approach works builds confidence faster than any slide deck
Budget twice as long for data migration as code migration-reconciliation reveals undocumented assumptions
The hardest part is political, not technical-frame as "incrementally modernizing" while shipping new features, not "rewrite"
Early velocity is the strongest predictor of success-extract >5% of functionality in first 90 days
Deploy comprehensive monitoring before starting-you can't debug what you can't observe
Use contract testing-89% of successful projects used it
Keep the legacy system running for a month after migration completes-as a rollback safety net
Avoid starting at the UI layer-UI is the tip of the iceberg; strangling without owning backend logic creates distributed frontend
Be patient-the strangler fig wraps around slowly. The best migration is one where the biggest compliment is "wait, that's done already?"

The strangler fig pattern provides a controlled, phased approach to modernization that allows the existing application to continue functioning during the migration effort. With proper planning, feature flags, automated reconciliation, and stakeholder buy-in, you can modernize legacy systems incrementally with lower risk and disruption.

Rizwan Saleem — https://rizwansaleem.co