How to migrate legacy systems incrementally without breaking everything
Legacy System Migration: The Strangler Fig Pattern Complete Tutorial
Why Big-Bang Rewrites Fail
Big-bang rewrites fail for predictable reasons: the legacy system keeps evolving during the rewrite (moving target), undocumented edge cases hidden in production code surface too late, teams lose motivation after months with nothing shipped, and maintaining two systems becomes permanent. When someone proposes a full rewrite of a production system, the default answer should be "no"-the burden of proof is on demonstrating why incremental migration won't work.
The Strangler Fig Pattern Explained
The strangler fig pattern incrementally migrates a legacy system by gradually replacing specific functionality with new applications while keeping the old system running. Named after the tree that grows around a host tree and eventually replaces it, this approach lets you modernize piece by piece rather than attempting a risky big-bang rewrite.
How It Works
| Phase | What Happens | Duration |
|---|---|---|
| Setup | Place routing layer (proxy/façade) in front of legacy system; route 100% traffic to legacy | 2 weeks |
| Migrate Incrementally | Move one feature at a time to new system, starting with lowest-risk endpoint | 2-6 months |
| Decommission | When all traffic flows through new system, keep legacy running as rollback target for a month, then turn off | 2 weeks |
The proxy layer intercepts requests and routes them to either the legacy system or new services. From the client's perspective, nothing changes-same URLs, same request shapes, same authentication.
Step-by-Step Implementation
1. Map Your Legacy System
Identify major capabilities and boundaries between them. Find where you could intercept requests. Apply criteria for your first candidate: bounded capability, valuable, and low-risk.
2. Build the Routing Layer First
Before migrating anything, set up the infrastructure to route requests between systems. For web applications, use a reverse proxy or load balancer. Initially forward 100% of requests to the legacy system and verify the routing layer doesn't break anything.
3. Develop New Components in Isolation
For each slice, develop a new component replicating the old functionality with modern technologies. Start with a relatively isolated piece of functionality.
4. Route Traffic Gradually
Once a new component is ready, implement routing through the indirection layer instead of the old component. Increase traffic in stages: 1%, 10%, 50%, 100%.
5. Iterate and Retire Old Components
Repeat the cycle for each additional function. As new components take over, old components become redundant and can be retired.
Feature Flags for Migration Control
Feature flags are essential for managing transitions and mitigating migration risk. Don't flip an endpoint from legacy to modern all at once-use feature flags to shift traffic gradually:
| Week | Traffic to Modern System | What You're Watching |
|---|---|---|
| 1 | Internal team only | Error rates, response times, data consistency |
| 2 | 5% of users (smallest accounts) | Edge cases, query performance at small scale |
| 3 | 25% of users | Database load, cache hit rates |
| 4 | 50% of users | Aggregate metrics vs. legacy baseline |
| 5 | 90% of users | Long-tail edge cases |
| 6 | 100% | Monitoring period, legacy route removed |
At any point, if you see errors or performance degradation, flip the flag back instantly. Start with your internal team, then smallest/newest users-they have the least data, fewest edge cases, highest tolerance for issues. Migrate largest, most complex accounts last.
Data Migration Strategies
Code migration is straightforward compared to data migration. Code is stateless; data is stateful-once you've written to a new schema, rolling back means migrating data backwards.
Dual-Write with Reconciliation (Proven Approach)
- The new system writes to the new database (source of truth going forward)
- It also writes to the legacy database (keeps legacy system consistent during transition)
- A background job migrates historical data in batches
- A reconciliation job compares data between old and new, flagging discrepancies
The reconciliation job is where you'll discover every undocumented assumption in the legacy system-timezone conversions, currency rounding differences, nullable fields handled inconsistently. Budget twice as long as you think for data migration-in one project, code migration took 3 months while data migration and reconciliation took 4.
Data Migration Strategy Options
| Strategy | Description | Downtime | Risk |
|---|---|---|---|
| Big Bang | All data transferred in one go during planned downtime | High | High |
| Trickle Migration | Data moved gradually in phases; source and target run in parallel | Minimal | Lower |
| Incremental Transfer | Transfer data in batches instead of all at once | Minimal | Moderate |
| Live Migration | Database replication, active-active failover, zero-downtime deployment | None | Lowest |
Best Practices for Data Integrity
- Use checksums and record counts to confirm data integrity
- Preserve data integrity with version control for all migration scripts
- Document everything-keep complete logs for root cause analysis
- Start reconciliation early-in the first month-to discover mapping bugs sooner
Minimizing and Handling Downtime
Pre-Migration Planning
- Develop a comprehensive migration plan with detailed schedules and resource allocations
- Perform migrations during off-peak hours or planned maintenance windows
- Establish clear fallback and rollback procedures thoroughly tested before migration begins
Live Migration Techniques
Employ database replication, active-active failover configurations, and zero-downtime deployment strategies to continue operations without interruptions. AWS Database Migration Service provides real-time replication during transition.
Communication and Coordination
Maintain open lines of communication with all stakeholders-IT staff, end-users, and business leaders.
Rollback Plans
Instant Rollback with Feature Flags
The psychological advantage of strangler fig is that every step is reversible-flip the feature flag back instantly if you see errors.
Legacy System as Rollback Target
Keep the legacy system running in production for a month or more after you think you're "done"-serving zero traffic but available as a rollback target. Decommission it when you're confident the new system handles every edge case.
Database Rollback
Each ChangeSet should include rollback logic defined in the migration script. If a migration fails, the system automatically invokes rollback commands, restoring the database schema to its previous stable state. Pre-defined checkpoints should be established where migration can be paused or rolled back if necessary.
Testing Migrations
Contract Testing is Critical
89% of successful strangler projects used contract testing; only 14% of failed projects did. Contract testing validates API agreements between consumer and provider services.
Test Distribution Shift During Migration
| Test Type | Monolith % | During Strangler % | Post-Migration % |
|---|---|---|---|
| Unit (within service) | 40% | 55% | 70% |
| Contract (API agreements) | 5% | 35% | 25% |
| Integration (DB/external) | 25% | 8% | 4% |
| E2E (full system) | 30% | 2% | 1% |
Testing Checklist
- All migration scripts must be versioned
- Monitor performance for bottlenecks, be prepared to change approach
- Perform automated reconciliation runs continuously (8,064 runs over 14 months in one case study)
- Set kill criteria: 4 consecutive weeks with 0 mismatches before decommissioning
Real Migration Case Studies
Case Study 1: Insurance SaaS Pricing Engine Migration
Project Details:
- Codebase: 380,418 LOC VB6 + 127 SQL Server stored procedures
- Timeline: 14 months (Feb 2024 - Apr 2025)
- Team: 5 engineers (3 backend, 1 DevOps, 1 QA)
- Cost: $1.24M
Before Migration Metrics:
| Metric | Before (VB6) | After (.NET 8) | Improvement |
|---|---|---|---|
| Median latency | 1,247ms | 32ms | -97.4% |
| P99 latency | 4,820ms | 78ms | -98.4% |
| Error rate | 0.18% | 0.004% | -97.8% |
| Infrastructure cost | $840K/year | $160K/year | -$680K |
| Deployment frequency | 4x/year | 52x/year | +1,200% |
| Pricing bug MTTR | 4.2 days | 1.8 hours | -96% |
Reconciliation Loop That Saved $4.2M:
- 8,064 automated reconciliation runs over 14 months
- 12.4M pricing calculations processed
- 847 mismatches detected (0.007% of events)
- Mismatches: rounding differences (49%), tax calculation edge cases (26%), undocumented business rules (15%), timezone handling (8%)
Without automated reconciliation, projected $4.2M in incorrect quotes/invoices would have occurred over 14 months.
Case Study 2: OTT Streaming Platform Microservices Migration
- Successfully migrated legacy systems to microservices using UUID mapping strategy for data syncing
- Used SNS, SQS, and Dead Letter Queues for data synchronization and versioning
- New system supports myriad features, better content management, streamlined publishing mechanics
Case Study 3: Global EHS Platform to ServiceNow
- 12-month migration of global EHS platform off outdated SaaS to ServiceNow
- Saved 20% while completing migration
- 95% of workflows worked as expected, remaining 5% required post-deployment adjustments
5 Lessons That Made the Difference:
- Stakeholder mapping first-defined decision-making authority
- Vendor verification-validated workflows in controlled test environments
- Embedding business in DevOps-closed gap between requirements and delivery
- Smart scope management-tailored adoption by audience
- Deliverable-based success metrics-measurable outcomes instead of abstract goals
Why Most Strangler Attempts Fail (Research Data)
Analysis of 41 enterprise strangler projects (2022-2025) found 68% stalled before 90 days, never replacing their first monolith component.
Failure Analysis: 28 Stalled Projects
| Failure Mode | % of Projects | Median Time to Stall | Primary Symptom |
|---|---|---|---|
| Started at UI layer | 39% | 68 days | New UI tightly coupled to legacy backend |
| No stable semantic boundary | 32% | 72 days | Endless scope creep |
| Treated legacy DB as immutable | 18% | 104 days | Dual-write hell, data sync failures |
| Underestimated observability | 7% | 45 days | Can't debug distributed system |
| Cultural resistance (no DevOps) | 4% | 38 days | Teams refuse to own new services |
Key Finding: Projects that extracted <5% of monolith functionality in first 90 days had 92% failure rate.
Anti-Patterns to Avoid
Anti-Pattern 1: Trying to Strangle at UI Layer First
- Built new React admin panel to replace legacy JSP UI
- New UI made 47 API calls to legacy monolith for single page load
- Result: "Modern" UI inherited all legacy performance/reliability issues
- Cost: $680K sunk
Anti-Pattern 2: No Stable Semantic Boundary → Scope Creep
- "Customer Service" extraction pulled in Order, Fulfillment, Returns domains over 8 weeks
- Service now depends on 6 domains, had to start over
Anti-Pattern 3: Read-Only Modernization (Duplicate Writes Hell)
| Dual-Write Strategy | Success Rate | Median Incidents/Month |
|---|---|---|
| No reconciliation | 12% | 38 |
| Manual reconciliation | 34% | 12 |
| Automated reconciliation | 87% | 0.4 |
9 Non-Negotiable Prerequisites for Success
| Prerequisite | % of Successful Projects | % of Failed Projects |
|---|---|---|
| Comprehensive test coverage for legacy | 100% | 18% |
| Stable semantic boundary (DDD) | 100% | 25% |
| Stakeholder buy-in for dual operations | 92% | 11% |
| Single accountable owner | 92% | 14% |
| Robust monitoring already in place | 85% | 7% |
| Data ownership plan documented | 100% | 21% |
| Interception point identified | 100% | 32% |
| Reconciliation strategy designed | 92% | 4% |
| Kill-switch (instant rollback) | 85% | 11% |
Critical Insight: Projects missing >2 prerequisites had 94% failure rate.
When to Use Strangler vs. Big Bang
| Factor | Strangler Fig Winner | Big Bang Winner |
|---|---|---|
| Modular business logic | 22/22 success | 2/8 success |
| Big ball of mud (no boundaries) | 1/9 success | 4/6 success |
| Deep team knowledge of legacy | 18/19 success | 3/7 success |
| Black box (no docs, devs gone) | 2/8 success | 5/9 success |
| Business-critical (no downtime) | 20/21 success | 0/4 success |
Overall Outcomes from 41 Projects (2022-2025):
| Approach | Projects | Success Rate | Median Timeline | Median Cost |
|---|---|---|---|---|
| Strangler Fig | 29 | 76% (22/29) | 16.2 months | $1.8M |
| Big Bang Rewrite | 12 | 50% (6/12) | 22.8 months | $3.4M |
Key Lessons Learned
- Start with the lowest-risk, highest-signal endpoint-like authentication or a simple read-only API
- Ship something to the modern system within the first two weeks-demonstrating the approach works builds confidence faster than any slide deck
- Budget twice as long for data migration as code migration-reconciliation reveals undocumented assumptions
- The hardest part is political, not technical-frame as "incrementally modernizing" while shipping new features, not "rewrite"
- Early velocity is the strongest predictor of success-extract >5% of functionality in first 90 days
- Deploy comprehensive monitoring before starting-you can't debug what you can't observe
- Use contract testing-89% of successful projects used it
- Keep the legacy system running for a month after migration completes-as a rollback safety net
- Avoid starting at the UI layer-UI is the tip of the iceberg; strangling without owning backend logic creates distributed frontend
- Be patient-the strangler fig wraps around slowly. The best migration is one where the biggest compliment is "wait, that's done already?"
The strangler fig pattern provides a controlled, phased approach to modernization that allows the existing application to continue functioning during the migration effort. With proper planning, feature flags, automated reconciliation, and stakeholder buy-in, you can modernize legacy systems incrementally with lower risk and disruption.
Rizwan Saleem — https://rizwansaleem.co
Top comments (0)