DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

How to migrate legacy systems incrementally without breaking everything

How to migrate legacy systems incrementally without breaking everything

Legacy System Migration: The Strangler Fig Pattern Complete Tutorial

Why Big-Bang Rewrites Fail

Big-bang rewrites fail for predictable reasons: the legacy system keeps evolving during the rewrite (moving target), undocumented edge cases hidden in production code surface too late, teams lose motivation after months with nothing shipped, and maintaining two systems becomes permanent. When someone proposes a full rewrite of a production system, the default answer should be "no"-the burden of proof is on demonstrating why incremental migration won't work.

The Strangler Fig Pattern Explained

The strangler fig pattern incrementally migrates a legacy system by gradually replacing specific functionality with new applications while keeping the old system running. Named after the tree that grows around a host tree and eventually replaces it, this approach lets you modernize piece by piece rather than attempting a risky big-bang rewrite.

How It Works

Phase What Happens Duration
Setup Place routing layer (proxy/façade) in front of legacy system; route 100% traffic to legacy 2 weeks
Migrate Incrementally Move one feature at a time to new system, starting with lowest-risk endpoint 2-6 months
Decommission When all traffic flows through new system, keep legacy running as rollback target for a month, then turn off 2 weeks

The proxy layer intercepts requests and routes them to either the legacy system or new services. From the client's perspective, nothing changes-same URLs, same request shapes, same authentication.

Step-by-Step Implementation

1. Map Your Legacy System

Identify major capabilities and boundaries between them. Find where you could intercept requests. Apply criteria for your first candidate: bounded capability, valuable, and low-risk.

2. Build the Routing Layer First

Before migrating anything, set up the infrastructure to route requests between systems. For web applications, use a reverse proxy or load balancer. Initially forward 100% of requests to the legacy system and verify the routing layer doesn't break anything.

3. Develop New Components in Isolation

For each slice, develop a new component replicating the old functionality with modern technologies. Start with a relatively isolated piece of functionality.

4. Route Traffic Gradually

Once a new component is ready, implement routing through the indirection layer instead of the old component. Increase traffic in stages: 1%, 10%, 50%, 100%.

5. Iterate and Retire Old Components

Repeat the cycle for each additional function. As new components take over, old components become redundant and can be retired.

Feature Flags for Migration Control

Feature flags are essential for managing transitions and mitigating migration risk. Don't flip an endpoint from legacy to modern all at once-use feature flags to shift traffic gradually:

Week Traffic to Modern System What You're Watching
1 Internal team only Error rates, response times, data consistency
2 5% of users (smallest accounts) Edge cases, query performance at small scale
3 25% of users Database load, cache hit rates
4 50% of users Aggregate metrics vs. legacy baseline
5 90% of users Long-tail edge cases
6 100% Monitoring period, legacy route removed

At any point, if you see errors or performance degradation, flip the flag back instantly. Start with your internal team, then smallest/newest users-they have the least data, fewest edge cases, highest tolerance for issues. Migrate largest, most complex accounts last.

Data Migration Strategies

Code migration is straightforward compared to data migration. Code is stateless; data is stateful-once you've written to a new schema, rolling back means migrating data backwards.

Dual-Write with Reconciliation (Proven Approach)

  1. The new system writes to the new database (source of truth going forward)
  2. It also writes to the legacy database (keeps legacy system consistent during transition)
  3. A background job migrates historical data in batches
  4. A reconciliation job compares data between old and new, flagging discrepancies

The reconciliation job is where you'll discover every undocumented assumption in the legacy system-timezone conversions, currency rounding differences, nullable fields handled inconsistently. Budget twice as long as you think for data migration-in one project, code migration took 3 months while data migration and reconciliation took 4.

Data Migration Strategy Options

Strategy Description Downtime Risk
Big Bang All data transferred in one go during planned downtime High High
Trickle Migration Data moved gradually in phases; source and target run in parallel Minimal Lower
Incremental Transfer Transfer data in batches instead of all at once Minimal Moderate
Live Migration Database replication, active-active failover, zero-downtime deployment None Lowest

Best Practices for Data Integrity

  • Use checksums and record counts to confirm data integrity
  • Preserve data integrity with version control for all migration scripts
  • Document everything-keep complete logs for root cause analysis
  • Start reconciliation early-in the first month-to discover mapping bugs sooner

Minimizing and Handling Downtime

Pre-Migration Planning

  • Develop a comprehensive migration plan with detailed schedules and resource allocations
  • Perform migrations during off-peak hours or planned maintenance windows
  • Establish clear fallback and rollback procedures thoroughly tested before migration begins

Live Migration Techniques

Employ database replication, active-active failover configurations, and zero-downtime deployment strategies to continue operations without interruptions. AWS Database Migration Service provides real-time replication during transition.

Communication and Coordination

Maintain open lines of communication with all stakeholders-IT staff, end-users, and business leaders.

Rollback Plans

Instant Rollback with Feature Flags

The psychological advantage of strangler fig is that every step is reversible-flip the feature flag back instantly if you see errors.

Legacy System as Rollback Target

Keep the legacy system running in production for a month or more after you think you're "done"-serving zero traffic but available as a rollback target. Decommission it when you're confident the new system handles every edge case.

Database Rollback

Each ChangeSet should include rollback logic defined in the migration script. If a migration fails, the system automatically invokes rollback commands, restoring the database schema to its previous stable state. Pre-defined checkpoints should be established where migration can be paused or rolled back if necessary.

Testing Migrations

Contract Testing is Critical

89% of successful strangler projects used contract testing; only 14% of failed projects did. Contract testing validates API agreements between consumer and provider services.

Test Distribution Shift During Migration

Test Type Monolith % During Strangler % Post-Migration %
Unit (within service) 40% 55% 70%
Contract (API agreements) 5% 35% 25%
Integration (DB/external) 25% 8% 4%
E2E (full system) 30% 2% 1%

Testing Checklist

  • All migration scripts must be versioned
  • Monitor performance for bottlenecks, be prepared to change approach
  • Perform automated reconciliation runs continuously (8,064 runs over 14 months in one case study)
  • Set kill criteria: 4 consecutive weeks with 0 mismatches before decommissioning

Real Migration Case Studies

Case Study 1: Insurance SaaS Pricing Engine Migration

Project Details:

  • Codebase: 380,418 LOC VB6 + 127 SQL Server stored procedures
  • Timeline: 14 months (Feb 2024 - Apr 2025)
  • Team: 5 engineers (3 backend, 1 DevOps, 1 QA)
  • Cost: $1.24M

Before Migration Metrics:

Metric Before (VB6) After (.NET 8) Improvement
Median latency 1,247ms 32ms -97.4%
P99 latency 4,820ms 78ms -98.4%
Error rate 0.18% 0.004% -97.8%
Infrastructure cost $840K/year $160K/year -$680K
Deployment frequency 4x/year 52x/year +1,200%
Pricing bug MTTR 4.2 days 1.8 hours -96%

Reconciliation Loop That Saved $4.2M:

  • 8,064 automated reconciliation runs over 14 months
  • 12.4M pricing calculations processed
  • 847 mismatches detected (0.007% of events)
  • Mismatches: rounding differences (49%), tax calculation edge cases (26%), undocumented business rules (15%), timezone handling (8%)

Without automated reconciliation, projected $4.2M in incorrect quotes/invoices would have occurred over 14 months.

Case Study 2: OTT Streaming Platform Microservices Migration

  • Successfully migrated legacy systems to microservices using UUID mapping strategy for data syncing
  • Used SNS, SQS, and Dead Letter Queues for data synchronization and versioning
  • New system supports myriad features, better content management, streamlined publishing mechanics

Case Study 3: Global EHS Platform to ServiceNow

  • 12-month migration of global EHS platform off outdated SaaS to ServiceNow
  • Saved 20% while completing migration
  • 95% of workflows worked as expected, remaining 5% required post-deployment adjustments

5 Lessons That Made the Difference:

  1. Stakeholder mapping first-defined decision-making authority
  2. Vendor verification-validated workflows in controlled test environments
  3. Embedding business in DevOps-closed gap between requirements and delivery
  4. Smart scope management-tailored adoption by audience
  5. Deliverable-based success metrics-measurable outcomes instead of abstract goals

Why Most Strangler Attempts Fail (Research Data)

Analysis of 41 enterprise strangler projects (2022-2025) found 68% stalled before 90 days, never replacing their first monolith component.

Failure Analysis: 28 Stalled Projects

Failure Mode % of Projects Median Time to Stall Primary Symptom
Started at UI layer 39% 68 days New UI tightly coupled to legacy backend
No stable semantic boundary 32% 72 days Endless scope creep
Treated legacy DB as immutable 18% 104 days Dual-write hell, data sync failures
Underestimated observability 7% 45 days Can't debug distributed system
Cultural resistance (no DevOps) 4% 38 days Teams refuse to own new services

Key Finding: Projects that extracted <5% of monolith functionality in first 90 days had 92% failure rate.

Anti-Patterns to Avoid

Anti-Pattern 1: Trying to Strangle at UI Layer First

  • Built new React admin panel to replace legacy JSP UI
  • New UI made 47 API calls to legacy monolith for single page load
  • Result: "Modern" UI inherited all legacy performance/reliability issues
  • Cost: $680K sunk

Anti-Pattern 2: No Stable Semantic Boundary → Scope Creep

  • "Customer Service" extraction pulled in Order, Fulfillment, Returns domains over 8 weeks
  • Service now depends on 6 domains, had to start over

Anti-Pattern 3: Read-Only Modernization (Duplicate Writes Hell)

Dual-Write Strategy Success Rate Median Incidents/Month
No reconciliation 12% 38
Manual reconciliation 34% 12
Automated reconciliation 87% 0.4

9 Non-Negotiable Prerequisites for Success

Prerequisite % of Successful Projects % of Failed Projects
Comprehensive test coverage for legacy 100% 18%
Stable semantic boundary (DDD) 100% 25%
Stakeholder buy-in for dual operations 92% 11%
Single accountable owner 92% 14%
Robust monitoring already in place 85% 7%
Data ownership plan documented 100% 21%
Interception point identified 100% 32%
Reconciliation strategy designed 92% 4%
Kill-switch (instant rollback) 85% 11%

Critical Insight: Projects missing >2 prerequisites had 94% failure rate.

When to Use Strangler vs. Big Bang

Factor Strangler Fig Winner Big Bang Winner
Modular business logic 22/22 success 2/8 success
Big ball of mud (no boundaries) 1/9 success 4/6 success
Deep team knowledge of legacy 18/19 success 3/7 success
Black box (no docs, devs gone) 2/8 success 5/9 success
Business-critical (no downtime) 20/21 success 0/4 success

Overall Outcomes from 41 Projects (2022-2025):

Approach Projects Success Rate Median Timeline Median Cost
Strangler Fig 29 76% (22/29) 16.2 months $1.8M
Big Bang Rewrite 12 50% (6/12) 22.8 months $3.4M

Key Lessons Learned

  1. Start with the lowest-risk, highest-signal endpoint-like authentication or a simple read-only API
  2. Ship something to the modern system within the first two weeks-demonstrating the approach works builds confidence faster than any slide deck
  3. Budget twice as long for data migration as code migration-reconciliation reveals undocumented assumptions
  4. The hardest part is political, not technical-frame as "incrementally modernizing" while shipping new features, not "rewrite"
  5. Early velocity is the strongest predictor of success-extract >5% of functionality in first 90 days
  6. Deploy comprehensive monitoring before starting-you can't debug what you can't observe
  7. Use contract testing-89% of successful projects used it
  8. Keep the legacy system running for a month after migration completes-as a rollback safety net
  9. Avoid starting at the UI layer-UI is the tip of the iceberg; strangling without owning backend logic creates distributed frontend
  10. Be patient-the strangler fig wraps around slowly. The best migration is one where the biggest compliment is "wait, that's done already?"

The strangler fig pattern provides a controlled, phased approach to modernization that allows the existing application to continue functioning during the migration effort. With proper planning, feature flags, automated reconciliation, and stakeholder buy-in, you can modernize legacy systems incrementally with lower risk and disruption.


Rizwan Saleem — https://rizwansaleem.co

Top comments (0)