Strangler Fig vs. Big Bang: 3 Reasons for Migrating to Modular

#architecture #microservices #systemdesign #migration

When you decide to fundamentally change your software architecture, two main paths emerge: tearing everything down and rebuilding (Big Bang), or gradually transforming the old structure into a new one (Strangler Fig). In the Strangler Fig vs. Big Bang comparison, with 20 years of field experience, I can confidently say that trying to change everything at once usually leads to frustration. When migrating large systems, maintaining operational continuity, ensuring database consistency, and managing teams are critically important.

In this post, drawing from my experiences working on a production ERP and migrating large-scale infrastructures, I will explain why you should prefer the Strangler Fig pattern over the Big Bang approach through 3 fundamental technical reasons. Let's get straight to the point, with code, configurations, and real-world scenarios.

1. Risk Management and Live Deployment Safety in Strangler Fig vs. Big Bang

In the Big Bang approach, you aim to deploy the entire system to a new version in a single day (usually at 03:00 AM on a sleepless Saturday night). This requires all components of the system to work flawlessly simultaneously. In reality, this is almost impossible. The Strangler Fig pattern, on the other hand, aims to gradually build new microservices or modules around the old monolithic structure, eventually "strangling" and eliminating the old system.

The most critical component here is the routing layer that manages traffic. Using Nginx or an API Gateway, we can gradually redirect specific endpoints of the monolithic application to our new service. For example, while redirecting the /api/v1/orders path to our newly written modular service, we can leave all other traffic to the old monolith.

The following Nginx configuration demonstrates how we can safely implement this gradual transition in a production environment:

# nginx.conf - Strangler Fig Traffic Routing Example
upstream old_monolith {
    server 10.0.1.50:8080; # Old monolith server
}

upstream new_orders_service {
    server 10.0.1.60:3000; # New modular orders service
}

server {
    listen 80;
    server_name api.sirket.com;

    # Endpoint migrated to the new service
    location /api/v1/orders {
        proxy_pass http://new_orders_service;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Keep timeout values low so that bottlenecks in the new service don't affect the monolith
        proxy_connect_timeout 2s;
        proxy_read_timeout 5s;
    }

    # All other endpoints not yet migrated, running on the old monolith
    location / {
        proxy_pass http://old_monolith;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

The biggest advantage of this method is that if a memory leak or an unexpected crash occurs in the new service, only the relevant endpoint is affected. The entire ERP or accounting system does not go down. Moreover, in an emergency, reverting to the old configuration and running nginx -s reload takes only milliseconds. In terms of risk management, this flexibility is far superior to the "3.2 TB database backup that is very difficult to revert" nightmare offered by Big Bang.

⚠️ Pay Attention to Timeout Settings

When directing traffic to a new service, always keep proxy timeout values aggressive. If the new service responds slowly, Nginx will keep connections open and consume resources that would go to the old monolith, potentially locking down the entire system.

2. The Shared Database and Data Consistency Dilemma

The biggest mistake in migrating to a modular architecture is separating only the code layer and leaving the database common. The "Shared Database" anti-pattern prevents microservices from being deployed independently. However, if you try to change the entire database schema at once with Big Bang, your chance of data loss is 99%.

In the Strangler Fig pattern, database transformation is also done gradually. The new service will have its own database (e.g., PostgreSQL 14+). For data synchronization between the old monolith and the new service, the transactional outbox pattern or PostgreSQL logical replication can be used.

In my previously discussed [related: postgresql-index-optimize] post, I mentioned reducing database load. Here, our goal is to manage dual-write risks. Instead of dual-writing at the application layer, using an Outbox table at the database level guarantees data consistency.

The following Python and FastAPI code example demonstrates the method of recording an event in an outbox table within the same transaction when processing data arriving at the new order service:

# database.py - Transactional Outbox Pattern Example
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, text
import json

app = FastAPI()
DATABASE_URL = "postgresql://user:password@10.0.1.60:5432/orders_db"
engine = create_engine(DATABASE_URL)

def get_db():
    db = engine.connect()
    trans = db.begin()
    try:
        yield db
        trans.commit()
    except Exception as e:
        trans.rollback()
        raise e
    finally:
        db.close()

@app.post("/api/v1/orders")
def create_order(order_data: dict, db: Session = Depends(get_db)):
    # 1. Save the order to the main table
    insert_order_query = text("""
        INSERT INTO orders (id, customer_id, total_amount, status) 
        VALUES (:id, :customer_id, :total_amount, :status)
    """)
    db.execute(insert_order_query, order_data)

    # 2. Write the event to feed the old monolith to the Outbox table (within the same Transaction!)
    outbox_event = {
        "event_type": "ORDER_CREATED",
        "payload": json.dumps(order_data)
    }
    insert_outbox_query = text("""
        INSERT INTO outbox (event_type, payload, processed) 
        VALUES (:event_type, :payload, false)
    """)
    db.execute(insert_outbox_query, outbox_event)

    return {"status": "success", "order_id": order_data["id"]}

An independent systemd service running in the background (a CDC - Change Data Capture tool or a simple Python script) continuously queries this outbox table and synchronizes the unprocessed records to the old monolith's database. This way, eventual consistency is achieved between the two databases, and the new service starts with its own isolated database while the old monolith continues to run.

3. Development Speed and Team Organization (Conway's Law)

Conway's Law states: "Organizations design systems that mirror their own communication structure." When you attempt to change a monolithic structure with Big Bang, you condemn the entire software team to a deadlock that lasts for months, halting new feature development. The moment you declare a "feature freeze," business units start pressuring you, and you end up with a new monolith that is rushed to production, untested, and still harbors old bugs.

In the Strangler Fig model, you can divide teams according to modular boundaries. For example, you can focus a team of 5 people solely on the "Supply Chain and Inventory Module." This team can work completely independently with their own CI/CD processes, without touching the old monolith's codebase.

Metric	Big Bang Approach	Strangler Fig Pattern
Development Interruption (Feature Freeze)	6 - 18 Months (High Risk)	Zero Interruption (Continuous Delivery)
Blast Radius	Entire System	Only the Relevant Module / Service
Deploy Frequency	Once a Month or Once a Year	Multiple Times a Day (Staged)
Rollback Cost	Very High (Risk of data loss)	Very Low (Only routing change)
Time to Value	End of Project (Months later)	First Week (First endpoint live)

We personally applied this strategy while modularizing a production ERP. To separate the shipping module, we gave the team a three-week deadline. Instead of stopping the entire system, we only separated the screen where shipping slips were generated and the background services. If we had failed, the only loss would have been the effort for a single module over three weeks, not the operation of the entire company.

4. Performance and Resource Consumption (CPU/Memory and Autoscaling)

One of the biggest problems with monolithic applications is the necessity of vertical scaling. A single heavy reporting query in the system can exhaust the memory (RAM) of the entire application, causing the system to be OOM-killed (Out Of Memory). With Strangler Fig, the modules we separate run with their own resource limits (cgroups).

While keeping the old monolith on bare-metal servers with 16 GB RAM, we can run our newly written modules in Docker Compose or lightweight container structures, clearly defining resource limits. This way, the failure of one module does not prevent other critical services from running.

The following systemd unit file shows how we can limit the resources of our new modular order service at the operating system level:

# /etc/systemd/system/orders-service.service
[Unit]
Description=New Modular Orders Service
After=network.target postgresql.service

[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/orders-service
ExecStart=/var/www/orders-service/venv/bin/uvicorn main:app --host 127.0.0.1 --port 3000
Restart=always
RestartSec=3

# Resource Limits (cgroups) - To prevent the monolith from consuming resources
MemoryHigh=256M
MemoryMax=512M
CPUQuota=50%

# Logging Settings
StandardOutput=journal
StandardError=journal
SyslogIdentifier=orders-service

[Install]
WantedBy=multi-user.target

With this systemd configuration, no matter what the orders service does, it cannot consume more than 50% of the operating system's CPU and will restart itself without crashing the entire system if it exceeds the 512 MB RAM limit. Achieving this level of isolation in monolithic systems is nearly impossible.

5. Rollback Strategies and Fault Tolerance

When planning an architectural migration, the place where you should spend the most effort is the moment when everything goes wrong. In Big Bang migrations, the "revert everything" scenario is a complete disaster. A silent data corruption discovered 4 hours after going live forces you to manually fix historical data.

In the Strangler Fig pattern, error management is dynamic. With a small split_clients test (canary deployment) implemented on Nginx, we can observe errors by directing only 10% of the traffic to the new module we've written.

# nginx.conf - Canary Deployment and Percentage-Based Traffic Distribution
split_clients="${remote_addr}AAA" $destination_upstream {
    10%     new_orders_service;
    *       old_monolith;
}

server {
    listen 80;
    server_name api.sirket.com;

    location /api/v1/orders {
        proxy_pass http://$destination_upstream;
        proxy_set_header Host $host;

        # Automatically redirect to the old system in case of error (Fallback)
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_timeout 1s;
        proxy_next_upstream_tries 2;
    }
}

With this configuration, if the new orders service returns a 502 or 503 error, or if it doesn't respond within 1 second, Nginx will automatically pass the request to the old monolith without the user noticing. This is the formula for a truly zero-downtime and secure architectural migration.

6. Strangler Fig Implementation Guide: Step-by-Step Modular Transformation

Let's clarify the roadmap you should follow to put theory into practice. Don't rush when planning the transformation process and follow the steps sequentially.

Set Up the Routing Layer: Place a reverse proxy like Nginx, HAProxy, or similar in front of the existing monolith. All traffic should first come here.
Choose the Easiest and Most Independent Module: First, don't choose the most complex module (e.g., accounting or inventory), but the module with the least dependencies (e.g., notifications or a simple reporting service).
Define Database Boundaries: Identify the tables for the selected module. Ensure that only the new service has direct access to these tables. The monolith should no longer query these tables directly; it should communicate via API if necessary.
Set Up Outbox and CDC Mechanism: To ensure data consistency, set up the transactional outbox structure mentioned above.
Migrate Traffic with Canary Deployments: Gradually direct traffic to the new service, starting with 5%, then 25%, 50%, and finally 100%.
Clean Up Old Code: After traffic has been fully redirected to the new service, delete the relevant code blocks within the old monolith and drop the database tables.

In my previously written [related: zero-trust-network-mimari] post, I discussed secure communication between systems. You must also secure the communication between these newly created modules, either with encrypted (mTLS) connections or secure API keys.

💡 Don't Be Afraid to Delete Old Code

After the migration is complete, don't postpone deleting old code from the monolith. Every line of old code left behind with the thought "it might be needed later" will surface as technical debt in the future. Your version control system (Git) already remembers that code.

In conclusion, in the Strangler Fig vs. Big Bang struggle, the pragmatic, controlled, and incremental Strangler Fig approach always wins. Transforming large systems is not a sprint, it's a marathon. Proceed step by step, monitor the system's metrics at each step, and maintain the flexibility to quickly revert if you make a mistake.

In the next post, we will examine the "idempotency" problem encountered when establishing event-driven communication between microservices and its solutions.