Oleksandr Prudnikov

Posted on Apr 15

37 Alembic Migrations, Zero Downtime: How We Moved a Live SaaS From Single-Tenant to Multi-Tenant

#python #postgres #database #architecture

In March 2025 I was hired by Anish Chopra at komodoai.dev to do one job: take their B2B lead-gen SaaS (FunnelBoost, codenamed Intently) from single-tenant to multi-tenant. On a live PostgreSQL database. Without downtime. Without data loss.

Five weeks later we shipped 37 Alembic revisions, 114 commits, 909 files changed, +40,527 / -11,073 lines of code. Team of 6 developers.

This article is the pattern we followed. If you ever need to do a tenancy migration on a live Postgres database, this is the playbook.

The Problem

FunnelBoost is a B2B lead-generation platform. It monitors Reddit, LinkedIn, and X via Apify scrapers, AI-scores leads with OpenAI, manages multi-step notification pipelines, Slack integrations, and Stripe billing. Stack: Python 3.11, FastAPI, SQLAlchemy async, Alembic, Pydantic 2, Celery + Redis, PostgreSQL.

The original schema keyed everything off user_id. Leads, notifications, signals, smart columns, Slack connections, intent signal configs, data sources, personas, notification settings, lead status configs — all belonged to a user.

Growth required organizations with multiple projects per org, guest access, and per-project data isolation. That meant every one of those tables needed to move from user_id to project_id as the ownership key.

You cannot do this in one migration. If you try, you get a locked table, a failed backfill, and a 3am incident.

The Four-Phase Pattern

We applied the same four-phase pattern across ~10 tables. Here it is.

Phase 1: Add nullable foreign key

# add_nullable_project_id_to_leads.py
def upgrade():
    op.add_column(
        "leads",
        sa.Column(
            "project_id",
            sa.Integer(),
            sa.ForeignKey("projects.id", ondelete="CASCADE"),
            nullable=True,
        ),
    )
    op.create_index(
        "ix_leads_project_id",
        "leads",
        ["project_id"],
    )

This is a non-blocking DDL on modern Postgres. The column is nullable, so existing rows are fine. No locks beyond a brief ACCESS EXCLUSIVE for the ALTER TABLE, which completes instantly because there's no default value to backfill.

Phase 2: Backfill in a separate revision

# populate_leads_by_project_id.py
def upgrade():
    op.execute("""
        UPDATE leads
        SET project_id = (
            SELECT p.id
            FROM projects p
            JOIN organizations o ON o.id = p.organization_id
            JOIN organization_members om ON om.organization_id = o.id
            WHERE om.user_id = leads.user_id
            LIMIT 1
        )
        WHERE project_id IS NULL
    """)

This is deliberately a separate Alembic revision. Why? Because ops can run Phase 1 during a deploy, then run Phase 2 at a controlled time — maybe in batches if the table is large. Schema changes and data movements have completely different risk profiles. Mixing them means you can't retry one without the other.

We did this for every table: add_project_id_to_smart_column and migrate_data_for_smart_column as two distinct revisions.

Phase 3: Swap uniqueness constraints and indexes

# leads_change_uniq_index_from_user_id_to_project_id.py
def upgrade():
    # Drop old constraint
    op.drop_constraint(
        "uq_leads_user_id_source_url",
        "leads",
        type_="unique",
    )

    # Create new constraint on project_id
    op.create_unique_constraint(
        "uq_leads_project_id_source_url",
        "leads",
        ["project_id", "source_url"],
    )

    # Make project_id non-nullable now that all rows are backfilled
    op.alter_column(
        "leads",
        "project_id",
        nullable=False,
    )

This is the hardest phase. You're changing the uniqueness semantics of the table. If your application code still writes with the old constraint assumption, inserts will fail. That means your application code needs to be updated before this migration runs, but also needs to work with the old schema.

Our approach: deploy application code that writes both user_id and project_id, then run Phase 3, then deploy code that only writes project_id. Two deploys bracketing the migration.

Phase 4: Drop old column

# drop_user_id_from_leads.py
def upgrade():
    op.drop_index("ix_leads_user_id", "leads")
    op.drop_column("leads", "user_id")

By this point the column is dead weight. Application code no longer references it. Drop it.

Tables We Migrated

We applied this exact four-phase pattern to: leads, notifications, signals, data_sources, smart_column, notification_settings, slack_connection, intent_signal_configs, lead_status_configs, and personas.

That's roughly 4 revisions per table, give or take variations where some phases could be combined. It accounts for the bulk of the 37 total revisions.

The Trigger Case

FunnelBoost had a database-level trigger on the users table — an AFTER INSERT trigger that auto-created default configuration rows for new users. When we moved to project-based tenancy, the trigger had to fire on project creation instead.

# drop_trigger_to_user_table_and_replace_by_project.py
def upgrade():
    op.execute("DROP TRIGGER IF EXISTS create_defaults_for_user ON users")
    op.execute("DROP FUNCTION IF EXISTS create_defaults_for_user()")

    op.execute("""
        CREATE OR REPLACE FUNCTION create_defaults_for_project()
        RETURNS TRIGGER AS $$
        BEGIN
            INSERT INTO notification_settings (project_id, enabled)
            VALUES (NEW.id, true);
            INSERT INTO lead_status_configs (project_id, statuses)
            VALUES (NEW.id, '["new","contacted","qualified","lost"]');
            RETURN NEW;
        END;
        $$ LANGUAGE plpgsql;
    """)

    op.execute("""
        CREATE TRIGGER create_defaults_for_project
        AFTER INSERT ON projects
        FOR EACH ROW
        EXECUTE FUNCTION create_defaults_for_project();
    """)

This is a separate migration because it changes runtime behavior, not just schema. If it breaks, you want to roll back just this revision without touching the column migrations.

The Table Rename

Midway through, we had a project_access table that held guest access records. As the model solidified, its name became misleading — it wasn't about project access in general, it was specifically about guest access. We shipped a rename:

rename_project_access_table_to_guest_project_access

A small thing, but semantic clarity in your schema prevents a whole class of bugs downstream. We renamed it while the migration train was running anyway. Cost: one Alembic revision.

Organization and Project Models

Before any of the per-table migrations could run, we needed the target models. Two revisions:

add_organization_and_project — created organizations, projects, and organization_members tables with the full constraint set.
populate_organization_and_project — for every existing user, created an org, a default project within it, and linked the user as owner.

This gave us project_id values to backfill into all the other tables. Without this, Phase 2 has nowhere to point.

Rebase Conflicts

With 6 developers working on the same codebase and migration chain, we hit Alembic head conflicts. The standard advice is "fix the down_revision pointer." We did that. But instead of silently editing an existing revision (which would confuse anyone who'd already applied it locally), we shipped it as an explicit revision:

fix migrations order after rebasing

The revision file just corrected the down_revision chain. This is the honest approach — it shows up in alembic history, everyone can see what happened, and nobody's local database is in a mystery state.

What I'd Do Differently

Batch the backfills. Our UPDATE ... WHERE project_id IS NULL queries worked because the tables were small enough (tens of thousands of rows). On tables with millions of rows, you'd want to batch in chunks of 10,000-50,000 with LIMIT and a loop, or use a background worker. We got away with it, but the pattern doesn't scale without batching.

Automate the pattern. After doing the same four phases for the third table, I should have written a code generator — give it a table name and a join path, and it spits out four Alembic revision files. By table six I was copy-pasting and search-replacing, which is fine until you miss one.

Test the constraint swap more aggressively. Phase 3 is where things break. We caught issues in staging, but a dedicated integration test that runs the full four-phase sequence against a seeded database would have caught them earlier.

The Numbers

37 Alembic revisions authored
114 commits over 5 weeks (Mar 28 - May 2, 2025)
909 files changed
+40,527 / -11,073 lines of code
Team of 6 developers
10+ tables migrated with the four-phase pattern
Zero downtime, zero data loss

The frontend moved in parallel: OpenAPI client codegen (which I proposed and wired — auto-generated React client replacing hand-maintained API methods), project switch UX, mapping layers for intent signals and lead statuses. 284 files in frontend/src/client alone from the codegen output.

The Takeaway

Multi-tenancy migrations on live databases are not magic. They're tedious. The four-phase pattern — add nullable, backfill separately, swap constraints, drop old column — is mechanical and predictable. That's the point. You want boring migrations.

The hard parts are operational: coordinating deploys around Phase 3, handling rebase conflicts in a team, and resisting the temptation to combine phases "because it's the same table anyway."

If you're facing a similar migration, steal this pattern. It works.

This migration shipped over 5 weeks as part of a team of 6 for a B2B SaaS. I'm now working on a much smaller side project, FlipperHelper — an iOS app for resellers to track purchases and profit. Different scale, same obsession with getting the right architecture before scaling.

DEV Community