DEV Community

Cover image for The Expand and Contract Pattern for Zero-Downtime Migrations
JP Fontenele
JP Fontenele

Posted on • Originally published at jpfontenele.makes-code.com

The Expand and Contract Pattern for Zero-Downtime Migrations

You're doing a rolling deployment. Half your instances are running v1, half are
running v2. Then v2 runs a migration that renames a column. Suddenly v1
instances start panicking, your phone starts blowing up with PD alarms for the
service you just deployed and downstream services are affected, because the
column that v1 expects exists is gone.

The expand and contract pattern prevents this.

The Problem

Traditional migrations assume all application instances update simultaneously.
That's fine for recreate deployments with downtime, but breaks badly with
rolling updates where both versions run concurrently.

-- This breaks rolling deployments
ALTER TABLE users RENAME COLUMN name TO full_name;
Enter fullscreen mode Exit fullscreen mode

The moment this runs, every v1 box querying name fails.

The Pattern

Expand and contract splits breaking changes into backward-compatible steps:

  1. Expand: Add the new structure alongside the old one
  2. Migrate: Dual-write in application code, backfill existing data
  3. Contract: Remove the old structure once nothing uses it

Each step is a separate deployment. Each deployment is safe to run while both
application versions are live.

Example: Renaming a Column

Let's rename users.name to users.full_name.

Step 1: Expand

Add the new column. That's it.

ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
Enter fullscreen mode Exit fullscreen mode

This is backward-compatible. v1 keeps using name, completely unaware of
full_name.

Step 2: Migrate

Deploy application code that writes to both columns:

func (r *UserRepo) Create(ctx context.Context, user *User) error {
    _, err := r.db.ExecContext(ctx, `
        INSERT INTO users (name, full_name, email)
        VALUES ($1, $1, $2)
    `, user.FullName, user.Email)
    return err
}

func (r *UserRepo) UpdateName(ctx context.Context, id int, name string) error {
    _, err := r.db.ExecContext(ctx, `
        UPDATE users SET name = $1, full_name = $1 WHERE id = $2
    `, name, id)
    return err
}
Enter fullscreen mode Exit fullscreen mode

Read from the new column, falling back to the old one:

func (r *UserRepo) GetByID(ctx context.Context, id int) (*User, error) {
    row := r.db.QueryRowContext(ctx, `
        SELECT id, COALESCE(full_name, name), email FROM users WHERE id = $1
    `, id)

    var user User
    err := row.Scan(&user.ID, &user.FullName, &user.Email)
    return &user, err
}
Enter fullscreen mode Exit fullscreen mode

Backfill existing rows:

UPDATE users SET full_name = name WHERE full_name IS NULL;
Enter fullscreen mode Exit fullscreen mode

At this point both columns stay in sync through your application code. v1
instances still work because name is always populated.

Step 3: Contract

Once all instances are on the new code and you're confident no rollback is
needed, clean up:

ALTER TABLE users DROP COLUMN name;
Enter fullscreen mode Exit fullscreen mode

Update your application code to remove the dual-write logic:

func (r *UserRepo) Create(ctx context.Context, user *User) error {
    _, err := r.db.ExecContext(ctx, `
        INSERT INTO users (full_name, email) VALUES ($1, $2)
    `, user.FullName, user.Email)
    return err
}
Enter fullscreen mode Exit fullscreen mode

Example: Splitting a Table

You need to extract users.address_* columns into a separate addresses table.

Step 1: Expand

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id) UNIQUE,
    street VARCHAR(255),
    city VARCHAR(100),
    postal_code VARCHAR(20)
);
Enter fullscreen mode Exit fullscreen mode

Step 2: Migrate

Dual-write to both locations:

func (r *UserRepo) UpdateAddress(ctx context.Context, userID int, addr Address) error {
    tx, err := r.db.BeginTx(ctx, nil)
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // Write to old columns (for v1 instances)
    _, err = tx.ExecContext(ctx, `
        UPDATE users
        SET address_street = $1, address_city = $2, address_postal_code = $3
        WHERE id = $4
    `, addr.Street, addr.City, addr.PostalCode, userID)
    if err != nil {
        return err
    }

    // Write to new table (for v2 instances)
    _, err = tx.ExecContext(ctx, `
        INSERT INTO addresses (user_id, street, city, postal_code)
        VALUES ($1, $2, $3, $4)
        ON CONFLICT (user_id) DO UPDATE SET
            street = EXCLUDED.street,
            city = EXCLUDED.city,
            postal_code = EXCLUDED.postal_code
    `, userID, addr.Street, addr.City, addr.PostalCode)
    if err != nil {
        return err
    }

    return tx.Commit()
}
Enter fullscreen mode Exit fullscreen mode

Read from the new table with fallback:

func (r *UserRepo) GetAddress(ctx context.Context, userID int) (*Address, error) {
    // Try new table first
    row := r.db.QueryRowContext(ctx, `
        SELECT street, city, postal_code FROM addresses WHERE user_id = $1
    `, userID)

    var addr Address
    err := row.Scan(&addr.Street, &addr.City, &addr.PostalCode)
    if err == nil {
        return &addr, nil
    }
    if err != sql.ErrNoRows {
        return nil, err
    }

    // Fall back to old columns
    row = r.db.QueryRowContext(ctx, `
        SELECT address_street, address_city, address_postal_code
        FROM users WHERE id = $1
    `, userID)
    err = row.Scan(&addr.Street, &addr.City, &addr.PostalCode)
    return &addr, err
}
Enter fullscreen mode Exit fullscreen mode

Backfill:

INSERT INTO addresses (user_id, street, city, postal_code)
SELECT id, address_street, address_city, address_postal_code
FROM users
WHERE address_street IS NOT NULL
ON CONFLICT (user_id) DO NOTHING;
Enter fullscreen mode Exit fullscreen mode

Step 3: Contract

ALTER TABLE users
    DROP COLUMN address_street,
    DROP COLUMN address_city,
    DROP COLUMN address_postal_code;
Enter fullscreen mode Exit fullscreen mode

Remove the dual-write code and the fallback logic.

Example: Changing a Column Type

Changing price from INTEGER (cents) to DECIMAL (dollars).

Step 1: Expand

ALTER TABLE products ADD COLUMN price_decimal DECIMAL(10,2);
Enter fullscreen mode Exit fullscreen mode

Step 2: Migrate

Dual-write with conversion:

func (r *ProductRepo) UpdatePrice(ctx context.Context, id int, cents int) error {
    dollars := float64(cents) / 100.0
    _, err := r.db.ExecContext(ctx, `
        UPDATE products SET price = $1, price_decimal = $2 WHERE id = $3
    `, cents, dollars, id)
    return err
}
Enter fullscreen mode Exit fullscreen mode

Backfill:

UPDATE products SET price_decimal = price / 100.0 WHERE price_decimal IS NULL;
Enter fullscreen mode Exit fullscreen mode

Step 3: Contract

ALTER TABLE products DROP COLUMN price;
ALTER TABLE products RENAME COLUMN price_decimal TO price;
Enter fullscreen mode Exit fullscreen mode

When to Use This Pattern

This pattern is a requirement if you need a deployment strategy other than
big-bang, because you'll need to run both application versions at the same time
while the new version is being rolled out.

If your deployment pipeline allows for maintenance windows, or you're making non
breaking-changes, then skip this pattern.

Triggers vs Application Code

The examples above use application level dual-writes, but you can also use
database triggers to keep columns in sync. Triggers simplify application code
and guarantee consistency even for direct database writes or scripts.

Beware though, triggers are harder to test, version, and maintain alongside
your application, and add vendor specific logic to your schema. If all writes
go through your application anyway, keeping the sync logic in code is usually
cleaner.

This pattern pairs well with zero-downtime deployment strategies. Check out Deployment Strategies Visualized for a visual guide to rolling, blue-green, canary, and progressive rollouts.

Top comments (0)