DEV Community

Tosh
Tosh

Posted on

Zero-Downtime Schema Changes (You Can Do This)

You need to add a column to a table with 100 million rows.

Old approach: Lock the table, add column, wait 30 minutes, production is down.

New approach: Add column, backfill data, no downtime.

The Pattern

Step 1: Add column (new, nullable)

ALTER TABLE users ADD COLUMN new_field VARCHAR(255) NULL;
Enter fullscreen mode Exit fullscreen mode

Takes 2 seconds. Table briefly locked. No big deal.

Step 2: Backfill in batches

UPDATE users SET new_field = computed_value WHERE id >= 0 AND id < 10000;
UPDATE users SET new_field = computed_value WHERE id >= 10000 AND id < 20000;
-- ... repeat in batches
Enter fullscreen mode Exit fullscreen mode

Takes time but doesn't lock the whole table.

Step 3: Add constraint (if needed)

ALTER TABLE users MODIFY new_field VARCHAR(255) NOT NULL;
Enter fullscreen mode Exit fullscreen mode

After backfill is done.

Step 4: Update application
Start writing to new_field.

Step 5: Remove old code
Once confident, stop reading old_field.

Step 6: Drop old column
After a few weeks, delete old_field.

Zero downtime throughout.

Real Timeline

Day 1 - Deploy:

  • Add new_field (NULL)
  • App still reads old_field
  • App writes to old_field only
  • Backfill job starts (runs in background)

Day 2-3:

  • Backfill job continues
  • No table locks
  • No downtime
  • Monitor progress

Day 4 - Deploy:

  • Backfill complete
  • App reads new_field (with fallback to old if NULL)
  • App writes to both fields

Day 7 - Deploy:

  • App reads new_field only
  • Old field no longer written
  • Keep old field in case of rollback

Week 3 - Deploy:

  • Drop old column
  • Confident no rollback needed

Why This Works

Each step is reversible:

  • Step 1: Just added a column, no risk
  • Step 2: Backfill failed? Restart it
  • Step 3: App update wrong? Rollback
  • Step 4: Reading new field broken? Revert
  • Step 5: Drop old column? Wait a week first

No single point of failure. No downtime.

Constraint Example

You want to add a UNIQUE constraint:

Wrong (with downtime):

ALTER TABLE users ADD UNIQUE(email);
Enter fullscreen mode Exit fullscreen mode

Right (zero downtime):

Step 1: Add index (doesn't enforce uniqueness yet)

CREATE UNIQUE INDEX idx_email_unique ON users(email);
Enter fullscreen mode Exit fullscreen mode

Step 2: Deploy code to prevent duplicates
(Validate in app before saving)

Step 3: Once confident, add constraint

ALTER TABLE users ADD CONSTRAINT unique_email UNIQUE(email);
Enter fullscreen mode Exit fullscreen mode

Gotchas

1. Large tables take time
Backfilling 100M rows takes hours. Plan for it.

2. Locks still happen briefly
Each ALTER TABLE locks the table for seconds. Plan for off-peak times.

3. Disk space
Backfill might double your data temporarily (old + new columns). Ensure you have space.

Checklist

Before schema change:

  • [ ] Can I make it nullable? (adds safety)
  • [ ] Do I need to backfill? (if yes, batch it)
  • [ ] What's the app change? (deploy separately)
  • [ ] How long is backfill? (plan timing)
  • [ ] Can I rollback? (at each step)

If your schema changes cause downtime, you're doing it wrong.

Zero-downtime is achievable for most schema changes.

Plan the steps. Execute carefully. Sleep through the night.

Top comments (0)