You need to add a column to a table with 100 million rows.
Old approach: Lock the table, add column, wait 30 minutes, production is down.
New approach: Add column, backfill data, no downtime.
The Pattern
Step 1: Add column (new, nullable)
ALTER TABLE users ADD COLUMN new_field VARCHAR(255) NULL;
Takes 2 seconds. Table briefly locked. No big deal.
Step 2: Backfill in batches
UPDATE users SET new_field = computed_value WHERE id >= 0 AND id < 10000;
UPDATE users SET new_field = computed_value WHERE id >= 10000 AND id < 20000;
-- ... repeat in batches
Takes time but doesn't lock the whole table.
Step 3: Add constraint (if needed)
ALTER TABLE users MODIFY new_field VARCHAR(255) NOT NULL;
After backfill is done.
Step 4: Update application
Start writing to new_field.
Step 5: Remove old code
Once confident, stop reading old_field.
Step 6: Drop old column
After a few weeks, delete old_field.
Zero downtime throughout.
Real Timeline
Day 1 - Deploy:
- Add new_field (NULL)
- App still reads old_field
- App writes to old_field only
- Backfill job starts (runs in background)
Day 2-3:
- Backfill job continues
- No table locks
- No downtime
- Monitor progress
Day 4 - Deploy:
- Backfill complete
- App reads new_field (with fallback to old if NULL)
- App writes to both fields
Day 7 - Deploy:
- App reads new_field only
- Old field no longer written
- Keep old field in case of rollback
Week 3 - Deploy:
- Drop old column
- Confident no rollback needed
Why This Works
Each step is reversible:
- Step 1: Just added a column, no risk
- Step 2: Backfill failed? Restart it
- Step 3: App update wrong? Rollback
- Step 4: Reading new field broken? Revert
- Step 5: Drop old column? Wait a week first
No single point of failure. No downtime.
Constraint Example
You want to add a UNIQUE constraint:
Wrong (with downtime):
ALTER TABLE users ADD UNIQUE(email);
Right (zero downtime):
Step 1: Add index (doesn't enforce uniqueness yet)
CREATE UNIQUE INDEX idx_email_unique ON users(email);
Step 2: Deploy code to prevent duplicates
(Validate in app before saving)
Step 3: Once confident, add constraint
ALTER TABLE users ADD CONSTRAINT unique_email UNIQUE(email);
Gotchas
1. Large tables take time
Backfilling 100M rows takes hours. Plan for it.
2. Locks still happen briefly
Each ALTER TABLE locks the table for seconds. Plan for off-peak times.
3. Disk space
Backfill might double your data temporarily (old + new columns). Ensure you have space.
Checklist
Before schema change:
- [ ] Can I make it nullable? (adds safety)
- [ ] Do I need to backfill? (if yes, batch it)
- [ ] What's the app change? (deploy separately)
- [ ] How long is backfill? (plan timing)
- [ ] Can I rollback? (at each step)
If your schema changes cause downtime, you're doing it wrong.
Zero-downtime is achievable for most schema changes.
Plan the steps. Execute carefully. Sleep through the night.
Top comments (0)