brian austin

Posted on Apr 10

How I use Claude Code for zero-downtime database migrations — a complete workflow

#database #claudecode #postgres #devops

How I use Claude Code for zero-downtime database migrations — a complete workflow

Database migrations are the highest-risk moments in backend development. One bad migration can take down production for hours. Here's the exact workflow I use with Claude Code to make migrations safe, reversible, and fast — without taking the site offline.

Why database migrations are dangerous

The classic failure pattern:

Write a migration that locks a table
Deploy during peak traffic
30-second lock → 500 errors → incident

Or worse:

Add a NOT NULL column without a default
Old code still running during rolling deploy
INSERT fails → data loss

Claude Code doesn't prevent these mistakes automatically — but it dramatically reduces them when you use the right workflow.

The workflow: research first, write second

Step 1: Audit current schema

Before writing a single line of SQL, I ask Claude Code to understand the full context:

Read the current schema in db/schema.sql and all existing migrations in db/migrations/.
Identify:
1. What tables and indexes are affected by the changes I'm about to make
2. Estimated row counts (check seed files or any hints in codebase)
3. Any existing constraints that might conflict
4. What the current deploy process looks like (check Makefile, package.json scripts, CI config)

This prevents Claude from writing a migration that looks correct but doesn't account for existing constraints or table sizes.

Step 2: Write the expand migration

Zero-downtime migrations follow the expand/contract pattern. The expand migration adds new structure while keeping old structure:

Write a zero-downtime migration to add a `verified_at` timestamp to the users table.

Requirements:
- Old code must still work during rolling deploy (no breaking changes)
- New column must be nullable initially
- Must include rollback
- Must work on Postgres 14+
- Table has ~2M rows, so avoid full table locks

Claude Code typically outputs:

-- Up migration: expand phase
-- Safe for tables with millions of rows
-- Does NOT lock the table

ALTER TABLE users ADD COLUMN IF NOT EXISTS verified_at TIMESTAMPTZ;

-- Add index concurrently (no table lock)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_verified_at 
  ON users(verified_at) 
  WHERE verified_at IS NOT NULL;

-- Down migration
-- ALTER TABLE users DROP COLUMN IF EXISTS verified_at;

Step 3: Write the application code

With the migration in place, I ask Claude to update the application code:

Update the User model and any queries that touch the users table to:
1. Write `verified_at` when users complete email verification
2. Read `verified_at` to determine if a user is verified
3. Keep backward compatibility — don't break if column is NULL

Files to check: models/user.js, services/auth.js, controllers/users.js

Step 4: Write the contract migration

After the rolling deploy is complete and all instances are running new code:

Write the contract migration to:
1. Add NOT NULL constraint to verified_at with a DEFAULT for existing rows
2. Remove any old columns/indexes that are no longer needed
3. This runs AFTER 100% of instances are on new code

-- Contract phase: runs after full deploy
-- Safe because all code is now writing verified_at

-- Set a default for existing rows (batched for large tables)
UPDATE users SET verified_at = created_at WHERE verified_at IS NULL;

-- Now safe to add constraint
ALTER TABLE users ALTER COLUMN verified_at SET NOT NULL;

Step 5: Test the full sequence

Write a test that simulates the migration sequence:
1. Start with old schema
2. Run expand migration
3. Run old code against new schema (should work)
4. Run new code against new schema (should work)
5. Run contract migration
6. Run rollback and verify clean state

The rate limit problem

This workflow involves Claude Code reading:

Schema files
All existing migrations (often 50-100 files)
Models, services, controllers
Tests
CI config

On a large Rails or Django project, that's easily 50,000+ tokens of context — before you write a single line. Add the back-and-forth for testing, fixing, and verifying, and a single database migration session can consume your entire Claude Pro daily limit.

I use SimplyLouie as my API proxy — it routes through Anthropic's API at ✌️$2/month (vs $20/month for Claude Pro) with no rate limits that interrupt my workflow mid-migration. The last thing you want is a "usage limit reached" message when you're in the middle of a production database change.

Key patterns Claude Code gets right

Batched updates for large tables:

-- Instead of: UPDATE users SET col = val WHERE col IS NULL
-- Claude writes:
DO $$
DECLARE
  batch_size INT := 10000;
  offset_val INT := 0;
  rows_updated INT;
BEGIN
  LOOP
    UPDATE users SET verified_at = created_at
    WHERE verified_at IS NULL
    LIMIT batch_size;
    GET DIAGNOSTICS rows_updated = ROW_COUNT;
    EXIT WHEN rows_updated = 0;
    COMMIT;
    PERFORM pg_sleep(0.1);
  END LOOP;
END $$;

Concurrent index creation:
Claude knows to use CREATE INDEX CONCURRENTLY without being told — but only if you've established the context that the table is large and production traffic matters.

Rollback that actually works:
Most rollbacks are written as an afterthought and don't actually work. Claude Code generates rollbacks that match the exact operations performed, in reverse order.

What this workflow prevents

Risk	Prevention
Table lock on large table	CONCURRENTLY for index, ADD COLUMN IF NOT EXISTS
Breaking old code during deploy	Expand phase is always backward-compatible
Data loss on NOT NULL	Contract phase only runs after all rows are populated
Failed rollback	Claude writes and tests rollback in same session
Missing index	Schema audit in step 1 catches missing indexes

The checklist I run before every migration

Ask Claude Code to verify:

Review this migration and answer:
1. Does any operation lock the table? (ALTER TABLE, CREATE INDEX without CONCURRENTLY)
2. Is the expand phase fully backward-compatible with old code?
3. Does the rollback actually undo all changes cleanly?
4. Are there any foreign key implications?
5. What's the estimated execution time on a table with 2M rows?

This takes 30 seconds and has caught multiple production issues before they happened.

Database migrations were the last thing I felt uncomfortable delegating to AI assistance — because the cost of a mistake is so high. This workflow changed that. The combination of explicit context-loading, expand/contract structure, and pre-migration review makes Claude Code a genuinely useful migration partner.

If you want to try this workflow without blowing your Claude Pro limit, SimplyLouie gives you API access at ✌️$2/month — no rate limits, no interruptions in the middle of a migration session.

DEV Community

How I use Claude Code for zero-downtime database migrations — a complete workflow

How I use Claude Code for zero-downtime database migrations — a complete workflow

Why database migrations are dangerous

The workflow: research first, write second

Step 1: Audit current schema

Step 2: Write the expand migration

Step 3: Write the application code

Step 4: Write the contract migration

Step 5: Test the full sequence

The rate limit problem

Key patterns Claude Code gets right

What this workflow prevents

The checklist I run before every migration

Top comments (0)