Alan West

Posted on Apr 19

How to Stop Nuking Your Postgres Data When Testing Schema Changes

#database #postgres #devops #opensource

We've all been there. You're working on a feature that requires a schema migration, you run it against your dev database, something goes wrong, and now your carefully seeded test data is toast. Or worse — you accidentally ran it against staging.

The traditional solution is some combination of database dumps, Docker containers, and a prayer. But there's a better pattern emerging in the Postgres ecosystem: copy-on-write database branching. And with open-source tools like Xata bringing this to self-hostable Postgres platforms, it's worth understanding how this actually works and how to set it up.

The Root Cause: Shared Mutable State

The fundamental problem is that databases are shared mutable state — the thing every CS textbook warns you about. Here's what typically goes wrong:

One dev database for the team — migrations collide, test data gets overwritten
Local database per developer — data gets stale, fixtures drift from reality
Snapshot/restore workflows — slow, eat disk space, and nobody remembers to update them

Each approach has tradeoffs, but they all share a common failure mode: getting a clean, realistic copy of your database for testing is either slow, expensive, or both.

-- The classic "oh no" workflow
ALTER TABLE users ADD COLUMN org_id INTEGER;
-- Wait, I need a NOT NULL constraint...
ALTER TABLE users ALTER COLUMN org_id SET NOT NULL;
-- ERROR: column "org_id" of relation "users" contains null values
-- Now you're writing backfill scripts at 4pm on a Friday

What Copy-on-Write Branching Actually Is

If you've used Git, the mental model is straightforward. Copy-on-write (CoW) branching creates a logical fork of your database that shares the underlying data pages with the parent. You only pay storage costs for the data that actually changes on the branch.

This isn't a new concept at the filesystem level — ZFS and Btrfs have done this for years. The innovation is applying it at the Postgres layer, where you get branch-aware connection strings and can treat each branch as its own isolated database.

Here's the key insight: a traditional pg_dump | pg_restore of a 50GB database might take 20 minutes. A CoW branch? Usually seconds, regardless of database size. The data isn't copied — it's referenced.

Parent database (50GB)
├── Branch: feature/add-orgs    (only stores changed pages, ~50MB)
├── Branch: feature/new-billing  (only stores changed pages, ~120MB)
└── Branch: hotfix/user-emails   (only stores changed pages, ~2MB)

Setting Up Branch-Based Workflows

Xata is an open-source, cloud-native Postgres platform that implements this pattern. According to the project's GitHub repo, it provides copy-on-write branching along with scale-to-zero capabilities. Here's how a branch-based workflow generally looks with tools that support this pattern:

Step 1: Create a Branch for Your Feature

Most tools that support Postgres branching expose this through a CLI or API. The general pattern looks like:

# Create a branch from your main database
# (exact syntax varies by tool)
xata branch create feature/add-org-support --from main

# You get a connection string scoped to this branch
# postgresql://branch-feature-add-org-support:5432/mydb

The branch is instant. No waiting for a dump to finish, no disk space explosion.

Step 2: Run Your Migration Against the Branch

Now you can safely test destructive operations:

-- Connected to: feature/add-org-support branch
-- This only affects the branch, not main

BEGIN;

CREATE TABLE organizations (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    slug TEXT UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now()
);

ALTER TABLE users ADD COLUMN org_id INTEGER REFERENCES organizations(id);

-- Backfill existing users into a default org
INSERT INTO organizations (name, slug) VALUES ('Default', 'default');
UPDATE users SET org_id = (SELECT id FROM organizations WHERE slug = 'default');

-- Now safe to add the constraint
ALTER TABLE users ALTER COLUMN org_id SET NOT NULL;

COMMIT;

If something blows up? Delete the branch. Your main data is untouched. No rollback scripts, no restoring from backups.

Step 3: Validate and Merge

Once your migration works correctly on the branch, you have a few options:

Run the migration against main — treat the branch as a dry run
Promote the branch — if the tool supports it, swap the branch in as the new main
Reset and re-branch — start fresh if you need to iterate

Why Scale-to-Zero Matters Here

Here's the thing about dev/preview databases: most of them sit idle 90% of the time. That feature branch you created on Monday? It's been idle since Tuesday afternoon.

Scale-to-zero means those idle branches aren't consuming compute resources. The storage (which is minimal thanks to CoW) persists, but the Postgres process itself shuts down when there's no active connections. When someone connects again, it spins back up.

This is what makes branch-per-PR workflows actually viable economically. Without scale-to-zero, ten branches means ten running Postgres instances. With it, you're only paying for what's actually being queried.

Wiring This Into CI/CD

The real power is automating this. Here's a simplified GitHub Actions workflow that creates a branch per PR:

# .github/workflows/preview-db.yml
name: Preview Database
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  create-preview-db:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Create database branch
        run: |
          # Create a branch named after the PR
          BRANCH_NAME="pr-${{ github.event.pull_request.number }}"
          # Use your branching tool's CLI here
          xata branch create "$BRANCH_NAME" --from main

      - name: Run migrations
        run: |
          # Point your migration tool at the branch
          DATABASE_URL=$(xata branch connection-string "pr-${{ github.event.pull_request.number }}")
          npx prisma migrate deploy
        env:
          DATABASE_URL: ${{ env.DATABASE_URL }}

      - name: Run integration tests
        run: npm test -- --integration

When the PR is merged or closed, a cleanup job deletes the branch. Clean, automated, and nobody accidentally tests against production.

Prevention Tips: Stop the Pain Before It Starts

Even without fancy branching tools, you can adopt patterns that reduce database pain:

Always use transactions in migrations — if step 3 of 5 fails, you don't end up in a half-migrated state
Test migrations with BEGIN; ... ROLLBACK; — validate the SQL without committing
Use IF NOT EXISTS guards — makes migrations idempotent and re-runnable
Keep a seed.sql in version control — deterministic test data that any developer can load
Name your constraints — ALTER TABLE DROP CONSTRAINT is a lot easier when you know the name

-- Idempotent migration pattern
DO $$
BEGIN
    IF NOT EXISTS (
        SELECT 1 FROM information_schema.columns
        WHERE table_name = 'users' AND column_name = 'org_id'
    ) THEN
        ALTER TABLE users ADD COLUMN org_id INTEGER;
    END IF;
END $$;

When to Reach for Database Branching

Database branching isn't always necessary. If you're working solo on a small project with a simple schema, pg_dump and a good seed.sql are probably fine.

But it starts to shine when:

Multiple developers are working on competing schema changes
You need preview environments with realistic data
Your database is large enough that dump/restore is painfully slow
You're running integration tests in CI that need isolated database state

The Postgres ecosystem is evolving fast, and copy-on-write branching is one of the more practical innovations I've seen. Projects like Xata are worth keeping an eye on if this workflow appeals to you. Being open source and designed for cloud-native deployments, it fits into the broader trend of making Postgres operations feel as smooth as Git operations.

The bottom line: your database workflow shouldn't be the bottleneck in your development process. Whether you adopt full branching or just tighten up your migration hygiene, the goal is the same — stop being afraid to touch the schema.

DEV Community