Jake Lazarus

Posted on Mar 26 • Originally published at basecut.dev

How to Replace Seed Scripts with Production Snapshots

#postgres #testing #database #devtools

Seed scripts are technical debt that nobody tracks.

They start as a convenience — a few INSERT statements so the app boots locally — and they end up as a 400-line file that touches 30 tables, breaks on every third migration, and produces data that looks nothing like production. Everyone knows the seed script is bad. Nobody wants to fix it, because fixing it means rewriting it, and it will just rot again.

The usual response is to invest in a better seed script — more tables, better relationships, more realistic values. But the underlying issue is not the quality of the script. It is that hand-crafting test data stops scaling as schema complexity grows.

TL;DR: Define what data you need in a YAML config, extract a subset from production with PII anonymized, and restore it anywhere. No INSERT statements to maintain — the snapshot stays current automatically.

Why database seed scripts break as projects grow

As a database seeding approach, seed scripts have real advantages early on: they are version-controlled, deterministic, and easy to understand. But those advantages erode as the schema grows, and three problems start compounding.

Schema drift

Every migration is a chance for the seed script to break. A new NOT NULL column, a renamed FK, a dropped table — each one needs a corresponding update to the seed file. Those updates happen late or not at all. The person writing the migration is thinking about the migration, not about whether seed.sql still runs.

The script does not fail loudly. It just produces increasingly stale data.

Manual referential integrity

In production, an order belongs to a user, references line items, connects to shipments and payments. In a seed script, you maintain all of those relationships by hand. Every ID, every FK, every cross-table reference. Miss one and you get constraint violations or, worse, data that loads fine but makes the app behave in ways it never would in production.

Flat data

Test User 1 with test@example.com and two orders is structurally valid. It is not useful. Real users have Unicode in their names. Real accounts have nullable fields that are actually null. Real customers have 47 orders accumulated over two years, with edge cases nobody thought to fabricate.

The bugs that reach production are usually triggered by data shapes that did not exist in the seed script, because nobody anticipated them. We covered this in more detail in Why Fake PostgreSQL Test Data Misses Real Bugs.

Production database snapshots: the seed script alternative

Instead of fabricating data, extract it.

Not with pg_dump — that copies everything, including all the PII you should not have in dev and all the volume you do not need. The Thoughtworks Technology Radar explicitly recommends against using raw production data in test environments for exactly this reason — the privacy and security risks outweigh the convenience. Instead, extract a subset: a small, connected slice of production data with sensitive fields anonymized during extraction.

What you get is a snapshot that reflects the real schema, has valid relationships because they were followed rather than hand-coded, and contains real data shapes because they came from production. It also requires no maintenance, because the next snapshot picks up schema changes automatically.

This is the workflow Basecut was built for. Define what to extract, run one command, restore to any database.

How database subsetting works in practice

1. Define what to extract

Instead of INSERT statements, you describe the shape of the data you want:

version: '1'
name: 'dev-data'

from:
  - table: users
    where: 'created_at > :since'
    params:
      since: '2026-01-01'

limits:
  rows:
    per_table: 1000
    total: 50000

anonymize:
  mode: auto

Start from recent users, follow FKs to collect related data, cap the size, auto-detect and anonymize PII. You can add explicit anonymization rules when you need them — we cover that in How to Anonymize PII in PostgreSQL for Development.

The important difference from a seed script: this config describes what to extract, not what to insert. New columns, new tables, and new relationships get picked up on the next snapshot without touching the config.

2. Create a snapshot

basecut snapshot create --config basecut.yml

Basecut connects to your database (or a read replica), traverses relationships, anonymizes PII inline, and writes the result. No real PII ever leaves production.

3. Restore locally

basecut snapshot restore dev-data:latest \
  --target "$LOCAL_DATABASE_URL"

That is the local dev setup. A new developer joining the team runs two commands and has a working database with realistic test data management handled for them.

Basecut handles all of this in one CLI command. See the quickstart →

4. Share it

Once a snapshot exists, anyone on the team can restore it. Everyone works against the same fixture data, which means bugs are reproducible across machines and "works on my machine" stops being about data differences.

Comparison

	Seed script	Production snapshot
Schema changes	Breaks until someone updates it	Automatic
FK integrity	Manual	Followed from the database
Data realism	Fabricated	Real, anonymized
PII risk	None (but no realism either)	Handled at extraction
Maintenance	Grows with the schema	Near zero
Onboarding	Run, debug, ask for help	Restore, start working
Edge cases	Only what someone added	Whatever production has

More detail in our seed scripts vs Basecut comparison.

Using production snapshots in CI/CD pipelines

The same snapshot works in CI. Replace the seed script step with a restore:

name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: test_db
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v4

      - name: Install Basecut CLI
        run: |
          curl -fsSL https://basecut.dev/install.sh | sh
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Restore snapshot
        env:
          BASECUT_API_KEY: ${{ secrets.BASECUT_API_KEY }}
        run: |
          basecut snapshot restore dev-data:latest \
            --target "postgresql://postgres:postgres@localhost:5432/test_db"

      - name: Run tests
        run: npm test

New tables and columns from migrations show up in the next snapshot. No CI config changes needed.

More on this in our CI/CD test data guide.

When seed scripts still make sense

Seed scripts are the right tool in some situations:

Pre-launch projects with no production data yet.
Intentionally fictional demos where you need a specific scenario.
Unit tests that need three rows in one table. A snapshot is overkill there.
Small schemas where the maintenance cost is genuinely low.

If your schema has fewer than ten tables and no PII, a seed script is probably the right choice. The crossover point is usually obvious — it is when maintaining the seed file takes more effort than it saves.

Migrating off the seed script

You do not have to rip it out in one PR.

Start with one workflow. Pick the place where the seed script hurts most — usually dev environment setup or CI. Set up a snapshot and run it alongside the seed script.
Compare. Run the app against both datasets. The snapshot will usually expose things the seed script missed: edge cases, data shapes that only exist in production, relationships that only worked because the script inserted rows in a specific order.
Switch gradually. Replace the seed script where the snapshot is better. Keep it for unit tests or demos if it still makes sense.
Let it go. Once the snapshot covers your main workflows, stop maintaining the seed script. Do not delete it if people reference it — just stop investing in keeping it current.

Final thought

The seed script is one of those things that works well enough to never get fixed. It seeds the database. The app boots. Nobody wants to touch it.

The problem is that "well enough" slowly gets worse. The schema changes, the data drifts, the edge cases multiply, and the gap between what the seed script produces and what production looks like gets wider every quarter.

Production snapshots close that gap by removing the maintenance entirely. The data stays current because it comes from the real database. The relationships stay valid because they are followed, not written by hand. And anonymization is part of the process rather than a separate step someone has to remember.

If your seed script is the file nobody wants to own, maybe the right move is making sure nobody has to.

Get started in minutes. Basecut extracts FK-aware, anonymized snapshots from PostgreSQL with one CLI command. Free for small teams. Try Basecut free →

Or explore first: quickstart guide · snapshot config reference

DEV Community