I Stopped Writing Database Fixtures by Hand. Here's What I Do Instead.

#ai #webdev #database #testing

I'll be honest with you. For the longest time, writing database fixtures was just something I accepted as part of the job. You build a feature, you write the tests, you write the fake data the tests run against. Nobody loves doing it. Nobody complains loudly enough to change it. It just becomes part of the rhythm.

Then I inherited a codebase where the fixtures hadn't been touched in two years.

The schema had changed eleven times. Half the foreign keys in the fixture files pointed to IDs that no longer existed. Date fields were hardcoded to 2022, which meant every piece of logic tied to account age or subscription tenure was silently returning wrong results. The tests were green. The data was completely detached from anything resembling reality.
That was the moment I decided to actually fix this instead of patch it again.

The Problem With Handwritten Fixtures Nobody Talks About

The obvious problem is that fixtures go stale. You update a table, add a column, rename a field — and suddenly a fixture file three directories away is broken in a way that takes 45 minutes to track down.
But the deeper problem is that handwritten fixtures are optimistic by nature. When you write fake data by hand, you naturally write the clean, happy-path version. Realistic names, round numbers, sensible dates. You don't write the edge cases because you're not thinking about edge cases when you're setting up the test environment. You're thinking about the feature.

So your local database looks nothing like production. Production has users who signed up in 2017 with half-filled profiles. It has orders with NULL shipping addresses from a bug that lived for three weeks in 2023. It has customers whose subscription tier doesn't match their billing cycle because of a migration that ran slightly wrong. Your fixtures have none of that. And so your tests are validating behavior against a version of your product that doesn't actually exist.That gap is where bugs live. The ones that only show up in production.

Why Copying Production Doesn't Solve It?

The obvious workaround is to grab a production snapshot. Real data, real distributions, real edge cases.

I've done this. Most developers have. And the problem isn't just the compliance angle — though that's real and worth taking seriously if your product touches anything regulated. The practical developer problem is that production data is unpredictable in ways that break local environments constantly.

You restore a prod snapshot locally and suddenly a background job triggers on stale records and hammers an external API. Or your seed script runs and conflicts with hardcoded IDs in your test suite. Or you realize you've just put 200,000 rows in a local Postgres instance and your laptop fan sounds like a helicopter.

And then there's the reset problem. Production data doesn't reset cleanly. You can't just wipe it and regenerate a fresh consistent state in 30 seconds when something goes wrong in a test run. Fixtures, for all their problems, at least give you that.

What I actually needed was data that behaved like production — same distributions, same edge cases, same relational complexity — but that was generated fresh on demand and had no connection to real customer records.

What I Use Now?

I started using SyntheholDB a few months ago and it changed how I think about test data entirely.

The workflow is simpler than I expected. You describe your schema in plain English — not in a config file, not in YAML, just in a sentence or two the way you'd explain your data model to a new engineer on the team. Something like: "I need a Users table, an Orders table, and a Products table. Each order should link to a user by user ID. Order value should scale loosely with how long the user has been a customer. Include some users with no orders."

That's the input. What comes back is a fully generated dataset with foreign keys that actually resolve, value distributions that reflect the logic you described, date ranges that make sense, and edge cases baked in — including the users with no orders you mentioned. The relational integrity holds across every linked table.

Before the export, a PII scan runs automatically. If any generated value accidentally matches a pattern that resembles a real identifier — an email format that collides with an actual domain, a number sequence that looks like an SSN — it gets flagged before the file leaves the tool. Nothing slips into your environment that shouldn't be there.
The whole thing from blank screen to a CSV ready to seed my local database takes under five minutes. You can try it free at db.synthehol.ai — no credit card, no setup call.

How It Fits Into a Real Workflow?

For local development I just run the generation once, commit the output CSV as my seed file, and the team uses it. When the schema changes, I regenerate. Takes two minutes. No more hunting down which fixture file is now broken.

For CI I wired SyntheholDB into the pipeline seed step directly. Every test run starts from a freshly generated dataset that's consistent, realistic, and relational. Test isolation improved significantly because we're no longer sharing stale state between runs.

For demos it's been genuinely useful in a way I didn't anticipate. Spinning up a demo environment with realistic-looking data used to require either using sanitized production exports — which always felt slightly risky — or building a custom seed script that took days to get right.

Now I describe what the demo data should look like and have a seeded environment in under five minutes. The data looks real enough to be convincing without any actual customer records involved.

What Actually Changed?

The thing I noticed most wasn't the time saved, though that's real. It was that I stopped dreading schema changes.

When your test data is generated rather than handwritten, a schema change just means regenerating. You're not hunting through fixture files. You're not updating foreign key references by hand. You describe what you need, you get it back, you move on.

The edge cases improved too. Because I can describe the distribution I want — "include about 15% of users with incomplete profiles, and 5% with orders in a failed payment state" — the test suite now covers behavior that the old handwritten fixtures never got close to. We caught two bugs in the first month that would have lived until production under the old approach.

I'm not going back to writing fixtures by hand. The 20 minutes I spent figuring out SyntheholDB bought back hours every sprint and gave me test data that actually reflects how a real product behaves.

If you're still maintaining fixture files that are six months out of sync with your schema, that time is worth spending.