Discussion on: How do you populate your development databases?

View post

Wow I was just talking about this with @maestromac. We seed our database with dummy data as our default development process. Even though we could, we don't grab copies of production DB unless we also have some safety constraints in place. I just wouldn't want to do something bad by accident in that regard.

But our process is inefficient and is not an ideal development experience. I feel like it might be better if we actively maintained a central dummy database we manicured by reseeding it with a well-thought out process of creating dummy data or preprocessing production data. Then storing that in the cloud and every time someone wants to reset their dev environment, they download the ready-to-go DB that is primed for a beautiful, fresh dev setup.

Does this make sense or am I way off-base?

Jared Silver • Oct 9 '17

Yup, this makes a lot of sense -- thanks, @ben !

Though, I'm curious about what your data structure looks like that you're able to generate dummy data as part of your development process.

Prod has the benefit of giving us virtually every imaginable permutation of data, whereas manually generating it will likely result in blind spots that make it tougher to catch things during the QA process.

I definitely agree with respect to the security concerns around grabbing prod, and I really like the idea of trying to maintain a central dummy database!

Ben Halpern • Oct 9 '17

Yeah our dummy data system has its flaws and only works because it's worked so far. Far from a universal-forever solution, but we have a small team and gradually building out concrete solutions as our team and data-scaling needs necessitate them.

The centrally maintained dummy DB could be based off production, but I think it's key to have it be its own project with dedicated maintenance (from few or many) and setting the team up for success.

I'd really love to hear from others, especially those in large, hairy, long-lived projects.

Grumpy • Oct 11 '17

I've worked at a couple major online retailers with long-lived databases and they've been grotty. Basically, after a decade or more, the only way you end up with a prod database you could synthetically dummy up is if all the members that have ever worked on the team (including management) have kept technical debt near zero. Not. Gunna. Happen. Deadlines, deferred maintenance, and shifting business goals impose compromises that have long-lived effects on how other features are built, etc. The ripples go on and on.

It becomes insane to try to maintain a synthetic dev/test database that duplicates all the things that users do, or have done, possibly using features that no longer exist, or by exploiting bugs. At some point, trying to dummy up "production-adjacent" data is as much work as just using a copy of production.

The best solution I've seen used a nightly snapshot of production together with Docker so that all devs did their daily work with a full copy of the production DB. (With user passwords, etc. stripped.) We could let our dev DB get paved over every night, or flip a switch so that the replication process would leave it alone.