I created Mockyard - a free, open source, and self-hostable alternative to Mockaroo.
Mockyard ships as a Docker container that just runs:
docker run -p 8080:8080 ghcr.io/portside-labs/mockyard
Problem
If you don't know what Mockaroo is, it's an online tool for generating large amounts of mock data in formats like CSV, JSON, SQL, etc.
The catch is that Mockaroo is limited to 1K rows per file on the free tier, and costs $60/year if you want to generate files with up to 100K rows. It's also not open source and not self-hostable.
I built Mockyard for two reasons:
- AI has made it possible to build things that used to live in the "I wish I had time for this" category.
- I needed to test CSV ingestion pipelines with hundreds of thousands to millions of records, and I wanted something that was fast, memory efficient, easy to use, and didn't require going online or installing a bunch of languages, tools, or packages.
Now it costs me $0 to generate up to 10 million rows per file, every day of the year.
Honestly, I figured someone would have already built this, but either nobody actually has or I'm terrible at Googling.
And yes, I tried generatedata.com too. It's good, but it didn't quite fit the way I needed to generate some of my mock data.
Differences
One thing I wanted was the ability to generate data using weighted enums. In Mockyard, you can specify not just fixed enum values, but also their distribution.
For example:
- 20% of records should have
role = Admin - 30% should have
role = Manager - The remaining 50% should have
role = Viewer
Another issue I had was address realism. Seeing things like:
Miami, Yukon Territory, Switzerland
for city, state, and country combinations hurt my eyes.
So Mockyard supports lookup tables so fields can stay logically connected. If a city is selected, the state and country can match appropriately instead of being generated independently and producing nonsense.
You still have to specify your own lookup values, but at least the generated data looks realistic.
Performance
Initial benchmarks for the same CSV with four columns:
| Rows | Format | Time | Throughput (rows/sec) |
|---|---|---|---|
| 1,000 | CSV | 0.02s | ~50,000 |
| 10,000 | CSV | 0.09s | ~111,111 |
| 100,000 | CSV | 0.53s | ~188,679 |
| 1,000,000 | CSV | 4.89s | ~204,499 |
| 10,000,000 | CSV | 53.61s | ~186,532 |
10 million rows is currently the max in Mockyard.
Anything beyond one million rows and Excel won't even load the whole file anyway (at least on Mac), so this should cover most real-world scenarios.
Web or API - Your Pick
If you want to generate data programmatically, Mockyard also exposes an API endpoint:
The UI actually uses the exact same API under the hood.
Output
Right now, output is limited to CSV and JSON.
Why?
Because I personally haven't needed anything else yet.
That said, if people actually find this useful and want support for additional formats, feel free to open an issue on the repo.
Repository: https://github.com/portside-labs/mockyard


Top comments (0)