Every developer knows the pain: you need test data for your database, but generating realistic relational data is a nightmare.
You end up with orders pointing to customers that don't exist. Invoices referencing products with invalid IDs. Foreign key constraints screaming at you.
So I built DataForge - a mock data API where foreign keys actually work.
The Problem
Let's say you have a simple e-commerce schema:
customers (id, name, email)
orders (id, customer_id, total, status) -- customer_id references customers.id
When you generate mock data, you need orders.customer_id to contain real IDs that exist in the customers table.
Most solutions give you random UUIDs. Then your app crashes. Or you spend hours writing scripts to manually wire up relationships.
The Solution
With DataForge, you define relationships in your schema, and the API handles the rest:
{
"tables": {
"customers": {
"count": 100,
"columns": {
"id": { "type": "uuid" },
"name": { "type": "person.fullName" },
"email": { "type": "person.email" }
}
},
"orders": {
"count": 500,
"columns": {
"id": { "type": "uuid" },
"customer_id": { "reference": "customers.id" },
"total": { "type": "commerce.price(50,500)" },
"status": { "type": "random([\"pending\",\"shipped\",\"delivered\"])" }
}
}
}
}
The magic is in this line:
"customer_id": { "reference": "customers.id" }
DataForge automatically:
- Generates customers first
- Collects all the customer IDs
- Assigns real, valid IDs to each order
No broken foreign keys. No constraint violations. Just clean, relational data.
Real Output
Here's actual output from the API:
{
"customers": [
{ "id": "d457875d-...", "name": "David Miller", "email": "david.miller@gmail.com" },
{ "id": "e45b7aa7-...", "name": "Mary Gonzalez", "email": "mary.g@outlook.com" },
{ "id": "ae5e7e8b-...", "name": "Emma Jones", "email": "emma.jones42@yahoo.com" }
],
"orders": [
{ "id": "156a6208-...", "customer_id": "e45b7aa7-...", "total": 116.32, "status": "delivered" },
{ "id": "b8089f49-...", "customer_id": "d457875d-...", "total": 165.47, "status": "pending" },
{ "id": "86cc7ffe-...", "customer_id": "e45b7aa7-...", "total": 176.77, "status": "delivered" }
]
}
Every customer_id in orders is a valid UUID from the customers table.
50+ Data Types
Beyond foreign keys, DataForge supports all the types you'd expect:
People & Identity
person.firstName, person.lastName, person.fullName
person.email, person.phone, person.username
person.age, person.jobTitle, person.dateOfBirth
Addresses
address.street, address.city, address.state
address.country, address.zipCode
address.latitude, address.longitude
Commerce & Finance
commerce.price(min, max), commerce.productName
commerce.company, commerce.category
finance.creditCard, finance.bitcoin, finance.iban
Internet
internet.url, internet.domain, internet.ip
internet.email, internet.hexColor
Dates
date.past(years), date.future(years), date.recent(days)
date.between(start, end), date.timestamp
Special Types
uuid, autoincrement, boolean(probability)
random(["option1", "option2", "option3"])
weighted({"active": 0.8, "inactive": 0.2})
Multiple Output Formats
JSON (default)
curl -X POST https://your-api/generate \
-H "Content-Type: application/json" \
-d '{"tables": {...}, "output": "json"}'
SQL (PostgreSQL, MySQL, SQL Server, SQLite)
curl -X POST https://your-api/generate \
-H "Content-Type: application/json" \
-d '{"tables": {...}, "output": "sql", "dialect": "postgresql"}'
Output:
INSERT INTO "customers" ("id", "name", "email") VALUES
('d457875d-...', 'David Miller', 'david.miller@gmail.com'),
('e45b7aa7-...', 'Mary Gonzalez', 'mary.g@outlook.com');
INSERT INTO "orders" ("id", "customer_id", "total", "status") VALUES
('156a6208-...', 'e45b7aa7-...', 116.32, 'delivered'),
('b8089f49-...', 'd457875d-...', 165.47, 'pending');
CSV - One file per table with headers
Reproducible Results
Need the same data every time? Use a seed:
{
"tables": { ... },
"seed": "my-test-seed-123"
}
Same seed = identical output. Perfect for:
- Consistent test suites
- Reproducible demos
- Debugging specific scenarios
Complex Schema Example
Here's a more realistic example with multiple relationships:
{
"tables": {
"users": {
"count": 50,
"columns": {
"id": { "type": "uuid", "unique": true },
"username": { "type": "person.username", "unique": true },
"email": { "type": "person.email", "unique": true },
"created_at": { "type": "date.past(2)" }
}
},
"categories": {
"count": 10,
"columns": {
"id": { "type": "autoincrement" },
"name": { "type": "commerce.category" },
"slug": { "type": "text.slug(2)" }
}
},
"products": {
"count": 200,
"columns": {
"id": { "type": "uuid" },
"category_id": { "reference": "categories.id" },
"name": { "type": "commerce.productName" },
"price": { "type": "commerce.price(10,500)" },
"in_stock": { "type": "boolean(0.8)" }
}
},
"orders": {
"count": 500,
"columns": {
"id": { "type": "uuid" },
"user_id": { "reference": "users.id" },
"total": { "type": "commerce.price(20,1000)" },
"status": { "type": "weighted({\"pending\":0.2,\"processing\":0.3,\"shipped\":0.3,\"delivered\":0.2})" },
"created_at": { "type": "date.recent(90)" }
}
},
"order_items": {
"count": 1500,
"columns": {
"id": { "type": "uuid" },
"order_id": { "reference": "orders.id" },
"product_id": { "reference": "products.id" },
"quantity": { "type": "number.between(1,5)" },
"price": { "type": "commerce.price(10,500)" }
}
}
}
}
Five related tables, all with valid foreign keys, generated in milliseconds.
Use Cases
Testing
Populate your test database with realistic data that respects all constraints.
Demos & Prototypes
Show clients a working product with believable data instead of "Lorem ipsum" everywhere.
Load Testing
Generate thousands of rows to stress-test your queries and indexes.
Documentation
Create realistic API response examples for your docs.
Learning
Practice SQL joins, ORMs, and database design with proper relational data.
Try It
DataForge is available on RapidAPI:
What's Next?
I'm actively developing DataForge and would love feedback. Some features on the roadmap:
- More data types (healthcare, gaming, IoT)
- GraphQL output
- Schema inference from existing databases
- Visual schema builder
Drop a comment with what features you'd find useful!
What's your current approach to generating test data? Have you run into the foreign key problem before? Let me know in the comments.
Top comments (0)