DEV Community

Divan
Divan

Posted on

Mock Data API That Actually Understands Foreign Keys

Every developer knows the pain: you need test data for your database, but generating realistic relational data is a nightmare.

You end up with orders pointing to customers that don't exist. Invoices referencing products with invalid IDs. Foreign key constraints screaming at you.

So I built DataForge - a mock data API where foreign keys actually work.


The Problem

Let's say you have a simple e-commerce schema:

customers (id, name, email)
orders (id, customer_id, total, status)  -- customer_id references customers.id
Enter fullscreen mode Exit fullscreen mode

When you generate mock data, you need orders.customer_id to contain real IDs that exist in the customers table.

Most solutions give you random UUIDs. Then your app crashes. Or you spend hours writing scripts to manually wire up relationships.


The Solution

With DataForge, you define relationships in your schema, and the API handles the rest:

{
  "tables": {
    "customers": {
      "count": 100,
      "columns": {
        "id": { "type": "uuid" },
        "name": { "type": "person.fullName" },
        "email": { "type": "person.email" }
      }
    },
    "orders": {
      "count": 500,
      "columns": {
        "id": { "type": "uuid" },
        "customer_id": { "reference": "customers.id" },
        "total": { "type": "commerce.price(50,500)" },
        "status": { "type": "random([\"pending\",\"shipped\",\"delivered\"])" }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The magic is in this line:

"customer_id": { "reference": "customers.id" }
Enter fullscreen mode Exit fullscreen mode

DataForge automatically:

  1. Generates customers first
  2. Collects all the customer IDs
  3. Assigns real, valid IDs to each order

No broken foreign keys. No constraint violations. Just clean, relational data.


Real Output

Here's actual output from the API:

{
  "customers": [
    { "id": "d457875d-...", "name": "David Miller", "email": "david.miller@gmail.com" },
    { "id": "e45b7aa7-...", "name": "Mary Gonzalez", "email": "mary.g@outlook.com" },
    { "id": "ae5e7e8b-...", "name": "Emma Jones", "email": "emma.jones42@yahoo.com" }
  ],
  "orders": [
    { "id": "156a6208-...", "customer_id": "e45b7aa7-...", "total": 116.32, "status": "delivered" },
    { "id": "b8089f49-...", "customer_id": "d457875d-...", "total": 165.47, "status": "pending" },
    { "id": "86cc7ffe-...", "customer_id": "e45b7aa7-...", "total": 176.77, "status": "delivered" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Every customer_id in orders is a valid UUID from the customers table.


50+ Data Types

Beyond foreign keys, DataForge supports all the types you'd expect:

People & Identity

person.firstName, person.lastName, person.fullName
person.email, person.phone, person.username
person.age, person.jobTitle, person.dateOfBirth
Enter fullscreen mode Exit fullscreen mode

Addresses

address.street, address.city, address.state
address.country, address.zipCode
address.latitude, address.longitude
Enter fullscreen mode Exit fullscreen mode

Commerce & Finance

commerce.price(min, max), commerce.productName
commerce.company, commerce.category
finance.creditCard, finance.bitcoin, finance.iban
Enter fullscreen mode Exit fullscreen mode

Internet

internet.url, internet.domain, internet.ip
internet.email, internet.hexColor
Enter fullscreen mode Exit fullscreen mode

Dates

date.past(years), date.future(years), date.recent(days)
date.between(start, end), date.timestamp
Enter fullscreen mode Exit fullscreen mode

Special Types

uuid, autoincrement, boolean(probability)
random(["option1", "option2", "option3"])
weighted({"active": 0.8, "inactive": 0.2})
Enter fullscreen mode Exit fullscreen mode

Multiple Output Formats

JSON (default)

curl -X POST https://your-api/generate \
  -H "Content-Type: application/json" \
  -d '{"tables": {...}, "output": "json"}'
Enter fullscreen mode Exit fullscreen mode

SQL (PostgreSQL, MySQL, SQL Server, SQLite)

curl -X POST https://your-api/generate \
  -H "Content-Type: application/json" \
  -d '{"tables": {...}, "output": "sql", "dialect": "postgresql"}'
Enter fullscreen mode Exit fullscreen mode

Output:

INSERT INTO "customers" ("id", "name", "email") VALUES
    ('d457875d-...', 'David Miller', 'david.miller@gmail.com'),
    ('e45b7aa7-...', 'Mary Gonzalez', 'mary.g@outlook.com');

INSERT INTO "orders" ("id", "customer_id", "total", "status") VALUES
    ('156a6208-...', 'e45b7aa7-...', 116.32, 'delivered'),
    ('b8089f49-...', 'd457875d-...', 165.47, 'pending');
Enter fullscreen mode Exit fullscreen mode

CSV - One file per table with headers


Reproducible Results

Need the same data every time? Use a seed:

{
  "tables": { ... },
  "seed": "my-test-seed-123"
}
Enter fullscreen mode Exit fullscreen mode

Same seed = identical output. Perfect for:

  • Consistent test suites
  • Reproducible demos
  • Debugging specific scenarios

Complex Schema Example

Here's a more realistic example with multiple relationships:

{
  "tables": {
    "users": {
      "count": 50,
      "columns": {
        "id": { "type": "uuid", "unique": true },
        "username": { "type": "person.username", "unique": true },
        "email": { "type": "person.email", "unique": true },
        "created_at": { "type": "date.past(2)" }
      }
    },
    "categories": {
      "count": 10,
      "columns": {
        "id": { "type": "autoincrement" },
        "name": { "type": "commerce.category" },
        "slug": { "type": "text.slug(2)" }
      }
    },
    "products": {
      "count": 200,
      "columns": {
        "id": { "type": "uuid" },
        "category_id": { "reference": "categories.id" },
        "name": { "type": "commerce.productName" },
        "price": { "type": "commerce.price(10,500)" },
        "in_stock": { "type": "boolean(0.8)" }
      }
    },
    "orders": {
      "count": 500,
      "columns": {
        "id": { "type": "uuid" },
        "user_id": { "reference": "users.id" },
        "total": { "type": "commerce.price(20,1000)" },
        "status": { "type": "weighted({\"pending\":0.2,\"processing\":0.3,\"shipped\":0.3,\"delivered\":0.2})" },
        "created_at": { "type": "date.recent(90)" }
      }
    },
    "order_items": {
      "count": 1500,
      "columns": {
        "id": { "type": "uuid" },
        "order_id": { "reference": "orders.id" },
        "product_id": { "reference": "products.id" },
        "quantity": { "type": "number.between(1,5)" },
        "price": { "type": "commerce.price(10,500)" }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Five related tables, all with valid foreign keys, generated in milliseconds.


Use Cases

Testing
Populate your test database with realistic data that respects all constraints.

Demos & Prototypes
Show clients a working product with believable data instead of "Lorem ipsum" everywhere.

Load Testing
Generate thousands of rows to stress-test your queries and indexes.

Documentation
Create realistic API response examples for your docs.

Learning
Practice SQL joins, ORMs, and database design with proper relational data.


Try It

DataForge is available on RapidAPI:

👉 DataForge on RapidAPI


What's Next?

I'm actively developing DataForge and would love feedback. Some features on the roadmap:

  • More data types (healthcare, gaming, IoT)
  • GraphQL output
  • Schema inference from existing databases
  • Visual schema builder

Drop a comment with what features you'd find useful!


What's your current approach to generating test data? Have you run into the foreign key problem before? Let me know in the comments.

Top comments (0)