Divan

Posted on Dec 15, 2025

Mock Data API That Actually Understands Foreign Keys

#api #programming #sideprojects #webdev

Every developer knows the pain: you need test data for your database, but generating realistic relational data is a nightmare.

You end up with orders pointing to customers that don't exist. Invoices referencing products with invalid IDs. Foreign key constraints screaming at you.

So I built DataForge - a mock data API where foreign keys actually work.

The Problem

Let's say you have a simple e-commerce schema:

customers (id, name, email)
orders (id, customer_id, total, status)  -- customer_id references customers.id

When you generate mock data, you need orders.customer_id to contain real IDs that exist in the customers table.

Most solutions give you random UUIDs. Then your app crashes. Or you spend hours writing scripts to manually wire up relationships.

The Solution

With DataForge, you define relationships in your schema, and the API handles the rest:

{
  "tables": {
    "customers": {
      "count": 100,
      "columns": {
        "id": { "type": "uuid" },
        "name": { "type": "person.fullName" },
        "email": { "type": "person.email" }
      }
    },
    "orders": {
      "count": 500,
      "columns": {
        "id": { "type": "uuid" },
        "customer_id": { "reference": "customers.id" },
        "total": { "type": "commerce.price(50,500)" },
        "status": { "type": "random([\"pending\",\"shipped\",\"delivered\"])" }
      }
    }
  }
}

The magic is in this line:

"customer_id": { "reference": "customers.id" }

DataForge automatically:

Generates customers first
Collects all the customer IDs
Assigns real, valid IDs to each order

No broken foreign keys. No constraint violations. Just clean, relational data.

Real Output

Here's actual output from the API:

{
  "customers": [
    { "id": "d457875d-...", "name": "David Miller", "email": "david.miller@gmail.com" },
    { "id": "e45b7aa7-...", "name": "Mary Gonzalez", "email": "mary.g@outlook.com" },
    { "id": "ae5e7e8b-...", "name": "Emma Jones", "email": "emma.jones42@yahoo.com" }
  ],
  "orders": [
    { "id": "156a6208-...", "customer_id": "e45b7aa7-...", "total": 116.32, "status": "delivered" },
    { "id": "b8089f49-...", "customer_id": "d457875d-...", "total": 165.47, "status": "pending" },
    { "id": "86cc7ffe-...", "customer_id": "e45b7aa7-...", "total": 176.77, "status": "delivered" }
  ]
}

Every customer_id in orders is a valid UUID from the customers table.

50+ Data Types

Beyond foreign keys, DataForge supports all the types you'd expect:

People & Identity

person.firstName, person.lastName, person.fullName
person.email, person.phone, person.username
person.age, person.jobTitle, person.dateOfBirth

Addresses

address.street, address.city, address.state
address.country, address.zipCode
address.latitude, address.longitude

Commerce & Finance

commerce.price(min, max), commerce.productName
commerce.company, commerce.category
finance.creditCard, finance.bitcoin, finance.iban

Internet

internet.url, internet.domain, internet.ip
internet.email, internet.hexColor

Dates

date.past(years), date.future(years), date.recent(days)
date.between(start, end), date.timestamp

Special Types

uuid, autoincrement, boolean(probability)
random(["option1", "option2", "option3"])
weighted({"active": 0.8, "inactive": 0.2})

Multiple Output Formats

JSON (default)

curl -X POST https://your-api/generate \
  -H "Content-Type: application/json" \
  -d '{"tables": {...}, "output": "json"}'

SQL (PostgreSQL, MySQL, SQL Server, SQLite)

curl -X POST https://your-api/generate \
  -H "Content-Type: application/json" \
  -d '{"tables": {...}, "output": "sql", "dialect": "postgresql"}'

Output:

INSERT INTO "customers" ("id", "name", "email") VALUES
    ('d457875d-...', 'David Miller', 'david.miller@gmail.com'),
    ('e45b7aa7-...', 'Mary Gonzalez', 'mary.g@outlook.com');

INSERT INTO "orders" ("id", "customer_id", "total", "status") VALUES
    ('156a6208-...', 'e45b7aa7-...', 116.32, 'delivered'),
    ('b8089f49-...', 'd457875d-...', 165.47, 'pending');

CSV - One file per table with headers

Reproducible Results

Need the same data every time? Use a seed:

{
  "tables": { ... },
  "seed": "my-test-seed-123"
}

Same seed = identical output. Perfect for:

Consistent test suites
Reproducible demos
Debugging specific scenarios

Complex Schema Example

Here's a more realistic example with multiple relationships:

{
  "tables": {
    "users": {
      "count": 50,
      "columns": {
        "id": { "type": "uuid", "unique": true },
        "username": { "type": "person.username", "unique": true },
        "email": { "type": "person.email", "unique": true },
        "created_at": { "type": "date.past(2)" }
      }
    },
    "categories": {
      "count": 10,
      "columns": {
        "id": { "type": "autoincrement" },
        "name": { "type": "commerce.category" },
        "slug": { "type": "text.slug(2)" }
      }
    },
    "products": {
      "count": 200,
      "columns": {
        "id": { "type": "uuid" },
        "category_id": { "reference": "categories.id" },
        "name": { "type": "commerce.productName" },
        "price": { "type": "commerce.price(10,500)" },
        "in_stock": { "type": "boolean(0.8)" }
      }
    },
    "orders": {
      "count": 500,
      "columns": {
        "id": { "type": "uuid" },
        "user_id": { "reference": "users.id" },
        "total": { "type": "commerce.price(20,1000)" },
        "status": { "type": "weighted({\"pending\":0.2,\"processing\":0.3,\"shipped\":0.3,\"delivered\":0.2})" },
        "created_at": { "type": "date.recent(90)" }
      }
    },
    "order_items": {
      "count": 1500,
      "columns": {
        "id": { "type": "uuid" },
        "order_id": { "reference": "orders.id" },
        "product_id": { "reference": "products.id" },
        "quantity": { "type": "number.between(1,5)" },
        "price": { "type": "commerce.price(10,500)" }
      }
    }
  }
}

Five related tables, all with valid foreign keys, generated in milliseconds.

Use Cases

Testing
Populate your test database with realistic data that respects all constraints.

Demos & Prototypes
Show clients a working product with believable data instead of "Lorem ipsum" everywhere.

Load Testing
Generate thousands of rows to stress-test your queries and indexes.

Documentation
Create realistic API response examples for your docs.

Learning
Practice SQL joins, ORMs, and database design with proper relational data.

Try It

DataForge is available on RapidAPI:

👉 DataForge on RapidAPI

What's Next?

I'm actively developing DataForge and would love feedback. Some features on the roadmap:

More data types (healthcare, gaming, IoT)
GraphQL output
Schema inference from existing databases
Visual schema builder

Drop a comment with what features you'd find useful!

What's your current approach to generating test data? Have you run into the foreign key problem before? Let me know in the comments.

DEV Community