Hassann

Posted on Jun 17 • Originally published at apidog.com

How to Create Realistic API Test Data

Every API test needs data to run against. A login test needs users. A checkout test needs orders, addresses, and payment records. A search test needs thousands of rows so pagination, sorting, and filtering behave like they do in production. Hand-typing that data is slow, brittle, and usually too “clean” to catch real bugs.

Try Apidog today

A test data generator fixes this by producing realistic, varied records on demand. Instead of maintaining static fixtures, you define the shape of the data you need and generate it when your mocks or tests run. This guide shows what a test data generator is, which types are available, and how to generate API test data directly inside Apidog.

If you’re new to faking API responses, start with what a mock API is, then come back here for the data-generation side.

What is a test data generator?

A test data generator is a tool or library that creates synthetic records that look like production data.

Instead of writing this repeatedly:

{
  "name": "test",
  "email": "test@test.com"
}

You define the fields and constraints you need:

{
  "name": "full name",
  "email": "valid email",
  "price": "number between 10 and 500",
  "createdAt": "recent date"
}

The generator fills in realistic values.

Good test data should be:

Realistic: names look like names, emails pass validation, dates are sensible.
Varied: records differ enough to catch edge cases and boundary bugs.
Safe: data is synthetic, so no real customer PII enters your test suite.

The goal is not pretty demo data. The goal is coverage: empty strings, Unicode names, huge numbers, expired dates, invalid formats, and other inputs that can break your API.

Why generated data matters for API testing

APIs validate input and branch based on request data. They reject malformed emails, enforce limits, clamp numbers, and handle optional fields differently.

If every test uses:

John Doe / john@example.com / quantity: 1

You mostly test the happy path.

Generated data helps you:

Test at volume

Generate thousands of records to exercise pagination, sorting, filtering, and performance.
Target edge cases

Use values like 0, -1, null, oversized strings, invalid emails, or expired timestamps.
Run data-driven tests

Feed many input rows into one test case and assert the expected result for each row.

That last workflow is especially useful for APIs: one request template, many generated inputs, repeatable assertions.

Main types of test data generators

Most teams use more than one generator depending on the test scenario.

1. Code libraries

Libraries like Faker.js and Faker for Python give you a programmatic API for generating values.

Example with Faker.js:

import { faker } from '@faker-js/faker';

const user = {
  id: faker.string.uuid(),
  name: faker.person.fullName(),
  email: faker.internet.email(),
  createdAt: faker.date.recent().toISOString()
};

console.log(user);

Use this option when you need:

Full control in code
Repeatable data with seeds
Generated data inside scripts, test runners, or CI jobs

The trade-off is maintenance: you own the scripts and must keep them aligned with your API schema.

If you use JavaScript, see Faker.js and how to use it in Apidog.

2. Standalone and online generators

Tools like Mockaroo let you define fields in a UI and export CSV, JSON, or SQL.

Use this option when you need:

A quick seed file
A one-time dataset
No-code data generation

The downside: exported data is static. When your schema changes, you usually regenerate and re-import the dataset manually.

3. Schema-based generators

Schema-based generators read your OpenAPI spec or JSON Schema and generate records based on field types, formats, and constraints.

For example, if your schema defines:

{
  "type": "object",
  "properties": {
    "email": {
      "type": "string",
      "format": "email"
    },
    "age": {
      "type": "integer",
      "minimum": 18
    }
  }
}

A schema-aware generator can create matching records automatically.

Use this option when you need:

Data that matches your API contract
Less manual fixture maintenance
Mock responses aligned with OpenAPI or JSON Schema

See how to generate mock data from OpenAPI schemas. The JSON Schema standard makes this possible by describing types, formats, ranges, and required fields in a machine-readable way.

4. AI-based generators

AI-based generators create context-aware records, such as:

A realistic support ticket
A coherent user profile
A plausible product description
Related fields that make sense together

Use this option when random field-level values are not enough and you need data with semantic consistency.

For a hands-on example, see generating mock data using Claude Code.

How to generate test data in Apidog

If you test APIs in Apidog, you can generate data directly in the same workspace where you define endpoints, mocks, and tests.

Apidog supports test data generation in three practical places:

1. Smart mock data from field rules

When Apidog mocks an endpoint, it reads field names and types to generate realistic values.

Examples:

email returns a valid email address
createdAt returns a date
price returns a number
name returns a realistic name-like value

You can also attach Faker-style rules to fields when you need more control.

For example, you might define status as one of:

active
pending
closed

Then your mock response keeps the same shape as your real API while returning varied data.

Download Apidog and any endpoint you define can start returning generated data without maintaining a separate db.json.

2. AI-generated test records

Apidog can generate batches of test records from an endpoint schema. This is useful when you want multiple realistic examples without manually writing rules for every field.

Use it when you need to quickly create:

Users
Orders
Products
Tickets
Search results
Validation test inputs

3. Data-driven API tests

Data-driven testing lets you attach a CSV or JSON dataset to a test step. Apidog runs the step once per row and substitutes values as variables.

For example, a CSV file might look like this:

email,password,expectedStatus
valid@example.com,correct-password,200
invalid-email,correct-password,400
valid@example.com,wrong-password,401

Your request can reference those values:

{
  "email": "{{email}}",
  "password": "{{password}}"
}

Then your assertion checks the expected result:

response.status == {{expectedStatus}}

One test step now covers multiple scenarios.

For setup details, see:

Step by step: generate test data for an endpoint in Apidog

Use this workflow for an API endpoint that needs realistic test responses and repeatable input data.

Open your Apidog project

Select the API project that contains the endpoint you want to test.

Select or create an endpoint

Open the endpoint definition for the API route, such as:

   GET /users
   POST /orders
   GET /products

Define the response schema

Add the response fields manually or import them from an OpenAPI file.

Example response shape:

   {
     "id": "string",
     "name": "string",
     "email": "string",
     "createdAt": "string"
   }

Enable the mock

Turn on mocking for the endpoint. Apidog generates values for the fields based on names, types, and rules.

Customize fields where needed

Add field-specific mock rules for values that must follow known constraints.

Example:

   status: active | pending | closed
   role: admin | user | guest

Create a dataset for test execution

Prepare a CSV or JSON dataset for input-driven scenarios.

Example JSON dataset:

   [
     {
       "email": "valid@example.com",
       "password": "correct-password",
       "expectedStatus": 200
     },
     {
       "email": "invalid-email",
       "password": "correct-password",
       "expectedStatus": 400
     }
   ]

Attach the dataset to a test step

Configure the test step to iterate over each row and substitute variables into the request.

Run and review results

Apidog executes the same request against every dataset row, giving you repeatable coverage without duplicating test cases.

You now have generated mock responses for development and structured datasets for test execution in the same workflow.

How to choose a test data generator

If you need…	Use	Why
Full programmatic control in JS/Python	Faker library	Flexible, scriptable, reproducible with seeds
A quick static seed file	Mockaroo or similar	No code, export and go
Data that matches your API contract	Schema-based generation	Stays aligned with OpenAPI or JSON Schema
Context-aware records	AI generator	Produces coherent multi-field data
Generated data wired into mocks and API tests	Apidog	One workspace for mock, generate, and run

There is no single best option for every team. A scripting-heavy team may prefer Faker. A team already designing and testing APIs in Apidog can keep generation, mocking, and data-driven execution in one place.

Best practices for API test data

Seed data when you need reproducibility

Random data is useful, but failing tests must be reproducible. Use fixed seeds for test runs you need to debug later.

Generate invalid data too

Do not only generate valid records. Add rows for:

Empty fields
Missing required fields
Wrong types
Oversized strings
Negative quantities
Expired tokens
Malformed emails
Boundary values

Keep data aligned with the schema

When your API contract changes, regenerate your test data. Schema-based generation helps reduce drift between fixtures and actual API behavior.

Never use real PII

Do not copy production customer records into tests. Synthetic data avoids privacy risk and prevents sensitive data from leaking into repositories or CI logs.

Match dataset size to the test

Use small datasets for validation tests and large datasets for pagination, filtering, search, and performance scenarios.

Examples:

Test type	Recommended data volume
Field validation	5–20 rows
Authentication scenarios	5–10 rows
Pagination	Hundreds or thousands of rows
Search and filtering	Hundreds or thousands of rows
Performance testing	As much as the scenario requires

FAQ

What’s the difference between a test data generator and a mock server?

A test data generator creates the data. A mock server serves that data over HTTP as fake API responses.

You often need both. In Apidog, the mock can return data generated from your schema and mock rules. A standalone generator usually just gives you a file.

Can I generate test data from my OpenAPI spec?

Yes. Schema-based tools read OpenAPI types and constraints to produce matching records.

See generating mock data from OpenAPI schemas.

Is generated test data safe to commit to a repo?

Synthetic test data is generally safe because it contains no real personal information. Never commit exported production data or customer records.

How do I run one API test against many generated inputs?

Use data-driven testing. Attach a CSV or JSON dataset to a test step, reference row values as variables, and run the step once per row.

The parameterized testing guide shows the setup.

Do I need to run a fake server to use test data?

Not always. If you want a throwaway REST API backed by a flat file, see the guide to json-server and JSONPlaceholder.

For schema-aware, team-shareable mocks, use Apidog’s built-in mock server.

The short version

A test data generator turns manual fixture writing into a repeatable workflow. Use code libraries when you need scripting control, schema-based tools when your data must match an API contract, and AI-based generation when records need to make sense across multiple fields.

If you already test APIs in Apidog, you can generate data, serve smart mock responses, and run data-driven tests in one place. Download Apidog, point it at an endpoint, and start testing with realistic data on the first request.

DEV Community

How to Create Realistic API Test Data

What is a test data generator?

Why generated data matters for API testing

Main types of test data generators

1. Code libraries

2. Standalone and online generators

3. Schema-based generators

4. AI-based generators

How to generate test data in Apidog

1. Smart mock data from field rules

2. AI-generated test records

3. Data-driven API tests

Step by step: generate test data for an endpoint in Apidog

How to choose a test data generator

Best practices for API test data

Seed data when you need reproducibility

Generate invalid data too

Keep data aligned with the schema

Never use real PII

Match dataset size to the test

FAQ

What’s the difference between a test data generator and a mock server?

Can I generate test data from my OpenAPI spec?

Is generated test data safe to commit to a repo?

How do I run one API test against many generated inputs?

Do I need to run a fake server to use test data?

The short version

Top comments (0)