Every API test needs data to run against. A login test needs users. A checkout test needs orders, addresses, and payment records. A search test needs thousands of rows so pagination, sorting, and filtering behave like they do in production. Hand-typing that data is slow, brittle, and usually too “clean” to catch real bugs.
A test data generator fixes this by producing realistic, varied records on demand. Instead of maintaining static fixtures, you define the shape of the data you need and generate it when your mocks or tests run. This guide shows what a test data generator is, which types are available, and how to generate API test data directly inside Apidog.
If you’re new to faking API responses, start with what a mock API is, then come back here for the data-generation side.
What is a test data generator?
A test data generator is a tool or library that creates synthetic records that look like production data.
Instead of writing this repeatedly:
{
"name": "test",
"email": "test@test.com"
}
You define the fields and constraints you need:
{
"name": "full name",
"email": "valid email",
"price": "number between 10 and 500",
"createdAt": "recent date"
}
The generator fills in realistic values.
Good test data should be:
- Realistic: names look like names, emails pass validation, dates are sensible.
- Varied: records differ enough to catch edge cases and boundary bugs.
- Safe: data is synthetic, so no real customer PII enters your test suite.
The goal is not pretty demo data. The goal is coverage: empty strings, Unicode names, huge numbers, expired dates, invalid formats, and other inputs that can break your API.
Why generated data matters for API testing
APIs validate input and branch based on request data. They reject malformed emails, enforce limits, clamp numbers, and handle optional fields differently.
If every test uses:
John Doe / john@example.com / quantity: 1
You mostly test the happy path.
Generated data helps you:
Test at volume
Generate thousands of records to exercise pagination, sorting, filtering, and performance.Target edge cases
Use values like0,-1,null, oversized strings, invalid emails, or expired timestamps.Run data-driven tests
Feed many input rows into one test case and assert the expected result for each row.
That last workflow is especially useful for APIs: one request template, many generated inputs, repeatable assertions.
Main types of test data generators
Most teams use more than one generator depending on the test scenario.
1. Code libraries
Libraries like Faker.js and Faker for Python give you a programmatic API for generating values.
Example with Faker.js:
import { faker } from '@faker-js/faker';
const user = {
id: faker.string.uuid(),
name: faker.person.fullName(),
email: faker.internet.email(),
createdAt: faker.date.recent().toISOString()
};
console.log(user);
Use this option when you need:
- Full control in code
- Repeatable data with seeds
- Generated data inside scripts, test runners, or CI jobs
The trade-off is maintenance: you own the scripts and must keep them aligned with your API schema.
If you use JavaScript, see Faker.js and how to use it in Apidog.
2. Standalone and online generators
Tools like Mockaroo let you define fields in a UI and export CSV, JSON, or SQL.
Use this option when you need:
- A quick seed file
- A one-time dataset
- No-code data generation
The downside: exported data is static. When your schema changes, you usually regenerate and re-import the dataset manually.
3. Schema-based generators
Schema-based generators read your OpenAPI spec or JSON Schema and generate records based on field types, formats, and constraints.
For example, if your schema defines:
{
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
},
"age": {
"type": "integer",
"minimum": 18
}
}
}
A schema-aware generator can create matching records automatically.
Use this option when you need:
- Data that matches your API contract
- Less manual fixture maintenance
- Mock responses aligned with OpenAPI or JSON Schema
See how to generate mock data from OpenAPI schemas. The JSON Schema standard makes this possible by describing types, formats, ranges, and required fields in a machine-readable way.
4. AI-based generators
AI-based generators create context-aware records, such as:
- A realistic support ticket
- A coherent user profile
- A plausible product description
- Related fields that make sense together
Use this option when random field-level values are not enough and you need data with semantic consistency.
For a hands-on example, see generating mock data using Claude Code.
How to generate test data in Apidog
If you test APIs in Apidog, you can generate data directly in the same workspace where you define endpoints, mocks, and tests.
Apidog supports test data generation in three practical places:
1. Smart mock data from field rules
When Apidog mocks an endpoint, it reads field names and types to generate realistic values.
Examples:
-
emailreturns a valid email address -
createdAtreturns a date -
pricereturns a number -
namereturns a realistic name-like value
You can also attach Faker-style rules to fields when you need more control.
For example, you might define status as one of:
active
pending
closed
Then your mock response keeps the same shape as your real API while returning varied data.
Download Apidog and any endpoint you define can start returning generated data without maintaining a separate db.json.
2. AI-generated test records
Apidog can generate batches of test records from an endpoint schema. This is useful when you want multiple realistic examples without manually writing rules for every field.
Use it when you need to quickly create:
- Users
- Orders
- Products
- Tickets
- Search results
- Validation test inputs
3. Data-driven API tests
Data-driven testing lets you attach a CSV or JSON dataset to a test step. Apidog runs the step once per row and substitutes values as variables.
For example, a CSV file might look like this:
email,password,expectedStatus
valid@example.com,correct-password,200
invalid-email,correct-password,400
valid@example.com,wrong-password,401
Your request can reference those values:
{
"email": "{{email}}",
"password": "{{password}}"
}
Then your assertion checks the expected result:
response.status == {{expectedStatus}}
One test step now covers multiple scenarios.
For setup details, see:
- How to run parameterized API tests from CSV and JSON
- Which tool to use for data-driven API testing
- Data-driven testing in the Apidog CLI
Step by step: generate test data for an endpoint in Apidog
Use this workflow for an API endpoint that needs realistic test responses and repeatable input data.
- Open your Apidog project
Select the API project that contains the endpoint you want to test.
- Select or create an endpoint
Open the endpoint definition for the API route, such as:
GET /users
POST /orders
GET /products
- Define the response schema
Add the response fields manually or import them from an OpenAPI file.
Example response shape:
{
"id": "string",
"name": "string",
"email": "string",
"createdAt": "string"
}
- Enable the mock
Turn on mocking for the endpoint. Apidog generates values for the fields based on names, types, and rules.
- Customize fields where needed
Add field-specific mock rules for values that must follow known constraints.
Example:
status: active | pending | closed
role: admin | user | guest
- Create a dataset for test execution
Prepare a CSV or JSON dataset for input-driven scenarios.
Example JSON dataset:
[
{
"email": "valid@example.com",
"password": "correct-password",
"expectedStatus": 200
},
{
"email": "invalid-email",
"password": "correct-password",
"expectedStatus": 400
}
]
- Attach the dataset to a test step
Configure the test step to iterate over each row and substitute variables into the request.
- Run and review results
Apidog executes the same request against every dataset row, giving you repeatable coverage without duplicating test cases.
You now have generated mock responses for development and structured datasets for test execution in the same workflow.
How to choose a test data generator
| If you need… | Use | Why |
|---|---|---|
| Full programmatic control in JS/Python | Faker library | Flexible, scriptable, reproducible with seeds |
| A quick static seed file | Mockaroo or similar | No code, export and go |
| Data that matches your API contract | Schema-based generation | Stays aligned with OpenAPI or JSON Schema |
| Context-aware records | AI generator | Produces coherent multi-field data |
| Generated data wired into mocks and API tests | Apidog | One workspace for mock, generate, and run |
There is no single best option for every team. A scripting-heavy team may prefer Faker. A team already designing and testing APIs in Apidog can keep generation, mocking, and data-driven execution in one place.
Best practices for API test data
Seed data when you need reproducibility
Random data is useful, but failing tests must be reproducible. Use fixed seeds for test runs you need to debug later.
Generate invalid data too
Do not only generate valid records. Add rows for:
- Empty fields
- Missing required fields
- Wrong types
- Oversized strings
- Negative quantities
- Expired tokens
- Malformed emails
- Boundary values
Keep data aligned with the schema
When your API contract changes, regenerate your test data. Schema-based generation helps reduce drift between fixtures and actual API behavior.
Never use real PII
Do not copy production customer records into tests. Synthetic data avoids privacy risk and prevents sensitive data from leaking into repositories or CI logs.
Match dataset size to the test
Use small datasets for validation tests and large datasets for pagination, filtering, search, and performance scenarios.
Examples:
| Test type | Recommended data volume |
|---|---|
| Field validation | 5–20 rows |
| Authentication scenarios | 5–10 rows |
| Pagination | Hundreds or thousands of rows |
| Search and filtering | Hundreds or thousands of rows |
| Performance testing | As much as the scenario requires |
FAQ
What’s the difference between a test data generator and a mock server?
A test data generator creates the data. A mock server serves that data over HTTP as fake API responses.
You often need both. In Apidog, the mock can return data generated from your schema and mock rules. A standalone generator usually just gives you a file.
Can I generate test data from my OpenAPI spec?
Yes. Schema-based tools read OpenAPI types and constraints to produce matching records.
See generating mock data from OpenAPI schemas.
Is generated test data safe to commit to a repo?
Synthetic test data is generally safe because it contains no real personal information. Never commit exported production data or customer records.
How do I run one API test against many generated inputs?
Use data-driven testing. Attach a CSV or JSON dataset to a test step, reference row values as variables, and run the step once per row.
The parameterized testing guide shows the setup.
Do I need to run a fake server to use test data?
Not always. If you want a throwaway REST API backed by a flat file, see the guide to json-server and JSONPlaceholder.
For schema-aware, team-shareable mocks, use Apidog’s built-in mock server.
The short version
A test data generator turns manual fixture writing into a repeatable workflow. Use code libraries when you need scripting control, schema-based tools when your data must match an API contract, and AI-based generation when records need to make sense across multiple fields.
If you already test APIs in Apidog, you can generate data, serve smart mock responses, and run data-driven tests in one place. Download Apidog, point it at an endpoint, and start testing with realistic data on the first request.


Top comments (0)