DEV Community: Smeet Gohel

Postman Collections vs OpenAPI Specs: Which Scales as a Source of Truth?

Smeet Gohel — Wed, 01 Jul 2026 14:32:54 +0000

Our team kept both for two years. Eventually they drifted, and the postmortem from that drift is the reason this post exists.

At first, maintaining both seemed perfectly reasonable.

Our OpenAPI specification documented the API.

Our Postman collection helped developers explore and test it.

Each served a different purpose.

Or so we thought.

As the API grew, something subtle happened.

A new endpoint would be added to the application.

The OpenAPI specification would be updated during code review.

The Postman collection would be updated a week later.

Sometimes.

Other times it wouldn't.

Months later, developers started asking questions like:

"Which request body is correct?"
"Why does the documentation say one thing but Postman sends something else?"
"Why does the generated SDK accept a field that isn't in the collection?"

Eventually we realized we no longer had one source of truth.

We had two.

And neither one completely matched production.

That experience forced us to rethink the role of each format.

The conclusion wasn't that one was "better."

It was that they solve different problems—and only one of them scales well as the authoritative API contract.

What Each Format Actually Models (And What It Doesn't)

The first mistake many teams make is assuming that Postman collections and OpenAPI specifications represent the same thing.

They don't.

Their goals are fundamentally different.

What a Postman Collection Models

A Postman collection describes interactions.

It contains:

Requests
Headers
Variables
Authentication
Example payloads
Test scripts
Pre-request scripts
Environment values

Its focus is execution.

Developers can immediately send requests and observe responses.

That's incredibly valuable during development and debugging.

What It Doesn't Model Well

Collections don't naturally describe:

Complete API contracts
Reusable schemas
Type relationships
Polymorphism
Code generation metadata
Validation rules

While Postman has added schema-related capabilities over time, contract modeling isn't its primary design goal.

What an OpenAPI Specification Models

OpenAPI focuses on describing the API itself.

It defines:

Endpoints
Operations
Parameters
Request bodies
Response schemas
Authentication
Data models
Error responses

Everything is structured around the contract.

That contract can then power:

Documentation
Client SDK generation
Mock servers
Contract validation
Test generation

OpenAPI isn't primarily an execution format.

It's a specification format.

That distinction matters.

The Drift Problem at the 50-Endpoint Mark

For small APIs, maintaining both formats rarely feels painful.

Imagine ten endpoints.

Updating two files after each change is manageable.

Now imagine fifty.

Then one hundred.

Eventually every API change requires updating:

Implementation
OpenAPI specification
Postman collection

Three artifacts.

Three opportunities for drift.

How Drift Begins

A developer adds:

```text id="f5mvr0"
PATCH /customers/{id}




The OpenAPI specification updates immediately.

The Postman collection still contains:



```text id="h1lgrg"
PUT /customers/{id}

Nothing breaks immediately.

Weeks later:

Someone imports the Postman collection.

Someone else generates an SDK from OpenAPI.

Now different teams are using different contracts.

Both appear correct.

Neither matches reality perfectly.

Why This Gets Worse Over Time

Drift compounds.

Each inconsistent update creates another future inconsistency.

Eventually developers stop trusting both artifacts.

Instead they inspect the implementation directly.

That's exactly what documentation should prevent.

Tooling Around Each: Code Generation, Documentation, Mocks, and Tests

One of OpenAPI's biggest strengths is the surrounding ecosystem.

A single specification can generate:

Interactive documentation
Type-safe SDKs
Mock servers
Validation middleware
Contract tests
API clients

Everything originates from one contract.

Postman Tooling

Postman excels in different areas.

Examples include:

Manual exploration
Team collaboration
Environment management
Request execution
Automated collections
Monitoring

It's an outstanding productivity tool for developers and QA engineers.

Where OpenAPI Pulls Ahead

When APIs become larger, the contract becomes increasingly valuable.

Teams begin relying on:

Code generation
Continuous validation
Consumer-driven contracts
API governance
Version compatibility

Those workflows naturally align with the OpenAPI specification.

They Complement Each Other

This isn't an either/or situation.

Postman helps humans interact with APIs.

OpenAPI helps tools understand APIs.

That's an important distinction.

Migrating Postman → OpenAPI Without Losing Examples

One reason many teams hesitate to adopt OpenAPI is fear of losing years of curated request examples.

Fortunately, that's usually unnecessary.

A gradual migration works surprisingly well.

Step 1: Export Existing Collections

Most collections already contain valuable examples.

Those shouldn't disappear.

Export them first.

Step 2: Generate a Baseline Specification

Several tools can convert Postman collections into an initial OpenAPI document.

The output usually requires cleanup, but it's an excellent starting point.

Step 3: Promote Examples

Move useful request and response examples into:

Request schemas
Response examples
Components
Reusable objects

Examples become part of the specification itself rather than remaining hidden inside collections.

Step 4: Generate New Collections

Once OpenAPI becomes authoritative, regenerate Postman collections whenever the specification changes.

Instead of manually maintaining two sources, maintain one.

Generate the other.

This single workflow eliminates most synchronization problems.

When Keeping Both Actually Makes Sense

Despite everything I've written so far, there is one situation where maintaining both is entirely reasonable.

Consumer workflows.

Imagine an external developer onboarding experience.

OpenAPI provides:

Documentation
Schemas
SDK generation

Postman provides:

Ready-to-run requests
Authentication helpers
Environment variables
Example workflows

These serve different audiences.

As long as the collection is generated from OpenAPI rather than maintained independently, both remain valuable.

The key is understanding which artifact owns the contract.

A Practical Recommendation

If I were starting a new API project today, my workflow would look like this.

Step 1

Design the API using OpenAPI.

Step 2

Review the contract.

Step 3

Generate:

Documentation
SDKs
Mock servers
Contract tests
Postman collections

Step 4

Implement the API.

Step 5

Validate implementation against the contract.

Notice what's missing.

Manual synchronization.

Every artifact originates from the same specification.

That's the real advantage.

Common Mistakes Teams Make

Over time, I've seen several recurring patterns.

Treating Postman as Documentation

Collections explain requests.

They're not comprehensive API contracts.

Maintaining Both Manually

This almost always leads to drift.

Ignoring Examples

Examples are valuable.

Move them into the specification.

Don't lose them during migration.

Writing Tests Against Collections Alone

Tests become stronger when they validate against the API contract rather than only request collections.

Delaying OpenAPI Adoption

The larger the API becomes, the harder migration becomes later.

Starting early usually pays off.

Final Thoughts

The debate around Postman vs OpenAPI often assumes they're competing technologies.

I don't think that's the right way to view them.

Postman is excellent for interacting with APIs.

OpenAPI is excellent for describing APIs.

Those are complementary goals.

Where teams run into trouble is treating both as independent sources of truth.

That approach works for small projects but becomes increasingly difficult as APIs evolve.

Once an API reaches dozens of endpoints, keeping multiple manually maintained contracts synchronized becomes a maintenance problem rather than a productivity benefit.

For most organizations, the sustainable approach is straightforward:

Use OpenAPI as the authoritative contract.

Generate documentation, SDKs, mocks, tests, and even Postman collections from that contract.

Then let each tool do what it does best instead of asking every artifact to become the source of truth.

If you're evaluating workflows or deciding how to structure your API tooling, the Total Shift Left vs Postman breakdown provides a practical comparison of how contract-first approaches differ from collection-first workflows and where each fits within a modern API development process.

Testing Async Jobs and Queues End-to-End (Without sleep())

Smeet Gohel — Mon, 29 Jun 2026 13:51:54 +0000

Search the average backend test suite for sleep or wait_for and you'll find a depressing number of arbitrary numbers — 2, 5, sometimes 30.

Those numbers usually have a story behind them.

Someone wrote an asynchronous test that occasionally failed because a background job hadn't completed yet. To make it pass, they added a sleep(2).

A few months later, the infrastructure changed. The job occasionally took three seconds.

The test became flaky again.

Someone increased the timeout to five seconds.

Later, another environment was slower, so the timeout became thirty seconds.

Eventually, every asynchronous test in the suite was waiting far longer than necessary, pipelines became slower, and intermittent failures were dismissed as "just another flaky test."

If you've experienced this cycle, you're not alone.

Testing asynchronous systems is fundamentally different from testing synchronous APIs. The goal isn't to wait a fixed amount of time—it’s to detect when the expected outcome has actually happened.

Over the past few years, I've found that a few simple patterns eliminate most of the unnecessary sleeps while making asynchronous tests faster and far more reliable.

Here's the approach.

Why Async APIs Are Hard to Test

Unlike a synchronous REST endpoint, asynchronous workflows return before the real work has finished.

A typical request looks like this:

Client
   │
POST /orders
   │
   ▼
API
   │
Stores message
   │
Returns 202 Accepted
   │
───────────────
Background Worker
   │
Processes message
   │
Updates database
   │
Publishes event
   │
Clears cache

Your API responds immediately.

The actual business logic happens later.

The test now has two responsibilities:

Verify the request was accepted.
Verify the background processing completed correctly.

That's where many teams reach for sleep().

Why `sleep()` Is Almost Always the Wrong Tool

Consider this example:

await sleep(5000);

const order = await db.orders.findOne({
    id: orderId
});

expect(order.status).toBe("Completed");

This works…

until it doesn't.

Problems include:

The job finishes in 200 ms, but the test still waits five seconds.
The job takes six seconds during peak load, and the test fails.
CI machines are slower than local development.
Multiple background jobs compete for resources.

The fixed delay becomes either:

Too short (flaky tests), or
Too long (slow pipelines).

Neither outcome is desirable.

Why Polling Beats Sleep

Instead of waiting for a fixed duration, wait for the condition you're expecting.

Conceptually:

Is the order completed?

No.

Wait briefly.

Check again.

Still no.

Wait briefly.

Check again.

Yes.

Continue immediately.

The test finishes as soon as the condition becomes true.

Not one second later.

Polling Without Hammering the Database

One concern is excessive database traffic.

Fortunately, polling doesn't require checking every millisecond.

A practical strategy looks like:

Poll every 250–500 ms.
Stop immediately once the condition succeeds.
Respect an overall timeout.

Example:

await eventually(async () => {
    const order = await repository.find(orderId);

    expect(order.status).toBe("Completed");
});

Most jobs complete after only a few polling iterations.

The database load remains minimal.

The `eventually()` Helper — 20 Lines That Eliminate Most Sleeps

One of the most useful utilities we've adopted is a simple helper called eventually().

It repeatedly executes an assertion until either:

It succeeds, or
The timeout expires.

A simplified implementation looks like:

async function eventually(assertion, timeout = 10000, interval = 250) {
    const start = Date.now();

    while (Date.now() - start < timeout) {
        try {
            await assertion();
            return;
        } catch (_) {
            await new Promise(r => setTimeout(r, interval));
        }
    }

    throw new Error("Condition not satisfied before timeout.");
}

Despite being only a few lines of code, this helper replaces dozens of arbitrary sleeps across a typical test suite.

Why It Works So Well

Instead of writing:

await sleep(10000);

you simply write:

await eventually(async () => {

    expect(await orderCompleted(orderId))
        .toBe(true);

});

If the job completes after 800 milliseconds, the test finishes after 800 milliseconds.

If it needs four seconds, the helper patiently waits.

No guessing required.

Asserting on Side Effects: The Row, the Event, the Cache

One common mistake is checking only the API response.

Imagine:

POST /orders

returns:

202 Accepted

Many tests stop there.

That tells you only that the message entered the queue.

It says nothing about whether processing succeeded.

Instead, verify the side effects.

1. Database Changes

Example:

await eventually(async () => {

    const order = await db.orders.find(orderId);

    expect(order.status).toBe("Completed");

});

2. Published Events

Many async systems emit events.

Verify:

Event exists.
Payload is correct.
Event type matches expectations.

For example:

OrderCompleted

should appear exactly once.

3. Cache Updates

Suppose order summaries are cached.

After processing completes:

Verify:

Cache exists.
Cached values are correct.
Stale entries disappeared.

Ignoring cache validation often hides production bugs.

4. Notifications

If background jobs send:

Emails
SMS
Push notifications

Test the message queue or mock notification service rather than relying solely on database assertions.

Choosing a Timeout Strategy

The next question becomes:

"How long should eventually() wait?"

Many teams guess.

I recommend using production metrics instead.

Measure the 99th Percentile

Suppose monitoring shows:

Average:

400 ms

95th percentile:

900 ms

99th percentile:

1.8 seconds

Choose a timeout around:

3 × P99

In this example:

5–6 seconds

This provides enough tolerance for occasional variance without masking genuine performance regressions.

Why Not Infinite Retries?

Infinite retries create dangerous tests.

A failed job should fail the pipeline—not wait forever.

A timeout communicates:

"This condition never became true."

That's valuable debugging information.

Testing Job Retries

Many queues automatically retry failed jobs.

Those retries deserve explicit tests.

Suppose processing fails because an external API is temporarily unavailable.

Expected behavior:

Attempt 1

↓

Failure

↓

Retry

↓

Attempt 2

↓

Success

Your test should verify:

Retry count.
Retry delay.
Final success.
No duplicate side effects.

Retries often introduce subtle bugs such as duplicate database writes or duplicate notifications.

Testing Dead Letter Queues (DLQs)

Some failures should never succeed.

For example:

Invalid payload
Corrupt message
Missing required fields

After repeated retries:

The message should move into the Dead Letter Queue.

Test expectations include:

Retry limit reached.
DLQ contains message.
Original queue is empty.
Error logged.

Ignoring DLQ behavior leaves one of the most important resilience mechanisms completely untested.

Common Mistakes in Async API Testing

Over time, I've seen the same patterns repeatedly.

Using Fixed Sleeps

Creates slow and unreliable pipelines.

Verifying Only HTTP Responses

A 202 Accepted response does not guarantee successful processing.

Ignoring Side Effects

Database updates, cache invalidation, and events deserve verification.

Skipping Retry Logic

Retries often behave differently from first attempts.

Never Testing Failure Paths

DLQs and permanent failures are part of the application—not edge cases.

A Practical Async Testing Checklist

For every asynchronous workflow, ask:

Did the API return the expected acknowledgement?
Did the background job finish?
Was the database updated?
Were downstream events published?
Was the cache refreshed?
Were retries handled correctly?
Were permanent failures routed to the DLQ?
Did everything complete within acceptable time?

Answering these questions provides much stronger confidence than simply waiting five seconds and hoping the job finished.

Final Thoughts

Asynchronous systems introduce complexity that synchronous APIs simply don't have.

The temptation to sprinkle sleep() calls throughout the test suite is understandable, but those arbitrary delays almost always lead to slower pipelines, flaky builds, and difficult debugging sessions.

A better approach is to wait for outcomes rather than time.

Polling with a lightweight eventually() helper allows tests to complete as soon as work finishes, while side-effect assertions ensure background jobs actually performed the expected business operations.

Combined with sensible timeout strategies based on production metrics and explicit tests for retries and Dead Letter Queues, this creates a much more reliable approach to async API testing.

Instead of asking, "Has enough time passed?", your tests begin asking the more important question:

"Has the expected outcome happened?"

That's the question your users ultimately care about—and your automation should too.

If you're implementing asynchronous APIs with queues, events, or background workers, you'll find additional examples in the async/queue testing pattern we documented, including end-to-end workflows, queue validation techniques, and testing strategies for distributed microservices.

API Error Codes: A Test Suite Pattern I Stole from Stripe

Smeet Gohel — Fri, 26 Jun 2026 13:39:06 +0000

Read Stripe's API reference for an hour and you'll notice every endpoint has a complete enumerated list of error codes with example payloads. Then look at your own API.

The contrast is hard to ignore.

Stripe's API documentation treats errors as first-class citizens. Every endpoint clearly documents not only the happy path but also every expected failure, complete with structured error codes, descriptions, HTTP status codes, and example responses.

Now compare that to many APIs in production.

You might find a generic list of HTTP status codes somewhere in the documentation, but business-specific errors are often buried inside controller logic, scattered across wiki pages, or simply undocumented. The test suite isn't much better—there are dozens of happy-path tests, but only a handful of negative scenarios.

That imbalance creates problems for everyone involved:

Developers don't know which errors are expected.
Frontend teams can't reliably handle failures.
QA engineers miss important negative cases.
Refactoring accidentally changes error responses without anyone noticing.

A few years ago, I borrowed a simple idea from Stripe's documentation and turned it into a testing strategy.

Instead of treating error responses as exceptions, we created an error-code catalog and made it the foundation of our negative test suite.

The result wasn't just better API error code testing—it also improved documentation, simplified maintenance, and made API contracts far more consistent.

Here's how the pattern works.

Why Error Responses Are Part of the API Contract

When people think about API testing, they naturally focus on successful responses.

Typical assertions include:

HTTP 200 OK
HTTP 201 Created
Correct JSON payload
Required fields
Business calculations

Negative testing often gets much less attention.

Maybe there are a few tests for:

Invalid authentication
Missing required fields
Unknown resources

Beyond that, many APIs rely on manual testing or hope that the framework handles everything correctly.

The problem is that real users encounter failures just as often as successful requests.

Examples include:

Customer account locked
Payment declined
Coupon expired
Inventory unavailable
Duplicate registration
Subscription canceled
Rate limit exceeded

These aren't exceptional scenarios.

They're expected business outcomes.

Treating them as first-class API contracts changes how you design both documentation and tests.

The Error-Code Catalog as a Test Input

The first step is creating a centralized catalog of every business error the API can intentionally return.

A simplified example might look like this:

errors:
  USER_NOT_FOUND:
    httpStatus: 404
    message: User not found

  EMAIL_ALREADY_EXISTS:
    httpStatus: 409
    message: Email already exists

  INVALID_TOKEN:
    httpStatus: 401
    message: Invalid authentication token

  PAYMENT_DECLINED:
    httpStatus: 402
    message: Payment declined

  ORDER_ALREADY_SHIPPED:
    httpStatus: 409
    message: Order cannot be modified

This catalog becomes far more than documentation.

It becomes an executable specification.

Instead of asking:

"What errors should this endpoint return?"

the answer already exists in one authoritative location.

Every new business error must be added here before it reaches production.

That single requirement dramatically improves consistency.

Why a Catalog Helps

Without a catalog:

Documentation drifts.
Tests become incomplete.
Frontend teams discover errors by accident.
Reviewers overlook breaking changes.

With a catalog:

Every error is documented.
Every error becomes testable.
Every API consumer sees the same contract.

The catalog becomes the foundation for automation.

One Test per Error Code, Generated from the Catalog

Once the catalog exists, generating negative tests becomes surprisingly straightforward.

Rather than manually writing dozens of repetitive tests, a generator simply iterates through every defined error.

Conceptually:

for (const errorCode of catalog) {
    generateNegativeTest(errorCode);
}

Each generated test validates four things:

The expected HTTP status
The error code
The error message
The response schema

Consider EMAIL_ALREADY_EXISTS.

The generated scenario might:

Create a user.
Attempt to create the same user again.
Verify the response:

{
  "code": "EMAIL_ALREADY_EXISTS",
  "message": "Email already exists"
}

The implementation differs depending on the framework, but the testing philosophy remains the same:

Every documented error deserves exactly one corresponding test.

As new error codes are introduced, new tests appear automatically.

No engineer has to remember to write them.

Why This Scales Better

Imagine your API exposes:

150 endpoints
90 business error codes

Maintaining those manually quickly becomes tedious.

Generation solves two maintenance problems simultaneously:

Missing tests
Duplicate effort

Instead of asking developers to remember every negative case, the catalog guarantees baseline coverage.

Engineers can then focus on more complex business workflows rather than repetitive validation tests.

The Shape Assertion That Prevents Silent Error Drift

One lesson we learned very early was this:

Checking only the HTTP status is almost useless.

Imagine an endpoint originally returns:

{
  "code": "USER_NOT_FOUND",
  "message": "User not found",
  "requestId": "abc123"
}

Months later, someone refactors the global exception handler.

The response becomes:

{
  "error": "User not found"
}

The HTTP status is still:

Many tests still pass.

But every client expecting the original response contract is now broken.

This is known as silent error drift.

Nothing appears wrong until consumers start failing.

The Solution: Shape Assertions

Every negative test also validates the response structure.

Example:

expect(response.body).toEqual({
    code: expect.any(String),
    message: expect.any(String),
    requestId: expect.any(String)
});

Notice that we're not only validating values.

We're validating the schema itself.

That single assertion protects every API consumer from accidental response changes.

Why This Matters

Consumers often depend on:

Error codes
Localization keys
Correlation IDs
Documentation URLs

Removing any of these fields can become a breaking API change even though the HTTP status remains correct.

Schema validation catches those problems immediately.

Keeping the Catalog in Sync with the Code (Code Generation)

The obvious concern is maintenance.

If engineers must manually update both:

Source code
Error catalog

the catalog eventually becomes outdated.

The solution is code generation.

Most applications already define errors centrally.

For example:

export enum ErrorCode {
    USER_NOT_FOUND,
    INVALID_TOKEN,
    PAYMENT_DECLINED,
    EMAIL_ALREADY_EXISTS
}

A simple generation step can produce:

API documentation
OpenAPI components
Markdown reference tables
Test inputs
SDK constants

All from the same source.

Now there's only one place where error definitions live.

Everything else is generated automatically.

Benefits of Codegen

This approach creates several advantages:

Documentation Never Falls Behind

As soon as a new error appears in code, documentation updates automatically.

Generated Tests Stay Current

No manual synchronization required.

API Consumers Stay Aligned

Client SDKs can reference the same constants used by the server.

Code Reviews Become Easier

Adding a new business error becomes highly visible because it affects generated documentation and tests.

The Two Error Codes We Deliberately Don't Test (And Why)

Although our negative suite covers nearly every business error, there are two categories we intentionally exclude.

1. Generic Internal Server Errors

Example:

500 Internal Server Error

These represent unexpected failures.

They're not part of normal business behavior.

Rather than intentionally triggering every possible internal exception, we verify:

Sensitive details aren't exposed
Generic messages are returned
Correlation IDs exist
Logging occurs correctly

Testing every possible server failure adds little value.

Testing the response contract provides much greater return.

2. Infrastructure Failures

Examples include:

Database unavailable
Network partition
DNS outage
Message broker failure
Cloud storage unavailable

These failures belong to resilience testing rather than standard API automation.

They are better validated using:

Chaos engineering
Fault injection
Infrastructure testing
Disaster recovery exercises

Mixing infrastructure scenarios into routine API negative tests usually creates unstable pipelines.

Keeping them separate results in cleaner and more reliable automation.

Additional Benefits We Didn't Expect

Once the catalog became part of our development process, several unexpected improvements appeared.

More Consistent APIs

Every endpoint used the same response format.

Better Frontend Development

Frontend teams no longer guessed which errors could occur.

Simpler Documentation

Error references stayed synchronized automatically.

Cleaner Pull Requests

Adding a new error became an explicit design decision rather than an implementation detail.

Better QA Coverage

Negative scenarios became just as visible as successful ones.

Final Thoughts

Most engineering teams invest heavily in testing successful requests while treating failures as secondary concerns.

Stripe demonstrates a different philosophy.

Errors are documented, standardized, and treated as an integral part of the public API contract.

Building an error-code catalog allowed us to adopt that same mindset.

Instead of manually maintaining dozens of repetitive error response testing scenarios, we generated them from a single source of truth.

Combined with response schema validation and code generation, the approach dramatically reduced maintenance while increasing confidence that every documented failure behaved exactly as expected.

If your API already has a growing collection of business errors, consider creating a centralized catalog before the list becomes unmanageable.

The investment is relatively small, but the payoff in documentation quality, test coverage, and long-term maintainability is substantial.

If you'd like to explore how automated API testing can support this approach, you can spin up a free trial to try this catalog pattern and see how generated negative tests, schema validation, and API contracts work together in practice.

5 OpenAPI Mistakes That Break Every Test Generator I've Tried

Smeet Gohel — Thu, 25 Jun 2026 13:28:37 +0000

I fed the same OpenAPI spec into four different test generators last weekend. All four failed on the same five things.

That wasn't supposed to happen.

The generators came from different vendors. They used different parsing engines, different test-generation strategies, and different AI capabilities. Some focused on contract testing. Others emphasized intelligent test creation and edge-case discovery.

Yet all four stumbled on the exact same sections of the specification.

The surprising part wasn't that the tools failed.

The surprising part was that every failure could be traced back to issues inside the OpenAPI document itself.

Over the years, I've learned that most API test generation problems aren't actually tool problems. They're specification quality problems.

A clean OpenAPI specification can generate hundreds of useful tests automatically.

A flawed specification can confuse even the smartest generator.

If you're using OpenAPI to generate tests, SDKs, mocks, documentation, or validation suites, these are the five most common OpenAPI mistakes I've seen repeatedly—and why they cause so much trouble.

1. Missing Required Fields on Nested `$ref` Objects

This is easily one of the most overlooked issues in OpenAPI specifications.

Consider the following schema:

Customer:
  type: object
  properties:
    id:
      type: integer
    address:
      $ref: '#/components/schemas/Address'

The referenced schema looks like:

Address:
  type: object
  properties:
    street:
      type: string
    city:
      type: string

Looks harmless.

Now imagine the actual API requires:

{
  "street": "Main Street",
  "city": "London"
}

but the schema never specifies:

required:
  - street
  - city

The generator now assumes both fields are optional.

Why Test Generators Struggle

When generating positive and negative test cases, the tool needs to know:

Which fields must exist
Which fields may be omitted
Which omissions should trigger validation failures

Without required declarations, generators often create:

{
  "address": {}
}

and treat the payload as valid.

The resulting tests provide little value because they don't reflect real application behavior.

Best Practice

Always define required fields explicitly at every schema level.

Even when schemas are reused via $ref, each referenced object should clearly identify mandatory properties.

Never assume the generator will infer business intent.

Schemas should be unambiguous.

2. `anyOf` With No Discriminator (The One That Hurts Most)

If I had to choose the single most problematic specification pattern, this would be it.

Consider:

Pet:
  anyOf:
    - $ref: '#/components/schemas/Cat'
    - $ref: '#/components/schemas/Dog'

This tells the generator:

"The payload may match Cat or Dog."

Sounds reasonable.

The problem is that the generator has no reliable way to determine which schema should be used for a specific test case.

Why It Breaks Generation

Imagine:

Cat:
  properties:
    name:
      type: string

Dog:
  properties:
    name:
      type: string

Now the payload:

{
  "name": "Buddy"
}

matches both schemas.

The generator faces several questions:

Is this a Cat?
Is this a Dog?
Should both test paths be created?
Which validation rules apply?

Different tools make different assumptions.

Most assumptions are wrong.

The Fix

Use discriminators whenever possible.

Example:

Pet:
  oneOf:
    - $ref: '#/components/schemas/Cat'
    - $ref: '#/components/schemas/Dog'
  discriminator:
    propertyName: petType

Now responses become:

{
  "petType": "Dog",
  "name": "Buddy"
}

The ambiguity disappears.

Test generation becomes deterministic.

This single change often improves generated coverage dramatically.

3. Untyped Query Parameters (String vs Integer Ambiguity)

Another surprisingly common issue involves query parameters.

Consider:

parameters:
  - name: page
    in: query

Looks simple.

Unfortunately, the schema never specifies the data type.

Why This Creates Problems

A generator needs type information to create meaningful tests.

Should it generate:

?page=1

?page=abc

?page=true

Without explicit typing, every option becomes theoretically valid.

Different generators handle this differently:

Some assume strings
Some guess based on naming
Some generate everything
Some skip validation entirely

None of these approaches are ideal.

The Hidden Impact

This issue affects more than positive tests.

It also impacts:

Boundary testing
Negative testing
Fuzz testing
Schema validation

For example:

schema:
  type: integer
  minimum: 1
  maximum: 100

enables generators to automatically create:

Without typing, those valuable edge cases disappear.

Best Practice

Always define parameter schemas completely.

Example:

parameters:
  - name: page
    in: query
    schema:
      type: integer
      minimum: 1

The more constraints provided, the better the generated tests become.

4. Example Values That Don't Match the Schema

This problem creates some of the most confusing failures.

Imagine:

type: integer
example: "123"

type: boolean
example: "true"

The examples look correct at first glance.

But they violate the schema definition.

Why This Matters

Most generators rely heavily on example values.

Examples help generate:

Request payloads
Mock data
Positive test cases
Sample assertions

When examples contradict schemas, the generator receives conflicting instructions.

The schema says:

type: integer

The example says:

"123"

Which one should the tool trust?

What Usually Happens

Different generators respond differently:

Generator A

Uses schema definition.

Generator B

Uses example value.

Generator C

Attempts to coerce data types.

Generator D

Fails validation completely.

The result is inconsistent behavior across platforms.

Best Practice

Treat examples as executable documentation.

Every example should validate successfully against its schema.

A useful review process is:

Validate schema
Validate examples
Validate generated payloads

All three should agree.

5. Auth Scheme Defined Globally but Overridden Per-Path

Authentication definitions often create subtle specification issues.

Consider:

security:
  - bearerAuth: []

This applies globally.

Everything looks fine.

Later, a specific endpoint introduces:

paths:
  /public-data:
    get:
      security: []

This explicitly removes authentication.

Still valid.

The trouble begins when specifications contain dozens or hundreds of endpoints.

Why Test Generators Fail Here

Generators must determine:

Which endpoints require authentication
Which endpoints are public
Which credentials to attach
Which negative auth scenarios to generate

Conflicting security definitions make this surprisingly difficult.

I've seen generators:

Add tokens to public endpoints
Skip tokens on protected endpoints
Generate invalid authentication tests
Ignore overrides completely

The Real Problem

Many teams inherit APIs over several years.

Security definitions evolve.

Documentation evolves.

Endpoints move between versions.

Eventually, the specification contains multiple overlapping security patterns.

Humans can usually understand the intent.

Generators often cannot.

Best Practice

Maintain a clear authentication strategy.

Review:

Global security settings
Path-level overrides
Operation-level overrides

Whenever possible, minimize exceptions.

The fewer special cases you create, the easier automated tooling becomes.

Why These Mistakes Keep Appearing

What's interesting is that none of these issues are technically invalid OpenAPI.

Many specifications containing these patterns will still:

Render documentation correctly
Generate SDKs successfully
Pass schema validation

The problems emerge only when advanced automation enters the picture.

Test generation requires much higher precision than documentation generation.

Documentation can tolerate ambiguity.

Automated testing cannot.

A Simple Validation Checklist

Before feeding an OpenAPI document into any generator, review the following:

Schema Definitions

Are required fields defined?
Are nested references complete?
Are enums constrained properly?

Polymorphism

Does every anyOf or oneOf include a discriminator?
Can payload types be identified deterministically?

Parameters

Are query parameters typed?
Are constraints defined?

Examples

Do examples validate against schemas?
Are example payloads realistic?

Authentication

Are security rules consistent?
Are overrides intentional?

This five-minute review can save hours of debugging later.

Final Thoughts

After testing multiple generators against the same API specification, one conclusion became obvious:

Most test-generation failures begin long before the generator runs.

They begin when the specification is written.

The better your OpenAPI document, the better your generated tests, mocks, SDKs, documentation, and validation suites become.

Tools continue to improve every year.

AI-assisted generation is becoming increasingly capable.

Yet even the most advanced platform struggles when the specification contains ambiguity, conflicting definitions, or incomplete schema information.

If you're planning to generate automated API tests from your OpenAPI specification, ensuring your specification is well-designed and complete is one of the most valuable investments you can make. To learn the best practices, avoid common pitfalls, and generate reliable API tests.

A clean specification doesn't just improve documentation.

It becomes the foundation for every automation layer built on top of it.

DEV Community: Smeet Gohel

Postman Collections vs OpenAPI Specs: Which Scales as a Source of Truth?

What Each Format Actually Models (And What It Doesn't)

What a Postman Collection Models

What It Doesn't Model Well

What an OpenAPI Specification Models

The Drift Problem at the 50-Endpoint Mark

How Drift Begins

Why This Gets Worse Over Time

Tooling Around Each: Code Generation, Documentation, Mocks, and Tests

Postman Tooling

Where OpenAPI Pulls Ahead

They Complement Each Other

Migrating Postman → OpenAPI Without Losing Examples

Step 1: Export Existing Collections

Step 2: Generate a Baseline Specification

Step 3: Promote Examples

Step 4: Generate New Collections

When Keeping Both Actually Makes Sense

A Practical Recommendation

Step 1

Step 2

Step 3

Step 4

Step 5

Common Mistakes Teams Make

Treating Postman as Documentation

Maintaining Both Manually

Ignoring Examples

Writing Tests Against Collections Alone

Delaying OpenAPI Adoption

Final Thoughts

Testing Async Jobs and Queues End-to-End (Without sleep())

Why Async APIs Are Hard to Test

Why sleep() Is Almost Always the Wrong Tool

Why Polling Beats Sleep

Polling Without Hammering the Database

The eventually() Helper — 20 Lines That Eliminate Most Sleeps

Why It Works So Well

Asserting on Side Effects: The Row, the Event, the Cache

1. Database Changes

2. Published Events

3. Cache Updates

4. Notifications

Choosing a Timeout Strategy

Measure the 99th Percentile

Why Not Infinite Retries?

Testing Job Retries

Testing Dead Letter Queues (DLQs)

Common Mistakes in Async API Testing

Using Fixed Sleeps

Verifying Only HTTP Responses

Ignoring Side Effects

Skipping Retry Logic

Never Testing Failure Paths

A Practical Async Testing Checklist

Final Thoughts

API Error Codes: A Test Suite Pattern I Stole from Stripe

Why Error Responses Are Part of the API Contract

The Error-Code Catalog as a Test Input

Why a Catalog Helps

One Test per Error Code, Generated from the Catalog

Why This Scales Better

The Shape Assertion That Prevents Silent Error Drift

The Solution: Shape Assertions

Why This Matters

Keeping the Catalog in Sync with the Code (Code Generation)

Benefits of Codegen

Documentation Never Falls Behind

Generated Tests Stay Current

API Consumers Stay Aligned

Code Reviews Become Easier

The Two Error Codes We Deliberately Don't Test (And Why)

1. Generic Internal Server Errors

2. Infrastructure Failures

Additional Benefits We Didn't Expect

More Consistent APIs

Better Frontend Development

Simpler Documentation

Cleaner Pull Requests

Why `sleep()` Is Almost Always the Wrong Tool

The `eventually()` Helper — 20 Lines That Eliminate Most Sleeps

1. Missing Required Fields on Nested `$ref` Objects

2. `anyOf` With No Discriminator (The One That Hurts Most)