Smeet Gohel

Posted on Jun 26

API Error Codes: A Test Suite Pattern I Stole from Stripe

#api #testing #ai #opensource

Read Stripe's API reference for an hour and you'll notice every endpoint has a complete enumerated list of error codes with example payloads. Then look at your own API.

The contrast is hard to ignore.

Stripe's API documentation treats errors as first-class citizens. Every endpoint clearly documents not only the happy path but also every expected failure, complete with structured error codes, descriptions, HTTP status codes, and example responses.

Now compare that to many APIs in production.

You might find a generic list of HTTP status codes somewhere in the documentation, but business-specific errors are often buried inside controller logic, scattered across wiki pages, or simply undocumented. The test suite isn't much better—there are dozens of happy-path tests, but only a handful of negative scenarios.

That imbalance creates problems for everyone involved:

Developers don't know which errors are expected.
Frontend teams can't reliably handle failures.
QA engineers miss important negative cases.
Refactoring accidentally changes error responses without anyone noticing.

A few years ago, I borrowed a simple idea from Stripe's documentation and turned it into a testing strategy.

Instead of treating error responses as exceptions, we created an error-code catalog and made it the foundation of our negative test suite.

The result wasn't just better API error code testing—it also improved documentation, simplified maintenance, and made API contracts far more consistent.

Here's how the pattern works.

Why Error Responses Are Part of the API Contract

When people think about API testing, they naturally focus on successful responses.

Typical assertions include:

HTTP 200 OK
HTTP 201 Created
Correct JSON payload
Required fields
Business calculations

Negative testing often gets much less attention.

Maybe there are a few tests for:

Invalid authentication
Missing required fields
Unknown resources

Beyond that, many APIs rely on manual testing or hope that the framework handles everything correctly.

The problem is that real users encounter failures just as often as successful requests.

Examples include:

Customer account locked
Payment declined
Coupon expired
Inventory unavailable
Duplicate registration
Subscription canceled
Rate limit exceeded

These aren't exceptional scenarios.

They're expected business outcomes.

Treating them as first-class API contracts changes how you design both documentation and tests.

The Error-Code Catalog as a Test Input

The first step is creating a centralized catalog of every business error the API can intentionally return.

A simplified example might look like this:

errors:
  USER_NOT_FOUND:
    httpStatus: 404
    message: User not found

  EMAIL_ALREADY_EXISTS:
    httpStatus: 409
    message: Email already exists

  INVALID_TOKEN:
    httpStatus: 401
    message: Invalid authentication token

  PAYMENT_DECLINED:
    httpStatus: 402
    message: Payment declined

  ORDER_ALREADY_SHIPPED:
    httpStatus: 409
    message: Order cannot be modified

This catalog becomes far more than documentation.

It becomes an executable specification.

Instead of asking:

"What errors should this endpoint return?"

the answer already exists in one authoritative location.

Every new business error must be added here before it reaches production.

That single requirement dramatically improves consistency.

Why a Catalog Helps

Without a catalog:

Documentation drifts.
Tests become incomplete.
Frontend teams discover errors by accident.
Reviewers overlook breaking changes.

With a catalog:

Every error is documented.
Every error becomes testable.
Every API consumer sees the same contract.

The catalog becomes the foundation for automation.

One Test per Error Code, Generated from the Catalog

Once the catalog exists, generating negative tests becomes surprisingly straightforward.

Rather than manually writing dozens of repetitive tests, a generator simply iterates through every defined error.

Conceptually:

for (const errorCode of catalog) {
    generateNegativeTest(errorCode);
}

Each generated test validates four things:

The expected HTTP status
The error code
The error message
The response schema

Consider EMAIL_ALREADY_EXISTS.

The generated scenario might:

Create a user.
Attempt to create the same user again.
Verify the response:

{
  "code": "EMAIL_ALREADY_EXISTS",
  "message": "Email already exists"
}

The implementation differs depending on the framework, but the testing philosophy remains the same:

Every documented error deserves exactly one corresponding test.

As new error codes are introduced, new tests appear automatically.

No engineer has to remember to write them.

Why This Scales Better

Imagine your API exposes:

150 endpoints
90 business error codes

Maintaining those manually quickly becomes tedious.

Generation solves two maintenance problems simultaneously:

Missing tests
Duplicate effort

Instead of asking developers to remember every negative case, the catalog guarantees baseline coverage.

Engineers can then focus on more complex business workflows rather than repetitive validation tests.

The Shape Assertion That Prevents Silent Error Drift

One lesson we learned very early was this:

Checking only the HTTP status is almost useless.

Imagine an endpoint originally returns:

{
  "code": "USER_NOT_FOUND",
  "message": "User not found",
  "requestId": "abc123"
}

Months later, someone refactors the global exception handler.

The response becomes:

{
  "error": "User not found"
}

The HTTP status is still:

Many tests still pass.

But every client expecting the original response contract is now broken.

This is known as silent error drift.

Nothing appears wrong until consumers start failing.

The Solution: Shape Assertions

Every negative test also validates the response structure.

Example:

expect(response.body).toEqual({
    code: expect.any(String),
    message: expect.any(String),
    requestId: expect.any(String)
});

Notice that we're not only validating values.

We're validating the schema itself.

That single assertion protects every API consumer from accidental response changes.

Why This Matters

Consumers often depend on:

Error codes
Localization keys
Correlation IDs
Documentation URLs

Removing any of these fields can become a breaking API change even though the HTTP status remains correct.

Schema validation catches those problems immediately.

Keeping the Catalog in Sync with the Code (Code Generation)

The obvious concern is maintenance.

If engineers must manually update both:

Source code
Error catalog

the catalog eventually becomes outdated.

The solution is code generation.

Most applications already define errors centrally.

For example:

export enum ErrorCode {
    USER_NOT_FOUND,
    INVALID_TOKEN,
    PAYMENT_DECLINED,
    EMAIL_ALREADY_EXISTS
}

A simple generation step can produce:

API documentation
OpenAPI components
Markdown reference tables
Test inputs
SDK constants

All from the same source.

Now there's only one place where error definitions live.

Everything else is generated automatically.

Benefits of Codegen

This approach creates several advantages:

Documentation Never Falls Behind

As soon as a new error appears in code, documentation updates automatically.

Generated Tests Stay Current

No manual synchronization required.

API Consumers Stay Aligned

Client SDKs can reference the same constants used by the server.

Code Reviews Become Easier

Adding a new business error becomes highly visible because it affects generated documentation and tests.

The Two Error Codes We Deliberately Don't Test (And Why)

Although our negative suite covers nearly every business error, there are two categories we intentionally exclude.

1. Generic Internal Server Errors

Example:

500 Internal Server Error

These represent unexpected failures.

They're not part of normal business behavior.

Rather than intentionally triggering every possible internal exception, we verify:

Sensitive details aren't exposed
Generic messages are returned
Correlation IDs exist
Logging occurs correctly

Testing every possible server failure adds little value.

Testing the response contract provides much greater return.

2. Infrastructure Failures

Examples include:

Database unavailable
Network partition
DNS outage
Message broker failure
Cloud storage unavailable

These failures belong to resilience testing rather than standard API automation.

They are better validated using:

Chaos engineering
Fault injection
Infrastructure testing
Disaster recovery exercises

Mixing infrastructure scenarios into routine API negative tests usually creates unstable pipelines.

Keeping them separate results in cleaner and more reliable automation.

Additional Benefits We Didn't Expect

Once the catalog became part of our development process, several unexpected improvements appeared.

More Consistent APIs

Every endpoint used the same response format.

Better Frontend Development

Frontend teams no longer guessed which errors could occur.

Simpler Documentation

Error references stayed synchronized automatically.

Cleaner Pull Requests

Adding a new error became an explicit design decision rather than an implementation detail.

Better QA Coverage

Negative scenarios became just as visible as successful ones.

Final Thoughts

Most engineering teams invest heavily in testing successful requests while treating failures as secondary concerns.

Stripe demonstrates a different philosophy.

Errors are documented, standardized, and treated as an integral part of the public API contract.

Building an error-code catalog allowed us to adopt that same mindset.

Instead of manually maintaining dozens of repetitive error response testing scenarios, we generated them from a single source of truth.

Combined with response schema validation and code generation, the approach dramatically reduced maintenance while increasing confidence that every documented failure behaved exactly as expected.

If your API already has a growing collection of business errors, consider creating a centralized catalog before the list becomes unmanageable.

The investment is relatively small, but the payoff in documentation quality, test coverage, and long-term maintainability is substantial.

If you'd like to explore how automated API testing can support this approach, you can spin up a free trial to try this catalog pattern and see how generated negative tests, schema validation, and API contracts work together in practice.

DEV Community