My mock server lied to me. So I built a stateful API sandbox.

#api #apitesting #openapi #webdev

Last month I was integrating with a payment API. Wrote my tests against a mock server, everything passed, shipped to staging — and the whole flow broke.

The mock told me POST /charges returns {"id": "ch_123"}. And it does. But my code then called GET /charges/ch_123 to verify the status, and the mock returned 404. Because the mock doesn't actually store anything. Every request lives in its own universe.

I lost half a day to this. And it wasn't the first time.

The problem with stateless mocks

I've used Prism, WireMock, Mockoon — they're solid tools. You point them at an OpenAPI spec and they generate responses. But the responses are canned. There's no memory between requests:

POST /customers → 201 {"id": "cust_123"}
GET /customers/cust_123 → 404   # has no idea you just created this

This works fine for unit tests where you're testing your HTTP client. It falls apart the moment you have a multi-step flow.

Think about how a real Stripe integration works:

Create a customer
Create a payment intent for that customer (needs the customer ID from step 1)
Confirm the payment intent (needs the PI ID from step 2)
A webhook fires (your server needs to handle it)

A mock server can't do steps 2-4. The IDs don't carry over. The webhook never fires. You're testing a fantasy.

What I actually needed

I needed a sandbox where:

POST creates a real resource I can GET later
IDs chain between requests like they would in production
State transitions work (a charge goes from pending to succeeded)
Webhooks fire when things change

Basically — not a mock, but a tiny fake version of the actual API that behaves like the real thing.

So I built one

I've been heads-down on FetchSandbox for a few months now. You give it an OpenAPI spec and it generates a stateful sandbox with seed data, state machines, and webhook events.

Here's what it looks like from the terminal:

npm install -g fetchsandbox

fetchsandbox generate ./stripe-openapi.yaml
# ✓ Sandbox ready: 587 endpoints, 63 seed records

fetchsandbox run stripe --all
# ✓ Accept a payment — 3/3 steps passed
# ✓ Onboard a connected account — 3/3 steps passed
# ✓ Respond to a dispute — 2/2 steps passed
# ✓ All workflows passed — 3/3 (9ms)

That run --all command is the thing I wish I'd had. It executes every integration workflow end-to-end — creating resources, chaining IDs between steps, and verifying each response. If something breaks, you see exactly which step failed and why.

The stuff that surprised me while building it

Error scenarios were harder than happy paths. I added a --scenario flag so you can switch the whole sandbox to "auth_failure" mode and see what happens:

fetchsandbox run stripe accept_payment --scenario auth_failure
# ✗ Step 1: POST /v1/payment_intents → 401 Unauthorized
# Scenario "auth_failure" correctly caused failure.
# Scenario reset to default.

My code had a bug where it didn't handle 401 on the payment intent endpoint — only on the customer endpoint. Would never have caught that with a regular mock.

Webhooks were a rabbit hole. In a real Stripe integration, half the logic is in webhook handlers. The sandbox now fires webhook events when resources mutate, and you can watch them in real-time:

fetchsandbox webhook-listen stripe
# 12:04:31  payment_intent.created  pi_xyz  → requires_confirmation
# 12:04:32  payment_intent.succeeded pi_xyz → succeeded

Inspecting state is underrated. After running a workflow, you can see exactly what's in the sandbox:

fetchsandbox state stripe customers
# customers — 3 records
# ┌──────────────┬─────────────────┬──────────┐
# │ id           │ email           │ status   │
# ├──────────────┼─────────────────┼──────────┤
# │ cust_abc123  │ test@acme.com   │ active   │
# └──────────────┴─────────────────┴──────────┘

How it compares to the alternatives

I'm not going to pretend FetchSandbox replaces everything. Here's where I honestly think it sits:

	Mock server (Prism)	Vendor sandbox (Stripe test mode)	FetchSandbox
Setup time	1 min	15-30 min (account + keys)	< 30 sec
Stateful	No	Yes	Yes
Signup required	No	Yes	No
Works offline	Yes	No	Hosted (offline coming)
Matches prod exactly	No	Yes	No (schema-accurate, not logic-accurate)
Webhooks	No	Yes (with CLI forwarding)	Yes (built-in)
Any OpenAPI spec	Yes	Only their API	Yes

The honest gap: FetchSandbox doesn't replicate vendor-specific business logic. Stripe's test mode knows that a card ending in 4242 succeeds and 4000000000000002 declines. FetchSandbox doesn't. It validates your integration pattern, not the vendor's edge cases.

For me, that's the right tradeoff. I use FetchSandbox while building the integration, then switch to the vendor's test mode for final validation.

CI/CD is where it clicks

The thing I'm most excited about is this:

# GitHub Actions
- name: Prove integration works before deploy
  run: npx fetchsandbox run stripe --all --json

Exit code 0 = all workflows pass. Exit code 1 = something broke. Your pipeline catches integration regressions before they hit staging.

Numbers

I ran a benchmark — time from "I want to explore this API" to "I made my first successful call":

Vendor docs path: 15-30 minutes (signup → dashboard → keys → local server → SDK → code → run)
FetchSandbox path: under 1 second (open portal → endpoint is callable)

Nordic APIs defines TTFC (time to first call) benchmarks: under 2 minutes is "Champion" tier. Over 10 minutes is a "Red Flag."

Try it

19 APIs are live right now — Stripe, GitHub, Twilio, WorkOS, OpenAI, DigitalOcean, and more. No signup needed.

fetchsandbox.com

It's free during early access while I figure out what developers actually need from it. If you try it and something breaks or feels wrong, I genuinely want to know — I'm @fetchsandbox on X.

Curious what other people's testing setups look like for third-party APIs. Do you mock everything? Use vendor test modes? Some hybrid? Drop a comment — I've been deep in this problem for months and I'm still learning.