Last month I was integrating with a payment API. Wrote my tests against a mock server, everything passed, shipped to staging — and the whole flow broke.
The mock told me POST /charges returns {"id": "ch_123"}. And it does. But my code then called GET /charges/ch_123 to verify the status, and the mock returned 404. Because the mock doesn't actually store anything. Every request lives in its own universe.
I lost half a day to this. And it wasn't the first time.
The problem with stateless mocks
I've used Prism, WireMock, Mockoon — they're solid tools. You point them at an OpenAPI spec and they generate responses. But the responses are canned. There's no memory between requests:
POST /customers → 201 {"id": "cust_123"}
GET /customers/cust_123 → 404 # has no idea you just created this
This works fine for unit tests where you're testing your HTTP client. It falls apart the moment you have a multi-step flow.
Think about how a real Stripe integration works:
- Create a customer
- Create a payment intent for that customer (needs the customer ID from step 1)
- Confirm the payment intent (needs the PI ID from step 2)
- A webhook fires (your server needs to handle it)
A mock server can't do steps 2-4. The IDs don't carry over. The webhook never fires. You're testing a fantasy.
What I actually needed
I needed a sandbox where:
- POST creates a real resource I can GET later
- IDs chain between requests like they would in production
- State transitions work (a charge goes from
pendingtosucceeded) - Webhooks fire when things change
Basically — not a mock, but a tiny fake version of the actual API that behaves like the real thing.
So I built one
I've been heads-down on FetchSandbox for a few months now. You give it an OpenAPI spec and it generates a stateful sandbox with seed data, state machines, and webhook events.
Here's what it looks like from the terminal:
npm install -g fetchsandbox
fetchsandbox generate ./stripe-openapi.yaml
# ✓ Sandbox ready: 587 endpoints, 63 seed records
fetchsandbox run stripe --all
# ✓ Accept a payment — 3/3 steps passed
# ✓ Onboard a connected account — 3/3 steps passed
# ✓ Respond to a dispute — 2/2 steps passed
# ✓ All workflows passed — 3/3 (9ms)
That run --all command is the thing I wish I'd had. It executes every integration workflow end-to-end — creating resources, chaining IDs between steps, and verifying each response. If something breaks, you see exactly which step failed and why.
The stuff that surprised me while building it
Error scenarios were harder than happy paths. I added a --scenario flag so you can switch the whole sandbox to "auth_failure" mode and see what happens:
fetchsandbox run stripe accept_payment --scenario auth_failure
# ✗ Step 1: POST /v1/payment_intents → 401 Unauthorized
# Scenario "auth_failure" correctly caused failure.
# Scenario reset to default.
My code had a bug where it didn't handle 401 on the payment intent endpoint — only on the customer endpoint. Would never have caught that with a regular mock.
Webhooks were a rabbit hole. In a real Stripe integration, half the logic is in webhook handlers. The sandbox now fires webhook events when resources mutate, and you can watch them in real-time:
fetchsandbox webhook-listen stripe
# 12:04:31 payment_intent.created pi_xyz → requires_confirmation
# 12:04:32 payment_intent.succeeded pi_xyz → succeeded
Inspecting state is underrated. After running a workflow, you can see exactly what's in the sandbox:
fetchsandbox state stripe customers
# customers — 3 records
# ┌──────────────┬─────────────────┬──────────┐
# │ id │ email │ status │
# ├──────────────┼─────────────────┼──────────┤
# │ cust_abc123 │ test@acme.com │ active │
# └──────────────┴─────────────────┴──────────┘
How it compares to the alternatives
I'm not going to pretend FetchSandbox replaces everything. Here's where I honestly think it sits:
| Mock server (Prism) | Vendor sandbox (Stripe test mode) | FetchSandbox | |
|---|---|---|---|
| Setup time | 1 min | 15-30 min (account + keys) | < 30 sec |
| Stateful | No | Yes | Yes |
| Signup required | No | Yes | No |
| Works offline | Yes | No | Hosted (offline coming) |
| Matches prod exactly | No | Yes | No (schema-accurate, not logic-accurate) |
| Webhooks | No | Yes (with CLI forwarding) | Yes (built-in) |
| Any OpenAPI spec | Yes | Only their API | Yes |
The honest gap: FetchSandbox doesn't replicate vendor-specific business logic. Stripe's test mode knows that a card ending in 4242 succeeds and 4000000000000002 declines. FetchSandbox doesn't. It validates your integration pattern, not the vendor's edge cases.
For me, that's the right tradeoff. I use FetchSandbox while building the integration, then switch to the vendor's test mode for final validation.
CI/CD is where it clicks
The thing I'm most excited about is this:
# GitHub Actions
- name: Prove integration works before deploy
run: npx fetchsandbox run stripe --all --json
Exit code 0 = all workflows pass. Exit code 1 = something broke. Your pipeline catches integration regressions before they hit staging.
Numbers
I ran a benchmark — time from "I want to explore this API" to "I made my first successful call":
- Vendor docs path: 15-30 minutes (signup → dashboard → keys → local server → SDK → code → run)
- FetchSandbox path: under 1 second (open portal → endpoint is callable)
Nordic APIs defines TTFC (time to first call) benchmarks: under 2 minutes is "Champion" tier. Over 10 minutes is a "Red Flag."
Try it
19 APIs are live right now — Stripe, GitHub, Twilio, WorkOS, OpenAI, DigitalOcean, and more. No signup needed.
It's free during early access while I figure out what developers actually need from it. If you try it and something breaks or feels wrong, I genuinely want to know — I'm @fetchsandbox on X.
Curious what other people's testing setups look like for third-party APIs. Do you mock everything? Use vendor test modes? Some hybrid? Drop a comment — I've been deep in this problem for months and I'm still learning.
Top comments (0)