Pritom Mazumdar

Posted on Mar 18

We fired 150 dispute webhooks at a payment service. 12 handlers crashed. Here's what we built.

#stripe #python #testing #webdev

Every company processing payments tests the happy path.

Payment succeeds, order gets fulfilled, customer gets a confirmation email. That’s the flow that gets reviewed, tested in staging, and monitored in production.

What doesn’t get tested is everything else.

Dispute spikes. Refund storms. Gateway errors that leave orders stuck. Webhook sequences your handlers were never built to handle at volume.

These are the failure modes that show up in production usually at the worst possible time.

The problem is not that engineering teams don’t care.
It’s that the tools to test this don’t exist.

Why you can’t test this in Razorpay’s sandbox

Razorpay’s test API cannot create disputes.

Disputes are raised by banks and card networks, not merchants. There is no POST /disputes endpoint.

Even if you could trigger disputes manually, you can’t fire 150 of them in 10 seconds on a test account. Razorpay would rate limit you. And you can’t control the timing or sequence of webhook events in any sandbox.

So the failure mode you most need to test is the one the provider doesn’t let you simulate.

Teams ship, cross their fingers, and find out what breaks when customers find it first.

What we built

Carbon Layer is an open-source chaos engineering tool for payment flows.

You run a scenario dispute spike, refund storm, payment decline spike and it fires Razorpay format webhook events directly at your endpoint.

Same JSON shape.
Same headers.
Same HMAC-SHA256 signature as real Razorpay webhooks.

Your server can’t tell the difference.

pip install carbon-layer

carbon run dispute-spike \
  --provider mock \
  --webhook-url http://localhost:8000/webhooks/razorpay

No Razorpay account needed.
No sandbox credentials.
No rate limits.

The report

The report shows exactly what happened:

Webhook Delivery Summary
Target: http://localhost:8000/webhooks/razorpay

Event Type                 Sent    2xx    4xx    5xx    Timeout
payment.captured            100     98      0      1          1
payment.dispute.created     150    135      0     12          3
refund.processed             50     49      0      1          0

Total                       300    282      0     14          4

14 events your handler didn’t process correctly.

In production, each unhandled dispute is a chargeback the merchant loses by default.

Scenarios available

• dispute-spike — 150 disputes on captured payments
• payment-decline-spike — Simulates a 30% payment failure rate
• refund-storm — Mass refunds across captured payments
• flash-sale — High-volume order and payment flow
• gateway-error-burst — Intermittent gateway failures
• min-amount — Minimum paise transactions
• max-amount — Large-value transactions

All scenarios work with the mock adapter.
No external account required.

If you use Razorpay

You can also run scenarios against your actual Razorpay test account:

carbon run dispute-spike \
  --provider razorpay \
  --api-key rzp_test_xxx \
  --api-secret yyy \
  --webhook-url https://your-staging-app.com/webhooks/razorpay

Note: Razorpay’s API doesn’t support server-side payment or dispute creation.
Scenarios that need these fall back to the mock adapter automatically.

Try it

pip install carbon-layer

GitHub:

https://github.com/Pritom14/carbon-layer

Update: new features shipped in v0.2.0

Three new features based on feedback:

Parameter overrides -- override scenario parameters at runtime without editing YAML:

carbon run dispute-spike --set baseline_orders=500 --set dispute_rate=0.3

HTML reports -- export a shareable report for your team:

carbon report --run-id <id> --format html

CI/CD integration -- POST run results to your pipeline:

carbon run dispute-spike \
    --webhook-url http://localhost:8000/webhooks/razorpay \
    --callback-url http://ci/carbon/results

Built with Python, asyncpg, httpx, and Typer.
Open source under Apache 2.0.

Feedback welcome especially if you’re building on Razorpay and want to run this against your staging environment.

Top comments (1)

arun rajkumar • May 13

The "tools to test the failure modes don't exist" framing is exactly right — every team I've worked with on payments has the happy-path test suite covered and a complete blind spot on dispute spikes, refund storms, and the gateway-error sequences that show up at the worst possible time. Two patterns from running this at scale on the open-banking side that translate one-to-one to your card-side scenarios: (1) the Razorpay-event-shape fidelity is the right call — the moment your synthetic events drift from the real provider's payload (timestamp precision, header casing, signature scheme, optional fields the SDK injects), your handlers catch bugs the real webhook stream wouldn't, and you get false confidence; (2) the per-event-type breakdown in the report (sent / 2xx / 4xx / 5xx / timeout) is the right aggregation — total error count is what most teams alert on, but the real signal is "dispute.created went from 99% 2xx to 87% 2xx in the last hour" and that only shows up when you bucket by event type. Carbon Layer's CI/CD callback in v0.2.0 is the underrated half — the moment chaos runs are gated on PR merge, the failure modes stop showing up first in production. Will be sharing this with our infra team.

One question: are the synthetic dispute events stable across Razorpay schema versions, or does the mock adapter need updating each time the provider tweaks the payload shape? That's usually the maintenance burden that kills these tools after month six.