Every company processing payments tests the happy path.
Payment succeeds, order gets fulfilled, customer gets a confirmation email. That’s the flow that gets reviewed, tested in staging, and monitored in production.
What doesn’t get tested is everything else.
Dispute spikes. Refund storms. Gateway errors that leave orders stuck. Webhook sequences your handlers were never built to handle at volume.
These are the failure modes that show up in production usually at the worst possible time.
The problem is not that engineering teams don’t care.
It’s that the tools to test this don’t exist.
Why you can’t test this in Razorpay’s sandbox
Razorpay’s test API cannot create disputes.
Disputes are raised by banks and card networks, not merchants. There is no POST /disputes endpoint.
Even if you could trigger disputes manually, you can’t fire 150 of them in 10 seconds on a test account. Razorpay would rate limit you. And you can’t control the timing or sequence of webhook events in any sandbox.
So the failure mode you most need to test is the one the provider doesn’t let you simulate.
Teams ship, cross their fingers, and find out what breaks when customers find it first.
What we built
Carbon Layer is an open-source chaos engineering tool for payment flows.
You run a scenario dispute spike, refund storm, payment decline spike and it fires Razorpay format webhook events directly at your endpoint.
Same JSON shape.
Same headers.
Same HMAC-SHA256 signature as real Razorpay webhooks.
Your server can’t tell the difference.
pip install carbon-layer
carbon run dispute-spike \
--provider mock \
--webhook-url http://localhost:8000/webhooks/razorpay
No Razorpay account needed.
No sandbox credentials.
No rate limits.
The report
The report shows exactly what happened:
Webhook Delivery Summary
Target: http://localhost:8000/webhooks/razorpay
Event Type Sent 2xx 4xx 5xx Timeout
payment.captured 100 98 0 1 1
payment.dispute.created 150 135 0 12 3
refund.processed 50 49 0 1 0
Total 300 282 0 14 4
14 events your handler didn’t process correctly.
In production, each unhandled dispute is a chargeback the merchant loses by default.
Scenarios available
• dispute-spike — 150 disputes on captured payments
• payment-decline-spike — Simulates a 30% payment failure rate
• refund-storm — Mass refunds across captured payments
• flash-sale — High-volume order and payment flow
• gateway-error-burst — Intermittent gateway failures
• min-amount — Minimum paise transactions
• max-amount — Large-value transactions
All scenarios work with the mock adapter.
No external account required.
If you use Razorpay
You can also run scenarios against your actual Razorpay test account:
carbon run dispute-spike \
--provider razorpay \
--api-key rzp_test_xxx \
--api-secret yyy \
--webhook-url https://your-staging-app.com/webhooks/razorpay
Note: Razorpay’s API doesn’t support server-side payment or dispute creation.
Scenarios that need these fall back to the mock adapter automatically.
Try it
pip install carbon-layer
GitHub:
https://github.com/Pritom14/carbon-layer
Update: new features shipped in v0.2.0
Three new features based on feedback:
Parameter overrides -- override scenario parameters at runtime without editing YAML:
carbon run dispute-spike --set baseline_orders=500 --set dispute_rate=0.3
HTML reports -- export a shareable report for your team:
carbon report --run-id <id> --format html
CI/CD integration -- POST run results to your pipeline:
carbon run dispute-spike \
--webhook-url http://localhost:8000/webhooks/razorpay \
--callback-url http://ci/carbon/results
Built with Python, asyncpg, httpx, and Typer.
Open source under Apache 2.0.
Feedback welcome especially if you’re building on Razorpay and want to run this against your staging environment.
Top comments (1)
The "tools to test the failure modes don't exist" framing is exactly right — every team I've worked with on payments has the happy-path test suite covered and a complete blind spot on dispute spikes, refund storms, and the gateway-error sequences that show up at the worst possible time. Two patterns from running this at scale on the open-banking side that translate one-to-one to your card-side scenarios: (1) the Razorpay-event-shape fidelity is the right call — the moment your synthetic events drift from the real provider's payload (timestamp precision, header casing, signature scheme, optional fields the SDK injects), your handlers catch bugs the real webhook stream wouldn't, and you get false confidence; (2) the per-event-type breakdown in the report (sent / 2xx / 4xx / 5xx / timeout) is the right aggregation — total error count is what most teams alert on, but the real signal is "dispute.created went from 99% 2xx to 87% 2xx in the last hour" and that only shows up when you bucket by event type. Carbon Layer's CI/CD callback in v0.2.0 is the underrated half — the moment chaos runs are gated on PR merge, the failure modes stop showing up first in production. Will be sharing this with our infra team.
One question: are the synthetic dispute events stable across Razorpay schema versions, or does the mock adapter need updating each time the provider tweaks the payload shape? That's usually the maintenance burden that kills these tools after month six.