DEV Community: Pritom Mazumdar

Tired of Being Paged at 3am? Let Your AI Handle the Runbook

Pritom Mazumdar — Fri, 03 Apr 2026 18:58:49 +0000

When that alert fires at 3:14am on Sunday, you know the drill: VPN in, SSH to the server, check logs, maybe restart the service, page escalates to someone else. You've probably done this 100 times.

What if the runbook executed itself? See it for yourself

Meet RunbookAI

RunbookAI is an open-source autonomous incident response agent. Connect it to PagerDuty, fire a webhook at it, and it reads your runbook, diagnoses the problem, and acts—without paging a human first.

How It Works

Alert fires → RunbookAI reads the runbook
Diagnosis → runs tools: check_logs, http_check, run_db_check, query_metrics, check_disk, check_processes
Remediation → executes: restart_service, clear_disk, scale_service
Resolves or escalates → full summary, no human was involved

git clone https://github.com/Pritom14/runbookai
cd runbookai
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Run with local LLM (no API keys)
ollama pull qwen2.5:7b
DEMO_MODE=true uvicorn runbookai.main:app --port 7000
python demo/run_demo.py regression

The Game-Changer: Regression Detection

Here's the real magic. Your service crashed 2 hours ago. RunbookAI restarted it. But if it crashes again within 6 hours, the agent is warned: "Don't just restart again—you did that before. Dig deeper."

Instead of blindly running the same remediation, it:

Checks for new logs
Queries recent metrics changes
Looks for disk space issues, process hangs, or configuration drift
Suggests a root cause before acting

This turns "fix the symptom" into "understand the problem."

Suggest Mode

High-risk actions (service restart, disk cleanup, scale-up) pause with a 5-second countdown for human approval. You stay in control while the agent handles the grunt work.

Auto-Generated Postmortem

After every resolved incident, hit GET /incidents/{id}/postmortem and get a ready-to-share markdown document: full timeline, actions taken, regression analysis, duration, and a recommendations checklist. Two hours of postmortem work, done automatically.

Slack Lifecycle Notifications

Set SLACK_WEBHOOK_URL and RunbookAI posts a rich message at every stage: incident started, approval required (with the curl command to approve), resolved with duration, escalated with reason. Your Slack channel becomes your incident dashboard.

AgentTrace Replay UI

Every tool call, every decision, every second of the remediation is logged. Open the browser, replay the entire incident timeline. Understand what the agent decided and why.

Why Open Source?

Incident response is deeply custom—every company's runbooks, tools, and risk tolerance differ. We ship the core (diagnosis + remediation) free and self-hosted. No vendor lock-in, no SaaS fee, no pinging external APIs.

No API Keys Needed

Runs on Ollama locally. qwen2.5:7b is small, fast, and good enough for runbook reasoning. Everything stays on your infrastructure. Or swap in OpenAI, Anthropic, or Groq with a single env var, no code changes.

GitHub: https://github.com/Pritom14/runbookai

Try it now. Fire a demo alert. See regression detection in action. Fork, extend, and own your incident response.

How I Built Video Token Optimization for Vision LLMs: Cutting Costs 13-45% with Frame Dedup + Scene Detection

Pritom Mazumdar — Mon, 30 Mar 2026 19:30:58 +0000

A few weeks ago I launched Token0 -- an open-source proxy that optimizes images before they hit vision LLMs like GPT-4o, Claude, and Ollama models. The reception was good, so I kept building.

The most requested feature was video. If images are expensive, video is brutal -- every second at 30fps is 30 images. This post covers how I built the video optimization pipeline, what I learned benchmarking it across 5 models, and the model-aware edge case that nearly broke everything.

The Problem with Naive Video

Most apps that analyze video do one of two things:

Extract frames at 1fps and send every one of them
Send a handful of manually selected keyframes

Both approaches waste tokens in predictable ways. At 1fps on a 60-second product demo video:

You get 60 frames
Frames 1-29 of the same talking head are near-identical (Hamming distance < 10 between perceptual hashes)
The only frames with unique information are at scene transitions

You're paying for 60 images when 8-12 contain all the information.

The Pipeline: 4 Layers

Token0's video optimization runs in four stages, each optional and composable:

Layer 1: Frame Extraction

OpenCV extracts frames at 1fps (configurable). A 60s video at 30fps → 60 frames. Hard cap at 32 frames sent to the LLM.

def extract_frames(video_path, fps=1.0, max_frames=32):
    cap = cv2.VideoCapture(video_path)
    video_fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
    frame_interval = max(1, int(video_fps / fps))
    # yield every frame_interval-th frame as PIL image

Layer 2: QJL Perceptual Hash Deduplication

This is the core insight. I reused the same QJL (Quantized Johnson-Lindenstrauss) hash infrastructure I built for the image cache:

Compute 256-bit perceptual hash of each frame (dhash on 16x16 grayscale)
Compress to 128-bit binary signature using a random JL projection matrix
Compute Hamming distance between consecutive frames
If distance < 12, drop the frame (near-duplicate)

DEDUP_HAMMING_THRESHOLD = 12  # tighter than cache (consecutive frames are very similar)

def deduplicate_frames(frames, hamming_threshold=DEDUP_HAMMING_THRESHOLD):
    kept = [frames[0]]
    prev_sig = _jl_compress(_image_hash(frames[0][1]))

    for timestamp, frame in frames[1:]:
        sig = _jl_compress(_image_hash(frame))
        dist = _hamming_distance(prev_sig, sig)
        if dist > hamming_threshold:
            kept.append((timestamp, frame))
            prev_sig = sig
    return kept

On a document scanning video (invoice + receipt + screenshot on screen), this collapsed 15 consecutive near-duplicate frames down to 3 unique ones.

Layer 3: Scene Change Detection

Pixel-level diff between consecutive frames (160x120 downsampled, mean absolute difference). Frames above the threshold (15.0 mean pixel diff) are kept as scene boundaries.

def detect_scene_changes(frames, threshold=15.0):
    kept = [frames[0]]
    for i in range(1, len(frames)):
        prev_arr = np.array(frames[i-1][1].resize((160, 120))).astype(np.float32)
        curr_arr = np.array(frames[i][1].resize((160, 120))).astype(np.float32)
        diff = np.mean(np.abs(curr_arr - prev_arr))
        if diff > threshold:
            kept.append(frames[i])
    return kept

Layer 4: CLIP Scoring (optional, Layer 2)

If sentence-transformers is installed, Token0 scores each remaining frame against the user's prompt using CLIP (ViT-B/32) and keeps the top-K most relevant. Code is wired in but CLIP is an optional dependency -- most deployments skip this and the first three layers are already effective.

Each Keyframe Goes Through the Full Image Pipeline

After frame selection, every keyframe runs through the existing image optimization stack:

Smart resize (downscale to provider max)
OCR routing (if the frame is text-heavy)
JPEG recompression
Prompt-aware detail mode
Tile-optimized resize

This means you get compounding savings: fewer frames and each frame is smaller.

Benchmark Results

I tested against 5 Ollama vision models using 3 videos (product showcase, document montage, mixed content). Naive baseline = all frames at 1fps sent raw. Token0 = full pipeline.

Model	Naive Tokens	Token0 Tokens	Savings
gemma3:4b	14,706	8,081	45.0%
llava:7b	15,731	12,845	18.3%
llava-llama3	15,658	12,789	18.3%
minicpm-v	7,428	6,447	13.2%
moondream	12,288	11,714	4.7%

Why the spread? Gemma3 uses a high-resolution image encoder -- it's 45% because there are many tokens to remove per frame. Moondream uses a tiny encoder (~50 tokens/frame) -- frame dedup has less absolute impact even when it removes the same number of frames.

GPT-4o extrapolation (using OpenAI's published tile formula):

60s video, 30fps → 1fps = 60 frames → dedup to ~10 keyframes:

Naive: 60 × 425 tokens = 25,500 tokens (~$0.064/video)
Token0: 10 × 425 = 4,250 tokens (~$0.011/video)
~83% savings per video

At 10K videos/day: $19,125/mo → $3,188/mo.

The Edge Case That Nearly Broke Everything

While benchmarking, I discovered that llama3.2-vision was showing -124% savings (negative -- Token0 was making it worse).

The root cause was two bugs stacked on top of each other:

Bug 1: Provider detection miss

get_provider_from_model() didn't include llama3.2-vision, so it fell through to the "openai" default. OCR routing was then skipped because it's only enabled for models where image tokens > OCR text tokens -- but with the wrong provider, the estimate formula was wrong.

Fix: explicitly add llama3.2-vision, llama3.2, gemma3, granite3.2, qwen2.5vl, qwen3-vl to the Ollama model list.

Bug 2: Ultra-efficient encoders break the OCR savings assumption

llama3.2-vision uses ~8-27 tokens per image natively. The standard OCR flow routes text-heavy images to EasyOCR and returns extracted text (~200-700 tokens depending on content). For a model that uses 15 tokens/image, returning 300 tokens of OCR text is 20x more expensive, not cheaper.

The fix was a named allowlist of ultra-efficient models that skip OCR entirely:

_ultra_efficient_models = ("llama3.2-vision", "llama3.2")
is_ultra_efficient = any(k in model.lower() for k in _ultra_efficient_models)

if provider == "ollama" and is_ultra_efficient:
    # Skip OCR -- image tokens are already cheaper than text extraction
    plan.reasons.append(f"skip OCR: ultra-efficient encoder (~{estimated_image_tokens} tokens < OCR cost)")

After both fixes: llama3.2-vision went from -124% to 0% (correct passthrough). gemma3 stayed at 24.8% (was briefly broken by an intermediate fix attempt). granite3.2-vision: 53.1%.

The lesson: optimization strategies that help high-token-count models hurt ultra-efficient ones. You need model-aware routing, not just image-aware routing.

How to Use Video in Token0

pip install token0
token0 serve

from openai import OpenAI
import base64

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-...",
)

with open("product_demo.mp4", "rb") as f:
    video_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What happens in this video?"},
            {"type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_b64}"}}
        ]
    }],
    extra_headers={"X-Provider-Key": "sk-..."}
)

# Token0 extracted keyframes, deduped, optimized, forwarded
# response.token0.tokens_saved, optimizations_applied, etc.

Already using LiteLLM? Video works through the hook too:

import litellm
from token0.litellm_hook import Token0Hook

litellm.callbacks = [Token0Hook()]
# video_url content type automatically handled

What's Next

CLIP scoring (Layer 2): score each frame against the user's prompt and keep the top-K most relevant. Code is wired, needs pip install sentence-transformers clip to activate.
Saliency-based ROI cropping: detect what region the prompt is asking about, crop and send only that. "What's the total?" on an invoice → crop to bottom-right only.
Adaptive quality escalation: send low-detail first (85 tokens), retry at high-detail only if the response shows uncertainty. Happy path (60-70% of cases) = massive savings.

Apache 2.0. pip install token0.

GitHub: github.com/Pritom14/token0

If you're processing video through vision LLMs and have benchmarks on your own models, I'd love to compare notes. Especially curious about Gemini 2.5 Pro's native video support vs frame-by-frame through Token0.

Token0 v0.2.0: Streaming Support + Updated Benchmarks : 35-42% Savings Across 4 Vision Models

Pritom Mazumdar — Fri, 27 Mar 2026 11:46:39 +0000

A few days ago I launched Token0 -- an open-source API proxy that makes vision LLM calls cheaper by optimizing images before they hit the model. The response was great, so here is the first real update: v0.2.0 with full streaming support and expanded benchmarks.

What's New in v0.2.0

1. Streaming support (stream=true)

This was the most requested feature. Token0 now supports Server-Sent Events streaming across all four providers -- OpenAI, Anthropic, Google, and Ollama.

How it works: Token0 optimizes your images before streaming begins, then tokens flow word-by-word exactly like native provider APIs. You get the cost savings without sacrificing the real-time UX.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-...",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
        ]
    }],
    stream=True,
    extra_headers={"X-Provider-Key": "sk-..."}
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
# Final chunk includes token0 stats (tokens_saved, optimizations_applied)

A few details worth noting:

OpenAI-compatible SSE format -- data: {...}\n\n chunks with delta (not message), ending with data: [DONE]
Optimization stats on the final chunk -- the last streaming chunk includes a token0 field with tokens saved and which optimizations were applied
Cached responses stream too -- if Token0 has a cache hit, it simulates streaming by sending the cached response in small chunks, so your client code does not need to handle two different response formats
Zero overhead on text-only -- if there are no images in the request, streaming passes through with no added latency

2. Expanded benchmarks (full suite)

In v0.1.0, I only benchmarked on the real-world image suite (5 images). For v0.2.0, I ran the full benchmark suite across 6 categories: single images, text passthrough, multi-image requests, multi-turn conversations, different task types (classification, extraction, description, Q&A), and real-world images.

Results across all four Ollama vision models:

Model	Params	Direct Tokens	Token0 Tokens	Savings
minicpm-v	8B	10,877	6,276	42.3%
moondream	1.7B	16,457	10,240	37.8%
llava-llama3	8B	13,365	8,486	36.5%
llava:7b	7B	13,384	8,701	35.0%

The numbers are higher than v0.1.0 because the full suite includes more text-heavy test cases where OCR routing delivers 93-97% savings per image.

Key findings from the expanded benchmarks:

OCR routing is the biggest win: 93-97% savings on text-heavy images (documents, screenshots, receipts)
Zero overhead on text-only: confirmed 0 extra tokens across all 4 models on text-only requests
Multi-turn conversations: images in conversation history get optimized too -- no wasted tokens on re-sent images
Latency improves in most cases: OCR routing is actually faster than sending the full image to the model

3. Ollama provider routing fix

v0.1.0 had a bug where Ollama models (moondream, llava, etc.) could be incorrectly routed to the OpenAI provider. Fixed -- Token0 now correctly detects and routes all Ollama vision models.

GPT-4o Cost Projections (unchanged)

These projections from v0.1.0 still hold -- they are based on OpenAI's published token formulas, not local model benchmarks:

Scale	Without Token0	With Token0	Savings
1K images/day	$67.58/mo	$0.74/mo	98.9%
100K images/day	$6,757.50/mo	$74.47/mo	98.9%

Upgrade

pip install --upgrade token0

That is it. No config changes needed. Streaming works automatically when you pass stream=True.

What's Next

Video optimization -- keyframe extraction + per-frame optimization for video LLM calls
More provider-specific optimizations as new models launch

Links

PyPI: pip install token0
GitHub: github.com/Pritom14/token0
License: Apache 2.0

Already using LiteLLM? Token0 plugs in as a callback hook -- litellm.callbacks = [Token0Hook()] -- no proxy needed. If you tried v0.1.0, upgrade and let me know how streaming works on your workload. If you haven't tried it yet -- pip install token0 && token0 serve and change your base URL. That is all it takes.

I Cut Vision LLM Costs by 98.9% -> Here's How Token0 Works Under the Hood

Pritom Mazumdar — Fri, 27 Mar 2026 04:34:50 +0000

Every time you send an image to GPT-4o, Claude, or Gemini, you are paying for vision tokens. And most of them are wasted.

I built Token0 : an open-source API proxy that sits between your app and the LLM provider, optimizes every image request automatically, and typically saves 70-99% on vision costs. It is now live on PyPI.

In this post, I will walk through the problem, the seven optimization strategies, the benchmarks, and how to get started in under a minute.

The Problem: Vision Tokens Are Expensive and Poorly Optimized

Text token optimization is a solved problem. Prompt caching, compression, smart routing : the tooling is mature.

But images : the modality that costs 2-5x more per token have almost no optimization tooling.

Here is what happens today:

Wasted pixels. You send a 4000x3000 photo to Claude. Claude silently downscales it to 1568px max. You paid for the original resolution. Those tokens are gone.

Wrong modality. A screenshot of a document costs ~765 tokens on GPT-4o as an image. The same information extracted as text costs ~30 tokens. That is a 25x markup for identical information.

Wrong detail level. "Classify this image" on GPT-4o uses high-detail mode at 1,105 tokens. Low-detail mode gives the same answer for 85 tokens. A 13x difference that nobody is optimizing for.

Wasted tiles. GPT-4o tiles images into 512x512 blocks. A 1280x720 image creates 4 tiles (765 tokens). Resizing to 1024x768 gives 2 tiles (425 tokens). 44% savings, zero quality loss.

** How Token0 Works **

Your App --> Token0 Proxy --> [Analyze -> Classify -> Route -> Transform -> Cache] --> LLM Provider

You change one line -- your base URL -- and Token0 handles everything automatically.

Token0 applies seven optimizations:

1. Smart Resize

Each provider has a maximum resolution it actually processes. Claude caps at 1568px, GPT-4o at 2048px. Token0 downscales to these limits before sending. No quality loss because the provider would have done the same thing you just stop paying for the discarded pixels.

2. OCR Routing

When an image is mostly text (screenshots, receipts, invoices, documents), Token0 extracts the text via OCR and sends that instead. Text tokens cost 10-50x less than vision tokens.

The detection uses a multi-signal heuristic: background uniformity, color variance, horizontal line structure, and edge density. Validated at 91% accuracy on real world images. Photos are never falsely OCR routed.

3. JPEG Recompression

PNG screenshots get converted to optimized JPEG when transparency is not needed. Smaller payload, faster upload, same visual information.

4. Prompt-Aware Detail Mode

This is the interesting one. Token0 analyzes your prompt, not just the image, to decide the detail level.

"What is in this image?" --> low detail (85 tokens)
"Extract all the text from this receipt" --> high detail (1,105 tokens)

A keyword classifier on the prompt text makes this decision. Simple queries get low-detail mode automatically.

5. Tile-Optimized Resize

OpenAI charges by 512px tiles. Token0 resizes images to land exactly on tile boundaries, minimizing the number of tiles without changing the aspect ratio meaningfully.

6. Model Cascade

Not every image needs the flagship model. Token0 analyzes task complexity and routes simple tasks to cheaper models:

GPT-4o --> GPT-4o-mini (16.7x cheaper)
Claude Opus --> Claude Haiku (6.25x cheaper)

Complex tasks stay on the original model.

7. Semantic Response Cache

Token0 generates a perceptual hash of each image combined with the prompt text. If a similar request has been seen before, the cached response is returned. Zero tokens consumed.

This is particularly effective on repetitive workloads: product image classification, document processing pipelines, batch operations.

Benchmarks

I tested Token0 on four Ollama vision models with real-world images -- actual photos, a real store receipt, a typed invoice, and a desktop screenshot.

Model	Params	Token Savings
moondream	1.7B	36.3%
llava-llama3	8B	31.2%
minicpm-v	8B	25.9%
llava:7b	7B	24.2%

On GPT-4o with all seven optimizations enabled:

Scale	Direct Cost	Token0 Cost	Monthly Savings
1K images/day	$67.58	$0.74	$66.83
10K images/day	$675.75	$7.45	$668.30
100K images/day	$6,757.50	$74.47	$6,683.03

That is a 98.9% cost reduction.

Key finding: OCR routing alone delivers 47-70% token savings on text-heavy images. If you do nothing else, just routing screenshots and documents through OCR instead of vision is worth it.

Quick Start

Install from PyPI:

pip install token0

Add your API key to a .env file:

OPENAI_API_KEY=sk-...

Start the server:

token0 serve

That is it. No Docker, no Postgres, no Redis. Token0 starts in lite mode by default with SQLite and in-memory cache.

Now change your base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-...",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
        ]
    }],
    extra_headers={"X-Provider-Key": "sk-..."}
)

Check your savings:

curl http://localhost:8000/v1/usage

{
    "total_requests": 47,
    "total_tokens_saved": 12840,
    "total_cost_saved_usd": 0.0321,
    "avg_compression_ratio": 3.2
}

Works With Everything

Token0 supports four providers:

OpenAI -- GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
Anthropic -- Claude Sonnet, Claude Opus, Claude Haiku
Google -- Gemini 2.5 Flash, Gemini 2.5 Pro
Ollama -- moondream, llava, llava-llama3, minicpm-v, any vision model

For production, switch to full mode with PostgreSQL, Redis, and S3:

pip install token0[full]

Try It

PyPI: pip install token0
GitHub: github.com/Pritom14/token0
License: Apache 2.0

Fully open source. If you are sending images to LLMs and paying for vision tokens, give it a try and let me know what savings you see.

Carbon Layer v0.6 : Webhook resilience testing for payment handlers (idempotency, out-of-order, signature verification)

Pritom Mazumdar — Tue, 24 Mar 2026 08:46:54 +0000

New release of Carbon Layer : the open source chaos engineering tool for payment flows.

v0.5 added multi-provider support (Razorpay, Stripe, Cashfree, Juspay). v0.6 focuses on a different problem: how resilient is your webhook handler?

The problem

Most webhook handlers are tested against the happy path : one event, correct signature, delivered in order. Production is different:

Payment gateways retry failed deliveries, so your handler gets the same webhook 2-5 times
Webhook delivery order is not guaranteed — payment.captured can arrive before payment.authorized
If your handler doesn't verify signatures, anyone can forge webhook events

These are the bugs that don't show up in staging.

What's new in v0.6

Idempotency testing : Fire each webhook N times and see if your handler processes it once or N times:

carbon run dispute-spike --provider mock \
  --webhook-url http://localhost:8000/webhooks \
  --webhook-repeat 5

Out-of-order delivery : Randomize or reverse webhook delivery order:

carbon run dispute-spike --provider mock \
  --webhook-url http://localhost:8000/webhooks \
  --webhook-order random

Signature verification : Send webhooks with missing, corrupted, or wrong-secret signatures:

carbon run dispute-spike --provider mock \
  --webhook-url http://localhost:8000/webhooks \
  --webhook-signature missing

Webhook replay : Re-fire webhooks from any previous run. Useful for regression testing:

carbon replay <run_id> --webhook-url http://localhost:8000/webhooks

CI/CD exit codes : Exit with code 1 if any webhook returned 5xx or timed out:

carbon run dispute-spike --provider mock \
  --webhook-url http://localhost:8000/webhooks \
  --ci

4 new scenarios

upi-timeout : UPI payments stuck without terminal status
vpa-not-found : Invalid UPI VPA failures
mandate-rejection : UPI autopay mandate rejections
settlement-delay : Refunds on captured-but-unsettled payments

That brings us to 11 scenarios total.

Quick start

pip install carbon-layer
carbon run dispute-spike --provider mock --webhook-url http://localhost:8000/webhooks

No database setup, no gateway credentials. 11 scenarios, 5 providers, webhook resilience testing. Apache 2.0.

GitHub: github.com/Pritom14/carbon-layer

We're building a hosted version with dashboards, scheduled runs, and compliance reports. Join the waitlist: pritom14.github.io/carbon-layer/waitlist

Carbon Layer v0.5.1 — Chaos testing for payment webhooks, now with Juspay (4 providers supported)

Pritom Mazumdar — Mon, 23 Mar 2026 11:52:09 +0000

Quick update on Carbon Layer — the open-source chaos engineering tool for payment flows.

What's new in v0.5

Juspay support -> Carbon Layer now generates Juspay-specific webhook payloads (ORDER_SUCCEEDED, ORDER_FAILED, ORDER_REFUNDED) with Basic Auth signing. That brings us to 4 providers:

Razorpay
Signature Header: X-Razorpay-Signature
Signing Method: HMAC-SHA256
Encoding: Hex
Stripe
Signature Header: Stripe-Signature
Header Format: t=..., v1=...
Signing Method: HMAC-SHA256
Extra: Includes timestamp for replay-attack protection
Cashfree
Signature Header: x-webhook-signature
Signing Method: HMAC-SHA256
Encoding: Base64
Juspay
Header: Authorization: Basic ...
Authentication Method: Basic Auth
Format: Base64 encoded username:password credentials

Each implementation is verified against the provider's official documentation.

# Juspay webhooks (new)
carbon run dispute-spike --provider juspay --juspay-key your_key --juspay-merchant-id your_mid --webhook-url http://localhost:8000/webhooks

# Stripe webhooks
carbon run dispute-spike --provider stripe --webhook-url http://localhost:8000/webhooks

# Cashfree webhooks
carbon run dispute-spike --provider cashfree --webhook-url http://localhost:8000/webhooks

# Razorpay webhooks
carbon run dispute-spike --provider razorpay --webhook-url http://localhost:8000/webhooks

Why Juspay is different

Juspay doesn't use HMAC signing for webhooks —> it uses Basic Auth. Their webhook payload format is also unique: everything is wrapped under content.order rather than separate entity types. Disputes are dashboard-only (no API), and payments auto-capture (no manual capture step). The adapter handles all of these quirks.

Quick start

pip install carbon-layer
carbon run dispute-spike --provider mock --webhook-url http://localhost:8000/webhooks

No database setup, no gateway credentials. SQLite by default, PostgreSQL optional.

What Carbon Layer does

If you're new here —> Carbon Layer simulates payment failure modes (dispute spikes, refund storms, gateway errors) and fires signed webhook events at your endpoint. It reports exactly what your handler returned for each event type.

7 scenarios built in. 4 providers. HTML reports. CI/CD callbacks.

What's next

We're building a hosted version with dashboards, scheduled runs, and compliance reports for teams.

Join the waitlist: pritom14.github.io/carbon-layer/waitlist

GitHub: github.com/Pritom14/carbon-layer

Feedback welcome : especially from teams using Juspay in production.

Carbon Layer v0.4 — Chaos testing for payment webhooks, now with Stripe and Cashfree support

Pritom Mazumdar — Sat, 21 Mar 2026 09:37:15 +0000

A few days ago I shared Carbon Layer -> an open-source chaos engineering tool for payment flows. Here's what's new:

Multi-provider webhook support

Carbon Layer now generates provider-specific webhook payloads for Razorpay, Stripe, and Cashfree with the correct signing format for each:

Razorpay
Signature Header: X-Razorpay-Signature
Signing Method: HMAC-SHA256
Encoding: Hex encoded
Stripe
Signature Header: Stripe-Signature
Header Format: t=timestamp, v1=signature
Signing Method: HMAC-SHA256
Extra Security: Includes timestamp verification to prevent replay attacks
Cashfree
Signature Header: x-webhook-signature
Signing Method: HMAC-SHA256
Encoding: Base64 encoded

Your webhook handler gets tested with the exact same format and signing it sees in production. We verified each implementation against the provider's official documentation.

# Stripe webhooks
carbon run dispute-spike --provider stripe --webhook-url http://localhost:8000/webhooks

# Cashfree webhooks
carbon run dispute-spike --provider cashfree --webhook-url http://localhost:8000/webhooks

# Razorpay webhooks (original)
carbon run dispute-spike --provider razorpay --webhook-url http://localhost:8000/webhooks

Zero-config install

The other big change — PostgreSQL is no longer required. Carbon Layer now uses SQLite by default:

pip install carbon-layer
carbon run dispute-spike --provider mock --webhook-url http://localhost:8000/webhooks

No database setup. No environment variables. Two commands and you're testing.

PostgreSQL is still supported for teams: pip install carbon-layer[postgres]

What Carbon Layer does

If you missed the first post — Carbon Layer simulates payment failure modes (dispute spikes, refund storms, gateway errors) and fires signed webhook events at your endpoint. It reports exactly what your handler returned for each event type.

7 scenarios built in. Provider specific signed payloads. HTML reports. CI/CD callbacks.

What's next

We're building a hosted version with dashboards, scheduled runs, and compliance reports for teams.

If that interests you, join the waitlist:carbon-pro-waitlist

GitHub: github.com/Pritom14/carbon-layer

Feedback welcome —> especially if you've tried it against your Stripe/Razorpay or Cashfree webhook handlers.

We fired 150 dispute webhooks at a payment service. 12 handlers crashed. Here's what we built.

Pritom Mazumdar — Wed, 18 Mar 2026 08:42:53 +0000

Every company processing payments tests the happy path.

Payment succeeds, order gets fulfilled, customer gets a confirmation email. That’s the flow that gets reviewed, tested in staging, and monitored in production.

What doesn’t get tested is everything else.

Dispute spikes. Refund storms. Gateway errors that leave orders stuck. Webhook sequences your handlers were never built to handle at volume.

These are the failure modes that show up in production usually at the worst possible time.

The problem is not that engineering teams don’t care.
It’s that the tools to test this don’t exist.

Why you can’t test this in Razorpay’s sandbox

Razorpay’s test API cannot create disputes.

Disputes are raised by banks and card networks, not merchants. There is no POST /disputes endpoint.

Even if you could trigger disputes manually, you can’t fire 150 of them in 10 seconds on a test account. Razorpay would rate limit you. And you can’t control the timing or sequence of webhook events in any sandbox.

So the failure mode you most need to test is the one the provider doesn’t let you simulate.

Teams ship, cross their fingers, and find out what breaks when customers find it first.

What we built

Carbon Layer is an open-source chaos engineering tool for payment flows.

You run a scenario dispute spike, refund storm, payment decline spike and it fires Razorpay format webhook events directly at your endpoint.

Same JSON shape.
Same headers.
Same HMAC-SHA256 signature as real Razorpay webhooks.

Your server can’t tell the difference.

pip install carbon-layer

carbon run dispute-spike \
  --provider mock \
  --webhook-url http://localhost:8000/webhooks/razorpay

No Razorpay account needed.
No sandbox credentials.
No rate limits.

The report

The report shows exactly what happened:

Webhook Delivery Summary
Target: http://localhost:8000/webhooks/razorpay

Event Type                 Sent    2xx    4xx    5xx    Timeout
payment.captured            100     98      0      1          1
payment.dispute.created     150    135      0     12          3
refund.processed             50     49      0      1          0

Total                       300    282      0     14          4

14 events your handler didn’t process correctly.

In production, each unhandled dispute is a chargeback the merchant loses by default.

Scenarios available

• dispute-spike — 150 disputes on captured payments
• payment-decline-spike — Simulates a 30% payment failure rate
• refund-storm — Mass refunds across captured payments
• flash-sale — High-volume order and payment flow
• gateway-error-burst — Intermittent gateway failures
• min-amount — Minimum paise transactions
• max-amount — Large-value transactions

All scenarios work with the mock adapter.
No external account required.

If you use Razorpay

You can also run scenarios against your actual Razorpay test account:

carbon run dispute-spike \
  --provider razorpay \
  --api-key rzp_test_xxx \
  --api-secret yyy \
  --webhook-url https://your-staging-app.com/webhooks/razorpay

Note: Razorpay’s API doesn’t support server-side payment or dispute creation.
Scenarios that need these fall back to the mock adapter automatically.

Try it

pip install carbon-layer

GitHub:

https://github.com/Pritom14/carbon-layer

Update: new features shipped in v0.2.0

Three new features based on feedback:

Parameter overrides -- override scenario parameters at runtime without editing YAML:

carbon run dispute-spike --set baseline_orders=500 --set dispute_rate=0.3

HTML reports -- export a shareable report for your team:

carbon report --run-id <id> --format html

CI/CD integration -- POST run results to your pipeline:

carbon run dispute-spike \
    --webhook-url http://localhost:8000/webhooks/razorpay \
    --callback-url http://ci/carbon/results

Built with Python, asyncpg, httpx, and Typer.
Open source under Apache 2.0.

Feedback welcome especially if you’re building on Razorpay and want to run this against your staging environment.