<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pritom Mazumdar</title>
    <description>The latest articles on DEV Community by Pritom Mazumdar (@pritom14).</description>
    <link>https://dev.to/pritom14</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3830966%2F061117bf-b5c8-4495-9afa-26bc517bb90c.jpg</url>
      <title>DEV Community: Pritom Mazumdar</title>
      <link>https://dev.to/pritom14</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pritom14"/>
    <language>en</language>
    <item>
      <title>Tired of Being Paged at 3am? Let Your AI Handle the Runbook</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Fri, 03 Apr 2026 18:58:49 +0000</pubDate>
      <link>https://dev.to/pritom14/tired-of-being-paged-at-3am-let-your-ai-handle-the-runbook-1l0c</link>
      <guid>https://dev.to/pritom14/tired-of-being-paged-at-3am-let-your-ai-handle-the-runbook-1l0c</guid>
      <description>&lt;p&gt;When that alert fires at 3:14am on Sunday, you know the drill: VPN in, SSH to the server, check logs, maybe restart the service, page escalates to someone else. You've probably done this 100 times.&lt;/p&gt;

&lt;p&gt;What if the runbook executed itself? &lt;a href="https://www.loom.com/share/94472577f5ce4dd18975801e7877838c" rel="noopener noreferrer"&gt;See it for yourself&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meet RunbookAI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RunbookAI is an open-source autonomous incident response agent. Connect it to PagerDuty, fire a webhook at it, and it reads your runbook, diagnoses the problem, and acts—without paging a human first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How It Works&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Alert fires&lt;/strong&gt; → RunbookAI reads the runbook&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diagnosis&lt;/strong&gt; → runs tools: check_logs, http_check, run_db_check, query_metrics, check_disk, check_processes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remediation&lt;/strong&gt; → executes: restart_service, clear_disk, scale_service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resolves or escalates&lt;/strong&gt; → full summary, no human was involved
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Pritom14/runbookai
&lt;span class="nb"&gt;cd &lt;/span&gt;runbookai
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[dev]"&lt;/span&gt;

&lt;span class="c"&gt;# Run with local LLM (no API keys)&lt;/span&gt;
ollama pull qwen2.5:7b
&lt;span class="nv"&gt;DEMO_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true &lt;/span&gt;uvicorn runbookai.main:app &lt;span class="nt"&gt;--port&lt;/span&gt; 7000
python demo/run_demo.py regression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Game-Changer: Regression Detection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the real magic. Your service crashed 2 hours ago. RunbookAI restarted it. But if it crashes &lt;em&gt;again&lt;/em&gt; within 6 hours, the agent is warned: "Don't just restart again—you did that before. Dig deeper."&lt;/p&gt;

&lt;p&gt;Instead of blindly running the same remediation, it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks for new logs&lt;/li&gt;
&lt;li&gt;Queries recent metrics changes&lt;/li&gt;
&lt;li&gt;Looks for disk space issues, process hangs, or configuration drift&lt;/li&gt;
&lt;li&gt;Suggests a root cause before acting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns "fix the symptom" into "understand the problem."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suggest Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;High-risk actions (service restart, disk cleanup, scale-up) pause with a 5-second countdown for human approval. You stay in control while the agent handles the grunt work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-Generated Postmortem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After every resolved incident, hit &lt;code&gt;GET /incidents/{id}/postmortem&lt;/code&gt; and get a ready-to-share markdown document: full timeline, actions taken, regression analysis, duration, and a recommendations checklist. Two hours of postmortem work, done automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slack Lifecycle Notifications&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;SLACK_WEBHOOK_URL&lt;/code&gt; and RunbookAI posts a rich message at every stage: incident started, approval required (with the curl command to approve), resolved with duration, escalated with reason. Your Slack channel becomes your incident dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AgentTrace Replay UI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every tool call, every decision, every second of the remediation is logged. Open the browser, replay the entire incident timeline. Understand what the agent decided and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Open Source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Incident response is deeply custom—every company's runbooks, tools, and risk tolerance differ. We ship the core (diagnosis + remediation) free and self-hosted. No vendor lock-in, no SaaS fee, no pinging external APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No API Keys Needed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Runs on Ollama locally. qwen2.5:7b is small, fast, and good enough for runbook reasoning. Everything stays on your infrastructure. Or swap in OpenAI, Anthropic, or Groq with a single env var, no code changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Pritom14/runbookai" rel="noopener noreferrer"&gt;https://github.com/Pritom14/runbookai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try it now. Fire a demo alert. See regression detection in action. Fork, extend, and own your incident response.&lt;/p&gt;




</description>
      <category>pagerduty</category>
      <category>ai</category>
      <category>claude</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I Built Video Token Optimization for Vision LLMs: Cutting Costs 13-45% with Frame Dedup + Scene Detection</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Mon, 30 Mar 2026 19:30:58 +0000</pubDate>
      <link>https://dev.to/pritom14/how-i-built-video-token-optimization-for-vision-llms-cutting-costs-13-45-with-frame-dedup-scene-2ic</link>
      <guid>https://dev.to/pritom14/how-i-built-video-token-optimization-for-vision-llms-cutting-costs-13-45-with-frame-dedup-scene-2ic</guid>
      <description>&lt;p&gt;A few weeks ago I launched &lt;a href="https://github.com/Pritom14/token0" rel="noopener noreferrer"&gt;Token0&lt;/a&gt; -- an open-source proxy that optimizes images before they hit vision LLMs like GPT-4o, Claude, and Ollama models. The reception was good, so I kept building.&lt;/p&gt;

&lt;p&gt;The most requested feature was video. If images are expensive, video is brutal -- every second at 30fps is 30 images. This post covers how I built the video optimization pipeline, what I learned benchmarking it across 5 models, and the model-aware edge case that nearly broke everything.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Problem with Naive Video&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most apps that analyze video do one of two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract frames at 1fps and send every one of them&lt;/li&gt;
&lt;li&gt;Send a handful of manually selected keyframes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both approaches waste tokens in predictable ways. At 1fps on a 60-second product demo video:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You get 60 frames&lt;/li&gt;
&lt;li&gt;Frames 1-29 of the same talking head are near-identical (Hamming distance &amp;lt; 10 between perceptual hashes)&lt;/li&gt;
&lt;li&gt;The only frames with unique information are at scene transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're paying for 60 images when 8-12 contain all the information.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Pipeline: 4 Layers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Token0's video optimization runs in four stages, each optional and composable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Frame Extraction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenCV extracts frames at 1fps (configurable). A 60s video at 30fps → 60 frames. Hard cap at 32 frames sent to the LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_frames&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_frames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VideoCapture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;video_fps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CAP_PROP_FPS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;
    &lt;span class="n"&gt;frame_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_fps&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;fps&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="c1"&gt;# yield every frame_interval-th frame as PIL image
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 2: QJL Perceptual Hash Deduplication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the core insight. I reused the same QJL (Quantized Johnson-Lindenstrauss) hash infrastructure I built for the image cache:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compute 256-bit perceptual hash of each frame (dhash on 16x16 grayscale)&lt;/li&gt;
&lt;li&gt;Compress to 128-bit binary signature using a random JL projection matrix&lt;/li&gt;
&lt;li&gt;Compute Hamming distance between consecutive frames&lt;/li&gt;
&lt;li&gt;If distance &amp;lt; 12, drop the frame (near-duplicate)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEDUP_HAMMING_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;  &lt;span class="c1"&gt;# tighter than cache (consecutive frames are very similar)
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deduplicate_frames&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hamming_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DEDUP_HAMMING_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;kept&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;prev_sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_jl_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_image_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]:&lt;/span&gt;
        &lt;span class="n"&gt;sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_jl_compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_image_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_hamming_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev_sig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;hamming_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;kept&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;prev_sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;kept&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a document scanning video (invoice + receipt + screenshot on screen), this collapsed 15 consecutive near-duplicate frames down to 3 unique ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Scene Change Detection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pixel-level diff between consecutive frames (160x120 downsampled, mean absolute difference). Frames above the threshold (15.0 mean pixel diff) are kept as scene boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_scene_changes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;15.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;kept&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;prev_arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;160&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;))).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;curr_arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;160&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;))).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curr_arr&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;prev_arr&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;kept&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frames&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;kept&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Layer 4: CLIP Scoring (optional, Layer 2)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;sentence-transformers&lt;/code&gt; is installed, Token0 scores each remaining frame against the user's prompt using CLIP (ViT-B/32) and keeps the top-K most relevant. Code is wired in but CLIP is an optional dependency -- most deployments skip this and the first three layers are already effective.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Each Keyframe Goes Through the Full Image Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After frame selection, every keyframe runs through the existing image optimization stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smart resize (downscale to provider max)&lt;/li&gt;
&lt;li&gt;OCR routing (if the frame is text-heavy)&lt;/li&gt;
&lt;li&gt;JPEG recompression&lt;/li&gt;
&lt;li&gt;Prompt-aware detail mode&lt;/li&gt;
&lt;li&gt;Tile-optimized resize&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means you get compounding savings: fewer frames &lt;strong&gt;and&lt;/strong&gt; each frame is smaller.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Benchmark Results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tested against 5 Ollama vision models using 3 videos (product showcase, document montage, mixed content). Naive baseline = all frames at 1fps sent raw. Token0 = full pipeline.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Naive Tokens&lt;/th&gt;
&lt;th&gt;Token0 Tokens&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemma3:4b&lt;/td&gt;
&lt;td&gt;14,706&lt;/td&gt;
&lt;td&gt;8,081&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;45.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava:7b&lt;/td&gt;
&lt;td&gt;15,731&lt;/td&gt;
&lt;td&gt;12,845&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;18.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava-llama3&lt;/td&gt;
&lt;td&gt;15,658&lt;/td&gt;
&lt;td&gt;12,789&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;18.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;minicpm-v&lt;/td&gt;
&lt;td&gt;7,428&lt;/td&gt;
&lt;td&gt;6,447&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;13.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;moondream&lt;/td&gt;
&lt;td&gt;12,288&lt;/td&gt;
&lt;td&gt;11,714&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.7%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why the spread?&lt;/strong&gt; Gemma3 uses a high-resolution image encoder -- it's 45% because there are many tokens to remove per frame. Moondream uses a tiny encoder (~50 tokens/frame) -- frame dedup has less absolute impact even when it removes the same number of frames.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o extrapolation&lt;/strong&gt; (using OpenAI's published tile formula):&lt;/p&gt;

&lt;p&gt;60s video, 30fps → 1fps = 60 frames → dedup to ~10 keyframes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Naive: 60 × 425 tokens = &lt;strong&gt;25,500 tokens&lt;/strong&gt; (~$0.064/video)&lt;/li&gt;
&lt;li&gt;Token0: 10 × 425 = &lt;strong&gt;4,250 tokens&lt;/strong&gt; (~$0.011/video)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~83% savings&lt;/strong&gt; per video&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 10K videos/day: $19,125/mo → $3,188/mo.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Edge Case That Nearly Broke Everything&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While benchmarking, I discovered that llama3.2-vision was showing &lt;strong&gt;-124% savings&lt;/strong&gt; (negative -- Token0 was making it worse).&lt;/p&gt;

&lt;p&gt;The root cause was two bugs stacked on top of each other:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 1: Provider detection miss&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;get_provider_from_model()&lt;/code&gt; didn't include &lt;code&gt;llama3.2-vision&lt;/code&gt;, so it fell through to the &lt;code&gt;"openai"&lt;/code&gt; default. OCR routing was then skipped because it's only enabled for models where image tokens &amp;gt; OCR text tokens -- but with the wrong provider, the estimate formula was wrong.&lt;/p&gt;

&lt;p&gt;Fix: explicitly add &lt;code&gt;llama3.2-vision&lt;/code&gt;, &lt;code&gt;llama3.2&lt;/code&gt;, &lt;code&gt;gemma3&lt;/code&gt;, &lt;code&gt;granite3.2&lt;/code&gt;, &lt;code&gt;qwen2.5vl&lt;/code&gt;, &lt;code&gt;qwen3-vl&lt;/code&gt; to the Ollama model list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 2: Ultra-efficient encoders break the OCR savings assumption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;llama3.2-vision uses ~8-27 tokens per image natively. The standard OCR flow routes text-heavy images to EasyOCR and returns extracted text (~200-700 tokens depending on content). For a model that uses 15 tokens/image, returning 300 tokens of OCR text is &lt;strong&gt;20x more expensive&lt;/strong&gt;, not cheaper.&lt;/p&gt;

&lt;p&gt;The fix was a named allowlist of ultra-efficient models that skip OCR entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_ultra_efficient_models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2-vision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;is_ultra_efficient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_ultra_efficient_models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;is_ultra_efficient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Skip OCR -- image tokens are already cheaper than text extraction
&lt;/span&gt;    &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reasons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skip OCR: ultra-efficient encoder (~&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;estimated_image_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens &amp;lt; OCR cost)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After both fixes: llama3.2-vision went from -124% to 0% (correct passthrough). gemma3 stayed at 24.8% (was briefly broken by an intermediate fix attempt). granite3.2-vision: 53.1%.&lt;/p&gt;

&lt;p&gt;The lesson: optimization strategies that help high-token-count models hurt ultra-efficient ones. You need model-aware routing, not just image-aware routing.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;How to Use Video in Token0&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;token0
token0 serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_demo.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;video_b64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What happens in this video?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:video/mp4;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;video_b64&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Provider-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Token0 extracted keyframes, deduped, optimized, forwarded
# response.token0.tokens_saved, optimizations_applied, etc.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Already using LiteLLM? Video works through the hook too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;litellm&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;token0.litellm_hook&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Token0Hook&lt;/span&gt;

&lt;span class="n"&gt;litellm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;callbacks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Token0Hook&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;span class="c1"&gt;# video_url content type automatically handled
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLIP scoring (Layer 2)&lt;/strong&gt;: score each frame against the user's prompt and keep the top-K most relevant. Code is wired, needs &lt;code&gt;pip install sentence-transformers clip&lt;/code&gt; to activate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Saliency-based ROI cropping&lt;/strong&gt;: detect what region the prompt is asking about, crop and send only that. "What's the total?" on an invoice → crop to bottom-right only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive quality escalation&lt;/strong&gt;: send low-detail first (85 tokens), retry at high-detail only if the response shows uncertainty. Happy path (60-70% of cases) = massive savings.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Apache 2.0. &lt;code&gt;pip install token0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Pritom14/token0" rel="noopener noreferrer"&gt;github.com/Pritom14/token0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're processing video through vision LLMs and have benchmarks on your own models, I'd love to compare notes. Especially curious about Gemini 2.5 Pro's native video support vs frame-by-frame through Token0.&lt;/p&gt;




</description>
      <category>python</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Token0 v0.2.0: Streaming Support + Updated Benchmarks : 35-42% Savings Across 4 Vision Models</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Fri, 27 Mar 2026 11:46:39 +0000</pubDate>
      <link>https://dev.to/pritom14/token0-v020-streaming-support-updated-benchmarks-35-42-savings-across-4-vision-models-1m1i</link>
      <guid>https://dev.to/pritom14/token0-v020-streaming-support-updated-benchmarks-35-42-savings-across-4-vision-models-1m1i</guid>
      <description>&lt;p&gt;A few days ago I launched &lt;a href="https://github.com/Pritom14/token0" rel="noopener noreferrer"&gt;Token0&lt;/a&gt; -- an open-source API proxy that makes vision LLM calls cheaper by optimizing images before they hit the model. The response was great, so here is the first real update: &lt;strong&gt;v0.2.0 with full streaming support and expanded benchmarks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's New in v0.2.0&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Streaming support (&lt;code&gt;stream=true&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the most requested feature. Token0 now supports Server-Sent Events streaming across all four providers -- OpenAI, Anthropic, Google, and Ollama.&lt;/p&gt;

&lt;p&gt;How it works: Token0 optimizes your images &lt;em&gt;before&lt;/em&gt; streaming begins, then tokens flow word-by-word exactly like native provider APIs. You get the cost savings without sacrificing the real-time UX.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe this image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Provider-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Final chunk includes token0 stats (tokens_saved, optimizations_applied)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details worth noting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-compatible SSE format&lt;/strong&gt; -- &lt;code&gt;data: {...}\n\n&lt;/code&gt; chunks with &lt;code&gt;delta&lt;/code&gt; (not &lt;code&gt;message&lt;/code&gt;), ending with &lt;code&gt;data: [DONE]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimization stats on the final chunk&lt;/strong&gt; -- the last streaming chunk includes a &lt;code&gt;token0&lt;/code&gt; field with tokens saved and which optimizations were applied&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached responses stream too&lt;/strong&gt; -- if Token0 has a cache hit, it simulates streaming by sending the cached response in small chunks, so your client code does not need to handle two different response formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero overhead on text-only&lt;/strong&gt; -- if there are no images in the request, streaming passes through with no added latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Expanded benchmarks (full suite)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In v0.1.0, I only benchmarked on the real-world image suite (5 images). For v0.2.0, I ran the full benchmark suite across 6 categories: single images, text passthrough, multi-image requests, multi-turn conversations, different task types (classification, extraction, description, Q&amp;amp;A), and real-world images.&lt;/p&gt;

&lt;p&gt;Results across all four Ollama vision models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Direct Tokens&lt;/th&gt;
&lt;th&gt;Token0 Tokens&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;minicpm-v&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;10,877&lt;/td&gt;
&lt;td&gt;6,276&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;moondream&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;16,457&lt;/td&gt;
&lt;td&gt;10,240&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;37.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava-llama3&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;13,365&lt;/td&gt;
&lt;td&gt;8,486&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;36.5%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava:7b&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;13,384&lt;/td&gt;
&lt;td&gt;8,701&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;35.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The numbers are higher than v0.1.0 because the full suite includes more text-heavy test cases where OCR routing delivers 93-97% savings per image.&lt;/p&gt;

&lt;p&gt;Key findings from the expanded benchmarks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OCR routing is the biggest win&lt;/strong&gt;: 93-97% savings on text-heavy images (documents, screenshots, receipts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero overhead on text-only&lt;/strong&gt;: confirmed 0 extra tokens across all 4 models on text-only requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn conversations&lt;/strong&gt;: images in conversation history get optimized too -- no wasted tokens on re-sent images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency improves in most cases&lt;/strong&gt;: OCR routing is actually faster than sending the full image to the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Ollama provider routing fix&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;v0.1.0 had a bug where Ollama models (moondream, llava, etc.) could be incorrectly routed to the OpenAI provider. Fixed -- Token0 now correctly detects and routes all Ollama vision models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o Cost Projections (unchanged)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These projections from v0.1.0 still hold -- they are based on OpenAI's published token formulas, not local model benchmarks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;th&gt;Without Token0&lt;/th&gt;
&lt;th&gt;With Token0&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1K images/day&lt;/td&gt;
&lt;td&gt;$67.58/mo&lt;/td&gt;
&lt;td&gt;$0.74/mo&lt;/td&gt;
&lt;td&gt;98.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100K images/day&lt;/td&gt;
&lt;td&gt;$6,757.50/mo&lt;/td&gt;
&lt;td&gt;$74.47/mo&lt;/td&gt;
&lt;td&gt;98.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; token0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. No config changes needed. Streaming works automatically when you pass &lt;code&gt;stream=True&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's Next&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Video optimization&lt;/strong&gt; -- keyframe extraction + per-frame optimization for video LLM calls&lt;/li&gt;
&lt;li&gt;More provider-specific optimizations as new models launch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;code&gt;pip install token0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Pritom14/token0" rel="noopener noreferrer"&gt;github.com/Pritom14/token0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; Apache 2.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Already using LiteLLM? Token0 plugs in as a callback hook -- &lt;code&gt;litellm.callbacks = [Token0Hook()]&lt;/code&gt; -- no proxy needed. If you tried v0.1.0, upgrade and let me know how streaming works on your workload. If you haven't tried it yet -- &lt;code&gt;pip install token0 &amp;amp;&amp;amp; token0 serve&lt;/code&gt; and change your base URL. That is all it takes.&lt;/p&gt;




</description>
      <category>webdev</category>
      <category>vision</category>
      <category>python</category>
    </item>
    <item>
      <title>I Cut Vision LLM Costs by 98.9% -&gt; Here's How Token0 Works Under the Hood</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Fri, 27 Mar 2026 04:34:50 +0000</pubDate>
      <link>https://dev.to/pritom14/i-cut-vision-llm-costs-by-989-heres-how-token0-works-under-the-hood-4ldc</link>
      <guid>https://dev.to/pritom14/i-cut-vision-llm-costs-by-989-heres-how-token0-works-under-the-hood-4ldc</guid>
      <description>&lt;p&gt;Every time you send an image to GPT-4o, Claude, or Gemini, you are paying for vision tokens. And most of them are wasted.&lt;/p&gt;

&lt;p&gt;I built Token0 : an open-source API proxy that sits between your app and the LLM provider, optimizes every image request automatically, and typically saves 70-99% on vision costs. It is now live on PyPI.&lt;/p&gt;

&lt;p&gt;In this post, I will walk through the problem, the seven optimization strategies, the benchmarks, and how to get started in under a minute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem: Vision Tokens Are Expensive and Poorly Optimized&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Text token optimization is a solved problem. Prompt caching, compression, smart routing : the tooling is mature.&lt;/p&gt;

&lt;p&gt;But images : the modality that costs 2-5x more per token have almost no optimization tooling.&lt;/p&gt;

&lt;p&gt;Here is what happens today:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wasted pixels.&lt;/strong&gt; You send a 4000x3000 photo to Claude. Claude silently downscales it to 1568px max. You paid for the original resolution. Those tokens are gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong modality.&lt;/strong&gt; A screenshot of a document costs ~765 tokens on GPT-4o as an image. The same information extracted as text costs ~30 tokens. That is a 25x markup for identical information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong detail level.&lt;/strong&gt; "Classify this image" on GPT-4o uses high-detail mode at 1,105 tokens. Low-detail mode gives the same answer for 85 tokens. A 13x difference that nobody is optimizing for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wasted tiles.&lt;/strong&gt; GPT-4o tiles images into 512x512 blocks. A 1280x720 image creates 4 tiles (765 tokens). Resizing to 1024x768 gives 2 tiles (425 tokens). 44% savings, zero quality loss.&lt;/p&gt;

&lt;p&gt;** How Token0 Works **&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your App --&amp;gt; Token0 Proxy --&amp;gt; [Analyze -&amp;gt; Classify -&amp;gt; Route -&amp;gt; Transform -&amp;gt; Cache] --&amp;gt; LLM Provider
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You change one line -- your base URL -- and Token0 handles everything automatically.&lt;/p&gt;

&lt;p&gt;Token0 applies seven optimizations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Smart Resize&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each provider has a maximum resolution it actually processes. Claude caps at 1568px, GPT-4o at 2048px. Token0 downscales to these limits before sending. No quality loss because the provider would have done the same thing you just stop paying for the discarded pixels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. OCR Routing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When an image is mostly text (screenshots, receipts, invoices, documents), Token0 extracts the text via OCR and sends that instead. Text tokens cost 10-50x less than vision tokens.&lt;/p&gt;

&lt;p&gt;The detection uses a multi-signal heuristic: background uniformity, color variance, horizontal line structure, and edge density. Validated at 91% accuracy on real world images. Photos are never falsely OCR routed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. JPEG Recompression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PNG screenshots get converted to optimized JPEG when transparency is not needed. Smaller payload, faster upload, same visual information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Prompt-Aware Detail Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the interesting one. Token0 analyzes your &lt;em&gt;prompt&lt;/em&gt;, not just the image, to decide the detail level.&lt;/p&gt;

&lt;p&gt;"What is in this image?" --&amp;gt; low detail (85 tokens)&lt;br&gt;
"Extract all the text from this receipt" --&amp;gt; high detail (1,105 tokens)&lt;/p&gt;

&lt;p&gt;A keyword classifier on the prompt text makes this decision. Simple queries get low-detail mode automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Tile-Optimized Resize&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenAI charges by 512px tiles. Token0 resizes images to land exactly on tile boundaries, minimizing the number of tiles without changing the aspect ratio meaningfully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Model Cascade&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not every image needs the flagship model. Token0 analyzes task complexity and routes simple tasks to cheaper models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o --&amp;gt; GPT-4o-mini (16.7x cheaper)&lt;/li&gt;
&lt;li&gt;Claude Opus --&amp;gt; Claude Haiku (6.25x cheaper)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Complex tasks stay on the original model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Semantic Response Cache&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Token0 generates a perceptual hash of each image combined with the prompt text. If a similar request has been seen before, the cached response is returned. Zero tokens consumed.&lt;/p&gt;

&lt;p&gt;This is particularly effective on repetitive workloads: product image classification, document processing pipelines, batch operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmarks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I tested Token0 on four Ollama vision models with real-world images -- actual photos, a real store receipt, a typed invoice, and a desktop screenshot.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Token Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;moondream&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;36.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava-llama3&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;31.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;minicpm-v&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;25.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;llava:7b&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;24.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On GPT-4o with all seven optimizations enabled:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;th&gt;Direct Cost&lt;/th&gt;
&lt;th&gt;Token0 Cost&lt;/th&gt;
&lt;th&gt;Monthly Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1K images/day&lt;/td&gt;
&lt;td&gt;$67.58&lt;/td&gt;
&lt;td&gt;$0.74&lt;/td&gt;
&lt;td&gt;$66.83&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10K images/day&lt;/td&gt;
&lt;td&gt;$675.75&lt;/td&gt;
&lt;td&gt;$7.45&lt;/td&gt;
&lt;td&gt;$668.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100K images/day&lt;/td&gt;
&lt;td&gt;$6,757.50&lt;/td&gt;
&lt;td&gt;$74.47&lt;/td&gt;
&lt;td&gt;$6,683.03&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is a 98.9% cost reduction.&lt;/p&gt;

&lt;p&gt;Key finding: OCR routing alone delivers 47-70% token savings on text-heavy images. If you do nothing else, just routing screenshots and documents through OCR instead of vision is worth it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Start&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install from PyPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;token0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add your API key to a &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;token0 serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. No Docker, no Postgres, no Redis. Token0 starts in lite mode by default with SQLite and in-memory cache.&lt;/p&gt;

&lt;p&gt;Now change your base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s in this image?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Provider-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check your savings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8000/v1/usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_requests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;47&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens_saved"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12840&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_cost_saved_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0321&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"avg_compression_ratio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;3.2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Works With Everything&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Token0 supports four providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt; -- GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic&lt;/strong&gt; -- Claude Sonnet, Claude Opus, Claude Haiku&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google&lt;/strong&gt; -- Gemini 2.5 Flash, Gemini 2.5 Pro&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; -- moondream, llava, llava-llama3, minicpm-v, any vision model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For production, switch to full mode with PostgreSQL, Redis, and S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;token0[full]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try It&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;code&gt;pip install token0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Pritom14/token0" rel="noopener noreferrer"&gt;github.com/Pritom14/token0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; Apache 2.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fully open source. If you are sending images to LLMs and paying for vision tokens, give it a try and let me know what savings you see.&lt;/p&gt;




</description>
      <category>vision</category>
      <category>python</category>
      <category>webdev</category>
      <category>deepseek</category>
    </item>
    <item>
      <title>Carbon Layer v0.6 : Webhook resilience testing for payment handlers (idempotency, out-of-order, signature verification)</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Tue, 24 Mar 2026 08:46:54 +0000</pubDate>
      <link>https://dev.to/pritom14/carbon-layer-v06-webhook-resilience-testing-for-payment-handlers-idempotency-out-of-order-307d</link>
      <guid>https://dev.to/pritom14/carbon-layer-v06-webhook-resilience-testing-for-payment-handlers-idempotency-out-of-order-307d</guid>
      <description>&lt;p&gt;&lt;strong&gt;New release of Carbon Layer : the open source chaos engineering tool for payment flows.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;v0.5 added multi-provider support (Razorpay, Stripe, Cashfree, Juspay). v0.6 focuses on a different problem: &lt;strong&gt;how resilient is your webhook handler?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most webhook handlers are tested against the happy path : one event, correct signature, delivered in order. Production is different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment gateways retry failed deliveries, so your handler gets the same webhook 2-5 times&lt;/li&gt;
&lt;li&gt;Webhook delivery order is not guaranteed — &lt;code&gt;payment.captured&lt;/code&gt; can arrive before &lt;code&gt;payment.authorized&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;If your handler doesn't verify signatures, anyone can forge webhook events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the bugs that don't show up in staging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's new in v0.6&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotency testing&lt;/strong&gt; : Fire each webhook N times and see if your handler processes it once or N times:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-repeat&lt;/span&gt; 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Out-of-order delivery&lt;/strong&gt; : Randomize or reverse webhook delivery order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-order&lt;/span&gt; random
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Signature verification&lt;/strong&gt; : Send webhooks with missing, corrupted, or wrong-secret signatures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-signature&lt;/span&gt; missing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Webhook replay&lt;/strong&gt; : Re-fire webhooks from any previous run. Useful for regression testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon replay &amp;lt;run_id&amp;gt; &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CI/CD exit codes&lt;/strong&gt; : Exit with code 1 if any webhook returned 5xx or timed out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ci&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4 new scenarios&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;upi-timeout&lt;/code&gt; : UPI payments stuck without terminal status&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vpa-not-found&lt;/code&gt; : Invalid UPI VPA failures&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mandate-rejection&lt;/code&gt; : UPI autopay mandate rejections&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;settlement-delay&lt;/code&gt; : Refunds on captured-but-unsettled payments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;That brings us to 11 scenarios total.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick start&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;carbon-layer
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;No database setup, no gateway credentials. 11 scenarios, 5 providers, webhook resilience testing. Apache 2.0.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Pritom14/carbon-layer" rel="noopener noreferrer"&gt;github.com/Pritom14/carbon-layer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're building a hosted version with dashboards, scheduled runs, and compliance reports. Join the waitlist: &lt;a href="https://pritom14.github.io/carbon-layer/waitlist" rel="noopener noreferrer"&gt;pritom14.github.io/carbon-layer/waitlist&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>Carbon Layer v0.5.1 — Chaos testing for payment webhooks, now with Juspay (4 providers supported)</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Mon, 23 Mar 2026 11:52:09 +0000</pubDate>
      <link>https://dev.to/pritom14/carbon-layer-v051-chaos-testing-for-payment-webhooks-now-with-juspay-4-providers-supported-2lli</link>
      <guid>https://dev.to/pritom14/carbon-layer-v051-chaos-testing-for-payment-webhooks-now-with-juspay-4-providers-supported-2lli</guid>
      <description>&lt;p&gt;&lt;em&gt;Quick update on Carbon Layer — the open-source chaos engineering tool for payment flows.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's new in v0.5&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Juspay support&lt;/strong&gt; -&amp;gt; Carbon Layer now generates Juspay-specific webhook payloads (&lt;code&gt;ORDER_SUCCEEDED&lt;/code&gt;, &lt;code&gt;ORDER_FAILED&lt;/code&gt;, &lt;code&gt;ORDER_REFUNDED&lt;/code&gt;) with Basic Auth signing. That brings us to 4 providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Razorpay&lt;/li&gt;
&lt;li&gt;Signature Header: X-Razorpay-Signature&lt;/li&gt;
&lt;li&gt;Signing Method: HMAC-SHA256&lt;/li&gt;
&lt;li&gt;Encoding: Hex&lt;/li&gt;
&lt;li&gt;Stripe&lt;/li&gt;
&lt;li&gt;Signature Header: Stripe-Signature&lt;/li&gt;
&lt;li&gt;Header Format: t=..., v1=...&lt;/li&gt;
&lt;li&gt;Signing Method: HMAC-SHA256&lt;/li&gt;
&lt;li&gt;Extra: Includes timestamp for replay-attack protection&lt;/li&gt;
&lt;li&gt;Cashfree&lt;/li&gt;
&lt;li&gt;Signature Header: x-webhook-signature&lt;/li&gt;
&lt;li&gt;Signing Method: HMAC-SHA256&lt;/li&gt;
&lt;li&gt;Encoding: Base64&lt;/li&gt;
&lt;li&gt;Juspay&lt;/li&gt;
&lt;li&gt;Header: Authorization: Basic ...&lt;/li&gt;
&lt;li&gt;Authentication Method: Basic Auth&lt;/li&gt;
&lt;li&gt;Format: Base64 encoded username:password credentials&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each implementation is verified against the provider's official documentation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Juspay webhooks (new)&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; juspay &lt;span class="nt"&gt;--juspay-key&lt;/span&gt; your_key &lt;span class="nt"&gt;--juspay-merchant-id&lt;/span&gt; your_mid &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks

&lt;span class="c"&gt;# Stripe webhooks&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; stripe &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks

&lt;span class="c"&gt;# Cashfree webhooks&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; cashfree &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks

&lt;span class="c"&gt;# Razorpay webhooks&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; razorpay &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why Juspay is different&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Juspay doesn't use HMAC signing for webhooks —&amp;gt; it uses Basic Auth. Their webhook payload format is also unique: everything is wrapped under &lt;code&gt;content.order&lt;/code&gt; rather than separate entity types. Disputes are dashboard-only (no API), and payments auto-capture (no manual capture step). The adapter handles all of these quirks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick start&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;carbon-layer
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No database setup, no gateway credentials. SQLite by default, PostgreSQL optional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Carbon Layer does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're new here —&amp;gt; Carbon Layer simulates payment failure modes (dispute spikes, refund storms, gateway errors) and fires signed webhook events at your endpoint. It reports exactly what your handler returned for each event type.&lt;/p&gt;

&lt;p&gt;7 scenarios built in. 4 providers. HTML reports. CI/CD callbacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We're building a hosted version with dashboards, scheduled runs, and compliance reports for teams.&lt;/p&gt;

&lt;p&gt;Join the waitlist: &lt;a href="https://pritom14.github.io/carbon-layer/waitlist" rel="noopener noreferrer"&gt;pritom14.github.io/carbon-layer/waitlist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Pritom14/carbon-layer" rel="noopener noreferrer"&gt;github.com/Pritom14/carbon-layer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome : especially from teams using Juspay in production.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>webdev</category>
      <category>python</category>
      <category>stripe</category>
    </item>
    <item>
      <title>Carbon Layer v0.4 — Chaos testing for payment webhooks, now with Stripe and Cashfree support</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Sat, 21 Mar 2026 09:37:15 +0000</pubDate>
      <link>https://dev.to/pritom14/carbon-layer-v04-chaos-testing-for-payment-webhooks-now-with-stripe-and-cashfree-support-3lnk</link>
      <guid>https://dev.to/pritom14/carbon-layer-v04-chaos-testing-for-payment-webhooks-now-with-stripe-and-cashfree-support-3lnk</guid>
      <description>&lt;p&gt;A few days ago I shared Carbon Layer -&amp;gt; an open-source chaos engineering tool for payment flows. Here's what's new:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-provider webhook support&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Carbon Layer now generates provider-specific webhook payloads for &lt;strong&gt;Razorpay, Stripe, and Cashfree&lt;/strong&gt; with the correct signing format for each:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Razorpay&lt;/li&gt;
&lt;li&gt;Signature Header: X-Razorpay-Signature&lt;/li&gt;
&lt;li&gt;Signing Method: HMAC-SHA256&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encoding: Hex encoded&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stripe&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Signature Header: Stripe-Signature&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Header Format: t=timestamp, v1=signature&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Signing Method: HMAC-SHA256&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extra Security: Includes timestamp verification to prevent replay attacks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cashfree&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Signature Header: x-webhook-signature&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Signing Method: HMAC-SHA256&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encoding: Base64 encoded&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your webhook handler gets tested with the exact same format and signing it sees in production. We verified each implementation against the provider's official documentation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stripe webhooks&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; stripe &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks

&lt;span class="c"&gt;# Cashfree webhooks&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; cashfree &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks

&lt;span class="c"&gt;# Razorpay webhooks (original)&lt;/span&gt;
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; razorpay &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Zero-config install&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The other big change — PostgreSQL is no longer required. Carbon Layer now uses SQLite by default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;carbon-layer
carbon run dispute-spike &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No database setup. No environment variables. Two commands and you're testing.&lt;/p&gt;

&lt;p&gt;PostgreSQL is still supported for teams: &lt;code&gt;pip install carbon-layer[postgres]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Carbon Layer does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you missed the first post — Carbon Layer simulates payment failure modes (dispute spikes, refund storms, gateway errors) and fires signed webhook events at your endpoint. It reports exactly what your handler returned for each event type.&lt;/p&gt;

&lt;p&gt;7 scenarios built in. Provider specific signed payloads. HTML reports. CI/CD callbacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We're building a hosted version with dashboards, scheduled runs, and compliance reports for teams.&lt;/p&gt;

&lt;p&gt;If that interests you, join the waitlist:&lt;a href="https://pritom14.github.io/carbon-layer/waitlist" rel="noopener noreferrer"&gt;carbon-pro-waitlist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Pritom14/carbon-layer" rel="noopener noreferrer"&gt;github.com/Pritom14/carbon-layer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome —&amp;gt; especially if you've tried it against your Stripe/Razorpay or Cashfree webhook handlers.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>stripe</category>
      <category>webdev</category>
    </item>
    <item>
      <title>We fired 150 dispute webhooks at a payment service. 12 handlers crashed. Here's what we built.</title>
      <dc:creator>Pritom Mazumdar</dc:creator>
      <pubDate>Wed, 18 Mar 2026 08:42:53 +0000</pubDate>
      <link>https://dev.to/pritom14/we-fired-150-dispute-webhooks-at-a-payment-service-12-handlers-crashed-heres-what-we-built-1khp</link>
      <guid>https://dev.to/pritom14/we-fired-150-dispute-webhooks-at-a-payment-service-12-handlers-crashed-heres-what-we-built-1khp</guid>
      <description>&lt;p&gt;Every company processing payments tests the happy path.&lt;/p&gt;

&lt;p&gt;Payment succeeds, order gets fulfilled, customer gets a confirmation email. That’s the flow that gets reviewed, tested in staging, and monitored in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What doesn’t get tested is everything else.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dispute spikes. Refund storms. Gateway errors that leave orders stuck. Webhook sequences your handlers were never built to handle at volume.&lt;/p&gt;

&lt;p&gt;These are the failure modes that show up in production usually at the worst possible time.&lt;/p&gt;

&lt;p&gt;The problem is not that engineering teams don’t care.&lt;br&gt;
It’s that the tools to test this don’t exist.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Why you can’t test this in Razorpay’s sandbox&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Razorpay’s test API cannot create disputes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Disputes are raised by banks and card networks, not merchants. There is no POST /disputes endpoint.&lt;/p&gt;

&lt;p&gt;Even if you could trigger disputes manually, you can’t fire 150 of them in 10 seconds on a test account. Razorpay would rate limit you. And you can’t control the timing or sequence of webhook events in any sandbox.&lt;/p&gt;

&lt;p&gt;So the failure mode you most need to test is the one the provider doesn’t let you simulate.&lt;/p&gt;

&lt;p&gt;Teams ship, cross their fingers, and find out what breaks when customers find it first.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;What we built&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Carbon Layer is an open-source chaos engineering tool for payment flows.&lt;/p&gt;

&lt;p&gt;You run a scenario dispute spike, refund storm, payment decline spike and it fires Razorpay format webhook events directly at your endpoint.&lt;/p&gt;

&lt;p&gt;Same JSON shape.&lt;br&gt;
Same headers.&lt;br&gt;
Same HMAC-SHA256 signature as real Razorpay webhooks.&lt;/p&gt;

&lt;p&gt;Your server can’t tell the difference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;carbon-layer

carbon run dispute-spike &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; mock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks/razorpay
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No Razorpay account needed.&lt;br&gt;
No sandbox credentials.&lt;br&gt;
No rate limits.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;The report&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The report shows exactly what happened:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Webhook Delivery Summary
Target: http://localhost:8000/webhooks/razorpay

Event Type                 Sent    2xx    4xx    5xx    Timeout
payment.captured            100     98      0      1          1
payment.dispute.created     150    135      0     12          3
refund.processed             50     49      0      1          0

Total                       300    282      0     14          4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;14 events your handler didn’t process correctly.&lt;/p&gt;

&lt;p&gt;In production, each unhandled dispute is a chargeback the merchant loses by default.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Scenarios available&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;• dispute-spike — 150 disputes on captured payments&lt;br&gt;
• payment-decline-spike — Simulates a 30% payment failure rate&lt;br&gt;
• refund-storm — Mass refunds across captured payments&lt;br&gt;
• flash-sale — High-volume order and payment flow&lt;br&gt;
• gateway-error-burst — Intermittent gateway failures&lt;br&gt;
• min-amount — Minimum paise transactions&lt;br&gt;
• max-amount — Large-value transactions&lt;/p&gt;

&lt;p&gt;All scenarios work with the mock adapter.&lt;br&gt;
No external account required.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;If you use Razorpay&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also run scenarios against your actual Razorpay test account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; razorpay &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--api-key&lt;/span&gt; rzp_test_xxx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--api-secret&lt;/span&gt; yyy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; https://your-staging-app.com/webhooks/razorpay

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Razorpay’s API doesn’t support server-side payment or dispute creation.&lt;br&gt;
Scenarios that need these fall back to the mock adapter automatically.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install carbon-layer&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://github.com/Pritom14/carbon-layer&lt;/code&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Update: new features shipped in v0.2.0&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three new features based on feedback:&lt;/p&gt;

&lt;p&gt;Parameter overrides -- override scenario parameters at runtime without editing YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;baseline_orders&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;500 &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;dispute_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HTML reports -- export a shareable report for your team:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon report &lt;span class="nt"&gt;--run-id&lt;/span&gt; &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt; html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CI/CD integration -- POST run results to your pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;carbon run dispute-spike &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--webhook-url&lt;/span&gt; http://localhost:8000/webhooks/razorpay &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--callback-url&lt;/span&gt; http://ci/carbon/results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Built with Python, asyncpg, httpx, and Typer.&lt;br&gt;
Open source under Apache 2.0.&lt;/p&gt;

&lt;p&gt;Feedback welcome especially if you’re building on Razorpay and want to run this against your staging environment.&lt;/p&gt;

</description>
      <category>stripe</category>
      <category>python</category>
      <category>testing</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
