DEV Community: Akarsh Cholapurath

How to test LLM integrations in CI without burning tokens

Akarsh Cholapurath — Mon, 23 Feb 2026 17:46:54 +0000

The problem nobody talks about

Every tutorial on building with AI shows you the happy path: call the API, get a response, render it. Ship it.

But here's what actually happens when you're building AI features for real:

You write integration tests — they call OpenAI. That's real money on every test run. Your CI pipeline runs 20 times a day? That's 20× your token bill and you haven't even shipped a feature yet.

Then there's the debugging loop. Something breaks in your response parsing logic. You tweak the code, hit the API, wait 3 seconds, read the response, realize the bug is elsewhere, tweak again. Every single iteration costs tokens and adds latency. You're paying to debug your own code, not even the AI part.

And the worst part? AI responses are non-deterministic. Your test passes today, fails tomorrow, because the model decided to phrase something differently. Your CI goes red for no reason. You re-run it and it's green. Nobody trusts the test suite anymore.

We hit this exact wall while building AI apps. We were integrating multiple AI providers, and every debug cycle meant burning tokens across OpenAI, Anthropic and others, sometimes simultaneously. The cost wasn't just financial. It was the constant uncertainty of not knowing whether a failure was our code, the model or the network.

So we built something to fix it.

What we built

We added a feature called Test Mode.

The idea is dead simple: when a workflow is in Test Mode, it returns your pre-defined sample data shaped exactly the way your app expects without ever hitting OpenAI, Anthropic or any provider. Zero tokens consumed, zero AI provider charges, instant response.

Here's what that looks like in practice.

You define your expected output once

In your structured output, you provide sample data alongside the JSON schema. This is the data that Test Mode will return:

{
  "currency": "INR",
  "customer": "Akarsh",
  "invoice_id": "INV-1024",
  "items": [
    { "name": "ModelRiver Pro", "price": 1499, "qty": 1 }
  ],
  "status": "paid",
  "subtotal": 1499
}

Here's what this looks like in the ModelRiver dashboard — your schema on the left, sample data on the right:

You flip the workflow to Test Mode

Every workflow has a mode: Production or Testing. When set to Testing, the workflow skips the entire AI provider pipeline.

You still use your API key (auth and request logging all run as normal), but you don't need any AI provider credentials configured. Just the structured output with your sample data.

Your API calls work exactly the same

Your code doesn't change at all. Same endpoint, same payload, same auth:

curl -X POST https://api.modelriver.com/v1/ai \
  -H "Authorization: Bearer mr_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow": "Test_Mode_Demo_Blog",
    "messages": [{"role": "user", "content": "Test input"}]
  }'

The response comes back with your sample data:

{
  "message": "success",
  "status": "success",
  "data": {
    "currency": "INR",
    "customer": "Akarsh",
    "invoice_id": "INV-1024",
    "items": [
      { "name": "ModelRiver Pro", "price": 1499, "qty": 1 }
    ],
    "status": "paid",
    "subtotal": 1499
  },
  "customer_data": {},
  "meta": {
    "http_status": 200,
    "workflow": "Test_Mode_Demo_Blog",
    "provider": "Testing",
    "model": "test-mode",
    "used_provider": "Testing",
    "used_model": "test-mode",
    "test_mode": true,
    "structured_output": true,
    "duration_ms": 2000,
    "customer_fields": [],
    "usage": {
      "prompt_tokens": 0,
      "completion_tokens": 0,
      "total_tokens": 0
    }
  }
}

And here's the actual Playground showing the result — notice the provider is "Testing", tokens used is 0:

Your application code doesn't change. Your frontend doesn't change. The response shape is identical to what a real provider would return. The only difference: no AI provider is called, no tokens are spent, and you get the same deterministic response every time.

Why this actually matters

CI/CD that doesn't burn money

We run our integration tests on every push. Before Test Mode, that meant either mocking the AI layer (which doesn't actually test the real flow) or paying for API calls on every CI run (which means your token bill scales with your commit frequency — not the kind of scale you want).

With Test Mode, our CI tests hit the real ModelRiver API, go through the real authentication and routing pipeline, and get back predictable data. The only thing skipped is the external AI provider call. Everything else — auth, request logging, response formatting — all exercised for real.

# GitHub Actions — no mocking, no AI costs
- name: Run integration tests
  env:
    MODELRIVER_API_KEY: ${{ secrets.MR_TEST_KEY }}
  run: npm test

Simulating latency

One thing that caught us off guard early on: our frontend loading states looked fine in dev but janky in production because real GPT-4 calls take 2–5 seconds. We were testing against instant responses and shipping UIs that hadn't been tested under real latency.

Test Mode has a configurable response delay. Set it to 2000ms and your workflow responds after roughly a 2-second pause — close enough to simulate what a real AI call feels like. This lets you test loading states, timeout handling, and retry logic properly. Without spending a single token.

Unblocking frontend development

This one's been huge for us. Your frontend team doesn't need to wait for the AI pipeline to be production-ready. They can start building against Test Mode workflows immediately.

The sample data acts as a contract: "This is the shape of the response. Build against it." When the AI pipeline is ready, flip the toggle to Production. The frontend code stays identical.

Async and event-driven workflows too

Test Mode works with ModelRiver's async API as well. Fire an async request, and the test response flows through the same WebSocket delivery pipeline — including reconnection after page refresh. If you have event-driven workflows with backend callbacks, those get triggered too, with the sample data as the payload.

This means you can test your entire real-time pipeline end-to-end — webhooks, callbacks, WebSocket delivery — without a single provider call.

What Test Mode doesn't do

A few honest boundaries:

It doesn't test your prompts. The AI model is never called, so you won't know if your prompt produces good results. Test Mode is for testing your application logic, not your prompt engineering.
It doesn't test failover. Since no provider is called, the backup model chain isn't exercised. For that, use Production mode with a cheaper model.
Responses are static. The same sample data comes back every time. That's the whole point — determinism — but it means you won't catch edge cases in output variability.

The bigger picture

Test Mode is one piece of something we care a lot about: making AI infrastructure as testable and predictable as any other part of your stack.

Right now, most AI integrations are treated like black boxes. You fire a request and hope for the best. If something breaks, you check logs and squint at the provider dashboard. We've been there, and it's not a great developer experience.

We think this can be better. Real observability. Real testing. Real CI support. The kind of rigor we'd expect from databases, queues, and APIs — but applied to AI.

Have you solved this differently?
Curious how others are testing LLM features in production.

Build a real-time streaming AI chatbot with zero streaming infrastructure - async + webhooks + failover

Akarsh Cholapurath — Tue, 03 Feb 2026 16:42:03 +0000

Have you ever tried building a production-ready AI chatbot that streams responses token-by-token, handles failover across providers, enforces structured JSON outputs, and lets you inject custom logic (like metadata tracking or approval gates) — all without managing WebSocket servers, polling, timeouts, or connection state?

Most vanilla setups (OpenAI/Anthropic streaming) force you into complex infra. But what if a lightweight gateway handled all that?

Enter this full-stack example using ModelRiver (an AI gateway I'm building). It demonstrates a clean pattern for true end-to-end streaming with async requests, event-driven webhooks, automatic failover, and easy local dev — no ngrok needed.

In ~30-45 minutes, you can recreate this: React frontend → Node.js backend → ModelRiver → real-time WebSocket back to browser.

(Disclosure: I work on ModelRiver. This is a genuine technical demo for feedback on production LLM patterns.)

Why This Pattern Matters in 2026

Modern AI apps need:

Instant, human-like streaming UX
Reliability (failover if a provider flakes)
Structured, type-safe outputs (e.g., sentiment + action items)
Business logic gates (validation, enrichment, custom IDs for DB)
Zero heavy infra (no persistent WebSockets on your side)

This example solves all that with async + webhook callbacks + lightweight client SDK.

Architecture at a Glance

User (React) → Node.js Backend → ModelRiver Async API
↓
AI Processing (background, failover)
↓
Webhook to Backend (enrich/inject)
↓
Callback to ModelRiver
↓
WebSocket Stream → Frontend (real-time)

Key magic: ModelRiver processes async, hits your webhook before final delivery → you enrich → callback → streams via WS.

Prerequisites

Node.js 16+
ModelRiver account (free tier): console.modelriver.com
API key from console
Optional: Ollama/llama.cpp/vLLM for local inference testing

Step 1: Set Up ModelRiver (Console)

Create a project.
Add providers (OpenAI/Anthropic + local if wanted).
Define structured output schema (e.g., chatbot_response):

{
  "reply": "string",
  "summary": "string",
  "sentiment": "positive | negative | neutral | mixed",
  "confidence": "number",
  "topics": "array<string>",
  "action_items": "array<{task: string, priority: 'high' | 'medium' | 'low'}>"
}

Create workflow mr_chatbot_workflow with structured output + event new_chat.
Set webhook type to "Localhost CLI" for dev.

Step 2: Backend (Node.js + Express)

Install deps:

npm init -y
npm i express uuid dotenv node-fetch

.env:

MODELRIVER_API_KEY=your_key
PORT=4000
BACKEND_PUBLIC_URL=http://localhost:4000
WEBHOOK_SECRET=your_secret_from_console
EVENT_NAME=new_chat

Main index.js (key endpoints):

const express = require('express');
const { v4: uuidv4 } = require('uuid');
require('dotenv').config();
const fetch = require('node-fetch');

const app = express();
app.use(express.json());

const pendingRequests = new Map(); // channel_id → {conversationId, messageId}

app.post('/chat', async (req, res) => {
  const { message, workflow = 'mr_chatbot_workflow' } = req.body;
  const conversationId = uuidv4();
  const messageId = uuidv4();

  const resp = await fetch('https://api.modelriver.com/v1/ai/async', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.MODELRIVER_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      workflow,
      messages: [{ role: 'user', content: message }],
      delivery_method: 'websocket',
      webhook_url: `${process.env.BACKEND_PUBLIC_URL}/webhook/modelriver`,
      events: ['webhook_received'],
      metadata: { conversationId, messageId },
    }),
  });

  const data = await resp.json();
  pendingRequests.set(data.channel_id, { conversationId, messageId });

  res.json({
    channel_id: data.channel_id,
    websocket_url: data.websocket_url,
    ws_token: data.ws_token,
  });
});

app.post('/webhook/modelriver', async (req, res) => {
  const { channel_id, ai_response, callback_url } = req.body;
  const pending = pendingRequests.get(channel_id);

  if (!pending) return res.status(404).json({ error: 'Not found' });

  // Enrich with custom IDs (add validation/sentiment gates here!)
  const enriched = {
    id: pending.messageId,
    conversation_id: pending.conversationId,
    ...ai_response.data,
  };

  await fetch(callback_url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(enriched),
  });

  pendingRequests.delete(channel_id);
  res.status(200).json({ success: true });
});

app.listen(process.env.PORT, () => console.log(`Backend on ${process.env.PORT}`));

Step 3: Frontend (React + ModelRiver Client SDK)

npx create-vite@latest frontend --template react
cd frontend
npm i @modelriver/client

Use useModelRiver hook for streaming:

import { useModelRiver } from '@modelriver/client';
import { useState } from 'react';

function App() {
  const [input, setInput] = useState('');
  const { connect, message, status } = useModelRiver();

  const send = async () => {
    if (!input) return;
    const res = await fetch('http://localhost:4000/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: input }),
    });
    const { websocket_url, ws_token, channel_id } = await res.json();
    connect({ websocket_url, ws_token, channel_id });
    setInput('');
  };

  return (
    <div>
      <h1>Streaming Chatbot Demo</h1>
      <div>{status}</div>
      <div style={{ whiteSpace: 'pre-wrap' }}>{message?.reply || 'Waiting...'}</div>
      {message?.sentiment && <p>Sentiment: {message.sentiment} ({(message.confidence * 100).toFixed(0)}%)</p>}
      {/* Render topics, action_items similarly */}
      <input value={input} onChange={e => setInput(e.target.value)} />
      <button onClick={send}>Send</button>
    </div>
  );
}

export default App;

Step 4: Local Dev (No ngrok!)

Install CLI: npm i -g @modelriver/cli
Run: modelriver forward (forwards webhooks to localhost)
Start backend: node index.js
Start frontend: npm run dev

Test at http://localhost:5173 (Vite default).

Production Benefits Recap

No streaming servers — ModelRiver handles WS.
Async non-blocking — UI stays responsive.
Failover built-in — Auto-switches providers.
Structured + enriched — JSON schema + your logic.
Local-first dev — CLI makes webhooks trivial.
Metadata tracking — Easy DB/logging integration.

Next Steps & Repo

Full repo: https://github.com/modelriver/modelriver-chatbot-demo.git (clone, follow README).
Docs: https://modelriver.com/docs/chatbot-example

#help Insert json data into 3 tables and respond 2 table's data through json using knexjs, expressjs

Akarsh Cholapurath — Wed, 10 Oct 2018 06:56:29 +0000

#api Which backend api framework suits best with reactjs for fetching user data?

Akarsh Cholapurath — Thu, 04 Oct 2018 06:56:43 +0000