DEV Community

Cover image for Build a real-time streaming AI chatbot with zero streaming infrastructure - async + webhooks + failover
Akarsh Cholapurath
Akarsh Cholapurath

Posted on

Build a real-time streaming AI chatbot with zero streaming infrastructure - async + webhooks + failover

Have you ever tried building a production-ready AI chatbot that streams responses token-by-token, handles failover across providers, enforces structured JSON outputs, and lets you inject custom logic (like metadata tracking or approval gates) — all without managing WebSocket servers, polling, timeouts, or connection state?

Most vanilla setups (OpenAI/Anthropic streaming) force you into complex infra. But what if a lightweight gateway handled all that?

Enter this full-stack example using ModelRiver (an AI gateway I'm building). It demonstrates a clean pattern for true end-to-end streaming with async requests, event-driven webhooks, automatic failover, and easy local dev — no ngrok needed.

In ~30-45 minutes, you can recreate this: React frontend → Node.js backend → ModelRiver → real-time WebSocket back to browser.

(Disclosure: I work on ModelRiver. This is a genuine technical demo for feedback on production LLM patterns.)

Why This Pattern Matters in 2026

Modern AI apps need:

  • Instant, human-like streaming UX
  • Reliability (failover if a provider flakes)
  • Structured, type-safe outputs (e.g., sentiment + action items)
  • Business logic gates (validation, enrichment, custom IDs for DB)
  • Zero heavy infra (no persistent WebSockets on your side)

This example solves all that with async + webhook callbacks + lightweight client SDK.

Architecture at a Glance

User (React) → Node.js Backend → ModelRiver Async API
↓
AI Processing (background, failover)
↓
Webhook to Backend (enrich/inject)
↓
Callback to ModelRiver
↓
WebSocket Stream → Frontend (real-time)
Enter fullscreen mode Exit fullscreen mode

Key magic: ModelRiver processes async, hits your webhook before final delivery → you enrich → callback → streams via WS.

Prerequisites

  • Node.js 16+
  • ModelRiver account (free tier): console.modelriver.com
  • API key from console
  • Optional: Ollama/llama.cpp/vLLM for local inference testing

Step 1: Set Up ModelRiver (Console)

  1. Create a project.
  2. Add providers (OpenAI/Anthropic + local if wanted).
  3. Define structured output schema (e.g., chatbot_response):
{
  "reply": "string",
  "summary": "string",
  "sentiment": "positive | negative | neutral | mixed",
  "confidence": "number",
  "topics": "array<string>",
  "action_items": "array<{task: string, priority: 'high' | 'medium' | 'low'}>"
}
Enter fullscreen mode Exit fullscreen mode
  1. Create workflow mr_chatbot_workflow with structured output + event new_chat.
  2. Set webhook type to "Localhost CLI" for dev.

Step 2: Backend (Node.js + Express)

Install deps:

npm init -y
npm i express uuid dotenv node-fetch
Enter fullscreen mode Exit fullscreen mode

.env:

MODELRIVER_API_KEY=your_key
PORT=4000
BACKEND_PUBLIC_URL=http://localhost:4000
WEBHOOK_SECRET=your_secret_from_console
EVENT_NAME=new_chat
Enter fullscreen mode Exit fullscreen mode

Main index.js (key endpoints):

const express = require('express');
const { v4: uuidv4 } = require('uuid');
require('dotenv').config();
const fetch = require('node-fetch');

const app = express();
app.use(express.json());

const pendingRequests = new Map(); // channel_id → {conversationId, messageId}

app.post('/chat', async (req, res) => {
  const { message, workflow = 'mr_chatbot_workflow' } = req.body;
  const conversationId = uuidv4();
  const messageId = uuidv4();

  const resp = await fetch('https://api.modelriver.com/v1/ai/async', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.MODELRIVER_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      workflow,
      messages: [{ role: 'user', content: message }],
      delivery_method: 'websocket',
      webhook_url: `${process.env.BACKEND_PUBLIC_URL}/webhook/modelriver`,
      events: ['webhook_received'],
      metadata: { conversationId, messageId },
    }),
  });

  const data = await resp.json();
  pendingRequests.set(data.channel_id, { conversationId, messageId });

  res.json({
    channel_id: data.channel_id,
    websocket_url: data.websocket_url,
    ws_token: data.ws_token,
  });
});

app.post('/webhook/modelriver', async (req, res) => {
  const { channel_id, ai_response, callback_url } = req.body;
  const pending = pendingRequests.get(channel_id);

  if (!pending) return res.status(404).json({ error: 'Not found' });

  // Enrich with custom IDs (add validation/sentiment gates here!)
  const enriched = {
    id: pending.messageId,
    conversation_id: pending.conversationId,
    ...ai_response.data,
  };

  await fetch(callback_url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(enriched),
  });

  pendingRequests.delete(channel_id);
  res.status(200).json({ success: true });
});

app.listen(process.env.PORT, () => console.log(`Backend on ${process.env.PORT}`));

Enter fullscreen mode Exit fullscreen mode

Step 3: Frontend (React + ModelRiver Client SDK)

npx create-vite@latest frontend --template react
cd frontend
npm i @modelriver/client
Enter fullscreen mode Exit fullscreen mode

Use useModelRiver hook for streaming:

import { useModelRiver } from '@modelriver/client';
import { useState } from 'react';

function App() {
  const [input, setInput] = useState('');
  const { connect, message, status } = useModelRiver();

  const send = async () => {
    if (!input) return;
    const res = await fetch('http://localhost:4000/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: input }),
    });
    const { websocket_url, ws_token, channel_id } = await res.json();
    connect({ websocket_url, ws_token, channel_id });
    setInput('');
  };

  return (
    <div>
      <h1>Streaming Chatbot Demo</h1>
      <div>{status}</div>
      <div style={{ whiteSpace: 'pre-wrap' }}>{message?.reply || 'Waiting...'}</div>
      {message?.sentiment && <p>Sentiment: {message.sentiment} ({(message.confidence * 100).toFixed(0)}%)</p>}
      {/* Render topics, action_items similarly */}
      <input value={input} onChange={e => setInput(e.target.value)} />
      <button onClick={send}>Send</button>
    </div>
  );
}

export default App;
Enter fullscreen mode Exit fullscreen mode

Step 4: Local Dev (No ngrok!)

  1. Install CLI: npm i -g @modelriver/cli
  2. Run: modelriver forward (forwards webhooks to localhost)
  3. Start backend: node index.js
  4. Start frontend: npm run dev

Test at http://localhost:5173 (Vite default).

Production Benefits Recap

  • No streaming servers — ModelRiver handles WS.
  • Async non-blocking — UI stays responsive.
  • Failover built-in — Auto-switches providers.
  • Structured + enriched — JSON schema + your logic.
  • Local-first dev — CLI makes webhooks trivial.
  • Metadata tracking — Easy DB/logging integration.

Next Steps & Repo

Full repo: https://github.com/modelriver/modelriver-chatbot-demo.git (clone, follow README).
Docs: https://modelriver.com/docs/chatbot-example

Top comments (0)