David C Cavalcante

Posted on May 30

ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser

#ai #javascript #llm #showdev

ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser

As a solo LLMOps engineer with over 25 years building production AI systems, I kept hitting the same limitation: when you have access to multiple LLM providers and models, choosing the right one for each request becomes fragile and outdated quickly.

Static if/else rules or fixed fallbacks do not survive real-world changes in pricing, latency, or model quality. Manual benchmarking is time-consuming and error-prone.

ModelChain (@takk/modelchain) was built to solve this.

The Problem

Developers and companies with keys for OpenAI, Anthropic, Gemini, Groq, and others waste time and money because they cannot dynamically route each prompt to the best available model based on current cost, observed latency, and actual response quality. Hard-coded choices quickly become suboptimal.

The Solution

ModelChain is a measurable, adaptive LLM router for Node.js, Edge runtimes, and browser. It selects the best model per request using seven routing strategies, scores every response in real time, feeds those scores back into future decisions, enforces hard budget guards, and includes per-model circuit breakers with automatic failover.

It normalises responses, tool calling, and streaming across providers while remaining zero-runtime-dependency and fully tree-shakable.

Core Features

Seven declarative routing strategies (cost-then-quality, cost-first, quality-first, etc.)
Six pluggable scorers (latency, token-budget, length-bound, regex-match, exact-match, schema-valid)
Native streaming over Web Streams with a unified CompletionChunk type
Normalised tool calling across OpenAI, Anthropic, and Gemini
Hard budget guard (per-request, per-task, daily ceilings) that throws before any network call
Per-model circuit breaker + full-jitter exponential backoff + automatic failover
EWMA health scoring that decays on failure and recovers on success
Thirteen in-process telemetry events (no external OpenTelemetry required)
Vercel AI SDK adapter (toVercelAILanguageModel)
CLI proxy, inspect, and bench modes
Six tree-shakeable entry points (core, providers, web, edge, ai-sdk, cli)
SLSA provenance on every release

Quickstart Examples

1. Basic Router Setup

import { createModelchain } from '@takk/modelchain';
import { openaiModel, anthropicModel, geminiModel } from '@takk/modelchain/providers';

const router = createModelchain({
  models: [
    openaiModel('gpt-4o-mini', {
      cost: { costPer1kInput: 0.00015, costPer1kOutput: 0.00060 },
      keys: process.env.OPENAI_API_KEY ?? '',
    }),
    anthropicModel('claude-3-5-haiku-latest', {
      cost: { costPer1kInput: 0.00080, costPer1kOutput: 0.00400 },
      keys: process.env.ANTHROPIC_API_KEY ?? '',
    }),
    geminiModel('gemini-2.0-flash', {
      cost: { costPer1kInput: 0.00010, costPer1kOutput: 0.00040 },
      keys: process.env.GEMINI_API_KEY ?? '',
    }),
  ],
  strategy: 'cost-then-quality',
  scoring: { built: ['latency', 'token-budget'] },
  budget: { perRequestUsd: 0.02, dailyUsd: 5 },
  telemetry: { enabled: true },
});

const response = await router.complete({
  prompt: 'Summarise X in 3 bullets.',
  maxTokens: 200,
});
console.log(response.text, response.finishReason, response.usage);

2. Streaming

for await (const chunk of router.stream({ prompt: 'Tell me a story.' })) {
  if (chunk.type === 'text-delta') process.stdout.write(chunk.delta);
  if (chunk.type === 'finish') console.log('\nDone:', chunk.finishReason, chunk.usage);
}

3. Vercel AI SDK Integration

import { generateText } from 'ai';
import { toVercelAILanguageModel } from '@takk/modelchain/ai-sdk';

const { text } = await generateText({
  model: toVercelAILanguageModel(router),
  prompt: 'Hello.',
});

4. Tool Calling (normalised)

const result = await router.complete({
  prompt: 'What is the weather in Tokyo?',
  tools: [ /* ToolDefinition shape */ ],
});

How It Works (Request Flow)

Select best model using chosen strategy and current health/scores
Pre-flight budget guard check
Dispatch request through normalised provider adapter
Classify response or error
Update EWMA health score and circuit breaker state
Score response quality and record for future routing
Emit telemetry events

All operations happen in-process with zero external dependencies.

Installation

pnpm add @takk/modelchain
# or
npm install @takk/modelchain
# or
yarn add @takk/modelchain
# or
bun add @takk/modelchain

Optional peer dependencies only if using richer typed adapters.

Why ModelChain Exists

ModelChain is the second building block (after KeyMesh) of a long-term family of high-reliability, open-source-first npm libraries for AI-native infrastructure that I plan to maintain through 2026–2030.

I built it because dynamic, measurable routing is the missing layer between raw LLM providers and production applications that care about cost, latency, quality, and reliability.

Top comments (1)

Harjot Singh • May 31

This is exactly the layer I think most AI apps are missing. Adaptive model selection + real-time scoring + budget guards is the trifecta, route to the cheapest model that clears the bar, measure whether it actually did, and hard-stop before a runaway loop bills you. The failover piece is underrated too, a provider hiccup shouldn't take your whole feature down. I built essentially this routing-plus-budget-guard logic into Moonshift so a full run stays a few dollars without quality cliffs. Curious how you score quality in real time, a judge model, heuristics on the output, or downstream signal? That scoring is the hard part.

DEV Community

ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser

ModelChain: Measurable LLM Router with Adaptive Model Selection, Real-Time Scoring, Budget Guards and Failover for Node.js, Edge and Browser

The Problem

The Solution

Core Features

Quickstart Examples

1. Basic Router Setup

2. Streaming

3. Vercel AI SDK Integration

4. Tool Calling (normalised)

How It Works (Request Flow)

Installation

Why ModelChain Exists

Links

Top comments (1)