Combining Chrome's new WebMCP API with Transformers.js taught me that AI development isn't just a backend problem anymore
When most people think about adding AI to a web app, they picture the same architecture: a frontend that sends requests to a backend that calls OpenAI or Anthropic, and a response that travels back the same way. It works. But it also means API bills, latency, data leaving the device, and infrastructure to maintain.
I wanted to challenge that assumption. So I built a document analysis app where everything — the AI model, the tool execution, the agent reasoning — runs entirely in the browser. No backend. No API keys. No cloud inference.
Two tools made it possible: Transformers.js and Chrome's WebMCP API. They solve different problems, and combining them produced something more interesting than either could alone.
Transformers.js: The Model Comes to the Browser
Transformers.js is Hugging Face's JavaScript library for running AI models locally in the browser. It uses ONNX format models, accelerated by WebAssembly, with optional WebGPU support for faster inference where available.
The model I used was Qwen2.5-0.5B-Instruct — a 0.5 billion parameter instruction-tuned model, quantised to 4-bit (~300MB download). It downloads once, caches in the browser, and runs completely offline from that point on.
The critical engineering decision is where the model runs: a Web Worker. This keeps inference off the main thread entirely, so the UI stays responsive while the model is generating. Without this, the page would freeze for every token generated.
What surprised me was how capable the model is for focused tasks. Summarisation, keyword extraction, document Q&A, deciding which tool to call — Qwen2.5-0.5B handles all of these well. It's not GPT-4. But for scoped, well-defined tasks it's genuinely useful, and it's free to run after the initial download.
What Transformers.js is good for:
- Privacy-sensitive workflows where data can't leave the device
- Offline-capable applications
- High-volume internal tools where per-token API costs add up fast
- Any task where you want instant responses without a network round trip
Chrome WebMCP: Making Your Web App Agent-Ready
WebMCP is newer and less understood. It's a W3C browser standard, currently in Chrome 146 Canary behind a feature flag, co-developed by engineers at Google and Microsoft.
The core idea: websites can register structured, callable tools directly in the browser using a new API called navigator.modelContext. Any AI agent that has access to the browser session — via Chrome's DevTools protocol or a connected MCP client — can discover those tools and call them as typed function calls.
// Register a tool that any AI agent can call
navigator.modelContext.registerTool({
name: "count_words",
description: "Count words and estimate reading time for the loaded document",
inputSchema: {},
execute: async () => ({ wordCount: 423, readingTime: "2 min" })
})
This is a significant shift from how browser-based AI agents work today. Current approaches require agents to visually interpret pages — taking screenshots, reading the DOM, simulating clicks. It's brittle. Every redesign breaks the agent's understanding of the page.
WebMCP replaces visual guessing with a semantic contract. Your page declares what it can do, in terms AI agents already understand.
What WebMCP is not — it's not for your app talking to an AI. It's for an AI agent talking to your app. The direction of the call is reversed from what most developers expect.
What Happens When You Combine Them
Here's where it gets interesting. These two tools solve genuinely different problems:
- Transformers.js provides local AI intelligence — the ability to reason, summarise, and decide
- WebMCP provides agent interoperability — the ability for external agents to interact with your app through a typed interface
In the POC I built, they work together as a complete agent loop:
User: "What are the main topics in this document?"
↓
Qwen2.5 (local, Transformers.js) — Pass 1
Reads tool descriptions, decides: call extract_keywords({ topN: 8 })
↓
toolRegistry.execute("extract_keywords") ← WebMCP handler
Returns: [{ word: "transformer", count: 8 }, ...]
↓
Qwen2.5 (local, Transformers.js) — Pass 2
Reads the JSON result, writes natural language answer
↓
"The main topics are transformers, attention mechanisms, and neural networks..."
The model runs twice per agent turn — once to decide which tool to call, once to turn the tool result into a human answer. WebMCP provides the structured bridge in between. Neither step touches a server.
The UI makes the whole loop visible: a three-panel layout with the document on the left, AI summary in the middle, and an agent chat console on the right where you can see the navigator.modelContext tool call cards — showing the exact JSON request and response — alongside the natural language answer.
The Practical Difference Between the Two APIs
One thing worth being precise about: WebMCP's navigator.modelContext is a write-only registration API from your own page's perspective. You register tools. External agents call them. There's no getTools() method — you can't call your own registered tools back through the navigator API.
In the POC, I solved this by maintaining a parallel toolRegistry — a simple JavaScript object that stores each tool's execute function. The WebMCP registration exposes tools to external agents, while the registry ref lets the local agent console call the same functions directly.
Each tool registers in two places:
1. navigator.modelContext.registerTool() → for external AI agents
2. toolRegistry["tool_name"] = fn → for local agent loop
Same logic, same output. One is the external interface, one is the internal wiring.
The Real Use Cases This Unlocks
Working through this POC made the practical applications obvious:
Legal and medical document review — documents never leave the device, no compliance questions about data egress, AI assistance is still fully functional.
Internal enterprise tools — no per-token API costs, no rate limits, works on restricted networks. A tool used by a hundred people all day costs the same as one person using it for five minutes.
Agent-ready web apps — with WebMCP, your web app becomes part of the broader AI agent ecosystem automatically. When Claude in Chrome, Cursor, or any MCP-compatible agent visits your page, it gets a clean typed API instead of having to scrape your UI.
Offline-first applications — once the model is cached, everything works without connectivity. Field tools, healthcare applications, anything that needs to work in low-connectivity environments.
The Honest Trade-offs
Model capability: A 0.5B parameter model is good at focused tasks but not at complex multi-step reasoning. If you need GPT-4 level capability, you still need a cloud model. Browser-native AI is excellent for scoped, well-defined tasks.
First load: ~300MB is a real UX consideration. Users need to see clear progress and understand what's happening. After the first download it's instant.
WebMCP maturity: Chrome 146 Canary only, behind a flag. No Firefox or Safari signals yet. The @mcp-b/global polyfill bridges the gap for demos and development, but native agent discovery requires Chrome Canary for now.
WebGPU on Windows: GPU driver crashes with WebGPU are common on Windows right now. WASM is the stable fallback — slower but reliable everywhere.
The Bigger Picture
What this combination demonstrates is that AI development has a new axis to reason about: not just which model, but where the model runs and how your app exposes itself to agents. For a long time, "frontend developer" and "AI developer" were different job descriptions. Transformers.js and WebMCP are collapsing that gap. A frontend developer who understands these two APIs can now ship a complete AI-powered workflow — with local inference, structured tool use, and agent interoperability — without touching a backend. The web page is becoming an AI-native surface. Not just a UI for humans, but a typed interface for agents. WebMCP is the standard that makes that possible without giving up the browser's security model or the user's privacy.
Top comments (0)