In the dynamic world of front-end development, the tools and APIs available to us are constantly evolving. For years, integrating AI into a web application meant relying on heavy server-side processing, expensive API calls to cloud providers, and dealing with the inherent latency of network requests. Today, a significant shift is occurring, and AI is moving to the edge, directly into the user’s browser.
Google Chrome’s new Prompt API, powered by the on-device Gemini Nano model, is a gamechanger. It allows front-end developers to run natural language processing tasks locally, ensuring privacy, zero-latency inference, and offline capabilities. However, providing a user with an empty text box and asking an on-device model to “do everything” rarely yields enterprise-grade results. To build truly smart, autonomous applications, we need Agents.
In this comprehensive guide, we will explore how to build a robust, LangChain-style AI agent architecture entirely in the browser. We will tackle the architectural challenges of running AI on the UI thread, leverage Web Workers for heavy lifting, and implement the ReAct (Reason + Act) pattern to give our on-device model the ability to use custom tools.
The Architectural Challenge: AI on the Main Thread
Before we dive into the code, we must address the elephant in the room: performance.
As front-end architects, we know that the browser’s main thread is responsible for everything the user interacts with rendering the DOM, handling click events, executing JavaScript, and painting pixels to the screen. When you introduce a computationally heavy task to the main thread, the UI freezes. This phenomenon, known as “Jank,” degrades the user experience significantly.
While Gemini Nano is highly optimized for edge devices, querying a Language Learning Model (LLM) is still a heavy asynchronous operation. If we were to run a complex, multi-step agent loop (where the model thinks, acts, observes, and thinks again) entirely on the main thread, the browser would struggle to maintain a smooth 60 frames per second (FPS).
The Web Worker Dilemma
The standard solution for offloading heavy computations in the browser is the Web Worker. Web Workers allow you to run JavaScript in background threads, separate from the UI.
However, as of the current implementation in Chrome, the Prompt API window.LanguageModel is not available inside Web Workers. This limitation stems from the complexity of establishing Permission Policies (like checking if a site is allowed to run the model) across different execution contexts.
The Proxy Architecture Solution
To solve this, we must implement a Proxy Architecture.
- The Main Thread: Acts as the “LLM Server.” It holds the actual window.ai session and listens for prompts.
- The Web Worker: Acts as the “Agent Orchestrator.” It runs the heavy logic loop, executes background tools (like fetching data or parsing large strings), and uses Remote Procedure Calls (RPC) via postMessage to ask the Main Thread for LLM completions.
This decoupling ensures that our application remains incredibly responsive, even when the agent is executing a 10 step chain of thought.
Borrowing from LangChain: The ReAct Pattern
To make our on-device model autonomous, we will borrow a concept popularized by frameworks like LangChain: the ReAct (Reason + Act) pattern.
Traditional prompting involves asking a question and getting a static answer. Agentic prompting involves giving the model a goal, a set of tools, and a structured loop to follow.
In the ReAct pattern, the LLM acts as the routing engine. When given a prompt, it enters a while loop:
- Reason (Thought): The model evaluates the current state and decides what it needs to do next.
- Act (Action): The model selects a tool from its available arsenal and provides the input for that tool.
- Observe (Observation): The system executes the tool, captures the output, and feeds it back to the model.
- Repeat: The model reasons about the new observation until it can formulate a Final Answer.
Enforcing Structure with JSON Schemas
Large cloud models can often follow ReAct instructions via plain text prompts. However, smaller on-device models like Gemini Nano require stricter guardrails.
To prevent the model from hallucinating tool names or outputting unstructured text, we utilize the responseConstraint feature of the Prompt API. By passing a JSON Schema to the API, we force the model to output strict, deterministically parsable JSON on every iteration.
Step 1: The Main Thread Host
Let’s start building. The first piece of our architecture is the PromptChainHost. This class lives on the main thread. Its sole responsibility is to instantiate the Web Worker, initialize the Gemini Nano session, and act as a bridge between the background agent and the Prompt API.
import { MessageContext } from "./consts.js";
export class PromptChainHost {
constructor(workerUrl) {
this.worker = new Worker(workerUrl, { type: 'module' });
this.session = null;
this.callbacks = new Map();
this.msgId = 0;
this.worker.onmessage = this.handleWorkerMessage.bind(this);
}
async init(systemPrompt) {
this.session = await LanguageModel.create({
systemPrompt: systemPrompt
});
}
async handleWorkerMessage(e) {
const { id, type, payload } = e.data;
if (type === MessageContext.llmRequest) {
try {
// Force the LLM to output JSON so we can parse tool requests
const responseText = await this.session.prompt(payload.prompt, {
responseConstraint: payload.schema
});
this.worker.postMessage({ id, type: MessageContext.llmResponse, payload: responseText });
} catch (err) {
this.worker.postMessage({ id, type: MessageContext.llmError, payload: err.message });
}
}
else if (type === 'agent_log') {
window.dispatchEvent(new CustomEvent(MessageContext.agentLog, { detail: payload }));
}
else if (type === MessageContext.agentComplete) {
const cb = this.callbacks.get(id);
if (cb) {
cb.resolve(payload);
}
this.callbacks.delete(id);
}
else if (type === MessageContext.agentError) {
const cb = this.callbacks.get(id);
if (cb) {
cb.reject(new Error(payload));
}
this.callbacks.delete(id);
}
}
runAgent(userPrompt) {
return new Promise((resolve, reject) => {
const id = ++this.msgId;
this.callbacks.set(id, { resolve, reject });
this.worker.postMessage({ id, type: MessageContext.startLoop, payload: userPrompt });
});
}
}
Architectural Nuances of the Host
Notice how thin this layer is. The host does not know what tools exist, it does not parse JSON, and it does not handle any business logic. By keeping the main thread lean, we ensure that the browser’s rendering engine has maximum resources available. The window.dispatchEvent mechanism is particularly useful here, as it allows us to decouple the core logic from our React, Angular, or Vanilla JS components.
Step 2: The Worker Core Engine
This is the heart of our library. Because we cannot pass JavaScript functions (tools) via postMessage due to the browser’s Structured Clone algorithm, we create a generic worker engine. Developers will import this file into their own worker scripts to define their tools.
This file handles the while loop, manages the conversation context string, parses the JSON, and manually triggers the functions.
import { MessageContext } from "./consts.js";
export class Tool {
constructor(name, description, executeFn) {
this.name = name;
this.description = description;
this.executeFn = executeFn;
}
}
export function createAgentWorker(toolsArray, maxIterations = 7) {
let msgId = 0;
const resolvers = new Map();
const toolsMap = new Map(toolsArray.map(t => [t.name, t]));
// Schema to force the LLM to choose between thinking, acting, or answering
const agentSchema = {
"type": "object",
"properties": {
"thought": { "type": "string", "description": "Reasoning for the current step." },
"toolName": { "type": "string", "description": "Name of tool to use. Empty if no tool needed." },
"toolInput": { "type": "string", "description": "Input for the tool." },
"finalAnswer": { "type": "string", "description": "The final answer. Empty if using a tool." }
},
"required": ["thought", "toolName", "toolInput", "finalAnswer"],
};
function askLLM(prompt) {
return new Promise((resolve, reject) => {
const id = ++msgId;
resolvers.set(id, { resolve, reject });
self.postMessage({ id, type: MessageContext.llmRequest, payload: { prompt, schema: agentSchema } });
});
}
function logToMain(message) {
self.postMessage({ id: 0, type: MessageContext.agentLog, payload: message });
}
async function runReActLoop(userPrompt) {
let isComplete = false;
let finalResult = "";
let loopCount = 0;
// Inject available tools into the conversation context
const toolDescriptions = toolsArray.map(t => `- ${t.name}: ${t.description}`).join('\n');
let context = `System: You are an AI agent. Think step-by-step.
Available tools:
${toolDescriptions}
- none: Use this if you do not need a tool and can answer the user directly.
Rules:
1. If you need data, set "toolName" to a tool and "toolInput" to the query. Leave "finalAnswer" as "".
2. If you know the answer, set "toolName" to "none" and put the answer in "finalAnswer".
Conversation:
User: ${userPrompt}\n`;
while (!isComplete && loopCount < maxIterations) {
loopCount++;
const responseText = await askLLM(`${context}\nOutput your next step as JSON:`);
let response;
try {
response = JSON.parse(responseText);
} catch (e) {
context += `Observation: Invalid JSON format. Please output strictly valid JSON.\n`;
continue;
}
if (response.thought) {
logToMain(`Thought: ${response.thought}`);
}
if (response.finalAnswer && response.finalAnswer.trim() !== "") {
finalResult = response.finalAnswer;
isComplete = true;
} else if (response.toolName && response.toolName !== "none" && toolsMap.has(response.toolName)) {
logToMain(`Action: Running ${response.toolName} with input "${response.toolInput}"`);
try {
const tool = toolsMap.get(response.toolName);
const toolResult = await tool.executeFn(response.toolInput);
context += `Action: ${response.toolName}("${response.toolInput}")\n`;
context += `Observation: ${toolResult}\n`;
logToMain(`Observation: ${toolResult}`);
} catch (err) {
context += `Observation: Tool failed with error: ${err.message}\n`;
}
} else if (response.toolName === "none" || response.toolName === "") {
// Edge case: Model selected none, but forgot to populate finalAnswer
context += `Observation: You selected no tools, but didn't provide a finalAnswer. Please provide the final answer.\n`;
}
else {
// Fallback: Model hallucinated a tool that doesn't exist
context += `Observation: Tool '${response.toolName}' does not exist. Use an available tool or 'none'.\n`;
}
}
return finalResult || "Error: Reached maximum iterations.";
}
self.addEventListener('message', async (e) => {
const { id, type, payload } = e.data;
if (type === MessageContext.llmResponse) {
resolvers.get(id)?.resolve(payload);
resolvers.delete(id);
} else if (type === MessageContext.llmError) {
resolvers.get(id)?.reject(new Error(payload));
resolvers.delete(id);
} else if (type === MessageContext.startLoop) {
try {
const answer = await runReActLoop(payload);
self.postMessage({ id, type: MessageContext.agentComplete, payload: answer });
} catch (err) {
self.postMessage({ id, type: MessageContext.agentError, payload: err.message });
}
}
});
}
The Importance of the “None” Off-Ramp
One of the most critical parts of this code is the handling of the “none” tool state. Smaller models, when presented with a JSON schema that demands a toolName, will often try to satisfy the schema even if they already know the answer (e.g., answering “What is a car?”).
By explicitly defining none: Use this if you do not need a tool, we give the model permission to bypass the tool execution block entirely. Furthermore, we must check for empty strings because JavaScript considers them falsy. Without strict string trimming and response.finalAnswer.trim() !== “” checks, the model could easily fall into an infinite fallback loop.
Step 3: The Consumer’s Implementation
Now that the generic engine is built, we can consume it. This file is the actual Web Worker script that your application will load. Here, you define the tools specific to your business domain. Because these functions run in the worker, they can perform heavy fetch calls, parse massive CSV files, or execute complex mathematics without ever blocking the UI.
import { Tool, createAgentWorker } from './prompt-chain-worker.js';
// Define custom tools
const fetchTool = new Tool(
"FetchData",
"Fetches text content from a URL.",
async (url) => {
const res = await fetch(url);
return await res.text();
}
);
const mathTool = new Tool(
"Calculator",
"Evaluates math expressions (e.g. '100 * 5').",
(expression) => {
return String(eval(expression));
}
);
// Initialize the generic engine with these specific tools
createAgentWorker([fetchTool, mathTool]);
Step 4: Tying it Together in the UI
Finally, we constructed an HTML document to test our architecture. We will listen to the custom agent-log events to build a LangChain-style verbose output console, allowing the user to watch the model “think” in real-time.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Prompt API Agent Test</title>
<link rel="stylesheet" href="./styles.css">
</head>
<body>
<h1>On-Device Agent Tester</h1>
<div class="controls">
<input type="text" id="prompt-input" value="What is 542 multiplied by 13?" />
<button id="run-btn">Run Agent</button>
</div>
<h3>Agent Reasoning Stream:</h3>
<div id="logs"></div>
<h3>Final Output:</h3>
<div id="result"></div>
<script type="module">
import { PromptChainHost } from './prompt-chain-host.js';
import {MessageContext} from "./consts.js";
const logBox = document.getElementById('logs');
const resultBox = document.getElementById('result');
const runBtn = document.getElementById('run-btn');
const input = document.getElementById('prompt-input');
// Listen for custom events dispatched by the Host
window.addEventListener(MessageContext.agentLog, (e) => {
const msg = e.detail;
let className = 'log-entry thought';
if (msg.startsWith('Action:')) className = 'log-entry action';
if (msg.startsWith('Observation:')) className = 'log-entry observation';
logBox.innerHTML += `<div class="${className}">${msg}</div>`;
logBox.scrollTop = logBox.scrollHeight;
});
async function init() {
if (!window.LanguageModel) {
resultBox.textContent = "Error: LanguageModel is not available. Please ensure Chrome flags are enabled.";
return;
}
const availability = await LanguageModel.availability();
if (availability === "unavailable") {
resultBox.textContent = "LanguageModel API is not available on this device/browser.";
return;
}
runBtn.disabled = true;
runBtn.textContent = "Initializing Model...";
// Instantiate the host, pointing it to the developer's custom worker
const host = new PromptChainHost('./my-agent.js');
await host.init(
"You are a logical assistant. You must always output JSON matching the provided schema. Think step-by-step."
);
runBtn.disabled = false;
runBtn.textContent = "Run Agent";
runBtn.addEventListener('click', async () => {
logBox.innerHTML = '';
resultBox.textContent = 'Thinking...';
runBtn.disabled = true;
try {
const finalAnswer = await host.runAgent(input.value);
resultBox.textContent = finalAnswer;
} catch (error) {
resultBox.textContent = `Error: ${error.message}`;
} finally {
runBtn.disabled = false;
}
});
}
init();
</script>
</body>
</html>
Summary
By combining the power of Chrome’s native Prompt API with the asynchronous, non-blocking nature of Web Workers, we have created a highly scalable front-end architecture.
We avoided the “Framework Catholic Wedding” by keeping our core logic strictly tied to vanilla browser APIs, ensuring this code can be integrated seamlessly into React, Vue, or Web Components architectures. Furthermore, by implementing the ReAct pattern manually with strict JSON schemas, we bridged the gap between a simple chat interface and a truly autonomous, tool-wielding local agent.
The shift to on-device AI is not just a trend; it is a fundamental architectural change that reduces cloud costs and enhances user privacy. With this setup, your users can now run complex agentic workflows directly in their browser, at 60 frames per second.
If you are interested in the code, you can find it on my Github — https://github.com/gilf/prompt-chain.

Top comments (0)