Wilson Xu

Posted on Mar 19

Building Chrome Extensions That Bridge AI Agents to Your Browser

#chrome #ai #typescript #tutorial

Building Chrome Extensions That Bridge AI Agents to Your Browser

AI agents can read files, call APIs, and write code. But they cannot see what is on your screen, click a button in a web app, or read the DOM of the page you are looking at. The browser remains a walled garden -- unless you build a bridge.

This tutorial shows you how to build a Chrome extension that exposes browser state to AI agents through two patterns: a CDP (Chrome DevTools Protocol) relay that proxies low-level browser commands, and an MCP (Model Context Protocol) bridge that wraps those capabilities as structured tools. By the end, you will have a working TypeScript extension (Manifest V3) that lets any MCP-compatible agent read pages, fill forms, and navigate the web using your authenticated session.

Why an Extension and Not Just Puppeteer?

The obvious question: why build an extension when you can connect to Chrome over CDP directly?

Three reasons make extensions the right choice for agent-browser integration:

1. No launch flags required. Direct CDP access requires starting Chrome with --remote-debugging-port=9222. That means quitting your browser, relaunching from the terminal, and hoping no other process grabs that port. An extension works in any running Chrome instance with zero configuration.

2. Access to privileged APIs. Extensions get access to chrome.debugger, chrome.tabs, chrome.scripting, chrome.cookies, and chrome.webNavigation -- all without a raw CDP socket. The chrome.debugger API is particularly powerful: it gives you a sandboxed CDP channel per tab, managed by Chrome's permission system instead of an open network port.

3. Security boundaries. An open debugging port exposes every tab to any process on the machine (or the network, if bound to 0.0.0.0). An extension communicates through Chrome's internal messaging, and you control exactly which capabilities to expose and to whom.

The tradeoff is that extensions run inside Chrome's sandbox and cannot spawn processes or open server sockets. We solve this by having the extension connect outward to a local relay server, inverting the typical client-server relationship.

Architecture Overview

The system has four components:

┌─────────────────────────────────────────────────────────┐
│                    Chrome Browser                        │
│                                                         │
│  ┌─────────────────┐       chrome.debugger API          │
│  │  Extension       │──────────────────────►  Tab DOM   │
│  │  (Service Worker │       chrome.scripting             │
│  │   + Content      │──────────────────────►  Page JS   │
│  │   Scripts)       │                                   │
│  └────────┬─────────┘                                   │
│           │  WebSocket (outbound)                       │
└───────────┼─────────────────────────────────────────────┘
            │
            ▼
┌───────────────────────┐       stdio / SSE       ┌──────────────┐
│  Local Relay Server   │◄───────────────────────►│   AI Agent   │
│  (Node.js, port 9800) │      MCP Protocol       │  (Claude,    │
│                       │                          │   GPT, etc.) │
└───────────────────────┘                          └──────────────┘

The extension uses chrome.debugger and chrome.scripting to interact with tabs, connecting outward to the relay via WebSocket. The relay translates between that WebSocket and the MCP protocol (stdio or SSE), so any MCP-compatible AI client can connect. The agent sees tools like browser_navigate and browser_read_page -- it never knows a Chrome extension is on the other end. This separation means the extension never needs to run an HTTP server (impossible from a service worker).

Step 1: Extension Scaffold

Create the project structure:

mkdir chrome-ai-bridge && cd chrome-ai-bridge
npm init -y
npm install -D typescript @anthropic-ai/sdk ws @anthropic-ai/mcp
mkdir -p src/extension src/relay

Start with the manifest. This is Manifest V3, which Chrome requires for all new extensions:

// src/extension/manifest.json
{
  "manifest_version": 3,
  "name": "AI Browser Bridge",
  "version": "1.0.0",
  "description": "Exposes browser state to AI agents via MCP",
  "permissions": [
    "debugger",
    "tabs",
    "scripting",
    "activeTab",
    "cookies"
  ],
  "host_permissions": ["<all_urls>"],
  "background": {
    "service_worker": "background.js",
    "type": "module"
  },
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["content.js"],
      "run_at": "document_idle"
    }
  ]
}

The debugger permission is the critical one. It enables the chrome.debugger API, which gives us a full CDP channel to any tab without an external debugging port.

Step 2: The CDP Relay Pattern

The CDP relay pattern works like this: the extension attaches chrome.debugger to a target tab, then forwards CDP commands received over a WebSocket from the relay server. Responses flow back the same way. The extension acts as a CDP proxy, but one that lives inside Chrome's security model.

Here is the service worker that implements this:

// src/extension/background.ts

let socket: WebSocket | null = null;
let attachedTabs = new Map<number, boolean>();

function connectToRelay(): void {
  socket = new WebSocket("ws://localhost:9800/extension");

  socket.onmessage = async (event: MessageEvent) => {
    const { id, method, params } = JSON.parse(event.data as string);
    try {
      const handlers: Record<string, Function> = {
        "browser.navigate": navigateTab,
        "browser.readPage": readPage,
        "browser.click": clickElement,
        "browser.fill": fillInput,
        "browser.screenshot": captureScreenshot,
        "browser.evaluate": evaluateScript,
        "browser.getTabs": getOpenTabs,
      };
      const handler = handlers[method]
        ?? ((p: any) => sendCDP(p.tabId, method, p));
      const result = await handler(params);
      socket?.send(JSON.stringify({ id, result }));
    } catch (error) {
      socket?.send(JSON.stringify({
        id, error: { message: (error as Error).message }
      }));
    }
  };

  socket.onclose = () => setTimeout(connectToRelay, 3000);
}

The key function is sendCDP, which attaches the debugger to a tab if needed and sends raw CDP commands:

async function ensureAttached(tabId: number): Promise<void> {
  if (attachedTabs.has(tabId)) return;

  await chrome.debugger.attach({ tabId }, "1.3");
  attachedTabs.set(tabId, true);

  chrome.debugger.onDetach.addListener((source) => {
    if (source.tabId) attachedTabs.delete(source.tabId);
  });
}

async function sendCDP(
  tabId: number,
  method: string,
  params: Record<string, unknown> = {}
): Promise<unknown> {
  await ensureAttached(tabId);
  return chrome.debugger.sendCommand({ tabId }, method, params);
}

Note the version string "1.3" in the attach call. This is the CDP protocol version, not the Chrome version. Version 1.3 is stable and supported across all modern Chrome releases.

Step 3: High-Level Browser Operations

Raw CDP commands are too low-level for an AI agent to use effectively. An agent does not want to call DOM.querySelector followed by DOM.getBoxModel followed by Input.dispatchMouseEvent. It wants to call click("#submit-button"). The extension bridges this gap:

async function readPage(params: Record<string, unknown>) {
  const tabId = (params.tabId as number) ?? (await getActiveTabId());

  const results = await chrome.scripting.executeScript({
    target: { tabId },
    func: () => {
      const getText = (el: Element): string => {
        if (el.tagName === "SCRIPT" || el.tagName === "STYLE") return "";
        return Array.from(el.childNodes)
          .map((n) => n.nodeType === Node.TEXT_NODE
            ? (n.textContent ?? "").trim()
            : n.nodeType === Node.ELEMENT_NODE ? getText(n as Element) : "")
          .filter(Boolean).join(" ");
      };
      return {
        title: document.title,
        url: location.href,
        text: getText(document.body).slice(0, 15000),
        links: Array.from(document.querySelectorAll("a[href]")).slice(0, 50)
          .map((a) => ({ text: a.textContent?.trim() ?? "",
                         href: (a as HTMLAnchorElement).href })),
      };
    },
  });
  return results[0].result;
}

async function fillInput(params: Record<string, unknown>) {
  const { tabId: rawTabId, selector, value } = params as {
    tabId?: number; selector: string; value: string;
  };
  const tabId = rawTabId ?? (await getActiveTabId());

  const results = await chrome.scripting.executeScript({
    target: { tabId },
    func: (sel: string, val: string) => {
      const el = document.querySelector(sel) as HTMLInputElement | null;
      if (!el) return { filled: false, selector: sel };

      // Use the native setter so React/Vue/Angular detect the change
      const nativeSetter = Object.getOwnPropertyDescriptor(
        HTMLInputElement.prototype, "value"
      )?.set;
      nativeSetter?.call(el, val);
      el.dispatchEvent(new Event("input", { bubbles: true }));
      el.dispatchEvent(new Event("change", { bubbles: true }));
      return { filled: true, selector: sel };
    },
    args: [selector, value],
  });
  return results[0].result;
}

async function captureScreenshot(params: Record<string, unknown>) {
  const tabId = (params.tabId as number) ?? (await getActiveTabId());
  await ensureAttached(tabId);

  const result = (await chrome.debugger.sendCommand(
    { tabId }, "Page.captureScreenshot", { format: "png" }
  )) as { data: string };
  return { dataUrl: `data:image/png;base64,${result.data}` };
}

Two things worth noting. The fillInput function uses the native HTMLInputElement.prototype value setter instead of assigning to input.value directly. This is necessary because React, Vue, and Angular override the value property and only detect changes through the native setter. This is the same technique Playwright uses internally. The captureScreenshot function uses CDP via chrome.debugger.sendCommand rather than chrome.tabs.captureVisibleTab, because the latter only works for the active tab in the focused window.

Step 4: The MCP Bridge Server

The relay server sits between the extension and the AI agent. It speaks WebSocket to the extension and MCP (over stdio) to the agent. Here is the implementation:

// src/relay/server.ts

import { McpServer } from "@anthropic-ai/mcp";
import { WebSocketServer, WebSocket } from "ws";
import { z } from "zod";

let extensionSocket: WebSocket | null = null;
let pendingRequests = new Map<
  string,
  { resolve: (v: unknown) => void; reject: (e: Error) => void }
>();
let nextId = 0;

const wss = new WebSocketServer({ port: 9800 });

wss.on("connection", (ws, req) => {
  if (req.url !== "/extension") return;
  extensionSocket = ws;

  ws.on("message", (data) => {
    const msg = JSON.parse(data.toString());
    const pending = pendingRequests.get(msg.id);
    if (!pending) return;
    pendingRequests.delete(msg.id);
    msg.error
      ? pending.reject(new Error(msg.error.message))
      : pending.resolve(msg.result);
  });

  ws.on("close", () => { extensionSocket = null; });
});

function callExtension(
  method: string,
  params: Record<string, unknown>
): Promise<unknown> {
  if (!extensionSocket) {
    throw new Error("Extension not connected");
  }
  const id = String(nextId++);
  return new Promise((resolve, reject) => {
    pendingRequests.set(id, { resolve, reject });
    extensionSocket!.send(JSON.stringify({ id, method, params }));
    setTimeout(() => {
      if (pendingRequests.delete(id)) reject(new Error("Timeout"));
    }, 30_000);
  });
}

// MCP server -- register one tool per browser operation
const server = new McpServer({ name: "chrome-ai-bridge", version: "1.0.0" });

server.tool(
  "browser_navigate",
  "Navigate a tab to a URL",
  {
    url: z.string().describe("The URL to navigate to"),
    tabId: z.number().optional().describe("Tab ID (default: active tab)"),
  },
  async ({ url, tabId }) => {
    const result = await callExtension("browser.navigate", { url, tabId });
    return { content: [{ type: "text", text: JSON.stringify(result) }] };
  }
);

server.tool(
  "browser_read_page",
  "Extract text content, title, URL, and links from a page",
  { tabId: z.number().optional() },
  async ({ tabId }) => {
    const result = await callExtension("browser.readPage", { tabId });
    return { content: [{ type: "text", text: JSON.stringify(result) }] };
  }
);

server.tool(
  "browser_click",
  "Click an element matching a CSS selector",
  {
    selector: z.string().describe("CSS selector"),
    tabId: z.number().optional(),
  },
  async ({ selector, tabId }) => {
    const result = await callExtension("browser.click", { selector, tabId });
    return { content: [{ type: "text", text: JSON.stringify(result) }] };
  }
);

// Additional tools follow the same pattern: browser_fill, browser_screenshot,
// browser_evaluate, browser_get_tabs. Each delegates to callExtension()
// with the matching method name. The screenshot tool returns MCP image content:
//
//   const result = await callExtension("browser.screenshot", { tabId });
//   return { content: [{
//     type: "image",
//     data: result.dataUrl.replace(/^data:image\/png;base64,/, ""),
//     mimeType: "image/png",
//   }] };

async function main(): Promise<void> {
  console.error("[Relay] Listening on ws://localhost:9800");
  await server.connect(process.stdin, process.stdout);
}

main();

Notice how each MCP tool maps cleanly to an extension method. The relay is deliberately thin -- it does not interpret browser state or make decisions. It translates protocols and manages connection lifecycle.

Step 5: Registering the MCP Server

To make this available to AI agents, register it in your MCP client's configuration. For Claude Desktop, add this to your config:

{
  "mcpServers": {
    "chrome-ai-bridge": {
      "command": "npx",
      "args": ["tsx", "/path/to/src/relay/server.ts"]
    }
  }
}

For Claude Code, add it to .mcp.json:

{
  "mcpServers": {
    "chrome-ai-bridge": {
      "type": "stdio",
      "command": "npx",
      "args": ["tsx", "/path/to/src/relay/server.ts"]
    }
  }
}

The relay starts its WebSocket server on port 9800 and waits for the extension to connect. Once both sides are up, the AI agent has full browser access through the MCP tools.

Step 6: Content Script for Deep DOM Access

Some operations need direct DOM access that chrome.scripting.executeScript cannot provide efficiently -- for example, observing mutations over time or maintaining persistent state across multiple calls. A content script solves this:

// src/extension/content.ts

function getInteractiveElements() {
  const selectors = [
    "a[href]", "button", "input", "textarea", "select",
    "[role='button']", "[role='link']", "[onclick]",
  ];

  return Array.from(document.querySelectorAll(selectors.join(",")))
    .filter((el) => {
      const rect = el.getBoundingClientRect();
      const style = getComputedStyle(el);
      return rect.width > 0 && rect.height > 0
        && style.display !== "none" && style.visibility !== "hidden";
    })
    .slice(0, 200)
    .map((el) => {
      const rect = el.getBoundingClientRect();
      return {
        tag: el.tagName.toLowerCase(),
        id: el.id,
        text: (el.textContent ?? "").trim().slice(0, 100),
        rect: { x: Math.round(rect.x), y: Math.round(rect.y),
                width: Math.round(rect.width), height: Math.round(rect.height) },
      };
    });
}

chrome.runtime.onMessage.addListener((message, _sender, sendResponse) => {
  if (message.type === "getInteractiveElements") {
    sendResponse(getInteractiveElements());
  }
  return true;
});

This content script builds a semantic map of all interactive elements on the page: buttons, links, inputs, and anything with click handlers. This is much more useful to an AI agent than raw HTML. Instead of parsing a 500KB DOM tree, the agent gets a structured list of elements it can actually interact with, complete with bounding boxes for visual grounding.

Handling Real-World Complexity

The Detach Problem

Chrome detaches the debugger whenever you open DevTools on a tab that has a debugger attached. Your extension needs to handle this gracefully:

chrome.debugger.onDetach.addListener((source, reason) => {
  if (source.tabId) {
    attachedTabs.delete(source.tabId);
    console.log(
      `[AI Bridge] Debugger detached from tab ${source.tabId}: ${reason}`
    );
  }
});

Always reattach lazily (in ensureAttached) rather than eagerly. If a tab's debugger gets detached, the next CDP command will reattach it automatically.

Service Worker Lifecycle

Manifest V3 service workers can be terminated after 30 seconds of inactivity. A WebSocket connection keeps the worker alive, but only while messages are flowing. Add a keepalive ping:

setInterval(() => {
  if (socket?.readyState === WebSocket.OPEN) {
    socket.send(JSON.stringify({ method: "ping" }));
  }
}, 20_000);

Cross-Origin Iframes

By default, chrome.scripting.executeScript runs in the top frame only. If the content you need is inside an iframe, specify the frame:

const results = await chrome.scripting.executeScript({
  target: { tabId, allFrames: true },
  func: () => document.title,
});
// results is an array, one entry per frame

Rate Limiting and Safety

If you expose a browser_evaluate tool, block patterns like chrome.runtime, chrome.storage, window.open, and document.cookie = to prevent an agent from accidentally executing dangerous operations. This is not a security boundary -- pattern matching is always bypassable -- but it is a useful guardrail. Combine it with Chrome's extension permissions and the agent framework's own safety layers for defense in depth.

Testing the Complete Flow

Start the relay server with npx tsx src/relay/server.ts, then load the extension in Chrome via chrome://extensions (Developer Mode, "Load unpacked," point to src/extension). The service worker connects to the relay automatically.

When an agent calls browser_read_page, the request flows: MCP client -> relay server -> WebSocket -> extension -> chrome.scripting.executeScript -> page context -> structured result back through the same chain. Round-trip latency on localhost is 15-40ms for text extraction and 80-150ms for screenshots.

What You Can Build With This

Once an AI agent has browser access, practical applications include: extracting data from dashboards behind SSO using your existing login session; filling expense reports and time sheets from natural language instructions; reading data from one tab and entering it into another to bridge SaaS apps with no API integration; and capturing screenshots for visual QA with multimodal models. The key advantage in every case is that the agent uses your authenticated session -- no credential management, no bot detection.

Conclusion

The combination of a Chrome extension and a local MCP relay server gives AI agents something they cannot get any other way: access to the web as you see it, with your cookies, your permissions, and your authenticated sessions. The CDP relay pattern handles low-level browser control. The MCP bridge presents clean, structured tools to any compatible agent. And the extension's security model keeps everything sandboxed inside Chrome's permission system.

The full architecture -- extension, relay, MCP server -- is roughly 400 lines of TypeScript. Small enough to understand completely, flexible enough to extend with any browser capability Chrome exposes. Start with the five core tools (navigate, readPage, click, fill, screenshot), then add domain-specific ones as your workflows demand.

The browser is the last major platform that AI agents cannot access natively. Extensions are the key that unlocks it.

DEV Community

Building Chrome Extensions That Bridge AI Agents to Your Browser

Building Chrome Extensions That Bridge AI Agents to Your Browser

Why an Extension and Not Just Puppeteer?

Architecture Overview

Step 1: Extension Scaffold

Step 2: The CDP Relay Pattern

Step 3: High-Level Browser Operations

Step 4: The MCP Bridge Server

Step 5: Registering the MCP Server

Step 6: Content Script for Deep DOM Access

Handling Real-World Complexity

The Detach Problem

Service Worker Lifecycle

Cross-Origin Iframes

Rate Limiting and Safety

Testing the Complete Flow

What You Can Build With This

Conclusion

Top comments (0)