DEV Community

Wilson Xu
Wilson Xu

Posted on

Turn Any Website Into a CLI API Using Chrome DevTools Protocol

Building a CLI That Controls Any Website Using Chrome DevTools Protocol and AI

Browser automation has traditionally meant spinning up headless instances, wrestling with authentication flows, and maintaining fragile selectors. But there is a better way. By connecting a CLI tool directly to your existing browser session through the Chrome DevTools Protocol (CDP), you can build command-line interfaces that control any website you are already logged into — no credential management, no CAPTCHA solving, no cookie juggling.

In this tutorial, you will build a TypeScript CLI that attaches to a running Chrome instance, executes scripts on live pages, extracts structured data, and bridges the gap between terminal workflows and the modern web. This is the same architectural pattern behind tools like Puppeteer's connect() mode and the emerging Model Context Protocol (MCP) ecosystem that lets AI agents drive browsers.

Why Reuse a Live Browser Session?

Most web automation tutorials start with puppeteer.launch() or playwright.chromium.launch(). That creates a fresh browser with no cookies, no extensions, no logged-in sessions. You immediately hit two problems:

  1. Authentication is hard. OAuth flows, MFA, CAPTCHAs, and bot detection all fight against automated logins.
  2. The page you see isn't the page your bot sees. Extensions, feature flags tied to accounts, and personalized content all differ.

By connecting to a browser you already use, you sidestep both problems entirely. Your CLI inherits every cookie, every extension, every authenticated session. If you can see it in your browser, your CLI can interact with it.

This pattern is especially powerful for:

  • Internal tools behind SSO where programmatic auth is impossible
  • Dashboards that require human login but repetitive data extraction
  • AI-assisted workflows where an LLM needs to read or act on page content
  • Developer tooling that augments your browser with terminal superpowers

Architecture Overview

The system has three layers:

┌──────────────┐     CDP (WebSocket)     ┌──────────────────┐
│   Your CLI   │ ◄──────────────────────► │  Chrome Browser   │
│  (Node.js)   │                          │  (--remote-debug) │
└──────┬───────┘                          └──────────────────┘
       │
       │  Optional: MCP Bridge
       │
┌──────▼───────┐
│   AI Agent   │
│  (LLM/Tool)  │
└──────────────┘
Enter fullscreen mode Exit fullscreen mode

Chrome DevTools Protocol (CDP) is a WebSocket-based protocol that Chrome exposes when launched with the --remote-debugging-port flag. It gives you full programmatic access to every tab: DOM manipulation, JavaScript evaluation, network interception, cookie access, screenshots, and more.

Model Context Protocol (MCP) is an open standard for connecting AI models to external tools. By wrapping your CDP bridge as an MCP server, you let AI agents drive the browser through a well-defined tool interface — the same architecture used by Claude's computer use capabilities and similar systems.

The key insight: CDP is not just for testing. It is a general-purpose browser control API, and combining it with a CLI gives you a scriptable interface to the entire web.

Step 1: Launch Chrome with Remote Debugging

First, start Chrome with the debugging port open. This is the only setup required on the browser side.

macOS:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222
Enter fullscreen mode Exit fullscreen mode

Linux:

google-chrome --remote-debugging-port=9222
Enter fullscreen mode Exit fullscreen mode

Windows:

& "C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
Enter fullscreen mode Exit fullscreen mode

Verify it is working by visiting http://localhost:9222/json/version in another browser or with curl:

curl http://localhost:9222/json/version
Enter fullscreen mode Exit fullscreen mode

You should see JSON with the browser's WebSocket debugger URL:

{
  "Browser": "Chrome/131.0.6778.86",
  "Protocol-Version": "1.3",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/browser/..."
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Project Setup

Initialize a TypeScript project with the CDP client library:

mkdir browser-cli && cd browser-cli
npm init -y
npm install chrome-remote-interface commander
npm install -D typescript @types/node tsx
npx tsc --init --target ES2022 --module Node16 --moduleResolution Node16
Enter fullscreen mode Exit fullscreen mode

chrome-remote-interface is a lightweight CDP client. commander handles CLI argument parsing. We use tsx to run TypeScript directly during development.

Step 3: Connect to Chrome and List Open Tabs

Create src/browser.ts — the core module that manages the CDP connection:

import CDP from "chrome-remote-interface";

export interface TabInfo {
  id: string;
  title: string;
  url: string;
}

export async function listTabs(port = 9222): Promise<TabInfo[]> {
  const targets = await CDP.List({ port });
  return targets
    .filter((t) => t.type === "page")
    .map((t) => ({
      id: t.id,
      title: t.title,
      url: t.url,
    }));
}

export async function connectToTab(
  tabId: string,
  port = 9222
): Promise<CDP.Client> {
  const client = await CDP({ target: tabId, port });
  await client.Runtime.enable();
  await client.Network.enable();
  await client.DOM.enable();
  return client;
}

export async function evaluateOnPage<T>(
  client: CDP.Client,
  expression: string
): Promise<T> {
  const { result, exceptionDetails } = await client.Runtime.evaluate({
    expression,
    returnByValue: true,
    awaitPromise: true,
  });

  if (exceptionDetails) {
    throw new Error(
      `Page evaluation failed: ${exceptionDetails.text}\n` +
        `${exceptionDetails.exception?.description ?? ""}`
    );
  }

  return result.value as T;
}
Enter fullscreen mode Exit fullscreen mode

Three functions, each doing one thing:

  • listTabs queries the CDP HTTP endpoint for all open page targets.
  • connectToTab opens a WebSocket connection to a specific tab and enables the domains we need.
  • evaluateOnPage runs arbitrary JavaScript in the context of the connected page and returns the result.

Step 4: Build the CLI Interface

Create src/cli.ts:

import { program } from "commander";
import { listTabs, connectToTab, evaluateOnPage } from "./browser.js";

program
  .name("browser-cli")
  .description("Control your browser from the terminal")
  .version("1.0.0");

program
  .command("tabs")
  .description("List all open browser tabs")
  .action(async () => {
    const tabs = await listTabs();
    tabs.forEach((tab, i) => {
      console.log(`[${i}] ${tab.title}`);
      console.log(`    ${tab.url}`);
      console.log(`    id: ${tab.id}`);
    });
  });

program
  .command("eval <tabIndex> <script>")
  .description("Evaluate JavaScript in a tab")
  .action(async (tabIndex: string, script: string) => {
    const tabs = await listTabs();
    const tab = tabs[parseInt(tabIndex, 10)];
    if (!tab) {
      console.error(`No tab at index ${tabIndex}`);
      process.exit(1);
    }

    const client = await connectToTab(tab.id);
    try {
      const result = await evaluateOnPage(client, script);
      console.log(JSON.stringify(result, null, 2));
    } finally {
      await client.close();
    }
  });

program
  .command("extract <tabIndex>")
  .description("Extract structured page data from a tab")
  .option("--selector <css>", "CSS selector to extract text from")
  .option("--attr <name>", "Attribute to extract instead of textContent")
  .option("--json", "Output as JSON array")
  .action(async (tabIndex: string, opts) => {
    const tabs = await listTabs();
    const tab = tabs[parseInt(tabIndex, 10)];
    if (!tab) {
      console.error(`No tab at index ${tabIndex}`);
      process.exit(1);
    }

    const client = await connectToTab(tab.id);
    try {
      const selector = opts.selector || "h1, h2, h3, p";
      const attr = opts.attr || null;

      const data = await evaluateOnPage<string[]>(
        client,
        `Array.from(document.querySelectorAll(${JSON.stringify(selector)}))
          .map(el => ${attr ? `el.getAttribute(${JSON.stringify(attr)})` : "el.textContent.trim()"})
          .filter(Boolean)`
      );

      if (opts.json) {
        console.log(JSON.stringify(data, null, 2));
      } else {
        data.forEach((item) => console.log(item));
      }
    } finally {
      await client.close();
    }
  });

program.parse();
Enter fullscreen mode Exit fullscreen mode

Add a run script to package.json:

{
  "scripts": {
    "cli": "tsx src/cli.ts"
  }
}
Enter fullscreen mode Exit fullscreen mode

Now you can use it:

# List all open tabs
npm run cli -- tabs

# Get the page title from the first tab
npm run cli -- eval 0 "document.title"

# Extract all links from a page
npm run cli -- extract 0 --selector "a[href]" --attr href --json
Enter fullscreen mode Exit fullscreen mode

Step 5: Network Interception and Data Capture

One of CDP's most powerful features is intercepting network requests. This lets you capture API responses that the page fetches, giving you structured data without scraping the DOM at all.

Add this to src/browser.ts:

export async function captureRequests(
  client: CDP.Client,
  urlPattern: string,
  duration = 10000
): Promise<Array<{ url: string; status: number; body: string }>> {
  const captured: Array<{ url: string; status: number; body: string }> = [];
  const pending = new Map<string, { url: string; status: number }>();

  client.on("Network.responseReceived", (params) => {
    if (params.response.url.includes(urlPattern)) {
      pending.set(params.requestId, {
        url: params.response.url,
        status: params.response.status,
      });
    }
  });

  client.on("Network.loadingFinished", async (params) => {
    const meta = pending.get(params.requestId);
    if (!meta) return;

    try {
      const { body } = await client.Network.getResponseBody({
        requestId: params.requestId,
      });
      captured.push({ ...meta, body });
    } catch {
      // Response body may not be available for all requests
    }
    pending.delete(params.requestId);
  });

  // Wait for the specified duration to collect responses
  await new Promise((resolve) => setTimeout(resolve, duration));
  return captured;
}
Enter fullscreen mode Exit fullscreen mode

Use this in a CLI command to capture API traffic:

# Capture all XHR responses matching "api" for 5 seconds
npm run cli -- capture 0 --pattern "api" --duration 5000
Enter fullscreen mode Exit fullscreen mode

This is enormously useful. Many modern SPAs fetch all their data via JSON APIs. Instead of parsing DOM elements, you capture the raw API responses — perfectly structured data with no scraping required.

Advanced Patterns

Cookie Extraction for Downstream Use

Sometimes you need your browser's session cookies in other tools — a curl command, a Python script, or an API client. CDP makes this trivial:

export async function extractCookies(
  client: CDP.Client,
  domain?: string
): Promise<Array<{ name: string; value: string; domain: string }>> {
  const { cookies } = await client.Network.getCookies();
  const filtered = domain
    ? cookies.filter((c) => c.domain.includes(domain))
    : cookies;

  return filtered.map((c) => ({
    name: c.name,
    value: c.value,
    domain: c.domain,
  }));
}
Enter fullscreen mode Exit fullscreen mode

Pipe cookies directly into curl:

# Extract session cookies and use them in a curl request
COOKIES=$(npm run cli -- cookies 0 --domain "github.com" --format curl)
curl -b "$COOKIES" https://api.github.com/user
Enter fullscreen mode Exit fullscreen mode

Header Injection via Fetch Override

You can override the browser's fetch function to inject headers into all outgoing requests from a page. This is useful for adding API keys or authorization tokens on the fly:

export async function injectHeaders(
  client: CDP.Client,
  headers: Record<string, string>
): Promise<void> {
  await client.Network.setExtraHTTPHeaders({ headers });
}
Enter fullscreen mode Exit fullscreen mode

This modifies every request the tab makes, including XHR and fetch calls. Use it to add bearer tokens, custom tracing headers, or API version headers without modifying the website's code.

Waiting for Dynamic Content

Real pages load data asynchronously. A robust CLI needs to wait for content rather than hoping it is already there:

export async function waitForSelector(
  client: CDP.Client,
  selector: string,
  timeout = 10000
): Promise<boolean> {
  const poll = `new Promise((resolve, reject) => {
    const interval = setInterval(() => {
      if (document.querySelector(${JSON.stringify(selector)})) {
        clearInterval(interval);
        resolve(true);
      }
    }, 100);
    setTimeout(() => {
      clearInterval(interval);
      reject(new Error("Timeout waiting for ${selector}"));
    }, ${timeout});
  })`;

  return evaluateOnPage<boolean>(client, poll);
}
Enter fullscreen mode Exit fullscreen mode

Bridging to AI: The MCP Pattern

The Model Context Protocol lets you expose your CDP bridge as a set of tools that AI agents can call. Here is a minimal MCP server that wraps the functions you have already built:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { listTabs, connectToTab, evaluateOnPage } from "./browser.js";

const server = new McpServer({
  name: "browser-bridge",
  version: "1.0.0",
});

server.tool("list_tabs", "List open browser tabs", {}, async () => {
  const tabs = await listTabs();
  return { content: [{ type: "text", text: JSON.stringify(tabs, null, 2) }] };
});

server.tool(
  "evaluate",
  "Run JavaScript in a browser tab",
  {
    tabIndex: z.number().describe("Index of the tab"),
    script: z.string().describe("JavaScript to evaluate"),
  },
  async ({ tabIndex, script }) => {
    const tabs = await listTabs();
    const client = await connectToTab(tabs[tabIndex].id);
    try {
      const result = await evaluateOnPage(client, script);
      return {
        content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
      };
    } finally {
      await client.close();
    }
  }
);

server.tool(
  "extract_content",
  "Extract text content from elements matching a CSS selector",
  {
    tabIndex: z.number().describe("Index of the tab"),
    selector: z.string().describe("CSS selector"),
  },
  async ({ tabIndex, selector }) => {
    const tabs = await listTabs();
    const client = await connectToTab(tabs[tabIndex].id);
    try {
      const data = await evaluateOnPage<string[]>(
        client,
        `Array.from(document.querySelectorAll(${JSON.stringify(selector)}))
          .map(el => el.textContent.trim()).filter(Boolean)`
      );
      return { content: [{ type: "text", text: data.join("\n") }] };
    } finally {
      await client.close();
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);
Enter fullscreen mode Exit fullscreen mode

Once registered in your AI client's MCP configuration, an LLM can say "read the title of my first tab" and the tool calls flow through: LLM invokes list_tabs, picks the index, then calls evaluate with document.title. The AI can now see and interact with everything in your browser.

Real-World Use Cases

Internal dashboard reporting. Your company's analytics dashboard requires SSO and shows data that has no API. Connect to the tab, extract the numbers with querySelectorAll, and pipe them into a spreadsheet or Slack message on a cron job.

Competitive price monitoring. Log into a supplier portal in your browser, then use the CLI to extract pricing tables on demand. Because you reuse the session, you never trigger bot detection.

AI-assisted form filling. Point an AI agent at a complex multi-step form (insurance quotes, government applications). The LLM reads each page's fields via the MCP bridge, decides what to fill in, and submits — all through your authenticated session.

Developer workflow automation. Extract the current Jira ticket from your browser, pull the description, and auto-generate a git branch name and commit template — all from one terminal command.

Testing against production state. Connect to a production app you are logged into and run assertions against the actual DOM. No need to replicate auth in your test environment.

Security Considerations

Exposing CDP on port 9222 means any local process can control your browser. Keep these precautions in mind:

  • Bind to localhost only. Chrome does this by default, but never use --remote-debugging-address=0.0.0.0.
  • Close the debugging port when not in use. Restart Chrome normally when you are done.
  • Be careful with cookie extraction. Session tokens in logs or shell history are a security risk. Pipe them directly rather than echoing to the terminal.
  • Do not run untrusted scripts. The eval command executes arbitrary JavaScript with full page privileges.

Conclusion

The Chrome DevTools Protocol turns your browser into a programmable platform. By connecting to an existing session rather than launching a new one, you skip the hardest parts of web automation — authentication, bot detection, and environment parity — and go straight to building useful tools.

The CLI you built in this tutorial is a foundation. From here you can add site-specific commands (a github subcommand that lists your notifications, a jira subcommand that extracts sprint data), wrap everything as an MCP server for AI agents, or build a full TUI with interactive tab selection.

The code from this tutorial is deliberately minimal — under 200 lines for the core — because the architecture itself is the insight. CDP gives you the bridge. Your CLI gives you the interface. What you build on top is limited only by what your browser can render.

Key takeaways:

  • Launch Chrome with --remote-debugging-port=9222 to enable CDP access.
  • Use chrome-remote-interface to connect, evaluate scripts, and intercept network traffic.
  • Reusing a live session means zero authentication code for any site you are logged into.
  • The MCP bridge pattern turns your CLI into an AI-accessible tool, enabling LLM-driven browser automation.
  • Network interception often yields cleaner data than DOM scraping, since you capture the raw API responses.

Start with tabs and eval. Once you see your browser respond to terminal commands, you will never look at web automation the same way again.


Check out my other CLI tools: websnap-reader, gitpulse, depcheck-ai

Top comments (0)