DEV Community

Cover image for Stop Feeding DOM Snapshots to Claude — Use a Rubber Duck Instead
Mike
Mike

Posted on

Stop Feeding DOM Snapshots to Claude — Use a Rubber Duck Instead

This is Part 2. If you're new to MCP Rubber Duck, start with Part 1: Stop Copy-Pasting Between AI Tabs for the basics.

The Problem: Chrome DevTools MCP Eats Your Context

If you use Chrome DevTools MCP for browser automation in Claude Code, you've probably noticed something painful.

Every take_snapshot call returns the entire page as a Chrome accessibility tree. For a complex web app, that's 20–30k+ characters — roughly 5–15k tokens. Every click-and-check cycle dumps all of that into your host LLM's context window.

A typical multi-step browser flow needs about 6 snapshots. That's tens of thousands of tokens of raw DOM fed to Opus just to find a few buttons.

If you're on a Claude Code subscription, this eats into your usage limits and triggers context compaction sooner. If you're on API billing, it hits your wallet directly.

What if the DOM never touched Claude's context at all?

The Solution: Ducks As Middleware

MCP Rubber Duck is an MCP server that lets you route work to other LLMs — Gemini, GPT, Groq, local models — and MCP tools. Its MCP bridge lets ducks call other MCP servers autonomously.

I connected Chrome DevTools to the bridge. Now a cheap model (Gemini Flash) does all the DOM processing, and Claude only sees short summaries like "uid is 8_37".

Here's the flow:

Claude → ask_duck("find the Submit button")
Duck   → [calls take_snapshot, parses 25k chars]
Duck   → "uid is 1_462"
Claude → [sees 10 tokens, not 15,000]
Enter fullscreen mode Exit fullscreen mode

The DOM snapshot lives and dies inside the duck's context. Your host LLM never touches it.

Setup

You'll need:

  • Claude Code (or any MCP host)
  • mcp-rubber-duck
  • Chrome DevTools MCP
  • A Gemini API key (or any supported provider)

Step 1: Add Chrome DevTools to the Duck Bridge

In your Claude Code config file, add these environment variables to the rubber-duck MCP server:

macOS/Linux: ~/.claude.json
Windows: %USERPROFILE%.claude.json

"MCP_SERVER_CHROME_TYPE": "stdio",
"MCP_SERVER_CHROME_COMMAND": "npx",
"MCP_SERVER_CHROME_ARGS": "chrome-devtools-mcp@latest",
"MCP_SERVER_CHROME_ENABLED": "true",
"MCP_TRUSTED_TOOLS_CHROME": "*"
Enter fullscreen mode Exit fullscreen mode

These go inside the env block of your existing rubber-duck MCP server configuration.

Step 2: Remove Direct Chrome MCP

Important: Only one process can own the Chrome profile. Remove any direct chrome-devtools MCP server from your project config so it doesn't conflict with the duck's bridge.

Two chrome-devtools-mcp processes fighting over a SingletonLock file is not a fun debugging session.

Step 3: Restart and Verify

Restart Claude Code, then check the bridge status:

mcp__rubber-duck__mcp_status

🟢 chrome (stdio) - connected, 26 tools
Enter fullscreen mode Exit fullscreen mode

The duck now has access to all 26 Chrome DevTools tools: click, take_snapshot, navigate_page, fill, wait_for, and more.

See It In Action

Here's what changes in practice:

Before (Direct Chrome MCP)

→ take_snapshot [entire DOM into Opus context]
→ Opus parses it, finds uid
→ Usage: ~5–15k Opus tokens per snapshot
Enter fullscreen mode Exit fullscreen mode

After (Duck Bridge)

→ ask_duck(gemini):
  "Call take_snapshot. Find button containing Submit.
   Report ONLY its uid."
→ Gemini Flash: "8_37"
  [DOM processed in duck's context, invisible to Opus]
→ Opus sees: "8_37"
→ Usage: ~100 Opus tokens
  + Gemini tokens (your Gemini API, not Claude quota)
Enter fullscreen mode Exit fullscreen mode

The numbers for a typical multi-step browser automation:

Direct Chrome MCP Duck Bridge
Opus tokens per snapshot ~5,000–15,000 ~100 (summary only)
Snapshots seen by Opus ~6 0
Total Opus context impact Tens of thousands of tokens ~600 tokens
Who processes DOM Opus (your subscription) Gemini Flash (pennies via API)

Gotchas You Should Know

I learned these the hard way so you don't have to.

1. One Tool Per Duck Prompt

In practice, Gemini Flash works best when each prompt triggers a single, focused tool call.

❌ "Navigate to the page, take snapshot, find the button"
   → [empty response or confused output]

✅ "Call take_snapshot MCP tool.
    Find the Submit button. Report ONLY its uid."
   → "1_462"
Enter fullscreen mode Exit fullscreen mode

2. Cache Busting

Rubber Duck caches identical prompts by design. If you need to click the same button twice, vary your wording:

❌ Same prompt twice → second one returns cached

✅ "Call click with uid 8_37. Report the result."
   "Click the Submit button now. Call click with uid 8_37."
   → Both execute
Enter fullscreen mode Exit fullscreen mode

3. Directive Prompts

LLMs sometimes philosophize about tools instead of calling them. Be direct:

❌ "Can you take a snapshot?"
   → "I can call take_snapshot, but it provides
      a text snapshot of the accessibility tree..."

✅ "Call take_snapshot MCP tool. Report what you see."
   → [actually does it]
Enter fullscreen mode Exit fullscreen mode

Use "Call X" not "Can you use X". Be the manager, not the coworker.

Bonus: Multimodal Possibilities

The setup above uses take_snapshot (text accessibility tree), but Chrome DevTools also supports take_screenshot (actual images). Since Gemini is multimodal, you could have the duck process visual screenshots:

ask_duck(gemini):
  "Call take_screenshot. Describe what you see.
   Is there a Submit button? Where is it?"
Enter fullscreen mode Exit fullscreen mode

Visual debugging through a cheap multimodal model, without the image ever touching your host LLM's context. I haven't fully tested this path yet, but the architecture supports it.

Beyond Chrome DevTools

This pattern isn't limited to browser automation. Any MCP tool that produces large output can be routed through a duck:

  • Documentation scrapers — duck reads the docs, returns a summary
  • Code analyzers — duck processes the report, returns key findings
  • Log parsers — duck digests thousands of lines, returns what matters
  • Database queries — duck processes large result sets, returns insights

The principle is the same: keep your expensive host LLM focused on reasoning, offload data processing to a cheap model.

You could use even cheaper models for this. gemini-2.5-flash-lite has a massive context window and costs almost nothing — perfect for tasks where you don't need deep reasoning, just "find the thing and tell me about it."

The Architecture

┌──────────────────────────────────┐
│ Claude Code (Opus)               │
│                                  │
│  "ask_duck: find Submit button"  │
│                                  │
│  ┌────────────────────────────┐  │
│  │ Rubber Duck MCP Server     │  │
│  │                            │  │
│  │  Gemini Flash ←→ Chrome    │  │
│  │  [processes       DevTools │  │
│  │   entire DOM]    [26 tools]│  │
│  └────────────────────────────┘  │
│                                  │
│  Duck returns: "uid is 8_37"     │
│  Opus context: ~100 tokens       │
└──────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The Bottom Line

Chrome DevTools MCP is powerful but context-hungry. By routing it through MCP Rubber Duck, you get the same browser automation capabilities while keeping your host LLM's context clean. The DOM enters the duck. A uid exits the duck. Your context window stays focused on what matters.

Try It Now

GitHub: github.com/nesquikm/mcp-rubber-duck

The MCP bridge supports any server — stdio or HTTP. Chrome DevTools is just one use case. If you have an MCP tool that produces massive output, a duck can probably tame it.

Your Turn

Have you hit context window limits with Chrome DevTools MCP or other data-heavy tools? I'd love to hear how you're managing it. Drop a comment or open an issue on the repo — the ducks are listening. 🦆

Top comments (0)