DEV Community

Cover image for Stop Giving LLMs Raw HTML: Build a Graph-Native, 96% More Efficient MCP Web Browser
UnitBuilds for UnitBuilds CC

Posted on

Stop Giving LLMs Raw HTML: Build a Graph-Native, 96% More Efficient MCP Web Browser

As developers building AI agents, we’ve all run into the same massive bottleneck: how do you let a Language Model (LLM) browse the web without destroying your API budget or driving execution latency through the roof?

Most current agent frameworks solve web browsing in one of two ways:

  1. Raw HTML Scraping: Injecting hundreds of thousands of tokens of raw HTML (complete with styling classes, analytics scripts, and SVG paths) straight into the context window.
  2. Visual Screenshotting: Sending high-resolution images to a Vision LLM (VLM) and hoping it doesn't hallucinate element coordinates when clicking buttons.

Both methods are slow, expensive, and fragile.

Today, we’re open-sourcing MCP Lite—a standalone, graph-native Model Context Protocol (MCP) browser automation server written in Go. By utilizing Chrome’s Accessibility Object Model (AOM), custom stealth extensions, and Neo4j graph transition mapping, we reduced the context footprint by up to 96.7% and achieved a 3.0x execution speedup.

Here is how we built it, the benchmarks we hit, and how you can use it in your agent stack.


The Core Architecture: Semantic AOM Pruning

When a human looks at a webpage, they don't read the raw HTML code. They look for interactive components: buttons, text fields, links, and headers. Chrome’s accessibility engine translates the visual DOM into a structured Accessibility Object Model (AOM) for screen readers.

MCP Lite taps directly into Chrome's AOM via chromedp. We extract this tree and apply a recursive pruning filter:

  • Prune Non-Semantic Nodes: We strip away generic containers, layout tables, and presentation divs.
  • Redundant Flattening: If a link contains a single text child with the same label, we flatten it.
  • Retain Spatial Bounding Boxes: We keep physical screen coordinates (x, y, width, height) only for actionable nodes.

Here is the visual lifecycle of a page representation as it moves through the pruning pipeline:

graph TD
    HTML["Raw HTML / DOM (100K+ Tokens)"] -->|CDP Session| AOM["Accessibility Object Model (AOM)"]
    AOM -->|Pruning & Redundancy Filter| Pruned["Pruned AOM (90%+ Token Reduction)"]
    Pruned -->|Targeted Clicks| Action["Logical Clicks (nodeId)"]
    Pruned -->|SHA-256 Signature| Hashing["Deterministic State Hashing"]
Enter fullscreen mode Exit fullscreen mode

The Impact: Wikipedia & GitHub Benchmarks

We ran our benchmarking engine against real-world targets comparing the pruned AOM against standard full-HTML extraction:

Target Site Metric Standard Full-HTML MCP Lite (Pruned AOM) Token Savings / Speedup
wikipedia.org Payload Size
Est. Tokens
435 KB
108,863
47 KB
11,956
89.02% reduction in input tokens
Latency (3 Steps) 9.0s (100% Agent Loop) 3.0s (0% Agent Loop) 3.0x speedup via workflow playback
github.com Payload Size
Est. Tokens
594 KB
148,625
19 KB
4,774
96.79% reduction (Perfect for modern SPAs)

Instead of feeding 148,000 tokens to the LLM for a single GitHub page, the agent only processes 4,770 tokens—making inference faster and significantly cheaper.


Bypassing Bot Detection: Hardening Chrome

Automating browsers usually triggers bot protection (Cloudflare, DataDome, etc.). MCP Lite loads a custom content script extension (inject.js) at startup that executes at document_start across all frames:

  1. navigator.webdriver Masking: Instead of just overriding the property, we mock the getter. If a bot detector calls .toString() on navigator.webdriver, our mock returns the exact native string signature:
   function get webdriver() { [native code] }
Enter fullscreen mode Exit fullscreen mode
  1. PluginArray Prototype Realignment: Standard mock scripts fail checks like navigator.plugins instanceof PluginArray. We reconstruct the fake plugins list and link their prototypes directly to PluginArray.prototype and Plugin.prototype.
  2. Permissions Query Alignment: We align Notification.permission and navigator.permissions.query to resolve to prompt, matching real browser configurations.

The result? A 100% green pass on benchmarks like bot.sannysoft.com and seamless access to sites like G2.com.


Graph-Native Navigation Mapping

Every web interaction is a state transition. When the agent clicks a button, the page layout changes.

MCP Lite integrates with Neo4j to represent these layouts as nodes and interactions as edges. At every page load:

  1. We compute a SHA-256 hash of the serialized pruned AOM structure to represent the page’s unique layout State.
  2. We log transitions as an INTERACTED edge:
MERGE (f:State {hash: $from})
MERGE (t:State {hash: $to})
MERGE (f)-[r:INTERACTED {role: $role, name: $name, index: $index}]->(t)
SET r.x = $x, r.y = $y, r.w = $w, r.h = $h
Enter fullscreen mode Exit fullscreen mode

If an agent needs to navigate from its current page to a target URL, it queries Neo4j for the shortest path of logical interactions (e.g. "click Search -> type username -> click Submit") and replays them instantly, avoiding reasoning loops.


Replaying Workflows with Zero LLM Overhead

For repetitive automation (such as bulk form submissions or multi-page scrapes), running the LLM in a step-by-step loop is expensive.

MCP Lite includes a workflow engine:

  • Record: Call workflow_record_start and perform actions.
  • Template: Actions are written to a JSON file. Use double-braces ({{variable_name}}) to define input fields.
  • Replay: Play back the workflow with an input dataset:
[
  {"search_query": "adidas campus green"},
  {"search_query": "nike air force white"}
]
Enter fullscreen mode Exit fullscreen mode

The browser handles the automation loop natively, feeding variables into the template and running actions sequentially with 0% LLM overhead during playback.


Try It Yourself!

The repository is open-source under the GNU AGPL-v3 license.

Quick Build

git clone https://github.com/UnitBuilds-CC/MCP-Lite.git
cd mcp-lite
go mod tidy
build.bat # Rebuilds server, benchmark, and test binaries
Enter fullscreen mode Exit fullscreen mode

Configure Claude Desktop

Add it to your claude_desktop_config.json:

{
  "mcpServers": {
    "agentic-browser-mcp-lite": {
      "command": "C:\\path\\to\\mcp-lite\\mcp-server.exe",
      "env": {
        "NEO4J_URI": "bolt://localhost:7687",
        "NEO4J_USER": "neo4j",
        "NEO4J_PASS": "secure_password"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Check out the code, run the benchmark suite against your own target sites, and join us in building a more efficient web automation standard!

Star MCP-Lite on GitHub

GitHub logo UnitBuilds-CC / MCP-Lite

AOM driven Agentic browser MCP

Agentic Browser (MCP Lite) 🚀

A highly-efficient, standalone, graph-native version of the Agentic Browser MCP Server. It provides direct LLM control over Google Chrome via the Model Context Protocol (MCP) using chromedp. It integrates with Neo4j to build site graphs, query shortest action paths, and automate complex workflows with looping and parameter binding.

Features

  1. Raw MCP Browser Automation: Perform standard interactions (navigate, click, type, scroll, wait, drag, screenshot, list frames) using standard logical pixel offsets.
  2. Neo4j Graph Logging: Automatically logs page navigation data (titles, links, scripts, and cookies) and AOM state-transition graphs directly to a Neo4j database.
  3. BFS Site Crawling: Traverses a site using Breadth-First Search (BFS) to map structural accessibility states and link action transitions.
  4. Shortest Path Execution: Query Neo4j for the shortest sequence of actions from any start state/URL to a target state/URL, and execute the transitions automatically.
  5. Workflow Recording & Execution





Discussion

How are you currently handling web browsing for your AI agents? Are you hitting context limits with raw HTML, or have you tried vision-based approaches?

I'd love to hear your thoughts on using the AOM vs. the DOM for these types of workflows!

Top comments (12)

Collapse
 
lanternproton profile image
keeper

Solid approach — AOM over raw DOM is a genuinely good insight, and the benchmarks show real numbers. A couple of questions though:

  1. How does it hold up on SPA-heavy sites (React dashboards, maps, etc.) where AOM coverage is spottier than on Wikipedia/GitHub? Would be interesting to see the same benchmark on something like a Google Sheets or Notion page.

  2. Neo4j for state transitions makes sense for complex workflows, but for simpler "navigate → extract → done" patterns, do you see it as a net win or does the infrastructure overhead outweigh the benefit?

Either way, clean write-up. Curious to see how this evolves.

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC
  1. While not as fantastic as if it were native AOM, it does use heuristic search for interactives and it has inspect_node to isolate and send commands in respect to the node id, resulting still in cleaner interactivity and a token reduction vs traditional.

  2. Neo4j is integrated as a 'long-game' solution. Mapping sites at enterprise level, would mean a large amount of ram usage, but practically 0 inference cost on any action triggered on that site after the initial mapping. So as the numbers scale gets cranked, efficiency rises. a little overhead for scalability makes a big difference when it's system memory instead of vram + inference

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

Docs updated on the repo for the SPA test

Collapse
 
tizwildin profile image
Gary Doman/TizWildin

i would need standard benchmark stats vs this, this is potentially novel information if you can prove theres no intelligence loss in any way or corners skipped

Collapse
 
unitbuilds profile image
UnitBuilds UnitBuilds CC

docs updated.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.