As developers building AI agents, we’ve all run into the same massive bottleneck: how do you let a Language Model (LLM) browse the web without destroying your API budget or driving execution latency through the roof?
Most current agent frameworks solve web browsing in one of two ways:
- Raw HTML Scraping: Injecting hundreds of thousands of tokens of raw HTML (complete with styling classes, analytics scripts, and SVG paths) straight into the context window.
- Visual Screenshotting: Sending high-resolution images to a Vision LLM (VLM) and hoping it doesn't hallucinate element coordinates when clicking buttons.
Both methods are slow, expensive, and fragile.
Today, we’re open-sourcing MCP Lite—a standalone, graph-native Model Context Protocol (MCP) browser automation server written in Go. By utilizing Chrome’s Accessibility Object Model (AOM), custom stealth extensions, and Neo4j graph transition mapping, we reduced the context footprint by up to 96.7% and achieved a 3.0x execution speedup.
Here is how we built it, the benchmarks we hit, and how you can use it in your agent stack.
The Core Architecture: Semantic AOM Pruning
When a human looks at a webpage, they don't read the raw HTML code. They look for interactive components: buttons, text fields, links, and headers. Chrome’s accessibility engine translates the visual DOM into a structured Accessibility Object Model (AOM) for screen readers.
MCP Lite taps directly into Chrome's AOM via chromedp. We extract this tree and apply a recursive pruning filter:
- Prune Non-Semantic Nodes: We strip away generic containers, layout tables, and presentation divs.
- Redundant Flattening: If a link contains a single text child with the same label, we flatten it.
-
Retain Spatial Bounding Boxes: We keep physical screen coordinates
(x, y, width, height)only for actionable nodes.
Here is the visual lifecycle of a page representation as it moves through the pruning pipeline:
graph TD
HTML["Raw HTML / DOM (100K+ Tokens)"] -->|CDP Session| AOM["Accessibility Object Model (AOM)"]
AOM -->|Pruning & Redundancy Filter| Pruned["Pruned AOM (90%+ Token Reduction)"]
Pruned -->|Targeted Clicks| Action["Logical Clicks (nodeId)"]
Pruned -->|SHA-256 Signature| Hashing["Deterministic State Hashing"]
The Impact: Wikipedia & GitHub Benchmarks
We ran our benchmarking engine against real-world targets comparing the pruned AOM against standard full-HTML extraction:
| Target Site | Metric | Standard Full-HTML | MCP Lite (Pruned AOM) | Token Savings / Speedup |
|---|---|---|---|---|
| wikipedia.org |
Payload Size Est. Tokens |
435 KB 108,863 |
47 KB 11,956 |
89.02% reduction in input tokens |
| Latency (3 Steps) | 9.0s (100% Agent Loop) | 3.0s (0% Agent Loop) | 3.0x speedup via workflow playback | |
| github.com |
Payload Size Est. Tokens |
594 KB 148,625 |
19 KB 4,774 |
96.79% reduction (Perfect for modern SPAs) |
Instead of feeding 148,000 tokens to the LLM for a single GitHub page, the agent only processes 4,770 tokens—making inference faster and significantly cheaper.
Bypassing Bot Detection: Hardening Chrome
Automating browsers usually triggers bot protection (Cloudflare, DataDome, etc.). MCP Lite loads a custom content script extension (inject.js) at startup that executes at document_start across all frames:
-
navigator.webdriverMasking: Instead of just overriding the property, we mock the getter. If a bot detector calls.toString()onnavigator.webdriver, our mock returns the exact native string signature:
function get webdriver() { [native code] }
-
PluginArray Prototype Realignment: Standard mock scripts fail checks like
navigator.plugins instanceof PluginArray. We reconstruct the fake plugins list and link their prototypes directly toPluginArray.prototypeandPlugin.prototype. -
Permissions Query Alignment: We align
Notification.permissionandnavigator.permissions.queryto resolve toprompt, matching real browser configurations.
The result? A 100% green pass on benchmarks like bot.sannysoft.com and seamless access to sites like G2.com.
Graph-Native Navigation Mapping
Every web interaction is a state transition. When the agent clicks a button, the page layout changes.
MCP Lite integrates with Neo4j to represent these layouts as nodes and interactions as edges. At every page load:
- We compute a SHA-256 hash of the serialized pruned AOM structure to represent the page’s unique layout
State. - We log transitions as an
INTERACTEDedge:
MERGE (f:State {hash: $from})
MERGE (t:State {hash: $to})
MERGE (f)-[r:INTERACTED {role: $role, name: $name, index: $index}]->(t)
SET r.x = $x, r.y = $y, r.w = $w, r.h = $h
If an agent needs to navigate from its current page to a target URL, it queries Neo4j for the shortest path of logical interactions (e.g. "click Search -> type username -> click Submit") and replays them instantly, avoiding reasoning loops.
Replaying Workflows with Zero LLM Overhead
For repetitive automation (such as bulk form submissions or multi-page scrapes), running the LLM in a step-by-step loop is expensive.
MCP Lite includes a workflow engine:
-
Record: Call
workflow_record_startand perform actions. -
Template: Actions are written to a JSON file. Use double-braces (
{{variable_name}}) to define input fields. - Replay: Play back the workflow with an input dataset:
[
{"search_query": "adidas campus green"},
{"search_query": "nike air force white"}
]
The browser handles the automation loop natively, feeding variables into the template and running actions sequentially with 0% LLM overhead during playback.
Try It Yourself!
The repository is open-source under the GNU AGPL-v3 license.
Quick Build
git clone https://github.com/UnitBuilds-CC/MCP-Lite.git
cd mcp-lite
go mod tidy
build.bat # Rebuilds server, benchmark, and test binaries
Configure Claude Desktop
Add it to your claude_desktop_config.json:
{
"mcpServers": {
"agentic-browser-mcp-lite": {
"command": "C:\\path\\to\\mcp-lite\\mcp-server.exe",
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASS": "secure_password"
}
}
}
}
Check out the code, run the benchmark suite against your own target sites, and join us in building a more efficient web automation standard!
UnitBuilds-CC
/
MCP-Lite
AOM driven Agentic browser MCP
Agentic Browser (MCP Lite) 🚀
A highly-efficient, standalone, graph-native version of the Agentic Browser MCP Server. It provides direct LLM control over Google Chrome via the Model Context Protocol (MCP) using chromedp. It integrates with Neo4j to build site graphs, query shortest action paths, and automate complex workflows with looping and parameter binding.
Features
- Raw MCP Browser Automation: Perform standard interactions (navigate, click, type, scroll, wait, drag, screenshot, list frames) using standard logical pixel offsets.
- Neo4j Graph Logging: Automatically logs page navigation data (titles, links, scripts, and cookies) and AOM state-transition graphs directly to a Neo4j database.
- BFS Site Crawling: Traverses a site using Breadth-First Search (BFS) to map structural accessibility states and link action transitions.
- Shortest Path Execution: Query Neo4j for the shortest sequence of actions from any start state/URL to a target state/URL, and execute the transitions automatically.
- Workflow Recording & Execution…
Discussion
How are you currently handling web browsing for your AI agents? Are you hitting context limits with raw HTML, or have you tried vision-based approaches?
I'd love to hear your thoughts on using the AOM vs. the DOM for these types of workflows!
Top comments (12)
Solid approach — AOM over raw DOM is a genuinely good insight, and the benchmarks show real numbers. A couple of questions though:
How does it hold up on SPA-heavy sites (React dashboards, maps, etc.) where AOM coverage is spottier than on Wikipedia/GitHub? Would be interesting to see the same benchmark on something like a Google Sheets or Notion page.
Neo4j for state transitions makes sense for complex workflows, but for simpler "navigate → extract → done" patterns, do you see it as a net win or does the infrastructure overhead outweigh the benefit?
Either way, clean write-up. Curious to see how this evolves.
While not as fantastic as if it were native AOM, it does use heuristic search for interactives and it has inspect_node to isolate and send commands in respect to the node id, resulting still in cleaner interactivity and a token reduction vs traditional.
Neo4j is integrated as a 'long-game' solution. Mapping sites at enterprise level, would mean a large amount of ram usage, but practically 0 inference cost on any action triggered on that site after the initial mapping. So as the numbers scale gets cranked, efficiency rises. a little overhead for scalability makes a big difference when it's system memory instead of vram + inference
Docs updated on the repo for the SPA test
i would need standard benchmark stats vs this, this is potentially novel information if you can prove theres no intelligence loss in any way or corners skipped
docs updated.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.