There's a clean story everyone tells about AI agents and the web.
Agents will call structured tools. Websites will expose those tools. Everything will be typed, reliable, and boring in the good way. Google even shipped a standard for it — WebMCP, in Chrome, behind a flag.
It's a genuinely good idea. There's just one problem.
Almost nobody has implemented it. Adoption is effectively zero. And web standards don't get adopted in quarters — they get adopted in years, if they get adopted at all.
So in the meantime, your agent is still doing the embarrassing thing: screenshotting pages, scraping the DOM, clicking at pixel coordinates, and quietly praying the layout didn't shift since last Tuesday.
I got tired of waiting. So I asked a different question:
What if you didn't need the website's permission?
A search box — a labeled input next to a submit button — already is a search(query) tool. The spec is right there, rendered in HTML. Someone just has to read it and write it down.
That's the entire idea behind webmcp-gen.
pip install webmcp-gen
webmcp-gen https://news.ycombinator.com --groq
{
"tools": [{
"name": "searchStories",
"description": "Search Hacker News stories",
"parameters": {
"type": "object",
"properties": { "query": { "type": "string" } },
"required": ["query"]
}
}]
}
It drives a real browser, reads the page the way a person would, and emits WebMCP tool definitions. Then — the part that makes it useful instead of a toy — it runs as an MCP server, so Claude Desktop, Cline, or any MCP client can actually call those tools on the live site and get structured results back.
The pipeline is four stages:
EXTRACT real browser -> DOM + Shadow DOM + iframes -> stable CSS selectors
ANALYZE heuristic or LLM -> WebMCP tools, each param bound to its selector
SERVE MCP server (stdio / SSE / streamable-HTTP)
EXECUTE fill by selector -> submit -> read structured results back
Let me show you the two parts that actually took thought.
Part 1: the selector binding (why it doesn't fall over on real pages)
Most "let AI use the browser" tools work by showing the model the DOM and letting it guess what to click. That guessing is exactly where they fall apart on a real, messy page — the model picks the wrong input, or the layout shifts and the coordinates rot.
webmcp-gen makes a different bet: resolve the target once, deterministically, at generation time. Every parameter the analyzer emits carries a _selector — the exact CSS selector that fills it. The tool an agent sees is clean:
{ "query": { "type": "string", "description": "Search term" } }
But the version the executor holds also carries the binding:
{
"query": {
"type": "string",
"description": "Search term",
"_selector": "input[name=\"q\"]"
},
"_submit_selector": "form#search"
}
When the agent calls searchStories(query="rust"), there is no guessing. The executor fills input[name="q"] and submits form#search. The LLM was used once, up front, to name things and infer intent — never on the hot path to re-derive what a search box is.
The selectors themselves are generated with a fallback chain, most-stable first:
function stableSelector(el) {
if (el.id) return '#' + CSS.escape(el.id);
if (el.getAttribute('data-testid'))
return `[data-testid="${el.getAttribute('data-testid')}"]`;
if (el.name && el.tagName === 'INPUT')
return `input[name="${CSS.escape(el.name)}"]`;
// ... select / textarea by name ...
// last resort: a bounded path with :nth-of-type
}
#id is best. data-testid is what good frontends ship for exactly this purpose. [name=...] is reliable for form fields. Only if all of those fail do we build a structural path — and even then it's capped at five levels so it can't generate a brittle 12-deep selector that breaks on the next deploy.
Part 2: the bug that taught me to never trust form.method
Here's the war story, because it's the kind of thing you only hit once you run against dozens of real sites instead of your own test page.
Extraction was crashing on certain sites. Not erroring gracefully — crashing the entire page extraction, returning zero tools. The stack trace pointed at this innocent-looking line:
method: (form.method || 'GET').toUpperCase()
The culprit is DOM clobbering. If a form contains an input named method — say <input name="method"> — then form.method no longer returns the string "get". It returns the input element. And elements don't have .toUpperCase(), so the whole thing throws and takes the page down with it.
Plenty of real forms have fields named method, action, submit, id. The property accessor is a trap.
The fix is to stop reading properties and read attributes instead:
method: (form.getAttribute('method') || 'GET').toUpperCase()
getAttribute can't be clobbered. I went through and did the same everywhere I'd touched form/field properties, and wrapped each form's parsing in its own try/catch so one malformed form can't nuke the rest of the page. Recovered a bunch of sites that had been silently returning nothing.
It's a small fix. But it's the difference between "works in the demo" and "works on the actual web," and you don't find it by being clever — you find it by running against real sites and reading the failures.
There's a related subtlety in extraction worth a line: webmcp-gen waits for the page with a MutationObserver that resolves when the DOM stops changing, not a fixed sleep. Single-page apps render after load; a sleep(2) either wastes two seconds or misses the content. Watching for stability is both faster and more correct. It also walks open Shadow DOM and same-origin iframes, so component-framework sites aren't invisible.
The part I'm actually proud of: it tells the truth
Drive a headless browser and some sites will try to stop you.
webmcp-gen patches the obvious headless tells — navigator.webdriver, the missing window.chrome, an empty plugin list, the SwiftShader WebGL giveaway. That's enough for a surprising amount of the web.
It is not enough for Cloudflare challenges, CAPTCHAs, or behavioral fingerprinting. Beating those means residential proxies and a TLS-spoofing arms race I deliberately don't ship.
So when a site blocks it, webmcp-gen says so:
{
"success": false,
"blocked": true,
"error": "Blocked by anti-bot protection (redirected to '418.html')."
}
It never fakes a success: true over a CAPTCHA page. For an agent, a fake success with garbage results is far more dangerous than an honest "I was blocked" — the agent can recover from the second one, but it'll happily act on the first.
Does it actually work?
There's a benchmark in the repo. It runs the full pipeline against real sites grouped by difficulty, because "X% success" is meaningless until you say which sites.
| Tier | What it means |
|---|---|
| sandbox | sites built for automation |
| open | public sites, no aggressive detection |
| guarded | real sites that may throttle or challenge |
| walled | known hard-blocks (reported blocked, never faked) |
On the open and sandbox tiers it lands the large majority of sites — including successful live runs against names like Google, Bing, GitHub, and Wikipedia, not just toy pages. On the walled tier it correctly reports blocked.
The point of the tiers is honesty: one aggregate percentage hides which sites it actually handles. The whole suite is in the source, and you can re-run it:
webmcp-benchmark --suite full
It does more than single calls
-
Multi-page crawl — one page rarely shows everything a site can do.
--crawlwalks the origin and merges tools from every page it finds. - Authenticated sessions — for gated sites, you log in once in a real browser (you type the password, not the tool), and it reuses the session.
-
Tool-chaining workflows — chain
search -> open result -> act, passing earlier results into later steps. The page is re-read between steps, so a "reserve" button that only appears on a detail page becomes callable when you get there.
All of it works with any OpenAI-compatible API — Groq, OpenAI, or a local Ollama model, so the analysis can run fully offline.
Try it
pip install webmcp-gen
playwright install chromium
webmcp-gen https://en.wikipedia.org --groq
It's MIT, on PyPI, and the README has the architecture diagrams and the honest caveats spelled out.
→ github.com/Nidhicodes/webmcp-gen
If you're building agents and you've felt this exact frustration, I'd genuinely love to know where it breaks for you. The interesting failures are the ones I haven't seen yet.
Top comments (0)