<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nidhi Singh</title>
    <description>The latest articles on DEV Community by Nidhi Singh (@nidhi-singh).</description>
    <link>https://dev.to/nidhi-singh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965352%2Ff7df9782-3d7f-4890-8b49-0f4f83523eba.jpg</url>
      <title>DEV Community: Nidhi Singh</title>
      <link>https://dev.to/nidhi-singh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nidhi-singh"/>
    <language>en</language>
    <item>
      <title>I gave one Gemini agent two observability tools. The correlation it found surprised me.</title>
      <dc:creator>Nidhi Singh</dc:creator>
      <pubDate>Sat, 06 Jun 2026 12:54:29 +0000</pubDate>
      <link>https://dev.to/nidhi-singh/i-gave-one-gemini-agent-two-observability-tools-the-correlation-it-found-surprised-me-2h4i</link>
      <guid>https://dev.to/nidhi-singh/i-gave-one-gemini-agent-two-observability-tools-the-correlation-it-found-surprised-me-2h4i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftso582wux0t3fqbqbvij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftso582wux0t3fqbqbvij.png" width="800" height="415"&gt;&lt;/a&gt;&lt;br&gt;
There is a category of production bug in AI systems that I find genuinely fascinating, because the difficulty has almost nothing to do with the bug itself. The bug is often simple. What makes it nearly undebuggable is the way we've chosen to organize our tools. I want to walk through it carefully, because once the shape of the problem is clear, the solution becomes almost forced and that solution turned into a project I'll show you at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two worlds that never meet
&lt;/h2&gt;

&lt;p&gt;When you put a model into production, you quickly find yourself watching two different things.&lt;/p&gt;

&lt;p&gt;The first is the infrastructure. This is the world of memory, CPU, pods, network, latency — the machinery the model runs on. We have excellent tools for this; Dynatrace is the one I used.&lt;/p&gt;

&lt;p&gt;The second is the model's own behavior. Is it hallucinating? Are its answers relevant? How is the eval score trending, how many tokens is it consuming? This is a genuinely different kind of observability, and again we have good tools for it; I used Arize Phoenix.&lt;/p&gt;

&lt;p&gt;Here is the important part, and it's so ordinary that it's easy to miss: these two worlds are monitored by two different products, and those products do not know about each other. Worse, they're usually watched by two different teams. The infrastructure has its on-call rotation; the model has its own. Each group is fluent in its own dashboard and effectively blind to the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure that lives in the seam
&lt;/h2&gt;

&lt;p&gt;Now consider a specific incident. A memory leak begins on one of your pods. Under memory pressure, the system does something reasonable in isolation: it trims the buffer that assembles prompts before they go to the model, to reclaim space. The consequence is that the model starts receiving prompts with part of their context silently removed. And a model running on half its context does the only thing it can, it fills the missing pieces by guessing. The hallucination rate climbs.&lt;/p&gt;

&lt;p&gt;Watch what each observer sees. The infrastructure engineer sees memory utilization spike. That's a familiar, almost boring signal - restart the pod, reclaim the memory, move on. The ML engineer sees the model's answer quality fall off a cliff and begins the long investigation into prompts, retrieval, weights. Each of them is looking at exactly one link of a single causal chain, and nothing in their tool gives them any reason to suspect that the other link exists, let alone that it belongs to the same story.&lt;/p&gt;

&lt;p&gt;This is the insight I kept coming back to: the bug is not technical, it's organizational. Every piece of information required to solve it is already being collected. The failure is purely that the two halves of the chain never arrive in the same place, in the same mind, at the same time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The forced solution
&lt;/h2&gt;

&lt;p&gt;Once you frame it that way, the fix is almost not a choice. If the problem is that no single observer sees both layers, then you create an observer that does. You put one agent in front of both dashboards.&lt;/p&gt;

&lt;p&gt;That's ARIA. It connects to both Dynatrace and Arize Phoenix through their MCP servers, pulls the relevant signals from each, and hands the combined picture to Gemini — orchestrated with Google's Agent Development Kit as a planner → reasoner → executor pipeline — to reason over as one problem instead of two.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyv5vakx9cu84mn9y0cn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyv5vakx9cu84mn9y0cn.png" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The one decision that actually mattered
&lt;/h2&gt;

&lt;p&gt;I'll share the mistake, because it's the most useful part. My first design was two agents: one that understood Dynatrace, one that understood Arize, talking to each other. It felt natural — mirror the org structure in software. It does not work. All it does is faithfully reproduce the two-teams blind spot inside your code. Each agent is still an expert in one half and a stranger to the other.&lt;/p&gt;

&lt;p&gt;The correlation only emerges when a single agent holds both toolsets inside one reasoning context. When one mind can call a Dynatrace tool and an Arize tool in the same turn, and keep both results in view at once, it can finally see the chain end to end. That's the whole product compressed into a sentence: one mind, both halves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://aria-three-lac.vercel.app/" rel="noopener noreferrer"&gt;https://aria-three-lac.vercel.app/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Code: &lt;a href="https://github.com/Nidhicodes/aria" rel="noopener noreferrer"&gt;https://github.com/Nidhicodes/aria&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Demo: &lt;a href="https://youtu.be/odIU07C-jfY" rel="noopener noreferrer"&gt;https://youtu.be/odIU07C-jfY&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Nidhi Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:36:58 +0000</pubDate>
      <link>https://dev.to/nidhi-singh/-4o04</link>
      <guid>https://dev.to/nidhi-singh/-4o04</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7" class="crayons-story__hidden-navigation-link"&gt;WebMCP has 0% adoption. So I generated the tools myself.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/nidhi-singh" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965352%2Ff7df9782-3d7f-4890-8b49-0f4f83523eba.jpg" alt="nidhi-singh profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/nidhi-singh" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Nidhi Singh
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Nidhi Singh
                
              
              &lt;div id="story-author-preview-content-3805675" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/nidhi-singh" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965352%2Ff7df9782-3d7f-4890-8b49-0f4f83523eba.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Nidhi Singh&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 2&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7" id="article-link-3805675"&gt;
          WebMCP has 0% adoption. So I generated the tools myself.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/mcp"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;mcp&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>WebMCP has 0% adoption. So I generated the tools myself.</title>
      <dc:creator>Nidhi Singh</dc:creator>
      <pubDate>Tue, 02 Jun 2026 21:05:42 +0000</pubDate>
      <link>https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7</link>
      <guid>https://dev.to/nidhi-singh/webmcp-has-0-adoption-so-i-generated-the-tools-myself-59k7</guid>
      <description>&lt;p&gt;There's a clean story everyone tells about AI agents and the web.&lt;/p&gt;

&lt;p&gt;Agents will call structured tools. Websites will expose those tools. Everything will be typed, reliable, and boring in the good way. Google even shipped a standard for it — &lt;strong&gt;WebMCP&lt;/strong&gt;, in Chrome, behind a flag.&lt;/p&gt;

&lt;p&gt;It's a genuinely good idea. There's just one problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Almost nobody has implemented it.&lt;/strong&gt; Adoption is effectively zero. And web standards don't get adopted in quarters — they get adopted in years, if they get adopted at all.&lt;/p&gt;

&lt;p&gt;So in the meantime, your agent is still doing the embarrassing thing: screenshotting pages, scraping the DOM, clicking at pixel coordinates, and quietly praying the layout didn't shift since last Tuesday.&lt;/p&gt;

&lt;p&gt;I got tired of waiting. So I asked a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if you didn't need the website's permission?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A search box — a labeled input next to a submit button — already &lt;em&gt;is&lt;/em&gt; a &lt;code&gt;search(query)&lt;/code&gt; tool. The spec is right there, rendered in HTML. Someone just has to read it and write it down.&lt;/p&gt;

&lt;p&gt;That's the entire idea behind &lt;strong&gt;webmcp-gen&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;webmcp-gen
webmcp-gen https://news.ycombinator.com &lt;span class="nt"&gt;--groq&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"searchStories"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search Hacker News stories"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It drives a real browser, reads the page the way a person would, and emits WebMCP tool definitions. Then — the part that makes it useful instead of a toy — it runs as an &lt;strong&gt;MCP server&lt;/strong&gt;, so Claude Desktop, Cline, or any MCP client can &lt;em&gt;actually call those tools on the live site&lt;/em&gt; and get structured results back.&lt;/p&gt;

&lt;p&gt;The pipeline is four stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXTRACT   real browser -&amp;gt; DOM + Shadow DOM + iframes -&amp;gt; stable CSS selectors
ANALYZE   heuristic or LLM -&amp;gt; WebMCP tools, each param bound to its selector
SERVE     MCP server (stdio / SSE / streamable-HTTP)
EXECUTE   fill by selector -&amp;gt; submit -&amp;gt; read structured results back
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me show you the two parts that actually took thought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: the selector binding (why it doesn't fall over on real pages)
&lt;/h2&gt;

&lt;p&gt;Most "let AI use the browser" tools work by showing the model the DOM and letting it guess what to click. That guessing is exactly where they fall apart on a real, messy page — the model picks the wrong input, or the layout shifts and the coordinates rot.&lt;/p&gt;

&lt;p&gt;webmcp-gen makes a different bet: &lt;strong&gt;resolve the target once, deterministically, at generation time.&lt;/strong&gt; Every parameter the analyzer emits carries a &lt;code&gt;_selector&lt;/code&gt; — the exact CSS selector that fills it. The tool an agent sees is clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search term"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the version the executor holds also carries the binding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search term"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"_selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input[name=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;q&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_submit_selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"form#search"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the agent calls &lt;code&gt;searchStories(query="rust")&lt;/code&gt;, there is no guessing. The executor fills &lt;code&gt;input[name="q"]&lt;/code&gt; and submits &lt;code&gt;form#search&lt;/code&gt;. The LLM was used &lt;strong&gt;once&lt;/strong&gt;, up front, to name things and infer intent — never on the hot path to re-derive what a search box is.&lt;/p&gt;

&lt;p&gt;The selectors themselves are generated with a fallback chain, most-stable first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;stableSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;CSS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data-testid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`[data-testid="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data-testid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;"]`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INPUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`input[name="&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;CSS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;"]`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// ... select / textarea by name ...&lt;/span&gt;
  &lt;span class="c1"&gt;// last resort: a bounded path with :nth-of-type&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;#id&lt;/code&gt; is best. &lt;code&gt;data-testid&lt;/code&gt; is what good frontends ship for exactly this purpose. &lt;code&gt;[name=...]&lt;/code&gt; is reliable for form fields. Only if all of those fail do we build a structural path — and even then it's capped at five levels so it can't generate a brittle 12-deep selector that breaks on the next deploy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: the bug that taught me to never trust &lt;code&gt;form.method&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Here's the war story, because it's the kind of thing you only hit once you run against dozens of real sites instead of your own test page.&lt;/p&gt;

&lt;p&gt;Extraction was crashing on certain sites. Not erroring gracefully — crashing the &lt;em&gt;entire page extraction&lt;/em&gt;, returning zero tools. The stack trace pointed at this innocent-looking line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toUpperCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The culprit is &lt;strong&gt;DOM clobbering.&lt;/strong&gt; If a form contains an input named &lt;code&gt;method&lt;/code&gt; — say &lt;code&gt;&amp;lt;input name="method"&amp;gt;&lt;/code&gt; — then &lt;code&gt;form.method&lt;/code&gt; no longer returns the string &lt;code&gt;"get"&lt;/code&gt;. It returns the &lt;em&gt;input element&lt;/em&gt;. And elements don't have &lt;code&gt;.toUpperCase()&lt;/code&gt;, so the whole thing throws and takes the page down with it.&lt;/p&gt;

&lt;p&gt;Plenty of real forms have fields named &lt;code&gt;method&lt;/code&gt;, &lt;code&gt;action&lt;/code&gt;, &lt;code&gt;submit&lt;/code&gt;, &lt;code&gt;id&lt;/code&gt;. The property accessor is a trap.&lt;/p&gt;

&lt;p&gt;The fix is to stop reading properties and read attributes instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;method&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toUpperCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;getAttribute&lt;/code&gt; can't be clobbered. I went through and did the same everywhere I'd touched form/field properties, and wrapped each form's parsing in its own try/catch so one malformed form can't nuke the rest of the page. Recovered a bunch of sites that had been silently returning nothing.&lt;/p&gt;

&lt;p&gt;It's a small fix. But it's the difference between "works in the demo" and "works on the actual web," and you don't find it by being clever — you find it by running against real sites and reading the failures.&lt;/p&gt;

&lt;p&gt;There's a related subtlety in extraction worth a line: webmcp-gen waits for the page with a &lt;strong&gt;MutationObserver&lt;/strong&gt; that resolves when the DOM stops changing, not a fixed &lt;code&gt;sleep&lt;/code&gt;. Single-page apps render after load; a &lt;code&gt;sleep(2)&lt;/code&gt; either wastes two seconds or misses the content. Watching for stability is both faster and more correct. It also walks open &lt;strong&gt;Shadow DOM&lt;/strong&gt; and same-origin &lt;strong&gt;iframes&lt;/strong&gt;, so component-framework sites aren't invisible.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part I'm actually proud of: it tells the truth
&lt;/h2&gt;

&lt;p&gt;Drive a headless browser and some sites will try to stop you.&lt;/p&gt;

&lt;p&gt;webmcp-gen patches the obvious headless tells — &lt;code&gt;navigator.webdriver&lt;/code&gt;, the missing &lt;code&gt;window.chrome&lt;/code&gt;, an empty plugin list, the SwiftShader WebGL giveaway. That's enough for a surprising amount of the web.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;not&lt;/strong&gt; enough for Cloudflare challenges, CAPTCHAs, or behavioral fingerprinting. Beating those means residential proxies and a TLS-spoofing arms race I deliberately don't ship.&lt;/p&gt;

&lt;p&gt;So when a site blocks it, webmcp-gen says so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"blocked"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Blocked by anti-bot protection (redirected to '418.html')."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It never fakes a &lt;code&gt;success: true&lt;/code&gt; over a CAPTCHA page. For an agent, a fake success with garbage results is far more dangerous than an honest "I was blocked" — the agent can recover from the second one, but it'll happily act on the first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Does it actually work?
&lt;/h2&gt;

&lt;p&gt;There's a benchmark in the repo. It runs the full pipeline against real sites grouped by difficulty, because "X% success" is meaningless until you say &lt;em&gt;which&lt;/em&gt; sites.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;sandbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;sites built for automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;open&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;public sites, no aggressive detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;guarded&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;real sites that may throttle or challenge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;walled&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;known hard-blocks (reported &lt;code&gt;blocked&lt;/code&gt;, never faked)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On the open and sandbox tiers it lands the large majority of sites — including successful live runs against names like Google, Bing, GitHub, and Wikipedia, not just toy pages. On the walled tier it correctly reports &lt;code&gt;blocked&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The point of the tiers is honesty: one aggregate percentage hides which sites it actually handles. The whole suite is in the source, and you can re-run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;webmcp-benchmark &lt;span class="nt"&gt;--suite&lt;/span&gt; full
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  It does more than single calls
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-page crawl&lt;/strong&gt; — one page rarely shows everything a site can do. &lt;code&gt;--crawl&lt;/code&gt; walks the origin and merges tools from every page it finds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authenticated sessions&lt;/strong&gt; — for gated sites, you log in once in a real browser (you type the password, not the tool), and it reuses the session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-chaining workflows&lt;/strong&gt; — chain &lt;code&gt;search -&amp;gt; open result -&amp;gt; act&lt;/code&gt;, passing earlier results into later steps. The page is re-read between steps, so a "reserve" button that only appears on a detail page becomes callable when you get there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of it works with any OpenAI-compatible API — Groq, OpenAI, or a &lt;strong&gt;local Ollama model&lt;/strong&gt;, so the analysis can run fully offline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;webmcp-gen
playwright &lt;span class="nb"&gt;install &lt;/span&gt;chromium
webmcp-gen https://en.wikipedia.org &lt;span class="nt"&gt;--groq&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's MIT, on PyPI, and the README has the architecture diagrams and the honest caveats spelled out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;→ &lt;a href="https://github.com/Nidhicodes/webmcp-gen" rel="noopener noreferrer"&gt;github.com/Nidhicodes/webmcp-gen&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building agents and you've felt this exact frustration, I'd genuinely love to know where it breaks for you. The interesting failures are the ones I haven't seen yet.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
