<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Omid Seyfan</title>
    <description>The latest articles on DEV Community by Omid Seyfan (@omidseyfan).</description>
    <link>https://dev.to/omidseyfan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4006855%2Feb1bd74a-bd17-488a-a2af-5ae6abf8af25.jpg</url>
      <title>DEV Community: Omid Seyfan</title>
      <link>https://dev.to/omidseyfan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/omidseyfan"/>
    <language>en</language>
    <item>
      <title>How We Made Our AI Browser Agent Stop Clicking the Wrong Button</title>
      <dc:creator>Omid Seyfan</dc:creator>
      <pubDate>Tue, 30 Jun 2026 13:07:00 +0000</pubDate>
      <link>https://dev.to/omidseyfan/how-we-made-our-ai-browser-agent-stop-clicking-the-wrong-button-3kkl</link>
      <guid>https://dev.to/omidseyfan/how-we-made-our-ai-browser-agent-stop-clicking-the-wrong-button-3kkl</guid>
      <description>&lt;p&gt;At Smoketest.sh, you describe a flow in a sentence ("log in, add a paid seat, confirm the invoice updates") and an AI agent runs it in a real browser. The agent reads the page, decides what to do, and drives Playwright to do it.&lt;/p&gt;

&lt;p&gt;The first version worked great in the demo and fell apart on the second run. This is the story of why, and the fix that made element targeting reliable: never let the model invent a selector. Hand it stable IDs from the accessibility tree and make it point at those.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Letting an LLM target page elements by natural-language description is flaky. The description is regenerated every run and rarely resolves to exactly one element.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;page.ariaSnapshot({ mode: 'ai' })&lt;/code&gt; returns the page as a role-based tree and stamps every interactive element with a stable &lt;code&gt;[ref=eN]&lt;/code&gt; ID.&lt;/li&gt;
&lt;li&gt;Playwright resolves &lt;code&gt;aria-ref=eN&lt;/code&gt; as a first-class locator, so the model can act on the exact element it just saw.&lt;/li&gt;
&lt;li&gt;Make the model cite refs from the tree and keep description as a fallback only. The wrong-element problem mostly disappears.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how each piece works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "click the Sign in button" is flaky
&lt;/h2&gt;

&lt;p&gt;The naive design is the obvious one. Give the model a &lt;code&gt;click&lt;/code&gt; tool that takes a description, and let it figure out the rest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tempting, and wrong&lt;/span&gt;
&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;the Sign in button&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood you turn that string into a locator. On a clean login page, &lt;code&gt;getByRole('button', { name: 'Sign in' })&lt;/code&gt; finds exactly one element and it works. Ship it, watch the demo pass, feel good.&lt;/p&gt;

&lt;p&gt;Then it meets a real app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There are three things matching "Sign in": a nav link, a footer link, and the actual button. The locator resolves to a list, and Playwright clicks the first one, which navigates somewhere you did not expect.&lt;/li&gt;
&lt;li&gt;The button text is "Sign In" this week and "Log in" after a copy change. The description the model wrote last run no longer matches.&lt;/li&gt;
&lt;li&gt;The model rewords its own description between runs. "the Sign in button" becomes "the blue login button at the top right," and now your role-and-name lookup misses entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are bugs in the model. They are the consequence of using a regenerated English phrase as a selector. The phrase is fuzzy by construction, and fuzzy selectors on a busy page do not resolve to one element.&lt;/p&gt;

&lt;h2&gt;
  
  
  The accessibility tree is the agent's source of truth
&lt;/h2&gt;

&lt;p&gt;The fix starts by changing what the model looks at. Instead of letting it guess from a screenshot or raw HTML, we hand it Playwright's &lt;a href="https://playwright.dev/docs/aria-snapshots" rel="noopener noreferrer"&gt;accessibility snapshot&lt;/a&gt; in AI mode, a compact view of the page's &lt;a href="https://developer.mozilla.org/en-US/docs/Glossary/Accessibility_tree" rel="noopener noreferrer"&gt;accessibility tree&lt;/a&gt;. That is one tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getAccessibilityTree&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Return a structured representation of page content as an accessibility tree to understand the page.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ariaSnapshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tree&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;page.ariaSnapshot({ mode: 'ai' })&lt;/code&gt; returns the page as a compact, role-based tree. The important part of AI mode: every interactive element gets a &lt;code&gt;[ref=eN]&lt;/code&gt; tag. A login page comes back looking roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;heading "Welcome back" [level=1]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;textbox "Email" [ref=e4]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;textbox "Password" [ref=e5]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;button "Sign in" [ref=e6]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;link "Forgot password?" [ref=e7]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model no longer has to describe the button. It can refer to &lt;code&gt;e6&lt;/code&gt;. That ref is the contract between "what the model saw" and "what Playwright clicks," and it is the whole game.&lt;/p&gt;

&lt;p&gt;This is the same structured-snapshot approach Microsoft's &lt;a href="https://github.com/microsoft/playwright-mcp" rel="noopener noreferrer"&gt;Playwright MCP server&lt;/a&gt; takes: let the model act on accessibility refs, not on pixels or guesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  aria-ref is a first-class Playwright locator
&lt;/h2&gt;

&lt;p&gt;The reason refs work is that Playwright resolves them directly. &lt;code&gt;aria-ref=e6&lt;/code&gt; is a real locator engine, not something we built. So the &lt;code&gt;click&lt;/code&gt; tool prefers the ref and only falls back to a description when it has none:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;refStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;refStr&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;click requires either ref or description&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;locator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;refStr&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`aria-ref=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;refStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// stable: resolves against the snapshot&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;resolveLocator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;         &lt;span class="c1"&gt;// fallback: fuzzy, best-effort&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ref path is stable because it is resolved against the exact snapshot the model just read, not re-derived from a phrase. Same idea for &lt;code&gt;fill&lt;/code&gt;, &lt;code&gt;select&lt;/code&gt;, and &lt;code&gt;getText&lt;/code&gt;. Every interaction tool takes &lt;code&gt;ref&lt;/code&gt; first and &lt;code&gt;description&lt;/code&gt; second.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model needs rules, not just tools
&lt;/h2&gt;

&lt;p&gt;Tools that accept a ref are not enough. The model will still reach for a description if you let it, because describing things in English is what language models love to do. So the rules you give it have to make the ordering non-negotiable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read the accessibility tree before touching a page it has not seen.&lt;/li&gt;
&lt;li&gt;On every action, prefer the ref over a description.&lt;/li&gt;
&lt;li&gt;After a failed action, read the tree again for fresh refs instead of rewording the description.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last rule is the one that earns its place. The instinct of a language model after a failed action is to try a more elaborate description. That is exactly the wrong move, because the description was never the reliable path. Re-reading the tree gives it fresh refs that match the current DOM, which is what actually changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  When there is no ref, fall back deliberately
&lt;/h2&gt;

&lt;p&gt;Refs are not always available. The model might be acting on something it inferred rather than something in the last snapshot. So &lt;code&gt;resolveLocator&lt;/code&gt; is a deliberate ladder, not a single guess. For each candidate phrase it tries &lt;a href="https://playwright.dev/docs/locators" rel="noopener noreferrer"&gt;role, then label, then placeholder, then text&lt;/a&gt;, and takes the first one that is actually visible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;phrase&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;roleHint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;roleLocator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;roleHint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;isVisible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;roleLocator&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;roleLocator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;labelLocator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByLabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;isVisible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;labelLocator&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;labelLocator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;placeholderLocator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByPlaceholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;isVisible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;placeholderLocator&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;placeholderLocator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textLocator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;exact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;isVisible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;textLocator&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;textLocator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Could not find a visible element for description: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;isVisible&lt;/code&gt; is a 5-second &lt;code&gt;waitFor({ state: 'visible' })&lt;/code&gt; wrapped in a try/catch, so a candidate that exists but is hidden does not win. The phrase extraction pulls quoted substrings out of the description first ("click the button labeled \"Place order\"" yields &lt;code&gt;Place order&lt;/code&gt;), so the model's verbosity does not poison the match.&lt;/p&gt;

&lt;p&gt;This is the fuzzy path, and we treat it as such. It is good enough to recover, and it is exactly why we want the model on refs whenever it can be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't fail with "element not found"
&lt;/h2&gt;

&lt;p&gt;When even the fallback misses, the worst thing you can return is a bare "element not found." The model has nothing to act on and will flail. So a failed click collects diagnostics about what the page actually contains and returns them with the error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;diagnostics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;collectClickDiagnostics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;getErrorMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;. Diagnostics: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;diagnostics&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;collectClickDiagnostics&lt;/code&gt; counts how many elements matched by role, by label, and by text, and includes a sample of the page's links:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;roleHint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;roleMatch&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;roleCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// e.g. 0 buttons matched&lt;/span&gt;
  &lt;span class="nx"&gt;labelCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// e.g. 0 labels matched&lt;/span&gt;
  &lt;span class="nx"&gt;textCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// e.g. 3 text nodes matched&lt;/span&gt;
  &lt;span class="na"&gt;sampleLinks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;linkSamples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;currentUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the failure is legible. &lt;code&gt;textCount: 3, roleCount: 0&lt;/code&gt; tells the model (and us, in the trace) that the thing it called a button is really three pieces of text, so it should re-read the tree and target a real interactive element. The recovery loop closes because the error carries enough to act on.&lt;/p&gt;

&lt;p&gt;There is also a small specialization for links: if the model meant to click a link and the locator missed, we look up the href by matching link text or &lt;code&gt;aria-label&lt;/code&gt; and navigate directly, which sidesteps a whole class of overlay-and-intercept clicks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trade-offs, honestly
&lt;/h2&gt;

&lt;p&gt;This is reliable element targeting, not a deterministic agent. Two limits worth stating plainly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Refs are only valid for the snapshot you took.&lt;/strong&gt; After a navigation or a DOM change, &lt;code&gt;e6&lt;/code&gt; may point at nothing or at the wrong node. That is why the prompt forces a fresh &lt;code&gt;getAccessibilityTree&lt;/code&gt; after failures and on new pages. Treat refs as per-snapshot, not durable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snapshots cost tokens.&lt;/strong&gt; An accessibility tree of a content-heavy page can be large, and feeding one to the model after every navigation adds up fast. We wrote about that cost in detail in &lt;a href="https://smoketest.sh/blog/playwright-mcp-claude-code" rel="noopener noreferrer"&gt;What Works (and What Breaks) Running Playwright MCP in Claude Code&lt;/a&gt;, where a single snapshot can run to tens of thousands of tokens. The reliability is worth it for us, but it is a real line on the bill, and it is why we do not re-snapshot on every single step, only when the page is new or something failed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the model still decides &lt;em&gt;what&lt;/em&gt; to do. Refs make sure that when it decides to click the Sign in button, it clicks that button and not the footer link with the same name. They do not stop it from deciding to click the wrong thing in the first place. That is a different problem, solved with a separate evaluation pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you are building an LLM browser agent
&lt;/h2&gt;

&lt;p&gt;The one idea to take away: do not let the model emit selectors. A selector invented from an English phrase is regenerated every run and rarely resolves to one element. Instead,&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Give the model a structured snapshot of the page with stable IDs (&lt;code&gt;page.ariaSnapshot({ mode: 'ai' })&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Make every action tool take an ID first and a description only as a fallback (&lt;code&gt;page.locator('aria-ref=eN')&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Enforce snapshot-then-act in the system prompt, and re-snapshot on failure instead of rewording.&lt;/li&gt;
&lt;li&gt;Return rich diagnostics on a miss so the model can recover instead of guessing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sequence is what moved our agent from "passes the demo" to "passes on the second run, and the hundredth."&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own app
&lt;/h2&gt;

&lt;p&gt;We run this in production at Smoketest. You describe the flows that matter (login, checkout, onboarding, billing), and we run them in a real browser after every deploy and tell you what broke. No Playwright suite for you to own or maintain. Take a look at &lt;a href="https://smoketest.sh" rel="noopener noreferrer"&gt;smoketest.sh&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
