<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: aras</title>
    <description>The latest articles on DEV Community by aras (@name_ara_1bda5ded839304).</description>
    <link>https://dev.to/name_ara_1bda5ded839304</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878839%2Ffe6ef8db-992f-4723-bea4-ff9c7c5d602d.png</url>
      <title>DEV Community: aras</title>
      <link>https://dev.to/name_ara_1bda5ded839304</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/name_ara_1bda5ded839304"/>
    <language>en</language>
    <item>
      <title>Cutting LLM tokens 10x for AI browser automation</title>
      <dc:creator>aras</dc:creator>
      <pubDate>Tue, 14 Apr 2026 14:53:36 +0000</pubDate>
      <link>https://dev.to/name_ara_1bda5ded839304/cutting-llm-tokens-10x-for-ai-browser-automation-1195</link>
      <guid>https://dev.to/name_ara_1bda5ded839304/cutting-llm-tokens-10x-for-ai-browser-automation-1195</guid>
      <description>&lt;p&gt;Last month I was building a browser-automation pipeline for an insurance-quote aggregator — a freelance gig that needed to fill forms across 20+ provider websites. I picked &lt;a href="https://github.com/browserbase/stagehand" rel="noopener noreferrer"&gt;Stagehand&lt;/a&gt; because the &lt;code&gt;act()&lt;/code&gt; / &lt;code&gt;extract()&lt;/code&gt; API looked clean.&lt;/p&gt;

&lt;p&gt;Two days in, the Gemini bill was at $60 and I'd only automated three sites.&lt;/p&gt;

&lt;p&gt;I dug into the token accounting. Stagehand was sending the &lt;strong&gt;entire&lt;/strong&gt; accessibility tree of every page to the LLM on every action. For an Amazon search-results page that's ~50,000 tokens per decision. For a Booking.com hotel listing, ~45,000.&lt;/p&gt;

&lt;p&gt;This post is about the filtering heuristic I ended up writing, a head-to-head benchmark against Stagehand, and the open-source framework (Sentinel) that came out of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: brute-force accessibility tree
&lt;/h2&gt;

&lt;p&gt;When you ask an LLM to click a button, it needs to know what interactive elements exist on the page. The standard approach is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse the browser's accessibility tree (AOM)&lt;/li&gt;
&lt;li&gt;Serialize every interactive element (button, link, textbox, checkbox…)&lt;/li&gt;
&lt;li&gt;Send the whole serialization to the LLM along with your instruction&lt;/li&gt;
&lt;li&gt;LLM picks an element ID and returns an action&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a simple page (login form, 5 inputs) this is fine. Maybe 500 tokens. The LLM has no trouble picking the right field.&lt;/p&gt;

&lt;p&gt;For a real-world page — Amazon search results, a form on durchblicker.at, the Gmail inbox — you easily hit &lt;strong&gt;300+ interactive elements&lt;/strong&gt; per page. Serialized, that's 30,000–50,000 tokens. Every. Single. Action.&lt;/p&gt;

&lt;p&gt;Three problems compound:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: GPT-4o at $2.50/M input tokens = ~$0.12 per action. A 15-step task costs $1.80. Scale that to 10k users and you're bankrupt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: More tokens = slower responses, sometimes 15–30 seconds per decision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: LLMs get worse at picking the right element when there are 300 candidates. The signal-to-noise ratio tanks.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The insight: you don't need the full tree
&lt;/h2&gt;

&lt;p&gt;Here's the thing — when a user says "click the Add to Cart button", there are maybe &lt;strong&gt;10 elements&lt;/strong&gt; on the page that could plausibly match. The other 290 are header nav, footer links, unrelated product cards, cookie banners, modal stubs.&lt;/p&gt;

&lt;p&gt;If you could filter to the top-50 most relevant elements before sending to the LLM, you'd have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5–10× fewer tokens → 5–10× cheaper&lt;/li&gt;
&lt;li&gt;Shorter prompts → faster responses&lt;/li&gt;
&lt;li&gt;Less noise → better element picks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is: how do you rank "relevance" without an LLM call (which would defeat the purpose)?&lt;/p&gt;




&lt;h2&gt;
  
  
  The filter: keyword overlap scoring
&lt;/h2&gt;

&lt;p&gt;The approach I ended up with is embarrassingly simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;filterRelevantElements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;UIElement&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;maxElements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;UIElement&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Tokenize the instruction (handle multiple languages with \p{L}\p{N})&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;instructionTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;instruction&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[\p&lt;/span&gt;&lt;span class="sr"&gt;{L}&lt;/span&gt;&lt;span class="se"&gt;\p&lt;/span&gt;&lt;span class="sr"&gt;{N}&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/gu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Score each element by keyword overlap&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elementText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elementTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;elementText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[\p&lt;/span&gt;&lt;span class="sr"&gt;{L}&lt;/span&gt;&lt;span class="se"&gt;\p&lt;/span&gt;&lt;span class="sr"&gt;{N}&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/gu&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;elementTokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;instructionTokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Keep top-N by score, preserve original order for tied scores&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;scored&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxElements&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the core. ~15 lines of code, zero dependencies, runs in &amp;lt;5ms for 500 elements.&lt;/p&gt;

&lt;p&gt;Three refinements that matter in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Always keep form fields.&lt;/strong&gt; If the instruction is "fill out the form", the keyword "form" doesn't appear on any input label. Always preserve &lt;code&gt;role=textbox/combobox/checkbox/radio/slider&lt;/code&gt; elements regardless of score.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Always keep buttons near form fields.&lt;/strong&gt; The submit button often has a generic label like "Continue" that doesn't match the instruction keywords. Keep buttons that are positionally close to form fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Unicode tokenization.&lt;/strong&gt; &lt;code&gt;/[a-z0-9]+/&lt;/code&gt; breaks on German umlauts, Turkish dotted-i, Czech diacritics. Use &lt;code&gt;/[\p{L}\p{N}]+/gu&lt;/code&gt; to handle Latin-script languages correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The benchmark
&lt;/h2&gt;

&lt;p&gt;I ran the same task with Stagehand and Sentinel using the &lt;strong&gt;exact same model&lt;/strong&gt; — Gemini 3 Flash — and identical browser config (viewport 1920×1080, domSettleTimeoutMs 3000).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task&lt;/strong&gt;: Amazon.de → search "bluetooth kopfhörer over-ear" → filter by brand Sony → sort by customer rating → extract first 3 products (name, price, rating).&lt;/p&gt;

&lt;h3&gt;
  
  
  Sentinel
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: ✅ Fill search field with "bluetooth kopfhörer over-ear"
Step 2: ✅ Click "Los" (submit)
Step 3: ✅ Click Sony brand filter in sidebar
Step 4: ✅ Select "Durchschn. Kundenrezension" from sort dropdown
Step 5: 🔍 Extract top 3 products

Result: ✅ Goal achieved in 5 steps
Time: 100s
Tokens: 33k total
Cost: $0.003
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Extracted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sony WH-1000XM3…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"star_rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4,6 von 5 Sternen"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sony WH-CH520…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"25,20 €"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"star_rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4,5 von 5 Sternen"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sony WH-CH720N…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"60,62 €"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"star_rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4,5 von 5 Sternen"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stagehand
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Action 1: ariaTree (page observation)
Action 2: click "Akzeptieren" (cookie banner)
Action 3: type search query
Action 4: click "Los"
Action 5: ariaTree
Action 6: click "Weitere" button in Marken filter
Action 7: ariaTree
Action 8: type "Sony" into search box (misread the filter UI)
Action 9: scroll 50% down
Action 10: ariaTree
Action 11: scroll 50% down
Action 12: click "Sony" filter (finally)

⏱ Timed out at 300s. One LLM call near the end used:
   - 147,158 input tokens
   - 62,914 reasoning tokens
   - 75 output tokens
   (= 210k tokens for a single decision)

💥 Crashed: "Cannot read properties of null (reading 'awaitActivePage')"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stagehand never got to sorting or extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 210k-token single-decision call is the headline number.&lt;/strong&gt; That's not a typical call, but it happens when the model keeps re-observing the page without narrowing down. The average call was still in the 30–50k range.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;The filter approach is not magic. Failure modes I've hit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Index-based sliders.&lt;/strong&gt; Amazon's price filter is a &lt;code&gt;&amp;lt;input type="range"&amp;gt;&lt;/code&gt; with &lt;code&gt;min=0 max=145&lt;/code&gt; — the values are &lt;strong&gt;positions&lt;/strong&gt;, not Euros. Setting &lt;code&gt;value=100&lt;/code&gt; puts the slider at bucket 100, which maps to ~1,200 EUR, not 100 EUR. The real value is in &lt;code&gt;aria-valuetext&lt;/code&gt;. Both Sentinel and Stagehand fail here. Fix on my roadmap: aria-valuetext binary search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Non-Latin scripts.&lt;/strong&gt; The tokenizer is Unicode but untested on CJK. I don't think keyword overlap works for Chinese/Japanese at all — probably need to swap to embedding-based scoring for those languages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Synonyms.&lt;/strong&gt; If the user says "submit" and the button says "Apply", keyword overlap scores zero. I mitigate this with the "keep nearby buttons" rule, but it's not a complete fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Vision-only pages.&lt;/strong&gt; Canvas games, interactive maps, WebGL dashboards — the accessibility tree is empty. You need vision grounding. Sentinel has a &lt;code&gt;mode: 'vision'&lt;/code&gt; fallback but it's slower and more expensive.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other things I built along the way
&lt;/h2&gt;

&lt;p&gt;If you read this far you might care about these. Quick list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;fillForm(json)&lt;/code&gt;&lt;/strong&gt; — declarative form filling. Pass a JSON object of fields, Sentinel figures out which input is which.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;intercept(pattern, trigger)&lt;/code&gt;&lt;/strong&gt; — capture the raw API response during a browser action instead of scraping the rendered DOM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOTP/MFA&lt;/strong&gt; — auto-generate 2FA codes during login (&lt;code&gt;mfa: { type: 'totp', secret: '...' }&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner/executor model split&lt;/strong&gt; — use Gemini 3 Pro for planning decisions, Gemini Flash for action execution. Cheaper than all-Pro, smarter than all-Flash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Click-target verification&lt;/strong&gt; — before every click, verify the element at the target coordinates matches the intended target. Catches stale-coordinate bugs in dynamic dropdowns.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub (MIT): &lt;a href="https://github.com/ArasHuseyin/sentinel.ai" rel="noopener noreferrer"&gt;https://github.com/ArasHuseyin/sentinel.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm: &lt;code&gt;@isoldex/sentinel&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Full docs and live benchmark comparison: &lt;a href="https://isoldex.ai" rel="noopener noreferrer"&gt;https://isoldex.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The E2E test from this benchmark: &lt;code&gt;src/__tests__/e2e/amazon-filter-sort.test.ts&lt;/code&gt; — run with your own Gemini key.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you've worked on LLM-driven browser automation and have a better scoring heuristic than keyword overlap, I'd love to hear about it. The current filter works but feels primitive. Embeddings would probably be better if latency allowed.&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
