<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ralph van der Horst</title>
    <description>The latest articles on DEV Community by Ralph van der Horst (@ralph_vanderhorst_fc092).</description>
    <link>https://dev.to/ralph_vanderhorst_fc092</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1557495%2F2c55b1d7-b95a-496d-9953-b278122b85aa.png</url>
      <title>DEV Community: Ralph van der Horst</title>
      <link>https://dev.to/ralph_vanderhorst_fc092</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ralph_vanderhorst_fc092"/>
    <language>en</language>
    <item>
      <title>Selenium AI Agent 2.3.0 AI-Powered Browser Automation with 74 Tools</title>
      <dc:creator>Ralph van der Horst</dc:creator>
      <pubDate>Tue, 24 Feb 2026 23:52:40 +0000</pubDate>
      <link>https://dev.to/ralph_vanderhorst_fc092/selenium-ai-agent-230-ai-powered-browser-automation-with-74-tools-443f</link>
      <guid>https://dev.to/ralph_vanderhorst_fc092/selenium-ai-agent-230-ai-powered-browser-automation-with-74-tools-443f</guid>
      <description>&lt;h2&gt;
  
  
  What is Selenium AI Agent?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;selenium-ai-agent&lt;/code&gt; is an MCP (Model Context Protocol) server that gives your AI assistant full control over a real browser.&lt;/p&gt;

&lt;p&gt;It is driven by Selenium WebDriver, controlled by your AI. Once installed, your AI can navigate pages, fill forms, click elements, take screenshots, run tests, heal broken locators, and explore your entire app, all from a single prompt.&lt;/p&gt;

&lt;p&gt;It is driven by Selenium WebDriver, controlled by your AI. Once installed, your AI can navigate pages, fill forms, click elements, take screenshots, run tests, heal broken locators, and explore your entire app — all from a single prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; selenium-ai-agent
npx selenium-ai-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works with Claude Desktop, Claude Code, Cursor, Cline, GitHub Copilot (VS Code 1.99+), and Windsurf — any AI client that supports MCP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"selenium-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"selenium-ai-agent"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requirements: Node.js 18+, Chrome/Firefox/Edge. ChromeDriver is auto-managed — no manual setup needed. If you've used Playwright MCP or similar browser automation agents, Selenium AI Agent will feel immediately familiar. It is same concept, same prompt-driven workflow, but built on Selenium WebDriver. That means you get the battle-tested cross-browser engine that teams have relied on for years, now supercharged with AI.&lt;/p&gt;

&lt;p&gt;Unlike most browser automation MCP servers that run a single local browser, Selenium AI Agent has first-class Selenium Grid support built in. Spin up a full Grid with Docker Compose in one command and run parallel sessions across Chrome and Firefox simultaneously. Or even you can link it with Browserstack and any server-sided grid. Your AI agent can explore multiple URLs at the same time, run cross-browser tests in parallel, and merge results into a single report — all without leaving your prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;grid_start  &lt;span class="c"&gt;# Launches 4 Chrome + 1 Firefox nodes via Docker Compose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Can It Do? (74 Tools)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Navigate &amp;amp; Interact
&lt;/h3&gt;

&lt;p&gt;Navigate to URLs, click elements, type text, hover, drag and drop, press keys, upload files, handle browser dialogs, manage tabs — everything a real user can do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capture &amp;amp; Verify
&lt;/h3&gt;

&lt;p&gt;Take viewport or full-page screenshots, capture the accessibility tree snapshot, verify elements are visible, verify text on page, wait for conditions, monitor network requests, collect console logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Planner
&lt;/h3&gt;

&lt;p&gt;Your AI walks through your app, understands its structure, and produces a structured markdown test plan — ready to hand off to the generator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;planner_setup_page → planner_explore_page → planner_save_plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Generator
&lt;/h3&gt;

&lt;p&gt;AI interacts with your app, records actions with element locators, validates selectors against the live DOM, and writes framework-ready test files. A &lt;code&gt;.test-manifest.json&lt;/code&gt; is created so the healer knows exactly how to run them later.&lt;/p&gt;

&lt;p&gt;Supported: Playwright, WebdriverIO, Selenium Python/Java, Robot Framework and more — it is programming independent as long as it is Selenium.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;generator_setup_page → [interact] → stop_recording → generator_write_test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Self-Healing Tests
&lt;/h3&gt;

&lt;p&gt;When tests break due to UI changes, the healer pipeline finds the drift and fixes it automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;healer_run_tests → healer_inspect_page → healer_fix_test → healer_run_tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;healer_inspect_page&lt;/code&gt; compares your expected locators against the live page, spots UI drift, and suggests fixes. &lt;code&gt;healer_fix_test&lt;/code&gt; validates selectors before writing — no more broken locators silently committed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Selenium Grid — Parallel Execution
&lt;/h3&gt;

&lt;p&gt;First-class Grid support with Docker Compose. Explore multiple URLs simultaneously or run the same test on Chrome and Firefox at the same time. The &lt;code&gt;exploration_merge&lt;/code&gt; tool deduplicates results and builds a unified map of your app across all sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  BiDi Cross-Browser Features
&lt;/h3&gt;

&lt;p&gt;Full-page and element screenshots, PDF generation across Chrome, Firefox, and Edge, real-time console events via BiDi LogInspector, network request monitoring, and stealth mode via BiDi preload script injection.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New in 2.3.0 — Accessibility Tree Discovery
&lt;/h2&gt;

&lt;p&gt;The biggest change in this release: &lt;strong&gt;how the agent sees a page.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before v2.3.0, capturing a page returned a flat list of interactive elements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive Elements:
  [e1] button: Play
  [e2] a: English 7,141,000+ articles
  [e3] a: 日本語 1,491,000+ articles
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No hierarchy. No context. A lot of noise. On a complex page, 100+ elements would fill the entire budget — leaving the AI blind to other parts of the page. A nav link and a skip link looked identical. The agent had no idea what region of the page an element belonged to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Now it looks like this
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- main [e21]
  - button "Play" [e1]
  - heading "Wikipedia — 25 years..." [level=1] [e2]
  - navigation "Main menu" [e3]
    - link "Contents" [e4]
    - link "Current events" [e5]
  - search [e9]
    - searchbox "Search Wikipedia" [e6]
    - button "Search" [e7]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent now understands &lt;strong&gt;where things are&lt;/strong&gt; and &lt;strong&gt;what they mean&lt;/strong&gt; — not just that they exist.&lt;/p&gt;

&lt;p&gt;The new discovery engine walks the DOM recursively from &lt;code&gt;&amp;lt;body&amp;gt;&lt;/code&gt;, computes the ARIA role for each element using the implicit role map, resolves accessible names following the W3C algorithm (&lt;code&gt;aria-label&lt;/code&gt; → &lt;code&gt;aria-labelledby&lt;/code&gt; → &lt;code&gt;alt&lt;/code&gt; → text content), and prunes non-semantic wrappers — &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; with no role are collapsed, promoting their meaningful children up the tree. Refs are only assigned to nodes that have both a role AND a name (or are structural landmarks), keeping the list tight.&lt;/p&gt;

&lt;p&gt;On a real Wikipedia page: &lt;strong&gt;100 elements → 46, a 54% noise reduction&lt;/strong&gt; — with full structural context preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Also in 2.3.0
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;New &lt;code&gt;scroll_page&lt;/code&gt; tool&lt;/strong&gt; — directional scrolling (up/down/left/right) with configurable pixel amounts, plus scroll-into-view by CSS selector. Previously scrolling was jammed into other tools; now it's a first-class citizen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Element ref budget doubled&lt;/strong&gt; — from 100 to 200 refs. Combined with smarter pruning, agents can now navigate dense pages without hitting the ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ElementInfo&lt;/code&gt; now carries &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;level&lt;/code&gt; fields&lt;/strong&gt; — &lt;code&gt;discoverElements()&lt;/code&gt; returns &lt;code&gt;{ elements, tree }&lt;/code&gt; instead of just a flat map.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug fixes&lt;/strong&gt; — deeply nested non-semantic wrappers are now correctly flattened, element resolution order fixed in drag/hover tools, trailing comma consistency across tool schema definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Changelog
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;New:&lt;/strong&gt; Accessibility tree discovery with ARIA role computation · Hierarchical snapshot output · &lt;code&gt;AccessibilityNode&lt;/code&gt; type · &lt;code&gt;formatAccessibilityTree()&lt;/code&gt; utility · &lt;code&gt;ScrollPageTool&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Changed:&lt;/strong&gt; &lt;code&gt;discoverElements()&lt;/code&gt; returns &lt;code&gt;{ elements, tree }&lt;/code&gt; · &lt;code&gt;PageSnapshot&lt;/code&gt; includes &lt;code&gt;tree: AccessibilityNode&lt;/code&gt; · Ref budget 100→200 · Improved text-matching fallback · Grid session and exploration coordinator updated for new discovery API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixed:&lt;/strong&gt; Recursive flattening of &lt;code&gt;__promote&lt;/code&gt; nodes for deeply nested non-semantic wrappers · Trailing commas in tool schemas · Ref resolution order in drag/hover tools&lt;/p&gt;

&lt;p&gt;It is a beta but it is open source. For anyone who wants to help working on this project, contributions are very welcome!&lt;/p&gt;

&lt;p&gt;📦 &lt;a href="https://www.npmjs.com/package/selenium-ai-agent" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · 🐙 &lt;a href="https://github.com/learn-automated-testing/selenium_agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · 📖 &lt;a href="https://learnautomatedtesting.com/blog/selenium-agent-release-accessibility-tree-discovery/" rel="noopener noreferrer"&gt;Full release notes&lt;/a&gt;&lt;/p&gt;

</description>
      <category>selenium</category>
      <category>automation</category>
      <category>ai</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
