<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gaddour</title>
    <description>The latest articles on DEV Community by Gaddour (@gaddour).</description>
    <link>https://dev.to/gaddour</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1214549%2Ff215ebed-c43b-4d32-b736-fb2b9beaceb5.png</url>
      <title>DEV Community: Gaddour</title>
      <link>https://dev.to/gaddour</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gaddour"/>
    <language>en</language>
    <item>
      <title>Our AI Automation Agent Got Accepted at RoboCon 2026 Conference</title>
      <dc:creator>Gaddour</dc:creator>
      <pubDate>Wed, 26 Nov 2025 13:25:29 +0000</pubDate>
      <link>https://dev.to/gaddour/introducing-our-ai-agent-vision-language-automation-for-real-apps-131b</link>
      <guid>https://dev.to/gaddour/introducing-our-ai-agent-vision-language-automation-for-real-apps-131b</guid>
      <description>&lt;p&gt;Most UI tests today still look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you code the steps,&lt;/li&gt;
&lt;li&gt;you hard-code selectors (IDs, XPath, CSS),&lt;/li&gt;
&lt;li&gt;you pray they don’t break on the next release.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works… until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the accessibility tree is a mess,&lt;/li&gt;
&lt;li&gt;the app runs inside a WebView,&lt;/li&gt;
&lt;li&gt;the UI is legacy or hybrid,&lt;/li&gt;
&lt;li&gt;or there is &lt;strong&gt;no reliable locator at all&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, traditional automation just gives up.&lt;/p&gt;

&lt;p&gt;I’ve been working on a different approach:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of hard-coding selectors and steps,&lt;br&gt;&lt;br&gt;
&lt;strong&gt;let an AI agent build the locator and the action at runtime&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is what our project, &lt;strong&gt;AI Agent&lt;/strong&gt;, is about.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🚀 &lt;strong&gt;AI Agent is open source &amp;amp; early-stage.&lt;br&gt;&lt;br&gt;
If this resonates with you, please ⭐ star the repo:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://github.com/aidriventesting/Agent" rel="noopener noreferrer"&gt;https://github.com/aidriventesting/Agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It helps a &lt;em&gt;lot&lt;/em&gt; with visibility and future sponsorship.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What is AI Agent?
&lt;/h2&gt;

&lt;p&gt;AI Agent is an open-source project that plugs into your existing tests and tools, and moves the “intelligence” to runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You give &lt;strong&gt;an instruction&lt;/strong&gt; (in natural language or structured form).&lt;/li&gt;
&lt;li&gt;The agent analyzes the &lt;strong&gt;current UI&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;It decides &lt;strong&gt;what element to interact with&lt;/strong&gt; and &lt;strong&gt;what action to perform&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Then it calls Appium / Playwright  RF keywords behind the scenes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For accessible apps, the agent can still use locators — but it &lt;strong&gt;builds them on the fly&lt;/strong&gt;, instead of you hard-coding them.&lt;/p&gt;

&lt;p&gt;For non-accessible apps (no IDs, no labels, weird trees), it can switch to &lt;strong&gt;vision-based mode&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
work directly from the screenshot.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why vision-based?
&lt;/h2&gt;

&lt;p&gt;Some apps are just not testable with classic locators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;custom rendering,&lt;/li&gt;
&lt;li&gt;games,&lt;/li&gt;
&lt;li&gt;kiosk / embedded UIs,&lt;/li&gt;
&lt;li&gt;“designer” apps with no semantic structure,&lt;/li&gt;
&lt;li&gt;broken or incomplete accessibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those cases, “find element by ID” is not an option.&lt;/p&gt;

&lt;p&gt;That’s where a &lt;strong&gt;vision agent&lt;/strong&gt; comes in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it receives a &lt;strong&gt;screenshot&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;it detects &lt;strong&gt;interactive regions&lt;/strong&gt; (buttons, inputs, icons…),&lt;/li&gt;
&lt;li&gt;it understands text and layout,&lt;/li&gt;
&lt;li&gt;it chooses where to click / type based on the screen, not the DOM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now, AI Agent integrates &lt;strong&gt;OmniParser&lt;/strong&gt; for this, and the plan is to support more models and eventually a dedicated model tuned for interactive zones in mobile &amp;amp; web UIs.&lt;/p&gt;


&lt;h2&gt;
  
  
  Two ways to use AI Agent
&lt;/h2&gt;

&lt;p&gt;AI Agent is not “all or nothing”.&lt;br&gt;&lt;br&gt;
You can use it in two complementary ways.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Agent mode: &lt;code&gt;Agent.Do&lt;/code&gt; and &lt;code&gt;Agent.Check&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the “agentic” interface.&lt;/p&gt;

&lt;p&gt;You give it a goal at step-level, and it decides what to do on the current screen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Agent.Do&lt;/code&gt; → perform an action based on the instruction and UI
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Agent.Check&lt;/code&gt; → verify something visually or semantically on the current screen
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example (simplified Robot Framework style):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight robot_framework"&gt;&lt;code&gt;&lt;span class="gh"&gt;*** Settings ***&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="bp"&gt;Library&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;AIAgentLibrary&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="gh"&gt;*** Test Cases ***&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;Login With Runtime Agent&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Open Application&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;my_app&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Agent.Do&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;Tap the login button&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Agent.Do&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;Type "user@example.com" into the email field&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Agent.Do&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;Type "Secret123" into the password field&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Agent.Do&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;Submit the login form&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nf"&gt;Agent.Check&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;Verify that the home screen is visible and shows the username&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No hard-coded XPath.&lt;br&gt;&lt;br&gt;
The agent looks at the UI / accessibility / screenshot and makes a decision in the moment.&lt;/p&gt;

&lt;p&gt;Today, this is step-by-step.&lt;br&gt;&lt;br&gt;
The roadmap includes &lt;code&gt;Agent.Autonomous&lt;/code&gt; for multi-step flows in one shot.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. AI-in-the-loop tools for any test
&lt;/h2&gt;

&lt;p&gt;You don’t have to rewrite your whole suite to use AI.&lt;/p&gt;

&lt;p&gt;AI Agent also provides small, focused keywords/tools that you can drop into any existing test. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Locate GUI element visually&lt;/strong&gt; → get bounding box / description of an element on the screen.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explain what is on this screen&lt;/strong&gt; → useful for debugging and test failure analysis.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report a bug with visual context&lt;/strong&gt; → capture screenshot + regions + description.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggest a locator&lt;/strong&gt; → propose a more robust selector based on the UI.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This “AI in the loop” mode is meant to &lt;strong&gt;augment&lt;/strong&gt; your traditional tests, not replace them.&lt;br&gt;&lt;br&gt;
You keep your framework, your asserts, your structure — and use AI only where it actually helps.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works (high level)
&lt;/h2&gt;

&lt;p&gt;Under the hood, AI Agent has three main parts:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. UI understanding from structure
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;uses whatever is available: accessibility tree, DOM, widget hierarchy, etc.
&lt;/li&gt;
&lt;li&gt;can build locators dynamically and choose good candidates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. UI understanding from vision
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;uses models like &lt;strong&gt;OmniParser&lt;/strong&gt; (for now) to parse screenshots into blocks, text, regions.
&lt;/li&gt;
&lt;li&gt;future: dedicated model for &lt;strong&gt;“interactive zones”&lt;/strong&gt; (tappable, typable, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Decision layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;takes the current instruction + perceived UI,
&lt;/li&gt;
&lt;li&gt;picks a target element and an action,
&lt;/li&gt;
&lt;li&gt;dispatches to Appium / WebDriver / Robot Framework.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The focus right now is on &lt;strong&gt;reliable per-step decisions&lt;/strong&gt;, clear logs, and reproducible behavior — not on creating a mysterious black-box “magic agent”.&lt;/p&gt;




&lt;h2&gt;
  
  
  📢 AI Agent at RoboCon 2026
&lt;/h2&gt;

&lt;p&gt;AI Agent will be presented at &lt;strong&gt;RoboCon 2026&lt;/strong&gt; in Helsinki, the main Robot Framework community conference:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.robocon.io/agenda/helsinki#what-if-robot-framework-have-a-brain" rel="noopener noreferrer"&gt;https://www.robocon.io/agenda/helsinki#what-if-robot-framework-have-a-brain&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The talk will explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why runtime locator generation matters,
&lt;/li&gt;
&lt;li&gt;how vision-based perception fits into real-world testing,
&lt;/li&gt;
&lt;li&gt;how &lt;code&gt;Agent.Do&lt;/code&gt; / &lt;code&gt;Agent.Check&lt;/code&gt; and future &lt;code&gt;Agent.Autonomous&lt;/code&gt; can live together with classic Robot Framework suites.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re attending RoboCon 2026, come say hi and bring your weirdest UI problems. 😄&lt;/p&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Short-term
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Improve runtime locator generation from accessibility / DOM.
&lt;/li&gt;
&lt;li&gt;Strengthen the OmniParser integration and add alternative vision backends.
&lt;/li&gt;
&lt;li&gt;Provide robust &lt;code&gt;Agent.Do&lt;/code&gt; / &lt;code&gt;Agent.Check&lt;/code&gt; implementations with good logging.
&lt;/li&gt;
&lt;li&gt;Expose useful “AI-in-the-loop” keywords for common use cases:

&lt;ul&gt;
&lt;li&gt;visual location,
&lt;/li&gt;
&lt;li&gt;smart attachments for bug reports,
&lt;/li&gt;
&lt;li&gt;visual checks.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mid-term
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Agent.Autonomous&lt;/code&gt; for multi-step flows.
&lt;/li&gt;
&lt;li&gt;A custom model for interactive UI zones.
&lt;/li&gt;
&lt;li&gt;Benchmarks for agent-based vs selector-based testing.
&lt;/li&gt;
&lt;li&gt;Better support for non-accessible and legacy apps.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⭐ How you can support the project right now
&lt;/h2&gt;

&lt;p&gt;If you want this direction to exist for real and stay open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;⭐ Star the GitHub repo&lt;/strong&gt; (this is the most important signal)&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://github.com/aidriventesting/Agent" rel="noopener noreferrer"&gt;https://github.com/aidriventesting/Agent&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Share the project&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your QA / automation team,
&lt;/li&gt;
&lt;li&gt;anyone fighting with fragile locators,
&lt;/li&gt;
&lt;li&gt;people working on vision/agentic testing.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Open issues&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your use cases,
&lt;/li&gt;
&lt;li&gt;screenshots of hard-to-test UIs,
&lt;/li&gt;
&lt;li&gt;ideas for AI-in-the-loop keywords.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Support the project financially&lt;/strong&gt; via Open Collective (infra, models, device farms):&lt;br&gt;&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://opencollective.com/ai-testing-agent" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://opencollective.com/ai-testing-agent" rel="noopener noreferrer"&gt;https://opencollective.com/ai-testing-agent&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Get involved
&lt;/h2&gt;

&lt;p&gt;I’m especially interested in feedback from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mobile &amp;amp; web test engineers,
&lt;/li&gt;
&lt;li&gt;people dealing with inaccessible / legacy UIs,
&lt;/li&gt;
&lt;li&gt;researchers working on UI understanding or agents,
&lt;/li&gt;
&lt;li&gt;teams that want to bring “just enough AI” into existing test suites.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Comments, critiques, weird edge cases… all welcome.&lt;br&gt;&lt;br&gt;
Let’s see how far we can push &lt;strong&gt;runtime UI automation&lt;/strong&gt; with an AI agent in the loop.&lt;/p&gt;

</description>
      <category>testing</category>
      <category>automation</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
