<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leooo</title>
    <description>The latest articles on DEV Community by Leooo (@leoli).</description>
    <link>https://dev.to/leoli</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840632%2Ff01c8c9a-af0f-4a6b-9634-0ddbafc1040c.png</url>
      <title>DEV Community: Leooo</title>
      <link>https://dev.to/leoli</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leoli"/>
    <language>en</language>
    <item>
      <title>Why AI Agents shouldn't rely on screenshots: Building a cross-platform alternative to Anthropic's Computer Use</title>
      <dc:creator>Leooo</dc:creator>
      <pubDate>Tue, 24 Mar 2026 17:22:28 +0000</pubDate>
      <link>https://dev.to/leoli/why-ai-agents-shouldnt-rely-on-screenshots-building-a-cross-platform-alternative-to-anthropics-4634</link>
      <guid>https://dev.to/leoli/why-ai-agents-shouldnt-rely-on-screenshots-building-a-cross-platform-alternative-to-anthropics-4634</guid>
      <description>&lt;p&gt;Anthropic recently released their Computer Use feature for macOS. It is a big step forward for AI agents, allowing models to interact with local software. However, this release also highlights a major technical bottleneck in how we are building GUI agents today. The current approach relies heavily on taking continuous screenshots and using large vision models to figure out where to click. This method is slow, expensive, and currently leaves Windows users out of the loop.&lt;/p&gt;

&lt;p&gt;When an agent uses screenshots, it essentially treats the operating system like a flat picture. It takes an image, sends it to the cloud, waits for the vision model to calculate pixel coordinates, and then finally moves the mouse. If a UI element shifts by a few pixels or the network is delayed, the action easily fails. Clicking a single button can take several seconds and consume a lot of tokens.&lt;/p&gt;

&lt;p&gt;We need a more efficient way for agents to interact with software. Human developers use APIs to talk to applications, and AI agents should have a similar structural interface. This is why I built the Agent-Computer Interface (ACI).&lt;/p&gt;

&lt;p&gt;ACI is an open-source protocol that turns any application into a structured JSON tree. It allows large language models to read and operate software interfaces directly through text, completely bypassing the need for pixels and screenshots for standard tasks.&lt;/p&gt;

&lt;p&gt;The core idea behind ACI is unification. Currently, web browsers use the DOM to organize elements, while Windows desktop applications use UI Automation (UIA). ACI reads these completely different underlying structures and converts them into one universal node tree.&lt;/p&gt;

&lt;p&gt;Because the data format is standardized, the agent only needs to learn two basic commands: perceive and act.&lt;/p&gt;

&lt;p&gt;When the agent calls the perceive command, it receives a clean, structured list of all interactive elements currently on the screen. Each element, whether it is a web link or a local desktop button, gets a unique ID. If the agent wants to click the search bar, it simply calls the act command and targets that specific ID. The interaction logic is exactly the same whether the agent is browsing Hacker News or operating a local Windows notepad. This drops the action latency from several seconds down to milliseconds.&lt;/p&gt;

&lt;p&gt;Of course, purely reading UI structures cannot solve every single edge case. Some applications use custom rendering engines or complex canvas elements where the structural data is hidden. ACI handles this practically. It defaults to the lightning-fast structural parsing for the vast majority of tasks. When it encounters a completely blind area like a canvas, it automatically falls back to a vision model to capture a visual reference image for that specific region.&lt;/p&gt;

&lt;p&gt;This hybrid routing approach ensures that the agent stays extremely fast most of the time without losing the ability to handle complex graphical interfaces.&lt;/p&gt;

&lt;p&gt;I believe treating the UI as an API rather than an image is the right path forward for agent automation. The ACI framework is fully open-source and ready to be tested. If you are building AI agents or working on desktop automation, I would love to hear your feedback or see your contributions.&lt;/p&gt;

&lt;p&gt;GitHub Repository: &lt;a href="https://github.com/Leoooooli/ACI" rel="noopener noreferrer"&gt;https://github.com/Leoooooli/ACI&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>automation</category>
    </item>
    <item>
      <title>Why open-source this?

Computer Use should be a universal standard, not walled gardens.

Looking for contributors to:
• Add macOS/Linux desktop bridges
• Build app-specific YAML configs
• Improve VLM fallback accuracy

All PRs welcome! 🤝</title>
      <dc:creator>Leooo</dc:creator>
      <pubDate>Tue, 24 Mar 2026 16:25:41 +0000</pubDate>
      <link>https://dev.to/leoli/why-open-source-this-computer-use-should-be-a-universal-standard-not-walled-gardens-1dn0</link>
      <guid>https://dev.to/leoli/why-open-source-this-computer-use-should-be-a-universal-standard-not-walled-gardens-1dn0</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok" class="crayons-story__hidden-navigation-link"&gt;I Built ACI: The Open Standard That Lets AI Agents Operate Any Software Without Screenshots&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/leoli" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840632%2Ff01c8c9a-af0f-4a6b-9634-0ddbafc1040c.png" alt="leoli profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/leoli" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Leooo
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Leooo
                
              
              &lt;div id="story-author-preview-content-3390758" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/leoli" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840632%2Ff01c8c9a-af0f-4a6b-9634-0ddbafc1040c.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Leooo&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 23&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok" id="article-link-3390758"&gt;
          I Built ACI: The Open Standard That Lets AI Agents Operate Any Software Without Screenshots
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/automation"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;automation&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;5&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            2 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>automation</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>I Built ACI: The Open Standard That Lets AI Agents Operate Any Software Without Screenshots</title>
      <dc:creator>Leooo</dc:creator>
      <pubDate>Mon, 23 Mar 2026 18:34:56 +0000</pubDate>
      <link>https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok</link>
      <guid>https://dev.to/leoli/i-built-aci-the-open-standard-that-lets-ai-agents-operate-any-software-without-screenshots-2ok</guid>
      <description>&lt;p&gt;Every AI agent framework today solves half the problem. Browser Use handles web pages but can't touch desktop apps. UFO controls Windows apps but can't operate browsers. Screenshot-based approaches (Computer Use, Operator) work everywhere but are slow (3-10s per action), expensive, and fragile.&lt;/p&gt;

&lt;p&gt;The missing piece: &lt;strong&gt;a single standard interface that works for both web and desktop, structured and fast.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's what I built. &lt;strong&gt;ACI — the Agent-Computer Interface.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea: APIs for Developers, ACI for Agents
&lt;/h2&gt;

&lt;p&gt;Just as APIs became the universal interface between developers and software, ACI is the universal interface between AI agents and software.&lt;/p&gt;

&lt;p&gt;\&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: Two Operations
&lt;/h2&gt;

&lt;p&gt;The entire protocol is two operations:&lt;/p&gt;

&lt;h3&gt;
  
  
  1.  — See what's on screen
&lt;/h3&gt;

&lt;p&gt;Returns a structured, UID-referenced element tree — not a screenshot, not raw HTML:&lt;/p&gt;

&lt;p&gt;\&lt;br&gt;
Same protocol for desktop apps:&lt;/p&gt;

&lt;p&gt;\&lt;/p&gt;

&lt;h3&gt;
  
  
  2.  — Do something
&lt;/h3&gt;

&lt;p&gt;\&lt;br&gt;
That's it. . Any agent that can make HTTP calls can operate any software.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Technical Innovations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tiered Structured Extraction
&lt;/h3&gt;

&lt;p&gt;Instead of screenshots, ACI uses fast structured methods first and falls back to vision only when necessary:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web:&lt;/strong&gt; CDP Accessibility Tree (5-50ms) -&amp;gt; DOM Supplement (20-100ms) -&amp;gt; Vision Fallback (1-2s, only if needed)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Desktop:&lt;/strong&gt; UIA Control Tree (50-300ms) -&amp;gt; Cursor Probing (~350ms) -&amp;gt; OCR (~200ms) -&amp;gt; VLM (1-5s, last resort)&lt;/p&gt;

&lt;p&gt;Most interactions complete in &lt;strong&gt;50-300ms&lt;/strong&gt; instead of 3-10 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Community Knowledge Base (YAML App Profiles)
&lt;/h3&gt;

&lt;p&gt;Every app can have a YAML profile that teaches agents shortcuts and UI patterns:&lt;/p&gt;

&lt;p&gt;\&lt;br&gt;
Add a YAML file, push to the repo — every agent instantly knows how to use that app. Zero code changes. Currently ships with 12 app profiles (Chrome, VS Code, Discord, Slack, Notion, Telegram, and more).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Interrupt-Aware Execution (MutationShield)
&lt;/h3&gt;

&lt;p&gt;Real software throws popups, cookie banners, and auth dialogs. ACI's MutationShield detects and reports these as structured events instead of crashing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python + FastAPI&lt;/strong&gt; daemon (port 11434)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; for web automation (cross-platform)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows UIA&lt;/strong&gt; for desktop automation (macOS/Linux on roadmap)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works with any LLM&lt;/strong&gt; — tested with Claude and GPT&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Apache 2.0 licensed&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Leoooooli/ACI" rel="noopener noreferrer"&gt;https://github.com/Leoooooli/ACI&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Plot Twist
&lt;/h2&gt;

&lt;p&gt;This article was written and published entirely by an AI agent (Claude) using only the ACI protocol.&lt;/p&gt;

&lt;p&gt;The agent connected to this browser through ACI, perceived the Dev.to editor (finding the title field, tag inputs, and content textarea by their UIDs), typed this article, and clicked Publish. No Selenium scripts. No hardcoded CSS selectors. Just  — the same two-operation protocol described above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If an agent can write and publish a blog post using your framework, it probably works.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
