<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kacper Gadomski</title>
    <description>The latest articles on DEV Community by Kacper Gadomski (@k4cper-g).</description>
    <link>https://dev.to/k4cper-g</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3801448%2Fbdf69960-715b-4caf-bb76-9b6d92f47adb.jpg</url>
      <title>DEV Community: Kacper Gadomski</title>
      <link>https://dev.to/k4cper-g</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/k4cper-g"/>
    <language>en</language>
    <item>
      <title>We Built an Open Protocol So AI Agents Can Actually See Your Screen</title>
      <dc:creator>Kacper Gadomski</dc:creator>
      <pubDate>Mon, 02 Mar 2026 10:35:26 +0000</pubDate>
      <link>https://dev.to/k4cper-g/we-built-an-open-protocol-so-ai-agents-can-actually-see-your-screen-55aj</link>
      <guid>https://dev.to/k4cper-g/we-built-an-open-protocol-so-ai-agents-can-actually-see-your-screen-55aj</guid>
      <description>&lt;p&gt;You know what's wild? Every major AI lab is building "computer use" agents right now. Models that can look at your screen, understand what they see, and click buttons on your behalf. Anthropic has Claude Computer Use. OpenAI shipped CUA. Microsoft built UFO2.&lt;/p&gt;

&lt;p&gt;And every single one of them is independently solving the same problem: &lt;em&gt;how do you describe a UI to an AI?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We thought that was broken, so we built &lt;strong&gt;Computer Use Protocol (CUP)&lt;/strong&gt;, an open specification that gives AI agents a universal way to perceive and interact with any desktop UI. One format. Every platform. MIT licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/computeruseprotocol/computeruseprotocol" rel="noopener noreferrer"&gt;computeruseprotocol/computeruseprotocol&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://computeruseprotocol.com" rel="noopener noreferrer"&gt;computeruseprotocol.com&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem: Six Platforms, Six Different Languages
&lt;/h2&gt;

&lt;p&gt;Here's the fragmentation that every computer-use agent has to deal with today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Accessibility API&lt;/th&gt;
&lt;th&gt;Role Count&lt;/th&gt;
&lt;th&gt;IPC Mechanism&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;UIA (COM)&lt;/td&gt;
&lt;td&gt;~40 ControlTypes&lt;/td&gt;
&lt;td&gt;COM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;macOS&lt;/td&gt;
&lt;td&gt;AXUIElement&lt;/td&gt;
&lt;td&gt;AXRole + AXSubrole&lt;/td&gt;
&lt;td&gt;XPC / Mach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;AT-SPI2&lt;/td&gt;
&lt;td&gt;~100+ AtspiRole values&lt;/td&gt;
&lt;td&gt;D-Bus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web&lt;/td&gt;
&lt;td&gt;ARIA&lt;/td&gt;
&lt;td&gt;~80 ARIA roles&lt;/td&gt;
&lt;td&gt;In-process / CDP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Android&lt;/td&gt;
&lt;td&gt;AccessibilityNodeInfo&lt;/td&gt;
&lt;td&gt;Java class names&lt;/td&gt;
&lt;td&gt;Binder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iOS&lt;/td&gt;
&lt;td&gt;UIAccessibility&lt;/td&gt;
&lt;td&gt;~15 trait flags&lt;/td&gt;
&lt;td&gt;In-process&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's roughly &lt;strong&gt;300+ combined role types&lt;/strong&gt; across platforms, each with different naming, different semantics, and different ways to query them. If you're building an agent that needs to work on more than one OS, you're writing a lot of glue code.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Solution: One Schema, One Vocabulary
&lt;/h2&gt;

&lt;p&gt;CUP collapses all of that into a single, ARIA-derived schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;59 universal roles&lt;/strong&gt;, the subset that maps cleanly across all 6 platforms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16 state flags&lt;/strong&gt;, only truthy/active states listed (absence = default)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15 action verbs&lt;/strong&gt;, a canonical vocabulary for what agents can do with elements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform escape hatch&lt;/strong&gt;, raw native properties preserved for advanced use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what a CUP snapshot looks like in JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"windows"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1740067200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screen"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2560&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"h"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1440&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"scale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spotify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tree"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"window"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spotify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"bounds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1680&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"h"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1020&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"states"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"focused"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"click"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"children"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whether that UI was captured on Windows via UIA, macOS via AXUIElement, or Linux via AT-SPI2, it comes out looking exactly the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compact Format (97% Fewer Tokens)
&lt;/h2&gt;

&lt;p&gt;Sending JSON trees to an LLM burns context fast. CUP defines a compact text format that's optimized for token efficiency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# CUP 0.1.0 | windows | 2560x1440
# app: Spotify
# 63 nodes (280 before pruning)

[e0] window "Spotify" @120,40 1680x1020
  [e1] document "Spotify" @120,40 1680x1020
    [e2] button "Back" @132,52 32x32 [click]
    [e3] button "Forward" @170,52 32x32 {disabled} [click]
    [e7] navigation "Main" @120,88 240x972
      [e8] link "Home" @132,100 216x40 {selected} [click]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line follows: &lt;code&gt;[id] role "name" @x,y wxh {states} [actions]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Same information. A fraction of the tokens. Your agent sees more UI in less context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Protocol First
&lt;/h2&gt;

&lt;p&gt;CUP is intentionally layered. The protocol is the foundation, and everything else is optional.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Protocol&lt;/strong&gt; (core repo)&lt;/td&gt;
&lt;td&gt;Defines the universal tree format: roles, states, actions, schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SDKs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Capture native accessibility trees, normalize to CUP, execute actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP Servers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Expose CUP as tools for AI agents (Claude Code, Cursor, Copilot, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can adopt just the schema. Or use the SDKs. Or go all the way to MCP integration. Each layer is independent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Install the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# TypeScript&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;computeruseprotocol

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;computeruseprotocol
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Capture a UI tree and interact with it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;computeruseprotocol&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// Capture the active window's UI tree&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;// Click a button&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;e14&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Type into a search box&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;e9&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello world&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Send a keyboard shortcut&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;press&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ctrl+s&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The SDK auto-detects your OS and loads the right platform adapter.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Integration
&lt;/h2&gt;

&lt;p&gt;CUP ships a built-in MCP server. Add it to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt; or equivalent and your agent can start controlling desktop UIs immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cup"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cup-mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This exposes tools like &lt;code&gt;snapshot&lt;/code&gt; (capture window tree), &lt;code&gt;action&lt;/code&gt; (interact with elements), &lt;code&gt;overview&lt;/code&gt; (list all open windows), and &lt;code&gt;find&lt;/code&gt; (search elements in the last tree). It works with Claude Code, Cursor, OpenClaw, Codex, and anything MCP-compatible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Computer-use agents are evolving fast, but the infrastructure layer is still ad-hoc. Every team building an agent that needs to "see" a desktop is solving the same problems from scratch: how to capture UI state, how to represent it for an LLM, how to execute actions reliably across platforms.&lt;/p&gt;

&lt;p&gt;CUP standardizes that layer so teams can focus on what makes their agent unique (the reasoning, the planning, the task execution) instead of reimplementing platform-specific UI perception.&lt;/p&gt;

&lt;p&gt;Think of it like this: HTTP didn't make web browsers smart, but it gave them a common language. CUP aims to do the same for computer-use agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Status &amp;amp; Contributing
&lt;/h2&gt;

&lt;p&gt;CUP is at v0.1.0, early but functional. The spec covers 59 roles mapped across 6 platforms, with SDKs for Python and TypeScript.&lt;/p&gt;

&lt;p&gt;Contributions are very welcome, especially around new role/action proposals with cross-platform mapping rationale, platform mapping improvements, and SDK contributions like new platform adapters or bug fixes.&lt;/p&gt;

&lt;p&gt;Check out the repos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spec:&lt;/strong&gt; &lt;a href="https://github.com/computeruseprotocol/computeruseprotocol" rel="noopener noreferrer"&gt;github.com/computeruseprotocol/computeruseprotocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python SDK:&lt;/strong&gt; &lt;a href="https://github.com/computeruseprotocol/python-sdk" rel="noopener noreferrer"&gt;github.com/computeruseprotocol/python-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript SDK:&lt;/strong&gt; &lt;a href="https://github.com/computeruseprotocol/typescript-sdk" rel="noopener noreferrer"&gt;github.com/computeruseprotocol/typescript-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://computeruseprotocol.com" rel="noopener noreferrer"&gt;computeruseprotocol.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building computer-use agents, cross-platform UI testing, or accessibility tooling, we'd love to hear from you. Open an issue, submit a PR, or just star the repo if you think this problem is worth solving.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CUP is MIT licensed and community-driven. The protocol belongs to everyone building in this space.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>computeruse</category>
      <category>protocol</category>
    </item>
  </channel>
</rss>
