<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: screenhand</title>
    <description>The latest articles on DEV Community by screenhand (@screenahand).</description>
    <link>https://dev.to/screenahand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3811990%2F714e0db8-33ce-4682-9f1a-90dfc2ac959c.png</url>
      <title>DEV Community: screenhand</title>
      <link>https://dev.to/screenahand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/screenahand"/>
    <language>en</language>
    <item>
      <title>ScreenHand: Give AI Agents Eyes and Hands on Your Desktop (Open Source MCP Server)</title>
      <dc:creator>screenhand</dc:creator>
      <pubDate>Sat, 07 Mar 2026 18:46:46 +0000</pubDate>
      <link>https://dev.to/screenahand/screenhand-give-ai-agents-eyes-and-hands-on-your-desktop-open-source-mcp-server-2b6h</link>
      <guid>https://dev.to/screenahand/screenhand-give-ai-agents-eyes-and-hands-on-your-desktop-open-source-mcp-server-2b6h</guid>
      <description>&lt;h2&gt;
  
  
  What is ScreenHand?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ScreenHand&lt;/strong&gt; is an open-source MCP (Model Context Protocol) server that gives AI agents native desktop control on macOS and Windows. Think of it as giving Claude, Cursor, or any MCP-compatible AI the ability to see your screen and interact with it — clicking buttons, typing text, navigating apps, and automating browser workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Built It
&lt;/h2&gt;

&lt;p&gt;AI agents are powerful reasoners, but they're blind. They can write code but can't click a button. They can draft an email but can't send it. ScreenHand bridges this gap with &lt;strong&gt;82 MCP tools&lt;/strong&gt; spanning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Desktop automation&lt;/strong&gt; — click, type, scroll, drag, OCR, accessibility tree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser control via CDP&lt;/strong&gt; — navigate, fill forms, click with anti-detection, execute JS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-detection&lt;/strong&gt; — human-like typing delays, stealth mode, realistic mouse movements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory system&lt;/strong&gt; — persistent learning from errors and patterns across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job system&lt;/strong&gt; — multi-step persistent jobs with worker daemon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playbooks&lt;/strong&gt; — reusable automation sequences for Instagram, X/Twitter, LinkedIn, YouTube, Reddit, Discord, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP Client (Claude, Cursor, etc.)
  | stdio (Model Context Protocol)
mcp-desktop.ts — 82 tools, Zod validation
  |
  +-- Native Bridge (Swift on macOS, C# on Windows)
  |     JSON-RPC over stdio, accessibility APIs
  |
  +-- CDP Chrome Adapter
  |     Chrome DevTools Protocol for browser automation
  |
  +-- Session Supervisor — lease management, recovery
  +-- Job System — persistent multi-step workflows
  +-- Playbook Engine — battle-tested automation recipes
  +-- Memory Service — learns from errors across sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; screenhand
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to your MCP client config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"screenhand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"screenhand"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your AI agent now has eyes and hands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Battle-Tested Playbooks
&lt;/h2&gt;

&lt;p&gt;We've built and tested playbooks for 8 platforms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Actions Tested&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instagram&lt;/td&gt;
&lt;td&gt;Like, comment, save, DM, create post, follow, search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;X/Twitter&lt;/td&gt;
&lt;td&gt;Like, reply, retweet, bookmark, create post, DM, follow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LinkedIn&lt;/td&gt;
&lt;td&gt;Post, like, comment, connect, message, search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YouTube&lt;/td&gt;
&lt;td&gt;Upload video, like, comment, subscribe, search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reddit&lt;/td&gt;
&lt;td&gt;Upvote, comment, create post, search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discord&lt;/td&gt;
&lt;td&gt;Join server, navigate channels, send messages, DM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Threads&lt;/td&gt;
&lt;td&gt;Like, reply, repost, create post, follow, search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n&lt;/td&gt;
&lt;td&gt;Create workflows, add nodes, execute, publish&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each playbook documents real selectors, error patterns, and workarounds discovered through live testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Technical Insights
&lt;/h2&gt;

&lt;p&gt;Building desktop automation taught us things docs don't cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reddit uses shadow DOM&lt;/strong&gt; — &lt;code&gt;shreddit-post.shadowRoot&lt;/code&gt; for action buttons&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;X/Twitter needs JS dispatch&lt;/strong&gt; — &lt;code&gt;mousedown+mouseup+click&lt;/code&gt; for retweet menus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn uses Quill editor&lt;/strong&gt; — &lt;code&gt;.ql-editor&lt;/code&gt; for posts and comments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube uploads work via DataTransfer API&lt;/strong&gt; — no file picker needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord message actions only appear on hover&lt;/strong&gt; — need mouseover dispatch&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;ScreenHand is AGPL-3.0 licensed and available on GitHub:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/manushi4/Screenhand" rel="noopener noreferrer"&gt;github.com/manushi4/Screenhand&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;code&gt;screenhand&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Star us if you find it useful! We'd love contributions — especially new platform playbooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⭐ Star Us on GitHub
&lt;/h2&gt;

&lt;p&gt;If ScreenHand looks useful, please &lt;a href="https://github.com/manushi4/Screenhand" rel="noopener noreferrer"&gt;star us on GitHub&lt;/a&gt; — it helps others discover the project and motivates us to keep building.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://github.com/manushi4/Screenhand" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fgithub%2Fstars%2Fmanushi4%2FScreenhand%3Fstyle%3Dsocial" alt="GitHub stars" width="82" height="20"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://clazro.com" rel="noopener noreferrer"&gt;Clazro Technology&lt;/a&gt;. We believe AI agents should be able to do everything a human can on a computer.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
