What is ScreenHand?
ScreenHand is an open-source MCP (Model Context Protocol) server that gives AI agents native desktop control on macOS and Windows. Think of it as giving Claude, Cursor, or any MCP-compatible AI the ability to see your screen and interact with it — clicking buttons, typing text, navigating apps, and automating browser workflows.
Why We Built It
AI agents are powerful reasoners, but they're blind. They can write code but can't click a button. They can draft an email but can't send it. ScreenHand bridges this gap with 82 MCP tools spanning:
- Desktop automation — click, type, scroll, drag, OCR, accessibility tree
- Browser control via CDP — navigate, fill forms, click with anti-detection, execute JS
- Anti-detection — human-like typing delays, stealth mode, realistic mouse movements
- Memory system — persistent learning from errors and patterns across sessions
- Job system — multi-step persistent jobs with worker daemon
- Playbooks — reusable automation sequences for Instagram, X/Twitter, LinkedIn, YouTube, Reddit, Discord, and more
Architecture
MCP Client (Claude, Cursor, etc.)
| stdio (Model Context Protocol)
mcp-desktop.ts — 82 tools, Zod validation
|
+-- Native Bridge (Swift on macOS, C# on Windows)
| JSON-RPC over stdio, accessibility APIs
|
+-- CDP Chrome Adapter
| Chrome DevTools Protocol for browser automation
|
+-- Session Supervisor — lease management, recovery
+-- Job System — persistent multi-step workflows
+-- Playbook Engine — battle-tested automation recipes
+-- Memory Service — learns from errors across sessions
Quick Start
npm install -g screenhand
Add to your MCP client config:
{
"mcpServers": {
"screenhand": {
"command": "npx",
"args": ["screenhand"]
}
}
}
That's it. Your AI agent now has eyes and hands.
Battle-Tested Playbooks
We've built and tested playbooks for 8 platforms:
| Platform | Actions Tested |
|---|---|
| Like, comment, save, DM, create post, follow, search | |
| X/Twitter | Like, reply, retweet, bookmark, create post, DM, follow |
| Post, like, comment, connect, message, search | |
| YouTube | Upload video, like, comment, subscribe, search |
| Upvote, comment, create post, search | |
| Discord | Join server, navigate channels, send messages, DM |
| Threads | Like, reply, repost, create post, follow, search |
| n8n | Create workflows, add nodes, execute, publish |
Each playbook documents real selectors, error patterns, and workarounds discovered through live testing.
Key Technical Insights
Building desktop automation taught us things docs don't cover:
-
Reddit uses shadow DOM —
shreddit-post.shadowRootfor action buttons -
X/Twitter needs JS dispatch —
mousedown+mouseup+clickfor retweet menus -
LinkedIn uses Quill editor —
.ql-editorfor posts and comments - YouTube uploads work via DataTransfer API — no file picker needed
- Discord message actions only appear on hover — need mouseover dispatch
Open Source
ScreenHand is AGPL-3.0 licensed and available on GitHub:
- GitHub: github.com/manushi4/Screenhand
-
npm:
screenhand
Star us if you find it useful! We'd love contributions — especially new platform playbooks.
⭐ Star Us on GitHub
If ScreenHand looks useful, please star us on GitHub — it helps others discover the project and motivates us to keep building.
Built by Clazro Technology. We believe AI agents should be able to do everything a human can on a computer.
Top comments (0)