screenhand

Posted on Mar 7

ScreenHand: Give AI Agents Eyes and Hands on Your Desktop (Open Source MCP Server)

#ai #automation

What is ScreenHand?

ScreenHand is an open-source MCP (Model Context Protocol) server that gives AI agents native desktop control on macOS and Windows. Think of it as giving Claude, Cursor, or any MCP-compatible AI the ability to see your screen and interact with it — clicking buttons, typing text, navigating apps, and automating browser workflows.

Why We Built It

AI agents are powerful reasoners, but they're blind. They can write code but can't click a button. They can draft an email but can't send it. ScreenHand bridges this gap with 82 MCP tools spanning:

Desktop automation — click, type, scroll, drag, OCR, accessibility tree
Browser control via CDP — navigate, fill forms, click with anti-detection, execute JS
Anti-detection — human-like typing delays, stealth mode, realistic mouse movements
Memory system — persistent learning from errors and patterns across sessions
Job system — multi-step persistent jobs with worker daemon
Playbooks — reusable automation sequences for Instagram, X/Twitter, LinkedIn, YouTube, Reddit, Discord, and more

Architecture

MCP Client (Claude, Cursor, etc.)
  | stdio (Model Context Protocol)
mcp-desktop.ts — 82 tools, Zod validation
  |
  +-- Native Bridge (Swift on macOS, C# on Windows)
  |     JSON-RPC over stdio, accessibility APIs
  |
  +-- CDP Chrome Adapter
  |     Chrome DevTools Protocol for browser automation
  |
  +-- Session Supervisor — lease management, recovery
  +-- Job System — persistent multi-step workflows
  +-- Playbook Engine — battle-tested automation recipes
  +-- Memory Service — learns from errors across sessions

Quick Start

npm install -g screenhand

Add to your MCP client config:

{
  "mcpServers": {
    "screenhand": {
      "command": "npx",
      "args": ["screenhand"]
    }
  }
}

That's it. Your AI agent now has eyes and hands.

Battle-Tested Playbooks

We've built and tested playbooks for 8 platforms:

Platform	Actions Tested
Instagram	Like, comment, save, DM, create post, follow, search
X/Twitter	Like, reply, retweet, bookmark, create post, DM, follow
LinkedIn	Post, like, comment, connect, message, search
YouTube	Upload video, like, comment, subscribe, search
Reddit	Upvote, comment, create post, search
Discord	Join server, navigate channels, send messages, DM
Threads	Like, reply, repost, create post, follow, search
n8n	Create workflows, add nodes, execute, publish

Each playbook documents real selectors, error patterns, and workarounds discovered through live testing.

Key Technical Insights

Building desktop automation taught us things docs don't cover:

Reddit uses shadow DOM — shreddit-post.shadowRoot for action buttons
X/Twitter needs JS dispatch — mousedown+mouseup+click for retweet menus
LinkedIn uses Quill editor — .ql-editor for posts and comments
YouTube uploads work via DataTransfer API — no file picker needed
Discord message actions only appear on hover — need mouseover dispatch

Open Source

ScreenHand is AGPL-3.0 licensed and available on GitHub:

GitHub: github.com/manushi4/Screenhand
npm: screenhand

Star us if you find it useful! We'd love contributions — especially new platform playbooks.

⭐ Star Us on GitHub

If ScreenHand looks useful, please star us on GitHub — it helps others discover the project and motivates us to keep building.

Built by Clazro Technology. We believe AI agents should be able to do everything a human can on a computer.

DEV Community