I built a Windows computer-use MCP server in pure Go. One EXE. No Python. No Docker.
It's a single 27 MB executable that gives local LLMs (Claude, Gemini, Cursor, Kiro, OpenCode, Ollama, you name it) the ability to actually use a Windows desktop. Think of it as giving an LLM a mouse, keyboard, eyes, and long-term memory.
(Screenshots and demo clips coming soon — wanted to get the post up first.)
Over 14,000 lines of Go, zero dependencies on OpenCV, go-ole, or any COM binding library. Almost every subsystem was implemented from scratch in pure Go instead of wrapping existing libraries.
The fun part
I'm not a professional software engineer. I built this by pair-programming with multiple AI models across hundreds of iterations.
And I didn't spend a cent on API tokens. Gemini CLI (free tier), Claude Code (trial credits), Ollama (local, free), GitHub Copilot, OpenCode's Big Pickle (200K ctx, free). The ollama launch claude trick was a workhorse — point Claude Code at a Nemotron or MiniMax3 locally and get agentic scaffolding on budget hardware.
Why I built it
This started as "I want my AI to click a button." It somehow turned into a Windows automation framework with vision, memory, and a training pipeline.
But the real reason? A friend who's disabled and uses Narrator as their primary computer interface. They asked for a month to test once I was ready to go public. After countless trials and Python's "works on my machine" nonsense, I wanted something that actually ships.
What it does
120 MCP tools covering the full desktop stack:
- Mouse, keyboard, screenshot, OCR (native WinRT COM — 2–8x faster than PowerShell)
- Window management, browser automation, File Explorer control
-
find_image/find_all_imageswith triple cascade: template matching → ONNX YOLO → OCR -
ocr_languages, middle mouse, horizontal scroll, fullscreen detection - SQLite memory store — AI remembers UI elements across sessions
- Training pipeline — every click saves screenshot+metadata for future model fine-tuning
- Adaptive engine that learns timing, success rates, and predicts next actions
- Input recorder with click-vs-drag disambiguation, replays as native MCP tools
Battle-tested with: Claude Desktop, Claude Code, Kiro, Cursor, Windsurf, Gemini CLI, OpenCode, Ollama, Antigravity IDE, Cline, Android Studio, Zed, Obsidian, and more.
Under the hood (briefly)
-
Raw COM/WinRT vtable dispatch — no go-ole, no CGO, no C++/WinRT. 36 COM calls through
syscall.SyscallN(vtblMethod(obj, N), ...)with indices hand-verified against Windows SDK headers. - Hand-written NCC template matcher — brute-force O(n⁴) Pearson correlation in pure Go, no OpenCV. Cascades to ONNX YOLO, then WinRT OCR.
- SQLite Bayesian priors — per-class frequency distributions and spatial z-scores computed entirely in SQL. ONNX confidence adjusts based on where elements usually appear per window. No ML framework needed.
Why should you care?
If you're building AI agents, this gives them hands.
If you're building desktop automation, this gives you 120 reusable tools.
If you're learning Windows internals, it shows how raw WinRT COM, OCR, ONNX, and UI automation fit together in a real project.
If you're just curious how far one person can get with modern AI tools, this is my answer.
Security
Yes, it can control your computer. So can AutoHotkey, PowerShell, Selenium, and every RPA tool. This is local-first, every action can be logged, privacy controls toggle at runtime. Not spyware. Not a remote admin tool. Just an automation engine for your own machine.
Things I'm weirdly proud of
- Pure Go, raw COM/WinRT, zero CGO — no bindings, no wrappers
- Hand-written NCC — no OpenCV dependency
- SQLite Bayesian priors — no ML framework for the learning layer
- One 27 MB EXE — no Python, Docker, or Electron
- 120 MCP tools — every OS automation primitive you'd want
- 36 COM vtbl call sites, 51 WinRT IIDs — all annotated, tested, cross-referenced
- Built entirely on free AI tokens — you don't need a budget to build something real
- My GTX 1070 8GB handles YOLO inference fine — you don't need a $3,000 GPU either
I'd love feedback
Especially from people into Go, MCP, local AI, computer vision, automation, or Windows internals. Open issues, suggest features, steal patterns — the repo has templates and a security policy now. Tell me what I broke, what to build next, or that I'm insane for hand-writing COM vtables in Go.
AI didn't build this project. AI became my pair programmer.
The architecture, direction, debugging, testing, and endless "why doesn't Windows do what the docs say?" moments were still mine.
It convinced me that one curious computer technician, persistence, and today's AI tools can build things I wouldn't have been able to create on my own just a few years ago.
Links
GitHub: https://github.com/coff33ninja/go-mcp-computer-use
Docs: docs/reference/tools.md, docs/security.md, docs/mcp-client-configs.md
Top comments (0)