<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DJ Kruger</title>
    <description>The latest articles on DEV Community by DJ Kruger (@coff33ninja).</description>
    <link>https://dev.to/coff33ninja</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2270718%2Fa4cd594c-baf2-4e25-90c2-4d237b24adda.jpg</url>
      <title>DEV Community: DJ Kruger</title>
      <link>https://dev.to/coff33ninja</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/coff33ninja"/>
    <language>en</language>
    <item>
      <title>One EXE. No Python. No Docker. 120 Windows automation tools written in Go.</title>
      <dc:creator>DJ Kruger</dc:creator>
      <pubDate>Tue, 30 Jun 2026 22:30:40 +0000</pubDate>
      <link>https://dev.to/coff33ninja/one-exe-no-python-no-docker-120-windows-automation-tools-written-in-go-i4m</link>
      <guid>https://dev.to/coff33ninja/one-exe-no-python-no-docker-120-windows-automation-tools-written-in-go-i4m</guid>
      <description>&lt;p&gt;I built a Windows computer-use MCP server in pure Go. One EXE. No Python. No Docker.&lt;/p&gt;

&lt;p&gt;It's a single &lt;strong&gt;27 MB executable&lt;/strong&gt; that gives local LLMs (Claude, Gemini, Cursor, Kiro, OpenCode, Ollama, you name it) the ability to actually &lt;em&gt;use&lt;/em&gt; a Windows desktop. Think of it as giving an LLM a mouse, keyboard, eyes, and long-term memory.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Screenshots and demo clips coming soon — wanted to get the post up first.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Over 14,000 lines of Go, zero dependencies on OpenCV, go-ole, or any COM binding library. Almost every subsystem was implemented from scratch in pure Go instead of wrapping existing libraries.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fun part
&lt;/h2&gt;

&lt;p&gt;I'm not a professional software engineer. I built this by pair-programming with multiple AI models across hundreds of iterations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And I didn't spend a cent on API tokens.&lt;/strong&gt; Gemini CLI (free tier), Claude Code (trial credits), Ollama (local, free), GitHub Copilot, OpenCode's Big Pickle (200K ctx, free). The &lt;code&gt;ollama launch claude&lt;/code&gt; trick was a workhorse — point Claude Code at a Nemotron or MiniMax3 locally and get agentic scaffolding on budget hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;This started as "I want my AI to click a button." It somehow turned into a Windows automation framework with vision, memory, and a training pipeline.&lt;/p&gt;

&lt;p&gt;But the real reason? A friend who's disabled and uses Narrator as their primary computer interface. They asked for a month to test once I was ready to go public. After countless trials and Python's "works on my machine" nonsense, I wanted something that actually ships.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;120 MCP tools&lt;/strong&gt; covering the full desktop stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mouse, keyboard, screenshot, OCR (native WinRT COM — 2–8x faster than PowerShell)&lt;/li&gt;
&lt;li&gt;Window management, browser automation, File Explorer control&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;find_image&lt;/code&gt;/&lt;code&gt;find_all_images&lt;/code&gt; with triple cascade: template matching → ONNX YOLO → OCR&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ocr_languages&lt;/code&gt;, middle mouse, horizontal scroll, fullscreen detection&lt;/li&gt;
&lt;li&gt;SQLite memory store — AI remembers UI elements across sessions&lt;/li&gt;
&lt;li&gt;Training pipeline — every click saves screenshot+metadata for future model fine-tuning&lt;/li&gt;
&lt;li&gt;Adaptive engine that learns timing, success rates, and predicts next actions&lt;/li&gt;
&lt;li&gt;Input recorder with click-vs-drag disambiguation, replays as native MCP tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Battle-tested with:&lt;/strong&gt; Claude Desktop, Claude Code, Kiro, Cursor, Windsurf, Gemini CLI, OpenCode, Ollama, Antigravity IDE, Cline, Android Studio, Zed, Obsidian, and more.&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the hood (briefly)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raw COM/WinRT vtable dispatch&lt;/strong&gt; — no go-ole, no CGO, no C++/WinRT. 36 COM calls through &lt;code&gt;syscall.SyscallN(vtblMethod(obj, N), ...)&lt;/code&gt; with indices hand-verified against Windows SDK headers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hand-written NCC template matcher&lt;/strong&gt; — brute-force O(n⁴) Pearson correlation in pure Go, no OpenCV. Cascades to ONNX YOLO, then WinRT OCR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite Bayesian priors&lt;/strong&gt; — per-class frequency distributions and spatial z-scores computed entirely in SQL. ONNX confidence adjusts based on where elements usually appear per window. No ML framework needed.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why should you care?
&lt;/h2&gt;

&lt;p&gt;If you're building AI agents, this gives them hands.&lt;/p&gt;

&lt;p&gt;If you're building desktop automation, this gives you 120 reusable tools.&lt;/p&gt;

&lt;p&gt;If you're learning Windows internals, it shows how raw WinRT COM, OCR, ONNX, and UI automation fit together in a real project.&lt;/p&gt;

&lt;p&gt;If you're just curious how far one person can get with modern AI tools, this is my answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;p&gt;Yes, it can control your computer. So can AutoHotkey, PowerShell, Selenium, and every RPA tool. This is &lt;strong&gt;local-first&lt;/strong&gt;, every action can be logged, privacy controls toggle at runtime. Not spyware. Not a remote admin tool. Just an automation engine for your own machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things I'm weirdly proud of
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pure Go, raw COM/WinRT, zero CGO — no bindings, no wrappers&lt;/li&gt;
&lt;li&gt;Hand-written NCC — no OpenCV dependency&lt;/li&gt;
&lt;li&gt;SQLite Bayesian priors — no ML framework for the learning layer&lt;/li&gt;
&lt;li&gt;One 27 MB EXE — no Python, Docker, or Electron&lt;/li&gt;
&lt;li&gt;120 MCP tools — every OS automation primitive you'd want&lt;/li&gt;
&lt;li&gt;36 COM vtbl call sites, 51 WinRT IIDs — all annotated, tested, cross-referenced&lt;/li&gt;
&lt;li&gt;Built entirely on free AI tokens — you don't need a budget to build something real&lt;/li&gt;
&lt;li&gt;My GTX 1070 8GB handles YOLO inference fine — you don't need a $3,000 GPU either&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  I'd love feedback
&lt;/h2&gt;

&lt;p&gt;Especially from people into Go, MCP, local AI, computer vision, automation, or Windows internals. Open issues, suggest features, steal patterns — the repo has templates and a security policy now. Tell me what I broke, what to build next, or that I'm insane for hand-writing COM vtables in Go.&lt;/p&gt;




&lt;p&gt;AI didn't build this project. AI became my pair programmer.&lt;/p&gt;

&lt;p&gt;The architecture, direction, debugging, testing, and endless "why doesn't Windows do what the docs say?" moments were still mine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It convinced me that one curious computer technician, persistence, and today's AI tools can build things I wouldn't have been able to create on my own just a few years ago.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/coff33ninja/go-mcp-computer-use" rel="noopener noreferrer"&gt;https://github.com/coff33ninja/go-mcp-computer-use&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Docs: &lt;code&gt;docs/reference/tools.md&lt;/code&gt;, &lt;code&gt;docs/security.md&lt;/code&gt;, &lt;code&gt;docs/mcp-client-configs.md&lt;/code&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>opensource</category>
      <category>automation</category>
      <category>go</category>
    </item>
  </channel>
</rss>
