DEV Community

martin
martin

Posted on

One Programm to Slave them all - or how to Control every Existing Programm with Agents

DirectShell 0.3.1 — Control Everything.


The So-Called "State of the Art" in 2026

It's fascinating, really.

One camp is out there hyping Moltbot while unknowingly leaking secrets — watching AIs talk to other AIs in circles and genuinely believing something is "emerging." Spoiler: they're just more puppets on human strings. The other camp is hyping whatever frontier model dropped this week, completely blind to the fact that we're hitting massive bottlenecks and the rate of improvement is shrinking with every release.


And What About Google, OpenAI, and Anthropic?

They keep trying to brute-force marginal progress. GG.

Best example? AI-powered browsers. There you sit, in the year 2026, watching an agent struggle for 25 minutes trying to operate a browser using images — guessing where to click. Let that absurdity sink in for a moment.

A text-based LLM takes screenshots. Those screenshots get converted to base64. That base64 gets sent to another AI which translates it back into text. Then the AI gets to guess which coordinates to click. And then — brace yourselves — a SCRIPT runs. A script, people. Like it's the year 2000. It manually shoves your mouse cursor to a position and clicks. Or it injects something into the browser DOM that immediately gets detected. Can't solve CAPTCHAs. And complex tasks? Let's not even go there.

State of the fucking art, gentlemen.


DirectShell — And Why It Starts a Paradigm Shift

My personal motivation wasn't to build something cool or develop some epic new primitive. It was more like: "Dude... this is just painful at this point. There HAS to be a better way."

And that's exactly what I did. I made it better.

I created a software primitive — a new foundational technology — that uses multiple data channels to control any program or browser. Whether through an agent or through scripts. This tool can read, control, and operate virtually any program, no matter how old. It doesn't need an API. It doesn't need permission. It doesn't violate any TOS or EULA. It simply uses what has been there all along — but nobody bothered to look at.


Real Talk

DirectShell gives every program a usable SQL database and a universal AI interface — in milliseconds. It gives any AI that can use CLI or MCP the ability to control any program on your machine. It replaces proprietary API wrappers with one universal interface. As an AI browser, it uses significantly fewer tokens, takes zero screenshots, and is dramatically faster.

It can solve CAPTCHAs. It can talk to other AI programs like Claude Desktop. Or it can just operate your Paint, your antivirus, or your Notepad.

It's the end of slow, browser-only agents — and the beginning of something new: the ability to give every GUI native AI support.


And Now?

I have absolutely no fucking clue.

Several people have already reached out wanting to contribute. And that's fantastic. DirectShell is only a few days old. There are still 100 bugs — but 100x more potential to discover. We're building a reinforced learning loop, working on faster latencies, and creating config files for all kinds of programs.

But this is just the beginning.


Let's change something with this. I invite everyone to share it. To help with development. Or to simply give feedback.

The current demo video is here: https://youtu.be/rHfVj1KpCDU

The repo is here: https://github.com/IamLumae/DirectShell

And the full technical article is here: https://dev.to/tlrag/i-built-a-new-software-primitive-in-85-hours-it-replaces-the-eyes-of-every-ai-agent-on-earth-55ia

Top comments (0)