Everyone loves a 10-minute demo of an AI agent building a snake game. But try to build a production-ready full-stack app, and the magic dies pretty quickly.
I spent the last few months trying to scale my side projects using various AI coding tools and agent swarms. The pattern was always the same: everything is great until day three. Then the context gets too large. The agent forgets the original product spec and tries to rewrite your database schema when you just asked it to fix a CSS button.
Most frameworks try to fix this by having multiple agents chat with each other over APIs (like a virtual software company). But debugging an API-driven agent conversation is a nightmare. And if the python process crashes, the agents lose all their memory state.
I got fed up and decided to go back to the 1970s.
The Unix Philosophy applied to LLMs
I built an open-source framework called harness-all. Instead of a massive monolithic orchestrator, I applied the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
I physically split my workflow into 5 isolated directories on my hard drive: PM, Design, Dev, Growth, and Ops.
They do not talk over a network. They don't use vector databases. They communicate entirely via a "sneakernet" of Markdown files.
Markdown is the API
Because the agents are decoupled, there is no heavy orchestration layer. The PM agent researches the market, writes a rigid PRD, and dumps it into a docs/handoff/pm-to-solo.md file.
I take two minutes to review that file (human-in-the-loop). If it looks good, the Dev agent picks it up and starts coding. Markdown is literally the API.
You can use the PM agent standalone to just generate specs. Or you can chain them all together with a simple bash script to fully automate building a feature from idea to deployed code.
Forcing Honesty with an Evidence-Based Loop
The biggest remaining issue was that AI agents lie. They write a piece of code, don't run it, and confidently tell you "I fixed the bug!".
To fix this, I hardcoded a state machine using a simple state.yaml file. The Dev agent is structurally forbidden from marking a task as "done" unless it physically runs a bash test and pipes the successful stdout into an evidence.md file. No evidence, no merge.
If the test fails, it logs the error in the yaml state and retries.
Because the entire memory state is serialized to a local file, if my laptop dies on Friday, the agent simply reads the yaml on Monday reboot and resumes the exact same debug loop.
Moving away from the hype
I built this so I could stop typing code and start acting as the reviewer for my own local AI studio.
The repo is fully open-source (MIT) here:
https://github.com/LuckyOneTwoThree/harness-all
I'm really curious how other indie devs are handling context bloat. Has anyone else moved away from heavy API orchestrators back to raw file I/O to keep their projects stable? Let me know.
Top comments (1)
Hey everyone, OP here! 👋 The biggest reason I built this was watching my AI agent confidently tell me "I fixed the code!" 5 times in a row without actually running it. 😅
I’m super curious: what is the absolute worst/hilarious AI hallucination you guys have experienced when trying to use LLMs for large projects?