I applied the Unix Philosophy to AI Agents. Here’s why plain text beats API swarms.

LuckyOneTwoThree — Thu, 25 Jun 2026 16:41:44 +0000

Everyone loves a 10-minute demo of an AI agent building a snake game. But try to build a production-ready full-stack app, and the magic dies pretty quickly.

I spent the last few months trying to scale my side projects using various AI coding tools and agent swarms. The pattern was always the same: everything is great until day three. Then the context gets too large. The agent forgets the original product spec and tries to rewrite your database schema when you just asked it to fix a CSS button.

Most frameworks try to fix this by having multiple agents chat with each other over APIs (like a virtual software company). But debugging an API-driven agent conversation is a nightmare. And if the python process crashes, the agents lose all their memory state.

I got fed up and decided to go back to the 1970s.

The Unix Philosophy applied to LLMs

I built an open-source framework called harness-all. Instead of a massive monolithic orchestrator, I applied the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

I physically split my workflow into 5 isolated directories on my hard drive: PM, Design, Dev, Growth, and Ops.

They do not talk over a network. They don't use vector databases. They communicate entirely via a "sneakernet" of Markdown files.

Markdown is the API

Because the agents are decoupled, there is no heavy orchestration layer. The PM agent researches the market, writes a rigid PRD, and dumps it into a docs/handoff/pm-to-solo.md file.

I take two minutes to review that file (human-in-the-loop). If it looks good, the Dev agent picks it up and starts coding. Markdown is literally the API.

You can use the PM agent standalone to just generate specs. Or you can chain them all together with a simple bash script to fully automate building a feature from idea to deployed code.

Forcing Honesty with an Evidence-Based Loop

The biggest remaining issue was that AI agents lie. They write a piece of code, don't run it, and confidently tell you "I fixed the bug!".

To fix this, I hardcoded a state machine using a simple state.yaml file. The Dev agent is structurally forbidden from marking a task as "done" unless it physically runs a bash test and pipes the successful stdout into an evidence.md file. No evidence, no merge.

If the test fails, it logs the error in the yaml state and retries.

Because the entire memory state is serialized to a local file, if my laptop dies on Friday, the agent simply reads the yaml on Monday reboot and resumes the exact same debug loop.

Moving away from the hype

I built this so I could stop typing code and start acting as the reviewer for my own local AI studio.

The repo is fully open-source (MIT) here:
https://github.com/LuckyOneTwoThree/harness-all

I'm really curious how other indie devs are handling context bloat. Has anyone else moved away from heavy API orchestrators back to raw file I/O to keep their projects stable? Let me know.

Stop Using One AI Agent for Everything. I Built a Contract-Driven Multi-Agent Architecture.

LuckyOneTwoThree — Wed, 24 Jun 2026 12:15:00 +0000

github：https://github.com/LuckyOneTwoThree/harness-all

If you've been building apps with AI coding assistants (like Cursor, Claude, or custom LangChain setups), you've probably experienced the "Context Explosion" phenomenon.

You start a project. You tell the AI to act as a Product Manager and write a spec. Then you tell it to act as a Designer to pick colors. Then you ask it to write the backend in Go and the frontend in React.

By day 3, the AI is a mess. It forgets the initial Acceptance Criteria (AC). It hallucinates React components inside your Go server. It tries to fix a CSS bug and accidentally deletes your database connection logic.

We are forcing AIs to be omnipotent "full-stack gods" in a single context window. In the real world, human teams don't work like this. We have physical isolation and separation of concerns.

So, I decided to build an architecture that mimics how real remote teams work. I open-sourced it, and I call it harness-all.

🛠 What is `harness-all`?

github：https://github.com/LuckyOneTwoThree/harness-all

harness-all is not a heavy Python SDK or an API wrapper. It is a file-based, contract-driven Multi-Agent framework family.

Instead of putting all prompts in one giant system prompt, I split the AI's "brain" into 5 physically isolated workspaces:

🎯 harness-pm: Writes PRDs, tracks metrics, and defines strict Acceptance Criteria.
🎨 harness-design: Consumes PRDs and generates Design Systems and Component Maps.
💻 harness-solo: The Developer. Strictly follows TDD and ingests the component maps to write code.
🚀 harness-growth: Handles SEO, funnel events, and marketing copy.
🛡️ harness-ops: Handles IaC, deployment, and security.

They do not talk to each other via live APIs. They communicate exactly like we do: Through Markdown Handoff Documents.

🧠 The 3 Core Architectural Decisions

I wanted to share the technical decisions behind this framework, as I think this pattern solves 90% of current AI agent hallucination issues.

1. The "Sneakernet" Contract Handoff

When the PM agent finishes a spec, it generates a file called docs/handoff/pm-to-solo.md.
The Solo (Dev) agent reads this file. It doesn't know how the PM arrived at these decisions, it only sees the strict ACs.

By isolating the context, the Dev agent doesn't waste its precious LLM context window on user research or marketing personas. It only sees pure engineering requirements. If the Dev agent finds an ambiguity, it pauses and we generate a "Ticket" back to the PM.

2. Explicit State Machine (`state.yaml`) over Implicit RAM

Most AI tools rely on vector databases or hidden conversational memory to remember what they were doing. This is a black box.

In harness-all, memory is an explicit State Machine stored on your hard drive.

# .harness/loops/specs/001-user-login/state.yaml
current_task: 001-user-login
iteration: 3
stage: verify
status: retrying
last_error: "test_auth.py::test_login_empty_password FAILED"

If you close your IDE on Friday and open it on Monday, the AI reads state.yaml and instantly resumes at "Iteration 3, fixing a failed password test." You have 100% read/write access to the AI's memory.

3. The Evidence-Based LOOP Engine

AI is notoriously lazy. It will write a test, see it fail, and tell you "I fixed it!" without actually running it.

I built a strict LOOP engine (Plan → Act → Verify). Before the AI can change a task status to done, it is forced by its constitutional rules to run the bash test command, capture the standard output, and write it to an evidence.md file. No evidence, no merge.

🤝 Looking for Feedback

I originally built this just to help me launch my own indie SaaS projects without losing my sanity. But as the architecture matured, I realized this "Contract-Driven, File-Based" approach is incredibly stable for LLMs.

It’s completely open-source (MIT License) and operates entirely locally within your project directories.

I would love for the Dev.to community to tear it apart, critique the architecture, or try it out for your next side project.

Let me know what you think in the comments! What is the biggest issue you face when using AI agents for large projects?

https://github.com/LuckyOneTwoThree/harness-all

DEV Community: LuckyOneTwoThree