LuckyOneTwoThree

Posted on Jun 24

Stop Using One AI Agent for Everything. I Built a Contract-Driven Multi-Agent Architecture.

#ai #agents #opensource #productivity

github：https://github.com/LuckyOneTwoThree/harness-all

If you've been building apps with AI coding assistants (like Cursor, Claude, or custom LangChain setups), you've probably experienced the "Context Explosion" phenomenon.

You start a project. You tell the AI to act as a Product Manager and write a spec. Then you tell it to act as a Designer to pick colors. Then you ask it to write the backend in Go and the frontend in React.

By day 3, the AI is a mess. It forgets the initial Acceptance Criteria (AC). It hallucinates React components inside your Go server. It tries to fix a CSS bug and accidentally deletes your database connection logic.

We are forcing AIs to be omnipotent "full-stack gods" in a single context window. In the real world, human teams don't work like this. We have physical isolation and separation of concerns.

So, I decided to build an architecture that mimics how real remote teams work. I open-sourced it, and I call it harness-all.

🛠 What is `harness-all`?

github：https://github.com/LuckyOneTwoThree/harness-all

harness-all is not a heavy Python SDK or an API wrapper. It is a file-based, contract-driven Multi-Agent framework family.

Instead of putting all prompts in one giant system prompt, I split the AI's "brain" into 5 physically isolated workspaces:

🎯 harness-pm: Writes PRDs, tracks metrics, and defines strict Acceptance Criteria.
🎨 harness-design: Consumes PRDs and generates Design Systems and Component Maps.
💻 harness-solo: The Developer. Strictly follows TDD and ingests the component maps to write code.
🚀 harness-growth: Handles SEO, funnel events, and marketing copy.
🛡️ harness-ops: Handles IaC, deployment, and security.

They do not talk to each other via live APIs. They communicate exactly like we do: Through Markdown Handoff Documents.

🧠 The 3 Core Architectural Decisions

I wanted to share the technical decisions behind this framework, as I think this pattern solves 90% of current AI agent hallucination issues.

1. The "Sneakernet" Contract Handoff

When the PM agent finishes a spec, it generates a file called docs/handoff/pm-to-solo.md.
The Solo (Dev) agent reads this file. It doesn't know how the PM arrived at these decisions, it only sees the strict ACs.

By isolating the context, the Dev agent doesn't waste its precious LLM context window on user research or marketing personas. It only sees pure engineering requirements. If the Dev agent finds an ambiguity, it pauses and we generate a "Ticket" back to the PM.

2. Explicit State Machine (`state.yaml`) over Implicit RAM

Most AI tools rely on vector databases or hidden conversational memory to remember what they were doing. This is a black box.

In harness-all, memory is an explicit State Machine stored on your hard drive.

# .harness/loops/specs/001-user-login/state.yaml
current_task: 001-user-login
iteration: 3
stage: verify
status: retrying
last_error: "test_auth.py::test_login_empty_password FAILED"

If you close your IDE on Friday and open it on Monday, the AI reads state.yaml and instantly resumes at "Iteration 3, fixing a failed password test." You have 100% read/write access to the AI's memory.

3. The Evidence-Based LOOP Engine

AI is notoriously lazy. It will write a test, see it fail, and tell you "I fixed it!" without actually running it.

I built a strict LOOP engine (Plan → Act → Verify). Before the AI can change a task status to done, it is forced by its constitutional rules to run the bash test command, capture the standard output, and write it to an evidence.md file. No evidence, no merge.

🤝 Looking for Feedback

I originally built this just to help me launch my own indie SaaS projects without losing my sanity. But as the architecture matured, I realized this "Contract-Driven, File-Based" approach is incredibly stable for LLMs.

It’s completely open-source (MIT License) and operates entirely locally within your project directories.

I would love for the Dev.to community to tear it apart, critique the architecture, or try it out for your next side project.

Let me know what you think in the comments! What is the biggest issue you face when using AI agents for large projects?

https://github.com/LuckyOneTwoThree/harness-all

DEV Community

Stop Using One AI Agent for Everything. I Built a Contract-Driven Multi-Agent Architecture.

🛠 What is `harness-all`?

🧠 The 3 Core Architectural Decisions

1. The "Sneakernet" Contract Handoff

2. Explicit State Machine (`state.yaml`) over Implicit RAM

3. The Evidence-Based LOOP Engine

🤝 Looking for Feedback

Top comments (0)

🛠 What is harness-all?

🧠 The 3 Core Architectural Decisions

1. The "Sneakernet" Contract Handoff

2. Explicit State Machine (state.yaml) over Implicit RAM

3. The Evidence-Based LOOP Engine

🤝 Looking for Feedback

🛠 What is `harness-all`?

2. Explicit State Machine (`state.yaml`) over Implicit RAM