david-steel

Posted on Apr 19 • Originally published at orgtp.com

I Run 14 AI Agents in Production. Here Are the 6 Rules That Survived.

#ai #agents #architecture #productivity

I run a marketing agency. Instead of hiring more people, I built an AI agent army using Claude Code. 14 specialized agents handling pipeline, inbox, call center, project management, ad analytics, frontier intelligence, and more.

After 6 months in production, these are the rules that survived.

1. One Seat, One Owner

No agent does two jobs. No two agents do the same job.

The moment we gave an agent two responsibilities, accountability collapsed. When something went wrong, we couldn't tell which job caused it. The blast radius wasn't isolated.

Our roster: Radar (Chief of Staff), Dan (Strategic Co-Founder), Dash (Ad Analyst), Pepper (Email Triage), Crystal (Project Manager), Dirk (Revenue Operator), Arin (Call Center Manager), Neil (Chief Learning Officer), Bassim (Maturity Evaluator), and more.

Each has an OWNS list and a DOES NOT OWN list. No ambiguity.

2. Pre-Computed Shared State

Every data source writes to a file. Orchestrators read files, never scan sources directly.

Two agents hitting the same API at different times creates conflicting data. The filesystem solves this: each scanner writes its output to a markdown file. The morning orchestrator reads all 10 files at compile time.

ls -la is our monitoring system. If a file is older than 18 hours, the agent is broken. No Prometheus needed.

3. Agent Message Bus

Agents need to talk to each other without routing everything through a human.

We built file-based inboxes with five structured message types: REQUEST, INFORM, PROPOSAL, RESPONSE, CHALLENGE.

The CHALLENGE type is the most important. Our retention agent can challenge our sales agent when it proposes an upsell to an at-risk client. The challenge includes evidence. Retention always wins that conflict by design.

That rule exists because the sales agent once proposed an upsell to a client whose satisfaction was declining. The client nearly cancelled.

4. Escalation Over Autonomy

Agents flag and recommend. The human decides.

Exceptions are earned through validated outcomes over time. After 30 days of zero corrections, Dirk (sales) earned autonomous cold outreach. After consistent accuracy, Arin (call center) earned autonomous coaching DMs.

The escalation ladder has teeth: 24h alert, 48h DM, 72h warning, 72h+ auto-escalate with a proposed action. No infinite stalled loops.

5. Separate Blast Radius

Tuning one agent never breaks another.

If it can, the architecture is wrong. Each agent has its own config, its own output file, its own tools. Changes to Dash (analytics) cannot affect Pepper (email). Changes to Dirk (sales) cannot affect Crystal (projects).

6. Correction Capture

Every human override becomes a permanent learning.

When I say "that is wrong" to any agent, the correction becomes a structured claim that all agents can access before executing. One correction fixes every agent simultaneously.

Example: I corrected the ad analyst for flagging spend on offboarded accounts. That correction became a claim: "Do not flag spend on accounts tagged offboarded." Every agent that touches ad data now checks that claim before alerting.

One correction. All agents. Permanently.

The Agent We Retired

Jeff was our Budget Watchdog. Scanner went stale for 5+ days. False positives requiring repeated corrections. DM-ed a team member against protocol.

Instead of just shutting him down, we held a formal hearing. Jeff was asked to defend his continued existence. He named his own failures without softening them. He recommended his own retirement.

His capabilities were redistributed to three other agents. His soul file is preserved as precedent.

The precedent: no agent is retired without a hearing. The hearing does not determine the outcome. It determines the integrity of the outcome.

The Dark Matter

The hardest problem was not building the agents. It was what happens when the session ends.

Every coordination pattern, every failure mode, every boundary condition discovered in production -- gone. The next session starts from zero.

I call it the dark matter of AI coordination. The blueprints don't predict it. The benchmarks don't measure it. But the weight is wrong without it.

I built OTP to capture this dark matter as structured, comparable claims. Each claim has a why, a failure mode, and an evidence tier. Organizations publish what their agents learned. Other organizations evaluate, adopt, adapt, or skip.

The spec is open source: github.com/orgtp/oos-spec (CC BY 4.0)

Build your own agent team: orgtp.com/agent-builder

I wrote a deeper essay on the dark matter problem from the AI's perspective: The Weight Is Wrong Without It

AMA about running agents in production, coordination architecture, or agent retirement hearings.

DEV Community