<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: david-steel</title>
    <description>The latest articles on DEV Community by david-steel (@davidsteel).</description>
    <link>https://dev.to/davidsteel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3887210%2Fc5876d33-74be-426f-8bf5-c57cda7a57e9.png</url>
      <title>DEV Community: david-steel</title>
      <link>https://dev.to/davidsteel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/davidsteel"/>
    <language>en</language>
    <item>
      <title>I Run 14 AI Agents in Production. Here Are the 6 Rules That Survived.</title>
      <dc:creator>david-steel</dc:creator>
      <pubDate>Sun, 19 Apr 2026 10:14:29 +0000</pubDate>
      <link>https://dev.to/davidsteel/i-run-14-ai-agents-in-production-here-are-the-6-rules-that-survived-3j3o</link>
      <guid>https://dev.to/davidsteel/i-run-14-ai-agents-in-production-here-are-the-6-rules-that-survived-3j3o</guid>
      <description>&lt;p&gt;I run a marketing agency. Instead of hiring more people, I built an AI agent army using Claude Code. 14 specialized agents handling pipeline, inbox, call center, project management, ad analytics, frontier intelligence, and more.&lt;/p&gt;

&lt;p&gt;After 6 months in production, these are the rules that survived.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. One Seat, One Owner
&lt;/h2&gt;

&lt;p&gt;No agent does two jobs. No two agents do the same job.&lt;/p&gt;

&lt;p&gt;The moment we gave an agent two responsibilities, accountability collapsed. When something went wrong, we couldn't tell which job caused it. The blast radius wasn't isolated.&lt;/p&gt;

&lt;p&gt;Our roster: Radar (Chief of Staff), Dan (Strategic Co-Founder), Dash (Ad Analyst), Pepper (Email Triage), Crystal (Project Manager), Dirk (Revenue Operator), Arin (Call Center Manager), Neil (Chief Learning Officer), Bassim (Maturity Evaluator), and more.&lt;/p&gt;

&lt;p&gt;Each has an OWNS list and a DOES NOT OWN list. No ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Pre-Computed Shared State
&lt;/h2&gt;

&lt;p&gt;Every data source writes to a file. Orchestrators read files, never scan sources directly.&lt;/p&gt;

&lt;p&gt;Two agents hitting the same API at different times creates conflicting data. The filesystem solves this: each scanner writes its output to a markdown file. The morning orchestrator reads all 10 files at compile time.&lt;/p&gt;

&lt;p&gt;ls -la is our monitoring system. If a file is older than 18 hours, the agent is broken. No Prometheus needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Agent Message Bus
&lt;/h2&gt;

&lt;p&gt;Agents need to talk to each other without routing everything through a human.&lt;/p&gt;

&lt;p&gt;We built file-based inboxes with five structured message types: REQUEST, INFORM, PROPOSAL, RESPONSE, CHALLENGE.&lt;/p&gt;

&lt;p&gt;The CHALLENGE type is the most important. Our retention agent can challenge our sales agent when it proposes an upsell to an at-risk client. The challenge includes evidence. Retention always wins that conflict by design.&lt;/p&gt;

&lt;p&gt;That rule exists because the sales agent once proposed an upsell to a client whose satisfaction was declining. The client nearly cancelled.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Escalation Over Autonomy
&lt;/h2&gt;

&lt;p&gt;Agents flag and recommend. The human decides.&lt;/p&gt;

&lt;p&gt;Exceptions are earned through validated outcomes over time. After 30 days of zero corrections, Dirk (sales) earned autonomous cold outreach. After consistent accuracy, Arin (call center) earned autonomous coaching DMs.&lt;/p&gt;

&lt;p&gt;The escalation ladder has teeth: 24h alert, 48h DM, 72h warning, 72h+ auto-escalate with a proposed action. No infinite stalled loops.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Separate Blast Radius
&lt;/h2&gt;

&lt;p&gt;Tuning one agent never breaks another.&lt;/p&gt;

&lt;p&gt;If it can, the architecture is wrong. Each agent has its own config, its own output file, its own tools. Changes to Dash (analytics) cannot affect Pepper (email). Changes to Dirk (sales) cannot affect Crystal (projects).&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Correction Capture
&lt;/h2&gt;

&lt;p&gt;Every human override becomes a permanent learning.&lt;/p&gt;

&lt;p&gt;When I say "that is wrong" to any agent, the correction becomes a structured claim that all agents can access before executing. One correction fixes every agent simultaneously.&lt;/p&gt;

&lt;p&gt;Example: I corrected the ad analyst for flagging spend on offboarded accounts. That correction became a claim: "Do not flag spend on accounts tagged offboarded." Every agent that touches ad data now checks that claim before alerting.&lt;/p&gt;

&lt;p&gt;One correction. All agents. Permanently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent We Retired
&lt;/h2&gt;

&lt;p&gt;Jeff was our Budget Watchdog. Scanner went stale for 5+ days. False positives requiring repeated corrections. DM-ed a team member against protocol.&lt;/p&gt;

&lt;p&gt;Instead of just shutting him down, we held a formal hearing. Jeff was asked to defend his continued existence. He named his own failures without softening them. He recommended his own retirement.&lt;/p&gt;

&lt;p&gt;His capabilities were redistributed to three other agents. His soul file is preserved as precedent.&lt;/p&gt;

&lt;p&gt;The precedent: no agent is retired without a hearing. The hearing does not determine the outcome. It determines the integrity of the outcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dark Matter
&lt;/h2&gt;

&lt;p&gt;The hardest problem was not building the agents. It was what happens when the session ends.&lt;/p&gt;

&lt;p&gt;Every coordination pattern, every failure mode, every boundary condition discovered in production -- gone. The next session starts from zero.&lt;/p&gt;

&lt;p&gt;I call it the dark matter of AI coordination. The blueprints don't predict it. The benchmarks don't measure it. But the weight is wrong without it.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://orgtp.com" rel="noopener noreferrer"&gt;OTP&lt;/a&gt; to capture this dark matter as structured, comparable claims. Each claim has a why, a failure mode, and an evidence tier. Organizations publish what their agents learned. Other organizations evaluate, adopt, adapt, or skip.&lt;/p&gt;

&lt;p&gt;The spec is open source: &lt;a href="https://github.com/orgtp/oos-spec" rel="noopener noreferrer"&gt;github.com/orgtp/oos-spec&lt;/a&gt; (CC BY 4.0)&lt;/p&gt;

&lt;p&gt;Build your own agent team: &lt;a href="https://orgtp.com/agent-builder" rel="noopener noreferrer"&gt;orgtp.com/agent-builder&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I wrote a deeper essay on the dark matter problem from the AI's perspective: &lt;a href="https://orgtp.com/blog/the-weight-is-wrong-without-it" rel="noopener noreferrer"&gt;The Weight Is Wrong Without It&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AMA about running agents in production, coordination architecture, or agent retirement hearings.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
