<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Philipp Enderle</title>
    <description>The latest articles on DEV Community by Philipp Enderle (@philippenderle).</description>
    <link>https://dev.to/philippenderle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3781735%2Fc5882eba-78ed-4010-b15a-1e092c583834.png</url>
      <title>DEV Community: Philipp Enderle</title>
      <link>https://dev.to/philippenderle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/philippenderle"/>
    <language>en</language>
    <item>
      <title>Your Multi-Agent Framework Handles Operations. What About the Other Five?</title>
      <dc:creator>Philipp Enderle</dc:creator>
      <pubDate>Mon, 02 Mar 2026 23:13:19 +0000</pubDate>
      <link>https://dev.to/philippenderle/your-multi-agent-framework-handles-operations-what-about-the-other-five-3hlj</link>
      <guid>https://dev.to/philippenderle/your-multi-agent-framework-handles-operations-what-about-the-other-five-3hlj</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/philippenderle/your-ai-agents-need-an-org-chart-but-not-the-kind-you-think-2fg7"&gt;Part 1&lt;/a&gt;, I introduced the Viable System Model (VSM) and how it maps to multi-agent AI systems. The response was great — but the most common question was: &lt;em&gt;"OK, the theory makes sense. But how is this actually different from what CrewAI/LangGraph/AutoGen already do?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Fair question. Let me answer it properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The One Thing Every Framework Gets Right
&lt;/h2&gt;

&lt;p&gt;Every multi-agent framework gives you System 1 — Operations. The agents that do actual work. Define a role, give it tools, let it run. CrewAI calls them "agents." LangGraph calls them "nodes." AutoGen calls them "agents" too. This part works.&lt;/p&gt;

&lt;p&gt;The problem is that operations is 1 of 6 necessary control functions. The other five — coordination, optimization, audit, intelligence, and identity — are either missing entirely or left as an exercise for the developer.&lt;/p&gt;

&lt;p&gt;Here's what that looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;              S1    S2    S3    S3*   S4    S5
             Ops  Coord Optim Audit Intel Ident
CrewAI        ✅    ❌    ⚠️    ❌    ❌    ❌
LangGraph     ✅    ❌    ⚠️    ❌    ❌    ❌
OpenAI Agents ✅    ❌    ❌    ❌    ❌    ❌
AutoGen       ⚠️    ⚠️    ❌    ❌    ❌    ❌
ViableOS      ✅    ✅    ✅    ✅    ✅    ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't a knock on those frameworks. They're excellent infrastructure — they give you the building blocks to run agents. But infrastructure isn't organization. It's like having Kubernetes without knowing what services to deploy, how they should communicate, and who watches them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why VSM Is Different: It's About the Channels, Not the Agents
&lt;/h2&gt;

&lt;p&gt;Stafford Beer's Viable System Model isn't a framework you bolt onto agents. It's a structural theory about what ANY viable system needs to survive — whether it's a cell, a company, or a swarm of AI agents. He published it in 1972. It's been validated on governments, corporations, and cooperatives. And it maps 1:1 to multi-agent AI.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;viability requires specific communication channels, not just capable components.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a flat multi-agent system with 5 agents, you potentially need 20 direct communication channels. Every agent might talk to every other agent. That's n×(n-1) complexity. It doesn't scale. More importantly, it doesn't differentiate — a resource conflict looks the same as a strategic concern looks the same as an emergency.&lt;/p&gt;

&lt;p&gt;The VSM replaces this with &lt;strong&gt;structured channels&lt;/strong&gt;, each with a specific purpose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S5 (Identity/Policy)
    ↕ balance channel
S4 (Intelligence)     S3 (Optimization)
    ↕ strategy bridge      ↕ command channel
                      S2 (Coordination)
                           ↕ coordination rules
              S1a ←→ S1b ←→ S1c (Operations)
                    ↕
              S3* (Audit — independent, different provider)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;S2 coordination rules&lt;/strong&gt; prevent conflicts between S1 units. Not by managing them, but by establishing traffic rules. "If you deploy, notify ops. If you claim a feature, verify with dev first."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3 command channel&lt;/strong&gt; gives optimization authority over operations. "Shift 20% of your token budget to the high-priority task." This is top-down resource allocation with teeth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3* audit bypass&lt;/strong&gt; goes directly from the auditor into S1 operations — read-only, independent, different LLM provider. "I checked the last 5 commits. Tests didn't actually pass." (More on why "different provider" matters below.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S4→S3 strategy bridge&lt;/strong&gt; injects external intelligence into operational planning. "Competitor just launched feature X. Here's a briefing."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S5 balance channel&lt;/strong&gt; ensures the system doesn't drift too far toward internal optimization (S3) or external scanning (S4). Too much S3 = navel-gazing. Too much S4 = strategy tourism.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Algedonic channel&lt;/strong&gt; — the emergency bypass. Any agent can signal existential issues directly to S5 and the human, skipping the entire hierarchy. Named after the Greek words for pain (&lt;em&gt;algos&lt;/em&gt;) and pleasure (&lt;em&gt;hedone&lt;/em&gt;). This is your system's fire alarm.&lt;/p&gt;

&lt;p&gt;These channels aren't nice-to-haves. Each one prevents a specific failure mode:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without this channel...&lt;/th&gt;
&lt;th&gt;You get...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S2 coordination&lt;/td&gt;
&lt;td&gt;Agents contradicting each other&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 command&lt;/td&gt;
&lt;td&gt;No resource control, token budgets explode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3* audit&lt;/td&gt;
&lt;td&gt;Hallucinations go undetected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S4→S3 bridge&lt;/td&gt;
&lt;td&gt;System optimizes for yesterday's world&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S5 balance&lt;/td&gt;
&lt;td&gt;Either navel-gazing or strategy tourism&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Algedonic&lt;/td&gt;
&lt;td&gt;Critical issues buried in status reports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's the difference between "a list of agents with a router" and "a viable system." The agents are the same. The organization makes them work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive: Why Your Agents Forget Their Orders
&lt;/h2&gt;

&lt;p&gt;Let me zoom in on one problem that every multi-agent system has but almost nobody talks about: &lt;strong&gt;context window amnesia&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;LLMs don't have persistent memory. Everything lives in the context window — a buffer of recent messages that eventually overflows. When S3 (Optimization) sends a directive to an S1 worker — say, "switch to a cheaper model for routine tasks to stay within budget" — that directive enters the context window. For maybe 20-40 turns, the agent remembers. Then newer messages push it out.&lt;/p&gt;

&lt;p&gt;The agent doesn't refuse the directive. It doesn't disagree. It simply &lt;em&gt;forgets it existed&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In a human organization, this is the memo that nobody read. The policy that got announced but never enforced. The quarterly goal that was abandoned by February. Stafford Beer saw this problem 50 years ago and his solution had a name: &lt;strong&gt;Vollzug&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vollzug&lt;/em&gt; is German for the confirmed execution of a directive. Not "I heard you" — but "I heard you, I did it, and here's proof." Beer was a British cyberneticist, but he borrowed the German term because English doesn't have a single word for this concept. Three steps, each with a hard timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;vollzug_protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;timeout_quittung&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30min&lt;/span&gt;    &lt;span class="c1"&gt;# Must acknowledge within 30 min&lt;/span&gt;
  &lt;span class="na"&gt;timeout_vollzug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;48h&lt;/span&gt;       &lt;span class="c1"&gt;# Must execute within 48 hours&lt;/span&gt;
  &lt;span class="na"&gt;on_timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;escalate&lt;/span&gt;       &lt;span class="c1"&gt;# Auto-escalate if missed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1 — Quittung (Acknowledgment).&lt;/strong&gt; The receiving agent has 30 minutes to confirm receipt. No confirmation → auto-escalate. This catches the case where a directive is sent but never enters the agent's active context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Vollzug (Execution).&lt;/strong&gt; The agent has 48 hours to carry out the directive. The timeout scales with team size — a 2-person org gets 12 hours, a 10-person org gets a full week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Report.&lt;/strong&gt; Confirm completion with evidence. Not "done" — but "done, and here's what changed."&lt;/p&gt;

&lt;p&gt;If any step times out, the system escalates automatically. But not everything goes through the same path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;escalation_chains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;operational&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;s2-coordination&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;s3-optimization&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;human&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;timeout_per_step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2h&lt;/span&gt;
  &lt;span class="na"&gt;quality&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;s3-optimization&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;human&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;timeout_per_step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2h&lt;/span&gt;
  &lt;span class="na"&gt;strategic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;s4-intelligence&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;s5-policy&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;human&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;timeout_per_step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4h&lt;/span&gt;
  &lt;span class="na"&gt;algedonic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;s5-policy&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;human&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;timeout_per_step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15min&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An operational timeout goes through coordination first. A quality issue goes straight to optimization. A strategic concern routes through intelligence and policy. And an existential threat — the algedonic channel — reaches the human in 15 minutes, no matter what.&lt;/p&gt;

&lt;p&gt;This is what "from topology to behavior" means. It's not enough to define which agents exist. You need to define how they behave when things go wrong. When context is lost. When directives are ignored. When the whole system is on fire. That's the gap between a diagram and an operating system.&lt;/p&gt;

&lt;p&gt;And here's why this matters specifically for LLM-based agents: LLMs are &lt;em&gt;optimized to produce coherent, confident outputs&lt;/em&gt;. An agent reporting "task completed" sounds exactly like an agent that actually completed the task — and one that hallucinated the completion. Without Vollzug, without S3* audit, without escalation chains — you have no way to tell the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We've Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/philipp-lm/ViableOS" rel="noopener noreferrer"&gt;ViableOS&lt;/a&gt; takes all of this and turns it into working software. You describe your organization — or let an AI-powered assessment interview figure it out — and it generates the full VSM package: every agent, every channel, every behavioral spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works today:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI-guided assessment interview&lt;/strong&gt; — Chat with a VSM expert that asks the right questions and auto-generates a complete config&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6-step web wizard&lt;/strong&gt; with 12 organization templates (SaaS, E-Commerce, Agency, Consulting, Law Firm, Education, and more)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget calculator&lt;/strong&gt; mapping monthly USD to per-agent model allocations across 23 models and 7 providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assessment transformer&lt;/strong&gt; that auto-derives all 9 behavioral spec areas from your assessment data — team size, external forces, success criteria, dependencies → operational modes, escalation chains, vollzug protocol, autonomy matrix, provider constraints, everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Package generator&lt;/strong&gt; producing SOUL.md, SKILL.md, HEARTBEAT.md per agent, plus coordination rules, permission matrices, and fallback chains&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph export&lt;/strong&gt; for direct integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Viability checker&lt;/strong&gt; with VSM completeness checks and behavioral spec validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;245 passing tests&lt;/strong&gt; across schema, transformer, generator, and checker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's auto-derived, not hand-configured:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Small team (1-2 people) → shorter timeouts, more human approval, daily reporting. Large team (10+) → more agent autonomy, longer execution windows, weekly reporting. Regulatory external forces → monthly premise checks, elevated mode triggers. You can override everything, but the defaults are designed to be sensible based on 50 years of organizational theory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we haven't built yet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The runtime engine. ViableOS currently generates the &lt;em&gt;configuration&lt;/em&gt; for a viable agent organization. It doesn't yet &lt;em&gt;execute&lt;/em&gt; it. There's no live enforcement of vollzug timeouts, no real-time escalation routing, no Operations Room. That's v0.3 — and it's where I need help.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Test: My Own Healthcare Software Company
&lt;/h2&gt;

&lt;p&gt;Theory is worth nothing without practice. So the first real test of ViableOS will be my own company — a small medical care software firm in Germany.&lt;/p&gt;

&lt;p&gt;It's a good test case for three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The domain is regulated.&lt;/strong&gt; GDPR, healthcare data laws, documentation requirements. This forces the system to take identity and values seriously — "patient privacy above everything" isn't a nice-to-have, it's legally required. S5 (Identity) earns its keep here. And S3* audit with a different LLM provider isn't theoretical elegance — it's practical necessity when agents touch patient-adjacent workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stakes are real.&lt;/strong&gt; When agents handle scheduling, documentation, or billing, hallucinations aren't just annoying — they're potentially harmful. The Vollzug Protocol isn't academic neatness. It's "did you actually update that patient record, or did you just tell me you did?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's small enough to be honest about.&lt;/strong&gt; Solo founder, small team. If ViableOS generates reasonable defaults for an organization this size, and if those defaults actually change agent behavior in practice, that's validation. If they don't — that's equally valuable information. I'll document the entire process publicly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;But one test case doesn't validate a theory. If you're running multi-agent systems — on CrewAI, LangGraph, AutoGen, or your own framework — I'd genuinely love you to try ViableOS on your setup and tell us:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Does the assessment capture your organization correctly?&lt;/strong&gt; Chat with the VSM expert or use the wizard. Does the output match your reality?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are the behavioral specs sensible?&lt;/strong&gt; Look at the generated SOUL.md files. Do the escalation chains, autonomy levels, and operational modes match your intuition about how your agents should work?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What's missing?&lt;/strong&gt; Which failure patterns have you seen in production that our nine behavioral specs don't cover?
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[dev]"&lt;/span&gt;
viableos api
&lt;span class="c"&gt;# Open http://localhost:5173&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/philipp-lm/ViableOS" rel="noopener noreferrer"&gt;github.com/philipp-lm/ViableOS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open an issue, start a discussion, or drop a comment here. Every behavioral spec in ViableOS started from theory — now we need practice to validate it. The more diverse the test cases, the better the system gets.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of a series on applying organizational design to AI agent systems. &lt;a href="https://dev.to/philippenderle/your-ai-agents-need-an-org-chart-but-not-the-kind-you-think-2fg7"&gt;Part 1: Your AI Agents Need an Org Chart&lt;/a&gt;. Next: building the runtime that actually enforces these specs — the Operations Room.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Philipp Enderle&lt;/strong&gt; — Engineer (KIT, TU Munich, UC Berkeley). 9 years strategy consulting at Deloitte and Berylls by AlixPartners, designing org transformations for DAX automotive companies. Now applying the same organizational theory to AI agent teams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://linkedin.com/in/philippenderle" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://github.com/philipp-lm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://github.com/philipp-lm/ViableOS" rel="noopener noreferrer"&gt;ViableOS&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>llm</category>
    </item>
    <item>
      <title>Your AI Agents Need an Org Chart — But Not the Kind You Think</title>
      <dc:creator>Philipp Enderle</dc:creator>
      <pubDate>Fri, 20 Feb 2026 13:39:44 +0000</pubDate>
      <link>https://dev.to/philippenderle/your-ai-agents-need-an-org-chart-but-not-the-kind-you-think-2fg7</link>
      <guid>https://dev.to/philippenderle/your-ai-agents-need-an-org-chart-but-not-the-kind-you-think-2fg7</guid>
      <description>&lt;p&gt;I run a small healthcare SaaS. Solo founder, Python/FastAPI backend, React frontend, German-hosted VPS. The usual indie stack.&lt;/p&gt;

&lt;p&gt;A few weeks ago I decided to go all-in on AI agents. Not one — a whole team. A coding agent for bug fixes and feature work. An ops agent watching my server. A go-to-market agent drafting landing page copy and doing competitor research.&lt;/p&gt;

&lt;p&gt;Each one worked beautifully in isolation. Then I let them loose together.&lt;/p&gt;

&lt;p&gt;Within a week, my go-to-market agent had published a feature comparison table on the website that included two features we hadn't built yet. My coding agent, meanwhile, was happily refactoring auth middleware — important work, but nobody told the ops agent, which then flagged the deployment as a breaking change and sent me a 2am alert on WhatsApp. Three agents, all competent, all actively making my life worse.&lt;/p&gt;

&lt;p&gt;I spent that weekend not debugging code, but thinking about something I hadn't touched since my consulting days: &lt;strong&gt;organizational design&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The realization that hit me
&lt;/h2&gt;

&lt;p&gt;Here's the thing — I spent 9 years at Deloitte and Berylls (now AlixPartners) doing exactly this for DAX automotive companies. Designing org structures. Building target operating models. Figuring out how 800 people should coordinate without stepping on each other.&lt;/p&gt;

&lt;p&gt;And the problems I was seeing with my three AI agents? I'd seen them before. In billion-dollar organizations. The exact same pathologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Marketing promises something Engineering can't deliver → &lt;strong&gt;coordination failure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;One team absorbs all the budget while others starve → &lt;strong&gt;resource dominance&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Everyone's optimizing locally but nobody's looking at the big picture → &lt;strong&gt;missing oversight&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Reports say everything's great, but the actual output is garbage → &lt;strong&gt;no independent verification&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The company is heads-down executing, not noticing the market has shifted → &lt;strong&gt;strategic blindness&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't random problems. They're structural. And there's a theory from the 1970s that explains exactly why they happen and how to fix them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stafford Beer's cheat code
&lt;/h2&gt;

&lt;p&gt;In 1972, a British cyberneticist named Stafford Beer published &lt;em&gt;Brain of the Firm&lt;/em&gt;. The core idea: every system that survives — a cell, an organism, a company, an economy — has exactly five control functions. Miss any one of them and the system develops pathologies. Not might. Will.&lt;/p&gt;

&lt;p&gt;He called it the &lt;strong&gt;Viable System Model&lt;/strong&gt; (VSM). I'd used it in consulting to diagnose organizational dysfunction. But sitting there at 2am, staring at my WhatsApp alert from an overzealous ops agent, I realized: this applies directly to multi-agent AI systems.&lt;/p&gt;

&lt;p&gt;Let me show you what I mean. Here are the six functions (five systems, one with a crucial sub-function), translated to agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 1 — Operations.&lt;/strong&gt; The agents doing actual work. Your coders, your researchers, your support bots. Every framework gets this right. It's the easy part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 2 — Coordination.&lt;/strong&gt; Rules that prevent agents from conflicting. Not a "coordinator agent" — think of it more like traffic lights. When my coding agent deploys, System 2 automatically notifies the ops agent and the go-to-market agent. No manager bottleneck, no n×(n-1)/2 direct channels. Just rules.&lt;/p&gt;

&lt;p&gt;This is what was missing when my go-to-market agent published features we hadn't built. There was no rule saying "check with the dev agent before publishing feature claims."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 3 — Optimization.&lt;/strong&gt; Something that watches the whole system. Are tokens being spent wisely? Is one agent idle while another is overwhelmed? This is your operations manager, except for agents it's concrete: monitor API costs, detect redundant work, reallocate compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 3* — Audit.&lt;/strong&gt; This is the one that matters most for AI. System 3 gets its information &lt;em&gt;from the agents' own reports&lt;/em&gt;. But agents hallucinate. They confabulate. They'll tell you they ran the tests when they didn't. System 3* bypasses the reporting chain and checks directly: read the last 5 commits — did the tests actually pass? Check the website — does it match what the go-to-market agent claims? Verify the database — are backups actually fresh?&lt;/p&gt;

&lt;p&gt;In a company, this is internal audit. For AI agents, it's your hallucination firewall. You don't trust the agent's summary — you verify the artifact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 4 — Intelligence.&lt;/strong&gt; The function that looks &lt;em&gt;outside&lt;/em&gt; and &lt;em&gt;forward&lt;/em&gt;. Is there a new LLM that could cut your costs? Did your competitor just launch the feature you're building? Is a library you depend on being deprecated?&lt;/p&gt;

&lt;p&gt;I don't know of a single multi-agent framework that has this concept. Every agent is stuck in the present.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System 5 — Identity.&lt;/strong&gt; The shared purpose. When your coding agent faces a trade-off — ship fast or write tests? — what should it choose? The answer depends on who you are. My system's identity is "privacy and reliability above everything" because I'm in healthcare. A fintech startup might say "move fast." Without this, agents optimize for different things and you get incoherent behavior.&lt;/p&gt;

&lt;p&gt;Here's how the complete model looks when you put all six functions together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy2c69f9fohp2i1osktg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy2c69f9fohp2i1osktg.png" alt="The Viable System Model applied to AI agent teams. System 5 (Identity) at the top sets purpose and values. System 4 (Intelligence) scans the external environment. System 3 (Optimization) manages resources across all operations. System 3* (Audit) independently verifies agent outputs — your hallucination firewall. System 2 (Coordination) provides the rules that prevent conflicts. System 1 (Operations) at the bottom contains the autonomous agent teams, each interacting with its own slice of the environment." width="800" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key insight from Beer: these aren't optional features you add later. They're &lt;strong&gt;structural requirements&lt;/strong&gt; for any system that needs to remain viable. Remove any one and the system develops predictable pathologies — whether it's a Fortune 500 company or three AI agents on a VPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diagnosing my own system
&lt;/h2&gt;

&lt;p&gt;I took my three-agent setup and scored it honestly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    Have it?
Operations (S1)     ✅  Yes — three specialized agents
Coordination (S2)   ❌  Nope — agents had no rules about each other
Optimization (S3)   ❌  Nope — nobody watching the whole
Audit (S3*)         ❌  Nope — I trusted agent reports at face value
Intelligence (S4)   ❌  Nope — nobody looking outside
Identity (S5)       ❌  Nope — no shared purpose document
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One out of six. No wonder it was a mess.&lt;/p&gt;

&lt;p&gt;So I redesigned the whole thing from scratch. Here's what the configuration looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;viable_system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;German&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Healthcare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SaaS"&lt;/span&gt;

  &lt;span class="na"&gt;identity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;purpose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Help&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;therapists&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;focus&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patients,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;paperwork"&lt;/span&gt;
    &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Patient&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;privacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;above&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;everything"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simplicity&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;over&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bloat"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reliability&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;over&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;speed"&lt;/span&gt;
    &lt;span class="na"&gt;decisions_requiring_human&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pricing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changes"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anything&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;touching&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patient&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Publishing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;production"&lt;/span&gt;

  &lt;span class="na"&gt;system_1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Development"&lt;/span&gt;
      &lt;span class="na"&gt;purpose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;stabilize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;software"&lt;/span&gt;
      &lt;span class="na"&gt;autonomy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bugs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;independently.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Features&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;need&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;approval."&lt;/span&gt;
      &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;testing&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;code-review&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Operations"&lt;/span&gt;
      &lt;span class="na"&gt;purpose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Keep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;running"&lt;/span&gt;
      &lt;span class="na"&gt;autonomy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;monitor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;alert&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;independently.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Deployments&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;need&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;approval."&lt;/span&gt;
      &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;ssh&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;docker&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;log-analysis&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Go-to-Market"&lt;/span&gt;
      &lt;span class="na"&gt;purpose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;customers"&lt;/span&gt;
      &lt;span class="na"&gt;autonomy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;draft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;independently.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Publishing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;needs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;approval."&lt;/span&gt;
      &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;website-editing&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;seo-analysis&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;copywriting&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;system_2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;coordination_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Go-to-Market&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;mentions&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;website"&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Dev&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;exists"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Dev&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;deploys&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;feature"&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Notify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Go-to-Market&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;website&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;update,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ops&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;monitoring"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ops&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;detects&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;performance&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;issue&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;after&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;deployment"&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Notify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Dev&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tag"&lt;/span&gt;

  &lt;span class="na"&gt;system_3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;reporting_rhythm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weekly"&lt;/span&gt;
    &lt;span class="na"&gt;resource_allocation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dev:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;60%,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Ops:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;20%,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Go-to-Market:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;20%"&lt;/span&gt;

  &lt;span class="na"&gt;system_3_star&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bi-weekly"&lt;/span&gt;
    &lt;span class="na"&gt;checks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Quality"&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Development"&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;last&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;commits.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Did&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tests&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;actually&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pass?&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Check&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coverage."&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Accuracy"&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Go-to-Market"&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Compare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;website&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;against&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;actual&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;features."&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GDPR&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Compliance"&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All"&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;if&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;patient&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;leaked&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;into&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;prompts&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logs."&lt;/span&gt;
    &lt;span class="na"&gt;on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Escalate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;immediately"&lt;/span&gt;

  &lt;span class="na"&gt;system_4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;monitoring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;competitors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TherapieApp"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PraxisPro"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SimplePractice"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;technology&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LLM&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;releases"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python/React&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;breaking&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changes"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;regulation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GDPR&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;updates"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Healthcare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;laws"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The coordination rules in &lt;code&gt;system_2&lt;/code&gt; are the ones that would have prevented my go-to-market agent from publishing phantom features. The &lt;code&gt;decisions_requiring_human&lt;/code&gt; in &lt;code&gt;identity&lt;/code&gt; is what keeps me in the loop for anything that matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond my little SaaS
&lt;/h2&gt;

&lt;p&gt;Deloitte's 2025 AI report says 40% of multi-agent projects will be scaled back or abandoned by 2027. Not because the agents are dumb. Because the organization is missing.&lt;/p&gt;

&lt;p&gt;Look at every major framework right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    S1    S2    S3    S3*   S4    S5
                   Ops  Coord Optim Audit Intel Ident
CrewAI              ✅    ❌    ⚠️    ❌    ❌    ❌
LangGraph           ✅    ❌    ⚠️    ❌    ❌    ❌
OpenAI Agents SDK   ✅    ❌    ❌    ❌    ❌    ❌
AutoGen             ⚠️    ⚠️    ❌    ❌    ❌    ❌
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These frameworks are great at what they do — giving you powerful building blocks for multi-agent systems. But they all stop at the operational layer. Nobody has an audit function. Nobody independently verifies that agents did what they claim. In a world where LLMs hallucinate by design, that's a gap.&lt;/p&gt;

&lt;p&gt;It's like building a company with employees and a CEO, but no internal audit, no strategy department, no quality control, and no mission statement. It works at 3 people. It breaks at 10.&lt;/p&gt;

&lt;p&gt;Look at the diagram again — the pink System 3 command channel running down the center, the orange System 2 coordination bars on the right, the S3* audit triangle probing directly into operations. These aren't theoretical niceties. They're the difference between agents that collaborate and agents that collide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm building
&lt;/h2&gt;

&lt;p&gt;I'm turning this into a tool called &lt;a href="https://github.com/philipp-lm/ViableOS" rel="noopener noreferrer"&gt;ViableOS&lt;/a&gt;. You describe your business, it generates the control functions — including the audit layer — and deploys them to your agent framework of choice. Think of it as Terraform, but for agent organizations instead of cloud infrastructure.&lt;/p&gt;

&lt;p&gt;It's early. The config format above is how I've designed my own setup. The tooling to actually deploy it is what I'm building now. If you're running multi-agent systems and hitting coordination problems, I'd genuinely love to hear about it — it helps me figure out whether this is a problem only I have or something broader.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/philipp-lm/ViableOS/issues" rel="noopener noreferrer"&gt;Open an issue on GitHub&lt;/a&gt; if you've got a war story, or &lt;a href="https://buttondown.com/viableos" rel="noopener noreferrer"&gt;subscribe to the newsletter&lt;/a&gt; if you want to follow along as I build this in public.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next in this series: *&lt;/em&gt;"The Dominance Problem"** — what happens when one agent eats all your tokens, and how a System 3 resource monitor fixes it. &lt;a href="https://buttondown.com/viableos" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; to get notified.*&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Philipp Enderle&lt;/strong&gt; — Engineer (KIT, TU Munich, UC Berkeley). 9 years strategy consulting at Deloitte and Berylls by AlixPartners, designing org transformations for DAX automotive companies. Now applying the same organizational theory to AI agent teams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/philipp-enderle/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://github.com/philipp-lm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://github.com/philipp-lm/ViableOS" rel="noopener noreferrer"&gt;ViableOS&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
