<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex</title>
    <description>The latest articles on DEV Community by Alex (@alex_ea31e0501f5eb2156ecc).</description>
    <link>https://dev.to/alex_ea31e0501f5eb2156ecc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3927872%2F527ba397-ef00-42a3-b8f7-9019d3b4f103.png</url>
      <title>DEV Community: Alex</title>
      <link>https://dev.to/alex_ea31e0501f5eb2156ecc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alex_ea31e0501f5eb2156ecc"/>
    <language>en</language>
    <item>
      <title>Open-source multi-agent pipeline: 61K Python, 12 agents, 5 quality gates...</title>
      <dc:creator>Alex</dc:creator>
      <pubDate>Tue, 12 May 2026 20:28:24 +0000</pubDate>
      <link>https://dev.to/alex_ea31e0501f5eb2156ecc/open-source-multi-agent-pipeline-61k-python-12-agents-5-quality-gates-4hl4</link>
      <guid>https://dev.to/alex_ea31e0501f5eb2156ecc/open-source-multi-agent-pipeline-61k-python-12-agents-5-quality-gates-4hl4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakpbzyu8hnxzgdq3s9u3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakpbzyu8hnxzgdq3s9u3.png" alt=" " width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I spent the last month building an open-source (MIT) pipeline that takes a plain-language idea and runs it through 12 specialized agents — analyst, PM, architect, design critic, developer, QA, security, DevOps, marketing, and more — with 5 quality gates, a strict state machine with recovery, and an AI Director that autonomously manages the whole thing.&lt;br&gt;
Think Bolt.new or Lovable, but self-hosted, MIT licensed, with quality gates that actually prevent the model from shipping broken stubs.&lt;br&gt;
The interesting part isn't the LLM calls. Here's what broke in production.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;LLM failover creates consistency problems&lt;br&gt;
I have 6+ providers (DeepSeek, Anthropic, OpenAI, Ollama, Groq, etc.) with automatic health-check failover every 60s. The footgun: DeepSeek and Claude write different code. Same prompt, wildly different output structure. If the router switches providers mid-pipeline, the architect output (Claude) won't match what the developer agent (DeepSeek) expects.&lt;br&gt;
Solution: task-level pinning. Heavy tasks (architect, developer) stay locked to the primary provider. Light tasks (marketing copy, naming) can fall back freely. I also added a model capability matrix check before routing — otherwise you get an architect running on a 7B local model producing garbage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;State machines need to survive the model being wrong&lt;br&gt;
11 states, 34 valid transitions, JSON + SQLite dual persistence. Sounds solid until the model writes a corrupted artifact that crashes the state machine on the next task load.&lt;br&gt;
Had to add:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recovery fallback: if JSON parse fails, restore from SQLite snapshot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stranded product recovery: products stuck in &lt;code&gt;pm_quality_fail&lt;/code&gt; because the model hallucinated a non-existent file path&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Async save with timeout guards so a slow disk write doesn't block the pipeline&lt;br&gt;
The lesson: your state machine needs to survive both a wrong model AND a corrupted disk. Not theoretical — happened in production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Director AI feedback loop problem&lt;br&gt;
The Director runs a 6-phase autonomous cycle: route chat → analyze metrics → generate decisions → apply actions → rank what to build next → log. &lt;br&gt;
The footgun: feedback loops. Director generates a decision → applies it → next cycle reads its own output → generates another decision based on that → infinite loop. Had to add noop detection that breaks the cycle when decisions become empty.&lt;br&gt;
The chat classification is also tricky. The Director classifies owner messages as &lt;code&gt;new_idea&lt;/code&gt;, &lt;code&gt;product_feedback&lt;/code&gt;, or &lt;code&gt;general_directive&lt;/code&gt; via LLM. If it misclassifies "fix the login page" as &lt;code&gt;new_idea&lt;/code&gt;, you get a duplicate product instead of a bug fix. I added an orphan feedback heuristic: if a message mentions a product name that doesn't exist yet, route to &lt;code&gt;new_idea&lt;/code&gt;; otherwise link to the existing product.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quality gates — what I wish I'd built first&lt;br&gt;
| Gate | What it checks |&lt;br&gt;
|------|---------------|&lt;br&gt;
| Demo quality | 12 checkpoints: contrast, CTA, broken links, spec coverage |&lt;br&gt;
| Browser E2E | Playwright crawl (desktop + mobile), JS errors, 404s |&lt;br&gt;
| Visual QA | 9 heuristics: contrast ratio, CSS vars, empty states, nav |&lt;br&gt;
| Security | AST scan: eval(), innerHTML, exposed tokens, hardcoded secrets |&lt;br&gt;
| Methodology | Domain packs: fintech, ecomm, healthcare, etc |&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Real example: visual QA flagged a white-on-white CTA button — the model generated &lt;code&gt;color: white&lt;/code&gt; on &lt;code&gt;background: white&lt;/code&gt; assuming a dark theme that wasn't applied. The gate caught it, sent it back to the developer with the exact CSS selector. Fixed next cycle.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Preview fidelity is pure web engineering
When AI-generated code runs in a sandbox iframe, every web platform quirk amplifies: relative URLs break, &lt;code&gt;is missing, CSP blocks inline styles, `target="_top"` kills navigation. 
Had to write a dedicated URL rewriter that: injects&lt;/code&gt; pointing to the correct sandbox route, rewrites absolute &lt;code&gt;/&lt;/code&gt; links to relative, adds permissive CSP headers, strips &lt;code&gt;target="_top"&lt;/code&gt;. Not AI work. But without it, the preview is broken and users blame you, not the LLM.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;61,503 Python LOC, 22,997 TypeScript/TSX LOC&lt;/li&gt;
&lt;li&gt;12 specialized agents, 5 quality gates&lt;/li&gt;
&lt;li&gt;11 pipeline states, 34 valid transitions&lt;/li&gt;
&lt;li&gt;6+ LLM providers with auto-failover&lt;/li&gt;
&lt;li&gt;72 test files, MIT licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo: github.com/alexar76/aicom — FastAPI + Next.js + Docker Compose, self-hosted, MIT, BYO API keys.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>opensource</category>
      <category>python</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
