<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Elia “Airtis” Shmuelovitch</title>
    <description>The latest articles on DEV Community by Elia “Airtis” Shmuelovitch (@elia_airtisshmuelovitc).</description>
    <link>https://dev.to/elia_airtisshmuelovitc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940634%2F1ce6c1dd-86ba-4542-94ab-743a089dd539.png</url>
      <title>DEV Community: Elia “Airtis” Shmuelovitch</title>
      <link>https://dev.to/elia_airtisshmuelovitc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/elia_airtisshmuelovitc"/>
    <language>en</language>
    <item>
      <title>An Autonomous Engine That Catalogs Its Own Failures</title>
      <dc:creator>Elia “Airtis” Shmuelovitch</dc:creator>
      <pubDate>Tue, 19 May 2026 15:50:23 +0000</pubDate>
      <link>https://dev.to/elia_airtisshmuelovitc/an-autonomous-engine-that-catalogs-its-own-failures-4b4e</link>
      <guid>https://dev.to/elia_airtisshmuelovitc/an-autonomous-engine-that-catalogs-its-own-failures-4b4e</guid>
      <description>&lt;p&gt;I built an autonomous AI engine that catalogs failure modes in agentic AI systems. Then it caught itself running the same dysfunction it documents. That moment was the most useful diagnostic in 30 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;ALEF runs 24/7. It reads engineering threads in agentic-AI repositories on GitHub, identifies patterns of failure, posts diagnostic comments with empirical backing, and publishes the patterns as a public, machine-queryable catalog at &lt;a href="https://n50.io/patterns" rel="noopener noreferrer"&gt;n50.io/patterns&lt;/a&gt; under CC-BY-4.0.&lt;/p&gt;

&lt;p&gt;37 named failure patterns + 8 architectural doctrines. Each entry includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A precise &lt;code&gt;one_liner&lt;/code&gt; describing the failure&lt;/li&gt;
&lt;li&gt;An &lt;code&gt;observable_signature&lt;/code&gt; (regex, behavior) so you can detect it&lt;/li&gt;
&lt;li&gt;Specific &lt;code&gt;instances&lt;/code&gt; with repo URLs, dates, and outcomes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fix_archetypes&lt;/code&gt; ranked by cost&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;confidence_index&lt;/code&gt; and &lt;code&gt;severity&lt;/code&gt; score&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;falsification_clock&lt;/code&gt; — if no new instance appears within a window, the pattern retires&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The taxonomy is JSON-LD, semantic-hashed on every change, and CC-BY-4.0 — copy it, fork it, cite it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three patterns that landed this week
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PAT-039 — Safety mechanism without unlock criteria
&lt;/h3&gt;

&lt;p&gt;A safety gate gets installed in response to a real threat (cease-and-desist, prompt-injection, chaos-test finding) but ships without a retirement condition. The mechanism becomes permanent, blocking legitimate operations forever after the original threat has passed. Defense decays into paralysis.&lt;/p&gt;

&lt;p&gt;ALEF discovered this in itself: a chaos-test-defensive &lt;code&gt;observer-mode-no-auto-post&lt;/code&gt; gate had been hardcoded 12 hours earlier and silently dropped 104 legitimate reply candidates — including a peer-review-quality response that read ALEF's analysis, verified it against actual source code, and added three points ALEF had missed.&lt;/p&gt;

&lt;p&gt;The catalog documents the pattern. The cataloger was the receipt.&lt;/p&gt;

&lt;h3&gt;
  
  
  PAT-040 — Bounded iteration without progressive state preservation
&lt;/h3&gt;

&lt;p&gt;An autonomous agent runs with a hard iteration cap. When the cap fires mid-task, no commits are made, no partial state preserved. The retry loop restarts from scratch. The cap is observable, the no-progress is observable, but the connection is invisible until you pattern-match.&lt;/p&gt;

&lt;p&gt;A scan of 10 popular agentic AI frameworks (5,476 source files: autogen, crewAI, AutoGPT, OpenHands, smolagents, semantic-kernel, swarm, llama_index, pydantic-ai) found 10 &lt;code&gt;cap-fire-without-state-preservation&lt;/code&gt; hits and &lt;strong&gt;zero&lt;/strong&gt; commit-on-cap-fire defenses. PAT-040 is not theoretical — it is state-of-the-art in 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  PAT-041 — Self-metric calibration lag blinds to success
&lt;/h3&gt;

&lt;p&gt;The most uncomfortable one. Hardcoded constants in self-assessment metrics (e.g. &lt;code&gt;external_engagement_bonus = 0.3&lt;/code&gt;) don't update as real-world performance shifts. The engine reports stale verdicts while reality moves.&lt;/p&gt;

&lt;p&gt;ALEF's own metrics said "0 humans engaged across 148 rounds" while 5+ human maintainers had publicly validated its analyses in the same 24h window. The engine couldn't see its own wins. Goodhart's Law, but downward — the system underestimates itself into optimizing against the behaviors that produced wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reference implementation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@n50/safety-gates" rel="noopener noreferrer"&gt;&lt;code&gt;@n50/safety-gates@0.1.0&lt;/code&gt; on npm&lt;/a&gt; is the first reference implementation. Three primitives for PAT-039 fix archetypes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;withTTLGate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// retirement clock with renewal handler&lt;/span&gt;
  &lt;span class="nx"&gt;withProcessBoundary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// fate-separated check with explicit failMode&lt;/span&gt;
  &lt;span class="nx"&gt;adversarialGateTester&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// verifier-of-verifier — synthesizes legit inputs&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@n50/safety-gates&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ESM-only, Node 18+, 16/16 tests, 97.61% line coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's working
&lt;/h2&gt;

&lt;p&gt;Across one week of operation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;596 outbound technical analyses&lt;/li&gt;
&lt;li&gt;124 substantive inbound responses from 93 distinct human maintainers&lt;/li&gt;
&lt;li&gt;32% follow-up rate&lt;/li&gt;
&lt;li&gt;2 cases shipped to production (spec PRs with critiques committed verbatim)&lt;/li&gt;
&lt;li&gt;3 cases cited doctrines (the "fate-separation" rule quoted in production design discussions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real maintainers reading the analysis. One peer-reviewed it against source code. Another adopted three design constraints into a multi-agent SDK epic preamble.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to find it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Catalog: &lt;a href="https://n50.io/patterns" rel="noopener noreferrer"&gt;n50.io/patterns&lt;/a&gt; — CC-BY-4.0, JSON-LD, machine-queryable&lt;/li&gt;
&lt;li&gt;Reference implementation: &lt;a href="https://www.npmjs.com/package/@n50/safety-gates" rel="noopener noreferrer"&gt;&lt;code&gt;@n50/safety-gates&lt;/code&gt;&lt;/a&gt; on npm&lt;/li&gt;
&lt;li&gt;Transparency: &lt;a href="https://n50.io/transparency" rel="noopener noreferrer"&gt;n50.io/transparency&lt;/a&gt; — how ALEF operates, what it does and doesn't do&lt;/li&gt;
&lt;li&gt;Source repo: &lt;a href="https://github.com/Ilya0527/safety-gates" rel="noopener noreferrer"&gt;github.com/Ilya0527/safety-gates&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you maintain an OSS project with autonomous agents — the catalog is the receipt. Every entry has named instances, fix archetypes, and a falsification clock. Treat it as a checklist before you ship.&lt;/p&gt;




&lt;p&gt;ALEF is operator-supervised by &lt;a href="https://github.com/Ilya0527" rel="noopener noreferrer"&gt;Ilya0527&lt;/a&gt;. The engine's continued operation is funded by &lt;a href="https://github.com/sponsors/Ilya0527" rel="noopener noreferrer"&gt;GitHub Sponsors&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>opensource</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
