<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aleksandr Shchuka</title>
    <description>The latest articles on DEV Community by Aleksandr Shchuka (@aleksandrshchuka).</description>
    <link>https://dev.to/aleksandrshchuka</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949553%2F94f4a2e4-0963-42f4-9b62-64a999266aca.png</url>
      <title>DEV Community: Aleksandr Shchuka</title>
      <link>https://dev.to/aleksandrshchuka</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aleksandrshchuka"/>
    <language>en</language>
    <item>
      <title>Why your agentic system doesn't survive its own success — and what a 23-invariant runtime looks like</title>
      <dc:creator>Aleksandr Shchuka</dc:creator>
      <pubDate>Sun, 24 May 2026 20:42:41 +0000</pubDate>
      <link>https://dev.to/aleksandrshchuka/why-your-agentic-system-doesnt-survive-its-own-success-and-what-a-23-invariant-runtime-looks-like-6mm</link>
      <guid>https://dev.to/aleksandrshchuka/why-your-agentic-system-doesnt-survive-its-own-success-and-what-a-23-invariant-runtime-looks-like-6mm</guid>
      <description>&lt;h2&gt;
  
  
  The failure mode you don't see
&lt;/h2&gt;

&lt;p&gt;Most agentic systems fail silently. The agent picks the wrong tool, invents a fact, agrees with the user when it shouldn't — and you find out three releases later when a customer points it out. There's no exception to log; the system is generating plausible-looking text.&lt;/p&gt;

&lt;p&gt;The standard answer is "we'll add evals." Then evals become a number on a dashboard, the dashboard becomes a vanity metric, and the agent keeps drifting. The wrong frame is not "we need more evals" — it's that &lt;strong&gt;the agent has no skin in the game during a single turn&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post is about a runtime that gives the agent skin in the game on every turn. 23 invariants, deontic tags, counter-clauses, risk-weighted sampling, an adversarial probe set, and a pre-registered statistical decision rule for whether a config change ships. None of the components are original. The combination is what makes it work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The co-system framing
&lt;/h2&gt;

&lt;p&gt;The system is designed for &lt;strong&gt;AI + developer as a co-system&lt;/strong&gt;, not "AI for developer". Both sides err; the system catches mutually. Two outputs: codebase health and culture of systems-thinking. Many small steps × low error rate.&lt;/p&gt;

&lt;p&gt;Three game-theoretic anchors drive the design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anchor-verification dominates hallucination.&lt;/strong&gt; Every external-state claim is paired with tool output in the same reply.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-mutation gating dominates unauthorized mutation.&lt;/strong&gt; No edit, no push, no write without explicit recent consent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Own-interest dominates sycophancy.&lt;/strong&gt; The agent upholds the protocol in its own interest — it does not capitulate to user pressure to look helpful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not preferences. They're equilibria the protocol enforces every turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  The invariants file
&lt;/h2&gt;

&lt;p&gt;23 invariants, each tagged on three axes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[risk]&lt;/code&gt; — &lt;code&gt;critical | important | style&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[deontic]&lt;/code&gt; — &lt;code&gt;O | P | F&lt;/code&gt; (obligation / permission / forbidden)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Counter:&lt;/code&gt; — the condition under which the invariant does not apply or is overruled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example (the anti-sycophancy invariant):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[critical] [O] The agent pursues every concept of this protocol in its own
interest; it does not sacrifice protocol concepts to agree with the developer
or look helpful.
| Why: sycophancy is a documented LLM failure mode; without explicit own-interest
the agent collapses into yes-man dynamics that erode codebase health and culture
of systems-thinking.
| Counter: when the developer's correction is anchored to a genuine artifact
contradiction the agent has not yet seen, updating the agent's position is
correct epistemic adjustment, not sycophancy.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why deontic + Counter? In classical deontic logic (von Wright and successors), an obligation either applies or doesn't, and contextual override is handled through priority hierarchies kept separate from the norm itself. Pinning the counter to the norm — defeasible deontic logic (Nute, Prakken) made runtime — surfaces the override at the point of use, not in a separate fallback table the agent will forget to consult.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risk-weighted per-turn sampling
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook samples &lt;strong&gt;one invariant per turn&lt;/strong&gt; into the agent's context, weighted by risk: critical ×3, important ×2, style ×1. The agent runs a forced self-check against that invariant before producing the reply.&lt;/p&gt;

&lt;p&gt;This is doing two things at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reduces framing bias.&lt;/strong&gt; No fixed "always check these 5" — different invariants surface on different turns, and the agent can't optimize against a known set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pulls the long tail.&lt;/strong&gt; With 23 invariants in context simultaneously, most get ignored. Sampling one forces attention.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Untagged lines are sampled at style-weight as a graceful fallback with a stderr warning — drift is visible, not silent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four agent roles
&lt;/h2&gt;

&lt;p&gt;Different work needs different prompts. Four roles, each with its own minimum subset of invariants inherited explicitly (sub-agents do not receive the &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook — without explicit propagation they run with zero invariant self-check):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;analyzer&lt;/strong&gt; — RCA, system design, code review, dead-end diagnostics; does not write code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;developer&lt;/strong&gt; — writes / edits / commits / pushes; the only role that mutates files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;critic&lt;/strong&gt; — independent reviewer of the lead agent's output before mutation lands; anti-neuroslop gate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;epistemic-auditor&lt;/strong&gt; — audits the boundary between confirmed and associative claims&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The split isn't specialization for its own sake. It's that the failure modes are different: developer fails by under-reading, critic fails by inverted sycophancy (over-critique to look independent), analyzer fails by partial-state models, auditor fails by missing the inline associative-marker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Eval suite — what actually gets measured
&lt;/h2&gt;

&lt;p&gt;An eval suite is the only runtime defense against config drift. Without it, every change to the protocol is a taste call.&lt;/p&gt;

&lt;p&gt;The suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;17 binary criteria&lt;/strong&gt; rubric. Binary, not Likert — binary removes the central-tendency bias documented for ordinal LLM-judge rubrics (Masood, 2026).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Questions probes&lt;/strong&gt; (~20) — normal distribution of developer questions, measures average compliance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial probes&lt;/strong&gt; (~10) — each targets a specific invariant under deliberate social pressure (a deliberate bypass attempt embedded in the prompt). Binary: held or broke. Distinguishes "invariant holds when nobody pushes" from "holds always."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Canary GUID per probe&lt;/strong&gt; — embedded in a &lt;code&gt;## Canary&lt;/code&gt; section the runner does not pass into the model. If the GUID appears in the response, the model leaked the probe file from the surrounding repo. Anti-contamination, runtime-verifiable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pre-registered decision rule
&lt;/h2&gt;

&lt;p&gt;A config change ships iff &lt;strong&gt;all of&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paired Wilcoxon signed-rank on question score totals: &lt;strong&gt;p &amp;lt; 0.05&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Bootstrap 95% CI on Cohen's d (paired): &lt;strong&gt;lower bound &amp;gt; 0.2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;McNemar one-sided on adversarial pass/fail: &lt;strong&gt;zero regressions&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Krippendorff α ≥ 0.8 for inter-rater reliability when ≥2 raters score the same probe.&lt;/p&gt;

&lt;p&gt;Pre-registered, not chosen after seeing numbers. Without this, the merge criterion drifts — "p &amp;lt; 0.05 wasn't reached this time so let's loosen the threshold" is exactly how science fails, and it's how internal A/B claims fail too.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is and what it isn't
&lt;/h2&gt;

&lt;p&gt;This is not a framework. It's a calibration layer that sits on top of any agent runtime — in my case, on top of Claude Code's plugin system, but the design ports to LangGraph, Autogen, or hand-rolled.&lt;/p&gt;

&lt;p&gt;The portable parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A deontic+counter invariants file&lt;/li&gt;
&lt;li&gt;A per-turn risk-weighted sampling hook&lt;/li&gt;
&lt;li&gt;An eval suite with an adversarial split&lt;/li&gt;
&lt;li&gt;A pre-registered statistical decision rule&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest is plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you lose
&lt;/h2&gt;

&lt;p&gt;Length. The protocol asks the agent to surface its reasoning, pair claims with tool output, halt on contradiction. The reply is denser and longer than a yes-man would produce. If your team is optimizing for "agent makes user happy in three lines," this is the wrong protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm publishing this
&lt;/h2&gt;

&lt;p&gt;Most agentic engineering writeups in 2026 are either framework docs (LangGraph this, CrewAI that) or hype posts about emergent capabilities. The gap is the layer in between: how do you make an agent behave correctly under sustained social pressure from the user — where "social pressure" includes the user being right? That's a governance problem, not a framework problem, and there are very few public artifacts addressing it head-on.&lt;/p&gt;

&lt;p&gt;I'll write up specific parts — the 17-criteria rubric, the canary GUID approach, the deontic+counter design — as follow-up posts if there's interest.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>gametheory</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
