<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: thomas Pham</title>
    <description>The latest articles on DEV Community by thomas Pham (@guardianai).</description>
    <link>https://dev.to/guardianai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3794747%2Fcffa6366-d177-432c-9929-a3ba851545f0.png</url>
      <title>DEV Community: thomas Pham</title>
      <link>https://dev.to/guardianai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/guardianai"/>
    <language>en</language>
    <item>
      <title>Observing silent failures in LLM outputs over time</title>
      <dc:creator>thomas Pham</dc:creator>
      <pubDate>Thu, 26 Feb 2026 13:46:32 +0000</pubDate>
      <link>https://dev.to/guardianai/observing-silent-failures-in-llm-outputs-over-time-5d4m</link>
      <guid>https://dev.to/guardianai/observing-silent-failures-in-llm-outputs-over-time-5d4m</guid>
      <description>&lt;p&gt;Hello all,&lt;/p&gt;

&lt;p&gt;I’ve been working on a structural observability layer for AI systems called GuardianAI.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;br&gt;
instead of trying to correct model outputs or evaluate reasoning, GuardianAI only observes how outputs evolve under constraints over time.&lt;/p&gt;

&lt;p&gt;It does not inspect content or replace the model.&lt;br&gt;
It only monitors trajectory behavior and emits read-only control states when constraints are breached.&lt;/p&gt;

&lt;p&gt;To test this, I built a deterministic contract lab where the model must output exact literals.&lt;br&gt;
This isolates stability rather than reasoning quality.&lt;/p&gt;

&lt;p&gt;In a recent run:&lt;/p&gt;

&lt;p&gt;• 15 contract breaches were observed&lt;br&gt;
• 5 of those were strict semantic failures (not formatting drift)&lt;br&gt;
• this corresponds to roughly a 1–4% hard failure rate&lt;/p&gt;

&lt;p&gt;These are not hallucinations in the usual sense.&lt;br&gt;
They are silent decision errors that appear correct locally but violate the contract globally.&lt;/p&gt;

&lt;p&gt;In production pipelines, this kind of failure compounds over time.&lt;/p&gt;

&lt;p&gt;The demo interface is just a visualization layer.&lt;br&gt;
The observer runs independently and can be tested directly.&lt;/p&gt;

&lt;p&gt;You can try the demo here:&lt;br&gt;
&lt;a href="https://app.guardianai.fr" rel="noopener noreferrer"&gt;https://app.guardianai.fr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m interested in connecting with researchers or engineers who work on evaluation, reliability, or production AI pipelines.&lt;/p&gt;

&lt;p&gt;If you want to test GuardianAI directly outside the UI, feel free to reach out — I can provide endpoint access.&lt;/p&gt;

&lt;p&gt;Thom&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>computerscience</category>
      <category>startup</category>
    </item>
  </channel>
</rss>
