<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Solomon Mithra</title>
    <description>The latest articles on DEV Community by Solomon Mithra (@solomon-mithra).</description>
    <link>https://dev.to/solomon-mithra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3740925%2F87afef24-7222-426b-bbef-76be69a286a3.png</url>
      <title>DEV Community: Solomon Mithra</title>
      <link>https://dev.to/solomon-mithra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/solomon-mithra"/>
    <language>en</language>
    <item>
      <title>Treat AI Output as Untrusted Input</title>
      <dc:creator>Solomon Mithra</dc:creator>
      <pubDate>Sat, 31 Jan 2026 09:48:45 +0000</pubDate>
      <link>https://dev.to/solomon-mithra/treat-ai-output-as-untrusted-input-1lhp</link>
      <guid>https://dev.to/solomon-mithra/treat-ai-output-as-untrusted-input-1lhp</guid>
      <description>&lt;p&gt;In every serious system we build, there’s a rule we don’t argue with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User input is untrusted.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We validate it.&lt;br&gt;&lt;br&gt;
We sanitize it.&lt;br&gt;&lt;br&gt;
We enforce boundaries before it’s allowed to do anything meaningful.&lt;/p&gt;

&lt;p&gt;Yet when it comes to AI systems, many teams quietly abandon this rule.&lt;/p&gt;


&lt;h2&gt;
  
  
  The dangerous assumption
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
In production AI systems, model output often flows directly into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer-facing responses&lt;/li&gt;
&lt;li&gt;financial decisions&lt;/li&gt;
&lt;li&gt;workflow automation&lt;/li&gt;
&lt;li&gt;compliance-sensitive paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implicit assumption is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The model did what we asked, so the output must be okay.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where things go wrong.&lt;/p&gt;

&lt;p&gt;When failures happen, the postmortem usually says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“The prompt wasn’t strict enough”&lt;/li&gt;
&lt;li&gt;“We should retry more”&lt;/li&gt;
&lt;li&gt;“The model hallucinated”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But those aren’t root causes.&lt;/p&gt;


&lt;h2&gt;
  
  
  The real failure is the boundary
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
The model didn’t break the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The system trusted the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From a systems perspective, AI output is just another external data source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;probabilistic&lt;/li&gt;
&lt;li&gt;non-deterministic&lt;/li&gt;
&lt;li&gt;not guaranteed to respect invariants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That puts it in the same category as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user input&lt;/li&gt;
&lt;li&gt;webhook payloads&lt;/li&gt;
&lt;li&gt;third-party API responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We don’t &lt;em&gt;trust&lt;/em&gt; those.&lt;br&gt;&lt;br&gt;
We &lt;strong&gt;verify&lt;/strong&gt; them.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why prompts and retries don’t solve this
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Prompts are instructions, not enforcement.&lt;/p&gt;

&lt;p&gt;Retries increase the chance of a better answer, but they don’t guarantee:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;structural correctness&lt;/li&gt;
&lt;li&gt;compliance&lt;/li&gt;
&lt;li&gt;safety&lt;/li&gt;
&lt;li&gt;consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using one LLM to judge another just adds more probability to the system.&lt;/p&gt;

&lt;p&gt;None of these create a &lt;strong&gt;hard stop&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The correct production architecture
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Once you see it, it’s hard to unsee.&lt;/p&gt;

&lt;p&gt;LLM → Verification Layer → System&lt;/p&gt;

&lt;p&gt;The verification layer runs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;after generation&lt;/li&gt;
&lt;li&gt;before delivery&lt;/li&gt;
&lt;li&gt;outside the model’s control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its job is not to be smart.&lt;/p&gt;

&lt;p&gt;Its job is to be &lt;strong&gt;strict&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  What verification actually means
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
In practice, verification enforces three things:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Contracts
&lt;/h3&gt;

&lt;p&gt;Does the output match the structure your system expects?&lt;/p&gt;

&lt;p&gt;If not, it doesn’t proceed.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Policies
&lt;/h3&gt;

&lt;p&gt;Does the output violate any deterministic rules?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compliance language&lt;/li&gt;
&lt;li&gt;PII exposure&lt;/li&gt;
&lt;li&gt;secret leakage&lt;/li&gt;
&lt;li&gt;unsafe markup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If yes, the system blocks or rewrites explicitly.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Explicit decisions
&lt;/h3&gt;

&lt;p&gt;Every response results in a clear outcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;allow&lt;/li&gt;
&lt;li&gt;block&lt;/li&gt;
&lt;li&gt;rewrite&lt;/li&gt;
&lt;li&gt;audit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No silent failures.&lt;br&gt;&lt;br&gt;
No “probably fine.”&lt;/p&gt;


&lt;h2&gt;
  
  
  Why this changes everything
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Once AI output is treated as untrusted input:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simpler models become viable&lt;/li&gt;
&lt;li&gt;failures become predictable&lt;/li&gt;
&lt;li&gt;compliance becomes enforceable&lt;/li&gt;
&lt;li&gt;incidents are caught &lt;em&gt;before&lt;/em&gt; damage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model becomes a suggestion engine, not a source of truth.&lt;/p&gt;

&lt;p&gt;That’s exactly where probabilistic systems belong.&lt;/p&gt;


&lt;h2&gt;
  
  
  This isn’t about safety it’s about systems
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
This isn’t a moral argument.&lt;br&gt;&lt;br&gt;
It’s a production one.&lt;/p&gt;

&lt;p&gt;Every mature system enforces trust at boundaries.&lt;/p&gt;

&lt;p&gt;AI systems are no different.&lt;/p&gt;


&lt;h2&gt;
  
  
  Final principle
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
If your system cannot deterministically explain &lt;strong&gt;why&lt;/strong&gt; an AI response was allowed,&lt;br&gt;&lt;br&gt;
then it should not have been allowed.&lt;/p&gt;



&lt;p&gt;If you’re interested in enforcing this boundary in real systems,&lt;br&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://gateia.co" rel="noopener noreferrer"&gt;Gateia&lt;/a&gt;&lt;/strong&gt; is an open-source TypeScript SDK built specifically for post-generation verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;gateia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Built to be boring.&lt;br&gt;
Built to be strict.&lt;br&gt;
Built for production.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Probability Is a Liability in Production</title>
      <dc:creator>Solomon Mithra</dc:creator>
      <pubDate>Fri, 30 Jan 2026 03:39:21 +0000</pubDate>
      <link>https://dev.to/solomon-mithra/probability-is-a-liability-in-production-41md</link>
      <guid>https://dev.to/solomon-mithra/probability-is-a-liability-in-production-41md</guid>
      <description>&lt;p&gt;Large Language Models are impressive.&lt;br&gt;&lt;br&gt;
They’re also &lt;strong&gt;probabilistic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Production systems are not.&lt;/p&gt;

&lt;p&gt;That mismatch is where most AI failures actually happen.&lt;/p&gt;


&lt;h2&gt;
  
  
  AI failures are usually trust failures.
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
When AI systems fail in production, it’s rarely dramatic.&lt;/p&gt;

&lt;p&gt;It’s not “the model crashed.”&lt;br&gt;&lt;br&gt;
It’s quieter and more dangerous:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;malformed JSON reaches a parser
&lt;/li&gt;
&lt;li&gt;guarantee language slips into a response
&lt;/li&gt;
&lt;li&gt;PII leaks into customer facing text
&lt;/li&gt;
&lt;li&gt;unsafe markup reaches a client
&lt;/li&gt;
&lt;li&gt;assumptions are violated silently
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are &lt;strong&gt;trust failures&lt;/strong&gt;, not intelligence failures.&lt;/p&gt;


&lt;h2&gt;
  
  
  We validate inputs. We don’t verify outputs.
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Every serious system treats user input as untrusted.&lt;/p&gt;

&lt;p&gt;We validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;types&lt;/li&gt;
&lt;li&gt;formats&lt;/li&gt;
&lt;li&gt;invariants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We fail closed when validation fails.&lt;/p&gt;

&lt;p&gt;But AI output often skips this step entirely.&lt;/p&gt;

&lt;p&gt;Instead, teams rely on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompts&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;“the model usually behaves”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a safety model.&lt;br&gt;&lt;br&gt;
That’s hope.&lt;/p&gt;

&lt;p&gt;An LLM is just another untrusted computation.&lt;/p&gt;


&lt;h2&gt;
  
  
  Compliance is enforced at boundaries
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
This is the key insight.&lt;/p&gt;

&lt;p&gt;Databases aren’t “GDPR-aware.”&lt;br&gt;&lt;br&gt;
APIs aren’t “SOC2-aware.”&lt;br&gt;&lt;br&gt;
Users aren’t trusted.&lt;/p&gt;

&lt;p&gt;Compliance is enforced at &lt;strong&gt;boundaries&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;validation layers
&lt;/li&gt;
&lt;li&gt;policy checks
&lt;/li&gt;
&lt;li&gt;explicit allow/block decisions
&lt;/li&gt;
&lt;li&gt;audit logs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI systems need the same treatment.&lt;/p&gt;

&lt;p&gt;Trying to make AI “behave” by adding more AI only increases uncertainty.&lt;/p&gt;


&lt;h2&gt;
  
  
  Deterministic verification beats AI judging AI
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Many AI safety tools rely on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLMs evaluating LLMs&lt;/li&gt;
&lt;li&gt;probabilistic moderation&lt;/li&gt;
&lt;li&gt;confidence scores&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That fails quietly.&lt;/p&gt;

&lt;p&gt;A verifier should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;never hallucinate
&lt;/li&gt;
&lt;li&gt;never guess
&lt;/li&gt;
&lt;li&gt;never be creative
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It should be boring and correct.&lt;/p&gt;


&lt;h2&gt;
  
  
  Gateia: verifying AI output before it ships
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
This is why I built &lt;strong&gt;&lt;a href="https://www.npmjs.com/package/gateia" rel="noopener noreferrer"&gt;Gateia&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Gateia does &lt;strong&gt;not&lt;/strong&gt; generate AI output.&lt;br&gt;&lt;br&gt;
It does &lt;strong&gt;not&lt;/strong&gt; orchestrate agents.&lt;br&gt;&lt;br&gt;
It does &lt;strong&gt;not&lt;/strong&gt; manage prompts or models.&lt;/p&gt;

&lt;p&gt;Gateia runs &lt;strong&gt;after generation&lt;/strong&gt; and answers one question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Is this output allowed to enter my system?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;schema contracts
&lt;/li&gt;
&lt;li&gt;deterministic safety &amp;amp; compliance policies
&lt;/li&gt;
&lt;li&gt;explicit pass / warn / block decisions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is auditable.&lt;br&gt;&lt;br&gt;
Failures are explicit.&lt;br&gt;&lt;br&gt;
Security fails closed.&lt;/p&gt;


&lt;h2&gt;
  
  
  A missing layer, not a framework
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
Gateia isn’t an orchestration framework.&lt;br&gt;&lt;br&gt;
It’s deliberately narrow.&lt;/p&gt;

&lt;p&gt;Every production AI system eventually needs a gate — either by design or after an incident.&lt;/p&gt;

&lt;p&gt;Verification is not exciting.&lt;br&gt;&lt;br&gt;
But it is inevitable.&lt;/p&gt;


&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
AI doesn’t fail in production because it’s not smart enough.&lt;/p&gt;

&lt;p&gt;It fails because &lt;strong&gt;we trust probability where we should enforce rules&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Production systems don’t need smarter models.&lt;br&gt;&lt;br&gt;
They need stronger boundaries.&lt;/p&gt;



&lt;p&gt;If you’re interested in deterministic verification for AI outputs,&lt;br&gt;&lt;br&gt;
Gateia is available as an open-source TypeScript SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;gateia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>typescript</category>
      <category>opensource</category>
      <category>security</category>
    </item>
  </channel>
</rss>
