DEV Community

Cover image for Treat AI Output as Untrusted Input
Solomon Mithra
Solomon Mithra

Posted on

Treat AI Output as Untrusted Input

In every serious system we build, there’s a rule we don’t argue with:

User input is untrusted.

We validate it.

We sanitize it.

We enforce boundaries before it’s allowed to do anything meaningful.

Yet when it comes to AI systems, many teams quietly abandon this rule.


The dangerous assumption

 
In production AI systems, model output often flows directly into:

  • customer-facing responses
  • financial decisions
  • workflow automation
  • compliance-sensitive paths

The implicit assumption is:

“The model did what we asked, so the output must be okay.”

This is where things go wrong.

When failures happen, the postmortem usually says:

  • “The prompt wasn’t strict enough”
  • “We should retry more”
  • “The model hallucinated”

But those aren’t root causes.


The real failure is the boundary

 
The model didn’t break the system.

The system trusted the model.

From a systems perspective, AI output is just another external data source:

  • probabilistic
  • non-deterministic
  • not guaranteed to respect invariants

That puts it in the same category as:

  • user input
  • webhook payloads
  • third-party API responses

We don’t trust those.

We verify them.


Why prompts and retries don’t solve this

 
Prompts are instructions, not enforcement.

Retries increase the chance of a better answer, but they don’t guarantee:

  • structural correctness
  • compliance
  • safety
  • consistency

Using one LLM to judge another just adds more probability to the system.

None of these create a hard stop.


The correct production architecture

 
Once you see it, it’s hard to unsee.

LLM → Verification Layer → System

The verification layer runs:

  • after generation
  • before delivery
  • outside the model’s control

Its job is not to be smart.

Its job is to be strict.


What verification actually means

 
In practice, verification enforces three things:

1. Contracts

Does the output match the structure your system expects?

If not, it doesn’t proceed.

2. Policies

Does the output violate any deterministic rules?

  • compliance language
  • PII exposure
  • secret leakage
  • unsafe markup

If yes, the system blocks or rewrites explicitly.

3. Explicit decisions

Every response results in a clear outcome:

  • allow
  • block
  • rewrite
  • audit

No silent failures.

No “probably fine.”


Why this changes everything

 
Once AI output is treated as untrusted input:

  • simpler models become viable
  • failures become predictable
  • compliance becomes enforceable
  • incidents are caught before damage

The model becomes a suggestion engine, not a source of truth.

That’s exactly where probabilistic systems belong.


This isn’t about safety it’s about systems

 
This isn’t a moral argument.

It’s a production one.

Every mature system enforces trust at boundaries.

AI systems are no different.


Final principle

 
If your system cannot deterministically explain why an AI response was allowed,

then it should not have been allowed.


If you’re interested in enforcing this boundary in real systems,

Gateia is an open-source TypeScript SDK built specifically for post-generation verification:

npm install gateia
Enter fullscreen mode Exit fullscreen mode

Built to be boring.
Built to be strict.
Built for production.


Top comments (0)