Solomon Mithra

Posted on Jan 31

Treat AI Output as Untrusted Input

#ai #infrastructure #security #opensource

In every serious system we build, there’s a rule we don’t argue with:

User input is untrusted.

We validate it.

We sanitize it.

We enforce boundaries before it’s allowed to do anything meaningful.

Yet when it comes to AI systems, many teams quietly abandon this rule.

The dangerous assumption

In production AI systems, model output often flows directly into:

customer-facing responses
financial decisions
workflow automation
compliance-sensitive paths

The implicit assumption is:

“The model did what we asked, so the output must be okay.”

This is where things go wrong.

When failures happen, the postmortem usually says:

“The prompt wasn’t strict enough”
“We should retry more”
“The model hallucinated”

But those aren’t root causes.

The real failure is the boundary

The model didn’t break the system.

The system trusted the model.

From a systems perspective, AI output is just another external data source:

probabilistic
non-deterministic
not guaranteed to respect invariants

That puts it in the same category as:

user input
webhook payloads
third-party API responses

We don’t trust those.

We verify them.

Why prompts and retries don’t solve this

Prompts are instructions, not enforcement.

Retries increase the chance of a better answer, but they don’t guarantee:

structural correctness
compliance
safety
consistency

Using one LLM to judge another just adds more probability to the system.

None of these create a hard stop.

The correct production architecture

Once you see it, it’s hard to unsee.

LLM → Verification Layer → System

The verification layer runs:

after generation
before delivery
outside the model’s control

Its job is not to be smart.

Its job is to be strict.

What verification actually means

In practice, verification enforces three things:

1. Contracts

Does the output match the structure your system expects?

If not, it doesn’t proceed.

2. Policies

Does the output violate any deterministic rules?

compliance language
PII exposure
secret leakage
unsafe markup

If yes, the system blocks or rewrites explicitly.

3. Explicit decisions

Every response results in a clear outcome:

allow
block
rewrite
audit

No silent failures.

No “probably fine.”

Why this changes everything

Once AI output is treated as untrusted input:

simpler models become viable
failures become predictable
compliance becomes enforceable
incidents are caught before damage

The model becomes a suggestion engine, not a source of truth.

That’s exactly where probabilistic systems belong.

This isn’t about safety it’s about systems

This isn’t a moral argument.

It’s a production one.

Every mature system enforces trust at boundaries.

AI systems are no different.

Final principle

If your system cannot deterministically explain why an AI response was allowed,

then it should not have been allowed.

If you’re interested in enforcing this boundary in real systems,

Gateia is an open-source TypeScript SDK built specifically for post-generation verification:

npm install gateia

Built to be boring.
Built to be strict.
Built for production.

DEV Community