AI Gov Dev for Aguardic

Posted on Mar 9 • Originally published at aguardic.com

What LLM Guardrails Don't Cover (And What AI Governance Actually Requires)

#ai #security #opensource #devops

LLM guardrails have gotten good. Tools like Guardrails AI, NVIDIA NeMo Guardrails, Lakera Guard, and Bifrost solve a real problem: they sit between your application and the language model, scanning prompts for injection attacks, filtering responses for PII and toxic content, and enforcing output format constraints. If you're running an LLM in production without any of these, you should fix that today.

But here's the thing nobody in the guardrails conversation is saying clearly enough: LLM input/output filtering is one layer of a much larger governance problem. And for most organizations, it's not even the most important layer.

The companies getting asked hard governance questions by enterprise customers, auditors, and regulators aren't being asked "do you filter toxic LLM outputs?" They're being asked "how do you enforce your organizational policies across every system where AI-generated content flows?" That's a fundamentally different question, and guardrails tools don't answer it.

What Guardrails Tools Actually Do

To be fair to the existing tools, let's be precise about what they cover. LLM guardrails operate on a single interaction pattern: a prompt goes in, a response comes out, and rules are applied to one or both.

The typical capabilities include prompt injection detection, which catches attempts to override system instructions. PII detection and redaction, which prevents models from leaking sensitive data in responses. Toxicity and content safety filtering, which blocks harmful or inappropriate outputs. Output format validation, which ensures structured responses match expected schemas. Topic restriction, which keeps the model from engaging with off-limits subjects.

These are valuable capabilities. If your product sends user input to GPT-4 and returns the response, you absolutely need this layer. The tools that exist are mature, performant, and well-documented.

The problem is scope.

The Five Gaps

Gap 1: Guardrails Cover One Surface. Organizations Have Many.

An AI-powered healthcare company doesn't just have LLM outputs to worry about. They have developers committing code that might contain API keys or hardcoded credentials. They have team members sharing documents through Google Drive that might contain unredacted patient data. They have Slack channels where sensitive information gets discussed. They have emails going to external recipients that might violate disclosure policies. They have AI agents taking multi-step actions across systems.

Every one of these surfaces has rules that need to be enforced. A guardrails tool that filters LLM responses leaves all of these completely ungoverned.

The compliance officer isn't thinking in terms of "LLM calls." They're thinking in terms of "everywhere sensitive data flows." The CISO isn't thinking about "prompt injection." They're thinking about "every vector where a policy can be violated." Governance that only covers the LLM call leaves the other surfaces to manual review, hope, or nothing.

Gap 2: Guardrails Are Stateless. Agent Governance Requires State.

LLM guardrails evaluate a single request-response pair. Each check is independent. There's no memory of what happened before, no context about what's coming next.

This works fine for a chatbot. It doesn't work for an AI agent.

An agent receives a goal, breaks it into steps, calls tools, reads results, makes decisions, and takes actions across multiple systems. The individual steps might each pass a guardrails check. The sequence might be a policy violation.

An agent that reads a patient's medical records (allowed), drafts a summary (allowed), and emails it to an external address (violation) won't be caught by stateless guardrails. Each action looks fine on its own. The violation only emerges when you evaluate the full session: the agent accessed PHI earlier and is now attempting to send content externally.

Session-aware evaluation requires tracking what tools have been used, what data has been accessed, and what actions have already been taken within a workflow. Guardrails tools don't maintain session state because they were designed for a different problem.

Gap 3: Guardrails Apply Generic Safety Rules. Organizations Need Their Own Rules.

The rules baked into guardrails tools are universal: don't leak PII, don't generate toxic content, don't allow prompt injection. These are table stakes and they're the same for every company.

But the rules that actually matter to a specific organization are unique to that organization. A financial services company needs to enforce that AI-generated communications never promise specific returns. A healthcare company needs to enforce minimum necessary standards for PHI disclosure. A SaaS company needs to enforce that AI outputs comply with their specific brand voice guidelines and don't mention competitors. A legal team needs to ensure AI-drafted contracts don't include unauthorized indemnification clauses.

These rules can't be defined by a guardrails vendor because the vendor doesn't know your business. They live in your compliance documents, your brand guides, your contracts, your regulatory frameworks. Turning these organization-specific rules into enforceable checks requires a different architecture: one that can ingest your documents, extract your rules, and evaluate content against your specific context.

This is the difference between generic safety (what guardrails tools provide) and organizational governance (what enterprises actually need).

Gap 4: Guardrails Monitor or Block. Governance Requires Graduated Enforcement.

Most guardrails tools offer two modes: allow or block. Some add a "flag" option. This binary model works for clear-cut safety rules, such as "never output a social security number." But organizational governance isn't binary.

Some violations should block the action entirely. A code commit containing an AWS secret key should never reach the repository. Some violations should warn but allow. A marketing email that uses slightly off-brand language should flag for review, not get blocked in the middle of a campaign. Some violations should monitor silently. A Slack message that mentions a competitor's product might be worth tracking for pattern analysis, but blocking it would be disruptive and counterproductive.

Enforcement modes need to match the severity and context of the violation. And they need to apply differently across different surfaces. A HIPAA violation in an LLM response should block. The same content in an internal Slack channel between authorized personnel might only need monitoring. The enforcement logic has to be configurable per policy, per surface, per context.

Gap 5: Guardrails Don't Generate Compliance Evidence

When an auditor asks "how do you govern your AI systems?", they don't want to hear "we run Lakera on our API calls." They want to see evidence: what policies exist, when they were last updated, what violations were detected, how they were resolved, and what the compliance rate looks like over time.

LLM guardrails tools are security tools. They block bad things. They don't generate the audit trails, compliance reports, and versioned policy histories that regulated industries require.

Governance evidence includes versioned policies showing who changed what and when. It includes violation logs with full context about what triggered the rule, what content was evaluated, and what enforcement action was taken. It includes compliance reports with time-range filtering for audit periods. It includes explainable evaluations showing why a specific rule triggered on specific content.

This evidence needs to be generated automatically as a byproduct of enforcement, not assembled manually before an audit. Every evaluation, every violation, every enforcement action should produce a record that contributes to a complete audit trail.

What a Complete Governance Stack Looks Like

LLM guardrails aren't wrong. They're incomplete. They're one layer of what should be a multi-layer governance architecture.

A complete stack has five components.

Layer 1: LLM Input/Output Safety. This is what guardrails tools do today. Prompt injection prevention, PII filtering, toxicity detection, output validation. Keep using these. They're good at this specific job.

Layer 2: Multi-Surface Policy Enforcement. The same organizational rules need to apply wherever content flows. Code repositories, cloud storage, email, messaging platforms, and LLM calls should all be governed by the same policy engine. A rule that says "no PII in external communications" should work whether the communication is an LLM response, an email, a Slack message, or a shared document.

Layer 3: Organization-Specific Rules. Generic safety isn't enough. The governance engine needs to understand your specific compliance documents, brand guidelines, contracts, and regulatory requirements. This means ingesting your documents, extracting enforceable rules from them, and evaluating content against your organizational context, not just universal safety patterns.

Layer 4: Session-Aware Agent Governance. For AI agents, governance must be stateful. Track what tools the agent has used, what data it has accessed, and what actions it has taken across the entire workflow. Evaluate each action in the context of the full session, not in isolation. Enforce rules that span multiple actions, like "block external communication when PHI was accessed in this session."

Layer 5: Audit Trail and Compliance Evidence. Every evaluation should produce a record. Policies should be versioned. Violations should include full context. Compliance reports should be generated automatically. When the auditor asks, the evidence should already exist.

The Organizational Disconnect

There's a reason this gap exists. The teams buying guardrails tools and the teams responsible for organizational governance are often different people.

The engineering team deploys Guardrails AI or NeMo because they need to ship a safe LLM integration. They're solving an immediate technical problem: make sure the model doesn't say something dangerous. The tool works. The problem feels solved.

Meanwhile, the compliance officer is manually reviewing AI outputs against a regulatory checklist. The CISO is assembling audit evidence from scattered logs. The VP of Engineering is getting security questionnaires from enterprise prospects asking how the company governs AI across all its systems, and there's no good answer because the "guardrails" only cover one surface.

The engineering team thinks governance is handled. The compliance team knows it isn't. Neither team is wrong about their own piece. They're just looking at different scopes of the same problem.

The Convergence

These worlds are going to merge. The same tool that enforces rules on LLM outputs should enforce rules on code commits, document sharing, email content, and agent actions. The same policy definition should work across all surfaces. The same audit trail should capture every enforcement decision regardless of where it happened.

This isn't a theoretical argument. It's a practical one driven by two forces.

First, regulations don't scope to a single surface. The EU AI Act doesn't say "govern your LLM calls." HIPAA doesn't say "filter PHI from chatbot responses." They say "protect patient data wherever it appears." SOC 2 auditors ask about controls across all systems, not just the ones running language models. Governance that only covers one surface creates a compliance gap that widens as AI gets embedded in more workflows.

Second, enterprise buyers are asking for it. When a Fortune 500 company evaluates an AI vendor, their security review covers the entire system. "We use Lakera for LLM safety" answers one question on a 200-question security questionnaire. The other 199 questions are about data handling, access controls, audit trails, and policy enforcement across the whole product. Vendors who can demonstrate comprehensive governance pass the review. Vendors who can only point to LLM guardrails get more questions.

Where This Leaves You

If you're running LLMs in production, keep your guardrails. Prompt injection prevention and PII filtering are real, immediate needs.

But if you're building AI products for regulated industries, if your enterprise customers are asking how you govern AI across your entire system, if your compliance team is manually reviewing content that should be automatically enforced, then guardrails are necessary but not sufficient.

The question isn't "do we have LLM guardrails?" It's "do we have governance across every surface where our organizational rules need to be enforced?"

For most organizations, the honest answer is no. And the gap between "we filter LLM outputs" and "we enforce our policies everywhere, automatically, with evidence" is where the actual governance work begins.

I'm building Aguardic — a policy-as-code platform that enforces organizational rules across code, AI outputs, documents, agents, email, and messaging, with full audit trails. Happy to answer questions about governance beyond LLM guardrails in the comments.

DEV Community