I've been building AI-powered features for a while now, and the hardest conversations I have with my team are never about which model to use. They're always about the same thing: what is this system actually allowed to do, and how do we prove it?
That question pushed me to build PolicyAware - an open source Python control plane that sits in front of your models, tools, and retrieval systems. Before I explain what it does, I want to walk through why the tools most teams reach for first - guardrails, AI gateways, and model routers - are genuinely useful but leave a critical gap wide open.
The landscape right now
If you search for "AI safety" or "LLM governance" you will find three categories of tools coming up again and again:
- Guardrail libraries - validate prompts and outputs against safety rules
- AI gateways - proxy your requests to model providers, centralize API keys
- Model routers - pick the cheapest or fastest model for each request
All three are useful. None of them alone answers the governance question.
Here is the mental model I use: a guardrail checks what the model says. A gateway manages where the request goes. A router decides which model handles it. But none of them ask the most important question first: should this request be allowed to run at all, under this user's role, for this tenant, in this region, given this risk level?
That is the gap PolicyAware fills.
Side-by-side comparison
| Capability | Guardrails | AI Gateway | Model Router | PolicyAware |
|---|---|---|---|---|
| Block unsafe prompts before execution | Sometimes | Sometimes | No | Yes |
| Redact PII / PHI / secrets pre-execution | Sometimes | Sometimes | No | Yes |
| Decisions using role, tenant, region, risk | Limited | Limited | Limited | Yes |
| Deny-by-default posture | Usually no | Usually no | No | Yes |
| Govern MCP / agent tool calls | Usually no | Sometimes | No | Yes |
| Require human approval for risky actions | Usually no | Sometimes | No | Yes |
| Route across providers after policy approval | No | Yes | Yes | Yes |
| Evaluate RAG citation, grounding, leakage | Sometimes | Limited | No | Yes |
| Emit audit traces with reason codes | Limited | Sometimes | Limited | Yes |
| Generate compliance evidence artifacts | Usually no | Usually no | No | Yes |
The right column is not a flex. It is a description of what enterprise AI systems actually need once they move beyond read-only chat and start touching real data, real tools, and real business workflows.
When each tool is the right call
Use a guardrails library when your only need is response formatting, toxicity filtering, or structured output validation. If you do not need RBAC, tenant rules, approval flows, or audit evidence, a guardrail is lighter and faster.
Use an AI gateway when your main problem is juggling provider keys, rate limits, and fallback routing. Gateways are great infrastructure. They are just not governance.
Use a model router when you are optimizing for cost, latency, or quality tradeoffs across providers. A router does not decide whether a request should run - only which model would run it.
Use PolicyAware when your AI system touches sensitive data, calls external tools, operates under regional compliance rules, or takes actions with financial or operational consequences. If you need to explain a decision to a security team six months from now, you need a control plane, not just a proxy.
How the architecture fits together
Here is the pattern I use in production. The key rule is: nothing reaches a model, retriever, or tool until the control plane has made an explicit decision.
+-----------------------------+
| Application Layer |
| (web app / API / workflow) |
+-------------+---------------+
|
v
+-----------------------------+
| PolicyAware Control Plane |
| |
| 1. Identity + context |
| 2. Deny-by-default check |
| 3. PII / PHI detection |
| 4. Risk classification |
| 5. Approval gate (if high) |
| 6. Provider routing |
+--------+----------+---------+
| |
v v
+----------+ +----------+
| RAG Layer| | Tools / |
| retrieval| | MCP |
| citation | | payments |
+----+-----+ +----+-----+
| |
+------+-------+
|
v
+-------------+
| Model Layer |
| (local/SaaS)|
+------+------+
|
v
+-------------+
| Evaluations |
| leakage |
| grounding |
| audit trace |
+-------------+
Every arrow in that diagram has a policy decision attached to it. That is the entire point.
A real example: the $500 refund prompt
Let us make this concrete. A customer-support copilot gets this message:
Email jane@example.com and refund the customer $500.
Here is what different tools do with it:
- A guardrail might check whether the output looks safe
- A gateway forwards the request to your provider of choice
- A router picks GPT-4.1 because it is the best model for support tasks
- PolicyAware stops and works through the full decision tree before any of that happens
Code: policy-first middleware
from dataclasses import dataclass, field
from enum import Enum
import re
from typing import List, Optional
class Decision(str, Enum):
ALLOW = "allow"
DENY = "deny"
REQUIRE_APPROVAL = "require_approval"
@dataclass
class RequestContext:
user_id: str
role: str
tenant: str
region: str
task_type: str
prompt: str
tools: List[str] = field(default_factory=list)
@dataclass
class PolicyResult:
decision: Decision
risk_tier: str
redacted_prompt: str
reason_codes: List[str]
required_approver: Optional[str] = None
EMAIL_RE = re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}")
ALLOWED_TOOLS = {
"support_agent": {"knowledge_search", "draft_email"},
"finance_manager": {"knowledge_search", "draft_email", "issue_refund"},
}
def evaluate_policy(ctx: RequestContext) -> PolicyResult:
reason_codes = []
allowed = ALLOWED_TOOLS.get(ctx.role, set())
for tool in ctx.tools:
if tool not in allowed:
return PolicyResult(
decision=Decision.DENY,
risk_tier="high",
redacted_prompt=EMAIL_RE.sub("[REDACTED]", ctx.prompt),
reason_codes=[f"tool_not_permitted:{tool}"],
)
redacted = EMAIL_RE.sub("[REDACTED]", ctx.prompt)
if EMAIL_RE.search(ctx.prompt):
reason_codes.append("pii_detected")
is_high_risk = "refund" in ctx.prompt.lower() or "issue_refund" in ctx.tools
if is_high_risk:
reason_codes.append("high_risk_financial_action")
return PolicyResult(
decision=Decision.REQUIRE_APPROVAL,
risk_tier="high",
redacted_prompt=redacted,
reason_codes=reason_codes,
required_approver="finance_supervisor",
)
reason_codes.append("policy_allow")
return PolicyResult(
decision=Decision.ALLOW,
risk_tier="low",
redacted_prompt=redacted,
reason_codes=reason_codes,
)
# Try the refund prompt
ctx = RequestContext(
user_id="u-1001",
role="support_agent",
tenant="acme-corp",
region="us-east",
task_type="customer_support",
prompt="Email jane@example.com and refund the customer $500.",
tools=["draft_email", "issue_refund"],
)
result = evaluate_policy(ctx)
print(result.decision) # Decision.DENY
print(result.reason_codes) # ['tool_not_permitted:issue_refund']
The support agent gets denied before the prompt ever reaches a model. The reason code is logged. The redacted prompt is stored. That is the audit trail your security team will ask for.
Code: compliant model routing
Routing still matters - but it should only happen after policy approves the request.
@dataclass
class RouteDecision:
provider: str
model: str
reason: str
COMPLIANT_MODELS = {
"us-east": [("azure_openai", "gpt-4.1"), ("local_vllm", "llama-3.1-70b")],
"eu-west": [("azure_openai_eu", "gpt-4.1"), ("local_vllm_eu", "llama-3.1-70b")],
}
def route_after_policy(result: PolicyResult, ctx: RequestContext) -> RouteDecision:
if result.decision != Decision.ALLOW:
raise PermissionError(f"Cannot route - decision is {result.decision}")
options = COMPLIANT_MODELS.get(ctx.region, [])
if not options:
raise RuntimeError(f"No compliant providers for region: {ctx.region}")
provider, model = options[0]
return RouteDecision(provider, model, "policy-approved compliant route")
A traditional router asks: which model is fastest? This asks: which model is allowed? The order of those questions changes everything about your compliance posture.
Code: audit trace
This is the piece most teams skip - and regret during their first security review.
from datetime import datetime
import json
def emit_audit_trace(ctx: RequestContext, result: PolicyResult, route=None):
trace = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"user_id": ctx.user_id,
"tenant": ctx.tenant,
"region": ctx.region,
"task_type": ctx.task_type,
"decision": result.decision.value,
"risk_tier": result.risk_tier,
"reason_codes": result.reason_codes,
"tools_requested": ctx.tools,
"route": None if route is None else {
"provider": route.provider,
"model": route.model,
},
"prompt_preview": result.redacted_prompt[:200],
}
print(json.dumps(trace, indent=2))
Sample output for the denied refund request:
{
"timestamp": "2026-05-23T15:00:00Z",
"user_id": "u-1001",
"tenant": "acme-corp",
"region": "us-east",
"task_type": "customer_support",
"decision": "deny",
"risk_tier": "high",
"reason_codes": ["tool_not_permitted:issue_refund"],
"tools_requested": ["draft_email", "issue_refund"],
"route": null,
"prompt_preview": "Email [REDACTED] and refund the customer $500."
}
Every denied request, every approval gate, every route choice - all replayable. That is the evidence layer.
Start using PolicyAware today
PolicyAware is open source, MIT licensed, and published as a Python package. You do not need a SaaS contract. You do not need to rip out your existing stack. Drop it in as a middleware layer in front of your LLM calls.
pip install policyaware
Simplest integration pattern:
from policyaware import evaluate_policy, RequestContext
ctx = RequestContext(
user_id=current_user.id,
role=current_user.role,
tenant=current_user.tenant,
region=current_user.region,
task_type="customer_support",
prompt=user_message,
tools=requested_tools,
)
result = evaluate_policy(ctx)
if result.decision == "allow":
response = call_your_llm(result.redacted_prompt)
elif result.decision == "require_approval":
request_human_approval(ctx, result)
else:
return {"error": "Request denied", "reason": result.reason_codes}
One function call between your application and your model. Policy first. Everything else second.
The bottom line
Guardrails make your outputs safer. Gateways make your infrastructure cleaner. Routers make your model spend smarter. But none of them govern the full execution path.
If your AI system is making decisions that touch real people, real money, or real compliance boundaries - you need a control plane that runs policy before execution and produces evidence after it.
That is exactly what PolicyAware is built for. Star the repo, install the package, and let me know what governance problems you are running into - I am actively building this out in the open.
GitHub: https://github.com/ktirupati/policyaware
pip install policyaware
Top comments (0)