KRISHNA KISHOR TIRUPATI

Posted on May 18

Build a Policy-Aware AI Gateway in Python: Data Protection + Policy Enforcement with policyaware

#ai #security #opensource #python

Most AI apps ship without any real governance layer. Prompts flow raw to models, sensitive data ends up in logs, and nobody finds out until a compliance audit or a breach. I built policyaware to fix that — a Python-first package that gives you data protection and policy enforcement in front of any AI system.

This article is a hands-on technical walkthrough. Every section has working code. By the end you will have a pattern you can wire into any AI gateway or agent pipeline today.

Quick Install

!pip install policyaware

GitHub: https://github.com/ktirupati/policyaware
Wiki: https://github.com/ktirupati/policyaware/wiki

Part 1 — Data Protection

What the engine detects

The DataProtectionEngine scans any string and returns a structured DataFindings object. It classifies content into three buckets:

Bucket	What it catches
PII	email, phone, SSN, credit card
PHI	medical record, patient ID, diagnosis, medication
Secrets	API keys, bearer tokens, private keys

Inspecting a prompt

from policyaware import DataProtectionEngine

text = "Hi, I'm Jane. Reach me at jane@example.com or 212-555-7890."

engine = DataProtectionEngine()
findings = engine.inspect(text)

print(findings.contains_pii)        # True
print(findings.contains_phi)        # False
print(findings.contains_secrets)    # False
print(findings.contains_sensitive)  # True  (aggregate flag)
print(findings.categories)          # ['email', 'phone']
print(findings.redactions)          # 2

DataFindings field reference

Field	Type	Description
`contains_pii`	bool	email, phone, SSN, credit card detected
`contains_phi`	bool	medical record, diagnosis, medication detected
`contains_secrets`	bool	API key, bearer token, private key detected
`contains_sensitive`	bool	True if any of the above is True
`categories`	list	e.g. `['email', 'phone', 'ssn']`
`redactions`	int	Total number of matches found
`redacted_text`	str	Sanitised text returned by `.redact()`

Part 2 — Policy Enforcement

Data protection tells you what is in the request. Policy enforcement tells you what to do about it. The PolicyEngine loads a YAML file and evaluates every request against your rules, returning a structured PolicyDecision.

The four decision outcomes

Decision	Meaning
`allow`	Request passes through, apply any transforms
`deny`	Request is blocked outright
`conditional_allow`	Passes but triggers follow-up checks
`require_approval`	Routes to a human-in-the-loop flow

The engine is deny-by-default. If no rule explicitly grants access, the request is blocked. No silent pass-throughs.

Writing your first policy YAML

Rules reference DataFindings fields directly via the data root:

# support_policy.yaml
id: support_policy
schema_version: "0.2"
default: deny

rules:

  # Rule 1: Block anything containing secrets (API keys, tokens)
  - name: deny_secret_leakage
    effect: deny
    when:
      data.contains_secrets: true

  # Rule 2: Redact PII for standard users, but not for compliance officers
  - name: redact_pii_standard_users
    effect: transform
    action: redact
    when:
      data.contains_pii: true
      user.role_not_in:
        - privacy_admin
        - compliance_officer

  # Rule 3: Allow support agents in US for low/medium risk requests
  - name: allow_support_agents
    effect: allow
    when:
      user.role_in:
        - support_agent
        - support_manager
      request.region: us
      risk.tier_in:
        - low
        - medium

Enforcing the policy at runtime

Load the YAML, build a GatewayRequest, inspect the prompt, then call decide:

from policyaware import DataProtectionEngine, GatewayRequest, PolicyEngine

# Load policy from YAML file
policy = PolicyEngine.from_file("support_policy.yaml")

# Build the request context
request = GatewayRequest(
    tenant="acme-corp",
    app="support-copilot",
    user={"role": "support_agent", "id": "u_001"},
    context={"region": "us", "risk": "low"},
    messages=[{"role": "user", "content": "Email jane@example.com, urgent!"}],
)

# Step 1: inspect the prompt
findings = DataProtectionEngine().inspect(request.prompt_text)

# Step 2: evaluate policy
decision = policy.decide(request, findings)

# Step 3: act on the decision
print(decision.decision.value)   # 'allow' / 'deny' / 'conditional_allow' / 'require_approval'
print(decision.actions)          # ['redact']
print(decision.matched_rules)    # ['redact_pii_standard_users', 'allow_support_agents']
print(decision.violated_rules)   # []
print(decision.reason)           # Human-readable explanation
print(decision.reason_codes)     # Machine-readable codes for logging
print(decision.risk_score)       # Numeric risk score
print(decision.risk_tier)        # 'low' / 'medium' / 'high' / 'critical'
print(decision.remediation)      # Suggested fix if blocked

PolicyDecision field reference

Field	Type	Description
`decision`	enum	`allow`, `deny`, `conditional_allow`, `require_approval`
`actions`	list	Transforms to apply e.g. `['redact']`
`matched_rules`	list	Rules that matched the request
`violated_rules`	list	Rules that were violated (for audit logs)
`reason`	str	Human-readable explanation
`reason_codes`	list	Machine-readable codes for dashboards
`risk_score`	float	Numeric risk score
`risk_tier`	str	`low`, `medium`, `high`, `critical`
`remediation`	str	Suggested fix when request is blocked

Policy Context Roots

Inside every when clause you can reference these roots:

Root	Example usage	What it covers
`tenant`	`tenant: acme`	Customer or team identifier
`app`	`app: support-copilot`	Calling application or service
`user`	`user.role_in: [support_agent]`	Role, ID, department attributes
`request`	`request.region: us`	Region, task type, autonomy level
`data`	`data.contains_pii: true`	Output from `DataProtectionEngine`
`risk`	`risk.tier_in: [low, medium]`	Risk score and tier
`ml`	`ml.prompt_injection.detected: true`	Optional ML classifier signals

Validate Policies Before Production

Ship broken policies and you get silent misses or unintended blocks. policyaware ships a schema validator and CLI to catch issues early.

Python validator:

import yaml
from policyaware import PolicySchemaValidator

with open("support_policy.yaml", "r", encoding="utf-8") as f:
    policy = yaml.safe_load(f)

PolicySchemaValidator().validate(policy)  # raises on schema errors

CLI commands:

# Validate the YAML schema
policyaware policy validate support_policy.yaml

# Explain how a specific request flows through your rules
policyaware policy explain --request sample_request.json

The explain command is especially useful in CI/CD pipelines — you can run policy checks against a suite of sample requests before merging.

Optional: ML-Assisted PII Detection with Presidio

Regex-based rules miss things like names and addresses. For those, policyaware supports an optional Microsoft Presidio integration:

pip install "policyaware[presidio]"

from policyaware import PresidioPIIClassifier

classifier = PresidioPIIClassifier(score_threshold=0.5)

assessment = classifier.classify(
    "Jane Doe lives at 120 Main St and her phone is 212-555-7890."
)

print(assessment.model_dump())
# Returns detected entities with type, value, and confidence score

The Presidio findings feed back into the same data and ml roots in your YAML, giving you deterministic + ML detection in one framework.

TL;DR — What You Get in One Package

Capability	How
Detect PII, PHI, Secrets	`DataProtectionEngine().inspect(text)`
Redact sensitive content	`DataProtectionEngine().redact(text)`
Enforce access policies via YAML	`PolicyEngine.from_file("policy.yaml")`
Rich audit-ready decisions	`PolicyDecision` with reason, risk, remediation
ML-assisted detection	`PresidioPIIClassifier` (optional extra)
Validate policies before shipping	`PolicySchemaValidator` + CLI

Get Started Now

!pip install policyaware

Here is the fastest path to seeing value:

Install the package
Run DataProtectionEngine().inspect() on one real prompt from your app
Write a 3-rule YAML that reflects your actual governance needs
Call policy.decide(request, findings) and log the full PolicyDecision

That four-step experiment is enough to understand whether policyaware fits your stack.

I am the author and sole maintainer of this package. I built it because every AI project I worked on had the same gap — no structured layer between raw user input and the model. If you run into anything unexpected, have a governance pattern not covered yet, or want to contribute, I want to hear from you.

GitHub: https://github.com/ktirupati/policyaware
Wiki & Docs: https://github.com/ktirupati/policyaware/wiki

If this was useful, drop a like, share it with your team, and star the repo. Every bit of feedback helps make policyaware better for everyone building serious AI systems in Python.

DEV Community