Most AI apps ship without any real governance layer. Prompts flow raw to models, sensitive data ends up in logs, and nobody finds out until a compliance audit or a breach. I built policyaware to fix that — a Python-first package that gives you data protection and policy enforcement in front of any AI system.
This article is a hands-on technical walkthrough. Every section has working code. By the end you will have a pattern you can wire into any AI gateway or agent pipeline today.
Quick Install
!pip install policyaware
GitHub: https://github.com/ktirupati/policyaware
Wiki: https://github.com/ktirupati/policyaware/wiki
Part 1 — Data Protection
What the engine detects
The DataProtectionEngine scans any string and returns a structured DataFindings object. It classifies content into three buckets:
| Bucket | What it catches |
|---|---|
| PII | email, phone, SSN, credit card |
| PHI | medical record, patient ID, diagnosis, medication |
| Secrets | API keys, bearer tokens, private keys |
Inspecting a prompt
from policyaware import DataProtectionEngine
text = "Hi, I'm Jane. Reach me at jane@example.com or 212-555-7890."
engine = DataProtectionEngine()
findings = engine.inspect(text)
print(findings.contains_pii) # True
print(findings.contains_phi) # False
print(findings.contains_secrets) # False
print(findings.contains_sensitive) # True (aggregate flag)
print(findings.categories) # ['email', 'phone']
print(findings.redactions) # 2
DataFindings field reference
| Field | Type | Description |
|---|---|---|
contains_pii |
bool | email, phone, SSN, credit card detected |
contains_phi |
bool | medical record, diagnosis, medication detected |
contains_secrets |
bool | API key, bearer token, private key detected |
contains_sensitive |
bool | True if any of the above is True |
categories |
list | e.g. ['email', 'phone', 'ssn']
|
redactions |
int | Total number of matches found |
redacted_text |
str | Sanitised text returned by .redact()
|
Part 2 — Policy Enforcement
Data protection tells you what is in the request. Policy enforcement tells you what to do about it. The PolicyEngine loads a YAML file and evaluates every request against your rules, returning a structured PolicyDecision.
The four decision outcomes
| Decision | Meaning |
|---|---|
allow |
Request passes through, apply any transforms |
deny |
Request is blocked outright |
conditional_allow |
Passes but triggers follow-up checks |
require_approval |
Routes to a human-in-the-loop flow |
The engine is deny-by-default. If no rule explicitly grants access, the request is blocked. No silent pass-throughs.
Writing your first policy YAML
Rules reference DataFindings fields directly via the data root:
# support_policy.yaml
id: support_policy
schema_version: "0.2"
default: deny
rules:
# Rule 1: Block anything containing secrets (API keys, tokens)
- name: deny_secret_leakage
effect: deny
when:
data.contains_secrets: true
# Rule 2: Redact PII for standard users, but not for compliance officers
- name: redact_pii_standard_users
effect: transform
action: redact
when:
data.contains_pii: true
user.role_not_in:
- privacy_admin
- compliance_officer
# Rule 3: Allow support agents in US for low/medium risk requests
- name: allow_support_agents
effect: allow
when:
user.role_in:
- support_agent
- support_manager
request.region: us
risk.tier_in:
- low
- medium
Enforcing the policy at runtime
Load the YAML, build a GatewayRequest, inspect the prompt, then call decide:
from policyaware import DataProtectionEngine, GatewayRequest, PolicyEngine
# Load policy from YAML file
policy = PolicyEngine.from_file("support_policy.yaml")
# Build the request context
request = GatewayRequest(
tenant="acme-corp",
app="support-copilot",
user={"role": "support_agent", "id": "u_001"},
context={"region": "us", "risk": "low"},
messages=[{"role": "user", "content": "Email jane@example.com, urgent!"}],
)
# Step 1: inspect the prompt
findings = DataProtectionEngine().inspect(request.prompt_text)
# Step 2: evaluate policy
decision = policy.decide(request, findings)
# Step 3: act on the decision
print(decision.decision.value) # 'allow' / 'deny' / 'conditional_allow' / 'require_approval'
print(decision.actions) # ['redact']
print(decision.matched_rules) # ['redact_pii_standard_users', 'allow_support_agents']
print(decision.violated_rules) # []
print(decision.reason) # Human-readable explanation
print(decision.reason_codes) # Machine-readable codes for logging
print(decision.risk_score) # Numeric risk score
print(decision.risk_tier) # 'low' / 'medium' / 'high' / 'critical'
print(decision.remediation) # Suggested fix if blocked
PolicyDecision field reference
| Field | Type | Description |
|---|---|---|
decision |
enum |
allow, deny, conditional_allow, require_approval
|
actions |
list | Transforms to apply e.g. ['redact']
|
matched_rules |
list | Rules that matched the request |
violated_rules |
list | Rules that were violated (for audit logs) |
reason |
str | Human-readable explanation |
reason_codes |
list | Machine-readable codes for dashboards |
risk_score |
float | Numeric risk score |
risk_tier |
str |
low, medium, high, critical
|
remediation |
str | Suggested fix when request is blocked |
Policy Context Roots
Inside every when clause you can reference these roots:
| Root | Example usage | What it covers |
|---|---|---|
tenant |
tenant: acme |
Customer or team identifier |
app |
app: support-copilot |
Calling application or service |
user |
user.role_in: [support_agent] |
Role, ID, department attributes |
request |
request.region: us |
Region, task type, autonomy level |
data |
data.contains_pii: true |
Output from DataProtectionEngine
|
risk |
risk.tier_in: [low, medium] |
Risk score and tier |
ml |
ml.prompt_injection.detected: true |
Optional ML classifier signals |
Validate Policies Before Production
Ship broken policies and you get silent misses or unintended blocks. policyaware ships a schema validator and CLI to catch issues early.
Python validator:
import yaml
from policyaware import PolicySchemaValidator
with open("support_policy.yaml", "r", encoding="utf-8") as f:
policy = yaml.safe_load(f)
PolicySchemaValidator().validate(policy) # raises on schema errors
CLI commands:
# Validate the YAML schema
policyaware policy validate support_policy.yaml
# Explain how a specific request flows through your rules
policyaware policy explain --request sample_request.json
The explain command is especially useful in CI/CD pipelines — you can run policy checks against a suite of sample requests before merging.
Optional: ML-Assisted PII Detection with Presidio
Regex-based rules miss things like names and addresses. For those, policyaware supports an optional Microsoft Presidio integration:
pip install "policyaware[presidio]"
from policyaware import PresidioPIIClassifier
classifier = PresidioPIIClassifier(score_threshold=0.5)
assessment = classifier.classify(
"Jane Doe lives at 120 Main St and her phone is 212-555-7890."
)
print(assessment.model_dump())
# Returns detected entities with type, value, and confidence score
The Presidio findings feed back into the same data and ml roots in your YAML, giving you deterministic + ML detection in one framework.
TL;DR — What You Get in One Package
| Capability | How |
|---|---|
| Detect PII, PHI, Secrets | DataProtectionEngine().inspect(text) |
| Redact sensitive content | DataProtectionEngine().redact(text) |
| Enforce access policies via YAML | PolicyEngine.from_file("policy.yaml") |
| Rich audit-ready decisions |
PolicyDecision with reason, risk, remediation |
| ML-assisted detection |
PresidioPIIClassifier (optional extra) |
| Validate policies before shipping |
PolicySchemaValidator + CLI |
Get Started Now
!pip install policyaware
Here is the fastest path to seeing value:
- Install the package
- Run
DataProtectionEngine().inspect()on one real prompt from your app - Write a 3-rule YAML that reflects your actual governance needs
- Call
policy.decide(request, findings)and log the fullPolicyDecision
That four-step experiment is enough to understand whether policyaware fits your stack.
I am the author and sole maintainer of this package. I built it because every AI project I worked on had the same gap — no structured layer between raw user input and the model. If you run into anything unexpected, have a governance pattern not covered yet, or want to contribute, I want to hear from you.
- GitHub: https://github.com/ktirupati/policyaware
- Wiki & Docs: https://github.com/ktirupati/policyaware/wiki
If this was useful, drop a like, share it with your team, and star the repo. Every bit of feedback helps make policyaware better for everyone building serious AI systems in Python.
Top comments (0)