Sourish Chakraborty

Posted on Jun 19

I Built a PII Firewall for LLMs in a Weekend (and Caught My Own Leak)

#privacy #llm #security #opensource

Three weeks ago I was benchmarking GPT-4o against a local Llama model. I was copying prompts from a real support ticket database to make the test realistic. Midway through the run I glanced at the terminal and saw this in the logs:

prompt="Hi, my name is Sarah Johnson, my account number is 4532-1234-5678-9012..."
provider=cloud
model=gpt-4o

A real customer's name. A real credit card number. Already sent to OpenAI.

I had not noticed because the benchmark UI just showed a token count, not the actual prompt content. The PII was in the data. I had forgotten to sanitise it. OpenAI's API terms say they don't train on API data, but that's not the point — the data left my infrastructure. Under GDPR, that's a potential breach.

I spent the rest of that weekend building a firewall so it could never happen again. This post is the full story of what I built, how it works, and how you can run it in one command.

The code is at github.com/sochaty/llm-governance-engine — tag governance-post-1.

The Problem With Every Existing LLM Tool

Every LLM observability tool I have used — LangSmith, Helicone, Arize Phoenix — works the same way: it records what happened after the fact. You get a dashboard, a trace, a cost breakdown. None of them stop the request.

That distinction matters enormously under GDPR, HIPAA, and the EU AI Act. "We logged that PII was sent" is not a compliance posture. "PII was blocked before it left the building" is.

By the end of this post you will have:

A FastAPI backend that scans every prompt with Microsoft Presidio before it reaches any model
A YAML-based policy engine where a rule file controls what gets blocked, warned, or alerted
A PostgreSQL audit vault with every inference logged — PII flag, safety score, cost, latency
Webhook alerts to Slack or Teams when a rule fires
An Angular 21 dashboard for real-time cloud vs local benchmarks
A 10-dimension governance radar chart on every run

Everything runs with docker compose up.

Architecture

The key insight is where enforcement happens: before the model call, not after.

User Prompt
    │
    ▼
FastAPI /benchmark/stream
    │
    ├── enforce_governance_policy()   ← Presidio scan + policy evaluation
    │       │
    │       ├── PII detected + cloud model → HTTP 403 (prompt never sent)
    │       ├── Safety score low → warn + log + continue
    │       └── All rules pass → verdict returned to endpoint
    │
    ├── LLMOrchestrator.get_streaming_response()
    │       │
    │       ├── OpenAI / Groq / Google / Anthropic (cloud)
    │       └── Ollama (local)
    │
    └── AuditService → PostgreSQL

The enforce_governance_policy function is a FastAPI Depends() — injected into the streaming endpoint. If a blocking rule fires, it raises HTTP 403 before the orchestrator is even called. The prompt never touches the wire.

The YAML Policy DSL

The entire governance model is a YAML file. No code changes, no restarts — edit the file, POST /api/v1/policies/reload, rules are live.

# policies/default.yaml
version: "1.0"
name: "default"

rules:
  - id: pii-cloud-block
    name: "Block PII from cloud models"
    condition: pii_detected
    threshold: 0.7          # Presidio confidence ≥ 0.7 triggers this rule
    models: [cloud, gpt-4o]
    action: block           # returns HTTP 403
    severity: critical
    webhook_url: null       # set to your Slack URL to get alerted

  - id: low-safety-warn
    name: "Warn on low safety score"
    condition: safety_score_below
    threshold: 0.5
    action: warn            # logs + audits, passes through
    severity: medium

  - id: pii-local-alert
    name: "Alert on PII sent to local models"
    condition: pii_detected
    threshold: 0.85
    models: [local]
    action: alert           # fires webhook, does not block
    severity: high

Four conditions: pii_detected, safety_score_below, cost_exceeds, model_is.
Three actions: block (HTTP 403), warn (audit + continue), alert (webhook + continue).

Starter templates are shipped in the repo for GDPR (policies/gdpr.yaml) and HIPAA (policies/hipaa.yaml).

PII Detection: Microsoft Presidio

Presidio is Microsoft's open-source PII detection library. It runs locally — no API call, no data leaving your machine.

It detects 50+ entity types out of the box: PERSON, EMAIL_ADDRESS, CREDIT_CARD, US_SSN, PHONE_NUMBER, IBAN_CODE, IP_ADDRESS, and more. It uses a combination of regex patterns, checksums, and a spaCy NLP model for name recognition.

The scan returns a confidence score per entity. The policy engine compares that score against the rule's threshold. An entity with 0.95 confidence on CREDIT_CARD and a threshold of 0.7 triggers the pii-cloud-block rule.

# backend/app/services/audit_service.py (simplified)
from presidio_analyzer import AnalyzerEngine

class AuditService:
    def __init__(self):
        self.analyzer = AnalyzerEngine()

    def scan_for_pii_details(self, text: str) -> ScanResult:
        results = self.analyzer.analyze(text=text, language="en")
        detected = len(results) > 0
        entities = [
            EntityResult(
                entity_type=r.entity_type,
                confidence=r.score,
                start=r.start,
                end=r.end,
            )
            for r in results
        ]
        max_confidence = max((r.score for r in results), default=0.0)
        return ScanResult(
            detected=detected,
            entities=entities,
            max_confidence=max_confidence,
        )

The safety score is calculated separately — it is a 0.0–1.0 measure that combines PII confidence, entity density, and sensitive keyword presence. A score below 0.5 triggers the low-safety-warn rule.

The Policy Engine

The engine follows a Chain of Responsibility pattern. Each rule evaluates the GovernanceContext independently:

# backend/app/governance/policy/schema.py
@dataclass
class GovernanceContext:
    prompt: str
    provider: str
    model_id: str
    pii_detected: bool
    pii_entity_types: List[str]
    pii_max_confidence: float
    safety_score: float
    estimated_prompt_cost_usd: float


class PolicyVerdict(BaseModel):
    passed: bool
    violated_rules: List[ViolatedRule] = []
    blocking_rule: Optional[ViolatedRule] = None
    warnings: List[str] = []

The DefaultPolicyEngine.evaluate() iterates all rules in order. Block rules short-circuit. Warn and alert rules accumulate into the verdict. The verdict is returned to the FastAPI dependency, which raises HTTP 403 if blocking_rule is set.

The FastAPI Enforcement Dependency

This is the part that makes everything composable. One line wires the entire governance stack into any endpoint:

# backend/app/api/benchmark_router.py
@router.get("/stream")
async def stream_benchmark(
    verdict: PolicyVerdict = Depends(enforce_governance_policy),
    db: AsyncSession = Depends(get_db),
):
    # If we reach here, the prompt passed all blocking rules.
    # verdict.warnings contains any non-blocking rule hits.
    ...

The dependency itself:

# backend/app/governance/policy/enforcement.py (simplified)
async def enforce_governance_policy(
    prompt: Annotated[str, Query(min_length=1)],
    provider: Annotated[str, Query(pattern="^(cloud|local)$")] = "cloud",
    db: AsyncSession = Depends(get_db),
) -> PolicyVerdict:
    engine = get_policy_engine()
    audit = _get_audit_service()

    scan = audit.scan_for_pii_details(prompt)

    context = GovernanceContext(
        prompt=prompt,
        provider=provider,
        model_id="gpt-4o" if provider == "cloud" else "llama3.2:latest",
        pii_detected=scan.detected,
        pii_entity_types=[e.entity_type for e in scan.entities],
        pii_max_confidence=scan.max_confidence,
        safety_score=audit.calculate_safety_score(prompt),
        estimated_prompt_cost_usd=(len(prompt.split()) * 0.00003)
        if provider == "cloud" else 0.0,
    )

    verdict = engine.evaluate(context)

    for violation in verdict.violated_rules:
        webhook_url = _get_webhook_url(engine, violation.rule_id)
        await _record_violation(db, violation, context, webhook_url)

    if not verdict.passed and verdict.blocking_rule:
        br = verdict.blocking_rule
        raise HTTPException(
            status_code=403,
            detail={
                "error": "governance_violation",
                "rule_id": br.rule_id,
                "rule_name": br.rule_name,
                "severity": br.severity,
                "message": br.message,
            },
        )

    return verdict

Every violation — blocked or not — is persisted to policy_violations in PostgreSQL before the function returns. Webhook delivery is fire-and-forget via asyncio.create_task() so it never adds latency to the response path.

Webhook Alerts

When a rule fires with a webhook_url, a CloudEvents-compatible payload is POSTed:

{
  "specversion": "1.0",
  "type": "com.governance.policy.violation",
  "source": "llm-governance-engine",
  "id": "uuid",
  "time": "2026-06-19T09:00:00Z",
  "data": {
    "rule_id": "pii-cloud-block",
    "rule_name": "Block PII from cloud models",
    "severity": "critical",
    "action": "block",
    "message": "PII detected (CREDIT_CARD, confidence=0.95) on cloud provider",
    "provider": "cloud",
    "model_id": "gpt-4o"
  }
}

Three delivery attempts with exponential backoff. Slack, Teams, and PagerDuty all accept this payload natively via their incoming webhook integrations.

Running It

git clone https://github.com/sochaty/llm-governance-engine
git checkout governance-post-1
cp .env.example .env
# Add your OPENAI_API_KEY (or any provider key)
docker compose up

Dashboard → http://localhost:4200
API docs → http://localhost:8000/docs

Pull a local model to enable the side-by-side comparison:

curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2:latest"}'

Trigger your first governance block:
Open the dashboard, type a prompt containing a fake SSN — My SSN is 123-45-6789 — select the Cloud provider and hit Run. You will get a red Governance Violation banner instead of a response. The prompt never reached GPT-4o.

Open http://localhost:8000/api/v1/policies/violations to see the audit record of the block.

What the Audit Vault Captures

Every inference — blocked or not — is stored in PostgreSQL:

Field	Example
`prompt` (preview)	"My SSN is 123-45..."
`provider`	cloud
`model_name`	gpt-4o
`pii_detected`	true
`safety_score`	0.12
`latency_ms`	0 (blocked before model)
`estimated_cost`	$0.0000
`version_tag`	openai/gpt-4o

The Audit Vault page in the dashboard is filterable by prompt, provider, and PII flag. Every row has a "Generate Report" button that exports a PDF — useful when a compliance officer asks for evidence.

Multi-Provider Routing

The orchestrator supports five provider types with a single interface:

Provider	How it connects
OpenAI	`AsyncOpenAI` — native
Groq	`AsyncOpenAI(base_url="https://api.groq.com/openai/v1")`
Google Gemini	`AsyncOpenAI(base_url="https://generativelanguage.googleapis.com/v1beta/openai")`
Anthropic	Lazy `import anthropic` — separate streaming path
Ollama (local)	`AsyncOpenAI(base_url="http://ollama-service:11434/v1", api_key="ollama")`

API keys are stored in PostgreSQL (Fernet-encrypted) and resolved live on every request via settings_service.get(). Change a key in the Settings UI — no restart needed, effective on the next request.

What's next

The codebase is production-ready for single-tenant use. The roadmap from here:

v1.1 — RAGAS hallucination scoring on every response. Local Ollama as the free evaluation judge — no marginal cost. faithfulness_score populates in the audit log 2–3 seconds after the benchmark completes.
v1.2 — FinOps dashboard. Daily cost trends per model, Z-score anomaly detection, budget circuit breakers. A rule that says "if GPT-4o spend exceeds $500 this week, route to Llama" enforced at the proxy layer — not just a dashboard alert.
v2.0 — Multi-tenant workspaces with JWT + RBAC. PostgreSQL Row-Level Security for tenant isolation. OIDC/SAML for Okta and Azure AD.

The incident that started this — a real customer's credit card number sent to GPT-4o because I forgot to sanitise a test dataset — took about 30 seconds to happen and would have taken weeks to untangle from a compliance perspective.

The fix took a weekend. It should have existed before the first prompt was ever sent.

Full code: github.com/sochaty/llm-governance-engine
Reproduce this post exactly: git checkout governance-post-1

PRs and issues welcome. If you build a custom Presidio recogniser for your domain (medical records, legal documents, financial instruments), I would love to include it in the default policy templates.

All my writing lives at blogs.sourishchakraborty.com — subscribe there for future posts.

Top comments (3)

VoltageGPU • Jun 21

Interesting project! The YAML-based rules engine reminds me of how we handle data classification in confidential computing environments. Have you considered integrating a lightweight tokenizer to better detect obfuscated PII (like hashed or partially masked values)? At VoltageGPU, we sometimes use similar techniques to filter input before dispatching to GPU inference.

Sourish Chakraborty • Jun 21

Thanks — the confidential-computing parallel is a good fit. The YAML DSL was intentionally designed to resemble the data-classification policies used in enterprise security tooling, so that framing lands well.

On the tokenizer idea, I agree. Presidio is effective for recognisable PII patterns, but obfuscated values can fall outside those patterns. For example, a partially masked card number such as 4532-*-*-9012 or a hashed email address may not be detected because the original format is no longer present.

A pre-processing step before the Presidio scan could address that. Custom transformers could normalise obfuscated formats, expand known abbreviations, and convert partial masks into a representation that can be evaluated consistently. Where reversal is possible and appropriate, a transformer could also resolve protected values before scanning.

The policy engine already passes a GovernanceContext through evaluation, so introducing a pre-scan transformer at that layer seems architecturally clean and keeps the approach extensible.

How do you handle hashed values at VoltageGPU? Do you reverse them before scanning when possible, or classify the hash itself as sensitive regardless of whether the underlying value can be recovered?

Med Marrouchi • Jun 19

Interesting stuff!