What We Are Building
By the end of this workshop, you will have a working AI governance layer you can drop into any project shipping AI features. No compliance team required. No enterprise tooling. Just four patterns that will save you from a painful rewrite the moment a regulated customer shows up.
Let me show you a pattern I use in every project — and it starts before your first AI inference ever reaches a user.
Prerequisites
- A project making API calls to any LLM (OpenAI, Anthropic, etc.)
- A PostgreSQL database (or any append-only store)
- Basic familiarity with YAML and structured logging
- About 15-20% more patience during your initial build (it pays back 4-8x)
Step 1: Decision Logging as Infrastructure
Every AI call your app makes should produce a structured record before the response hits the user. This is not console.log. This is an immutable audit trail.
Here is the minimal setup to get this working:
CREATE TABLE ai_decisions (
decision_id TEXT PRIMARY KEY,
model TEXT NOT NULL,
input_hash TEXT NOT NULL,
output_summary TEXT,
confidence NUMERIC,
policy_version TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Write-only service role. No updates. No deletes.
REVOKE UPDATE, DELETE ON ai_decisions FROM app_service;
Then wrap your inference calls:
import hashlib, uuid, datetime
def log_decision(model, input_text, output_summary, confidence, policy_version):
return {
"decision_id": f"d-{uuid.uuid4().hex[:12]}",
"model": model,
"input_hash": hashlib.sha256(input_text.encode()).hexdigest(),
"output_summary": output_summary[:500],
"confidence": confidence,
"policy_version": policy_version,
"created_at": datetime.datetime.utcnow().isoformat()
}
Call this before you return any AI output. Every inference gets a record. No exceptions.
Step 2: Pin Your Model Versions
Never point production code at latest. Pin every model reference explicitly.
# Do this
MODEL_ID = "gpt-4o-2025-12-17"
# Never this
MODEL_ID = "gpt-4o"
When you upgrade, run the new version alongside the old one and compare outputs on real inputs before cutting over. This is the difference between confidently answering "which model produced this output six months ago" and guessing.
Step 3: Build a Human Review Hook
Route anything below your confidence threshold to a review queue. For a solo dev, this can be dead simple — a Slack webhook:
import requests
CONFIDENCE_THRESHOLD = 0.75
def maybe_flag_for_review(decision):
if decision["confidence"] < CONFIDENCE_THRESHOLD:
requests.post(SLACK_WEBHOOK_URL, json={
"text": f"🔍 Low-confidence output ({decision['confidence']}): "
f"{decision['output_summary'][:200]}\n"
f"Decision ID: {decision['decision_id']}"
})
You are not building a full moderation dashboard. You are building a tripwire.
Step 4: Policy-as-Code
Here is the gotcha that will save you hours: governance rules belong in version control, not in a Google Doc nobody reads.
# governance.yml — checked in alongside your app code
policies:
- name: high_risk_review
trigger: "category in ['healthcare', 'financial', 'pii']"
action: route_to_human_review
- name: confidence_gate
trigger: "confidence < 0.75"
action: flag_for_review
- name: approved_models
allowed:
- "gpt-4o-2025-12-17"
- "claude-sonnet-4-6"
Load this at startup. Every deployment is now traceable to a specific policy state. When an auditor asks "what were your rules on March 15th?", you run git log governance.yml.
Gotchas and Common Mistakes
- Mixing decision logs with application logs. Keep them separate. App logs get rotated and deleted. Decision records are permanent. The docs do not mention this, but the moment you need to prove an output lineage, grep through nginx logs is not going to cut it.
- Hashing full outputs instead of inputs. Hash the input to prove what went in. Store a summary of what came out. Full output storage gets expensive fast and often contains PII you do not want sitting in a flat table.
- Skipping governance on internal tools. If your AI feature touches data that belongs to a customer, it needs a decision log — even if only your team sees the output.
- Treating model upgrades as dependency bumps. A model upgrade is a deployment. Test it, compare outputs, then cut over. Silent upgrades are the number one source of "it worked last week" bugs in AI systems.
Wrapping Up
That 15-20% upfront investment covers you the moment an enterprise prospect asks how your model arrived at a specific output. You are not competing on compliance team size — you are competing on architectural maturity.
Start with decision logging today. Add the other three patterns as you ship. A solo dev with governance built in will outsell an ungoverned team of twenty in any regulated market.
Resources:
Top comments (0)