TorkNetwork

Posted on Mar 24 • Originally published at tork.network

AI Governance Is a Seatbelt, Not Invincibility — And That's the Point

#ai #governance #startup #opinion

I am going to say something that most governance vendors will not: our product will not prevent every bad thing your AI does.

It will catch most of them. It will log all of them. And when something slips through — because something will — you will have cryptographic proof of what happened, when, and what your system did about it.

That is not a failure of governance. That is the entire point.

The invincibility myth

Open any AI governance vendor's website. Count the absolutes. "Complete compliance." "Total protection." "Zero risk deployment." "Bulletproof AI safety."

These are lies. Polite, well-designed, investor-friendly lies — but lies.

No PII detection system catches every pattern. South African ID numbers follow a predictable format. Credit card numbers pass the Luhn check. These are detectable. But a customer who writes "my number is nine five zero two zero one five eight zero zero zero eight six" has just bypassed every regex and most ML classifiers. A customer who puts their ID number in an image, or splits it across two messages, or embeds it in a question about someone else — these are edge cases that no scanner handles perfectly.

No policy enforcement system prevents every hallucination. LLMs are stochastic. They generate plausible text, not verified facts. You can constrain outputs, scan responses, and block known failure patterns — but you cannot guarantee that a model with 175 billion parameters will never produce a sentence you did not anticipate.

Anyone who tells you otherwise is selling a feeling, not a product.

The seatbelt

A seatbelt does not prevent car crashes. It does not make you a better driver. It does not stop the other driver from running a red light.

What it does: when the crash happens, it dramatically reduces the damage. It keeps you in the seat instead of through the windshield. It converts a fatal outcome into a survivable one.

AI governance is a seatbelt.

It will not prevent every PII leak. It will detect the vast majority of them — in real time, in the request path, before the data reaches the LLM or the database. The ones it misses are logged with enough context to detect them in review.

It will not prevent every hallucination. It will scan every response against policy rules and flag violations before the customer sees them. The hallucinations it misses are recorded with audit receipts, so when a customer says "your chatbot told me X," you can verify exactly what it said.

It will not prevent every rogue agent action. It will enforce escalation rules, maintain human override capability, and provide a kill switch that works in seconds, not deployments.

The seatbelt framing is uncomfortable for sales teams because it admits imperfection. It is comfortable for engineers and compliance officers because it is honest. And in my experience, honest framing builds longer customer relationships than invincibility promises.

What "defensible" actually means

When something goes wrong with your AI — and it will — you end up in one of two positions.

Position A: "We had no idea." No logs. No audit trail. No evidence of what the AI said. No record of governance scans. The regulator, the customer, or the lawyer asks what happened, and you reconstruct it from server logs and Slack threads. Your legal counsel describes this as "indefensible."

Position B: "We caught it in 200 milliseconds." You have a cryptographic receipt showing exactly what the customer sent, what governance detected, what the AI received (after redaction), what the AI responded, what the output scan flagged, and what action was taken. The receipt has a unique ID, a timestamp, and is stored independently of the conversation. Your legal counsel describes this as "defensible."

The difference between these two positions is not whether the incident happened. Incidents happen. The difference is whether you can prove you had systems in place, those systems were running, and you responded appropriately.

This is what governance actually provides: not prevention, but proof.

In 2023, the Italian data protection authority temporarily banned ChatGPT — not because it made errors, but because OpenAI could not adequately demonstrate how user data was being processed. The issue was not the AI's behaviour. It was the inability to prove what the AI was doing with the data. Governance that generates verifiable receipts addresses this directly.

When Air Canada's chatbot fabricated a bereavement discount, the company had no audit trail showing what the chatbot was instructed to say versus what it actually said. A governance layer with output scanning would not have prevented the hallucination with certainty. But it would have flagged a response that referenced a policy not present in the knowledge base, and it would have produced a receipt proving the system attempted to catch it.

Defensible does not mean perfect. It means prepared.

Three layers of realistic governance

Layer 1: Detection

Catch PII. Catch policy violations. Catch anomalies.

Real-time scanning of every input and every output. Regular expressions for structured data (credit cards, ID numbers, phone numbers). ML classifiers for unstructured PII (names mentioned in context, addresses described in prose). Policy rules for topic boundaries, claim restrictions, and escalation triggers.

Will you catch 100%? No. Aim for 99%+. The last 1% is why you have Layer 2.

Detection is not just about blocking bad content. It is about knowing what passed through your system. A scan that returns "allow" is just as important as a scan that returns "deny" — both generate receipts, both are auditable, both prove the system was running.

Layer 2: Evidence

Cryptographic receipts for every interaction.

Every scan — input and output — produces a receipt with a unique ID. The receipt records what was scanned, what was detected, what action was taken, and when. Receipts are stored independently of conversation logs. They cannot be retroactively modified without detection.

When something goes wrong, you do not grep through log files hoping to find the relevant entry. You query receipts by conversation ID, by tenant, by time range, or by action type. The data is structured, indexed, and verifiable.

This layer exists for the incidents that Layer 1 misses. Detection failed, but evidence did not. You can prove the system was running, prove what it scanned, and prove the outcome. The gap between "our system should have caught this" and "our system did not catch this, but here is the complete record of what happened" is the gap between a compliance violation and a documented incident.

Layer 3: Response

When the AI fails, humans take over immediately.

Escalation rules that trigger on frustration signals, explicit handoff requests, and repeated failed interactions. A kill switch that disables AI responses in seconds — per tenant, per topic, or globally. Human override capability that lets a supervisor intervene in a live conversation.

Response is the layer most governance vendors ignore because it requires operational design, not just software. A kill switch is useless if nobody is monitoring. Escalation rules are useless if there is no human to escalate to. This layer is as much about process as it is about code.

The three layers work together. Detection catches most problems before they reach the customer. Evidence ensures every interaction is recorded regardless of what detection caught. Response ensures that when both detection and evidence reveal a problem, a human can act on it immediately.

No single layer is sufficient. All three together give you a defensible position.

Why this framing sells better

I know this reads like an argument against selling governance. It is the opposite.

Customers who have been burned by vendor promises are the most skeptical buyers in enterprise software. They have heard "100% uptime," "zero data loss," and "complete protection" before. They know what those claims are worth when the incident report lands on their desk.

"We will reduce your risk by 95%+ and give you a complete audit trail for the rest" is a statement that a CTO can repeat to their board without feeling like they are being dishonest. "We eliminate all AI risk" is a statement that makes a CTO wonder what you are hiding.

The seatbelt framing also manages expectations after the sale. Customers who understand that governance is risk reduction — not risk elimination — have fewer complaints when an edge case gets through. They expected it. They planned for it. The audit receipt is there. The escalation path worked. The system performed as described.

Customers who were sold invincibility experience every edge case as a broken promise. They call support angry. They question the product. They churn.

Honest framing produces longer contracts. In a market full of absolute claims, "we are really good at this but not perfect, and here is how we handle the imperfection" is a differentiator.

How we built this into Tork

At Tork, every design decision starts from the assumption that the system will encounter inputs it cannot perfectly handle.

Every marketing claim is probabilistic, not absolute. We detect PII. We do not claim to detect all PII. We scan outputs. We do not claim to prevent all hallucinations. We generate audit receipts. We do claim those receipts are cryptographic and tamper-evident — because that is a property of the system, not a probabilistic outcome.

Every governance receipt in Tork Chat is a structured record with a unique ID, stored independently of the conversation. When our customers face a compliance query, they pull receipts — not log files.

Every escalation path ends at a human. The escalation agent in our multi-agent system does not generate an AI response. It produces a fixed handoff message and routes to a person. No LLM in the loop when a customer is frustrated. This is not a limitation of our AI. It is a design choice that acknowledges the AI's limitations.

Every tenant has a kill switch. One API call disables AI responses for that tenant. The widget falls back to a contact form. This exists not because we expect to use it often, but because the alternative — a deployment pipeline — takes minutes when you need seconds.

We built governance as a seatbelt because that is what our customers actually need: a system that reduces risk, proves what happened, and lets humans take over when the AI is not enough.

The uncomfortable truth

If your governance vendor promises 100% protection, one of two things is true:

They do not understand the problem space. PII detection, hallucination prevention, and policy enforcement are probabilistic problems operating on stochastic systems. Anyone who has built these systems knows that edge cases are infinite and perfection is asymptotic.

Or they understand it perfectly and chose to lie anyway, because "95%+ detection with complete audit trails" does not look as good on a slide deck as "total protection."

Either way, you should ask harder questions. Ask for false negative rates. Ask what happens when detection fails. Ask to see an audit receipt from a real incident. Ask how fast the kill switch works. Ask who gets paged at 3am when the AI starts misbehaving.

The vendors who answer these questions clearly are the ones who have thought about failure. The ones who deflect back to "complete protection" have not.

Governance is risk reduction, not risk elimination. The companies that understand this will build more resilient systems, maintain longer customer relationships, and have better outcomes when — not if — something goes wrong.

Build your seatbelt. Skip the invincibility cape.

We built Tork on this philosophy. Governance-first AI for customer-facing deployments. The honest kind. Read more about governed AI agents in The Agent Crisis, available free at tork.network.

DEV Community