Mika Torren

Posted on Feb 23

Opt-In Safety Is Just Liability Transfer

#ai #security #devops #programming

Opt-In Safety Is Just Liability Transfer

CVE-2026-26030 dropped for Semantic Kernel last week. RCE via the CodeInterpreter plugin. LLM-generated strings executed directly, no validation. Microsoft patched it and added a RequireUserConfirmation flag to gate execution.

The flag is opt-in.

The default is still trust.

I keep turning that over. Not because the patch is wrong (it's fine, it stops the specific exploit), but because of what it means that the safe behavior requires you to ask for it. That's not a security model. That's Microsoft saying: we gave you the switch, you chose not to flip it. When the next breach happens, that's the sentence in the incident report.

Opt-in safety is liability transfer. Full stop.

The Architecture Makes This Worse

Flags are an insufficient answer because the underlying architecture has no concept of trust levels at all.

Schneier's group published a paper on "promptware" last week. The line that stuck with me: "Unlike traditional computing systems that strictly separate executable code from user data, LLMs process all input — whether it is a system command, a user's email, or a retrieved document — as a single, undifferentiated sequence of tokens. There is no architectural boundary to enforce a distinction between trusted instructions and untrusted data."

There's no ring 0 / ring 3 separation. There's no kernel/userspace boundary. It's tokens all the way down. A RequireUserConfirmation flag is a policy sitting on top of an architecture that literally cannot tell the difference between "run this code" and "here's an email that says run this code." The policy is downstream of the problem.

You can add all the flags you want. The model doesn't know it's being used as a vector.

We've Been Here Before

This isn't a new failure mode. It's the same one the web had, and it took years and a lot of damage to fix.

SameSite cookies. Remember when CSRF was a constant, boring, reliable vulnerability? Developers were supposed to set SameSite=Strict or SameSite=Lax on their session cookies. Most didn't. The opt-in secure behavior sat there, available, while CSRF attacks kept landing. Chrome eventually flipped the default to Lax in 2020. Not because developers started doing the right thing. Because Google got tired of waiting and just changed the behavior for everyone.

CSP is still playing out the same way. Content Security Policy has existed since 2012. It's powerful, it works, and adoption is still embarrassingly low because it's opt-in and configuration is annoying. Opt-in security doesn't scale. People don't opt in. The frameworks that get this right enforce deny-by-default and make you explicitly request capabilities.

The web took roughly a decade to learn this. I'm watching AI frameworks start the same clock.

What "Getting It Right" Actually Looks Like

There's a small cluster of tools being built right now that have internalized the correct model. They're not popular yet. They should be.

ouros (parcadei, Rust, MIT): "No filesystem, network, subprocess, or environment access. The only way sandbox code communicates with the outside world is through external functions you explicitly provide." That's it. That's the whole pitch. Deny by default, explicit grants only. Sub-microsecond startup. If you want the sandbox to read a file, you hand it a function that reads that file. The sandbox cannot go get the file itself.

nucleus (coproduct-opensource, Rust): Firecracker microVM, default-deny egress, DNS allowlist, and a non-escalating envelope (this is the part I keep coming back to). The policy can only tighten, never silently relax. You can restrict what an agent can do mid-session. You cannot grant it more than it started with. That property alone closes an entire class of privilege escalation attacks.

shuru (superhq-ai, Rust): Ephemeral rootfs per run. Every execution starts clean. There's no persistent state for a compromised agent to corrupt between runs.

Notice what these have in common: they're not adding flags to permissive systems. They're building systems where the default answer is no, and capability is something you construct explicitly.

The Blast Radius Problem Nobody's Asking About

One thing I haven't seen discussed enough: agents inherit credentials.

In most real deployments, an AI agent runs with whatever permissions the developer has. There's no concept of least privilege because nobody's built the tooling to express it easily. So you get what one HN commenter described last week as: "Your senior engineer has admin access but uses it carefully. Your AI agent has the same access and uses it indiscriminately. No concept of blast radius, no intuition about risk, no career on the line."

A senior engineer with admin access is dangerous to compromise. An AI agent with the same access is more dangerous because it will execute without hesitation, at machine speed, and the "convergence gap" (the window between when the agent mutates state and when the orchestration system reconciles it) means there's a period where only the agent knows what it intended to do. If it was injected during that window, the attacker knows and you don't.

That's not a CVE. That's an architectural property of how these systems are being deployed. No flag fixes it.

When the Default Flips

The question isn't whether AI framework defaults will eventually flip toward deny-by-default. They will. The SameSite story makes that inevitable. At some point the damage accumulates enough that someone with enough market power changes the default for everyone.

The question is how much gets burned before that happens.

MTTP (Mean Time to Prompt, the proposed metric for how quickly an internet-facing agent gets hit with an injection attempt) is currently under four hours based on honeypot data. Four hours. That's how long a freshly deployed agent exists in a safe state before someone starts probing it.

RequireUserConfirmation is opt-in. The default is trust. The clock starts at deployment.

Flip your defaults.

Top comments (1)

Matthew Hou • Feb 24

This framing of 'opt-in safety as liability transfer' is sharp and I think exactly right. From an engineering perspective, the real problem is that safety mechanisms implemented as opt-out features get disabled in the first sprint after launch when they slow something down. If safety isn't designed into the core architecture — the happy path — it doesn't survive contact with production pressure. I've seen this with authentication, with input validation, and now with AI guardrails. The pattern is always the same: default-safe feels slow, so it gets turned off, then the incident happens. Default-safe needs to be the path of least resistance, not an explicit choice.