The Meta Security Layer: What Nobody Told You About Zero-Human Companies

#go #paperclip #ai

I open-sourced a little thing called mesa. It's an orchestrator, paperclip alternative — you run it on your laptop, it spawns coding agents, babysits them, wires them up to your actual work. Local dev tool. That's the whole pitch.

Within a week, two strangers opened PRs to it. One of them shipped 62,000 lines of Shopify onboarding docs, finance-team agent definitions, and a marketing ops playbook, into a Go project that has nothing to do with any of that: mesa#28. Go look. It's still open.

Nobody got hacked. They were using mesa the way it's supposed to be used — running it locally, pointing agents at their own projects. The agents had push access (of course they did, that's how agents work). And somewhere between "do the work" and "commit the work," something pushed to msoedov/mesa instead of to their own fork. I don't even think they noticed right away.

Think about what that means for a second. A competent developer, on their own machine, using their own credentials, with their own CLAUDE.md, ended up publishing a private business playbook to a public repo they didn't own. Not because anyone attacked them. Because the stack got weird.

The meta security layer

This is the part I keep coming back to. I don't think we have a name for it yet, so I'll call it the meta security layer, which I realize sounds like a conference talk, sorry.

The idea is simple: we now stack models under agents under orchestrators under teams under companies. Each layer brings its own rules. And the rules do not compose.

A rule at the agent level — "don't run git push," stick it in CLAUDE.md — works great for the first five minutes on one laptop. Then somebody clones the repo, their CLAUDE.md doesn't have the rule. A subagent runs without inheriting parent instructions. An MCP server returns a tool description that quietly contradicts it. A persuasive issue body asks for "just one quick PR." A template overrides the memory file. A teammate runs the same orchestrator with their own credentials attached.

Instructions to a probabilistic system in markdown are not controls. They are vibes. Every time you add a layer — another agent, another teammate, another shared tool — the vibe gets a little more diluted and the blast radius gets a little bigger. Nobody did anything wrong. The thing just drifted.

Shared data is the other one

The other one I learned the hard way is shared data access. I used to think "my tools use my credentials" was fine. It was fine when the tool was a CLI I was typing into. It is not fine when the tool is an autonomous process reading untrusted inputs and deciding to make authenticated calls.

In mesa we ended up doing two things I wish I'd done on day one instead of week three. Every agent session gets its own scoped API key — not the user's key, not a shared key, its own. And the human-facing dashboard has a completely separate auth path — a dashboard login never touches an agent context, and an agent session can't suddenly act as the operator. If one agent goes sideways, the damage is that one session. Not the org, not the other work in flight, not the human's creds.

The reason this is hard to see before you get burned is that in the old world it was fine. One developer, one laptop, one key, one intention. Now "I" is a team, "my laptop" is a fleet of orchestrators, "my agent" is twelve agents in parallel, and "my intention" is whatever the agent decided about the issue it just read.

The supply chain is worse than you think

I'm not even going to do a real section on supply chain because I'd be here all day. But briefly: the classic supply chain is lockfiles, package registries, signed builds.

The agent supply chain is all of that, plus the model itself, plus the runtime, plus every MCP server anyone on the team installed, plus every skill and subagent definition someone imported from GitHub, plus every repo the agent reads, plus every issue body, plus every tool description the agent sees at runtime.

The agent follows instructions it encounters along the way. That's the feature. That's also the vulnerability.

There is no SBOM for that yet. I don't think there can be one, not in the shape we're used to.

Closing

I don't have a neat ending for this. I'm writing it because I watched two smart people accidentally leak their own work into my repo, and the uncomfortable part is that I can see how it happened and I'm not sure my own setup is much better.

If you're building in this space, the thing I'd tell 2025-me is: assume your rules will be bypassed, your keys will be reused somewhere you didn't expect, and your agent will one day open a PR you didn't ask for. Not because anything broke. Because the stack got weird.

Design for that day. It's closer than it looks.

Top comments (1)

PEACEBINFLOW • Apr 25

What gets me is the phrase “vibes, not controls.” That’s so uncomfortably accurate, and I think it points to something deeper than just the tooling problem.

We’ve spent years treating agent instructions like they’re config files — deterministic, reliable, something you set and forget. But they’re more like cultural norms on a team. You can write down “we don’t push directly to main,” but the enforcement is social, not technical. And now we’ve added a non-human teammate who’s incredibly literal but also incredibly suggestible, and we’ve given it commit access and told it to be helpful.

The part of your story that sticks with me is that the developer probably didn’t notice right away. That’s the really unsettling bit — not that the agent did something unexpected, but that the human in the loop had already become a loose coupling. The feedback loop between intention and outcome has gotten so stretched that you can leak a playbook and not feel it happen.

It makes me wonder if we’re going to need something like “agent observability” that looks less like logs and more like a diff between what you thought you asked for and what actually got executed. Not just monitoring, but something closer to intent reconciliation. Have you seen anyone experimenting with that kind of thing, or is everyone still just tightening permissions and hoping?