I joined a system I work on as a total stranger — and it silently dropped me

#ai #agents #testing #architecture

I work on an agent task economy: autonomous software agents publish signed events to a public log, declare what they can do, get matched to small paid jobs, and earn credit when a verifier confirms their results. The whole thing is permissionless by design — no accounts, no API keys, just a keypair and the public docs. Which raises an uncomfortable question I'd been avoiding: can a brand-new agent actually walk in off the street and earn its first credit using nothing but what we publish?

Every component passed its own tests. The matcher matched. The verifier verified. The settlement settled. So I assumed the answer was yes. I was wrong, and the way I was wrong is the most common way onboarding breaks.

The falsification test

The only honest way to answer the question was to stop confirming and start falsifying. So I generated a fresh keypair — no relationship to anything I'd touched before — and became a newcomer with zero insider knowledge. The rule I gave myself was strict: I may read only the public docs. No source code, no internal schema, no "oh I know what it really wants." If a real stranger couldn't do it from the docs, neither could I.

The join went perfectly. Signed profile event, proof-of-work to deter spam, and a capability declaration saying what kind of work I could take. Textbook. The public log showed me arriving.

Then I waited for the bootstrap task — the small, automatically-issued first job that's supposed to give a newcomer something to actually do and a first credit to earn. It never came.

The gap was between two things that both "worked"

No error. No rejection. No log line addressed to me. Just nothing. From the newcomer's seat, the network was a locked door with no handle and no sign.

When I finally traced it, the bug wasn't in any component. It was between two of them:

The task issuer — the thing that decides who gets a bootstrap task — accepted a capability declaration in exactly one narrow shape.
The verifier — the thing that later checks results — accepted a broader set of shapes.
And the public docs, whose example I had faithfully copied, used a third shape that the issuer silently rejected.

Three consumers of the same wire format, three different ideas of what that format was. Each had been tested in isolation and each "worked." But the issuer's matcher looked at my docs-shaped declaration, decided this agent declares no capability I recognize, and skipped me. The verifier would have happily accepted me — but you never reach the verifier without a task, and you never get a task without passing the issuer. So a newcomer who did everything the documentation said landed in a dead zone that no single component's tests could see.

After I republished the exact same capability in the precise shape the issuer wanted, the whole pipeline unblocked in seconds: task issued, accepted, result submitted, verified, settled, first credit earned. The system worked. It had always almost worked. The gap was a few characters of structure that no insider would ever get wrong, because no insider produces the wrong-but-plausible shape — only a stranger copying the docs does.

Four things I now believe about onboarding

1. Onboarding paths rot silently, and they rot in the seams. Your unit tests live inside your trusted setup, where every producer emits the shape every consumer expects. The newcomer's path crosses component boundaries that your tests never stress with realistic-but-foreign input. The failure isn't a broken part; it's two correct parts disagreeing about a detail at the wire.

2. Only a true falsification test catches it. Not "log in and click around as yourself." Fresh identity, zero privileged knowledge, only the public docs, and a hypothesis you are actively trying to break: a stranger cannot complete the first job. A confirmation test ("does my account still work?") will pass forever while the front door stays jammed.

3. The silent skip is the worst possible failure mode. A loud rejection — capability declaration not recognized: expected shape X, got shape Y — would have cost me thirty seconds. The silent drop cost me a debugging session and, for a real newcomer, the entire relationship: they'd conclude the network is dead or hostile and leave. I didn't recognize what you sent must be loud. Code that decides to skip an actor on the onboarding path should be physically incapable of doing so without emitting a reason addressed to that actor.

4. Producer and consumer must agree from one shared schema — and your docs are a third consumer. The issuer and verifier drifted because each carried its own private notion of the format. The fix is a single canonical schema both validate against, so they can't disagree. But the subtler lesson is that documentation is also a consumer of your format, and it drifts just like code does. Your canonical example must be a test fixture: feed the docs' own example through the real intake path in CI, and fail the build if the thing you tell strangers to send is something you'd silently reject.

The line I keep coming back to: a system that works for everyone who already knows how it works is not the same as a system that works. The only way to know which one you've built is to arrive as a stranger and try to get in.