Stop Babysitting What? The Trust Boundary You Just Relocated.

#security #governance #agents #ai

On 2026-05-19, Sid Bidasaria — founding engineer and tech lead of Claude Code at Anthropic — gave a talk at Code with Claude London titled "Stop babysitting your agents." The talk lays out three patterns that compound: verification loops, parallelization, and background routines. The argument is genuine, the engineering is real, and the patterns work.

But each pattern shares an unstated assumption: that the automation you put in place to remove human oversight is itself trustworthy. The talk scales the things you can automate. It does not scale the question that decides whether the automation is safe.

The trust boundary doesn't disappear when you automate. It relocates. Each of the three patterns moves it to a different surface, most of which teams have not built infrastructure for yet.

Pattern 1 — Verification loops move the trust boundary from "code review" to the harness itself

When an agent verifies its own work, the question "is this code safe?" is replaced by "is the verification harness adversarially robust?" The two are not the same.

The structural failure mode is concrete. Over the past six months, the AI agent framework ecosystem has shipped a coherent cluster of CWE-502 (insecure deserialization) vulnerabilities:

CVE-2026-26210 — ktransformers ≤ 0.5.3, CVSS 9.8 — ZMQ ROUTER socket binds 0.0.0.0 with no authentication; worker calls pickle.loads() on raw network bytes.
CVE-2026-28277 — langgraph ≤ 1.0.9, CVSS 7.2 — SQLite checkpoint loader reconstructs Python objects from msgpack without an allowlist.
CVE-2025-68664 — langchain-core < 0.3.81, CVSS 9.3 — dumps()/dumpd() allow user-controlled lc keys, enabling Jinja2 template injection and credential extraction on deserialize.
CVE-2026-7712 — MindsDB ≤ 26.01, CVSS 6.3 — pickle.loads() in the Pickle Handler without input validation; vendor declined to patch.

An agent that resumes from a poisoned checkpoint will self-verify successfully, because the verification runs in the same process that just got compromised. The prompt injection that lets an agent write malicious code can corrupt the agent's "I checked it" claim by the same mechanism. The verifier and the thing being verified share a substrate, and that substrate is the attack surface.

The fix is not "tell Claude to check more carefully." The fix is adversarial tests for the verifier itself: prompt-injection vectors that target the verification claim, checkpoint poisoning that survives self-verification, tool-output manipulation that flips PASS to FAIL.

When Pattern 1 lands, the trust boundary has moved. "Did a human read the diff" is replaced by "is the harness verified." Don't conflate "Claude verified it" with "the verification is verified."

Pattern 2 — Parallelization moves the trust boundary to agent-to-agent attestation

The Bidasaria framework's natural extension is the asymmetric notification architecture: agents seek aid from adjacent specialized agent processes rather than escalating to humans. Ten Claudes verifying each other. Cross-agent attestation as the gate.

This is where lived evidence gets uncomfortable. Operating on AI agent standards-body threads (a2aproject/A2A, microsoft/autogen, x402-foundation/x402) for two months has surfaced a recurring pattern: an account named kenneives "independently verifies" a proposal from Liuyanfeng1234. Both validate arian-gogani's receipts in a parallel thread. On 2026-05-16, arian-gogani publicly admitted that SpeedGenius00 — which had been "independently verifying" the same proposals — was an old account of his, accidentally posted under.

That cluster operates exactly the architecture the asymmetric-notification prescription describes. Agents validating each other to reduce human notification load. The closed loop sells itself as "independent verification" because the human in the loop has been removed by design.

Cross-agent attestation cannot be the trust boundary. It has to reference a trust boundary that lives somewhere the agents do not control. Out-of-band identity (W3C DIDs, cryptographically-signed agent attestations), verifiable evidence schemas that compose across providers, substrate-neutral standards bodies — these are not features to add later. They are the surface where Pattern 2 either works or collapses.

When Pattern 2 lands, the trust boundary has moved. "Did a human verify each agent's output" is replaced by "is the substrate the agents reference independent of the agents themselves." Trust the substrate, not the peers.

Pattern 3 — Background routines move the trust boundary to policy enforcement at the gate

When agents run continuously without you, the question "did anyone check this?" is replaced by "did the gate enforce the policy?" An autonomous routine that patches a dependency, refactors a service, or queues a PR is, by design, accountable only to the constraints it cannot override.

Two recent disclosures expose how brittle this gets without hard constraints. On 2026-05-13, Akamai disclosed three MCP database-server vulnerabilities; Alibaba Cloud's RDS MCP response was "not applicable" for a fix. On 2026-05-03, MindsDB declined to patch CVE-2026-7712. Two confirmed instances of vendor-refusal-as-disclosure-failure-mode in three weeks. The coordinated-disclosure pipeline has no machine-readable "refused" status code. A continuous autonomous maintenance routine has no signal to act on. The dependency gets pulled, the patch gets queued, the verification gets run — and the trust boundary the operator thought they had is sitting in a private email between a researcher and a vendor.

Non-overridable hard constraints at the policy gate are what make Pattern 3 survive contact with this environment. The autonomous routine that pulls a new dependency must pass through a gate that asks: does this dependency violate a constraint the agent cannot rewrite, even if it "verified" the change?

When Pattern 3 lands, the trust boundary has moved. "Did a human approve each change" is replaced by "are the gate's hard constraints non-overridable, even by the agent itself."

The fourth pattern

Each of Bidasaria's three patterns is real. Each makes you faster. Each is necessary if agents are going to operate at scale.

But each relocates the trust boundary to a surface most teams have not built yet:

Pattern 1 moves it to the harness. Verify the harness.
Pattern 2 moves it to inter-agent attestation. Trust the substrate, not the peers.
Pattern 3 moves it to the gate. Enforce constraints the agent cannot override.

The talk says stop babysitting. The right question is: stop babysitting what? You can stop babysitting the keystrokes. You can stop babysitting the diff. You can stop babysitting the wall clock.

You cannot stop babysitting the trust boundary. You can only relocate it. The work has not disappeared; it moved into the verification infrastructure, the identity substrate, and the policy gate.

Operator prescriptions

For Pattern 1 (verification loops):

Build adversarial tests for the verifier itself, not just the code under test.
Specific surfaces: prompt-injection against the verification claim, checkpoint poisoning that survives self-verification, tool-output flips on PASS/FAIL.
Treat "Claude verified it" as untrusted until verified out-of-process.

For Pattern 2 (parallelization):

Out-of-band identity for cross-agent attestation: signed DIDs, evidence schemas that compose across providers.
Reference a substrate the agents do not control. The work being done in MCP, A2A, x402, and the OWASP Agentic Security Initiative is exactly this.
Treat agent-to-agent "I verified this" as the cluster pattern until anchored to identity.

For Pattern 3 (background routines):

Non-overridable hard constraints at the policy gate. The agent that wrote the constraint cannot rewrite it.
Audit log with bilateral co-signature (one party signs, another verifies on independent infrastructure).
Machine-readable "refused" status for vendor disclosure outcomes — pending standards-body work; until then, route around refusing-vendor surfaces explicitly.

The compounding effect

Stacked, the three patterns are powerful. Stacked with the fourth — verify the verifier, trust the substrate, enforce non-overridable constraints — they survive contact with the threat environment the talk does not mention.

Without the fourth, the three patterns are an attack surface multiplier. Each layer of automation removes a human-in-the-loop check while assuming the agents below it are honest. The CWE-502 cluster, the authority-laundering cluster, and the vendor-refusal pattern are not edge cases. They are the operational present.

Bidasaria's vision is correct. The babysitting can stop. But the relocation work has to happen first — and the engineering organizations that do that work before scaling agent fleets will be the ones whose automation survives the first adversarial contact.

Stop babysitting what? Stop babysitting the keystrokes. Keep babysitting the trust boundary. That is the actual work.

Saleme, Michael K. — ORCID 0009-0003-6736-1900

References: Sid Bidasaria, "Stop babysitting your agents," Code with Claude London, 2026-05-19. Open-source artifacts: agent-security-harness (470 adversarial tests across MCP, A2A, x402, L402) · constitutional-agent (six-gate governance with 12 hard constraints).