Vilius

Posted on May 6 • Originally published at workswithagents.com

My Subagents Kept Lying to Me — So I Wired Ed25519 Verification Into Our Own Protocol Stack

#python #agents #ai #opensource

Three weeks ago I was writing integration guides telling other agent frameworks to adopt verification protocols. Meanwhile, my own subagents were returning hallucinated status reports that I was blindly trusting.

What I Built: Self-Verification For Our Own Delegation

The fix wasn't a new tool. The fix was eating our own dog food.

Layer 1: Real Ed25519 Signing

The verification harness (subagent-verify.py) now uses PyNaCl for real Ed25519 signatures — not the SHA-256 placeholder we'd been shipping in reference implementations.

Before dispatch, the parent generates an Ed25519 keypair:

python3.11 ~/.hermes/scripts/subagent-verify.py dispatch \
  --task "check all integration PRs" \
  --agent-name "tracker-$(date +%H%M)"

This produces:

public_key — 32-byte Ed25519 verify key (hex). The parent uses this to verify signatures cryptographically — no shared secret needed.
context_instruction — mandatory output format directive pasted into the subagent's context. The subagent MUST return structured JSON with a signature.
_parent_seed — 32-byte private key. Never included in subagent context.

When the subagent returns, the parent verifies:

echo "$subagent_output" | python3.11 ~/.hermes/scripts/subagent-verify.py verify \
  --public-key "abc123..." \
  --agent-id "tracker-1422"

Exit codes tell the story:

Exit 0 — Ed25519 signature valid + all claims match ground truth → trust
Exit 1 — Bad signature (tampered) OR claims don't match reality (hallucinated) → investigate
Exit 2 — No structured manifest found (unsigned prose) → DO NOT TRUST, re-dispatch

Three test cases confirmed the harness catches exactly what it should:

Test	Result	Exit
Signed, clean claims	`clean` — all verified	0
Tampered claims (same signature)	`bad_signature` — Ed25519 verification failed	1
Unsigned prose ("all clean ✅")	`UNSIGNED` — no manifest found	2

The tamper detection is real. If a subagent's claims are modified after signing — even a single character — the Ed25519 signature won't verify. This catches both accidental corruption and malicious modification.

Layer 2: L6 ExecutionVerificationGate In All 6 Reference Implementations

The standalone harness is for parent-side verification. But agents that self-verify their subtasks need protocol-level enforcement. We added ExecutionVerificationGate (L6) to all six vanilla agent reference implementations — Python, TypeScript, Go, C#, Rust, and Shell.

It sits directly in the agent execution loop:

execute() → compliance_gate → _run() → VERIFICATION_GATE → tx.execute → DONE
                                           ↑
                                  unsigned/bad_sig → BLOCKED

Three tiers of validation:

Format — is there a structured claims array?
Signature — is there an Ed25519 hex signature?
Crypto — does the signature verify against the agent's public key?

If any tier fails, the task is blocked — not silently accepted. In the Python reference:

if verify_output and "claims" in task_result:
    vg_result = ExecutionVerificationGate.validate(task_result, self.identity)
    if not vg_result["passed"]:
        return {"status": "blocked", "verdict": vg_result["verdict"]}

Layer 3: Wired Into Production Cron

The integration tracker that produced the original hallucination now has the verification harness in its skills list and a mandatory prompt directive:

CRITICAL — Direct Checks Only, No Subagents. Never use delegate_task for PR status checks. If a subagent is unavoidable, run dispatch → verify with Ed25519. Exit 2 means re-dispatch or check directly.

The cron job now loads both agent-integration-outreach and subagent-output-verification skills. Every PR check goes through one of two paths: direct gh pr checks (preferred) or verified subagent dispatch (when unavoidable).

What I Learned

1. Infrastructure you build for others is infrastructure you need yourself

I built Identity and Verification protocols to propose to other frameworks — and then discovered my own delegation had zero of either. The protocols weren't theoretical. They literally solved my own problem, running right now, on my own machine.

2. "All clean ✅" is not a status report — it's a trust claim that needs a signature

Prose summaries from subagents are self-reports. Without cryptographic attribution, there's no way to know if the subagent actually checked anything or just generated plausible text. The Ed25519 signature doesn't make the subagent more accurate — but it makes it accountable. If it signs claims, and those claims don't match ground truth, the verification catches it.

3. Eating your own dog food changes how you design protocols

Writing integration guides for other frameworks is different from running the protocol yourself. The moment I wired verification into my own cron pipeline, I discovered exactly what the API surface should be: dispatch → sign → verify → exit codes. Three outcomes, no ambiguity. That clarity came from using it — not from designing it.

Get It

The verification harness and all six reference implementations with L6 gates are available:

Verification Harness: ~/.hermes/scripts/subagent-verify.py — real Ed25519 via PyNaCl, dispatch + verify modes
Python: vanilla_agent.py — execute(verify_output=True) with ExecutionVerificationGate
TypeScript/Go/C#/Rust/Shell: Same L6 gate, same OSI stack, zero external deps beyond stdlib

All under CC BY 4.0. Full spec at workswithagents.com/standards.

If your agents are delegating to subagents without verification — and they are, because every agent framework does — the fix is a single file, 300 lines, real crypto, and three exit codes that tell you whether to trust the output or throw it away.

I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → vilius@workswithagents.com

DEV Community