DEV Community

The Harness Stack

Ian Johnson on June 01, 2026

Ask five developers what an "agent harness" is and you will get five different answers. Some mean the model. Some mean a CLAUDE.md file. Some mean ...
Collapse
 
kenielzep97 profile image
Self-Correcting Systems

The taxonomy is the cleanest framing I've seen of this space. The debugging ladder —
"which level is this a problem at?" — is genuinely useful infrastructure.

One gap worth naming in the Level 1 framing: the debugging question you list is "Is
global memory incomplete or contradictory?" That frames it as a coverage or conflict
problem. There is a third failure mode that lives in Level 1 and maps to neither: the
memory is present, non-contradictory, and retrieved — but the retrieval system selected
it based on query relevance, not on what the memory is authorized to govern. The agent
acts confidently on the most relevant instruction it found, which may be superseded or
simply not authorized for the action being taken.

We have been running experiments on this in agent memory stores. The finding,
replicated across two packet families: target-accurate retrieval can still produce
unsafe actions when the retrieved memory lacks authority metadata — fields that tell
the system what action class the memory is authorized to govern. A retriever optimizing
for relevance and a retriever optimizing for authority jurisdiction select different
memories, and the divergence correlates with the most dangerous failure modes.

The L1 debugging question would be sharper with a sub-question: not just "is the memory
present and non-contradictory?" but "does the retrieved memory know what it is
authorized to govern?"

German's auditability concern and this gap are related but distinct. German is asking
who can verify what the agent did after the fact. This is asking whether what the agent
acted on was ever authorized to govern that action in the first place. Both gaps are
real. They fail at different moments in the chain.

Collapse
 
tacoda profile image
Ian Johnson

Thank you! Really good point about the alternative failure mode. Failure modes across all layers would likely have to be identified, classified, and addressed. I think that’s a valuable addition to the model.

Collapse
 
pqbuilder profile image
German

The distinction you draw is precise and important. What I was pointing at is downstream verification — given that the agent acted, can you prove it cryptographically. What you are pointing at is upstream authorization — given that the agent is about to act, was the memory it retrieved ever authorized to govern this action class.
They fail at different moments but share a common root: neither the action nor the memory has a verifiable authority chain attached to it.
The mechanism that addresses both is a certificate per agent — not just a token for the action, but a certificate that encodes what action classes this agent is authorized to perform. Issued by a CA the organization controls. When the agent retrieves memory to act on, the memory itself could carry a reference to the certificate scope it was authorized under. When the action is taken, the token is signed against that same certificate.
That gives you Ian's cross-cutting auditability property at each layer — L1 memory carries authority metadata, L2 actions are signed against it, L3 is the CA that issued the certificates fleet-wide.
The gap you identified — retrieval optimizing for relevance instead of authority jurisdiction — is exactly what certificate-scoped memory would close. The retriever would select not just what is most relevant but what is authorized for this action class.
This is the L3 infrastructure nobody is building yet. The primitives exist — certificate authorities, signed tokens, revocation. The missing piece is applying them to agent memory and action authorization, not just to network identity.

Collapse
 
kenielzep97 profile image
Self-Correcting Systems

The synthesis is right: upstream and downstream fail at the same root. The action has
no authority chain, and the memory it was selected from has no authority chain, and
those are the same absence wearing different names.

The certificate model closes something our metadata approach cannot.
governs.action_types in our schema attempts to express jurisdiction — what action
classes this memory is authorized to govern. But it's a field in a file. The memory
self-asserts that it was authorized to govern ["execute", "write"]. Nothing verifies
that claim. There's no issuing authority, no scope boundary, no revocation path. The
metadata is the authority and the assertion simultaneously.

What certificate-scoped retrieval would change architecturally: before ranking by
relevance, the retriever filters by certificate scope. The question becomes not "which
memory is most relevant to this query" but "which memories are authorized for this
action class, and of those, which is most relevant." Relevance becomes a tiebreaker
within an authorized set, not the primary selector. That's a different retrieval
architecture than anything currently in our framework — and it's the correct one for
the problem we've been trying to solve.

The revocation case is where the model needs more design work. PKI handles revocation
cleanly for network identity: the endpoint stops being trusted. It's less clear how
revocation propagates backward through memory already retrieved and weighted under a
prior certificate scope. If an agent's action-class certificate is downscoped
mid-session, does the already-loaded context stay valid? That question doesn't have an
obvious answer in the certificate model yet.

On where our framework sits relative to the three layers: we're building at L1 —
jurisdiction expressed as structured fields in the memory object (governs,
action_types, resource_sensitivity, verification_required). These are exactly the
claims a certificate model would verify rather than self-assert. We're specifying what
authority metadata needs to contain. L2 (signed actions against a certificate) and L3
(CA-issued agent certificates fleet-wide) are not in current scope.

The honest framing: we're building the schema that certificate-scoped memory would
need. The verification layer is the missing piece — and you've named the mechanism that
would provide it.

Thread Thread
 
pqbuilder profile image
German

The revocation mid-session question is the right one to push on. Two models worth separating:
Lazy revocation — the already-loaded context remains in memory but cannot produce authorized actions. On the next action attempt, the agent checks the certificate. If revoked or downscoped, the action is blocked regardless of what context was loaded. The context is not invalidated retroactively — it simply cannot authorize new actions. This mirrors TLS session resumption: existing sessions continue under prior state, new operations require the updated certificate.
Eager revocation — the agent receives an out-of-band signal on revocation and flushes context loaded under the revoked scope. This requires an active channel — a webhook, a push notification, something the agent is listening to. More complex, but closes the window entirely.
For most production cases, lazy is sufficient. Mid-session revocation is an emergency event — compromised agent, incorrectly scoped certificate. The risk is not the context sitting in memory. The risk is the next action executing under a revoked certificate. Lazy revocation blocks that.
The eager model becomes necessary when the agent has long sessions with large context windows and the revoked scope covers high-sensitivity action classes — financial transactions, code deployment, external communications. In those cases you want the flush, not just the block on next action.
On the schema your framework is building: governs.action_types is exactly the field a certificate would sign over. The certificate does not replace that schema — it becomes the issuing authority for it. Your metadata specifies the claim. The certificate makes the claim verifiable and gives it a revocation path.
That's not a replacement architecture. It's a verification layer sitting above what you're already building.

Thread Thread
 
kenielzep97 profile image
Self-Correcting Systems

The lazy/eager distinction resolves the revocation question and separates the two
threat models cleanly.

Lazy is sufficient for the production case because you named the risk correctly — it is
not context sitting in memory, it is the next action executing under a revoked scope.
If the gate checks the certificate before authorizing any action, lazy revocation
closes that window without requiring an active channel. The TLS session resumption
analogy is the right frame: prior state continues, new operations require fresh
validation.

Eager becomes necessary when the agent's reasoning loop is long enough that "next
action attempt" is too far away. A multi-step planning agent working through a
financial transaction sequence could execute several intermediate steps before hitting
the gate again. In those cases the flush matters, not just the block. The threat model
determines which you need — most teams won't know which one they have until they map
their longest session paths against their highest-sensitivity action classes.

The clarification at the end is the most useful thing said in this thread.
governs.action_types is the field the certificate signs over. Not a replacement
architecture — a verification layer sitting above what already exists. We specified the
vocabulary. The certificate gives that vocabulary cryptographic weight and a
revocation path. That relationship also resolves the compliance gap from the earlier
comment: right now action_authorized_by names the field that authorized the action.
Under the certificate model, the attestation record carries the certificate reference
alongside that field. Same structure, now externally verifiable instead of
self-asserted.

One direct question: would you be willing to author an external validation packet?
Everything tested in this framework has been internally authored by someone who
designed the schema. We need a packet written from outside — a domain you choose, mixed
metadata quality, run blind through the evaluator. Your read of this work is precise
enough that you would know exactly how to stress it. That is the test the framework
needs before any of these claims can be treated as more than internally validated
findings.

Collapse
 
pqbuilder profile image
German

The taxonomy holds well as a configuration map, but it is missing a cross-cutting concern that does not fit cleanly in any single level: auditability of agent actions.
At L2 and L4 especially, agents are taking actions — calling tools, writing code, making decisions — and the question of how do you prove what an agent did, when, and that it was not tampered with is not addressed by harness configuration alone.
A hook at L2 can log what an agent did. An orchestrator at L4 can trace execution paths. But neither gives you a cryptographic guarantee that the record is authentic and was not altered after the fact. In regulated environments or anywhere agent outputs have downstream consequences — financial decisions, code deployed to production, documents sent externally — that gap matters.
The debugging ladder you propose (which level is this a problem at?) implicitly assumes you can trust the logs. That assumption breaks exactly when you most need it not to.
Not sure if this belongs as a sixth level or as a cross-cutting concern that each level needs to address independently. But the taxonomy feels incomplete without it.

Collapse
 
tacoda profile image
Ian Johnson • Edited

Great insight! Do we need observability or audibility as a required property or cross-cutting concern?

But neither gives you a cryptographic guarantee that the record is authentic and was not altered after the fact

This is huge - it should definitely have to have an ability to verify legitimacy. What are your thoughts on what that would ideally look like?

Collapse
 
pqbuilder profile image
German

Exactly — I'd frame it as auditability rather than observability. Observability tells you what happened. Auditability tells you that what happened is verifiable and tamper-proof. They solve different problems.
What it would ideally look like: each agent action produces a signed token — sub: "agent_id", scoped to the specific operation, short TTL, revocable. The signature is cryptographic proof that this specific agent performed this specific action at this specific time. If the token verifies, the action is authentic. If it doesn't, something was tampered with or the agent was compromised.
The revocation piece matters as much as the signing. If an agent is compromised, you want to be able to invalidate its credentials instantly — not wait for tokens to expire.
I've been building exactly this with FIPSign — a post-quantum signing API built on ML-DSA-65 (NIST FIPS 204). An agent is just another sub. Sign the action, verify anywhere, revoke if something looks off. The post-quantum part is forward-looking but the auditability model works today regardless of the threat model.
Where it gets interesting is your L3 question — org-level auditability would mean a shared CA that issues certificates per agent, so you can verify not just "this token is valid" but "this token was issued by an agent that belongs to this organization's fleet." That's the layer nobody is building yet.

Thread Thread
 
tacoda profile image
Ian Johnson

This is great! But since these are all at different level, maybe we need that audibility as a requirement of each layer. So that, for example, Layer 2 and depend on Layer 1 because it can validate that it was a legtimate output. If each layer has its own mechanism to record the agent's output - or perhaps the diff - that may work. This seems like a property of the whole harness that must be true at each level. I have no doubt there are others that I've been blind to. Thanks!

Collapse
 
tecnomanu profile image
Manuel Bruña

The harness framing is right: agent output needs a place to land where quality can be checked. I care less about the agent being clever and more about whether the harness captures inputs, constraints, tests, and review points.