DEV Community

Cover image for The Trust Gap: Why AI Agent Capabilities Can't Be Self-Reported
Stanislav Tsepa
Stanislav Tsepa

Posted on

The Trust Gap: Why AI Agent Capabilities Can't Be Self-Reported

The fundamental flaw in how we currently build AI agent ecosystems is the capability registry.

Right now, if you are building a multi-agent system, your routing layer probably asks agents what they can do. An agent replies with a static JSON schema:

{
  "agent_id": "data_cleaner_01",
  "capabilities": ["format_json", "execute_sql_read", "summarize_text"]
}
Enter fullscreen mode Exit fullscreen mode

Here is the problem: an agent self-reporting its capabilities is essentially a declarative lie.

It might genuinely handle 90% of formatting tasks, but choke on your specific edge case. It might work perfectly with small payloads but hallucinate under load. Worse, in an open ecosystem where agents are rewarded (compute, tokens, reputation) for accepting tasks, they are financially and algorithmically incentivized to over-promise to win the routing bid.

And if they fail mid-task? You've already handed over context, state, and potentially sensitive credentials.

Stop Building Registries on Honor Systems

Some developers have recognized this and attempted to build "Capability Probes"—micro-validations sent to an agent before the real handoff. ("Don't tell me you can format JSON, format this small string first.")

While safer, probes add unacceptable latency to the critical path. You cannot run a dynamic integration test every single time you need to route a prompt.

We need to shift from declarative schemas to empirical trust scores. We need Eval-Backed Advertising.

Trust, but Cryptographically Verify

An agent shouldn't just broadcast what it can do; it must broadcast its proof.

Instead of a boolean can_execute_sql=true, the agent broadcasts a cryptographic proof of passing a standardized benchmark for that specific task within the last 24 hours.

If an agent claims to be a SQL generator, it must attach a signed attestation of its score on the Spider dataset. If it claims to be a Python engineer, it attaches its SWE-bench score.

{
  "agent_id": "sql_writer_09",
  "capabilities": {
    "sql_generation": {
      "benchmark": "spider_v1.0",
      "score": 0.89,
      "attestation_signature": "0x4f8b...a1c9",
      "timestamp": "2026-02-23T14:00:00Z"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The router doesn't have to trust the agent; it only has to verify the signature of the benchmark authority. If the agent cannot provide the proof-of-work, or the signature is stale, the router downgrades its confidence score to 0.1 and routes the task to a slower, more expensive fallback model.

The Paradigm Shift

We are moving past the era of "agents doing chores" into the era of agents interacting in complex, zero-trust economies.

If we are going to let autonomous systems handle our databases, APIs, and wallets, we cannot rely on API schemas that act like dating profiles. We need verified proof of competence.


I document my daily build logs, architectural teardowns, and unlisted experiments on my public journal. Subscribe to The Prompt & The Code on Telegram.

Top comments (0)