DEV Community

Cover image for I Built an Open Standard for AI Agent Observability — Here's Why It's Needed
ANAND SINGH
ANAND SINGH

Posted on

I Built an Open Standard for AI Agent Observability — Here's Why It's Needed

AI agents are everywhere — approving refunds, managing infrastructure, calling APIs, making decisions that affect real users and real money. Multiple frameworks build them (LangChain, CrewAI, AutoGen, OpenClaw, custom stacks), and multiple protocols power their actions (HTTP, MCP, A2A).

But there's a fundamental gap: there is no common observability protocol for what agents actually do.

The Problem

Every framework handles logging differently. Most don't log at all. And none of it is tamper-proof.

AI hallucination is still an unsolved problem. An agent can:

  • Fabricate a tool call that never happened
  • Misrepresent what the LLM actually returned
  • Act without proper authorization

And you'd never know.

Now think about where agents are headed — healthcare, finance, defense, critical infrastructure. In these environments, a rogue agent or a cyber attack exploiting an agent isn't hypothetical. It's inevitable.

And it's not just agents within a single system. Soon it will be common for two different enterprises' agents to transact with each other. Personal agents will negotiate with company agents on your behalf. Without a common observability format, there's no shared source of truth about what happened.

Agent History Protocol (AHP)

I built AHP as an open standard for tamper-evident, hash-chained recording of every AI agent action.

Every HTTP call, MCP tool use, A2A message, LLM inference, and authorization decision gets recorded as a cryptographically linked, append-only record. Each record contains a SHA-256 hash of the previous record. If anything is modified, deleted, inserted, or reordered — the hash chain breaks.

Quick Start

pip install open-ahp
Enter fullscreen mode Exit fullscreen mode
from ahp.core.chain import ChainWriter
from ahp.core.records import BootPayload

# Start recording
writer = ChainWriter("my-agent.ahp")
writer.write_boot(BootPayload(
    agent_name="my-agent",
    sdk_name="ahp-py",
    sdk_version="0.1.0"
))

# Auto-instrumentation intercepts HTTP calls automatically
# Every action gets a hash-chained, append-only record
Enter fullscreen mode Exit fullscreen mode

TypeScript SDK:

npm install open-ahp
Enter fullscreen mode Exit fullscreen mode

Verify and inspect:

# Verify chain integrity
ahp verify --chain my-agent.ahp

# View the action log
ahp log --chain my-agent.ahp

# Export to JSON/CSV
ahp export --chain my-agent.ahp --format csv
Enter fullscreen mode Exit fullscreen mode

Three Conformance Levels

Level 1 — Recording: Every action gets a hash-chained record with timestamps, sequence numbers, protocol type, tool name, parameter/result hashes, response times, and authorization details.

Level 2 — Signing: Ed25519 cryptographic signatures on checkpoint records. Forged records from a different agent fail signature verification. Merkle root signature catches any tampering across the entire chain.

Level 3 — Witness: Independent witness servers hold signed receipts of chain checkpoints. Even if an agent deletes its entire chain file, the witness has cryptographic proof that the chain existed.

What It Detects

Tampering Attempt Detection Method
Modified records (tool calls, inferences, authorizations) Hash chain breaks
Deleted records from the middle Sequence gap + hash mismatch
Inserted fake records Hash chain breaks at insertion point
Reordered records Sequence + hash mismatch
Agent lying about calls, authorization, or LLM responses Hash of parameters/results doesn't match
Forged records from different agent (Level 2) Ed25519 signature verification fails
Checkpoint tampering (Level 2) Merkle root signature catches it
Agent deleting its own chain (Level 3) Witness server has independent signed receipts

What It Does NOT Prevent

Being honest about limitations matters:

  • Reading the chain — it's not encrypted
  • A compromised agent writing false records going forward — the agent controls its own writer
  • Bad actions before they're recorded — AHP records what happened, it can't prevent bad actions

Think of it like a flight recorder on an airplane — it doesn't prevent the crash, but it makes it impossible to lie about what happened afterward. An enterprise auditor can run ahp verify on any chain and know immediately if anything was changed.

Auto-Instrumentation

AHP is framework-agnostic. The SDKs auto-instrument at the protocol level:

Python: Intercepts urllib, requests, and httpx — covers any framework that makes HTTP calls (which is all of them).

TypeScript: Intercepts globalThis.fetch — covers any Node.js agent framework.

Drop it into any existing agent system. No code changes. It records what agents do, not how they're built.

Why a Common Protocol

Just like HTTP standardized web communication and OpenTelemetry standardized application observability, agents need a shared format for accountability. Not a framework-specific logger — a standard that any framework can write, any auditor can verify, and any tool can inspect.

AHP is that standard.

Links

Feedback, issues, and contributions welcome. If you're building agents in production, I'd especially love to hear what observability gaps you're dealing with today.

Top comments (0)