Fabio Marcello Salvadori

Posted on Mar 2

Your AI Agent Just Hallucinated a Wire Transfer. Here's How I Stopped It

#ai #security #python #opensource

Your LLM agent just decided to send $45,000 to a vendor. The invoice number? Hallucinated. The recipient? Close enough to sound right. The approval? A Slack message it misread from an unrelated thread.

By the time you notice, the money is gone.

This is not hypothetical. OWASP published the Agentic AI Top 10 in late 2025, and the top threats read like a horror show: goal hijacking, tool misuse, privilege escalation through tool chaining. In the meantime, 48% of cybersecurity professionals now call agentic AI the number one attack vector, but only about a third of enterprises have AI-specific security controls in place.

I built an open-source protocol to fix this. It's called PIC (Provenance & Intent Contracts), and it works by forcing agents to prove every important action before it happens.

Guardrails Don't Solve This

If you have worked with AI safety tooling, you've probably used guardrails: NeMo Guardrails, Guardrails AI, or something similar. They are good at constraining what a model says. Content filters. Output validation. Topic rails.

But none of them constrain what an agent does.

An agent can pass every output filter you have and still trigger an unauthorized wire transfer, export a customer database, or delete a production table. The guardrail sees the text. It doesn't see the tool call. And it definitely doesn't ask why the agent decided to make that tool call or where the decision data came from.

That's the gap. Guardrails sit at the output boundary. The real danger is at the action boundary: the moment between "the LLM decided to do something" and "the tool actually executes."

PIC: One Rule, Enforced Everywhere

PIC sits at that action boundary. The idea is simple:

Before any high-impact tool call executes, the agent must submit a structured proposal declaring what it wants to do, why, and where the decision data came from. PIC verifies the proposal and blocks anything that doesn't check out.

Here is what a proposal looks like:

{
  "protocol": "PIC/1.0",
  "intent": "Execute wire transfer for Q4 server costs.",
  "impact": "money",
  "provenance": [
    { "id": "cfo_signed_invoice_hash", "trust": "trusted" },
    { "id": "slack_approval_manager", "trust": "semi_trusted" }
  ],
  "claims": [
    {
      "text": "Invoice hash matches authorized payment list",
      "evidence": ["cfo_signed_invoice_hash"]
    }
  ],
  "action": {
    "tool": "treasury.wire_transfer",
    "args": { "recipient": "AWS_Global_Payments", "amount": 45000 }
  }
}

Every proposal must include:

Field	What it does
`intent`	Plain-language description of what the agent is trying to do
`impact`	Risk class: `read`, `write`, `money`, `privacy`, `irreversible`, etc.
`provenance`	Where the decision data came from, with explicit trust levels
`claims`	Agent's assertions, each pointing to evidence
`action`	The actual tool call (`tool` + `args`)

The core verification rule: high-impact actions (money, privacy, irreversible) require at least one claim backed by evidence from trusted provenance. No trusted evidence? Blocked. Missing fields? Blocked. Schema invalid? Blocked. Any error at all? Blocked.

This is fail-closed by design. There is no "allow anyway" fallback.

See It Work in 30 Seconds

pip install pic-standard

# This proposal has trusted provenance + valid evidence → passes
pic-cli verify examples/financial_irreversible.json

# This one has a bad SHA-256 hash → blocked
pic-cli verify examples/failing/financial_hash_bad.json --verify-evidence

The first command passes: the proposal has trusted provenance backing a high-impact action. The second one fails: the evidence hash doesn't match the artifact. The action never executes.

That's the entire verification loop. Schema check -> verifier rules -> tool binding check -> evidence verification -> allow or block. All local, all deterministic, zero external dependencies.

How This Maps to Real Threats

Let's walk through the OWASP Agentic Top 10 threats and how PIC handles them:

Prompt injection → side effect (ASI01: Agent Goal Hijack)
A malicious email gets ingested by the agent and it triggers a payment. PIC tracks that the email is untrusted provenance. Untrusted data alone cannot trigger a money action: it needs trusted evidence to "bridge" the taint. The transfer is blocked.

Hallucination -> financial loss (ASI02: Tool Misuse)
The LLM fabricates an invoice number and tries to send $500. PIC requires cryptographic evidence (a SHA-256 hash or Ed25519 signature) from a trusted source. Hallucinations don't produce evidence. Blocked.

Privilege escalation via tool chaining (ASI03)
Agent chains a series of harmless read calls, then attempts a money transfer. PIC gates each tool call independently by its impact class. The reads pass (low impact). The transfer still needs its own trusted evidence. Chaining doesn't help.

Untrusted data laundering (ASI04)
User input or webhook data gets treated as authoritative. PIC's provenance model forces explicit trust labels - trusted, semi_trusted, untrusted - and the verifier enforces the distinction. You can't launder untrusted data into a trusted claim without cryptographic proof.

It Plugs Into Your Existing Stack

PIC is not a framework. It's a verification layer that slots into whatever you're already using:

LangGraph - PICToolNode drops into your graph as a tool executor that verifies proposals before dispatch:

pip install "pic-standard[langgraph]"

MCP (Model Context Protocol) - Wrap any MCP tool with guard_mcp_tool for fail-closed verification with request tracing and DoS limits:

pip install "pic-standard[mcp]"

OpenClaw - A full TypeScript plugin with three hooks: pic-gate (blocks before execution), pic-init (injects PIC awareness at session start), and pic-audit (structured audit logging).

Cordum - A Go-based Pack that adds a job.pic-standard.verify worker topic to Cordum workflows, with three-way routing: proceed, fail, or require_approval for human-in-the-loop on high-impact actions.

There is also a language-agnostic HTTP bridge (pic-cli serve) so you can integrate from Go, TypeScript, Rust, or anything that speaks HTTP.

What's Under the Hood

This is not a weekend project. Some numbers:

108 tests across 18 test files (schema, verifier rules, evidence, keyring, integrations, HTTP bridge hardening, pipeline)
7 impact classes with formal evidence requirements
2 evidence types: SHA-256 hash verification and Ed25519 digital signatures
Trusted keyring with expiry timestamps and revocation lists
DoS hardening: 64KB max proposal, 500ms eval budget, 5MB max evidence file, 1MB HTTP body limit, 5-second socket timeout
Formal spec: RFC-0001 with a 7-threat model and SHA-256 spec fingerprints
CI: Tested across Python 3.10, 3.11, 3.12

The whole thing is published as a defensive publication under Apache 2.0, meaning the core concepts (causal taint semantics, action-boundary gating, provenance bridging) are documented and timestamped specifically to prevent anyone from patenting them.

Try It

pip install pic-standard
pic-cli verify examples/financial_irreversible.json

That's one command to verify your first proposal. From there:

Read the quickstart
Browse the example proposals (passing and failing)
Check the RFC if you want the formal spec

If you are building AI agents that touch money, user data, or anything irreversible, this is the layer that was missing.

GitHub: github.com/madeinplutofabio/pic-standard
PyPI: pic-standard
License: Apache 2.0

If this is useful, a star on the repo helps more than you'd think.

DEV Community