Rudson Kiyoshi Souza Carvalho

Posted on Jun 16 • Originally published at github.com

Your AI agent is inventing behavior — and you have no way to prove otherwise

#ai #softwareengineering #architecture #devops

You reviewed the PR. The code looks right. The tests pass.

But that new field in the API response — who asked for that?

You check the history, the requirements, the conversation with the PO. It's nowhere. The AI agent just added it. And if you hadn't looked closely, it would have shipped to production.

This happens every time an agent generates code. And it will happen more often as pipelines become more autonomous. The problem isn't that the AI makes mistakes — it's that when it adds something nobody asked for, there's no structural mechanism today that stops it from getting through.

The gap nobody closed

There's an entire ecosystem of standards for tracking what happens to software. But each one covers a different piece:

RTM, ReqIF, OSLC — track requirements, but they're documents humans fill in. Nothing stops an agent from generating code that doesn't correspond to any of them.
SLSA, SPDX, CycloneDX — cover the build chain and component inventory. Excellent, but they operate on the compiled artifact, not on behavior.
W3C PROV — models data provenance in general. It doesn't go down to "who asked for this field in the response?"
C2PA — provenance for media and digital content. A different domain.

The floor between an approved requirement and generated behavior is empty. There's no machine-checkable contract today that says: "this behavior has traceable origin, or it's rejected."

The root of the problem

When a human engineer adds a field nobody asked for, there's natural friction: the PR goes to review, someone asks "where did this come from?", the person has to justify it.

With AI agents the cycle is different:

The agent receives a context
The agent generates code
The code is plausible, it compiles, the tests pass
Nobody has an automated way to ask: was this specific behavior derived from which requirement?

The honest answer is: there isn't one. And this isn't process fussiness — in regulated environments (finance, healthcare, aerospace), this is real risk.

The core idea behind BPR

The Behavioral Provenance Record (BPR) is a conformance specification that attacks exactly this problem.

The logic is simple: instead of trying to prove the agent didn't invent anything — which is impossible — you turn invention into a structural failure.

Every node in the pipeline (a requirement, a behavioral example, a scenario, a contract, a unit of code, a test) emits a provenance record. That record says: "this artifact came from that upstream node." If it didn't come from anywhere, it's rejected.

Core rule: provenance-or-reject.

Every derived node MUST cite the upstream node it came from.
No resolvable origin → rejected.

Where it enters the SDLC

It's not a document you fill in afterward. BPR enters the flow at the moment the artifact is produced:

The human authors the need and the behavioral examples. These are the trust anchors — authored nodes, the roots of the graph.
The agent derives the scenario, the contract, the code, and the test — emitting a derived record at each step, citing what came before.
A conformance gate (a CI stage, a PR check) reads the records and passes or fails the change before merge.

The result is a traceability graph. The classic RTM you know is just a 2D projection of this graph — not the actual object.

The hardest level: anti-invention

The interesting part is what happens at the highest level — when you want to know whether the agent added behavior nobody asked for.

BPR doesn't try to prove a negative. Instead, each node can declare its claims — the behavioral assertions it makes. And every claim needs to cite an upstream claim.

Every claim has a type defined by observability:

behavioral — changes the observable functional contract (response field, status code). Must be traced.
operational — changes observable non-functional behavior (latency, retry, logging, metrics, security). Must be traced.
implementation — internal, no observable footprint (data structure, in-process cache). Exempt — but only if non-observability is attested. Without attestation, it's treated as behavioral. This closes the loophole of "laundering" invention by labeling it an internal detail.

A behavioral claim with no upstream ancestry is invention — now a localized, named, attributable failure. Not a hallucination hidden in the middle of the code.

Incremental adoption — not all or nothing

This is the point I consider most important in the design: you don't have to adopt everything at once.

L1 and L2 already deliver real value today, without depending on anything sophisticated:

L1: every scenario has a test. Every test verifies a scenario. Simple, deterministic, and most teams still don't have this enforced.
L2: every approved need has downstream coverage through to a test. (Deferred, rejected, or informational requirements are explicitly exempt — the status field handles that.)

L3 is the frontier — where you guarantee the agent didn't invent behavior. Harder, depends on external checkers (human, NLI model, LLM-judge), but fully specified and falsifiable.

What BPR standardizes — and what it doesn't

This distinction is what separates a standard from a product:

What is the standard: the record schema, the serialization, the invariants, the conformance levels, and claim semantics.

What is not the standard (replaceable): the validator, the checkers that produce verdicts, how the agent emits records in the pipeline, where the graph is stored, and the policy for who can attest.

You implement your own validator in Go, TypeScript, whatever. If two independent implementations reach the same conformance verdict on the same examples, that's a standard — not a private API.

Honesty about the limits

Conformance level is a claim about structure, not correctness.

"L3b conformant" means every behavioral claim is cited and attested as supported. It doesn't mean the attestation is correct. It's not proof that nothing was invented.

BPR guarantees attributability — that invented behavior can't be committed silently. A weak checker can still issue a wrong supported. The quality of the checker is the responsibility of whoever chooses it, and is out of scope for a purpose-built standard.

What BPR doesn't solve on its own: a malicious agent forging records, an incompetent checker, a bad original requirement, absence of organizational policy, tampering with files after the fact. These are solved through composition — record signing, attestation policy, human process — not by expanding the scope of the standard.

Current state and what's missing

BPR is published as a preprint of an initial specification:

✅ JSON Schema (JSON Schema 2020-12)
✅ Reference validator L0–L3c in Python (runs and rejects invalid input with a specific reason)
✅ Conformant and deliberately broken examples (including laundering attempts)
🔲 Versioning + staleness (v0.5) — when a requirement changes, which downstream nodes become stale?
🔲 Anti-self-attestation at L3c — issuer ≠ verifier as a specifiable property
🔲 Normative mapping to W3C PROV-O

The line between a specification and a standard is an independent implementation. Today there's one — the reference one. Publishing this is the invitation for a second.

How to test it right now

git clone https://github.com/RudsonCarvalho/bpr.git
cd bpr
pip install jsonschema

# base example — should pass at L2
python validator/validate.py examples/profile-retrieval.records.json --level L2

# example with claims — should pass at L3b
python validator/validate.py examples/profile-retrieval.l3.records.json --level L3b

# example with invention and laundering — should FAIL
python validator/validate.py examples/profile-retrieval.l3-broken.json --level L3b

Conclusion

We don't need to prove the AI didn't invent anything. We need any invention to be a named, localized, traceable failure — not a hallucination that shipped to production because the tests passed.

BPR proposes exactly that: a minimal, neutral, verifiable contract. The immediate value is in L1/L2. The long-term promise is in L3.

If you work with AI pipelines generating code: clone it, run the examples, try to break L3, send adversarial cases. That's what turns a specification into infrastructure.

Specification + schema + validator: 👉 github.com/RudsonCarvalho/bpr

Preprint with DOI: 👉 doi.org/10.5281/zenodo.20710512