Building an Offline Verifier for AI Decision Evidence
Most audit-log systems have a quiet dependency baked into them:
to believe the log, you have to believe the system that wrote it.
A cloud provider sealing its own logs is still the cloud provider vouching for itself. A SaaS platform exporting its own audit trail is still asking the reviewer to trust the platform that produced the evidence.
That is not always wrong. But it is not independent.
If the only way to check the evidence is to trust the party that generated it, then the evidence has a structural ceiling.
I have been building AURORA, a system that seals AI decisions into signed, timestamped evidence records.
This week I shipped the part that makes the word independent more real:
an offline verifier.
A reviewer can now take an AURORA evidence bundle, drop the .zip into a browser, and verify the bundle against the cryptographic material inside it.
No OpenSSL.
No shell.
No account.
No trust in me required.
The verifier checks the manifest, file-hash inventory, RSA signature, record hash chain, and RFC 3161 timestamp evidence. Each result is shown with its own scope boundary.
Here is what I learned building it.
1. The verifier has to be provider-neutral
The first important design decision was not about UI.
It was about what the verifier should actually verify.
A naive verifier checks a vendor:
Is this a valid GlobalSign timestamp?
or:
Is this a valid FreeTSA token?
That sounds reasonable at first. But it couples your verification logic to a specific provider.
The moment you change timestamp providers, add a second provider, or let enterprise customers bring their own TSA, your verifier becomes a provider adapter instead of an evidence verifier.
That is not the shape I wanted.
So the verifier checks the bundle contract, not the vendor.
It checks:
- the canonical record bytes hash to the recorded
data_hash - the RSA signature verifies over the canonical JSON with the bundled public key
- the public key fingerprint matches the published fingerprint
- the record hash chain is self-consistent where the required material exists
- the RFC 3161 timestamp token imprint binds to the timestamped message
- the bundle file inventory matches the recorded SHA-256 hashes
None of those checks require the verifier to care which TSA vendor issued the timestamp.
A bundle timestamped by a free TSA and a bundle timestamped by a qualified TSA should travel through the same verification model.
The trust layer can change.
The verification contract should not.
That separation matters.
If the verifier is tied to a single provider, then the evidence format inherits the lifecycle of that provider relationship. If the verifier is tied to the bundle contract, the evidence format can outlive provider changes.
That is the property I wanted first.
2. One core, two consumers, no hidden app dependency
The verifier has two consumers.
The first is a standalone CLI. That is for auditors, engineers, or reviewers who want to run verification outside AURORA entirely.
The second is the backend /verify/bundle endpoint. That is for the browser-based upload flow.
The obvious move would be to put the verifier core in one shared package and import it everywhere.
I did not do that.
The standalone verifier needs to stay dependency-minimal. An external reviewer should not need to install a FastAPI app, database client, auth layer, storage SDK, or internal AURORA package just to check a signature.
So the core is a pure library:
bytes in.
structured result out.
No database.
No service imports.
No app.crypto.
No TSA client imports.
No HTTP assumptions.
The CLI reads a path, loads the bundle, calls the core, and renders text or JSON.
The backend receives an uploaded .zip, calls the same core, and returns the structured result.
To keep the two copies aligned, the verifier core is vendored into the backend by a sync script. The script stamps a SHA-256 of the source copy, and CI can run a --check mode to fail the build if the backend copy drifts from the standalone verifier.
This feels redundant until you remember the product goal.
The whole point is not to ask people to trust AURORA more than necessary.
A verifier that only works inside AURORA’s app tree is not an independent verifier. It is just another internal endpoint.
The standalone path has to remain real.
3. The bug that taught me the most: normalize, do not overfit
The first regression caught a very small mistake with very large implications.
I added a manifest check for the bundle format.
The verifier had a whitelist of accepted format strings.
My test fixture passed.
A real production bundle failed.
The reason was not cryptographic. It was not a broken signature. It was not a malformed bundle.
It was casing.
Production used:
AURORA_Verification_Bundle
The verifier expected a slightly different casing and separator convention.
The bundle was valid.
My equality check was too brittle.
The fix was to normalize before comparing:
- trim
- lowercase
- normalize separators
- then compare against the canonical format identity
The lesson is simple:
when validating identifiers you control on both sides, normalize first.
Exact string whitelists are useful at protocol boundaries. But when your own system already has historical or production variants, a strict literal check can turn valid evidence into a false failure.
False negatives in a verifier are dangerous.
Imagine an auditor uploads a legitimate evidence bundle and the tool says FAILED because one internal label used a different underscore pattern.
That is not integrity.
That is your validator mistaking formatting for evidence.
4. Failure modes need to be honest
This is the part I care about most.
A verifier has power because it produces a verdict.
That also makes it dangerous.
If it hides ambiguity, rounds warnings into passes, or makes trust claims it has not actually proven, it becomes another system the reviewer has to trust blindly.
That defeats the purpose.
So the verifier separates required checks, warnings, unavailable checks, and non-applicable checks.
For example, AURORA currently has a dual-path timestamp condition in some bundles.
The timestamp hash bound at sealing time and the direct hash of the bundled timestamp token can differ because the bundled token may come from a separate TSA call.
That looks suspicious if you only compare bytes mechanically.
But it is not necessarily a tampering signal.
So the verifier reports it as a warning, with a plain-language explanation. The integrity verdict can still pass if the required signature, hash, manifest, and record-chain checks pass.
Same with timestamp trust status.
If a TSA is non-qualified, the result says non_qualified.
If a qualified trust-list validation has not been performed, the verifier does not pretend it has.
That boundary is part of the result.
Not hidden.
Not dressed up.
Not silently promoted.
The principle is:
say exactly what was checked, exactly what passed, exactly what failed, and exactly what was not proven.
That may sound obvious.
It is not how a lot of software behaves.
A verifier that hides its own limits is just another trust demand.
5. HTTP semantics matter more than they look
The browser verifier exposed another small but important distinction:
a malformed upload and a failed verification are not the same thing.
If someone uploads a random file, or a .zip without a manifest, the server could not perform bundle verification.
That is a structural input problem.
It should return something like:
422 Unprocessable Entity
But if someone uploads a well-formed AURORA bundle and the RSA signature fails, that is different.
The verifier successfully ran.
The answer is just no.
That should return:
200 OK
{
"ok": false,
"result": "FAILED"
}
A failed cryptographic check is not a server error.
It is a valid verification result.
Collapsing both cases into a generic 400 loses the distinction between:
I could not read this evidence.
and:
I read this evidence, and it does not verify.
For auditors, that distinction matters.
For debugging, it matters.
For trust, it matters.
6. The browser is convenience. The CLI is independence.
The new browser verifier is useful because it removes friction.
A reviewer does not need OpenSSL installed. They do not need to know which command verifies a signature or how to parse an RFC 3161 token.
They can upload the bundle and see the axes:
- manifest
- file inventory
- canonical record hash
- signature
- record hash chain
- timestamp evidence
That is the accessible path.
But it is not the only path.
The CLI remains important because it is the fully independent path.
The browser flow still asks AURORA’s server to run the verifier core. That is useful, but not maximally independent.
The CLI lets a reviewer run the same verification logic outside AURORA’s infrastructure.
That is why the verification guide now has both:
- browser-based bundle verification for no-CLI review
- OpenSSL / CLI verification for fully independent offline review
The browser lowers the barrier.
The CLI preserves the trust boundary.
Both matter.
7. What the verifier does not do
The verifier does not tell you the AI decision was correct.
It does not tell you the decision was lawful.
It does not tell you the decision was fair.
It does not tell you a regulator approved it.
It does not tell you a court will admit it.
That is not its job.
Its job is narrower:
- is this evidence package intact?
- do the hashes match?
- does the signature verify?
- does the timestamp evidence bind to the expected message?
- can an external reviewer reproduce the integrity checks?
- does the result state its own limits clearly?
That narrowness is intentional.
AURORA is not trying to be the judge.
It is trying to be the custody layer.
Where it lands
The result is now live:
aurora-audit.com/verify-bundle
An external reviewer with no account, no CLI, and no reason to trust me can drag an AURORA evidence bundle into the browser and watch the verifier check the package.
The core is still beta.
The rough edges still show.
Qualified timestamp trust-list validation is a separate trust layer and is still being hardened. The verifier does not pretend otherwise.
But the important part is real now:
the evidence can leave the original workflow and still be checked.
That is the line I wanted to cross first.
If you have built independently verifiable evidence systems, I would be interested in how you handled the trust-boundary question:
where did you draw the line between “verify” and “trust me”?

Top comments (1)
One boundary worth making explicit:
This does not say the AI decision was correct, lawful, fair, regulator-approved, or court-admissible.
It only checks whether the evidence bundle is intact, signed, timestamped, and internally consistent.
That boundary matters. A verifier that overclaims becomes just another thing you have to trust.
Curious how others handle this line: where do you separate “verified evidence” from “trusted interpretation”?