Healthcare and legal customers kept asking the same question about our document extraction API:
"How do we prove in court that the extracted JSON wasn't tampered with after the fact?"
Good question. Here's the answer I shipped.
The architecture in 60 seconds
Every extraction record gets:
- HMAC-SHA256 signed at write time with a per-tenant secret
- Inserted into a daily Merkle tree (one tree per UTC day)
- The daily Merkle root is published — clients can see it any time
When a customer needs to prove integrity, they request a Merkle proof for their specific record ID. The proof is a sequence of sibling hashes that lets anyone re-derive the tree root from the leaf. If the recomputed root matches the published root → the record is unaltered. If anything changed in the record after the daily tree was sealed, the proof fails.
Pure Node crypto. Zero external service. ~150 LOC.
The library API
import { buildMerkleTree, generateProof, verifyProof, hmacSign } from "./audit-merkle";
const records = [
{ id: "ext-1", customer_id: "acme", timestamp: "2026-05-08T12:00:00Z",
raw_payload_hash: sha256(rawPdfBytes),
extracted_payload_hash: sha256(JSON.stringify(extractedData)) },
// ... more records for the same UTC day
];
const tree = buildMerkleTree(records);
console.log("Daily root:", tree.hash); // publish this
const proof = generateProof(records, /* index of leaf */ 0);
const valid = verifyProof(proof, tree.hash); // true
What I tested
Real records (5 extractions, BBC test data). Result:
- Build root:
a8ad1f8ac2f60a1e... - Generate proof for leaf 3: VALID (3 siblings)
- Tamper test: changed
extracted_payload_hashof leaf 3 → recomputed root differs → proof verification fails - HMAC verify with wrong key: returns false (timing-safe)
Tamper-evident, exactly as advertised.
Why this matters
For enterprise/legal/healthcare buyers, "trust us, we don't alter your data" is not enough. Cryptographic proof closes deals.
For everyone else: it costs nothing extra to add. Vercel + Upstash free tier already covered. The only addition is one cron job that builds the daily tree and writes the root somewhere queryable.
The full code
Lib code: lib/audit-merkle.ts in our docs API repo.
Live endpoint: POST /api/v1/audit-proof (mode=generate or mode=verify).
Try it: parseflow.dev/case-studies
What other tamper-evident patterns have you seen ship in production? Curious if pure-crypto approaches scale beyond ~10k records/day before you need a dedicated WORM store.
Top comments (0)