DEV Community

DevToolsmith
DevToolsmith

Posted on

Cryptographic audit trail for document APIs (Merkle tree, pure Node crypto, $0 cost)

Healthcare and legal customers kept asking the same question about our document extraction API:

"How do we prove in court that the extracted JSON wasn't tampered with after the fact?"

Good question. Here's the answer I shipped.

The architecture in 60 seconds

Every extraction record gets:

  1. HMAC-SHA256 signed at write time with a per-tenant secret
  2. Inserted into a daily Merkle tree (one tree per UTC day)
  3. The daily Merkle root is published — clients can see it any time

When a customer needs to prove integrity, they request a Merkle proof for their specific record ID. The proof is a sequence of sibling hashes that lets anyone re-derive the tree root from the leaf. If the recomputed root matches the published root → the record is unaltered. If anything changed in the record after the daily tree was sealed, the proof fails.

Pure Node crypto. Zero external service. ~150 LOC.

The library API

import { buildMerkleTree, generateProof, verifyProof, hmacSign } from "./audit-merkle";

const records = [
  { id: "ext-1", customer_id: "acme", timestamp: "2026-05-08T12:00:00Z",
    raw_payload_hash: sha256(rawPdfBytes),
    extracted_payload_hash: sha256(JSON.stringify(extractedData)) },
  // ... more records for the same UTC day
];

const tree = buildMerkleTree(records);
console.log("Daily root:", tree.hash); // publish this

const proof = generateProof(records, /* index of leaf */ 0);
const valid = verifyProof(proof, tree.hash); // true
Enter fullscreen mode Exit fullscreen mode

What I tested

Real records (5 extractions, BBC test data). Result:

  • Build root: a8ad1f8ac2f60a1e...
  • Generate proof for leaf 3: VALID (3 siblings)
  • Tamper test: changed extracted_payload_hash of leaf 3 → recomputed root differs → proof verification fails
  • HMAC verify with wrong key: returns false (timing-safe)

Tamper-evident, exactly as advertised.

Why this matters

For enterprise/legal/healthcare buyers, "trust us, we don't alter your data" is not enough. Cryptographic proof closes deals.

For everyone else: it costs nothing extra to add. Vercel + Upstash free tier already covered. The only addition is one cron job that builds the daily tree and writes the root somewhere queryable.

The full code

Lib code: lib/audit-merkle.ts in our docs API repo.
Live endpoint: POST /api/v1/audit-proof (mode=generate or mode=verify).

Try it: parseflow.dev/case-studies

What other tamper-evident patterns have you seen ship in production? Curious if pure-crypto approaches scale beyond ~10k records/day before you need a dedicated WORM store.

Top comments (0)