SEN LLC

Posted on May 6

Building an Interactive Merkle Tree Visualizer — One Byte Changes, the Whole Root Changes, in O(log n) Hops

#javascript #webdev #crypto #frontend

"If a Merkle tree commits to a million transactions and one of them is altered, the root changes. To prove inclusion of any one transaction, you only need ~20 hashes." Most explanations stop there and leave you to imagine the picture. This article is the picture: an interactive visualizer where you edit a leaf and watch the change ripple to the root, and a proof — the actual sibling hashes you'd send to a verifier — gets generated and replayed live.

🌳 Demo: https://sen.ltd/portfolio/merkle-viz/
📦 GitHub: https://github.com/sen-ltd/merkle-viz

The whole thing is ~200 lines of pure logic + 150 lines of DOM/SVG glue, no build step, no dependencies. Hashing is crypto.subtle.digest("SHA-256", …) in both the browser and Node 18+, so the same merkle.js ships to production and runs under node --test. 17 tests cover the math.

What a Merkle tree actually is

You have a list of items: transactions, log entries, file blobs, anything addressable. You want a single short fingerprint that commits to the entire list, such that:

Changing any item changes the fingerprint.
Given any item, you can prove to someone holding only the fingerprint that the item was in the list — without sending them the rest of the list.
Both verification and proof generation are cheap: O(log n) in the list size.

A naïve answer is "hash the concatenation of every item." That gives you property 1 but blows up property 2 — the verifier needs the entire list. A Merkle tree gets all three.

The construction is mechanical:

level 0:  H(L0)  H(L1)  H(L2)  H(L3)        ← leaves: hash each item
level 1:    H(H(L0)||H(L1))     H(H(L2)||H(L3))   ← pair them up, hash again
level 2:        H( level1_left || level1_right )  ← keep going
                       =
                     ROOT

H is SHA-256. || is byte concatenation (not hex-string concatenation — that's a real footgun). For odd-count levels, this implementation duplicates the last node and pairs it with itself — the convention used by Bitcoin's consensus rules. (Certificate Transparency carries the odd one up unchanged; the Ethereum Patricia trie is a different beast entirely. Pick a convention, document it, stick with it.)

Once you've built it, the inclusion proof for leaf i is just the sibling hash at every level on the way from leaf i up to the root. For 4 leaves, every proof is 2 hashes long. For 1024 leaves, it's 10. For 1 million, it's 20.

The pure-logic core

merkle.js exports four functions a verifier needs and three more for the visualizer to drive the UI. The whole thing is async because crypto.subtle.digest is async; nothing else about it is exotic.

const enc = new TextEncoder();

export async function sha256Hex(bytes) {
  const buf = bytes instanceof Uint8Array ? bytes : enc.encode(bytes);
  const digest = await crypto.subtle.digest("SHA-256", buf);
  return bytesToHex(new Uint8Array(digest));
}

export async function hashLeaf(text) {
  return sha256Hex(text);
}

export async function hashPair(leftHex, rightHex) {
  const buf = new Uint8Array(64);
  buf.set(hexToBytes(leftHex), 0);
  buf.set(hexToBytes(rightHex), 32);
  return sha256Hex(buf);
}

hashPair is where most ad-hoc implementations go wrong. The two child hashes need to be concatenated as bytes, not as hex strings. sha256("ab" + "cd") and sha256(bytes_of("ab") || bytes_of("cd")) produce different roots, and verifiers built against one convention will reject proofs built against the other. The unit tests pin this down explicitly:

test("hashPair concatenates child bytes (not hex strings)", async () => {
  const left = await hashLeaf("L");
  const right = await hashLeaf("R");
  const got = await hashPair(left, right);

  const concat = Buffer.concat([
    Buffer.from(left, "hex"),
    Buffer.from(right, "hex"),
  ]);
  assert.equal(got, createHash("sha256").update(concat).digest("hex"));
});

Building the tree is a straightforward bottom-up loop:

export async function buildTree(leaves) {
  if (!leaves.length) return { levels: [[]], root: null };

  const leafHashes = await Promise.all(leaves.map(hashLeaf));
  const levels = [leafHashes];

  while (levels[levels.length - 1].length > 1) {
    const cur = levels[levels.length - 1];
    const next = [];
    for (let i = 0; i < cur.length; i += 2) {
      const left = cur[i];
      const right = i + 1 < cur.length ? cur[i + 1] : cur[i]; // duplicate odd
      next.push(await hashPair(left, right));
    }
    levels.push(next);
  }
  return { levels, root: levels[levels.length - 1][0] };
}

I keep all the levels around (not just the root) because the visualizer wants to draw them, but a real-world implementation that only cares about the root can discard each level after consuming it.

Generating and verifying proofs

The proof for leaf at index i walks up the tree, collecting the sibling at each level:

export function getProof(levels, index) {
  if (!levels.length || !levels[0].length) return [];
  const proof = [];
  let i = index;
  for (let lvl = 0; lvl < levels.length - 1; lvl++) {
    const layer = levels[lvl];
    const isRight = i % 2 === 1;
    const sibIdx = isRight ? i - 1 : i + 1;
    const sibling = sibIdx < layer.length ? layer[sibIdx] : layer[i]; // odd-end self-dup
    proof.push({ hash: sibling, side: isRight ? "left" : "right" });
    i = Math.floor(i / 2);
  }
  return proof;
}

Two non-obvious bits:

side matters. When you concatenate the running hash with the sibling, order is part of the input. If the running hash is the left child, you concat running || sibling; if it's the right child, sibling || running. Forget this and the verifier silently rejects every proof.
Odd-end leaves pair with themselves. A 3-leaf tree has [H(L0), H(L1), H(L2)] at level 0. Level 1 is [H(H(L0)||H(L1)), H(H(L2)||H(L2))]. So the proof for L2's first step has the sibling field set to L2's own hash — self-duplication is part of the protocol, not a bug.

Verification is the reverse:

export async function verifyProof(leafText, proof, expectedRoot) {
  let running = await hashLeaf(leafText);
  for (const step of proof) {
    if (step.side === "left") {
      running = await hashPair(step.hash, running);
    } else {
      running = await hashPair(running, step.hash);
    }
  }
  return { ok: running === expectedRoot, computedRoot: running };
}

Note the verifier never sees the other leaves. It hashes the candidate leaf, replays each pair-hash with the supplied sibling, and checks the final value against the root it already trusted. That's the whole point: O(log n) hashes prove inclusion in an O(n) dataset.

Visualizing the propagation

The killer demo for Merkle trees is "change one byte of one leaf, watch the root change." Once you can see it, the abstract claim becomes obvious. The visualizer keeps two trees in memory — a baseline and a current one — and diffs them at draw time:

export function diffLevels(a, b) {
  const changed = new Set();
  for (let lvl = 0; lvl < Math.max(a.length, b.length); lvl++) {
    const la = a[lvl] || [], lb = b[lvl] || [];
    const w = Math.max(la.length, lb.length);
    for (let i = 0; i < w; i++) {
      if (la[i] !== lb[i]) changed.add(`${lvl}:${i}`);
    }
  }
  return changed;
}

When one leaf is edited, exactly the leaf-to-root chain ends up in the changed set. Every other node — including the sibling chain that the proof relies on — stays identical. The unit test pins the shape:

test("diffLevels finds exactly the path of changed nodes", async () => {
  const before = await buildTree(["a", "b", "c", "d"]);
  const after  = await buildTree(["a", "b", "c!", "d"]);
  const changed = diffLevels(before.levels, after.levels);
  // Expected: leaf 2, parent (1,1), root (2,0).
  assert.ok(changed.has("0:2"));
  assert.ok(changed.has("1:1"));
  assert.ok(changed.has("2:0"));
  // Untouched sibling chain.
  assert.ok(!changed.has("0:0")); assert.ok(!changed.has("0:1"));
  assert.ok(!changed.has("0:3")); assert.ok(!changed.has("1:0"));
});

That's the whole "tampering propagates in O(log n)" claim, expressed as code. The SVG layer walks the changed set and paints the matching nodes red. It's the picture every Merkle-tree explainer wants but doesn't draw.

Why two files

merkle.js has no DOM access, no globals, no module state. Every export is pure (or async-pure). That's what lets node --test consume it directly:

$ npm test
✔ sha256Hex matches node:crypto for ASCII input
✔ hashLeaf is deterministic and matches sha256(text)
✔ hashPair concatenates child bytes (not hex strings)
✔ buildTree on 4 leaves yields a 3-level tree
✔ buildTree duplicates the last leaf when count is odd (Bitcoin style)
✔ proofs verify against the real root for every leaf
✔ proof verification fails if a single character of the leaf changes
✔ proof verification fails if a sibling hash is swapped
✔ diffLevels finds exactly the path of changed nodes
…
ℹ tests 17  ℹ pass 17  ℹ fail 0

script.js is the DOM/SVG glue: editor for the leaves, SVG renderer for the tree, panel for the proof. It owns nothing about hashing — every byte that becomes a hash goes through merkle.js. That's how the same module ships to production and runs under tests.

The runtime story is symmetric: crypto.subtle.digest is identical in modern browsers and Node 18+, no polyfill, no shim. The async-everywhere style is the price; the payoff is that import { buildTree } from "./merkle.js" works in both contexts unchanged.

Where this data structure actually shows up

Bitcoin block headers commit to a single Merkle root over every transaction in the block. SPV ("simplified payment verification") wallets use exactly this proof shape: server sends the wallet a transaction + a Merkle proof, wallet verifies inclusion against the block header it already has. The 2008 paper builds on this; the alternative — sending the whole block — is the network-bandwidth disaster the design avoids.
Git tree objects hash a directory of blobs and recursive sub-trees. It's not the exact same construction (Git uses a content-addressable graph, not a balanced binary tree), but the property is the same: changing one byte of one file changes every ancestor tree's hash up to the commit.
Certificate Transparency logs commit to every TLS certificate issued by a participating CA. Browsers verify, before trusting a certificate, that it was logged — using a Merkle inclusion proof against a public log root. The convention is slightly different (no leaf duplication), but the math is the same.
Content-addressable storage in IPFS, Plan 9, and friends uses Merkle DAGs to address blobs. Same idea, different topology.

The shared insight is that hashing is composable: a hash of a hash is still a fingerprint, and you can build a tree where the root is a fingerprint of the whole subtree. Once you see that, the rest is plumbing.

What's not in scope here

The visualizer is intentionally small. A few things real implementations care about that this one doesn't:

Domain separation — production Merkle trees prefix leaf hashes with a different byte than internal-node hashes (0x00 vs 0x01) to defeat second-preimage attacks where a verifier is tricked into treating an internal node as a leaf. CT and recent Bitcoin proposals do this; the demo doesn't, because adding it complicates the explanation.
Streaming construction — buildTree here holds every level in memory. For a 100M-entry log you'd compute parent levels on-the-fly and forget the leaves once their level is finished.
Proof formats — real-world proofs are serialized to specific byte layouts (RFC 6962 for CT, the Bitcoin protocol's compact format). The visualizer keeps proofs as plain JS objects since serialization isn't the point.

If you're shipping production code: read the RFC 6962 (CT) or the relevant section of the Bitcoin developer reference instead of trusting any visualizer's defaults.

Try it

The demo at https://sen.ltd/portfolio/merkle-viz/ starts with four "transaction" leaves. Click tamper next to leaf 2 to see the canonical demo: the leaf input goes red, the SVG node and its parent and the root all turn red, the root hash above the tree changes. The inclusion proof to the right still verifies — because it's been replayed against the new root, with the new sibling hashes that fall out of the new tree. Then open the "tamper with the leaf I'm verifying" details panel and type something different from the leaf text — that's when the verification fails, because now the candidate leaf doesn't actually correspond to any leaf in the tree.

Source: https://github.com/sen-ltd/merkle-viz — MIT, ~350 lines total, 17 unit tests, no build step. Open merkle.js first; the rest is presentation.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

DEV Community