Dmitrii Zatona

Posted on Mar 1 • Originally published at atl-protocol.org

The Super-Tree: How One Merkle Tree Proves Another

#rust #cryptography #security #opensource

A transparency log that lives forever will eventually contain billions of entries. A single Merkle tree with a billion leaves has a depth of 30, which means inclusion proofs are 30 hashes long and every append touches 30 nodes. That is manageable. But the tree file on disk grows without bound, proof generation requires random access across the entire structure, and rotating keys or changing operational parameters means you are stuck with decisions you made on day one.

ATL Protocol splits the problem into two levels: short-lived Data Trees and an eternal Super-Tree. Each Data Tree accumulates entries for a bounded period (configurable -- say, 24 hours or 100,000 entries). When the period ends, the tree is closed, its root hash becomes a leaf in the Super-Tree, and a fresh Data Tree starts accepting new entries. The Super-Tree is itself a Merkle tree -- it grows by one leaf every time a Data Tree closes.

I want to walk through why this architecture exists, how the two levels connect cryptographically, and how two receipts held by two different people can independently prove that the entire log history was never rewritten -- without contacting the server.

Why Not One Big Tree

The obvious approach is a single append-only Merkle tree. Every entry is a leaf. The root hash represents the entire log state. This works, but it has operational problems.

First, the tree file grows forever. ATL uses memory-mapped slab files where every node (leaves and intermediates) is stored at a fixed offset. A tree with 1 million leaves has approximately 2 million nodes, each 32 bytes -- that is 64 MB per slab. With a single tree, you eventually have a multi-gigabyte file that must remain memory-mapped for proof generation. Splitting into bounded Data Trees means each slab is a fixed, predictable size.

Second, key rotation. If the log operator wants to rotate their Ed25519 signing key, a single-tree design forces a choice: sign new entries with the new key (breaking consistency with old checkpoints) or keep the old key forever (defeating the purpose of rotation). With Data Trees, each tree has its own checkpoint signed at close time. Rotating keys between trees is a natural boundary.

Third, anchoring granularity. In ATL Protocol v2.0, RFC 3161 timestamps anchor Data Tree roots, and Bitcoin OTS anchors the Super Root. The Super Root changes every time a Data Tree closes, so each new snapshot gets its own OTS anchor -- but each anchor commits to the entire log history up to that point, not just one tree. TSA timestamps give every Data Tree a seconds-level proof of time, while Bitcoin gives each Super Root snapshot permanent immutability. A single-tree design would force a choice between frequent expensive Bitcoin anchoring or infrequent timestamps.

The Genesis Leaf

When a new Data Tree starts, its first entry is not user data. It is a genesis leaf -- a cryptographic link to the previous tree.

pub const GENESIS_DOMAIN: &[u8] = b"ATL-CHAIN-v1";

pub fn compute_genesis_leaf_hash(prev_root_hash: &Hash, prev_tree_size: u64) -> Hash {
    let mut hasher = Sha256::new();
    hasher.update([LEAF_PREFIX]);
    hasher.update(GENESIS_DOMAIN);
    hasher.update(prev_root_hash);
    hasher.update(prev_tree_size.to_le_bytes());
    hasher.finalize().into()
}

The genesis leaf hash is:

SHA256(0x00 || "ATL-CHAIN-v1" || prev_root_hash || prev_tree_size_le)

This binds the new tree to the exact state of the previous tree at close time -- not just its root hash, but also its size. If the operator tries to rewrite the previous tree (changing entries, adding entries, removing entries), the root hash or the size changes, and the genesis leaf in the next tree no longer matches. The chain is broken, and any verifier who holds a receipt from the previous tree will detect the inconsistency.

The domain separator ATL-CHAIN-v1 prevents confusion between genesis leaves and regular data leaves. Without it, an attacker could craft a data entry whose payload hash and metadata hash happen to collide with a genesis leaf hash, potentially confusing the chain verification. The domain separator makes this a different hash domain entirely -- the input space for genesis leaves and data leaves does not overlap.

The 0x00 prefix is the RFC 6962 leaf prefix, shared with data leaves. This is intentional: the genesis leaf occupies a regular leaf slot in the Data Tree. The Merkle tree does not need special handling for it. It is leaf 0, hashed like any other leaf, included in proofs like any other leaf. The distinction between "genesis" and "data" exists only in the semantic layer, not in the tree structure.

The Super-Tree

When a Data Tree is closed, its root hash is appended to the Super-Tree as a new leaf. The Super-Tree is a standard RFC 6962 Merkle tree -- the same data structure, the same inclusion proofs, the same consistency proofs. The only difference is what its leaves represent: not individual documents, but entire Data Trees.

The Super-Tree root (super_root) is the single hash that represents the entire log history. Every Data Tree that ever existed, every entry that was ever written, is committed to by this one root. If you trust the super_root, you trust every entry in every Data Tree.

In ATL Protocol v2.0, Bitcoin OTS anchors the super_root, not individual Data Tree roots. Each time the Super-Tree grows, the new Super Root gets its own OTS anchor -- and each anchor commits to every Data Tree in the log up to that point. TSA timestamps anchor individual Data Tree roots for faster, per-tree time proofs.

What a Receipt Contains

Every Receipt-Full in ATL Protocol v2.0 includes a super_proof:

{
  "genesis_super_root": "sha256:aabb...",
  "data_tree_index": 150,
  "super_tree_size": 152,
  "super_root": "sha256:ccdd...",
  "inclusion": ["sha256:...", "sha256:...", "sha256:..."],
  "consistency_to_origin": ["sha256:...", "sha256:..."]
}

Six fields. Each one does something specific.

genesis_super_root is the Super-Tree root when it contained exactly one leaf -- the first Data Tree ever closed. This is the identity of the log. Two receipts with the same genesis_super_root claim to be from the same log instance.

data_tree_index is the position of this receipt's Data Tree in the Super-Tree. Data Tree 0 is the first tree the log ever created. Data Tree 150 is the 151st.

super_tree_size is how many Data Trees the Super-Tree contained at the time this receipt was issued.

super_root is the Super-Tree root hash at that size.

inclusion is a Merkle inclusion proof -- the sibling hashes needed to reconstruct the super_root from the Data Tree root at position data_tree_index. This proves that this specific Data Tree is part of the Super-Tree.

consistency_to_origin is a Merkle consistency proof from Super-Tree size 1 to super_tree_size. This proves that the current Super-Tree is an append-only extension of the genesis state -- no historical Data Trees were removed or modified.

Verifying Super-Tree Inclusion

The first verification step: is this Data Tree actually in the Super-Tree?

pub fn verify_super_inclusion(data_tree_root: &Hash, super_proof: &SuperProof) -> AtlResult<bool> {
    if super_proof.super_tree_size == 0 {
        return Err(AtlError::InvalidTreeSize {
            size: 0,
            reason: "super_tree_size cannot be zero",
        });
    }

    if super_proof.data_tree_index >= super_proof.super_tree_size {
        return Err(AtlError::LeafIndexOutOfBounds {
            index: super_proof.data_tree_index,
            tree_size: super_proof.super_tree_size,
        });
    }

    let expected_super_root = super_proof.super_root_bytes()?;
    let inclusion_path = super_proof.inclusion_path_bytes()?;

    let inclusion_proof = InclusionProof {
        leaf_index: super_proof.data_tree_index,
        tree_size: super_proof.super_tree_size,
        path: inclusion_path,
    };

    verify_inclusion(data_tree_root, &inclusion_proof, &expected_super_root)
}

The Data Tree root is the leaf. The Super-Tree is the tree. Standard Merkle inclusion verification -- the same algorithm used for verifying individual entries within a Data Tree. The Super-Tree does not need special proof algorithms. It reuses the same verify_inclusion function because it is the same data structure.

Two structural checks before any cryptographic work: tree size cannot be zero (there is no Super-Tree with no Data Trees), and the index cannot exceed the size (you cannot be at position 150 in a tree with 100 leaves). These catch malformed proofs before touching the expensive hash operations.

Verifying Consistency to Origin

The second verification step: is the Super-Tree an honest extension of the genesis state?

pub fn verify_consistency_to_origin(super_proof: &SuperProof) -> AtlResult<bool> {
    if super_proof.super_tree_size == 0 {
        return Err(AtlError::InvalidTreeSize {
            size: 0,
            reason: "super_tree_size cannot be zero",
        });
    }

    let genesis_super_root = super_proof.genesis_super_root_bytes()?;
    let super_root = super_proof.super_root_bytes()?;

    if super_proof.super_tree_size == 1 {
        if super_proof.consistency_to_origin.is_empty() {
            return Ok(use_constant_time_eq(&genesis_super_root, &super_root));
        }
        return Err(AtlError::InvalidProofStructure {
            reason: format!(
                "consistency_to_origin must be empty for super_tree_size 1, got {} hashes",
                super_proof.consistency_to_origin.len()
            ),
        });
    }

    let consistency_path = super_proof.consistency_to_origin_bytes()?;

    let consistency_proof = ConsistencyProof {
        from_size: 1,
        to_size: super_proof.super_tree_size,
        path: consistency_path,
    };

    verify_consistency(&consistency_proof, &genesis_super_root, &super_root)
}

This is an RFC 9162 consistency proof, but with a specific twist: from_size is always 1. The proof says: "the Super-Tree at size 1 (containing just the first Data Tree) is a prefix of the Super-Tree at size N." If this holds, then Data Tree 0 is still at position 0, unchanged. Data Tree 1 is still at position 1. And so on, up to the current size.

The genesis case (super_tree_size == 1) is a degenerate consistency proof: the only valid state is genesis_super_root == super_root with an empty proof path. If someone provides a non-empty path for size 1, the proof is structurally invalid -- there is nothing to prove consistency between a tree of size 1 and itself except equality.

Again, this reuses verify_consistency -- the same RFC 9162 implementation I wrote about in a previous post. The Super-Tree does not reinvent consistency proofs. It parametrizes them differently (always from size 1), but the algorithm is identical.

Cross-Receipt Verification

This is the part that makes the two-level architecture worth the complexity.

Suppose Alice receives a receipt today, and Bob receives a receipt next month. They have never communicated. The server might have been compromised in between. Can Alice and Bob independently determine whether the log was tampered with, without talking to the server?

Yes.

pub fn verify_cross_receipts(
    receipt_a: &Receipt,
    receipt_b: &Receipt,
) -> CrossReceiptVerificationResult {
    // ...

    // Step 1: Both receipts must have super_proof
    let super_proof_a = receipt_a.super_proof.as_ref()?;
    let super_proof_b = receipt_b.super_proof.as_ref()?;

    // Step 2: Same genesis?
    let genesis_a = super_proof_a.genesis_super_root_bytes()?;
    let genesis_b = super_proof_b.genesis_super_root_bytes()?;

    if !use_constant_time_eq(&genesis_a, &genesis_b) {
        // Different logs entirely
        return result;
    }

    // Step 3: Both consistent with genesis?
    let consistency_a = verify_consistency_to_origin(super_proof_a);
    let consistency_b = verify_consistency_to_origin(super_proof_b);

    match (consistency_a, consistency_b) {
        (Ok(true), Ok(true)) => {
            result.history_consistent = true;
        }
        // ...
    }

    result
}

Three checks, no server required:

Same genesis? If genesis_super_root differs, the receipts are from different log instances. Done.
Receipt A consistent with genesis? If the consistency proof from size 1 to A's super_tree_size verifies, then the Super-Tree at A's snapshot is an honest extension of genesis.
Receipt B consistent with genesis? Same check for B.

If both receipts are consistent with the same genesis, then by the transitive property of Merkle consistency, the history between them was not modified. The operator cannot have deleted Data Trees, reordered them, or substituted different content -- because both receipts independently prove that their respective Super-Tree snapshots are append-only extensions of the same origin.

This works because consistency proofs are transitive. If the Super-Tree at size 50 is consistent with size 1, and the Super-Tree at size 100 is consistent with size 1, then size 100 is consistent with size 50. Any modification to the first 50 Data Trees would break at least one of the two consistency proofs.

No communication between Alice and Bob. No server access. No trusted third party. Two receipts, one function call, and the math proves the rest.

Why `from_size` Is Always 1

I could have designed the consistency proof to go from the previous receipt's Super-Tree size to the current one. That would be a shorter proof -- proving consistency from size 50 to size 100 requires fewer hashes than proving from size 1 to size 100.

But it would also require the verifier to have the previous receipt. Cross-receipt verification would become sequential: to verify receipt C, you need receipt B, and to verify receipt B, you need receipt A, all the way back to the genesis.

By always proving consistency from size 1, every receipt is self-contained. Each receipt independently proves its relationship to the origin. Verification is O(1) receipts, not O(N). Any single receipt, in isolation, proves that the entire log history up to that point is an append-only extension of genesis.

The cost is a slightly longer proof path. For a Super-Tree with 1000 Data Trees, the consistency proof from size 1 is at most 2 * ceil(log2(1000)) = 20 hashes -- 640 bytes. For a Super-Tree with a million Data Trees, it is 40 hashes -- 1280 bytes. This is negligible compared to the value of self-contained verification.

The Two-Level Trust Chain

Putting it all together, the verification chain for a single receipt is:

Entry level: Verify the document hash matches the receipt's payload_hash. This proves "this document is what was anchored."
Data Tree level: Verify the Merkle inclusion proof from the entry's leaf hash to the Data Tree root (proof.root_hash). This proves "this entry is in this Data Tree."
Super-Tree level (inclusion): Verify the Super-Tree inclusion proof from the Data Tree root to the Super Root (super_proof.super_root). This proves "this Data Tree is in the Super-Tree."
Super-Tree level (consistency): Verify the consistency proof from genesis to the current Super Root. This proves "the Super-Tree was never rewritten."
Anchor level: Verify external anchors (TSA on the Data Tree root, Bitcoin OTS on the Super Root). This proves "these roots existed at these specific times, attested by parties outside the operator's control."

Each level builds on the one below it. Compromise any level and the chain breaks -- the verifier gets a definitive failure, not a silent pass.

And because every level uses standard Merkle proofs (RFC 9162 inclusion and consistency), the entire verification stack is built from two primitives. One for "this leaf is in this tree." One for "this smaller tree is a prefix of this larger tree." Everything else is composition.

The full implementation is open source: github.com/evidentum-io/atl-core (Apache-2.0)

Files discussed in this post:

src/core/verify/super_tree.rs -- Super-Tree verification: inclusion, consistency to origin, cross-receipt verification
src/core/merkle/crypto.rs -- genesis leaf hash with ATL-CHAIN-v1 domain separator
src/core/receipt.rs -- SuperProof struct definition

DEV Community

The Super-Tree: How One Merkle Tree Proves Another

Why Not One Big Tree

The Genesis Leaf

The Super-Tree

What a Receipt Contains

Verifying Super-Tree Inclusion

Verifying Consistency to Origin

Cross-Receipt Verification

Why `from_size` Is Always 1

The Two-Level Trust Chain

Top comments (0)

Why Not One Big Tree

The Genesis Leaf

The Super-Tree

What a Receipt Contains

Verifying Super-Tree Inclusion

Verifying Consistency to Origin

Cross-Receipt Verification

Why from_size Is Always 1

The Two-Level Trust Chain

Why `from_size` Is Always 1