DEV Community

Cover image for From Merkle Trees to Verkle Commitments: How Ethereum Actually Stores and Proves Truth
Ankita Virani
Ankita Virani

Posted on

From Merkle Trees to Verkle Commitments: How Ethereum Actually Stores and Proves Truth

The Assumption That Quietly Breaks Most Systems

Most engineers think they understand how blockchain stores data.

They don’t.

They operate on simplified mental models:

  • “Transactions are stored in blocks”
  • “Merkle Trees are used for verification”

These are not wrong.
They are incomplete.

And incomplete mental models lead to bad system design.

The Core Misconception

A blockchain does not store truth.

It stores a cryptographic commitment to a state.

Formally:

R = C(D)

Where:

  • ( D ) is the full dataset (transactions, accounts, state)
  • ( C ) is a commitment function
  • ( R ) is a fixed-size root (typically 32 bytes)

This is the fundamental abstraction most people miss.

Why This Matters (System Consequences)

This single design choice enables:

  • Sub-linear verification
  • Stateless clients
  • Light client security
  • Rollup architectures
  • ZK proof systems

But it also imposes a requirement:

If you don’t understand commitments, you cannot reason about modern blockchain systems.

The Ethereum System Model (What Most Engineers Never Formalize)

Stop thinking in components. Think in layers.

Ethereum System = {
  Commitment Layer   → Data compression
  Execution Layer    → State transition
  Verification Layer → Proof of correctness
}
Enter fullscreen mode Exit fullscreen mode


plaintext

1. Commitment Layer

R = C(D)

Compresses large state into a constant-size root.

2. Execution Layer

S(t+1) = E(S(t), T)

Where:

  • (S(t)) is current state
  • ( T ) is transaction set

3. Verification Layer

V(S, T, R) = valid

Ensures correctness without recomputing everything.

Critical Insight

These layers are independent but tightly coupled:

  • Execution produces state
  • Commitment compresses it
  • Verification proves it

If your commitment layer is weak, everything above it collapses.

Merkle Trees Are Not a Data Structure

They are a commitment scheme over a dataset.

That distinction is not academic. It is architectural.

Formal Construction

Let:

D = {x1, x2, ..., xn}

Step 1: Leaf Hashing

h_i = H(x_i)

Step 2: Pairwise Aggregation

h(i,j) = H(h_i || h_j)

Step 3: Recursive Reduction

Continue until a single value remains:

R = C_merkle(D)

What the Root Actually Represents

The root ( R ):

  • Represents the entire dataset
  • Is constant size
  • Changes if any element changes

Formally:

x_k ≠ x_k' → R ≠ R'

This is a global integrity guarantee.

Visual Model: Data → Commitment

Image

One root. Entire dataset.

Change one bit → new root.

That’s the contract.

Proof of Inclusion (Where the Real Power Is)

Compression is not the innovation.

Selective verification is.

Definition

For element ( x_k ), the proof is:

π_k = {s1, s2, ..., s_log(n)}

Sibling hashes along the path.

Verification

h_k = H(x_k)

h1 = H(h_k || s1)
h2 = H(h1 || s2)
...
R̂ = root
Enter fullscreen mode Exit fullscreen mode


plaintext
Check:

R̂ == R

What’s Actually Happening

You reconstruct the root from:

  • one leaf
  • a logarithmic proof

No full dataset needed.

Complexity

Operation Complexity
Build (O(n))
Proof (O(\log n))
Verify (O(\log n))

Real Blockchain Flow

Transactions

Verification:

User → Request Proof  
→ Receive (Transaction + Merkle Proof)  
→ Recompute Root  
→ Compare with Block Header Root  
→ Accept / Reject
Enter fullscreen mode Exit fullscreen mode

Where Merkle Trees Break (Real Systems, Not Textbooks)

This is where most articles stop. This is where real engineering starts.

1. Write Amplification

Update Cost = O(log n)

In Ethereum:

  • State size exceeds 100GB
  • Each block touches hundreds to thousands of accounts
  • Every update requires recomputing multiple trie nodes

This creates sustained pressure on:

  • disk I/O
  • CPU hashing
  • state synchronization

2. Proof Size Problem

Theoretical complexity:

O(log n)

Practical reality:

  • ~500 bytes to 2KB per proof
  • A block with ~1000 accesses → ~1MB witness

This directly impacts:

  • stateless client feasibility
  • rollup bandwidth
  • light client performance

Merkle proofs scale logarithmically, but not efficiently enough in practice.

3. ZK Mismatch (Critical)

Hash functions behave differently in zk systems.

Hash ZK Cost
SHA-256 ~25k constraints
Poseidon ~200–300

That’s not a small gap.

That’s a design failure if ignored.

4. State Structure Mismatch

Merkle Trees assume:

  • ordered data
  • static structure

Blockchain state is:

  • dynamic
  • sparse
  • key-value

Result:

  • Merkle Patricia Trees
  • hexary tries
  • complexity explosion

Ethereum Reality: It’s Not a Simple Merkle Tree

Ethereum does not use a binary Merkle tree.

It uses a Merkle Patricia Trie, which introduces:

  • Hexary branching (16 children per node)
  • Key hashing using Keccak-256
  • Separate tries for accounts and storage

Implications:

  • deeper and more complex paths
  • larger proofs
  • higher update cost

This design made Ethereum flexible, but not optimal for scalability.

ZK Reality (Most Engineers Underestimate This)

Cost_zk(hash) >> Cost_cpu(hash)

This flips design priorities.

Implication

Future systems must be:

ZK-native, not ZK-compatible

Evolution: From Merkle to Verkle

Merkle Trees solve:

  • integrity

They do NOT solve:

  • proof size
  • stateless scalability
  • zk efficiency

Verkle Trees (Next Generation)

Instead of hashing pairs, they use polynomial commitments.

C = Σ (a_i * g_i)

What Changes

Property Merkle Verkle
Proof size log(n) ~constant
Model hash polynomial
Verification hashing pairing

Why Verkle Trees Matter

Merkle proof size grows with tree depth.

Verkle proof size is nearly constant.

This changes everything:

  • Smaller witnesses per block
  • Practical stateless clients
  • Reduced bandwidth requirements

This is not an optimization.

This is a requirement for Ethereum’s future scalability.

The Compression Stack (Unifying Insight)

This is the part most people never connect:

  • Merkle → compress data
  • Verkle → compress proofs
  • ZK → compress verification

System Objective

min(trust, data, verification cost)

That’s the real optimization problem.

Post-Quantum Reality (Subtle but Important)

Merkle Trees rely on hash functions.

So:

Primitive Status
ECDSA broken (quantum)
RSA broken
Merkle degraded but viable
Hash-based signatures strong

The Hidden Insight

Merkle Trees are not about trees.

They are about:

compressing trust into a verifiable commitment

Final Thought

Merkle Trees explain how blockchains compress data into commitments.

But real systems are not limited by correctness.

They are limited by:

  • bandwidth
  • state growth
  • verification cost

This is why the evolution toward Verkle Trees and ZK systems is not optional.

It is inevitable.

Because the goal is not just to store truth.

It is to make truth efficiently verifiable at global scale.

Top comments (0)