Ankita Virani

Posted on Apr 19

From Merkle Trees to Verkle Commitments: How Ethereum Actually Stores and Proves Truth

#ethereum #blockchain #web3 #cryptography

The Assumption That Quietly Breaks Most Systems

Most engineers think they understand how blockchain stores data.

They don’t.

They operate on simplified mental models:

“Transactions are stored in blocks”
“Merkle Trees are used for verification”

These are not wrong.
They are incomplete.

And incomplete mental models lead to bad system design.

The Core Misconception

A blockchain does not store truth.

It stores a cryptographic commitment to a state.

Formally:

R = C(D)

Where:

( D ) is the full dataset (transactions, accounts, state)
( C ) is a commitment function
( R ) is a fixed-size root (typically 32 bytes)

This is the fundamental abstraction most people miss.

Why This Matters (System Consequences)

This single design choice enables:

Sub-linear verification
Stateless clients
Light client security
Rollup architectures
ZK proof systems

But it also imposes a requirement:

If you don’t understand commitments, you cannot reason about modern blockchain systems.

The Ethereum System Model (What Most Engineers Never Formalize)

Stop thinking in components. Think in layers.

Ethereum System = {
  Commitment Layer   → Data compression
  Execution Layer    → State transition
  Verification Layer → Proof of correctness
}

plaintext

1. Commitment Layer

R = C(D)

Compresses large state into a constant-size root.

2. Execution Layer

S(t+1) = E(S(t), T)

Where:

(S(t)) is current state
( T ) is transaction set

3. Verification Layer

V(S, T, R) = valid

Ensures correctness without recomputing everything.

Critical Insight

These layers are independent but tightly coupled:

Execution produces state
Commitment compresses it
Verification proves it

If your commitment layer is weak, everything above it collapses.

Merkle Trees Are Not a Data Structure

They are a commitment scheme over a dataset.

That distinction is not academic. It is architectural.

Formal Construction

Let:

D = {x1, x2, ..., xn}

Step 1: Leaf Hashing

h_i = H(x_i)

Step 2: Pairwise Aggregation

h(i,j) = H(h_i || h_j)

Step 3: Recursive Reduction

Continue until a single value remains:

R = C_merkle(D)

What the Root Actually Represents

The root ( R ):

Represents the entire dataset
Is constant size
Changes if any element changes

Formally:

x_k ≠ x_k' → R ≠ R'

This is a global integrity guarantee.

Visual Model: Data → Commitment

One root. Entire dataset.

Change one bit → new root.

That’s the contract.

Proof of Inclusion (Where the Real Power Is)

Compression is not the innovation.

Selective verification is.

Definition

For element ( x_k ), the proof is:

π_k = {s1, s2, ..., s_log(n)}

Sibling hashes along the path.

Verification

h_k = H(x_k)

h1 = H(h_k || s1)
h2 = H(h1 || s2)
...
R̂ = root

plaintext
Check:

R̂ == R

What’s Actually Happening

You reconstruct the root from:

one leaf
a logarithmic proof

No full dataset needed.

Complexity

Operation	Complexity
Build	(O(n))
Proof	(O(\log n))
Verify	(O(\log n))

Real Blockchain Flow

Verification:

User → Request Proof  
→ Receive (Transaction + Merkle Proof)  
→ Recompute Root  
→ Compare with Block Header Root  
→ Accept / Reject

Where Merkle Trees Break (Real Systems, Not Textbooks)

This is where most articles stop. This is where real engineering starts.

1. Write Amplification

Update Cost = O(log n)

In Ethereum:

State size exceeds 100GB
Each block touches hundreds to thousands of accounts
Every update requires recomputing multiple trie nodes

This creates sustained pressure on:

disk I/O
CPU hashing
state synchronization

2. Proof Size Problem

Theoretical complexity:

O(log n)

Practical reality:

~500 bytes to 2KB per proof
A block with ~1000 accesses → ~1MB witness

This directly impacts:

stateless client feasibility
rollup bandwidth
light client performance

Merkle proofs scale logarithmically, but not efficiently enough in practice.

3. ZK Mismatch (Critical)

Hash functions behave differently in zk systems.

Hash	ZK Cost
SHA-256	~25k constraints
Poseidon	~200–300

That’s not a small gap.

That’s a design failure if ignored.

4. State Structure Mismatch

Merkle Trees assume:

ordered data
static structure

Blockchain state is:

dynamic
sparse
key-value

Result:

Merkle Patricia Trees
hexary tries
complexity explosion

Ethereum Reality: It’s Not a Simple Merkle Tree

Ethereum does not use a binary Merkle tree.

It uses a Merkle Patricia Trie, which introduces:

Hexary branching (16 children per node)
Key hashing using Keccak-256
Separate tries for accounts and storage

Implications:

deeper and more complex paths
larger proofs
higher update cost

This design made Ethereum flexible, but not optimal for scalability.

ZK Reality (Most Engineers Underestimate This)

Cost_zk(hash) >> Cost_cpu(hash)

This flips design priorities.

Implication

Future systems must be:

ZK-native, not ZK-compatible

Evolution: From Merkle to Verkle

Merkle Trees solve:

integrity

They do NOT solve:

proof size
stateless scalability
zk efficiency

Verkle Trees (Next Generation)

Instead of hashing pairs, they use polynomial commitments.

C = Σ (a_i * g_i)

What Changes

Property	Merkle	Verkle
Proof size	log(n)	~constant
Model	hash	polynomial
Verification	hashing	pairing

Why Verkle Trees Matter

Merkle proof size grows with tree depth.

Verkle proof size is nearly constant.

This changes everything:

Smaller witnesses per block
Practical stateless clients
Reduced bandwidth requirements

This is not an optimization.

This is a requirement for Ethereum’s future scalability.

The Compression Stack (Unifying Insight)

This is the part most people never connect:

Merkle → compress data
Verkle → compress proofs
ZK → compress verification

System Objective

min(trust, data, verification cost)

That’s the real optimization problem.

Post-Quantum Reality (Subtle but Important)

Merkle Trees rely on hash functions.

So:

Primitive	Status
ECDSA	broken (quantum)
RSA	broken
Merkle	degraded but viable
Hash-based signatures	strong

The Hidden Insight

Merkle Trees are not about trees.

They are about:

compressing trust into a verifiable commitment

Final Thought

Merkle Trees explain how blockchains compress data into commitments.

But real systems are not limited by correctness.

They are limited by:

bandwidth
state growth
verification cost

This is why the evolution toward Verkle Trees and ZK systems is not optional.

It is inevitable.

Because the goal is not just to store truth.

It is to make truth efficiently verifiable at global scale.