You get a file hash and a proof that claims it's anchored in some blockchain transaction. How do you verify that without trusting the service that created it?
The answer is Merkle proof validation. You take the file's SHA-256, combine it with sibling hashes step by step, and see if you end up at the same root that's stored on-chain. If the math checks out, the proof is valid. If it doesn't, someone's lying.
Here's how to build that validation from scratch.
What You're Actually Checking
A Merkle tree bundles multiple file hashes into a single root hash. Each level combines pairs of hashes until you get one value at the top. That root gets written to a blockchain transaction.
The proof gives you a path from your file hash to that root. At each step, you get a sibling hash and a position (left or right). You combine them, hash the result, and move up one level.
import hashlib
def combine_hashes(left_hash, right_hash):
"""Combine two SHA-256 hashes into their parent hash"""
combined = left_hash + right_hash
return hashlib.sha256(bytes.fromhex(combined)).hexdigest()
# Example: combining two leaf hashes
left = "a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3"
right = "b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78"
parent = combine_hashes(left, right)
print(f"Parent: {parent}")
The critical detail: you convert the hex strings to bytes before hashing. Using .encode('utf-8') on the hex string gives you a different result entirely.
Walking Up the Tree
A proof contains your starting hash plus a list of steps. Each step has a sibling hash and whether it goes on the left or right side.
def verify_merkle_proof(file_hash, proof_steps):
"""
Verify a Merkle proof by walking from leaf to root
Args:
file_hash: SHA-256 of your file (64-char hex)
proof_steps: List of dicts with 'hash' and 'position' keys
Returns:
Final root hash after walking all steps
"""
current = file_hash
for step in proof_steps:
sibling = step['hash']
position = step['position']
if position == 'left':
# Sibling goes left, current goes right
current = combine_hashes(sibling, current)
else:
# Current goes left, sibling goes right
current = combine_hashes(current, sibling)
print(f"Step: {current}")
return current
The position tells you which side your current hash goes on. If the sibling position is "left", your hash goes on the right. The naming can be confusing, but the math is straightforward.
Complete Validation Example
Let's verify a real proof structure. You'd get something like this from a blockchain anchoring service:
# Your file's SHA-256 hash
file_hash = "a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3"
# Proof steps to walk up the tree
proof_steps = [
{
'hash': 'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78',
'position': 'right'
},
{
'hash': 'c3499c2729730a7f807efb8676a266dcdb59aa4f7b1a27d3c33a9e8f1b14e8b5',
'position': 'left'
}
]
# Expected root (what should be on the blockchain)
expected_root = "d5688a52d55a02ec4aea5ec1eadfffe1c9e0ee6a4ddbe2377f98326d42dfc975"
# Verify the proof
calculated_root = verify_merkle_proof(file_hash, proof_steps)
if calculated_root == expected_root:
print(f"✓ Proof valid! Root matches: {calculated_root}")
else:
print(f"✗ Proof invalid!")
print(f" Expected: {expected_root}")
print(f" Got: {calculated_root}")
If this prints a checkmark, your file was definitely included in the batch that got anchored. If it fails, either the proof is wrong or someone modified the file.
What This Actually Proves
Merkle proof validation tells you two things:
- Inclusion: Your file hash was part of the batch when the root was calculated
- Integrity: The file hasn't changed since then (different file = different hash = proof fails)
What it doesn't prove:
- That the root actually exists on a blockchain (you need to check that separately)
- When the anchoring happened (that's in the transaction timestamp)
- Who created the proof (anyone with the tree structure can generate valid proofs)
The math is bulletproof, but you're still trusting that the blockchain transaction is real and the timestamp is accurate.
Build Your Own Validator
You can wrap this into a complete validation function:
def validate_blockchain_proof(file_path, proof_data):
"""Validate a file against a blockchain proof"""
# Hash the file
with open(file_path, 'rb') as f:
file_content = f.read()
file_hash = hashlib.sha256(file_content).hexdigest()
# Extract proof components
merkle_steps = proof_data.get('merkle_path', [])
expected_root = proof_data.get('merkle_root')
if not merkle_steps or not expected_root:
return {'valid': False, 'error': 'Missing proof data'}
# Walk the tree
calculated_root = verify_merkle_proof(file_hash, merkle_steps)
return {
'valid': calculated_root == expected_root,
'file_hash': file_hash,
'calculated_root': calculated_root,
'expected_root': expected_root
}
This gives you independent verification of any Merkle-based blockchain proof. The service that created it doesn't need to be online. The algorithm works the same whether you're validating evidence for legal proceedings, checking backup integrity, or auditing content pipelines.
Top comments (0)