Introduction
In my previous article, I tried to described the concept of a blockchain with code. This time, I'll try to describe the structure of a single block. I will use the Bitcoin blockchain to explain blocks, but keep in mind that the concepts will remain more or less the same. It could be useful to read my last article to understand a few things first.
Structure of a block
A block is a container data structure. In the Bitcoin world, a block contains more than 500 transactions on average. The average size of a block seems to be 1MB (source). In Bitcoin Cash ( a hard fork from the Bitcoin blockchain ), the size of a block can go up to 8MB. This enables more transactions to be processed per second.
Anyway, a block is composed of a header and a long list of transactions. Let's start with the header.
Block Header
The header contains metadata about a block. There are three different sets of metadata:
The previous block hash. Remember that in a blockchain, every block is inherits from the previous block because we use the previous block's hash to create the new block's hash. For every block N, we feed it the hash of the block N-1.
Mining competition. For a block to be part of the blockchain, it needs to be given a valid hash. This contains the timestamp, the nonce and the difficulty. Mining is another crucial part of the blockchain technology, but it is outside the scope of this article.
The third part is a merkle tree root. This is a data structure to summarize the transactions in the block. And we will leave it at that for now. More on this later.
Block identifiers
To identify a block, you have a cryptographic hash, a digital signature if you will. This is created by hashing the block header twice with the SHA256 algorithm. For example, this is a block. I will refer to this block as an example for this article.
The block header hash for this particular block is (right column): 000000000000000000301fcfeb141088a93b77dc0d52571a1185b425256ae2fb
We also can see the previous block's hash (right column): 0000000000000000004b1ef0105dc1275b3adfd067aed63a43324929bed64fd7
Remember that we used the second hash to create the first. Every block uses the previous block's hash to construct its own hash. The block hash is a unique identifier. You won't find two blocks with the same hash.
The other way to identify a specific block is the block height. The is the position of the block in the blockchain. Our example's block is in the 500312 position. This means that there are 500311 blocks before this one. Since the creation of the Bitcoin blockchain in 2009, 500312 blocks have been created ( at the time of writing obviously ).
A block height is not unique. Several blocks can compete for the same position in the case of a fork, like Bitcoin Cash for example.
Merkle Trees
The transactions in a block are contained in a structure called a merkle tree or binary hash tree.
I feel that topics like that are easier to understand with actual examples. So we'll go coding for this. A merkle tree is constructed by recursively hashing pairs of nodes ( in this case, transactions ), until there is only one hash, called the root or merkle root. If we stay in the Bitcoin world, the cryptographic hash algorithm used is SHA256. This is applied twice each time.
An example: We have a block with 4 transactions. For the sake of simplicity, each transaction is a string:
const tA = 'Hello'
const tB = 'How are you?'
const tC = 'This is Thursday'
const tD = 'Happy new Year'
To construct our merkle tree, we start from the bottom. We take each transaction and double-hash them. I'll use the js-sha256 package here.
const sha256 = require('js-sha256').sha256
// Double-hashing here
const hA = sha256(sha256(tA))
const hB = sha256(sha256(tB))
const hC = sha256(sha256(tC))
const hD = sha256(sha256(tD))
//Results
52c87cd40ccfbd7873af4180fced6d38803d4c3684ed60f6513e8d16077e5b8e //hA
426436adcaca92d2f41d221e0dd48d1518b524c56e4e93fd324d10cb4ff8bfb9 //hB
6eeb307fb7fbc0b0fdb8bcfdcd2d455e4f6f347ff8007ed47475481a462e1aeb //hC
fd0df328a806a6517e2eafeaacea72964f689d29560185294b4e99ca16c63f8f //hD
Ok, great. Now remember that I wrote a merkle tree is constructed hashing pairs of nodes. So, we will pair our transactions and concatenate their hashes. Then, we will double hash them too. We will create a hash using the hashes hA and hB, and another for hC and hD. Then, we repeat that process until we have only one hash left and no more pairs to work with. The last hash will be our merkle root.
With only four transactions, this will be rather quick:
//Pairing hA and hB
const hAB = sha256(sha256(hA + hB))
//5dc23d1a2151665e2ac258340aa9a11ed227a4cc235e142a3e1738333575590b
//Pairing hC and hD
const hCD = sha256(sha256(hC + hD))
//ff220daefda29821435691a9aa07dd2c47ca1d2574b8b77344aa783142bae330
// We do it again. We pair hAB and hCD
// This is our root!
const hABCD = sha256(sha256(hAB + hCD))
//301faf21178234b04e1746ee43e126e7b2ecd2188e3fe6986356cc1dd7aa1377
The node at the top of the merkle tree is called the root. This is the information that is stored in the block header in each block on the blockchain. This is how transactions are summarized in each block. In our example block given earlier, the merkle root can be found in the right column:
a89769d0487a29c73057e14d89afafa0c01e02782cba6c89b7018e5129d475cc
It doesn't matter how many transactions are contained in a block, they always will be summarized by a 32 bytes hash.
Note: The merkle tree is a binary tree. If there is an odd number of transactions, the last one will be duplicated so we can construct our tree.
Because all the leaves in our tree depends on other leaves, it is impossible to alter one leaf without altering others. If you change only one leaf ( one transaction ), the hash changes, therefore the hash you constructed by pairing it with another leaf changes, therefore the merkle root will be different.
You can prove that any transaction is included in a block by creating a authentification path or merkle path. You only need to know log base 2(N) 32-byte hashes. For example:
-For my 4 transactions merkle tree:
log base 2( 4 ) = 2 => If I have a path of 2 hashes for a tree of 4 transactions, I can manage to prove if a transaction belongs to this merkle tree.
For a 16 transactions merkle tree:
log base 2( 16 ) = 4 => If I have a path of 4 hashes for a tree of 16 transactions, I can manage to prove if a transaction belongs to this merkle tree.
log base 2( 1500 ) = 10.55 => If I have a path of 11 hashes for a tree of 1500 transactions, I can manage to prove if a transaction belongs to this merkle tree.
Perhaps a little diagram will help.
There are 16 leaves in this tree. We construct our tree from the bottom up by pairing each leaf. Now, anybody can prove that the leaf I ( in orange ) is part of this block by having the path given in green. We have only 4 hashes, but that is enough to know if the leaf I belongs here. That is because with those informations, we are able to construct every single leaf we need( in yellow ). We can create IJ, IJKL, IJKLMNOP and the root and check if those hashes correspond. This is why it is very complicated to cheat a blockchain. To change one thing means you must change everything.
Well, that's pretty much what a block contains in the Bitcoin blockchain. Hope it helped!
Top comments (28)
Damien, this is absolutely brilliant. I've re-read twice now and it's just beginning to click. Thanks for writing this up. I especially love the graphic to explain the merkle root concept.
Thank you for the kind words. Glad it helped.
This really helped -- so clearly written!
I have a rookie question: how are the transactions included in a block determined? I expect that at any given time, all the miners are working with a similar but non-identical set of transactions to create the next block. Do they have to check the transactions included in every new block while they are computing the hashes, and discard all the work if a newly added block includes any transaction they were working with?
Yes, whenever a new block is mined, miners have to create a candidate block. This candidate block includes all the transactions that have not been mined yet. If a miner was trying to mine a block and failed, she will check which transactions have been included in the winning block. Whatever transactions is leftover becomes part of this new candidate block.
If you want a deeper explaination on this, I wrote an article called Blockchain: what is mining? That explains this concept :)
Thank you! I will read up that article!
There are 16 leaves in this tree. We construct our tree from the bottom up by pairing each leaf. Now, anybody can prove that the leaf I ( in orange ) is part of this block by having the path given in green. We have only 4 hashes, but that is enough to know if the leaf I belongs here. That is because with those informations, we are able to construct every single leaf we need( in yellow ). We can create IJ, IJKL, IJKLMNOP and the root and check if those hashes correspond. This is why it is very complicated to cheat a blockchain. To change one thing means you must change everything.
How can we prove that I is belong to this block.is it mathematical way or any other ways?can you explain little ?
In this example, by having the path J, KL, MNOP and ABCDEFGH, you can re-created the hash of each pair.
You have the I hash and J hash, so you can create a hash IJ. Because you also have the hash KL, you can create the hash IJKL.... If one hash doesn't match the original, you know the I hash is corrupt.
Do we have I hash, J hash and so on? I thought a block contains only the Merkle root? When you say I hash and J hash, do you mean the transaction IDs that are included in the block? If so, they we can compute IJ hash, etc. until we arrive at the Merkle root? Does the block include IJ hash for validation purpose? Or it is just the Merkle root that can be used for validation? Thx
Yes, I J K ... are transactions hashes that are included in the block here.
Whenever a transaction ( in this case I ) claims to be a part of a block, we can control if the hashes we get are the same.
One more doubt. Are intermediate hashes (eg. IJ hash) included in the block or is it only the Merkle root that is included?
Intermediate hashes are included. If I understood this part correctly, the client wants to verify a transaction is part of a block. The client gets a bloom filter that will give him the necessary hashes to verify whether or not this transaction is part of the block.
This saves a lot of resources, because you only need a few "leaves" in the tree, and not the entire merkle tree. With the path you get, you can control if you get the same hashes.
thank you very much
but let's say i'm a full-node and i modify an existing block/transaction. specifically, i modify a transaction's unspent output. then i make another new transaction on top of this where i use the previously modified output as an input. let's assume i am the owner of this unspent output that i will be using as input for the new transaction. yes, the hash info for the transaction and thus the block will no longer be correct. when a peer node goes to verify the transaction they notice the transaction has been signed by a valid private key and thus proving i am the owner of the modified unspent output. how long before I actually get caught? am i missing some part of the verification process?
If I understand correctly what your scenario is:
For this new transaction to be part of the blockchain, you, or another miner, would have to find a new Proof-of-Work for this block. This is an entirely new block now.
You mine it and find a valid Proof-of-Work. You propagate your finding to the network and they have to validate this new block.
Here, the problem is much bigger for this block to be accepted.
1) The network sees you tried to cheat the system. The block is rejected and you wasted your resources mining this new block for nothing.
2) The network accepts this new block ( for whatever reasons ). Now, we have to mine every single block that was validated after the one you just proposed because they are all invalid now.
I believe in this case, you won't get caught, because your block will never be part of the blockchain. You can't modify a block that is already part of the blockchain. You will create a new block that will act as a replacement of an existing block. So, for this block to be accepted, the network would also have to provide valid proof-of-work for every single block after the one you want to change.
When you find a valid Proof-of-Work to your block, it is propagated to other nodes. These nodes will verify that the Proof-of-Work is valid, but they will also verify every single transaction in the block. There is a very long list of parameters that nodes must control in order to call a transaction valid ( inputs, outputs...). In this scenario, this is where your fraud will be stopped. Your block won't be accepted by the network.
I hope I answered your question. Let me know if anything wasn't clear.
ok that makes enough sense. from your type-up i get that the consensus from other peer nodes will not be in my favor due to this proof-of-work algorithm that is performed on new blocks.
Very good. I like your article series - I wish I would have read it when I got started, you make it really easy to understand (or at least: as easy as possible).
One small thing I noticed: you wrote "Mining is another crucial part of the blockchain technology, but it is outside the scope of this article".
I have heard many people speak about "the blockchain", implying there is only one, when in fact there are many. The most popular one is the Bitcoin blockchain, and for that chain, your statement is correct. It uses "Proof-of-work" a.k.a. mining for creating new blocks/coins.
There are other blockchains that use "Proof-of-stake", so instead of mining, new blocks are created based on stake, i.e. already owned coins, and random distribution.
Am I correct?
Yes, you are absolutely right. Proof-of-stake is a mechanism that also requires a lot less computational power, so it could probably be used in future to help blockchains scale.
Great and very simplistic article....Thanks
Reallu useful article but one question.
How is data stored in a block exactly. For example if we consider a medical health record system, do many health records are stored in one block or one medical record is only stored in one block. If there can be many records in one block, does one record (transaction) has a unique key or hash.
And if we need to update one record how can it happen?
Hi Damien,
Thank you for your very helpful article. However, I have one question: What would be the transaction we choose to validate? In the above article you choose the transaction I to validate if it exists in the block. So how about other transactions? Is there any specific rules to choose which transactions need to be validated?
Very clear, thanks. Just a small nickpick, I believe that you refer to the pink I leaf when you say the orange one.
Well-structured article! It was very clear. Thanks
Some comments may only be visible to logged-in visitors. Sign in to view all comments.