The Ethereum-blockchain size will not exceed 1TB anytime soon.
Afri Schoedon Nov 29 '17 Updated on Dec 17, 2017
Before diving into this article, please read the two disclosures about my involvement (1,2) and the one on data accuracy (3) at the bottom of the article.
At least once a month someone posts a chart on r/ethereum predicting the blockchain size of Ethereum will soon exceed 1 TB. I want to take that chance to clean up with some stories around the Ethereum-blockchain size in this article and try to explain why this chart is technically correct, but not the full picture.
Let's have a look at this chart first. It shows the complete data directory size of an Ethereum node (red), Geth in this case, and a Bitcoin node (blue), probably Bitcoin-Core, plotted over time. While the Bitcoin graph is moving slightly upwards in a seemingly linear inclination, the Ethereum graph reminds the reader of an exponential growing slope.
Users accusing Ethereum of blockchain-bloat are not far off with their assumptions. But actually, not the chain is bloated but the Ethereum state. I want to examine some terminology from the Whitepaper before proceeding.
- Block. A bundle of transactions which, after proper execution, update the state. Each transaction-bundling block gets a number, has some difficulty, and contains the most recent state.
- State. The state is made up of all initialized Ethereum accounts. At the time of writing, there are around 12 million known accounts and contracts growing at a rate of roughly 100k new accounts per day.
- Block-History. A chain of all historical blocks, starting at the genesis block up to the latest best block, also known as the blockchain.
- State-History. The state of each historical block makes up the state history. I will get into the details on this later.
If this already bores you, now please, read on.
Early 2016, the Go-Ethereum team introduced a so-called fast synchronization mode. Since then, it was pretty famous to run
geth --fast, especially after the spam-attacks on Ethereum later the same year making a full synchronization mode painful. I'm writing these modes italic because I will come back to an essential disambiguation at a later point in this article. Just keep them in mind for now.
The Parity team (formerly Ethcore) reacted to the on-chain spam by offering a warp synchronization mode at the end of 2016 to ease the chain synchronization for new users. Much as the same as Geth's fast,
parity --warp soon became the de-facto standard mode for users trying to synchronize the Ethereum chain. As of today, both these options are adapted as default in both clients.
But what does it mean to fast-sync versus full-sync a Geth node? What does it actually mean to warp-sync a Parity node rather than no-warp-syncing it?
A full Geth node processes the entire blockchain and replays all transactions that ever happened. A fast Geth node downloads all transaction receipts in parallel to all blocks, and the most recent state database. It switches to a full synchronization mode once done with that. Note, that this results not only in a fast sync but also in a pruned state-database because the historical states are not available for blocks smaller than best block minus 1024. That's not an issue, but before reading on, please keep in mind that Geth synchronization modes are also pruning modes.
Looking at Parity configuration options, this gets more complex. In addition to the previously mentioned synchronization modes, Parity also offers separate pruning modes, namely fast and archive... Right, Geth fast is a sync-mode, we learned, that even prunes, however, Parity fast is pruning mode not heavily coupled to the sync mode. At this point, I have to admit, the terminology is confusing, and I might have lost you already. Let's draw something with pen and paper.
Geth's fast enables a quicker synchronization and database pruning. Geth full disables both. Parity warp, however, can be disabled without disabling the state-trie pruning! This is a significant sentence. Thus I bolded it. And I am not comparing Ethereum clients here, that's not my intention at least. I want to show you that it is possible to run a full-verifying Ethereum node with a small database. Parity just provides the proof-of-concept for this.
But why is this? Because as long as you have all historic blocks on your disk, you can compute any historical state from it by reprocessing the entire chain again. But in most use-cases, you don't need historical states at all! Therefore it is smart just to delete outdated entries from the state history and to reduce your required disk space by 95%.
Some 10's of GB by just running
parity --no-warp. Earlier this fall it was less than 20 GB, but the state is growing very fast. Currently, the raw historical block data containing the blocks and transactions is approximately 12-15GB in size and the latest state around 1-2GB.
But is this to be considered a full Ethereum node? Yes:
- It runs a full blockchain synchronization starting at genesis.
- It replays all transactions and executes all contracts.
- It recomputes the state for each block.
- It keeps all historical blocks on the disk.
- It keeps the most recent states on the disk and prunes ancient states.
Something an Ethereum client never does is deleting old blocks. This is a significant difference between Bitcoin and Ethereum because pruning a Bitcoin node does not leave any choice but removing old blocks. With this context available, it's easier to understand why users often think a pruned Ethereum node is not a full node. But now, dear reader, you know the opposite is
And on top of this, even a warp-synced Parity node is downloading the whole history of blocks after the initial synchronization allowing it to serve the network as a full node once completed the ancient-block synchronization.
Below is a screenshot of my nicely-colored spreadsheet trying to distinguish between node-security of different Parity operation modes.
05 are all to be considered full nodes. Configuration
06 is a default-configuration warp-node which can be regarded as full once the ancient block download is finished. However, it does not replay all transactions; it only checks the Proof-of-Work of the historical blocks.
07 is something users often ask for but should be highly discouraged in production use. This setting is comparable to a pruned bitcoin node as historical blocks are partially not available. This is not a full node anymore. Note, how I added a separator above this paragraph. You get the idea.
08 is a light client, but that's worth another blog article. Thanks for scrolling this far down, here is your conclusion: An Ethereum full node does not require more than 20-30 GB disk space by default. :)
Noteworthy disclosures and bottom-line comments.
(1) I work for Parity. I'm comparing different Parity configurations not only because I sincerely know and understand them, but also because Parity allows users to configure pruning mode and synchronization mode separately.
(2) I hold some Bitcoin and some Ether. I hope this does not have any influence on the technical aspects I'm outlining in this article. Also, I'm trying not to become overly political about this.
(2) I have been running Parity in 36 different configurations over six weeks to gather the numbers. This is time- and resource-consuming, and still, it bears the issue that I can not keep all configurations running at the same time, and therefore, the accuracy of the numbers presented in this article have to be consumed with caution. I expect the results to differ up to plus/minus 20% from other nodes running the same configuration. But you get the idea:
| ID | Pruning / DB Config | Verification | Available History | ETH | ETC | MSC | EXP | Parity CLI Options | |====|=====================|=================|============================|============|============|============|============|============================================| | 00 | archive +Fat +Trace | Full/No-Warp | All Blocks + States | 385 GB | 90 GB | 25 GB | 5.6 GB | --pruning archive --tracing on --fat-db on | | 01 | archive +Trace | Full/No-Warp | All Blocks + States | 334 GB | 90 GB | 21 GB | 5.8 GB | --pruning archive --tracing on | | 02 | archive | Full/No-Warp | All Blocks + States | 326 GB | 91 GB | 30 GB | 5.5 GB | --pruning archive | | 03 | fast +Fat +Trace | Full/No-Warp | All Blocks + Recent States | 37 GB | 13 GB | 3.5 GB | 1.3 GB | --tracing on --fat-db on | | 04 | fast +Trace | Full/No-Warp | All Blocks + Recent States | 34 GB | 13 GB | 3.5 GB | 1.2 GB | --tracing on | | 05 | fast | Full/No-Warp | All Blocks + Recent States | 26 GB | 9.7 GB | 3.0 GB | 1.1 GB | --no-warp | | 06 | fast +Warp | PoW-Only/Warp | All Blocks + Recent States | 25 GB | 9.6 GB | 2.6 GB | 0.96 GB | | | 07 | fast +Warp -Ancient | No-Ancient/Warp | Recent Blocks + States | 5.3 GB | 2.9 GB | 0.19 GB | 0.13 GB | --no-ancient-blocks | | 08 | light | Headers/Light | No Blocks + No State | 5 MB | 3 MB | 4 MB | 5 MB | --light |
Version: Parity/v1.8.0-unstable-7940bf6ec-20170921/x86_64-linux-gnu/rustc1.19.0 from source w/ musicoin support Ubuntu: 17.04 Kernel 4.10.0-35-generic / September 2017 / Lenovo Thinkpad X270, Core i7-7600U, 1TB SSD, 16GB RAM
Thanks for scrolling to the bottom. <3
Fun fact: While publishing this article, the price of Bitcoin broke 10_000 USD and Ethereum 500 USD. I think I will add current market prices to my articles in future, just for fun.
Update: Thanks for rating this top-1 post, dear dev.to team <3 <3 <3
Update: Here is a more controversial discussion on Hackernews.