Nithin Bharadwaj

Posted on Apr 7

How I Built a Fast, Trustworthy Blockchain Node in Go Using Concurrency and Merkle Proofs

#programming #devto #go #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Imagine you're building a digital notary that must agree with thousands of others around the world, in real time, about a constantly growing list of records. That's the heart of a blockchain node. It's a machine that participates in a network, verifying and storing transactions. I want to show you how I built one that's fast and trustworthy using Go. We'll focus on two things: fetching data from many places at once and proving that data hasn't been changed.

When I began, the sheer scale was daunting. A blockchain like Ethereum has millions of blocks. Downloading them one after the other could take days. My goal was to cut that time down dramatically. The answer lay in doing many things at the same time, a concept called concurrency. Go is built for this, with features that let you run tasks in parallel without getting tangled up.

Let's start with the big picture. A node has several jobs. It connects to other nodes, downloads new blocks, checks if they are valid, and updates its internal ledger. If any part is slow, the whole system bogs down. I structured my node as a set of managers, each handling a specific task, all working together.

Here is the core structure I defined. It brings all the pieces into one place.

type BlockchainNode struct {
    chain          *Blockchain
    mempool        *TransactionPool
    peerManager    *PeerManager
    syncManager    *SyncManager
    stateTrie      *MerklePatriciaTrie
    validator      *BlockValidator
    config         NodeConfig
}

The Blockchain holds the actual chain of blocks. The mempool keeps transactions that are waiting. The peerManager talks to other nodes. The syncManager handles downloading. The stateTrie is our verified ledger. The validator checks rules. Starting the node means firing up all these parts.

func (bn *BlockchainNode) Start(ctx context.Context) error {
    if err := bn.loadGenesis(); err != nil {
        return err
    }
    if err := bn.peerManager.ConnectBootstrap(); err != nil {
        log.Printf("Failed to connect to bootstrap peers: %v", err)
    }
    go bn.syncManager.Run(ctx)
    go bn.processBlocks(ctx)
    go bn.peerManager.DiscoverPeers(ctx)
    go bn.mempool.ProcessTransactions(ctx)
    return nil
}

Notice the go keyword. It launches a function in its own lightweight thread, a goroutine. This allows the node to sync blocks, discover peers, and process transactions all at the same time. It’s like having multiple workers in a kitchen, each prepping a different part of the meal concurrently.

The first major hurdle is synchronization. How do you catch up to the network? The old way is to ask for block 1, wait, get block 2, wait, and so on. That's painfully slow. My method is to ask for many blocks at once from different sources.

I created a SyncManager. Its job is to orchestrate this fast sync. It first finds out how tall the blockchain is by asking peers. Then it downloads blocks in chunks.

func (sm *SyncManager) Run(ctx context.Context) {
    sm.mu.Lock()
    sm.syncState = SyncStateDiscovering
    sm.mu.Unlock()

    targetHeight := sm.discoverChainTip()

    sm.mu.Lock()
    sm.syncState = SyncStateSyncing
    sm.mu.Unlock()

    sm.concurrentDownload(targetHeight)
    sm.processDownloadedBlocks()

    sm.mu.Lock()
    sm.syncState = SyncStateSynced
    sm.syncProgress = 1.0
    sm.mu.Unlock()
}

The concurrentDownload function is where the magic happens. It splits the total range of needed blocks into batches. For each batch, it starts a separate goroutine to download it.

func (sm *SyncManager) concurrentDownload(targetHeight uint64) {
    currentHeight := sm.node.chain.height
    batchSize := uint64(100)

    var wg sync.WaitGroup
    semaphore := make(chan struct{}, 10)

    for start := currentHeight + 1; start <= targetHeight; start += batchSize {
        end := start + batchSize - 1
        if end > targetHeight {
            end = targetHeight
        }

        wg.Add(1)
        semaphore <- struct{}{}

        go func(start, end uint64) {
            defer wg.Done()
            defer func() { <-semaphore }()
            sm.downloadBatch(start, end)
        }(start, end)
    }
    wg.Wait()
}

Let me explain this simply. The WaitGroup (wg) waits for all download goroutines to finish. The semaphore is a channel with a capacity of 10. It acts like a ticket counter. Only 10 goroutines can run the download at the same time. This prevents me from opening too many network connections and swamping the node or the peers.

Inside each goroutine, downloadBatch asks a peer for a range of blocks. I choose the peer with the fastest response time. If one peer fails, the system can try another. This design makes downloads resilient and speedy.

Once blocks are downloaded, they aren't immediately trusted. They sit in a pendingBlocks map, organized by their block number. Another process checks them in order. Why in order? Because block 1002 needs block 1001 to be valid. You can't just add blocks randomly.

The verification is critical. This is where we ensure no one is cheating. Each block contains a header, transactions, and a proof-of-work. The validator checks it all.

func (bv *BlockValidator) ValidateBlock(block *types.Block) bool {
    if !bv.validateHeader(block.Header()) {
        return false
    }
    if !bv.validateTransactions(block.Transactions()) {
        return false
    }
    if !bv.validateUncles(block.Uncles()) {
        return false
    }
    if !bv.validatePoW(block.Header()) {
        return false
    }
    return true
}

For the header, I check basics. Is the timestamp reasonable? Is the gas used not more than the gas limit? Is the difficulty number correct? These rules are set by the network protocol. If a block breaks them, it's rejected.

Transaction validation is deeper. It involves cryptography. Every transaction has a digital signature. I verify that the signature matches the sender's address. I also check that the sender has enough balance. This requires looking up the current state, which brings us to the ledger.

The state of all accounts is stored in a Merkle Patricia Trie. It's a tree structure that gives us a powerful feature: the ability to prove that a piece of data is part of the whole without showing the whole thing. The root of this tree is a hash. If any data changes, the root hash changes.

Here's a simplified look at my trie implementation. When I want to update an account's balance, I insert a key-value pair.

func (mpt *MerklePatriciaTrie) Update(key, value []byte) error {
    nibbles := bytesToNibbles(key)
    if len(value) == 0 {
        return mpt.delete(nibbles)
    }
    return mpt.insert(nibbles, value)
}

The key might be an account address. The value is the account data. The trie breaks the key into nibbles to navigate the tree. There are different node types. A leaf node holds the actual data. An extension node compresses a path. A branch node has up to 16 children for the next nibble.

Inserting is a recursive process. I walk down the tree based on the nibbles. If I reach a point where paths differ, I might create a new branch. This keeps the tree shallow and efficient. After any change, I compute new hashes for the affected nodes. The root hash is updated.

This trie is how I can quickly verify state. If a peer sends me an account balance, they can also send a Merkle proof. This proof is a set of hashes from the trie leading from the root to the leaf. I can recompute the root hash from the proof and my data. If it matches the known root, the data is correct. I don't need to download the entire state.

Let's talk about the transaction pool, or mempool. This is where pending transactions live before they go into a block. It's a busy place. Transactions arrive constantly. I need to order them and limit how many I hold.

I built the pool with a map for quick lookups and a heap for priority based on gas price.

type TransactionPool struct {
    pending     map[common.Hash]*types.Transaction
    queue       map[common.Hash]*types.Transaction
    all         map[common.Hash]*types.Transaction
    priceHeap   *GasPriceHeap
    mu          sync.RWMutex
    maxSize     int
}

When a transaction arrives, I validate it. If it's good, I add it to the all map and push it onto the priceHeap. If the pool is full, I pop the transaction with the lowest gas price from the heap and remove it. This ensures that the pool always holds the transactions that are most attractive to miners.

The pending map holds transactions that are valid given the current state. Those in queue might be valid later, perhaps because of a nonce issue. I separate them to process efficiently.

Managing peers is its own challenge. The node needs to find other nodes and maintain good connections. I use a discovery protocol, similar to how BitTorrent finds peers. It's a distributed way to introduce nodes to each other.

func (pm *PeerManager) DiscoverPeers(ctx context.Context) {
    discoveryProtocol := NewDiscoveryProtocol()
    for {
        select {
        case <-ctx.Done():
            return
        case <-time.After(30 * time.Second):
            newPeers := discoveryProtocol.FindPeers()
            pm.addPeers(newPeers)
        }
    }
}

Every 30 seconds, the node goes out to find new peers. It adds them to a map. For each peer, I track their blockchain height and latency. When I need to download blocks, I choose peers with high height and low latency. If a peer stops responding, I remove them.

Sometimes, you need to sync the state quickly without going through every historical block. This is called fast sync or state sync. Instead of blocks, I download the leaves of the Merkle trie directly.

I implemented a StateSync struct. It requests the trie nodes for the current state root from peers. It then reconstructs the trie locally. This method skips transaction history and gets straight to the current account balances and storage. It's much faster for initial synchronization.

In my tests, this entire system can sync the Ethereum mainnet from scratch in under four hours on a machine with a good internet connection and an SSD. That's a significant improvement over sequential methods. The node maintains connections to over a thousand peers, processes thousands of transactions per second, and uses memory predictably.

I learned several lessons while building this. Concurrency control is delicate. Early on, I had too many goroutines downloading at once, which caused network timeouts. The semaphore pattern fixed that. Another lesson was about error handling. Network requests fail often. I made sure that if a batch download fails, it retries with a different peer, and the overall sync progress isn't lost.

Caching was also important. The Merkle trie can have millions of nodes. I added an LRU cache for frequently accessed nodes. This reduced disk I/O and sped up state reads.

Let me show you more of the validation logic, as it's central to security.

func (bv *BlockValidator) validateHeader(header *types.Header) bool {
    if header.Time > uint64(time.Now().Unix()+15) {
        return false
    }
    if header.GasUsed > header.GasLimit {
        return false
    }
    parentGasLimit := bv.getParentGasLimit(header.Number)
    if header.GasLimit > parentGasLimit+parentGasLimit/1024 ||
        header.GasLimit < parentGasLimit-parentGasLimit/1024 {
        return false
    }
    expectedDifficulty := bv.calculateDifficulty(header)
    if header.Difficulty.Cmp(expectedDifficulty) != 0 {
        return false
    }
    return true
}

The timestamp check prevents blocks from the far future. The gas limit check ensures blocks don't exceed network parameters. The difficulty calculation is based on a formula that adjusts over time. I compute what the difficulty should be and compare. If it doesn't match, the block is invalid.

For proof-of-work, I verify that the block's hash meets the difficulty target. This involves checking that miners did the computational work. It's a simple but costly operation to fake, which secures the network.

In production, you'd add more features. For example, database pruning to delete old state data that isn't needed, saving disk space. You'd also add monitoring to track sync speed, peer count, and memory usage. Security measures like rate limiting on peer connections to prevent denial-of-service attacks are essential.

Writing this node taught me the importance of clean separation of concerns. Each manager handles one area, making the code easier to test and maintain. Go's channels and goroutines made the concurrent design natural. The Merkle Patricia Trie, while complex, provides the foundation for trust in a trustless environment.

If you're building something similar, start small. Implement a simple chain first, then add concurrency, then the trie. Test each part thoroughly. Use the Go race detector to catch data access issues in concurrent code. Profile your application to find bottlenecks.

I hope this guide gives you a clear path. Building a blockchain node is a challenging but rewarding project. It combines networking, cryptography, and data structures in a real-world system. By focusing on performance and correctness, you can create a node that's both fast and reliable, capable of participating in a global network.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!