At Agnostic, we build open-source infrastructure for collaborative blockchain data platforms. One of our flagship tools is clickhouse-evm, a suite of high-performance User Defined Functions (UDFs) that brings native Ethereum decoding and querying capabilities directly into ClickHouse.
While our Go-based implementation has served us well, we've been exploring whether Rust—with its rapidly maturing Ethereum ecosystem—could take us even further. The potential benefits are compelling: better performance, enhanced safety, and improved portability that could make it easier to bring these UDFs to other analytical engines like DataFusion or DuckDB.
At the heart of this exploration is Alloy, a promising Rust library offering composable, well-designed primitives for Ethereum data. Its type system and tooling make it an ideal candidate for building cleaner, more robust decoding logic and ABI handling.
To put this theory to the test, we reimplemented a core piece of our stack in Rust: the evm_decode_event
UDF. The results were encouraging enough that we wanted to share our findings.
This post walks through our benchmarking methodology, compares the Rust and Go implementations, and explores the performance gains, developer experience improvements, and future opportunities this migration unlocks.
Understanding evm_decode_event
Before diving into the benchmarks, let's clarify what evm_decode_event
actually does. Here's a practical example:
SELECT evm_decode_event(
[
evm_hex_decode('0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'),
evm_hex_decode('0x00000000000000000000000063dfe4e34a3bfc00eb0220786238a7c6cef8ffc4'),
evm_hex_decode('0x000000000000000000000000936c700adf05d1118d6550a3355f66e93c9476c6')
]::Array(FixedString(32)),
evm_hex_decode('0x0000000000000000000000000000000000000000000000000000000252e9f940'),
['event Transfer(address indexed,address indexed,uint256)']
) as evt
SETTINGS output_format_arrow_string_as_string=0
Result:
{
"inputs": {
arg0": "0x63DFE4e34A3bFC00eB0220786238a7C6cEF8Ffc4",
"arg1": "0x936C700Adf05d1118D6550A3355f66e93C9476C6",
"arg2": "9981000000"
},
"fullsig": "event Transfer(address indexed, address indexed, uint256)",
"signature": "Transfer(address,address,uint256)"
}
The UDF takes raw bytes from a log's topics and data, along with a list of fullsig strings—compact, human-readable representations of ABI event signatures containing all the information needed for decoding. It attempts to decode the log using each fullsig in sequence, returning a JSON representation of the decoded event on success, or an error if no signatures match.
The Rust Implementation Journey
ClickHouse communicates with UDF processes via standard input/output, streaming blocks of rows back and forth. The first challenge in porting evm_decode_event
was selecting an efficient serialization format for data exchange between ClickHouse and our Rust binary.
Serialization Strategy
In our Go implementation, we leveraged the excellent ch-go library, which provides low-level serialization directly in ClickHouse's native binary format—the most efficient and tightly integrated option available.
Unfortunately, we couldn't find an equivalent implementation of the ClickHouse native format in Rust. As an alternative, we opted for the Arrow IPC format, which offered two key advantages:
- High performance with minimal overhead when converting to/from ClickHouse native blocks
- Mature Rust ecosystem support through libraries like arrow-rs
The Alloy Advantage
For the core decoding logic, Alloy proved to be a game-changer. It provides ergonomic and efficient abstractions for working with Ethereum data, including crucial functionality to parse fullsig strings into typed Event objects. Once parsed, decoding logs from topics and data becomes straightforward.
The porting experience was surprisingly smooth—even for someone relatively new to Rust. Alloy handled much of the heavy lifting, and Arrow proved to be a practical bridge to ClickHouse integration.
Benchmarking Setup: Real-World Scale
To ensure meaningful results, we designed our benchmark around substantial real-world data: 1 million blocks of raw log data from our open-source dataset, extracted using ClickHouse's native format for efficient processing.
Data extraction query:
SELECT *
FROM iceberg('https://data.agnostic.dev/agnostic-ethereum-mainnet/logs',
SETTINGS iceberg_use_version_hint=true)
WHERE block_number BETWEEN 21000000 AND 21099999
INTO OUTFILE './tmp/ethereum_mainnet_logs_sample_21000000_21099999.bin'
FORMAT Native
Dataset characteristics:
SELECT
formatReadableQuantity(count(*)) as total_logs,
min(block_number) as start_block,
max(block_number) as end_block,
formatReadableSize(any(_size)) as file_size
FROM file('./benchmark/ethereum_mainnet_logs_sample_21000000_21099999.bin', Native)
Total Logs | Start Block | End Block | File Size |
---|---|---|---|
40.87 million | 21,000,000 | 21,099,999 | 11.40 GiB |
This dataset provides a rich, realistic snapshot of Ethereum mainnet activity—large enough to meaningfully stress-test our decoding pipeline while remaining manageable for iterative experimentation.
ABI Signature Dictionary
We also needed a comprehensive collection of fullsig definitions to interpret the raw log data. We maintain a regularly updated set sourced from excellent daily Sourcify dumps, providing broad coverage of known Ethereum event signatures.
CREATE DICTIONARY evm_abi_decoding (
selector String,
fullsigs Array(String)
)
PRIMARY KEY selector
SOURCE(file(path './benchmark/sourcify_20250519.parquet' format 'Parquet'))
LIFETIME(0)
LAYOUT(hashed())
Dictionary stats:
- Signatures: 1.62 million
- Memory usage: 223.98 MiB
ClickHouse's dictionary engine enables efficient querying of this rich signature set with minimal memory overhead—perfect for high-throughput benchmarking.
Benchmark Results: A Performance Journey
Our benchmark query processes each log row through evm_decode_event
, attempting to decode it into structured JSON:
WITH decoded_logs AS (
SELECT
JSONExtract(
evm_decode_event(
topics::Array(FixedString(32)),
data::String,
dictGet(evm_abi_decoding, 'fullsigs', topics[1]::String)
),
'JSON'
) as evt
FROM file('./tmp/ethereum_mainnet_logs_sample_21000000_21099999.bin', 'Native')
)
SELECT
formatReadableQuantity(count(*)) as total_logs,
formatReadableQuantity(countIf(evt.error IS NULL)) as decoded_logs,
formatReadableQuantity(countIf(evt.error IS NOT NULL)) as undecoded_logs,
formatReadableQuantity(countIf(evt.error IS NULL) / count(*)) as decoding_ratio
FROM decoded_logs
Round 1: The Debug Build Embarrassment
Initial results were... disappointing:
Implementation | Duration | Decoded Logs | Events/s | Decoding Ratio |
---|---|---|---|---|
clickhouse-evm (Go) | 63s | 39.87M | 632.86K | 0.98 |
ch-evm (Rust) | 344s | 40.16M | 116.74K | 0.98 |
After some head-scratching, I discovered a classic mistake: I was running a debug build! 🤦♂️
Sometimes the biggest performance bottleneck is your own oversight. Always double-check your build mode before blaming the code!
Round 2: Release Build Reality Check
Implementation | Duration | Decoded Logs | Events/s | Decoding Ratio |
---|---|---|---|---|
clickhouse-evm (Go) | 63s | 39.87M | 632.86K | 0.98 |
ch-evm (debug) | 344s | 40.16M | 116.74K | 0.98 |
ch-evm (release) | 54s | 40.16M | 738.64K | 0.98 |
Much better! The Rust implementation now outperforms Go by 17%, even with the overhead of native ⇄ Arrow conversions. The slightly improved decoding ratio suggests Alloy's ABI decoder handles more edge cases than our custom Go implementation.
Round 3: Eliminating serde_arrow
Profiling with flamegraph-rs revealed that ~10% of CPU time was spent on serde_arrow
conversions. In our Go implementation, we operate directly on columnar data for maximum performance.
We removed the serde_arrow
layer and worked directly with Arrow arrays:
Implementation | Duration | Decoded Logs | Events/s | Decoding Ratio |
---|---|---|---|---|
clickhouse-evm (Go) | 63s | 39.87M | 632.86K | 0.98 |
ch-evm (no serde_arrow) | 41s | 40.16M | 990.68K | 0.98 |
Significant improvement! We're now decoding at nearly 1 million logs per second—a 57% improvement over the original Go implementation.
Round 4: JSON Serialization Optimization
Further profiling showed substantial time spent in serde_json
. Instead of converting Alloy's DynSolValue
enum to serde_json::Value
and then serializing, we implemented a direct formatter that visits DynSolValue
recursively, writing JSON chunks directly.
This eliminated a whole layer of allocations:
Implementation | Duration | Decoded Logs | Events/s | Decoding Ratio |
---|---|---|---|---|
clickhouse-evm (Go) | 63s | 39.87M | 632.86K | 0.98 |
ch-evm (final optimized) | 34s | 40.16M | 1.21M | 0.98 |
Final Results: 91% Performance Improvement
Here is a nice chart that summarizes our journey:
Our optimized Rust implementation achieved 1.21 million events per second—a 91% performance improvement over the original Go version while maintaining the same high decoding accuracy.
Beyond Performance: Strategic Advantages
This exploration delivered more than just speed improvements:
Developer Experience
Alloy's ergonomic design makes complex Ethereum operations feel natural and safe. The type system catches errors at compile time that might only surface during runtime in other languages.
Portability
The Rust implementation positions us well for expansion beyond ClickHouse. We can now more easily bring these UDFs to DataFusion, DuckDB, and other analytical engines.
Ecosystem Access
Rust opens doors to blockchain ecosystems where it's the dominant language—particularly Solana. This natural fit could significantly expand Agnostic's reach in blockchain data analytics.
Looking Forward
This proof-of-concept has been so successful that we've decided to rewrite all UDFs in clickhouse-evm using Rust. The new repository, ch-evm, will be our primary development focus going forward, with all new UDFs developed in Rust.
The combination of Rust's performance characteristics, Alloy's powerful abstractions, and the broader ecosystem opportunities make this transition a strategic win for Agnostic's data infrastructure evolution.
Conclusion
The Rust + Alloy combination has proven to be a powerful force multiplier for blockchain data processing. What started as an exploration of potential performance gains has evolved into a comprehensive strategy for expanding our capabilities across the blockchain ecosystem.
For teams working with blockchain data at scale, this experience demonstrates that modern Rust tooling—particularly Alloy—has matured to the point where it can deliver both superior performance and developer productivity. The future of blockchain data infrastructure is looking increasingly Rust-colored.
Interested in blockchain data infrastructure? Check out our open-source tools at Agnostic or explore our datasets at data.agnostic.dev.
Top comments (0)