Arnaud Briche

Posted on Jul 14

From Go to Rust: Supercharging Our ClickHouse UDFs with Alloy

At Agnostic, we build open-source infrastructure for collaborative blockchain data platforms. One of our flagship tools is clickhouse-evm, a suite of high-performance User Defined Functions (UDFs) that brings native Ethereum decoding and querying capabilities directly into ClickHouse.

While our Go-based implementation has served us well, we've been exploring whether Rust—with its rapidly maturing Ethereum ecosystem—could take us even further. The potential benefits are compelling: better performance, enhanced safety, and improved portability that could make it easier to bring these UDFs to other analytical engines like DataFusion or DuckDB.

At the heart of this exploration is Alloy, a promising Rust library offering composable, well-designed primitives for Ethereum data. Its type system and tooling make it an ideal candidate for building cleaner, more robust decoding logic and ABI handling.

To put this theory to the test, we reimplemented a core piece of our stack in Rust: the evm_decode_event UDF. The results were encouraging enough that we wanted to share our findings.

This post walks through our benchmarking methodology, compares the Rust and Go implementations, and explores the performance gains, developer experience improvements, and future opportunities this migration unlocks.

Understanding `evm_decode_event`

Before diving into the benchmarks, let's clarify what evm_decode_event actually does. Here's a practical example:

SELECT evm_decode_event(
    [
        evm_hex_decode('0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'),
        evm_hex_decode('0x00000000000000000000000063dfe4e34a3bfc00eb0220786238a7c6cef8ffc4'),
        evm_hex_decode('0x000000000000000000000000936c700adf05d1118d6550a3355f66e93c9476c6')
    ]::Array(FixedString(32)),
    evm_hex_decode('0x0000000000000000000000000000000000000000000000000000000252e9f940'),
    ['event Transfer(address indexed,address indexed,uint256)']
) as evt
SETTINGS output_format_arrow_string_as_string=0

Result:

{
  "inputs": {
    arg0": "0x63DFE4e34A3bFC00eB0220786238a7C6cEF8Ffc4",
    "arg1": "0x936C700Adf05d1118D6550A3355f66e93C9476C6", 
    "arg2": "9981000000"
  },
  "fullsig": "event Transfer(address indexed, address indexed, uint256)",
  "signature": "Transfer(address,address,uint256)"
}

The UDF takes raw bytes from a log's topics and data, along with a list of fullsig strings—compact, human-readable representations of ABI event signatures containing all the information needed for decoding. It attempts to decode the log using each fullsig in sequence, returning a JSON representation of the decoded event on success, or an error if no signatures match.

The Rust Implementation Journey

ClickHouse communicates with UDF processes via standard input/output, streaming blocks of rows back and forth. The first challenge in porting evm_decode_event was selecting an efficient serialization format for data exchange between ClickHouse and our Rust binary.

Serialization Strategy

In our Go implementation, we leveraged the excellent ch-go library, which provides low-level serialization directly in ClickHouse's native binary format—the most efficient and tightly integrated option available.

Unfortunately, we couldn't find an equivalent implementation of the ClickHouse native format in Rust. As an alternative, we opted for the Arrow IPC format, which offered two key advantages:

High performance with minimal overhead when converting to/from ClickHouse native blocks
Mature Rust ecosystem support through libraries like arrow-rs

The Alloy Advantage

For the core decoding logic, Alloy proved to be a game-changer. It provides ergonomic and efficient abstractions for working with Ethereum data, including crucial functionality to parse fullsig strings into typed Event objects. Once parsed, decoding logs from topics and data becomes straightforward.

The porting experience was surprisingly smooth—even for someone relatively new to Rust. Alloy handled much of the heavy lifting, and Arrow proved to be a practical bridge to ClickHouse integration.

Benchmarking Setup: Real-World Scale

To ensure meaningful results, we designed our benchmark around substantial real-world data: 1 million blocks of raw log data from our open-source dataset, extracted using ClickHouse's native format for efficient processing.

Data extraction query:

SELECT * 
FROM iceberg('https://data.agnostic.dev/agnostic-ethereum-mainnet/logs', 
             SETTINGS iceberg_use_version_hint=true) 
WHERE block_number BETWEEN 21000000 AND 21099999 
INTO OUTFILE './tmp/ethereum_mainnet_logs_sample_21000000_21099999.bin' 
FORMAT Native

Dataset characteristics:

SELECT
    formatReadableQuantity(count(*)) as total_logs,
    min(block_number) as start_block,
    max(block_number) as end_block,
    formatReadableSize(any(_size)) as file_size
FROM file('./benchmark/ethereum_mainnet_logs_sample_21000000_21099999.bin', Native)

Total Logs	Start Block	End Block	File Size
40.87 million	21,000,000	21,099,999	11.40 GiB

This dataset provides a rich, realistic snapshot of Ethereum mainnet activity—large enough to meaningfully stress-test our decoding pipeline while remaining manageable for iterative experimentation.

ABI Signature Dictionary

We also needed a comprehensive collection of fullsig definitions to interpret the raw log data. We maintain a regularly updated set sourced from excellent daily Sourcify dumps, providing broad coverage of known Ethereum event signatures.

CREATE DICTIONARY evm_abi_decoding (
    selector String,
    fullsigs Array(String)
)
PRIMARY KEY selector
SOURCE(file(path './benchmark/sourcify_20250519.parquet' format 'Parquet'))
LIFETIME(0)
LAYOUT(hashed())

Dictionary stats:

Signatures: 1.62 million
Memory usage: 223.98 MiB

ClickHouse's dictionary engine enables efficient querying of this rich signature set with minimal memory overhead—perfect for high-throughput benchmarking.

Benchmark Results: A Performance Journey

Our benchmark query processes each log row through evm_decode_event, attempting to decode it into structured JSON:

WITH decoded_logs AS (
    SELECT
        JSONExtract(
            evm_decode_event(
                topics::Array(FixedString(32)),
                data::String,
                dictGet(evm_abi_decoding, 'fullsigs', topics[1]::String)
            ),
            'JSON'
        ) as evt
    FROM file('./tmp/ethereum_mainnet_logs_sample_21000000_21099999.bin', 'Native')
)
SELECT 
    formatReadableQuantity(count(*)) as total_logs,
    formatReadableQuantity(countIf(evt.error IS NULL)) as decoded_logs,
    formatReadableQuantity(countIf(evt.error IS NOT NULL)) as undecoded_logs,
    formatReadableQuantity(countIf(evt.error IS NULL) / count(*)) as decoding_ratio
FROM decoded_logs

Round 1: The Debug Build Embarrassment

Initial results were... disappointing:

Implementation	Duration	Decoded Logs	Events/s	Decoding Ratio
clickhouse-evm (Go)	63s	39.87M	632.86K	0.98
ch-evm (Rust)	344s	40.16M	116.74K	0.98

After some head-scratching, I discovered a classic mistake: I was running a debug build! 🤦‍♂️

Sometimes the biggest performance bottleneck is your own oversight. Always double-check your build mode before blaming the code!

Round 2: Release Build Reality Check

Implementation	Duration	Decoded Logs	Events/s	Decoding Ratio
clickhouse-evm (Go)	63s	39.87M	632.86K	0.98
ch-evm (debug)	344s	40.16M	116.74K	0.98
ch-evm (release)	54s	40.16M	738.64K	0.98

Much better! The Rust implementation now outperforms Go by 17%, even with the overhead of native ⇄ Arrow conversions. The slightly improved decoding ratio suggests Alloy's ABI decoder handles more edge cases than our custom Go implementation.

Round 3: Eliminating serde_arrow

Profiling with flamegraph-rs revealed that ~10% of CPU time was spent on serde_arrow conversions. In our Go implementation, we operate directly on columnar data for maximum performance.

We removed the serde_arrow layer and worked directly with Arrow arrays:

Implementation	Duration	Decoded Logs	Events/s	Decoding Ratio
clickhouse-evm (Go)	63s	39.87M	632.86K	0.98
ch-evm (no serde_arrow)	41s	40.16M	990.68K	0.98

Significant improvement! We're now decoding at nearly 1 million logs per second—a 57% improvement over the original Go implementation.

Round 4: JSON Serialization Optimization

Further profiling showed substantial time spent in serde_json. Instead of converting Alloy's DynSolValue enum to serde_json::Value and then serializing, we implemented a direct formatter that visits DynSolValue recursively, writing JSON chunks directly.

This eliminated a whole layer of allocations:

Implementation	Duration	Decoded Logs	Events/s	Decoding Ratio
clickhouse-evm (Go)	63s	39.87M	632.86K	0.98
ch-evm (final optimized)	34s	40.16M	1.21M	0.98

Final Results: 91% Performance Improvement

Here is a nice chart that summarizes our journey:

Our optimized Rust implementation achieved 1.21 million events per second—a 91% performance improvement over the original Go version while maintaining the same high decoding accuracy.

Beyond Performance: Strategic Advantages

This exploration delivered more than just speed improvements:

Developer Experience

Alloy's ergonomic design makes complex Ethereum operations feel natural and safe. The type system catches errors at compile time that might only surface during runtime in other languages.

Portability

The Rust implementation positions us well for expansion beyond ClickHouse. We can now more easily bring these UDFs to DataFusion, DuckDB, and other analytical engines.

Ecosystem Access

Rust opens doors to blockchain ecosystems where it's the dominant language—particularly Solana. This natural fit could significantly expand Agnostic's reach in blockchain data analytics.

Looking Forward

This proof-of-concept has been so successful that we've decided to rewrite all UDFs in clickhouse-evm using Rust. The new repository, ch-evm, will be our primary development focus going forward, with all new UDFs developed in Rust.

The combination of Rust's performance characteristics, Alloy's powerful abstractions, and the broader ecosystem opportunities make this transition a strategic win for Agnostic's data infrastructure evolution.

Conclusion

The Rust + Alloy combination has proven to be a powerful force multiplier for blockchain data processing. What started as an exploration of potential performance gains has evolved into a comprehensive strategy for expanding our capabilities across the blockchain ecosystem.

For teams working with blockchain data at scale, this experience demonstrates that modern Rust tooling—particularly Alloy—has matured to the point where it can deliver both superior performance and developer productivity. The future of blockchain data infrastructure is looking increasingly Rust-colored.

Interested in blockchain data infrastructure? Check out our open-source tools at Agnostic or explore our datasets at data.agnostic.dev.

DEV Community

From Go to Rust: Supercharging Our ClickHouse UDFs with Alloy

Understanding `evm_decode_event`

The Rust Implementation Journey

Serialization Strategy

The Alloy Advantage

Benchmarking Setup: Real-World Scale

ABI Signature Dictionary

Benchmark Results: A Performance Journey

Round 1: The Debug Build Embarrassment

Round 2: Release Build Reality Check

Round 3: Eliminating serde_arrow

Round 4: JSON Serialization Optimization

Final Results: 91% Performance Improvement

Beyond Performance: Strategic Advantages

Developer Experience

Portability

Ecosystem Access

Looking Forward

Conclusion

Top comments (0)

Understanding evm_decode_event

The Rust Implementation Journey

Serialization Strategy

The Alloy Advantage

Benchmarking Setup: Real-World Scale

ABI Signature Dictionary

Benchmark Results: A Performance Journey

Round 1: The Debug Build Embarrassment

Round 2: Release Build Reality Check

Round 3: Eliminating serde_arrow

Round 4: JSON Serialization Optimization

Final Results: 91% Performance Improvement

Beyond Performance: Strategic Advantages

Developer Experience

Portability

Ecosystem Access

Looking Forward

Conclusion

Understanding `evm_decode_event`