boba bobo

Posted on Jun 18

OpenChainBench: the open audit layer crypto infrastructure was missing

#webdev #opensource #crypto #api

Every time I wanted to pick a crypto data provider, an RPC, a bridge aggregator or a price oracle, I hit the same wall. Every vendor claims the lowest latency. Every vendor claims the best uptime. Nobody publishes a methodology you can audit. The status pages are green even when production is on fire. Comparison posts read like sponsored content.

So I built the thing I always wanted, then made it fully open source.

It is called OpenChainBench. Today it runs over 20 benchmarks across six categories, on a fixed methodology, with raw Prometheus metrics anyone can query, a public MCP server for AI agents, JSON endpoints for any frontend, and badges you can drop into a README. The whole site, the harnesses, the specs and the data layer are MIT licensed.

This post walks through what is shipped, why we built it the way we did, and how to use it if you ship anything that talks to a chain.

The shape of the problem

The crypto API stack is now wide. A typical dapp or trading bot will touch:

An RPC provider (Alchemy, QuickNode, Ankr, Tenderly, dRPC, PublicNode, a self hosted node).
A market data provider (Mobula, CoinGecko, CoinMarketCap, Coinpaprika, DefiLlama).
A bridge aggregator (LiFi, Socket, Squid, Across, Jumper).
A perp or DEX aggregator (Jupiter, 1inch, Paraswap, OpenOcean).
A wallet labels or NFT metadata service (Reservoir, Simplehash, OpenSea, Magic Eden).
An oracle (Chainlink, Pyth, Redstone, Switchboard).
A prediction market venue (Polymarket, Kalshi).

Each layer ships marketing pages with three numbers and an SLA. None of them publish the actual distributions. None of them tell you what happens at p99 when the chain forks, when there is a 200 BNB volume spike, when Singapore goes through a degraded peering window.

OpenChainBench measures the providers in each of these layers, side by side, on identical inputs, from multiple regions, every minute, into a shared Prometheus instance you can query.

The methodology in five rules

The whole project hinges on a methodology people can attack. If the design is opaque, the numbers are worthless. So we wrote it down and pinned it to every page.

Identical inputs. Every provider in a given benchmark receives the same request, the same pair, the same notional, the same destination, the same region, submitted concurrently. If we test bridge quote latency for 1 ETH from Arbitrum to Base, all aggregators get that same payload in the same second.

Honest aggregates. We publish p50, p90 and p99 latency, plus success rate. We never headline means. A vendor that fails 30 percent of requests but is fast on the other 70 percent will not look fast in our tables.

Auditable runs. Every benchmark spec is a YAML file in the repo. Every measurement lands in a public Prometheus federation. The harness code is open. If you do not believe a number, you can rerun the bench against the same provider with the same query.

No cherry picking. The benchmark plan is committed before the first run. Providers are locked. We do not silently drop the providers that look bad in our tables.

Neutral presentation. Tables sort mechanically by p50. There is no preset winner. The site does not say "X is the best", it says "X had the lowest median over the rolling 24 hour window on this date". If the order changes tomorrow, the page changes tomorrow.

The full version with examples lives at openchainbench.com/methodology.

The benchmarks at a glance

Six categories. The names are intentionally boring because we want vendors to engage with the criteria, not the branding.

Aggregators. Bridge quote latency, bridge fee, aggregator head lag (how stale is the index against the chain tip), perp fees.

Blockchains. L1 finality, L2 block time, validator yield, gas estimation accuracy, token deployment cost, network fees.

Bridges. Bridge quote latency, bridge fee.

NFT APIs. Collection metadata coverage.

RPCs. RPC capabilities matrix, network coverage, metadata coverage.

Trading. Polymarket API latency, Polymarket data freshness, Polymarket rate limits, Polymarket resolution delay, Hyperliquid frontends, stablecoin peg, oracle deviation, buyback audit.

You can browse the full list at openchainbench.com/benchmarks.

How the site is organised

The information architecture today is built around two stable surfaces.

/products/[provider]. One page per provider. Shows every bench that provider appears in, the current rank, the headline value, and a sparkline. The full provider registry is browsable at openchainbench.com/products.

/compare/[pair]. Head to head pages between curated provider pairs. Each page surfaces the aggregate values, the per chain breakdown when the bench exposes a chain dimension, and the per region breakdown when the bench exposes a region dimension.

The shape of the site will grow. New surfaces ship on a staging branch first so we never push a half wired hub straight to production.

The data, free and CORS open

OpenChainBench is useless if you cannot pull the data out. So every measurement is exposed through plain JSON endpoints, with CORS open, edge cached, CC BY 4.0 licensed.

# Flat index of every live benchmark
curl https://openchainbench.com/api/citable

# Full detail for one bench (rankings, sparkline, methodology, citation)
curl https://openchainbench.com/api/stat/aggregator-head-lag

# Freshness probe (one timestamp per bench)
curl https://openchainbench.com/api/freshness

# A markdown blob ready to inject in a system prompt
curl https://openchainbench.com/api/llm-context

# OpenAPI 3.1 schema of all endpoints
curl https://openchainbench.com/api/openapi.json

Two extra surfaces worth knowing about.

Badges. Drop the current rank of a provider into any README. The SVG is refreshed every minute.

![bench](https://openchainbench.com/api/badge/aggregator-head-lag/codex)

OG images. Each bench gets a 1200 by 630 PNG with the current value, the leader and a tiny sparkline. Tweet a bench and the embed is current, not a screenshot from last week.

https://openchainbench.com/api/og/aggregator-head-lag

The MCP server

This one is the piece I am most proud of. AI agents need provider performance data, but they need it in a form they can reason about. Static markdown goes stale. Web scraping is slow and fragile. So we ship a Model Context Protocol server.

Endpoint: https://openchainbench.com/api/mcp/mcp (Streamable HTTP, no SSE).

It exposes three tools:

list_benchmarks returns the flat index. The agent uses this to discover what is available.
get_benchmark(slug, chain?, region?) returns full detail for one bench with optional dimensional slicing.
query_prom(query, windowSec?, steps?) is a guarded PromQL passthrough.

Resources are also pinnable. Every live bench is exposed as openchainbench://benchmark/{slug} in both Markdown and JSON, so an agent can stick a benchmark in its context window for the whole session.

The PromQL passthrough is the interesting part. We want agents to be able to write their own queries, but we cannot let them enumerate the entire Prometheus instance. So the endpoint runs every query through a hardening pipeline.

Pattern blocks reject __name__=~, empty selectors {}, and aggregations over topology labels.
Every bare metric name must match an allowlisted prefix tied to a published benchmark namespace (head_lag_seconds, bridge_*, l1_finality_*, metadata_*, and so on).
The query is stripped of comments, strings and durations before the scan, so Unicode whitespace tricks do not slip through.
Per IP rate limit at 60 requests per 60 seconds. Request body capped at 64 KB. JSON RPC batches rejected.

No auth, no signup, public.

A Claude or Cursor session that adds OpenChainBench as an MCP server gets full read access to live infrastructure performance, in tool form, with cited URLs the agent can include in its answers. The walkthrough is at openchainbench.com/mcp.

How we run it

The architecture is intentionally boring.

Site is Next.js on Vercel, App Router, ISR.
Snapshots live in Upstash Redis. The site reads snapshots only, never Prometheus directly.
A worker on Railway sweeps every bench every 60 seconds and writes new snapshots.
The harnesses for each bench category run on dedicated Railway containers. Each one hardcodes a :2112 Prometheus exporter so the shared federation can scrape them on a fixed port.
The Prometheus federation is one shared instance. Every bench scrapes into it.

The carry forward layer matters. When a single harness goes down (CDN issue, RPC outage, container restart), the page still renders the last known value with a staleness indicator, instead of going blank. Cold misses are rare because the worker is always warm.

Hosting cost is in the tens of dollars per month, not hundreds. The whole thing can pay for itself in newsletter sponsorships when it gets traction.

The vision

There are three places I want to push this in the next twelve months.

Coverage. Twenty plus benchmarks is enough to be useful but nowhere near complete. Missing categories I want to add: account abstraction bundlers, intent solvers, MEV protected RPC, Solana priority fee oracles, ZK proof prover networks, restaking points feeds, prediction market resolution feeds beyond Polymarket and Kalshi.

Reproducibility. Every bench should have a one command reproduction. Today the harness code is open but the orchestration on Railway is not yet a turnkey docker compose. The roadmap is to ship a make reproduce target that any engineer can run on a laptop and get the same distributions, modulo network locality.

Editorial layer. The raw data is one half. The other half is helping operators make the call. The end state is something closer to a Consumer Reports of crypto infrastructure: every quarter, a short editorial post per category with the data underneath it.

A neutral, open, audited benchmark layer for the whole crypto API stack. That is the goal.

Get involved

The repo is at github.com/ChainBench/OpenChainBench. The license is MIT for the site and the harnesses, CC BY 4.0 for the data. PRs are welcome. The lowest barrier contribution is adding a provider to a registry, the highest is proposing a new benchmark category.

Site: openchainbench.com
Methodology: openchainbench.com/methodology
API: openchainbench.com/api/openapi.json
MCP server: https://openchainbench.com/api/mcp/mcp
Twitter: @OpenChainBench

If you ship anything that talks to a chain, I would love your benchmark wishlist. If you are a provider and you want to be added to a registry, open an issue. If you find a number you disagree with, file a PR with the harness rerun output.

The whole point of the project is that nobody should have to take our word for it. The data is open. The methodology is open. The code is open. Now go look at it.

DEV Community