DEV Community

estell
estell

Posted on

We Replaced Our RPC Layer With eRPC

TL;DR

We finished migrating from in-house RPC to eRPC. Our homegrown stack handled scale, but it was noisy and hard to tune. eRPC—the open-source EVM RPC proxy—gave us a pragmatic RPC load balancer with JSON-RPC caching, hedging, and clean failure modes. Result: fewer spikes, clearer scaling signals, and latency mostly dominated by providers (not our proxy).

Why touch a “working” system?

Running wallets across many chains means lots of JSON-RPC reads/writes, multiple providers per chain, 4337/bundler methods, and strict uptime. Our custom Go + caches + selection logic worked… until tail-latency spikes and flaky providers made it painful to operate and evolve.

What we wanted in one box:

  • Multi-upstream per chain with smart selection
  • RPC load balancer behavior (hedging, retries, backoff)
  • JSON-RPC caching that’s reorg-aware
  • Clean observability and failure modes
  • Easy k8s deploy

Why eRPC?

eRPC is an EVM RPC proxy that fronts your upstream providers and adds:

  • Hedged requests to cut tail latency
  • Permanent/reorg-aware caching and multiplexing to slash duplicate reads
  • Failover + circuit breakers so provider blips don’t page you
  • Simple config, Kubernetes-friendly

Migration story (short & hoonest)

We rolled it out per-chain:

  1. Start with one chain and a couple of upstreams
  2. Verify standard and 4337/bundler calls
  3. Run in parallel with our old system for a while
  4. Tune cache TTLs, hedging, and selection policy
  5. Promote to more chains

The “failsafe” designation only kicks in when others are unhealthy. We wanted “A, then B, then a true last-resort C.” Marking the last one as a normal upstream but with low priority did the trick.

Results that mattered

  • Stability: CPU/memory stopped yo-yoing. Spikes turned into smooth plateaus.
  • Latency: After hedging + cache tuning, p95 mostly tracked provider health, not our proxy.
  • Operational clarity: Easier to attribute errors (provider vs proxy), calmer on-call.
  • Costs: Caching/multiplexing reduced duplicate reads.

migrating from in-house rpc to erpc, evm rpc proxy, json-rpc caching, rpc load balancer

If you run multi-chain infra, try this

  • Ship small: 1 chain, a few upstreams, and explicit routing for writes.
  • Run both paths for a week; compare logs/metrics before cutting over.
  • Be intentional with “last resort” upstreams (don’t rely on magic).
  • Tune TTLs per method; logs & blockByNumber benefit from short, reorg-aware caches.
  • Watch tail latency—hedging is your friend, but set limits.

Openfort overview (how we wire this into wallets):
Docs → Overview

Open to Discussion:

  • How would you tune hedging for indexers vs. interactive user flows?
  • What’s your policy for eth_getLogs (batch-and-split vs single call)?
  • Any clever upstream selection strategies we should test?

Top comments (0)