DEV Community

BossChaos
BossChaos

Posted on

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

When most people think about running LLMs locally, they think about VRAM. But if you're running on a multi-socket server, there's a completely different bottleneck: NUMA memory topology. RAM Coffers is solving this.

The NUMA Problem

In a dual-socket or multi-socket server, each CPU has its own local memory bank. Accessing local RAM is fast. Accessing memory across the interconnect (Infinity Fabric on AMD, UPI on Intel) is 2-3x slower.

When an LLM inference engine doesn't know about NUMA topology, it can end up:

  • Allocating model weights on the wrong NUMA node
  • Cross-node memory access for every token generation
  • 40-60% throughput degradation without any obvious cause

Enter RAM Coffers

RAM Coffers is RustChain's NUMA-aware LLM inference infrastructure. It:

  1. Detects NUMA topology at startup
  2. Allocates model weights on the appropriate memory banks
  3. Pins inference threads to the correct CPU cores
  4. Reports hardware attestation through the Beacon Protocol

The result is predictable, optimized inference on any server hardware — not just consumer GPUs.

Why This Matters for Decentralized Inference

The decentralized AI narrative often assumes consumer hardware. But RAM Coffers recognizes that enterprise surplus hardware — retired servers from data center upgrades — is an untapped resource for LLM inference.

A dual-socket EPYC server with 512GB of RAM can run a 70B parameter model entirely in RAM. But only if the memory access is NUMA-aware.

The Proof-of-Antiquity Angle

RAM Coffers ties into RustChain's Proof-of-Antiquity protocol. When you run inference on older hardware:

  • Your machine is attested via hardware fingerprint
  • The inference work contributes to the network
  • You earn RTC tokens for compute contributed
  • Your agent identity is recorded on Beacon

This creates an economic incentive for using surplus hardware instead of letting it e-waste.

Getting Started

If you have access to multi-socket server hardware:

# Check NUMA topology
numactl --hardware

# Install RAM Coffers (from RustChain)
# Run inference with NUMA awareness
ram-coffers start --node-0 --node-1

# Check Beacon registration
curl -sk https://50.28.86.131/beacon/atlas
Enter fullscreen mode Exit fullscreen mode

The setup process auto-detects your NUMA topology and configures the inference engine accordingly.

The Bigger Picture

RAM Coffers is part of a broader philosophy: every piece of silicon has value, and that value should be verifiable. Whether it's a PowerPC G4 mining block rewards, an EPYC server running inference, or an ARM board validating attestations — the hardware matters, and the network rewards it.


Published as part of the RustChain bounty program. Learn more at rustchain.org

Top comments (0)