DEV Community

Cover image for I built an AOT rule-image compiler for Java — 853 MB of heap became 8 MB
Ahmed Yasser
Ahmed Yasser

Posted on

I built an AOT rule-image compiler for Java — 853 MB of heap became 8 MB

There's a pattern that's been quietly shipping in production for over a decade: take a large read-only dataset, compile it into a flat binary, and mmap it instead of deserializing it into Java objects. Lucene does it. Chronicle does it. LMDB does it.

I built a reusable version of that pattern for rule/config/policy datasets and just open-sourced it.

Repo: github.com/AlphaSudo/rimg

The Problem

You have a service that evaluates rules, policies, feature flags, or lookup tables on every request. The dataset is large (10K–1M+ entries), rarely changes, and lives on the heap as a big object graph. You're paying for it in:

  • Heap pressure: Hundreds of MB of static data the GC has to scan every cycle.
  • Startup time: Deserializing JSON or querying a DB to build the graph.
  • Reload cost: Rebuilding the whole graph when the dataset updates.

What rule-image does

figure 1

The .rimg format is a custom binary featuring:

  • A CHD-style Minimal Perfect Hash (MPHF) index.
  • Optional Bloom filter for fast negative lookups.
  • CRC32 corruption detection + SHA-256 integrity.
  • Little-endian packed entries with natural alignment.

The Numbers

GeoIP-style showcase (5 million synthetic entries)

Metric Heap (POJO) rule-image mapped
Heap after load 853 MB 8 MB
Load time 6,683 ms 145 ms
Reload time 6,595 ms 278 ms

100K entries with fat metadata payloads

Metric Heap Mapped
Heap after load 1.47 GB 7.33 MB

The "Honest" Benchmark (Latency)

Benchmark Heap Mapped
JMH single warm lookup 21 ns/op 79 ns/op
JMH composed (N=10) 400 ns/op 684 ns/op

The Tradeoff: Warm lookup is slower. If your dataset is already loaded and "warm" in the CPU cache, plain heap objects win. rule-image wins on memory footprint, startup, reload, and cold/miss-path behavior.

Hot-swap Chaos Test

I threw 10,000 concurrent virtual-thread readers at the service harness while forcing an image swap every 500ms for 5 continuous minutes.

Result: Zero segfaults, zero stale reads, zero lost evaluations.

The reclamation strategy is epoch-based—each reader increments/decrements an epoch counter, and the swap thread waits for the epoch to stabilize before closing the old Arena.

The Valhalla Angle

The part I'm most excited about is the forward path. When JEP 401 (Value Classes) ships permanently, you'll be able to write zero-allocation views like this:

value class RuleHeader {
    private final MemorySegment seg;
    private final long base;

    public int id()       { return seg.get(JAVA_INT,  base + 0); }
    public int priority() { return seg.get(JAVA_INT,  base + 4); }
    public long mask()    { return seg.get(JAVA_LONG, base + 8); }
}
Enter fullscreen mode Exit fullscreen mode

Scalar replacement means this lives in registers. The hot path allocates exactly zero bytes end-to-end while your code reads like normal Java. I've drafted a post for valhalla-dev (included in the repo under docs/) and would love feedback from anyone tracking JEP 401.

Quick Start

# Requires JDK 26 (Temurin)
git clone [https://github.com/AlphaSudo/rimg.git](https://github.com/AlphaSudo/rimg.git)
cd rimg
./gradlew test
./gradlew :geoip-showcase:run --args="--entries 100000 --lookups 10000 --warmup-lookups 2000"
Enter fullscreen mode Exit fullscreen mode

What this is NOT

  1. Not "faster than everything": Warm lookup loses to heap POJOs.
  2. Not production-ready for all workloads: This is a PoC with real evidence, but use with caution.
  3. Not novel at the JVM level: Lucene and Chronicle have used these techniques for 15+ years.

What IS arguably new: The specific packaging as a reusable AOT compiler + runtime, tuned for Virtual Threads (Loom), with a Valhalla-forward codegen path.

The data is in the repo. Judge for yourself.

Apache 2.0 · github.com/AlphaSudo/rimg

Top comments (0)