V.E.L.O.C.I.T.Y.-OS: The JIT Compiler Core – From AST to Native Closures (Part 4)

#showdev #coding #compilers #rust

Solving Rust borrow issues with HRTBs

With the standalone IDE running, I had a sandboxed environment to write and execute Neural Document Architecture (NDA) programs. However, executing the binary AST via a standard recursive tree-walk interpreter was adding unacceptable dispatch overhead.

Every opcode instruction required match branching, dynamic type checking, and variable lookup cycles. I needed a Just-In-Time (JIT) compiler to turn the AST into native machine code.

The V.E.L.O.C.I.T.Y.-OS 12-Part Roadmap

We are building a bare-metal, self-healing operating system running entirely inside the CPU's L3 cache. Here is the roadmap for this 12-part series:

Part 1: The Spark — Exposing the "Safe-Room" security leak and building the compiler gate.
Part 2: The NDA Language — Designing a content-addressed triplet representation to cure context bloat.
Part 3: Ditching the Web Stack — Building a native 30MB IDE with 1,500,000x IPC latency drops.
Part 4: The Closure JIT — Compiling AST blocks to nested closures and bypassing borrow checker limits. (You are here)
Part 5: JIT Math Optimizations — Replacing division operations with precomputed 16-bit lookup tables.
Part 6: x86-64 Assembler & SCEV-Lite — Compiling scalar loops directly to native code in constant time.
Part 7: Classic Compiler Passes — Implementing inter-procedural Dead Code Elimination and loop unrolling.
Part 8: Reclaiming Ring 0 — Exiting UEFI boot services and transitioning the kernel to Ring 0.
Part 9: Bare-Metal Drivers — Writing a PCI scanner, NVMe block storage controller, and FAT32 parser.
Part 10: Synaptic Canvas — Rendering a spatial, force-directed GUI based on model token activation vectors.
Part 11: Swarms & Hot-Patching — Building multi-agent scheduling and zero-downtime RCU driver updates.
Part 12: Self-Evolution — Handing system control over to a local LLM Terminal that self-optimizes via telemetry.

Tier-1: The Closure JIT

I started by designing a Tier-1 Closure-Based JIT Compiler.

Instead of compiling directly to machine instructions, the compiler walks the AST at load-time and generates a chain of nested Rust closures (Box<dyn Fn>).

This approach resolves all opcode matches, scope checks, and control-flow branches at compile-time. At runtime, the JIT engine simply walks down a flat, pre-compiled chain of function pointers. This completely eliminates branch misprediction penalties and instruction cache misses.

Here is how the compiler defines the JIT function type and registers the compilation sequence in src/compiler/nda_jit.rs:

// compiler/nda_jit.rs — Closure JIT definitions
pub enum JitControlFlow {
    Continue,
    Break,
    Return,
}

// A compiled JIT closure: accepts a mutable state reference of *any* lifetime 'a
pub type JitFn = Arc<dyn for<'a> Fn(&mut JitState<'a>) -> Result<JitControlFlow, String> + Send + Sync>;

// Compile a sequence of NDA AST nodes into a flat chain of closures
fn compile_sequence(nodes: &[NdaNode], counter: &mut usize, registry: &VarRegistry) -> Vec<JitFn> {
    nodes.iter().map(|n| compile_node(n, counter, registry)).collect()
}

Dynamic Dispatch: How AST Nodes Compile to Closures

To understand why this compiler is so fast, we have to look at how the AST nodes compile into closures.

In a standard interpreter, executing an assignment like let a = 5 and a load like a + 1 requires querying a hash map by string name inside loop ticks. The JIT closure compiler bypasses this by pre-allocating variable slots at load-time and wrapping the runtime actions in nested closures that hold direct index offsets.

Here is the exact implementation in src/compiler/nda_jit.rs for compiling Let and Load nodes:

// compiler/nda_jit.rs — Compiling Let and Load AST nodes to closures
fn compile_node(node: &NdaNode, counter: &mut usize, registry: &VarRegistry) -> JitFn {
    *counter += 1;
    match node {
        // Compile a variable declaration
        NdaNode::Let { name_hash, init } => {
            let slot = registry.get_or_create_slot(*name_hash);
            let init_fn = compile_node(init, counter, registry);

            Arc::new(move |state: &mut JitState<'_>| {
                state.executed_nodes += 1;
                // Evaluate the initialization expression
                init_fn(state)?;
                let val = state.stack.pop().ok_or("Stack underflow in Let init")?;

                // Write directly to the pre-allocated flat array index
                if slot >= state.variables.len() {
                    state.variables.resize(slot + 1, None);
                }
                state.variables[slot] = Some(val);
                Ok(JitControlFlow::Continue)
            })
        }

        // Compile a variable reference load
        NdaNode::Load { name_hash } => {
            let slot = registry.get_or_create_slot(*name_hash);

            Arc::new(move |state: &mut JitState<'_>| {
                state.executed_nodes += 1;
                // Sub-nanosecond flat array read, no hash map overhead
                let val = state.variables.get(slot)
                    .and_then(|v| v.as_ref())
                    .ok_or_else(|| format!("Load of uninitialized variable slot {}", slot))?;

                state.stack.push(val.clone());
                Ok(JitControlFlow::Continue)
            })
        }
        // ... other nodes (Matrix, Norm, Loop, Add) compile similarly
    }
}

By resolving variable lookups to slot indices during compilation and mapping them directly to pre-allocated indices in JitState::variables, we reduce variable load/store operations from hash table lookups to flat memory offsets.

The Lifetime Trap: Higher-Ranked Trait Bounds (HRTBs)

However, I immediately hit a massive Rust lifetime wall.

The JIT execution closures needed to query my persistent Merkle database (SiteMap) to resolve content-addressed function calls. Because the JIT closures were stored and executed dynamically, Satisfying Rust’s borrow checker required wrapping the SiteMap in an Arc<SiteMap>.

This meant that every variable assignment, function call, and closure jump required cloning the atomic reference count. The CPU was wasting cycles updating memory barriers in the hot path.

To fix this, I refactored the JIT engine to accept direct reference inputs &SiteMap instead. I solved the lifetime constraint by using Higher-Ranked Trait Bounds (HRTBs):

type JitFn = Arc<dyn for<'a> Fn(&mut JitState<'a>) -> Result<JitControlFlow, String> + Send + Sync>;

By specifying for<'a>, I explicitly instructed the compiler that the JIT closure could accept a JitState of any lifetime 'a. This allowed the JIT engine to reference the live, stack-allocated database directly, eliminating Arc clones and reference-counting heap writes entirely.

The JIT Sandbox

I wrapped this JIT engine in a custom JIT Sandbox (NdaJitSandbox). Before any program was committed to the codebase, the sandbox:

Compiled the AST on the fly (taking just 93 microseconds).
Ran the execution inside a panic-safe boundary (AssertUnwindSafe).
Captured print buffers and returned execution metadata.

Here is the architectural comparison mapping the JIT compilation pipeline and sandbox verification execution path:

Flowchart showing the JIT Sandbox compilation pipeline: deciding between Tier-1 Closures and Tier-2 Machine code assembly — Fig 1: The two-tier JIT sandbox compilation pipeline and execution pathways.

Pascal's Analysis: Bypassing the Serialization Wall

When I shared the performance gains (the JIT sandbox executing a 4-layer network block in 206µs including compile-and-run time),

Pascal CESCATO

Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier.

analyzed the structural benefits:

"The format itself enforces consistency at write time, so the model can commit incrementally — each triple is either valid against the current graph or it isn't. The correction happens at write speed, not at review time."

By compiling directly to closures, I was allowing the model's output to bypass the serialization wall completely.

But my JIT closures still relied on heap allocations and standard integer loops. I needed to push compiler performance to match—and exceed—native Rust scalar math.

In the next post, I'll document how I optimized the JIT math by introducing slot-based registries and division-free byte loops.

Discussion

How do you handle runtime extensibility in compiled languages? Have you worked with closure chains or dynamic function dispatch in Rust? How do you tackle borrow checker constraints when dealing with dynamic state sharing? Let's discuss in the comments below!

Special thanks to

Pascal CESCATO

Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier.

for showing me that a structured compilation pipeline is the ultimate guard against model hallucinations.

Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.