How I dropped LLM latency from 500ms to 0ms in real-time physics loops

#ai #robotics #opensource #embodiedai

If you’ve tried to put an LLM in charge of a 60fps physics loop (robotics, MuJoCo, game NPCs), you’ve hit the wall.

The "Brain-Pull" model—where the brain has to micromanage every tool-call—is just too slow. Physics doesn't wait for an API response.

I just open-sourced a "Body-Push" protocol called SCP (Spatial Context Protocol) and an orchestrator called Plexa to solve this.

The Problem: The "Brain-Pull" Bottleneck

Standard tool-calling (like MCP) is passive. The Brain asks, the Body waits. In a 3D environment, this leads to:

Frozen Agents: The simulation pauses or the robot crashes while waiting for the LLM.
Massive API Bills: Paying for the same decision every single frame.

The Solution: Digital Muscle Memory

SCP inverts the hierarchy. Instead of the brain micromanaging, the body owns the loop.

Muscle-First: The body runs at 60fps locally using a Pattern Store.
Cache Miss: It only pings the LLM when it encounters a "novel state" (something it hasn't seen).
Local Learning: Once the LLM gives advice, the body caches the pattern locally.

Brain teaches once. Muscle remembers forever.

The Proof (MuJoCo Cart-Pole)

We tested this on a standard cart-pole balancing act:

Loop 1: The LLM was called 27 times.
Loop 17: The LLM was called 0 times.

The local pattern cache took over completely. The latency hit 0ms, and the API cost hit $0.

One Brain, Many Bodies (Plexa)

I also built Plexa, an orchestration framework that sits on top of SCP. It handles the "Motor Cortex" logic—taking a high-level intent like "Secure the room" and sequencing it across multiple autonomous SCP bodies (drones, smart locks, cameras) without them desyncing.

Open Source & Community Roast

This is still in the starting stage, and I’m looking for the community to battle-test the architecture. I’m specifically looking for feedback on:

State Invalidation: How can we make the "3-strikes" cache wipe more robust?
High-Dimensional Scaling: How does the k-NN similarity hold up with 100+ agents?

Check the code here: