Persistent world models are what keep long‑horizon tasks coherent when the agent’s sensors go dark. Linear temporal attention and associative graph memories give embodied systems a way to write and read state across those dark intervals, eliminating the drift that has long plagued simulation‑to‑real pipelines.
Before these advances, world models behaved like camera‑following renderers: they could generate plausible frames while observed, but they fell apart the moment the viewpoint changed. An analysis of 9 600 videos across 23 models showed a systematic “preservation‑access‑re‑observed‑consistency” gap, with the re‑observed state almost never correct regardless of image quality or model size [1].
Kairos demonstrates that hybrid linear temporal attention can close that gap without exploding compute. Its gated linear attention runs in O(n) time, allowing a 5‑second video to be processed in ≈ 11.7 seconds on an NVIDIA A800—real‑time edge inference that would be impossible with quadratic attention [2].
MRAgent shows that a graph‑structured episodic memory can make the same persistence cheap. By integrating reasoning into memory access, it “reduces prompt tokens to 118 k, a significant decrease from baselines like A‑Mem (632 k)” —an 81 % reduction that also slashes runtime cost while preserving expressive power [3].
The new components are not a silver bullet. Kairos still relies on sliding‑window and dilated windows, so events that span beyond the window length may be truncated, and MRAgent’s reconstruction loop grows with the number of graph nodes, raising concerns about scalability to truly open‑ended lifespans. The papers discuss these design trade‑offs but do not claim a specific solution for fully unbounded persistence.
If persistent state is the missing piece, then the community’s evaluation suite must reflect it. WRBench’s three‑stage diagnostic—intervention, continuity, and re‑observed‑state correctness—should become a default test for any world‑model rollout, forcing developers to measure not just visual fidelity but the ability to keep the world moving while nobody is looking.
Top comments (0)