Persistent state and memory for embodied agents
Linear‑temporal attention lets agents keep a running world model instead of recomputing everything from scratch [1].
Associative graph memories store observations as linked nodes, enabling recall after long gaps [2].
Both approaches expose a core difficulty: maintaining coherent behavior when input streams are intermittent or extend over many steps [3].
Granular reinforcement learning and quality‑aware distillation
Step‑level credit assignment replaces coarse episode rewards, giving agents clearer signals about which actions actually mattered [4].
Quality‑aware self‑distillation preserves fine‑grained grounding cues when multimodal models are compressed, improving downstream reasoning without extra supervision [5].
Efficient latent‑diffusion transformers
Adaptive token compression discards low‑information patches on the fly, cutting inference cost while keeping visual fidelity [6].
Frequency‑aware spectral forcing reshapes the diffusion spectrum so fewer parameters achieve the same detail level, further reducing runtime [7].
Multilingual code gaps
The Multi‑LCB benchmark adds twelve non‑Python languages and shows that current large language models still excel mainly at Python, with performance drops of up to 40 % on other languages [8].
The results warn that code‑generation tools will remain biased unless training data and evaluation broaden.
Stabilizing 4‑bit pretraining
Replacing the customary E2M1 quantizer with a uniform 4‑bit grid plus a Random Hadamard Transform eliminates geometric shrinkage bias. Large‑scale models trained this way converge more reliably and retain accuracy [9].
FastContext: repository‑exploration agent
FastContext spawns a lightweight sub‑agent that extracts concise file paths before the main model processes code. This reduces token consumption by roughly 60 % and lifts success rates on the SWE‑bench suite [10].
Fragility of sparse autoencoder features
Sparse autoencoders produce interpretable neurons, but individual features change dramatically across random seeds. Moreover, after targeted interventions the same sparse codes can re‑emerge to support harmful behaviors [11], [12].
AI reviewer repackaging attacks
Evaluations that rely on LLM‑driven reviewers can be fooled by trivial re‑formatting of submissions; the content stays the same but the reviewer output shifts, exposing a non‑robustness in automated peer review pipelines [13].
Token reduction via visual repository maps
Encoding a code repository as a graph‑based image and feeding it alongside text cuts token usage by up to 26 % while preserving answer quality. The technique offers a practical shortcut for long‑context code‑understanding tasks [14].
References
- Kairos: A Native World Model Stack for Physical AI
- Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents
- Current World Models Lack a Persistent State Core
- StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning
- Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
- HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing
- Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion
- Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
- Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe
- FastContext: Training Efficient Repository Explorer for Coding Agents
- SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
- Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders
- No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions
- LLM Agents Can See Code Repositories
Top comments (0)