Scaling Long‑Horizon Video Generation
Recent work replaces sliding‑window attention with memory‑centric designs. A learnable evolving memory compresses the entire history at constant cost, enabling real‑time infinite rollouts [1]. Low‑rank KV caches achieve roughly a 90 % memory reduction while preserving visual fidelity [2]. A complementary low‑rank latent with 3‑D RoPE further shrinks per‑head caches without quality loss [2]. Together these tricks make video models scalable to lengths that were previously infeasible and cut GPU memory footprints dramatically [3].
Stabilizing On‑Policy Distillation for LLMs
Distilling policies from RL‑trained LLMs suffers from high KL variance and distribution drift. Aligning hidden representations reduces this variance, yielding smoother student updates [4]. Trust‑region constraints enforce a bounded policy shift, preventing collapse during distillation [5]. Two orthogonal tricks—logit‑free chunk verification and self‑distilled policy gradients—provide additional stability for RL fine‑tuning [6], [7].
Agentic Reliability and Safety Frameworks
Benchmarks now probe LLM agents on extended reasoning tasks that require memory‑policy optimization and plan adaptation under shifting constraints [8], [9]. Counterfactual context revision audits reveal how agents change stance when presented with altered evidence, exposing hidden failure modes [10]. Self‑evolving prompt agents automatically refine system prompts, improving alignment without human intervention [11].
Standout Papers
VideoMLA: Efficient KV Cache Reduction – Introduces a shared low‑rank latent and 3‑D RoPE to replace per‑head KV caches, cutting memory use by over 90 % while keeping generation quality intact [2].
Echo‑Infinity: Infinite Video Rollouts – Proposes a learnable evolving memory that stores compressed history at fixed cost, allowing autoregressive video generation to run indefinitely in real time [1].
Hamilton‑Jacobi Evolution for NN Dynamics – Shows that gradient steps follow a viscous Hamilton‑Jacobi PDE, unifying ResNets, Transformers, and RNNs under a single mathematical lens [12].
Other Notable Findings
Distributional watermarking is fragile – Linear ensembles can erase watermark perturbations, indicating that current watermarking schemes may be easy to bypass [13].
Speculative decoding throughput – A pipeline‑parallel speculative decoding framework processes multiple tokens per step, delivering a theoretical speedup far beyond prior baselines [14].
Sparse MoE via optimal transport – Differentiable optimal transport converts dense feed‑forward layers into sparse experts, offering a principled recipe for Mixture‑of‑Experts specialization [15].
These results tighten the gap between research prototypes and production‑ready systems. Memory‑efficient video models can now power longer streams, stable distillation makes LLM policy fine‑tuning safer, and new safety benchmarks expose hidden brittleness in agentic behavior. The accompanying techniques—low‑rank caches, evolving memories, trust‑region distillation, and optimal‑transport MoEs—provide concrete tools for engineers building the next generation of generative and autonomous AI.
References
- Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
- VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
- LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation
- OPRD: On-Policy Representation Distillation
- Trust Region On-Policy Distillation
- OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
- Self-Distilled Policy Gradient
- Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents
- AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
- Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions
- SePO: Self-Evolving Prompt Agent for System Prompt Optimization
- The Hamilton-Jacobi Theory of Deep Learning
- Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
- Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism
- DOT-MoE: Differentiable Optimal Transport for MoEfication
Top comments (0)