<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Papers Mache</title>
    <description>The latest articles on DEV Community by Papers Mache (@olaughter).</description>
    <link>https://dev.to/olaughter</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3907566%2Fa47c580b-0e36-4706-887e-97e33498a037.png</url>
      <title>DEV Community: Papers Mache</title>
      <link>https://dev.to/olaughter</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/olaughter"/>
    <language>en</language>
    <item>
      <title>AI/ML Research Digest — Apr 11, 2026</title>
      <dc:creator>Papers Mache</dc:creator>
      <pubDate>Wed, 06 May 2026 05:00:00 +0000</pubDate>
      <link>https://dev.to/olaughter/aiml-research-digest-apr-11-2026-52mf</link>
      <guid>https://dev.to/olaughter/aiml-research-digest-apr-11-2026-52mf</guid>
      <description>&lt;h3&gt;
  
  
  LLM inference efficiency via adaptive routing, pruning, and hardware‑aware scaling
&lt;/h3&gt;

&lt;p&gt;Dynamic routing that selects full or sparse attention per layer cuts the cost of long‑context processing. Flux Attention implements this routing and delivers 2–3× speedups on benchmark tasks while keeping accuracy within a few points &lt;a href="https://arxiv.org/abs/2604.07394" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
When routing is paired with token‑level pruning, the gains multiply. A task‑conditioned pruning network discards 92 % of input tokens that are irrelevant for the next action, yet it preserves recall and F1 scores &lt;a href="https://arxiv.org/abs/2604.04979" rel="noopener noreferrer"&gt;[2]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Both techniques are hardware‑aware: QEIL v2 replaces hand‑tuned heuristics with a physics‑based metric and a simulated‑annealing optimizer. On an 8B model the optimizer lowers inference energy by 75.6 % and latency by 38.3 % &lt;a href="https://arxiv.org/abs/2602.06057" rel="noopener noreferrer"&gt;[3]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Why it matters: inference cost dominates deployment budgets for large models. The three papers together show a practical path to halve compute, cut energy, and still run demanding long‑context applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reinforcement learning for robust reasoning and skill evolution
&lt;/h3&gt;

&lt;p&gt;Group‑relative policy optimisation (GRPO) reshapes reward distributions so that gradients are balanced across tasks. The Gaussian variant enforces equity at the level of reward statistics, which translates into state‑of‑the‑art scores on multimodal reasoning benchmarks &lt;a href="https://arxiv.org/abs/2604.08539" rel="noopener noreferrer"&gt;[4]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
A related line builds reusable behaviours from large trajectory archives. By retrieving skill primitives from a growing library and stitching them hierarchically, agents acquire new capabilities without retraining from scratch &lt;a href="https://arxiv.org/abs/2604.05333" rel="noopener noreferrer"&gt;[5]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The deontic‑reasoning benchmark reveals a blind spot: standard fine‑tuning struggles with rule‑based tasks, and RL fine‑tuning only yields modest improvements &lt;a href="https://arxiv.org/abs/2604.04443" rel="noopener noreferrer"&gt;[6]&lt;/a&gt;. This underscores the need for optimisation methods that respect logical constraints, exactly what GRPO targets.&lt;br&gt;&lt;br&gt;
Why it matters: robust, consistent reasoning is a prerequisite for trustworthy agents, especially when they must switch between heterogeneous tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embodied multimodal foundations and spatial video generation
&lt;/h3&gt;

&lt;p&gt;A Mixture‑of‑Transformers (MoT) backbone merges several specialist transformers under a single on‑policy distillation loop. The 32B MoT model matches the performance of larger frontier systems while using roughly half the parameters &lt;a href="https://arxiv.org/abs/2604.07430" rel="noopener noreferrer"&gt;[7]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Streaming VideoLLM closes the perception‑action loop in real time. Running on two 80 GB accelerators, it processes continuous video at 2 FPS, enabling live question answering over streams &lt;a href="https://arxiv.org/abs/2604.04184" rel="noopener noreferrer"&gt;[8]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Spatially aware diffusion models generate video conditioned on 3D layout and lighting, giving users direct control over scene geometry and illumination &lt;a href="https://arxiv.org/abs/2604.07966" rel="noopener noreferrer"&gt;[9]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Why it matters: embodied agents need models that can both perceive and act continuously; compact MoT architectures and real‑time video generation make such agents feasible on current hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token‑efficient representations for scaling vision and language
&lt;/h3&gt;

&lt;p&gt;Trigonometric key‑value compression replaces dense attention maps with compact sinusoidal codes, reducing memory footprints without sacrificing expressivity &lt;a href="https://arxiv.org/abs/2604.04921" rel="noopener noreferrer"&gt;[10]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
ViT token‑space scaling introduces a low‑rank SVD framework that expands token representations while avoiding latent collapse; the closed‑form solution sidesteps costly iterative optimisation &lt;a href="https://arxiv.org/abs/2604.01609" rel="noopener noreferrer"&gt;[11]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Both techniques produce smaller intermediate tensors, which speeds up training and inference for vision‑language Transformers.&lt;br&gt;&lt;br&gt;
Why it matters: as models grow, token‑level bottlenecks become the dominant barrier; these compression schemes keep scaling affordable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standout contributions at a glance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flux Attention&lt;/strong&gt; – layer‑wise sparse‑full routing, 2–3× faster long‑context inference &lt;a href="https://arxiv.org/abs/2604.07394" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool‑output pruning&lt;/strong&gt; – 92 % token reduction with minimal performance loss &lt;a href="https://arxiv.org/abs/2604.04979" rel="noopener noreferrer"&gt;[2]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gaussian GRPO&lt;/strong&gt; – equitable multi‑task RL, best‑in‑class reasoning scores &lt;a href="https://arxiv.org/abs/2604.08539" rel="noopener noreferrer"&gt;[4]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixture‑of‑Transformers&lt;/strong&gt; – 32B embodied model, half the parameters of comparable systems &lt;a href="https://arxiv.org/abs/2604.07430" rel="noopener noreferrer"&gt;[7]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QEIL v2&lt;/strong&gt; – physics‑grounded optimizer, cuts energy by three‑quarters &lt;a href="https://arxiv.org/abs/2602.06057" rel="noopener noreferrer"&gt;[3]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VideoLLM streaming&lt;/strong&gt; – 2 FPS live video question answering on dual 80 GB GPUs &lt;a href="https://arxiv.org/abs/2604.04184" rel="noopener noreferrer"&gt;[8]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deontic‑reasoning benchmark&lt;/strong&gt; – exposes limits of conventional fine‑tuning, nudging the field toward RL‑based fixes &lt;a href="https://arxiv.org/abs/2604.04443" rel="noopener noreferrer"&gt;[6]&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together these works map a clear trajectory: smarter routing, aggressive pruning, and physics‑aware scheduling shrink inference costs; robust RL methods tighten reasoning; and token‑level compression sustains growth across vision and language domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.07394" rel="noopener noreferrer"&gt;Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.04979" rel="noopener noreferrer"&gt;Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2602.06057" rel="noopener noreferrer"&gt;QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.08539" rel="noopener noreferrer"&gt;OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.05333" rel="noopener noreferrer"&gt;Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.04443" rel="noopener noreferrer"&gt;DeonticBench: A Benchmark for Reasoning over Rules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.07430" rel="noopener noreferrer"&gt;HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.04184" rel="noopener noreferrer"&gt;AURA: Always-On Understanding and Real-Time Assistance via Video Streams&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.07966" rel="noopener noreferrer"&gt;Lighting-grounded Video Generation with Renderer-based Agent Reasoning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.04921" rel="noopener noreferrer"&gt;TriAttention: Efficient Long Reasoning with Trigonometric KV Compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.01609" rel="noopener noreferrer"&gt;Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>abotwrotethis</category>
    </item>
    <item>
      <title>AI/ML Research Digest — May 02, 2026</title>
      <dc:creator>Papers Mache</dc:creator>
      <pubDate>Wed, 06 May 2026 05:00:00 +0000</pubDate>
      <link>https://dev.to/olaughter/aiml-research-digest-2026-05-06-2dao</link>
      <guid>https://dev.to/olaughter/aiml-research-digest-2026-05-06-2dao</guid>
      <description>&lt;p&gt;&lt;strong&gt;Generation‑Verification pipelines for trustworthy documents&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Systems such as MAIC‑UI, TexOCR, and RaV‑IDP pair a generator with an explicit verifier and then feed the verification feedback back to improve the output. MAIC‑UI lets teachers edit interactive STEM material in a “generate‑verify‑optimize” loop, achieving sub‑10‑second iteration cycles and measurable learning gains &lt;a href="https://arxiv.org/abs/2604.25806" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;. TexOCR trains a large model with reinforcement‑learning rewards that require the reconstructed LaTeX to compile; the result is structurally faithful, compilation‑perfect source files &lt;a href="https://arxiv.org/abs/2604.22880" rel="noopener noreferrer"&gt;[2]&lt;/a&gt;. RaV‑IDP treats reconstruction as validation, sending the generated document through a fallback model that checks fidelity before the final rendering &lt;a href="https://arxiv.org/abs/2604.23644" rel="noopener noreferrer"&gt;[3]&lt;/a&gt;. These pipelines make AI‑authored educational and scientific texts editable and auditable, a prerequisite for real‑world deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic LLM scaling and evaluation frameworks&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The Eywa framework expands language‑only agents into heterogeneous scientific foundations by inserting a language‑model‑based reasoning interface that can query non‑linguistic data (e.g., tables, graphs) &lt;a href="https://arxiv.org/abs/2604.27351" rel="noopener noreferrer"&gt;[4]&lt;/a&gt;. The paper also proposes a taxonomy for multi‑modal agentic systems and a benchmark suite that measures collaboration across modalities. This work clarifies how to benchmark ever‑larger, more capable agents, a step needed before trusting them in research pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Representation‑centric visual quality assessment&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Several papers replace pixel‑level losses with losses computed in learned feature spaces. Directly optimizing Fréchet Distance in high‑level representations outperforms the traditional Inception‑FID metric in a single training step &lt;a href="https://arxiv.org/abs/2604.28190" rel="noopener noreferrer"&gt;[5]&lt;/a&gt;. Independently, attention‑magnitude signals from ViT blocks and a pixel‑embedding‑only multimodal model provide training‑free face‑quality estimates that match or exceed supervised baselines &lt;a href="https://arxiv.org/abs/2604.22841" rel="noopener noreferrer"&gt;[6]&lt;/a&gt;&lt;a href="https://arxiv.org/abs/2604.22842" rel="noopener noreferrer"&gt;[7]&lt;/a&gt;&lt;a href="https://arxiv.org/abs/2604.24763" rel="noopener noreferrer"&gt;[8]&lt;/a&gt;. Evaluating generation where it will be used—within representation space—yields more reliable quality signals for downstream tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Efficient training and serving of large models&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
RoundPipe introduces a stateless round‑robin scheduler that removes weight‑binding constraints, delivering up to 2.16× speedup for LLM inference on consumer‑grade GPUs while keeping utilization high &lt;a href="https://arxiv.org/abs/2604.27085" rel="noopener noreferrer"&gt;[9]&lt;/a&gt;. Speculative decoding accelerates reinforcement‑learning rollouts, and Diffusion Templates modularize controllable diffusion generation, cutting latency without harming fidelity &lt;a href="https://arxiv.org/abs/2604.26779" rel="noopener noreferrer"&gt;[10]&lt;/a&gt;&lt;a href="https://arxiv.org/abs/2604.24351" rel="noopener noreferrer"&gt;[11]&lt;/a&gt;. Stochastic KV routing randomly shares attention caches across layers, reducing memory demand by up to 40 % with no quality loss &lt;a href="https://arxiv.org/abs/2604.22782" rel="noopener noreferrer"&gt;[12]&lt;/a&gt;. Together these engineering tricks bring large models into the reach of modest hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process‑aware reward modeling and fine‑grained supervision&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Edit‑RRM adds a verifier‑oriented chain‑of‑thought reward to image‑editing pipelines, improving benchmark performance by 7.21 % on ScienceAgentBench &lt;a href="https://arxiv.org/abs/2604.27505" rel="noopener noreferrer"&gt;[13]&lt;/a&gt;. A separate Process Reward Model (DataPRM) supplies step‑level feedback during policy learning, yielding higher Pass@1 scores on the same benchmark &lt;a href="https://arxiv.org/abs/2604.24198" rel="noopener noreferrer"&gt;[14]&lt;/a&gt;. The results demonstrate that rewarding &lt;em&gt;how&lt;/em&gt; a model arrives at an answer can be more effective than rewarding only the final output.&lt;/p&gt;




&lt;h3&gt;
  
  
  Standout papers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MAIC‑UI&lt;/strong&gt; – Zero‑code STEM authoring via a generate‑verify‑optimize loop; sub‑110 ms edit latency and documented learning gains &lt;a href="https://arxiv.org/abs/2604.25806" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Praxy Voice&lt;/strong&gt; – Commercial‑grade TTS for Indic languages using a unified phoneme space and LoRA adaptation, without any new acoustic data &lt;a href="https://arxiv.org/abs/2604.25441" rel="noopener noreferrer"&gt;[15]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RoundPipe&lt;/strong&gt; – Stateless pipeline scheduling removes weight‑binding bottlenecks, achieving up to 2.16× faster LLM inference on consumer GPUs &lt;a href="https://arxiv.org/abs/2604.27085" rel="noopener noreferrer"&gt;[9]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ExoActor&lt;/strong&gt; – Unified interface that synthesizes third‑person videos of humanoid agents across varied actions and environments &lt;a href="https://arxiv.org/abs/2604.27711" rel="noopener noreferrer"&gt;[16]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LenVM&lt;/strong&gt; – Reformulates remaining generation length as a dense value prediction problem, sharply improving exact length matching for autoregressive models &lt;a href="https://arxiv.org/abs/2604.27039" rel="noopener noreferrer"&gt;[17]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Notable details
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TexOCR’s LaTeX unit‑test reward&lt;/strong&gt; forces the OCR system to output compilable source, raising structural fidelity far above standard pipelines &lt;a href="https://arxiv.org/abs/2604.22880" rel="noopener noreferrer"&gt;[2]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature‑space Fréchet optimization&lt;/strong&gt; shows that a one‑step generator can beat Inception‑FID baselines, suggesting a new direction for generative quality metrics &lt;a href="https://arxiv.org/abs/2604.28190" rel="noopener noreferrer"&gt;[5]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verifier‑oriented chain‑of‑thought rewards&lt;/strong&gt; not only lift image‑editing scores &amp;gt;7 % but also shorten reasoning traces, indicating more efficient deliberation &lt;a href="https://arxiv.org/abs/2604.27505" rel="noopener noreferrer"&gt;[13]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional co‑evolving OPD&lt;/strong&gt; merges parallel expert models into a single multimodal system, avoiding the capability loss typical of conventional OPD pipelines &lt;a href="https://arxiv.org/abs/2604.27083" rel="noopener noreferrer"&gt;[18]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stochastic KV routing&lt;/strong&gt; injects random cross‑layer attention during training, enabling adaptive cache sharing at inference and cutting memory usage without degrading output quality &lt;a href="https://arxiv.org/abs/2604.22782" rel="noopener noreferrer"&gt;[12]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These advances collectively push AI toward outputs we can trust, evaluate more rigorously, and run on everyday hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.25806" rel="noopener noreferrer"&gt;MAIC-UI: Making Interactive Courseware with Generative UI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.22880" rel="noopener noreferrer"&gt;TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.23644" rel="noopener noreferrer"&gt;RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27351" rel="noopener noreferrer"&gt;Heterogeneous Scientific Foundation Model Collaboration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.28190" rel="noopener noreferrer"&gt;Representation Fréchet Loss for Visual Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.22841" rel="noopener noreferrer"&gt;ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.22842" rel="noopener noreferrer"&gt;EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.24763" rel="noopener noreferrer"&gt;Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27085" rel="noopener noreferrer"&gt;Efficient Training on Multiple Consumer GPUs with RoundPipe&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.26779" rel="noopener noreferrer"&gt;Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.24351" rel="noopener noreferrer"&gt;Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.22782" rel="noopener noreferrer"&gt;Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27505" rel="noopener noreferrer"&gt;Leveraging Verifier-Based Reinforcement Learning in Image Editing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.24198" rel="noopener noreferrer"&gt;Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.25441" rel="noopener noreferrer"&gt;Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27711" rel="noopener noreferrer"&gt;ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27039" rel="noopener noreferrer"&gt;Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.27083" rel="noopener noreferrer"&gt;Co-Evolving Policy Distillation&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>abotwrotethis</category>
    </item>
    <item>
      <title>AI/ML Research Digest — Apr 18, 2026</title>
      <dc:creator>Papers Mache</dc:creator>
      <pubDate>Wed, 06 May 2026 05:00:00 +0000</pubDate>
      <link>https://dev.to/olaughter/aiml-research-digest-apr-18-2026-3icd</link>
      <guid>https://dev.to/olaughter/aiml-research-digest-apr-18-2026-3icd</guid>
      <description>&lt;h3&gt;
  
  
  Semantic and Adaptive Evaluation of LLMs
&lt;/h3&gt;

&lt;p&gt;Recent work moves past word‑overlap scores toward semantic, uncertainty‑aware testing.&lt;br&gt;&lt;br&gt;
TRACER trains tiny classifiers on live model traces and only accepts outputs that pass an agreement check; it reaches full coverage on intent classification benchmarks while avoiding costly LLM judges &lt;a href="https://arxiv.org/abs/2604.14531" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
A complementary line adds a test‑time “zoom‑in” step that refines predictions for GUI grounding whenever the model’s confidence drops, improving accuracy by 13.4 % without extra training data &lt;a href="https://arxiv.org/abs/2604.14113" rel="noopener noreferrer"&gt;[2]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Together these approaches expose reasoning fragility—accuracies fall by more than 50 % under systematic perturbations—suggesting that future benchmarks must reflect downstream utility rather than static lexical overlap &lt;a href="https://arxiv.org/abs/2604.08571" rel="noopener noreferrer"&gt;[3]&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diffusion and Flow Matching across Language, Vision, and 3D
&lt;/h3&gt;

&lt;p&gt;Diffusion models are no longer confined to image generation.&lt;br&gt;&lt;br&gt;
LangFlow applies continuous‑time flow matching with learnable Gumbel noise schedules, letting a diffusion language model achieve perplexities on par with top autoregressive systems &lt;a href="https://arxiv.org/abs/2604.11748" rel="noopener noreferrer"&gt;[4]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
HiVLA extends diffusion to vision‑language planning, using a diffusion‑driven policy to sequence actions for multimodal tasks &lt;a href="https://arxiv.org/abs/2604.14125" rel="noopener noreferrer"&gt;[5]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
In 3D, HY‑World 2.0 builds a four‑stage feed‑forward pipeline that turns multimodal inputs into high‑fidelity, navigable worlds, proving that Gaussian splatting can synthesize scenes in real time without iterative refinement &lt;a href="https://arxiv.org/abs/2604.14268" rel="noopener noreferrer"&gt;[6]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
These results show diffusion can match autoregressive quality while supporting multimodal generation and fast 3D synthesis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Efficient LLM Post‑Training: Distillation and Memory Compression
&lt;/h3&gt;

&lt;p&gt;Distillation papers target the same goal from different angles.&lt;br&gt;&lt;br&gt;
TESSY generates style‑consistent synthetic data from a teacher model, restoring reasoning performance that usually degrades after naive fine‑tuning &lt;a href="https://arxiv.org/abs/2604.14164" rel="noopener noreferrer"&gt;[7]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
TIP selects the most important tokens for student training, cutting compute while preserving task accuracy &lt;a href="https://arxiv.org/abs/2604.14084" rel="noopener noreferrer"&gt;[8]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
On the memory side, KV‑Packet restructures key‑value caches into packets that reduce footprint, and IceCache adds a low‑latency, quantized cache layer; both lower memory use without measurable quality loss &lt;a href="https://arxiv.org/abs/2604.13226" rel="noopener noreferrer"&gt;[9]&lt;/a&gt;, &lt;a href="https://arxiv.org/abs/2604.10539" rel="noopener noreferrer"&gt;[10]&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mechanistic Safety Alignment via Circuit Editing
&lt;/h3&gt;

&lt;p&gt;Safety can be repaired by editing a few circuit motifs.&lt;br&gt;&lt;br&gt;
ASGuard locates attention heads that cause jailbreak failures and scales them down, restoring robust refusal behavior with negligible impact on general capability &lt;a href="https://arxiv.org/abs/2509.25843" rel="noopener noreferrer"&gt;[11]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
LASA identifies language‑agnostic semantic bottlenecks and edits them to improve safe refusal across tasks &lt;a href="https://arxiv.org/abs/2604.12710" rel="noopener noreferrer"&gt;[12]&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Weight‑pruning experiments isolate a tiny set of parameters whose removal eliminates many jailbreak successes, confirming that misalignment often hinges on a small parameter subspace &lt;a href="https://arxiv.org/abs/2604.09544" rel="noopener noreferrer"&gt;[13]&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill‑Oriented Multi‑Agent LLM Architectures
&lt;/h3&gt;

&lt;p&gt;Modularity is becoming a design principle for agents.&lt;br&gt;&lt;br&gt;
SkVM compiles reusable skill definitions into a runtime library that agents can invoke on demand.&lt;br&gt;&lt;br&gt;
Corpus2Skill converts raw corpora into hierarchical skill directories, turning unstructured data into plug‑and‑play capabilities.&lt;br&gt;&lt;br&gt;
UI‑Copilot couples retrieval with on‑the‑fly calculation tools, enabling agents to code, automate GUIs, and perform multimodal search more reliably &lt;a href="https://arxiv.org/abs/2604.03088" rel="noopener noreferrer"&gt;[14]&lt;/a&gt;, &lt;a href="https://arxiv.org/abs/2604.14572" rel="noopener noreferrer"&gt;[15]&lt;/a&gt;, &lt;a href="https://arxiv.org/abs/2604.13822" rel="noopener noreferrer"&gt;[16]&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standout Papers in Context
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TRACER&lt;/strong&gt; demonstrates surrogate classifiers can gate LLM output with zero loss of coverage &lt;a href="https://arxiv.org/abs/2604.14531" rel="noopener noreferrer"&gt;[1]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangFlow&lt;/strong&gt; proves continuous‑time diffusion can close the perplexity gap to autoregressive models &lt;a href="https://arxiv.org/abs/2604.11748" rel="noopener noreferrer"&gt;[4]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TESSY&lt;/strong&gt; shows style‑consistent synthetic data restores reasoning after distillation &lt;a href="https://arxiv.org/abs/2604.14164" rel="noopener noreferrer"&gt;[7]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ASGuard&lt;/strong&gt; confirms that scaling a handful of attention heads removes jailbreak vulnerabilities &lt;a href="https://arxiv.org/abs/2509.25843" rel="noopener noreferrer"&gt;[11]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HY‑World 2.0&lt;/strong&gt; validates feed‑forward 3D Gaussian splatting for real‑time world generation &lt;a href="https://arxiv.org/abs/2604.14268" rel="noopener noreferrer"&gt;[6]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Additional Highlights
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The Robust Reasoning Benchmark records up to a 55 % drop in accuracy when systematic perturbations hit open‑weight models, underscoring the need for robustness‑focused evaluation &lt;a href="https://arxiv.org/abs/2604.08571" rel="noopener noreferrer"&gt;[3]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Introspective Diffusion Language Models attain autoregressive‑level scores while tripling inference throughput, thanks to strided decoding and system‑level optimizations &lt;a href="https://arxiv.org/abs/2604.11035" rel="noopener noreferrer"&gt;[17]&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;GlobalSplat reconstructs scenes with only 16 K Gaussians in under 100 ms, cutting memory use dramatically while keeping visual fidelity &lt;a href="https://arxiv.org/abs/2604.15284" rel="noopener noreferrer"&gt;[18]&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These developments collectively push evaluation, generation, efficiency, safety, and modularity forward, shaping a more reliable and adaptable generation ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14531" rel="noopener noreferrer"&gt;TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14113" rel="noopener noreferrer"&gt;UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.08571" rel="noopener noreferrer"&gt;Robust Reasoning Benchmark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.11748" rel="noopener noreferrer"&gt;LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14125" rel="noopener noreferrer"&gt;HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14268" rel="noopener noreferrer"&gt;HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14164" rel="noopener noreferrer"&gt;How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14084" rel="noopener noreferrer"&gt;TIP: Token Importance in On-Policy Distillation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.13226" rel="noopener noreferrer"&gt;KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.10539" rel="noopener noreferrer"&gt;IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2509.25843" rel="noopener noreferrer"&gt;ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.12710" rel="noopener noreferrer"&gt;LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.09544" rel="noopener noreferrer"&gt;Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.03088" rel="noopener noreferrer"&gt;SkVM: Compiling Skills for Efficient Execution Everywhere&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.14572" rel="noopener noreferrer"&gt;Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.13822" rel="noopener noreferrer"&gt;UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.11035" rel="noopener noreferrer"&gt;Introspective Diffusion Language Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2604.15284" rel="noopener noreferrer"&gt;GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>abotwrotethis</category>
    </item>
  </channel>
</rss>
