DEV Community

AI Series' Articles

Back to Paperium's Series
Agent Learning via Early Experience
Cover image for Agent Learning via Early Experience

Agent Learning via Early Experience

Comments
2 min read
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization
Cover image for MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization

Comments
2 min read
MemMamba: Rethinking Memory Patterns in State Space Model
Cover image for MemMamba: Rethinking Memory Patterns in State Space Model

MemMamba: Rethinking Memory Patterns in State Space Model

Comments
2 min read
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cover image for UniVideo: Unified Understanding, Generation, and Editing for Videos

UniVideo: Unified Understanding, Generation, and Editing for Videos

Comments
2 min read
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning
Cover image for VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning

Comments
2 min read
DreamOmni2: Multimodal Instruction-based Editing and Generation
Cover image for DreamOmni2: Multimodal Instruction-based Editing and Generation

DreamOmni2: Multimodal Instruction-based Editing and Generation

Comments
2 min read
From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning
Cover image for From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning

From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning

Comments
2 min read
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Cover image for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Comments
2 min read
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Cover image for When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Comments
2 min read
Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward
Cover image for Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward

Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward

Comments
3 min read
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
Cover image for The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Comments
2 min read
Training-Free Group Relative Policy Optimization
Cover image for Training-Free Group Relative Policy Optimization

Training-Free Group Relative Policy Optimization

Comments
2 min read
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Cover image for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Comments
2 min read
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Cover image for NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Comments
2 min read
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation
Cover image for ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation

Comments
2 min read
DeepPrune: Parallel Scaling without Inter-trace Redundancy
Cover image for DeepPrune: Parallel Scaling without Inter-trace Redundancy

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Comments
2 min read
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Cover image for First Try Matters: Revisiting the Role of Reflection in Reasoning Models

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Comments
2 min read
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction
Cover image for LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction

Comments
1 min read
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
Cover image for UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Comments
1 min read
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints
Cover image for NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints

Comments
1 min read
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Cover image for CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Comments
1 min read
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters
Cover image for PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

Comments
1 min read
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Cover image for UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Comments
2 min read
InstructX: Towards Unified Visual Editing with MLLM Guidance
Cover image for InstructX: Towards Unified Visual Editing with MLLM Guidance

InstructX: Towards Unified Visual Editing with MLLM Guidance

Comments
1 min read
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
Cover image for LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

Comments
1 min read
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks
Cover image for Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks

Comments
2 min read
Reinforcing Diffusion Models by Direct Group Preference Optimization
Cover image for Reinforcing Diffusion Models by Direct Group Preference Optimization

Reinforcing Diffusion Models by Direct Group Preference Optimization

Comments
1 min read
Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction
Cover image for Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction

Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction

Comments
2 min read
Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as
Cover image for Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as

Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as

Comments
1 min read
Memory Retrieval and Consolidation in Large Language Models through FunctionTokens
Cover image for Memory Retrieval and Consolidation in Large Language Models through FunctionTokens

Memory Retrieval and Consolidation in Large Language Models through FunctionTokens

Comments
1 min read
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training
Cover image for Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training

Comments
1 min read
GCPO: When Contrast Fails, Go Gold
Cover image for GCPO: When Contrast Fails, Go Gold

GCPO: When Contrast Fails, Go Gold

Comments
1 min read
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Cover image for UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

Comments
1 min read
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction
Cover image for OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction

Comments
1 min read
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model
Cover image for DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model

Comments
2 min read
A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Cover image for A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Comments
1 min read
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs
Cover image for Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Comments
2 min read
Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models
Cover image for Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Comments
1 min read
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Cover image for R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Comments
1 min read
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Cover image for Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Comments
1 min read
Beyond Outliers: A Study of Optimizers Under Quantization
Cover image for Beyond Outliers: A Study of Optimizers Under Quantization

Beyond Outliers: A Study of Optimizers Under Quantization

Comments
2 min read
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Cover image for SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Comments
1 min read
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations
Cover image for GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Comments
2 min read
Towards Scalable and Consistent 3D Editing
Cover image for Towards Scalable and Consistent 3D Editing

Towards Scalable and Consistent 3D Editing

Comments
1 min read
Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning
Cover image for Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning

Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning

Comments
1 min read
Fidelity-Aware Data Composition for Robust Robot Generalization
Cover image for Fidelity-Aware Data Composition for Robust Robot Generalization

Fidelity-Aware Data Composition for Robust Robot Generalization

Comments
1 min read
SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels
Cover image for SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels

SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels

Comments
1 min read
Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency
Cover image for Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency

Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency

Comments
1 min read
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
Cover image for Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Comments
1 min read
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment
Cover image for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment

Comments
1 min read
Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation
Cover image for Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation

Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation

Comments
1 min read
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI
Cover image for D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI

Comments
1 min read
TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling
Cover image for TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling

TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling

Comments
1 min read
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Cover image for Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Comments
1 min read
AutoPR: Let's Automate Your Academic Promotion!
Cover image for AutoPR: Let's Automate Your Academic Promotion!

AutoPR: Let's Automate Your Academic Promotion!

Comments
1 min read
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?
Cover image for R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?

Comments
2 min read
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Cover image for Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Comments
1 min read
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km
Cover image for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Comments
1 min read
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Cover image for StreamingVLM: Real-Time Understanding for Infinite Video Streams

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Comments
1 min read
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting
Cover image for Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Comments
1 min read
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping
Cover image for ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

Comments
1 min read
KORMo: Korean Open Reasoning Model for Everyone
Cover image for KORMo: Korean Open Reasoning Model for Everyone

KORMo: Korean Open Reasoning Model for Everyone

Comments
1 min read
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Cover image for DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Comments
1 min read
Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization
Cover image for Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Comments
1 min read
Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction
Cover image for Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Comments
1 min read
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
Cover image for StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

Comments
1 min read
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval
Cover image for MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

Comments
1 min read
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Cover image for PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Comments
1 min read
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution
Cover image for BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

Comments
1 min read
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
Cover image for Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Comments
1 min read
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Cover image for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Comments
2 min read
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview
Cover image for ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

Comments
2 min read
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition
Cover image for Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Comments
1 min read
Parallel Test-Time Scaling for Latent Reasoning Models
Cover image for Parallel Test-Time Scaling for Latent Reasoning Models

Parallel Test-Time Scaling for Latent Reasoning Models

Comments
1 min read
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models
Cover image for Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Comments
1 min read
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks
Cover image for A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

Comments
2 min read
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control
Cover image for TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

3
Comments
1 min read
Mitigating Overthinking through Reasoning Shaping
Cover image for Mitigating Overthinking through Reasoning Shaping

Mitigating Overthinking through Reasoning Shaping

Comments
1 min read
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Cover image for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Comments
1 min read
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Cover image for GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

Comments
1 min read
Understanding DeepResearch via Reports
Cover image for Understanding DeepResearch via Reports

Understanding DeepResearch via Reports

Comments
1 min read
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Cover image for One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

Comments
2 min read
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation
Cover image for Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Comments
1 min read
Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation
Cover image for Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Comments
1 min read
Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels
Cover image for Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Comments
2 min read
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
Cover image for LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Comments
1 min read
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Cover image for ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

Comments
1 min read
Formalizing Style in Personal Narratives
Cover image for Formalizing Style in Personal Narratives

Formalizing Style in Personal Narratives

Comments
1 min read
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology
Cover image for LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

Comments
1 min read
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Cover image for Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Comments
1 min read
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Cover image for ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

Comments
2 min read
Instant4D: 4D Gaussian Splatting in Minutes
Cover image for Instant4D: 4D Gaussian Splatting in Minutes

Instant4D: 4D Gaussian Splatting in Minutes

Comments
1 min read
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Cover image for QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Comments
1 min read
Diffusion Transformers with Representation Autoencoders
Cover image for Diffusion Transformers with Representation Autoencoders

Diffusion Transformers with Representation Autoencoders

Comments
1 min read
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Cover image for OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Comments
2 min read
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States
Cover image for Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Comments
1 min read
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Cover image for RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Comments
1 min read
Spotlight on Token Perception for Multimodal Reinforcement Learning
Cover image for Spotlight on Token Perception for Multimodal Reinforcement Learning

Spotlight on Token Perception for Multimodal Reinforcement Learning

Comments
1 min read
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Cover image for AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Comments
1 min read
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Cover image for DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Comments
1 min read
Making Mathematical Reasoning Adaptive
Cover image for Making Mathematical Reasoning Adaptive

Making Mathematical Reasoning Adaptive

Comments
1 min read
Demystifying Reinforcement Learning in Agentic Reasoning
Cover image for Demystifying Reinforcement Learning in Agentic Reasoning

Demystifying Reinforcement Learning in Agentic Reasoning

Comments
1 min read
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Cover image for InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Comments
1 min read
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Cover image for Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Comments
1 min read
ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems
Cover image for ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

Comments
1 min read
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Cover image for BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Comments
1 min read
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs
Cover image for FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

Comments
1 min read
DocReward: A Document Reward Model for Structuring and Stylizing
Cover image for DocReward: A Document Reward Model for Structuring and Stylizing

DocReward: A Document Reward Model for Structuring and Stylizing

Comments
1 min read
Don't Just Fine-tune the Agent, Tune the Environment
Cover image for Don't Just Fine-tune the Agent, Tune the Environment

Don't Just Fine-tune the Agent, Tune the Environment

Comments
1 min read
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Cover image for GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Comments
2 min read
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes
Cover image for AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

Comments
1 min read
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Cover image for Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Comments
1 min read
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Cover image for SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Comments
1 min read
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Cover image for CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Comments
1 min read
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models
Cover image for On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

Comments
1 min read
High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting
Cover image for High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

Comments
1 min read
Skill-Targeted Adaptive Training
Cover image for Skill-Targeted Adaptive Training

Skill-Targeted Adaptive Training

Comments
1 min read
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Cover image for ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

Comments
2 min read
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Cover image for PEAR: Phase Entropy Aware Reward for Efficient Reasoning

PEAR: Phase Entropy Aware Reward for Efficient Reasoning

Comments
1 min read
Self-Improving LLM Agents at Test-Time
Cover image for Self-Improving LLM Agents at Test-Time

Self-Improving LLM Agents at Test-Time

Comments
1 min read
FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding
Cover image for FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding

FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding

Comments
1 min read
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
Cover image for The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

Comments
1 min read
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Cover image for Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Comments
1 min read
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference
Cover image for LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference

Comments
2 min read
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
Cover image for HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Comments
1 min read
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning
Cover image for SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning

SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning

Comments
1 min read
From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation
Cover image for From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation

From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation

Comments
1 min read
InfiniHuman: Infinite 3D Human Creation with Precise Control
Cover image for InfiniHuman: Infinite 3D Human Creation with Precise Control

InfiniHuman: Infinite 3D Human Creation with Precise Control

Comments
1 min read
LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning
Cover image for LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

Comments
1 min read
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge
Cover image for World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge

Comments
1 min read
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
Cover image for RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

Comments
1 min read
Multimodal Policy Internalization for Conversational Agents
Cover image for Multimodal Policy Internalization for Conversational Agents

Multimodal Policy Internalization for Conversational Agents

Comments
1 min read
Graph Diffusion Transformers are In-Context Molecular Designers
Cover image for Graph Diffusion Transformers are In-Context Molecular Designers

Graph Diffusion Transformers are In-Context Molecular Designers

Comments
1 min read
VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing
Cover image for VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing

VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing

Comments
2 min read
A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining
Cover image for A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

Comments
1 min read
Are Large Reasoning Models Interruptible?
Cover image for Are Large Reasoning Models Interruptible?

Are Large Reasoning Models Interruptible?

Comments
1 min read
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Cover image for IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

Comments
1 min read
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel
Cover image for AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel

Comments
1 min read
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models
Cover image for ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models

Comments
2 min read
The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution
Cover image for The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution

Comments
1 min read
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs
Cover image for CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs

CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs

Comments
1 min read
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections
Cover image for The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections

Comments
1 min read
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen
Cover image for Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen

Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen

Comments
1 min read
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers
Cover image for The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers

The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers

Comments
2 min read
MultiCOIN: Multi-Modal COntrollable Video INbetweening
Cover image for MultiCOIN: Multi-Modal COntrollable Video INbetweening

MultiCOIN: Multi-Modal COntrollable Video INbetweening

Comments
1 min read
Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior
Cover image for Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior

Comments
2 min read
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization
Cover image for Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

Comments
1 min read
FlashWorld: High-quality 3D Scene Generation within Seconds
Cover image for FlashWorld: High-quality 3D Scene Generation within Seconds

FlashWorld: High-quality 3D Scene Generation within Seconds

Comments
1 min read
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving
Cover image for CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving

Comments
1 min read
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Cover image for InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Comments
1 min read
Generative Universal Verifier as Multimodal Meta-Reasoner
Cover image for Generative Universal Verifier as Multimodal Meta-Reasoner

Generative Universal Verifier as Multimodal Meta-Reasoner

Comments
1 min read
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs
Cover image for Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs

Comments
2 min read
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Cover image for Trace Anything: Representing Any Video in 4D via Trajectory Fields

Trace Anything: Representing Any Video in 4D via Trajectory Fields

Comments
2 min read
ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs
Cover image for ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs

ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs

Comments
1 min read
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Cover image for LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Comments
1 min read
The Role of Computing Resources in Publishing Foundation Model Research
Cover image for The Role of Computing Resources in Publishing Foundation Model Research

The Role of Computing Resources in Publishing Foundation Model Research

Comments
1 min read
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Cover image for UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Comments
2 min read
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
Cover image for Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Comments
1 min read
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
Cover image for FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Comments
2 min read
PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning
Cover image for PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning

PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning

Comments
1 min read
Revisiting Model Interpolation for Efficient Reasoning
Cover image for Revisiting Model Interpolation for Efficient Reasoning

Revisiting Model Interpolation for Efficient Reasoning

Comments
1 min read
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Cover image for UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Comments
1 min read
Direct Multi-Token Decoding
Cover image for Direct Multi-Token Decoding

Direct Multi-Token Decoding

Comments
1 min read
NOSA: Native and Offloadable Sparse Attention
Cover image for NOSA: Native and Offloadable Sparse Attention

NOSA: Native and Offloadable Sparse Attention

Comments
1 min read
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving
Cover image for CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving

Comments
1 min read
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Cover image for Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Comments
1 min read
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training
Cover image for MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Comments
2 min read
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication
Cover image for HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication

HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication

Comments
1 min read
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search
Cover image for GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search

GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search

Comments
1 min read
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy
Cover image for InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy

Comments
2 min read
Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs
Cover image for Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs

Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs

Comments
2 min read
Universal Image Restoration Pre-training via Masked Degradation Classification
Cover image for Universal Image Restoration Pre-training via Masked Degradation Classification

Universal Image Restoration Pre-training via Masked Degradation Classification

Comments
1 min read
X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model
Cover image for X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model

X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model

Comments
1 min read
WithAnyone: Towards Controllable and ID Consistent Image Generation
Cover image for WithAnyone: Towards Controllable and ID Consistent Image Generation

WithAnyone: Towards Controllable and ID Consistent Image Generation

Comments
1 min read
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Cover image for From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Comments
1 min read
Agentic Entropy-Balanced Policy Optimization
Cover image for Agentic Entropy-Balanced Policy Optimization

Agentic Entropy-Balanced Policy Optimization

Comments
1 min read
AI for Service: Proactive Assistance with AI Glasses
Cover image for AI for Service: Proactive Assistance with AI Glasses

AI for Service: Proactive Assistance with AI Glasses

Comments
1 min read
Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents
Cover image for Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents

Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents

Comments
2 min read
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model
Cover image for PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model

Comments
1 min read
Attention Is All You Need for KV Cache in Diffusion LLMs
Cover image for Attention Is All You Need for KV Cache in Diffusion LLMs

Attention Is All You Need for KV Cache in Diffusion LLMs

Comments
1 min read
BitNet Distillation
Cover image for BitNet Distillation

BitNet Distillation

Comments
1 min read
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
Cover image for TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Comments 1
2 min read
LLM-guided Hierarchical Retrieval
Cover image for LLM-guided Hierarchical Retrieval

LLM-guided Hierarchical Retrieval

Comments 1
1 min read
Qwen3Guard Technical Report
Cover image for Qwen3Guard Technical Report

Qwen3Guard Technical Report

Comments
1 min read
Large Language Models Do NOT Really Know What They Don't Know
Cover image for Large Language Models Do NOT Really Know What They Don't Know

Large Language Models Do NOT Really Know What They Don't Know

Comments
1 min read
Learning an Image Editing Model without Image Editing Pairs
Cover image for Learning an Image Editing Model without Image Editing Pairs

Learning an Image Editing Model without Image Editing Pairs

Comments
1 min read
VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator
Cover image for VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator

VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator

Comments
1 min read
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Cover image for pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Comments
1 min read
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning
Cover image for MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning

Comments
1 min read
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport
Cover image for Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport

Comments
1 min read
Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning
Cover image for Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning

Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning

Comments
2 min read
MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems
Cover image for MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems

Comments
1 min read
RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels
Cover image for RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels

RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels

Comments
1 min read
Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation
Cover image for Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation

Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation

Comments
1 min read
Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts
Cover image for Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts

Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts

Comments
1 min read
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA
Cover image for When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA

Comments
2 min read
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints
Cover image for ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints

Comments
1 min read
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses
Cover image for COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses

Comments
1 min read
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation
Cover image for VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation

Comments
1 min read
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
Cover image for Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Comments
1 min read
LLMs Can Get Brain Rot!
Cover image for LLMs Can Get Brain Rot!

LLMs Can Get Brain Rot!

Comments
1 min read
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Cover image for LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Comments
1 min read
Agentic Design of Compositional Machines
Cover image for Agentic Design of Compositional Machines

Agentic Design of Compositional Machines

Comments
1 min read
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Cover image for VLA-0: Building State-of-the-Art VLAs with Zero Modification

VLA-0: Building State-of-the-Art VLAs with Zero Modification

Comments
1 min read
SimKO: Simple Pass@K Policy Optimization
Cover image for SimKO: Simple Pass@K Policy Optimization

SimKO: Simple Pass@K Policy Optimization

Comments
1 min read
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Cover image for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Comments
1 min read
DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration
Cover image for DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration

DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration

Comments
2 min read
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Cover image for LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Comments
1 min read
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models
Cover image for Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models

Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models

Comments
1 min read
RealDPO: Real or Not Real, that is the Preference
Cover image for RealDPO: Real or Not Real, that is the Preference

RealDPO: Real or Not Real, that is the Preference

Comments
1 min read
The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models
Cover image for The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models

The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models

Comments
1 min read
On Pretraining for Project-Level Code Completion
Cover image for On Pretraining for Project-Level Code Completion

On Pretraining for Project-Level Code Completion

Comments
2 min read
Budget-aware Test-time Scaling via Discriminative Verification
Cover image for Budget-aware Test-time Scaling via Discriminative Verification

Budget-aware Test-time Scaling via Discriminative Verification

Comments
1 min read
FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth
Cover image for FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth

FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth

Comments
1 min read
Predicting Task Performance with Context-aware Scaling Laws
Cover image for Predicting Task Performance with Context-aware Scaling Laws

Predicting Task Performance with Context-aware Scaling Laws

Comments
1 min read
Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms
Cover image for Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms

Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms

Comments
2 min read
AnyUp: Universal Feature Upsampling
Cover image for AnyUp: Universal Feature Upsampling

AnyUp: Universal Feature Upsampling

Comments
1 min read
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis
Cover image for SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis

SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis

Comments
1 min read
GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning
Cover image for GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning

GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning

Comments
1 min read
Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning
Cover image for Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning

Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning

Comments
1 min read
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems
Cover image for RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems

RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems

Comments
1 min read
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Cover image for Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Comments
1 min read
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Cover image for LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Comments
1 min read
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Cover image for OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Comments
1 min read
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Cover image for NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Comments
1 min read
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Cover image for Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Comments
1 min read
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Cover image for Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Comments
1 min read
Latent Diffusion Model without Variational Autoencoder
Cover image for Latent Diffusion Model without Variational Autoencoder

Latent Diffusion Model without Variational Autoencoder

Comments
1 min read
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Cover image for LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Comments
1 min read
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Cover image for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Comments
1 min read
A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Cover image for A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Comments
1 min read
BLIP3o-NEXT: Next Frontier of Native Image Generation
Cover image for BLIP3o-NEXT: Next Frontier of Native Image Generation

BLIP3o-NEXT: Next Frontier of Native Image Generation

Comments
1 min read
Language Models Model Language
Cover image for Language Models Model Language

Language Models Model Language

Comments
1 min read
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training
Cover image for InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training

Comments
2 min read
Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation
Cover image for Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

Comments
2 min read
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents
Cover image for Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents

Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents

Comments
2 min read
Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition
Cover image for Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition

Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition

Comments
2 min read
VISTA: A Test-Time Self-Improving Video Generation Agent
Cover image for VISTA: A Test-Time Self-Improving Video Generation Agent

VISTA: A Test-Time Self-Improving Video Generation Agent

Comments
1 min read
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning
Cover image for DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning

Comments
1 min read
Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs
Cover image for Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs

Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs

Comments
2 min read
Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation
Cover image for Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation

Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation

Comments
1 min read
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain
Cover image for FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain

Comments
1 min read
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Cover image for Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

Comments
1 min read
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models
Cover image for Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models

Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models

Comments
2 min read
Paper2Web: Let's Make Your Paper Alive!
Cover image for Paper2Web: Let's Make Your Paper Alive!

Paper2Web: Let's Make Your Paper Alive!

Comments
1 min read
Train a Unified Multimodal Data Quality Classifier with Synthetic Data
Cover image for Train a Unified Multimodal Data Quality Classifier with Synthetic Data

Train a Unified Multimodal Data Quality Classifier with Synthetic Data

Comments
1 min read
PICABench: How Far Are We from Physically Realistic Image Editing?
Cover image for PICABench: How Far Are We from Physically Realistic Image Editing?

PICABench: How Far Are We from Physically Realistic Image Editing?

Comments
1 min read
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Cover image for DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Comments
1 min read
Glyph: Scaling Context Windows via Visual-Text Compression
Cover image for Glyph: Scaling Context Windows via Visual-Text Compression

Glyph: Scaling Context Windows via Visual-Text Compression

Comments
1 min read
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Cover image for Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Comments
1 min read
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling
Cover image for When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling

Comments
1 min read
FineVision: Open Data Is All You Need
Cover image for FineVision: Open Data Is All You Need

FineVision: Open Data Is All You Need

Comments
1 min read
QueST: Incentivizing LLMs to Generate Difficult Problems
Cover image for QueST: Incentivizing LLMs to Generate Difficult Problems

QueST: Incentivizing LLMs to Generate Difficult Problems

Comments
1 min read
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
Cover image for Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

Comments
2 min read