DEV Community

AI Series' Articles

Back to Paperium's Series
Agent Learning via Early Experience
Cover image for Agent Learning via Early Experience

Agent Learning via Early Experience

Comments
2 min read
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization
Cover image for MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with HolisticPlatform and Adaptive Hybrid Policy Optimization

Comments
2 min read
MemMamba: Rethinking Memory Patterns in State Space Model
Cover image for MemMamba: Rethinking Memory Patterns in State Space Model

MemMamba: Rethinking Memory Patterns in State Space Model

Comments
2 min read
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cover image for UniVideo: Unified Understanding, Generation, and Editing for Videos

UniVideo: Unified Understanding, Generation, and Editing for Videos

Comments
2 min read
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning
Cover image for VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches viaIn-Context Conditioning

Comments
2 min read
DreamOmni2: Multimodal Instruction-based Editing and Generation
Cover image for DreamOmni2: Multimodal Instruction-based Editing and Generation

DreamOmni2: Multimodal Instruction-based Editing and Generation

Comments
2 min read
From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning
Cover image for From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning

From What to Why: A Multi-Agent System for Evidence-based Chemical ReactionCondition Reasoning

Comments
2 min read
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Cover image for Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Comments
2 min read
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Cover image for When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Comments
2 min read
Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward
Cover image for Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward

Low-probability Tokens Sustain Exploration in Reinforcement Learning withVerifiable Reward

Comments
3 min read
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
Cover image for The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Comments
2 min read
Training-Free Group Relative Policy Optimization
Cover image for Training-Free Group Relative Policy Optimization

Training-Free Group Relative Policy Optimization

Comments
2 min read
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Cover image for Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Comments
2 min read
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Cover image for NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Comments
2 min read
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation
Cover image for ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction withStructured Scene Representation

Comments
2 min read
DeepPrune: Parallel Scaling without Inter-trace Redundancy
Cover image for DeepPrune: Parallel Scaling without Inter-trace Redundancy

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Comments
2 min read
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Cover image for First Try Matters: Revisiting the Role of Reflection in Reasoning Models

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Comments
2 min read
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction
Cover image for LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty fromMisaligned Samples to Biased Human-AI Interaction

Comments
1 min read
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
Cover image for UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution

Comments
1 min read
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints
Cover image for NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Modelsunder Data Constraints

Comments
1 min read
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Cover image for CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

Comments
1 min read
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters
Cover image for PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters

Comments
1 min read
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
Cover image for UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Comments
2 min read
InstructX: Towards Unified Visual Editing with MLLM Guidance
Cover image for InstructX: Towards Unified Visual Editing with MLLM Guidance

InstructX: Towards Unified Visual Editing with MLLM Guidance

Comments
1 min read
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
Cover image for LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

Comments
1 min read
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks
Cover image for Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-HorizonTasks

Comments
2 min read
Reinforcing Diffusion Models by Direct Group Preference Optimization
Cover image for Reinforcing Diffusion Models by Direct Group Preference Optimization

Reinforcing Diffusion Models by Direct Group Preference Optimization

Comments
1 min read
Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction
Cover image for Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction

Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction

Comments
2 min read
Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as
Cover image for Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as

Entropy Regularizing Activation: Boosting Continuous Control, Large LanguageModels, and Image Classification with Activation as

Comments
1 min read
Memory Retrieval and Consolidation in Large Language Models through FunctionTokens
Cover image for Memory Retrieval and Consolidation in Large Language Models through FunctionTokens

Memory Retrieval and Consolidation in Large Language Models through FunctionTokens

Comments
1 min read
Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training
Cover image for Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts forEfficient Large Language Model Pre-Training

Comments
1 min read
GCPO: When Contrast Fails, Go Gold
Cover image for GCPO: When Contrast Fails, Go Gold

GCPO: When Contrast Fails, Go Gold

Comments
1 min read
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Cover image for UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

Comments
1 min read
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction
Cover image for OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-BodyLoco-Manipulation and Scene Interaction

Comments
1 min read
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model
Cover image for DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-WiseNeural Dynamics Model

Comments
2 min read
A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Cover image for A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning

Comments
1 min read
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs
Cover image for Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Comments
2 min read
Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models
Cover image for Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models

Comments
1 min read
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Cover image for R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Comments
1 min read
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Cover image for Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models

Comments
1 min read
Beyond Outliers: A Study of Optimizers Under Quantization
Cover image for Beyond Outliers: A Study of Optimizers Under Quantization

Beyond Outliers: A Study of Optimizers Under Quantization

Comments
2 min read
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Cover image for SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Comments
1 min read
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations
Cover image for GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Comments
2 min read
Towards Scalable and Consistent 3D Editing
Cover image for Towards Scalable and Consistent 3D Editing

Towards Scalable and Consistent 3D Editing

Comments
1 min read
Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning
Cover image for Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning

Use the Online Network If You Can: Towards Fast and Stable ReinforcementLearning

Comments
1 min read
Fidelity-Aware Data Composition for Robust Robot Generalization
Cover image for Fidelity-Aware Data Composition for Robust Robot Generalization

Fidelity-Aware Data Composition for Robust Robot Generalization

Comments
1 min read
SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels
Cover image for SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels

SciVideoBench: Benchmarking Scientific Video Reasoning in Large MultimodalModels

Comments
1 min read
Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency
Cover image for Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency

Large Scale Diffusion Distillation via Score-Regularized Continuous-TimeConsistency

Comments
1 min read
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
Cover image for Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Comments
1 min read
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment
Cover image for OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modelingand LLM Alignment

Comments
1 min read
Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation
Cover image for Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation

Thinking with Camera: A Unified Multimodal Model for Camera-CentricUnderstanding and Generation

Comments
1 min read
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI
Cover image for D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to EmbodiedAI

Comments
1 min read
TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling
Cover image for TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling

TAG:Tangential Amplifying Guidance for Hallucination-Resistant DiffusionSampling

Comments
1 min read
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Cover image for Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Comments
1 min read
AutoPR: Let's Automate Your Academic Promotion!
Cover image for AutoPR: Let's Automate Your Academic Promotion!

AutoPR: Let's Automate Your Academic Promotion!

Comments
1 min read
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?
Cover image for R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth andDepth?

Comments
2 min read
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Cover image for Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Comments
1 min read
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km
Cover image for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Comments
1 min read
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Cover image for StreamingVLM: Real-Time Understanding for Infinite Video Streams

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Comments
1 min read
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting
Cover image for Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Comments
1 min read
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping
Cover image for ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level EntropyShaping

Comments
1 min read
KORMo: Korean Open Reasoning Model for Everyone
Cover image for KORMo: Korean Open Reasoning Model for Everyone

KORMo: Korean Open Reasoning Model for Everyone

Comments
1 min read
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Cover image for DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Comments
1 min read
Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization
Cover image for Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out ofDistribution Generalization

Comments
1 min read
Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction
Cover image for Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Progressive Gaussian Transformer with Anisotropy-aware Sampling for OpenVocabulary Occupancy Prediction

Comments
1 min read
StatEval: A Comprehensive Benchmark for Large Language Models in Statistics
Cover image for StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics

Comments
1 min read
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval
Cover image for MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark forReasoning-Intensive Multimodal Retrieval

Comments
1 min read
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Cover image for PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Comments
1 min read
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution
Cover image for BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation viaExecution

Comments
1 min read
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
Cover image for Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Comments
1 min read
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Cover image for Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Comments
2 min read
ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview
Cover image for ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

ReviewerToo: Should AI Join The Program Committee? A Look At The Future of PeerReview

Comments
2 min read
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition
Cover image for Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic SpeechRecognition

Comments
1 min read
Parallel Test-Time Scaling for Latent Reasoning Models
Cover image for Parallel Test-Time Scaling for Latent Reasoning Models

Parallel Test-Time Scaling for Latent Reasoning Models

Comments
1 min read
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models
Cover image for Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in SpokenLanguage Models

Comments
1 min read
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks
Cover image for A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global PlannerTraining for Long-Horizon Agent Tasks

Comments
2 min read
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control
Cover image for TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

Comments
1 min read
Mitigating Overthinking through Reasoning Shaping
Cover image for Mitigating Overthinking through Reasoning Shaping

Mitigating Overthinking through Reasoning Shaping

Comments
1 min read
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Cover image for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Comments
1 min read
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Cover image for GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare

Comments
1 min read
Understanding DeepResearch via Reports
Cover image for Understanding DeepResearch via Reports

Understanding DeepResearch via Reports

Comments
1 min read
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Cover image for One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

Comments
2 min read
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation
Cover image for Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance forSelf-supervised Monocular Depth Estimation

Comments
1 min read
Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation
Cover image for Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Speculative Jacobi-Denoising Decoding for Accelerating AutoregressiveText-to-image Generation

Comments
1 min read
Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels
Cover image for Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Better Together: Leveraging Unpaired Multimodal Data for Stronger UnimodalModels

Comments
2 min read
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
Cover image for LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Comments
1 min read
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Cover image for ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall

Comments
1 min read
Formalizing Style in Personal Narratives
Cover image for Formalizing Style in Personal Narratives

Formalizing Style in Personal Narratives

Comments
1 min read
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology
Cover image for LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

Comments
1 min read
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Cover image for Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Comments
1 min read
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Cover image for ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

Comments
2 min read
Instant4D: 4D Gaussian Splatting in Minutes
Cover image for Instant4D: 4D Gaussian Splatting in Minutes

Instant4D: 4D Gaussian Splatting in Minutes

Comments
1 min read
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Cover image for QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Comments
1 min read
Diffusion Transformers with Representation Autoencoders
Cover image for Diffusion Transformers with Representation Autoencoders

Diffusion Transformers with Representation Autoencoders

Comments
1 min read
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Cover image for OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Comments
2 min read
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States
Cover image for Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models byRefining Belief States

Comments
1 min read
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
Cover image for RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Comments
1 min read
Spotlight on Token Perception for Multimodal Reinforcement Learning
Cover image for Spotlight on Token Perception for Multimodal Reinforcement Learning

Spotlight on Token Perception for Multimodal Reinforcement Learning

Comments
1 min read
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Cover image for AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Comments
1 min read
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
Cover image for DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Comments
1 min read
Making Mathematical Reasoning Adaptive
Cover image for Making Mathematical Reasoning Adaptive

Making Mathematical Reasoning Adaptive

Comments
1 min read
Demystifying Reinforcement Learning in Agentic Reasoning
Cover image for Demystifying Reinforcement Learning in Agentic Reasoning

Demystifying Reinforcement Learning in Agentic Reasoning

Comments
1 min read
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Cover image for InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Comments
1 min read
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Cover image for Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Comments
1 min read
ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems
Cover image for ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

ACADREASON: Exploring the Limits of Reasoning Models with Academic ResearchProblems

Comments
1 min read
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
Cover image for BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Comments
1 min read
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs
Cover image for FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark forEvaluating LLMs

Comments
1 min read
DocReward: A Document Reward Model for Structuring and Stylizing
Cover image for DocReward: A Document Reward Model for Structuring and Stylizing

DocReward: A Document Reward Model for Structuring and Stylizing

Comments
1 min read
Don't Just Fine-tune the Agent, Tune the Environment
Cover image for Don't Just Fine-tune the Agent, Tune the Environment

Don't Just Fine-tune the Agent, Tune the Environment

Comments
1 min read
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Cover image for GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

GIR-Bench: Versatile Benchmark for Generating Images with Reasoning

Comments
2 min read
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes
Cover image for AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4DScenes

Comments
1 min read
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Cover image for Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Comments
1 min read
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Cover image for SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Comments
1 min read
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Cover image for CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Comments
1 min read
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models
Cover image for On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in LargeVision-Language Models

Comments
1 min read
High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting
Cover image for High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

High-Fidelity Simulated Data Generation for Real-World Zero-Shot RoboticManipulation Learning with Gaussian Splatting

Comments
1 min read
Skill-Targeted Adaptive Training
Cover image for Skill-Targeted Adaptive Training

Skill-Targeted Adaptive Training

Comments
1 min read
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Cover image for ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

Comments
2 min read
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Cover image for PEAR: Phase Entropy Aware Reward for Efficient Reasoning

PEAR: Phase Entropy Aware Reward for Efficient Reasoning

Comments
1 min read
Self-Improving LLM Agents at Test-Time
Cover image for Self-Improving LLM Agents at Test-Time

Self-Improving LLM Agents at Test-Time

Comments
1 min read
FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding
Cover image for FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding

FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging withDiffusion Decoding

Comments
1 min read
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
Cover image for The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

Comments
1 min read
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Cover image for Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Comments
1 min read
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference
Cover image for LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Modelsvia Likelihood Preference

Comments
2 min read
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
Cover image for HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Comments
1 min read
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning
Cover image for SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning

SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and AdaptiveReasoning

Comments
1 min read
From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation
Cover image for From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation

From Data to Rewards: a Bilevel Optimization Perspective on Maximum LikelihoodEstimation

Comments
1 min read
InfiniHuman: Infinite 3D Human Creation with Precise Control
Cover image for InfiniHuman: Infinite 3D Human Creation with Precise Control

InfiniHuman: Infinite 3D Human Creation with Precise Control

Comments
1 min read
LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning
Cover image for LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

Comments
1 min read
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge
Cover image for World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven WorldKnowledge

Comments
1 min read
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
Cover image for RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

Comments
1 min read
Multimodal Policy Internalization for Conversational Agents
Cover image for Multimodal Policy Internalization for Conversational Agents

Multimodal Policy Internalization for Conversational Agents

Comments
1 min read
Graph Diffusion Transformers are In-Context Molecular Designers
Cover image for Graph Diffusion Transformers are In-Context Molecular Designers

Graph Diffusion Transformers are In-Context Molecular Designers

Comments
1 min read
VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing
Cover image for VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing

VER: Vision Expert Transformer for Robot Learning via Foundation Distillationand Dynamic Routing

Comments
2 min read
A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining
Cover image for A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

Comments
1 min read
Are Large Reasoning Models Interruptible?
Cover image for Are Large Reasoning Models Interruptible?

Are Large Reasoning Models Interruptible?

Comments
1 min read
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Cover image for IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

Comments
1 min read
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel
Cover image for AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large LanguageModel

Comments
1 min read
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models
Cover image for ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for LargeVision-and-Language Models

Comments
2 min read
The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution
Cover image for The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns EnableHigh-Accuracy Authorship Attribution

Comments
1 min read
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs
Cover image for CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs

CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biasesin LLMs

Comments
1 min read
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections
Cover image for The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LlmJailbreaks and Prompt Injections

Comments
1 min read
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen
Cover image for Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen

Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-AwareAnnotation Pipeline for Terrestrial Point Cloud Segmen

Comments
1 min read
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers
Cover image for The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers

The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-FormAnswers

Comments
2 min read
MultiCOIN: Multi-Modal COntrollable Video INbetweening
Cover image for MultiCOIN: Multi-Modal COntrollable Video INbetweening

MultiCOIN: Multi-Modal COntrollable Video INbetweening

Comments
1 min read
Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior
Cover image for Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole SlideImage Diagnosis Behavior

Comments
2 min read
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization
Cover image for Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm EnablesFine-Grained Policy Optimization

Comments
1 min read
FlashWorld: High-quality 3D Scene Generation within Seconds
Cover image for FlashWorld: High-quality 3D Scene Generation within Seconds

FlashWorld: High-quality 3D Scene Generation within Seconds

Comments
1 min read
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving
Cover image for CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Modelfor Autonomous Driving

Comments
1 min read
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Cover image for InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Comments
1 min read
Generative Universal Verifier as Multimodal Meta-Reasoner
Cover image for Generative Universal Verifier as Multimodal Meta-Reasoner

Generative Universal Verifier as Multimodal Meta-Reasoner

Comments
1 min read
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs
Cover image for Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully OpenMLLMs

Comments
2 min read
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Cover image for Trace Anything: Representing Any Video in 4D via Trajectory Fields

Trace Anything: Representing Any Video in 4D via Trajectory Fields

Comments
2 min read
ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs
Cover image for ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs

ParallelBench: Understanding the Trade-offs of Parallel Decoding in DiffusionLLMs

Comments
1 min read
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Cover image for LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Comments
1 min read
The Role of Computing Resources in Publishing Foundation Model Research
Cover image for The Role of Computing Resources in Publishing Foundation Model Research

The Role of Computing Resources in Publishing Foundation Model Research

Comments
1 min read
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Cover image for UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Comments
2 min read
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
Cover image for Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Comments
1 min read
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
Cover image for FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Comments
2 min read
PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning
Cover image for PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning

PhysMaster: Mastering Physical Representation for Video Generation viaReinforcement Learning

Comments
1 min read
Revisiting Model Interpolation for Efficient Reasoning
Cover image for Revisiting Model Interpolation for Efficient Reasoning

Revisiting Model Interpolation for Efficient Reasoning

Comments
1 min read
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Cover image for UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Comments
1 min read
Direct Multi-Token Decoding
Cover image for Direct Multi-Token Decoding

Direct Multi-Token Decoding

Comments
1 min read
NOSA: Native and Offloadable Sparse Attention
Cover image for NOSA: Native and Offloadable Sparse Attention

NOSA: Native and Offloadable Sparse Attention

Comments
1 min read
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving
Cover image for CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in LatentWorld Models for Autonomous Driving

Comments
1 min read
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Cover image for Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Comments
1 min read
MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training
Cover image for MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training

Comments
2 min read
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication
Cover image for HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication

HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-AgentCommunication

Comments
1 min read
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search
Cover image for GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search

GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-TurnDeep Search

Comments
1 min read
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy
Cover image for InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for GeneralistRobot Policy

Comments
2 min read
Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs
Cover image for Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs

Deflanderization for Game Dialogue: Balancing Character Authenticity with TaskExecution in LLM-based NPCs

Comments
2 min read
Universal Image Restoration Pre-training via Masked Degradation Classification
Cover image for Universal Image Restoration Pre-training via Masked Degradation Classification

Universal Image Restoration Pre-training via Masked Degradation Classification

Comments
1 min read
X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model
Cover image for X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model

X-VLA: Soft-Prompted Transformer as Scalable Cross-EmbodimentVision-Language-Action Model

Comments
1 min read
WithAnyone: Towards Controllable and ID Consistent Image Generation
Cover image for WithAnyone: Towards Controllable and ID Consistent Image Generation

WithAnyone: Towards Controllable and ID Consistent Image Generation

Comments
1 min read
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Cover image for From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Comments
1 min read
Agentic Entropy-Balanced Policy Optimization
Cover image for Agentic Entropy-Balanced Policy Optimization

Agentic Entropy-Balanced Policy Optimization

Comments
1 min read
AI for Service: Proactive Assistance with AI Glasses
Cover image for AI for Service: Proactive Assistance with AI Glasses

AI for Service: Proactive Assistance with AI Glasses

Comments
1 min read
Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents
Cover image for Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents

Information Gain-based Policy Optimization: A Simple and Effective Approach forMulti-Turn LLM Agents

Comments
2 min read
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model
Cover image for PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-CompactVision-Language Model

Comments
1 min read
Attention Is All You Need for KV Cache in Diffusion LLMs
Cover image for Attention Is All You Need for KV Cache in Diffusion LLMs

Attention Is All You Need for KV Cache in Diffusion LLMs

Comments
1 min read
BitNet Distillation
Cover image for BitNet Distillation

BitNet Distillation

Comments
1 min read
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
Cover image for TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Comments 1
2 min read
LLM-guided Hierarchical Retrieval
Cover image for LLM-guided Hierarchical Retrieval

LLM-guided Hierarchical Retrieval

Comments 1
1 min read
Qwen3Guard Technical Report
Cover image for Qwen3Guard Technical Report

Qwen3Guard Technical Report

Comments
1 min read
Large Language Models Do NOT Really Know What They Don't Know
Cover image for Large Language Models Do NOT Really Know What They Don't Know

Large Language Models Do NOT Really Know What They Don't Know

Comments
1 min read
Learning an Image Editing Model without Image Editing Pairs
Cover image for Learning an Image Editing Model without Image Editing Pairs

Learning an Image Editing Model without Image Editing Pairs

Comments
1 min read
VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator
Cover image for VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator

VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a VideoGenerator

Comments
1 min read
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Cover image for pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Comments
1 min read
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning
Cover image for MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal MathematicalReasoning

Comments
1 min read
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport
Cover image for Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 TechReport

Comments
1 min read
Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning
Cover image for Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning

Expertise need not monopolize: Action-Specialized Mixture of Experts forVision-Language-Action Learning

Comments
2 min read
MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems
Cover image for MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-AugmentedGeneration Systems

Comments
1 min read
RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels
Cover image for RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels

RefusalBench: Generative Evaluation of Selective Refusal in Grounded LanguageModels

Comments
1 min read
Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation
Cover image for Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation

Ponimator: Unfolding Interactive Pose for Versatile Human-human InteractionAnimation

Comments
1 min read
Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts
Cover image for Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts

Beyond One World: Benchmarking Super Heros in Role-Playing Across MultiversalContexts

Comments
1 min read
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA
Cover image for When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection withPsiloQA

Comments
2 min read
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints
Cover image for ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond SemanticDependency Constraints

Comments
1 min read
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses
Cover image for COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with ThoughtProcesses

Comments
1 min read
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation
Cover image for VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework forUnseen Concept Manipulation

Comments
1 min read
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
Cover image for Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

Comments
1 min read
LLMs Can Get Brain Rot!
Cover image for LLMs Can Get Brain Rot!

LLMs Can Get Brain Rot!

Comments
1 min read
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Cover image for LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Comments
1 min read
Agentic Design of Compositional Machines
Cover image for Agentic Design of Compositional Machines

Agentic Design of Compositional Machines

Comments
1 min read
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Cover image for VLA-0: Building State-of-the-Art VLAs with Zero Modification

VLA-0: Building State-of-the-Art VLAs with Zero Modification

Comments
1 min read
SimKO: Simple Pass@K Policy Optimization
Cover image for SimKO: Simple Pass@K Policy Optimization

SimKO: Simple Pass@K Policy Optimization

Comments
1 min read
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Cover image for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Comments
1 min read
DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration
Cover image for DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration

DialectGen: Benchmarking and Improving Dialect Robustness in MultimodalGeneration

Comments
2 min read
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
Cover image for LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Comments
1 min read
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models
Cover image for Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models

Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection toDiffusion Language Models

Comments
1 min read
RealDPO: Real or Not Real, that is the Preference
Cover image for RealDPO: Real or Not Real, that is the Preference

RealDPO: Real or Not Real, that is the Preference

Comments
1 min read
The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models
Cover image for The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models

The German Commons - 154 Billion Tokens of Openly Licensed Text for GermanLanguage Models

Comments
1 min read
On Pretraining for Project-Level Code Completion
Cover image for On Pretraining for Project-Level Code Completion

On Pretraining for Project-Level Code Completion

Comments
2 min read
Budget-aware Test-time Scaling via Discriminative Verification
Cover image for Budget-aware Test-time Scaling via Discriminative Verification

Budget-aware Test-time Scaling via Discriminative Verification

Comments
1 min read
FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth
Cover image for FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth

FML-bench: A Benchmark for Automatic ML Research Agents Highlighting theImportance of Exploration Breadth

Comments
1 min read
Predicting Task Performance with Context-aware Scaling Laws
Cover image for Predicting Task Performance with Context-aware Scaling Laws

Predicting Task Performance with Context-aware Scaling Laws

Comments
1 min read
Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms
Cover image for Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms

Synthesizing Agentic Data for Web Agents with Progressive Difficulty EnhancementMechanisms

Comments
2 min read
AnyUp: Universal Feature Upsampling
Cover image for AnyUp: Universal Feature Upsampling

AnyUp: Universal Feature Upsampling

Comments
1 min read
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis
Cover image for SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis

SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel ViewSynthesis

Comments
1 min read
GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning
Cover image for GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning

GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling forStep-Level Reasoning

Comments
1 min read
Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning
Cover image for Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning

Unlocking Out-of-Distribution Generalization in Transformers via RecursiveLatent Space Reasoning

Comments
1 min read
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems
Cover image for RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems

RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval AugmentedGeneration Systems

Comments
1 min read
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Cover image for Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Comments
1 min read
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Cover image for LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Comments
1 min read
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Cover image for OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Comments
1 min read
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Cover image for NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Comments
1 min read
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Cover image for Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Comments
1 min read
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Cover image for Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Comments
1 min read
Latent Diffusion Model without Variational Autoencoder
Cover image for Latent Diffusion Model without Variational Autoencoder

Latent Diffusion Model without Variational Autoencoder

Comments
1 min read
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Cover image for LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal

Comments
1 min read
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
Cover image for MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Comments
1 min read
A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Cover image for A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Comments
1 min read
BLIP3o-NEXT: Next Frontier of Native Image Generation
Cover image for BLIP3o-NEXT: Next Frontier of Native Image Generation

BLIP3o-NEXT: Next Frontier of Native Image Generation

Comments
1 min read
Language Models Model Language
Cover image for Language Models Model Language

Language Models Model Language

Comments
1 min read
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training
Cover image for InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-BasedIncremental Training

Comments
2 min read
Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation
Cover image for Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

Comments
2 min read
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents
Cover image for Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents

Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive OnlineExploration for Deep Research Agents

Comments
2 min read
Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition
Cover image for Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition

Foundation Models for Scientific Discovery: From Paradigm Enhancement toParadigm Transition

Comments
2 min read
VISTA: A Test-Time Self-Improving Video Generation Agent
Cover image for VISTA: A Test-Time Self-Improving Video Generation Agent

VISTA: A Test-Time Self-Improving Video Generation Agent

Comments
1 min read
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning
Cover image for DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token viaReinforcement Learning

Comments
1 min read
Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs
Cover image for Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs

Emergent Misalignment via In-Context Learning: Narrow in-context examples canproduce broadly misaligned LLMs

Comments
2 min read
Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation
Cover image for Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation

Build Your Personalized Research Group: A Multiagent Framework for Continual andInteractive Science Automation

Comments
1 min read
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain
Cover image for FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in FinanceDomain

Comments
1 min read
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Cover image for Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

Comments
1 min read
Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models
Cover image for Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models

Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation inMixture-of-Expert models

Comments
2 min read
Paper2Web: Let's Make Your Paper Alive!
Cover image for Paper2Web: Let's Make Your Paper Alive!

Paper2Web: Let's Make Your Paper Alive!

Comments
1 min read
Train a Unified Multimodal Data Quality Classifier with Synthetic Data
Cover image for Train a Unified Multimodal Data Quality Classifier with Synthetic Data

Train a Unified Multimodal Data Quality Classifier with Synthetic Data

Comments
1 min read
PICABench: How Far Are We from Physically Realistic Image Editing?
Cover image for PICABench: How Far Are We from Physically Realistic Image Editing?

PICABench: How Far Are We from Physically Realistic Image Editing?

Comments
1 min read
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Cover image for DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Comments
1 min read
Glyph: Scaling Context Windows via Visual-Text Compression
Cover image for Glyph: Scaling Context Windows via Visual-Text Compression

Glyph: Scaling Context Windows via Visual-Text Compression

Comments
1 min read
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Cover image for Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Comments
1 min read
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling
Cover image for When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLMEnsembling

Comments
1 min read
FineVision: Open Data Is All You Need
Cover image for FineVision: Open Data Is All You Need

FineVision: Open Data Is All You Need

Comments
1 min read
QueST: Incentivizing LLMs to Generate Difficult Problems
Cover image for QueST: Incentivizing LLMs to Generate Difficult Problems

QueST: Incentivizing LLMs to Generate Difficult Problems

Comments
1 min read
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
Cover image for Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

Comments
2 min read
RL makes MLLMs see better than SFT
Cover image for RL makes MLLMs see better than SFT

RL makes MLLMs see better than SFT

Comments
1 min read
Annotation-Efficient Universal Honesty Alignment
Cover image for Annotation-Efficient Universal Honesty Alignment

Annotation-Efficient Universal Honesty Alignment

Comments
1 min read
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuningand MLLM Implicit Feedback
Cover image for Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuningand MLLM Implicit Feedback

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuningand MLLM Implicit Feedback

Comments
1 min read
ConsistEdit: Highly Consistent and Precise Training-free Visual Editing
Cover image for ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

ConsistEdit: Highly Consistent and Precise Training-free Visual Editing

Comments
1 min read
Executable Knowledge Graphs for Replicating AI Research
Cover image for Executable Knowledge Graphs for Replicating AI Research

Executable Knowledge Graphs for Replicating AI Research

Comments
1 min read
Deep Self-Evolving Reasoning
Cover image for Deep Self-Evolving Reasoning

Deep Self-Evolving Reasoning

Comments
1 min read
Chronos-2: From Univariate to Universal Forecasting
Cover image for Chronos-2: From Univariate to Universal Forecasting

Chronos-2: From Univariate to Universal Forecasting

Comments
1 min read
Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI
Cover image for Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI

Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI

Comments
1 min read
Constantly Improving Image Models Need Constantly Improving Benchmarks
Cover image for Constantly Improving Image Models Need Constantly Improving Benchmarks

Constantly Improving Image Models Need Constantly Improving Benchmarks

Comments
1 min read
Enterprise Deep Research: Steerable Multi-Agent Deep Research for EnterpriseAnalytics
Cover image for Enterprise Deep Research: Steerable Multi-Agent Deep Research for EnterpriseAnalytics

Enterprise Deep Research: Steerable Multi-Agent Deep Research for EnterpriseAnalytics

Comments
1 min read
UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action
Cover image for UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Comments
1 min read
Agentic Reinforcement Learning for Search is Unsafe
Cover image for Agentic Reinforcement Learning for Search is Unsafe

Agentic Reinforcement Learning for Search is Unsafe

Comments
1 min read
Distractor Injection Attacks on Large Reasoning Models: Characterization andDefense
Cover image for Distractor Injection Attacks on Large Reasoning Models: Characterization andDefense

Distractor Injection Attacks on Large Reasoning Models: Characterization andDefense

Comments
1 min read
Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset
Cover image for Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

Comments
1 min read
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval andFiltering
Cover image for Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval andFiltering

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval andFiltering

Comments
1 min read
Foundational Automatic Evaluators: Scaling Multi-Task Generative EvaluatorTraining for Reasoning-Centric Domains
Cover image for Foundational Automatic Evaluators: Scaling Multi-Task Generative EvaluatorTraining for Reasoning-Centric Domains

Foundational Automatic Evaluators: Scaling Multi-Task Generative EvaluatorTraining for Reasoning-Centric Domains

Comments
1 min read
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision andLanguage Models
Cover image for MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision andLanguage Models

MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision andLanguage Models

Comments
1 min read
Balanced Multi-Task Attention for Satellite Image Classification: A SystematicApproach to Achieving 97.23% Accuracy on EuroSAT W
Cover image for Balanced Multi-Task Attention for Satellite Image Classification: A SystematicApproach to Achieving 97.23% Accuracy on EuroSAT W

Balanced Multi-Task Attention for Satellite Image Classification: A SystematicApproach to Achieving 97.23% Accuracy on EuroSAT W

Comments
1 min read
Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in LargeLanguage Models
Cover image for Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in LargeLanguage Models

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in LargeLanguage Models

Comments
1 min read
Automated Composition of Agents: A Knapsack Approach for Agentic ComponentSelection
Cover image for Automated Composition of Agents: A Knapsack Approach for Agentic ComponentSelection

Automated Composition of Agents: A Knapsack Approach for Agentic ComponentSelection

Comments
1 min read
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
Cover image for AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning

AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning

Comments
1 min read
On Non-interactive Evaluation of Animal Communication Translators
Cover image for On Non-interactive Evaluation of Animal Communication Translators

On Non-interactive Evaluation of Animal Communication Translators

Comments
1 min read
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
Cover image for GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Comments
1 min read
Test-Time Scaling of Reasoning Models for Machine Translation
Cover image for Test-Time Scaling of Reasoning Models for Machine Translation

Test-Time Scaling of Reasoning Models for Machine Translation

Comments
1 min read
What Limits Agentic Systems Efficiency?
Cover image for What Limits Agentic Systems Efficiency?

What Limits Agentic Systems Efficiency?

Comments
1 min read
LightMem: Lightweight and Efficient Memory-Augmented Generation
Cover image for LightMem: Lightweight and Efficient Memory-Augmented Generation

LightMem: Lightweight and Efficient Memory-Augmented Generation

Comments
1 min read
World-in-World: World Models in a Closed-Loop World
Cover image for World-in-World: World Models in a Closed-Loop World

World-in-World: World Models in a Closed-Loop World

Comments
2 min read
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-ImageGeneration
Cover image for UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-ImageGeneration

UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-ImageGeneration

Comments
1 min read
Chem-R: Learning to Reason as a Chemist
Cover image for Chem-R: Learning to Reason as a Chemist

Chem-R: Learning to Reason as a Chemist

Comments
1 min read
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
Cover image for MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation

MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation

Comments
1 min read
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for MultimodalLLMs
Cover image for Grasp Any Region: Towards Precise, Contextual Pixel Understanding for MultimodalLLMs

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for MultimodalLLMs

Comments
1 min read
IF-VidCap: Can Video Caption Models Follow Instructions?
Cover image for IF-VidCap: Can Video Caption Models Follow Instructions?

IF-VidCap: Can Video Caption Models Follow Instructions?

Comments
2 min read
MT-Video-Bench: A Holistic Video Understanding Benchmark for EvaluatingMultimodal LLMs in Multi-Turn Dialogues
Cover image for MT-Video-Bench: A Holistic Video Understanding Benchmark for EvaluatingMultimodal LLMs in Multi-Turn Dialogues

MT-Video-Bench: A Holistic Video Understanding Benchmark for EvaluatingMultimodal LLMs in Multi-Turn Dialogues

Comments
2 min read
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
Cover image for ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning

ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning

Comments
1 min read
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
Cover image for ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Comments
1 min read
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
Cover image for MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models

MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models

Comments
1 min read
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
Cover image for DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

Comments
1 min read
UltraGen: High-Resolution Video Generation with Hierarchical Attention
Cover image for UltraGen: High-Resolution Video Generation with Hierarchical Attention

UltraGen: High-Resolution Video Generation with Hierarchical Attention

Comments
1 min read
Video Reasoning without Training
Cover image for Video Reasoning without Training

Video Reasoning without Training

Comments
1 min read
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos
Cover image for Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos

Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposureMonocular Videos

Comments
2 min read
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Cover image for PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies

PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies

Comments
1 min read
AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading
Cover image for AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading

AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement LearningFramework for Stock Trading

Comments
1 min read
Extracting alignment data in open models
Cover image for Extracting alignment data in open models

Extracting alignment data in open models

Comments
1 min read
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning
Cover image for EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

Comments
1 min read
Efficient Long-context Language Model Training by Core Attention Disaggregation
Cover image for Efficient Long-context Language Model Training by Core Attention Disaggregation

Efficient Long-context Language Model Training by Core Attention Disaggregation

Comments
1 min read
GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver
Cover image for GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver

GAS: Improving Discretization of Diffusion ODEs via Generalized AdversarialSolver

Comments
1 min read
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution
Cover image for Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-TranslationSolution

Comments
1 min read
DeepSeek-OCR: Contexts Optical Compression
Cover image for DeepSeek-OCR: Contexts Optical Compression

DeepSeek-OCR: Contexts Optical Compression

Comments
1 min read
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews
Cover image for Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from LimitedViews

Comments
1 min read
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Cover image for Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Comments
1 min read
Expanding the Action Space of LLMs to Reason Beyond Language
Cover image for Expanding the Action Space of LLMs to Reason Beyond Language

Expanding the Action Space of LLMs to Reason Beyond Language

Comments
1 min read
Planned Diffusion
Cover image for Planned Diffusion

Planned Diffusion

Comments
1 min read
Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis
Cover image for Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis

Unimedvl: Unifying Medical Multimodal Understanding And Generation ThroughObservation-Knowledge-Analysis

Comments
1 min read
Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)
Cover image for Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)

Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Countsin the Global Terrorism Database (GTD)

Comments
1 min read
Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio
Cover image for Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio

Static Sandboxes Are Inadequate: Modeling Societal Complexity RequiresOpen-Ended Co-Evolution in LLM-Based Multi-Agent Simulatio

Comments
1 min read
PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold
Cover image for PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold

PokeeResearch: Effective Deep Research via Reinforcement Learning from AIFeedback and Robust Reasoning Scaffold

Comments
2 min read
Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration
Cover image for Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration

Pruning Overparameterized Multi-Task Networks for Degraded Web Image Restoration

Comments
1 min read
When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?
Cover image for When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?

When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generatedby Code Agents?

Comments
1 min read
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Cover image for LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Comments
1 min read
Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning
Cover image for Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning

Every Attention Matters: An Efficient Hybrid Architecture for Long-ContextReasoning

Comments
1 min read
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping
Cover image for BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced PolicyOptimization with Adaptive Clipping

Comments
1 min read
DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents
Cover image for DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents

DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile PhoneAgents

Comments
1 min read
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Cover image for GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Comments
1 min read
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
Cover image for ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases

Comments
1 min read
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
Cover image for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Comments
1 min read
Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning
Cover image for Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning

Towards Faithful and Controllable Personalization via Critique-Post-EditReinforcement Learning

Comments
2 min read
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos
Cover image for VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Comments
1 min read
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
Cover image for Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Comments
1 min read
Language Models are Injective and Hence Invertible
Cover image for Language Models are Injective and Hence Invertible

Language Models are Injective and Hence Invertible

Comments
1 min read
Attention Sinks in Diffusion Language Models
Cover image for Attention Sinks in Diffusion Language Models

Attention Sinks in Diffusion Language Models

Comments
1 min read
Unified Reinforcement and Imitation Learning for Vision-Language Models
Cover image for Unified Reinforcement and Imitation Learning for Vision-Language Models

Unified Reinforcement and Imitation Learning for Vision-Language Models

Comments
1 min read
olmOCR 2: Unit Test Rewards for Document OCR
Cover image for olmOCR 2: Unit Test Rewards for Document OCR

olmOCR 2: Unit Test Rewards for Document OCR

Comments
1 min read
Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation
Cover image for Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation

Decomposed Attention Fusion in MLLMs for Training-Free Video ReasoningSegmentation

Comments
1 min read
FinSight: Towards Real-World Financial Deep Research
Cover image for FinSight: Towards Real-World Financial Deep Research

FinSight: Towards Real-World Financial Deep Research

Comments
1 min read
Directional Reasoning Injection for Fine-Tuning MLLMs
Cover image for Directional Reasoning Injection for Fine-Tuning MLLMs

Directional Reasoning Injection for Fine-Tuning MLLMs

Comments
1 min read
KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints
Cover image for KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints

KORE: Enhancing Knowledge Injection for Large Multimodal Models viaKnowledge-Oriented Augmentations and Constraints

Comments
1 min read
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues
Cover image for Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English andKorean Dialogues

Comments
1 min read
OmniNWM: Omniscient Driving Navigation World Models
Cover image for OmniNWM: Omniscient Driving Navigation World Models

OmniNWM: Omniscient Driving Navigation World Models

Comments
1 min read
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
Cover image for ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

Comments
1 min read
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
Cover image for TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

Comments
1 min read
NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning
Cover image for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Comments
1 min read
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
Cover image for From Charts to Code: A Hierarchical Benchmark for Multimodal Models

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

Comments
1 min read
MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models
Cover image for MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for LargeMultimodal Models

Comments
2 min read
Steering Autoregressive Music Generation with Recursive Feature Machines
Cover image for Steering Autoregressive Music Generation with Recursive Feature Machines

Steering Autoregressive Music Generation with Recursive Feature Machines

Comments
1 min read
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge
Cover image for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer andJudge

Comments
1 min read
Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection
Cover image for Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection

Learning from the Best, Differently: A Diversity-Driven Rethinking on DataSelection

Comments
1 min read
When Do Transformers Learn Heuristics for Graph Connectivity?
Cover image for When Do Transformers Learn Heuristics for Graph Connectivity?

When Do Transformers Learn Heuristics for Graph Connectivity?

Comments
2 min read
See the Text: From Tokenization to Visual Reading
Cover image for See the Text: From Tokenization to Visual Reading

See the Text: From Tokenization to Visual Reading

Comments
1 min read
RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling
Cover image for RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling

RIR-Mega: a large-scale simulated room impulse response dataset for machinelearning and room acoustics modeling

Comments
1 min read
What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics
Cover image for What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics

What Questions Should Robots Be Able to Answer? A Dataset of User Questions forExplainable Robotics

Comments
2 min read
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models
Cover image for DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation inText-to-Image Models

Comments
2 min read
Machine Text Detectors are Membership Inference Attacks
Cover image for Machine Text Detectors are Membership Inference Attacks

Machine Text Detectors are Membership Inference Attacks

Comments
1 min read
SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection
Cover image for SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection

SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection

Comments
1 min read
Accelerating Vision Transformers with Adaptive Patch Sizes
Cover image for Accelerating Vision Transformers with Adaptive Patch Sizes

Accelerating Vision Transformers with Adaptive Patch Sizes

Comments
1 min read
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs
Cover image for Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs inMultimodal LLMs

Comments
1 min read
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
Cover image for DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

Comments
1 min read
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application
Cover image for HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application

HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents inHierarchical Rule Application

Comments
2 min read
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Cover image for Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Comments
1 min read
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Cover image for Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Comments
1 min read
Every Question Has Its Own Value: Reinforcement Learning with Explicit HumanValues
Cover image for Every Question Has Its Own Value: Reinforcement Learning with Explicit HumanValues

Every Question Has Its Own Value: Reinforcement Learning with Explicit HumanValues

Comments
1 min read
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Cover image for HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Comments
1 min read
LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas
Cover image for LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Comments
1 min read
AlphaFlow: Understanding and Improving MeanFlow Models
Cover image for AlphaFlow: Understanding and Improving MeanFlow Models

AlphaFlow: Understanding and Improving MeanFlow Models

Comments
1 min read
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Cover image for ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

Comments
1 min read
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale VisualEvidence
Cover image for Conan: Progressive Learning to Reason Like a Detective over Multi-Scale VisualEvidence

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale VisualEvidence

Comments
1 min read
The Massive Legal Embedding Benchmark (MLEB)
Cover image for The Massive Legal Embedding Benchmark (MLEB)

The Massive Legal Embedding Benchmark (MLEB)

Comments
1 min read
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Cover image for AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Comments
1 min read
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
Cover image for DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Comments
1 min read
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
Cover image for Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Comments
1 min read
Emergence of Linear Truth Encodings in Language Models
Cover image for Emergence of Linear Truth Encodings in Language Models

Emergence of Linear Truth Encodings in Language Models

Comments
1 min read
From Masks to Worlds: A Hitchhiker's Guide to World Models
Cover image for From Masks to Worlds: A Hitchhiker's Guide to World Models

From Masks to Worlds: A Hitchhiker's Guide to World Models

Comments
1 min read
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
Cover image for Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

Comments
1 min read
Thought Communication in Multiagent Collaboration
Cover image for Thought Communication in Multiagent Collaboration

Thought Communication in Multiagent Collaboration

Comments
1 min read
AlphaOPT: Formulating Optimization Programs with Self-Improving LLM ExperienceLibrary
Cover image for AlphaOPT: Formulating Optimization Programs with Self-Improving LLM ExperienceLibrary

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM ExperienceLibrary

Comments
1 min read
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-LanguageModels
Cover image for SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-LanguageModels

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-LanguageModels

Comments
1 min read
Investigating Safety Vulnerabilities of Large Audio-Language Models UnderSpeaker Emotional Variations
Cover image for Investigating Safety Vulnerabilities of Large Audio-Language Models UnderSpeaker Emotional Variations

Investigating Safety Vulnerabilities of Large Audio-Language Models UnderSpeaker Emotional Variations

Comments
1 min read
Diff-XYZ: A Benchmark for Evaluating Diff Understanding
Cover image for Diff-XYZ: A Benchmark for Evaluating Diff Understanding

Diff-XYZ: A Benchmark for Evaluating Diff Understanding

Comments
1 min read
CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-AugmentedValidation
Cover image for CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-AugmentedValidation

CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-AugmentedValidation

Comments
1 min read
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Cover image for Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs

Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs

Comments
1 min read
Communication to Completion: Modeling Collaborative Workflows with IntelligentMulti-Agent Communication
Cover image for Communication to Completion: Modeling Collaborative Workflows with IntelligentMulti-Agent Communication

Communication to Completion: Modeling Collaborative Workflows with IntelligentMulti-Agent Communication

Comments
1 min read
Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference
Cover image for Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

Comments
1 min read
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed ContextParallelism
Cover image for Long-Context Attention Benchmark: From Kernel Efficiency to Distributed ContextParallelism

Long-Context Attention Benchmark: From Kernel Efficiency to Distributed ContextParallelism

Comments
1 min read
ComProScanner: A multi-agent based framework for composition-property structureddata extraction from scientific literature
Cover image for ComProScanner: A multi-agent based framework for composition-property structureddata extraction from scientific literature

ComProScanner: A multi-agent based framework for composition-property structureddata extraction from scientific literature

Comments
1 min read
MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration
Cover image for MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration

MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration

Comments
1 min read
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Cover image for DeepAgent: A General Reasoning Agent with Scalable Toolsets

DeepAgent: A General Reasoning Agent with Scalable Toolsets

Comments
1 min read
Video-As-Prompt: Unified Semantic Control for Video Generation
Cover image for Video-As-Prompt: Unified Semantic Control for Video Generation

Video-As-Prompt: Unified Semantic Control for Video Generation

Comments
2 min read
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
Cover image for UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

Comments
1 min read
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Cover image for Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Comments
1 min read
A Definition of AGI
Cover image for A Definition of AGI

A Definition of AGI

Comments
1 min read
From Denoising to Refining: A Corrective Framework for Vision-Language DiffusionModel
Cover image for From Denoising to Refining: A Corrective Framework for Vision-Language DiffusionModel

From Denoising to Refining: A Corrective Framework for Vision-Language DiffusionModel

Comments
1 min read
Sparser Block-Sparse Attention via Token Permutation
Cover image for Sparser Block-Sparse Attention via Token Permutation

Sparser Block-Sparse Attention via Token Permutation

Comments
1 min read
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation viaHierarchical Model Merging
Cover image for RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation viaHierarchical Model Merging

RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation viaHierarchical Model Merging

Comments
1 min read
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Cover image for Reasoning with Sampling: Your Base Model is Smarter Than You Think

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Comments
1 min read
Model Merging with Functional Dual Anchors
Cover image for Model Merging with Functional Dual Anchors

Model Merging with Functional Dual Anchors

Comments
1 min read
Attention Is All You Need
Cover image for Attention Is All You Need

Attention Is All You Need

Comments
1 min read
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Cover image for RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Comments
1 min read
YOLOv3: An Incremental Improvement
Cover image for YOLOv3: An Incremental Improvement

YOLOv3: An Incremental Improvement

Comments
1 min read
MobileNets: Efficient Convolutional Neural Networks for Mobile VisionApplications
Cover image for MobileNets: Efficient Convolutional Neural Networks for Mobile VisionApplications

MobileNets: Efficient Convolutional Neural Networks for Mobile VisionApplications

Comments
1 min read
Proximal Policy Optimization Algorithms
Cover image for Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

Comments
1 min read
Distilling the Knowledge in a Neural Network
Cover image for Distilling the Knowledge in a Neural Network

Distilling the Knowledge in a Neural Network

Comments
1 min read
LLaMA: Open and Efficient Foundation Language Models
Cover image for LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models

Comments
1 min read
YOLOv4: Optimal Speed and Accuracy of Object Detection
Cover image for YOLOv4: Optimal Speed and Accuracy of Object Detection

YOLOv4: Optimal Speed and Accuracy of Object Detection

Comments
1 min read
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Cover image for Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Comments
1 min read
Playing Atari with Deep Reinforcement Learning
Cover image for Playing Atari with Deep Reinforcement Learning

Playing Atari with Deep Reinforcement Learning

Comments
1 min read
Representation Learning with Contrastive Predictive Coding
Cover image for Representation Learning with Contrastive Predictive Coding

Representation Learning with Contrastive Predictive Coding

Comments
1 min read
Layer Normalization
Cover image for Layer Normalization

Layer Normalization

Comments
1 min read
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Cover image for TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Comments
1 min read
Community detection in graphs
Cover image for Community detection in graphs

Community detection in graphs

Comments
1 min read
Conditional Generative Adversarial Nets
Cover image for Conditional Generative Adversarial Nets

Conditional Generative Adversarial Nets

Comments
1 min read
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Cover image for UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Comments
1 min read
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine LearningAlgorithms
Cover image for Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine LearningAlgorithms

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine LearningAlgorithms

Comments
1 min read
Rethinking Atrous Convolution for Semantic Image Segmentation
Cover image for Rethinking Atrous Convolution for Semantic Image Segmentation

Rethinking Atrous Convolution for Semantic Image Segmentation

Comments
1 min read
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Cover image for DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Comments
1 min read
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB modelsize
Cover image for SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB modelsize

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB modelsize

Comments
1 min read
Hierarchical Text-Conditional Image Generation with CLIP Latents
Cover image for Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents

Comments
1 min read
Improving neural networks by preventing co-adaptation of feature detectors
Cover image for Improving neural networks by preventing co-adaptation of feature detectors

Improving neural networks by preventing co-adaptation of feature detectors

Comments
1 min read
Evaluating Large Language Models Trained on Code
Cover image for Evaluating Large Language Models Trained on Code

Evaluating Large Language Models Trained on Code

Comments
1 min read
Google's Neural Machine Translation System: Bridging the Gap between Human andMachine Translation
Cover image for Google's Neural Machine Translation System: Bridging the Gap between Human andMachine Translation

Google's Neural Machine Translation System: Bridging the Gap between Human andMachine Translation

Comments
1 min read
ADADELTA: An Adaptive Learning Rate Method
Cover image for ADADELTA: An Adaptive Learning Rate Method

ADADELTA: An Adaptive Learning Rate Method

Comments
1 min read
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Cover image for UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Comments
1 min read
Training Verifiers to Solve Math Word Problems
Cover image for Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems

Comments
1 min read
Scaling Laws for Neural Language Models
Cover image for Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

Comments
1 min read
Attention U-Net: Learning Where to Look for the Pancreas
Cover image for Attention U-Net: Learning Where to Look for the Pancreas

Attention U-Net: Learning Where to Look for the Pancreas

Comments
1 min read
ShapeNet: An Information-Rich 3D Model Repository
Cover image for ShapeNet: An Information-Rich 3D Model Repository

ShapeNet: An Information-Rich 3D Model Repository

Comments
1 min read
Evaluation: from precision, recall and F-measure to ROC, informedness,markedness and correlation
Cover image for Evaluation: from precision, recall and F-measure to ROC, informedness,markedness and correlation

Evaluation: from precision, recall and F-measure to ROC, informedness,markedness and correlation

Comments
1 min read
An Empirical Evaluation of Generic Convolutional and Recurrent Networks forSequence Modeling
Cover image for An Empirical Evaluation of Generic Convolutional and Recurrent Networks forSequence Modeling

An Empirical Evaluation of Generic Convolutional and Recurrent Networks forSequence Modeling

Comments
1 min read
On the Opportunities and Risks of Foundation Models
Cover image for On the Opportunities and Risks of Foundation Models

On the Opportunities and Risks of Foundation Models

Comments
1 min read
OpenAI Gym
Cover image for OpenAI Gym

OpenAI Gym

Comments
1 min read
Variational Inference: A Review for Statisticians
Cover image for Variational Inference: A Review for Statisticians

Variational Inference: A Review for Statisticians

Comments
1 min read
A Quantitative Measure Of Fairness And Discrimination For Resource Allocation InShared Computer Systems
Cover image for A Quantitative Measure Of Fairness And Discrimination For Resource Allocation InShared Computer Systems

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation InShared Computer Systems

Comments
1 min read
YOLOX: Exceeding YOLO Series in 2021
Cover image for YOLOX: Exceeding YOLO Series in 2021

YOLOX: Exceeding YOLO Series in 2021

Comments
1 min read
Federated Learning: Strategies for Improving Communication Efficiency
Cover image for Federated Learning: Strategies for Improving Communication Efficiency

Federated Learning: Strategies for Improving Communication Efficiency

Comments
1 min read
Wasserstein GAN
Cover image for Wasserstein GAN

Wasserstein GAN

Comments
1 min read
Classifier-Free Diffusion Guidance
Cover image for Classifier-Free Diffusion Guidance

Classifier-Free Diffusion Guidance

Comments
1 min read
Fast Graph Representation Learning with PyTorch Geometric
Cover image for Fast Graph Representation Learning with PyTorch Geometric

Fast Graph Representation Learning with PyTorch Geometric

Comments
1 min read
CoSaMP: Iterative signal recovery from incomplete and inaccurate samples
Cover image for CoSaMP: Iterative signal recovery from incomplete and inaccurate samples

CoSaMP: Iterative signal recovery from incomplete and inaccurate samples

Comments
1 min read
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ReinforcementLearning
Cover image for DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ReinforcementLearning

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ReinforcementLearning

Comments
1 min read
Longformer: The Long-Document Transformer
Cover image for Longformer: The Long-Document Transformer

Longformer: The Long-Document Transformer

Comments
1 min read
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Cover image for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Comments
1 min read
End to End Learning for Self-Driving Cars
Cover image for End to End Learning for Self-Driving Cars

End to End Learning for Self-Driving Cars

Comments
1 min read
Bidirectional LSTM-CRF Models for Sequence Tagging
Cover image for Bidirectional LSTM-CRF Models for Sequence Tagging

Bidirectional LSTM-CRF Models for Sequence Tagging

Comments
1 min read
OPT: Open Pre-trained Transformer Language Models
Cover image for OPT: Open Pre-trained Transformer Language Models

OPT: Open Pre-trained Transformer Language Models

Comments
1 min read
Generating Sequences With Recurrent Neural Networks
Cover image for Generating Sequences With Recurrent Neural Networks

Generating Sequences With Recurrent Neural Networks

Comments
1 min read
The Kinetics Human Action Video Dataset
Cover image for The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

Comments
1 min read
Improved Regularization of Convolutional Neural Networks with Cutout
Cover image for Improved Regularization of Convolutional Neural Networks with Cutout

Improved Regularization of Convolutional Neural Networks with Cutout

Comments
1 min read
Variational Graph Auto-Encoders
Cover image for Variational Graph Auto-Encoders

Variational Graph Auto-Encoders

Comments
1 min read
Instance Normalization: The Missing Ingredient for Fast Stylization
Cover image for Instance Normalization: The Missing Ingredient for Fast Stylization

Instance Normalization: The Missing Ingredient for Fast Stylization

Comments
1 min read
The information bottleneck method
Cover image for The information bottleneck method

The information bottleneck method

Comments
1 min read
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Cover image for Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Comments
1 min read
Improved Baselines with Momentum Contrastive Learning
Cover image for Improved Baselines with Momentum Contrastive Learning

Improved Baselines with Momentum Contrastive Learning

Comments
1 min read
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Cover image for Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Comments
1 min read
A Survey of Large Language Models
Cover image for A Survey of Large Language Models

A Survey of Large Language Models

Comments
1 min read
Deep Learning using Rectified Linear Units (ReLU)
Cover image for Deep Learning using Rectified Linear Units (ReLU)

Deep Learning using Rectified Linear Units (ReLU)

Comments
1 min read
Objects as Points
Cover image for Objects as Points

Objects as Points

Comments
1 min read
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Cover image for Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Comments
1 min read
Estimating or Propagating Gradients Through Stochastic Neurons for ConditionalComputation
Cover image for Estimating or Propagating Gradients Through Stochastic Neurons for ConditionalComputation

Estimating or Propagating Gradients Through Stochastic Neurons for ConditionalComputation

Comments
1 min read
In Defense of the Triplet Loss for Person Re-Identification
Cover image for In Defense of the Triplet Loss for Person Re-Identification

In Defense of the Triplet Loss for Person Re-Identification

Comments
1 min read
Relational inductive biases, deep learning, and graph networks
Cover image for Relational inductive biases, deep learning, and graph networks

Relational inductive biases, deep learning, and graph networks

Comments
1 min read
Training a Helpful and Harmless Assistant with Reinforcement Learning from HumanFeedback
Cover image for Training a Helpful and Harmless Assistant with Reinforcement Learning from HumanFeedback

Training a Helpful and Harmless Assistant with Reinforcement Learning from HumanFeedback

Comments
1 min read
MMDetection: Open MMLab Detection Toolbox and Benchmark
Cover image for MMDetection: Open MMLab Detection Toolbox and Benchmark

MMDetection: Open MMLab Detection Toolbox and Benchmark

Comments
1 min read
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open LanguageModels
Cover image for DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open LanguageModels

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open LanguageModels

Comments
1 min read
Empirical Evaluation of Rectified Activations in Convolutional Network
Cover image for Empirical Evaluation of Rectified Activations in Convolutional Network

Empirical Evaluation of Rectified Activations in Convolutional Network

Comments
1 min read
Past, Present, and Future of Simultaneous Localization And Mapping: Towards theRobust-Perception Age
Cover image for Past, Present, and Future of Simultaneous Localization And Mapping: Towards theRobust-Perception Age

Past, Present, and Future of Simultaneous Localization And Mapping: Towards theRobust-Perception Age

Comments
2 min read
An Overview of Multi-Task Learning in Deep Neural Networks
Cover image for An Overview of Multi-Task Learning in Deep Neural Networks

An Overview of Multi-Task Learning in Deep Neural Networks

Comments
1 min read
On discrete cosine transform
Cover image for On discrete cosine transform

On discrete cosine transform

Comments
1 min read
A Neural Algorithm of Artistic Style
Cover image for A Neural Algorithm of Artistic Style

A Neural Algorithm of Artistic Style

Comments
1 min read
The Effectiveness of Data Augmentation in Image Classification using DeepLearning
Cover image for The Effectiveness of Data Augmentation in Image Classification using DeepLearning

The Effectiveness of Data Augmentation in Image Classification using DeepLearning

Comments
1 min read
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with DeepLearning
Cover image for CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with DeepLearning

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with DeepLearning

Comments
1 min read
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Cover image for HuggingFace's Transformers: State-of-the-art Natural Language Processing

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Comments
1 min read
Recurrent Neural Network Regularization
Cover image for Recurrent Neural Network Regularization

Recurrent Neural Network Regularization

Comments
1 min read
Federated Learning with Non-IID Data
Cover image for Federated Learning with Non-IID Data

Federated Learning with Non-IID Data

Comments
1 min read
Mistral 7B
Cover image for Mistral 7B

Mistral 7B

Comments
1 min read
Gemini 1.5: Unlocking multimodal understanding across millions of tokens ofcontext
Cover image for Gemini 1.5: Unlocking multimodal understanding across millions of tokens ofcontext

Gemini 1.5: Unlocking multimodal understanding across millions of tokens ofcontext

Comments
1 min read
Link Prediction in Complex Networks: A Survey
Cover image for Link Prediction in Complex Networks: A Survey

Link Prediction in Complex Networks: A Survey

Comments
1 min read
Soft Actor-Critic Algorithms and Applications
Cover image for Soft Actor-Critic Algorithms and Applications

Soft Actor-Critic Algorithms and Applications

Comments
1 min read
Microsoft COCO Captions: Data Collection and Evaluation Server
Cover image for Microsoft COCO Captions: Data Collection and Evaluation Server

Microsoft COCO Captions: Data Collection and Evaluation Server

Comments
1 min read
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Cover image for BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Comments
1 min read
Concrete Problems in AI Safety
Cover image for Concrete Problems in AI Safety

Concrete Problems in AI Safety

Comments
1 min read
Program Synthesis with Large Language Models
Cover image for Program Synthesis with Large Language Models

Program Synthesis with Large Language Models

Comments
1 min read
Progressive Neural Networks
Cover image for Progressive Neural Networks

Progressive Neural Networks

Comments
1 min read
A Tutorial on Principal Component Analysis
Cover image for A Tutorial on Principal Component Analysis

A Tutorial on Principal Component Analysis

Comments
1 min read
Counterfactual Explanations without Opening the Black Box: Automated Decisionsand the GDPR
Cover image for Counterfactual Explanations without Opening the Black Box: Automated Decisionsand the GDPR

Counterfactual Explanations without Opening the Black Box: Automated Decisionsand the GDPR

Comments
1 min read
Code Llama: Open Foundation Models for Code
Cover image for Code Llama: Open Foundation Models for Code

Code Llama: Open Foundation Models for Code

Comments
1 min read
Fine-Grained Visual Classification of Aircraft
Cover image for Fine-Grained Visual Classification of Aircraft

Fine-Grained Visual Classification of Aircraft

Comments
1 min read
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at AnyResolution
Cover image for Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at AnyResolution

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at AnyResolution

Comments
1 min read
Qwen2.5 Technical Report
Cover image for Qwen2.5 Technical Report

Qwen2.5 Technical Report

Comments
1 min read
Retrieval-Augmented Generation for Large Language Models: A Survey
Cover image for Retrieval-Augmented Generation for Large Language Models: A Survey

Retrieval-Augmented Generation for Large Language Models: A Survey

Comments
1 min read
A Tutorial on Bayesian Optimization of Expensive Cost Functions, withApplication to Active User Modeling and Hierarchical Reinfo
Cover image for A Tutorial on Bayesian Optimization of Expensive Cost Functions, withApplication to Active User Modeling and Hierarchical Reinfo

A Tutorial on Bayesian Optimization of Expensive Cost Functions, withApplication to Active User Modeling and Hierarchical Reinfo

Comments
1 min read
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
Cover image for YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

Comments
1 min read
A Critical Review of Recurrent Neural Networks for Sequence Learning
Cover image for A Critical Review of Recurrent Neural Networks for Sequence Learning

A Critical Review of Recurrent Neural Networks for Sequence Learning

Comments
1 min read
LSUN: Construction of a Large-scale Image Dataset using Deep Learning withHumans in the Loop
Cover image for LSUN: Construction of a Large-scale Image Dataset using Deep Learning withHumans in the Loop

LSUN: Construction of a Large-scale Image Dataset using Deep Learning withHumans in the Loop

Comments
1 min read
Training Compute-Optimal Large Language Models
Cover image for Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

Comments
1 min read
Invariant Risk Minimization
Cover image for Invariant Risk Minimization

Invariant Risk Minimization

Comments
1 min read
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Cover image for The Pile: An 800GB Dataset of Diverse Text for Language Modeling

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Comments
1 min read
Iterative Hard Thresholding for Compressed Sensing
Cover image for Iterative Hard Thresholding for Compressed Sensing

Iterative Hard Thresholding for Compressed Sensing

Comments
1 min read
Neural Turing Machines
Cover image for Neural Turing Machines

Neural Turing Machines

Comments
1 min read
Decoupled Weight Decay Regularization
Cover image for Decoupled Weight Decay Regularization

Decoupled Weight Decay Regularization

Comments
1 min read
On First-Order Meta-Learning Algorithms
Cover image for On First-Order Meta-Learning Algorithms

On First-Order Meta-Learning Algorithms

Comments
1 min read
SmoothGrad: removing noise by adding noise
Cover image for SmoothGrad: removing noise by adding noise

SmoothGrad: removing noise by adding noise

Comments
1 min read
Theano: A Python framework for fast computation of mathematical expressions
Cover image for Theano: A Python framework for fast computation of mathematical expressions

Theano: A Python framework for fast computation of mathematical expressions

Comments
1 min read
Adversarial Autoencoders
Cover image for Adversarial Autoencoders

Adversarial Autoencoders

Comments
1 min read
GPT-4o System Card
Cover image for GPT-4o System Card

GPT-4o System Card

Comments
1 min read
Deep Learning for Medical Image Analysis
Cover image for Deep Learning for Medical Image Analysis

Deep Learning for Medical Image Analysis

Comments
1 min read
MXNet: A Flexible and Efficient Machine Learning Library for HeterogeneousDistributed Systems
Cover image for MXNet: A Flexible and Efficient Machine Learning Library for HeterogeneousDistributed Systems

MXNet: A Flexible and Efficient Machine Learning Library for HeterogeneousDistributed Systems

Comments
1 min read
Megatron-LM: Training Multi-Billion Parameter Language Models Using ModelParallelism
Cover image for Megatron-LM: Training Multi-Billion Parameter Language Models Using ModelParallelism

Megatron-LM: Training Multi-Billion Parameter Language Models Using ModelParallelism

Comments
1 min read
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on OpenProblems
Cover image for Offline Reinforcement Learning: Tutorial, Review, and Perspectives on OpenProblems

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on OpenProblems

Comments
1 min read
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Cover image for ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Comments
1 min read
Deep Speech: Scaling up end-to-end speech recognition
Cover image for Deep Speech: Scaling up end-to-end speech recognition

Deep Speech: Scaling up end-to-end speech recognition

Comments
1 min read
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with LowBitwidth Gradients
Cover image for DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with LowBitwidth Gradients

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with LowBitwidth Gradients

Comments
1 min read
Generating Long Sequences with Sparse Transformers
Cover image for Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

Comments
1 min read
VisualBERT: A Simple and Performant Baseline for Vision and Language
Cover image for VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A Simple and Performant Baseline for Vision and Language

Comments
1 min read
Constitutional AI: Harmlessness from AI Feedback
Cover image for Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

Comments
1 min read
Learning Face Representation from Scratch
Cover image for Learning Face Representation from Scratch

Learning Face Representation from Scratch

Comments
1 min read
Fine-Tuning Language Models from Human Preferences
Cover image for Fine-Tuning Language Models from Human Preferences

Fine-Tuning Language Models from Human Preferences

Comments
1 min read
Universal and Transferable Adversarial Attacks on Aligned Language Models
Cover image for Universal and Transferable Adversarial Attacks on Aligned Language Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

Comments
1 min read
Qwen2.5-VL Technical Report
Cover image for Qwen2.5-VL Technical Report

Qwen2.5-VL Technical Report

Comments
1 min read
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
Cover image for Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Comments
1 min read
Beyond the Imitation Game: Quantifying and extrapolating the capabilities oflanguage models
Cover image for Beyond the Imitation Game: Quantifying and extrapolating the capabilities oflanguage models

Beyond the Imitation Game: Quantifying and extrapolating the capabilities oflanguage models

Comments
1 min read
A Tutorial on Bayesian Optimization
Cover image for A Tutorial on Bayesian Optimization

A Tutorial on Bayesian Optimization

Comments
1 min read
Binarized Neural Networks
Cover image for Binarized Neural Networks

Binarized Neural Networks

Comments
1 min read
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Cover image for Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Comments
1 min read
Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: Howit Works, State of Research, and Road Ahead
Cover image for Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: Howit Works, State of Research, and Road Ahead

Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: Howit Works, State of Research, and Road Ahead

Comments
1 min read
DSSD : Deconvolutional Single Shot Detector
Cover image for DSSD : Deconvolutional Single Shot Detector

DSSD : Deconvolutional Single Shot Detector

Comments
1 min read
Weight Uncertainty in Neural Networks
Cover image for Weight Uncertainty in Neural Networks

Weight Uncertainty in Neural Networks

Comments
1 min read
Sequence Transduction with Recurrent Neural Networks
Cover image for Sequence Transduction with Recurrent Neural Networks

Sequence Transduction with Recurrent Neural Networks

Comments
1 min read
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
Cover image for BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Comments
1 min read
Linformer: Self-Attention with Linear Complexity
Cover image for Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

Comments
1 min read
Dota 2 with Large Scale Deep Reinforcement Learning
Cover image for Dota 2 with Large Scale Deep Reinforcement Learning

Dota 2 with Large Scale Deep Reinforcement Learning

Comments
1 min read
Artificial Intelligence: the global landscape of ethics guidelines
Cover image for Artificial Intelligence: the global landscape of ethics guidelines

Artificial Intelligence: the global landscape of ethics guidelines

Comments
1 min read
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Cover image for BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Comments
1 min read
MOT16: A Benchmark for Multi-Object Tracking
Cover image for MOT16: A Benchmark for Multi-Object Tracking

MOT16: A Benchmark for Multi-Object Tracking

Comments
1 min read
IPFS - Content Addressed, Versioned, P2P File System
Cover image for IPFS - Content Addressed, Versioned, P2P File System

IPFS - Content Addressed, Versioned, P2P File System

Comments
1 min read
Community detection in networks: A user guide
Cover image for Community detection in networks: A user guide

Community detection in networks: A user guide

Comments
1 min read
Mastering Chess and Shogi by Self-Play with a General Reinforcement LearningAlgorithm
Cover image for Mastering Chess and Shogi by Self-Play with a General Reinforcement LearningAlgorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement LearningAlgorithm

Comments
2 min read
Understanding Neural Networks Through Deep Visualization
Cover image for Understanding Neural Networks Through Deep Visualization

Understanding Neural Networks Through Deep Visualization

Comments
1 min read
cuDNN: Efficient Primitives for Deep Learning
Cover image for cuDNN: Efficient Primitives for Deep Learning

cuDNN: Efficient Primitives for Deep Learning

Comments
1 min read
Tutorial on Variational Autoencoders
Cover image for Tutorial on Variational Autoencoders

Tutorial on Variational Autoencoders

Comments
1 min read
AutoAugment: Learning Augmentation Policies from Data
Cover image for AutoAugment: Learning Augmentation Policies from Data

AutoAugment: Learning Augmentation Policies from Data

Comments
1 min read
Multitask Prompted Training Enables Zero-Shot Task Generalization
Cover image for Multitask Prompted Training Enables Zero-Shot Task Generalization

Multitask Prompted Training Enables Zero-Shot Task Generalization

Comments
1 min read
Open3D: A Modern Library for 3D Data Processing
Cover image for Open3D: A Modern Library for 3D Data Processing

Open3D: A Modern Library for 3D Data Processing

Comments
1 min read
Highway Networks
Cover image for Highway Networks

Highway Networks

Comments
1 min read
Transferability in Machine Learning: from Phenomena to Black-Box Attacks usingAdversarial Samples
Cover image for Transferability in Machine Learning: from Phenomena to Black-Box Attacks usingAdversarial Samples

Transferability in Machine Learning: from Phenomena to Black-Box Attacks usingAdversarial Samples

Comments
1 min read
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation,Progression Assessment, and Overall Survival Predi
Cover image for Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation,Progression Assessment, and Overall Survival Predi

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation,Progression Assessment, and Overall Survival Predi

Comments
2 min read
A Neural Conversational Model
Cover image for A Neural Conversational Model

A Neural Conversational Model

Comments
1 min read
NIPS 2016 Tutorial: Generative Adversarial Networks
Cover image for NIPS 2016 Tutorial: Generative Adversarial Networks

NIPS 2016 Tutorial: Generative Adversarial Networks

Comments
1 min read
Imagen Video: High Definition Video Generation with Diffusion Models
Cover image for Imagen Video: High Definition Video Generation with Diffusion Models

Imagen Video: High Definition Video Generation with Diffusion Models

Comments
1 min read
LaMDA: Language Models for Dialog Applications
Cover image for LaMDA: Language Models for Dialog Applications

LaMDA: Language Models for Dialog Applications

Comments
1 min read
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Cover image for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Comments
1 min read
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Cover image for Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Comments
1 min read
Deep Reinforcement Learning: An Overview
Cover image for Deep Reinforcement Learning: An Overview

Deep Reinforcement Learning: An Overview

Comments
1 min read
Deep learning in remote sensing: a review
Cover image for Deep learning in remote sensing: a review

Deep learning in remote sensing: a review

Comments
1 min read
word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embeddingmethod
Cover image for word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embeddingmethod

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embeddingmethod

Comments
1 min read
Federated Learning for Mobile Keyboard Prediction
Cover image for Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Comments
1 min read
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Cover image for LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Comments
1 min read
Deep Convolutional Networks on Graph-Structured Data
Cover image for Deep Convolutional Networks on Graph-Structured Data

Deep Convolutional Networks on Graph-Structured Data

Comments
1 min read
Deep Learning for Anomaly Detection: A Survey
Cover image for Deep Learning for Anomaly Detection: A Survey

Deep Learning for Anomaly Detection: A Survey

Comments
1 min read
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Cover image for Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Comments
1 min read
Exploiting Similarities among Languages for Machine Translation
Cover image for Exploiting Similarities among Languages for Machine Translation

Exploiting Similarities among Languages for Machine Translation

Comments
1 min read
MediaPipe: A Framework for Building Perception Pipelines
Cover image for MediaPipe: A Framework for Building Perception Pipelines

MediaPipe: A Framework for Building Perception Pipelines

Comments
1 min read
A guide to convolution arithmetic for deep learning
Cover image for A guide to convolution arithmetic for deep learning

A guide to convolution arithmetic for deep learning

Comments
1 min read
Coase's Penguin, or Linux and the Nature of the Firm
Cover image for Coase's Penguin, or Linux and the Nature of the Firm

Coase's Penguin, or Linux and the Nature of the Firm

Comments
1 min read
CatBoost: gradient boosting with categorical features support
Cover image for CatBoost: gradient boosting with categorical features support

CatBoost: gradient boosting with categorical features support

Comments
1 min read
Consistent Individualized Feature Attribution for Tree Ensembles
Cover image for Consistent Individualized Feature Attribution for Tree Ensembles

Consistent Individualized Feature Attribution for Tree Ensembles

Comments
1 min read
LEAF: A Benchmark for Federated Settings
Cover image for LEAF: A Benchmark for Federated Settings

LEAF: A Benchmark for Federated Settings

Comments
1 min read
Qwen3 Technical Report
Cover image for Qwen3 Technical Report

Qwen3 Technical Report

Comments
1 min read
Pitfalls of Graph Neural Network Evaluation
Cover image for Pitfalls of Graph Neural Network Evaluation

Pitfalls of Graph Neural Network Evaluation

Comments
1 min read
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Cover image for D4RL: Datasets for Deep Data-Driven Reinforcement Learning

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Comments
1 min read
WebGPT: Browser-assisted question-answering with human feedback
Cover image for WebGPT: Browser-assisted question-answering with human feedback

WebGPT: Browser-assisted question-answering with human feedback

Comments
1 min read
Qwen2 Technical Report
Cover image for Qwen2 Technical Report

Qwen2 Technical Report

Comments
1 min read
Opening the Black Box of Deep Neural Networks via Information
Cover image for Opening the Black Box of Deep Neural Networks via Information

Opening the Black Box of Deep Neural Networks via Information

Comments
1 min read
No Language Left Behind: Scaling Human-Centered Machine Translation
Cover image for No Language Left Behind: Scaling Human-Centered Machine Translation

No Language Left Behind: Scaling Human-Centered Machine Translation

Comments
1 min read
The CMA Evolution Strategy: A Tutorial
Cover image for The CMA Evolution Strategy: A Tutorial

The CMA Evolution Strategy: A Tutorial

Comments
1 min read
MUSAN: A Music, Speech, and Noise Corpus
Cover image for MUSAN: A Music, Speech, and Noise Corpus

MUSAN: A Music, Speech, and Noise Corpus

Comments
1 min read
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Cover image for Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Comments
1 min read
Mixtral of Experts
Cover image for Mixtral of Experts

Mixtral of Experts

Comments
1 min read
Differentially Private Federated Learning: A Client Level Perspective
Cover image for Differentially Private Federated Learning: A Client Level Perspective

Differentially Private Federated Learning: A Client Level Perspective

Comments
1 min read
The Roadmap to 6G -- AI Empowered Wireless Networks
Cover image for The Roadmap to 6G -- AI Empowered Wireless Networks

The Roadmap to 6G -- AI Empowered Wireless Networks

Comments
1 min read
Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by theInternational Skin Imaging Collaboration (ISIC)
Cover image for Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by theInternational Skin Imaging Collaboration (ISIC)

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by theInternational Skin Imaging Collaboration (ISIC)

Comments
1 min read
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Cover image for Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Comments
1 min read
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
Cover image for A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Comments
1 min read
Theano: new features and speed improvements
Cover image for Theano: new features and speed improvements

Theano: new features and speed improvements

Comments
1 min read
Distributionally Robust Neural Networks for Group Shifts: On the Importance ofRegularization for Worst-Case Generalization
Cover image for Distributionally Robust Neural Networks for Group Shifts: On the Importance ofRegularization for Worst-Case Generalization

Distributionally Robust Neural Networks for Group Shifts: On the Importance ofRegularization for Worst-Case Generalization

Comments
1 min read
Improved Simulation of Stabilizer Circuits
Cover image for Improved Simulation of Stabilizer Circuits

Improved Simulation of Stabilizer Circuits

Comments
1 min read
Gemma 2: Improving Open Language Models at a Practical Size
Cover image for Gemma 2: Improving Open Language Models at a Practical Size

Gemma 2: Improving Open Language Models at a Practical Size

Comments
1 min read
The 2017 DAVIS Challenge on Video Object Segmentation
Cover image for The 2017 DAVIS Challenge on Video Object Segmentation

The 2017 DAVIS Challenge on Video Object Segmentation

Comments
1 min read
Blockchain Technology Overview
Cover image for Blockchain Technology Overview

Blockchain Technology Overview

Comments
1 min read
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
Cover image for Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Comments
1 min read
DeepID3: Face Recognition with Very Deep Neural Networks
Cover image for DeepID3: Face Recognition with Very Deep Neural Networks

DeepID3: Face Recognition with Very Deep Neural Networks

Comments
1 min read
ERNIE: Enhanced Representation through Knowledge Integration
Cover image for ERNIE: Enhanced Representation through Knowledge Integration

ERNIE: Enhanced Representation through Knowledge Integration

Comments
1 min read
Expanding Performance Boundaries of Open-Source Multimodal Models with Model,Data, and Test-Time Scaling
Cover image for Expanding Performance Boundaries of Open-Source Multimodal Models with Model,Data, and Test-Time Scaling

Expanding Performance Boundaries of Open-Source Multimodal Models with Model,Data, and Test-Time Scaling

Comments
2 min read
Resnet in Resnet: Generalizing Residual Architectures
Cover image for Resnet in Resnet: Generalizing Residual Architectures

Resnet in Resnet: Generalizing Residual Architectures

Comments
1 min read
Towards Accurate Generative Models of Video: A New Metric & Challenges
Cover image for Towards Accurate Generative Models of Video: A New Metric & Challenges

Towards Accurate Generative Models of Video: A New Metric & Challenges

Comments
1 min read
The History Began from AlexNet: A Comprehensive Survey on Deep LearningApproaches
Cover image for The History Began from AlexNet: A Comprehensive Survey on Deep LearningApproaches

The History Began from AlexNet: A Comprehensive Survey on Deep LearningApproaches

Comments
1 min read
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
Cover image for Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

Comments
1 min read
An O(m) Algorithm for Cores Decomposition of Networks
Cover image for An O(m) Algorithm for Cores Decomposition of Networks

An O(m) Algorithm for Cores Decomposition of Networks

Comments
1 min read
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Cover image for eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Comments
1 min read