Breaking New Ground in AI Research Capabilities with Up to 600 Tool Calls Per Task
Published by MiroMind Team | November 2024
Introduction: The Dawn of Autonomous Research Intelligence
The landscape of artificial intelligence is witnessing a profound transformation. We're moving beyond static text generation toward dynamic, tool-augmented agents capable of conducting sophisticated research autonomously. The ability to formulate hypotheses, retrieve and verify evidence, and synthesize insights across diverse information sources represents a new frontier in AI capability—one that demands more than just linguistic fluency.
Proprietary systems like ChatGPT Agent and Claude Research have demonstrated near-human proficiency in literature review, comparative analysis, and reasoning-driven knowledge discovery. However, these systems remain closed, constraining transparency, reproducibility, and community-driven innovation. The open-source community has struggled to match their performance, facing limitations in model scale, context length, and interaction depth.
"MiroThinker v1.0 introduces a third dimension of scaling—interactive scaling—that enables sustained multi-turn reasoning through up to 600 tool calls per task within a 256K context window."
Enter MiroThinker v1.0, a groundbreaking open-source research agent that fundamentally reimagines how we approach AI research capabilities. Unlike previous approaches that focused solely on scaling model size or context length, MiroThinker explores interaction scaling as a third critical dimension of performance improvement.
Key Innovation: The Three Dimensions of Agent Scaling
MiroThinker v1.0 represents a paradigm shift by systematically addressing three complementary scaling dimensions:
1. Model Size Scaling
Built on the robust Qwen2.5 and Qwen3 foundations, MiroThinker is available in three variants to accommodate diverse computational budgets:
- 8B variant: Optimized for efficiency while maintaining strong performance
- 30B variant: Balanced performance-to-compute ratio for most applications
- 72B variant: State-of-the-art performance approaching commercial systems
2. Context Length Scaling
With a 256K context window, MiroThinker can maintain extensive conversation histories, complex reasoning chains, and comprehensive tool interaction records. This extended context enables the model to synthesize information across multiple documents and maintain coherent long-horizon planning.
3. Interactive Scaling (The Breakthrough)
The most revolutionary aspect of MiroThinker is its systematic training for deeper and more frequent agent-environment interactions. Unlike traditional LLM test-time scaling that operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories.
Performance Breakthrough: MiroThinker-72B achieves up to 81.9% on GAIA, 37.7% on Humanity's Last Exam, 47.1% on BrowseComp, and 55.6% on BrowseComp-ZH, surpassing previous open-source agents and approaching GPT-5-high performance.
Technical Deep Dive
ReAct Workflow Architecture
MiroThinker operates under the ReAct (Reasoning and Acting) paradigm, implementing a sophisticated iterative loop of reasoning, tool invocation, and observation. The model maintains a trajectory history and alternates between generating internal thoughts and executing structured tool calls until task completion.
The core workflow follows this pattern:
- Think: Generate internal reasoning about the current state and next action
- Act: Execute a structured tool invocation based on the reasoning
- Observe: Process the tool response and update internal understanding
- Repeat: Continue until the task is resolved or termination criteria are met
Comprehensive Tool Interface
Execution Environment
MiroThinker employs a Linux sandbox that provides isolated runtime for command and code execution. The agent can create sandbox instances and execute both shell commands and Python code within secure, controlled environments. This design ensures safe interaction with system-level resources while maintaining flexibility.
File Management
The system implements bidirectional file transfer capabilities, supporting:
- Upload from local systems to sandbox environments
- Download from sandbox to local storage
- Direct retrieval of remote assets from URLs
Information Retrieval
Two sophisticated retrieval tools power MiroThinker's research capabilities:
- Google Search Integration: Returns structured search results for broad information gathering
- Intelligent Web Scraping: Uses a lightweight LLM (Qwen3-14B) to extract task-relevant information from target URLs, serving as an efficient context management mechanism
Advanced Context Management
To maximize the efficiency of the 256K context window and enable up to 600 tool calls per task, MiroThinker implements two key strategies:
Recency-Based Context Retention
Rather than retaining all tool outputs (which would quickly overwhelm the context), the system preserves only the most recent tool responses while maintaining the complete sequence of thoughts and actions. This approach leverages the empirical observation that subsequent actions depend primarily on recent observations rather than distant ones.
Result Truncation
Long outputs from code execution and command tools are automatically truncated with clear indicators, preventing context overflow while preserving essential information.
"The recency-based retention strategy preserves reasoning traces while focusing attention on contextually relevant observations, freeing additional context for extended reasoning and deeper tool-use trajectories."
Data Construction: Building the MiroVerse Dataset
MultiDocQA Synthesis
The team developed an sophisticated pipeline that transforms interlinked web documents into complex, multi-hop QA pairs:
- Document Corpus Construction: Diverse sources including Wikipedia and Common Crawl with preserved hyperlink structures
- Knowledge Graph Creation: Connected subgraphs of related documents following internal hyperlinks
- Fact Extraction: Key statements requiring cross-document reasoning
- Constraint Obfuscation: Systematic transformation of facts into indirect constraints requiring deeper reasoning
Agentic Trajectory Synthesis
High-quality trajectory data generation through multiple complementary approaches:
- Agent Paradigms: Both ReAct single-agent and MiroFlow multi-agent frameworks
- Tool Invocation Methods: Traditional function calling and flexible Model Context Protocol (MCP)
- Diverse Model Integration: Multiple leading LLMs including GPT-OSS and DeepSeek-V3.1
Three-Stage Training Pipeline
Stage 1: Agentic Supervised Fine-Tuning (SFT)
The foundation stage establishes fundamental agentic behaviors through imitation learning on expert trajectories. The model learns to mimic complex multi-hop reasoning and tool use patterns, with rigorous filtering to ensure trajectory quality and consistency.
Stage 2: Agentic Preference Optimization (DPO)
Direct Preference Optimization refines decision-making by learning from preference pairs. Crucially, the team avoided rigid structural constraints, instead focusing on answer correctness as the primary ranking criterion to prevent systematic biases.
Stage 3: Agentic Reinforcement Learning (GRPO)
The final stage employs Group Relative Policy Optimization with fully online policy training. This enables creative solution discovery and adaptation to diverse real-world environments through direct interaction and exploration. The system supports thousands of concurrent agentic rollouts with sophisticated reward design balancing correctness and format compliance.
Benchmark Results: Setting New Standards
MiroThinker v1.0 demonstrates exceptional performance across multiple challenging benchmarks, establishing new state-of-the-art results for open-source research agents:
Standout Achievements:
- GAIA Benchmark: 81.9% accuracy, surpassing MiniMax-M2 by 6.2 percentage points
- Humanity's Last Exam: 37.7% score, outperforming GPT-5-high by 2.5 points
- BrowseComp: 47.1% accuracy, competitive with OpenAI DeepResearch
- BrowseComp-ZH: 55.6% accuracy, setting new open-source records for Chinese benchmarks
The results demonstrate that MiroThinker not only leads among open-source alternatives but approaches and sometimes exceeds the performance of leading commercial systems while maintaining complete transparency and reproducibility.
Interactive Scaling: The Game-Changing Discovery
Perhaps the most significant finding from the MiroThinker research is the empirical validation of interactive scaling as a fundamental dimension of agent performance improvement. The analysis reveals that research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions.
"Interactive scaling exhibits behaviors analogous to model size and context length scaling, establishing it as a third critical dimension for building next-generation research agents."
Key insights from the interactive scaling analysis:
- Consistent Improvement: Performance gains scale predictably with interaction depth across all benchmarks
- Error Correction: Environment feedback enables trajectory refinement and error correction
- Creative Exploration: Reinforcement learning drives discovery of novel solution paths
- Sustained Reasoning: Extended interaction sequences maintain coherence and progress toward goals
The reinforcement learning-trained models exhibit substantially longer and deeper interaction trajectories compared to their supervised fine-tuning counterparts, with corresponding improvements in task performance. This demonstrates that the capacity for extended, meaningful interaction with environments is not just beneficial but essential for advanced research capabilities.
Implications for the Future of AI Research
MiroThinker v1.0 represents more than just another capable AI model—it establishes a new framework for thinking about agent capabilities and scaling laws. The discovery that interactive scaling constitutes a third fundamental dimension alongside model size and context length has profound implications for future research directions.
This breakthrough suggests that the path to human-level research capability may not require only larger models or longer contexts, but fundamentally different training approaches that emphasize iterative interaction with environments. The open-source nature of MiroThinker ensures that these advances can be studied, reproduced, and built upon by the entire research community.
The model's ability to perform sustained multi-turn reasoning through hundreds of tool calls opens new possibilities for autonomous research workflows, from literature review and hypothesis generation to experimental design and result analysis. As these capabilities mature, we may witness the emergence of AI systems that can genuinely contribute to scientific discovery and knowledge advancement.
Conclusion
MiroThinker v1.0 establishes a new paradigm for open-source research agents, demonstrating that it's possible to match and sometimes exceed the performance of proprietary systems while maintaining transparency and community accessibility. The introduction of interactive scaling as a third fundamental dimension of agent capability represents a conceptual breakthrough that will likely influence the direction of AI research for years to come.
By systematically addressing model size, context length, and interaction depth, MiroThinker proves that the gap between open-source and commercial AI capabilities can be closed through thoughtful engineering and innovative training approaches. The model's exceptional performance across diverse benchmarks, combined with its comprehensive tool suite and sophisticated context management, positions it as a valuable resource for researchers, developers, and organizations seeking advanced AI research capabilities.
Access MiroThinker v1.0
Online Demo: https://dr.miromind.ai
Code Repository: https://github.com/MiroMindAI/MiroThinker
Model Weights:
- 72B: huggingface.co/miromind-ai/MiroThinker-v1.0-72B
- 30B: huggingface.co/miromind-ai/MiroThinker-v1.0-30B
- 8B: huggingface.co/miromind-ai/MiroThinker-v1.0-8B
Dataset: MiroVerse v0.1
Paper: arXiv:2511.11793
Top comments (0)