Originally published on 2026-01-25
Original article (Japanese): MiroThinker: AIは「記憶」ではなく「調査」で賢くなる時代へ
The conventional wisdom that the only way to enhance AI performance is to increase model size is now being challenged.
MiroThinker is an open-source search agent developed by MiroMind. Despite being a relatively small model with 30 billion parameters, it achieves performance that surpasses models with one trillion parameters. The secret lies in a new scaling method called "Interactive Scaling."
Three Axes of Scaling in AI Evolution
Traditionally, there have been two axes for improving AI performance.
graph LR
subgraph Traditional Scaling
A[Model Size] --> B[Increase in Parameter Count]
C[Context Length] --> D[Expansion of Input Length]
end
subgraph New Scaling
E[Interactive Scaling] --> F[Depth of Tool Invocation]
end
B --> G[Performance Improvement]
D --> G
F --> G
- Scaling Model Size: Increasing the number of parameters (7B → 70B → 700B)
- Scaling Context Length: Increasing the number of tokens that can be input (4K → 32K → 128K)
The "Interactive Scaling" proposed by MiroThinker represents a third axis that follows these.
What is Interactive Scaling?
The essence of Interactive Scaling is the idea that "to make AI smarter, we should not cram in more knowledge, but rather engage it in deeper interactions with the external world."
Traditional Approach: Relying on Memory
Conventional large language models (LLMs) learn from vast amounts of text data and "remember" this information to generate responses.
Question → [Search from LLM's Memory] → Answer
The issues with this approach are clear.
- Hallucination: When memory is vague, it confidently provides incorrect answers.
- Obsolescence of Information: It lacks knowledge of information beyond the training data.
- Increased Costs: To store more information, a larger model is required.
MiroThinker's Approach: Investigating
Instead of relying on memory, MiroThinker "investigates."
Question → [Formulate Hypothesis] → [Investigate with Tools] → [Verify Results] → [Reinvestigate if Necessary] → Answer
The key point is that this "investigate → verify → reinvestigate" loop can be repeated up to 400 times (in the case of v1.5). This is the essence of "Interactive Scaling."
Performance Proportional to "Tool Invocation Count"
An interesting finding reported in the MiroThinker paper states:
"As the depth and frequency of tool invocation increase, the performance on research tasks improves in a predictable manner."
In other words, just like model size and context length, the number of tool invocations also follows a scaling law.
How Can a 30B Model Surpass One Trillion Parameters?
MiroThinker v1.5 (30B parameters) outperformed the one trillion parameter Kimi-K2-Thinking on the BrowseComp-ZH benchmark.
| Model | Parameter Count | BrowseComp-ZH |
|---|---|---|
| Kimi-K2-Thinking | 1T | 68.5 |
| MiroThinker v1.5 | 30B | 69.8 |
This reversal phenomenon can be explained as follows.
Limitations of Large Models
A one trillion parameter model "remembers" vast amounts of knowledge. However,
- Memory is not perfect.
- It lacks knowledge of new information.
- Complex reasoning makes it difficult to "connect the dots" in memory.
Strength of Small Models + Deep Investigation
While a 30B model may fall short in terms of memory capacity,
- It has sufficient ability to determine "what to investigate."
- It can obtain the latest and most accurate information through external tools.
- It improves accuracy through the loop of hypothesis → verification → correction.
As a result, the "ability to investigate intelligently" has surpassed the "ability to remember a lot."
Scientist Mode: A Mechanism to Reduce Hallucinations
One of the distinctive designs of MiroThinker is "Scientist Mode."
According to a VentureBeat article,
"MiroThinker is trained to execute verifiable research loops instead of generating statistically plausible answers from memory patterns."
Specifically, it follows these steps:
- Proposing Hypotheses: Formulating hypotheses in response to questions.
- Querying External Sources: Using tools to seek evidence.
- Identifying Discrepancies: Detecting contradictions between hypotheses and evidence.
- Modifying Conclusions: Updating hypotheses as needed.
- Revalidating: Revalidating the modified conclusions.
This approach enables important auditability in enterprise environments, allowing tracking of "how the AI arrived at that conclusion."
Cost Efficiency: Approaching GPT-5 at 1/20 the Cost
Another advantage of Interactive Scaling is cost efficiency.
| Item | MiroThinker v1.5 | Kimi-K2-Thinking |
|---|---|---|
| Inference Cost | $0.07/call | $1.40/call |
| Parameter Count | 30B | 1T |
| Required GPU | Medium Scale | Large Scale Cluster |
According to reports from MiroMind, the inference cost of MiroThinker v1.5 is about 1/20 that of Kimi-K2-Thinking. This means:
- Local deployment is feasible: Can be operated on corporate servers.
- Reduced dependency on APIs: Lower reliance on external APIs.
- Lower barriers for experimentation: Many experiments can be conducted at low cost.
Time-Sensitive Training Sandbox
Another technical innovation of MiroThinker is the "Time-Sensitive Training Sandbox."
In traditional model training, the model has a "God's eye view," meaning it can access "future information" contained in the training data. This is not reflective of real-world reasoning conditions.
In MiroThinker's training,
- Access is limited to information prior to a specific timestamp.
- Prevents "future leakage."
- Forces reasoning under incomplete information.
This leads to reasoning capabilities that are more suitable for real-world tasks.
Real-World Use Cases
You can try MiroThinker in the online demo.
It can also be run locally, serving the model with SGLang or vLLM.
# Example deployment with SGLang
python -m sglang.launch_server \
--model-path miromind-ai/MiroThinker-v1.5-30B \
--tp 4 \
--host 0.0.0.0 \
--port 1234
It is released under the MIT license, allowing for commercial use.
Analogies with Human Abilities
Let’s consider the three axes of AI scaling in terms of human abilities.
| AI Scaling Axis | Human Ability | Example |
|---|---|---|
| Model Size | Memory & Knowledge | An erudite person, someone with high memorization skills |
| Context Length | Working Memory | The ability to hold and process multiple pieces of information simultaneously |
| Interactive Scaling | Investigative & Research Skills | Researching in a library, consulting experts, experimenting and verifying |
Memory-Based vs Investigative
Traditional LLMs may resemble "a person who has memorized an encyclopedia." They possess vast knowledge but cannot answer questions they don't know and confidently relay inaccuracies.
In contrast, MiroThinker is akin to "an excellent researcher" or "an investigative journalist."
- Even without memorizing everything, it knows "what to investigate."
- It has a habit of consulting primary sources.
- It formulates hypotheses, verifies them, and corrects them if wrong.
- It cross-references multiple sources to confirm reliability.
| Memory-Based (Traditional LLM) | Investigative (MiroThinker) |
|---|---|
| A person who memorized an encyclopedia | An excellent librarian |
| Quiz champion | Investigative journalist |
| A student who scores high on tests | A researcher capable of writing papers |
The core difference is clear.
- Memory-Based: Can only answer what it "knows."
- Investigative: Can research and answer what it "does not know."
We humans do not remember everything. Rather, we investigate, verify, and correct when necessary—this ability is what we might call true "intelligence."
Implications of Interactive Scaling
The success of MiroThinker poses fundamental questions about the evolution of AI.
"Should we make AI smarter by having it remember more, or by enabling it to investigate more intelligently?"
Until now, the answer has been "memory." Frontier models like GPT-4, Claude, and Gemini have all evolved towards "being larger and remembering more."
However, MiroThinker shows a different path.
- Memory has its limits (hallucinations, obsolescence, costs).
- Investigative abilities scale (proportional to tool invocation count).
- Even small models can surpass larger models with sufficient investigative capability.
The evolution of AI may be undergoing a paradigm shift from "remembering more" to "investigating more intelligently."
Top comments (0)