Sakana Fugu is a multi-agent orchestration system from Sakana AI, exposed as a single foundation model behind an OpenAI-compatible API. Instead of answering every prompt directly, Fugu acts as a trained conductor: it delegates work, coordinates agent communication, and synthesizes outputs from a pool of LLMs, including recursive copies of itself. Sakana announced Fugu on June 22, 2026 under “One Model to Command Them All” on the official release page. If you have followed other frontier releases like Claude Fable 5, Fugu is different: it is a conductor, not a soloist.
The name fits the architecture. Fugu is the Japanese pufferfish: valuable, but only when prepared correctly. That is a useful way to think about an AI system whose value comes from how it coordinates other models rather than from a single model answering alone.
TL;DR
- What it is: A trained “conductor” LLM that orchestrates a team of frontier models behind one endpoint.
-
Variants:
fugufor balanced, lower-latency use;fugu-ultrafor maximum answer quality. - API shape: OpenAI-compatible. Existing OpenAI clients can usually be pointed at Fugu’s base URL.
- Important caveat: Fugu calls other vendors’ models, recursively including itself. Treat benchmark results as model-of-models results, not single-model wins.
- Access: Product page plus console.sakana.ai behind Google/email login.
What Sakana Fugu actually is
Most LLM APIs work like this:
- You send a prompt.
- One model processes it.
- You receive tokens from that model.
Fugu adds an orchestration layer. When a request arrives, Fugu decides whether to answer directly or to assemble a team of models. It can split the task, assign subtasks, route messages between agents, and merge the results into one response.
From your code, Fugu looks like one model behind one endpoint. Internally, it is a model-of-models. That distinction matters when you evaluate latency, cost, and benchmark claims.
The branding also matches Sakana’s larger theme. “Sakana” means “fish” in Japanese, and the company often frames its research around school-of-fish collective intelligence: many small agents producing behavior stronger than any individual agent.
The two variants: Fugu and Fugu Ultra
Sakana ships two variants through the same API.
Fugu
Use Fugu when latency matters.
Typical use cases:
- Coding assistance
- Code review
- Chatbots
- Interactive developer tools
- General productivity workflows
This is the variant that was called “Fugu Mini” during beta. In current usage, lead with Fugu; “Mini” is the old beta name.
Fugu Ultra
Use Fugu Ultra when answer quality matters more than latency.
Typical use cases:
- AI research
- Paper reproduction
- Cybersecurity analysis
- Literature review
- Patent or technical investigation
Both variants run behind one endpoint. You choose the model ID, but you do not manually control the internal orchestration.
For a deeper comparison, see the Fugu Ultra vs Fable 5 vs Mythos breakdown.
Spec table
| Attribute | Detail |
|---|---|
| Vendor | Sakana AI |
| Released | June 22, 2026 |
| Type | Multi-agent orchestration system, shipped as one foundation model |
| Variants | Fugu, Fugu Ultra |
| Old beta name | “Fugu Mini” for the smaller variant |
| API | One OpenAI-compatible endpoint |
| Model IDs reported |
fugu, fugu-ultra; verify in console before shipping |
| Base URL | Not published publicly; copy from console.sakana.ai |
| Access | Product page plus console.sakana.ai with Google/email login |
| Pricing structure | Subscription tiers plus pay-as-you-go for heavier or enterprise use |
| Research lineage | Trinity, arXiv:2512.04695; Conductor, arXiv:2512.04388; both ICLR 2026 |
How the orchestration works
Fugu is not just a static router. The core idea is a learned conductor.
A traditional router picks one model for a request and forwards the prompt. Tools like OpenRouter or Martian follow that pattern.
An agent framework like Swarm, AutoGen, or LangGraph gives you the primitives to build multi-agent workflows, but you write the coordination logic.
Fugu sits between those approaches:
- You call one API.
- Fugu decides whether the task needs multiple agents.
- Fugu delegates subtasks.
- Fugu manages communication.
- Fugu synthesizes the final answer.
Per Sakana, the conductor handles three main jobs:
Delegation
It chooses which agents, including possible recursive copies of itself, should handle which subtasks.Communication
It manages messages between agents and can shape the team structure dynamically.Synthesis
It merges partial outputs into one coherent final response.
Two governance-related mechanics are also important:
- Swappable agents: The model pool is not fixed. Teams can opt specific agents out for data, policy, or compliance reasons.
- Routing around restrictions: Per Sakana, Fugu can dynamically route around provider restrictions by choosing a different agent when one is unavailable or disallowed.
For a related single-model framing, see the Mythos-class model explainer.
The benchmark caveat: orchestrator, not single model
Read Fugu benchmark claims carefully.
Fugu is an orchestrator that calls other vendors’ frontier models, recursively including itself. A Fugu benchmark result may come from Fugu calling Opus 4.8, Fable 5, or multiple models, then synthesizing their outputs. That is a model-of-models result, not a like-for-like single-model win.
This matters for two Sakana claims.
First, Sakana says Fugu Ultra “stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview” across engineering, scientific, and reasoning benchmarks. Read that as a parity claim, not a “Fugu’s own weights beat every single model” claim.
Second, Sakana says Fugu “consistently outperforms” Gemini 3.1 Pro high, Opus 4.8 max, and GPT 5.5 xhigh on these application tasks:
- AutoResearch
- Rubik’s Cube
- Mechanical Design
- Japanese Handwriting Analysis
- One-Shot Chess
- Financial Time Series Prediction
The honest interpretation: the orchestrated team can outperform an individual model while still depending on that individual model as part of the team.
That does not make the result invalid. Orchestration is a real capability. It just means you should label the result correctly.
For context, Fable 5 is Anthropic’s most powerful generally available model, while Mythos Preview was an unreleased frontier model. The Claude Fable 5 explainer covers that side in more detail.
Research lineage: Trinity and Conductor
Fugu builds on two ICLR 2026 research threads.
Trinity: An Evolved LLM Coordinator
Trinity: An Evolved LLM Coordinator describes a tiny coordinator with fewer than 20,000 parameters, optimized by derivative-free evolution.
It assigns roles such as:
- Thinker
- Worker
- Verifier
The key point: a very small evolved controller can drive a useful multi-agent loop.
Conductor: Learning to Orchestrate Agents in Natural Language
Conductor: Learning to Orchestrate Agents in Natural Language describes a 7B model trained with reinforcement learning to learn communication structures between agents.
It claims to outperform Mixture-of-Agents at lower cost.
These are different methods:
- Trinity uses evolution.
- Conductor uses reinforcement learning.
- Trinity is sub-20K parameters.
- Conductor is 7B parameters.
Do not assume the shipped Fugu product has the same parameter count or implementation details. Sakana has not published a parameter count for Fugu, so applying third-party details directly to the product is inference, not an official spec.
The practical novelty is narrower and more useful: Fugu packages learned, adaptive, cost-aware orchestration behind one endpoint.
What early users report
Sakana shared two vendor-reported testimonials.
One software engineer using Fugu Ultra for code review said it surfaced “more than twenty” issues where other tools flagged “about three,” and called it better than GPT-5.5.
A security engineer said one scoped instruction drove a full end-to-end assessment, including recon, XSS and SQLi probing, and auth review, while staying within scope.
Treat these as anecdotes, not benchmarks. They are still useful signals for workload fit. Fugu is most interesting for tasks that:
- Decompose into subtasks
- Benefit from parallel investigation
- Need verification or synthesis at the end
- Have enough complexity to justify orchestration overhead
For Sakana’s related model lineup, see the Mirofish explainer.
Using the Fugu API
Fugu exposes an OpenAI-compatible endpoint. That is the main implementation advantage: if your app already uses the OpenAI SDK, you can usually switch by changing the API key, base URL, and model ID.
The request shape follows the standard OpenAI Chat Completions API.
Important caveat: as of 2026-06-22, the base URL is not published on a public page. Do not guess it. Copy it from console.sakana.ai.
Python example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_FUGU_API_KEY",
base_url="<YOUR_FUGU_BASE_URL_FROM_CONSOLE>",
)
response = client.chat.completions.create(
model="fugu",
messages=[
{
"role": "system",
"content": "You are a precise coding assistant."
},
{
"role": "user",
"content": "Review this function for off-by-one bugs."
},
],
)
print(response.choices[0].message.content)
Switch to Fugu Ultra
Change the model ID:
response = client.chat.completions.create(
model="fugu-ultra",
messages=[
{
"role": "user",
"content": "Reproduce the headline result from this paper."
},
],
)
print(response.choices[0].message.content)
The reported model IDs are:
fugu
fugu-ultra
Verify the exact strings in the Sakana console before shipping. Model IDs and dated variants can change.
For a full walkthrough, see the Sakana Fugu API guide.
How to test Fugu in Apidog
Because Fugu uses the OpenAI Chat Completions format, you can test it like any other LLM endpoint in Apidog.
A practical setup:
- Copy your Fugu base URL from
console.sakana.ai. - Create an Apidog environment variable, for example:
FUGU_BASE_URL=<YOUR_FUGU_BASE_URL_FROM_CONSOLE>
FUGU_API_KEY=<YOUR_FUGU_API_KEY>
- Create a new request using the chat completions path from your Fugu-compatible endpoint.
- Add authorization:
Authorization: Bearer {{FUGU_API_KEY}}
- Send a JSON body like this:
{
"model": "fugu",
"messages": [
{
"role": "system",
"content": "You are a precise API testing assistant."
},
{
"role": "user",
"content": "Review this API response schema and identify breaking changes."
}
]
}
- Save the request as a reusable test case.
- Duplicate it and change only the model ID to compare
fuguandfugu-ultra.
This matters more for Fugu than for a single-model endpoint. Because Fugu may assemble different teams per request, latency and cost can vary. Capturing request timing, response shape, and token usage in Apidog gives you workload-specific data instead of relying only on vendor benchmarks.
You can download Apidog and point a new request at your console base URL to start testing.
Frequently asked questions
Is Sakana Fugu a single model or many models?
Both, depending on perspective.
To your code, Fugu is one model behind one API. Internally, it is a trained conductor that can call a pool of frontier models and synthesize their outputs. That is why its benchmark results should be treated as model-of-models results, not single-model wins.
See the Mythos-class model explainer for the single-model tier Fugu is often compared against.
What is the difference between Fugu and Fugu Ultra?
Fugu is the balanced, lower-latency variant for everyday tasks like coding, code review, and chatbots.
Fugu Ultra trades latency for higher answer quality and targets heavier work like research, security analysis, and deep technical investigation.
Both use the same endpoint. You select the variant with the model ID.
Does Fugu really beat Opus 4.8 and GPT 5.5?
Per Sakana, Fugu consistently outperforms Gemini 3.1 Pro, Opus 4.8, and GPT 5.5 on a specific list of application tasks.
The precise reading is that Fugu can orchestrate a team that may include those same kinds of models. A team result can beat a solo result while still depending on the solo model. Do not present it as Fugu’s own weights beating every individual model.
How do I call the Fugu API?
Use an OpenAI-compatible client, set your Fugu API key, set the Fugu base URL from the Sakana console, and send a standard Chat Completions request.
Example:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_FUGU_API_KEY",
base_url="<YOUR_FUGU_BASE_URL_FROM_CONSOLE>",
)
response = client.chat.completions.create(
model="fugu",
messages=[
{"role": "user", "content": "Explain this error log and suggest a fix."}
],
)
print(response.choices[0].message.content)
The base URL is not public, so copy it from console.sakana.ai.
For a complete example, see the Sakana Fugu API guide.
Is Fugu available to everyone right now?
Access runs through the product page and console.sakana.ai, behind Google or email login.
The beta reportedly ran with roughly 500 users from late April 2026. Whether fully self-serve GA signup is open, and whether any regional restrictions apply, should be checked live in the console.
How is Fugu different from a router or an agent framework?
A router chooses one model and forwards your request.
An agent framework gives you primitives to build multi-agent workflows, but you write the coordination logic.
Fugu trains the coordinator itself. A learned model decides delegation, communication, and synthesis, then exposes the result behind one endpoint.
Bottom line
Fugu is a bet that the next gains come from how models work together, not only from larger single-model weights.
For developers, the practical evaluation is straightforward:
- Use the OpenAI-compatible API.
- Test
fuguandfugu-ultraon your real tasks. - Measure latency, cost, and output quality.
- Compare against the single-model endpoints you already use.
- Decide whether orchestration improves your workload enough to justify the overhead.
Set up your first Fugu request in Apidog, save it as a reusable test case, and compare the results against your current LLM stack.





Top comments (0)