Hassann

Posted on Jun 22 • Originally published at apidog.com

Fugu Ultra vs Fable 5 vs Mythos: How the Sakana Orchestrator Stacks Up Against the Frontier

Fugu Ultra is Sakana AI’s top Fugu variant. Treat it as a frontier-comparable orchestration system, not as a single model that “beats” the frontier. Sakana says Fugu Ultra “stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview” on engineering, scientific, and reasoning benchmarks. That is a parity claim. It matters because Fugu can call other vendors’ models behind one OpenAI-compatible API. Read the official details on the Sakana Fugu release page, and see our deeper breakdown in what is Sakana Fugu.

Try Apidog today

What you are actually comparing

Fugu is a multi-agent orchestration system exposed as one model behind one OpenAI-compatible API. Sakana describes it as a trained language model specialized in delegation, agent communication, and work synthesis. On each request, Fugu can answer directly or assemble a team of models, including recursive instances of itself.

Fable 5 and Mythos are different. They are single Anthropic models. Fable 5 is Anthropic’s most powerful generally available model, described as a “Mythos-class” model made safe for public use and positioned above Opus 4.8. Mythos Preview, released April 7, 2026, is the frontier model Anthropic described as too dangerous to release.

One naming detail matters: Sakana compares against Mythos Preview, not the current Mythos 5. For the Anthropic side, see Fable 5 vs Mythos 5 and the Mythos-class model explained.

Use this frame for the rest of the comparison:

Fugu / Fugu Ultra: model-of-models orchestration system
Fable 5 / Mythos: single Anthropic models

That difference changes how you should read every benchmark.

Fugu and Fugu Ultra, briefly

Fugu ships in two variants through one endpoint:

Fugu: balanced, lower-latency option for everyday tasks, coding, code review, chatbots, and interactive services.
Fugu Ultra: maximum-quality option for AI research, paper reproduction, cybersecurity analysis, literature review, patent investigation, and other complex workflows.

The beta and some coverage called the smaller variant “Fugu Mini,” but the release page leads with “Fugu” and “Fugu Ultra,” so those are the names to use.

Why “orchestrator vs single model” matters

Do not skip this distinction.

Fugu is an orchestrator. When it produces a strong answer, it may have called another vendor’s frontier model, then added planning, verification, retries, or synthesis on top. That could include calls to Opus 4.8, Gemini, or recursive Fugu instances.

So if you see a result where Fugu “beats Opus 4.8,” that does not necessarily mean Fugu’s own weights beat Opus. It may mean the full Fugu system used Opus, checked the result, improved it, and returned a better final answer.

That is still useful. It is just a different kind of capability.

A fair summary is:

Fugu is an orchestrated system that can reach frontier-comparable quality, partly by routing to frontier models.

That is more accurate than saying “Fugu beats Fable 5.” For a benchmark walkthrough, see Sakana Fugu benchmarks.

Tier one: Sakana’s parity claim

Sakana’s first claim is parity.

Per Sakana, Fugu Ultra stands shoulder-to-shoulder with Fable 5 and Mythos Preview across engineering, scientific, and reasoning benchmarks.

Read that literally:

It does not say Fugu Ultra wins.
It says Fugu Ultra keeps pace.
It compares against Mythos Preview, not Mythos 5.

That last point matters. Anthropic pricing as of June 9, 2026 reflects the gap:

Model	Input	Output
Fable 5	$10 / 1M tokens	$50 / 1M tokens
Mythos 5	$10 / 1M tokens	$50 / 1M tokens
Mythos Preview	$25 / 1M tokens	$125 / 1M tokens

Naming Mythos Preview is a defensible reproducible comparison, but it does not mean Fugu Ultra was measured against the current ceiling.

For the single-model side, Claude Fable 5 vs Opus 4.8 explains where Fable 5 lands.

Tier two: Sakana’s outperform claim

Sakana also makes a separate outperform claim, but against a different model set.

Per Sakana, Fugu “consistently outperforms” these models on specific applications:

Gemini 3.1 Pro high
Opus 4.8 max
GPT 5.5 xhigh

The named applications are narrow:

AutoResearch
Rubik’s Cube
Mechanical Design
Japanese Handwriting Analysis
One-Shot Chess
Financial Time Series Prediction

This is where orchestration can shine. A system that can plan, delegate, verify, and retry is well-suited to structured, multi-step tasks.

The correct interpretation:

Fugu’s coordination layer can add measurable value on structured, verifiable tasks, sometimes enough to beat a single frontier model on that task.

The incorrect interpretation:

Fugu’s underlying model weights are generally stronger than Opus 4.8, Gemini, or GPT 5.5.

Sakana’s claim is application-level, not a broad general-benchmark win.

Comparison table

Read the first row before the rest.

Dimension	Fugu / Fugu Ultra	Fable 5	Mythos Preview / Mythos 5
System type	Orchestrator that calls multiple LLMs, including itself	Single Anthropic model	Single Anthropic model
Vendor	Sakana AI	Anthropic	Anthropic
Sakana’s claim	Parity with Fable 5 and Mythos Preview	Named parity peer	Named parity peer is Preview, not 5
Separate outperform claim	Vs Gemini 3.1 Pro, Opus 4.8, GPT 5.5 on named apps	Not the outperform target	Not the outperform target
Pricing	Reported tiers plus PAYG, all unverified	$10 input / $50 output per 1M tokens	Preview: $25 / $125; Mythos 5: $10 / $50
API surface	One OpenAI-compatible endpoint	Anthropic API	Anthropic API
Best fit	Structured multi-step tasks and governance-aware routing	General-purpose frontier quality	Raw frontier ceiling

Fugu pricing figures are reported from secondary or JS-rendered sources, not confirmed directly from the release page. Treat them as unverified until you confirm them in your Sakana console. For Fable 5’s own benchmark profile, see Claude Fable 5 benchmarks.

Pricing: what to verify before using it

Sakana confirms the pricing structure:

Subscription tiers for everyday use
Pay-as-you-go for heavier and enterprise workloads

The structure is confirmed. The actual numbers are not.

As of 2026-06-22, reported figures come from secondary or JS-rendered sources rather than the release page. Reported subscription tiers are:

$20 / month
$100 / month
$200 / month

Reported pay-as-you-go pricing is roughly:

$5 input / 1M tokens
$30 output / 1M tokens
$0.50 cached / 1M tokens
Context surcharge above 272K tokens

The base Fugu variant is reportedly passthrough-billed at the standard rate of the underlying model it calls. No standalone free tier surfaced.

Do not budget against those numbers yet. Verify them in your console first.

Research lineage: what it supports and what it does not

Fugu did not invent orchestration. Mixture-of-Agents from Together AI, published at ICLR 2025, already showed orchestrated models beating a single model.

Fugu’s narrower novelty is a learned, adaptive, cost-selective topology exposed as one endpoint.

Two ICLR 2026 papers sit behind this direction:

Trinity, “An Evolved LLM Coordinator” (arXiv:2512.04695)
Conductor, “Learning to Orchestrate Agents in Natural Language” (arXiv:2512.04388)

They are not the same method:

Paper	Method	Size / structure
Trinity	Derivative-free evolution	Sub-20K-parameter coordinator with Thinker, Worker, and Verifier roles
Conductor	Reinforcement learning	7B model that learns communication structure

Do not assume either paper maps directly onto the shipped Fugu product. The official release does not give a product parameter count, so applying the 7B detail to Fugu itself is inference, not a stated fact.

The practical distinction:

Routers like OpenRouter or Martian pick one model and send the request there.
Agent frameworks like Swarm, AutoGen, or LangGraph make you build the coordinator.
Fugu trains the coordinator and exposes the whole system behind a single call.

How to test Fugu in your own workflow

Fugu exposes one OpenAI-compatible endpoint. That means you can reuse an existing OpenAI client and change the base URL.

The base URL is not published on a public page as of 2026-06-22. Do not copy random hosts from social posts or examples. Get the real base URL from your Sakana console:

console.sakana.ai

Reported model IDs are:

fugu
fugu-ultra

Confirm the exact IDs in your console before hardcoding them.

Example Python call

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SAKANA_API_KEY",
    base_url="<YOUR_FUGU_BASE_URL_FROM_CONSOLE>",
)

response = client.chat.completions.create(
    model="fugu-ultra",  # confirm exact ID in console
    messages=[
        {"role": "system", "content": "You are a careful code reviewer."},
        {"role": "user", "content": "Review this pull request for security issues."},
    ],
)

print(response.choices[0].message.content)

Because Fugu uses the OpenAI chat completions format, you can test it the same way you test other OpenAI-compatible model endpoints. See the OpenAI API reference for request shape details.

How to compare Fugu Ultra with Fable 5 or Opus 4.8

Use the same prompts, same evaluation criteria, and same expected output format.

A practical test loop:

Create one request for Fugu Ultra.
Create equivalent requests for Fable 5 and Opus 4.8.
Use the same system prompt and user prompt.
Save representative test cases:
- code review
- vulnerability analysis
- scientific reasoning
- financial time-series explanation
- long-context synthesis
Compare outputs against objective criteria:
- correctness
- citations or evidence
- missed edge cases
- ability to follow constraints
- latency
- total cost
Repeat with enough examples to avoid judging from one lucky answer.

In Apidog, create a request against your Sakana console base URL, set model to fugu-ultra, and save it as a reusable case. Then run equivalent requests against Fable 5 or Opus 4.8 endpoints. This turns Sakana’s parity claim into something you can measure with your own prompts.

You can download Apidog to set up the comparison.

Example request body

{
  "model": "fugu-ultra",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior security engineer. Be specific and cite the risky code paths."
    },
    {
      "role": "user",
      "content": "Review the following API handler for authentication and authorization issues..."
    }
  ]
}

Suggested assertions for API testing

If you are evaluating model behavior inside an API workflow, assert on concrete output properties instead of vibes:

{
  "must_include": [
    "authentication",
    "authorization",
    "input validation"
  ],
  "must_not_include": [
    "unsupported claim",
    "fabricated CVE"
  ],
  "minimum_findings": 3
}

For code review tasks, you can track:

number of valid findings
severity classification accuracy
false positives
suggested patch quality
whether the model respects scope

Compliance note: test provider opt-outs

Sakana says Fugu’s agents are swappable, specific agents can be opted out of the pool for data or compliance reasons, and Fugu dynamically routes around provider restrictions.

If you are testing Fugu in a regulated context, validate that behavior directly:

Exclude the providers you cannot use.
Run prompts that would normally trigger multi-agent routing.
Inspect the response trace if available.
Confirm excluded providers do not appear.
Save the test as a regression case.

Do not rely on provider opt-out claims without testing them in your own environment.

Verdict

The strongest case for Fugu Ultra is practical: Sakana shipped a trained conductor behind one endpoint, backed by a real research line, and claims frontier parity plus application-level wins. For structured tasks like AutoResearch, chess, code review, and security analysis, orchestration is exactly the kind of architecture that can help.

The case for caution is just as important:

The parity claim is against Mythos Preview, not Mythos 5.
The outperform claim is against a different model set.
Fugu can call the same kinds of frontier models it is compared against.
Orchestration beating single models is not a new concept.
Pricing numbers should be verified live before budgeting.

For most teams, the right move is simple: benchmark Fugu Ultra against Fable 5 and Opus 4.8 on your own structured tasks. Watch where the coordination layer improves quality, measure the latency and cost, and verify provider routing if compliance matters.

Fugu may be valuable. Just evaluate it as an orchestrated system, not as a single-model replacement.

Frequently Asked Questions

Does Fugu Ultra beat Fable 5?

No. Sakana does not claim that. Sakana says Fugu Ultra stands shoulder-to-shoulder with Fable 5 and Mythos Preview, which is a parity claim. Because Fugu is an orchestrator that can call frontier models, any apparent win may come from routing and synthesis rather than single-model superiority. See Fable 5 vs Mythos 5 for the single-model comparison.

What does Sakana mean when it says Fugu outperforms Opus 4.8?

That is a separate application-level claim. Per Sakana, Fugu consistently outperforms Gemini 3.1 Pro, Opus 4.8, and GPT 5.5 on tasks like AutoResearch, one-shot chess, and financial time-series prediction. Since Fugu may call frontier models inside its loop, treat this as a system-level win, not a weights-level win.

Why does Sakana compare against Mythos Preview instead of Mythos 5?

Mythos Preview is the April 2026 frontier model Anthropic described as too dangerous to release. Mythos 5 is the current generally available version. Sakana named the older preview in its comparison. That can be defensible for reproducibility, but it means the parity claim is not measured against the current ceiling. See the Mythos-class model explained.

Is Fugu a single model or a group of models?

It is an orchestrated group of models exposed as one model-like API. Fugu is a trained conductor that delegates to multiple LLMs, including recursive copies of itself. Fable 5 and Mythos are single Anthropic models that answer from their own weights.

How do I test Fugu against Fable 5 myself?

Point an OpenAI-compatible client at your Sakana console base URL, set the model to fugu-ultra, and run the same prompts against Fable 5 or Opus 4.8. In Apidog, save each model request and compare outputs side by side.

How much does Fugu cost compared to Fable 5?

The pricing structure is confirmed: subscription tiers plus pay-as-you-go. The specific Fugu dollar figures are reported from secondary sources and remain unverified as of 2026-06-22, so confirm them in your console before budgeting. For reference, Anthropic lists Fable 5 at $10 per million input tokens and $50 per million output tokens. Our Sakana Fugu benchmarks piece tracks pricing as it gets confirmed.

DEV Community