Hassann

Posted on Jun 22 • Originally published at apidog.com

Sakana Fugu Pricing: Subscription Tiers, Pay-As-You-Go, and the Passthrough Cost Model

Sakana Fugu pricing has two confirmed structures: subscription tiers for everyday usage and pay-as-you-go pricing for heavier or enterprise workloads. The key billing mechanic is passthrough billing, where multiple agents can run inside one request without adding a separate fee per agent. The dollar figures below are reported by secondary sources because Sakana’s own release page explains the pricing structure but does not publish the numbers directly. Also note that Fugu is an orchestration system presented as a single model, which is what makes this billing model possible.

Try Apidog today

How Fugu pricing is structured

Fugu is not priced like a normal chat model. It acts as a conductor: a trained language model specialized in delegation. For each prompt, it can either answer directly or assemble a set of worker models, including recursive instances of itself.

That architecture affects how you should estimate cost.

Sakana confirms two purchase paths:

Subscription tiers for everyday work, where you pay a flat monthly fee for coding, code review, chatbots, and interactive services.

Pay-as-you-go pricing for heavier and enterprise workloads, where you pay per token. This is better suited to batch jobs, research runs, and spiky traffic.

Both the balanced fugu variant and the higher-quality fugu-ultra variant are exposed behind one OpenAI-compatible endpoint. You select the variant with a model ID string, and Fugu decides internally how much model capacity to use.

There is no standalone free tier reported. The closest reported offer is a launch promotion, covered below.

For implementation testing, treat Fugu like any other OpenAI-compatible API first: point your existing OpenAI client at the Fugu endpoint, send real requests, and log token usage per response. Apidog can help you inspect requests and responses call by call, which matters here because one Fugu request can fan out into multiple model calls behind the scenes.

Reported pricing: verify live before budgeting

The numbers below are reported from secondary sources. Before committing budget, verify current pricing inside the Sakana console.

Reported subscription tiers

Plan	Reported monthly price	Best fit
Entry	Reported $20 / month	Individual developers and light daily usage
Mid	Reported $100 / month	Teams with steady coding and review workloads
Top	Reported $200 / month	Power users and high-volume interactive services

The same subscription ladder reportedly applies to both Fugu and Fugu Ultra.

A launch promotion reportedly provides a free second month if you subscribe before the end of July 2026. If that promotion affects your decision, confirm it in the console first. Launch offers can change quickly, and this one is not confirmed on Sakana’s public release page.

Reported pay-as-you-go rates

Token type	Reported rate per 1M tokens	Reported surcharge above 272K context
Input	Reported $5	Reported $10
Output	Reported $30	Reported $45
Cached input	Reported $0.50	Reported $1.00

The surcharge column is the part to monitor. Requests above roughly 272K context tokens reportedly cost about double per million tokens.

This matters because orchestration prompts can grow quickly. Fugu may pass context between worker agents, so a research-grade Fugu Ultra request can enter the long-context pricing band faster than a standard single-model call.

If you have already priced Claude Fable 5, the shape is familiar: input, output, cached input, and long-context behavior all affect the final bill.

Understand the Sakana margin

Passthrough billing does not mean “free orchestration.”

The base Fugu variant reportedly uses the underlying model’s rate without stacking a separate per-agent orchestration fee. However, Fugu Ultra and the PAYG rates are higher than the cheapest worker models because you are also paying for Sakana’s orchestration layer:

Routing decisions
Agent-to-agent communication
Recursive delegation
Final synthesis into one answer

The practical way to think about it:

The conductor itself is relatively cheap. The Trinity paper describes a sub-20,000-parameter coordinator optimized by evolution, and the Conductor paper describes a 7B model trained with reinforcement learning that claims to beat Mixture-of-Agents at lower cost.
The models the conductor calls can be expensive. If Fugu decides your request needs a frontier model, you pay for those tokens plus Sakana’s orchestration margin.

So the cost question is not:

Is Fugu cheaper per token than a frontier model?

The better question is:

How often will my workload force Fugu to call expensive worker models?

If your traffic is mostly simple requests with occasional hard problems, Fugu may help control cost. If every request is frontier-grade reasoning, the orchestration layer may not reduce your bill.

Compare Fugu against published frontier pricing

Do not evaluate Fugu’s reported numbers in isolation. Anchor them against models with published pricing.

These figures come from Anthropic’s 2026-06-09 pricing:

Model	Input per 1M tokens	Output per 1M tokens	Description
Fable 5	$10	$50	Anthropic’s most powerful generally available model, a tier above Opus 4.8
Mythos 5	$10	$50	Same price band as Fable 5
Mythos Preview	$25	$125	April 2026 frontier model held back as “too dangerous to release”

Fugu’s reported PAYG rate of about $5 input and $30 output per million tokens looks cheaper than Fable 5 on paper. But that comparison is incomplete.

Sakana claims Fugu Ultra “stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview” across engineering, scientific, and reasoning benchmarks. That is a parity claim, not a claim that Fugu always beats those models.

Also remember the architecture: Fugu is an orchestrator. It can call other vendors’ frontier models and synthesize the result. The visible model ID is not the whole cost story; the effective cost depends on what Fugu decides to invoke internally.

If you are still evaluating access, see this guide on how to access Sakana Fugu. The single-model versus model-orchestration distinction is critical when comparing cost.

Call Fugu through an OpenAI-compatible client

Fugu exposes an OpenAI-compatible endpoint. That means you can usually reuse your existing OpenAI client and change only:

API key
Base URL
Model ID

The base URL is not published on a public page. Copy the real value from the Sakana console instead of guessing or hardcoding a host.

from openai import OpenAI

# Copy the real base URL from console.sakana.ai after you log in.
client = OpenAI(
    api_key="YOUR_SAKANA_API_KEY",
    base_url="<YOUR_FUGU_BASE_URL_FROM_CONSOLE>",
)

# "fugu" routes the balanced, passthrough-billed variant.
# "fugu-ultra" routes the maximum-quality variant.
response = client.chat.completions.create(
    model="fugu",
    messages=[
        {
            "role": "user",
            "content": "Review this function for security issues."
        },
    ],
)

print(response.choices[0].message.content)

# Log this for cost tracking.
print(response.usage)

The reported model IDs are:

fugu
fugu-ultra

Sakana may also use dated suffixes, so confirm the exact IDs in your console.

The request and response shape follows the standard OpenAI chat completions contract. Any tool or SDK that supports that protocol should work with minimal changes.

For a full setup flow, see this walkthrough on how to use the Sakana Fugu API.

Track Fugu cost per request

Because Fugu is an orchestrator, you should log usage on every response.

At minimum, store:

usage = response.usage

print({
    "prompt_tokens": usage.prompt_tokens,
    "completion_tokens": usage.completion_tokens,
    "total_tokens": usage.total_tokens,
})

For production usage, persist this data with your request metadata:

cost_log = {
    "model": "fugu",
    "prompt_tokens": response.usage.prompt_tokens,
    "completion_tokens": response.usage.completion_tokens,
    "total_tokens": response.usage.total_tokens,
    "request_type": "security_review",
}

Then aggregate by workload:

Code review
Chatbot requests
Research tasks
Batch analysis
Long-context requests
Fugu versus Fugu Ultra

This is the only reliable way to determine whether Fugu’s routing behavior saves money for your actual traffic.

Subscription or PAYG: how to choose

Use a subscription when:

Your workload is steady
You mostly run interactive coding or review tasks
You want predictable monthly spend
You do not expect frequent long-context requests

Use pay-as-you-go when:

Traffic is spiky
You run batch jobs
You need research-grade or enterprise workloads
You want to meter before committing to a subscription tier

Watch for requests above roughly 272K context tokens. Based on reported pricing, those can move into a higher cost band quickly.

Frequently Asked Questions

Does Fugu have a free tier?

No standalone free tier has been reported.

The closest reported offer is a launch promotion that gives you a free second month if you subscribe before the end of July 2026. This promotion is not confirmed on Sakana’s release page, so verify it live at console.sakana.ai before relying on it.

Why can Fugu look cheaper per token but still cost more?

Because the visible rate is not always the whole cost.

Fugu is an orchestrator. It can delegate hard problems to other vendors’ models, and those internal calls affect the final usage. A single frontier model has one published rate with no hidden fan-out. That is why a direct Claude Fable 5 pricing breakdown is easier to model.

What is passthrough billing in Fugu?

Passthrough billing means the base Fugu variant is reportedly billed at the standard rate of the underlying model it calls, without adding a separate per-agent orchestration fee for every internal agent.

You still pay for orchestration through Sakana’s margin, especially on Fugu Ultra and PAYG pricing.

Should I pick Fugu or Fugu Ultra?

Start with fugu unless you have a clear need for maximum-quality reasoning.

Use fugu-ultra for requests where answer quality matters more than cost, such as:

Complex engineering analysis
Scientific reasoning
Multi-step research
High-stakes synthesis

For cost-sensitive workloads, run the same sample prompts through both variants and compare response.usage.

How do I track what a Fugu request costs?

Log the usage field on every response. That is the best available signal for how many tokens the orchestrated request consumed.

If you are comparing Fugu with routing aggregators, this OpenRouter alternatives guide explains how to think about routing cost versus orchestration cost.

Bottom line

Fugu pricing favors workloads that are mostly easy, with occasional hard prompts that justify orchestration. It is less attractive if every request requires frontier-grade reasoning.

Before choosing a tier:

Confirm all reported prices in the Sakana console.
Run your real prompts through fugu and fugu-ultra.
Log response.usage for every request.
Watch for long-context requests above roughly 272K tokens.
Compare effective cost against direct frontier-model pricing.

To inspect Fugu token usage request by request while testing OpenAI-compatible calls, download Apidog.

DEV Community