To test the Sakana Fugu API in Apidog, create a POST request to Fugu’s OpenAI-compatible /chat/completions endpoint, add Authorization: Bearer <key>, and send a standard chat-completions payload with either the fugu or fugu-ultra model. Because Fugu exposes one OpenAI-compatible endpoint, any OpenAI-style client or API tool can call it without changing SDKs. Apidog is useful here because you can save request variants, inspect streaming SSE deltas, compare response history, and measure latency in one workspace.
If you want the code-first integration path, use the companion guide to using the Sakana Fugu API. This article focuses on testing and observing Fugu inside Apidog.
What you are testing with Fugu
Fugu is not a plain chat model. Per Sakana, it is a multi-agent orchestration system exposed as a single foundation model behind one API. A trained language model handles delegation, agent communication, and synthesis, then dynamically coordinates multiple LLMs, including recursive instances of itself.
For more context on the orchestration model, see the explainer on what Sakana Fugu is.
That design changes what you should test. One request may be answered directly, or Fugu may coordinate a team of models behind the scenes. You still receive one response, but the work underneath may involve several model calls.
In Apidog, focus on:
- Latency: a proxy for whether Fugu answered directly or performed an orchestration hop.
- Streaming behavior: especially the delay before the first SSE delta.
-
usageobject: token accounting for the parent Fugu request. - Response quality: compare the balanced and Ultra variants with the same prompt.
Fugu exposes two main variants through the same endpoint:
-
fugu: balanced, lower-latency variant for everyday coding, code review, chatbots, and interactive services. -
fugu-ultra: quality-focused variant for AI research, paper reproduction, cybersecurity analysis, and literature or patent investigation.
The beta and early coverage used the label “Fugu Mini.” The release page now leads with “Fugu” and “Fugu Ultra,” so use those names unless your console shows otherwise.
Get the base URL and API key
Fugu is available behind Sakana’s console. Sign in at console.sakana.ai with Google or email, then copy:
- Your API key
- Your Fugu base URL
- The currently supported model IDs
As of 2026-06-22, Sakana has not published the base URL on a public page. Do not guess it. Copy the exact host from the console and store it as a variable.
In the examples below, replace:
<YOUR_FUGU_BASE_URL_FROM_CONSOLE>
with the value shown in your Sakana console.
Also check the console for current access details. Availability, self-serve sign-up, and any regional restrictions may change.
Set up the Fugu request in Apidog
Create a new Apidog project, then create a new HTTP request.
1. Create environment variables
Do not paste secrets directly into request URLs or headers. Create an Apidog environment such as Fugu Prod, then add:
| Variable | Value |
|---|---|
fugu_base_url |
Your Fugu base URL from the Sakana console |
fugu_key |
Your Sakana API key |
Your request can then use:
{{fugu_base_url}}/chat/completions
and:
Bearer {{fugu_key}}
This makes it easy to switch between staging and production keys without editing each saved request.
If you have used OpenAI-compatible provider routing before, this is the same pattern shown in the Claude Code with OpenRouter walkthrough: one base URL and one bearer token redirect an OpenAI-style client to another backend.
2. Configure the request
Use:
POST {{fugu_base_url}}/chat/completions
Add headers:
Authorization: Bearer {{fugu_key}}
Content-Type: application/json
Add this JSON body:
{
"model": "fugu",
"messages": [
{
"role": "system",
"content": "You are a concise API testing assistant."
},
{
"role": "user",
"content": "Summarize what an SSE delta is in two sentences."
}
],
"stream": false
}
The payload shape follows the OpenAI chat completions reference. That is the main benefit of Fugu’s OpenAI-compatible endpoint: the request format stays familiar.
Model IDs reported at launch include:
fugu
fugu-ultra
You may also see dated IDs such as:
fugu-ultra-20260615
Confirm the exact IDs in the Sakana console instead of hardcoding dated values, because dated IDs can rotate.
Send the request. You should receive a chat completion object with a choices array and a usage block.
Save this request as:
Fugu balanced
Create an Ultra request variant
Duplicate the saved request and change only the model field:
{
"model": "fugu-ultra",
"messages": [
{
"role": "user",
"content": "Reproduce the core result of the Trinity coordinator paper in plain language and note one limitation."
}
],
"stream": false
}
Save it as:
Fugu Ultra
Now you have two saved requests:
| Request | Endpoint | Model |
|---|---|---|
Fugu balanced |
/chat/completions |
fugu |
Fugu Ultra |
/chat/completions |
fugu-ultra |
This setup makes comparison simple:
- Send the same prompt to both requests.
- Compare response time.
- Compare output quality.
- Diff response bodies.
- Re-run prompts and inspect response history.
For a broader approach to sequencing and comparing API calls, see the API test orchestration guide.
Inspect SSE streaming deltas
Streaming is useful because Fugu may spend time coordinating agents before tokens start arriving.
Change stream to true:
{
"model": "fugu-ultra",
"messages": [
{
"role": "user",
"content": "Walk through a one-shot chess opening analysis, step by step."
}
],
"stream": true
}
The response is returned as:
Content-Type: text/event-stream
You will receive a sequence of data: chunks:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Sicilian"},"finish_reason":null}]}
data: [DONE]
The delta object carries incremental token content.
Typical stream behavior:
- The first chunk may include the assistant
role. - Later chunks include
contentfragments. - The final chunk sets
finish_reason. - The stream ends with:
data: [DONE]
In Apidog, watch the time before the first visible delta. A long pause before streaming starts can indicate that Fugu is coordinating agents before producing the answer. The balanced fugu variant typically starts sooner because it targets lower latency, while fugu-ultra may spend more time coordinating for higher answer quality.
Read the usage object
For non-streaming calls, inspect the usage block in the JSON response:
{
"usage": {
"prompt_tokens": 38,
"completion_tokens": 412,
"total_tokens": 450
}
}
This shows token accounting for your parent request to Fugu.
Important limitation: Fugu is an orchestrator that can call other frontier models and recursive instances of itself. The usage object you see in the response is not necessarily a full trace of every downstream model call.
Per Sakana, pricing uses subscription tiers for everyday usage plus pay-as-you-go plans for heavier and enterprise workloads. Check the console or Sakana pricing page for current billing details.
If you want a single-model comparison baseline in the same Apidog project, the Claude Fable 5 API guide covers an endpoint you can test alongside Fugu.
Measure orchestration hop latency
To measure the visible orchestration cost:
- Open
Fugu balanced. - Send a fixed prompt.
- Record Apidog’s response time.
- Open
Fugu Ultra. - Send the same prompt.
- Record the response time.
- Repeat several times.
Example prompt:
Analyze this Python function for correctness, performance, and edge cases. Suggest a safer implementation.
Use the same messages array for both models:
{
"messages": [
{
"role": "user",
"content": "Analyze this Python function for correctness, performance, and edge cases. Suggest a safer implementation."
}
]
}
Only change:
"model": "fugu"
to:
"model": "fugu-ultra"
You will usually see the balanced variant return faster. Per Sakana, fugu targets low latency and interactive usage, while fugu-ultra targets maximum quality for research-grade tasks.
The latency difference is your practical signal of orchestration overhead. If Ultra takes longer, that extra time may reflect deeper coordination instead of a single-pass answer.
Good stress-test prompts include tasks from Sakana’s stated application areas:
- AutoResearch
- Mechanical design
- Financial time-series prediction
- One-shot chess
- Paper reproduction
- Cybersecurity analysis
Sakana states that Fugu performs strongly against models such as Gemini 3.1 Pro, Opus 4.8, and GPT 5.5 on specific applications. Read those claims carefully: because Fugu is a model-of-models orchestrator, a benchmark result may come from calling and synthesizing outputs from other models, not from a single standalone model. Test with your own prompts and compare outputs directly.
Probe routing and governance behavior
Sakana’s release page describes agent-pool mechanics that are worth testing when available in your console:
- Swappable agents
- Agent opt-out for data or compliance reasons
- Dynamic routing around provider restrictions
If your console exposes agent-pool controls, run this workflow:
- Save a baseline request in Apidog.
- Send the request and save the response.
- Change the eligible agent pool in the console.
- Re-run the same saved request.
- Compare latency, output, and
usage. - Diff responses in Apidog.
This gives you a repeatable way to observe how routing changes affect the final answer.
The research lineage behind this approach includes two ICLR 2026 papers:
- Trinity, “An Evolved LLM Coordinator”, a sub-20K-parameter coordinator optimized by derivative-free evolution with Thinker, Worker, and Verifier roles.
- Conductor, “Learning to Orchestrate Agents in Natural Language”, a 7B model trained with reinforcement learning that learns its own communication structure and claims to beat Mixture-of-Agents at lower cost.
They use different methods and sizes, so do not conflate them. Also avoid mapping a specific parameter count directly onto the shipped Fugu product unless Sakana states it officially.
Fit this into your Apidog workflow
Testing Fugu in Apidog is more useful than a one-off curl because the workflow is repeatable:
- Store
fugu_base_urlandfugu_keyas environment variables. - Create a saved
fugurequest. - Duplicate it for
fugu-ultra. - Send identical prompts to both.
- Inspect latency and
usage. - Enable streaming and watch SSE deltas.
- Diff responses across runs.
- Update one environment variable when keys, hosts, or model IDs change.
Sakana takes its name from the Japanese word for fish, and the school-of-fish branding fits an orchestrator that coordinates many models into one answer. Fugu, the pufferfish, is a delicacy that is safe only when prepared carefully. Treat that as a useful metaphor for agent routing, not as a benchmark.
Point your OpenAI-compatible request at Fugu, save both model variants, and use Apidog to observe how latency, streaming, and response quality change under real prompts.
Frequently Asked Questions
What base URL do I use to test Fugu in Apidog?
Copy the base URL from console.sakana.ai after you sign in. Sakana has not published the host on a public page as of 2026-06-22, so do not guess it. Store it as an Apidog environment variable and reference it as:
{{fugu_base_url}}/chat/completions
Do I need a special SDK to call Fugu?
No. Fugu exposes an OpenAI-compatible endpoint, so any OpenAI client or API tool that supports the chat completions format can call it with a base URL and key change. The same redirect pattern is shown in the Claude Code with OpenRouter guide.
How do I test streaming responses from Fugu?
Set:
"stream": true
The response arrives as text/event-stream with data: chunks carrying incremental delta content. The stream ends with:
data: [DONE]
Apidog renders the SSE stream live, so you can watch deltas arrive in real time.
What is the difference between Fugu and Fugu Ultra?
fugu is the balanced, lower-latency variant for everyday coding, review, chatbots, and interactive services.
fugu-ultra targets maximum answer quality for research, paper reproduction, security analysis, and deeper reasoning tasks.
Both use the same endpoint. The difference is the model field.
Why is Fugu Ultra slower than the balanced variant?
The extra latency is the visible orchestration hop. Per Sakana, Fugu can answer directly or coordinate multiple models. Ultra leans toward deeper coordination for higher-quality answers, so it may take longer before returning a final response or starting a stream.
Are Fugu’s benchmark wins single-model results?
No. Fugu is an orchestrator that can call other vendors’ frontier models, recursively including itself. A result that “beats Opus 4.8,” per Sakana, may come from calling Opus and synthesizing its output. Treat Fugu’s numbers as model-of-models results and verify against your own prompts.


Top comments (0)