Owen

Posted on Jun 26 • Originally published at ofox.ai

Doubao Seed 2.1 API (2026): Pro & Turbo, No Volcano Signup

#ai #api #llm #bytedance

Doubao Seed 2.1 API (2026): Pro & Turbo, No Volcano Signup

Call Doubao Seed 2.1 Pro ($0.884/$4.42 per M) and Turbo (half: $0.442/$2.212) from one endpoint. 256K context, no Volcano Engine account, no Chinese phone.

Published Jun 26, 2026 · updated Jun 26, 2026

ByteDance announced Doubao Seed 2.1 on June 24, 2026, at the Volcano Engine FORCE conference. Two variants, Pro and Turbo, both at 256K context. The direct route to them runs through a Volcano Engine account, which wants a Chinese phone number and a mainland ID. This guide skips that. You call both variants from one OpenAI-compatible endpoint with a single key, and you flip between them by editing one string.

30-second answer

What you can do: Call Doubao Seed 2.1 Pro and Turbo from the standard OpenAI SDK (Python or Node), switch between them by changing the model string, and send image input to either one.

Time required: About 5 minutes if you already have an ofox key. About 10 if you need to sign up.

What you need: An ofox.ai API key, the openai SDK (any recent version), and the two model IDs: volcengine/doubao-seed-2.1-pro and volcengine/doubao-seed-2.1-turbo.

The short version of the pricing, since it drives every routing decision below: Pro is $0.884 input and $4.42 output per million tokens. Turbo is exactly half, $0.442 and $2.212. Cached input drops the floor further, $0.177 on Pro and $0.085 on Turbo. Same 256K window on both.

	Doubao Seed 2.1 Pro	Doubao Seed 2.1 Turbo
Model ID	`volcengine/doubao-seed-2.1-pro`	`volcengine/doubao-seed-2.1-turbo`
Input ($/M)	$0.884	$0.442
Output ($/M)	$4.42	$2.212
Cached input ($/M)	$0.177	$0.085
Context window	256,000	256,000
Max output	256,000	256,000
Modality	Text + image in, text out	Text + image in, text out
Positioning	Flagship deep thinking: complex coding, long-chain agents, multi-step delivery	Low cost, low latency: high-frequency enterprise traffic

Turbo's per-token price is exactly half of Pro's across input, output, and cached input. ByteDance says Turbo's features are complete and its performance is comparable to Pro, which is the vendor's framing, not a benchmark, so the routing question below is really "how confident are you that the cheap variant holds up on this specific task."

What You Can Do After This Setup (And What You Can't)

Setting expectations first, because nobody likes finding the wall after the build.

Here is what the setup gets you:

Call both Seed 2.1 variants through the OpenAI Chat Completions shape. Your existing OpenAI code mostly works after three edits: key, base URL, model.
Route by cost. Send cheap, high-frequency calls to Turbo and reserve Pro for the hard reasoning, with one string per call deciding which.
Send images. Both variants take an image_url content block, so a screenshot or a diagram goes in alongside text.
Bill in USD with an international card. No Chinese phone number, no mainland ID, no CNY top-up through Alipay or WeChat Pay.
Share one key across Doubao and the other models on the same gateway, which matters when you want a fallback that isn't another signup.

And here is what it does not get you:

ByteDance's exact mainland ARK list price. A gateway sits in the path, so the USD numbers here are the ofox rate, not the raw Volcano Engine rate. They track each other closely (roughly 6.8 RMB to the dollar against ByteDance's published ¥6 / ¥30 per-million numbers), but they are not identical.
A guarantee that "Turbo performs like Pro." That is ByteDance's framing from the launch. Test it on your own workload before you route production traffic on the strength of a marketing line.
An offline or self-hosted option. Seed 2.1 is an API-only model. There is no open-weight checkpoint to download.

If you ran the Doubao Seed 2.0 setup earlier this year, the muscle memory carries over. The difference is the lineup: 2.0 was a four-tier budget family (Pro, Lite, Mini, Code), 2.1 is a two-variant flagship split (a deep-thinking Pro and a half-price Turbo), and the model IDs changed accordingly.

Decision Frame: When to Use This Setup (and When NOT)

Before the steps, decide whether the gateway path is actually your path.

Use it when:

You're outside mainland China and don't want to chase a Chinese phone number and ID just to evaluate a model.
You want Pro and Turbo behind one key so cost routing is a string swap, not a second integration.
You already call other models through an OpenAI-compatible endpoint and want Doubao to join the same code path.

Skip it when:

You're a China-based team that already has a verified Volcano Engine account and only ever calls Doubao. Direct ARK avoids the gateway hop, and you've already paid the registration tax.
You need ByteDance's exact mainland list price to the fen for a procurement spreadsheet. Go to the source.
Your compliance rules demand a specific data-residency guarantee. Confirm that with the provider directly; a third-party gateway doesn't change where inference runs.

One stop rule: if all you wanted was a first successful call to confirm the model exists and answers, you can stop at Step 4. Steps 5 onward are routing, error handling, and team setup.

System Requirements

Nothing heavy. The whole point of an OpenAI-compatible endpoint is that the client is boring.

Component	Requirement	Notes
Runtime	Python 3.8+ or Node.js 18+	Whatever your existing OpenAI SDK already runs on
SDK	`openai` (Python or JS)	Any recent version; the Chat Completions shape is stable
API key	One ofox.ai key (`sk-ofox-...`)	From the ofox dashboard after signup
Endpoint	`https://api.ofox.ai/v1`	The OpenAI-compatible base URL
Network	Outbound HTTPS	No VPN gymnastics, no mainland-only routing

You do not need the Volcano Engine SDK, a volces.com endpoint, or any ByteDance-specific client. The gateway normalizes the underlying API into the OpenAI shape.

Step-by-Step Installation

Step 1: Get an API key

Sign up at ofox.ai, open the dashboard, and create a key. It looks like sk-ofox-.... Keep it out of source control; an environment variable is the usual place.

export OFOX_API_KEY="sk-ofox-your-key-here"

Expected result: echo $OFOX_API_KEY prints your key in the current shell.

Step 2: Install the SDK

# Python
pip install openai

# or Node
npm install openai

Expected result: pip show openai (or npm ls openai) reports an installed version. Anything recent is fine; the request shape used here hasn't changed across the modern SDK line.

Step 3: Smoke-test the endpoint with curl

Before writing any code, confirm the key and endpoint talk to each other. This call hits Turbo because it's the cheaper one to test against.

curl https://api.ofox.ai/v1/chat/completions \
  -H "Authorization: Bearer $OFOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "volcengine/doubao-seed-2.1-turbo",
    "messages": [{"role": "user", "content": "Reply with the single word: ready"}]
  }'

Expected result: a JSON body with choices[0].message.content containing ready. If you get a 401, the key is wrong or unset. If you get a 404 on the model, recheck the ID spelling (it's volcengine/doubao-seed-2.1-turbo, with dots in 2.1, not dashes).

Step 4: First call from Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-ofox-...",            # or os.environ["OFOX_API_KEY"]
    base_url="https://api.ofox.ai/v1",
)

resp = client.chat.completions.create(
    model="volcengine/doubao-seed-2.1-pro",
    messages=[{"role": "user", "content": "Explain MoE routing in two sentences."}],
)
print(resp.choices[0].message.content)

Expected result: a two-sentence answer on your terminal. Three things differ from a stock OpenAI call: the api_key, the base_url, and the model. Streaming, tools, and structured output all use the same SDK methods you already know.

Step 5: Same call from Node

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OFOX_API_KEY,
  baseURL: "https://api.ofox.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "volcengine/doubao-seed-2.1-pro",
  messages: [{ role: "user", content: "Explain MoE routing in two sentences." }],
});
console.log(resp.choices[0].message.content);

Expected result: the same two-sentence answer. The JS SDK uses baseURL (camelCase) where Python uses base_url. That's the only spelling trap.

Step 6: Switch Pro and Turbo with one string

This is the part worth slowing down for, because it's the whole reason to run both behind one key. Nothing changes except the model value.

MODELS = {
    "pro":   "volcengine/doubao-seed-2.1-pro",
    "turbo": "volcengine/doubao-seed-2.1-turbo",
}

def ask(tier: str, prompt: str) -> str:
    resp = client.chat.completions.create(
        model=MODELS[tier],
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content

print(ask("turbo", "Summarize this ticket in one line."))   # cheap path
print(ask("pro",   "Plan a three-step refactor for this module."))  # hard path

Expected result: both calls return. The cheap summary goes through Turbo at $0.442/$2.212; the planning task goes through Pro at $0.884/$4.42. You decide per call which one pays.

Common Errors During Setup (and Fixes)

The failures here are almost all the same three categories: wrong key, wrong model string, wrong request shape. The table covers what actually shows up.

Symptom	Likely cause	Fix
`401 Unauthorized`	Key missing, expired, or with a stray space	Re-export the key; confirm the `Authorization: Bearer` header has no trailing whitespace
`404` on the model	Typo in the ID, usually `2-1` instead of `2.1`	Use the exact strings: `volcengine/doubao-seed-2.1-pro` / `volcengine/doubao-seed-2.1-turbo`
`Connection refused` / DNS error	Base URL points at OpenAI or a typo'd host	Set base URL to `https://api.ofox.ai/v1` (note the `/v1`)
`400` on an image request	`image_url` block malformed or missing the `data:` prefix on base64	Send `{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}`
Empty or truncated output	`max_tokens` set too low, or you're reading the wrong field	Raise `max_tokens`; read `choices[0].message.content`
`429 Too Many Requests`	Burst above your current rate allowance	Add exponential backoff; retry after the delay the response suggests
Slow first token on Pro	Deep-thinking model spends time before emitting	Expected on Pro for hard prompts; route latency-sensitive calls to Turbo instead
`model` works in curl, fails in SDK	SDK pinned to a stale base URL via env var	Check `OPENAI_BASE_URL`; the explicit `base_url`/`baseURL` argument should win, but a leftover env var can confuse older setups

Team / Multi-Developer Configuration

Solo setup is one key in one environment variable. A team needs the key to be shared safely and the model choice to be consistent, so people aren't each hardcoding a different tier.

The pattern that holds up: keep the key in your secret manager, expose the endpoint and default tier through environment variables, and let a small config decide Pro versus Turbo per environment.

# .env.example (committed); real .env stays out of git
OFOX_API_KEY=          # pulled from the team secret manager, never committed
OFOX_BASE_URL=https://api.ofox.ai/v1
DOUBAO_TIER=turbo      # dev/staging default; prod can override to pro per route

Then read those instead of literals, so no developer pins a tier by accident:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["OFOX_API_KEY"],
    base_url=os.environ.get("OFOX_BASE_URL", "https://api.ofox.ai/v1"),
)
DEFAULT_MODEL = f"volcengine/doubao-seed-2.1-{os.environ.get('DOUBAO_TIER', 'turbo')}"

A few things that keep a team out of trouble:

Concern	Solo	Team
Key storage	One env var locally	Secret manager (Vault, AWS Secrets Manager, Doppler), injected at deploy
Tier choice	Hardcoded is fine	Driven by `DOUBAO_TIER` env var so dev defaults to Turbo, prod opts into Pro
Cost visibility	Eyeball the dashboard	Tag requests per service so the Pro/Turbo split is attributable
Onboarding	"Here's a key"	`.env.example` in the repo, key handed out through the secret manager only

The single-key, single-endpoint shape is what makes this cheap to administer. One credential to rotate, one base URL, and the only per-team decision is which tier each environment defaults to. For cost attribution, read the usage object on each response (prompt_tokens, completion_tokens) and log it against the tier you called; that's how you find out after a month whether your Pro/Turbo split matched your plan or quietly drifted toward the expensive variant. If you're standing up a broader gateway in front of several models, the multi-model router pattern covers the routing layer that sits above this.

Advanced: Pro/Turbo Routing and Image Input

Cost-aware routing in one loop

A common production shape is a cheap first pass on Turbo with an escalation to Pro only when the cheap answer isn't good enough. The escalation rule is yours, and that is the part worth thinking about, since a bad rule either escalates everything (you've paid Pro prices for a Turbo-shaped problem) or never escalates (you ship Turbo answers on tasks that needed Pro). A confidence threshold, a length check, or a cheap validator pass are all reasonable triggers. The model swap itself is one line.

def answer(prompt: str, hard: bool) -> str:
    tier = "pro" if hard else "turbo"
    resp = client.chat.completions.create(
        model=f"volcengine/doubao-seed-2.1-{tier}",
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content

The math is the reason this pays off. Take a workload of one million requests a month, each averaging 500 input and 500 output tokens. All-Pro, that's roughly 500M input at $0.884 plus 500M output at $4.42, about $2,652 a month before cached-input savings. All-Turbo, the same volume lands near $1,327, half the bill, because Turbo's per-token rate is exactly half across the board. Route 80 percent to Turbo and escalate the hard 20 percent to Pro, and you sit around $1,592, much closer to the Turbo floor than the Pro ceiling. The split is the lever, not the model. Cached input pushes it lower again on prompts that repeat a system block, since the cache rate is $0.177 on Pro and $0.085 on Turbo against the full input rate.

Streaming a response

Long Pro answers feel slow if you wait for the whole completion. Stream tokens as they arrive; the only change is stream=True and iterating the chunks. The model swap stays a one-liner here too.

stream = client.chat.completions.create(
    model="volcengine/doubao-seed-2.1-pro",
    messages=[{"role": "user", "content": "Draft a migration plan, step by step."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Expected result: text prints incrementally instead of all at once. This matters more on Pro, where a deep-thinking pass can sit quiet for a beat before it starts emitting. Turbo's first token usually lands faster, which is the whole reason it exists.

Sending an image to either variant

Both variants are multimodal (text plus image in, text out). The content block is the standard OpenAI vision shape, so a screenshot or a chart goes straight in.

import base64

with open("screenshot.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = client.chat.completions.create(
    model="volcengine/doubao-seed-2.1-pro",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What does this error dialog say to do?"},
            {"type": "image_url",
             "image_url": {"url": f"data:image/png;base64,{b64}"}},
        ],
    }],
)
print(resp.choices[0].message.content)

Expected result: a text answer that reads the image. Swap the model string to volcengine/doubao-seed-2.1-turbo and the same call runs on the cheaper variant. If you need image generation rather than understanding, that's a different ByteDance model.

Want to try it on a real workload? A single ofox key calls both Seed 2.1 variants plus the rest of the catalog from https://api.ofox.ai/v1, billed in USD with no Volcano Engine signup. Start on the Doubao Seed 2.1 Pro model page.

Alternatives

If the gateway path isn't right for you, the honest options:

ofox.ai (this guide). One key, both variants, USD billing, OpenAI-compatible endpoint, and other models on the same credential. Best when you want Doubao without a Volcano Engine account and want a fallback model on the same key. A gateway markup sits over mainland ARK pricing.
Volcano Engine ARK (direct). ByteDance's own endpoint. Cheapest list price if you can clear the registration: Chinese phone number, mainland ID, and CNY top-up. The right call for a verified China-based team that only uses Doubao.
Another OpenAI-compatible aggregator. Several gateways now carry Doubao. The integration shape is the same as here; compare on price, the breadth of the rest of the catalog, and billing currency.

FAQ

What is Doubao Seed 2.1 and when was it released? Doubao Seed 2.1 is ByteDance's next-generation model family, announced June 24, 2026 at the Volcano Engine FORCE conference. Two variants, Pro and Turbo, both at 256K context. Pro is the flagship deep-thinking model; Turbo is the low-cost, low-latency version for high-volume traffic.

How much does the Doubao Seed 2.1 API cost? Via ofox.ai in USD: Pro is $0.884 input and $4.42 output per million tokens, cached input $0.177. Turbo is exactly half: $0.442 input, $2.212 output, $0.085 cached input. Both carry 256K context and 256K max output.

Can I use Doubao Seed 2.1 without a Volcano Engine account? Yes. Direct ARK registration wants a Chinese phone number and mainland ID. The ofox.ai endpoint takes an email signup and an international card, and one key calls both variants plus other models.

What is the difference between Pro and Turbo? Pro is the flagship deep-thinking model for high-complexity work. Turbo costs exactly half per token and targets latency-sensitive, high-frequency production. ByteDance says Turbo's performance is comparable to Pro; treat that as a vendor claim and verify on your own tasks.

How do I switch between Pro and Turbo in code? Change one string. Both run on the same endpoint, so you swap model between volcengine/doubao-seed-2.1-pro and volcengine/doubao-seed-2.1-turbo. Everything else stays identical.

Does Doubao Seed 2.1 support image input? Yes. Both variants are multimodal (text plus image in, text out). Attach an image_url content block carrying a URL or a base64 data URI alongside your text prompt.

How does Doubao Seed 2.1 compare to GPT-5.5? ByteDance positions Seed 2.1's three upgrades (coding delivery, agent long-chain tasks, multimodal understanding) against GPT-5.5. That is the vendor framing from the launch, not an independent benchmark, so verify it before you depend on it.

What is the context window? 256,000 tokens of context and up to 256,000 tokens of max output, the same on both Pro and Turbo.

Originally published on ofox.ai/blog.

DEV Community

Doubao Seed 2.1 API (2026): Pro & Turbo, No Volcano Signup

Doubao Seed 2.1 API (2026): Pro & Turbo, No Volcano Signup

What You Can Do After This Setup (And What You Can't)

Decision Frame: When to Use This Setup (and When NOT)

System Requirements

Step-by-Step Installation

Step 1: Get an API key

Step 2: Install the SDK

Step 3: Smoke-test the endpoint with curl

Step 4: First call from Python

Step 5: Same call from Node

Step 6: Switch Pro and Turbo with one string

Common Errors During Setup (and Fixes)

Team / Multi-Developer Configuration

Advanced: Pro/Turbo Routing and Image Input

Cost-aware routing in one loop

Streaming a response

Sending an image to either variant

Alternatives

FAQ

Top comments (0)