DEV Community

Cover image for How to Use the ERNIE 5.1 API?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the ERNIE 5.1 API?

ERNIE 5.1 shipped on May 9, 2026, and Qianfan API support followed within a week. If you want to call the model from your own code, route tool calls through it, or wire it into an agent workflow with Apidog, this guide covers the implementation path: account setup, API key, request body, streaming, tool use, and error handling.

Try Apidog today

By the end, you’ll have working curl, Python, and Node.js examples, plus a request setup you can reproduce in Apidog.

If you have not read the ERNIE 5.1 launch breakdown, start there first. It covers benchmarks and trade-offs versus DeepSeek V4 and Kimi K2.6. This article focuses on implementation.

ERNIE 5.1 API guide

Step 1: Get a Qianfan API key

ERNIE 5.1 is served through Baidu Intelligent Cloud’s Qianfan platform. There is no separate “ERNIE API”; requests go through Qianfan.

  1. Go to cloud.baidu.com and create or sign in to a Baidu Intelligent Cloud account.
  2. Open the Qianfan console: console.bce.baidu.com/qianfan.
  3. Go to API Key Management (API Key 管理).
  4. Click Create API Key.
  5. Select the workspace and grant access to the chat-completions service.
  6. Copy the key. It should look like this:
bce-v3/ALTAK-xxxx/xxxx
Enter fullscreen mode Exit fullscreen mode

Store it in an environment variable instead of hardcoding it:

export QIANFAN_API_KEY="bce-v3/ALTAK-xxxx/xxxx"
Enter fullscreen mode Exit fullscreen mode

Two implementation notes:

  • The newer v2 endpoint uses a single Bearer token. Avoid building new integrations on the older v1 OAuth access_token flow.
  • ERNIE 5.1 is paid from day one. Add a small balance before your first request.

Step 2: Call the OpenAI-compatible endpoint with curl

Qianfan exposes an OpenAI-compatible chat-completions endpoint. If your code already supports OpenAI-style requests, you mainly need to change the base URL and model ID.

Base URL:

https://qianfan.baidubce.com/v2
Enter fullscreen mode Exit fullscreen mode

Model ID:

ernie-5.1
Enter fullscreen mode Exit fullscreen mode

For early-access features, you may also see:

ernie-5.1-preview
Enter fullscreen mode Exit fullscreen mode

Minimum working request:

curl https://qianfan.baidubce.com/v2/chat/completions \
  -H "Authorization: Bearer $QIANFAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ernie-5.1",
    "messages": [
      {"role": "system", "content": "You are a senior API designer."},
      {"role": "user", "content": "Sketch a REST schema for a GitHub-style PR review API. Be concise."}
    ],
    "temperature": 0.3
  }'
Enter fullscreen mode Exit fullscreen mode

Expected response shape:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1746780000,
  "model": "ernie-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 318,
    "total_tokens": 360
  }
}
Enter fullscreen mode Exit fullscreen mode

Common first-run failures:

  • 401 Unauthorized: the key is wrong or expired.
  • 403 Forbidden: the key is valid, but ERNIE 5.1 is not enabled for the workspace.

If you get 403, return to the console and add ERNIE 5.1 to the workspace’s allowed models.

Step 3: Call ERNIE 5.1 from Python

Because the endpoint is OpenAI-compatible, the official openai Python SDK works. Configure it with your Qianfan key and base URL.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["QIANFAN_API_KEY"],
    base_url="https://qianfan.baidubce.com/v2",
)

response = client.chat.completions.create(
    model="ernie-5.1",
    messages=[
        {"role": "system", "content": "You explain APIs in plain English."},
        {"role": "user", "content": "Why would I use server-sent events over WebSockets for a chat UI?"},
    ],
    temperature=0.4,
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")
Enter fullscreen mode Exit fullscreen mode

If your application already wraps the OpenAI SDK, testing ERNIE 5.1 can be a small configuration change. The same pattern works for DeepSeek’s API and many other OpenAI-compatible providers.

Step 4: Stream tokens for chat UIs

For user-facing chat interfaces, use streaming. Set stream: true and consume server-sent events.

stream = client.chat.completions.create(
    model="ernie-5.1",
    messages=[
        {"role": "user", "content": "Write a haiku about API versioning."}
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Curl equivalent for debugging:

curl https://qianfan.baidubce.com/v2/chat/completions \
  -H "Authorization: Bearer $QIANFAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ernie-5.1",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Stream a 3-sentence joke."}
    ]
  }' \
  --no-buffer
Enter fullscreen mode Exit fullscreen mode

The stream uses the OpenAI-style SSE format:

data: {...}
data: {...}
data: [DONE]
Enter fullscreen mode Exit fullscreen mode

Step 5: Use ERNIE 5.1 with tools

ERNIE 5.1 supports OpenAI-style function calling. Define tools, let the model choose one, execute the function in your own code, then send the tool result back to the model.

Example tool definition:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. Singapore"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    },
                },
                "required": ["city"],
            },
        },
    }
]
Enter fullscreen mode Exit fullscreen mode

Send the request:

response = client.chat.completions.create(
    model="ernie-5.1",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo right now?"}
    ],
    tools=tools,
    tool_choice="auto",
)

tool_calls = response.choices[0].message.tool_calls

if tool_calls:
    call = tool_calls[0]
    print(f"Model wants to call: {call.function.name}({call.function.arguments})")
Enter fullscreen mode Exit fullscreen mode

A typical tool loop looks like this:

  1. Send user message plus tools.
  2. Check response.choices[0].message.tool_calls.
  3. Execute the requested function in your application.
  4. Append the result as a tool role message.
  5. Call the model again.
  6. Stop when finish_reason == "stop" and no more tool calls are returned.

Defensive parsing is useful. ERNIE 5.1 may return tool arguments as a JSON string, but you should still wrap parsing in try/except. If parsing fails, strip Markdown code fences such as

```json before retrying.

Step 6: Call ERNIE 5.1 from Node.js

For Node.js projects using openai v5+, configure baseURL and apiKey.


javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QIANFAN_API_KEY,
  baseURL: "https://qianfan.baidubce.com/v2",
});

const completion = await client.chat.completions.create({
  model: "ernie-5.1",
  messages: [
    {
      role: "user",
      content: "Return a JSON object with 3 API design tips.",
    },
  ],
  response_format: {
    type: "json_object",
  },
});

console.log(completion.choices[0].message.content);


Enter fullscreen mode Exit fullscreen mode

response_format: { type: "json_object" } works for JSON-mode responses. Strict JSON schemas using json_schema are still being rolled out on Qianfan, so validate response shape in your application code instead of relying only on the model constraint.

Step 7: Test and compare providers with Apidog

If you are comparing ERNIE 5.1, DeepSeek V4, and Kimi K2.6, avoid managing everything manually from the terminal. Use Apidog to create one workspace, one folder per provider, and saved environments for API keys.

Setup:

  1. Open Apidog.
  2. Create a new project named LLM bake-off.

Create an Apidog project

Add an environment with these variables:


text
QIANFAN_API_KEY
DEEPSEEK_API_KEY
MOONSHOT_API_KEY


Enter fullscreen mode Exit fullscreen mode

Add environment variables in Apidog

Create three requests, one per provider:

Provider Model
Qianfan ernie-5.1
DeepSeek deepseek-chat
Moonshot/Kimi kimi-k2-6

Use the same messages array in all three requests. Then use Apidog’s Run feature to execute them and compare outputs.

Apidog also saves request history per environment, so you can rerun the same evaluation later against new model versions.

For more multi-provider testing patterns, see Test local LLMs as APIs and the GLM 5.1 API guide.

Pricing, rate limits, and quotas

Public Qianfan pricing for ERNIE 5.1 was not included in the release post. Check the live console rate card before quoting costs internally.

Practical notes:

  • Rate limits are workspace-scoped. New accounts usually start with a low QPS cap. Increase it from the console after testing.
  • Token usage is returned per response. Log prompt_tokens, completion_tokens, and total_tokens from the usage field.
  • Prompt caching is not automatic. Qianfan does not currently expose a prompt-caching primitive for ERNIE 5.1. If your system prompt is 2,000 tokens, you pay for it on each call.

Error handling checklist

These are the errors you are most likely to hit:

Status Meaning Fix
401 Bearer token is wrong or expired Regenerate the key from the console
403 Model is not enabled on the workspace Add ERNIE 5.1 in the console
429 Rate limit hit Use backoff and retry with jitter
400 invalid messages Invalid message-role ordering Ensure valid user/assistant/tool ordering
500 / 502 Qianfan-side failure Retry once, then check status/support

Wrap calls in exponential backoff with a maximum of three attempts. In production, also log the request_id from response headers because Baidu support may need it for debugging.

Minimal production-ready Python wrapper

Here is a small wrapper that handles basic retries for rate limits and transient server errors:


python
import os
import time
import random
from openai import OpenAI, RateLimitError, APIError

client = OpenAI(
    api_key=os.environ["QIANFAN_API_KEY"],
    base_url="https://qianfan.baidubce.com/v2",
)

def chat(messages, *, model="ernie-5.1", temperature=0.3, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
            )
        except RateLimitError:
            time.sleep((2 ** attempt) + random.random())
        except APIError as e:
            if e.status_code and e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(1 + attempt)
                continue
            raise

    raise RuntimeError("ERNIE 5.1 retries exhausted")


Enter fullscreen mode Exit fullscreen mode

Use this as the base layer. Add streaming and tool-loop handling on top.

Frequently asked questions

Is the ERNIE 5.1 API free?

No. Qianfan is pay-as-you-go. There is no permanent free tier, although new accounts may sometimes receive trial credits. For free experimentation, use the ernie.baidu.com chat UI or review these free LLM options.

Can I run ERNIE 5.1 locally?

No. Public weights are not available. If on-prem deployment is required, see how to run DeepSeek V4 locally or this guide to the best local LLMs in 2026.

Does the OpenAI SDK work without changes?

Mostly yes. Set:


text
base_url = https://qianfan.baidubce.com/v2
api_key = your Qianfan key
model = ernie-5.1


Enter fullscreen mode Exit fullscreen mode

Function calling, streaming, and response_format: json_object work. Strict json_schema validation is still rolling out.

How does ERNIE 5.1 handle Chinese vs English prompts?

Both are first-class. The Arena Search score of 1,223 came from a mixed-language voter pool. For technical English tasks such as code and API design, it is competitive with closed frontier models. For Chinese creative writing, it is best-in-class among Chinese models.

What is the max output length?

It has not been officially published. In practice, single-turn responses cap around 8K tokens before the model wraps up. For long-form generation, chunk the task and continue.

Building an agent on ERNIE 5.1? Download Apidog and use the OpenAI-compatible request collection to mock, test, and document the Qianfan endpoint alongside the rest of your services.

Top comments (0)