GPT-5.5 launched on April 23, 2026. OpenAI immediately opened the model for ChatGPT and Codex, with Responses and Chat Completions APIs coming “very soon.” This guide covers both: how to call GPT-5.5 as soon as API keys work, and how to access it today via the Codex sign-in path.
This article includes endpoint shapes, authentication, Python and Node examples, the parameter table, pricing breakdown, error handling, and a testing workflow in Apidog to help you save credits while iterating.
For a product overview, see What is GPT-5.5. For a free-tier guide, see How to use GPT-5.5 API for free.
TL;DR
- GPT-5.5 is available via Responses and Chat Completions endpoints. Model IDs:
gpt-5.5andgpt-5.5-pro. - API pricing: $5 / M input, $30 / M output; Pro: $30 / M input, $180 / M output.
- Context window: 1M tokens (API), 400K (Codex CLI).
- Until API GA, access GPT-5.5 via Codex with ChatGPT sign-in.
- Use Apidog to pre-build collections; request shape matches GPT-5.4 with new model ID and expanded
reasoningblock.
Prerequisites
Before making your first call, ensure:
- OpenAI developer account with a billable tier. ChatGPT Plus/Pro is separate from API billing; for both UI and API, you need both.
- API key with GPT-5 access. Prefer project-scoped keys for production workloads.
-
SDK version supporting
gpt-5.5: Pythonopenai>=2.1.0, Nodeopenai@5.1.0or newer. - API client for easy request replay. Use curl for one-off, then switch to Apidog or similar for iteration.
Export your API key:
export OPENAI_API_KEY="sk-proj-..."
Endpoint and authentication
GPT-5.5 uses the same endpoints as GPT-5:
POST https://api.openai.com/v1/responses
POST https://api.openai.com/v1/chat/completions
Responses API is tool-aware (supports thinking mode, web search, computer use). Chat Completions maintains compatibility with legacy integrations.
Authenticate using a bearer token. Every request sends a JSON body with model ID, prompt/message array, and additional parameters as needed.
curl https://api.openai.com/v1/responses \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"input": "Summarize the last 10 releases of the openai/codex repo in three bullets.",
"reasoning": { "effort": "medium" }
}'
Successful calls return a JSON object with an output array and a usage block (input, output, reasoning tokens). Errors return a standard OpenAI envelope with code and message; see the error table below.
Request parameters
Here’s a full map of gpt-5.5 parameters and their effects:
| Parameter | Type | Values | Notes |
|---|---|---|---|
model |
string |
gpt-5.5, gpt-5.5-pro
|
Required. Pro is 6× cost. |
input / messages
|
string or array | Prompt or chat array | Required. Use input for Responses, messages for Chat Completions. |
reasoning.effort |
string |
none, low, medium, high, xhigh
|
Default: low. xhigh = max depth, higher cost. |
max_output_tokens |
integer | 1 – 128000 | Output cap, excludes reasoning tokens. |
tools |
array | Function, web_search, file_search, computer_use, code_interpreter | Define available tools. Model chains them as needed. |
tool_choice |
string/object |
auto, none, or a specific tool |
Force specific tool usage. |
response_format |
object | { "type": "json_schema", "schema": {...} } |
Structured output. Strict mode default. |
stream |
boolean | true / false | Server-sent events; reasoning tokens streamed separately. |
user |
string | Free-form | Helps abuse detection. Pass a hashed user ID. |
metadata |
object | Up to 16 key-value pairs | Visible in OpenAI dashboard/logs. |
seed |
integer | Any int32 | Soft determinism; output is similar for same prompt + seed. |
temperature |
number | 0 – 2 | Ignored if reasoning.effort >= medium. |
Parameters most affecting cost: reasoning.effort, max_output_tokens, and tools. High or xhigh reasoning.effort can increase output tokens 3–8× compared to low.
Python example
SDK usage mirrors GPT-5.4; update the model ID and use the expanded reasoning.effort range.
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.5",
input=[
{
"role": "system",
"content": "You are a senior Go engineer. Answer in terse, runnable code.",
},
{
"role": "user",
"content": (
"Write a worker pool with bounded concurrency and a context "
"cancellation path. No third-party deps."
),
},
],
reasoning={"effort": "medium"},
max_output_tokens=4000,
)
print(response.output_text)
print(response.usage.model_dump())
-
response.output_textflattens the output array. For structured events (tool calls, citations, etc.), useresponse.output. -
usagecontainsinput_tokens,output_tokens,reasoning_tokens. Bill against all three.
Node example
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-5.5",
input: [
{ role: "system", content: "You are a careful reviewer." },
{
role: "user",
content:
"Review this migration and flag any operation that would lock a write-heavy table for more than 200 ms.",
},
],
reasoning: { effort: "high" },
tools: [{ type: "file_search" }],
max_output_tokens: 6000,
});
console.log(response.output_text);
console.log(response.usage);
Set reasoning.effort to high for review tasks where correctness outweighs cost.
Thinking mode
Thinking mode uses reasoning.effort set to high or xhigh with a higher max_output_tokens. There’s no special model ID—just adjust these parameters per request.
-
Default to
mediumfor most tasks (agentic work, multi-file debugging, doc generation). Costs remain close to GPT-5.4. -
Use
high/xhighfor research, correctness-critical tasks, and long tool chains. Budget for 3–8× output tokens and longer response times.
If using computer_use or long web-search chains, higher effort reduces hallucinations (see OpenAI’s launch post).
Structured output
Strict JSON output is default. Pass a schema to the SDK for guaranteed JSON structure.
response = client.responses.create(
model="gpt-5.5",
input="Extract the title, speaker, and start time from this transcript chunk.",
response_format={
"type": "json_schema",
"json_schema": {
"name": "session_extract",
"strict": True,
"schema": {
"type": "object",
"required": ["title", "speaker", "start_time"],
"properties": {
"title": {"type": "string"},
"speaker": {"type": "string"},
"start_time": {"type": "string", "format": "date-time"},
},
},
},
},
)
For pipelines that feed downstream code, always set a schema. This prevents malformed output and eliminates manual retry logic.
Tool use and agents
The Responses API exposes five first-party tool types:
-
web_search: real-time search with citations -
file_search: vector search over uploaded files -
code_interpreter: sandboxed Python -
computer_use: mouse, keyboard, and browser via Operator stack -
function: custom callbacks
GPT-5.5 chains tools more effectively than 5.4. In tests like The Decoder’s, 5.5 completed 11% more multi-step tool chains without user intervention.
Error handling and retries
Handle these common error codes explicitly:
| Code | Meaning | Retry? |
|---|---|---|
429 rate_limit_exceeded |
Rate cap hit. | Yes, use exponential backoff + jitter. |
400 context_length_exceeded |
Input + output + reasoning > 1M tokens. | No; shorten input. |
500 server_error |
OpenAI server error. | Yes, up to 3 attempts. |
403 policy_violation |
Safety refusal. | No; rewrite prompt. |
Reasoning tokens count toward context window. For example, reasoning.effort: "xhigh" on a 900K-token input can trigger context overflow.
Testing workflow with Apidog
Due to GPT-5.5’s cost, avoid burning tokens with repeated trial runs. Recommended workflow:
- Build the request in Apidog, save it in a collection, and tag the environment (dev/staging/prod).
- Use Apidog’s mock server to replay the last real response while refining downstream code.
- Switch to a live key only when your schema and logic are stable.
Apidog also integrates with Claude Code and Cursor, so you can access collections directly from your editor. See the VS Code walkthrough and Apidog vs. Postman comparison for setup instructions.
Calling GPT-5.5 before the API is general
Until OpenAI’s Responses API is fully available, use the Codex sign-in flow for early access. The Codex free guide explains how to install the CLI, authenticate with ChatGPT, and select the model.
FAQ
Is there a gpt-5.5-mini? Not at launch. gpt-5.4-mini remains the cost-optimized option.
Context window size? 1M tokens (API), 400K (Codex CLI). Both count reasoning tokens.
Do I need to rewrite GPT-5.4 code? No. Swap the model ID, adjust max_output_tokens if needed, and tune reasoning.effort as appropriate.
How to reduce cost? Options: Batch (50% off), Flex (50% off with slower queue), and strict schemas to avoid retries. See the GPT-5.5 pricing breakdown for details.
Where to get API GA updates? Watch the OpenAI developer community and OpenAI API pricing page.
Top comments (0)