Hassann

Posted on Apr 24 • Originally published at apidog.com

How to Use the GPT-5.5 API

GPT-5.5 launched on April 23, 2026. OpenAI immediately opened the model for ChatGPT and Codex, with Responses and Chat Completions APIs coming “very soon.” This guide covers both: how to call GPT-5.5 as soon as API keys work, and how to access it today via the Codex sign-in path.

Try Apidog today

This article includes endpoint shapes, authentication, Python and Node examples, the parameter table, pricing breakdown, error handling, and a testing workflow in Apidog to help you save credits while iterating.

For a product overview, see What is GPT-5.5. For a free-tier guide, see How to use GPT-5.5 API for free.

TL;DR

GPT-5.5 is available via Responses and Chat Completions endpoints. Model IDs: gpt-5.5 and gpt-5.5-pro.
API pricing: $5 / M input, $30 / M output; Pro: $30 / M input, $180 / M output.
Context window: 1M tokens (API), 400K (Codex CLI).
Until API GA, access GPT-5.5 via Codex with ChatGPT sign-in.
Use Apidog to pre-build collections; request shape matches GPT-5.4 with new model ID and expanded reasoning block.

Prerequisites

Before making your first call, ensure:

OpenAI developer account with a billable tier. ChatGPT Plus/Pro is separate from API billing; for both UI and API, you need both.
API key with GPT-5 access. Prefer project-scoped keys for production workloads.
SDK version supporting gpt-5.5: Python openai>=2.1.0, Node openai@5.1.0 or newer.
API client for easy request replay. Use curl for one-off, then switch to Apidog or similar for iteration.

Export your API key:

export OPENAI_API_KEY="sk-proj-..."

Endpoint and authentication

GPT-5.5 uses the same endpoints as GPT-5:

POST https://api.openai.com/v1/responses
POST https://api.openai.com/v1/chat/completions

Responses API is tool-aware (supports thinking mode, web search, computer use). Chat Completions maintains compatibility with legacy integrations.

Authenticate using a bearer token. Every request sends a JSON body with model ID, prompt/message array, and additional parameters as needed.

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Summarize the last 10 releases of the openai/codex repo in three bullets.",
    "reasoning": { "effort": "medium" }
  }'

Successful calls return a JSON object with an output array and a usage block (input, output, reasoning tokens). Errors return a standard OpenAI envelope with code and message; see the error table below.

Request parameters

Here’s a full map of gpt-5.5 parameters and their effects:

Parameter	Type	Values	Notes
`model`	string	`gpt-5.5`, `gpt-5.5-pro`	Required. Pro is 6× cost.
`input` / `messages`	string or array	Prompt or chat array	Required. Use `input` for Responses, `messages` for Chat Completions.
`reasoning.effort`	string	`none`, `low`, `medium`, `high`, `xhigh`	Default: `low`. `xhigh` = max depth, higher cost.
`max_output_tokens`	integer	1 – 128000	Output cap, excludes reasoning tokens.
`tools`	array	Function, web_search, file_search, computer_use, code_interpreter	Define available tools. Model chains them as needed.
`tool_choice`	string/object	`auto`, `none`, or a specific tool	Force specific tool usage.
`response_format`	object	`{ "type": "json_schema", "schema": {...} }`	Structured output. Strict mode default.
`stream`	boolean	true / false	Server-sent events; reasoning tokens streamed separately.
`user`	string	Free-form	Helps abuse detection. Pass a hashed user ID.
`metadata`	object	Up to 16 key-value pairs	Visible in OpenAI dashboard/logs.
`seed`	integer	Any int32	Soft determinism; output is similar for same prompt + seed.
`temperature`	number	0 – 2	Ignored if `reasoning.effort >= medium`.

Parameters most affecting cost: reasoning.effort, max_output_tokens, and tools. High or xhigh reasoning.effort can increase output tokens 3–8× compared to low.

Python example

SDK usage mirrors GPT-5.4; update the model ID and use the expanded reasoning.effort range.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input=[
        {
            "role": "system",
            "content": "You are a senior Go engineer. Answer in terse, runnable code.",
        },
        {
            "role": "user",
            "content": (
                "Write a worker pool with bounded concurrency and a context "
                "cancellation path. No third-party deps."
            ),
        },
    ],
    reasoning={"effort": "medium"},
    max_output_tokens=4000,
)

print(response.output_text)
print(response.usage.model_dump())

response.output_text flattens the output array. For structured events (tool calls, citations, etc.), use response.output.
usage contains input_tokens, output_tokens, reasoning_tokens. Bill against all three.

Node example

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-5.5",
  input: [
    { role: "system", content: "You are a careful reviewer." },
    {
      role: "user",
      content:
        "Review this migration and flag any operation that would lock a write-heavy table for more than 200 ms.",
    },
  ],
  reasoning: { effort: "high" },
  tools: [{ type: "file_search" }],
  max_output_tokens: 6000,
});

console.log(response.output_text);
console.log(response.usage);

Set reasoning.effort to high for review tasks where correctness outweighs cost.

Thinking mode

Thinking mode uses reasoning.effort set to high or xhigh with a higher max_output_tokens. There’s no special model ID—just adjust these parameters per request.

Default to medium for most tasks (agentic work, multi-file debugging, doc generation). Costs remain close to GPT-5.4.
Use high/xhigh for research, correctness-critical tasks, and long tool chains. Budget for 3–8× output tokens and longer response times.

If using computer_use or long web-search chains, higher effort reduces hallucinations (see OpenAI’s launch post).

Structured output

Strict JSON output is default. Pass a schema to the SDK for guaranteed JSON structure.

response = client.responses.create(
    model="gpt-5.5",
    input="Extract the title, speaker, and start time from this transcript chunk.",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "session_extract",
            "strict": True,
            "schema": {
                "type": "object",
                "required": ["title", "speaker", "start_time"],
                "properties": {
                    "title": {"type": "string"},
                    "speaker": {"type": "string"},
                    "start_time": {"type": "string", "format": "date-time"},
                },
            },
        },
    },
)

For pipelines that feed downstream code, always set a schema. This prevents malformed output and eliminates manual retry logic.

Tool use and agents

The Responses API exposes five first-party tool types:

web_search: real-time search with citations
file_search: vector search over uploaded files
code_interpreter: sandboxed Python
computer_use: mouse, keyboard, and browser via Operator stack
function: custom callbacks

GPT-5.5 chains tools more effectively than 5.4. In tests like The Decoder’s, 5.5 completed 11% more multi-step tool chains without user intervention.

Error handling and retries

Handle these common error codes explicitly:

Code	Meaning	Retry?
`429 rate_limit_exceeded`	Rate cap hit.	Yes, use exponential backoff + jitter.
`400 context_length_exceeded`	Input + output + reasoning > 1M tokens.	No; shorten input.
`500 server_error`	OpenAI server error.	Yes, up to 3 attempts.
`403 policy_violation`	Safety refusal.	No; rewrite prompt.

Reasoning tokens count toward context window. For example, reasoning.effort: "xhigh" on a 900K-token input can trigger context overflow.

Testing workflow with Apidog

Due to GPT-5.5’s cost, avoid burning tokens with repeated trial runs. Recommended workflow:

Build the request in Apidog, save it in a collection, and tag the environment (dev/staging/prod).
Use Apidog’s mock server to replay the last real response while refining downstream code.
Switch to a live key only when your schema and logic are stable.

Apidog also integrates with Claude Code and Cursor, so you can access collections directly from your editor. See the VS Code walkthrough and Apidog vs. Postman comparison for setup instructions.

Calling GPT-5.5 before the API is general

Until OpenAI’s Responses API is fully available, use the Codex sign-in flow for early access. The Codex free guide explains how to install the CLI, authenticate with ChatGPT, and select the model.

FAQ

Is there a gpt-5.5-mini? Not at launch. gpt-5.4-mini remains the cost-optimized option.

Context window size? 1M tokens (API), 400K (Codex CLI). Both count reasoning tokens.

Do I need to rewrite GPT-5.4 code? No. Swap the model ID, adjust max_output_tokens if needed, and tune reasoning.effort as appropriate.

How to reduce cost? Options: Batch (50% off), Flex (50% off with slower queue), and strict schemas to avoid retries. See the GPT-5.5 pricing breakdown for details.

Where to get API GA updates? Watch the OpenAI developer community and OpenAI API pricing page.