DEV Community

Cover image for How to Use the GPT-5.5 API
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the GPT-5.5 API

GPT-5.5 launched on April 23, 2026. OpenAI immediately opened the model for ChatGPT and Codex, with Responses and Chat Completions APIs coming “very soon.” This guide covers both: how to call GPT-5.5 as soon as API keys work, and how to access it today via the Codex sign-in path.

Try Apidog today

This article includes endpoint shapes, authentication, Python and Node examples, the parameter table, pricing breakdown, error handling, and a testing workflow in Apidog to help you save credits while iterating.

For a product overview, see What is GPT-5.5. For a free-tier guide, see How to use GPT-5.5 API for free.

TL;DR

  • GPT-5.5 is available via Responses and Chat Completions endpoints. Model IDs: gpt-5.5 and gpt-5.5-pro.
  • API pricing: $5 / M input, $30 / M output; Pro: $30 / M input, $180 / M output.
  • Context window: 1M tokens (API), 400K (Codex CLI).
  • Until API GA, access GPT-5.5 via Codex with ChatGPT sign-in.
  • Use Apidog to pre-build collections; request shape matches GPT-5.4 with new model ID and expanded reasoning block.

Prerequisites

Before making your first call, ensure:

  • OpenAI developer account with a billable tier. ChatGPT Plus/Pro is separate from API billing; for both UI and API, you need both.
  • API key with GPT-5 access. Prefer project-scoped keys for production workloads.
  • SDK version supporting gpt-5.5: Python openai>=2.1.0, Node openai@5.1.0 or newer.
  • API client for easy request replay. Use curl for one-off, then switch to Apidog or similar for iteration.

Export your API key:

export OPENAI_API_KEY="sk-proj-..."
Enter fullscreen mode Exit fullscreen mode

Endpoint and authentication

GPT-5.5 uses the same endpoints as GPT-5:

POST https://api.openai.com/v1/responses
POST https://api.openai.com/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

Responses API is tool-aware (supports thinking mode, web search, computer use). Chat Completions maintains compatibility with legacy integrations.

Authenticate using a bearer token. Every request sends a JSON body with model ID, prompt/message array, and additional parameters as needed.

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "input": "Summarize the last 10 releases of the openai/codex repo in three bullets.",
    "reasoning": { "effort": "medium" }
  }'
Enter fullscreen mode Exit fullscreen mode

Successful calls return a JSON object with an output array and a usage block (input, output, reasoning tokens). Errors return a standard OpenAI envelope with code and message; see the error table below.

Request parameters

Here’s a full map of gpt-5.5 parameters and their effects:

Parameter Type Values Notes
model string gpt-5.5, gpt-5.5-pro Required. Pro is 6× cost.
input / messages string or array Prompt or chat array Required. Use input for Responses, messages for Chat Completions.
reasoning.effort string none, low, medium, high, xhigh Default: low. xhigh = max depth, higher cost.
max_output_tokens integer 1 – 128000 Output cap, excludes reasoning tokens.
tools array Function, web_search, file_search, computer_use, code_interpreter Define available tools. Model chains them as needed.
tool_choice string/object auto, none, or a specific tool Force specific tool usage.
response_format object { "type": "json_schema", "schema": {...} } Structured output. Strict mode default.
stream boolean true / false Server-sent events; reasoning tokens streamed separately.
user string Free-form Helps abuse detection. Pass a hashed user ID.
metadata object Up to 16 key-value pairs Visible in OpenAI dashboard/logs.
seed integer Any int32 Soft determinism; output is similar for same prompt + seed.
temperature number 0 – 2 Ignored if reasoning.effort >= medium.

Parameters most affecting cost: reasoning.effort, max_output_tokens, and tools. High or xhigh reasoning.effort can increase output tokens 3–8× compared to low.

Python example

SDK usage mirrors GPT-5.4; update the model ID and use the expanded reasoning.effort range.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input=[
        {
            "role": "system",
            "content": "You are a senior Go engineer. Answer in terse, runnable code.",
        },
        {
            "role": "user",
            "content": (
                "Write a worker pool with bounded concurrency and a context "
                "cancellation path. No third-party deps."
            ),
        },
    ],
    reasoning={"effort": "medium"},
    max_output_tokens=4000,
)

print(response.output_text)
print(response.usage.model_dump())
Enter fullscreen mode Exit fullscreen mode
  • response.output_text flattens the output array. For structured events (tool calls, citations, etc.), use response.output.
  • usage contains input_tokens, output_tokens, reasoning_tokens. Bill against all three.

Node example

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-5.5",
  input: [
    { role: "system", content: "You are a careful reviewer." },
    {
      role: "user",
      content:
        "Review this migration and flag any operation that would lock a write-heavy table for more than 200 ms.",
    },
  ],
  reasoning: { effort: "high" },
  tools: [{ type: "file_search" }],
  max_output_tokens: 6000,
});

console.log(response.output_text);
console.log(response.usage);
Enter fullscreen mode Exit fullscreen mode

Set reasoning.effort to high for review tasks where correctness outweighs cost.

Thinking mode

Thinking mode uses reasoning.effort set to high or xhigh with a higher max_output_tokens. There’s no special model ID—just adjust these parameters per request.

  • Default to medium for most tasks (agentic work, multi-file debugging, doc generation). Costs remain close to GPT-5.4.
  • Use high/xhigh for research, correctness-critical tasks, and long tool chains. Budget for 3–8× output tokens and longer response times.

If using computer_use or long web-search chains, higher effort reduces hallucinations (see OpenAI’s launch post).

Structured output

Strict JSON output is default. Pass a schema to the SDK for guaranteed JSON structure.

response = client.responses.create(
    model="gpt-5.5",
    input="Extract the title, speaker, and start time from this transcript chunk.",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "session_extract",
            "strict": True,
            "schema": {
                "type": "object",
                "required": ["title", "speaker", "start_time"],
                "properties": {
                    "title": {"type": "string"},
                    "speaker": {"type": "string"},
                    "start_time": {"type": "string", "format": "date-time"},
                },
            },
        },
    },
)
Enter fullscreen mode Exit fullscreen mode

For pipelines that feed downstream code, always set a schema. This prevents malformed output and eliminates manual retry logic.

Tool use and agents

The Responses API exposes five first-party tool types:

  • web_search: real-time search with citations
  • file_search: vector search over uploaded files
  • code_interpreter: sandboxed Python
  • computer_use: mouse, keyboard, and browser via Operator stack
  • function: custom callbacks

GPT-5.5 chains tools more effectively than 5.4. In tests like The Decoder’s, 5.5 completed 11% more multi-step tool chains without user intervention.

Error handling and retries

Handle these common error codes explicitly:

Code Meaning Retry?
429 rate_limit_exceeded Rate cap hit. Yes, use exponential backoff + jitter.
400 context_length_exceeded Input + output + reasoning > 1M tokens. No; shorten input.
500 server_error OpenAI server error. Yes, up to 3 attempts.
403 policy_violation Safety refusal. No; rewrite prompt.

Reasoning tokens count toward context window. For example, reasoning.effort: "xhigh" on a 900K-token input can trigger context overflow.

Testing workflow with Apidog

Due to GPT-5.5’s cost, avoid burning tokens with repeated trial runs. Recommended workflow:

  1. Build the request in Apidog, save it in a collection, and tag the environment (dev/staging/prod).
  2. Use Apidog’s mock server to replay the last real response while refining downstream code.
  3. Switch to a live key only when your schema and logic are stable.

Apidog also integrates with Claude Code and Cursor, so you can access collections directly from your editor. See the VS Code walkthrough and Apidog vs. Postman comparison for setup instructions.

Calling GPT-5.5 before the API is general

Until OpenAI’s Responses API is fully available, use the Codex sign-in flow for early access. The Codex free guide explains how to install the CLI, authenticate with ChatGPT, and select the model.

FAQ

Is there a gpt-5.5-mini? Not at launch. gpt-5.4-mini remains the cost-optimized option.

Context window size? 1M tokens (API), 400K (Codex CLI). Both count reasoning tokens.

Do I need to rewrite GPT-5.4 code? No. Swap the model ID, adjust max_output_tokens if needed, and tune reasoning.effort as appropriate.

How to reduce cost? Options: Batch (50% off), Flex (50% off with slower queue), and strict schemas to avoid retries. See the GPT-5.5 pricing breakdown for details.

Where to get API GA updates? Watch the OpenAI developer community and OpenAI API pricing page.

Top comments (0)