Hassann

Posted on May 6 • Originally published at apidog.com

How to Access and Use GPT-5.5 Instant: ChatGPT + API Guide

OpenAI swapped ChatGPT’s default model on May 5, 2026. GPT-5.5 Instant replaced GPT-5.3 Instant, reduced hallucinated claims on high-stakes prompts by 52.5%, and kept the low-latency behavior developers expect from the Instant tier. If you build against the API, use the gpt-5.5 model name, a 1M-token context window, and published per-million-token pricing.

Try Apidog today

This guide shows how to access GPT-5.5 Instant in ChatGPT, how the router switches between Instant and Thinking, and how to call the model from code with reproducible API tests.

TL;DR

GPT-5.5 Instant is OpenAI’s new ChatGPT default and the fast tier of the GPT-5.5 family.

Free users: 10 messages every 5 hours
Plus users: 160 messages every 3 hours
Pro and Business users: unlimited use, subject to abuse guardrails
API model name: gpt-5.5
Recommended endpoint: Responses API
Instant-like API setting: reasoning.effort: "minimal"
Context window: 1M tokens
Max output: 128,000 tokens
Standard pricing: $5 per 1M input tokens, $30 per 1M output tokens

What GPT-5.5 Instant is

GPT-5.5 Instant is the latency-optimized variant of GPT-5.5. In ChatGPT, OpenAI exposes three GPT-5.5 modes:

Mode	Best for
GPT-5.5 Instant	Fast responses, default chat, low-latency UX
GPT-5.5 Thinking	Deeper reasoning and harder multi-step tasks
GPT-5.5 Pro	Extra compute for paid tiers and highest-accuracy workloads

Instant exists because OpenAI uses a router. When a prompt looks simple, ChatGPT stays on Instant. When the prompt requires more reasoning, the router may switch to GPT-5.5 Thinking automatically.

Paid users can also pin Instant manually from the model picker when predictable latency matters.

GPT-5.5 Instant and GPT-5.5 Thinking share the same underlying model family. The difference is the reasoning budget, not the knowledge base.

Both support:

1M-token context
Up to 128,000 output tokens
Code generation and debugging
Live web search through the search tool
File handling for PDFs, images, and spreadsheets
Memory in supported Plus and Pro web sessions

For the broader release details, see the GPT-5.5 overview.

How to access GPT-5.5 Instant in ChatGPT

Open chatgpt.com or the mobile app and send a message. GPT-5.5 Instant is now the default model across account tiers.

The main difference is the message cap.

Plan	GPT-5.5 Instant cap	After the cap
Free	10 messages every 5 hours	Falls back to GPT-5.5 mini
Plus	160 messages every 3 hours	Falls back to GPT-5.5 mini
Pro	Unlimited, subject to abuse guardrails	Stays on GPT-5.5
Business	Unlimited, subject to abuse guardrails	Stays on GPT-5.5
Enterprise	Unlimited, subject to abuse guardrails	Stays on GPT-5.5

Plus, Pro, and Business users can pin GPT-5.5 Instant or GPT-5.5 Thinking from the model picker in the chat header. Pinning applies to the current chat, not your whole account.

When the router switches to Thinking

If you do not pin a model, ChatGPT decides whether to use Instant or Thinking. OpenAI has not published the full routing rules, but Thinking commonly appears when a prompt:

Requires a multi-step plan
Has ambiguous constraints
Involves high-stakes domains such as medicine, law, or finance
Needs synthesis across a long context
Uses tools or agent-like execution

For normal chat, Instant is usually the right default. For guaranteed reasoning depth, pin Thinking manually or set a higher reasoning effort in the API.

How to call GPT-5.5 Instant through the API

In the API, GPT-5.5 Instant does not have a separate model ID. Use:

gpt-5.5

Then control the reasoning behavior with reasoning.effort.

Supported values:

minimal
low
medium
high

For Instant-like behavior, use:

{
  "reasoning": {
    "effort": "minimal"
  }
}

GPT-5.5 is available through two endpoints:

Endpoint	Use case
Responses API, `/v1/responses`	Recommended for new apps, tools, structured output, and streaming
Chat Completions API, `/v1/chat/completions`	Legacy compatibility

Pricing

Tier	Input	Output
Standard	$5.00 / 1M tokens	$30.00 / 1M tokens
Batch	$2.50 / 1M tokens	$15.00 / 1M tokens
Flex	$2.50 / 1M tokens	$15.00 / 1M tokens
Priority	$12.50 / 1M tokens	$75.00 / 1M tokens

Important: prompts above 272K input tokens are billed at 2x input and 1.5x output for the rest of the session on every tier except Priority.

For more examples, see the GPT-5.5 pricing breakdown.

Minimal Python request

Create an API key from the OpenAI platform, then install the SDK.

pip install --upgrade openai
export OPENAI_API_KEY="sk-..."

Call the Responses API:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "user",
            "content": "Summarize this changelog entry in 3 bullet points: ..."
        }
    ],
    max_output_tokens=400,
)

print(response.output_text)

Use reasoning={"effort": "minimal"} for the closest API equivalent to GPT-5.5 Instant in ChatGPT.

Increase the effort when needed:

reasoning={"effort": "medium"}

or:

reasoning={"effort": "high"}

Minimal Node.js request

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: "minimal" },
  input: [
    {
      role: "user",
      content: "Translate this product description into Spanish, keeping HTML intact: ..."
    }
  ],
  max_output_tokens: 600,
});

console.log(response.output_text);

Stream GPT-5.5 Instant responses

Streaming gives users faster perceived latency because the UI can render tokens as they arrive.

from openai import OpenAI

client = OpenAI()

stream = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "user",
            "content": "Draft a release note for v2.7..."
        }
    ],
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

If you are migrating from Chat Completions, note that the response shape is different. The output_text helper flattens the structured response blocks into a plain string.

For free-tier API usage and quota details, see the GPT-5.5 free access guide.

Test GPT-5.5 Instant requests with Apidog before shipping

A notebook is enough for quick experiments. Production work needs repeatable request templates, environment secrets, response assertions, and CI-ready tests.

Apidog gives you that workflow without writing throwaway scripts.

Step 1: Import the OpenAI OpenAPI spec

Apidog supports OpenAPI 3.x. Import the Responses API spec so your endpoints, parameters, and response schemas are available in one workspace.

Step 2: Store your API key as an environment secret

Add your OpenAI key to an Apidog environment, for example:

OPENAI_API_KEY=sk-...

Then reference it in the Authorization header:

Authorization: Bearer {{OPENAI_API_KEY}}

This keeps staging and production credentials separate.

Step 3: Save a GPT-5.5 Instant request template

Create a request body like this:

{
  "model": "gpt-5.5",
  "reasoning": {
    "effort": "minimal"
  },
  "input": [
    {
      "role": "user",
      "content": "Summarize this changelog entry in 3 bullet points: ..."
    }
  ],
  "max_output_tokens": 400
}

Save it as a reusable request so teammates can replay the exact same prompt and settings.

Step 4: Compare Instant and Thinking behavior

Duplicate the request and change only the reasoning effort:

{
  "reasoning": {
    "effort": "high"
  }
}

Run both requests and compare:

Latency
Token usage
Response body
Accuracy on your expected output
Cost impact

Step 5: Add assertions

Turn the request into a test scenario. Assert on fields that matter to your app, such as response status, schema, or required text.

Example checks:

pm.test("status is 200", function () {
  pm.response.to.have.status(200);
});

pm.test("response contains output text", function () {
  const json = pm.response.json();
  pm.expect(json.output).to.exist;
});

Step 6: Run the scenario in CI

Use Apidog scenarios to catch regressions when:

You edit a system prompt
OpenAI ships a model update
You change request parameters
You move from staging to production

For a deeper testing workflow, see API testing for QA engineers. You can install Apidog from Download Apidog.

Advanced implementation tips

Pin reasoning effort per route

Do not use high reasoning everywhere. Route by task complexity.

Example:

const reasoningByRoute = {
  support_triage: "minimal",
  docs_qa: "low",
  security_review: "medium",
  incident_analysis: "high",
};

Then pass the selected value into the request:

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: reasoningByRoute.support_triage },
  input: [{ role: "user", content: ticketText }],
  max_output_tokens: 500,
});

Cap output tokens

GPT-5.5 can generate up to 128,000 output tokens. Always set max_output_tokens.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[{"role": "user", "content": "Write a concise support reply: ..."}],
    max_output_tokens=300,
)

Watch the 272K-token billing cliff

If your prompt crosses 272K input tokens, the rest of the session can cost more. For long-document workflows:

Chunk documents
Use retrieval instead of stuffing full documents into one prompt
Stream partial results
Keep session boundaries clear

Use Batch for offline jobs

Batch is cheaper and fits workloads without strict latency needs:

Bulk support ticket classification
Weekly report summaries
Backfills
Large-scale content transformation

Use Priority only when latency matters

Priority costs more. Reserve it for user-facing paths where response time affects the product experience.

Stream to your frontend

For web apps, stream tokens to the browser through WebSockets or Server-Sent Events.

Example SSE shape:

app.get("/chat", async (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");

  const stream = await client.responses.create({
    model: "gpt-5.5",
    reasoning: { effort: "minimal" },
    input: [{ role: "user", content: req.query.prompt }],
    stream: true,
  });

  for await (const event of stream) {
    if (event.type === "response.output_text.delta") {
      res.write(`data: ${JSON.stringify({ delta: event.delta })}\n\n`);
    }
  }

  res.end();
});

Common mistakes to avoid

Using gpt-5.5-pro for low-stakes prompts

Pro costs significantly more. Use it only when the quality gain justifies the bill.
Leaving the system prompt empty

A short system prompt improves consistency and usually reduces wasted tokens.
Not setting reasoning.effort explicitly

Pin it so traces are reproducible.
Hardcoding API keys

Use environment variables, a secret manager, or Apidog environments.
Forgetting max_output_tokens

Always cap output to control cost.

Alternatives and comparison

GPT-5.5 Instant is one option among fast frontier models.

Model	Input	Output	Context	Notable strength
GPT-5.5 Instant	$5.00 / 1M	$30.00 / 1M	1M	ChatGPT default, low hallucination, broad tool use
GPT-5.5 Pro	$30.00 / 1M	$180.00 / 1M	1M	Highest accuracy in the OpenAI lineup
Gemini 3 Flash Preview	varies	varies	1M	Fast multimodal, Google ecosystem fit
DeepSeek V4	low	low	128K	Low-cost open-weights frontier model

Use GPT-5.5 Instant when you need ChatGPT-grade reliability, tool use, and low latency. Consider alternatives when your infrastructure, cost model, or multimodal requirements point elsewhere.

Real-world implementation patterns

Customer support triage

Use minimal effort for fast classification.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "system",
            "content": "Classify support tickets by intent. Return only JSON."
        },
        {
            "role": "user",
            "content": "I was charged twice for my subscription."
        }
    ],
    max_output_tokens=200,
)

Expected output shape:

{
  "intent": "billing_issue",
  "priority": "high",
  "needs_human": true
}

Documentation Q&A

Use retrieval-augmented context and keep the reasoning effort low unless the question requires synthesis.

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: "low" },
  input: [
    {
      role: "system",
      content: "Answer using only the provided documentation context."
    },
    {
      role: "user",
      content: `Context:\n${retrievedDocs}\n\nQuestion: ${question}`
    }
  ],
  max_output_tokens: 700,
});

Code review assistant

Use low for common review comments and medium for security-sensitive code.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "medium"},
    input=[
        {
            "role": "system",
            "content": "Review the code for bugs, security issues, and API misuse."
        },
        {
            "role": "user",
            "content": diff_text
        }
    ],
    max_output_tokens=1200,
)

Pair this with the Apidog VS Code extension when you want inline API tests for suggested changes.

Video walkthrough

Conclusion

GPT-5.5 Instant is the default path for using GPT-5.5 with low latency. In ChatGPT, it is already enabled. In the API, call gpt-5.5 and set reasoning.effort to "minimal".

Key implementation points:

Use gpt-5.5 for API calls.
Set reasoning.effort explicitly.
Use minimal for Instant-like latency.
Set max_output_tokens to control cost.
Watch the 272K-token billing threshold.
Stream responses for better UX.
Test prompts and request bodies before deployment.

If you are building with the API, install Apidog, save a reusable gpt-5.5 request template, and run it across environments before shipping.

More references:

FAQ

Is GPT-5.5 Instant free?

Yes, with caps. Free ChatGPT accounts get 10 messages every 5 hours. Plus accounts get 160 messages every 3 hours. Pro and Business accounts get unlimited use, subject to abuse guardrails.

What is the API model name for GPT-5.5 Instant?

Use gpt-5.5. There is no separate gpt-5.5-instant model ID. Set reasoning.effort: "minimal" for Instant-like behavior.

See the GPT-5.5 API guide.

How is GPT-5.5 Instant different from GPT-5.5 Thinking?

They use the same underlying GPT-5.5 family, but with different reasoning budgets. Instant is optimized for fast responses. Thinking spends more compute on harder multi-step tasks. Pro adds more compute on top of Thinking.

Does GPT-5.5 Instant support tool use?

Yes. Through the Responses API, you can use the tools parameter for supported tool workflows, including search, code execution, and file-based operations.

What is the context window?

GPT-5.5 supports a 1M-token context window and up to 128,000 output tokens per response.

Watch the 272K input-token threshold. Past that point, standard, batch, and flex sessions are billed at 2x input and 1.5x output.

Can I pin GPT-5.5 Instant in ChatGPT?

Yes, on Plus, Pro, and Business plans. Open the model picker in the chat header and select GPT-5.5 Instant. The selection applies to the current chat.

How do I test GPT-5.5 Instant before deploying?

Save the request in Apidog, store the API key as an environment secret, add response assertions, and run the scenario in CI.

What happens when GPT-5.5 Instant routes me to Thinking?

ChatGPT’s router may switch to Thinking when the prompt looks complex. You may see a longer wait for the first token. In the API, pin the behavior yourself by setting reasoning.effort.