DEV Community

Cover image for How to Access and Use GPT-5.5 Instant: ChatGPT + API Guide
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Access and Use GPT-5.5 Instant: ChatGPT + API Guide

OpenAI swapped ChatGPT’s default model on May 5, 2026. GPT-5.5 Instant replaced GPT-5.3 Instant, reduced hallucinated claims on high-stakes prompts by 52.5%, and kept the low-latency behavior developers expect from the Instant tier. If you build against the API, use the gpt-5.5 model name, a 1M-token context window, and published per-million-token pricing.

Try Apidog today

This guide shows how to access GPT-5.5 Instant in ChatGPT, how the router switches between Instant and Thinking, and how to call the model from code with reproducible API tests.

TL;DR

GPT-5.5 Instant is OpenAI’s new ChatGPT default and the fast tier of the GPT-5.5 family.

  • Free users: 10 messages every 5 hours
  • Plus users: 160 messages every 3 hours
  • Pro and Business users: unlimited use, subject to abuse guardrails
  • API model name: gpt-5.5
  • Recommended endpoint: Responses API
  • Instant-like API setting: reasoning.effort: "minimal"
  • Context window: 1M tokens
  • Max output: 128,000 tokens
  • Standard pricing: $5 per 1M input tokens, $30 per 1M output tokens

What GPT-5.5 Instant is

GPT-5.5 Instant is the latency-optimized variant of GPT-5.5. In ChatGPT, OpenAI exposes three GPT-5.5 modes:

Mode Best for
GPT-5.5 Instant Fast responses, default chat, low-latency UX
GPT-5.5 Thinking Deeper reasoning and harder multi-step tasks
GPT-5.5 Pro Extra compute for paid tiers and highest-accuracy workloads

GPT-5.5 Instant screenshot

Instant exists because OpenAI uses a router. When a prompt looks simple, ChatGPT stays on Instant. When the prompt requires more reasoning, the router may switch to GPT-5.5 Thinking automatically.

Paid users can also pin Instant manually from the model picker when predictable latency matters.

GPT-5.5 model picker

GPT-5.5 Instant and GPT-5.5 Thinking share the same underlying model family. The difference is the reasoning budget, not the knowledge base.

Both support:

  • 1M-token context
  • Up to 128,000 output tokens
  • Code generation and debugging
  • Live web search through the search tool
  • File handling for PDFs, images, and spreadsheets
  • Memory in supported Plus and Pro web sessions

For the broader release details, see the GPT-5.5 overview.

How to access GPT-5.5 Instant in ChatGPT

Open chatgpt.com or the mobile app and send a message. GPT-5.5 Instant is now the default model across account tiers.

The main difference is the message cap.

Plan GPT-5.5 Instant cap After the cap
Free 10 messages every 5 hours Falls back to GPT-5.5 mini
Plus 160 messages every 3 hours Falls back to GPT-5.5 mini
Pro Unlimited, subject to abuse guardrails Stays on GPT-5.5
Business Unlimited, subject to abuse guardrails Stays on GPT-5.5
Enterprise Unlimited, subject to abuse guardrails Stays on GPT-5.5

Plus, Pro, and Business users can pin GPT-5.5 Instant or GPT-5.5 Thinking from the model picker in the chat header. Pinning applies to the current chat, not your whole account.

When the router switches to Thinking

If you do not pin a model, ChatGPT decides whether to use Instant or Thinking. OpenAI has not published the full routing rules, but Thinking commonly appears when a prompt:

  • Requires a multi-step plan
  • Has ambiguous constraints
  • Involves high-stakes domains such as medicine, law, or finance
  • Needs synthesis across a long context
  • Uses tools or agent-like execution

For normal chat, Instant is usually the right default. For guaranteed reasoning depth, pin Thinking manually or set a higher reasoning effort in the API.

How to call GPT-5.5 Instant through the API

In the API, GPT-5.5 Instant does not have a separate model ID. Use:

gpt-5.5
Enter fullscreen mode Exit fullscreen mode

Then control the reasoning behavior with reasoning.effort.

Supported values:

minimal
low
medium
high
Enter fullscreen mode Exit fullscreen mode

For Instant-like behavior, use:

{
  "reasoning": {
    "effort": "minimal"
  }
}
Enter fullscreen mode Exit fullscreen mode

GPT-5.5 is available through two endpoints:

Endpoint Use case
Responses API, /v1/responses Recommended for new apps, tools, structured output, and streaming
Chat Completions API, /v1/chat/completions Legacy compatibility

Pricing

Tier Input Output
Standard $5.00 / 1M tokens $30.00 / 1M tokens
Batch $2.50 / 1M tokens $15.00 / 1M tokens
Flex $2.50 / 1M tokens $15.00 / 1M tokens
Priority $12.50 / 1M tokens $75.00 / 1M tokens

Important: prompts above 272K input tokens are billed at 2x input and 1.5x output for the rest of the session on every tier except Priority.

For more examples, see the GPT-5.5 pricing breakdown.

Minimal Python request

Create an API key from the OpenAI platform, then install the SDK.

OpenAI API key screenshot

pip install --upgrade openai
export OPENAI_API_KEY="sk-..."
Enter fullscreen mode Exit fullscreen mode

Call the Responses API:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "user",
            "content": "Summarize this changelog entry in 3 bullet points: ..."
        }
    ],
    max_output_tokens=400,
)

print(response.output_text)
Enter fullscreen mode Exit fullscreen mode

Use reasoning={"effort": "minimal"} for the closest API equivalent to GPT-5.5 Instant in ChatGPT.

Increase the effort when needed:

reasoning={"effort": "medium"}
Enter fullscreen mode Exit fullscreen mode

or:

reasoning={"effort": "high"}
Enter fullscreen mode Exit fullscreen mode

Minimal Node.js request

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: "minimal" },
  input: [
    {
      role: "user",
      content: "Translate this product description into Spanish, keeping HTML intact: ..."
    }
  ],
  max_output_tokens: 600,
});

console.log(response.output_text);
Enter fullscreen mode Exit fullscreen mode

Stream GPT-5.5 Instant responses

Streaming gives users faster perceived latency because the UI can render tokens as they arrive.

from openai import OpenAI

client = OpenAI()

stream = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "user",
            "content": "Draft a release note for v2.7..."
        }
    ],
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

If you are migrating from Chat Completions, note that the response shape is different. The output_text helper flattens the structured response blocks into a plain string.

For free-tier API usage and quota details, see the GPT-5.5 free access guide.

Test GPT-5.5 Instant requests with Apidog before shipping

A notebook is enough for quick experiments. Production work needs repeatable request templates, environment secrets, response assertions, and CI-ready tests.

Apidog API testing screenshot

Apidog gives you that workflow without writing throwaway scripts.

Step 1: Import the OpenAI OpenAPI spec

Apidog supports OpenAPI 3.x. Import the Responses API spec so your endpoints, parameters, and response schemas are available in one workspace.

Step 2: Store your API key as an environment secret

Add your OpenAI key to an Apidog environment, for example:

OPENAI_API_KEY=sk-...
Enter fullscreen mode Exit fullscreen mode

Then reference it in the Authorization header:

Authorization: Bearer {{OPENAI_API_KEY}}
Enter fullscreen mode Exit fullscreen mode

This keeps staging and production credentials separate.

Step 3: Save a GPT-5.5 Instant request template

Create a request body like this:

{
  "model": "gpt-5.5",
  "reasoning": {
    "effort": "minimal"
  },
  "input": [
    {
      "role": "user",
      "content": "Summarize this changelog entry in 3 bullet points: ..."
    }
  ],
  "max_output_tokens": 400
}
Enter fullscreen mode Exit fullscreen mode

Save it as a reusable request so teammates can replay the exact same prompt and settings.

Step 4: Compare Instant and Thinking behavior

Duplicate the request and change only the reasoning effort:

{
  "reasoning": {
    "effort": "high"
  }
}
Enter fullscreen mode Exit fullscreen mode

Run both requests and compare:

  • Latency
  • Token usage
  • Response body
  • Accuracy on your expected output
  • Cost impact

Step 5: Add assertions

Turn the request into a test scenario. Assert on fields that matter to your app, such as response status, schema, or required text.

Example checks:

pm.test("status is 200", function () {
  pm.response.to.have.status(200);
});

pm.test("response contains output text", function () {
  const json = pm.response.json();
  pm.expect(json.output).to.exist;
});
Enter fullscreen mode Exit fullscreen mode

Step 6: Run the scenario in CI

Use Apidog scenarios to catch regressions when:

  • You edit a system prompt
  • OpenAI ships a model update
  • You change request parameters
  • You move from staging to production

For a deeper testing workflow, see API testing for QA engineers. You can install Apidog from Download Apidog.

Advanced implementation tips

Pin reasoning effort per route

Do not use high reasoning everywhere. Route by task complexity.

Example:

const reasoningByRoute = {
  support_triage: "minimal",
  docs_qa: "low",
  security_review: "medium",
  incident_analysis: "high",
};
Enter fullscreen mode Exit fullscreen mode

Then pass the selected value into the request:

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: reasoningByRoute.support_triage },
  input: [{ role: "user", content: ticketText }],
  max_output_tokens: 500,
});
Enter fullscreen mode Exit fullscreen mode

Cap output tokens

GPT-5.5 can generate up to 128,000 output tokens. Always set max_output_tokens.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[{"role": "user", "content": "Write a concise support reply: ..."}],
    max_output_tokens=300,
)
Enter fullscreen mode Exit fullscreen mode

Watch the 272K-token billing cliff

If your prompt crosses 272K input tokens, the rest of the session can cost more. For long-document workflows:

  • Chunk documents
  • Use retrieval instead of stuffing full documents into one prompt
  • Stream partial results
  • Keep session boundaries clear

Use Batch for offline jobs

Batch is cheaper and fits workloads without strict latency needs:

  • Bulk support ticket classification
  • Weekly report summaries
  • Backfills
  • Large-scale content transformation

Use Priority only when latency matters

Priority costs more. Reserve it for user-facing paths where response time affects the product experience.

Stream to your frontend

For web apps, stream tokens to the browser through WebSockets or Server-Sent Events.

Example SSE shape:

app.get("/chat", async (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");

  const stream = await client.responses.create({
    model: "gpt-5.5",
    reasoning: { effort: "minimal" },
    input: [{ role: "user", content: req.query.prompt }],
    stream: true,
  });

  for await (const event of stream) {
    if (event.type === "response.output_text.delta") {
      res.write(`data: ${JSON.stringify({ delta: event.delta })}\n\n`);
    }
  }

  res.end();
});
Enter fullscreen mode Exit fullscreen mode

Common mistakes to avoid

  1. Using gpt-5.5-pro for low-stakes prompts

    Pro costs significantly more. Use it only when the quality gain justifies the bill.

  2. Leaving the system prompt empty

    A short system prompt improves consistency and usually reduces wasted tokens.

  3. Not setting reasoning.effort explicitly

    Pin it so traces are reproducible.

  4. Hardcoding API keys

    Use environment variables, a secret manager, or Apidog environments.

  5. Forgetting max_output_tokens

    Always cap output to control cost.

Alternatives and comparison

GPT-5.5 Instant is one option among fast frontier models.

Model Input Output Context Notable strength
GPT-5.5 Instant $5.00 / 1M $30.00 / 1M 1M ChatGPT default, low hallucination, broad tool use
GPT-5.5 Pro $30.00 / 1M $180.00 / 1M 1M Highest accuracy in the OpenAI lineup
Gemini 3 Flash Preview varies varies 1M Fast multimodal, Google ecosystem fit
DeepSeek V4 low low 128K Low-cost open-weights frontier model

Use GPT-5.5 Instant when you need ChatGPT-grade reliability, tool use, and low latency. Consider alternatives when your infrastructure, cost model, or multimodal requirements point elsewhere.

Real-world implementation patterns

Customer support triage

Use minimal effort for fast classification.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "minimal"},
    input=[
        {
            "role": "system",
            "content": "Classify support tickets by intent. Return only JSON."
        },
        {
            "role": "user",
            "content": "I was charged twice for my subscription."
        }
    ],
    max_output_tokens=200,
)
Enter fullscreen mode Exit fullscreen mode

Expected output shape:

{
  "intent": "billing_issue",
  "priority": "high",
  "needs_human": true
}
Enter fullscreen mode Exit fullscreen mode

Documentation Q&A

Use retrieval-augmented context and keep the reasoning effort low unless the question requires synthesis.

const response = await client.responses.create({
  model: "gpt-5.5",
  reasoning: { effort: "low" },
  input: [
    {
      role: "system",
      content: "Answer using only the provided documentation context."
    },
    {
      role: "user",
      content: `Context:\n${retrievedDocs}\n\nQuestion: ${question}`
    }
  ],
  max_output_tokens: 700,
});
Enter fullscreen mode Exit fullscreen mode

Code review assistant

Use low for common review comments and medium for security-sensitive code.

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "medium"},
    input=[
        {
            "role": "system",
            "content": "Review the code for bugs, security issues, and API misuse."
        },
        {
            "role": "user",
            "content": diff_text
        }
    ],
    max_output_tokens=1200,
)
Enter fullscreen mode Exit fullscreen mode

Pair this with the Apidog VS Code extension when you want inline API tests for suggested changes.

Video walkthrough

Conclusion

GPT-5.5 Instant is the default path for using GPT-5.5 with low latency. In ChatGPT, it is already enabled. In the API, call gpt-5.5 and set reasoning.effort to "minimal".

Key implementation points:

  • Use gpt-5.5 for API calls.
  • Set reasoning.effort explicitly.
  • Use minimal for Instant-like latency.
  • Set max_output_tokens to control cost.
  • Watch the 272K-token billing threshold.
  • Stream responses for better UX.
  • Test prompts and request bodies before deployment.

If you are building with the API, install Apidog, save a reusable gpt-5.5 request template, and run it across environments before shipping.

More references:

FAQ

Is GPT-5.5 Instant free?

Yes, with caps. Free ChatGPT accounts get 10 messages every 5 hours. Plus accounts get 160 messages every 3 hours. Pro and Business accounts get unlimited use, subject to abuse guardrails.

What is the API model name for GPT-5.5 Instant?

Use gpt-5.5. There is no separate gpt-5.5-instant model ID. Set reasoning.effort: "minimal" for Instant-like behavior.

See the GPT-5.5 API guide.

How is GPT-5.5 Instant different from GPT-5.5 Thinking?

They use the same underlying GPT-5.5 family, but with different reasoning budgets. Instant is optimized for fast responses. Thinking spends more compute on harder multi-step tasks. Pro adds more compute on top of Thinking.

Does GPT-5.5 Instant support tool use?

Yes. Through the Responses API, you can use the tools parameter for supported tool workflows, including search, code execution, and file-based operations.

What is the context window?

GPT-5.5 supports a 1M-token context window and up to 128,000 output tokens per response.

Watch the 272K input-token threshold. Past that point, standard, batch, and flex sessions are billed at 2x input and 1.5x output.

Can I pin GPT-5.5 Instant in ChatGPT?

Yes, on Plus, Pro, and Business plans. Open the model picker in the chat header and select GPT-5.5 Instant. The selection applies to the current chat.

How do I test GPT-5.5 Instant before deploying?

Save the request in Apidog, store the API key as an environment secret, add response assertions, and run the scenario in CI.

What happens when GPT-5.5 Instant routes me to Thinking?

ChatGPT’s router may switch to Thinking when the prompt looks complex. You may see a longer wait for the first token. In the API, pin the behavior yourself by setting reasoning.effort.

Top comments (0)