Vaishali

Posted on Mar 18

Chat Completions vs OpenAI Responses API: What Actually Changed

#ai #llm #machinelearning #openai

While learning about structured outputs, I noticed something strange.
Almost every tutorial, course, and example I found was still using the Chat Completions API.

But the OpenAI documentation kept referencing something newer:
The Responses API.

At first I assumed it was just another wrapper around the same thing.
But the more I looked into it, the more it became clear:

The Responses API isn’t just a new endpoint.
It’s the direction OpenAI is pushing future AI applications.

🤖 A Quick Look at the Evolution

OpenAI APIs have gone through a few stages:

Completions API
↓
Chat Completions API
↓
Responses API

Each step moved the API closer to something easier to use inside real applications..

Completions → simple text generation
Chat Completions → conversation format
Responses API → full AI system interface

The Responses API doesn't just rename endpoints — it simplifies how AI systems handle conversations, tools, and structured data.

It was built for modern capabilities like reasoning models, tool usage, and structured outputs.

Several small changes in the API design make it noticeably easier to build real applications.

🧩 Simpler Requests and Cleaner Responses

With the Chat Completions API, prompts are structured as message arrays.

Example:

from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
  model="gpt-5",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message.content)

Two things stand out here:

Requests require managing a messages array.
Responses are nested inside a choices list.

Even when you only generate one response, you still have to access it like this:

completion.choices[0].message.content

The Responses API simplifies both sides.

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
   model="gpt-5",
   instructions="You are a helpful assistant.",
   input="Hello!"
)

print(response.output_text)

Now:

requests use clearer fields like instructions and input
responses can be accessed directly with response.output_text

This removes unnecessary nesting and makes the API simpler to read and easier to work with.

🔁 Handling Conversations Is Much Cleaner

With Chat Completions, you have to manually manage conversation history.

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

res1 = client.chat.completions.create(
    model="gpt-5",
    messages=messages
)

messages += [res1.choices[0].message]
messages += [{"role": "user", "content": "And its population?"}]

res2 = client.chat.completions.create(
    model="gpt-5",
    messages=messages
)

Every response has to be manually appended to the message history.

The Responses API introduces a much cleaner approach.

res1 = client.responses.create(
    model="gpt-5",
    input="What is the capital of France?",
    store=True
)

res2 = client.responses.create(
    model="gpt-5",
    input="And its population?",
    previous_response_id=res1.id,
    store=True
)

Here the API keeps track of context using:

previous_response_id

Instead of passing the entire conversation again, the model can continue reasoning from the previous response.

⚙️ Structured Outputs Are Cleaner Too

In Chat Completions, structured outputs are defined with response_format.

response = client.chat.completions.create(
  model="gpt-5",
  messages=[{"role":"user","content":"Jane, 54 years old"}],
  response_format={           # <--- Important
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "number"}
        }
      }
    }
  }
)

In the Responses API, this moves into a more intuitive structure:

response = client.responses.create(
  model="gpt-5",
  input="Jane, 54 years old",
  text={                       # <--- Important
    "format": {
      "type": "json_schema",
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "number"}
        }
      }
    }
  }
)

This makes structured output feel like a native capability of the API, rather than an add-on.

🛠 Function Calling Is Simpler

Function calling also became cleaner in the Responses API.

In Chat Completions, functions are defined with an extra layer of nesting:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Determine weather in my location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string"
        }
      }
    }
  }
}

The Responses API removes that unnecessary wrapper and simplifies the structure:

{
  "type": "function",
  "name": "get_weather",
  "description": "Determine weather in my location",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string"
      }
    }
  }
}

The schema now lives directly inside the tool definition itself, which makes function definitions easier to read and maintain.

Another small but important difference:

Chat Completions functions are non-strict by default
Responses API functions are strict by default

This means the model is more likely to follow the defined schema without extra validation logic.

🧠 Built-in Tool Usage

Another major difference is native tool support.

With Chat Completions, developers typically had to define and manage tools themselves.

Example:

def web_search(query):
    r = requests.get(f"https://api.example.com/search?q={query}")
    return r.json().get("results", [])

functions=[
  {
    "name": "web_search",
    "description": "Search the web",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {"type": "string"}
      }
    }
  }
]

The Responses API introduces built-in tools that can be used directly.

Some examples available on the OpenAI platform include:

Web search
File search
Image generation
Code interpreter
Remote MCP servers
Skills

Instead of implementing these manually, you can simply specify the tool you want to use.

answer = client.responses.create(
    model="gpt-5",
    input="Who is the current president of France?",
    tools=[{"type": "web_search_preview"}]
)

print(answer.output_text)

The model can now use the tool inside the same request, making it easier to build tool-powered AI applications.

📈 Other Improvements

The Responses API also introduces several practical improvements:

Better performance with reasoning models
Lower costs through improved caching
Stateful context between requests
Built-in tool integrations
Future compatibility with upcoming models

These changes make it easier to build agent-like workflows without complex orchestration logic.

🔍 So Should You Still Use Chat Completions?

Chat Completions still works and is widely used.

But OpenAI is clearly designing new models and features around the Responses API.

For new projects, the newer API often provides:

simpler requests
cleaner structured outputs
built-in tool support
better context management

🌱 The Takeaway

At first glance, the Responses API might look like a small change.
But it represents something bigger.

Earlier APIs treated LLMs like chat interfaces.

The Responses API treats them more like programmable systems — capable of reasoning, using tools, and maintaining context.

And that subtle change makes building AI systems much easier.

DEV Community