OpenAI Responses API Started Rejecting input_text With No Warning — Here's the Fix and Why It Keeps Happening

#openai #api #ai #monitoring

If you're here because you just got this error:

Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

…from the OpenAI Responses API on a request that worked last week, you're not debugging your code. You're debugging OpenAI's.

What happened

On or around October 21, 2025, the OpenAI Responses API started rejecting input_text content items when they appeared inside a message with role: "assistant". The documented format — the one every SDK and every tutorial was using — suddenly returned a 400.

Before:

{
  "role": "assistant",
  "content": [
    { "type": "input_text", "text": "Hi, how can I help?" }
  ]
}

After:

HTTP 400 Bad Request
Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

Same endpoint. Same SDK version. Same request body. Just stopped working.

The community thread that went up the day it broke captures the vibe:

"I'm wondering why there is no backward compatibility and no even deprecation messages for that. It's really bad user experience."

A moderator responded with:

"Thanks for flagging this! I asked staff for clarification because apparently everything is gone besides input audio."

"Apparently everything is gone" is not how you want to find out your production AI pipeline has been migrated under you.

The fix

For assistant-role messages, the type name changed. input_text is now only valid inside user-role messages. For assistant messages, you need output_text:

 {
   "role": "assistant",
   "content": [
-    { "type": "input_text", "text": "Hi, how can I help?" }
+    { "type": "output_text", "text": "Hi, how can I help?" }
   ]
 }

That's it. That's the whole fix. The semantics are identical; the type tag just has to match the role now.

If you're building the assistant message from scratch (e.g., seeding a conversation), you use output_text. If you're echoing back a prior turn, you were probably already using output_text (because that's what the API returned). The breakage is specifically on the "I manually built an assistant turn" path.

A few things worth noting before you patch and move on:

The error message is honest but only if you read it closely. It lists the valid values for the role you're sending — but it doesn't say "try output_text for assistant messages." You have to infer that from the full schema.
The Realtime API had a nearly identical incident earlier. This older thread shows developers hit with Invalid value: 'input_text'. Value must be 'text'. in the Realtime API. Same family of problem: the type tag you need depends on which API surface and which role you're using, and the docs don't cross-reference.
There's a related bug report open on the WordPress PHP AI client where text-only messages hit the same 400 because the client was constructing input_text blocks by default.

Why nobody's tests caught it

Every time a silent API change hits, the retrospective hits the same beats:

Unit tests don't catch it because unit tests mock the API response. The mock was built from the docs. The docs still showed input_text as valid.

Integration tests don't catch it unless they're pointed at the live OpenAI endpoint and run often enough to catch the change before a customer does. Most teams run integration tests on PRs, not on a schedule.

Type systems don't catch it because the official Python and TypeScript SDKs type the content[] as a union that includes input_text. The runtime rejects what the static types allow.

Monitoring doesn't catch it because a 400 error on a single request looks like a bad input from your own code. You check your code, can't find the bug, and file it under "weird" — until enough customers report it that you realize it's everyone.

The SDK version didn't change because this wasn't a client-side change. Your openai package is the same version it was yesterday. The server just started enforcing a stricter schema on the exact same request.

That last one is the killer. Pinning the SDK is the first move most teams reach for when an API breaks, and in this case it does absolutely nothing.

The pattern, not just the incident

This is the fourth incident I've written about this month where a top-tier API changed shape without a version bump or a deprecation warning:

GitHub PushEvent stripped payload.commits from the Events API in October 2025. No version bump — the Events API isn't versioned. Abuse-detection pipelines ran on empty data for weeks.
Stripe's Basil release (2025-03-31) removed current_period_end and billing_thresholds from the subscription object. Teams that let their account default version float got silently migrated.
Shopify's 2025-01 Admin API changed fulfillmentHold from a string to an object and removed PrivateMetafield entirely. Apps still on 2024-10 started returning nulls where structured data used to be.
OpenAI's Responses API — this one. No announcement, no version header to pin, just a stricter server-side validator.

The common thread: the API provider has a legitimate reason for the change — abuse mitigation, internal consistency, safer defaults — and the consumer's tests assume a frozen structure. The gap between those two realities is where production breaks.

You can't stop providers from making changes. Most of the changes are even good. What you can do is refuse to find out about them from your own users.

How to actually catch this

Three defenses, in order of how much they help:

1. Pin the API version where the API supports it. Stripe, Shopify, OpenAI's Chat Completions — all of these let you pin. OpenAI's Responses API does not currently expose a version header, which is why this incident hit even teams with otherwise disciplined version management. Pin where you can, and know which of your dependencies don't give you that lever.

2. Assert on response shape in integration tests — and schedule them. Not "does our assistant respond?" but "does content[0].type equal the value we rely on?" And run these tests on a cron against the live endpoint, not just on PRs. Daily is fine. Hourly for anything customer-facing.

3. Monitor the response shape in production. Poll the endpoints you depend on (or sample live traffic), record the shape over time, and diff against a learned baseline. When a field changes type, disappears, or a new enum value starts returning 400s, you get an alert — usually hours before customers do.

The third one is what I've been building at FlareCanary. Point it at your critical OpenAI, Stripe, GitHub, Shopify endpoints — the ones whose schema changes would ruin a Monday morning — and it polls on a schedule, learns the expected structure, and flags drift. Removed fields. Changed types. Nullability shifts. New fields that might be the start of a migration. Severity-classified so noise stays low.

You don't need a dedicated tool for this. You can cron a script that hits your top handful of endpoints, hashes the field set, and diffs. The point is that some layer has to be watching response shape, because nothing else in your stack is.

The harder question

Every one of these incidents ended the same way: developers on a community forum, pasting the error, asking if anyone else had hit it. That's the monitoring layer right now. It's community forums and other people's stack traces.

Most teams know their dependency graph at the package level. They can tell you which OpenAI model they call, what GitHub endpoints they depend on, which Stripe SDK is in package.json. Almost none of them can tell you whether any of those endpoints has returned a different response shape in the last week.

HTTP 200s tell you the endpoint is up. Latency tells you it's fast. Neither of those tells you the contract is still what you thought.

If you'd been diffing Responses API calls against a baseline on October 20, you'd have had an alert on October 21 — before any of your customers opened a ticket. The capability exists. The habit doesn't.

That gap is the real lesson, and it applies to every API you don't control.

If you've been hit by an API schema change that slipped through your tests — especially the "same SDK version, same request, suddenly a 400" variety — drop a reply. I've been collecting these and the pattern is remarkably consistent.