OpenAI's Codex CLI (codex-tui) shipped a new feature in version 0.128 that broke every third-party AI gateway I tested with it: streaming responses now go over WebSocket on /v1/responses instead of HTTP+SSE. If your gateway only registers a POST /v1/responses handler, every Codex session fails with a confusing storm of 405 Method Not Allowed errors interleaved with the occasional successful POST.
I run a hosted AI gateway at g0i.ai — the same problem hit me. This post is the writeup of the four-layer diagnosis I did to make it work, with the exact config and code at each layer. If you're running your own gateway (LiteLLM, Helicone, Portkey, your own Go/Python proxy, whatever) and Codex CLI users are hitting your endpoint, this is the path I'd hand you.
The symptom
Every codex-tui session emitted a flood of failed requests:
POST /v1/responses HTTP/1.1 200 254552 bytes opencode/1.14.32 ✓
GET /v1/responses HTTP/1.1 405 31 bytes codex-tui/0.128.0 ✗
GET /v1/responses HTTP/1.1 405 31 bytes codex-tui/0.128.0 ✗
GET /v1/responses HTTP/1.1 405 31 bytes codex-tui/0.128.0 ✗
Four hundred-and-five with Allow: POST is the smoking gun: the client is sending GET, the server has only POST registered, hence rejection. But why GET? Codex 0.128's user-agent is sending a WebSocket upgrade handshake, which on the wire is GET /v1/responses HTTP/1.1 with Upgrade: websocket and Connection: Upgrade.
The fix sounds easy: register a WebSocket route at the same path. But the actual request has to traverse four layers, three of which strip the upgrade headers by default.
The path the upgrade has to survive
client (codex-tui)
↓ wss://api.your-gateway.com/v1/responses
[1] Cloudflare Worker (or whatever edge you're using)
↓
[2] Cloudflare Tunnel / your VPN to origin
↓
[3] nginx reverse proxy (terminating TLS, multiplexing services)
↓
[4] FastAPI / Express / your application backend
Three of these strip the upgrade by default. Let me walk through each.
Layer 1 — Cloudflare Worker
If you're using a CF Worker for HTTP filtering, smart routing, prompt rewriting, or whatever, you're almost certainly maintaining a header sanitizer. Mine looks like this — copied from a hundred Worker examples online:
const HOP_BY_HOP = new Set([
'connection', 'keep-alive', 'proxy-authenticate',
'proxy-authorization', 'te', 'trailer',
'transfer-encoding', 'upgrade',
// ... CF-specific headers
]);
function cleanHeaders(input: Headers): Headers {
const out = new Headers();
for (const [k, v] of input.entries()) {
if (HOP_BY_HOP.has(k.toLowerCase())) continue;
out.append(k, v);
}
return out;
}
This is technically correct per RFC 7230 — Upgrade and Connection are hop-by-hop headers and shouldn't be forwarded by a true proxy. But CF Workers implementing fetch passthrough need to preserve them when the goal is letting the WebSocket upgrade reach origin.
The fix is a one-liner at the top of the fetch handler:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// WS passthrough — skip the regular HTTP pipeline. Cloudflare Workers
// forward WebSocket upgrades natively when we don't strip the headers.
if (request.headers.get('Upgrade')?.toLowerCase() === 'websocket') {
const originUrl = buildOriginRequest(url, env);
return fetch(originUrl.toString(), {
method: request.method,
headers: request.headers, // pass original — don't sanitize
body: request.body,
});
}
// ... existing HTTP flow with cleanHeaders, JSON parsing, etc.
},
};
That's it. CF Workers' fetch() knows how to forward WebSocket upgrades to origin as long as the headers are intact. The Upgrade: websocket and Connection: Upgrade headers reach origin verbatim.
Layer 2 — Cloudflare Tunnel (or your VPN)
If you're using cloudflared, the tunnel itself supports WebSocket out of the box for HTTP services. The config in ~/.cloudflared/config.yml doesn't need any special directive — service: http://localhost:8085 forwards both HTTP/1.1 traffic and WS upgrades correctly. Same for wireguard, tailscale, etc.
If you're using something more aggressive (a custom nginx-stream forwarder, a Lambda@Edge function), check that your transport handles upgrades.
Layer 3 — nginx
The standard reverse-proxy block in every nginx-on-rails tutorial is broken for WebSockets:
location /v1/ {
proxy_pass http://my_backend;
proxy_http_version 1.1;
proxy_set_header Connection ""; # ← strips Upgrade
proxy_set_header Host $host;
}
proxy_set_header Connection "" is the conventional way to enable upstream keepalive pooling, because nginx's Connection: keep-alive from the client should NOT be forwarded to upstream. But it also wipes out Connection: Upgrade in WebSocket handshakes.
The fix uses the standard nginx map directive to set Connection based on whether the client requested an upgrade. Add this once at the http {} level:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
Then for the specific endpoint that handles WebSockets, add a dedicated location block ABOVE your generic /v1/ block:
location = /v1/responses {
proxy_pass http://my_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
proxy_buffering off;
}
The = makes it an exact-match block — POST and GET to /v1/responses both go here, and the upgrade headers get forwarded properly only when the client sends them. Other /v1/* paths fall through to the keepalive-friendly block.
proxy_buffering off is critical for any streaming endpoint — buffered nginx hangs on to the response body until it has the whole thing, which defeats the entire point of streaming.
Layer 4 — FastAPI (or your backend)
The actual WebSocket handler. FastAPI makes this nicely concise:
import asyncio
import json
import httpx
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
router = APIRouter()
@router.websocket("/v1/responses")
async def responses_ws(ws: WebSocket):
# Auth via Bearer header on the WS upgrade request
auth = ws.headers.get("authorization", "")
if not auth.lower().startswith("bearer "):
await ws.close(code=4401, reason="Unauthorized")
return
token = auth[7:].strip()
user = await validate_token(token)
if not user:
await ws.close(code=4401, reason="Invalid token")
return
await ws.accept()
# First frame: the request body as a single JSON message
try:
first = await asyncio.wait_for(ws.receive_text(), timeout=60)
except asyncio.TimeoutError:
await ws.close(code=4408, reason="No request received")
return
body = json.loads(first)
body["stream"] = True # always stream over the bridge
upstream_url = resolve_upstream(body["model"]) + "/v1/responses"
async with httpx.AsyncClient(timeout=httpx.Timeout(read=600.0)) as client:
async with client.stream("POST", upstream_url, json=body) as r:
if r.status_code != 200:
err = await r.aread()
await ws.send_text(json.dumps({
"type": "error",
"error": {"message": err.decode(errors="ignore")[:500]}
}))
await ws.close(code=4502)
return
# Forward each SSE event as a WS text frame
async for line in r.aiter_lines():
if line.startswith("data: "):
payload = line[6:]
if payload.strip() == "[DONE]":
break
try:
await ws.send_text(payload)
except WebSocketDisconnect:
return # client gone
await ws.close()
A few non-obvious things in here:
1. Auth is on the upgrade request itself. The client sends Authorization: Bearer sk-... as a regular HTTP header during the GET /v1/responses upgrade. FastAPI's WebSocket.headers exposes them. You authenticate BEFORE calling await ws.accept() — if auth fails, ws.close(code=4401) rejects the upgrade cleanly with a custom close code.
2. Don't reuse a request-scoped DB session inside the WebSocket handler. I learned this the hard way — passing the route's db: AsyncSession into an asyncio task that outlives the route return causes:
asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress
The framework cleans up the request-scoped session as soon as the route returns, but the WS handler keeps running. Use a fresh session via async_session_factory() inside the WS handler.
3. The stream=True upstream call uses httpx.stream(), not httpx.AsyncClient.post(). This is the difference between "wait for the entire response, then forward" (broken — defeats streaming) and "iterate over the SSE lines as they arrive" (correct).
4. Upstream's SSE has data: prefix lines and possibly event: prefix lines. The OpenAI Responses API embeds the event type INSIDE the JSON payload (the "type": "response.output_text.delta" field), so we forward only the data: content. If your upstream uses event:-typed SSE, you may need to multiplex differently.
Verification
Once all four layers are wired, test with wscat:
$ wscat -c "wss://api.your-gateway.com/v1/responses" \
-H "Authorization: Bearer sk-..."
Connected (press CTRL+C to quit)
> {"model":"gpt-5.5","input":"reply with the word BANANA","stream":true}
< {"type":"response.created","response":{"id":"resp_..."...}}
< {"type":"response.in_progress","response":{...}}
< {"type":"response.output_item.added","item":{...}}
< {"type":"response.output_text.delta","delta":"BAN","item_id":"..."}
< {"type":"response.output_text.delta","delta":"ANA","item_id":"..."}
< {"type":"response.output_text.done","text":"BANANA",...}
< {"type":"response.completed","response":{"status":"completed",...}}
Disconnected (code: 1000, reason: "")
That's the protocol codex-tui expects. With this in place, point your codex-tui config at your gateway:
export OPENAI_BASE_URL=https://api.your-gateway.com/v1
export OPENAI_API_KEY=sk-...
codex
And the WebSocket handshake succeeds, the stream flows, and Codex's UI updates token-by-token like it should.
CDN-specific notes
A few CDNs I tested:
- Cloudflare — supports WS through Workers (with the passthrough above) and through plain proxied zones. No extra config needed if you're not using a Worker.
-
Bunny.net — supports WS upgrade verbatim. The
CDN-RequestPullCode: 101response header confirms the edge pulled "101 Switching Protocols" from origin and forwarded it. No special configuration needed beyond pointing the pull zone at your origin. - Fastly — needs explicit WS service config; their default HTTP service doesn't pass upgrades.
- AWS CloudFront — supports WebSockets but only on certain origin types; check their docs.
If you're on a CDN that doesn't pass WS upgrades AT ALL, the fallback is to bypass the CDN entirely for the /v1/responses path — DNS-only point a wsapi.your-domain.com directly at origin, and have clients hit that for WebSocket sessions. Less elegant but works.
What I'd watch for next
OpenAI's Responses API is still in flux — they've been adding fields, changing how reasoning blocks are encoded, and the WebSocket variant of /v1/responses is undocumented as of this writing (Jan 2026). If you're shipping a gateway that supports Codex, expect to chase a moving target.
The other two integration points worth watching are MCP servers (codex 0.128 added native MCP support, also over WebSocket in some configurations) and the realtime audio API (/v1/realtime, fully WebSocket, has been stable longer).
If this saved you some time, drop a comment — happy to compare notes on what other agents (Cline, Cursor, Aider, Continue) actually need on the wire. I run g0i which has integration guides for each of those, and the lessons from this codex-tui chase generalized to all of them.
If you're a gateway operator and want to compare notes on edge-case clients, my DMs are open.
Top comments (0)