Got a weird one that keeps coming up. You set up your API client — Cursor, Open WebUI, Cherry Studio, whatever — point it at your gateway, hit test connection, and it passes. Model list loads fine. Then you actually try to chat and... timeout. Or 400. Or just silence.
The model list working means the URL and key are correct. So what breaks between "can list models" and "can actually generate text"?
Here are the five things I keep seeing.
Model name mismatch. The /models endpoint returns what the gateway registered, which might not match what the upstream actually expects. Your gateway calls it "gpt4", upstream wants "gpt-4o". Request goes through, upstream says "I don't know this model", you get a 400 or 404. Quick test: curl the chat endpoint directly with the exact model name you're using. If it fails, check what the upstream actually expects.
Streaming breaks things. Most clients default to stream:true. Some gateways handle streaming poorly — buffers don't flush, SSE format gets mangled, chunks arrive incomplete. The client chokes and either hangs or drops the connection. Test: turn off streaming in the client settings. If it suddenly works, that's your problem.
Timeout too short. LLM inference takes time. Long prompts, large models, high server load — all push response times up. If your client or gateway times out at 10 seconds, longer requests will fail even though the server is still working on them. The tell: requests fail at a consistent time boundary (10s, 30s, 60s). Fix: find the timeout setting and bump it to 60s or more.
Request body incompatibility. OpenAI's API format is the de facto standard, but details vary. tool_choice, response_format, function calling — some upstreams don't support all fields. Your client auto-includes these params, upstream rejects the whole request with a 400. Debug: grab the actual JSON your client sends and strip fields one by one until it works.
Rate limiting hidden from the UI. 429 errors sometimes get swallowed by the client. No error shown, just empty responses or infinite retries. Shared keys, free tier accounts, and low-tier API plans hit concurrency limits fast. Check: curl the same request manually and look at the HTTP status code. If it's 429, check the Retry-After header.
The universal debugging step: bypass the client and curl the upstream directly. If curl works, the problem is in your client or gateway config. If curl fails, it's upstream or the key itself. This step is non-negotiable — saves hours of guessing.
What other weird failures have you hit with API gateways? Curious what edge cases people have run into.
Top comments (0)