It was during a live client demo.
The AI was mid-session. The user was answering questions.
Everything was going perfectly.
Then — this:
"Sorry, there was an error processing your request. Please try again."
The client looked at us. My manager looked at me. I looked at my laptop
and wanted to disappear.
The Investigation
First thing I checked: OpenAI dashboard. No failed runs. Nothing.
I checked our server logs. There it was:
run_timeout — after exactly 60 seconds
But here's the thing — the run wasn't failing. It was just slow.
OpenAI was still processing. Our backend gave up at 60s.
OpenAI finished at 87s.
We quit too early.
Why Does This Happen?
The longer a session gets, the more history OpenAI has to process.
Early in a session: 3–5 seconds.
Mid-session (10+ messages): 30–50 seconds.
Long sessions: 60–90+ seconds.
Our hardcoded limit of 60 seconds wasn't matching reality.
The Fix
Step 1: Made the timeout configurable via environment variable.
# .env
OPENAI_RUN_TIMEOUT_MS=150000
Step 2: Updated the polling loop to use it.
const TIMEOUT_MS = parseInt(process.env.OPENAI_RUN_TIMEOUT_MS) || 150000;
const TERMINAL = ['completed', 'failed', 'cancelled', 'expired', 'requires_action'];
while (!TERMINAL.includes(runStatus.status)) {
if (Date.now() - startTime >= TIMEOUT_MS) throw new Error('run_timeout');
await new Promise(r => setTimeout(r, 1000));
runStatus = await openai.beta.threads.runs.retrieve(threadId, run.id);
}
Step 3: Deployed. No more errors.
Lessons Learned
- Always handle ALL 5 terminal states — not just "completed"
- Never hardcode timeouts for AI workloads — they vary by session length
- Your error logs and OpenAI dashboard together tell the full story
What's Next
I'm exploring runs.stream() — streaming responses in real time,
no polling, no timeouts. Will write a follow-up once it's in production.
Have you hit this before? How did you handle it?
Drop it in the comments.
Top comments (0)