Elizabeth Fuentes L for AWS

Posted on Apr 30 • Edited on May 4 • Originally published at builder.aws.com

Fix MCP Timeouts: Async HandleId Pattern

#ai #llm #python #tutorial

MCP tools freeze AI agents when external APIs are slow, causing 424 errors. The async handleId pattern returns immediately with a job ID and polls for results without blocking.

MCP tool timeout occurs when an AI agent calls a Model Context Protocol (MCP) tool that depends on a slow external API. The tool blocks the agent indefinitely instead of returning an error. The result is a 424 (Failed Dependency) error or a frozen workflow with no user feedback. This post shows the problem with real scenarios and how the async handleId pattern provides immediate responses.

This demo uses Strands Agents with MCP (Model Context Protocol). The async pattern is framework-agnostic and applies to any agent that calls external APIs through MCP.

Working code: github.com/aws-samples/sample-why-agents-fail

Series: Why AI Agents Fail

Context Window Overflow — Memory Pointer Pattern for large data
MCP Tools That Never Respond (this post) — Async pattern for slow external APIs
AI Agent Reasoning Loops — Detect and block repeated tool calls

The Problem: MCP Tools That Never Respond

The Model Context Protocol (MCP) enables AI agents to call external tools. But when those tools depend on slow APIs, the entire agent workflow freezes. The agent waits. The user waits. Nothing happens.

Community observation from Octopus (Resilient AI Agents With MCP, 2025) identifies the core issue: as external system integrations increase, so does the likelihood of failure. Systems become unavailable, slow to respond, or return errors. Agents have no built-in strategy to handle this.

OpenAI Community reports confirm the real-world impact:

424 errors when MCP tools take too long
Unresponsive states where requests neither succeed nor fail
Tools that pass handshake validation but timeout during execution

Why This Happens

MCP expects tools to respond quickly. When a tool calls a slow external API.

The MCP protocol has implicit timeout expectations. If the tool doesn't respond within ~7-10 seconds, the connection may drop with a 424 (Failed Dependency) error. The agent receives an error instead of data, and the user gets no useful response.

Three failure modes:

Slow API — Tool waits 15+ seconds, poor UX but eventually responds
Failing API — External service unavailable, 424 error after timeout
Unresponsive state — Request accepted but never returns, requires session restart

The Demo: Simulating Real Timeout Scenarios

We built an MCP server that simulates these real-world scenarios:

from mcp.server import FastMCP
import asyncio

# FastMCP is a lightweight MCP server framework — tools are registered with @mcp.tool()
mcp = FastMCP("Timeout Demo Server")

# Baseline: responds in 1s, well within MCP's implicit timeout threshold (~7-10s)
@mcp.tool(description="Fast API - responds in 1 second")
async def fast_api(query: str) -> str:
    await asyncio.sleep(1)
    return f"Fast result for: {query}"

# Problem case: 15s delay exceeds MCP timeout — the agent freezes waiting for this
@mcp.tool(description="Slow API - responds in 15 seconds")
async def slow_api(query: str) -> str:
    await asyncio.sleep(15)  # Simulates a slow external service (data pipeline, batch job)
    return f"Slow result for: {query}"

# Failure case: 7s delay triggers the timeout, then raises Failed Dependency (424)
@mcp.tool(description="Failing API - returns 424 after delay")
async def failing_api(query: str) -> str:
    await asyncio.sleep(7)
    raise Exception("Failed Dependency: External service unavailable")

The Async HandleId Solution

Instead of waiting for slow operations, return immediately with a tracking ID:

import uuid

# In-memory job store: maps job_id → {status, query, result}
# For production, replace with a persistent store (Redis, DynamoDB) for durability across restarts
JOBS = {}

# The handleId pattern: return a tracking ID immediately instead of blocking
@mcp.tool(description="Start a long-running job, returns immediately with job ID")
async def start_async_job(query: str) -> str:
    job_id = str(uuid.uuid4())[:8]  # Short ID the LLM can pass in follow-up calls
    JOBS[job_id] = {"status": "processing", "query": query}

    # Fire-and-forget: slow work runs in background, tool returns before it finishes
    asyncio.create_task(do_work(job_id, query))

    # The agent receives this in < 1s — no timeout, no frozen UI
    return f"Job started: {job_id}. Use check_job_status to poll for results."

# Polling endpoint: the agent calls this repeatedly until status is "completed"
@mcp.tool(description="Check status of a running job")
async def check_job_status(job_id: str) -> str:
    job = JOBS.get(job_id)
    if not job:
        return f"Job {job_id} not found"
    if job["status"] == "completed":
        return f"COMPLETED: {job['result']}"  # Return the actual result to the agent
    return f"PROCESSING: Job {job_id} still running"  # Agent polls again after a short wait

Demo Results

We tested all four scenarios with a Strands Agent connected to the MCP server:

Scenario	Response Time	User Experience	Research Finding
Fast API (1s delay)	3.2s total	✅ Good UX	Baseline
Slow API (15s delay)	17.8s total	❌ Poor UX — agent waits	Octopus: "agent waits indefinitely"
Failing API (424)	7.7s total	❌ Error after wait	OpenAI Community: 424 errors
Async pattern (handleId)	3.7s total	✅ Immediate response	Solution: "respond ASAP with handleId"

The async pattern transforms a 17.8s wait into a 3.7s immediate response. The agent tells the user "job started" and can check status later, with no frozen UI and no timeout errors.

Why Strands Agents for MCP Integration?

The MCPClient connects to any MCP server in two lines. The agent discovers available tools at runtime through list_tools_sync(), so you don't maintain a hardcoded tool list. When the MCP server implements the async handleId pattern, the agent polls automatically without extra orchestration code.

Strands supports multiple model providers (OpenAI, Amazon Bedrock, Anthropic, Ollama). The MCP timeout patterns shown here work identically across all providers.

When to Use Each Pattern

Direct call (fast tools < 5s):

Lookups, calculations, small API calls
No timeout risk

Async handleId (slow tools > 5s):

External API calls with unpredictable latency
Data processing, report generation
Any operation that might exceed MCP timeout

Retry with backoff (intermittent failures):

Services that occasionally fail but recover
Network-dependent operations

Try It Yourself

You need Python 3.9+, uv, and an OpenAI API key. The MCP server runs locally as a subprocess, so no external services are needed.

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail/stop-ai-agents-wasting-tokens/02-mcp-timeout-demo
uv venv && uv pip install -r requirements.txt
export OPENAI_API_KEY="your-key-here"

uv run python test_mcp_timeout.py   # Runs all 4 scenarios

Or open test_mcp_timeout.ipynb in Jupyter, JupyterLab, VS Code, or your preferred notebook environment.

Key Takeaways

MCP tools timeout silently — 424 errors with no recovery
Slow APIs freeze the entire agent — 17.8s wait with no feedback
Async handleId pattern solves it — immediate response, poll for results
Design for failure — every external call can timeout, plan accordingly

Frequently Asked Questions

What causes 424 errors in MCP tool calls?

A 424 (Failed Dependency) error occurs when an MCP tool takes longer than the implicit timeout threshold (typically 7-10 seconds) to respond. The MCP protocol expects tools to return quickly. When an external API blocks the tool beyond this threshold, the connection drops and the agent receives a 424 error instead of data.

When should I use the async handleId pattern instead of a direct MCP tool call?

Use the async handleId pattern for any tool that calls an external API with unpredictable latency: data processing, report generation, third-party service calls, or any operation that might exceed 5 seconds. For fast lookups, calculations, and small API calls under 5 seconds, direct calls work fine.

Does the async handleId pattern work with any MCP server, not only Strands?

Yes. The async handleId pattern is an MCP server design pattern, not a framework feature. Any MCP-compatible agent can call start_long_job and check_job_status tools. The pattern works with OpenAI Agents, LangChain MCP integrations, and any client that supports the Model Context Protocol.

References

Research

Resilient AI Agents With MCP: Timeout And Retry Strategies — Octopus blog (community observation), May 2025
Call remote MCP server tool timed out, error 424 — OpenAI Community (community forum)
Handling Timeouts with Long-Running MCP Connectors — OpenAI Community (community forum), Dec 2025
Build Timeout-Proof MCP Tools — Arsturn (community observation)

Implementation

Strands MCP Tools — Connect any MCP server
Strands Model Providers — Swap to Amazon Bedrock, Anthropic, Ollama

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Elizabeth Fuentes L

I help developers build production-ready AI applications through hands-on tutorials and open-source projects.

Top comments (1)

PEACEBINFLOW • May 2

The handleId pattern is clean, but what I keep thinking about is what happens after the polling loop ends. You return COMPLETED: {result} and the agent finally has the data—but by that point, the conversation context has shifted. The user asked for something fifteen seconds and three exchanges ago. The agent now has to thread the result back into a conversation that might have moved on.

This isn't really a problem with the pattern itself. It's more that MCP's request-response model implicitly assumes tools are fast enough to fit within a single conversational turn. When you break that assumption with async, you're asking the agent to do something it's not naturally good at: maintain state across an indefinite gap, recognize when the async result has arrived, and weave it back in at the right moment without the user having to repeat themselves.

I wonder if the polling tool could carry a small amount of context forward—like including the original query in the PROCESSING response so the agent doesn't have to hold it in its own context window across polls. Something simple: PROCESSING: Job abc123 still running (original query: "generate Q1 report for enterprise segment"). That way when the agent finally gets COMPLETED, the reminder is right there in the polling history, not buried twenty messages up in the conversation. Small thing, but it might reduce the cases where the agent gets the result and then asks the user "what did you want me to do with this again?"