DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

Originally published at claudeguide.io/claude-agent-production-deploy

Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

Claude agent deployments fail for three common reasons: timeouts (LLM calls take 5-30 seconds, most serverless platforms time out at 30s), cold starts (agents with heavy initialization are too slow for serverless), and missing environment variables in production. Choosing the right deployment target for your agent type prevents all three. This guide covers the three main deployment patterns with complete configuration.


Deployment Target Decision Matrix

Agent type Recommended platform Why
Long-running (5+ minutes) Fly.io No timeout limits, persistent processes
API endpoint (< 30s response) Vercel Zero-config, automatic scaling
Event-driven (webhooks, queues) AWS Lambda Pay-per-invocation, natural event model
Streaming responses Vercel Edge Low latency, streaming SSE support
High-volume, cost-sensitive Fly.io + Redis queue Full control, no per-invocation billing

Fly.io: Long-Running Agents

Best for: agents that run for minutes, background processing, agents that need to hold state in memory.

Project structure

my-agent/
├── Dockerfile
├── fly.toml
├── requirements.txt
└── agent/
    ├── __init__.py
    ├── main.py
    └── tools.py
Enter fullscreen mode Exit fullscreen mode

Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY agent/ ./agent/

# Health check endpoint
EXPOSE 8080

CMD ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8080"]
Enter fullscreen mode Exit fullscreen mode

FastAPI agent server


python
# agent/main.py
import os
import asyncio
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# In-memory job tracker (use Redis in production for multi-instance)
jobs = {}


class AgentRequest(BaseModel):
    goal: str
    webhook_url: str | None = None


class JobStatus(BaseModel):
    job_id: str
    status: str  # "running" | "done" | "failed"
    result: str | None = None
    error: str | None = None


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/run")
async def run_agent(request: AgentRequest, background_tasks: BackgroundTasks):
    import uuid
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "running", "result": None, "error": None}

    background_tasks.add_task(execute_agent_job, job_id, request.goal, request.webhook_url)
    return {"job_id": job_id}


@app.get("/status/{job_id}")
async def get_status(job_id: str) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-agent-production-deploy)

*30-day money-back guarantee. Instant download.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)