Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

#deployment #vercel #production #devops

Originally published at claudeguide.io/claude-agent-production-deploy

Deploying Claude Agents to Production: Fly.io, Vercel, and Lambda

Claude agent deployments fail for three common reasons: timeouts (LLM calls take 5-30 seconds, most serverless platforms time out at 30s), cold starts (agents with heavy initialization are too slow for serverless), and missing environment variables in production. Choosing the right deployment target for your agent type prevents all three. This guide covers the three main deployment patterns with complete configuration.

Deployment Target Decision Matrix

Agent type	Recommended platform	Why
Long-running (5+ minutes)	Fly.io	No timeout limits, persistent processes
API endpoint (< 30s response)	Vercel	Zero-config, automatic scaling
Event-driven (webhooks, queues)	AWS Lambda	Pay-per-invocation, natural event model
Streaming responses	Vercel Edge	Low latency, streaming SSE support
High-volume, cost-sensitive	Fly.io + Redis queue	Full control, no per-invocation billing

Fly.io: Long-Running Agents

Best for: agents that run for minutes, background processing, agents that need to hold state in memory.

Project structure

my-agent/
├── Dockerfile
├── fly.toml
├── requirements.txt
└── agent/
    ├── __init__.py
    ├── main.py
    └── tools.py

Dockerfile

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY agent/ ./agent/

# Health check endpoint
EXPOSE 8080

CMD ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8080"]

FastAPI agent server


python
# agent/main.py
import os
import asyncio
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# In-memory job tracker (use Redis in production for multi-instance)
jobs = {}


class AgentRequest(BaseModel):
    goal: str
    webhook_url: str | None = None


class JobStatus(BaseModel):
    job_id: str
    status: str  # "running" | "done" | "failed"
    result: str | None = None
    error: str | None = None


@app.get("/health")
async def health():
    return {"status": "ok"}


@app.post("/run")
async def run_agent(request: AgentRequest, background_tasks: BackgroundTasks):
    import uuid
    job_id = str(uuid.uuid4())
    jobs[job_id] = {"status": "running", "result": None, "error": None}

    background_tasks.add_task(execute_agent_job, job_id, request.goal, request.webhook_url)
    return {"job_id": job_id}


@app.get("/status/{job_id}")
async def get_status(job_id: str) -

[→ Get the Agent SDK Cookbook — $49](https://shoutfirst.gumroad.com/l/ogxhmy?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-agent-production-deploy)

*30-day money-back guarantee. Instant download.*