DEV Community

Pax
Pax

Posted on • Originally published at paxrel.com

How to Deploy an AI Agent to Production: VPS, Docker & Serverless (2026)

How to Deploy an AI Agent to Production: VPS, Docker & Serverless (2026)

    Your agent works on your laptop. Great. Now how do you make it run 24/7 without you babysitting it? Deployment is where most AI agent projects die — not because the agent doesn't work, but because nobody figured out how to keep it running reliably.

    This guide covers three deployment approaches (VPS, Docker, serverless), with real configs, cost breakdowns, and the monitoring you need to sleep at night while your agent works.

    ## Choosing Your Deployment Model



            Approach
            Best For
            Monthly Cost
            Complexity
            Always-On?


            **VPS (bare metal)**
            24/7 autonomous agents
            $5-20
            Medium
            Yes


            **Docker + VPS**
            Reproducible, multi-agent
            $10-30
            Medium-High
            Yes


            **Serverless (Lambda/Cloud Run)**
            Event-triggered agents
            $1-50 (pay-per-use)
            Low-Medium
            No (triggered)


            **Managed platforms**
            No-ops teams
            $20-200
            Low
            Varies



    ## Option 1: VPS Deployment (What We Use)

    The simplest path to a 24/7 agent. Rent a virtual server, install your agent, set up a process manager, and let it run.

    ### Step 1: Choose a VPS Provider



            Provider
            Cheapest Plan
            Specs
            Best For


            **Hetzner**
            $4.50/mo
            2 vCPU, 4GB RAM, 40GB SSD
            Best value in EU


            **DigitalOcean**
            $6/mo
            1 vCPU, 1GB RAM, 25GB SSD
            Simple UI, good docs


            **Vultr**
            $6/mo
            1 vCPU, 1GB RAM, 25GB SSD
            Global locations


            **Contabo**
            $6.50/mo
            4 vCPU, 8GB RAM, 50GB SSD
            Most specs per dollar




        **What Paxrel uses:** A Hetzner CX22 ($5.50/mo) with 2 vCPU, 4GB RAM. Runs our full agent stack: newsletter pipeline, social media automation, web scraping, and Reddit karma builder — all on one server.


    ### Step 2: Initial Server Setup
Enter fullscreen mode Exit fullscreen mode
# SSH into your new server
ssh root@your-server-ip

# Create a non-root user
adduser agent
usermod -aG sudo agent

# Install essentials
apt update && apt install -y python3 python3-pip python3-venv git curl

# Switch to agent user
su - agent

# Clone your agent code
git clone https://github.com/your-org/your-agent.git
cd your-agent

# Set up Python environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Create environment file for credentials
cat > .env > logs/pipeline.log 2>&1

# Social media posting: Every 6 hours
0 */6 * * * cd /home/agent/your-agent && .venv/bin/python3 post_tweet.py >> logs/twitter.log 2>&1

# Daily monitoring report
30 9 * * * cd /home/agent/your-agent && .venv/bin/python3 monitoring.py >> logs/monitoring.log 2>&1
Enter fullscreen mode Exit fullscreen mode
    ## Option 2: Docker Deployment

    Docker adds reproducibility and isolation. Especially useful when running multiple agents or when your agent has complex dependencies.
Enter fullscreen mode Exit fullscreen mode
# Dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl git && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy agent code
COPY . .

# Non-root user for security
RUN useradd -m agent
USER agent

CMD ["python3", "agent.py"]
Enter fullscreen mode Exit fullscreen mode
# docker-compose.yml
version: '3.8'

services:
  agent:
    build: .
    restart: always
    env_file: .env
    volumes:
      - ./data:/app/data      # Persist agent memory/state
      - ./logs:/app/logs       # Persist logs
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '1.0'
    healthcheck:
      test: ["CMD", "python3", "-c", "import requests; requests.get('http://localhost:8080/health')"]
      interval: 60s
      timeout: 10s
      retries: 3

  # Optional: vector database for RAG
  chromadb:
    image: chromadb/chroma:latest
    restart: always
    volumes:
      - chroma_data:/chroma/chroma
    ports:
      - "8000:8000"

volumes:
  chroma_data:
Enter fullscreen mode Exit fullscreen mode
# Deploy
docker compose up -d

# View logs
docker compose logs -f agent

# Update agent
git pull && docker compose build && docker compose up -d
Enter fullscreen mode Exit fullscreen mode
    ## Option 3: Serverless Deployment

    For agents triggered by events (webhook, email, schedule) rather than running continuously. Pay only when the agent runs.

    ### AWS Lambda + EventBridge
Enter fullscreen mode Exit fullscreen mode
# handler.py
import json
import boto3

def lambda_handler(event, context):
    """Triggered by EventBridge cron or API Gateway webhook"""

    # Your agent logic here
    from agent import run_agent
    result = run_agent(event)

    return {
        'statusCode': 200,
        'body': json.dumps(result)
    }
Enter fullscreen mode Exit fullscreen mode
# serverless.yml (Serverless Framework)
service: ai-agent

provider:
  name: aws
  runtime: python3.12
  timeout: 300  # 5 minutes max
  memorySize: 512
  environment:
    OPENAI_API_KEY: ${ssm:/ai-agent/openai-key}

functions:
  newsletter:
    handler: handler.lambda_handler
    events:
      - schedule: cron(0 8 ? * MON,WED,FRI *)  # Mon/Wed/Fri 8am
  webhook:
    handler: handler.lambda_handler
    events:
      - httpApi:
          path: /webhook
          method: post
Enter fullscreen mode Exit fullscreen mode
    ### Google Cloud Run
Enter fullscreen mode Exit fullscreen mode
# For longer-running agents (up to 60 min)
gcloud run deploy ai-agent \
  --source . \
  --region us-central1 \
  --memory 1Gi \
  --timeout 3600 \
  --set-env-vars "OPENAI_API_KEY=sk-..." \
  --no-allow-unauthenticated
Enter fullscreen mode Exit fullscreen mode
            Platform
            Max Runtime
            Cold Start
            Cost per Run


            AWS Lambda
            15 minutes
            1-5 seconds
            $0.0001-0.01


            Google Cloud Run
            60 minutes
            2-10 seconds
            $0.001-0.05


            Vercel Functions
            5 minutes (pro: 15)

            $0.0001-0.005


            Cloudflare Workers
            30 seconds (free)

            $0.00005



    ## Monitoring Your Deployed Agent

    A deployed agent without monitoring is a liability. Here's the minimum monitoring stack:

    ### Health Check Endpoint
Enter fullscreen mode Exit fullscreen mode
from flask import Flask, jsonify
import psutil

app = Flask(__name__)

@app.route('/health')
def health():
    return jsonify({
        "status": "healthy",
        "uptime_hours": get_uptime(),
        "memory_mb": psutil.Process().memory_info().rss / 1024 / 1024,
        "last_run": get_last_run_timestamp(),
        "errors_24h": get_error_count(hours=24),
        "api_balance": check_api_balance()
    })
Enter fullscreen mode Exit fullscreen mode
    ### Alert System
Enter fullscreen mode Exit fullscreen mode
import requests

def send_alert(message, level="warning"):
    """Send alert via Telegram/Slack/email"""
    if level == "critical":
        # Telegram for immediate attention
        requests.post(
            f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
            data={"chat_id": OWNER_ID, "text": f"🚨 {message}"}
        )
    else:
        # Slack webhook for non-critical
        requests.post(SLACK_WEBHOOK, json={"text": f"⚠️ {message}"})

# Alerts to configure:
# - Agent crash / restart
# - API balance below threshold
# - Error rate spike (3+ errors in 10 min)
# - Agent stuck (no activity for 2+ hours)
# - Cost spike (daily spend > 2x average)
Enter fullscreen mode Exit fullscreen mode
    ### Log Management
Enter fullscreen mode Exit fullscreen mode
import logging
from logging.handlers import RotatingFileHandler

# Structured logging
handler = RotatingFileHandler(
    'logs/agent.log',
    maxBytes=10_000_000,  # 10MB per file
    backupCount=5          # Keep 5 rotated files
)
handler.setFormatter(logging.Formatter(
    '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
))

logger = logging.getLogger('agent')
logger.addHandler(handler)

# Log every significant action
logger.info("Scraping 12 RSS feeds")
logger.info("Scored 97 articles, top score: 28")
logger.warning("API rate limited, retrying in 30s")
logger.error("Beehiiv publish failed: 401 Unauthorized")
Enter fullscreen mode Exit fullscreen mode
    ## Production Hardening Checklist

    ### Security

        - API keys in environment variables or secrets manager, never in code
        - Non-root user for the agent process
        - Firewall: only allow SSH (22) and necessary ports
        - SSH key auth only, disable password login
        - Auto-update OS security patches (`unattended-upgrades`)


    ### Reliability

        - Process manager with auto-restart (systemd, Docker restart policy)
        - Graceful shutdown handling (catch SIGTERM, finish current task)
        - Exponential backoff on API errors (not infinite retry loops)
        - Circuit breaker for external services (stop calling after N failures)
        - Daily backup of agent state/memory to external storage


    ### Cost Control

        - Daily API spend limit with hard cutoff
        - Max steps per agent run (prevent infinite loops)
        - Token counting before API calls (reject oversized prompts)
        - Alert when daily spend exceeds 2x average
        - Weekly cost report to the team


    ## Deployment Patterns by Use Case



            Agent Type
            Best Deployment
            Why


            24/7 autonomous agent
            VPS + systemd
            Always-on, persistent state


            Scheduled pipeline
            VPS + cron or serverless
            Runs on schedule, sleeps between


            Webhook-triggered
            Serverless (Lambda/Cloud Run)
            Pay-per-use, auto-scales


            Multi-agent system
            Docker Compose on VPS
            Isolated containers, shared network


            Customer-facing chatbot
            Cloud Run or managed platform
            Auto-scale with traffic


            Development/testing
            Local Docker
            Reproducible environment



    ## Key Takeaways


        - **VPS + systemd is the simplest path** for always-on agents. $5-15/month, full control, works for 90% of use cases.
        - **Docker adds value** when you have complex dependencies, multiple agents, or need reproducibility across environments.
        - **Serverless is cheaper for sporadic workloads** but has runtime limits (15 min for Lambda) that don't suit long-running agents.
        - **Monitoring is not optional.** Health checks, alerts, and log rotation are the minimum. An unmonitored agent will fail silently.
        - **Security basics matter.** Non-root user, env vars for secrets, firewall, SSH keys. Takes 30 minutes, prevents disasters.
        - **Start simple, scale later.** A $5 VPS with cron jobs is a perfectly valid production deployment. Don't over-engineer until you need to.



        ### Deploy With Confidence
        Our AI Agent Playbook includes Dockerfiles, systemd configs, monitoring templates, and deployment checklists for production agents.

        [Get the Playbook — $29](https://paxrel.gumroad.com/l/ai-agent-playbook)



        ### Stay Updated on AI Agents
        Deployment patterns, infrastructure tips, and production war stories. 3x/week, no spam.

        [Subscribe to AI Agents Weekly](/newsletter.html)
Enter fullscreen mode Exit fullscreen mode

Want more AI agent content?

Originally published on paxrel.com

Top comments (0)