Target Keyword: "docker llm application deployment"
Tags: docker,devops,ai,programming,developer
Type: Tutorial
Content
Docker for AI Development: Containerizing LLM Applications
Docker simplifies AI application deployment by providing consistent environments from development to production. Here's how to containerize your AI applications powered by Claude and ofox.ai.
Why Docker for AI Apps?
- Reproducible environments — Same behavior locally and in production
- Dependency isolation — Python packages, system libraries, CUDA versions
- Easy deployment — Ship to any cloud with Docker
- Resource control — Limit CPU/memory per container
Basic Dockerfile for AI App
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Set environment variables
ENV PYTHONUNBUFFERED=1
# Run the application
CMD ["python", "main.py"]
# requirements.txt
fastapi==0.109.0
uvicorn==0.27.0
httpx==0.26.0
python-dotenv==1.0.0
Docker Compose for AI Services
# docker-compose.yml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- OFOX_API_KEY=${OFOX_API_KEY}
- MODEL=claude-3-5-sonnet-20241022
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Optional: Add a Redis cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
volumes:
redis-data:
Production-Ready FastAPI + ofox.ai
# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import httpx
import os
app = FastAPI(title="Claude API Service", version="1.0.0")
class Message(BaseModel):
role: str
content: str
class ChatRequest(BaseModel):
messages: List[Message]
model: str = "claude-3-5-sonnet-20241022"
max_tokens: Optional[int] = 1024
temperature: Optional[float] = 0.7
@app.get("/health")
async def health():
return {"status": "healthy"}
@app.post("/chat")
async def chat(request: ChatRequest):
async with httpx.AsyncClient(timeout=120.0) as client:
try:
response = await client.post(
"https://api.ofox.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['OFOX_API_KEY']}",
"Content-Type": "application/json"
},
json={
"model": request.model,
"messages": [m.model_dump() for m in request.messages],
"max_tokens": request.max_tokens,
"temperature": request.temperature
}
)
response.raise_for_status()
data = response.json()
return {
"content": data["choices"][0]["message"]["content"],
"model": data["model"],
"tokens": data["usage"]["total_tokens"]
}
except httpx.HTTPStatusError as e:
raise HTTPException(status_code=e.response.status_code, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
GPU Support for Local Models
# Dockerfile with GPU support
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
RUN apt-get update && apt-get install -y \
python3.11 python3.11-venv python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# For running local models like Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh
COPY . .
CMD ["python3", "main.py"]
# docker-compose.yml with GPU
services:
api:
build: .
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Multi-Stage Build (Smaller Images)
# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Production stage
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]
Environment-Based Configuration
# config.py
import os
from dataclasses import dataclass
@dataclass
class Config:
api_key: str
model: str
max_tokens: int
temperature: float
def get_config() -> Config:
return Config(
api_key=os.environ["OFOX_API_KEY"],
model=os.environ.get("MODEL", "claude-3-5-sonnet-20241022"),
max_tokens=int(os.environ.get("MAX_TOKENS", "1024")),
temperature=float(os.environ.get("TEMPERATURE", "0.7"))
)
Building and Running
# Build
docker build -t claude-api-service .
# Run
docker run -d -p 8000:8000 \
-e OFOX_API_KEY=your-key-here \
--name claude-api \
claude-api-service
# With Docker Compose
docker-compose up -d
# View logs
docker logs -f claude-api
# Shell into container
docker exec -it claude-api /bin/bash
CI/CD with GitHub Actions
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t claude-api:${{ github.sha }} .
- name: Run tests
run: |
docker run claude-api:${{ github.sha }} pytest
- name: Push to registry
run: |
docker tag claude-api:${{ github.sha }} registry/app/claude-api:latest
docker push registry/app/claude-api:latest
Deploy Anywhere
With Docker, your AI application deploys to:
- AWS ECS — Managed container service
- Google Cloud Run — Serverless containers
- Azure Container Instances — Simple deployment
- DigitalOcean App Platform — Simple PaaS
- Your own server — With docker-compose
Getting Started
Containerize your AI applications and deploy with confidence. Power them with ofox.ai — reliable Claude API with competitive pricing and 99.9% uptime.
This article contains affiliate links.
Tags: docker,devops,ai,programming,developer
Canonical URL: https://dev.to/zny10289
Top comments (0)