DEV Community

ZNY
ZNY

Posted on

Docker for AI Development: Containerizing LLM Applications

Docker simplifies AI application deployment by providing consistent environments from development to production. Here's how to containerize your AI applications powered by Claude and ofox.ai.

Why Docker for AI Apps?

  1. Reproducible environments — Same behavior locally and in production
  2. Dependency isolation — Python packages, system libraries, CUDA versions
  3. Easy deployment — Ship to any cloud with Docker
  4. Resource control — Limit CPU/memory per container

Basic Dockerfile for AI App

`dockerfile
Dockerfile
FROM python:3.11-slim

WORKDIR /app

Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*

Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Copy application
COPY . .

Set environment variables
ENV PYTHONUNBUFFERED=1

Run the application
CMD ["python", "main.py"]
`

dockerfile
requirements.txt
fastapi==0.109.0
uvicorn==0.27.0
httpx==0.26.0
python-dotenv==1.0.0

Docker Compose for AI Services

`yaml
docker-compose.yml
version: '3.8'

services:
api:
build: .
ports:

  • "8000:8000" environment:
  • OFOXAPIKEY=${OFOXAPIKEY}
  • MODEL=claude-3-5-sonnet-20241022 restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3

Optional: Add a Redis cache

redis:
image: redis:7-alpine
ports:

  • "6379:6379" volumes:
  • redis-data:/data

volumes:
redis-data:
`

Production-Ready FastAPI + ofox.ai

`python
main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import httpx
import os

app = FastAPI(title="Claude API Service", version="1.0.0")

class Message(BaseModel):
role: str
content: str

class ChatRequest(BaseModel):
messages: List[Message]
model: str = "claude-3-5-sonnet-20241022"
max_tokens: Optional[int] = 1024
temperature: Optional[float] = 0.7

@app.get("/health")
async def health():
return {"status": "healthy"}

@app.post("/chat")
async def chat(request: ChatRequest):
async with httpx.AsyncClient(timeout=120.0) as client:
try:
response = await client.post(
"https://api.ofox.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['OFOXAPIKEY']}",
"Content-Type": "application/json"
},
json={
"model": request.model,
"messages": [m.model_dump() for m in request.messages],
"maxtokens": request.maxtokens,
"temperature": request.temperature
}
)
response.raiseforstatus()
data = response.json()
return {
"content": data["choices"][0]["message"]["content"],
"model": data["model"],
"tokens": data["usage"]["total_tokens"]
}
except httpx.HTTPStatusError as e:
raise HTTPException(statuscode=e.response.statuscode, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
`

GPU Support for Local Models

`dockerfile
Dockerfile with GPU support
FROM nvidia/cuda:12.1.0-base-ubuntu22.04

RUN apt-get update && apt-get install -y \
python3.11 python3.11-venv python3-pip \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

For running local models like Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh

COPY . .
CMD ["python3", "main.py"]
`

`yaml
docker-compose.yml with GPU
services:
api:
build: .
deploy:
resources:
reservations:
devices:

  • driver: nvidia count: 1 capabilities: [gpu] `

Multi-Stage Build (Smaller Images)

`dockerfile
Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

Production stage
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]
`

Environment-Based Configuration

`python
config.py
import os
from dataclasses import dataclass

@dataclass
class Config:
api_key: str
model: str
max_tokens: int
temperature: float

def get_config() -> Config:
return Config(
apikey=os.environ["OFOXAPI_KEY"],
model=os.environ.get("MODEL", "claude-3-5-sonnet-20241022"),
maxtokens=int(os.environ.get("MAXTOKENS", "1024")),
temperature=float(os.environ.get("TEMPERATURE", "0.7"))
)
`

Building and Running

`bash
Build
docker build -t claude-api-service .

Run
docker run -d -p 8000:8000 \
-e OFOXAPIKEY=your-key-here \
--name claude-api \
claude-api-service

With Docker Compose
docker-compose up -d

View logs
docker logs -f claude-api

Shell into container
docker exec -it claude-api /bin/bash
`

CI/CD with GitHub Actions

`yaml
name: Build and Deploy

on:
push:
branches: [main]

jobs:
build:
runs-on: ubuntu-latest
steps:

  • uses: actions/checkout@v4

  • name: Build image
    run: docker build -t claude-api:${{ github.sha }} .

  • name: Run tests
    run: |
    docker run claude-api:${{ github.sha }} pytest

  • name: Push to registry
    run: |
    docker tag claude-api:${{ github.sha }} registry/app/claude-api:latest
    docker push registry/app/claude-api:latest
    `

Deploy Anywhere

With Docker, your AI application deploys to:
AWS ECS — Managed container service
Google Cloud Run — Serverless containers
Azure Container Instances — Simple deployment
DigitalOcean App Platform — Simple PaaS
Your own server — With docker-compose

Getting Started

Containerize your AI applications and deploy with confidence. Power them with ofox.ai — reliable Claude API with competitive pricing and 99.9% uptime.

👉 Get started with ofox.ai

This article contains affiliate links.

Tags: docker,devops,ai,programming,developer
Canonical URL: https://dev.to/zny10289

Top comments (0)