DEV Community

WDSEGA
WDSEGA

Posted on

Docker部署AI应用完全指南:从零到生产环境

Cover

AI模型从开发到部署,Docker已成为事实标准。本文手把手教你构建生产级AI应用Docker方案。

基础Dockerfile

FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python3", "app.py"]
Enter fullscreen mode Exit fullscreen mode

GPU支持

# docker-compose.yml
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]
Enter fullscreen mode Exit fullscreen mode

实战:LLM推理服务

from fastapi import FastAPI
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

app = FastAPI()
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)

@app.post("/generate")
async def generate(prompt: str):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512)
    return {"response": tokenizer.decode(outputs[0])}
Enter fullscreen mode Exit fullscreen mode

生产优化

  • 多阶段构建减小镜像体积
  • 使用vLLM加速推理(3-5倍提升)
  • Prometheus监控 + 健康检查
  • 模型量化(FP16→INT4,内存减少70%)

📢 本文为精简版,完整版含Kubernetes部署和监控方案,请访问 WD Tech Blog 查看!

关注博客获取最新AI教程!

Top comments (0)