DEV Community

WDSEGA
WDSEGA

Posted on

Kubernetes部署AI模型实战:从Docker到生产级MLOps

Kubernetes AI部署

引言

训练好一个AI模型只是开始,如何将其稳定、高效地部署到生产环境才是真正的挑战。Kubernetes为AI模型部署提供了弹性伸缩、滚动更新、服务发现等强大能力。

Docker化AI模型

FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim as runtime
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY app/ ./app/
COPY models/ ./models/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Enter fullscreen mode Exit fullscreen mode

FastAPI服务端

from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np

app = FastAPI(title="AI Model Service")

class PredictRequest(BaseModel):
    features: list[float]

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.post("/predict")
async def predict(request: PredictRequest):
    input_array = np.array(request.features).reshape(1, -1)
    prediction = int(model.predict(input_array)[0])
    return {"prediction": prediction}
Enter fullscreen mode Exit fullscreen mode

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-model-service
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: ai-model
        image: your-registry/ai-model-service:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
Enter fullscreen mode Exit fullscreen mode

HPA自动扩缩容

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70
Enter fullscreen mode Exit fullscreen mode

最佳实践

  1. 模型文件管理:使用共享存储,不打包到镜像
  2. GPU调度:配置nvidia.com/gpu资源限制
  3. 金丝雀发布:使用Istio实现流量切分
  4. 监控告警:Prometheus + Grafana全链路监控

结语

Kubernetes为AI模型部署提供了企业级的可靠性和弹性,是构建高可用AI推理服务的最佳选择。


📢 本文为精简版,完整版包含独家工具推荐和深度分析,请访问 WD Tech Blog 查看!

关注我的博客获取最新科技资讯、AI教程和效率工具推荐!

Top comments (0)