Naresh Nishad

Posted on Dec 13, 2024

Day 51: Containerization of LLM Applications

#llm #75daysofllm

Introduction

As Large Language Models (LLMs) become integral to applications, deploying them in a scalable and portable way is essential. Containerization enables this by packaging LLM applications with all dependencies into lightweight, portable containers that can run anywhere—cloud, on-premises, or edge devices.

Why Containerize LLM Applications?

Portability: Containers ensure your application runs consistently across different environments.
Scalability: Seamlessly scale up or down using orchestration tools like Kubernetes.
Isolation: Keeps the environment clean and avoids conflicts between dependencies.
Efficiency: Faster deployments and lightweight resource usage compared to traditional virtual machines.

Tools for Containerization

Docker: A popular containerization platform to build and run containers.
Kubernetes: For managing containerized applications at scale.
Docker Compose: Simplifies multi-container configurations.

Steps to Containerize LLM Applications

1. Prepare the LLM Application

Ensure your application (e.g., REST API for LLM inference) is working locally.

Example: Use FastAPI to create an LLM inference service.

from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
def health_check():
    return {"status": "Running!"}

2. Write a Dockerfile

Create a Dockerfile to define the container image.

# Use an official Python image
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy application files
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir fastapi uvicorn transformers torch

# Expose the port
EXPOSE 8000

# Define the command to run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

3. Build the Docker Image

Build the image using the Dockerfile.

docker build -t llm-api .

4. Run the Container

Run the container locally to test.

docker run -d -p 8000:8000 llm-api

5. Test the Containerized Application

Verify the application by sending a request.

curl http://localhost:8000/health

Expected response:

{"status": "Running!"}

Deployment with Orchestration

Using Docker Compose

For multi-container setups (e.g., API + Redis), create a docker-compose.yml.

version: '3.8'

services:
  llm-api:
    build: 
      context: .
    ports:
      - "8000:8000"
    depends_on:
      - redis
    environment:
      - REDIS_HOST=redis
    volumes:
      - ./:/app

  redis:
    image: "redis:latest"
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Run the setup:

docker-compose up

Using Kubernetes

Deploy at scale using Kubernetes. Define a deployment YAML file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-api
  template:
    metadata:
      labels:
        app: llm-api
    spec:
      containers:
      - name: llm-api
        image: llm-api:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8000
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 10

Apply the deployment:

kubectl apply -f deployment.yaml

Best Practices

Optimize Images: Use lightweight base images like python:3.9-alpine.
Environment Variables: Use .env files for configuration.
Resource Limits: Set CPU and memory limits for containers.
Monitoring: Use tools like Prometheus and Grafana.

Conclusion

Containerizing LLM applications ensures portability, scalability, and efficiency. Using tools like Docker and Kubernetes, you can deploy LLMs seamlessly across environments, enabling robust and scalable AI-powered applications.

DEV Community