Ollama Meets AMD GPUs

Large Language Models (LLMs) are revolutionizing the way we interact with machines. Their ever-growing complexity demands ever-increasing processing power. This is where accelerators like GPUs come into play, offering a significant boost for training and inference tasks.

The good news? Ollama, a popular self-hosted large language model server, now joins the party with official support for AMD GPUs through ROCm! This blog dives into how to leverage this exciting new development, even if your Ollama server resides within a Kubernetes cluster.

Ollama Meets AMD GPUs

A Match Made in Compute Heaven. Ollama's integration with ROCm allows you to utilize the raw power of your AMD graphics card for running LLMs. This translates to faster training times and smoother inference experiences. But wait, there's more!

Benefits of AMD + ROCm for Ollama:

Cost-effective performance: AMD GPUs offer exceptional value for money, making them a great choice for budget-conscious LLM enthusiasts.
Open-source advantage: ROCm, the open-source platform powering AMD's GPU ecosystem, fosters a collaborative environment and continuous development.

Setting Up Ollama with AMD and ROCm on Kubernetes

Here's how to deploy Ollama with ROCm support on your Kubernetes cluster:

Install the ROCm Kubernetes Device Plugin:

This plugin facilitates communication between Ollama and your AMD GPU. Follow the official guide at https://github.com/ROCm/k8s-device-plugin/blob/master/README.md for installation instructions.

Deploy Ollama with ROCm Support (using Kubernetes YAML):

The YAML configuration you provided offers a solid template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-rocm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama-rocm
  template:
    metadata:
      labels:
        app: ollama-rocm
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:rocm
        ports:
        - containerPort: 11434
          name: ollama
        volumeMounts:
        - name: ollama-data
          mountPath: /root/.ollama
        resources:
          requests:
            memory: "32Gi"
            cpu: "64"
          limits:
            memory: "100Gi"
            cpu: "64"
            amd.com/gpu: 1
      volumes:
      - name: ollama-data
        hostPath:
          path: /var/lib/ollama/.ollama
          type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service-rocm
spec:
  selector:
    app: ollama-rocm
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
    name: ollama

Key points to note:

The ollama/ollama:rocm image ensures you're using the ROCm-compatible version of Ollama.
The amd.com/gpu: 1 resource request signifies your desire to utilize one AMD GPU for Ollama.
Exposing Ollama Services:

The provided Service definition exposes Ollama's port (11434) for external access.

Important Note:

The provided Docker Compose configuration snippet seems to be for Nvidia GPUs and won't work for AMD with ROCm. Refer to Ollama's documentation for configuration specific to ROCm.

Unleash the Power of Your AMD GPU with Ollama!

With Ollama and ROCm working in tandem on your AMD-powered Kubernetes cluster, you're well-equipped to tackle demanding LLM tasks. Remember to consult Ollama's official documentation for detailed instructions and troubleshooting. Happy experimenting!