DEV Community

Ajeet Singh Raina
Ajeet Singh Raina

Posted on

1

Ollama Meets AMD GPUs

Large Language Models (LLMs) are revolutionizing the way we interact with machines. Their ever-growing complexity demands ever-increasing processing power. This is where accelerators like GPUs come into play, offering a significant boost for training and inference tasks.

The good news? Ollama, a popular self-hosted large language model server, now joins the party with official support for AMD GPUs through ROCm! This blog dives into how to leverage this exciting new development, even if your Ollama server resides within a Kubernetes cluster.

Ollama Meets AMD GPUs

A Match Made in Compute Heaven. Ollama's integration with ROCm allows you to utilize the raw power of your AMD graphics card for running LLMs. This translates to faster training times and smoother inference experiences. But wait, there's more!

Benefits of AMD + ROCm for Ollama:

  • Cost-effective performance: AMD GPUs offer exceptional value for money, making them a great choice for budget-conscious LLM enthusiasts.
  • Open-source advantage: ROCm, the open-source platform powering AMD's GPU ecosystem, fosters a collaborative environment and continuous development.

Setting Up Ollama with AMD and ROCm on Kubernetes

Here's how to deploy Ollama with ROCm support on your Kubernetes cluster:

  1. Install the ROCm Kubernetes Device Plugin:

This plugin facilitates communication between Ollama and your AMD GPU. Follow the official guide at https://github.com/ROCm/k8s-device-plugin/blob/master/README.md for installation instructions.

  1. Deploy Ollama with ROCm Support (using Kubernetes YAML):

The YAML configuration you provided offers a solid template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-rocm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama-rocm
  template:
    metadata:
      labels:
        app: ollama-rocm
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:rocm
        ports:
        - containerPort: 11434
          name: ollama
        volumeMounts:
        - name: ollama-data
          mountPath: /root/.ollama
        resources:
          requests:
            memory: "32Gi"
            cpu: "64"
          limits:
            memory: "100Gi"
            cpu: "64"
            amd.com/gpu: 1
      volumes:
      - name: ollama-data
        hostPath:
          path: /var/lib/ollama/.ollama
          type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service-rocm
spec:
  selector:
    app: ollama-rocm
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
    name: ollama
Enter fullscreen mode Exit fullscreen mode

Key points to note:

  1. The ollama/ollama:rocm image ensures you're using the ROCm-compatible version of Ollama.
  2. The amd.com/gpu: 1 resource request signifies your desire to utilize one AMD GPU for Ollama.
  3. Exposing Ollama Services:

The provided Service definition exposes Ollama's port (11434) for external access.

Important Note:

The provided Docker Compose configuration snippet seems to be for Nvidia GPUs and won't work for AMD with ROCm. Refer to Ollama's documentation for configuration specific to ROCm.

Unleash the Power of Your AMD GPU with Ollama!

With Ollama and ROCm working in tandem on your AMD-powered Kubernetes cluster, you're well-equipped to tackle demanding LLM tasks. Remember to consult Ollama's official documentation for detailed instructions and troubleshooting. Happy experimenting!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more