Ajeet Singh Raina

Posted on Jan 29, 2024

Running Docker GenAI Stack Using GPU

The Docker GenAI Stack is a set of open-source tools and technologies that simplifies the development and deployment of Generative AI (GenAI) applications. It aims to make building and running AI models like large language models (LLMs) and other complex AI systems easier and more accessible for developers, especially those not deeply familiar with AI infrastructure.

The GenAI Stack combines four key components:

Langchain: A Python library for managing AI workflows, data pipelines, and experiments.
Docker: Containerization platform for packaging and running applications in a consistent and isolated environment.
Neo4j: Graph database for storing and managing relationships between data points, particularly valuable for LLMs with their contextual understanding.
Ollama: Tool for optimizing and deploying LLMs on various hardware platforms, including CPUs and GPUs.

By integrating these components, the GenAI Stack provides several benefits:

Faster Development: Streamlines the development process by providing pre-built modules and tools for common GenAI tasks.
Simplified Deployment: Makes it easier to deploy GenAI applications across different environments and platforms.
Improved Efficiency: Optimizes AI models for better performance and resource utilization.
Accessibility: Lowers the barrier to entry for developers new to GenAI by providing a user-friendly platform.

In essence, the Docker GenAI Stack tackles the complexity of building and deploying GenAI applications, democratizing access to this powerful technology for a wider range of developers and enabling faster innovation in the field.

Using GenAI Stack with GPU

Using a GPU with the Docker GenAI Stack offers several key advantages for working with Generative AI (GenAI) models, particularly large language models (LLMs):

1. Significantly faster training and inference

GPUs are specially designed for parallel processing, making them vastly faster than CPUs for tasks involving large amounts of data, like training and running LLMs. This translates to:

Reduced training time: Your models will be trained in a fraction of the time compared to a CPU-only setup, speeding up development and experimentation.
Real-time responsiveness: Inference, or drawing conclusions from the model, becomes much faster, allowing for real-time interactions with LLMs, for example in chatbots or voice assistants.

2. Increased model capacity and complexity

GPUs enable training and running larger and more complex LLMs that wouldn't be feasible on CPUs. This allows you to:

Handle larger datasets: Train on more data to create more accurate and comprehensive models.
Build models with higher parameter counts: This leads to LLMs with greater capabilities and more nuanced understanding of language.
Explore more advanced architectures: Leverage the power of GPUs to experiment with cutting-edge LLM architectures.

3. Improved resource efficiency: Running GenAI workloads on GPUs can provide better resource utilization, leading to:

Reduced power consumption: Compared to CPUs, GPUs can handle the same workload with less power, saving energy and operating costs.
Lower overall hardware cost: In some cases, using a single GPU-powered system can be more cost-effective than a larger number of CPU-based machines for GenAI tasks.

4. Simplified development and deployment: The GenAI Stack is designed to leverage the capabilities of GPUs seamlessly, making it:

Easier to build and deploy GPU-accelerated GenAI applications: The stack handles resource allocation and configuration for you, streamlining the process.
More portable: You can easily move your GenAI workflow across different environments with GPU support without major adjustments.

If you're not sure whether a GPU is necessary for your specific use case, consider the size and complexity of your models, desired training and inference speeds, and budget constraints. For smaller models or less demanding tasks, a CPU-only setup might still be sufficient.

Getting Started

Prerequsities

1. Download the required NVIDIA driver

Visit the official NVIDIA drivers page to download and install the proper drivers. Reboot your system once you have done so.

2. Install nvidia-container-runtime

Follow the instructions at ( https://nvidia.github.io/nvidia-container-runtime/) and then run this command:



 apt-get install nvidia-container-runtime

Ensure the nvidia-container-runtime-hook is accessible from $PATH.



 which nvidia-container-runtime-hook

3. Restart the Docker daemon

4. Expose GPUs for use

Include the --gpus flag when you start a container to access GPU resources. Specify how many GPUs to use. For example:



docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
29202e855b20: Pull complete
Digest: sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74
Status: Downloaded newer image for ubuntu:latest
Mon Jan 29 02:36:51 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID A100D-4C       On   | 00000000:06:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Exposes that specific GPU



docker run -it --rm --gpus '"device=0"' ubuntu nvidia-smi
Mon Jan 29 02:42:33 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID A100D-4C       On   | 00000000:06:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Pls note that NVIDIA GPUs can only be accessed by systems running a single engine.

Set NVIDIA capabilities

You can set capabilities manually. For example, on Ubuntu you can run the following:



 docker run --gpus 'all,capabilities=utility' --rm ubuntu nvidia-smi
Mon Jan 29 02:43:47 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID A100D-4C       On   | 00000000:06:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+--------------------------

Clone the repo



git clone https://github.com/docker/genai-stack
cd genai-stack

Setting up the environment variables



cat .env
OPENAI_API_KEY=sk-EsNJzI5uMBCXXXXXXXX0Htnig8KIil4x
OLLAMA_BASE_URL=http://llm-gpu:11434
NEO4J_URI=neo4j://database:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=XXXXX
LLM=llama2 #or any Ollama model tag, or gpt-4 or gpt-3.5
EMBEDDING_MODEL=sentence_transformer #or openai or ollama

LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true # false
LANGCHAIN_PROJECT=default
LANGCHAIN_API_KEY=ls__cbXXXXXXX1d6106dd

Bringing up Docker GenAI Stack



docker compose --profile linux-gpu up

Verifying if all the required services are up and running



NAME                      IMAGE                   COMMAND                                                                  SERVICE     CREATED         STATUS                     PORTS
genai-stack-api-1         genai-stack-api         "uvicorn api:app --host 0.0.0.0 --port 8504"                             api         8 minutes ago   Up 6 minutes (healthy)     0.0.0.0:8504->8504/tcp, :::8504->8504/tcp
genai-stack-bot-1         genai-stack-bot         "streamlit run bot.py --server.port=8501 --server.address=0.0.0.0"       bot         8 minutes ago   Up 6 minutes (healthy)     0.0.0.0:8501->8501/tcp, :::8501->8501/tcp
genai-stack-database-1    neo4j:5.11              "tini -g -- /startup/docker-entrypoint.sh neo4j"                         database    8 minutes ago   Up 8 minutes (healthy)     0.0.0.0:7474->7474/tcp, :::7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp, :::7687->7687/tcp
genai-stack-front-end-1   genai-stack-front-end   "npm run dev"                                                            front-end   8 minutes ago   Up 5 minutes               0.0.0.0:8505->8505/tcp, :::8505->8505/tcp
genai-stack-llm-gpu-1     ollama/ollama:latest    "/bin/ollama serve"                                                      llm-gpu     4 weeks ago     Up 20 minutes              11434/tcp
genai-stack-loader-1      genai-stack-loader      "streamlit run loader.py --server.port=8502 --server.address=0.0.0.0"    loader      8 minutes ago   Up 6 minutes (unhealthy)   0.0.0.0:8502->8502/tcp, :::8502->8502/tcp, 0.0.0.0:8081->8080/tcp, :::8081->8080/tcp
genai-stack-pdf_bot-1     genai-stack-pdf_bot     "streamlit run pdf_bot.py --server.port=8503 --server.address=0.0.0.0"   pdf_bot     8 minutes ago   Up 6 minutes (healthy)     0.0.0.0:8503->8503/tcp, :::8503->8503/tcp

You will see the following results:



genai-stack-pull-model-1  | pulling 8934d96d3f08... 100% ▕▏ 3.8 GB
genai-stack-pull-model-1  | pulling 8c17c2ebb0ea... 100% ▕▏ 7.0 KB
genai-stack-pull-model-1  | pulling 7c23fb36d801... 100% ▕▏ 4.8 KB
genai-stack-pull-model-1  | pulling 2e0493f67d0c... 100% ▕▏   59 B
genai-stack-pull-model-1  | pulling fa304d675061... 100% ▕▏   91 B
genai-stack-pull-model-1  | pulling 42ba7f8a01dd... 100% ▕▏  557 B
...

Here's what's in this repo:

Name	Main files	Compose name	URLs	Description
Support Bot	`bot.py`	`bot`	http://localhost:8501	Main usecase. Fullstack Python application.
Stack Overflow Loader	`loader.py`	`loader`	http://localhost:8502	Load SO data into the database (create vector embeddings etc). Fullstack Python application.
PDF Reader	`pdf_bot.py`	`pdf_bot`	http://localhost:8503	Read local PDF and ask it questions. Fullstack Python application.
Standalone Bot API	`api.py`	`api`	http://localhost:8504	Standalone HTTP API streaming (SSE) + non-streaming endpoints Python.
Standalone Bot UI	`front-end/`	`front-end`	http://localhost:8505	Standalone client that uses the Standalone Bot API to interact with the model. JavaScript (Svelte) front-end.

The database can be explored at http://localhost:7474.

Verifying if GPU is being consumed or not

Let's try the PDF bot sample app and see if GPU is being used or not.

Open http://HostIP:8503/ to access the PDF bot.

You can clearly see that new process under GPU section that indicates GPU is being leveraged.

Top comments (2)

Armel BOBDA • Feb 18 '24 • Edited

Thank you for the detailed process. I tried to follow it but after running the command:
docker run -it --rm --gpus '"device=0"' ubuntu nvidia-smi
I got the following error:
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

I use docker-desktop.

Help, please

Ajeet Singh Raina • Feb 18 '24

It seems like there might be an issue with the NVIDIA container toolkit or the environment setup in your Docker container. Here are a few steps you can take to troubleshoot and resolve this issue:

Check NVIDIA Container Toolkit Installation: Make sure that the NVIDIA container toolkit is installed correctly on your system. You can follow the official documentation from NVIDIA to install the toolkit for your specific operating system: docs.nvidia.com/datacenter/cloud-n...
Verify Docker Configuration: Ensure that your Docker daemon is configured to use the NVIDIA runtime and that the necessary environment variables are set. You can check the Docker daemon configuration file (usually located at /etc/docker/daemon.json) to verify this.
Restart Docker Daemon: Sometimes, restarting the Docker daemon can resolve issues related to environment setup. You can do this by running the following command:

sudo systemctl restart docker

Rebuild Docker Image: If you're using a custom Docker image, make sure that it includes the necessary NVIDIA drivers and libraries. You may need to rebuild the Docker image with the appropriate configurations.
Check NVIDIA Driver Installation: Ensure that the NVIDIA drivers are installed correctly on your system. You can use the nvidia-smi command outside of Docker to verify that the drivers are functioning properly.
Permissions: Ensure that your user has the necessary permissions to access the NVIDIA GPU devices. You may need to add your user to the docker group or adjust permissions accordingly.
Update Docker and NVIDIA Drivers: Make sure that both Docker and the NVIDIA drivers are up to date. Outdated software versions can sometimes lead to compatibility issues.