DEV Community

Ajeet Singh Raina
Ajeet Singh Raina

Posted on

Running Ollama 2 on NVIDIA Jetson Nano with GPU using Docker

Ollama is a rapidly growing development tool, with 10,000 Docker Hub pulls in a short period of time. It is a large language model (LLM) from Google AI that is trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

To run OLLAMA on a Jetson Nano, you will need to install the following software:

  • Docker Engine
  • OLLAMA Docker image

Installing Docker

To install Docker on a Jetson Nano, follow these steps:

Update the package list:

sudo apt update
Enter fullscreen mode Exit fullscreen mode

Install Docker:

sudo curl -sSL https://get.docker.com/ | sh
Enter fullscreen mode Exit fullscreen mode

Add your user to the Docker group:

sudo groupadd docker
sudo usermod -aG docker $USER
Enter fullscreen mode Exit fullscreen mode

Log out and back in for the changes to take effect.

Install with Apt

Configure the repository

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
Enter fullscreen mode Exit fullscreen mode

Install the NVIDIA Container Toolkit packages

sudo apt-get install -y nvidia-container-toolkit
Enter fullscreen mode Exit fullscreen mode

Configure Docker to use Nvidia driver

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Enter fullscreen mode Exit fullscreen mode

Start the container

sudo docker run -d --gpus=all --runtime=nvidia -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Enter fullscreen mode Exit fullscreen mode

Run model locally

Now you can run a model:

sudo docker exec -it ollama ollama run llama2
Enter fullscreen mode Exit fullscreen mode
sudo docker exec -it ollama ollama run llama2
pulling manifest
pulling 8daa9615cce3... 100% |████████████████████████████| (3.8/3.8 GB, 2.4 MB/s)
pulling 8c17c2ebb0ea... 100% |████████████████████████████████| (7.0/7.0 kB, 3.7 kB/s)
pulling 7c23fb36d801... 100% |████████████████████████████████| (4.8/4.8 kB, 2.0 kB/s)
pulling bec56154823a... 100% |█████████████████████████████████████| (59/59 B, 18 B/s)
pulling e35ab70a78c7... 100% |█████████████████████████████████████| (90/90 B, 32 B/s)
pulling 09fe89200c09... 100% |██████████████████████████████████| (529/529 B, 180 B/s)
verifying sha256 digest
writing manifest
removing any unused layers
success
Enter fullscreen mode Exit fullscreen mode

The command sudo docker exec -it ollama ollama run llama2 will start the OLLAMA 2 model in the ollama container. This will allow you to interact with the model directly from the command line.

To use the OLLAMA 2 model, you can send it text prompts and it will generate text in response. For example, to generate a poem about a cat, you would run the following command:

docker exec -it ollama ollama run llama2 --prompt "Write a poem about a cat."
Enter fullscreen mode Exit fullscreen mode

This will generate a poem about a cat and print it to the console.

You can also use the OLLAMA 2 model to translate languages, write different kinds of creative content, and answer your questions in an informative way.

Experiment with different prompts to test the capabilities of the OLLAMA 2 model.

Here are some examples of prompts you can use with the OLLAMA 2 model:

  • Translate the sentence "Hello, world!" into Spanish.
  • Write a short story about a robot who falls in love with a human.
  • Generate a list of ideas for new products.
  • Answer the question "What is the meaning of life?"

The OLLAMA 2 model is still under development, but it has the potential to be a powerful tool for a variety of tasks.

Top comments (2)

Collapse
 
dillera profile image
andyD

Yeah so this doesn't even use the GPU on the Jetson. If it was a little more than just a rehash of hub.docker.com/r/ollama/ollama page you'd quickly see that it's just using the CPUs because of the way the Jetson works....

Collapse
 
dillera profile image
andyD

Great info- thanks. When i run this on my Jetson however I see 100% CPU usage and 0% gpu usage... what am I missing to run the model on the Ampere GPU on this board?