DEV Community

Ajeet Singh Raina
Ajeet Singh Raina

Posted on

10

Running Ollama 2 on NVIDIA Jetson Nano with GPU using Docker

Ollama is a rapidly growing development tool, with 10,000 Docker Hub pulls in a short period of time. It is a large language model (LLM) from Google AI that is trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

To run OLLAMA on a Jetson Nano, you will need to install the following software:

  • Docker Engine
  • OLLAMA Docker image

Installing Docker

To install Docker on a Jetson Nano, follow these steps:

Update the package list:

sudo apt update
Enter fullscreen mode Exit fullscreen mode

Install Docker:

sudo curl -sSL https://get.docker.com/ | sh
Enter fullscreen mode Exit fullscreen mode

Add your user to the Docker group:

sudo groupadd docker
sudo usermod -aG docker $USER
Enter fullscreen mode Exit fullscreen mode

Log out and back in for the changes to take effect.

Install with Apt

Configure the repository

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
Enter fullscreen mode Exit fullscreen mode

Install the NVIDIA Container Toolkit packages

sudo apt-get install -y nvidia-container-toolkit
Enter fullscreen mode Exit fullscreen mode

Configure Docker to use Nvidia driver

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Enter fullscreen mode Exit fullscreen mode

Start the container

sudo docker run -d --gpus=all --runtime=nvidia -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Enter fullscreen mode Exit fullscreen mode

Run model locally

Now you can run a model:

sudo docker exec -it ollama ollama run llama2
Enter fullscreen mode Exit fullscreen mode
sudo docker exec -it ollama ollama run llama2
pulling manifest
pulling 8daa9615cce3... 100% |████████████████████████████| (3.8/3.8 GB, 2.4 MB/s)
pulling 8c17c2ebb0ea... 100% |████████████████████████████████| (7.0/7.0 kB, 3.7 kB/s)
pulling 7c23fb36d801... 100% |████████████████████████████████| (4.8/4.8 kB, 2.0 kB/s)
pulling bec56154823a... 100% |█████████████████████████████████████| (59/59 B, 18 B/s)
pulling e35ab70a78c7... 100% |█████████████████████████████████████| (90/90 B, 32 B/s)
pulling 09fe89200c09... 100% |██████████████████████████████████| (529/529 B, 180 B/s)
verifying sha256 digest
writing manifest
removing any unused layers
success
Enter fullscreen mode Exit fullscreen mode

The command sudo docker exec -it ollama ollama run llama2 will start the OLLAMA 2 model in the ollama container. This will allow you to interact with the model directly from the command line.

To use the OLLAMA 2 model, you can send it text prompts and it will generate text in response. For example, to generate a poem about a cat, you would run the following command:

docker exec -it ollama ollama run llama2 --prompt "Write a poem about a cat."
Enter fullscreen mode Exit fullscreen mode

This will generate a poem about a cat and print it to the console.

You can also use the OLLAMA 2 model to translate languages, write different kinds of creative content, and answer your questions in an informative way.

Experiment with different prompts to test the capabilities of the OLLAMA 2 model.

Here are some examples of prompts you can use with the OLLAMA 2 model:

  • Translate the sentence "Hello, world!" into Spanish.
  • Write a short story about a robot who falls in love with a human.
  • Generate a list of ideas for new products.
  • Answer the question "What is the meaning of life?"

The OLLAMA 2 model is still under development, but it has the potential to be a powerful tool for a variety of tasks.

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post →

Top comments (2)

Collapse
 
dillera profile image
andyD

Yeah so this doesn't even use the GPU on the Jetson. If it was a little more than just a rehash of hub.docker.com/r/ollama/ollama page you'd quickly see that it's just using the CPUs because of the way the Jetson works....

Collapse
 
dillera profile image
andyD

Great info- thanks. When i run this on my Jetson however I see 100% CPU usage and 0% gpu usage... what am I missing to run the model on the Ampere GPU on this board?

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay