DEV Community

Ravi Rai
Ravi Rai

Posted on • Originally published at Medium on

Complete Guide: Setting Up Ollama on Intel GPU with Intel Graphics Package Manager

I remember using ChatGPT for the first time to write a reply when i received appreciation from leadership team for my work in my previous company. Nowadays, it is part of day to day life, AI has made my life easier. I was wondering what if we can run LLM locally on my laptop. I installed Ollama desktop for windows on my Laptop. My laptop with just 16 GB RAM was working fine with small models with basic email writing task. Using a model with 1b parameters and my regular apps like teams, chrome etc, my laptop was frequently become unresponsive. On my another Laptop with dedicated graphics card, I was able to run models upto 8b parameters smoothly.

I thought why can’t we use Intel GPU to perform the GPU heavy tasks on my laptop. I started exploring and found a reference to Intel Ipex-llm project on Github. You will get a zip file which you can extract and Ollama locally using Intel GPU. I did this setup on ubuntu 24.04 running on windows wsl. Here is step by step process:

  1. Update GPU driver on machine

Follow below steps to install packages from intel

A. Refresh the package index and install paclage manager

sudo apt-get update
sudo apt-get install -y software-properties-common
Enter fullscreen mode Exit fullscreen mode

B. Add intel-graphics Personal Package Archive (PPA)

sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
Enter fullscreen mode Exit fullscreen mode

C. Install compute related packages

sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
Enter fullscreen mode Exit fullscreen mode

D. Install media related packages

sudo apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo
Enter fullscreen mode Exit fullscreen mode

E. Verify installation

clinfo | grep "Device Name"
Enter fullscreen mode Exit fullscreen mode

result of running command clinfo | grep “Device Name”<br>
 Device Name Intel(R) Graphics [0xa721]<br>
 Device Name Intel(R) Graphics [0xa721]<br>
 Device Name Intel(R) Graphics [0xa721]<br>
 Device Name Intel(R) Graphics [0xa721]

if you do not see the result like above, there could be some issue with the user you are using , run below commands to add your user.

sudo gpasswd -a ${USER} render
newgrp render
Enter fullscreen mode Exit fullscreen mode

using the above steps, we have installed intel graphics packages in ubuntu running in wsl.

  1. Download the file from this link.

  2. Extract the file

tar -xvf [Downloaded tgz file path]
Enter fullscreen mode Exit fullscreen mode
  1. Go to the extracted folder and run start-ollama.sh
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
Enter fullscreen mode Exit fullscreen mode

screenshot of ollama running after running command

  1. Open another terminal and run your model
cd PATH/TO/EXTRACTED/FOLDER
./ollama run llama3.2:1b
Enter fullscreen mode Exit fullscreen mode

sample run for running ollama from ubuntu terminal

  1. You can verify the GPU usage from task manager.

Screenshot of GPU usage from task manager.

Conclusion

I was able to run small models like qwen3:1.7b, qwen3:0.6b, llama3.2:1b, and gemma3:1b smoothly. running deepseek model deepseek-r1:1.5b was giving garbage response. somehow managed to run gemma3:4b only once after that it was getting failed. what more i can expect on machine running on 16 GB RAM with i5 processor. It was good learning, i connected the locally running ollama with Librechat and played with it.

References:

  1. https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md
  2. https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly
  3. https://dgpu-docs.intel.com/driver/client/overview.html

Top comments (0)