I remember using ChatGPT for the first time to write a reply when i received appreciation from leadership team for my work in my previous company. Nowadays, it is part of day to day life, AI has made my life easier. I was wondering what if we can run LLM locally on my laptop. I installed Ollama desktop for windows on my Laptop. My laptop with just 16 GB RAM was working fine with small models with basic email writing task. Using a model with 1b parameters and my regular apps like teams, chrome etc, my laptop was frequently become unresponsive. On my another Laptop with dedicated graphics card, I was able to run models upto 8b parameters smoothly.
I thought why can’t we use Intel GPU to perform the GPU heavy tasks on my laptop. I started exploring and found a reference to Intel Ipex-llm project on Github. You will get a zip file which you can extract and Ollama locally using Intel GPU. I did this setup on ubuntu 24.04 running on windows wsl. Here is step by step process:
- Update GPU driver on machine
Follow below steps to install packages from intel
A. Refresh the package index and install paclage manager
sudo apt-get update
sudo apt-get install -y software-properties-common
B. Add intel-graphics Personal Package Archive (PPA)
sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
C. Install compute related packages
sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
D. Install media related packages
sudo apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo
E. Verify installation
clinfo | grep "Device Name"
if you do not see the result like above, there could be some issue with the user you are using , run below commands to add your user.
sudo gpasswd -a ${USER} render
newgrp render
using the above steps, we have installed intel graphics packages in ubuntu running in wsl.
Download the file from this link.
Extract the file
tar -xvf [Downloaded tgz file path]
- Go to the extracted folder and run start-ollama.sh
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
- Open another terminal and run your model
cd PATH/TO/EXTRACTED/FOLDER
./ollama run llama3.2:1b
- You can verify the GPU usage from task manager.
Conclusion
I was able to run small models like qwen3:1.7b, qwen3:0.6b, llama3.2:1b, and gemma3:1b smoothly. running deepseek model deepseek-r1:1.5b was giving garbage response. somehow managed to run gemma3:4b only once after that it was getting failed. what more i can expect on machine running on 16 GB RAM with i5 processor. It was good learning, i connected the locally running ollama with Librechat and played with it.
References:
Top comments (0)