<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ravi Rai</title>
    <description>The latest articles on DEV Community by Ravi Rai (@rrai).</description>
    <link>https://dev.to/rrai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F172214%2F0b763906-1f37-4315-93ae-f49d9fbd9b53.jpeg</url>
      <title>DEV Community: Ravi Rai</title>
      <link>https://dev.to/rrai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rrai"/>
    <language>en</language>
    <item>
      <title>Complete Guide: Setting Up Ollama on Intel GPU with Intel Graphics Package Manager</title>
      <dc:creator>Ravi Rai</dc:creator>
      <pubDate>Fri, 05 Sep 2025 17:44:20 +0000</pubDate>
      <link>https://dev.to/rrai/complete-guide-setting-up-ollama-on-intel-gpu-with-intel-graphics-package-manager-151b</link>
      <guid>https://dev.to/rrai/complete-guide-setting-up-ollama-on-intel-gpu-with-intel-graphics-package-manager-151b</guid>
      <description>&lt;p&gt;I remember using ChatGPT for the first time to write a reply when i received appreciation from leadership team for my work in my previous company. Nowadays, it is part of day to day life, AI has made my life easier. I was wondering what if we can run LLM locally on my laptop. I installed Ollama desktop for windows on my Laptop. My laptop with just 16 GB RAM was working fine with small models with basic email writing task. Using a model with 1b parameters and my regular apps like teams, chrome etc, my laptop was frequently become unresponsive. On my another Laptop with dedicated graphics card, I was able to run models upto 8b parameters smoothly.&lt;/p&gt;

&lt;p&gt;I thought why can’t we use Intel GPU to perform the GPU heavy tasks on my laptop. I started exploring and found a reference to &lt;a href="https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md" rel="noopener noreferrer"&gt;Intel Ipex-llm&lt;/a&gt; project on Github. You will get a zip file which you can extract and Ollama locally using Intel GPU. I did this setup on ubuntu 24.04 running on windows wsl. Here is step by step process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Update GPU driver on machine&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Follow below steps to install packages from intel&lt;br&gt;&lt;br&gt;
A. Refresh the package index and install paclage manager&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get update
sudo apt-get install -y software-properties-common
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;B. Add intel-graphics Personal Package Archive (PPA)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;C. Install compute related packages&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;D. Install media related packages&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;E. Verify installation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;clinfo | grep "Device Name"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AEdC9FM9mOBbjMNLwzdFkrA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F1%2AEdC9FM9mOBbjMNLwzdFkrA.png" alt="result of running command clinfo | grep “Device Name”&amp;lt;br&amp;gt;
 Device Name Intel(R) Graphics [0xa721]&amp;lt;br&amp;gt;
 Device Name Intel(R) Graphics [0xa721]&amp;lt;br&amp;gt;
 Device Name Intel(R) Graphics [0xa721]&amp;lt;br&amp;gt;
 Device Name Intel(R) Graphics [0xa721]" width="800" height="85"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;if you do not see the result like above, there could be some issue with the user you are using , run below commands to add your user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo gpasswd -a ${USER} render
newgrp render
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;using the above steps, we have installed intel graphics packages in ubuntu running in wsl.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download the file from this &lt;a href="https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extract the file&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tar -xvf [Downloaded tgz file path]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Go to the extracted folder and run start-ollama.sh
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69ubb8nigwsbl4g8dzh7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69ubb8nigwsbl4g8dzh7.png" alt="screenshot of ollama running after running command" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open another terminal and run your model
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd PATH/TO/EXTRACTED/FOLDER
./ollama run llama3.2:1b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88aisuznd3z5awu68dt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88aisuznd3z5awu68dt4.png" alt="sample run for running ollama from ubuntu terminal" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can verify the GPU usage from task manager.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jy0ms08qddbdp1zua85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jy0ms08qddbdp1zua85.png" alt="Screenshot of GPU usage from task manager." width="751" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I was able to run small models like qwen3:1.7b, qwen3:0.6b, llama3.2:1b, and gemma3:1b smoothly. running deepseek model deepseek-r1:1.5b was giving garbage response. somehow managed to run gemma3:4b only once after that it was getting failed. what more i can expect on machine running on 16 GB RAM with i5 processor. It was good learning, i connected the locally running ollama with Librechat and played with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md" rel="noopener noreferrer"&gt;https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly" rel="noopener noreferrer"&gt;https://github.com/ipex-llm/ipex-llm/releases/tag/v2.3.0-nightly&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dgpu-docs.intel.com/driver/client/overview.html" rel="noopener noreferrer"&gt;https://dgpu-docs.intel.com/driver/client/overview.html&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>intelgpu</category>
      <category>ipexllm</category>
      <category>generativeai</category>
      <category>localai</category>
    </item>
  </channel>
</rss>
