I installed a local LLM on my laptop for the first time yesterday. I thought I'd document the experience for those who would like to try it.
Here's the info on my OS and PC:
Linux OS - Debian 12 - Crunchbang ++
HP CPU: 11th Gen Intel Core i7-1165G7 (4 cores, 8 threads), up to 4.7 GHz, with Iris Xe integrated graphics.
RAM: 12 GiB (reported as 11 GiB usable) - sufficient for 4B–7B quantized LLMs.
GPU: Intel Iris Xe (integrated, no dedicated GPU), so Ollama runs purely on CPU - still viable with quantized models.
Download and install couldn't be easier:
curl -fsSL https://ollama.com/install.sh | sh
ollama run phi3:mini-4k
I chose one of the lightest models - the phi3 mini-4k, since without a NVIDIA GPU, my CPU would have to run the model without help from the GPU.
Once finished, the command to execute: ollama run phi3:mini-4k
That's it - the model began to run in my terminal. I even turned off my wireless, so that it hit home that I was really running this ai on my own machine. I asked it for a 300-word article on the great oxygenation event to keep it busy while I opened a second terminal and called up HTOP. I knew the CPU would be working hard, but was surprised to see a value of 392% - meaning all 4 cores of my CPU were running at nearly 100%. The process was using about 4 - 5 GB of RAM during active inference.
Ctrl C stops the current prompt, leaving the model running. Ctrl D stops the process entirely.
The model remains on the PC and will persist after a reboot. As soon as the process is terminated, the RAM is released so it can remain on my PC for when I want to experiment with it again.
That's it - easy download and install and the model runs well enough to experiment with, though without a GPU to handle the parallel processing, even this small model is too much for my CPU to be practical.
This surprised me at how easy it was to get up and running, so I thought I'd document the process and post it here for dev members. Since I won't be buying a new laptop with a NVIDIA GPU anytime soon, I'll be back to the online LLM's. But it was fun to see an ai model run on my PC for the first time.
I purposely run my i7 laptop without a heavy desktop environment, always using a debian minimal install with openbox window manager and the tint2 panel (ie - Crunchbang++ and BunsenLabs). This allows me to use as much of my CPU as possible for whatever processes I'm running. Still, even running this tiny model maxed it out and clearly revealed what kind of energy these LLMs require and how necessary a GPU and its parallel-processing ability really is for this purpose. It also helped drive home the understanding of exactly why NVIDIA is currently valued at 5 trillion dollars.
Try this out on your PC if it has the specs - check your specs with Qwen, or your favorite LLM model - it will tell you if running a small model on that machine is viable. If so, give it a try - it's a great experience.
Ben Santora - January 2026

Top comments (0)