DEV Community

Cover image for Ollama: How to Easily Run LLMs Locally on Your Computer

Ollama: How to Easily Run LLMs Locally on Your Computer

Richa Parekh on June 26, 2025

I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your ...
Collapse
 
solvecomputerscience profile image
Solve Computer Science

Try the qwen2.5-coder model family. Yes, 4GB is insufficient to run anything useful. I'm trying with qwen2.5-coder:14b-instruct-q2_K (so low quantization and higher parameters) and it's not bad at all. The speed and quality is decent all considering. You'll need about 20GB of RAM, however. Be aware I got Chinese language only replies when running 1.5B models of that family.

Collapse
 
richa-parekh profile image
Richa Parekh

Thanks for the tip! I’ll definitely check out qwen2.5-coder

Collapse
 
solvecomputerscience profile image
Solve Computer Science

ollama.com/library/qwen2.5-coder/tags You'll have to experiment the smallest models with different quantization levels and avoid swapping to disk during inference.

Thread Thread
 
richa-parekh profile image
Richa Parekh

Thanks for the link! I’ll explore the smallest models and test their performance. Thank you for the suggestion!

Collapse
 
dotallio profile image
Dotallio

Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.

Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?

Collapse
 
richa-parekh profile image
Richa Parekh

Since I'm still learning the concept and getting an understanding of how everything works, I'm sticking to the basics for now

Collapse
 
js402 profile image
Alexander Ertli

Hey,

Welcome to the genAI techspace.
There is nothing wrong in using smaller models, i resort to them all the time.

If you are interested you could try a much smaller model like smollm2:135m or qwen:0.5b they should be much more responsive with your hardware.

Also typically Ollama tries to run models on using the GPU or at least partially if you have a compatible one.

I hope this helps.

Collapse
 
richa-parekh profile image
Richa Parekh

Yes, I will check out the smaller models. Thanks for the useful advice.

Collapse
 
arindam_1729 profile image
Arindam Majumder

Ollama is Great. You can also use Docker Model Runner for this

Collapse
 
richa-parekh profile image
Richa Parekh

Yeah, ollama is a valuable tool. Thanks for sharing.

Collapse
 
iampraveen profile image
Praveen Rajamani

Thanks for being clear about the hardware limits. Many people try to run local LLMs, thinking it will just work, then get frustrated when it is slow or crashes. Posts like this help save a lot of time and confusion.

Collapse
 
richa-parekh profile image
Richa Parekh

Appreciate that! I'm glad the post was helpful.

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

This is extremely impressive, love how you documented the process and called out the RAM struggle directly. Makes me wanna try it on my old laptop now

Collapse
 
richa-parekh profile image
Richa Parekh

Thank you for the appreciation. Ollama is definitely worth a try.