DEV Community

Cover image for Ollama: How to Easily Run LLMs Locally on Your Computer
Richa Parekh
Richa Parekh

Posted on

Ollama: How to Easily Run LLMs Locally on Your Computer

I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your computer. So I wanted to know more about this and tried it out on the weekend. So here I am sharing what I learned when I used Ollama.

๐Ÿ” What Is Ollama?

Ollama is a lightweight yet powerful tool that lets you run LLMs like LLaMA, Mistral, DeepSeek, Starling, and others directly on your own computer. It runs in the background and shows both:

  • A Command-Line Interface (CLI) for quick management and interactions
  • An API that you can use in your own programs

The Advantage?

No dependency on the cloud. No keys for the API. Just LLMs that run on your computer.

๐Ÿ’ป Installing Ollama

It was surprisingly easy to get started:

Once installed, Ollama starts in the background, and we can run models using the CLI.
๐Ÿ’ก Installers for Linux and macOS users can be found on the same page.

๐Ÿ› ๏ธ Exploring the CLI

I opened the Windows Command Prompt (CMD) as soon as it was installed and began to explore. This is the summary of what I tried:

1. ollama

  • This gives a useful usage guide with a list of all the commands and flags that are available. ollama

2. ollama list

  • This displays every model that is currently installed. If nothing appears, it simply means no models have been installed yet. ollama list

3. ollama run llama3.2:1b

  • I use the llama3.2:1b model.

What makes me go with this model over the others? I'll explain later in this blog post. So, read till the end.

  • Ollama started a chat session directly in the terminal after automatically pulling the model, which took a few seconds.
  • I started conversion with a simple "hello" message. In response to my greeting, the model said:

Hello. Is there something I can help you with, or would you like to chat?

ollama run llama3.2:1b

  • Then I continued with the below few conversions, and the model response was accurate and well-structured. other conversion

4. Exiting the Chat /bye

  • Simply type /bye to end a chat session: Exiting the Chat

5. ollama rm llama3.2:1b

  • This command cleans up and frees up disk space.
  • The model is immediately deleted from the system. ollama rm llama3.2:1b

These are some of the commands I first tried with Ollama. These are only basic steps; there are a lot more things we can do with Ollama. See the Ollama GitHub repository for further information.

โ‰๏ธ Why I use llama3.2:1b model?

I use ollama run llama3.2 after installing Ollama. For the response to the simple prompt message hello, it takes longer than 1-3 minutes to provide a basic response:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?

What is the reason for that? Well, due to the memory limitations of my system. As per the Ollama documentation,

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

And my system has only 4GB of RAM ๐Ÿ˜…

Since this subject is new to me, I did my research into the reasons behind these particular requirements.

This is what I discovered ๐Ÿ‘‡
๐Ÿ”ด Main issue: Not enough RAM

  • For a smooth experience, Llama 3.2 (3B parameters) requires around 6โ€“8GB of RAM.
  • The total RAM on my system is just 4GB.
  • Windows 10 itself requires 2-3GB of RAM.
  • As a result, the AI model has very little memory left.

๐ŸŸฆ What happens if there is not enough RAM?

  • The computer starts to use "virtual memory" โ€” fake RAM made from hard drive space.
  • RAM is 100 times faster than hard drives.
  • The model is constantly "swapped" between the hard drive and RAM, which causes a bottleneck that makes everything very slow.

So I tried to use the smaller Llama 3.2 1B model, which works more smoothly than Llama 3.2. I also tried running other models, but they didn't work due to the system requirements.

๐Ÿ’ฌ Final Thoughts

Without depending on cloud APIs or remote inference, Ollama provides a very developer-friendly way to explore and play with LLMs. This tool is valuable if you're interested in developing local-first applications or simply want to learn how LLMs work off-cloud.

The CLI is easy to use, and the setup went smoothly in my experience. Ollama is definitely worth a try, no matter whether you're a developer developing edge-native apps or a hobbyist learning AI.

Have you tried to run LLMs locally? What models do you explore, or what are you creating with Ollama-like tools? Share your experience and leave your comments below!

Thank you for reading! โœจ

Top comments (14)

Collapse
 
solvecomputerscience profile image
Solve Computer Science

Try the qwen2.5-coder model family. Yes, 4GB is insufficient to run anything useful. I'm trying with qwen2.5-coder:14b-instruct-q2_K (so low quantization and higher parameters) and it's not bad at all. The speed and quality is decent all considering. You'll need about 20GB of RAM, however. Be aware I got Chinese language only replies when running 1.5B models of that family.

Collapse
 
richa-parekh profile image
Richa Parekh

Thanks for the tip! Iโ€™ll definitely check out qwen2.5-coder

Collapse
 
solvecomputerscience profile image
Solve Computer Science

ollama.com/library/qwen2.5-coder/tags You'll have to experiment the smallest models with different quantization levels and avoid swapping to disk during inference.

Thread Thread
 
richa-parekh profile image
Richa Parekh

Thanks for the link! Iโ€™ll explore the smallest models and test their performance. Thank you for the suggestion!

Collapse
 
dotallio profile image
Dotallio

Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.

Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?

Collapse
 
richa-parekh profile image
Richa Parekh

Since I'm still learning the concept and getting an understanding of how everything works, I'm sticking to the basics for now

Collapse
 
js402 profile image
Alexander Ertli

Hey,

Welcome to the genAI techspace.
There is nothing wrong in using smaller models, i resort to them all the time.

If you are interested you could try a much smaller model like smollm2:135m or qwen:0.5b they should be much more responsive with your hardware.

Also typically Ollama tries to run models on using the GPU or at least partially if you have a compatible one.

I hope this helps.

Collapse
 
richa-parekh profile image
Richa Parekh

Yes, I will check out the smaller models. Thanks for the useful advice.

Collapse
 
arindam_1729 profile image
Arindam Majumder

Ollama is Great. You can also use Docker Model Runner for this

Collapse
 
richa-parekh profile image
Richa Parekh

Yeah, ollama is a valuable tool. Thanks for sharing.

Collapse
 
iampraveen profile image
Praveen Rajamani

Thanks for being clear about the hardware limits. Many people try to run local LLMs, thinking it will just work, then get frustrated when it is slow or crashes. Posts like this help save a lot of time and confusion.

Collapse
 
richa-parekh profile image
Richa Parekh

Appreciate that! I'm glad the post was helpful.

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

This is extremely impressive, love how you documented the process and called out the RAM struggle directly. Makes me wanna try it on my old laptop now

Collapse
 
richa-parekh profile image
Richa Parekh

Thank you for the appreciation. Ollama is definitely worth a try.