Richa

Posted on Jun 26, 2025

Ollama: How to Easily Run LLMs Locally on Your Computer

#ai #chatgpt #llm #cmd

I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your computer. So I wanted to know more about this and tried it out on the weekend. So here I am sharing what I learned when I used Ollama.

🔍 What Is Ollama?

Ollama is a lightweight yet powerful tool that lets you run LLMs like LLaMA, Mistral, DeepSeek, Starling, and others directly on your own computer. It runs in the background and shows both:

A Command-Line Interface (CLI) for quick management and interactions
An API that you can use in your own programs

The Advantage?

No dependency on the cloud. No keys for the API. Just LLMs that run on your computer.

💻 Installing Ollama

It was surprisingly easy to get started:

Visit https://ollama.com/download
Get the OllamaSetup.exe (I use Windows OS).
Just launch the installer.

Once installed, Ollama starts in the background, and we can run models using the CLI.
💡 Installers for Linux and macOS users can be found on the same page.

🛠️ Exploring the CLI

I opened the Windows Command Prompt (CMD) as soon as it was installed and began to explore. This is the summary of what I tried:

1. ollama

This gives a useful usage guide with a list of all the commands and flags that are available.

2. ollama list

This displays every model that is currently installed. If nothing appears, it simply means no models have been installed yet.

3. ollama run llama3.2:1b

I use the llama3.2:1b model.

What makes me go with this model over the others? I'll explain later in this blog post. So, read till the end.

Ollama started a chat session directly in the terminal after automatically pulling the model, which took a few seconds.
I started conversion with a simple "hello" message. In response to my greeting, the model said:

Hello. Is there something I can help you with, or would you like to chat?

Then I continued with the below few conversions, and the model response was accurate and well-structured.

4. Exiting the Chat /bye

Simply type /bye to end a chat session:

5. ollama rm llama3.2:1b

This command cleans up and frees up disk space.
The model is immediately deleted from the system.

These are some of the commands I first tried with Ollama. These are only basic steps; there are a lot more things we can do with Ollama. See the Ollama GitHub repository for further information.

⁉️ Why I use `llama3.2:1b` model?

I use ollama run llama3.2 after installing Ollama. For the response to the simple prompt message hello, it takes longer than 1-3 minutes to provide a basic response:
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?

What is the reason for that? Well, due to the memory limitations of my system. As per the Ollama documentation,

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

And my system has only 4GB of RAM 😅

Since this subject is new to me, I did my research into the reasons behind these particular requirements.

This is what I discovered 👇
🔴 Main issue: Not enough RAM

For a smooth experience, Llama 3.2 (3B parameters) requires around 6–8GB of RAM.
The total RAM on my system is just 4GB.
Windows 10 itself requires 2-3GB of RAM.
As a result, the AI model has very little memory left.

🟦 What happens if there is not enough RAM?

The computer starts to use "virtual memory" — fake RAM made from hard drive space.
RAM is 100 times faster than hard drives.
The model is constantly "swapped" between the hard drive and RAM, which causes a bottleneck that makes everything very slow.

So I tried to use the smaller Llama 3.2 1B model, which works more smoothly than Llama 3.2. I also tried running other models, but they didn't work due to the system requirements.

💬 Final Thoughts

Without depending on cloud APIs or remote inference, Ollama provides a very developer-friendly way to explore and play with LLMs. This tool is valuable if you're interested in developing local-first applications or simply want to learn how LLMs work off-cloud.

The CLI is easy to use, and the setup went smoothly in my experience. Ollama is definitely worth a try, no matter whether you're a developer developing edge-native apps or a hobbyist learning AI.

Have you tried to run LLMs locally? What models do you explore, or what are you creating with Ollama-like tools? Share your experience and leave your comments below!

Thank you for reading! ✨

Top comments (14)

Solve Computer Science • Jun 26 '25

Try the qwen2.5-coder model family. Yes, 4GB is insufficient to run anything useful. I'm trying with qwen2.5-coder:14b-instruct-q2_K (so low quantization and higher parameters) and it's not bad at all. The speed and quality is decent all considering. You'll need about 20GB of RAM, however. Be aware I got Chinese language only replies when running 1.5B models of that family.

Richa • Jun 26 '25

Thanks for the tip! I’ll definitely check out qwen2.5-coder

Solve Computer Science • Jun 26 '25

ollama.com/library/qwen2.5-coder/tags You'll have to experiment the smallest models with different quantization levels and avoid swapping to disk during inference.

Richa • Jun 26 '25

Thanks for the link! I’ll explore the smallest models and test their performance. Thank you for the suggestion!

Dotallio • Jun 26 '25

Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.

Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?

Richa • Jun 26 '25

Since I'm still learning the concept and getting an understanding of how everything works, I'm sticking to the basics for now

Alexander Ertli • Jun 26 '25

Hey,

Welcome to the genAI techspace.
There is nothing wrong in using smaller models, i resort to them all the time.

If you are interested you could try a much smaller model like smollm2:135m or qwen:0.5b they should be much more responsive with your hardware.

Also typically Ollama tries to run models on using the GPU or at least partially if you have a compatible one.

I hope this helps.

Richa • Jun 27 '25

Yes, I will check out the smaller models. Thanks for the useful advice.

Arindam Majumder • Jun 27 '25

Ollama is Great. You can also use Docker Model Runner for this

Richa • Jun 27 '25

Yeah, ollama is a valuable tool. Thanks for sharing.

Praveen Rajamani • Jun 26 '25

Thanks for being clear about the hardware limits. Many people try to run local LLMs, thinking it will just work, then get frustrated when it is slow or crashes. Posts like this help save a lot of time and confusion.

Richa • Jun 26 '25

Appreciate that! I'm glad the post was helpful.

Nathan Tarbert • Jun 26 '25

This is extremely impressive, love how you documented the process and called out the RAM struggle directly. Makes me wanna try it on my old laptop now