How to Set Up Ollama: Install, Download Models, and Run LLMs Locally

#ai #llm #python #programming

Most people hear the term “AI model” and think of something that exists only in the cloud, accessed through an API key, and has a monthly fee. But not all large language models (LLMs) have to live on a cloud, a farm data center. Ollama lets you and me have LLMs on our computers. This gives us the flexibility to have the LLM regardless of internet connection, $0 in the bank, and our chats can be saved on our computers as well.

In short, Ollama is a local LLM runtime; it’s a lightweight environment that lets you download, run, and chat with LLMs locally; It’s like VSCode for LLMs. Although if you want to run an LLM on a container (like Docker), that is also an option. The goal of Ollama is to handle the heavy lifting of executing models and managing memory, so you can focus on using the model rather than wiring it from scratch.

In this short guide, we will walk through the steps of installing ollama, downloading models, and building with the model in a simple Python project.

Prerequisites

To follow alongside this guide, you will need to:

Have a laptop or PC that has 16GB RAM minimum
Windows 10+ or MacOS 12+

Installing Ollama

Before installing Ollama, ensure you have met the prerequisites above for a smooth experience using Ollama.

Head over to their website, download the software, and click on Download. You will be redirected to the download page, where you can select an OS and follow the on-screen instructions to install it on your computer.

To confirm you have Ollama installed properly, you can run this command:

ollama --version

You should see the version of Ollama you have installed

Downloading a Model

Once you have the Ollama software installed, you will need to download a model. Ollama has an awesome list of open-source models to choose from; they also offer embedding models. Head over to their model section and pick a model. For this guide, we will download the deepseek-r1 model.

Something to keep in mind when downloading a model is that each model has different versions that have different parameter sizes. A parameter is a representation of data the model was trained on or the data it has learned. Larger parameter sizes mean better response, especially if you want it for complex tasks.

We will download the lowest parameter size, so click on the size you want to download and copy the command from the page onto your terminal.

ollama run deepseek-r1:1.5b

To confirm you have downloaded the model, run the command again, and it should let you send a message to the model.

This lets you run the model you just downloaded locally, and with this, you can send messages to the model locally.

This is cool, but it doesn’t stop here. Let’s see how this model works in a project like a simple chatbot.

Accessing Ollama Endpoint

Ollama has an endpoint that lets you call the downloaded model and interact. This section will feature a simple project built in Python, which is optional, as the endpoint works in any project as long as you have the necessary software downloaded.

You would need to download requests library using pip to run the project, do this inside your virtual environment.

pip install requests

We would need import requests and JSON.

import requests
import json

Then you need to call the Ollama localhost endpoint and define the model name. This endpoint lets you send requests to models you downloaded and get a response.

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "deepseek-r1:1.5b"

Next, define functions that would take user input and send it to the model via ollama.

The ask_ollama func takes the user input as a prompt as an argument and sends it as a POST request to the model.

def ask_ollama(prompt):
    payload = {
        "model": MODEL,
        "prompt": prompt,
        "stream": False 
    }
    try:
        response = requests.post(OLLAMA_URL, json=payload)
        response.raise_for_status()

        result = response.json()
        if "response" in result:
            return result["response"]
        else:
            return "No response received from the model"

    except requests.exceptions.ConnectionError:
        return "Error: Cannot connect to Ollama. Make sure it's running (ollama serve)"
    except requests.exceptions.HTTPError as he:
        if "404" in str(he):
            return f"Error: Model '{MODEL}' not found. Try running: ollama pull {MODEL}"
        return f"HTTP Error: {he}"
    except json.JSONDecodeError:
        return "Error: Invalid response from Ollama"
    except Exception as e:
        return f"Error: {e}

Then, the main func is the main function of the project, which takes the user's input and returns the response from the model. It also tells the user the model sending the request.

 def main():
    print(f"Chatbot using {MODEL} via Ollama. Type 'exit' to quit.")
    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            break
        print("Bot:", ask_ollama(user_input).strip())

if __name__ == "__main__":
    main()

Your output should look like this.

Wrapping Up

And that is how you can interact with models you download. This lets you have open-source models you can use to experiment with building AI-powered solutions or agents.

The downside of using a local model rather than a cloud-based model is that it may not always provide the desired results, as they are often trained on a small dataset or lack real-time data, i.e., outdated datasets. Regardless, this is a great way to save costs on interacting with GPT, Claude, etc, for a simple project.

If you prefer the video version of this guide, you should check out the video below.