How to Setup an LLM Model Locally on Your Machine (Linux)

#git

Hello, I'm Ganesh. I'm working on FreeDevTools online, currently building a single platform for all development tools, cheat codes, and TL; DRs — a free, open-source hub where developers can quickly find and use tools without the hassle of searching the internet.

In the age of ChatGPT and Claude, we often rely on cloud services to run AI. But did you know you can run powerful Large Language Models (LLMs) right on your own laptop?

Running an LLM locally offers total privacy, zero subscription fees, and offline access. In this guide, we’ll walk through exactly how to set up Google’s lightweight Gemma 2 (2B) model on a Linux machine using a tool called Ollama.

1. What is an LLM?

A Large Language Model (LLM) is a type of Artificial Intelligence trained on massive amounts of text data. Think of it as a super-advanced "autocomplete" that understands context, logic, and coding.

While famous models like GPT-4 have trillions of parameters (the "brain cells" of the AI), modern "Small Language Models" (SLMs) like Gemma 2 (2B) are optimized to run on consumer hardware without needing a massive server farm.

2. Prerequisites

A computer running Linux (Ubuntu, Mint, Fedora, etc.).
Basic familiarity with the Terminal.
At least 4GB of RAM (The Gemma 2B model is very efficient).

3. The Tool: Why Ollama?

We will use Ollama, the most popular open-source tool for running LLMs locally. It handles all the complex configuration (drivers, weights, interfaces) automatically.

Step 1: Install Ollama

Open your terminal and run the official installation script. This single command downloads and installs the Ollama service.

curl -fsSL https://ollama.com/install.sh | sh

Note: You may be asked for your sudo password to complete the installation.

Step 2: Verify the Installation

Once the script finishes, ensure Ollama is running by checking its version:

ollama --version

If you see a version number (e.g., ollama version is 0.5.4), you are ready to go.

4. Get the Model: Gemma 2 (2B)

We will use Gemma 2:2b. This is Google's open model. It is heavily optimized, making it fast and surprisingly smart for its size.

To download and run the model in one go, use the following command:

ollama run gemma2:2b

What happens next?

Ollama will pull the model manifest.
It will download the model layers (approx. 1.6 GB).
Once finished, it will drop you directly into a chat prompt.

5. Test: Your First Prompt

Once the download is complete, you will see a prompt that looks like >>>. You are now chatting directly with the AI on your machine.

Try this simple test prompt:

>>> Write a Python function to check if a number is prime.

Expected Output:
The model should instantly generate code similar to this:

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

Another fun test:

>>> Why is the sky blue? Explain it like I'm 5.

6. Managing Your Models

To exit the chat, press Ctrl + d.
To see which models you have installed later, simply run:

ollama list

Conclusion

You now have a private AI assistant running entirely on your Linux machine! You can use it to summarize logs, write scripts, or brainstorm ideas without sending a single byte of data to the cloud.