DEV Community

Akarsh Jaiswal
Akarsh Jaiswal

Posted on

How to Run Mistral Locally Using Ollama

If you’ve ever wanted to run powerful language models on your own machine without cloud costs or complex setups Ollama makes that incredibly easy. In this post, I'll walk you through running the Mistral model locally using Ollama, from installation to making API calls from Python.

Let’s get started.

What is Ollama?

Ollama is a lightweight tool that lets you run large language models locally with minimal effort. It handles downloading, starting, and serving models through a local API—no Docker setup or GPU required (though it can use one if available).

Step 1: Install Ollama

First, install Ollama on your system. It's available for macOS, Linux, and Windows (via WSL).

macOS (via Homebrew):

brew install ollama
Enter fullscreen mode Exit fullscreen mode

Linux:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Windows (via WSL):

  1. Install WSL (if you haven’t already)
  2. Run the Linux install command inside your WSL terminal

Once installed, you can start the Ollama server:

ollama serve
Enter fullscreen mode Exit fullscreen mode

Step 2: Download the Mistral Model

Ollama supports various open-weight models. To download Mistral, simply run:

ollama pull mistral
Enter fullscreen mode Exit fullscreen mode

This will download the Mistral 7B model and prepare it for use.

Step 3: Run Mistral in Your Terminal

To start using the model in an interactive chat format, run:

ollama run mistral
Enter fullscreen mode Exit fullscreen mode

You’ll enter a prompt where you can chat with the model just like you would with any LLM:

> What are some fun weekend projects using Raspberry Pi?
Enter fullscreen mode Exit fullscreen mode

Mistral will generate a local response in real-time.

Step 4: Access Mistral Programmatically via API

Ollama also runs a local HTTP API at http://localhost:11434. You can use this to integrate the model into any app or script.

Here’s a quick example in Python:

import requests

response = requests.post(
    'http://localhost:11434/api/generate',
    json={
        'model': 'mistral',
        'prompt': 'Explain how quantum computing works in simple terms.',
        'stream': False
    }
)

print(response.json()['response'])
Enter fullscreen mode Exit fullscreen mode

Make sure ollama serve is running in the background when you make this request.

Optional: Customize Mistral with a Modelfile

You can also create a customized version of the model using a simple Modelfile. This allows you to define a default system prompt or other behaviors.

Example Modelfile:

FROM mistral
SYSTEM You are a concise and friendly assistant.
Enter fullscreen mode Exit fullscreen mode

To create and run your custom model:

ollama create custom-mistral -f Modelfile
ollama run custom-mistral
Enter fullscreen mode Exit fullscreen mode

This is useful if you want to fine-tune the tone or role of the assistant.

Final Thoughts

With Ollama, running high-performance models like Mistral locally is no longer just for AI researchers or devops wizards. You can get started in minutes, and it's ideal for:

  • Offline development
  • Privacy-sensitive tasks
  • Learning how LLMs work behind the scenes

If you're interested in experimenting with open-source LLMs, this setup is one of the easiest ways to dive in.

Useful Links

Top comments (0)