If you’ve ever wanted to run powerful language models on your own machine without cloud costs or complex setups Ollama makes that incredibly easy. In this post, I'll walk you through running the Mistral model locally using Ollama, from installation to making API calls from Python.
Let’s get started.
What is Ollama?
Ollama is a lightweight tool that lets you run large language models locally with minimal effort. It handles downloading, starting, and serving models through a local API—no Docker setup or GPU required (though it can use one if available).
Step 1: Install Ollama
First, install Ollama on your system. It's available for macOS, Linux, and Windows (via WSL).
macOS (via Homebrew):
brew install ollama
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows (via WSL):
- Install WSL (if you haven’t already)
- Run the Linux install command inside your WSL terminal
Once installed, you can start the Ollama server:
ollama serve
Step 2: Download the Mistral Model
Ollama supports various open-weight models. To download Mistral, simply run:
ollama pull mistral
This will download the Mistral 7B model and prepare it for use.
Step 3: Run Mistral in Your Terminal
To start using the model in an interactive chat format, run:
ollama run mistral
You’ll enter a prompt where you can chat with the model just like you would with any LLM:
> What are some fun weekend projects using Raspberry Pi?
Mistral will generate a local response in real-time.
Step 4: Access Mistral Programmatically via API
Ollama also runs a local HTTP API at http://localhost:11434
. You can use this to integrate the model into any app or script.
Here’s a quick example in Python:
import requests
response = requests.post(
'http://localhost:11434/api/generate',
json={
'model': 'mistral',
'prompt': 'Explain how quantum computing works in simple terms.',
'stream': False
}
)
print(response.json()['response'])
Make sure ollama serve
is running in the background when you make this request.
Optional: Customize Mistral with a Modelfile
You can also create a customized version of the model using a simple Modelfile
. This allows you to define a default system prompt or other behaviors.
Example Modelfile
:
FROM mistral
SYSTEM You are a concise and friendly assistant.
To create and run your custom model:
ollama create custom-mistral -f Modelfile
ollama run custom-mistral
This is useful if you want to fine-tune the tone or role of the assistant.
Final Thoughts
With Ollama, running high-performance models like Mistral locally is no longer just for AI researchers or devops wizards. You can get started in minutes, and it's ideal for:
- Offline development
- Privacy-sensitive tasks
- Learning how LLMs work behind the scenes
If you're interested in experimenting with open-source LLMs, this setup is one of the easiest ways to dive in.
Top comments (0)