In this blog post, I will show you how to run LLMs locally on macOS. I also want to provide an AI interface and integrate an AI coding assistant into VS Code.
Ollama
Ollama is an open source tool that allows you to run large language models (LLMs) directly on a local machine.
I use Homebrew for the installation of Ollama. But there are also alternative installation options available, see https://ollama.com/download.
Please run Ollama as a standalone application outside of Docker containers as Docker Desktop does not support GPUs.
Source
Use the following command to install Ollama:
brew install ollama
To run Ollama as a service in the background, execute the following command:
brew services start ollama
Ollama should now be running and accessible at http://localhost:11434/
. If you open the URL in your browser, you will see Ollama is running
.
You can now download the LLM of your choice. An overview of all available LLMs can be found here.
Let's start with the DeepSeek-R1 LLM. You can download it using the following command:
ollama pull deepseek-r1
Ready to go. Run the model with the following command:
ollama run deepseek-r1
Now ask DeepSeek-R1 something.
AI interface
As an alternative to the command line interface, I would like to use a user interface. So take a look at Open WebUI.
Open WebUI is an extensible, self-hosted AI interface that adapts to your workflow, all while operating entirely offline.
If you are already using Docker, installation is straightforward. Run the following command:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
See documentation for an alternative installation method.
Open WebUI is now available for use on http://localhost:3000/
.
AI Coding Assistant in Visual Studio Code
I would like a local AI coding assistant in VS Code as an alternative to GitHub Copilot. There are a number of different extensions, one of which is Continue.
The leading open-source AI code assistant. You can connect any models and any context to create custom autocomplete and chat experiences inside the IDE
You can find the extension for VS Code in the Visual Studio Marketplace.
Continue provides the following features:
- Chat to understand and iterate on code in the sidebar
- Autocomplete to receive inline code suggestions as you type
- Edit to modify code without leaving your current file
- Actions to establish shortcuts for common use cases
I want to use Llama 3.1 8B
model for chat and Qwen2.5-Coder 1.5B
model for autocomplete.
These models can be provided with Ollama using the following command:
ollama pull llama3.1:8b
ollama pull qwen2.5-coder:1.5b
The VS Code extension Continue
must now be configured:
{
"models": [
{
"title": "Llama 3.1 8B",
"model": "llama3.1:8b",
"provider": "ollama",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen2.5-Coder 1.5B",
"model": "qwen2.5-coder:1.5b",
"provider": "ollama",
"apiBase": "http://localhost:11434"
},
}
provider
is always ollama
and apiBase
is always the path where Ollama can be accessed (http://localhost:11434
).
The Continue features are now available in VS Code and can be used.
If you have any kind of feedback, suggestions or ideas - feel free to comment this post!
Top comments (1)
works great! thank you