DEV Community

Cover image for Run LLMs locally with Ollama on macOS for Developers
Daniel Bayerlein
Daniel Bayerlein

Posted on

3 1

Run LLMs locally with Ollama on macOS for Developers

In this blog post, I will show you how to run LLMs locally on macOS. I also want to provide an AI interface and integrate an AI coding assistant into VS Code.

Ollama

Ollama is an open source tool that allows you to run large language models (LLMs) directly on a local machine.

I use Homebrew for the installation of Ollama. But there are also alternative installation options available, see https://ollama.com/download.

Please run Ollama as a standalone application outside of Docker containers as Docker Desktop does not support GPUs.
Source

Use the following command to install Ollama:

brew install ollama
Enter fullscreen mode Exit fullscreen mode

To run Ollama as a service in the background, execute the following command:

brew services start ollama
Enter fullscreen mode Exit fullscreen mode

Ollama should now be running and accessible at http://localhost:11434/. If you open the URL in your browser, you will see Ollama is running.

You can now download the LLM of your choice. An overview of all available LLMs can be found here.

Let's start with the DeepSeek-R1 LLM. You can download it using the following command:

ollama pull deepseek-r1
Enter fullscreen mode Exit fullscreen mode

Ready to go. Run the model with the following command:

ollama run deepseek-r1
Enter fullscreen mode Exit fullscreen mode

Now ask DeepSeek-R1 something.

Demo

AI interface

As an alternative to the command line interface, I would like to use a user interface. So take a look at Open WebUI.

Open WebUI is an extensible, self-hosted AI interface that adapts to your workflow, all while operating entirely offline.

If you are already using Docker, installation is straightforward. Run the following command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

See documentation for an alternative installation method.

Open WebUI is now available for use on http://localhost:3000/.

Open WebUI

AI Coding Assistant in Visual Studio Code

I would like a local AI coding assistant in VS Code as an alternative to GitHub Copilot. There are a number of different extensions, one of which is Continue.

The leading open-source AI code assistant. You can connect any models and any context to create custom autocomplete and chat experiences inside the IDE

You can find the extension for VS Code in the Visual Studio Marketplace.

Continue provides the following features:

  • Chat to understand and iterate on code in the sidebar
  • Autocomplete to receive inline code suggestions as you type
  • Edit to modify code without leaving your current file
  • Actions to establish shortcuts for common use cases

I want to use Llama 3.1 8B model for chat and Qwen2.5-Coder 1.5B model for autocomplete.

These models can be provided with Ollama using the following command:

ollama pull llama3.1:8b
ollama pull qwen2.5-coder:1.5b
Enter fullscreen mode Exit fullscreen mode

The VS Code extension Continue must now be configured:

{
  "models": [
    {
      "title": "Llama 3.1 8B",
      "model": "llama3.1:8b",
      "provider": "ollama",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5B",
    "model": "qwen2.5-coder:1.5b",
    "provider": "ollama",
    "apiBase": "http://localhost:11434"
  },
}
Enter fullscreen mode Exit fullscreen mode

provider is always ollama and apiBase is always the path where Ollama can be accessed (http://localhost:11434).

The Continue features are now available in VS Code and can be used.

VS Code


If you have any kind of feedback, suggestions or ideas - feel free to comment this post!

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (1)

Collapse
 
marcostreng profile image
Marco Streng

works great! thank you

Eliminate Context Switching and Maximize Productivity

Pieces.app

Pieces Copilot is your personalized workflow assistant, working alongside your favorite apps. Ask questions about entire repositories, generate contextualized code, save and reuse useful snippets, and streamline your development process.

Learn more