DEV Community

Cover image for How to Use AI Models Locally in VS Code with the Continue Plugin (with Multi-Model Switching Support)
manikandan
manikandan

Posted on

How to Use AI Models Locally in VS Code with the Continue Plugin (with Multi-Model Switching Support)

AI-assisted coding has become a daily tool for many developers — from explaining complex code to generating entire functions in seconds. But most AI coding tools rely on cloud-based models like GitHub Copilot or ChatGPT, which means you’re always dependent on an internet connection, API tokens, and third-party privacy policies.

What if you could bring that power entirely local, right inside VS Code, with no external API calls and the ability to switch between multiple models at will?

That’s exactly what we’ll cover in this guide. You’ll learn how to use the Continue plugin in VS Code to run AI models locally using Ollama, and even set up multi-model switching for different coding scenarios.

What You’ll Need

Before begin, make sure you have the following:

  • Visual Studio Code (latest version)
  • Internet connection (only for installation)
  • Ollama (for running local AI models)
  • System resources — at least 8 GB RAM (16 GB recommended)
  • Basic familiarity with JSON configuration files

Step 1: Install the Continue Plugin in VS Code

  1. Open VS Code.
  2. Go to the Extensions Marketplace (Ctrl+Shift+X / Cmd+Shift+X).
  3. Search for “Continue” by Continue.dev.
  4. Click Install.

Once installed, you’ll notice a new 🧠 Continue icon on your left sidebar. Clicking it will open the Continue chat panel.

_Show the Continue plugin installation screen in the VS Code Marketplace._

_Show the Continue plugin after installation screen in the VS Code._

Step 2: Set Up Ollama for Local Models

Ollama lets run open-source AI models like Llama 3, Mistral, codellama, and more — all locally on your machine.

Install Ollama

Run this command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Then start Ollama:

ollama serve
Enter fullscreen mode Exit fullscreen mode

Pull a Model

For example, only one model per execution:

ollama pull llama3
ollama pull mistral
ollama pull codellama
Enter fullscreen mode Exit fullscreen mode

Once it’s loaded, Ollama hosts the model locally at http://localhost:11434, ready to respond to requests.

_Show Ollama running in the terminal with “Listening on port 11434”._

Step 3: Configure Continue to Use Local Models

Now using Continue to use Ollama as a provider for your local AI models.

Open Continue’s Configuration File

  1. In VS Code, open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P).
  2. Search “Continue: Open Config File”.
  3. This opens a file named .continue/config.json.

Add a Local Model

{
  "models": [
    {
      "name": "Mistral Local",
      "provider": "ollama",
      "model": "mistral"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Save the file.

Now Continue will use your locally hosted Mistral model via Ollama.

_Show  raw `.continue/config.json` endraw  with the model configuration._

Step 4: Add Multiple Models and Enable Switching

One of Continue’s most powerful features is multi-model support. You can define multiple models — local or remote — and switch between them instantly from the Continue sidebar.

Here’s an example setup:

{
  "models": [
    { "title": "Llama 3", "provider": "ollama", "model": "llama3" },
    { "title": "Mistral", "provider": "ollama", "model": "mistral" },
    { "title": "CodeLlama", "provider": "ollama", "model": "codellama" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

How to Switch Models

  • Click the model dropdown in the Continue sidebar and choose one.
  • Or, use the command line shortcut in the chat:
  /switch llama 3
Enter fullscreen mode Exit fullscreen mode

_Show the Continue sidebar with a model selection dropdown._

Step 5: Start Using Continue Locally

Now you’re ready to use your local AI assistant — completely offline.

Some great use cases include:

  • Explaining or summarizing existing code
  • Generating unit tests
  • Suggesting function names or documentation
  • Refactoring large files with reasoning

Example Interaction

Prompt: “Using Python how to get the index of specific string from a sentence”

Model (Mistral): “To find the index (position) of a specific string in a sentence using Python, you can utilize the built-in str.find() method.”

Model interaction

Step 6: Troubleshooting and Optimization

Here are a few quick tips to make your setup smoother:

Issue Possible Fix
Model not found Make sure the model is pulled via Ollama (ollama pull mistral)
Slow responses Try smaller models like phi3 or codellama
JSON config errors Validate using VS Code’s built-in JSON formatter
High memory use Limit concurrency in Ollama or close other running models

⚡ Optional: Hybrid Setup (Local + Cloud Models)

Continue lets you combine local and remote models in the same workspace.

You can use OpenAI or Anthropic APIs for high-power reasoning tasks while keeping local models for everyday completions.

Example hybrid config:

{
  "models": [
    { "name": "Llama 3 Local", "provider": "ollama", "model": "llama3" },
    { "name": "GPT-4 Turbo", "provider": "openai", "model": "gpt-4-turbo" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Then simply switch between them based on your needs — offline vs. advanced reasoning.

Conclusion

And that’s it! You now have a fully local AI coding assistant in VS Code — powered by the Continue plugin and Ollama.

✅ You installed the Continue plugin

✅ Configured local models like Mistral and Llama

✅ Added multiple models with seamless switching

✅ Used your AI assistant completely offline

This setup gives you the best of both worlds — privacy, flexibility, and zero dependency on cloud APIs.

So go ahead and experiment — try different open models, optimize your workflows, and experience the power of AI-assisted coding locally.

🔗 Resources

Top comments (0)