Anita Ihuman 🌼

Posted on Nov 25

Best Offline AI Coding Assistant: How to Run LLMs Locally Without Internet

#webdev #ai #programming #productivity

As developers, we’ve all felt that shift, AI coding assistants quietly becoming part of our workflow, helping us write cleaner code, understand new codebases faster, and move from idea to implementation with less friction. They’ve gone from “nice to have” to tools we rely on daily. But the moment the internet drops, everything stops.

This led me to explore options that I can use when I have access to the internet and even when I am offline. It turns out that the Continuous AI extension also supports this capability. Instead of depending on the cloud, it lets me load local models downloaded with tools like Ollama, LM Studio, or Hugging Face and use them directly inside VS Code. Now I can access my coding assistant offline, with features like chat, autocomplete, and agents available without interruptions.

How does it work?

Offline coding assistants like Continue.dev let you run large language models directly on your machine, giving you cloud-level intelligence without an internet connection. Instead of sending your code and prompts to a remote API, they pull the models to your device and handle all inference locally. Here’s how the process works under the hood:

Download the model locally — The assistant connects to a model source such as Hugging Face or Ollama, then downloads a compatible model (often quantized) to your machine.
Load the model into a local runtime — It uses an inference engine such as GGML, llama.cpp, or GPU-accelerated backends to load the model into your RAM or VRAM.
Run all inference on-device — When you type a prompt, Continue.dev sends it to the local runtime instead of the cloud. Your CPU or GPU handles all the computation needed to generate the following tokens.
Stream results back to your editor — The model’s responses are streamed directly to Continue.dev inside your IDE, with no network delay or external data transfer.
Stay fully offline — Because everything happens on your system, from prompt to output, you can keep coding even when offline, with full privacy and consistent performance.

Downloading LLMs with Ollama

While Continue.dev lets you use local LLMs directly inside your editor, tools like Ollama are where you actually download and run the models themselves. Ollama is an open-source platform for running large language models locally.

Install Ollama

For Windows and Mac, go to the Ollama download page, click the Download button to start your download. Once the download is finished, click on the file and click the Install button.
For Linux, run this command:

curl -fsSL https://ollama.com/install.sh | sh

After installation, the Ollama GUI should open.

Running an LLM

You can download and run a wide variety of LLMs with Ollama. To run an LLM, use the following command:

ollama run "model name"

Example:

ollama run deepseek-r1

When the download is complete, your LLM should be running, and you can interact with it directly on your terminal.
You can also open the GUI to confirm your downloaded model in the dropdown and interact with it directly without the need for an internet connection.

Check out the Ollama repository for more models, information, and commands.

Use your LLM on VSCode with Continue.dev

Install Continue Extension
Now open your VSCode, go to the extensions tab, and type “Continue” in the search bar.

Click Install. After installing, the Continue extension icon should appear. Click it to open the extension window.

Pull/Download a Model

For this article, we are using two lightweight Models recommended by Continue.dev that support chat and autocomplete modes. Run the following commands:

ollama run qwen2.5-coder:3b
ollama run qwen2.5-coder:1.5b

Run Configs

In your Continue extension window, click the settings icon and navigate to Configs, and click Local Config.

A config.yaml file will show up. Populate it with this configuration:

name: Local Assistant
version: 1.0.0
schema: v1

models:
 - name: Qwen2.5-Coder 3B
   provider: ollama
   model: qwen2.5-coder:3b
   roles:
     - chat
     - edit
     - apply

 - name: Qwen2.5-Coder 1.5B (Autocomplete)
   provider: ollama
   model: qwen2.5-coder:1.5b
   roles:
     - autocomplete

 - name: Autodetect
   provider: ollama
   model: AUTODETECT

context:
 - provider: code
 - provider: docs
 - provider: diff
 - provider: terminal
 - provider: problems
 - provider: folder
 - provider: codebase

These configurations define which models can be used and how. Check out the Continue reference docs to learn more about configurations.

Voila! You are done. You can now interact with your local LLM directly in your code editor without the need for the Internet.

Advantages of using Local LLMs

Aside from not needing internet to run, tools like the Continue extension come with more advantages for developers:

Privacy and data control — With Continue.dev running models locally, your code, logs, and prompts never leave your machine. It communicates directly with your on-device model runtime, so nothing is sent to external servers. This gives you full control over sensitive repositories, internal APIs, and proprietary logic without cloud-related security risks.
Customizability — Continue.dev lets you choose exactly which local model you want to run: Llama, Mistral, Codellama, Qwen, and more. You can swap models per project, adjust context lengths, tweak inference settings, or even fine-tune your own model and point Continue.dev to it. It adapts to your hardware and workflow instead of locking you into a single provider.
Predictable performance — Because Continue.dev operates through a local inference engine, your response time depends only on your machine, not on server load or network stability. There’s no throttling, rate-limiting, or unpredictable latency. Whether you’re refactoring a file or generating new components, the speed stays consistent.
Cost efficiency — Once you download the model and set it up with Continue.dev, you avoid ongoing API charges entirely. There are no per-token costs, monthly bills, or usage caps. You get unlimited inference, unlimited tokens, and full autonomy without burning through credits or subscriptions.

In essence, local LLMs with the use of the Continue extension offer a level of control, reliability, and security directly into your code editor. Whether you’re dealing with unstable or no internet, sensitive code, or the need for faster, more consistent performance, running models on your own machine gives you an edge.

What’s Next?

The future of offline coding assistance is here. It continues to evolve, with tools like Continue and Ollama that let developers run large language models locally in their environments without relying on the internet.
With tools like Continue, setting up local LLM support in your code editor is now easier than ever. So play around with it, run more models, sharpen your skills, and enjoy the joy of AI coding assistance even without an internet connection.

DEV Community