David Bilson

Posted on May 26

How I Escaped Claude & Cursor Limits: The Ultimate Free Local AI Coding Setup with Ollama + Continue.dev (2026 Guide)

#ai #vscode #llm #programming

As a developer, I got tired of waiting 5 days for Claude resets and burning through Cursor’s $20 credit pool.

I was building an editorial platform and I was at a critical point where I was developing a new feature that involved build on the admin, server and the platform concurrently for complete context and I was hit with the weekly limit by Claude right in the middle of it, having to wait 5 days because of the context.

Frustrating ! ! !

I'm building at senior engineering level with Claude's Opus 4.7 in VS Code, and I want to be able to do that without usage limits, weekly caps, or surprise bills.

After lots of research online, I finally set up completely free, local AI models running directly in VS Code using Ollama + Continue.dev.

These models now handle most of my daily coding work and they’re shockingly good (some tasks feel close to Claude Opus 4.7 level).

This is the exact step-by-step guide that worked for me.

Why You Need This SetupNo more waiting for resets

No credit pools or usage-based billing traps
Works completely offline after download
Private (your code never leaves your machine)
Easy to switch between multiple free models

Step-by-Step Setup (Tested on MacBook Air M1 model)

Step 1: Install Ollama

Go to ollama.com, download and install it

Make sure to select your operating system before clicking download. After installation, open ollama - it would run as a background activity.

Open any terminal and run:

ollama list

Running ollama list should show nothing at first.

You can use VS Code's terminal since we would be doing the whole work in VS code.

Step 2: Download Good Free Models

I started with these two:

In your terminal, run the following and wait for the download to be complete

ollama pull qwen2.5-coder:7b

ollama pull deepseek-coder-v2:16b

Wait till the two are done downloading. qwen2.5-coder is roughly 4.7gb in download size while deepseek-coder-v2 is roughly 8.9gb in download size.

Step 3: Install Continue.dev in VS Code

Open VS Code if you don't already have it opened
Click on the extensions tab on the side bar available on the left side or right side of VS code
Under the extensions tab, search "Continue" by Continue, it's the first result you are going to see.

Then proceed to install continue, it immediately becomes available on the sidebar after installation.

Step 4: Configure Continue.dev

Now this is the slightly tricky part.

In the Continue topbar, click the gear icon to open settings

Click on configs on the left tab to open the config panel

Replace everything in the config.yaml file with this configuration:

name: Local Config
version: 1.0.0
schema: v1

models:
  - name: Qwen2.5 Coder 7B
    provider: ollama
    model: qwen2.5-coder:7b

  - name: DeepSeek Coder V2 16B
    provider: ollama
    model: deepseek-coder-v2:16b

  - name: Qwen 1.5B Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete

tabAutocompleteModel:
  name: Qwen 1.5B Autocomplete
  provider: ollama
  model: qwen2.5-coder:1.5b

Save the file, close and reopen VS Code. Click on the continue tab on the sidebar again to open it, you should now have a Chat panel on Continue

Click on the models dropdown to see if you have any models running yet. If you don't have any model running it, you will need to reload the config, and you can do that by clicking on the local config dropdown right above the chat interface. Once the dropdown opens, click on reload.

Step 5: Test It

In Continue sidebar, select Qwen2.5 Coder 7B
Turn Agent mode OFF (very important for stability)
Type “Hello” and send.

Common Problems & Fixes (From My Own Struggle)

“No models configured” → Reload config after saving yaml, or restart VS Code.
DeepSeek shows “does not support tools” → Switch to Qwen2.5 7B and turn Agent OFF.
Too slow on MacBook Air → Use the 1.5B model for daily work, keep 7B for serious tasks.
First response is very slow → Normal on CPU. Be patient for the first message.

Model Recommendations

Daily driver: Qwen2.5 Coder 7B (best balance)
Fast autocomplete: Qwen 1.5B
Strong reasoning: DeepSeek Coder V2 16B (when speed is not critical)

I no longer worry about Claude resets or Cursor limit traps. I now have reliable, free, private AI coding assistance running locally in VS Code.

This setup won’t always beat the absolute latest Claude Opus on every single task, but for 60-70% of my work, it’s more than enough, it's frustrating to have to wait for limit reset or pay extra.

Direct Links to Resources

Ollama Download
https://ollama.com/download

VS Code IDE Download
https://ollama.com/download

Top comments (2)

Harjot Singh • May 29

the local-ollama route is the right hedge for many devs, but it trades quality for cost - kimi/llama can be rough on full saas builds. theres a middle ground: per-build pricing instead of subscription. been building moonshift on that bet: $3 per shipped SaaS using frontier models under the hood, code into ur own gh + vercel. first run completely free, no card needed, if u want a 3rd column for ur next escape-the-limits writeup.

Harjot Singh • Jun 1

going local with Ollama + Continue to dodge usage limits is a smart cost move, especially for high-volume coding. the tradeoff is you then own the orchestration + safety yourself. that's the layer Moonshift adds on top of whatever model: agents build + deploy + market a SaaS overnight with validation between steps. good setup guide. first run's free if you want the orchestration handled.