This article was originally published on aifoss.dev
---
title: 'Cline Setup Guide 2026: VS Code Agent with Local Models'
description: 'Set up Cline v3.84 in VS Code with Ollama local models. Covers Plan/Act modes, .clinerules configuration, model recommendations, and when to use cloud.'
pubDate: 'May 22 2026'
tags: ["ai", "coding", "productivity", "llm", "opensource"]
Cline is the VS Code extension that executes code changes autonomously — reads your files, proposes a plan, and with your approval writes the edits, runs commands, inspects errors, and iterates until the task is done. As of May 2026, the VS Code extension is at v3.84.0, the CLI at v3.0.11, it has 62,000+ GitHub stars, and it carries an Apache 2.0 license. Five million installs across the Marketplace and Open VSX.
Unlike Continue.dev, which gives you autocomplete and inline suggestions you apply one at a time, Cline takes a task description and does the work. You're approving, not steering. That's a meaningful shift in how you spend time with a codebase — and it works entirely on local models via Ollama, which matters if you're dealing with a private codebase, working offline, or unwilling to pipe your code to external APIs.
How Cline Works
Two modes structure everything:
Plan mode — Cline reads the relevant files, asks clarifying questions if it needs to, and lays out exactly what it's going to do before touching anything. You can push back, redirect, or approve. This is where you catch misunderstood requirements before they become multi-file edits to undo.
Act mode — Cline executes. Every file edit, terminal command, and tool call is shown before it runs. You can approve each step individually or configure auto-approval thresholds for low-risk operations like reading files or running test commands.
The approve-everything default feels slow for the first few tasks. After a few cycles, you realize it's the mechanism that prevents you from shipping a half-executed refactor. Tune auto-approve thresholds in settings once you've calibrated what the model does with your codebase.
There's also YOLO mode — Cline transitions Plan → Act automatically without waiting for your sign-off. Don't start there. Get a feel for what the agent actually does before letting it run unsupervised on production code.
Cline runs inside VS Code, JetBrains, Cursor, Windsurf, and Zed, plus a preview CLI for macOS and Linux. The VS Code extension is by far the most mature.
Installation
Open VS Code and press Ctrl+Shift+X (macOS: Cmd+Shift+X). Search for Cline. Publisher is saoudrizwan. Install.
After installation, the Cline icon appears in the left sidebar. Click it and the chat panel opens. No Python environment, no Node version management on your end — the extension bundles what it needs.
Quick Start with a Cloud Model
If you want to validate Cline's behavior before setting up local inference, the fastest path is through Anthropic's API:
- Open Cline settings (gear icon in the Cline panel)
- Set API Provider to
Anthropic - Paste your API key
- Select a current Sonnet model from the dropdown
Give it a scoped task: "Add a --dry-run flag to the CLI entry point that prints what would happen without executing." Watch Plan mode describe the approach, then Act mode carry it out across files.
Running this once with a frontier model gives you a baseline for what "a correct Cline task execution" looks like. That baseline is useful when you're later comparing it against local models and trying to diagnose whether a bad result is model quality or task scope.
Local Models via Ollama
Cline talks to Ollama's local HTTP API the same way it talks to any OpenAI-compatible endpoint. The configuration is minimal.
Step 1: Install Ollama
If Ollama isn't already running, the Ollama 2026 review covers the full setup. The short path:
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS (Homebrew)
brew install ollama
Start the server:
ollama serve
Ollama binds to http://localhost:11434 by default.
Step 2: Pull a Coding Model
# Fast and capable on modest hardware (~5GB disk, 8GB+ VRAM)
ollama pull qwen2.5-coder:7b
# Strong multi-file reasoning — the default pick for 16-24GB setups
ollama pull qwen2.5-coder:32b
# Reasoning-focused; strong on complex bugs but 2-3x slower
ollama pull deepseek-r1:14b
Step 3: Configure Cline
In the Cline settings panel:
-
API Provider →
Ollama -
Base URL →
http://localhost:11434 -
Model → select the model you pulled (e.g.,
qwen2.5-coder:32b)
The model list auto-populates from whatever Ollama has downloaded. Save. That's the full local setup.
Which Local Model to Run
The right answer depends on your hardware and task complexity. Here's where each model fits:
| Model | VRAM / RAM needed | Best for | Weakness |
|---|---|---|---|
| qwen2.5-coder:7b | 8GB VRAM / 16GB RAM | Boilerplate, scaffolding, quick refactors | Weak multi-file reasoning |
| qwen2.5-coder:32b | 22GB VRAM / 32GB RAM | Multi-file edits, reliable tool calls | Needs serious hardware |
| deepseek-r1:14b | 10GB VRAM / 24GB RAM | Complex debugging, step-by-step reasoning | 2–3x slower than Qwen2.5 |
| Qwen3-Coder 30B | 20GB VRAM / 32GB RAM | Agentic workflows, 256K context, best tool use | Large; requires high-end hardware |
| llama4:scout | 16GB VRAM / 32GB RAM | Balanced general coding, multimodal tasks | Less community-tested for agents |
For most developers with a 16–24GB GPU: qwen2.5-coder:32b is the default pick. It handles multi-file edits without hallucinating tool calls — which matters more for Cline than raw benchmark scores, because agentic use requires the model to reliably call read_file and write_to_file in the right sequence.
On Apple Silicon (M3 Pro / M3 Max or newer), qwen2.5-coder:32b at Q4_K_M runs at 20–30 tokens/sec thanks to unified memory. On NVIDIA, you need roughly 22GB VRAM for the 32B variant at Q4_K_M quantization.
If your machine has an 8GB GPU, use qwen2.5-coder:7b for contained tasks and switch to a cloud provider (OpenRouter, Anthropic) for anything requiring coherent reasoning across a large file tree.
Qwen3-Coder 30B is worth trying if you have the hardware. It was tuned specifically for agentic workflows — it understands tool-use sequences rather than just text generation, which directly benefits how Cline chains file reads and writes.
Configuring Cline for Your Codebase
This is where Cline's setup diverges from most coding tools. The .clinerules/ directory in your project root is the mechanism for giving Cline persistent, project-scoped instructions. Think of it as version-controlled system prompt — but editable by the agent itself.
Create .clinerules/project-rules.md:
# Project Rules
## Tech Stack
- TypeScript, Node.js 22, Vitest for tests
- All database access through the `db/` module — never raw SQL elsewhere
- Prefer functional patterns; avoid class hierarchies unless necessary
## Style
- camelCase variables, PascalCase types/interfaces, UPPER_SNAKE_CASE constants
- No `any` types without a comment explaining why
## Testing
- All new functions need a unit test in the co-located `__tests__/` directory
- Run `pnpm test` before declaring a task complete
## Branching
- Never commit directly to main
- Branch name format: `feat/short-description` or `fix/short-description`
Keep each rule file under 150 lines. For separate concerns — architecture, style, testing, deployment — split into separate files: 01-architecture.md, 02-style.md, and so on. Cline processes all .md and .txt files in .clinerules/ and merges them into a single context block.
Workspace rules take precedence over global rules when they conflict. Global rules (in your VS Code settings directory) work well for personal preferences you want across every project — indentation style, tool call verbosity, thing
Top comments (0)