DEV Community

Cover image for Run Claude Code with a Free Local Model — Qwen 3.5 + Ollama Setup
Ayyaz Zafar
Ayyaz Zafar

Posted on • Originally published at ayyaztech.com

Run Claude Code with a Free Local Model — Qwen 3.5 + Ollama Setup

Claude Code is powerful but costs money. Every prompt burns API tokens and your code is sent to external servers. What if you could run the same workflow with a free local model?

Meet Qwen 3.5

A 27B parameter model distilled from Claude 4.6 Opus reasoning traces:

  • Beats Claude Sonnet 4.5 on SWE-bench
  • 96.91% HumanEval accuracy
  • 24% less chain-of-thought bloat (faster responses)
  • Fits on a single GPU with 16GB VRAM
  • 300K+ downloads on Hugging Face

Setup

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the model
ollama pull qwen3.5

# Launch Claude Code with local model
claude --model ollama:qwen3.5
Enter fullscreen mode Exit fullscreen mode

One command and you're coding with local AI. Same workflow as Claude, running on your GPU.

What It Can Do

  • Write code: Clean async Python with error handling and type hints
  • Fix bugs: Reads file, finds bug, explains why, fixes in place
  • Build UIs: Full dark-theme landing page with Tailwind CSS from one prompt

All locally. No API call. No internet required.

Why This Matters

  • Free forever — no API costs, no rate limits
  • Private — your code never leaves your machine
  • Same workflow — identical Claude Code experience
  • Opus-level reasoning — distilled from Claude 4.6 Opus

Qwen 3.5 + Ollama + Claude Code = full agentic AI coding, running locally, free forever.


Originally published at ayyaztech.com

Top comments (0)