You love Claude Code. You hate the API bill. š
If you've used Claude Code even moderately, you've probably watched your Anthropic credits disappear faster than a coffee on a Monday morning. Heavy agentic sessions ā reading files, editing code, running commands ā can rack up $50 to $200+ a month on flagship models. That's a real cost for indie developers and learners.
Here's the good news: you don't need to pay a single dollar to use Claude Code anymore.
Thanks to Ollama's native support for the Anthropic Messages API (added in v0.14.0), you can point Claude Code at free open-source models running either locally on your machine or on Ollama's free cloud tier. Same CLI. Same workflow. Zero Anthropic tokens.
Want to know exactly how to set it all up ā step by step, command by command ā and which free cloud models are worth your time? Let's go. š
What Is This Setup All About?
Claude Code is Anthropic's terminal-based AI coding agent. It reads your codebase, edits files across multiple locations, runs shell commands, calls tools, and handles multi-step tasks ā all from your terminal, driven by natural language.
By default, it talks to Anthropic's API. But there's one small environment variable ā ANTHROPIC_BASE_URL ā that lets you redirect all those requests to a completely different backend.
Ollama is an open-source tool that lets you run AI models on your own machine. Since v0.14.0 (January 2026), Ollama also exposes an Anthropic-compatible Messages API on localhost:11434. That means Claude Code and Ollama now speak the same language natively ā no adapters, no hacks.
The result: Claude Code's full feature set (file editing, tool calls, multi-step reasoning, git integration) running against powerful open-source models at zero cost.
Why This Matters
For most developers, Claude Code was exciting in theory but expensive in practice. A typical heavy session ā analyzing a codebase, refactoring multiple files, generating tests ā could burn through dollars fast.
This setup changes the math completely:
- Local models: completely free, private, works offline, no data leaves your machine
- Ollama cloud models (free tier): runs on Ollama's servers, no GPU required, free daily usage with session limits
-
Same CLI experience: you still type
claudein your terminal. Nothing feels different
For learning, building demos, exploring Claude Code's features, and handling everyday coding tasks, these free alternatives are more than capable.
Part 1: Installing Ollama
macOS / Linux
Open your terminal and run:
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download the installer directly from ollama.com/download and run it. Ollama installs as a background service automatically.
Verify Installation
ollama --version
You should see version 0.14.0 or later (0.6.x is current as of mid-2026). This version is required for Anthropic API compatibility.
Start Ollama Server (if not auto-started)
ollama serve
This starts the local Ollama daemon on http://localhost:11434. On macOS, it usually starts automatically as a menu bar app.
Part 2: Essential Ollama Commands
These are the commands you'll use regularly. Learn them once and you'll never forget them.
Pull (Download) a Model
ollama pull <model-name>
Example:
ollama pull qwen3-coder:480b-cloud
For cloud models (with the :cloud suffix), this registers the model reference ā no actual weights are downloaded to your disk. Inference runs on Ollama's servers.
Run a Model Interactively
ollama run <model-name>
Example:
ollama run nemotron-3-super:cloud
This opens an interactive chat session in your terminal. Type your message and press Enter.
List All Downloaded / Registered Models
ollama ls
Remove a Model
ollama rm <model-name>
Check Running Models
ollama ps
Show Model Info
ollama show <model-name>
Stop a Running Model
ollama stop <model-name>
Connect Your Ollama Account (Required for Cloud Models)
ollama auth login
This opens a browser tab to ollama.com/connect. Sign in and your credentials are stored locally. All cloud model requests use this automatically.
Launch Claude Code Directly from Ollama
ollama launch claude
This is the magic command. ā” It starts Claude Code with Ollama as the backend, auto-sets all required environment variables, and prompts you to pick a model interactively.
You can also specify the model directly:
ollama launch claude --model nemotron-3-super:cloud
Part 3: Installing Claude Code
Claude Code requires Node.js 18+. Check your version first:
node --version
If you don't have Node.js, download it from nodejs.org.
Install Claude Code via npm
npm install -g @anthropic-ai/claude-code
Verify Installation
claude --version
Update Claude Code
npm update -g @anthropic-ai/claude-code
Part 4: Connecting Claude Code to Ollama
This is the three-step configuration. You set three environment variables to redirect Claude Code away from Anthropic's servers and toward your local Ollama instance.
Option A: Set Variables for the Current Session (Quick Test)
macOS / Linux:
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"
Windows (PowerShell):
$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
Option B: Permanent Setup (Recommended)
Add these lines to your ~/.bashrc, ~/.zshrc, or ~/.bash_profile:
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"
Then reload:
source ~/.zshrc
# or
source ~/.bashrc
Verify the Variables Are Set
printenv | grep ANTHROPIC
You should see all three variables.
Part 5: Running Claude Code with Ollama Models
Once your environment variables are set, start Claude Code like this:
Method 1: Using the claude Command Directly
claude --model nemotron-3-super:cloud --allow-dangerously-skip-permissions
Replace the model name with any Ollama-supported model. The --allow-dangerously-skip-permissions flag skips the interactive trust prompt (useful for faster startup during development).
Method 2: Using ollama launch claude (The Clean Way)
ollama launch claude
This method is recommended for most users. It:
- Auto-sets all environment variables
- Lets you pick the model from a list
- Handles all Anthropic API routing automatically
With a specific model:
ollama launch claude --model gpt-oss:120b-cloud
With auto-confirm and auto-pull:
ollama launch claude --model qwen3-coder:480b-cloud --yes
The --yes flag automatically pulls the model (if not already registered) and skips all confirmation prompts.
Method 3: Test Your Project Setup
# Create a test project
mkdir my-test-project
cd my-test-project
git init
echo "# My Project" > README.md
git add README.md
git commit -m "initial commit"
# Start Claude Code with Ollama
ollama launch claude --model gemma4:31b-cloud
You'll see the familiar Claude Code interface ā but it's now powered by a free open-source model. š
The 5 Free Ollama Cloud Models You Should Know
These are all available on Ollama's platform. Cloud models run on Ollama's servers, and all users get a free tier with daily/session usage limits. No GPU required on your machine.
ā Verification note: All five models below are confirmed available on Ollama's official model library (
ollama.com/library) as:cloudvariants. Ollama's free tier provides limited daily cloud usage ā heavier models (higher usage levels) consume your quota faster. For unlimited access, a Pro plan ($20/month) is available, but the free tier is genuinely useful for learning and everyday development.
1. nemotron-3-super:cloud
NVIDIA's Nemotron 3 Super is a 120B Mixture-of-Experts model that activates only 12B parameters at a time ā so you get strong reasoning quality with surprisingly efficient compute. It supports English, French, German, Italian, Japanese, Spanish, and Chinese, making it great for multilingual work and complex multi-agent tasks.
ollama run nemotron-3-super:cloud
2. gpt-oss:120b-cloud
OpenAI's open-weight model designed for reasoning, agentic tasks, and developer use cases. The 120B variant has strong tool-use support and thinking capabilities ā making it a natural fit for Claude Code's multi-step file editing and shell command workflows.
ollama run gpt-oss:120b-cloud
3. gemma4:31b-cloud
Google's Gemma 4 in the 31B size brings vision capabilities alongside solid general reasoning. It's a good all-rounder for chat, code review, and general development tasks. If you want a model that handles both text and images, this is your pick.
ollama run gemma4:31b-cloud
4. qwen3-vl:235b-cloud
Alibaba's most powerful vision-language model to date. With 235B parameters, it handles complex vision tasks alongside text, including image understanding, document parsing, and code from screenshots. If you're working on projects involving both visuals and code, this is a compelling option.
ollama run qwen3-vl:235b-cloud
5. qwen3-coder:480b-cloud
This is the one developers talk about most. Qwen3-Coder at 480B is specifically optimized for long-context coding and agentic tasks ā exactly the kind of work Claude Code does. It supports a 128K context window and is built for multi-file edits and code generation workflows. Heavy model (higher usage level), so pace yourself on the free tier.
ollama run qwen3-coder:480b-cloud
Model Comparison Table
Here's a clear side-by-side comparison of all five models to help you pick the right one for your task:
| Model | Parameters | Context Window | Vision | Tool Use | Thinking | Best For | Usage Level |
|---|---|---|---|---|---|---|---|
nemotron-3-super:cloud |
120B MoE (12B active) | 128K | ā | ā | ā | Reasoning, multi-agent, multilingual | Level 2 |
gpt-oss:120b-cloud |
120B | 128K | ā | ā | ā | Agentic tasks, coding, reasoning | Level 2 |
gemma4:31b-cloud |
31B | 128K | ā | ā | ā | General chat, multimodal, code review | Level 2 |
qwen3-vl:235b-cloud |
235B | 128K | ā | ā | ā | Vision + code, document parsing | Level 3 |
qwen3-coder:480b-cloud |
480B | 128K | ā | ā | ā | Code generation, agentic coding | Level 3ā4 |
š” Quick tip: On the free tier, start with
nemotron-3-super:cloudorgemma4:31b-cloud(Level 2) to stretch your free usage further. Useqwen3-coder:480b-cloudwhen you need serious coding power and have quota available.
Claude Code Commands You Should Know
Once inside a Claude Code session, here are the most useful commands:
| Command | What it does |
|---|---|
/help |
Shows all available commands |
/exit |
Exits Claude Code |
/clear |
Clears the conversation history |
/compact |
Compresses conversation to save context |
/model |
Shows the current model in use |
Ctrl+C |
Cancel the current response |
Tab |
Switch between planning and build mode |
Ctrl+P |
Opens options panel |
Best Tips for This Setup ā
Do these:
- Always verify your environment variables are set before starting Claude Code ā run
printenv | grep ANTHROPICto confirm - Start with
ollama launch claudeinstead of the manualclaudecommand. It's simpler and handles everything automatically - Use
qwen3-coder:480b-cloudfor serious coding sessions when you have quota. It's optimized for exactly this kind of work - Create a test project first before running Claude Code on real code, so you can verify everything works
- Sign into your Ollama account before using any
:cloudmodel ā runollama auth loginonce and you're done
Avoid these:
- Don't skip the
ollama auth loginstep for cloud models ā they won't run without it - Don't set
ANTHROPIC_API_KEYto a real Anthropic key when using Ollama ā leave it empty or set to a dummy value - Don't expect local inference speed to match cloud ā on a 16GB RAM machine, local models can be slow. Cloud models are faster for heavy tasks
- Don't use cloud models on the free tier for large batch jobs ā you'll hit session limits fast. Stick to focused coding sessions
Common Mistakes People Make
1. Forgetting to start the Ollama server
If you see a connection error when Claude Code starts, Ollama's daemon probably isn't running. Fix it:
ollama serve
2. Missing the ollama auth login step for cloud models
Cloud models (*:cloud) require you to be signed into your Ollama account. Without this, requests fail. Run ollama auth login once and your credentials are saved automatically.
3. Setting the wrong ANTHROPIC_BASE_URL
Some older guides suggest http://localhost:11434/v1. The correct URL for Ollama's Anthropic-compatible endpoint is:
http://localhost:11434
Without the /v1 suffix, unless you're specifically testing the OpenAI-compatible endpoint.
4. Expecting identical Claude behavior
Open-source models follow Claude Code's expected output format well for most tasks, but complex multi-step reasoning chains may behave differently than Anthropic's flagship models. For simple to mid-complexity tasks, they perform great. For highly complex architectural decisions, results may vary.
5. Running out of free cloud quota on heavy models
The Ollama free tier has session and weekly limits. Models like qwen3-coder:480b-cloud are large (Level 3ā4 usage), so they consume your quota faster. Be intentional with your requests on the free tier, and use gemma4:31b-cloud for lighter tasks to preserve quota.
Putting It All Together: Full Setup Walkthrough
Here's the complete setup from scratch in one place:
# Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Step 2: Verify Ollama version (needs 0.14+)
ollama --version
# Step 3: Sign in for cloud model access
ollama auth login
# Step 4: Install Claude Code
npm install -g @anthropic-ai/claude-code
# Step 5: Set environment variables (add to ~/.zshrc for permanent setup)
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"
# Step 6: Launch Claude Code with Ollama (the easy way)
ollama launch claude
# OR with a specific model
ollama launch claude --model nemotron-3-super:cloud
# OR the manual way
claude --model qwen3-coder:480b-cloud --allow-dangerously-skip-permissions
That's it. You're now running Claude Code on free, powerful open-source models. š
Conclusion
The combination of Claude Code and Ollama is genuinely exciting. What used to be an expensive tool reserved for developers with budget for API credits is now accessible to everyone ā students, indie developers, learners, and anyone curious about agentic AI coding.
The five cloud models covered here ā nemotron-3-super:cloud, gpt-oss:120b-cloud, gemma4:31b-cloud, qwen3-vl:235b-cloud, and qwen3-coder:480b-cloud ā are all available on Ollama's free tier and cover a strong range of use cases from coding to vision to multilingual reasoning.
The ollama launch claude command is genuinely the simplest way to start. One command, pick your model, and you're in. No manual environment variable juggling required.
Start free. Learn the workflow. When your usage outgrows the free tier, Ollama's Pro plan at $20/month is still a fraction of what Anthropic's API costs for heavy use.
Happy coding ā and may your context windows always be long enough. š
For more dev guides like this, visit hamidrazadev.com ā if this post saved you money or time, share it with a developer friend who could use it too. š
Top comments (0)