Muhammad Hamid Raza

Posted on May 19

How to Use Claude Code with Ollama for Free (+ 5 Powerful Cloud Models You Need to Try)

#ai #opensource #llm #webdev

You love Claude Code. You hate the API bill. 😅

If you've used Claude Code even moderately, you've probably watched your Anthropic credits disappear faster than a coffee on a Monday morning. Heavy agentic sessions — reading files, editing code, running commands — can rack up $50 to $200+ a month on flagship models. That's a real cost for indie developers and learners.

Here's the good news: you don't need to pay a single dollar to use Claude Code anymore.

Thanks to Ollama's native support for the Anthropic Messages API (added in v0.14.0), you can point Claude Code at free open-source models running either locally on your machine or on Ollama's free cloud tier. Same CLI. Same workflow. Zero Anthropic tokens.

Want to know exactly how to set it all up — step by step, command by command — and which free cloud models are worth your time? Let's go. 🚀

What Is This Setup All About?

Claude Code is Anthropic's terminal-based AI coding agent. It reads your codebase, edits files across multiple locations, runs shell commands, calls tools, and handles multi-step tasks — all from your terminal, driven by natural language.

By default, it talks to Anthropic's API. But there's one small environment variable — ANTHROPIC_BASE_URL — that lets you redirect all those requests to a completely different backend.

Ollama is an open-source tool that lets you run AI models on your own machine. Since v0.14.0 (January 2026), Ollama also exposes an Anthropic-compatible Messages API on localhost:11434. That means Claude Code and Ollama now speak the same language natively — no adapters, no hacks.

The result: Claude Code's full feature set (file editing, tool calls, multi-step reasoning, git integration) running against powerful open-source models at zero cost.

Why This Matters

For most developers, Claude Code was exciting in theory but expensive in practice. A typical heavy session — analyzing a codebase, refactoring multiple files, generating tests — could burn through dollars fast.

This setup changes the math completely:

Local models: completely free, private, works offline, no data leaves your machine
Ollama cloud models (free tier): runs on Ollama's servers, no GPU required, free daily usage with session limits
Same CLI experience: you still type claude in your terminal. Nothing feels different

For learning, building demos, exploring Claude Code's features, and handling everyday coding tasks, these free alternatives are more than capable.

Part 1: Installing Ollama

macOS / Linux

Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer directly from ollama.com/download and run it. Ollama installs as a background service automatically.

Verify Installation

ollama --version

You should see version 0.14.0 or later (0.6.x is current as of mid-2026). This version is required for Anthropic API compatibility.

Start Ollama Server (if not auto-started)

ollama serve

This starts the local Ollama daemon on http://localhost:11434. On macOS, it usually starts automatically as a menu bar app.

Part 2: Essential Ollama Commands

These are the commands you'll use regularly. Learn them once and you'll never forget them.

Pull (Download) a Model

ollama pull <model-name>

Example:

ollama pull qwen3-coder:480b-cloud

For cloud models (with the :cloud suffix), this registers the model reference — no actual weights are downloaded to your disk. Inference runs on Ollama's servers.

Run a Model Interactively

ollama run <model-name>

Example:

ollama run nemotron-3-super:cloud

This opens an interactive chat session in your terminal. Type your message and press Enter.

List All Downloaded / Registered Models

ollama ls

Remove a Model

ollama rm <model-name>

Check Running Models

ollama ps

Show Model Info

ollama show <model-name>

Stop a Running Model

ollama stop <model-name>

Connect Your Ollama Account (Required for Cloud Models)

ollama auth login

This opens a browser tab to ollama.com/connect. Sign in and your credentials are stored locally. All cloud model requests use this automatically.

Launch Claude Code Directly from Ollama

ollama launch claude

This is the magic command. ⚡ It starts Claude Code with Ollama as the backend, auto-sets all required environment variables, and prompts you to pick a model interactively.

You can also specify the model directly:

ollama launch claude --model nemotron-3-super:cloud

Part 3: Installing Claude Code

Claude Code requires Node.js 18+. Check your version first:

node --version

If you don't have Node.js, download it from nodejs.org.

Install Claude Code via npm

npm install -g @anthropic-ai/claude-code

Verify Installation

claude --version

Update Claude Code

npm update -g @anthropic-ai/claude-code

Part 4: Connecting Claude Code to Ollama

This is the three-step configuration. You set three environment variables to redirect Claude Code away from Anthropic's servers and toward your local Ollama instance.

Option A: Set Variables for the Current Session (Quick Test)

macOS / Linux:

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

Windows (PowerShell):

$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"

Option B: Permanent Setup (Recommended)

Add these lines to your ~/.bashrc, ~/.zshrc, or ~/.bash_profile:

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

Then reload:

source ~/.zshrc
# or
source ~/.bashrc

Verify the Variables Are Set

printenv | grep ANTHROPIC

You should see all three variables.

Part 5: Running Claude Code with Ollama Models

Once your environment variables are set, start Claude Code like this:

Method 1: Using the `claude` Command Directly

claude --model nemotron-3-super:cloud --allow-dangerously-skip-permissions

Replace the model name with any Ollama-supported model. The --allow-dangerously-skip-permissions flag skips the interactive trust prompt (useful for faster startup during development).

Method 2: Using `ollama launch claude` (The Clean Way)

ollama launch claude

This method is recommended for most users. It:

Auto-sets all environment variables
Lets you pick the model from a list
Handles all Anthropic API routing automatically

With a specific model:

ollama launch claude --model gpt-oss:120b-cloud

With auto-confirm and auto-pull:

ollama launch claude --model qwen3-coder:480b-cloud --yes

The --yes flag automatically pulls the model (if not already registered) and skips all confirmation prompts.

Method 3: Test Your Project Setup

# Create a test project
mkdir my-test-project
cd my-test-project
git init
echo "# My Project" > README.md
git add README.md
git commit -m "initial commit"

# Start Claude Code with Ollama
ollama launch claude --model gemma4:31b-cloud

You'll see the familiar Claude Code interface — but it's now powered by a free open-source model. 🎉

The 5 Free Ollama Cloud Models You Should Know

These are all available on Ollama's platform. Cloud models run on Ollama's servers, and all users get a free tier with daily/session usage limits. No GPU required on your machine.

✅ Verification note: All five models below are confirmed available on Ollama's official model library (ollama.com/library) as :cloud variants. Ollama's free tier provides limited daily cloud usage — heavier models (higher usage levels) consume your quota faster. For unlimited access, a Pro plan ($20/month) is available, but the free tier is genuinely useful for learning and everyday development.

1. `nemotron-3-super:cloud`

NVIDIA's Nemotron 3 Super is a 120B Mixture-of-Experts model that activates only 12B parameters at a time — so you get strong reasoning quality with surprisingly efficient compute. It supports English, French, German, Italian, Japanese, Spanish, and Chinese, making it great for multilingual work and complex multi-agent tasks.

ollama run nemotron-3-super:cloud

2. `gpt-oss:120b-cloud`

OpenAI's open-weight model designed for reasoning, agentic tasks, and developer use cases. The 120B variant has strong tool-use support and thinking capabilities — making it a natural fit for Claude Code's multi-step file editing and shell command workflows.

ollama run gpt-oss:120b-cloud

3. `gemma4:31b-cloud`

Google's Gemma 4 in the 31B size brings vision capabilities alongside solid general reasoning. It's a good all-rounder for chat, code review, and general development tasks. If you want a model that handles both text and images, this is your pick.

ollama run gemma4:31b-cloud

4. `qwen3-vl:235b-cloud`

Alibaba's most powerful vision-language model to date. With 235B parameters, it handles complex vision tasks alongside text, including image understanding, document parsing, and code from screenshots. If you're working on projects involving both visuals and code, this is a compelling option.

ollama run qwen3-vl:235b-cloud

5. `qwen3-coder:480b-cloud`

This is the one developers talk about most. Qwen3-Coder at 480B is specifically optimized for long-context coding and agentic tasks — exactly the kind of work Claude Code does. It supports a 128K context window and is built for multi-file edits and code generation workflows. Heavy model (higher usage level), so pace yourself on the free tier.

ollama run qwen3-coder:480b-cloud

Model Comparison Table

Here's a clear side-by-side comparison of all five models to help you pick the right one for your task:

Model	Parameters	Context Window	Vision	Tool Use	Thinking	Best For	Usage Level
`nemotron-3-super:cloud`	120B MoE (12B active)	128K	❌	✅	❌	Reasoning, multi-agent, multilingual	Level 2
`gpt-oss:120b-cloud`	120B	128K	❌	✅	✅	Agentic tasks, coding, reasoning	Level 2
`gemma4:31b-cloud`	31B	128K	✅	✅	❌	General chat, multimodal, code review	Level 2
`qwen3-vl:235b-cloud`	235B	128K	✅	✅	❌	Vision + code, document parsing	Level 3
`qwen3-coder:480b-cloud`	480B	128K	❌	✅	❌	Code generation, agentic coding	Level 3–4

💡 Quick tip: On the free tier, start with nemotron-3-super:cloud or gemma4:31b-cloud (Level 2) to stretch your free usage further. Use qwen3-coder:480b-cloud when you need serious coding power and have quota available.

Claude Code Commands You Should Know

Once inside a Claude Code session, here are the most useful commands:

Command	What it does
`/help`	Shows all available commands
`/exit`	Exits Claude Code
`/clear`	Clears the conversation history
`/compact`	Compresses conversation to save context
`/model`	Shows the current model in use
`Ctrl+C`	Cancel the current response
`Tab`	Switch between planning and build mode
`Ctrl+P`	Opens options panel

Best Tips for This Setup ✅

Do these:

Always verify your environment variables are set before starting Claude Code — run printenv | grep ANTHROPIC to confirm
Start with ollama launch claude instead of the manual claude command. It's simpler and handles everything automatically
Use qwen3-coder:480b-cloud for serious coding sessions when you have quota. It's optimized for exactly this kind of work
Create a test project first before running Claude Code on real code, so you can verify everything works
Sign into your Ollama account before using any :cloud model — run ollama auth login once and you're done

Avoid these:

Don't skip the ollama auth login step for cloud models — they won't run without it
Don't set ANTHROPIC_API_KEY to a real Anthropic key when using Ollama — leave it empty or set to a dummy value
Don't expect local inference speed to match cloud — on a 16GB RAM machine, local models can be slow. Cloud models are faster for heavy tasks
Don't use cloud models on the free tier for large batch jobs — you'll hit session limits fast. Stick to focused coding sessions

Common Mistakes People Make

1. Forgetting to start the Ollama server

If you see a connection error when Claude Code starts, Ollama's daemon probably isn't running. Fix it:

ollama serve

2. Missing the `ollama auth login` step for cloud models

Cloud models (*:cloud) require you to be signed into your Ollama account. Without this, requests fail. Run ollama auth login once and your credentials are saved automatically.

3. Setting the wrong `ANTHROPIC_BASE_URL`

Some older guides suggest http://localhost:11434/v1. The correct URL for Ollama's Anthropic-compatible endpoint is:

http://localhost:11434

Without the /v1 suffix, unless you're specifically testing the OpenAI-compatible endpoint.

4. Expecting identical Claude behavior

Open-source models follow Claude Code's expected output format well for most tasks, but complex multi-step reasoning chains may behave differently than Anthropic's flagship models. For simple to mid-complexity tasks, they perform great. For highly complex architectural decisions, results may vary.

5. Running out of free cloud quota on heavy models

The Ollama free tier has session and weekly limits. Models like qwen3-coder:480b-cloud are large (Level 3–4 usage), so they consume your quota faster. Be intentional with your requests on the free tier, and use gemma4:31b-cloud for lighter tasks to preserve quota.

Putting It All Together: Full Setup Walkthrough

Here's the complete setup from scratch in one place:

# Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Step 2: Verify Ollama version (needs 0.14+)
ollama --version

# Step 3: Sign in for cloud model access
ollama auth login

# Step 4: Install Claude Code
npm install -g @anthropic-ai/claude-code

# Step 5: Set environment variables (add to ~/.zshrc for permanent setup)
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

# Step 6: Launch Claude Code with Ollama (the easy way)
ollama launch claude

# OR with a specific model
ollama launch claude --model nemotron-3-super:cloud

# OR the manual way
claude --model qwen3-coder:480b-cloud --allow-dangerously-skip-permissions

That's it. You're now running Claude Code on free, powerful open-source models. 🎉

Conclusion

The combination of Claude Code and Ollama is genuinely exciting. What used to be an expensive tool reserved for developers with budget for API credits is now accessible to everyone — students, indie developers, learners, and anyone curious about agentic AI coding.

The five cloud models covered here — nemotron-3-super:cloud, gpt-oss:120b-cloud, gemma4:31b-cloud, qwen3-vl:235b-cloud, and qwen3-coder:480b-cloud — are all available on Ollama's free tier and cover a strong range of use cases from coding to vision to multilingual reasoning.

The ollama launch claude command is genuinely the simplest way to start. One command, pick your model, and you're in. No manual environment variable juggling required.

Start free. Learn the workflow. When your usage outgrows the free tier, Ollama's Pro plan at $20/month is still a fraction of what Anthropic's API costs for heavy use.

Happy coding — and may your context windows always be long enough. 🚀

For more dev guides like this, visit hamidrazadev.com — if this post saved you money or time, share it with a developer friend who could use it too. 🙌

What Is This Setup All About?

Why This Matters

Part 1: Installing Ollama

macOS / Linux

Windows

Verify Installation

Start Ollama Server (if not auto-started)

Part 2: Essential Ollama Commands

Pull (Download) a Model

Run a Model Interactively

List All Downloaded / Registered Models

Remove a Model

Check Running Models

Show Model Info

Stop a Running Model

Connect Your Ollama Account (Required for Cloud Models)

Launch Claude Code Directly from Ollama

Part 3: Installing Claude Code

Install Claude Code via npm

Verify Installation

Update Claude Code

Part 4: Connecting Claude Code to Ollama

Option A: Set Variables for the Current Session (Quick Test)

Option B: Permanent Setup (Recommended)

Verify the Variables Are Set

Part 5: Running Claude Code with Ollama Models

Method 1: Using the claude Command Directly

Method 2: Using ollama launch claude (The Clean Way)

Method 3: Test Your Project Setup

The 5 Free Ollama Cloud Models You Should Know

1. nemotron-3-super:cloud

2. gpt-oss:120b-cloud

3. gemma4:31b-cloud

4. qwen3-vl:235b-cloud

5. qwen3-coder:480b-cloud

Model Comparison Table

Claude Code Commands You Should Know

Best Tips for This Setup ✅

Common Mistakes People Make

1. Forgetting to start the Ollama server

2. Missing the ollama auth login step for cloud models

3. Setting the wrong ANTHROPIC_BASE_URL

4. Expecting identical Claude behavior

5. Running out of free cloud quota on heavy models

Putting It All Together: Full Setup Walkthrough

Conclusion

Method 1: Using the `claude` Command Directly

Method 2: Using `ollama launch claude` (The Clean Way)

1. `nemotron-3-super:cloud`

2. `gpt-oss:120b-cloud`

3. `gemma4:31b-cloud`

4. `qwen3-vl:235b-cloud`

5. `qwen3-coder:480b-cloud`

2. Missing the `ollama auth login` step for cloud models

3. Setting the wrong `ANTHROPIC_BASE_URL`