DEV Community

Cover image for How to Use Claude Code with Ollama for Free (+ 5 Powerful Cloud Models You Need to Try)
Muhammad Hamid Raza
Muhammad Hamid Raza

Posted on

How to Use Claude Code with Ollama for Free (+ 5 Powerful Cloud Models You Need to Try)

You love Claude Code. You hate the API bill. šŸ˜…

If you've used Claude Code even moderately, you've probably watched your Anthropic credits disappear faster than a coffee on a Monday morning. Heavy agentic sessions — reading files, editing code, running commands — can rack up $50 to $200+ a month on flagship models. That's a real cost for indie developers and learners.

Here's the good news: you don't need to pay a single dollar to use Claude Code anymore.

Thanks to Ollama's native support for the Anthropic Messages API (added in v0.14.0), you can point Claude Code at free open-source models running either locally on your machine or on Ollama's free cloud tier. Same CLI. Same workflow. Zero Anthropic tokens.

Want to know exactly how to set it all up — step by step, command by command — and which free cloud models are worth your time? Let's go. šŸš€


What Is This Setup All About?

Claude Code is Anthropic's terminal-based AI coding agent. It reads your codebase, edits files across multiple locations, runs shell commands, calls tools, and handles multi-step tasks — all from your terminal, driven by natural language.

By default, it talks to Anthropic's API. But there's one small environment variable — ANTHROPIC_BASE_URL — that lets you redirect all those requests to a completely different backend.

Ollama is an open-source tool that lets you run AI models on your own machine. Since v0.14.0 (January 2026), Ollama also exposes an Anthropic-compatible Messages API on localhost:11434. That means Claude Code and Ollama now speak the same language natively — no adapters, no hacks.

The result: Claude Code's full feature set (file editing, tool calls, multi-step reasoning, git integration) running against powerful open-source models at zero cost.


Why This Matters

For most developers, Claude Code was exciting in theory but expensive in practice. A typical heavy session — analyzing a codebase, refactoring multiple files, generating tests — could burn through dollars fast.

This setup changes the math completely:

  • Local models: completely free, private, works offline, no data leaves your machine
  • Ollama cloud models (free tier): runs on Ollama's servers, no GPU required, free daily usage with session limits
  • Same CLI experience: you still type claude in your terminal. Nothing feels different

For learning, building demos, exploring Claude Code's features, and handling everyday coding tasks, these free alternatives are more than capable.


Part 1: Installing Ollama

macOS / Linux

Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Windows

Download the installer directly from ollama.com/download and run it. Ollama installs as a background service automatically.

Verify Installation

ollama --version
Enter fullscreen mode Exit fullscreen mode

You should see version 0.14.0 or later (0.6.x is current as of mid-2026). This version is required for Anthropic API compatibility.

Start Ollama Server (if not auto-started)

ollama serve
Enter fullscreen mode Exit fullscreen mode

This starts the local Ollama daemon on http://localhost:11434. On macOS, it usually starts automatically as a menu bar app.


Part 2: Essential Ollama Commands

These are the commands you'll use regularly. Learn them once and you'll never forget them.

Pull (Download) a Model

ollama pull <model-name>
Enter fullscreen mode Exit fullscreen mode

Example:

ollama pull qwen3-coder:480b-cloud
Enter fullscreen mode Exit fullscreen mode

For cloud models (with the :cloud suffix), this registers the model reference — no actual weights are downloaded to your disk. Inference runs on Ollama's servers.

Run a Model Interactively

ollama run <model-name>
Enter fullscreen mode Exit fullscreen mode

Example:

ollama run nemotron-3-super:cloud
Enter fullscreen mode Exit fullscreen mode

This opens an interactive chat session in your terminal. Type your message and press Enter.

List All Downloaded / Registered Models

ollama ls
Enter fullscreen mode Exit fullscreen mode

Remove a Model

ollama rm <model-name>
Enter fullscreen mode Exit fullscreen mode

Check Running Models

ollama ps
Enter fullscreen mode Exit fullscreen mode

Show Model Info

ollama show <model-name>
Enter fullscreen mode Exit fullscreen mode

Stop a Running Model

ollama stop <model-name>
Enter fullscreen mode Exit fullscreen mode

Connect Your Ollama Account (Required for Cloud Models)

ollama auth login
Enter fullscreen mode Exit fullscreen mode

This opens a browser tab to ollama.com/connect. Sign in and your credentials are stored locally. All cloud model requests use this automatically.

Launch Claude Code Directly from Ollama

ollama launch claude
Enter fullscreen mode Exit fullscreen mode

This is the magic command. ⚔ It starts Claude Code with Ollama as the backend, auto-sets all required environment variables, and prompts you to pick a model interactively.

You can also specify the model directly:

ollama launch claude --model nemotron-3-super:cloud
Enter fullscreen mode Exit fullscreen mode

Part 3: Installing Claude Code

Claude Code requires Node.js 18+. Check your version first:

node --version
Enter fullscreen mode Exit fullscreen mode

If you don't have Node.js, download it from nodejs.org.

Install Claude Code via npm

npm install -g @anthropic-ai/claude-code
Enter fullscreen mode Exit fullscreen mode

Verify Installation

claude --version
Enter fullscreen mode Exit fullscreen mode

Update Claude Code

npm update -g @anthropic-ai/claude-code
Enter fullscreen mode Exit fullscreen mode

Part 4: Connecting Claude Code to Ollama

This is the three-step configuration. You set three environment variables to redirect Claude Code away from Anthropic's servers and toward your local Ollama instance.

Option A: Set Variables for the Current Session (Quick Test)

macOS / Linux:

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"
Enter fullscreen mode Exit fullscreen mode

Windows (PowerShell):

$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
Enter fullscreen mode Exit fullscreen mode

Option B: Permanent Setup (Recommended)

Add these lines to your ~/.bashrc, ~/.zshrc, or ~/.bash_profile:

export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"
Enter fullscreen mode Exit fullscreen mode

Then reload:

source ~/.zshrc
# or
source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Verify the Variables Are Set

printenv | grep ANTHROPIC
Enter fullscreen mode Exit fullscreen mode

You should see all three variables.


Part 5: Running Claude Code with Ollama Models

Once your environment variables are set, start Claude Code like this:

Method 1: Using the claude Command Directly

claude --model nemotron-3-super:cloud --allow-dangerously-skip-permissions
Enter fullscreen mode Exit fullscreen mode

Replace the model name with any Ollama-supported model. The --allow-dangerously-skip-permissions flag skips the interactive trust prompt (useful for faster startup during development).

Method 2: Using ollama launch claude (The Clean Way)

ollama launch claude
Enter fullscreen mode Exit fullscreen mode

This method is recommended for most users. It:

  • Auto-sets all environment variables
  • Lets you pick the model from a list
  • Handles all Anthropic API routing automatically

With a specific model:

ollama launch claude --model gpt-oss:120b-cloud
Enter fullscreen mode Exit fullscreen mode

With auto-confirm and auto-pull:

ollama launch claude --model qwen3-coder:480b-cloud --yes
Enter fullscreen mode Exit fullscreen mode

The --yes flag automatically pulls the model (if not already registered) and skips all confirmation prompts.

Method 3: Test Your Project Setup

# Create a test project
mkdir my-test-project
cd my-test-project
git init
echo "# My Project" > README.md
git add README.md
git commit -m "initial commit"

# Start Claude Code with Ollama
ollama launch claude --model gemma4:31b-cloud
Enter fullscreen mode Exit fullscreen mode

You'll see the familiar Claude Code interface — but it's now powered by a free open-source model. šŸŽ‰


The 5 Free Ollama Cloud Models You Should Know

These are all available on Ollama's platform. Cloud models run on Ollama's servers, and all users get a free tier with daily/session usage limits. No GPU required on your machine.

āœ… Verification note: All five models below are confirmed available on Ollama's official model library (ollama.com/library) as :cloud variants. Ollama's free tier provides limited daily cloud usage — heavier models (higher usage levels) consume your quota faster. For unlimited access, a Pro plan ($20/month) is available, but the free tier is genuinely useful for learning and everyday development.

1. nemotron-3-super:cloud

NVIDIA's Nemotron 3 Super is a 120B Mixture-of-Experts model that activates only 12B parameters at a time — so you get strong reasoning quality with surprisingly efficient compute. It supports English, French, German, Italian, Japanese, Spanish, and Chinese, making it great for multilingual work and complex multi-agent tasks.

ollama run nemotron-3-super:cloud
Enter fullscreen mode Exit fullscreen mode

2. gpt-oss:120b-cloud

OpenAI's open-weight model designed for reasoning, agentic tasks, and developer use cases. The 120B variant has strong tool-use support and thinking capabilities — making it a natural fit for Claude Code's multi-step file editing and shell command workflows.

ollama run gpt-oss:120b-cloud
Enter fullscreen mode Exit fullscreen mode

3. gemma4:31b-cloud

Google's Gemma 4 in the 31B size brings vision capabilities alongside solid general reasoning. It's a good all-rounder for chat, code review, and general development tasks. If you want a model that handles both text and images, this is your pick.

ollama run gemma4:31b-cloud
Enter fullscreen mode Exit fullscreen mode

4. qwen3-vl:235b-cloud

Alibaba's most powerful vision-language model to date. With 235B parameters, it handles complex vision tasks alongside text, including image understanding, document parsing, and code from screenshots. If you're working on projects involving both visuals and code, this is a compelling option.

ollama run qwen3-vl:235b-cloud
Enter fullscreen mode Exit fullscreen mode

5. qwen3-coder:480b-cloud

This is the one developers talk about most. Qwen3-Coder at 480B is specifically optimized for long-context coding and agentic tasks — exactly the kind of work Claude Code does. It supports a 128K context window and is built for multi-file edits and code generation workflows. Heavy model (higher usage level), so pace yourself on the free tier.

ollama run qwen3-coder:480b-cloud
Enter fullscreen mode Exit fullscreen mode

Model Comparison Table

Here's a clear side-by-side comparison of all five models to help you pick the right one for your task:

Model Parameters Context Window Vision Tool Use Thinking Best For Usage Level
nemotron-3-super:cloud 120B MoE (12B active) 128K āŒ āœ… āŒ Reasoning, multi-agent, multilingual Level 2
gpt-oss:120b-cloud 120B 128K āŒ āœ… āœ… Agentic tasks, coding, reasoning Level 2
gemma4:31b-cloud 31B 128K āœ… āœ… āŒ General chat, multimodal, code review Level 2
qwen3-vl:235b-cloud 235B 128K āœ… āœ… āŒ Vision + code, document parsing Level 3
qwen3-coder:480b-cloud 480B 128K āŒ āœ… āŒ Code generation, agentic coding Level 3–4

šŸ’” Quick tip: On the free tier, start with nemotron-3-super:cloud or gemma4:31b-cloud (Level 2) to stretch your free usage further. Use qwen3-coder:480b-cloud when you need serious coding power and have quota available.


Claude Code Commands You Should Know

Once inside a Claude Code session, here are the most useful commands:

Command What it does
/help Shows all available commands
/exit Exits Claude Code
/clear Clears the conversation history
/compact Compresses conversation to save context
/model Shows the current model in use
Ctrl+C Cancel the current response
Tab Switch between planning and build mode
Ctrl+P Opens options panel

Best Tips for This Setup āœ…

Do these:

  • Always verify your environment variables are set before starting Claude Code — run printenv | grep ANTHROPIC to confirm
  • Start with ollama launch claude instead of the manual claude command. It's simpler and handles everything automatically
  • Use qwen3-coder:480b-cloud for serious coding sessions when you have quota. It's optimized for exactly this kind of work
  • Create a test project first before running Claude Code on real code, so you can verify everything works
  • Sign into your Ollama account before using any :cloud model — run ollama auth login once and you're done

Avoid these:

  • Don't skip the ollama auth login step for cloud models — they won't run without it
  • Don't set ANTHROPIC_API_KEY to a real Anthropic key when using Ollama — leave it empty or set to a dummy value
  • Don't expect local inference speed to match cloud — on a 16GB RAM machine, local models can be slow. Cloud models are faster for heavy tasks
  • Don't use cloud models on the free tier for large batch jobs — you'll hit session limits fast. Stick to focused coding sessions

Common Mistakes People Make

1. Forgetting to start the Ollama server

If you see a connection error when Claude Code starts, Ollama's daemon probably isn't running. Fix it:

ollama serve
Enter fullscreen mode Exit fullscreen mode

2. Missing the ollama auth login step for cloud models

Cloud models (*:cloud) require you to be signed into your Ollama account. Without this, requests fail. Run ollama auth login once and your credentials are saved automatically.

3. Setting the wrong ANTHROPIC_BASE_URL

Some older guides suggest http://localhost:11434/v1. The correct URL for Ollama's Anthropic-compatible endpoint is:

http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

Without the /v1 suffix, unless you're specifically testing the OpenAI-compatible endpoint.

4. Expecting identical Claude behavior

Open-source models follow Claude Code's expected output format well for most tasks, but complex multi-step reasoning chains may behave differently than Anthropic's flagship models. For simple to mid-complexity tasks, they perform great. For highly complex architectural decisions, results may vary.

5. Running out of free cloud quota on heavy models

The Ollama free tier has session and weekly limits. Models like qwen3-coder:480b-cloud are large (Level 3–4 usage), so they consume your quota faster. Be intentional with your requests on the free tier, and use gemma4:31b-cloud for lighter tasks to preserve quota.


Putting It All Together: Full Setup Walkthrough

Here's the complete setup from scratch in one place:

# Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Step 2: Verify Ollama version (needs 0.14+)
ollama --version

# Step 3: Sign in for cloud model access
ollama auth login

# Step 4: Install Claude Code
npm install -g @anthropic-ai/claude-code

# Step 5: Set environment variables (add to ~/.zshrc for permanent setup)
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

# Step 6: Launch Claude Code with Ollama (the easy way)
ollama launch claude

# OR with a specific model
ollama launch claude --model nemotron-3-super:cloud

# OR the manual way
claude --model qwen3-coder:480b-cloud --allow-dangerously-skip-permissions
Enter fullscreen mode Exit fullscreen mode

That's it. You're now running Claude Code on free, powerful open-source models. šŸŽ‰


Conclusion

The combination of Claude Code and Ollama is genuinely exciting. What used to be an expensive tool reserved for developers with budget for API credits is now accessible to everyone — students, indie developers, learners, and anyone curious about agentic AI coding.

The five cloud models covered here — nemotron-3-super:cloud, gpt-oss:120b-cloud, gemma4:31b-cloud, qwen3-vl:235b-cloud, and qwen3-coder:480b-cloud — are all available on Ollama's free tier and cover a strong range of use cases from coding to vision to multilingual reasoning.

The ollama launch claude command is genuinely the simplest way to start. One command, pick your model, and you're in. No manual environment variable juggling required.

Start free. Learn the workflow. When your usage outgrows the free tier, Ollama's Pro plan at $20/month is still a fraction of what Anthropic's API costs for heavy use.

Happy coding — and may your context windows always be long enough. šŸš€


For more dev guides like this, visit hamidrazadev.com — if this post saved you money or time, share it with a developer friend who could use it too. šŸ™Œ

Top comments (0)