DEV Community

Jerry Tian
Jerry Tian

Posted on • Originally published at tianjerry.com

Run Claude Code For Free (for Fun and Profit)

Uber blew through its entire 2026 IT budget on AI in 4 months. The State of FinOps Report now lists "FinOps for AI" as the number one priority for 2026 — surpassing traditional cloud cost optimization for the first time. And in typical fighting-an-AI-problem-with-more-AI fashion, Google announced a FinOps Explainability agent at Cloud Next '26 whose entire job is to autonomously investigate why your other AI is costing so much. They also shipped Spend Caps that literally pause your API traffic when the budget runs out.

When a hyperscaler builds a product specifically to help you stop spending money on their platform, you know the burn rate has gotten out of control.

The default for AI-powered dev tools is a $20–200/month subscription piped to someone else's servers. Most developers don't question it. They sign up, hand over a credit card, and start streaming every keystroke to a data center in Virginia. But the silicon you already own can run surprisingly capable models. And the best agentic coding tool — Claude Code — already speaks the protocol you need to point it at a local model or a free cloud endpoint instead.

Your Code Doesn't Need a PhD

Most coding tasks aren't "explain quantum mechanics." They're summarize, classify, extract, rewrite, refactor. Local models handle these well. A 9B parameter model running on your laptop can rubber-duck a bug, suggest a refactor, and scaffold a component without ever touching a network socket.

As Brad Taunt put it: you took a UX feature and turned it into a distributed system that costs you money. The same logic applies to dev tooling. Every keystroke going to a remote API means latency, data exposure, vendor lock-in, and billing surprises. You're paying to solve problems your own hardware can solve.

The Setup Is Two Scripts

Here's the key insight: Claude Code doesn't care where the model lives. It speaks an OpenAI-compatible protocol. You can point it at a cloud endpoint, a local Ollama instance, or a free tier — the agentic scaffolding (tool use, file editing, shell execution, git awareness) stays the same.

Free cloud models via OpenRouter:

#!/bin/bash
MODEL="openrouter/auto:free"

export ANTHROPIC_DEFAULT_HAIKU_MODEL="$MODEL"
export ANTHROPIC_DEFAULT_SONNET_MODEL="$MODEL"
export ANTHROPIC_DEFAULT_OPUS_MODEL="$MODEL"
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-..."  # your OpenRouter API key
export ANTHROPIC_API_KEY=""
claude --model "$MODEL"
Enter fullscreen mode Exit fullscreen mode

Save this as claude-openrouter somewhere on your PATH (/usr/local/bin, ~/.local/bin, etc.), chmod +x it, and you can launch it from anywhere with a single command.

Local models via Ollama:

#!/bin/bash
MODEL="gemma4:31b"

export ANTHROPIC_DEFAULT_HAIKU_MODEL="$MODEL"
export ANTHROPIC_DEFAULT_SONNET_MODEL="$MODEL"
export ANTHROPIC_DEFAULT_OPUS_MODEL="$MODEL"
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
ollama pull "$MODEL"
claude --model "$MODEL"
Enter fullscreen mode Exit fullscreen mode

Save this one as claude-ollama. Zero cost. Zero network. Zero privacy concerns. The model runs on your machine, the data stays on your machine, and you can work on an airplane.

chmod +x claude-openrouter claude-ollama
mv claude-openrouter claude-ollama /usr/local/bin/
Enter fullscreen mode Exit fullscreen mode

Now you have two commands: claude-openrouter for free cloud inference, claude-ollama for fully local. Pick whichever fits the moment.

Why Claude Code and Not the Alternatives

Johanna Larsson recently documented her experience running local models with Pi and OpenCode. Both work, but both have friction.

Pi is powerful, but you can end up spending more time tuning the harness than doing the work. OpenCode is promising, but still rough around edge cases.

Claude Code is the more mature agentic scaffolding. Battle-tested tool use, file editing, shell execution, git awareness, and context management. The critical difference: you get the best tool interface regardless of which model you choose. The tool is the constant. The model is the variable. Swap in Gemma 4 locally, swap in a free OpenRouter model on the road, swap in Sonnet when you need the heavy artillery. The workflow stays identical.

When Free Catches Up to Frontier

Here's the prediction that should make investors nervous: the gap between open models and frontier models is shrinking on a predictable curve. Look at the OpenRouter rankings — open and non-frontier models are already taking serious share on OpenRouter:

  • Tencent GLM / Hy3 Preview — 2.68T tokens/week (+12%)
  • Moonshot Kimi K2.6 (Qwen family) — 1.61T tokens/week (+11%)
  • DeepSeek V4 Flash — 1.11T tokens/week (+58%)
  • Google Gemma 3 Flash — 1.07T tokens/week (+11%)
  • DeepSeek V3.2 — 868B tokens/week (+4%)
  • DeepSeek V4 Pro — 816B tokens/week (+99%)
  • MiniMax M2.7 — 745B tokens/week (+2%)

DeepSeek, MiniMax, Tencent, and Qwen-family models are already eating Claude and GPT's market share on token volume. The business risk isn't that open models become better than frontier models. It's that they become good enough for high-volume work. Each generation closes the distance faster than the last. What Opus 4.7 can do today, an open model running on your laptop will do in 12–18 months. Maybe less.

When that happens, the pricing model of OpenAI and Anthropic collapses. Their valuations assume the moat is intelligence itself. But intelligence is commoditizing faster than any technology in history. Every architecture gets open-sourced. Every training trick leaks. And here's the thing — most tasks are overserved by frontier models anyway. You wouldn't hire Einstein to be your restaurant's chef. You don't need a 200 IQ model to rename a variable, write a unit test, or scaffold a React component.

Imagine in the not-so-distant future, an open model catches up to Opus 4.7 capability. You run it locally, or hit it through a free endpoint. The willingness to pay $200/month evaporates overnight for most developers.

The Bottom Line

We should be investing in local model ecosystems — not as a hobby, but as infrastructure. Open models as a subsystem in your workflow saves on token cost, sure. But the deeper reason is structural. The tech industry has a tendency to concentrate power into fewer and fewer hands. The lesson of Lord of the Rings was not that the wrong person had the ring. It was that the ring should not exist. We've seen this movie before with cloud, with social, with search.

AI doesn't have to follow the same script. The models can run on your hardware. The weights can be open. The inference can be distributed. No single company needs to be the gatekeeper between you and your own development workflow.

Two scripts. Zero dollars. Full agentic coding. Keep the ring distributed.

Originally published at tianjerry.com.

Top comments (0)