free-claude-code: Route Claude Code Through NVIDIA NIM, OpenRouter, and Ollama for $0

If you've wanted to try Claude Code but balked at the $20 Pro subscription or API credit commitment, there's an open-source proxy worth knowing about: free-claude-code.

It's a small FastAPI server that sits on localhost:8082. You set ANTHROPIC_BASE_URL=http://localhost:8082 and Claude Code's API calls get rerouted to whatever backend you've mapped. The proxy handles bidirectional translation between Anthropic's SSE format and OpenAI chat / Anthropic Messages.

Why it's interesting

Two reasons.

First, the backends. NVIDIA NIM gives you 40 req/min for free with just an nvapi- key. OpenRouter has free models like DeepSeek R1. LM Studio, llama.cpp, and Ollama run locally with no rate limits at all.

Second, the model tier mapping. Claude Code internally picks between Opus/Sonnet/Haiku based on task complexity. The proxy intercepts that decision and routes each tier to a different backend you choose.

MODEL_OPUS="nvidia_nim/moonshotai/kimi-k2.5"
MODEL_SONNET="open_router/deepseek/deepseek-r1-0528:free"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="nvidia_nim/z-ai/glm4.7"

Heavy reasoning → Kimi K2.5 on NIM. Regular work → free DeepSeek R1 on OpenRouter. Fast responses → local LM Studio. Total cost: $0. And critically, it preserves Claude Code's own tier-routing logic, which is the part you can't easily replicate by manually configuring a single model.

Built-in features that aren't obvious

Thinking token conversion: <think> tags and reasoning_content from reasoning models get converted to Claude's native thinking blocks. You don't lose the reasoning trace.
Heuristic tool parser: Some free models emit tool calls as text. The proxy parses them back into structured tool_use blocks. Imperfect but helpful.
Subagent control: It intercepts Claude Code's Task tool and forces run_in_background=False. Without this, free-tier quotas can disappear in seconds when subagents spawn in parallel.
Smart rate limiting: Rolling-window throttling and 429 exponential backoff are baked in.
Discord/Telegram bots: Optional. Tree-based threading, session persistence, voice note transcription. You can trigger coding sessions remotely.

Setup

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.14

# 2. Clone and configure
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code
cp .env.example .env
# Edit .env: add NVIDIA_NIM_API_KEY=nvapi-xxx

# 3. Run the proxy
uv run uvicorn server:app --host 0.0.0.0 --port 8082

Then in another terminal:

ANTHROPIC_AUTH_TOKEN="freecc" \
ANTHROPIC_BASE_URL="http://localhost:8082" \
claude

ANTHROPIC_AUTH_TOKEN can be any string. The proxy doesn't validate it.

For convenience, alias it:

fcc() {
  ANTHROPIC_AUTH_TOKEN="freecc" \
  ANTHROPIC_BASE_URL="http://localhost:8082" \
  claude "$@"
}

There's also claude-pick, an fzf-based interactive selector if you don't want to keep editing .env to swap models.

Honest tradeoffs

This isn't a free Claude Code. It's a way to route Claude Code's interface to different backends, with the quality and reliability tradeoffs that implies.

Tool use accuracy: The hardest part. Anthropic's models are tuned heavily for accurate tool calling. Free models are usually weaker, and the heuristic parser only goes so far.
Latency: NIM's 40 req/min cap means heavy parallel work will throttle.
Translation loss: The OpenAI chat → Anthropic conversion can drop features in edge cases.
Self-hosted reliability: Uptime is on you.

If you ship production code, Claude Pro at $20/month is still the right answer. Where this shines:

Trying Claude Code's workflow before committing to a subscription.
Sensitive code (client data, internal repos) that you want processed entirely locally via Ollama.
Hybrid setups: real Anthropic for the Opus tier, free backends for Sonnet/Haiku. Cuts cost 70-90% with surprisingly small quality loss for boilerplate work.
Remote autonomous coding via the Discord/Telegram bots.

Where I'd start

NIM-only setup. Get an nvapi- key, set all three MODEL_* to NIM models, run the proxy, point Claude Code at it. Twenty minutes of setup, zero cost, and you can immediately tell whether Claude Code's workflow fits how you work.

Once that's working, layer in Ollama for sensitive work and OpenRouter free models for variety. Keep a separate .env per project so you can swap the routing strategy per repo.

Repo: https://github.com/Alishahryar1/free-claude-code