WonderLab

Posted on Apr 27

One Open Source Project a Day (No.49): free-claude-code - Run Claude Code for Free with One Environment Variable

#opensource #claude #nvidia #llm

Introduction

"When a tool is too expensive, programmers build a cheaper key."

This is article No.49 in the "One Open Source Project a Day" series. Today's project is free-claude-code (GitHub).

Claude Code is Anthropic's AI coding agent — deeply integrated into the terminal and VS Code, able to autonomously read files, edit code, and run commands. The catch: it requires a real Anthropic API key, and those API calls add up fast.

free-claude-code is built on a deceptively simple insight: Claude Code is just an HTTP client that calls the Anthropic Messages API. If you run a compatible proxy server locally that intercepts those requests and forwards them to any free or cheap backend, Claude Code has no idea the difference. Change one environment variable (ANTHROPIC_BASE_URL), and suddenly your $20/month tool is running on NVIDIA's free GPU credits.

14,300+ Stars, 2,000+ Forks — it surged to the top of GitHub Trending in April and stayed there for four consecutive days. The author, Ali Khokhar (Alishahryar1), was virtually unknown before this project. His other pinned repos have 3 stars each. This is a textbook "one-hit wonder" in the best sense.

What You'll Learn

The core proxy architecture: how ANTHROPIC_BASE_URL redirection works
Multi-backend routing: NVIDIA NIM free tier, OpenRouter free models, local Ollama
API format translation: adapting Anthropic Messages ↔ OpenAI Chat Completions
Thinking Token conversion: mapping <think> tags to Claude's native thinking blocks
Discord/Telegram bot mode: remote-control Claude Code from your phone
Real limitations: what you actually lose when you swap out the model

Prerequisites

Comfortable with terminal operations and environment variables
Basic familiarity with Claude Code
Basic understanding of REST APIs (optional)

Project Background

What Is It?

free-claude-code is a local HTTP proxy server built with FastAPI that emulates the Anthropic API interface. When Claude Code sends API requests, the proxy intercepts them, translates formats, routes to a configured free backend, converts the response back to Anthropic format, and returns it to Claude Code — completely transparently.

The name is blunt and accurate: free-claude-code. It runs Claude Code for free.

The core architecture in one sentence:

Claude Code → sends Anthropic API requests
           → proxy intercepts at localhost:8082
           → translates to OpenAI format
           → forwards to NVIDIA NIM / OpenRouter / Ollama
           → translates response back to Anthropic format
           → returns to Claude Code as if nothing happened

About the Author

Author: Ali Khokhar (GitHub: Alishahryar1)
Location: Sunnyvale, California
Bio: "Writing easily understandable code..."
Background: Individual developer; before this project, essentially no significant GitHub presence. Classic "overnight" open-source success.

Project Stats

⭐ GitHub Stars: 14,300+ (10,700+ in April alone)
🍴 Forks: 2,000+
👥 Contributors: 22
📄 License: MIT
📈 Trend: Topped GitHub Trending (Python + global) April 24-27

Key Features

6 Free Backends

NVIDIA NIM    → Free tier: 40 req/min, models: GLM-4, Llama 3, Mistral
OpenRouter    → 580+ models, many with daily free quotas
DeepSeek      → Ultra-low cost, native Anthropic Messages format support
LM Studio     → Local GUI for running quantized models
llama.cpp     → CPU/GPU inference, maximum control
Ollama        → One-line local model deployment, completely offline

Per-Model-Tier Routing

Claude Code internally uses three model "tiers" (Opus, Sonnet, Haiku) for different task complexity levels. free-claude-code lets you route each tier to a different backend:

# Route heavy reasoning tasks to a large model
MODEL_OPUS=nvidia_nim/nvidia/llama-3.1-nemotron-ultra-253b-v1

# Route standard coding to a mid-size model
MODEL_SONNET=nvidia_nim/nvidia/llama-3.3-70b-instruct

# Route simple tasks (status checks, classification) to a small fast model
MODEL_HAIKU=nvidia_nim/meta/llama-3.1-8b-instruct

Thinking Token Conversion

Some open-source models (DeepSeek-R1, QwQ, GLM-Z1) output their reasoning wrapped in <think> tags. The proxy automatically extracts these and maps them to Claude's native thinking content blocks:

# Proxy logic (simplified)
if "<think>" in model_output:
    thinking_content = extract_between_tags("<think>", "</think>", model_output)
    response["content"].insert(0, {
        "type": "thinking",
        "thinking": thinking_content
    })

This means VS Code's Claude Code extension — which collapses and displays thinking blocks — works correctly with open-source reasoning models.

Intelligent Rate Limiting

NVIDIA NIM's free tier caps at 40 requests/minute. Claude Code's aggressive request pattern (frequent context sends, tool calls) hits this quickly. The proxy implements two-layer protection:

Proactive throttling: Before sending a request, predicts whether it would exceed the rate limit and preemptively waits
Reactive backoff: On receiving 429 Too Many Requests, parses retry-after headers or applies exponential backoff
Concurrency control: PROVIDER_MAX_CONCURRENCY env var limits simultaneous in-flight requests

Discord/Telegram Bot Mode

Beyond pure proxying, the project manages Claude Code's full lifecycle in bot mode:

Phone message: "refactor the auth module to use async/await"
      ↓
Telegram/Discord bot receives message
      ↓
Spawns Claude Code subprocess in CLAUDE_WORKSPACE directory
      ↓
Streams real-time output back to your chat window
      ↓
Session persisted for follow-up messages

Quick Start

# 1. Clone the repo
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code

# 2. Copy config
cp .env.example .env

# 3. Edit .env — add your free NVIDIA NIM key
# NVIDIA_NIM_API_KEY=nvapi-xxxx
# MODEL_OPUS=nvidia_nim/nvidia/llama-3.1-nemotron-ultra-253b-v1
# MODEL_SONNET=nvidia_nim/nvidia/llama-3.3-70b-instruct
# MODEL_HAIKU=nvidia_nim/meta/llama-3.1-8b-instruct

# 4. Start the proxy (requires uv: pip install uv)
uv run uvicorn server:app --host 0.0.0.0 --port 8082

# 5. In another terminal, run Claude Code through the proxy
ANTHROPIC_BASE_URL="http://localhost:8082" claude

That's it. Claude Code is now running on NVIDIA's free GPU infrastructure. No Anthropic API key. No charges.

Deep Dive

Architecture: Transparent Proxy Pattern

Claude Code CLI / VS Code Extension
              │
              │  POST /v1/messages (Anthropic format)
              ▼
┌─────────────────────────────────────┐
│        free-claude-code Proxy       │
│  ┌────────────────────────────────┐ │
│  │    FastAPI Router              │ │
│  │  ┌──────────────────────────┐  │ │
│  │  │ Request parsing &        │  │ │
│  │  │ model tier routing       │  │ │
│  │  └──────────┬───────────────┘  │ │
│  │             │                   │ │
│  │  ┌──────────▼───────────────┐  │ │
│  │  │ Format translation layer │  │ │
│  │  │  Anthropic → OpenAI      │  │ │
│  │  │  (or direct passthrough) │  │ │
│  │  └──────────┬───────────────┘  │ │
│  │             │                   │ │
│  │  ┌──────────▼───────────────┐  │ │
│  │  │ Rate limiting & retry    │  │ │
│  │  └──────────┬───────────────┘  │ │
│  └─────────────┼─────────────────┘ │
└────────────────┼────────────────────┘
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
NVIDIA NIM   OpenRouter    Ollama
(free tier)  (free models) (local)

The Hard Part: API Format Translation

This is the real engineering challenge. Anthropic Messages API and OpenAI Chat Completions API differ significantly:

Anthropic format (what Claude Code sends):

{
  "model": "claude-opus-4-5",
  "max_tokens": 8096,
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Refactor this code"},
        {"type": "tool_result", "tool_use_id": "xxx", "content": "..."}
      ]
    }
  ],
  "tools": [...],
  "thinking": {"type": "enabled", "budget_tokens": 5000}
}

OpenAI format (what NVIDIA NIM / OpenRouter accepts):

{
  "model": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
  "messages": [
    {"role": "user", "content": "Refactor this code"}
  ],
  "tools": [...],
  "stream": true
}

The translation points the proxy handles:

Multimodal content: Flatten Anthropic's content array (text, images, tool results) into OpenAI's string or object format
Tool calls: tool_use blocks ↔ function_call / tool_calls
Streaming: Convert backend SSE stream to Anthropic's event: content_block_delta format
Thinking blocks: Extract <think> tags into separate thinking content blocks

Full Environment Variable Reference

# === Backend selection ===
NVIDIA_NIM_API_KEY=nvapi-xxxxx
OPENROUTER_API_KEY=sk-or-xxxxx
DEEPSEEK_API_KEY=sk-xxxxx
OLLAMA_BASE_URL=http://localhost:11434

# === Model tier routing ===
MODEL_OPUS=nvidia_nim/nvidia/llama-3.1-nemotron-ultra-253b-v1
MODEL_SONNET=nvidia_nim/nvidia/llama-3.3-70b-instruct
MODEL_HAIKU=nvidia_nim/meta/llama-3.1-8b-instruct

# === Thinking support ===
ENABLE_SONNET_THINKING=true
ENABLE_OPUS_THINKING=true

# === Rate limiting ===
PROVIDER_RATE_LIMIT=1           # max requests/second
PROVIDER_MAX_CONCURRENCY=5      # max concurrent requests

# === Bot config (optional) ===
MESSAGING_PLATFORM=telegram
TELEGRAM_BOT_TOKEN=xxxxx
ALLOWED_TELEGRAM_USER_ID=123456789

Limitations Worth Knowing

Before setting this up, these caveats matter:

1. Model quality gap is the fundamental trade-off
free-claude-code gives you the Claude Code interface, not Claude models. Open-source models on free tiers lag behind Claude Sonnet/Opus on complex multi-step reasoning, instruction following stability, and tool call reliability. One YouTube reviewer put it clearly: "Simply wrapping Claude's application layer around open LLMs will not produce the same quality output."

2. NVIDIA NIM free tier exhausts quickly
40 req/min sounds like a lot until Claude Code starts sending its full context window on every turn. Rate-limited sessions introduce noticeable pauses. Real coding sessions will hit the ceiling.

3. Tool call compatibility is imperfect
Claude Code depends heavily on structured tool calls (file read/write, bash execution, search). Open-source models vary in their tool call formatting discipline. The proxy includes heuristic parsing as a fallback, but failures happen — especially with smaller models.

4. Claude-specific features unavailable

Computer Use (Anthropic's vision+interaction feature)
True Extended Thinking (deep reasoning mode)
Latest Claude training data and safety alignment

5. Local models need real hardware
Running quality coding models locally (e.g., Qwen2.5-Coder-32B via Ollama) requires 20-24GB+ VRAM. Most consumer GPUs won't run the best models comfortably.

How It Compares

Project	Approach	vs. free-claude-code
openclaw	Alternative Claude Code CLI	Independent implementation; doesn't use the official Claude Code client
Aider	Standalone AI coding CLI	Mature and stable; native multi-model support; different UX entirely
OpenCode	Terminal AI coding tool	Native multi-model design, not a proxy
free-claude-code	API proxy layer	Preserves the complete Claude Code UX; just swaps the backend

The unique value: users already fluent in Claude Code's workflow don't need to learn a new tool. One environment variable change, costs go to zero.

Resources

Official

🌟 GitHub: https://github.com/Alishahryar1/free-claude-code
📋 Issues: https://github.com/Alishahryar1/free-claude-code/issues

Summary

Key Takeaways

The insight is the trick: Claude Code is just an Anthropic API client. ANTHROPIC_BASE_URL redirection is all it takes — the engineering complexity is in the format translation layer, not the core idea
Real technical work: Anthropic ↔ OpenAI format conversion, streaming response handling, tool call adaptation, and Thinking Token mapping — these are non-trivial engineering challenges the project solves well
"Free" has a price: You save on API costs but trade model quality. NVIDIA NIM's free tier rate limits will make sessions feel slow during heavy usage
Community momentum: 14k+ stars, 22 contributors, active issues — the project is iterating fast on the rough edges
Use case clarity: Best for learning/experimentation and offline private deployment; not a substitute for production-grade Claude in complex agentic tasks

Who Should Use This

Students and hobbyists: Want to experience Claude Code's full terminal agent workflow without paying for an API subscription
Model researchers: Want to compare open-source models using Claude Code's interface as a consistent test harness
Enterprise private deployment: Using Ollama for a fully offline, air-gapped AI coding assistant on internal infrastructure
Remote coding enthusiasts: Using the Telegram/Discord bot to control a remote server's Claude Code instance from a phone

A Question Worth Sitting With

free-claude-code's virality is more than just "free stuff is popular." It reflects a specific tension: Anthropic built a genuinely excellent developer tool, then gated it behind a per-token billing model that makes sustained use expensive. The community's immediate response was to route around it.

The question isn't whether this is "ethical" — it's clearly operating in a gray zone. The more interesting question is what it signals: when developers immediately build free alternatives to paid AI tools, it suggests the underlying capability is perceived as infrastructure, not a premium product. Infrastructure wants to be free or at least flat-rate. The market is making that preference clear.

Visit my personal site for more useful knowledge and interesting products

DEV Community