DEV Community

Cover image for Free-Model Playbook for Claude Code and Codex
Max Quimby
Max Quimby

Posted on • Originally published at agentconn.com

Free-Model Playbook for Claude Code and Codex

📖 Read the full version with charts and embedded sources on AgentConn →

The $200/Month Problem

Claude Code Max costs $200/month. Codex Pro costs $200/month. For a professional developer shipping production code daily, those are reasonable numbers. For a student, a hobbyist, a bootstrapped founder building nights and weekends, or an operator in a country where $200 is a month's rent — they're a wall.

But here's what changed in 2026: both Claude Code and Codex CLI now support third-party model backends. You can point them at any OpenAI-compatible API endpoint. That means you can route them through gateways that aggregate dozens of free models — and keep the agent harness you already know while paying nothing for inference.

A Hacker News thread titled "Reallocating $100/Month Claude Code Spend to Zed and OpenRouter" captured the shift. Developers are discovering that 80% of their coding tasks — completions, single-file edits, test generation, boilerplate — don't need a frontier model. They need a model that's good enough, available right now, and free.

This playbook covers two gateway options (OpenRouter and OmniRoute), the best free models for coding in June 2026, and step-by-step setup for both Claude Code and Codex CLI.

Two Gateway Options

You have two paths to free models. One is turnkey. The other gives you more control.

OpenRouter: The Turnkey Path

OpenRouter is an established routing gateway that aggregates hundreds of models from dozens of providers behind a single API endpoint. As of June 2026, it offers 29 free models — including GLM-5.2, DeepSeek V4 Flash, Qwen3-Coder, and Devstral 2. No local server, no Docker, no GPU. Sign up, get an API key, set three environment variables, and go.

The tradeoff: you're still routing through a third party. Your prompts hit OpenRouter's servers before reaching the model provider. Rate limits on the free tier are real (20 requests/min, 200/day). And you're subject to whatever availability and latency OpenRouter's routing layer introduces.

OmniRoute: The Self-Hosted Path

OmniRoute is a newer, open-source alternative (5.1K stars and climbing). It's a self-hosted gateway you run locally that aggregates 160+ providers, with 50+ offering free tiers. The headline feature beyond provider count is token compression: OmniRoute's RTK and Caveman modes claim 15–95% savings on token usage.

OmniRoute also supports MCP and A2A protocols natively. And because it runs on your machine, your prompts never leave your network until they hit the model provider directly.

OpenRouter vs OmniRoute — OpenRouter is a hosted service: zero setup, rate-limited free tier, your prompts transit their servers. OmniRoute is self-hosted infrastructure: more setup, no rate limits beyond the upstream provider's, prompts route directly from your machine to the model provider.

Best Free Models for Coding (June 2026)

GLM-5.2 (Z.ai / Zhipu)

The breakout model of June 2026. GLM-5.2 is MIT-licensed, uses a 753B Mixture of Experts architecture with a 1M token context window, and scores 62.1% on SWE-bench Pro. This is the first fully open-source model that competes with proprietary frontier models on real-world coding benchmarks.

Where to run it free:

  • OpenRouter: Available on the free tier with standard rate limits
  • Cloudflare Workers AI: Added June 16, 2026
  • Z.ai direct: The model creator offers free API access

DeepSeek V4 Flash

Free for a limited promotional period. Strong reasoning capabilities, particularly on multi-step problems.

Qwen3-Coder 480B

Alibaba's specialized coding model. 262K context window, state-of-the-art on agentic coding benchmarks, available on OpenRouter's free tier.

Devstral 2

Mistral's lightweight coding model. Fast and reliable for quick completions, simple edits, and boilerplate generation.

💡 Routing strategy — Point routine tasks at Devstral 2 or DeepSeek V4 Flash for speed. Reserve GLM-5.2 or Qwen3-Coder for multi-file refactors and complex reasoning.

Step-by-Step: Claude Code + OpenRouter

This is the fastest path from "paying $200/month" to "paying nothing."

1. Get an OpenRouter API key

Sign up at openrouter.ai and generate an API key.

2. Set the environment variables

export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_API_KEY="sk-or-v1-your-key-here"
export ANTHROPIC_MODEL="z-ai/glm-5.2"
Enter fullscreen mode Exit fullscreen mode

Three lines. Claude Code will now route all requests through OpenRouter.

3. Log out of the native Anthropic session

claude /logout
Enter fullscreen mode Exit fullscreen mode

4. Select your model

/model z-ai/glm-5.2
/model deepseek/deepseek-v4-flash
/model qwen/qwen3-coder-480b
Enter fullscreen mode Exit fullscreen mode

5. Understand the rate limits

Free tier: 20 requests/minute, 200 requests/day per model. Three models give you 600 free requests/day.

âš ī¸ Rate limit tip — The free tier rate limit of 200 requests/day is per-model, not per-account. Cycle through models to extend your free runway.

Step-by-Step: Claude Code + OmniRoute

1. Install OmniRoute

git clone https://github.com/diegosouzapw/OmniRoute.git
cd OmniRoute
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

2. Configure providers

providers:
  - name: openrouter-free
    base_url: https://openrouter.ai/api/v1
    api_key: ${OPENROUTER_API_KEY}
    models: ["z-ai/glm-5.2", "deepseek/deepseek-v4-flash"]
    priority: 1
  - name: cloudflare-workers
    base_url: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1
    api_key: ${CF_API_TOKEN}
    models: ["@cf/z-ai/glm-5.2"]
    priority: 2
Enter fullscreen mode Exit fullscreen mode

3. Start the gateway and point Claude Code at it

python omniroute.py --port 8080
export ANTHROPIC_BASE_URL="http://localhost:8080/v1"
export ANTHROPIC_API_KEY="omniroute-local"
export ANTHROPIC_MODEL="z-ai/glm-5.2"
Enter fullscreen mode Exit fullscreen mode

Step-by-Step: Codex CLI + Free Models

Codex CLI's Responses API requirement adds friction. You need a translation layer.

Option A: OpenRouter BYOK — The Knightli guide walks through configuring Codex CLI with OpenRouter's BYOK mode.

Option B: codeproxy-ai/cli — A local proxy that translates between API formats. See the community gist.

Option C: OmniRoute — Handles the API format translation natively. Point Codex at the same gateway.

â„šī¸ Codex CLI note — Codex CLI's Responses API requirement adds friction that Claude Code doesn't have. If cost is your primary concern, Claude Code's simpler gateway integration is an advantage.

When Free Models Break Down

Complex multi-file refactors. Operations requiring coordinated changes across 5+ files lose coherence on free models.

Deep architectural reasoning. "Redesign this module to use event sourcing" requires pattern understanding that free models don't reliably handle.

Very long agent loops. If each tool call succeeds 92% on a free model versus 98% on frontier, a 10-step workflow succeeds 43% versus 82%.

Subtle bug diagnosis. When the bug involves a race condition, stale cache, and off-by-one error interacting, free models fixate on one dimension.

The 80/20 Strategy

Routine tasks that free models handle reliably: completions, single-file edits, test generation, boilerplate, documentation, simple bug fixes.

Tasks that still justify paid tokens: multi-file refactors, architecture decisions, complex debugging, performance optimization, security-sensitive code review.

If 80% of your usage is routine, switching those tasks to free models turns a $200/month bill into $40/month — or $0/month if the free tier covers your volume.

The Operator's Checklist

  1. Start with OpenRouter. Sign up, get an API key, set three env vars. Running in under 5 minutes.
  2. Test your actual workload. Spend a day on free models. Note your personal 80/20 split.
  3. Set up model cycling. Three models = 600 free requests/day.
  4. Add OmniRoute when you need more control. Self-hosted gateway with token compression and automatic fallback.
  5. Keep a frontier model configured for the hard 20%.
  6. If using Codex CLI, set up the translation layer early.

The $200/month subscription isn't going away. But the wall between "paying developer" and "priced-out developer" just got a door. Three env vars, a free API key, and a model that scores 62.1% on SWE-bench Pro. That's the playbook.

Originally published at AgentConn

Top comments (0)