DEV Community

Nucleus OS
Nucleus OS

Posted on • Originally published at eidetic.works

Running Claude Code at zero per-token cost: the Max-plan OAuth shim pattern

If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.

There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects ANTHROPIC_API_KEY. Max doesn't give you one.

We built a shim that bridges the two.

What it is

oauth.nucleusos.dev is an HTTP wrapper that exposes the Anthropic Messages API endpoint at /v1/messages. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.

To use it, set two environment variables in your agent:

ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>
Enter fullscreen mode Exit fullscreen mode

Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.

The wire shape

Each incoming request hits an HMAC-validated gate (the ANTHROPIC_API_KEY value you set is treated as a shared secret, compared with hmac.compare_digest — no timing oracle). Valid requests get the Bearer token substituted and are forwarded to api.anthropic.com. The response streams back unchanged.

Security model: the shim trusts whoever holds the shared secret. This is for personal or team use — not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.

Smoke test results (actual, not synthetic)

  • GET /health → 200 {"status": "ok"}
  • Wrong shared secret → 401
  • Valid request to /v1/messages with a real Claude model → returned actual Anthropic response shape, usage: {"input_tokens": 17, "output_tokens": 6}

What we added after the initial ship

The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:

  • Gemini API routing on a separate port (8890) — same deployment, same auth model, second provider. One shim, Claude + Gemini.
  • Non-root Dockerfile (USER nobody) — the initial container ran as root, caught in review.
  • _HTTP_TIMEOUT_S bumped to 600 — the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.
  • gemini_keys.txt env override — you can bake a key file into the image for air-gapped deployments.

Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.

The cost math

Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).

Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.

We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.

What this isn't

This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.

It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.

Self-hosting

The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the gemini_keys.txt override.

The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.

Why we built it in-house

We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.

The telemetry endpoint we shipped in the same cohort (eidetic.works/api/telemetry/metrics) is what lets us measure whether it's working — we can now count daemon installs without relying on download counts or manual surveys.

What's next

The Gemini routing is v1 — it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.

If you build on this pattern, drop a comment below — interested in what you add.


Read more on what we're building at eidetic.works

Top comments (0)