If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.
There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects ANTHROPIC_API_KEY. Max doesn't give you one.
We built a shim that bridges the two.
What it is
oauth.nucleusos.dev is an HTTP wrapper that exposes the Anthropic Messages API endpoint at /v1/messages. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.
To use it, set two environment variables in your agent:
ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>
Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.
The wire shape
Each incoming request hits an HMAC-validated gate (the ANTHROPIC_API_KEY value you set is treated as a shared secret, compared with hmac.compare_digest — no timing oracle). Valid requests get the Bearer token substituted and are forwarded to api.anthropic.com. The response streams back unchanged.
Security model: the shim trusts whoever holds the shared secret. This is for personal or team use — not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.
Smoke test results (actual, not synthetic)
-
GET /health→ 200{"status": "ok"} - Wrong shared secret → 401
- Valid request to
/v1/messageswith a real Claude model → returned actual Anthropic response shape,usage: {"input_tokens": 17, "output_tokens": 6}
What we added after the initial ship
The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:
- Gemini API routing on a separate port (8890) — same deployment, same auth model, second provider. One shim, Claude + Gemini.
- Non-root Dockerfile (
USER nobody) — the initial container ran as root, caught in review. -
_HTTP_TIMEOUT_Sbumped to 600 — the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom. -
gemini_keys.txtenv override — you can bake a key file into the image for air-gapped deployments.
Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.
The cost math
Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).
Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.
We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.
What this isn't
This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.
It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.
Self-hosting
The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the gemini_keys.txt override.
The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.
Why we built it in-house
We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.
The telemetry endpoint we shipped in the same cohort (eidetic.works/api/telemetry/metrics) is what lets us measure whether it's working — we can now count daemon installs without relying on download counts or manual surveys.
What's next
The Gemini routing is v1 — it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.
If you build on this pattern, drop a comment below — interested in what you add.
Read more on what we're building at eidetic.works
Top comments (0)