TobiasBond

Posted on Feb 28

How I Built a Cost Proxy to Stop OpenClaw from Burning My API Budget

#agents #ai #api #showdev

If you've been in the OpenClaw community for more than a week, you've seen the posts.

"$3,600/month on API calls."
"Woke up to a $200 bill from a heartbeat loop running all night."
"I have zero visibility into what my agent is spending."

OpenClaw is one of the most exciting open-source projects of 2026 — 210K+ GitHub stars, a personal AI agent that actually does things. But the moment you give an AI agent unrestricted access to paid APIs, you're playing with fire.

I kept seeing these horror stories and realized nobody had built a proper solution. There are a few monitoring tools out there (ClawMetry gives you read-only stats, Tokscale is CLI-only), but nothing that actually stops the bleeding in real time.

So I built TokPinch.

What TokPinch Does

TokPinch is a transparent proxy that sits between OpenClaw and your LLM provider (Anthropic, OpenAI). Every API request passes through it.

Setup is literally one line in your OpenClaw config:

ANTHROPIC_BASE_URL=http://localhost:4100/v1

That's it. Your agent doesn't know TokPinch exists. But now you have:

Real-time cost tracking — every request logged with model, tokens, cost, and session
Budget enforcement — set daily/monthly limits that actually block requests when exceeded
Loop detection — catches runaway agents (rapid fire, repeated content, cost spirals, heartbeat storms) and pauses them automatically
Smart model routing — automatically downgrades cheap tasks (heartbeats, short messages) from Opus to Haiku, saving 10-50%
Telegram/email alerts — get notified the second something goes wrong
A dashboard that doesn't suck — dark mode, real-time WebSocket updates, cost charts, budget gauges

The Architecture

I wanted TokPinch to be fast, self-hosted, and have zero dependencies on external services.

OpenClaw → TokPinch (localhost:4100) → Anthropic/OpenAI
                ↓
         SQLite (metadata only)
                ↓
         React Dashboard + WebSocket
                ↓
         Telegram/Email Alerts

The tech stack:

TypeScript — end to end, because life is too short for runtime type errors
Fastify — fastest Node.js HTTP framework, perfect for a proxy
SQLite (better-sqlite3) — zero config, WAL mode for concurrent reads, file-based so it deploys anywhere
React 18 + Vite + Tailwind — for the dashboard
Docker — multi-stage build, runs as non-root with read-only filesystem

The key design decision: TokPinch never stores API keys or message content. Keys pass through in headers and are discarded immediately. Only metadata hits the database (model name, token counts, cost, timestamp, session ID). This is documented in our SECURITY.md.

Building the Loop Detector

This was the most interesting engineering challenge. OpenClaw agents can get stuck in loops — the infamous heartbeat bug, where the agent sends the same message repeatedly, burning through your budget at 20+ requests per minute.

I implemented four detection rules:

Rapid fire — more than 20 requests per minute from the same session
Repeated content — same message hash appearing 5+ times in 5 minutes (uses djb2 hash on first 200 chars)
Cost spiral — more than $2 spent in a 5-minute window
Heartbeat storm — 10+ heartbeat-pattern messages in 10 minutes

When any rule triggers, TokPinch pauses that session with exponential backoff (starting at 5 minutes, doubling up to 30 minutes). The agent gets a clear error message, and you get a Telegram alert.

🔄 Loop detected! Session loop-test sent identical content 6 times 
in 5 minutes. Spending $0.0000. Paused for 5 minute(s).

The circular buffer approach keeps memory usage constant — 100 slots per session, O(1) lookups.

Smart Model Routing

This is the feature that saves real money. Not every API call needs the most expensive model.

When OpenClaw sends a heartbeat ping or a short message like "hi" to Claude Opus ($15/MTok input), TokPinch intercepts it and routes to Haiku ($0.80/MTok input) instead. The response quality for trivial tasks is identical, but the cost drops by ~95%.

The routing rules are configurable:

Route to cheap model when: message is under 200 tokens, no tools/images/documents, system prompt under 500 tokens
Never downgrade when: user explicitly set the model, images or documents are present, more than 5 tools are being used

During testing, a request to claude-opus-4 with just "hi" was correctly routed to claude-haiku-4-5, confirmed in the server logs:

🔀 Routed: claude-opus-4 → claude-haiku-4-5-20251001 (low_token_chat, saved ~$0.0037)

Security: Built for the OpenClaw Crisis

Security isn't an afterthought — it's a feature. OpenClaw has had a rough security track record: one-click RCE, 824+ malicious skills on ClawHub, 42,000+ exposed instances. TokPinch sitting in the API request path means it must be bulletproof.

What we did:

API keys are never stored or logged — pino logger has redact paths for all auth headers
Zero message content on disk — the requests table schema literally has no column for it
Docker runs as non-root with read-only filesystem and no-new-privileges
JWT auth with auto-generated 512-bit secrets
Rate limiting on every endpoint (proxy, API, and login)
Content-Security-Policy, X-Frame-Options, HSTS headers on all responses
Test endpoints auto-disabled in production

The full audit is in SECURITY.md.

The Dashboard

I wanted the dashboard to feel like a proper product, not a developer afterthought. Dark theme (zinc-950 base), JetBrains Mono for numbers, Outfit for headings, real-time WebSocket updates, Framer Motion animations.

Overview — 4 stat cards, cost-over-time chart, model breakdown, budget gauges, live request feed

Sessions — every session with cost, request count, tokens, and the most-used model. Expandable rows showing individual requests.

Budget — arc gauges showing spend vs. limit, status badges (ACTIVE/WARNING/PAUSED/OVERRIDE), one-click resume after manual review.

Alerts — all budget warnings, loop detections, and daily digests with filter tabs and delivery status.

What I Learned

Native modules on Windows are painful. better-sqlite3 needs to be compiled for your exact Node.js version. Switching Node versions (via nvm) breaks the binary. Solution: always npm rebuild after version changes.
Streaming proxies are tricky. SSE (Server-Sent Events) responses from Anthropic need to be intercepted without buffering — you want zero-latency passthrough while still accumulating usage data from the final event. The SSEInterceptor Transform stream handles this.
Test with real money before shipping. Mock tests proved the code worked. Real Anthropic API calls found two critical bugs: a wrong default model ID in routing rules, and a missing anthropic-version header that the proxy wasn't injecting. Both would have broken every user's setup.
Security documentation is a feature. In the OpenClaw ecosystem where trust is low (malicious skills, exposed instances), having a thorough SECURITY.md that explains exactly how API keys are handled makes people comfortable using your tool.

Try It

TokPinch is 100% free and open source (MIT licensed).

Quick start:

docker run -p 4100:4100 -v tokpinch-data:/app/data \
  -e DASHBOARD_PASSWORD=yourpassword \
  tokpinch/tokpinch

Then add one line to your OpenClaw config:

ANTHROPIC_BASE_URL=http://localhost:4100/v1

Open http://localhost:4100/dashboard and watch your costs in real time.

Links:

GitHub: github.com/TobieTom/tokpinch
Landing page: tokpinch.vercel.app
Cloud version waitlist: tokpinch.vercel.app/#waitlist

If you're running OpenClaw or any AI agent with paid API access, give TokPinch a try. Star the repo if it's useful, and open an issue if you find bugs.

Built by TobieTom 🇳🇬

What features would you want to see next? Drop a comment below or open a GitHub issue.

DEV Community