If you've been in the OpenClaw community for more than a week, you've seen the posts.
"$3,600/month on API calls."
"Woke up to a $200 bill from a heartbeat loop running all night."
"I have zero visibility into what my agent is spending."
OpenClaw is one of the most exciting open-source projects of 2026 — 210K+ GitHub stars, a personal AI agent that actually does things. But the moment you give an AI agent unrestricted access to paid APIs, you're playing with fire.
I kept seeing these horror stories and realized nobody had built a proper solution. There are a few monitoring tools out there (ClawMetry gives you read-only stats, Tokscale is CLI-only), but nothing that actually stops the bleeding in real time.
So I built TokPinch.
What TokPinch Does
TokPinch is a transparent proxy that sits between OpenClaw and your LLM provider (Anthropic, OpenAI). Every API request passes through it.
Setup is literally one line in your OpenClaw config:
ANTHROPIC_BASE_URL=http://localhost:4100/v1
That's it. Your agent doesn't know TokPinch exists. But now you have:
- Real-time cost tracking — every request logged with model, tokens, cost, and session
- Budget enforcement — set daily/monthly limits that actually block requests when exceeded
- Loop detection — catches runaway agents (rapid fire, repeated content, cost spirals, heartbeat storms) and pauses them automatically
- Smart model routing — automatically downgrades cheap tasks (heartbeats, short messages) from Opus to Haiku, saving 10-50%
- Telegram/email alerts — get notified the second something goes wrong
- A dashboard that doesn't suck — dark mode, real-time WebSocket updates, cost charts, budget gauges
The Architecture
I wanted TokPinch to be fast, self-hosted, and have zero dependencies on external services.
OpenClaw → TokPinch (localhost:4100) → Anthropic/OpenAI
↓
SQLite (metadata only)
↓
React Dashboard + WebSocket
↓
Telegram/Email Alerts
The tech stack:
- TypeScript — end to end, because life is too short for runtime type errors
- Fastify — fastest Node.js HTTP framework, perfect for a proxy
- SQLite (better-sqlite3) — zero config, WAL mode for concurrent reads, file-based so it deploys anywhere
- React 18 + Vite + Tailwind — for the dashboard
- Docker — multi-stage build, runs as non-root with read-only filesystem
The key design decision: TokPinch never stores API keys or message content. Keys pass through in headers and are discarded immediately. Only metadata hits the database (model name, token counts, cost, timestamp, session ID). This is documented in our SECURITY.md.
Building the Loop Detector
This was the most interesting engineering challenge. OpenClaw agents can get stuck in loops — the infamous heartbeat bug, where the agent sends the same message repeatedly, burning through your budget at 20+ requests per minute.
I implemented four detection rules:
- Rapid fire — more than 20 requests per minute from the same session
- Repeated content — same message hash appearing 5+ times in 5 minutes (uses djb2 hash on first 200 chars)
- Cost spiral — more than $2 spent in a 5-minute window
- Heartbeat storm — 10+ heartbeat-pattern messages in 10 minutes
When any rule triggers, TokPinch pauses that session with exponential backoff (starting at 5 minutes, doubling up to 30 minutes). The agent gets a clear error message, and you get a Telegram alert.
🔄 Loop detected! Session loop-test sent identical content 6 times
in 5 minutes. Spending $0.0000. Paused for 5 minute(s).
The circular buffer approach keeps memory usage constant — 100 slots per session, O(1) lookups.
Smart Model Routing
This is the feature that saves real money. Not every API call needs the most expensive model.
When OpenClaw sends a heartbeat ping or a short message like "hi" to Claude Opus ($15/MTok input), TokPinch intercepts it and routes to Haiku ($0.80/MTok input) instead. The response quality for trivial tasks is identical, but the cost drops by ~95%.
The routing rules are configurable:
- Route to cheap model when: message is under 200 tokens, no tools/images/documents, system prompt under 500 tokens
- Never downgrade when: user explicitly set the model, images or documents are present, more than 5 tools are being used
During testing, a request to claude-opus-4 with just "hi" was correctly routed to claude-haiku-4-5, confirmed in the server logs:
🔀 Routed: claude-opus-4 → claude-haiku-4-5-20251001 (low_token_chat, saved ~$0.0037)
Security: Built for the OpenClaw Crisis
Security isn't an afterthought — it's a feature. OpenClaw has had a rough security track record: one-click RCE, 824+ malicious skills on ClawHub, 42,000+ exposed instances. TokPinch sitting in the API request path means it must be bulletproof.
What we did:
- API keys are never stored or logged — pino logger has redact paths for all auth headers
-
Zero message content on disk — the
requeststable schema literally has no column for it - Docker runs as non-root with read-only filesystem and
no-new-privileges - JWT auth with auto-generated 512-bit secrets
- Rate limiting on every endpoint (proxy, API, and login)
- Content-Security-Policy, X-Frame-Options, HSTS headers on all responses
- Test endpoints auto-disabled in production
The full audit is in SECURITY.md.
The Dashboard
I wanted the dashboard to feel like a proper product, not a developer afterthought. Dark theme (zinc-950 base), JetBrains Mono for numbers, Outfit for headings, real-time WebSocket updates, Framer Motion animations.
Overview — 4 stat cards, cost-over-time chart, model breakdown, budget gauges, live request feed
Sessions — every session with cost, request count, tokens, and the most-used model. Expandable rows showing individual requests.
Budget — arc gauges showing spend vs. limit, status badges (ACTIVE/WARNING/PAUSED/OVERRIDE), one-click resume after manual review.
Alerts — all budget warnings, loop detections, and daily digests with filter tabs and delivery status.
What I Learned
Native modules on Windows are painful. better-sqlite3 needs to be compiled for your exact Node.js version. Switching Node versions (via nvm) breaks the binary. Solution: always
npm rebuildafter version changes.Streaming proxies are tricky. SSE (Server-Sent Events) responses from Anthropic need to be intercepted without buffering — you want zero-latency passthrough while still accumulating usage data from the final event. The
SSEInterceptorTransform stream handles this.Test with real money before shipping. Mock tests proved the code worked. Real Anthropic API calls found two critical bugs: a wrong default model ID in routing rules, and a missing
anthropic-versionheader that the proxy wasn't injecting. Both would have broken every user's setup.Security documentation is a feature. In the OpenClaw ecosystem where trust is low (malicious skills, exposed instances), having a thorough SECURITY.md that explains exactly how API keys are handled makes people comfortable using your tool.
Try It
TokPinch is 100% free and open source (MIT licensed).
Quick start:
docker run -p 4100:4100 -v tokpinch-data:/app/data \
-e DASHBOARD_PASSWORD=yourpassword \
tokpinch/tokpinch
Then add one line to your OpenClaw config:
ANTHROPIC_BASE_URL=http://localhost:4100/v1
Open http://localhost:4100/dashboard and watch your costs in real time.
Links:
- GitHub: github.com/TobieTom/tokpinch
- Landing page: tokpinch.vercel.app
- Cloud version waitlist: tokpinch.vercel.app/#waitlist
If you're running OpenClaw or any AI agent with paid API access, give TokPinch a try. Star the repo if it's useful, and open an issue if you find bugs.
Built by TobieTom 🇳🇬
What features would you want to see next? Drop a comment below or open a GitHub issue.
Top comments (0)