Last week, Anthropic cut off subscription access for OpenClaw and all third-party agent tools. If you were running OpenClaw on a Claude Max plan, you already know the math changed overnight.
The immediate advice everywhere was the same: switch to Haiku, use cheaper models, trim your system prompts. But none of these suggestions tell you where your money is actually going across your full stack, or how much you could realistically be token optimizing.
I wanted actual numbers. So I built a tool that connects to billing APIs, pulls historical spend, and runs optimization analyses on all of your numbers.
The full picture is bigger than LLM inference tokens
Most people only think about their OpenAI or Anthropic bill. But if you are running anything in production, you are also paying for cloud infra, monitoring, communication APIs, search services, creative generation, and probably Stripe fees on revenue.
I started with a handful of common providers (also based on my own usage with GPT, Claude, AWS, Vercel, Gemini, SendGrid, Tavily, even cryptocurrencies!) and tracked everything going back a few months, categorizing broadly into
- LLM inference
- Cloud infra
- Creative/media generation
- Communication
- Monitoring
- Search/data
- Advertising
- Cash/Treasury
The spend breakdown itself was useful but the optimization analyses were the real finding.
Six Optimization categories, hundreds in savings
The tool runs six modules on billing data. Every number comes from your actual usage patterns and live model pricing. Nothing is hardcoded or assumed.
- Model Routing: 80% of GPT-4o requests had under 500 input tokens - classification, extraction, short lookups. The tool identifies these by looking at request size distribution, then prices the same workload on cheaper models using live rates from OpenRouter. Switching those tasks to mini results in hundreds/mo in savings.
- Caching Opportunity: My API calls had an 85% consistency score - input tokens were the same content repeated every call (system prompts, tool definitions, context preamble).
- Batch Processing: 70% of my daily API volume was consistent day over day - automated pipelines, not interactive queries. Moving those to OpenAI's Batch API provides a substantial discount.
- Unit Economics: With Stripe connected, the tool builds a P&L per customer. My AI inference cost per customer is just shy of a dollar per month with total operating costs close to $2.50. Margins looked fine today but AI costs were growing much faster than rev - the tool processed this and recommended some optimization strategies.
- Price Comparison: My exact Anthropic workload - 2M input, 500k output tokens - priced on every comparable model at current rates. Not a theoretical comparison table. My actual volume at today's prices.
- Spend Forecast: Growth rates calculated from my billing history and banking uploads. Although forecasting is still not perfect (and not a true predictor of future trends), it was helpful in assessing critical failure points in spend.
The tool
It's called ANVX. Open source, data stays local.
It's available as an MCP server for Claude Desktop and ChatGPT.
GitHub: https://github.com/tje8x/anvx
Or if you're running a swarm of agents on OpenClaw, install it as an OpenClaw skill:
clawhub install anvx
It pulls your billing history the moment you connect a provider - no waiting period. Upload bank or card statements to fill in the gaps. The real-time financial model builds over time and the forecasts get more accurate as more data comes in.
Why this matters right now
The cutoff forced cost visibility on everyone at the same time. For many builders, this is the first time they're paying per-token and the first time they have any reason to look at what they're spending.
Generic advice captures one optimization category. Model routing might save a few hundred a month but caching saves another hundred, batch processing saves tens of dollars, and the unit economics analysis might show your cost trajectory as unsustainable regardless. Without looking at actual data, we're all just guessing.
What's next
The intelligence layer is live. Next: a card and wallet layer that lets you act on the recommendations - program spend, route payments, execute model switches, consolidate all your tokens.
Early access: https://anvx.io
Top comments (0)