DEV Community

Tokens Forge
Tokens Forge

Posted on

AI API cost control is a routing problem, not a pricing spreadsheet

Most teams start AI cost control with a spreadsheet: model A costs this much, model B costs that much, so use the cheaper one.

That helps for a week. Then production traffic arrives.

The real cost problem is not the model price. It is losing the path between a user request and the billable provider call.

Once a product has multiple features, API keys, environments, retries, and fallback routes, the invoice stops answering the question founders actually care about:

Which product path created this spend, and could we have routed it better?

The failure mode

A typical early setup looks like this:

  • one OpenAI key in an environment variable
  • one Claude key for higher quality tasks
  • maybe Gemini or a proxy for cheaper workloads
  • logs that show application errors, but not token economics
  • a monthly provider invoice that arrives too late

This is fine while one developer is experimenting.

It breaks when several workflows share the same provider account. A single retry loop, a background summarizer, or a test environment can quietly become the largest customer in your AI budget.

The bad part is not only that money was spent. The bad part is that you cannot reconstruct the route.

Treat every AI request like a billable event

The cleaner pattern is to attach accounting data before the request leaves your system.

At minimum, every call should carry:

  • user or API key owner
  • project or workspace
  • requested model
  • actual upstream model
  • route type, such as direct, backup, or cheaper pool
  • input and output tokens
  • settlement bucket, such as credits, wallet balance, or internal cost center
  • request id for debugging

This makes the gateway the source of truth, not the provider invoice.

If a request starts as gpt-5.5 but gets served by a backup route, that decision should be visible. If a cheaper model pool handles a non-critical workflow, that should be visible too. If a premium direct route is used, it should be attached to the right balance and owner immediately.

Route policy matters more than average price

Averages hide the thing you need to tune.

For example, a team may discover that 80% of its calls are low-risk transformations that can tolerate a cheaper route, while 20% need the official direct model path. If both are merged into one monthly spend line, nobody can make a good routing decision.

A practical setup separates:

  • official/direct models for workloads where predictability matters
  • ordinary or pooled routes for lower-cost throughput
  • fallback channels for provider instability
  • per-route usage and error logs
  • clear balances or budgets for each settlement path

That is also how you avoid confusing product pricing with provider pricing. A product might sell usage-based credits while still routing internally across several providers. The customer should see a stable API surface; the operator should see the routing economics.

Alerts should trigger on velocity, not just totals

Daily spend alerts are too slow for runaway loops.

Token velocity catches problems earlier. A workflow that normally burns 20k tokens per hour and suddenly burns 2M tokens in 10 minutes is the event you care about. The absolute daily total may still look acceptable when the damage starts.

Useful alert signals include:

  • tokens per minute by API key
  • error rate by upstream channel
  • fallback route frequency
  • spend by model route
  • sudden provider/model mix changes
  • failed requests that still consumed tokens

This is where gateway-level logs beat provider dashboards. Provider dashboards are useful, but they do not know your feature boundaries.

What we are building

I am building Tokens Forge around this idea: one OpenAI-compatible API surface, but with model routing, official/direct and lower-cost routes, usage logs, balance separation, and AI Researcher workflows in one place.

The goal is not to hide complexity with a black-box proxy. The goal is to make the routing and billing path inspectable enough that a founder can answer:

  • which users or keys are spending
  • which models are actually serving requests
  • which routes are expensive but necessary
  • which routes can be moved to a cheaper path
  • which failures need operational attention

If you are building AI features, I would treat gateway instrumentation as product infrastructure, not billing admin.

Once the request leaves your app, the chance to attach useful business context is already mostly gone.

Tokens Forge: https://tokens-forge.com/

Top comments (0)