Tokens Forge

Posted on Jun 27

Multi-agent apps need token budgets, not only cheaper models

#saas #ai #devtools #api

When teams start using AI agents, the first cost-control instinct is usually simple: move more traffic to cheaper models.

That helps, but it does not solve the real operational problem.

A long-running workflow does not fail financially because one model is expensive. It fails because nobody can explain the chain of spending after the run finishes.

Which API key started the task? Which project owned it? Which model route did each step use? Did the request fall back to another route? Did it retry three times? Which balance bucket paid for the final bill?

If those questions are not answerable, a cheaper model only delays the same problem.

The unit of control should be the task

Most dashboards show spend by model, day, or provider. That is useful for accounting, but it is too coarse for agent work.

Agents do not spend money in clean daily rows. They spend money through task chains:

a research task expands context
a coding task calls multiple models
a retry loop quietly repeats a failed step
a fallback route changes the model used
a report generation task runs for 30 to 45 minutes

The operator does not need only a monthly cap. The operator needs a per-task budget envelope.

A task-level budget says: this workflow can spend up to this amount, on these route types, with these fallback rules. When it crosses the boundary, stop the workflow or require a new decision.

That is a different primitive from provider billing.

Route ledgers matter as much as route selection

Routing is usually presented as a way to lower cost: send easier work to cheaper models, reserve premium routes for harder work, and keep backups ready.

That is only half of the product.

The other half is the ledger.

For every model request, the system should store enough context to explain the charge later:

API key and project owner
requested model and resolved route
upstream model actually called
route type, such as premium/direct or lower-cost pool
fallback chain
retry count
input and output token usage
settlement bucket or balance bucket
latency and error state

Without that ledger, a routing layer can become a black box. It may save money most of the time, but when a user asks why a task consumed so much balance, there is no useful answer.

Separate balances make the product clearer

One thing we learned while building Tokens Forge is that balance semantics matter.

Premium/direct model access and lower-cost routed access should not feel like the same wallet with a hidden exchange rate. They have different expectations.

A user buying official model credit wants predictable premium access. A user using lower-cost routes wants discounted throughput and understands that routing can include pools, backups, and different upstream behavior.

Putting those into clear buckets makes the UI easier to explain and the ledger easier to audit.

This is especially important for research workflows

Tokens Forge also includes an AI Researcher workflow. That made the budget problem more obvious.

A short chat request is easy to understand. A research run is different. It can collect data, produce analysis, call quick and deeper models, and generate a long report. It may run for 15, 30, or 45 minutes depending on depth.

For that kind of workflow, token usage must be visible before and after the run. The user needs enough balance before starting, and the operator needs a ledger if the run costs more than expected.

That is why we treat the AI Researcher as a workflow built on top of the gateway, not as a separate gimmick. It is a practical test of whether the accounting layer is good enough.

The takeaway

Cheaper models are useful. Fallback routing is useful. Unified APIs are useful.

But for real products, the gateway also needs budget boundaries and route-level evidence.

The cost-control question should not be only:

Which model is cheapest?

It should be:

Which task spent this money, which route spent it, and was that spend allowed?

That is the direction we are building with Tokens Forge: low-cost multi-model API access, visible route ledgers, separate balance semantics, and AI Researcher workflows that make token usage explicit.

https://tokens-forge.com/

Top comments (1)

Alex Shev • Jun 27

Budgets become more important than price once agents run in chains. A cheap model does not help much if the workflow retries, fans out, or escalates without attribution. Per-project and per-run budgets give teams a way to stop runaway work before the invoice becomes the monitoring system.