Discussion on: 💰I Built a Token Billing System for My AI Agent - Here's How It Works

View post

The decision to meter at the gateway level instead of the application layer is smart — I've seen teams build token tracking into their app code and it becomes a maintenance nightmare when you add new models or providers. The gateway already sees everything, so why duplicate that logic? One challenge I've run into with per-token billing is that users often can't predict their costs because token counts are invisible to them. A "2,000 token request" means nothing to a non-technical user. Have you considered adding a cost-estimate preview before the request actually executes, or some kind of budget cap that blocks requests once a threshold is hit? That seems like the missing UX piece for making usage-based AI billing actually work for end users.

Teja Kummarikuntla • Apr 1

Totally agree on both points.

Gateway-level metering was mainly about avoiding duplication and keeping model/provider changes out of the app layer.

On the UX side - you’re right, token counts aren’t intuitive at all. Right now this setup solves accurate billing, but not predictable costs. Adding:

cost previews
usage alerts
hard budget caps

is something makes it more solid.

Estimation is a bit tricky (especially output tokens), but even a rough preview would go a long way. Feels like that’s the next layer needed to make this usable for non-technical users.