Teja Kummarikuntla

for Kong

Posted on Mar 31 • Edited on Apr 30

💰I Built a Token Billing System for My AI Agent - Here's How It Works

#ai #webdev #programming #architecture

Gateway tracking to avoid custom pipelines

I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?

Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.

I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.

I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.

So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:

🚧 Route requests through AI Gateway
🪙 Tokens get metered per consumer
💵 Pricing gets applied
🧾 Invoice generated

Here's the whole setup, step by step.

Set up the gateway
Step 1: Create a consumer
Step 2: Configure the AI Proxy
Step 3: Enable token metering
Step 4: Create a feature
Step 5: Create a plan with a rate card
Step 6: Create a subscription
Step 7: Validate the invoice
Step 8: Connect Stripe

The Setup

The billing pipeline has three layers:

Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.

Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.

Stripe collects payment. The metering layer generates invoices that sync to Stripe.

Let me walk through each piece.

Prerequisites

You can do this entirely through the UI or via CLI. I'll cover both as we go.

A Kong Konnect account
An OpenAI API key (or any LLM provider key of your choice)

For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.

Set Up the Gateway

Once you log in, click on API Gateway and create one.

I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:

Service Name: ai-service
Service URL: http://httpbin.konghq.com/anything
Route Name: ai-chat
Route Path: /chat

CLI

If you prefer the command line, generate your PAT and run:

export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
  -k $KONNECT_TOKEN --deck-output

This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:

export DECK_OPENAI_API_KEY='your_openai_api_key'

Then set up the service and route:

_format_version: "3.0"
services:
  - name: ai-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: ai-chat
    paths:
      - "/chat"
    service:
      name: ai-service

Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.

Step 1: Create a Consumer

You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.

Add a consumer with a key-auth credential:

You can enter the Key value as acme-secret-key.

Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:

Click on Plugins in the left sidebar
Click on New Plugin
Select Key Authentication from the plugin list
Select Service as the scope or keep it as Global
Click Save

CLI

_format_version: "3.0"
consumers:
  - username: acme-corp
    keyauth_credentials:
      - key: acme-secret-key

Then enable the key-auth plugin on the service so the gateway actually requires authentication:

_format_version: "3.0"
plugins:
  - name: key-auth
    service: ai-service
    config:
      key_names:
        - apikey

Apply both with deck gateway apply.

Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.

Step 2: Configure the AI Proxy

Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.

Navigate to Plugins
Click on New Plugin
Select AI Proxy from the plugin list

Following the below yaml for CLI and configure the plugin fields accordingly:

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
        name: gpt-4o
      logging:
        log_payloads: true
        log_statistics: true

Two things to note here:

log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.

log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.

Apply with deck gateway apply and test:

curl -X POST "$KONNECT_PROXY_URL/chat" \
  -H "Content-Type: application/json" \
  -H "apikey: acme-secret-key" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a mathematician."},
      {"role": "user", "content": "What is 1+1?"}
    ]
  }'

You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.

If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.

Step 3: Enable Token Metering

Now we connect the gateway's token logs to the metering system.

In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.

This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:

$.model : which LLM handled the request
$.type : whether the tokens are input (request) or output (response)

The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.

At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.

Step 4: Create a Feature

A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.

Go to Metering & Billing → Product Catalog → Features and create one:

Name: ai-token
Meter: AI Gateway Tokens
Group by filters:
- Provider = openai
- Type = request (this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)

The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.

Step 5: Create a Plan with a Rate Card

Plans bundle features with pricing. Go to Product Catalog → Plans and create one:

Name: Starter
Billing cadence: 1 month

Add a rate card:

Feature: ai-token
Pricing model: Usage Based
Price per unit: 1
Entitlement type: Boolean (grants access to the feature)

A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.

For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.

Publish the plan. It's now available for subscriptions.

Step 6: Create a Customer and Start a Subscription

This is where the consumer from Step 1 connects to the billing system.

Go to Metering & Billing → Billing → Customers and create one:

Name: Acme Corp
Include usage from: select the acme-corp consumer

This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.

Now create a subscription:

Go to the Acme Corp customer, then Subscriptions → Create a Subscription
Plan: Starter
Start the subscription

One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.

Step 7: Validate the Invoice

Send a few requests through the gateway:

for i in {1..6}; do
  curl -s -X POST "$KONNECT_PROXY_URL/chat" \
    -H "Content-Type: application/json" \
    -H "apikey: acme-secret-key" \
    --json '{
      "messages": [
        {"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
      ]
    }'
  echo ""
done

Wait a minute or two for the events to propagate, then go to Metering & Billing → Billing → Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.

You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.

Connecting Stripe

Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.

The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.

## Things I Ran Into

The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.

Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.

The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.

Where I'd Take This Next

This walkthrough uses a single provider and a single feature. A production setup would look more like:

Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)

The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.

If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.

Top comments (20)

Nova Elvaris • Apr 3

The per-token billing approach is smart — most AI agent systems I have seen just track total API cost at the provider level, which makes it really hard to attribute spend to specific features or user actions. Breaking it down to the token level gives you the granularity to actually optimize.

One thing I have found useful in similar setups is adding a "token budget" per task type. Instead of just tracking what was spent, you set a ceiling before execution starts. If the agent is about to blow past the budget on a single task, it forces a checkpoint instead of running up the bill silently. Pairs well with the billing system you built here.

Teja Kummarikuntla Kong • Apr 5

Yeah, totally agree, budgeting is the missing control loop.

Per-token billing (what I built here with Kong AI Gateway + Konnect Metering & Billing) gives you accurate attribution, who/what actually consumed tokens. But by itself, it’s reactive.

A token budget adds a runtime guardrail. For agent flows, that means checking expected token usage before each step and stopping or degrading (smaller model, less context, fewer tool calls) instead of silently overspending.

In practice, you need both:
metering for visibility, budgets for control.

vuleolabs • Apr 4

"Hey, this is one of the cleanest and most practical token billing setups I’ve seen. Really well written!
I love that you went with Kong AI Gateway + Konnect Metering instead of building yet another custom pipeline. The fact that the gateway already knows the token counts and can meter them directly is such a smart move.
The part about splitting input vs output tokens (and why it matters for pricing) is gold — a lot of people miss that and end up undercharging on output-heavy usage.
Quick questions for you:

How’s the added latency from the gateway in production? Noticeable or basically zero?
Would you recommend this stack for a smaller indie AI product, or is it more suitable once you have decent volume?

Thanks for the detailed walkthrough — saved it for future reference. Super helpful!"

Teja Kummarikuntla Kong • Apr 5

Thanks, really appreciate that.

On latency:

In practice, the gateway hop is usually small relative to model/provider latency, so it hasn’t been the bottleneck in my experience. Kong’s docs also call out that Gateway and AI Gateway are designed for minimal and predictable latency, but I’d still benchmark with your own setup (plugins, traffic, provider mix) since that’s what really determines impact.

developer.konghq.com/ai-gateway/re...

For indie products:

Yeah, I think it can make sense earlier than most people expect, if you already know you need a gateway boundary, provider abstraction, per-consumer usage tracking, and usage-based billing.

AI Gateway gives you a consistent layer across providers, and Konnect Metering & Billing handles usage tracking, pricing models, subscriptions/invoicing, and limits on top.

dev.to/tejakummarikuntla/i-built-a...

If it’s a very small app with a single provider and you just need basic cost visibility, this might be more than you need initially. But once you care about attribution, enforcing limits, or monetizing usage cleanly, doing it at the gateway layer is a lot simpler than pushing all of that logic into app code.

Sumsuzzaman Chowdhury • Apr 2

❤️

Teja Kummarikuntla Kong • Apr 2

❤️🚀

Nova Elvaris • Apr 1

The decision to meter at the gateway level instead of the application layer is smart — I've seen teams build token tracking into their app code and it becomes a maintenance nightmare when you add new models or providers. The gateway already sees everything, so why duplicate that logic? One challenge I've run into with per-token billing is that users often can't predict their costs because token counts are invisible to them. A "2,000 token request" means nothing to a non-technical user. Have you considered adding a cost-estimate preview before the request actually executes, or some kind of budget cap that blocks requests once a threshold is hit? That seems like the missing UX piece for making usage-based AI billing actually work for end users.

Teja Kummarikuntla Kong • Apr 1

Totally agree on both points.

Gateway-level metering was mainly about avoiding duplication and keeping model/provider changes out of the app layer.

On the UX side - you’re right, token counts aren’t intuitive at all. Right now this setup solves accurate billing, but not predictable costs. Adding:

cost previews
usage alerts
hard budget caps

is something makes it more solid.

Estimation is a bit tricky (especially output tokens), but even a rough preview would go a long way. Feels like that’s the next layer needed to make this usable for non-technical users.

Bryan Rhee • Mar 31

Love the article man. Thanks for posting it!

Teja Kummarikuntla Kong • Mar 31

🚀

Kai Alder • Apr 4

Solid walkthrough. I've been running a similar setup but hit an interesting edge case — streaming responses. When you're using SSE for chat completions, token counts aren't always available until the stream ends. Had to implement a small buffer that waits for the final chunk before emitting the usage event to the gateway.

The input vs output pricing split is crucial. We started with a flat "token" rate and quickly realized we were losing money on long-form generation tasks. GPT-4o's 4x output premium adds up fast.

One question: how are you handling failed requests? If a request times out or hits a rate limit mid-stream, do you still bill for the partial tokens consumed? We ended up adding a "billable" flag that only gets set when the response completes successfully.

Peter Marton • Apr 6

Hi Kai, we have provider (e.g., Anthropic), model (e.g., opus-4), type (e.g. output), and status_code dimensions on metered AI requests, so you can price differently for input and output tokens and filter out non-successful requests.

Steriani Karamanlis • Apr 6

the token budget approach only works if the price you're budgeting against is accurate. most systems hardcode a rate at build time and never update it. vendors reprice quietly, caching discounts appear or disappear, and suddenly your budget math is off by 30% or more without any visible signal. the control loop needs live pricing inputs to stay meaningful.

Archit Mittal • Apr 21

Great breakdown. One thing worth flagging for anyone copying this pattern: always log both prompt_tokens and completion_tokens separately because output tokens are typically 3-5x more expensive depending on the model. I also add a safety buffer by multiplying the tokenizer estimate by ~1.1 before charging — real billed tokens often come in slightly higher than tiktoken's local count, especially for models that do tool-calling. How are you handling streaming responses where you don't get the final usage object until the end?

Johnny Santamaria • Jun 9 • Edited

I like how you broke down the scope and the fact that this resource is open source is awesome for students and developers alike

Taking billing to the input token and output token level is a smart move
Taking all this in reflects what I have learned so far in my program, which is to control everything from the source and operations matter

Everything runs from data structures, so it is best you understand your program before adding things to it

This brings me to a simple question, how are tokens measured in this specific LLM? Where can I generally find this information?

Bravo

Teja Kummarikuntla Kong • Jun 10

Thank you 💯

Apex Stack • Apr 6

Really solid approach to per-token billing. The split between input and output token pricing is something a lot of teams overlook — they just track total cost per call and lose visibility into where the money actually goes.

One thing I've been thinking about with multi-provider agent setups: do you handle rate limiting or fallback routing at the gateway level too? Because if you're already tracking tokens per provider through Kong, it seems like a natural extension to add cost-aware routing — e.g., route lower-priority tasks to the cheaper model automatically based on the billing data you're already collecting.

The Konnect Metering + Stripe integration is clean. Way better than building a custom metering pipeline from scratch.

Peter Marton • Apr 6 • Edited

Hi, yes, Kong AI Gateway has both usage and cost rate limiters.

route lower-priority tasks to the cheaper model automatically based on the billing data you're already collecting.

This is technically possible, but it should be an app decision, no? It's specific to what you are building what is a low or high priority task

View full discussion (20 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.