I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?
Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.
I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.
I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.
So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:
- π§ Route requests through AI Gateway
- πͺ Tokens get metered per consumer
- π΅ Pricing gets applied
- π§Ύ Invoice generated
Here's the whole setup, step by step.
- Set up the gateway
- Step 1: Create a consumer
- Step 2: Configure the AI Proxy
- Step 3: Enable token metering
- Step 4: Create a feature
- Step 5: Create a plan with a rate card
- Step 6: Create a subscription
- Step 7: Validate the invoice
- Step 8: Connect Stripe
The Setup
The billing pipeline has three layers:
Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.
Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.
Stripe collects payment. The metering layer generates invoices that sync to Stripe.
Let me walk through each piece.
Prerequisites
You can do this entirely through the UI or via CLI. I'll cover both as we go.
- A Kong Konnect account
- An OpenAI API key (or any LLM provider key of your choice)
For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.
Set Up the Gateway
Once you log in, click on API Gateway and create one.
I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:
-
Service Name:
ai-service -
Service URL:
http://httpbin.konghq.com/anything -
Route Name:
ai-chat -
Route Path:
/chat
CLI
If you prefer the command line, generate your PAT and run:
export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
-k $KONNECT_TOKEN --deck-output
This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:
export DECK_OPENAI_API_KEY='your_openai_api_key'
Then set up the service and route:
_format_version: "3.0"
services:
- name: ai-service
url: http://httpbin.konghq.com/anything
routes:
- name: ai-chat
paths:
- "/chat"
service:
name: ai-service
Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.
Step 1: Create a Consumer
You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.
Add a consumer with a key-auth credential:
You can enter the Key value as acme-secret-key.
Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:
- Click on Plugins in the left sidebar
- Click on New Plugin
- Select Key Authentication from the plugin list
- Select Service as the scope or keep it as Global
- Click Save
CLI
_format_version: "3.0"
consumers:
- username: acme-corp
keyauth_credentials:
- key: acme-secret-key
Then enable the key-auth plugin on the service so the gateway actually requires authentication:
_format_version: "3.0"
plugins:
- name: key-auth
service: ai-service
config:
key_names:
- apikey
Apply both with deck gateway apply.
Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.
Step 2: Configure the AI Proxy
Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.
- Navigate to Plugins
- Click on New Plugin
- Select AI Proxy from the plugin list
Following the below yaml for CLI and configure the plugin fields accordingly:
_format_version: "3.0"
plugins:
- name: ai-proxy
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
model:
provider: openai
name: gpt-4o
logging:
log_payloads: true
log_statistics: true
Two things to note here:
log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.
log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.
Apply with deck gateway apply and test:
curl -X POST "$KONNECT_PROXY_URL/chat" \
-H "Content-Type: application/json" \
-H "apikey: acme-secret-key" \
--json '{
"messages": [
{"role": "system", "content": "You are a mathematician."},
{"role": "user", "content": "What is 1+1?"}
]
}'
You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.
If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.
Step 3: Enable Token Metering
Now we connect the gateway's token logs to the metering system.
In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.
This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:
-
$.model: which LLM handled the request -
$.type: whether the tokens are input (request) or output (response)
The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.
At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.
Step 4: Create a Feature
A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.
Go to Metering & Billing β Product Catalog β Features and create one:
-
Name:
ai-token - Meter: AI Gateway Tokens
-
Group by filters:
- Provider =
openai - Type =
request(this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)
- Provider =
The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.
Step 5: Create a Plan with a Rate Card
Plans bundle features with pricing. Go to Product Catalog β Plans and create one:
-
Name:
Starter - Billing cadence: 1 month
Add a rate card:
-
Feature:
ai-token - Pricing model: Usage Based
-
Price per unit:
1 - Entitlement type: Boolean (grants access to the feature)
A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.
For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.
Publish the plan. It's now available for subscriptions.
Step 6: Create a Customer and Start a Subscription
This is where the consumer from Step 1 connects to the billing system.
Go to Metering & Billing β Billing β Customers and create one:
-
Name:
Acme Corp -
Include usage from: select the
acme-corpconsumer
This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.
Now create a subscription:
- Go to the Acme Corp customer, then Subscriptions β Create a Subscription
-
Plan:
Starter - Start the subscription
One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.
Step 7: Validate the Invoice
Send a few requests through the gateway:
for i in {1..6}; do
curl -s -X POST "$KONNECT_PROXY_URL/chat" \
-H "Content-Type: application/json" \
-H "apikey: acme-secret-key" \
--json '{
"messages": [
{"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
]
}'
echo ""
done
Wait a minute or two for the events to propagate, then go to Metering & Billing β Billing β Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.
You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.
Connecting Stripe
Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.
The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.
## Things I Ran Into
The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.
Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.
The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.
Where I'd Take This Next
This walkthrough uses a single provider and a single feature. A production setup would look more like:
- Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
- Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
- Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
- AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)
The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.
If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.












Top comments (4)
One thing we've noticed in our accelerator is that the biggest challenge isn't routing requests but managing context across those requests. Without a robust system to handle context, agents tend to produce inconsistent outputs, especially when using multiple LLMs. Implementing a shared memory architecture or a RAG (retrieval-augmented generation) model can help stabilize outputs and improve response reliability. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)
Absolutely π―
Open to know your thoughts on implementig it, any suggestions here? @ali_muwwakkil_a776a21aa9c
Love the article man. Thanks for posting it!
π