Teja Kummarikuntla

for Kong

Posted on May 5

💰Monetize Your AI Agents with LangChain and Kong

#langchain #webdev #programming #ai

Say you built an AI agent and customers are starting to pay for it. Sooner or later you'll want to charge them by what they actually use, because some customers hammer the agent all day while others send a handful of messages a week. A single flat fee loses money on the heavy users and overcharges the light ones.

The billing problem is the same whether your agent runs on your own model (self-hosted, fine-tuned, or trained from scratch) or calls a third-party API like OpenAI, Anthropic, or Gemini. You still need to know which customer made which call, count the tokens it used, and turn that into a dollar amount on a real invoice. That mapping (request → customer → token count → dollar amount → invoice) is yours to build, and that's what this tutorial sets up.

The agent uses LangChain, which sits one layer above the model so the same metering code works regardless of what's behind it. The example runs on OpenAI's gpt-4o-mini for convenience, but swap the chat model and nothing else changes. A small LangChain callback records each call's input and output token counts, tagged with the customer ID. Those records flow to Kong Konnect Metering & Billing, which keeps a running per-customer tally, applies your prices (input and output tokens can be priced separately), and produces invoices on a monthly cycle.

See it in action first

Before getting into the setup, here is what the finished pipeline looks like end to end. The agent runs on one side and reports the tokens it just used. Those same tokens land as a billable line item on the customer's invoice in Kong on the other.

The AI Agent App

The user types Hello world. The agent replies with Hello! How can I assist you today?. Both ends happen to land on 9 tokens. The input count is 9 rather than 2 because OpenAI wraps the prompt in chat-message formatting, which adds a few more beyond the literal words. The output landing on 9 as well. The agent fires off one record for the input tokens and another for the output tokens, both tagged with the customer (acme).

Metering and Billing the Agent in Kong

The same call now sits there as a real billable line item. With a simple test pricing of $1 per input token and $2 per output token, the math lines up:

Input: 9 tokens × $1 = $9
Output: 9 tokens × $2 = $18
Total: $27

Same numbers on both sides of the pipeline. That is what we are about to build.

Let's go through it step by step.

AI Agent App: github.com/tejakummarikuntla/llm-metering-langchian-kong.

Architecture

Every LLM call produces two CloudEvents. One carries the prompt token count, the other carries the response token count. Both events carry a subject field set to the customer identifier. Kong groups events by subject, sums the token field, multiplies by the rate card configured on the customer's plan, and rolls everything into invoices on the billing cycle.

Why this stack

Kong Konnect Metering & Billing fits this tutorial for three specific reasons:

Open source core. The metering side is built on OpenMeter, which is open source. You can self-host the metering pipeline, or use the managed Konnect service.
Configurable billing engine. Meters, features, plans, rate cards, and subscriptions are first-class primitives, configured in the portal rather than shipped as code.

You're not replacing Stripe here; you're using Kong as the metering and invoicing layer that feeds it.

What you will build

A LangChain callback handler that emits two CloudEvents per LLM call
A Kong meter that filters kong.llm_request events and sums the tokens field
Two features (input and output tokens) feeding a plan with separate rate cards
A customer subscribed to that plan, with metered usage and dollar values in the Konnect portal

Prerequisites

Node.js 22.6 or higher
pnpm: npm install -g pnpm
An OpenAI API key
A free Kong Konnect account: konghq.com
A Konnect Personal Access Token with Metering & Billing write permissions

Tutorial map

Part 1: Add Metering into the AI Agent app

Clone the AI agent app
Configure environment variables
Walk through the codebase
Run the AI Agent app

Part 2: Connect to Kong Metering & Billing

Create a Meter in Kong M&B
Create Features for input and output tokens
Create a Plan with Rate Cards
Create the Customer
Add a Subscription
Inspect usage and Invoices
Connect a Payment provider

Part 1: Add Metering into the AI Agent app

Clone the AI Agent app

git clone https://github.com/tejakummarikuntla/llm-metering-langchian-kong
cd llm-metering-langchian-kong
pnpm install

The reference is two TypeScript files. handler.ts is the metering callback. index.ts is a small chain that reads a prompt from stdin so you have something to exercise the handler with. No sidecar service, no separate ingestion worker, no extra runtime dependency beyond LangChain and the OpenAI client.

Configure environment variables

cp .env.example .env

Open .env and fill in real values:

API_URL=https://us.api.konghq.tech/v3/openmeter/events
API_KEY=your-konnect-personal-access-token
SUBJECT=acme
MODEL=gpt-4o-mini
OPENAI_API_KEY=your-openai-api-key

Variable	Purpose
`API_URL`	Kong Konnect ingestion endpoint. The default is the US region. EU organizations use `https://eu.api.konghq.tech/v3/openmeter/events`.
`API_KEY`	Konnect Personal Access Token with Metering & Billing write scope.
`SUBJECT`	Customer identifier attached to every event. Use `acme` for testing. In production this comes from your authenticated session, not an env var.
`MODEL`	Any chat-completion model. `gpt-4o-mini` keeps testing cheap.
`OPENAI_API_KEY`	Standard OpenAI API key.

Walk through the codebase

MeteringCallbackHandler extends LangChain's BaseCallbackHandler and implements two of its lifecycle hooks. Callbacks fire at the same place token counts are reported, you do not need to subclass the LLM client, and the LangChain runId gives you a stable event ID for free.

handleLLMStart

This hook fires immediately before the model is called. The handler captures run metadata so the LLM end hook can build a CloudEvent with the right customer attribution:

async handleLLMStart(
  _llm: Serialized,
  _prompts: string[],
  runId: string,
  parentRunId?: string,
  _extraParams?: Record<string, unknown>,
  _tags?: string[],
  metadata: Record<string, unknown> = {},
) {
  if (parentRunId) {
    const parentMetadata = this.runMetadata.get(parentRunId);
    if (parentMetadata) {
      Object.assign(metadata, parentMetadata);
    }
  }
  this.runMetadata.set(runId, metadata);
}

The parent run check matters. LLM calls almost always run inside a chain, agent, or tool-calling flow, which LangChain models as a parent run. When you set metadata at chain.invoke({}, { metadata: { subject: 'acme' } }), LangChain attaches it to the chain run, not the child LLM run. Without merging parent metadata into the child, the LLM end hook reads an empty metadata object and the subject is lost.

handleLLMEnd

This hook fires after the model returns. The handler reads token counts from output.llmOutput.tokenUsage (the field OpenAI fills on non-streaming completions), builds two CloudEvents, and posts each to the Kong ingestion endpoint:

async handleLLMEnd(output: LLMResult, runId: string) {
  const { promptTokens = 0, completionTokens = 0 } =
    output.llmOutput?.['tokenUsage'] ??
    output.llmOutput?.['estimatedTokenUsage'] ??
    {};

  if (!(promptTokens > 0 || completionTokens > 0)) return;

  const metadata = this.runMetadata.get(runId) ?? {};
  const { subject, ls_model_name, ls_provider, ls_model_type, ...data } = metadata;

  const inputEvent = {
    specversion: '1.0',
    id: `${runId}-input`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'input', tokens: promptTokens, model: ls_model_name, provider: ls_provider },
  };

  const outputEvent = {
    specversion: '1.0',
    id: `${runId}-output`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'output', tokens: completionTokens, model: ls_model_name, provider: ls_provider },
  };

  await this.ingest(inputEvent);
  await this.ingest(outputEvent);
}

A few decisions in this block matter for production:

The id field combines the LangChain runId with -input or -output. Kong deduplicates events by id plus source, so retries do not double-bill.
data.type separates input from output tokens at the event level. That separation is what makes per-token-class pricing possible without running two meters.
Anything you pass in metadata at chain.invoke time spreads into data. Tenant tier, region, feature flag: add it once at invoke time and filter on it in the meter. No handler changes.
ingest is a plain fetch POST with a Bearer token header. No SDK, no batching layer.

Read the agent entry point

index.ts wires ChatOpenAI up with the metering handler and runs a small one-shot chain:

const handler = new MeteringCallbackHandler(apiUrl, apiKey);

const llm = new ChatOpenAI({
  model,
  apiKey: openaiApiKey,
  callbacks: [handler],
});

const chain = PromptTemplate.fromTemplate('{input}')
  .pipe(llm)
  .pipe(new StringOutputParser());

const result = await chain.invoke(
  { input: userInput },
  {
    metadata: {
      subject,
      kong: 'strong',
    },
  },
);

Two lines do the integration. callbacks: [handler] on the ChatOpenAI instance attaches the handler to every call made through it. The metadata block on chain.invoke carries the customer identifier into the run metadata that handleLLMStart reads. The kong: 'strong' field is just a metadata pass-through demonstration: anything you add in that block lands inside data on the CloudEvent.

Run the AI Agent app

Start the app:

pnpm start

Type a prompt:

You: Explain how token-based usage billing works for LLM applications.

The handler logs both events as it sends them:

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-input',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'input', tokens: 18, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

AI: Token-based usage billing charges customers based on the number of tokens consumed...

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-output',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'output', tokens: 156, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

Both events are in Kong. They will not appear in a customer's usage view or invoice until Part 2 is set up.

Part 2: Connect to Kong Metering & Billing

The next sections build the meter, features, plan, and subscription that turn the raw event stream into priced, per-customer usage. The flow follows the Konnect M&B concepts model: events feed meters, meters feed features, features attach to plans through rate cards, customers subscribe to plans.

Open cloud.konghq.com and confirm you're in the region matching API_URL.

Create the LLM Tokens meter

A meter is a continuously-running query over the event stream. It picks events that match a filter, applies an aggregation, and exposes the result as a numeric usage value.

In the Konnect console:

Left navigation: Metering & Billing → Metering
Top right: Create Meter
Choose template: LLM Tokens

The LLM Tokens template fills in the right defaults for this handler:

Event type filter: kong.llm_request (matches the type field on every CloudEvent the handler emits)
Aggregation: Sum
Value property: tokens (reads data.tokens)

Click Save.

CLI alternative

The same meter can be created through the Konnect API:

curl -X POST https://us.api.konghq.tech/v3/openmeter/meters \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "LLM Tokens",
    "key": "llm-tokens",
    "description": "LLM token usage",
    "event_type": "kong.llm_request",
    "aggregation": "SUM",
    "value_property": "$.tokens",
    "dimensions": { "type": "$.type", "provider": "$.provider", "model": "$.model" }
  }'

Create Features

Features turn a single meter into multiple billable units. You need two: one exposing only input-token events, one exposing only output-token events. The split is what makes asymmetric pricing possible (most providers charge more for output than input).

Left navigation: Product Catalog → Features.

Input token feature

Click Create Feature:

Name: Input Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens (from the dropdown)
Meter Group Filters: add a single filter
- Field: type
- Operator: equals
- Value: input

Save.

Output token feature

Same form, output values:

Name: Output Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens
Meter Group Filter: type equals output

Save.

The same meter now feeds two features, each filtered to a different event subset.

Create a Plan with usage-based Rate Cards

A plan is what a customer subscribes to. Inside it, rate cards attach prices to features.

Product Catalog → Plans → New Plan:

Name: Pro
Click Save

Inside the new plan, add two rate cards.

Input token rate card

Click Add Rate Card and select the input token feature:

Pricing model: Usage-based
Price per unit: 1

Two notes about this field that bite people on the first run.

First, price per unit is the price for a single token. Not per thousand, not per million. There is no toggle that switches the unit. Production rates are decimals like 0.000003. The example uses 1 here so the dollar values on the test invoice are large and obvious.

Second, the pricing model selector decides whether the feature is metered or flat. Choosing flat-fee here would charge a fixed amount per cycle regardless of usage, which is the opposite of what you want for a metered feature.

Output token rate card

Click Add Rate Card again, select output token:

Pricing model: Usage-based
Price per unit: 2

Save. Output tokens now cost twice the input rate, which roughly mirrors how OpenAI and most other providers price the underlying API.

Create the customer

The customer record needs to be created manually. The subject field on every CloudEvent ties a token usage event to a specific customer through the customer's key, so the key has to match the SUBJECT value in your .env (acme in this tutorial).

Left navigation: Metering & Billing → Billing → Customers. Top right: Create new.

Fill in the form:

Name: acme (display name shown in the portal)
Key: acme (must match the SUBJECT env value)

Click Save.

The customer is now in the system but does not have any plan attached yet. Token events tagged with subject: acme will associate to this record once a subscription is in place.

Add a subscription

A subscription connects this customer to the Pro plan you built earlier. Without it, events still flow into the meter but never produce invoice line items.

Open the acme customer page and switch to the Subscriptions tab. Click Create subscription.

Step 1 of the wizard: pick the plan.

Subscription plan: Pro (the plan with input-token and output-token rate cards)

Click Next.

Step 2: timing and billing cycle. Defaults are fine for testing.

Start subscription: Immediately
Bill: Monthly
Starting: Start of subscription

Click Next, then Start subscription on the confirmation step.

The subscription is now active. The next call from pnpm start lands inside an active billing window and rolls into an invoice.

Track usage and invoices

Run the agent a few times with prompts long enough that the response is more than a handful of tokens, otherwise the input and output counts can look almost identical.

pnpm start

Back in Konnect, open the acme customer page from the Billing section and switch to the Invoicing tab.

The view shows the active plan, both rate cards, accumulated usage per feature, and the running invoice total. With test rates of $1 per input token and $2 per output token, even four prompts produce a dollar value that is easy to verify against the handler's logged token counts. Switch to production decimals like 0.0000015 and 0.000006 and the same view continues to work, just with smaller numbers.

Connect a payment provider

The metering and billing layer ends at invoice generation. Actually charging the customer needs a payment provider.

Konnect connects to providers like Stripe to:

Sync customer payment methods between Konnect and the provider
Charge invoices automatically when the billing cycle closes
Handle dunning, retries, and failed payments

The metering pipeline doesn't change when payment providers change. Kong owns usage aggregation and invoice generation. The provider only handles collection. That separation makes it possible to support multiple providers, switch between them, or test with one provider in staging and another in production without touching any code.

Gotchas

Input and output token counts that look identical. Short prompts can produce the same input and output token count by coincidence. The input count includes chat message formatting overhead (role markers, message delimiters) added by OpenAI before the prompt reaches the model, so a two-word prompt is rarely two tokens. Use a longer prompt to see the counts diverge clearly.

Events appear in the meter but not in invoices. The subscription started after the events were ingested. Kong only invoices events that fall inside an active subscription window. Run the app again after creating the subscription.

subject missing warning in the logs. The handler logs could not find 'subject' in run metadata when the metadata block doesn't include a subject. Check that .env exists (not just .env.example), that SUBJECT is set, and that the metadata block in index.ts reads subject from the env variable.

EU vs US endpoint. The default API_URL is the US endpoint. EU Konnect organizations need https://eu.api.konghq.tech/v3/openmeter/events. Wrong region produces silent ingestion failures. Confirm the region from Konnect organization settings.

Event deduplication. Kong deduplicates by id plus source. Replaying the same event twice produces one record, not two. The handler builds id from the LangChain runId, so this is rarely an issue in normal use, but worth knowing if events are being replayed or generated outside this handler.

Production checklist

The reference app demonstrates the mechanics. A production setup needs a few real changes.

subject from auth, not env. Replace SUBJECT=acme with a value pulled from the authenticated user session. Each chain invocation passes the real customer ID into the metadata block.
Per-model pricing. Add model to the meter group filters on each feature and run different rate cards per model. GPT-4o, GPT-4o-mini, Claude, and others can all be priced independently while sharing one meter.
Custom segmentation. Any field added to the metadata block lands in data on the CloudEvent. Add tenant tier, region, or provider and filter or group on them in the meter to bill differently per segment.
Usage alerts. Once events flow, configure usage thresholds in Kong to notify customers, throttle them, or pause subscriptions when they hit a limit.
Idempotent retries. The handler doesn't retry failed ingest() calls. Wrap fetch with a small retry layer (exponential backoff, max attempts) to handle transient network errors without losing billable events. Kong's deduplication on id + source makes safe retries straightforward.

The full reference AI Agent app is at https://github.com/tejakummarikuntla/llm-metering-langchian-kong. Clone, configure, and the metering pipeline runs locally in a few minutes. Adding it to an existing LangChain agent is a single line: callbacks: [handler] on the LLM client. Everything else is Kong configuration.

What's the trickiest part of metering an AI agent in production for you? Streaming responses, multi-model pricing, or per-tenant segmentation? Drop a comment.

Top comments (2)

arun rajkumar • May 7

The API-key-as-billing primitive is the easy half — bills accrue, Stripe handles the rest. The half that breaks first in production is what happens when the agent's downstream call fails after the meter ticked. In normal API land, "credit the user" is a refund; in agent land, the agent has already taken five other actions based on a result that didn't actually settle, and now you're untangling a partial workflow. The teams that win at agent monetisation in 2026 won't be the ones with the cleanest pricing — they'll be the ones whose settlement/refund model can describe a partially-completed agent run. Worth thinking about now, before the first major chargeback case study lands.

Teja Kummarikuntla Kong • May 8

This is exactly the right problem to be thinking about, and honestly one I don't think the tooling has caught up to yet.

The mental model I keep coming back to is the authorize vs. capture pattern from payments. When you swipe a card, the bank authorizes the hold first, then the merchant captures later (or releases if the transaction falls through). Agent billing needs the same two-phase model: meter the usage provisionally as each step runs, then finalize or void it based on whether the overall workflow settled.

Right now most metering systems (including what I'm using with Kong Metering and Billing) treat every event as final on ingestion. Deduplication handles retries cleanly via id + source, but there's no native concept of a "pending" event that can be rolled back at the workflow level.

The interim approach I've seen work reasonably well:

Tag every meter event with a workflow_run_id in the event metadata
Track workflow state in your own store (started, completed, failed)
On failure, emit a compensating credit event for the same workflow_run_id to zero it out
Only finalize invoicing after workflow completion

It's essentially building a saga on top of your metering layer. Clunky, but it works today because Kong M&B's CloudEvents schema is flexible enough to carry the correlation metadata.

The harder edge case you're pointing at though is partial value delivery: step 1-3 completed and delivered real value, step 4 failed. The agent did something useful, just not everything. That's where "charge for what settled" gets genuinely complicated, because you need a value model per workflow step, not just a usage model.

I don't think any platform has cracked this cleanly yet. The first team to ship "workflow-aware metering with per-step settlement" as a primitive is going to clean up. Curious if you've seen anyone doing this well in production.