DEV Community: Kong

I Turned Hermes Into a Paid AI Agent, Then Billed Every Token and Tool Call

Teja Kummarikuntla — Mon, 08 Jun 2026 17:55:57 +0000

If you clicked this, you probably already like Hermes. So do I. I have had it running on my laptop for a while, the Hermes Agent is all over my feed lately, and the open models are good enough now that building your own agent on them is genuinely fun. Somewhere around the third tool I bolted on, my question quietly changed. It stopped being "can this thing do the task" and became "if I turn this to paid, what would it cost me, and how could I charge for it?"

I have put a price on software before, and an API is the easy case: meter the calls, pick a number, done. In case of agents, it does a couple of things which mostly cost real money, and they are not the same thing. It thinks, which is tokens. And it acts, which is the search it fires, the page it pulls, the report it writes. When I actually sat and watched mine run, the tool calls were doing as much work as the model. Pricing only the tokens would have billed half of what the agent really does.

So I stopped theorizing and tried it. I took my small Hermes research agent, gave it a few genuine tools, and wired up billing for both sides: every token and every tool call, as their own line items, ending in a real invoice. No pretend company, no pretend customers. Just an honest end-to-end run to find out what turning Hermes into a paid agent actually takes.

The billing runs on Kong Konnect Metering & Billing (the managed version of OpenMeter). I kept the path deliberately short, the agent posts its own usage events straight from the code it already runs. One agent run comes out the other end as one invoice, with a line for thinking and a line for each kind of acting. Here is how it went.

Here's the complete codebase: https://github.com/tejakummarikuntla/Hermes-Billing-with-KongMB

Here's what I had to do:
🧠 Set a research agent on Hermes with three tools (search, fetch, report)
🪙 Meter every token, split into input and output
🔧 Meter every tool call, by tool name
💵 Price thinking and acting in Kong Konnect Metering & Billing
🧾 Turn one agent run into one invoice

Here's the complete flow

Set up Hermes

You can use Hermes hosted or local. The agent code is identical either way; you only change three environment variables.

Option A: hosted (Nous Research API)

Create a key at portal.nousresearch.com. It is an OpenAI-compatible endpoint, so you point the OpenAI client at it:

LLM_BASE_URL=https://inference-api.nousresearch.com/v1
LLM_API_KEY=sk-nous-your-key
MODEL=nousresearch/hermes-4-70b

One thing to know up front: Hermes 4 is a paid model and needs purchased credits (a one-time grant only covers free models). And the Nous API does not expose the OpenAI tools parameter, so the agent uses Hermes native <tool_call> format there. More on that later.

Option B: local and free (Ollama)

If you do not want to spend anything, run Hermes 3 locally with Ollama. This is what the rest of the tutorial uses.

# The Homebrew cask bundles the inference runner. The CLI-only formula does not,
# so it can pull models but cannot actually run them.
brew install --cask ollama

ollama serve &        # start the local server on http://localhost:11434
ollama pull hermes3   # about 4.7GB, one time

That gives you an OpenAI-compatible Hermes at http://localhost:11434/v1 with no API key:

LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama
MODEL=hermes3

Prerequisites
Part 1: The Hermes agent
- Set up the project
- The tools
- Meter tokens and tool calls
- The agent loop
- Run it
Part 2: The billing setup
- Provision Kong with one script
- Create the two meters
- Create the features
- Create a plan with rate cards
- Create the customer and subscribe
- Run the agent and read the invoice
Where I'd take this next

Prerequisites

Python 3.10, 3.11, 3.12, or 3.13
A Hermes endpoint: local Ollama (free) or a Nous Research API key
A free Kong Konnect account
A Kong Konnect Personal Access Token with Metering & Billing permissions

Part 1: The Hermes agent

Set up the project

python -m venv .venv && source .venv/bin/activate
pip install openai httpx beautifulsoup4 python-dotenv

requirements.txt:

openai>=1.40.0
httpx>=0.27.0
beautifulsoup4>=4.12.0
python-dotenv>=1.0.0

.env (the local Ollama defaults plus your Kong values):

LLM_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=ollama
MODEL=hermes3

KONG_API_URL=https://us.api.konghq.com   # use eu or au if your org is there
KONG_PAT=kpat_your_konnect_token
SUBJECT=hermes-demo

SUBJECT is the customer identifier. Every usage event carries it, and Kong attributes the usage to the customer that owns that subject.

The tools

Three tools, all keyless so you only need the Kong PAT. web_search hits Wikipedia's open API, fetch_url reads a page, and make_report writes a file. Swap web_search for Tavily or Brave in production; nothing else changes.

# tools.py
import os, json, datetime
import httpx
from bs4 import BeautifulSoup

REPORTS_DIR = os.path.join(os.path.dirname(__file__), "reports")
UA = {"User-Agent": "hermes-paid-agent/0.1"}


def web_search(query: str) -> str:
    """Search for information on a topic. Returns titles, URLs, and snippets."""
    r = httpx.get("https://en.wikipedia.org/w/api.php",
                  params={"action": "query", "list": "search", "srsearch": query,
                          "format": "json", "srlimit": 5}, headers=UA, timeout=20.0)
    r.raise_for_status()
    hits = r.json().get("query", {}).get("search", [])
    return json.dumps([{
        "title": h["title"],
        "url": "https://en.wikipedia.org/wiki/" + h["title"].replace(" ", "_"),
        "snippet": BeautifulSoup(h.get("snippet", ""), "html.parser").get_text(),
    } for h in hits])


def fetch_url(url: str) -> str:
    """Fetch a web page and return its readable text (truncated)."""
    r = httpx.get(url, headers=UA, timeout=20.0, follow_redirects=True)
    r.raise_for_status()
    soup = BeautifulSoup(r.text, "html.parser")
    for tag in soup(["script", "style", "nav", "footer", "header"]):
        tag.decompose()
    return " ".join(soup.get_text(" ").split())[:4000]


def make_report(title: str, findings: str) -> str:
    """Write a final research report (markdown). The premium tool."""
    os.makedirs(REPORTS_DIR, exist_ok=True)
    stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    slug = "".join(c if c.isalnum() or c in "-_" else "-" for c in title.lower())[:40]
    path = os.path.join(REPORTS_DIR, f"{stamp}-{slug}.md")
    with open(path, "w") as f:
        f.write(f"# {title}\n\n{findings}\n")
    return f"Report written to {path}"


TOOLS = [
    {"type": "function", "function": {
        "name": "web_search", "description": "Search for information on a topic.",
        "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}},
    {"type": "function", "function": {
        "name": "fetch_url", "description": "Fetch the readable text of a web page by URL.",
        "parameters": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}}},
    {"type": "function", "function": {
        "name": "make_report", "description": "Write the final report. Call once when done.",
        "parameters": {"type": "object", "properties": {"title": {"type": "string"}, "findings": {"type": "string"}},
                       "required": ["title", "findings"]}}},
]
DISPATCH = {"web_search": web_search, "fetch_url": fetch_url, "make_report": make_report}

Meter tokens and tool calls

This is the whole billing integration. Two kinds of CloudEvent posted straight to Kong's ingest endpoint, no gateway:

hermes.tokens with {tokens, type, model}: one event for input, one for output, per model call.
hermes.tool_call with {tool}: one event each time a tool runs.

# metering.py
import os, uuid, datetime
import httpx
from dotenv import load_dotenv

load_dotenv()
KONG_API_URL = os.environ["KONG_API_URL"].rstrip("/")
KONG_PAT = os.environ["KONG_PAT"]
SUBJECT = os.environ.get("SUBJECT", "hermes-demo")
SOURCE = "hermes-paid-agent"

INGEST_URL = f"{KONG_API_URL}/v3/openmeter/events"
HEADERS = {"Authorization": f"Bearer {KONG_PAT}", "Content-Type": "application/cloudevents+json"}


def _now():
    return datetime.datetime.now(datetime.timezone.utc).isoformat()


def _post(event):
    r = httpx.post(INGEST_URL, headers=HEADERS, json=event, timeout=30.0)
    if r.status_code >= 300:
        raise RuntimeError(f"ingest failed {r.status_code}: {r.text}")


def emit_token_event(tokens, token_type, model):   # token_type is "input" or "output"
    _post({"specversion": "1.0", "id": str(uuid.uuid4()), "source": SOURCE,
           "type": "hermes.tokens", "time": _now(), "subject": SUBJECT,
           "data": {"tokens": tokens, "type": token_type, "model": model}})


def emit_tool_event(tool):
    _post({"specversion": "1.0", "id": str(uuid.uuid4()), "source": SOURCE,
           "type": "hermes.tool_call", "time": _now(), "subject": SUBJECT,
           "data": {"tool": tool}})

Each event gets a fresh id. Kong de-duplicates events by id plus source, so a fresh UUID per event keeps every one of them counted.

The agent loop

The loop is small: call Hermes, run any tools it asks for, feed results back, repeat until it answers. Two details make it Hermes-specific.

First, tool calling has two modes. When Hermes is served by Ollama, the server exposes the OpenAI tools parameter and returns structured tool_calls. The Nous API does not expose tools, so Hermes emits its native <tool_call> tags in the text and we parse them. The agent auto-selects the mode from the endpoint.

Second, metering sits inline: after each model call we read the usage block and emit two token events; each time a tool runs we emit a tool event.

# agent.py
import os, re, sys, json
from openai import OpenAI
from dotenv import load_dotenv
import tools
from metering import emit_token_event, emit_tool_event

load_dotenv()
BASE_URL = os.environ["LLM_BASE_URL"]
client = OpenAI(api_key=os.environ.get("LLM_API_KEY", "ollama"), base_url=BASE_URL)
MODEL = os.environ.get("MODEL", "hermes3")
MAX_STEPS, MAX_TOKENS, TEMPERATURE = 10, 1024, 0.3

# "api" = server-side tools parameter; "native" = Hermes <tool_call> parsing.
TOOL_MODE = os.environ.get("TOOL_MODE", "").lower() or ("native" if "nousresearch" in BASE_URL else "api")

SYSTEM = ("You are a research assistant. Use web_search to find sources, then call fetch_url ONLY "
          "with a url returned by web_search (never invent URLs). After 1-2 searches and one fetch, "
          "call make_report once with a title and concise findings that cite the source URLs. Then "
          "write a short 2-3 sentence summary as your final reply. Use at most 4 tools in total.")

HERMES_TOOL_INSTRUCTIONS = (
    "You are provided with function signatures within <tools></tools> XML tags. To call a function, "
    "return a JSON object with its name and arguments within <tool_call></tool_call> tags, like:\n"
    '<tool_call>\n{"name": "web_search", "arguments": {"query": "..."}}\n</tool_call>\n'
    "Call one function per step. When you have the final answer, reply with plain text and no tags.")

TOOL_CALL_RE = re.compile(r"<tool_call>\s*(\{.*?\})\s*</tool_call>", re.DOTALL)


def meter_usage(usage):
    if usage:
        emit_token_event(usage.prompt_tokens, "input", MODEL)
        emit_token_event(usage.completion_tokens, "output", MODEL)
        print(f"[meter] tokens  in={usage.prompt_tokens} out={usage.completion_tokens}")


def run_tool(name, args):
    print(f"[tool] {name}({args})")
    try:
        result = tools.DISPATCH[name](**args)
    except Exception as e:
        result = f"ERROR: {e}"
    emit_tool_event(name)
    print(f"[meter] tool_call {name}")
    return result


def run_api(question):   # Ollama and other endpoints that expose the tools parameter
    messages = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": question}]
    for step in range(MAX_STEPS):
        kwargs = {"model": MODEL, "messages": messages, "max_tokens": MAX_TOKENS, "temperature": TEMPERATURE}
        if step < MAX_STEPS - 1:
            kwargs["tools"] = tools.TOOLS
        resp = client.chat.completions.create(**kwargs)
        meter_usage(resp.usage)
        msg = resp.choices[0].message
        messages.append(msg.model_dump(exclude_none=True))
        if not msg.tool_calls:
            print("\n=== Answer ===\n" + (msg.content or "(no answer)"))
            return
        for call in msg.tool_calls:
            result = run_tool(call.function.name, json.loads(call.function.arguments or "{}"))
            messages.append({"role": "tool", "tool_call_id": call.id, "content": str(result)})


def run_native(question):   # Nous API: Hermes emits <tool_call> tags we parse
    sigs = "\n".join(json.dumps(t["function"]) for t in tools.TOOLS)
    system = f"{SYSTEM}\n\n{HERMES_TOOL_INSTRUCTIONS}\nHere are the available tools:\n<tools>\n{sigs}\n</tools>"
    messages = [{"role": "system", "content": system}, {"role": "user", "content": question}]
    for _ in range(MAX_STEPS):
        resp = client.chat.completions.create(model=MODEL, messages=messages,
                                              max_tokens=MAX_TOKENS, temperature=TEMPERATURE)
        meter_usage(resp.usage)
        content = resp.choices[0].message.content or ""
        messages.append({"role": "assistant", "content": content})
        calls = [(json.loads(m)["name"], json.loads(m).get("arguments", {}))
                 for m in TOOL_CALL_RE.findall(content)]
        if not calls:
            print("\n=== Answer ===\n" + content)
            return
        for name, args in calls:
            result = run_tool(name, args)
            messages.append({"role": "user",
                             "content": f"<tool_response>\n{json.dumps({'name': name, 'content': result})}\n</tool_response>"})


if __name__ == "__main__":
    print(f"[hermes] model={MODEL} endpoint={BASE_URL} tool_mode={TOOL_MODE}")
    (run_native if TOOL_MODE == "native" else run_api)(" ".join(sys.argv[1:]) or input("Ask Hermes: "))

Run it

python agent.py "Who founded Kong Inc. and what does the company build?"

You will see the metering happen in real time:

[hermes] model=hermes3 endpoint=http://localhost:11434/v1 tool_mode=api
[meter] tokens  in=391 out=54
[tool] web_search({'query': 'Kong Inc'})
[meter] tool_call web_search
[tool] fetch_url({'url': 'https://en.wikipedia.org/wiki/Kong_Inc.'})
[meter] tool_call fetch_url
[meter] tokens  in=833 out=249
[tool] make_report({'title': 'Kong Inc Overview', 'findings': '...'})
[meter] tool_call make_report
[meter] tokens  in=3392 out=87

=== Answer ===
The report on Kong Inc. has been written...

Every [meter] line is a CloudEvent already sitting in Kong. Now we price it.

Part 2: The billing setup

The model: two meters, then a feature per billable dimension, then a plan that prices each feature, then a customer and a subscription. Each step shows the Konnect UI and the equivalent API call.

Provision Kong with one script

If you want to skip the clicking, the repo has kong_setup.py that creates everything below in order and is safe to re-run (it reuses anything that already exists):

python kong_setup.py

The rest of this section is what that script does, step by step, so you understand each piece.

Create the two meters

A meter turns a stream of events into a number. We need two.

In the UI: Metering & Billing → Metering → Create Meter.

Meter	Event type	Aggregation	Value property	Group by
Hermes Tokens	`hermes.tokens`	Sum	`$.tokens`	`type`, `model`
Hermes Tool Calls	`hermes.tool_call`	Count	(none)	`tool`

The tokens meter sums the tokens field and keeps type as a dimension so we can split input from output. The tool meter just counts events and keeps tool as a dimension.

CLI

curl -s -X POST "$KONG_API_URL/v3/openmeter/meters" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"key":"hermes_tokens","name":"Hermes Tokens","event_type":"hermes.tokens",
       "aggregation":"sum","value_property":"$.tokens",
       "dimensions":{"type":"$.type","model":"$.model"}}'

curl -s -X POST "$KONG_API_URL/v3/openmeter/meters" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"key":"hermes_tool_calls","name":"Hermes Tool Calls","event_type":"hermes.tool_call",
       "aggregation":"count","dimensions":{"tool":"$.tool"}}'

Note the field names are snake_case: event_type, value_property, dimensions.

Create the features

A feature is a billable thing tied to a meter, filtered to one slice of it. We make five: input tokens, output tokens, and one per tool.

In the UI: Product Catalog → Features. For each, pick the meter and add a meter filter.

Feature key	Meter	Filter
`input_tokens`	Hermes Tokens	`type` = `input`
`output_tokens`	Hermes Tokens	`type` = `output`
`tool_web_search`	Hermes Tool Calls	`tool` = `web_search`
`tool_fetch_url`	Hermes Tool Calls	`tool` = `fetch_url`
`tool_make_report`	Hermes Tool Calls	`tool` = `make_report`

This is the step that bit me, so read the next line twice. The only feature shape that actually persists the filter is a nested meter object. If you send a different shape, the API still returns 201, but it silently drops the filter, and then every feature meters the whole meter and your invoice shows no per-line charges.

CLI

# meter id from: curl .../v3/openmeter/meters
curl -s -X POST "$KONG_API_URL/v3/openmeter/features" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"key":"input_tokens","name":"Input tokens",
       "meter":{"id":"<HERMES_TOKENS_METER_ID>","filters":{"type":{"eq":"input"}}}}'

After creating each feature, read it back and confirm the filter is there:

curl -s "$KONG_API_URL/v3/openmeter/features" -H "Authorization: Bearer $KONG_PAT" \
  | python3 -c "import sys,json;[print(f['key'],f.get('meter',{}).get('filters')) for f in json.load(sys.stdin)['data']]"

Create a plan with rate cards

The plan prices each feature. These are illustrative numbers chosen so every line is visible. The price is per single unit, so for tokens it is the price of one token. For production you would use small decimals (Hermes 4 70B costs about $0.00000005 per input token, so you would mark up from there).

In the UI: Product Catalog → Plans → New Plan, currency USD, monthly. Add five usage-based rate cards:

Rate card (key = feature key)	Price per unit
`input_tokens`	$0.0005
`output_tokens`	$0.0015
`tool_web_search`	$0.02
`tool_fetch_url`	$0.01
`tool_make_report`	$0.10

The rate card key must equal the feature key. If they differ, the API returns rate_card_key_feature_key_mismatch.

CLI

curl -s -X POST "$KONG_API_URL/v3/openmeter/plans" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"key":"hermes_pro","name":"Hermes Pro","currency":"USD","billing_cadence":"P1M",
       "phases":[{"key":"default","name":"Default","rate_cards":[
         {"billing_cadence":"P1M","key":"input_tokens","name":"Input tokens",
          "feature":{"id":"<INPUT_TOKENS_FEATURE_ID>"},"price":{"type":"unit","amount":"0.0005"}}
       ]}]}'

A plan is created as a draft. Publish it before anything can subscribe:

curl -s -X POST "$KONG_API_URL/v3/openmeter/plans/<PLAN_ID>/publish" \
  -H "Authorization: Bearer $KONG_PAT"

Create the customer and subscribe

Customers are not created from events. The subject rides along on every event, but you have to create a customer whose usage_attribution.subject_keys contains that subject, then subscribe it to the plan.

CLI

curl -s -X POST "$KONG_API_URL/v3/openmeter/customers" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"key":"hermes-demo","name":"Hermes Demo","usage_attribution":{"subject_keys":["hermes-demo"]}}'

curl -s -X POST "$KONG_API_URL/v3/openmeter/subscriptions" \
  -H "Authorization: Bearer $KONG_PAT" -H "Content-Type: application/json" \
  -d '{"customer":{"id":"<CUSTOMER_ID>"},"plan":{"key":"hermes_pro"},"active_from":"2026-01-01T00:00:00Z"}'

One ordering rule: events sent before the subscription starts do not get billed. Subscribe first, then run the agent.

Run the agent and read the invoice

With the subscription live, run the agent again:

python agent.py "Who founded Kong Inc. and what does the company build?"

One run on local Hermes 3 produced this, attributed to the customer:

Line	Usage	Price	Charge
Input tokens	4,616	$0.0005	$2.31
Output tokens	390	$0.0015	$0.58
web_search	2	$0.02	$0.04
fetch_url	3	$0.01	$0.03
make_report	2	$0.10	$0.20
Total			$3.16

Open the customer in Metering & Billing → Customers and the upcoming invoice shows thinking and acting as separate lines.

Where I'd take this next

Free quota then overage per tool, instead of flat per-call pricing.
Add MCP tools and meter each one as its own line.
Move to hosted Hermes 4 70B for a stronger agent, with TOOL_MODE=native.
If you would rather not put metering in app code at all, put the calls behind Kong AI Gateway and let it emit the token usage for you.

How would you price an agent?

Per token? Per tool call? A flat platform fee plus usage? Free searches then paid ones? I went with separate lines for thinking and acting because that is where the cost actually splits, but I am curious what you would do. Drop a comment with the model you use.

The full code is at https://github.com/tejakummarikuntla/Hermes-Billing-with-KongMB PRs welcome.

How to Set Up Per-Agent Billing for CrewAI Agents with Kong

Teja Kummarikuntla — Tue, 02 Jun 2026 17:04:33 +0000

Setting up billing for a single AI agent is easy. The agent uses tokens, you multiply by a price, you send an invoice. Setting up billing for a CrewAI crew is more challenging. A crew has multiple agents working together. Each agent uses tokens differently. Roll them all into one number and you can't tell which agent drove the cost.

In this tutorial, we will build per-agent token billing for a CrewAI multi-agent app. We will track token usage per agent role in CrewAI, send the usage to Kong Konnect Metering & Billing (the managed version of OpenMeter), and turn one crew run into three invoice line items, one per agent.

Here is why this matters. In my CrewAI research crew, the Writer agent uses about twice as many tokens as the Researcher agent. A flat per-token price overcharges Researcher-heavy runs and undercharges Writer-heavy runs. Per-agent billing fixes that. Each agent gets its own meter slice, its own filter, and its own price.

This is a common need for any multi-agent SaaS product, any team trying to monetize CrewAI agents, and any team setting up usage-based billing for AI agents. The same pattern works for LangChain agents, AutoGen crews, or any multi-agent framework that exposes per-call token usage.

Here's how the billing looks for each agent of your CrewAI in Kong Metering and Billing, you will be able to ahcieve this by the end of this tutorial.

The full app is about 200 lines of Python. Setup takes about 30 minutes end to end.

The full reference repo: github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB. Clone it if you want to skim the working code first, or follow the steps below and build it file by file.

git clone https://github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB.git
cd Billing-CrewAI-with-KongMB

Architecture

Every LLM call produces two events. One for the prompt (input) tokens, one for the completion (output) tokens. Both events carry the agent_role. Kong's meter groups token usage per agent. Each feature pulls one agent's slice out of the meter. The plan attaches a per-token price to each feature. The invoice ends up with three line items, one per agent.

What you'll build

This tutorial has two parts: a Python app that uses CrewAI, and a set of resources you configure in Kong Konnect Metering & Billing.

Part 1: The Python app (CrewAI)

A research crew with three agents. Researcher, Analyst, and Writer. Each agent has its own role, goal, and backstory. They run sequentially: Researcher gathers facts, Analyst picks the key insights, Writer turns the insights into a one-page briefing.
A billing listener that captures every LLM call. This is a small Python class called KongBillingListener. It subscribes to CrewAI's event bus. CrewAI fires a notification called LLMCallCompletedEvent every time an agent makes an LLM call. Our listener catches that event, reads the token count and the agent's role, and sends a usage event to Kong.
An entry-point script. Loads the API keys, builds the crew, runs it, and prints a per-agent token summary.

Part 2: The billing setup (Kong Konnect Metering & Billing)

A meter. A meter is a rule that tells Kong which incoming events to count. We create one meter that listens for crewai.llm_call events and sums the tokens.
Three features, one per agent role. A feature is a named "slice" of a meter, filtered by a dimension value. We create one feature for Researcher tokens, one for Analyst tokens, one for Writer tokens. Each feature filters the meter by agent_role.
A plan with three rate cards. A plan groups features and assigns prices. Our plan is called CrewAI Research Pro. It charges $0.0001 per Researcher token, $0.0002 per Analyst token, $0.0005 per Writer token.
A customer and an active subscription. The customer is acme. The subscription connects the customer to the plan. Usage and invoice values then show up in the Konnect portal.

Files in the repo

File	What it does
`crew.py`	Builds the three agents (Researcher, Analyst, Writer), defines their tasks, and wires them into a sequential `Crew`. The agent `role` strings are what end up tagged on every billing event.
`billing.py`	`KongBillingListener` subclasses CrewAI's `BaseEventListener`, subscribes to `LLMCallCompletedEvent`, and POSTs one CloudEvent per token bucket (input + output) to Kong M&B. Tracks per-agent totals in memory for the run summary.
`main.py`	Entry point. Loads `.env`, instantiates the listener, builds the crew, runs `kickoff()`, and prints the final briefing plus per-agent usage.
`setup_kong.py`	One-shot provisioner. Creates the meter, three filtered features, plan, customer, and active subscription via the Kong M&B API. Pass `--teardown` to clean up an earlier run before recreating.
`requirements.txt`	Three deps: `crewai`, `httpx`, `python-dotenv`. No LiteLLM, no LangChain.
`.env.example`	Template for the four secrets and three config values.

Prerequisites

Python 3.10, 3.11, 3.12, or 3.13 (CrewAI requires Python below 3.14)
An OpenAI API key
A free Kong Konnect account: konghq.com
A Konnect Personal Access Token with Metering & Billing write permissions

Steps

🧑‍💻 Part 1: Build the Python app (CrewAI)

Set up the project
Define the research crew
Subscribe to LLMCallCompletedEvent
Run the crew and see per-agent tokens

🧾 Part 2: Set up billing in Kong Metering & Billing

Provision Kong with one script (or skip to the manual path)
Create the meter
Create one feature per agent role
Create a plan with three rate cards
Create the customer and subscribe
Run the crew again and check usage

Set up the project
Define the research crew
Subscribe to LLMCallCompletedEvent
Run the crew and see per-agent tokens
Provision Kong with one script
Create the meter
Create one feature per agent role
Create a plan with three rate cards
Create the customer and subscribe
Run the crew again and check usage

Set up the project

Create a new folder and a Python virtual environment:

mkdir crewai-mb && cd crewai-mb
python3.12 -m venv .venv
source .venv/bin/activate

Three pinned dependencies. No LangChain, no LiteLLM, nothing hidden under the hood.

# requirements.txt
crewai>=1.14.0,<2.0.0
httpx>=0.27.0
python-dotenv>=1.0.1

Install:

pip install -r requirements.txt

Create a .env.example next to your code. This is where the API keys and other config live:

# .env.example

# OpenAI API key (from https://platform.openai.com/api-keys)
OPENAI_API_KEY=sk-...

# OpenAI model used by every CrewAI agent in this demo
MODEL=gpt-4o-mini

# Kong Konnect Metering & Billing ingestion endpoint
# US:  https://us.api.konghq.com/v3/openmeter/events
# EU:  https://eu.api.konghq.com/v3/openmeter/events
# AU:  https://au.api.konghq.com/v3/openmeter/events
KONG_INGEST_URL=https://us.api.konghq.com/v3/openmeter/events

# Personal Access Token from Konnect with Metering & Billing write permissions
# Konnect UI -> profile menu -> Personal Access Tokens
KONG_PAT=kpat_...

# Customer identifier. Becomes the `subject` on every CloudEvent
# and the customer in Konnect M&B once events arrive.
CUSTOMER_ID=acme

# Source identifier, becomes the `source` on every CloudEvent.
# Helps you tell different apps apart in the events view.
EVENT_SOURCE=crewai-research-crew

The KONG_INGEST_URL is region-specific. US orgs use us.api.konghq.com, EU orgs use eu.api.konghq.com, AU orgs use au.api.konghq.com. Use the wrong region and events get silently rejected. Check your region in the Konnect organization settings.

Copy .env.example to .env and fill in real values. Add .env to a .gitignore so secrets never get committed:

# .gitignore
.venv/
__pycache__/
*.pyc
.env
.env.local
*.log

Define the research crew

Three agents, three tasks, run one after the other. The agent role is the most important field. The role string is what we attach to every billing event and what shows up on the invoice. Pick names you are happy seeing on a customer's bill.

# crew.py
"""Three-agent research crew: Researcher -> Analyst -> Writer."""

from __future__ import annotations

import os

from crewai import LLM, Agent, Crew, Process, Task


def _llm() -> LLM:
    return LLM(
        model=os.environ.get("MODEL", "gpt-4o-mini"),
        api_key=os.environ["OPENAI_API_KEY"],
        temperature=0.4,
    )


def build_crew(topic: str) -> Crew:
    llm = _llm()

    researcher = Agent(
        role="Researcher",
        goal=f"Gather concrete, factual material about: {topic}",
        backstory=(
            "You are an analyst who pulls together raw facts, names, dates, "
            "and numbers on a topic. You write in dense bullet lists and "
            "never speculate."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    analyst = Agent(
        role="Analyst",
        goal="Distill research notes into the three sharpest insights",
        backstory=(
            "You read research notes and pull out the three insights that "
            "matter most. You discard noise. You explain each insight in "
            "two sentences."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    writer = Agent(
        role="Writer",
        goal="Turn the analyst's insights into a polished one-page briefing",
        backstory=(
            "You write executive briefings. You open with a one-sentence "
            "summary, then expand each insight with concrete supporting "
            "evidence. You never use jargon."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    research_task = Task(
        description=(
            f"Collect a tight set of facts about: {topic}. "
            "Aim for 8 to 12 bullet points. Each bullet should be a single "
            "fact with a year or named source where possible."
        ),
        expected_output="A bullet list of facts.",
        agent=researcher,
    )

    analysis_task = Task(
        description=(
            "Read the research notes. Pick the three insights that "
            "matter most to a builder evaluating this space. For each "
            "insight, write two sentences."
        ),
        expected_output="Three numbered insights, two sentences each.",
        agent=analyst,
        context=[research_task],
    )

    writing_task = Task(
        description=(
            "Write a one-page briefing for a busy engineering leader. "
            "Open with a one-sentence summary. Then expand each of the "
            "three insights with supporting evidence drawn from the "
            "research notes. Plain language only."
        ),
        expected_output="A one-page briefing in markdown.",
        agent=writer,
        context=[research_task, analysis_task],
    )

    return Crew(
        agents=[researcher, analyst, writer],
        tasks=[research_task, analysis_task, writing_task],
        process=Process.sequential,
        verbose=True,
    )

Subscribe to LLMCallCompletedEvent

CrewAI has a built-in event bus. Every LLM call inside an agent fires an LLMCallCompletedEvent. The event carries the token count, the model name, and the agent's role. To hook into it, we subclass BaseEventListener and register a handler.

# billing.py
"""Per-agent token billing listener for CrewAI.

Subscribes to LLMCallCompletedEvent and ships one CloudEvent per token bucket
(input + output) to Kong Konnect Metering & Billing.
"""

from __future__ import annotations

import logging
import os
import uuid
from datetime import datetime, timezone
from typing import Any

import httpx
from crewai.events import BaseEventListener, LLMCallCompletedEvent

logger = logging.getLogger(__name__)

EVENT_TYPE = "crewai.llm_call"
CLOUDEVENTS_SPEC_VERSION = "1.0"


class KongBillingListener(BaseEventListener):
    """Forwards CrewAI LLM token usage to Kong M&B as CloudEvents.

    One LLM call produces two events: one for prompt (input) tokens and
    one for completion (output) tokens. Both carry the agent_role so the
    meter can group spend per agent in the crew.
    """

    def __init__(
        self,
        ingest_url: str,
        api_key: str,
        subject: str,
        source: str = "crewai-research-crew",
        timeout: float = 5.0,
    ) -> None:
        self.ingest_url = ingest_url
        self.subject = subject
        self.source = source
        self._client = httpx.Client(
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/cloudevents+json",
            },
            timeout=timeout,
        )
        self.events_sent = 0
        self.tokens_by_agent: dict[str, dict[str, int]] = {}
        super().__init__()

    def setup_listeners(self, crewai_event_bus: Any) -> None:
        @crewai_event_bus.on(LLMCallCompletedEvent)
        def handle_llm_call(_source: Any, event: LLMCallCompletedEvent) -> None:
            self._record(event)

    def _record(self, event: LLMCallCompletedEvent) -> None:
        usage = event.usage or {}
        agent_role = getattr(event, "agent_role", None) or "unknown"
        model = event.model or "unknown"
        call_id = event.call_id

        prompt_tokens = int(usage.get("prompt_tokens", 0) or 0)
        completion_tokens = int(usage.get("completion_tokens", 0) or 0)

        bucket = self.tokens_by_agent.setdefault(
            agent_role, {"input": 0, "output": 0}
        )

        if prompt_tokens:
            self._emit(call_id, agent_role, model, "input", prompt_tokens)
            bucket["input"] += prompt_tokens

        if completion_tokens:
            self._emit(call_id, agent_role, model, "output", completion_tokens)
            bucket["output"] += completion_tokens

    def _emit(
        self,
        call_id: str,
        agent_role: str,
        model: str,
        token_type: str,
        tokens: int,
    ) -> None:
        payload = {
            "specversion": CLOUDEVENTS_SPEC_VERSION,
            "id": f"{call_id}-{token_type}-{uuid.uuid4().hex[:8]}",
            "source": self.source,
            "type": EVENT_TYPE,
            "subject": self.subject,
            "time": datetime.now(timezone.utc).isoformat(),
            "datacontenttype": "application/json",
            "data": {
                "tokens": tokens,
                "type": token_type,
                "agent_role": agent_role,
                "model": model,
                "call_id": call_id,
            },
        }

        try:
            response = self._client.post(self.ingest_url, json=payload)
            response.raise_for_status()
            self.events_sent += 1
        except httpx.HTTPError as exc:
            logger.warning(
                "Kong M&B ingest failed for %s (%s tokens): %s",
                agent_role,
                tokens,
                exc,
            )

    def close(self) -> None:
        self._client.close()

    def summary(self) -> str:
        lines = ["Per-agent token usage:"]
        for role, counts in self.tokens_by_agent.items():
            total = counts["input"] + counts["output"]
            lines.append(
                f"  {role:25s}  input={counts['input']:6d}  "
                f"output={counts['output']:6d}  total={total:6d}"
            )
        lines.append(f"Events sent to Kong M&B: {self.events_sent}")
        return "\n".join(lines)


def from_env() -> KongBillingListener:
    ingest_url = os.environ["KONG_INGEST_URL"]
    api_key = os.environ["KONG_PAT"]
    subject = os.environ.get("CUSTOMER_ID", "acme")
    source = os.environ.get("EVENT_SOURCE", "crewai-research-crew")
    return KongBillingListener(
        ingest_url=ingest_url,
        api_key=api_key,
        subject=subject,
        source=source,
    )

Three things worth pointing out in this code:

Two events per LLM call, not one. The listener sends one event for input tokens and one for output tokens. Splitting them now lets us bill them at different rates later.

Unique event IDs for safe retries. Each event ID is built from CrewAI's call_id, the token type (input or output), and a short random string. Kong deduplicates events by id plus source, so this format makes retries safe without losing the input/output split.

Errors are logged, not raised. If Kong is briefly down, the crew run keeps going. A dropped event is better than a crashed customer run. In production, add a retry queue for the dropped events.

The from_env() helper at the bottom is what main.py uses to build the listener from .env values.

Run the crew and see per-agent tokens

The entry-point script loads .env, builds the listener (which registers itself on the event bus during __init__), and kicks off the crew.

# main.py
"""Run the research crew and ship per-agent token usage to Kong M&B."""

from __future__ import annotations

import argparse
import os
import sys

from dotenv import load_dotenv

from billing import from_env
from crew import build_crew


def main() -> int:
    load_dotenv()

    parser = argparse.ArgumentParser(description="CrewAI research briefing")
    parser.add_argument(
        "topic",
        nargs="?",
        default="Usage-based pricing for AI agent products in 2026",
        help="Topic the crew should research",
    )
    parser.add_argument(
        "--customer",
        default=None,
        help="Customer ID to bill (overrides CUSTOMER_ID env var)",
    )
    args = parser.parse_args()

    if args.customer:
        os.environ["CUSTOMER_ID"] = args.customer

    listener = from_env()
    print(f"Billing customer: {os.environ['CUSTOMER_ID']}")
    print(f"Topic:            {args.topic}\n")

    try:
        crew = build_crew(args.topic)
        result = crew.kickoff()
        print("\n--- Briefing ---\n")
        print(result.raw if hasattr(result, "raw") else result)
        print("\n--- Billing ---\n")
        print(listener.summary())
    finally:
        listener.close()

    return 0


if __name__ == "__main__":
    sys.exit(main())

The per-agent summary at the end comes from listener.summary(). The listener tracks tokens in memory as events fire and formats them at the end of the run.

Run it:

python main.py "Strategies for monetizing developer tools with usage-based pricing"

You will see CrewAI's verbose output as each agent thinks, then the final briefing, then the billing summary. From one of my runs:

Per-agent token usage:
  Researcher                 input=   154  output=   453  total=   607
  Analyst                    input=   587  output=   176  total=   763
  Writer                     input=   779  output=   421  total=  1200
Events sent to Kong M&B: 6

This is why per-agent billing matters. Each agent has a different token shape:

Researcher: short prompt in, long fact dump out.
Analyst: long facts in, three short insights out.
Writer: everything before it in, the longest output out.

Each role uses tokens differently. The cost per agent is different. A single flat price hides all of that.

The events are in Kong M&B now, but no meter is matching them yet. They sit in the events table with a validation warning. The next steps fix that.

Provision Kong with one script

The next four steps (meter, features, plan, customer, subscription) can be done in a single command using the setup_kong.py script from the repo:

python setup_kong.py

To start over from a clean slate, pass --teardown. The script cancels the subscription, archives the plan, deletes the features and meter, and then recreates everything:

python setup_kong.py --teardown

The customer record is kept across teardowns so event history stays attached to the same subject.

Here is the script in full. It is the source of truth for the role names (Researcher, Analyst, Writer) and prices ($0.0001, $0.0002, $0.0005) used in the rest of this tutorial. The same values show up in the manual UI walk-through below, so the script and the click-by-click path produce the same setup.

# setup_kong.py
"""One-shot provisioner for Kong Konnect Metering & Billing.

Creates the meter, three features, a published plan with three rate cards,
the customer, and an active subscription. Designed to be re-run on a clean
org. Run with --teardown to delete a previous provisioning before recreating.

Each feature uses a meter group-by filter on agent_role so that the Researcher,
Analyst, and Writer features each only count tokens consumed by that agent.
Without the filter, every feature would aggregate the entire meter and the
invoice would show no per-role breakdown.
"""

from __future__ import annotations

import argparse
import os
import sys

import httpx
from dotenv import load_dotenv

ROLES = ["Researcher", "Analyst", "Writer"]
PRICES = {"Researcher": "0.0001", "Analyst": "0.0002", "Writer": "0.0005"}
METER_KEY = "crewai_tokens"
PLAN_KEY = "crewai_research_pro"


def _client() -> httpx.Client:
    load_dotenv()
    base = os.environ["KONG_INGEST_URL"].rsplit("/", 1)[0]
    pat = os.environ["KONG_PAT"]
    return httpx.Client(
        base_url=base,
        headers={
            "Authorization": f"Bearer {pat}",
            "Content-Type": "application/json",
        },
        timeout=15.0,
    )


def teardown(s: httpx.Client, customer_key: str) -> None:
    """Cancel/archive then delete subscription -> plan -> features -> meter.

    Subscriptions are cancelled (not deleted), plans are archived. Features
    and meters use DELETE. The customer is preserved so subjects keep their
    history.
    """
    print("Teardown ...")
    customers = s.get("/customers", params={"key": customer_key}).json().get("data", [])
    for c in customers:
        if c["key"] != customer_key:
            continue
        for sub in s.get("/subscriptions").json().get("data", []):
            if sub["customer_id"] == c["id"] and sub.get("status") == "active":
                r = s.post(f"/subscriptions/{sub['id']}/cancel", json={})
                print(f"  subscription {sub['id']} cancel -> {r.status_code}")

    for plan in s.get("/plans").json().get("data", []):
        if plan["key"] != PLAN_KEY:
            continue
        if plan.get("status") == "active":
            r = s.post(f"/plans/{plan['id']}/archive", json={})
            print(f"  plan {plan['id']} archive -> {r.status_code}")

    for feat in s.get("/features").json().get("data", []):
        if not feat["key"].startswith("crewai_"):
            continue
        r = s.delete(f"/features/{feat['id']}")
        print(f"  feature {feat['key']} -> {r.status_code}")

    for meter in s.get("/meters").json().get("data", []):
        if meter["key"] != METER_KEY:
            continue
        r = s.delete(f"/meters/{meter['id']}")
        print(f"  meter {meter['key']} -> {r.status_code}")


def provision(s: httpx.Client, customer_key: str) -> None:
    # 1. Meter
    print("Creating meter ...")
    r = s.post("/meters", json={
        "key": METER_KEY,
        "name": "CrewAI Tokens",
        "description": "Tokens consumed by CrewAI agents per role",
        "event_type": "crewai.llm_call",
        "value_property": "$.tokens",
        "aggregation": "sum",
        "dimensions": {
            "agent_role": "$.agent_role",
            "type": "$.type",
            "model": "$.model",
        },
    })
    r.raise_for_status()
    meter = r.json()
    print(f"  meter id={meter['id']}")

    # 2. Three features, each filtered by agent_role
    print("Creating features ...")
    feature_ids: dict[str, str] = {}
    for role in ROLES:
        key = f"crewai_{role.lower()}_tokens"
        r = s.post("/features", json={
            "key": key,
            "name": f"CrewAI {role} Tokens",
            "meter": {
                "id": meter["id"],
                "filters": {"agent_role": {"eq": role}},
            },
        })
        r.raise_for_status()
        feature_ids[role] = r.json()["id"]
        print(f"  {key:30s} id={feature_ids[role]}")

    # 3. Plan with three rate cards
    print("Creating plan ...")
    rate_cards = []
    for role in ROLES:
        rate_cards.append({
            "key": f"crewai_{role.lower()}_tokens",
            "name": f"{role} Tokens",
            "billing_cadence": "P1M",
            "feature": {"id": feature_ids[role]},
            "price": {"type": "unit", "amount": PRICES[role]},
        })
    r = s.post("/plans", json={
        "key": PLAN_KEY,
        "name": "CrewAI Research Pro",
        "currency": "USD",
        "billing_cadence": "P1M",
        "pro_rating_enabled": True,
        "phases": [
            {"key": "default", "name": "Default", "rate_cards": rate_cards}
        ],
    })
    r.raise_for_status()
    plan_id = r.json()["id"]
    print(f"  plan id={plan_id} status={r.json().get('status')}")

    # 4. Publish
    print("Publishing plan ...")
    r = s.post(f"/plans/{plan_id}/publish", json={})
    r.raise_for_status()
    print(f"  plan status={r.json().get('status')}")

    # 5. Customer (reuse if exists)
    print(f"Ensuring customer key={customer_key} ...")
    existing = [c for c in s.get("/customers", params={"key": customer_key}).json().get("data", []) if c["key"] == customer_key]
    if existing:
        customer_id = existing[0]["id"]
        print(f"  reusing customer id={customer_id}")
    else:
        r = s.post("/customers", json={
            "key": customer_key,
            "name": "Acme Inc",
            "currency": "USD",
            "usage_attribution": {"subject_keys": [customer_key]},
        })
        r.raise_for_status()
        customer_id = r.json()["id"]
        print(f"  customer id={customer_id}")

    # 6. Subscription
    print("Subscribing customer to plan ...")
    r = s.post("/subscriptions", json={
        "customer": {"id": customer_id},
        "plan": {"id": plan_id},
    })
    r.raise_for_status()
    sub = r.json()
    print(f"  subscription id={sub['id']} status={sub.get('status')}")

    print("\nDone. Provisioning summary:")
    print(f"  meter:        {meter['id']}")
    print(f"  features:     {feature_ids}")
    print(f"  plan:         {plan_id}")
    print(f"  customer:     {customer_id}")
    print(f"  subscription: {sub['id']}")


def main() -> int:
    parser = argparse.ArgumentParser()
    parser.add_argument("--teardown", action="store_true",
                        help="Delete prior CrewAI provisioning before recreating")
    args = parser.parse_args()

    customer_key = os.environ.get("CUSTOMER_ID", "acme")
    with _client() as s:
        if args.teardown:
            teardown(s, customer_key)
        provision(s, customer_key)
    return 0


if __name__ == "__main__":
    sys.exit(main())

A few things worth pointing out before the manual walk-through.

The four constants at the top are the only knobs. ROLES, PRICES, METER_KEY, and PLAN_KEY are the values you would change to use this script for a different crew or pricing model. Everything below them is mechanical.

The feature filter shape is strict. The script uses meter: {id, filters: {agent_role: {eq: role}}}. If you change the shape, the Kong API still returns 201 but it silently drops the filter. The feature then sums the whole meter and per-agent billing breaks. After creating a feature, always GET it back and confirm meter.filters is set.

Subscriptions and plans are not deleted. They are cancelled or archived. The teardown helper uses POST /subscriptions/{id}/cancel and POST /plans/{id}/archive. Features and meters do support DELETE.

The next four sections walk through the same steps by hand using the UI and curl. Read them to understand what each Kong resource does, or skip ahead to Run the crew again and check usage if you already ran the script.

Create the meter

A meter is a rule that tells Kong how to count incoming events. Open Konnect, go to Metering & Billing → Metering, and click Create Meter.

For this tutorial we skip the LLM Tokens template (it expects events from Kong AI Gateway) and configure the meter from scratch.

Field	Value
Name	CrewAI Tokens
Key	`crewai_tokens`
Event type	`crewai.llm_call`
Value property	`$.tokens`
Aggregation	Sum
Dimensions	`agent_role` → `$.agent_role`, `type` → `$.type`, `model` → `$.model`

The event_type must match the type field on the CloudEvents your listener sends. If they don't match, events still flow in but no meter picks them up.

Dimensions are important. They tell the meter to keep agent_role, type, and model available as group-by axes. Without dimensions, you get one big bucket of tokens with no breakdown.

CLI

curl -X POST https://us.api.konghq.com/v3/openmeter/meters \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "crewai_tokens",
    "name": "CrewAI Tokens",
    "description": "Tokens consumed by CrewAI agents per role",
    "event_type": "crewai.llm_call",
    "value_property": "$.tokens",
    "aggregation": "sum",
    "dimensions": {
      "agent_role": "$.agent_role",
      "type": "$.type",
      "model": "$.model"
    }
  }'

The response includes the meter id (a ULID starting with 01). Save it. You will need it when creating features.

Create one feature per agent role

A feature is a named slice of a meter, optionally filtered by dimension values. We need three features, one per agent role. All three point at the same crewai_tokens meter.

Go to Product Catalog → Features tab → Create Feature. Repeat three times, once per role:

Name	Key	Meter	Filter
CrewAI Researcher Tokens	`crewai_researcher_tokens`	CrewAI Tokens	`agent_role = Researcher`
CrewAI Analyst Tokens	`crewai_analyst_tokens`	CrewAI Tokens	`agent_role = Analyst`
CrewAI Writer Tokens	`crewai_writer_tokens`	CrewAI Tokens	`agent_role = Writer`

The feature key must match the rate card key you set on the plan in the next step. Pick descriptive keys now and the rest of the wiring stays clean.

Get the filter shape right. Kong expects the meter as an object with the meter id and a filters map. Filter values use operators like {"eq": "..."}, not bare strings. If you get it wrong, the API still returns 201 but silently drops the filter. The feature then sums the whole meter and your invoice ends up empty. After creating a feature, always GET it back and check that meter.filters is set.

CLI

# Look up the meter id once
METER_ID=$(curl -s "https://us.api.konghq.com/v3/openmeter/meters" \
  -H "Authorization: Bearer $KONG_PAT" | \
  jq -r '.data[] | select(.key=="crewai_tokens") | .id')

for role in Researcher Analyst Writer; do
  lower=$(echo "$role" | tr '[:upper:]' '[:lower:]')
  curl -X POST https://us.api.konghq.com/v3/openmeter/features \
    -H "Authorization: Bearer $KONG_PAT" \
    -H "Content-Type: application/json" \
    -d "{
      \"key\": \"crewai_${lower}_tokens\",
      \"name\": \"CrewAI ${role} Tokens\",
      \"meter\": {
        \"id\": \"${METER_ID}\",
        \"filters\": {\"agent_role\": {\"eq\": \"${role}\"}}
      }
    }"
done

Create a plan with three rate cards

A plan ties features to prices. Go to Product Catalog → Plans tab → New Plan. Name it CrewAI Research Pro, currency USD, monthly cadence. Then add three rate cards in the default phase:

Rate card key	Feature	Price (USD per token)
`crewai_researcher_tokens`	CrewAI Researcher Tokens	0.0001
`crewai_analyst_tokens`	CrewAI Analyst Tokens	0.0002
`crewai_writer_tokens`	CrewAI Writer Tokens	0.0005

The rate card key must match the feature key. If they don't match, Kong returns a rate_card_key_feature_key_mismatch error.

Prices are per single token. To charge $5 per million tokens, enter 0.000005, not 5. The decimals look uncomfortable but they are correct. I used round numbers like 0.0001 here so usage and dollar amounts are easy to read while testing. Real production pricing usually looks like 0.0000003.

After the rate cards are in, click Publish. A draft plan cannot accept subscriptions.

CLI

Build the plan in two steps: create as draft, then publish.

PLAN=$(curl -s -X POST https://us.api.konghq.com/v3/openmeter/plans \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "crewai_research_pro",
    "name": "CrewAI Research Pro",
    "currency": "USD",
    "billing_cadence": "P1M",
    "pro_rating_enabled": true,
    "phases": [{
      "key": "default",
      "name": "Default",
      "rate_cards": [
        {"key": "crewai_researcher_tokens", "name": "Researcher Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_researcher_tokens"},
         "price": {"type": "unit", "amount": "0.0001"}},
        {"key": "crewai_analyst_tokens", "name": "Analyst Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_analyst_tokens"},
         "price": {"type": "unit", "amount": "0.0002"}},
        {"key": "crewai_writer_tokens", "name": "Writer Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_writer_tokens"},
         "price": {"type": "unit", "amount": "0.0005"}}
      ]
    }]
  }' | jq -r .id)

curl -X POST "https://us.api.konghq.com/v3/openmeter/plans/$PLAN/publish" \
  -H "Authorization: Bearer $KONG_PAT"

Create the customer and subscribe

In Konnect, go to Customers → New Customer. Name it Acme Inc, key acme, currency USD. The important field is Subject keys. It must include acme. This is how Kong matches incoming events to a customer. Our listener sets the subject field on every event to acme (from the CUSTOMER_ID value in .env).

Then open the customer, click Add Subscription, pick CrewAI Research Pro, and start it immediately.

CLI

CUSTOMER=$(curl -s -X POST https://us.api.konghq.com/v3/openmeter/customers \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "acme",
    "name": "Acme Inc",
    "currency": "USD",
    "usage_attribution": {"subject_keys": ["acme"]}
  }' | jq -r .id)

curl -X POST https://us.api.konghq.com/v3/openmeter/subscriptions \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d "{
    \"customer\": {\"id\": \"$CUSTOMER\"},
    \"plan\": {\"key\": \"crewai_research_pro\"}
  }"

Run the crew again and check usage

One detail is easy to miss. Events sent to Kong before a subscription starts do not get billed. Only events with a timestamp inside the active subscription window roll into an invoice.

Run the crew one more time after the subscription is active:

python main.py "Best practices for instrumenting LLM token usage in multi-agent systems"

Open Konnect, go to the Acme Inc customer, and switch to the Usage tab. You should see three rows, one per feature, each with a total token count and a cost. Switch to Invoices and the same three rows show up as line items on the upcoming invoice.

To check from the CLI, query the events endpoint and confirm validation_errors is empty:

curl -s "https://us.api.konghq.com/v3/openmeter/events?type=crewai.llm_call&limit=6" \
  -H "Authorization: Bearer $KONG_PAT" | \
  jq '.data[] | {role: .event.data.agent_role,
                 type: .event.data.type,
                 tokens: .event.data.tokens,
                 errors: (.validation_errors | length)}'

A clean run looks like:

{"role": "Writer", "type": "output", "tokens": 421, "errors": 0}
{"role": "Writer", "type": "input", "tokens": 779, "errors": 0}
{"role": "Researcher", "type": "output", "tokens": 453, "errors": 0}
{"role": "Analyst", "type": "input", "tokens": 587, "errors": 0}
{"role": "Analyst", "type": "output", "tokens": 176, "errors": 0}
{"role": "Researcher", "type": "input", "tokens": 154, "errors": 0}

Six events, one per (agent role, token type) bucket. All six match the meter and roll into the customer's subscription.

How would you price your crew?

Per role like in this tutorial? Per total tokens? Per task? Per crew run? The right answer depends on what your customers can predict and what hurts your margin when they cannot. Drop a comment with the pricing model you use. I want to hear what is working in the wild.

The full code is at github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB. PRs welcome.

Usage-Based Billing for AI Agents with FastAPI and Kong

Teja Kummarikuntla — Tue, 26 May 2026 15:34:49 +0000

If you've built an AI agent, the next question is simple: how do you charge for it?

Flat subscriptions don't fit AI workloads. Token costs vary by model, by direction (input vs output), and by how much each user actually consumes. One user might send 200 tokens a day. Another burns through 50,000. A flat fee either overcharges the light user or subsidizes the heavy one.

What you need is usage-based billing: each user pays for exactly what they use. In this tutorial, you'll build a sample AI agent and set up billing for it using FastAPI, OpenMeter, and Kong Metering & Billing (Cloud OpenMeter engine).

What is Usage-Based Billing?

Usage-based billing means charging customers based on actual consumption rather than a fixed amount. For AI agents, every API call has a measurable cost tied to token counts, and those costs vary by model. It's a natural fit.

A usage-based billing system needs four things:

Event ingestion: Capture usage data every time something billable happens
Metering: Aggregate raw events into per-customer totals per billing period
Pricing: Apply rate cards or tiers to metered usage
Invoicing: Generate bills and collect payment

Building all of this from scratch is a real engineering project. You need event storage, deduplication, windowed aggregation, and a billing layer on top. That's a lot of infrastructure for something that isn't your core product.

The Tools

OpenMeter is an open-source metering engine maintained by Kong. It's built on the CloudEvents standard (a CNCF specification) and handles real-time event ingestion, deduplication, and aggregation at scale. The source code is on GitHub, and you can self-host it if you want full control.

Kong Konnect Metering & Billing is the cloud platform built on top of the OpenMeter repository. It adds the billing layer: features, pricing plans, subscriptions, invoicing, and payment provider integration. In this tutorial we will be using this platfrom to ingest, aggregate, apply rates and generate invoice.

What You'll Build

Here's the architecture of the system you'll build:

The flow works like this:

A user calls your API endpoint (/generate, /summarize, or /analyze)
Your agent sends the request to OpenAI and gets a response with token counts
Your app sends a CloudEvent to the Kong Metering & Billing API with the usage data (tokens consumed, model used, user ID)
The platform aggregates the events into meters, applies pricing from the user's plan, and generates an invoice at the end of the billing cycle
A payment provider collects payment

Your code handles steps 1 through 3. Steps 4 and 5 happen automatically once you've configured the billing system.

Prerequisites

To follow along, you'll need:

Python 3.10 or later
An OpenAI API key (you can use any LLM provider, but this tutorial uses OpenAI)
A Kong Konnect account for Metering & Billing
A Konnect Personal Access Token (PAT) for API access
Basic familiarity with FastAPI and REST APIs

Step 1: Set Up the Project
Step 2: Build the AI API
Step 3: Add API Key Authentication
Step 4: Send Usage Events to Konnect Metering & Billing
Step 5: Set Up Meters
Step 6: Define Features and Pricing Plans
Step 7: Onboard a Customer and Create a Subscription
Step 8: Test the Full Billing Flow
Step 9: Connect a Payment Provider
Common Gotchas and Production Tips
What You Learned
Where to Go Next

Step 1: Set Up the Project

Create a new project directory and set up a virtual environment:

mkdir ai-billing-app
cd ai-billing-app
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the dependencies:

pip install fastapi uvicorn openai httpx python-dotenv pydantic

A few notes on the packages:

fastapi and uvicorn: The web framework and ASGI server for your API
openai: The OpenAI Python SDK for making LLM calls
httpx: HTTP client for sending usage events and making API calls to Konnect Metering & Billing
python-dotenv: Loads environment variables from a .env file
pydantic: Data validation (bundled with FastAPI, listed for clarity)

Create a .env file for your configuration:

# .env
OPENAI_API_KEY=sk-your-openai-api-key
KONNECT_API_URL=https://us.api.konghq.com/v3/openmeter
KONNECT_TOKEN=kpat_your_konnect_personal_access_token

The KONNECT_TOKEN is a Personal Access Token you create in Konnect under your account settings.

For the curl commands throughout this tutorial, export these variables in your terminal:

export KONNECT_API_URL=https://us.api.konghq.com/v3/openmeter
export KONNECT_TOKEN=kpat_your_konnect_personal_access_token

Create the main application file:

touch app.py

Your project structure will look like this:

ai-billing-app/
├── app.py           # Main FastAPI application
├── .env             # Environment variables
└── requirements.txt # Dependencies

Step 2: Build the AI API

Start with a basic FastAPI app that wraps OpenAI's API. This gives your users three capabilities: generating text, summarizing content, and analyzing text for insights.

Open app.py and add the following:

import os
import uuid
import datetime
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


class GenerateRequest(BaseModel):
    prompt: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 1024


class SummarizeRequest(BaseModel):
    text: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 512


class AnalyzeRequest(BaseModel):
    text: str
    query: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 512


class AIResponse(BaseModel):
    content: str
    model: str
    usage: dict


app = FastAPI(title="AI Agent API")


@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )


@app.post("/api/summarize", response_model=AIResponse)
def summarize_text(request: SummarizeRequest):
    """Summarize a piece of text."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Summarize the following text concisely.",
            },
            {"role": "user", "content": request.text},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )


@app.post("/api/analyze", response_model=AIResponse)
def analyze_text(request: AnalyzeRequest):
    """Analyze text and extract insights based on a query."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Analyze the provided text and answer the user's query about it.",
            },
            {"role": "user", "content": f"Text: {request.text}\n\nQuery: {request.query}"},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )

Test it to make sure the API works:

uvicorn app:app --reload

In another terminal:

curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a haiku about Python programming"}'

You should get a response with generated text and token usage counts. The usage field is what you'll feed into the billing system.

Step 3: Add API Key Authentication

Before you can bill users, you need to know who's making each request. Add a simple API key authentication layer.

In a production system, you'd store API keys in a database and hash them. For this tutorial, you'll use an in-memory dictionary to keep the focus on billing.

Update app.py to add authentication:

# Add this after the load_dotenv() call

# In production, store these in a database with hashed keys
API_KEYS = {
    "ak_user1_abc123": {"user_id": "user-001", "name": "Alice"},
    "ak_user2_def456": {"user_id": "user-002", "name": "Bob"},
    "ak_user3_ghi789": {"user_id": "user-003", "name": "Charlie"},
}


def authenticate(x_api_key: str | None = Header(default=None)) -> dict:
    """Validate the API key and return the user info."""
    if not x_api_key:
        raise HTTPException(status_code=401, detail="Missing API key")
    user = API_KEYS.get(x_api_key)
    if not user:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return user

Now update each endpoint to require authentication. Here's the updated /api/generate endpoint as an example (apply the same pattern to /api/summarize and /api/analyze):

from fastapi import FastAPI, HTTPException, Header, Depends

@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest, user: dict = Depends(authenticate)):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )

Test that authentication works:

# This should fail with 401
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello"}'

# This should succeed
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Write a haiku about Python programming"}'

Now you know who's making each request. That user_id field is what ties API usage to a billing customer.

Step 4: Send Usage Events to Konnect Metering & Billing

This is where billing starts. Every time a user makes an API call, you'll send a usage event to Konnect Metering & Billing. The metering layer (powered by the open-source OpenMeter engine) ingests and aggregates these events. You'll talk to the API directly using standard HTTP calls, no extra dependencies needed.

What are CloudEvents?

The Konnect Metering & Billing API accepts events in the CloudEvents format, which is a CNCF standard for describing event data. Every usage event you send is a CloudEvents JSON object.

Here's what a single usage event looks like:

{
  "specversion": "1.0",
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "source": "ai-text-api",
  "type": "llm_token_usage",
  "subject": "user-001",
  "time": "2026-04-13T10:30:00Z",
  "datacontenttype": "application/json",
  "data": {
    "total_tokens": 247,
    "prompt_tokens": 52,
    "completion_tokens": 195,
    "model": "gpt-4o-mini",
    "endpoint": "/api/generate"
  }
}

Each field has a specific purpose in the billing pipeline:

Field	What it does	Example
`specversion`	CloudEvents version. Always `"1.0"`	`"1.0"`
`id`	Unique event ID. Used for deduplication.	UUID string
`source`	Identifies your app or service	`"ai-text-api"`
`type`	Matches events to meters. You'll define a meter that listens for this type.	`"llm_token_usage"`
`subject`	The customer/user this event belongs to. This is how usage is attributed.	`"user-001"`
`time`	When the event happened (RFC 3339 format)	`"2026-04-13T10:30:00Z"`
`datacontenttype`	Format of the data payload	`"application/json"`
`data`	Your custom payload. Contains the values you want to meter.	Token counts, model, endpoint

The subject field is critical. It's what ties an event to a specific customer. When you set up billing later, you'll create a customer with matching subject_keys, and all events with that subject will roll up into their usage.

You can test event ingestion directly with curl to make sure your Konnect credentials work before wiring it into your Python code:

curl -X POST $KONNECT_API_URL/events \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/cloudevents+json" \
  -d '{
    "specversion": "1.0",
    "id": "test-event-001",
    "source": "ai-text-api",
    "type": "llm_token_usage",
    "subject": "user-001",
    "time": "2026-04-13T10:30:00Z",
    "datacontenttype": "application/json",
    "data": {
      "total_tokens": 100,
      "prompt_tokens": 40,
      "completion_tokens": 60,
      "model": "gpt-4o-mini",
      "endpoint": "/api/generate"
    }
  }'

A 202 Accepted response means the event was ingested successfully. Note the content type: application/cloudevents+json. This tells the API you're sending a single CloudEvent.

Create the Billing Module

Now add a function to your Python app that sends usage events to Konnect after every API call. This uses httpx to POST CloudEvents directly to the Konnect Metering & Billing events endpoint:

import httpx

KONNECT_API_URL = os.getenv("KONNECT_API_URL", "https://us.api.konghq.com/v3/openmeter")
KONNECT_TOKEN = os.getenv("KONNECT_TOKEN")


def track_usage(user_id: str, model: str, endpoint: str, usage: dict):
    """Send a usage event to Konnect Metering & Billing."""
    event = {
        "specversion": "1.0",
        "id": str(uuid.uuid4()),
        "source": "ai-text-api",
        "type": "llm_token_usage",
        "subject": user_id,
        "time": datetime.datetime.now(datetime.timezone.utc).isoformat(),
        "datacontenttype": "application/json",
        "data": {
            "total_tokens": usage["total_tokens"],
            "prompt_tokens": usage["prompt_tokens"],
            "completion_tokens": usage["completion_tokens"],
            "model": model,
            "endpoint": endpoint,
        },
    }

    try:
        response = httpx.post(
            f"{KONNECT_API_URL}/events",
            headers={
                "Authorization": f"Bearer {KONNECT_TOKEN}",
                "Content-Type": "application/cloudevents+json",
            },
            json=event,
        )
        response.raise_for_status()
    except Exception as e:
        # Log the error but don't fail the request.
        # In production, use a dead-letter queue for failed events.
        print(f"Failed to track usage event: {e}")

A few important things about this function:

Each event gets a unique UUID as its id. The metering engine deduplicates events by the combination of id + source. If a network retry sends the same event twice, it won't be counted twice.
The subject is the user's ID, not their name or API key. This maps directly to the billing customer you'll create later.
The type is llm_token_usage. You'll configure a meter to listen for events with this exact type. If the type doesn't match, the events won't be metered.
The content type must be application/cloudevents+json. This is how the Konnect API knows to parse the request body as a CloudEvent.
Errors don't fail the request. Billing telemetry should never break your user's API experience. In production, you'd send failed events to a retry queue.

Track Tokens on Every Request

Now wire track_usage into each endpoint. Update the /api/generate endpoint:

@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest, user: dict = Depends(authenticate)):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    # Track this usage event for billing
    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/generate",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )

Apply the same pattern to /api/summarize and /api/analyze. The only difference is the endpoint value you pass to track_usage.

Here's the updated /api/summarize:

@app.post("/api/summarize", response_model=AIResponse)
def summarize_text(request: SummarizeRequest, user: dict = Depends(authenticate)):
    """Summarize a piece of text."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "Summarize the following text concisely."},
            {"role": "user", "content": request.text},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/summarize",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )

And /api/analyze:

@app.post("/api/analyze", response_model=AIResponse)
def analyze_text(request: AnalyzeRequest, user: dict = Depends(authenticate)):
    """Analyze text and extract insights based on a query."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Analyze the provided text and answer the user's query about it.",
            },
            {"role": "user", "content": f"Text: {request.text}\n\nQuery: {request.query}"},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/analyze",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )

At this point, every API call sends a CloudEvent with the user's token consumption to the Konnect Metering & Billing API. The events are flowing. Now you need to tell the billing system what to do with them.

Step 5: Set Up Meters

A meter defines how raw events get aggregated into meaningful usage numbers. Think of it as a SQL GROUP BY query that runs continuously. It takes a stream of events and produces totals like "User A consumed 15,247 tokens this month."

What is a Meter?

A meter has four key properties:

Property	What it does
`key`	A unique identifier for the meter (for example `llm_total_tokens`). Lowercase letters, numbers, and underscores only.
`event_type`	Which events feed this meter. Must match the `type` field in your CloudEvents.
`aggregation`	How to combine values: `sum`, `count`, `avg`, `min`, `max`, `unique_count`, `latest`
`value_property`	JSONPath to the numeric field in the event's `data` (for example `$.total_tokens`)
`dimensions`	Optional named JSONPath expressions to break down usage (for example by model or endpoint)

For billing token usage, you want a meter that SUMs the total_tokens field from every llm_token_usage event, grouped by model.

Create a Token Usage Meter

You can create meters through the Konnect UI or via the API. Here's the API approach, which is more reproducible:

curl -X POST $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "llm_total_tokens",
    "name": "Total LLM Tokens",
    "description": "Total LLM tokens consumed per request",
    "event_type": "llm_token_usage",
    "aggregation": "sum",
    "value_property": "$.total_tokens",
    "dimensions": {
      "model": "$.model",
      "endpoint": "$.endpoint"
    }
  }'

This meter does the following:

Listens for events with type: "llm_token_usage" (matching what your app sends)
SUMs the total_tokens value from each event's data payload
Groups the totals by model and endpoint, so you can see usage breakdowns like "User A consumed 10,000 tokens on gpt-4o-mini via /api/generate"

You might also want separate meters for input and output tokens, since output tokens are typically more expensive (3 to 5x more with most LLM providers). Here's how to create an output token meter:

curl -X POST $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "llm_completion_tokens",
    "name": "LLM Completion Tokens",
    "description": "LLM output/completion tokens consumed",
    "event_type": "llm_token_usage",
    "aggregation": "sum",
    "value_property": "$.completion_tokens",
    "dimensions": {
      "model": "$.model"
    }
  }'

Verify Events Are Flowing

After creating your meters, send a few test requests through your API to generate some events:

# Send a test request
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Explain what a REST API is in two sentences"}'

Then verify that the meter is receiving events. You can confirm this in two ways:

Konnect dashboard: Go to Metering & Billing > Meters, select llm_total_tokens, and check for incoming data for subject user-001.
List meters via API to confirm the meter exists and is active:

curl $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN"

If the meter shows no data after a few seconds, check that:

Your KONNECT_TOKEN is valid
The event_type in your meter matches the type in your CloudEvents (llm_token_usage)
The value_property path ($.total_tokens) matches your event data structure

Step 6: Define Features and Pricing Plans

Meters give you raw usage data. Features and plans turn that data into billable items with prices.

The hierarchy works like this:

Meter (raw usage aggregation)
  └─► Feature (billable unit: "AI Tokens")
        └─► Rate Card (pricing: $0.002 per 1,000 tokens)
              └─► Plan (collection of rate cards: "Pro Plan")

Create a Feature

A feature is a customer-facing billable unit. It links a meter to something you can put a price on. You configure features in the Konnect dashboard:

Go to Metering & Billing > Product Catalog > Features
Click Create Feature
Set the key to ai_tokens, the name to "AI Tokens"
Select llm_total_tokens as the meter
Save

This creates a feature called "AI Tokens" that draws its usage data from the llm_total_tokens meter you created in the previous step.

Feature, plan, and rate card keys must match the pattern ^[a-z0-9]+(?:_[a-z0-9]+)*$ — lowercase letters, numbers, and underscores only. Hyphens are not allowed.

Create a Plan with Rate Cards

A plan bundles one or more features with pricing. This is where you define how much you charge.

In the Konnect dashboard:

Go to Metering & Billing > Product Catalog > Plans
Click Create Plan
Set the key to starter, the name to "Starter", currency to USD
Add a phase (the default billing period configuration)
Inside the phase, add a rate card:

*   Type: Usage-based

*   Feature: `ai_tokens`

*   Rate card key: `ai_tokens` (must match the feature key)

*   Price type: Unit

*   Unit price: `0.000002`

Save and publish the plan

The unit price is the cost per single token, not per thousand. For production pricing, you'd calculate this from your LLM provider's rates plus your margin. For example, if GPT-4o-mini costs you $0.15 per 1 million input tokens, that's $0.00000015 per token. With a 10x margin, you'd charge $0.0000015 per token. For testing, a higher value like 0.000002 makes the numbers easier to read on invoices.

Konnect also supports tiered pricing for volume discounts. When creating a rate card, choose "Tiered" instead of "Unit" and define graduated tiers. For example:

First 100,000 tokens at $0.000003/token
Everything above 100,000 at $0.000002/token

Graduated pricing means each tier applies to its own range (the discounted rate doesn't apply retroactively to earlier usage).

Step 7: Onboard a Customer and Create a Subscription

Before usage events can turn into invoices, you need two things: a customer in the billing system that matches your app user, and a subscription that binds that customer to a plan.

Create a Customer

curl -X POST $KONNECT_API_URL/customers \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Alice",
    "key": "user-001",
    "usage_attribution": {
      "subject_keys": ["user-001"]
    }
  }'

The usage_attribution.subject_keys array must match the subject field in your CloudEvents. Your app sends events with subject: "user-001", and this customer has "user-001" in its subject_keys. That's how the billing system knows which events belong to which customer.

This is the consumer-to-customer mapping, and it's the most common setup mistake. If these don't match, events will be ingested but never attributed to a customer, and no invoices will be generated.

Create a Subscription

A subscription binds a customer to a plan. Events that occurred before the subscription start date are not billed:

curl -X POST $KONNECT_API_URL/subscriptions \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "customer": {
      "key": "user-001"
    },
    "plan": {
      "key": "starter"
    }
  }'

The subscription references the customer and plan by their key values (you can also use their ULID id if you stored it from the creation response). Once the subscription is active, all future events for this customer will be billed according to the plan's rate cards.

Repeat this process for each user you want to bill. In production, you'd automate this as part of your user registration flow:

def onboard_customer(user_id: str, name: str, plan: str = "starter"):
    """Create a billing customer and subscription for a new user."""
    import httpx

    base_url = os.getenv("KONNECT_API_URL", "https://us.api.konghq.com/v3/openmeter")
    token = os.getenv("KONNECT_TOKEN")
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
    }

    # Create the customer
    customer_resp = httpx.post(
        f"{base_url}/customers",
        headers=headers,
        json={
            "name": name,
            "key": user_id,
            "usage_attribution": {
                "subject_keys": [user_id],
            },
        },
    )
    customer = customer_resp.json()

    # Create the subscription
    sub_resp = httpx.post(
        f"{base_url}/subscriptions",
        headers=headers,
        json={
            "customer": {"key": user_id},
            "plan": {"key": plan},
        },
    )

    return customer, sub_resp.json()

Step 8: Test the Full Billing Flow

Now test the complete pipeline end to end. Send several requests as different users and verify that usage is tracked, metered, and reflected in invoices.

Send Test Requests

# Alice (user-001) generates text
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Write a short product description for a task management app"}'

# Alice summarizes text
curl -X POST http://localhost:8000/api/summarize \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"text": "FastAPI is a modern, fast web framework for building APIs with Python based on standard Python type hints. It is designed to be easy to use and learn while also being highly performant. FastAPI automatically generates interactive API documentation and validates request data using Pydantic models."}'

# Bob (user-002) generates text
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user2_def456" \
  -d '{"prompt": "Draft a welcome email for a new SaaS customer"}'

# Bob analyzes text
curl -X POST http://localhost:8000/api/analyze \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user2_def456" \
  -d '{"text": "Our Q1 revenue was $2.3M, up 15% from Q4. Customer churn decreased to 3.2%. New sign-ups increased by 22%.", "query": "What are the key business metrics and trends?"}'

Check Aggregated Usage

To see how usage is stacking up per user, open the Konnect dashboard and go to Metering & Billing > Meters. Select the llm_total_tokens meter. You'll see aggregated usage broken down by the dimensions you defined (model and endpoint), filterable by customer.

You can also verify that events are being ingested by checking the meter's event count. If Alice's requests went through, you should see her user-001 subject with token totals reflecting the requests you just sent.

Check the Invoice

After the billing period ends (or you trigger an invoice draft manually in the dashboard), check what the customer owes under Metering & Billing > Customers > Alice > Invoices.

The invoice will show line items based on the rate card in Alice's plan. If she consumed 15,000 tokens at $0.000002/token, the line item would be $0.03.

You can also check Alice's entitlement access via the API:

curl $KONNECT_API_URL/customers/{alice_customer_id}/entitlement-access \
  -H "Authorization: Bearer $KONNECT_TOKEN"

This returns Alice's configured entitlements (boolean, static, or metered). Pure usage-based rate cards without an explicit entitlement won't appear here — for usage verification, use the Konnect dashboard or the meter listing endpoint. Use the customer ID (ULID) returned when you created the customer, not the customer key.

Step 9: Connect a Payment Provider

Once invoices are generating, you need a way to collect payment. Konnect Metering & Billing integrates with Stripe. You connect your Stripe account through the Konnect dashboard (a one-time OAuth flow), link billing customers to their Stripe profiles, and invoices sync automatically.

For the full Stripe setup walkthrough, see the Konnect Metering & Billing documentation.

Common Gotchas and Production Tips

Here are problems you'll likely hit when moving this to production, along with how to handle them:

Events only count after the subscription starts. This is the most common surprise. If a user sends 10,000 tokens worth of requests before you create their subscription, those tokens won't appear on any invoice. Always create the subscription before the user starts using the API (or at least on the same day).

Event deduplication is by id + source. If you retry a failed event with the same UUID and the same source string, the metering engine will silently drop the duplicate. This is good for reliability (safe retries), but it means you should never reuse event IDs across different events. Always generate a fresh UUID.

Price per unit means per single token. There's no "per 1,000 tokens" setting. If you want to charge $0.002 per 1,000 tokens, enter 0.000002 as the unit price. This trips up almost everyone on the first try.

Use background tasks for event ingestion. In the tutorial code, track_usage runs synchronously inside the request handler. In production, you'd want to push events to a background queue so that billing telemetry doesn't add latency to your API responses. FastAPI's BackgroundTasks is the simplest option:

from fastapi import BackgroundTasks

@app.post("/api/generate", response_model=AIResponse)
def generate_text(
    request: GenerateRequest,
    user: dict = Depends(authenticate),
    background_tasks: BackgroundTasks = BackgroundTasks(),
):
    response = openai_client.chat.completions.create(...)

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    # Send the billing event in the background
    background_tasks.add_task(
        track_usage,
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/generate",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )

Test with recognizable token amounts. During testing, set the unit price to 1 (one dollar per token) so that the math is obvious. If a request uses 247 tokens, the invoice should show $247. Once the flow works, switch to production pricing.

What You Learned

In this tutorial, you built a complete usage-based billing system for an AI agent. Here's what you covered:

Built an AI API with FastAPI that wraps OpenAI's chat completions
Added user authentication so every request is tied to a specific user
Sent CloudEvents with token usage data to the Konnect Metering & Billing API on every request
Created meters that aggregate raw events into per-user usage totals
Defined features and pricing plans that turn metered usage into billable line items
Onboarded customers and created subscriptions to start the billing lifecycle
Tested the full flow from API request to invoice generation
Connected a payment provider for invoice collection

The core pattern is portable: capture a usage event at the moment something billable happens, send it to a metering system, and let the billing layer handle pricing and invoicing. You can apply this same approach to any metered resource, not just LLM tokens. API calls, documents processed, images generated, compute minutes, storage bytes. The CloudEvents format and the metering/billing pipeline work the same way.

Where to Go Next

Here are some things you'd add for a production deployment:

Entitlements and access control: Set monthly token limits per plan tier and enforce them at the API level. When a user hits their limit, return a 429 Too Many Requests response instead of burning through your LLM budget.
Multi-model pricing: Price input tokens and output tokens differently, and vary pricing by model. Output tokens from GPT-4o cost roughly 3x more than input tokens. Your rate cards should reflect that.
Real-time usage dashboards: Expose a /api/usage endpoint so users can check their own consumption. Use the entitlement access API to return their current-period totals and remaining allowances.
Webhook notifications: Send alerts when users hit 80% of their plan limit, when invoices are generated, or when payments fail.
Free tier with upgrade path: Create a plan with a first tier of 10,000 free tokens, then usage-based pricing after that. Graduated tiered pricing handles this automatically.
Dead-letter queue for failed events: If event ingestion fails, queue the event for retry rather than dropping it silently.

💰Monetize Your AI Agents with LangChain and Kong

Teja Kummarikuntla — Tue, 05 May 2026 15:26:47 +0000

Say you built an AI agent and customers are starting to pay for it. Sooner or later you'll want to charge them by what they actually use, because some customers hammer the agent all day while others send a handful of messages a week. A single flat fee loses money on the heavy users and overcharges the light ones.

The billing problem is the same whether your agent runs on your own model (self-hosted, fine-tuned, or trained from scratch) or calls a third-party API like OpenAI, Anthropic, or Gemini. You still need to know which customer made which call, count the tokens it used, and turn that into a dollar amount on a real invoice. That mapping (request → customer → token count → dollar amount → invoice) is yours to build, and that's what this tutorial sets up.

The agent uses LangChain, which sits one layer above the model so the same metering code works regardless of what's behind it. The example runs on OpenAI's gpt-4o-mini for convenience, but swap the chat model and nothing else changes. A small LangChain callback records each call's input and output token counts, tagged with the customer ID. Those records flow to Kong Konnect Metering & Billing, which keeps a running per-customer tally, applies your prices (input and output tokens can be priced separately), and produces invoices on a monthly cycle.

See it in action first

Before getting into the setup, here is what the finished pipeline looks like end to end. The agent runs on one side and reports the tokens it just used. Those same tokens land as a billable line item on the customer's invoice in Kong on the other.

The AI Agent App

The user types Hello world. The agent replies with Hello! How can I assist you today?. Both ends happen to land on 9 tokens. The input count is 9 rather than 2 because OpenAI wraps the prompt in chat-message formatting, which adds a few more beyond the literal words. The output landing on 9 as well. The agent fires off one record for the input tokens and another for the output tokens, both tagged with the customer (acme).

Metering and Billing the Agent in Kong

The same call now sits there as a real billable line item. With a simple test pricing of $1 per input token and $2 per output token, the math lines up:

Input: 9 tokens × $1 = $9
Output: 9 tokens × $2 = $18
Total: $27

Same numbers on both sides of the pipeline. That is what we are about to build.

Let's go through it step by step.

AI Agent App: github.com/tejakummarikuntla/llm-metering-langchian-kong.

Architecture

Every LLM call produces two CloudEvents. One carries the prompt token count, the other carries the response token count. Both events carry a subject field set to the customer identifier. Kong groups events by subject, sums the token field, multiplies by the rate card configured on the customer's plan, and rolls everything into invoices on the billing cycle.

Why this stack

Kong Konnect Metering & Billing fits this tutorial for three specific reasons:

Open source core. The metering side is built on OpenMeter, which is open source. You can self-host the metering pipeline, or use the managed Konnect service.
Configurable billing engine. Meters, features, plans, rate cards, and subscriptions are first-class primitives, configured in the portal rather than shipped as code.

You're not replacing Stripe here; you're using Kong as the metering and invoicing layer that feeds it.

What you will build

A LangChain callback handler that emits two CloudEvents per LLM call
A Kong meter that filters kong.llm_request events and sums the tokens field
Two features (input and output tokens) feeding a plan with separate rate cards
A customer subscribed to that plan, with metered usage and dollar values in the Konnect portal

Prerequisites

Node.js 22.6 or higher
pnpm: npm install -g pnpm
An OpenAI API key
A free Kong Konnect account: konghq.com
A Konnect Personal Access Token with Metering & Billing write permissions

Tutorial map

Part 1: Add Metering into the AI Agent app

Clone the AI agent app
Configure environment variables
Walk through the codebase
Run the AI Agent app

Part 2: Connect to Kong Metering & Billing

Create a Meter in Kong M&B
Create Features for input and output tokens
Create a Plan with Rate Cards
Create the Customer
Add a Subscription
Inspect usage and Invoices
Connect a Payment provider

Part 1: Add Metering into the AI Agent app

Clone the AI Agent app

git clone https://github.com/tejakummarikuntla/llm-metering-langchian-kong
cd llm-metering-langchian-kong
pnpm install

The reference is two TypeScript files. handler.ts is the metering callback. index.ts is a small chain that reads a prompt from stdin so you have something to exercise the handler with. No sidecar service, no separate ingestion worker, no extra runtime dependency beyond LangChain and the OpenAI client.

Configure environment variables

cp .env.example .env

Open .env and fill in real values:

API_URL=https://us.api.konghq.tech/v3/openmeter/events
API_KEY=your-konnect-personal-access-token
SUBJECT=acme
MODEL=gpt-4o-mini
OPENAI_API_KEY=your-openai-api-key

Variable	Purpose
`API_URL`	Kong Konnect ingestion endpoint. The default is the US region. EU organizations use `https://eu.api.konghq.tech/v3/openmeter/events`.
`API_KEY`	Konnect Personal Access Token with Metering & Billing write scope.
`SUBJECT`	Customer identifier attached to every event. Use `acme` for testing. In production this comes from your authenticated session, not an env var.
`MODEL`	Any chat-completion model. `gpt-4o-mini` keeps testing cheap.
`OPENAI_API_KEY`	Standard OpenAI API key.

Walk through the codebase

MeteringCallbackHandler extends LangChain's BaseCallbackHandler and implements two of its lifecycle hooks. Callbacks fire at the same place token counts are reported, you do not need to subclass the LLM client, and the LangChain runId gives you a stable event ID for free.

handleLLMStart

This hook fires immediately before the model is called. The handler captures run metadata so the LLM end hook can build a CloudEvent with the right customer attribution:

async handleLLMStart(
  _llm: Serialized,
  _prompts: string[],
  runId: string,
  parentRunId?: string,
  _extraParams?: Record<string, unknown>,
  _tags?: string[],
  metadata: Record<string, unknown> = {},
) {
  if (parentRunId) {
    const parentMetadata = this.runMetadata.get(parentRunId);
    if (parentMetadata) {
      Object.assign(metadata, parentMetadata);
    }
  }
  this.runMetadata.set(runId, metadata);
}

The parent run check matters. LLM calls almost always run inside a chain, agent, or tool-calling flow, which LangChain models as a parent run. When you set metadata at chain.invoke({}, { metadata: { subject: 'acme' } }), LangChain attaches it to the chain run, not the child LLM run. Without merging parent metadata into the child, the LLM end hook reads an empty metadata object and the subject is lost.

handleLLMEnd

This hook fires after the model returns. The handler reads token counts from output.llmOutput.tokenUsage (the field OpenAI fills on non-streaming completions), builds two CloudEvents, and posts each to the Kong ingestion endpoint:

async handleLLMEnd(output: LLMResult, runId: string) {
  const { promptTokens = 0, completionTokens = 0 } =
    output.llmOutput?.['tokenUsage'] ??
    output.llmOutput?.['estimatedTokenUsage'] ??
    {};

  if (!(promptTokens > 0 || completionTokens > 0)) return;

  const metadata = this.runMetadata.get(runId) ?? {};
  const { subject, ls_model_name, ls_provider, ls_model_type, ...data } = metadata;

  const inputEvent = {
    specversion: '1.0',
    id: `${runId}-input`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'input', tokens: promptTokens, model: ls_model_name, provider: ls_provider },
  };

  const outputEvent = {
    specversion: '1.0',
    id: `${runId}-output`,
    source: 'langchain',
    type: 'kong.llm_request',
    subject,
    data: { ...data, type: 'output', tokens: completionTokens, model: ls_model_name, provider: ls_provider },
  };

  await this.ingest(inputEvent);
  await this.ingest(outputEvent);
}

A few decisions in this block matter for production:

The id field combines the LangChain runId with -input or -output. Kong deduplicates events by id plus source, so retries do not double-bill.
data.type separates input from output tokens at the event level. That separation is what makes per-token-class pricing possible without running two meters.
Anything you pass in metadata at chain.invoke time spreads into data. Tenant tier, region, feature flag: add it once at invoke time and filter on it in the meter. No handler changes.
ingest is a plain fetch POST with a Bearer token header. No SDK, no batching layer.

Read the agent entry point

index.ts wires ChatOpenAI up with the metering handler and runs a small one-shot chain:

const handler = new MeteringCallbackHandler(apiUrl, apiKey);

const llm = new ChatOpenAI({
  model,
  apiKey: openaiApiKey,
  callbacks: [handler],
});

const chain = PromptTemplate.fromTemplate('{input}')
  .pipe(llm)
  .pipe(new StringOutputParser());

const result = await chain.invoke(
  { input: userInput },
  {
    metadata: {
      subject,
      kong: 'strong',
    },
  },
);

Two lines do the integration. callbacks: [handler] on the ChatOpenAI instance attaches the handler to every call made through it. The metadata block on chain.invoke carries the customer identifier into the run metadata that handleLLMStart reads. The kong: 'strong' field is just a metadata pass-through demonstration: anything you add in that block lands inside data on the CloudEvent.

Run the AI Agent app

Start the app:

pnpm start

Type a prompt:

You: Explain how token-based usage billing works for LLM applications.

The handler logs both events as it sends them:

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-input',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'input', tokens: 18, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

AI: Token-based usage billing charges customers based on the number of tokens consumed...

MeteringCallbackHandler: ingesting event {
  specversion: '1.0',
  id: '019dd41b-f14e-705b-a4dd-894bd025c73d-output',
  source: 'langchain',
  type: 'kong.llm_request',
  subject: 'acme',
  data: { kong: 'strong', type: 'output', tokens: 156, model: 'gpt-4o-mini', provider: 'openai', model_type: 'chat' }
}

Both events are in Kong. They will not appear in a customer's usage view or invoice until Part 2 is set up.

Part 2: Connect to Kong Metering & Billing

The next sections build the meter, features, plan, and subscription that turn the raw event stream into priced, per-customer usage. The flow follows the Konnect M&B concepts model: events feed meters, meters feed features, features attach to plans through rate cards, customers subscribe to plans.

Open cloud.konghq.com and confirm you're in the region matching API_URL.

Create the LLM Tokens meter

A meter is a continuously-running query over the event stream. It picks events that match a filter, applies an aggregation, and exposes the result as a numeric usage value.

In the Konnect console:

Left navigation: Metering & Billing → Metering
Top right: Create Meter
Choose template: LLM Tokens

The LLM Tokens template fills in the right defaults for this handler:

Event type filter: kong.llm_request (matches the type field on every CloudEvent the handler emits)
Aggregation: Sum
Value property: tokens (reads data.tokens)

Click Save.

CLI alternative

The same meter can be created through the Konnect API:

curl -X POST https://us.api.konghq.tech/v3/openmeter/meters \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "LLM Tokens",
    "key": "llm-tokens",
    "description": "LLM token usage",
    "event_type": "kong.llm_request",
    "aggregation": "SUM",
    "value_property": "$.tokens",
    "dimensions": { "type": "$.type", "provider": "$.provider", "model": "$.model" }
  }'

Create Features

Features turn a single meter into multiple billable units. You need two: one exposing only input-token events, one exposing only output-token events. The split is what makes asymmetric pricing possible (most providers charge more for output than input).

Left navigation: Product Catalog → Features.

Input token feature

Click Create Feature:

Name: Input Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens (from the dropdown)
Meter Group Filters: add a single filter
- Field: type
- Operator: equals
- Value: input

Save.

Output token feature

Same form, output values:

Name: Output Token
Key: auto-fills from the name (input_token)
Meter: LLM Tokens
Meter Group Filter: type equals output

Save.

The same meter now feeds two features, each filtered to a different event subset.

Create a Plan with usage-based Rate Cards

A plan is what a customer subscribes to. Inside it, rate cards attach prices to features.

Product Catalog → Plans → New Plan:

Name: Pro
Click Save

Inside the new plan, add two rate cards.

Input token rate card

Click Add Rate Card and select the input token feature:

Pricing model: Usage-based
Price per unit: 1

Two notes about this field that bite people on the first run.

First, price per unit is the price for a single token. Not per thousand, not per million. There is no toggle that switches the unit. Production rates are decimals like 0.000003. The example uses 1 here so the dollar values on the test invoice are large and obvious.

Second, the pricing model selector decides whether the feature is metered or flat. Choosing flat-fee here would charge a fixed amount per cycle regardless of usage, which is the opposite of what you want for a metered feature.

Output token rate card

Click Add Rate Card again, select output token:

Pricing model: Usage-based
Price per unit: 2

Save. Output tokens now cost twice the input rate, which roughly mirrors how OpenAI and most other providers price the underlying API.

Create the customer

The customer record needs to be created manually. The subject field on every CloudEvent ties a token usage event to a specific customer through the customer's key, so the key has to match the SUBJECT value in your .env (acme in this tutorial).

Left navigation: Metering & Billing → Billing → Customers. Top right: Create new.

Fill in the form:

Name: acme (display name shown in the portal)
Key: acme (must match the SUBJECT env value)

Click Save.

The customer is now in the system but does not have any plan attached yet. Token events tagged with subject: acme will associate to this record once a subscription is in place.

Add a subscription

A subscription connects this customer to the Pro plan you built earlier. Without it, events still flow into the meter but never produce invoice line items.

Open the acme customer page and switch to the Subscriptions tab. Click Create subscription.

Step 1 of the wizard: pick the plan.

Subscription plan: Pro (the plan with input-token and output-token rate cards)

Click Next.

Step 2: timing and billing cycle. Defaults are fine for testing.

Start subscription: Immediately
Bill: Monthly
Starting: Start of subscription

Click Next, then Start subscription on the confirmation step.

The subscription is now active. The next call from pnpm start lands inside an active billing window and rolls into an invoice.

Track usage and invoices

Run the agent a few times with prompts long enough that the response is more than a handful of tokens, otherwise the input and output counts can look almost identical.

pnpm start

Back in Konnect, open the acme customer page from the Billing section and switch to the Invoicing tab.

The view shows the active plan, both rate cards, accumulated usage per feature, and the running invoice total. With test rates of $1 per input token and $2 per output token, even four prompts produce a dollar value that is easy to verify against the handler's logged token counts. Switch to production decimals like 0.0000015 and 0.000006 and the same view continues to work, just with smaller numbers.

Connect a payment provider

The metering and billing layer ends at invoice generation. Actually charging the customer needs a payment provider.

Konnect connects to providers like Stripe to:

Sync customer payment methods between Konnect and the provider
Charge invoices automatically when the billing cycle closes
Handle dunning, retries, and failed payments

The metering pipeline doesn't change when payment providers change. Kong owns usage aggregation and invoice generation. The provider only handles collection. That separation makes it possible to support multiple providers, switch between them, or test with one provider in staging and another in production without touching any code.

Gotchas

Input and output token counts that look identical. Short prompts can produce the same input and output token count by coincidence. The input count includes chat message formatting overhead (role markers, message delimiters) added by OpenAI before the prompt reaches the model, so a two-word prompt is rarely two tokens. Use a longer prompt to see the counts diverge clearly.

Events appear in the meter but not in invoices. The subscription started after the events were ingested. Kong only invoices events that fall inside an active subscription window. Run the app again after creating the subscription.

subject missing warning in the logs. The handler logs could not find 'subject' in run metadata when the metadata block doesn't include a subject. Check that .env exists (not just .env.example), that SUBJECT is set, and that the metadata block in index.ts reads subject from the env variable.

EU vs US endpoint. The default API_URL is the US endpoint. EU Konnect organizations need https://eu.api.konghq.tech/v3/openmeter/events. Wrong region produces silent ingestion failures. Confirm the region from Konnect organization settings.

Event deduplication. Kong deduplicates by id plus source. Replaying the same event twice produces one record, not two. The handler builds id from the LangChain runId, so this is rarely an issue in normal use, but worth knowing if events are being replayed or generated outside this handler.

Production checklist

The reference app demonstrates the mechanics. A production setup needs a few real changes.

subject from auth, not env. Replace SUBJECT=acme with a value pulled from the authenticated user session. Each chain invocation passes the real customer ID into the metadata block.
Per-model pricing. Add model to the meter group filters on each feature and run different rate cards per model. GPT-4o, GPT-4o-mini, Claude, and others can all be priced independently while sharing one meter.
Custom segmentation. Any field added to the metadata block lands in data on the CloudEvent. Add tenant tier, region, or provider and filter or group on them in the meter to bill differently per segment.
Usage alerts. Once events flow, configure usage thresholds in Kong to notify customers, throttle them, or pause subscriptions when they hit a limit.
Idempotent retries. The handler doesn't retry failed ingest() calls. Wrap fetch with a small retry layer (exponential backoff, max attempts) to handle transient network errors without losing billable events. Kong's deduplication on id + source makes safe retries straightforward.

The full reference AI Agent app is at https://github.com/tejakummarikuntla/llm-metering-langchian-kong. Clone, configure, and the metering pipeline runs locally in a few minutes. Adding it to an existing LangChain agent is a single line: callbacks: [handler] on the LLM client. Everything else is Kong configuration.

What's the trickiest part of metering an AI agent in production for you? Streaming responses, multi-model pricing, or per-tenant segmentation? Drop a comment.

💰I Built a Token Billing System for My AI Agent - Here's How It Works

Teja Kummarikuntla — Tue, 31 Mar 2026 15:39:56 +0000

I've been building an AI agent that routes requests across multiple LLM providers, OpenAI, Anthropic etc., based on the task. But pretty quickly, I hit a real problem: how do you charge for this fairly?

Flat subscriptions didn't make sense. Token costs vary by model, input vs output, and actual usage. A user generating a two-line summary isn't the same as someone churning out 3,000-word articles, yet flat pricing treats them the same.

I looked at a few options for usage-based billing. Stripe Billing has metered subscriptions but you have to build your own token tracking pipeline on top. Orb and Metronome are good, but they're separate vendors, you'd still need something to capture token data from your LLM calls and pipe it in. What I wanted was something at the gateway level, where the traffic already flows.

I ended up using Kong AI Gateway with Konnect Metering & Billing (built on OpenMeter). The gateway proxies every LLM request, so it already knows the token counts. The metering layer plugs directly into that. No separate vendor, no custom pipeline.

So instead of debating about pricing models, I set up the billing layer. A working system where every API request flows through a gateway, gets tracked, and is priced based on real usage:

🚧 Route requests through AI Gateway
🪙 Tokens get metered per consumer
💵 Pricing gets applied
🧾 Invoice generated

Here's the whole setup, step by step.

Set up the gateway
Step 1: Create a consumer
Step 2: Configure the AI Proxy
Step 3: Enable token metering
Step 4: Create a feature
Step 5: Create a plan with a rate card
Step 6: Create a subscription
Step 7: Validate the invoice
Step 8: Connect Stripe

The Setup

The billing pipeline has three layers:

Kong AI Gateway proxies the LLM requests. It sits between the app and the provider, handles auth, and this is the part that matters for billing, it logs token statistics for every request.

Konnect Metering & Billing (this is built on OpenMeter) takes those token events and aggregates them per consumer, per billing cycle. It supports defining features, pricing models, and plans on top of the raw usage data.

Stripe collects payment. The metering layer generates invoices that sync to Stripe.

Let me walk through each piece.

Prerequisites

You can do this entirely through the UI or via CLI. I'll cover both as we go.

A Kong Konnect account
An OpenAI API key (or any LLM provider key of your choice)

For CLI, you'll also need decK (v1.43+) installed and a PAT from Kong Konnect.

Set Up the Gateway

Once you log in, click on API Gateway and create one.

I'm using Serverless here. You can choose Self-managed too. Enter the gateway name as ai-service and click Create and configure. Once that's done, click Add a service and route and fill in:

Service Name: ai-service
Service URL: http://httpbin.konghq.com/anything
Route Name: ai-chat
Route Path: /chat

CLI

If you prefer the command line, generate your PAT and run:

export KONNECT_TOKEN='your_konnect_pat'
curl -Ls https://get.konghq.com/quickstart | bash -s -- \
  -k $KONNECT_TOKEN --deck-output

This gives you a running Kong Gateway connected to Konnect. It'll output some environment variables, export them as instructed. You'll also need:

export DECK_OPENAI_API_KEY='your_openai_api_key'

Then set up the service and route:

_format_version: "3.0"
services:
  - name: ai-service
    url: http://httpbin.konghq.com/anything
routes:
  - name: ai-chat
    paths:
      - "/chat"
    service:
      name: ai-service

Apply it with deck gateway apply. Now you have a route at /chat that we'll wire up to an LLM.

Step 1: Create a Consumer

You can't bill anyone if the gateway doesn't know who is making the request. Consumers are how Kong identifies API callers. Later, we'll map each consumer to a billing customer.

Add a consumer with a key-auth credential:

You can enter the Key value as acme-secret-key.

Now, you need to add the key-auth plugin to the service so the gateway actually requires authentication:

Click on Plugins in the left sidebar
Click on New Plugin
Select Key Authentication from the plugin list
Select Service as the scope or keep it as Global
Click Save

CLI

_format_version: "3.0"
consumers:
  - username: acme-corp
    keyauth_credentials:
      - key: acme-secret-key

Then enable the key-auth plugin on the service so the gateway actually requires authentication:

_format_version: "3.0"
plugins:
  - name: key-auth
    service: ai-service
    config:
      key_names:
        - apikey

Apply both with deck gateway apply.

Now every request to /chat must include an apikey header. The gateway identifies the caller as acme-corp, and that identity flows through to metering. Without this step, usage events have no subject. They're anonymous, and you can't attribute them to anyone.

Step 2: Configure the AI Proxy

Next, wire the route to an actual LLM. The AI Proxy plugin accepts requests in OpenAI's chat format and forwards them to the configured provider.

Navigate to Plugins
Click on New Plugin
Select AI Proxy from the plugin list

Following the below yaml for CLI and configure the plugin fields accordingly:

_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer ${{ env "DECK_OPENAI_API_KEY" }}
      model:
        provider: openai
        name: gpt-4o
      logging:
        log_payloads: true
        log_statistics: true

Two things to note here:

log_statistics: true is what makes billing possible. Without it, the gateway proxies requests but doesn't record token counts. When enabled, it captures prompt tokens, completion tokens, and total tokens on every response. This is the data that metering consumes downstream.

log_payloads: true logs the actual request/response content. This is optional and useful for debugging, but you'd probably turn it off in production for privacy reasons.

Apply with deck gateway apply and test:

curl -X POST "$KONNECT_PROXY_URL/chat" \
  -H "Content-Type: application/json" \
  -H "apikey: acme-secret-key" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a mathematician."},
      {"role": "user", "content": "What is 1+1?"}
    ]
  }'

You should get a response from GPT-4o. The gateway handled auth, forwarded the request, and logged the token statistics.

If you want to proxy multiple providers (say, OpenAI and Anthropic with automatic failover), you'd use [ai-proxy-advanced](https://developer.konghq.com/plugins/ai-proxy-advanced/) instead with a load balancing config. I stuck with a single provider here to keep the billing walkthrough focused.

Step 3: Enable Token Metering

Now we connect the gateway's token logs to the metering system.

In Konnect, go to Metering & Billing in the sidebar. You'll see an AI Gateway Tokens section. Click Enable Related API Gateways, select your control plane (the quickstart one), and confirm.

This activates a built-in meter called kong_konnect_llm_tokens. It uses SUM aggregation on the token count, grouped by:

$.model : which LLM handled the request
$.type : whether the tokens are input (request) or output (response)

The grouping matters because LLM providers charge differently for input vs. output tokens. Output tokens are typically 3-5x more expensive because input can be parallelized across GPUs while output generation is sequential, each token depends on all previous tokens. If your metering doesn't split these, your pricing will be wrong.

At this point, every authenticated request through the AI Gateway generates a usage event that gets aggregated by the meter. But usage alone doesn't generate invoices. You need to define what's billable and how it's priced.

Step 4: Create a Feature

A feature is the link between raw metered data and something that appears on an invoice. Without it, usage is tracked but never billed.

Go to Metering & Billing → Product Catalog → Features and create one:

Name: ai-token
Meter: AI Gateway Tokens
Group by filters:
- Provider = openai
- Type = request (this tracks input tokens; you'd create a separate feature for output tokens if you want to price them differently)

The filters narrow the meter to a specific slice of usage. In a real setup, you'd likely create multiple features, one per model, one per token direction, to apply different rates. For this walkthrough, I'm keeping it to one feature to show the flow.

Step 5: Create a Plan with a Rate Card

Plans bundle features with pricing. Go to Product Catalog → Plans and create one:

Name: Starter
Billing cadence: 1 month

Add a rate card:

Feature: ai-token
Pricing model: Usage Based
Price per unit: 1
Entitlement type: Boolean (grants access to the feature)

A note on what "price per unit" means here: 1 unit = 1 token, because the meter SUMs individual tokens. So entering 1 means $1.00 per token, which is way too expensive for real use. I'm using it here because the official tutorial does the same thing: a round number that makes invoice changes easy to spot during testing.

For production, you'd enter something like 0.000003 for GPT-4o input tokens ($3.00 per 1M tokens) or 0.00001 for GPT-4o output tokens ($10.00 per 1M tokens). There's no "per 1,000" toggle in the UI. You do the math yourself and enter the per-token price as a decimal.

Publish the plan. It's now available for subscriptions.

Step 6: Create a Customer and Start a Subscription

This is where the consumer from Step 1 connects to the billing system.

Go to Metering & Billing → Billing → Customers and create one:

Name: Acme Corp
Include usage from: select the acme-corp consumer

This mapping is what ties gateway traffic to a billable entity. The consumer handles identity at the gateway level; the customer handles identity at the billing level. They're separate concepts joined here.

Now create a subscription:

Go to the Acme Corp customer, then Subscriptions → Create a Subscription
Plan: Starter
Start the subscription

One important detail: metering only invoices events that occur after the subscription starts. If you sent test requests before creating the subscription, those tokens won't appear on any invoice. I spent some time confused by this before finding it in the docs.

Step 7: Validate the Invoice

Send a few requests through the gateway:

for i in {1..6}; do
  curl -s -X POST "$KONNECT_PROXY_URL/chat" \
    -H "Content-Type: application/json" \
    -H "apikey: acme-secret-key" \
    --json '{
      "messages": [
        {"role": "user", "content": "Explain what a Fourier transform does in two sentences."}
      ]
    }'
  echo ""
done

Wait a minute or two for the events to propagate, then go to Metering & Billing → Billing → Invoices. Click on Acme Corp, go to the Invoicing tab, and hit Preview Invoice.

You should see the ai-token feature listed with the aggregated token count and the calculated charge based on your rate card. That's the billing pipeline working end to end, from an API request to a line item on an invoice.

Connecting Stripe

Konnect syncs invoices to Stripe, which handles payment collection, receipts, and retry logic for failed payments. You connect your Stripe account in the Metering & Billing settings, and invoices flow through automatically at the end of each billing cycle.

The result for end users is a transparent invoice showing exactly what they consumed: token count, model, rate applied. Not a flat fee with no breakdown.

## Things I Ran Into

The consumer-customer mapping confused me at first. Kong Gateway has "consumers" (API identity). Metering & Billing has "customers" (billing identity). They're separate. You create both, then link them. If you skip the consumer or forget to link it, usage events come in but they're not attributed to anyone billable. Set this up before you start sending traffic.

Input vs. output pricing is a bigger deal than I expected. Output tokens from OpenAI's GPT-4o cost $10.00/1M vs. $2.50/1M for input. If you use a single flat rate for "tokens," you'll underprice output-heavy workloads significantly. Splitting features by token type (request vs. response) and pricing them separately is worth the extra configuration.

The order of operations matters. Specifically: create the consumer and link it to a customer before you start sending traffic you care about billing for. Events that arrive before a subscription exists don't retroactively appear on invoices.

Where I'd Take This Next

This walkthrough uses a single provider and a single feature. A production setup would look more like:

Multiple features: one per model per token direction (GPT-4o input, GPT-4o output, Claude input, Claude output)
Tiered pricing: lower per-token rates at higher usage thresholds to incentivize growth
Entitlements with metered limits: cap total tokens per month per plan tier, so you can offer Starter (500K tokens), Pro (5M tokens), Enterprise (unlimited)
AI Proxy Advanced: route across multiple providers with load balancing (lowest-latency, round-robin, or cost-based routing)

The docs for all of these are at developer.konghq.com/metering-and-billing and developer.konghq.com/ai-gateway.

If you're building an AI agent and thinking about how to charge for it, I'd be curious to hear your approach. Per-token, credits, flat rate? What's working, what's not? Drop your thoughts in the comments.

DEV Community: Kong

I Turned Hermes Into a Paid AI Agent, Then Billed Every Token and Tool Call

Set up Hermes

Option A: hosted (Nous Research API)

Option B: local and free (Ollama)

Table of contents

Prerequisites

Part 1: The Hermes agent

Set up the project

The tools

Meter tokens and tool calls

The agent loop

Run it

Part 2: The billing setup

Provision Kong with one script

Create the two meters

CLI

Create the features

CLI

Create a plan with rate cards

CLI

Create the customer and subscribe

CLI

Run the agent and read the invoice

Where I'd take this next

How would you price an agent?

How to Set Up Per-Agent Billing for CrewAI Agents with Kong

Architecture

What you'll build

Part 1: The Python app (CrewAI)

Part 2: The billing setup (Kong Konnect Metering & Billing)

Files in the repo

Prerequisites

Steps

Table of contents

Set up the project

Define the research crew

Subscribe to LLMCallCompletedEvent

Run the crew and see per-agent tokens

Provision Kong with one script

Create the meter

CLI

Create one feature per agent role

CLI

Create a plan with three rate cards

CLI

Create the customer and subscribe

CLI

Run the crew again and check usage

How would you price your crew?

Usage-Based Billing for AI Agents with FastAPI and Kong

What is Usage-Based Billing?

The Tools

What You'll Build

Prerequisites

Table of Contents

Step 1: Set Up the Project

Step 2: Build the AI API

Step 3: Add API Key Authentication

Step 4: Send Usage Events to Konnect Metering & Billing

What are CloudEvents?

Create the Billing Module

Track Tokens on Every Request

Step 5: Set Up Meters

What is a Meter?

Create a Token Usage Meter

Verify Events Are Flowing

Step 6: Define Features and Pricing Plans

Create a Feature

Create a Plan with Rate Cards

Step 7: Onboard a Customer and Create a Subscription

Create a Customer

Create a Subscription

Step 8: Test the Full Billing Flow

Send Test Requests

Check Aggregated Usage

Check the Invoice

Step 9: Connect a Payment Provider

Common Gotchas and Production Tips

What You Learned