Teja Kummarikuntla

for Kong

Posted on Jun 2

How to Set Up Per-Agent Billing for CrewAI Agents with Kong

#ai #programming #webdev #crewai

Setting up billing for a single AI agent is easy. The agent uses tokens, you multiply by a price, you send an invoice. Setting up billing for a CrewAI crew is more challenging. A crew has multiple agents working together. Each agent uses tokens differently. Roll them all into one number and you can't tell which agent drove the cost.

In this tutorial, we will build per-agent token billing for a CrewAI multi-agent app. We will track token usage per agent role in CrewAI, send the usage to Kong Konnect Metering & Billing (the managed version of OpenMeter), and turn one crew run into three invoice line items, one per agent.

Here is why this matters. In my CrewAI research crew, the Writer agent uses about twice as many tokens as the Researcher agent. A flat per-token price overcharges Researcher-heavy runs and undercharges Writer-heavy runs. Per-agent billing fixes that. Each agent gets its own meter slice, its own filter, and its own price.

This is a common need for any multi-agent SaaS product, any team trying to monetize CrewAI agents, and any team setting up usage-based billing for AI agents. The same pattern works for LangChain agents, AutoGen crews, or any multi-agent framework that exposes per-call token usage.

Here's how the billing looks for each agent of your CrewAI in Kong Metering and Billing, you will be able to ahcieve this by the end of this tutorial.

The full app is about 200 lines of Python. Setup takes about 30 minutes end to end.

The full reference repo: github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB. Clone it if you want to skim the working code first, or follow the steps below and build it file by file.

git clone https://github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB.git
cd Billing-CrewAI-with-KongMB

Architecture

Every LLM call produces two events. One for the prompt (input) tokens, one for the completion (output) tokens. Both events carry the agent_role. Kong's meter groups token usage per agent. Each feature pulls one agent's slice out of the meter. The plan attaches a per-token price to each feature. The invoice ends up with three line items, one per agent.

What you'll build

This tutorial has two parts: a Python app that uses CrewAI, and a set of resources you configure in Kong Konnect Metering & Billing.

Part 1: The Python app (CrewAI)

A research crew with three agents. Researcher, Analyst, and Writer. Each agent has its own role, goal, and backstory. They run sequentially: Researcher gathers facts, Analyst picks the key insights, Writer turns the insights into a one-page briefing.
A billing listener that captures every LLM call. This is a small Python class called KongBillingListener. It subscribes to CrewAI's event bus. CrewAI fires a notification called LLMCallCompletedEvent every time an agent makes an LLM call. Our listener catches that event, reads the token count and the agent's role, and sends a usage event to Kong.
An entry-point script. Loads the API keys, builds the crew, runs it, and prints a per-agent token summary.

Part 2: The billing setup (Kong Konnect Metering & Billing)

A meter. A meter is a rule that tells Kong which incoming events to count. We create one meter that listens for crewai.llm_call events and sums the tokens.
Three features, one per agent role. A feature is a named "slice" of a meter, filtered by a dimension value. We create one feature for Researcher tokens, one for Analyst tokens, one for Writer tokens. Each feature filters the meter by agent_role.
A plan with three rate cards. A plan groups features and assigns prices. Our plan is called CrewAI Research Pro. It charges $0.0001 per Researcher token, $0.0002 per Analyst token, $0.0005 per Writer token.
A customer and an active subscription. The customer is acme. The subscription connects the customer to the plan. Usage and invoice values then show up in the Konnect portal.

Files in the repo

File	What it does
`crew.py`	Builds the three agents (Researcher, Analyst, Writer), defines their tasks, and wires them into a sequential `Crew`. The agent `role` strings are what end up tagged on every billing event.
`billing.py`	`KongBillingListener` subclasses CrewAI's `BaseEventListener`, subscribes to `LLMCallCompletedEvent`, and POSTs one CloudEvent per token bucket (input + output) to Kong M&B. Tracks per-agent totals in memory for the run summary.
`main.py`	Entry point. Loads `.env`, instantiates the listener, builds the crew, runs `kickoff()`, and prints the final briefing plus per-agent usage.
`setup_kong.py`	One-shot provisioner. Creates the meter, three filtered features, plan, customer, and active subscription via the Kong M&B API. Pass `--teardown` to clean up an earlier run before recreating.
`requirements.txt`	Three deps: `crewai`, `httpx`, `python-dotenv`. No LiteLLM, no LangChain.
`.env.example`	Template for the four secrets and three config values.

Prerequisites

Python 3.10, 3.11, 3.12, or 3.13 (CrewAI requires Python below 3.14)
An OpenAI API key
A free Kong Konnect account: konghq.com
A Konnect Personal Access Token with Metering & Billing write permissions

Steps

🧑‍💻 Part 1: Build the Python app (CrewAI)

Set up the project
Define the research crew
Subscribe to LLMCallCompletedEvent
Run the crew and see per-agent tokens

🧾 Part 2: Set up billing in Kong Metering & Billing

Provision Kong with one script (or skip to the manual path)
Create the meter
Create one feature per agent role
Create a plan with three rate cards
Create the customer and subscribe
Run the crew again and check usage

Set up the project
Define the research crew
Subscribe to LLMCallCompletedEvent
Run the crew and see per-agent tokens
Provision Kong with one script
Create the meter
Create one feature per agent role
Create a plan with three rate cards
Create the customer and subscribe
Run the crew again and check usage

Set up the project

Create a new folder and a Python virtual environment:

mkdir crewai-mb && cd crewai-mb
python3.12 -m venv .venv
source .venv/bin/activate

Three pinned dependencies. No LangChain, no LiteLLM, nothing hidden under the hood.

# requirements.txt
crewai>=1.14.0,<2.0.0
httpx>=0.27.0
python-dotenv>=1.0.1

Install:

pip install -r requirements.txt

Create a .env.example next to your code. This is where the API keys and other config live:

# .env.example

# OpenAI API key (from https://platform.openai.com/api-keys)
OPENAI_API_KEY=sk-...

# OpenAI model used by every CrewAI agent in this demo
MODEL=gpt-4o-mini

# Kong Konnect Metering & Billing ingestion endpoint
# US:  https://us.api.konghq.com/v3/openmeter/events
# EU:  https://eu.api.konghq.com/v3/openmeter/events
# AU:  https://au.api.konghq.com/v3/openmeter/events
KONG_INGEST_URL=https://us.api.konghq.com/v3/openmeter/events

# Personal Access Token from Konnect with Metering & Billing write permissions
# Konnect UI -> profile menu -> Personal Access Tokens
KONG_PAT=kpat_...

# Customer identifier. Becomes the `subject` on every CloudEvent
# and the customer in Konnect M&B once events arrive.
CUSTOMER_ID=acme

# Source identifier, becomes the `source` on every CloudEvent.
# Helps you tell different apps apart in the events view.
EVENT_SOURCE=crewai-research-crew

The KONG_INGEST_URL is region-specific. US orgs use us.api.konghq.com, EU orgs use eu.api.konghq.com, AU orgs use au.api.konghq.com. Use the wrong region and events get silently rejected. Check your region in the Konnect organization settings.

Copy .env.example to .env and fill in real values. Add .env to a .gitignore so secrets never get committed:

# .gitignore
.venv/
__pycache__/
*.pyc
.env
.env.local
*.log

Define the research crew

Three agents, three tasks, run one after the other. The agent role is the most important field. The role string is what we attach to every billing event and what shows up on the invoice. Pick names you are happy seeing on a customer's bill.

# crew.py
"""Three-agent research crew: Researcher -> Analyst -> Writer."""

from __future__ import annotations

import os

from crewai import LLM, Agent, Crew, Process, Task


def _llm() -> LLM:
    return LLM(
        model=os.environ.get("MODEL", "gpt-4o-mini"),
        api_key=os.environ["OPENAI_API_KEY"],
        temperature=0.4,
    )


def build_crew(topic: str) -> Crew:
    llm = _llm()

    researcher = Agent(
        role="Researcher",
        goal=f"Gather concrete, factual material about: {topic}",
        backstory=(
            "You are an analyst who pulls together raw facts, names, dates, "
            "and numbers on a topic. You write in dense bullet lists and "
            "never speculate."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    analyst = Agent(
        role="Analyst",
        goal="Distill research notes into the three sharpest insights",
        backstory=(
            "You read research notes and pull out the three insights that "
            "matter most. You discard noise. You explain each insight in "
            "two sentences."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    writer = Agent(
        role="Writer",
        goal="Turn the analyst's insights into a polished one-page briefing",
        backstory=(
            "You write executive briefings. You open with a one-sentence "
            "summary, then expand each insight with concrete supporting "
            "evidence. You never use jargon."
        ),
        llm=llm,
        allow_delegation=False,
        verbose=True,
    )

    research_task = Task(
        description=(
            f"Collect a tight set of facts about: {topic}. "
            "Aim for 8 to 12 bullet points. Each bullet should be a single "
            "fact with a year or named source where possible."
        ),
        expected_output="A bullet list of facts.",
        agent=researcher,
    )

    analysis_task = Task(
        description=(
            "Read the research notes. Pick the three insights that "
            "matter most to a builder evaluating this space. For each "
            "insight, write two sentences."
        ),
        expected_output="Three numbered insights, two sentences each.",
        agent=analyst,
        context=[research_task],
    )

    writing_task = Task(
        description=(
            "Write a one-page briefing for a busy engineering leader. "
            "Open with a one-sentence summary. Then expand each of the "
            "three insights with supporting evidence drawn from the "
            "research notes. Plain language only."
        ),
        expected_output="A one-page briefing in markdown.",
        agent=writer,
        context=[research_task, analysis_task],
    )

    return Crew(
        agents=[researcher, analyst, writer],
        tasks=[research_task, analysis_task, writing_task],
        process=Process.sequential,
        verbose=True,
    )

Subscribe to LLMCallCompletedEvent

CrewAI has a built-in event bus. Every LLM call inside an agent fires an LLMCallCompletedEvent. The event carries the token count, the model name, and the agent's role. To hook into it, we subclass BaseEventListener and register a handler.

# billing.py
"""Per-agent token billing listener for CrewAI.

Subscribes to LLMCallCompletedEvent and ships one CloudEvent per token bucket
(input + output) to Kong Konnect Metering & Billing.
"""

from __future__ import annotations

import logging
import os
import uuid
from datetime import datetime, timezone
from typing import Any

import httpx
from crewai.events import BaseEventListener, LLMCallCompletedEvent

logger = logging.getLogger(__name__)

EVENT_TYPE = "crewai.llm_call"
CLOUDEVENTS_SPEC_VERSION = "1.0"


class KongBillingListener(BaseEventListener):
    """Forwards CrewAI LLM token usage to Kong M&B as CloudEvents.

    One LLM call produces two events: one for prompt (input) tokens and
    one for completion (output) tokens. Both carry the agent_role so the
    meter can group spend per agent in the crew.
    """

    def __init__(
        self,
        ingest_url: str,
        api_key: str,
        subject: str,
        source: str = "crewai-research-crew",
        timeout: float = 5.0,
    ) -> None:
        self.ingest_url = ingest_url
        self.subject = subject
        self.source = source
        self._client = httpx.Client(
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/cloudevents+json",
            },
            timeout=timeout,
        )
        self.events_sent = 0
        self.tokens_by_agent: dict[str, dict[str, int]] = {}
        super().__init__()

    def setup_listeners(self, crewai_event_bus: Any) -> None:
        @crewai_event_bus.on(LLMCallCompletedEvent)
        def handle_llm_call(_source: Any, event: LLMCallCompletedEvent) -> None:
            self._record(event)

    def _record(self, event: LLMCallCompletedEvent) -> None:
        usage = event.usage or {}
        agent_role = getattr(event, "agent_role", None) or "unknown"
        model = event.model or "unknown"
        call_id = event.call_id

        prompt_tokens = int(usage.get("prompt_tokens", 0) or 0)
        completion_tokens = int(usage.get("completion_tokens", 0) or 0)

        bucket = self.tokens_by_agent.setdefault(
            agent_role, {"input": 0, "output": 0}
        )

        if prompt_tokens:
            self._emit(call_id, agent_role, model, "input", prompt_tokens)
            bucket["input"] += prompt_tokens

        if completion_tokens:
            self._emit(call_id, agent_role, model, "output", completion_tokens)
            bucket["output"] += completion_tokens

    def _emit(
        self,
        call_id: str,
        agent_role: str,
        model: str,
        token_type: str,
        tokens: int,
    ) -> None:
        payload = {
            "specversion": CLOUDEVENTS_SPEC_VERSION,
            "id": f"{call_id}-{token_type}-{uuid.uuid4().hex[:8]}",
            "source": self.source,
            "type": EVENT_TYPE,
            "subject": self.subject,
            "time": datetime.now(timezone.utc).isoformat(),
            "datacontenttype": "application/json",
            "data": {
                "tokens": tokens,
                "type": token_type,
                "agent_role": agent_role,
                "model": model,
                "call_id": call_id,
            },
        }

        try:
            response = self._client.post(self.ingest_url, json=payload)
            response.raise_for_status()
            self.events_sent += 1
        except httpx.HTTPError as exc:
            logger.warning(
                "Kong M&B ingest failed for %s (%s tokens): %s",
                agent_role,
                tokens,
                exc,
            )

    def close(self) -> None:
        self._client.close()

    def summary(self) -> str:
        lines = ["Per-agent token usage:"]
        for role, counts in self.tokens_by_agent.items():
            total = counts["input"] + counts["output"]
            lines.append(
                f"  {role:25s}  input={counts['input']:6d}  "
                f"output={counts['output']:6d}  total={total:6d}"
            )
        lines.append(f"Events sent to Kong M&B: {self.events_sent}")
        return "\n".join(lines)


def from_env() -> KongBillingListener:
    ingest_url = os.environ["KONG_INGEST_URL"]
    api_key = os.environ["KONG_PAT"]
    subject = os.environ.get("CUSTOMER_ID", "acme")
    source = os.environ.get("EVENT_SOURCE", "crewai-research-crew")
    return KongBillingListener(
        ingest_url=ingest_url,
        api_key=api_key,
        subject=subject,
        source=source,
    )

Three things worth pointing out in this code:

Two events per LLM call, not one. The listener sends one event for input tokens and one for output tokens. Splitting them now lets us bill them at different rates later.

Unique event IDs for safe retries. Each event ID is built from CrewAI's call_id, the token type (input or output), and a short random string. Kong deduplicates events by id plus source, so this format makes retries safe without losing the input/output split.

Errors are logged, not raised. If Kong is briefly down, the crew run keeps going. A dropped event is better than a crashed customer run. In production, add a retry queue for the dropped events.

The from_env() helper at the bottom is what main.py uses to build the listener from .env values.

Run the crew and see per-agent tokens

The entry-point script loads .env, builds the listener (which registers itself on the event bus during __init__), and kicks off the crew.

# main.py
"""Run the research crew and ship per-agent token usage to Kong M&B."""

from __future__ import annotations

import argparse
import os
import sys

from dotenv import load_dotenv

from billing import from_env
from crew import build_crew


def main() -> int:
    load_dotenv()

    parser = argparse.ArgumentParser(description="CrewAI research briefing")
    parser.add_argument(
        "topic",
        nargs="?",
        default="Usage-based pricing for AI agent products in 2026",
        help="Topic the crew should research",
    )
    parser.add_argument(
        "--customer",
        default=None,
        help="Customer ID to bill (overrides CUSTOMER_ID env var)",
    )
    args = parser.parse_args()

    if args.customer:
        os.environ["CUSTOMER_ID"] = args.customer

    listener = from_env()
    print(f"Billing customer: {os.environ['CUSTOMER_ID']}")
    print(f"Topic:            {args.topic}\n")

    try:
        crew = build_crew(args.topic)
        result = crew.kickoff()
        print("\n--- Briefing ---\n")
        print(result.raw if hasattr(result, "raw") else result)
        print("\n--- Billing ---\n")
        print(listener.summary())
    finally:
        listener.close()

    return 0


if __name__ == "__main__":
    sys.exit(main())

The per-agent summary at the end comes from listener.summary(). The listener tracks tokens in memory as events fire and formats them at the end of the run.

Run it:

python main.py "Strategies for monetizing developer tools with usage-based pricing"

You will see CrewAI's verbose output as each agent thinks, then the final briefing, then the billing summary. From one of my runs:

Per-agent token usage:
  Researcher                 input=   154  output=   453  total=   607
  Analyst                    input=   587  output=   176  total=   763
  Writer                     input=   779  output=   421  total=  1200
Events sent to Kong M&B: 6

This is why per-agent billing matters. Each agent has a different token shape:

Researcher: short prompt in, long fact dump out.
Analyst: long facts in, three short insights out.
Writer: everything before it in, the longest output out.

Each role uses tokens differently. The cost per agent is different. A single flat price hides all of that.

The events are in Kong M&B now, but no meter is matching them yet. They sit in the events table with a validation warning. The next steps fix that.

Provision Kong with one script

The next four steps (meter, features, plan, customer, subscription) can be done in a single command using the setup_kong.py script from the repo:

python setup_kong.py

To start over from a clean slate, pass --teardown. The script cancels the subscription, archives the plan, deletes the features and meter, and then recreates everything:

python setup_kong.py --teardown

The customer record is kept across teardowns so event history stays attached to the same subject.

Here is the script in full. It is the source of truth for the role names (Researcher, Analyst, Writer) and prices ($0.0001, $0.0002, $0.0005) used in the rest of this tutorial. The same values show up in the manual UI walk-through below, so the script and the click-by-click path produce the same setup.

# setup_kong.py
"""One-shot provisioner for Kong Konnect Metering & Billing.

Creates the meter, three features, a published plan with three rate cards,
the customer, and an active subscription. Designed to be re-run on a clean
org. Run with --teardown to delete a previous provisioning before recreating.

Each feature uses a meter group-by filter on agent_role so that the Researcher,
Analyst, and Writer features each only count tokens consumed by that agent.
Without the filter, every feature would aggregate the entire meter and the
invoice would show no per-role breakdown.
"""

from __future__ import annotations

import argparse
import os
import sys

import httpx
from dotenv import load_dotenv

ROLES = ["Researcher", "Analyst", "Writer"]
PRICES = {"Researcher": "0.0001", "Analyst": "0.0002", "Writer": "0.0005"}
METER_KEY = "crewai_tokens"
PLAN_KEY = "crewai_research_pro"


def _client() -> httpx.Client:
    load_dotenv()
    base = os.environ["KONG_INGEST_URL"].rsplit("/", 1)[0]
    pat = os.environ["KONG_PAT"]
    return httpx.Client(
        base_url=base,
        headers={
            "Authorization": f"Bearer {pat}",
            "Content-Type": "application/json",
        },
        timeout=15.0,
    )


def teardown(s: httpx.Client, customer_key: str) -> None:
    """Cancel/archive then delete subscription -> plan -> features -> meter.

    Subscriptions are cancelled (not deleted), plans are archived. Features
    and meters use DELETE. The customer is preserved so subjects keep their
    history.
    """
    print("Teardown ...")
    customers = s.get("/customers", params={"key": customer_key}).json().get("data", [])
    for c in customers:
        if c["key"] != customer_key:
            continue
        for sub in s.get("/subscriptions").json().get("data", []):
            if sub["customer_id"] == c["id"] and sub.get("status") == "active":
                r = s.post(f"/subscriptions/{sub['id']}/cancel", json={})
                print(f"  subscription {sub['id']} cancel -> {r.status_code}")

    for plan in s.get("/plans").json().get("data", []):
        if plan["key"] != PLAN_KEY:
            continue
        if plan.get("status") == "active":
            r = s.post(f"/plans/{plan['id']}/archive", json={})
            print(f"  plan {plan['id']} archive -> {r.status_code}")

    for feat in s.get("/features").json().get("data", []):
        if not feat["key"].startswith("crewai_"):
            continue
        r = s.delete(f"/features/{feat['id']}")
        print(f"  feature {feat['key']} -> {r.status_code}")

    for meter in s.get("/meters").json().get("data", []):
        if meter["key"] != METER_KEY:
            continue
        r = s.delete(f"/meters/{meter['id']}")
        print(f"  meter {meter['key']} -> {r.status_code}")


def provision(s: httpx.Client, customer_key: str) -> None:
    # 1. Meter
    print("Creating meter ...")
    r = s.post("/meters", json={
        "key": METER_KEY,
        "name": "CrewAI Tokens",
        "description": "Tokens consumed by CrewAI agents per role",
        "event_type": "crewai.llm_call",
        "value_property": "$.tokens",
        "aggregation": "sum",
        "dimensions": {
            "agent_role": "$.agent_role",
            "type": "$.type",
            "model": "$.model",
        },
    })
    r.raise_for_status()
    meter = r.json()
    print(f"  meter id={meter['id']}")

    # 2. Three features, each filtered by agent_role
    print("Creating features ...")
    feature_ids: dict[str, str] = {}
    for role in ROLES:
        key = f"crewai_{role.lower()}_tokens"
        r = s.post("/features", json={
            "key": key,
            "name": f"CrewAI {role} Tokens",
            "meter": {
                "id": meter["id"],
                "filters": {"agent_role": {"eq": role}},
            },
        })
        r.raise_for_status()
        feature_ids[role] = r.json()["id"]
        print(f"  {key:30s} id={feature_ids[role]}")

    # 3. Plan with three rate cards
    print("Creating plan ...")
    rate_cards = []
    for role in ROLES:
        rate_cards.append({
            "key": f"crewai_{role.lower()}_tokens",
            "name": f"{role} Tokens",
            "billing_cadence": "P1M",
            "feature": {"id": feature_ids[role]},
            "price": {"type": "unit", "amount": PRICES[role]},
        })
    r = s.post("/plans", json={
        "key": PLAN_KEY,
        "name": "CrewAI Research Pro",
        "currency": "USD",
        "billing_cadence": "P1M",
        "pro_rating_enabled": True,
        "phases": [
            {"key": "default", "name": "Default", "rate_cards": rate_cards}
        ],
    })
    r.raise_for_status()
    plan_id = r.json()["id"]
    print(f"  plan id={plan_id} status={r.json().get('status')}")

    # 4. Publish
    print("Publishing plan ...")
    r = s.post(f"/plans/{plan_id}/publish", json={})
    r.raise_for_status()
    print(f"  plan status={r.json().get('status')}")

    # 5. Customer (reuse if exists)
    print(f"Ensuring customer key={customer_key} ...")
    existing = [c for c in s.get("/customers", params={"key": customer_key}).json().get("data", []) if c["key"] == customer_key]
    if existing:
        customer_id = existing[0]["id"]
        print(f"  reusing customer id={customer_id}")
    else:
        r = s.post("/customers", json={
            "key": customer_key,
            "name": "Acme Inc",
            "currency": "USD",
            "usage_attribution": {"subject_keys": [customer_key]},
        })
        r.raise_for_status()
        customer_id = r.json()["id"]
        print(f"  customer id={customer_id}")

    # 6. Subscription
    print("Subscribing customer to plan ...")
    r = s.post("/subscriptions", json={
        "customer": {"id": customer_id},
        "plan": {"id": plan_id},
    })
    r.raise_for_status()
    sub = r.json()
    print(f"  subscription id={sub['id']} status={sub.get('status')}")

    print("\nDone. Provisioning summary:")
    print(f"  meter:        {meter['id']}")
    print(f"  features:     {feature_ids}")
    print(f"  plan:         {plan_id}")
    print(f"  customer:     {customer_id}")
    print(f"  subscription: {sub['id']}")


def main() -> int:
    parser = argparse.ArgumentParser()
    parser.add_argument("--teardown", action="store_true",
                        help="Delete prior CrewAI provisioning before recreating")
    args = parser.parse_args()

    customer_key = os.environ.get("CUSTOMER_ID", "acme")
    with _client() as s:
        if args.teardown:
            teardown(s, customer_key)
        provision(s, customer_key)
    return 0


if __name__ == "__main__":
    sys.exit(main())

A few things worth pointing out before the manual walk-through.

The four constants at the top are the only knobs. ROLES, PRICES, METER_KEY, and PLAN_KEY are the values you would change to use this script for a different crew or pricing model. Everything below them is mechanical.

The feature filter shape is strict. The script uses meter: {id, filters: {agent_role: {eq: role}}}. If you change the shape, the Kong API still returns 201 but it silently drops the filter. The feature then sums the whole meter and per-agent billing breaks. After creating a feature, always GET it back and confirm meter.filters is set.

Subscriptions and plans are not deleted. They are cancelled or archived. The teardown helper uses POST /subscriptions/{id}/cancel and POST /plans/{id}/archive. Features and meters do support DELETE.

The next four sections walk through the same steps by hand using the UI and curl. Read them to understand what each Kong resource does, or skip ahead to Run the crew again and check usage if you already ran the script.

Create the meter

A meter is a rule that tells Kong how to count incoming events. Open Konnect, go to Metering & Billing → Metering, and click Create Meter.

For this tutorial we skip the LLM Tokens template (it expects events from Kong AI Gateway) and configure the meter from scratch.

Field	Value
Name	CrewAI Tokens
Key	`crewai_tokens`
Event type	`crewai.llm_call`
Value property	`$.tokens`
Aggregation	Sum
Dimensions	`agent_role` → `$.agent_role`, `type` → `$.type`, `model` → `$.model`

The event_type must match the type field on the CloudEvents your listener sends. If they don't match, events still flow in but no meter picks them up.

Dimensions are important. They tell the meter to keep agent_role, type, and model available as group-by axes. Without dimensions, you get one big bucket of tokens with no breakdown.

CLI

curl -X POST https://us.api.konghq.com/v3/openmeter/meters \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "crewai_tokens",
    "name": "CrewAI Tokens",
    "description": "Tokens consumed by CrewAI agents per role",
    "event_type": "crewai.llm_call",
    "value_property": "$.tokens",
    "aggregation": "sum",
    "dimensions": {
      "agent_role": "$.agent_role",
      "type": "$.type",
      "model": "$.model"
    }
  }'

The response includes the meter id (a ULID starting with 01). Save it. You will need it when creating features.

Create one feature per agent role

A feature is a named slice of a meter, optionally filtered by dimension values. We need three features, one per agent role. All three point at the same crewai_tokens meter.

Go to Product Catalog → Features tab → Create Feature. Repeat three times, once per role:

Name	Key	Meter	Filter
CrewAI Researcher Tokens	`crewai_researcher_tokens`	CrewAI Tokens	`agent_role = Researcher`
CrewAI Analyst Tokens	`crewai_analyst_tokens`	CrewAI Tokens	`agent_role = Analyst`
CrewAI Writer Tokens	`crewai_writer_tokens`	CrewAI Tokens	`agent_role = Writer`

The feature key must match the rate card key you set on the plan in the next step. Pick descriptive keys now and the rest of the wiring stays clean.

Get the filter shape right. Kong expects the meter as an object with the meter id and a filters map. Filter values use operators like {"eq": "..."}, not bare strings. If you get it wrong, the API still returns 201 but silently drops the filter. The feature then sums the whole meter and your invoice ends up empty. After creating a feature, always GET it back and check that meter.filters is set.

CLI

# Look up the meter id once
METER_ID=$(curl -s "https://us.api.konghq.com/v3/openmeter/meters" \
  -H "Authorization: Bearer $KONG_PAT" | \
  jq -r '.data[] | select(.key=="crewai_tokens") | .id')

for role in Researcher Analyst Writer; do
  lower=$(echo "$role" | tr '[:upper:]' '[:lower:]')
  curl -X POST https://us.api.konghq.com/v3/openmeter/features \
    -H "Authorization: Bearer $KONG_PAT" \
    -H "Content-Type: application/json" \
    -d "{
      \"key\": \"crewai_${lower}_tokens\",
      \"name\": \"CrewAI ${role} Tokens\",
      \"meter\": {
        \"id\": \"${METER_ID}\",
        \"filters\": {\"agent_role\": {\"eq\": \"${role}\"}}
      }
    }"
done

Create a plan with three rate cards

A plan ties features to prices. Go to Product Catalog → Plans tab → New Plan. Name it CrewAI Research Pro, currency USD, monthly cadence. Then add three rate cards in the default phase:

Rate card key	Feature	Price (USD per token)
`crewai_researcher_tokens`	CrewAI Researcher Tokens	0.0001
`crewai_analyst_tokens`	CrewAI Analyst Tokens	0.0002
`crewai_writer_tokens`	CrewAI Writer Tokens	0.0005

The rate card key must match the feature key. If they don't match, Kong returns a rate_card_key_feature_key_mismatch error.

Prices are per single token. To charge $5 per million tokens, enter 0.000005, not 5. The decimals look uncomfortable but they are correct. I used round numbers like 0.0001 here so usage and dollar amounts are easy to read while testing. Real production pricing usually looks like 0.0000003.

After the rate cards are in, click Publish. A draft plan cannot accept subscriptions.

CLI

Build the plan in two steps: create as draft, then publish.

PLAN=$(curl -s -X POST https://us.api.konghq.com/v3/openmeter/plans \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "crewai_research_pro",
    "name": "CrewAI Research Pro",
    "currency": "USD",
    "billing_cadence": "P1M",
    "pro_rating_enabled": true,
    "phases": [{
      "key": "default",
      "name": "Default",
      "rate_cards": [
        {"key": "crewai_researcher_tokens", "name": "Researcher Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_researcher_tokens"},
         "price": {"type": "unit", "amount": "0.0001"}},
        {"key": "crewai_analyst_tokens", "name": "Analyst Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_analyst_tokens"},
         "price": {"type": "unit", "amount": "0.0002"}},
        {"key": "crewai_writer_tokens", "name": "Writer Tokens",
         "billing_cadence": "P1M",
         "feature": {"key": "crewai_writer_tokens"},
         "price": {"type": "unit", "amount": "0.0005"}}
      ]
    }]
  }' | jq -r .id)

curl -X POST "https://us.api.konghq.com/v3/openmeter/plans/$PLAN/publish" \
  -H "Authorization: Bearer $KONG_PAT"

Create the customer and subscribe

In Konnect, go to Customers → New Customer. Name it Acme Inc, key acme, currency USD. The important field is Subject keys. It must include acme. This is how Kong matches incoming events to a customer. Our listener sets the subject field on every event to acme (from the CUSTOMER_ID value in .env).

Then open the customer, click Add Subscription, pick CrewAI Research Pro, and start it immediately.

CLI

CUSTOMER=$(curl -s -X POST https://us.api.konghq.com/v3/openmeter/customers \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "acme",
    "name": "Acme Inc",
    "currency": "USD",
    "usage_attribution": {"subject_keys": ["acme"]}
  }' | jq -r .id)

curl -X POST https://us.api.konghq.com/v3/openmeter/subscriptions \
  -H "Authorization: Bearer $KONG_PAT" \
  -H "Content-Type: application/json" \
  -d "{
    \"customer\": {\"id\": \"$CUSTOMER\"},
    \"plan\": {\"key\": \"crewai_research_pro\"}
  }"

Run the crew again and check usage

One detail is easy to miss. Events sent to Kong before a subscription starts do not get billed. Only events with a timestamp inside the active subscription window roll into an invoice.

Run the crew one more time after the subscription is active:

python main.py "Best practices for instrumenting LLM token usage in multi-agent systems"

Open Konnect, go to the Acme Inc customer, and switch to the Usage tab. You should see three rows, one per feature, each with a total token count and a cost. Switch to Invoices and the same three rows show up as line items on the upcoming invoice.

To check from the CLI, query the events endpoint and confirm validation_errors is empty:

curl -s "https://us.api.konghq.com/v3/openmeter/events?type=crewai.llm_call&limit=6" \
  -H "Authorization: Bearer $KONG_PAT" | \
  jq '.data[] | {role: .event.data.agent_role,
                 type: .event.data.type,
                 tokens: .event.data.tokens,
                 errors: (.validation_errors | length)}'

A clean run looks like:

{"role": "Writer", "type": "output", "tokens": 421, "errors": 0}
{"role": "Writer", "type": "input", "tokens": 779, "errors": 0}
{"role": "Researcher", "type": "output", "tokens": 453, "errors": 0}
{"role": "Analyst", "type": "input", "tokens": 587, "errors": 0}
{"role": "Analyst", "type": "output", "tokens": 176, "errors": 0}
{"role": "Researcher", "type": "input", "tokens": 154, "errors": 0}

Six events, one per (agent role, token type) bucket. All six match the meter and roll into the customer's subscription.

How would you price your crew?

Per role like in this tutorial? Per total tokens? Per task? Per crew run? The right answer depends on what your customers can predict and what hurts your margin when they cannot. Drop a comment with the pricing model you use. I want to hear what is working in the wild.

The full code is at github.com/tejakummarikuntla/Billing-CrewAI-with-KongMB. PRs welcome.

Top comments (2)

Max Quimby • Jun 8

The per-agent meter slicing is the part most people skip, and it pays off well beyond billing — we started attributing tokens per role mostly to invoice correctly and ended up using it as our main optimization signal. Once you can see the Writer burning ~2x the Researcher, you start asking whether the Writer actually needs the full context window or whether you're stuffing the entire research dump into its prompt when a summary would do.

One thing I'd flag from running crews in production: the cost that actually hurts usually isn't the clean single pass you're metering here — it's retries and tool-call fan-out. An agent that silently loops three times on a flaky tool quietly triples its slice. Are you tagging agent_role on retry events too, or do those land in an "unattributed" bucket? And when one agent delegates to another, does the child's usage roll up to the parent or stay separate? That hierarchy question is exactly where our first per-agent dashboard got confusing.

Teja Kummarikuntla Kong • Jun 8

Solid points. The optimization signal was the bigger win for me too. The Writer eating 2x almost always traces back to "we're stuffing the full research dump into a prompt that needed a 200-word summary."

Retries: LLMCallCompletedEvent fires per attempt with agent_role set, so successful retries do tag the right agent. The gap is LLMCallFailedEvent, which the listener in the post doesn't subscribe to. Partial token spend from hard failures gets dropped. For production I'd add a handler for failed events that emits under the same agent_role so failure cost shows up next to success cost in the same view.

Delegation: child usage stays separate. When A delegates to B, B's LLM calls fire with from_agent=B, so they bill to B. The delegation tool-call itself bills to A. That makes the parent's slice look smaller than what it actually triggered. What works for me: keep from_agent as the leaf attribution and add a second delegated_by dimension on the meter, so you can pivot between flat and hierarchical views.

DEV Community