DEV Community

Cover image for Usage-Based Billing for AI Agents with FastAPI and Kong
Teja Kummarikuntla Subscriber for Kong

Posted on

Usage-Based Billing for AI Agents with FastAPI and Kong

If you've built an AI agent, the next question is simple: how do you charge for it?

Flat subscriptions don't fit AI workloads. Token costs vary by model, by direction (input vs output), and by how much each user actually consumes. One user might send 200 tokens a day. Another burns through 50,000. A flat fee either overcharges the light user or subsidizes the heavy one.

What you need is usage-based billing: each user pays for exactly what they use. In this tutorial, you'll build a sample AI agent and set up billing for it using FastAPI, OpenMeter, and Kong Metering & Billing (Cloud OpenMeter engine).

What is Usage-Based Billing?

Usage-based billing means charging customers based on actual consumption rather than a fixed amount. For AI agents, every API call has a measurable cost tied to token counts, and those costs vary by model. It's a natural fit.

A usage-based billing system needs four things:

  1. Event ingestion: Capture usage data every time something billable happens

  2. Metering: Aggregate raw events into per-customer totals per billing period

  3. Pricing: Apply rate cards or tiers to metered usage

  4. Invoicing: Generate bills and collect payment

Building all of this from scratch is a real engineering project. You need event storage, deduplication, windowed aggregation, and a billing layer on top. That's a lot of infrastructure for something that isn't your core product.

The Tools

OpenMeter is an open-source metering engine maintained by Kong. It's built on the CloudEvents standard (a CNCF specification) and handles real-time event ingestion, deduplication, and aggregation at scale. The source code is on GitHub, and you can self-host it if you want full control.

Kong Konnect Metering & Billing is the cloud platform built on top of the OpenMeter repository. It adds the billing layer: features, pricing plans, subscriptions, invoicing, and payment provider integration. In this tutorial we will be using this platfrom to ingest, aggregate, apply rates and generate invoice.

What You'll Build

Here's the architecture of the system you'll build:

The flow works like this:

  1. A user calls your API endpoint (/generate, /summarize, or /analyze)

  2. Your agent sends the request to OpenAI and gets a response with token counts

  3. Your app sends a CloudEvent to the Kong Metering & Billing API with the usage data (tokens consumed, model used, user ID)

  4. The platform aggregates the events into meters, applies pricing from the user's plan, and generates an invoice at the end of the billing cycle

  5. A payment provider collects payment

Your code handles steps 1 through 3. Steps 4 and 5 happen automatically once you've configured the billing system.

Prerequisites

To follow along, you'll need:

  • Python 3.10 or later

  • An OpenAI API key (you can use any LLM provider, but this tutorial uses OpenAI)

  • A Kong Konnect account for Metering & Billing

  • A Konnect Personal Access Token (PAT) for API access

  • Basic familiarity with FastAPI and REST APIs

Table of Contents

Step 1: Set Up the Project

Create a new project directory and set up a virtual environment:

mkdir ai-billing-app
cd ai-billing-app
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

Install the dependencies:

pip install fastapi uvicorn openai httpx python-dotenv pydantic
Enter fullscreen mode Exit fullscreen mode

A few notes on the packages:

  • fastapi and uvicorn: The web framework and ASGI server for your API

  • openai: The OpenAI Python SDK for making LLM calls

  • httpx: HTTP client for sending usage events and making API calls to Konnect Metering & Billing

  • python-dotenv: Loads environment variables from a .env file

  • pydantic: Data validation (bundled with FastAPI, listed for clarity)

Create a .env file for your configuration:

# .env
OPENAI_API_KEY=sk-your-openai-api-key
KONNECT_API_URL=https://us.api.konghq.com/v3/openmeter
KONNECT_TOKEN=kpat_your_konnect_personal_access_token
Enter fullscreen mode Exit fullscreen mode

The KONNECT_TOKEN is a Personal Access Token you create in Konnect under your account settings.

For the curl commands throughout this tutorial, export these variables in your terminal:

export KONNECT_API_URL=https://us.api.konghq.com/v3/openmeter
export KONNECT_TOKEN=kpat_your_konnect_personal_access_token
Enter fullscreen mode Exit fullscreen mode

Create the main application file:

touch app.py
Enter fullscreen mode Exit fullscreen mode

Your project structure will look like this:

ai-billing-app/
├── app.py           # Main FastAPI application
├── .env             # Environment variables
└── requirements.txt # Dependencies
Enter fullscreen mode Exit fullscreen mode

Step 2: Build the AI API

Start with a basic FastAPI app that wraps OpenAI's API. This gives your users three capabilities: generating text, summarizing content, and analyzing text for insights.

Open app.py and add the following:

import os
import uuid
import datetime
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


class GenerateRequest(BaseModel):
    prompt: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 1024


class SummarizeRequest(BaseModel):
    text: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 512


class AnalyzeRequest(BaseModel):
    text: str
    query: str
    model: str = "gpt-4o-mini"
    max_tokens: int = 512


class AIResponse(BaseModel):
    content: str
    model: str
    usage: dict


app = FastAPI(title="AI Agent API")


@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )


@app.post("/api/summarize", response_model=AIResponse)
def summarize_text(request: SummarizeRequest):
    """Summarize a piece of text."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Summarize the following text concisely.",
            },
            {"role": "user", "content": request.text},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )


@app.post("/api/analyze", response_model=AIResponse)
def analyze_text(request: AnalyzeRequest):
    """Analyze text and extract insights based on a query."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Analyze the provided text and answer the user's query about it.",
            },
            {"role": "user", "content": f"Text: {request.text}\n\nQuery: {request.query}"},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )
Enter fullscreen mode Exit fullscreen mode

Test it to make sure the API works:

uvicorn app:app --reload
Enter fullscreen mode Exit fullscreen mode

In another terminal:

curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a haiku about Python programming"}'
Enter fullscreen mode Exit fullscreen mode

You should get a response with generated text and token usage counts. The usage field is what you'll feed into the billing system.

Step 3: Add API Key Authentication

Before you can bill users, you need to know who's making each request. Add a simple API key authentication layer.

In a production system, you'd store API keys in a database and hash them. For this tutorial, you'll use an in-memory dictionary to keep the focus on billing.

Update app.py to add authentication:

# Add this after the load_dotenv() call

# In production, store these in a database with hashed keys
API_KEYS = {
    "ak_user1_abc123": {"user_id": "user-001", "name": "Alice"},
    "ak_user2_def456": {"user_id": "user-002", "name": "Bob"},
    "ak_user3_ghi789": {"user_id": "user-003", "name": "Charlie"},
}


def authenticate(x_api_key: str | None = Header(default=None)) -> dict:
    """Validate the API key and return the user info."""
    if not x_api_key:
        raise HTTPException(status_code=401, detail="Missing API key")
    user = API_KEYS.get(x_api_key)
    if not user:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return user
Enter fullscreen mode Exit fullscreen mode

Now update each endpoint to require authentication. Here's the updated /api/generate endpoint as an example (apply the same pattern to /api/summarize and /api/analyze):

from fastapi import FastAPI, HTTPException, Header, Depends

@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest, user: dict = Depends(authenticate)):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage={
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
        },
    )
Enter fullscreen mode Exit fullscreen mode

Test that authentication works:

# This should fail with 401
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello"}'

# This should succeed
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Write a haiku about Python programming"}'
Enter fullscreen mode Exit fullscreen mode

Now you know who's making each request. That user_id field is what ties API usage to a billing customer.

Step 4: Send Usage Events to Konnect Metering & Billing

This is where billing starts. Every time a user makes an API call, you'll send a usage event to Konnect Metering & Billing. The metering layer (powered by the open-source OpenMeter engine) ingests and aggregates these events. You'll talk to the API directly using standard HTTP calls, no extra dependencies needed.

What are CloudEvents?

The Konnect Metering & Billing API accepts events in the CloudEvents format, which is a CNCF standard for describing event data. Every usage event you send is a CloudEvents JSON object.

Here's what a single usage event looks like:

{
  "specversion": "1.0",
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "source": "ai-text-api",
  "type": "llm_token_usage",
  "subject": "user-001",
  "time": "2026-04-13T10:30:00Z",
  "datacontenttype": "application/json",
  "data": {
    "total_tokens": 247,
    "prompt_tokens": 52,
    "completion_tokens": 195,
    "model": "gpt-4o-mini",
    "endpoint": "/api/generate"
  }
}
Enter fullscreen mode Exit fullscreen mode

Each field has a specific purpose in the billing pipeline:

Field What it does Example
specversion CloudEvents version. Always "1.0" "1.0"
id Unique event ID. Used for deduplication. UUID string
source Identifies your app or service "ai-text-api"
type Matches events to meters. You'll define a meter that listens for this type. "llm_token_usage"
subject The customer/user this event belongs to. This is how usage is attributed. "user-001"
time When the event happened (RFC 3339 format) "2026-04-13T10:30:00Z"
datacontenttype Format of the data payload "application/json"
data Your custom payload. Contains the values you want to meter. Token counts, model, endpoint

The subject field is critical. It's what ties an event to a specific customer. When you set up billing later, you'll create a customer with matching subject_keys, and all events with that subject will roll up into their usage.

You can test event ingestion directly with curl to make sure your Konnect credentials work before wiring it into your Python code:

curl -X POST $KONNECT_API_URL/events \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/cloudevents+json" \
  -d '{
    "specversion": "1.0",
    "id": "test-event-001",
    "source": "ai-text-api",
    "type": "llm_token_usage",
    "subject": "user-001",
    "time": "2026-04-13T10:30:00Z",
    "datacontenttype": "application/json",
    "data": {
      "total_tokens": 100,
      "prompt_tokens": 40,
      "completion_tokens": 60,
      "model": "gpt-4o-mini",
      "endpoint": "/api/generate"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

A 202 Accepted response means the event was ingested successfully. Note the content type: application/cloudevents+json. This tells the API you're sending a single CloudEvent.

Create the Billing Module

Now add a function to your Python app that sends usage events to Konnect after every API call. This uses httpx to POST CloudEvents directly to the Konnect Metering & Billing events endpoint:

import httpx

KONNECT_API_URL = os.getenv("KONNECT_API_URL", "https://us.api.konghq.com/v3/openmeter")
KONNECT_TOKEN = os.getenv("KONNECT_TOKEN")


def track_usage(user_id: str, model: str, endpoint: str, usage: dict):
    """Send a usage event to Konnect Metering & Billing."""
    event = {
        "specversion": "1.0",
        "id": str(uuid.uuid4()),
        "source": "ai-text-api",
        "type": "llm_token_usage",
        "subject": user_id,
        "time": datetime.datetime.now(datetime.timezone.utc).isoformat(),
        "datacontenttype": "application/json",
        "data": {
            "total_tokens": usage["total_tokens"],
            "prompt_tokens": usage["prompt_tokens"],
            "completion_tokens": usage["completion_tokens"],
            "model": model,
            "endpoint": endpoint,
        },
    }

    try:
        response = httpx.post(
            f"{KONNECT_API_URL}/events",
            headers={
                "Authorization": f"Bearer {KONNECT_TOKEN}",
                "Content-Type": "application/cloudevents+json",
            },
            json=event,
        )
        response.raise_for_status()
    except Exception as e:
        # Log the error but don't fail the request.
        # In production, use a dead-letter queue for failed events.
        print(f"Failed to track usage event: {e}")
Enter fullscreen mode Exit fullscreen mode

A few important things about this function:

  1. Each event gets a unique UUID as its id. The metering engine deduplicates events by the combination of id + source. If a network retry sends the same event twice, it won't be counted twice.

  2. The subject is the user's ID, not their name or API key. This maps directly to the billing customer you'll create later.

  3. The type is llm_token_usage. You'll configure a meter to listen for events with this exact type. If the type doesn't match, the events won't be metered.

  4. The content type must be application/cloudevents+json. This is how the Konnect API knows to parse the request body as a CloudEvent.

  5. Errors don't fail the request. Billing telemetry should never break your user's API experience. In production, you'd send failed events to a retry queue.

Track Tokens on Every Request

Now wire track_usage into each endpoint. Update the /api/generate endpoint:

@app.post("/api/generate", response_model=AIResponse)
def generate_text(request: GenerateRequest, user: dict = Depends(authenticate)):
    """Generate text content based on a prompt."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "You are a helpful writing assistant."},
            {"role": "user", "content": request.prompt},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    # Track this usage event for billing
    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/generate",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )
Enter fullscreen mode Exit fullscreen mode

Apply the same pattern to /api/summarize and /api/analyze. The only difference is the endpoint value you pass to track_usage.

Here's the updated /api/summarize:

@app.post("/api/summarize", response_model=AIResponse)
def summarize_text(request: SummarizeRequest, user: dict = Depends(authenticate)):
    """Summarize a piece of text."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {"role": "system", "content": "Summarize the following text concisely."},
            {"role": "user", "content": request.text},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/summarize",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )
Enter fullscreen mode Exit fullscreen mode

And /api/analyze:

@app.post("/api/analyze", response_model=AIResponse)
def analyze_text(request: AnalyzeRequest, user: dict = Depends(authenticate)):
    """Analyze text and extract insights based on a query."""
    response = openai_client.chat.completions.create(
        model=request.model,
        messages=[
            {
                "role": "system",
                "content": "Analyze the provided text and answer the user's query about it.",
            },
            {"role": "user", "content": f"Text: {request.text}\n\nQuery: {request.query}"},
        ],
        max_tokens=request.max_tokens,
    )

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    track_usage(
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/analyze",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )
Enter fullscreen mode Exit fullscreen mode

At this point, every API call sends a CloudEvent with the user's token consumption to the Konnect Metering & Billing API. The events are flowing. Now you need to tell the billing system what to do with them.

Step 5: Set Up Meters

A meter defines how raw events get aggregated into meaningful usage numbers. Think of it as a SQL GROUP BY query that runs continuously. It takes a stream of events and produces totals like "User A consumed 15,247 tokens this month."

What is a Meter?

A meter has four key properties:

Property What it does
key A unique identifier for the meter (for example llm_total_tokens). Lowercase letters, numbers, and underscores only.
event_type Which events feed this meter. Must match the type field in your CloudEvents.
aggregation How to combine values: sum, count, avg, min, max, unique_count, latest
value_property JSONPath to the numeric field in the event's data (for example $.total_tokens)
dimensions Optional named JSONPath expressions to break down usage (for example by model or endpoint)

For billing token usage, you want a meter that SUMs the total_tokens field from every llm_token_usage event, grouped by model.

Create a Token Usage Meter

You can create meters through the Konnect UI or via the API. Here's the API approach, which is more reproducible:

curl -X POST $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "llm_total_tokens",
    "name": "Total LLM Tokens",
    "description": "Total LLM tokens consumed per request",
    "event_type": "llm_token_usage",
    "aggregation": "sum",
    "value_property": "$.total_tokens",
    "dimensions": {
      "model": "$.model",
      "endpoint": "$.endpoint"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

This meter does the following:

  • Listens for events with type: "llm_token_usage" (matching what your app sends)

  • SUMs the total_tokens value from each event's data payload

  • Groups the totals by model and endpoint, so you can see usage breakdowns like "User A consumed 10,000 tokens on gpt-4o-mini via /api/generate"

You might also want separate meters for input and output tokens, since output tokens are typically more expensive (3 to 5x more with most LLM providers). Here's how to create an output token meter:

curl -X POST $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "key": "llm_completion_tokens",
    "name": "LLM Completion Tokens",
    "description": "LLM output/completion tokens consumed",
    "event_type": "llm_token_usage",
    "aggregation": "sum",
    "value_property": "$.completion_tokens",
    "dimensions": {
      "model": "$.model"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Verify Events Are Flowing

After creating your meters, send a few test requests through your API to generate some events:

# Send a test request
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Explain what a REST API is in two sentences"}'
Enter fullscreen mode Exit fullscreen mode

Then verify that the meter is receiving events. You can confirm this in two ways:

  1. Konnect dashboard: Go to Metering & Billing > Meters, select llm_total_tokens, and check for incoming data for subject user-001.

  2. List meters via API to confirm the meter exists and is active:

curl $KONNECT_API_URL/meters \
  -H "Authorization: Bearer $KONNECT_TOKEN"
Enter fullscreen mode Exit fullscreen mode

If the meter shows no data after a few seconds, check that:

  • Your KONNECT_TOKEN is valid

  • The event_type in your meter matches the type in your CloudEvents (llm_token_usage)

  • The value_property path ($.total_tokens) matches your event data structure

Step 6: Define Features and Pricing Plans

Meters give you raw usage data. Features and plans turn that data into billable items with prices.

The hierarchy works like this:

Meter (raw usage aggregation)
  └─► Feature (billable unit: "AI Tokens")
        └─► Rate Card (pricing: $0.002 per 1,000 tokens)
              └─► Plan (collection of rate cards: "Pro Plan")
Enter fullscreen mode Exit fullscreen mode

Create a Feature

A feature is a customer-facing billable unit. It links a meter to something you can put a price on. You configure features in the Konnect dashboard:

  1. Go to Metering & Billing > Product Catalog > Features

  2. Click Create Feature

  3. Set the key to ai_tokens, the name to "AI Tokens"

  4. Select llm_total_tokens as the meter

  5. Save

This creates a feature called "AI Tokens" that draws its usage data from the llm_total_tokens meter you created in the previous step.

Feature, plan, and rate card keys must match the pattern ^[a-z0-9]+(?:_[a-z0-9]+)*$ — lowercase letters, numbers, and underscores only. Hyphens are not allowed.

Create a Plan with Rate Cards

A plan bundles one or more features with pricing. This is where you define how much you charge.

In the Konnect dashboard:

  1. Go to Metering & Billing > Product Catalog > Plans

  2. Click Create Plan

  3. Set the key to starter, the name to "Starter", currency to USD

  4. Add a phase (the default billing period configuration)

  5. Inside the phase, add a rate card:

*   Type: Usage-based

*   Feature: `ai_tokens`

*   Rate card key: `ai_tokens` (must match the feature key)

*   Price type: Unit

*   Unit price: `0.000002`
Enter fullscreen mode Exit fullscreen mode
  1. Save and publish the plan

The unit price is the cost per single token, not per thousand. For production pricing, you'd calculate this from your LLM provider's rates plus your margin. For example, if GPT-4o-mini costs you $0.15 per 1 million input tokens, that's $0.00000015 per token. With a 10x margin, you'd charge $0.0000015 per token. For testing, a higher value like 0.000002 makes the numbers easier to read on invoices.

Konnect also supports tiered pricing for volume discounts. When creating a rate card, choose "Tiered" instead of "Unit" and define graduated tiers. For example:

  • First 100,000 tokens at $0.000003/token

  • Everything above 100,000 at $0.000002/token

Graduated pricing means each tier applies to its own range (the discounted rate doesn't apply retroactively to earlier usage).

Step 7: Onboard a Customer and Create a Subscription

Before usage events can turn into invoices, you need two things: a customer in the billing system that matches your app user, and a subscription that binds that customer to a plan.

Create a Customer

curl -X POST $KONNECT_API_URL/customers \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Alice",
    "key": "user-001",
    "usage_attribution": {
      "subject_keys": ["user-001"]
    }
  }'
Enter fullscreen mode Exit fullscreen mode

The usage_attribution.subject_keys array must match the subject field in your CloudEvents. Your app sends events with subject: "user-001", and this customer has "user-001" in its subject_keys. That's how the billing system knows which events belong to which customer.

This is the consumer-to-customer mapping, and it's the most common setup mistake. If these don't match, events will be ingested but never attributed to a customer, and no invoices will be generated.

Create a Subscription

A subscription binds a customer to a plan. Events that occurred before the subscription start date are not billed:

curl -X POST $KONNECT_API_URL/subscriptions \
  -H "Authorization: Bearer $KONNECT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "customer": {
      "key": "user-001"
    },
    "plan": {
      "key": "starter"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

The subscription references the customer and plan by their key values (you can also use their ULID id if you stored it from the creation response). Once the subscription is active, all future events for this customer will be billed according to the plan's rate cards.

Repeat this process for each user you want to bill. In production, you'd automate this as part of your user registration flow:

def onboard_customer(user_id: str, name: str, plan: str = "starter"):
    """Create a billing customer and subscription for a new user."""
    import httpx

    base_url = os.getenv("KONNECT_API_URL", "https://us.api.konghq.com/v3/openmeter")
    token = os.getenv("KONNECT_TOKEN")
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
    }

    # Create the customer
    customer_resp = httpx.post(
        f"{base_url}/customers",
        headers=headers,
        json={
            "name": name,
            "key": user_id,
            "usage_attribution": {
                "subject_keys": [user_id],
            },
        },
    )
    customer = customer_resp.json()

    # Create the subscription
    sub_resp = httpx.post(
        f"{base_url}/subscriptions",
        headers=headers,
        json={
            "customer": {"key": user_id},
            "plan": {"key": plan},
        },
    )

    return customer, sub_resp.json()
Enter fullscreen mode Exit fullscreen mode

Step 8: Test the Full Billing Flow

Now test the complete pipeline end to end. Send several requests as different users and verify that usage is tracked, metered, and reflected in invoices.

Send Test Requests

# Alice (user-001) generates text
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"prompt": "Write a short product description for a task management app"}'

# Alice summarizes text
curl -X POST http://localhost:8000/api/summarize \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user1_abc123" \
  -d '{"text": "FastAPI is a modern, fast web framework for building APIs with Python based on standard Python type hints. It is designed to be easy to use and learn while also being highly performant. FastAPI automatically generates interactive API documentation and validates request data using Pydantic models."}'

# Bob (user-002) generates text
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user2_def456" \
  -d '{"prompt": "Draft a welcome email for a new SaaS customer"}'

# Bob analyzes text
curl -X POST http://localhost:8000/api/analyze \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: ak_user2_def456" \
  -d '{"text": "Our Q1 revenue was $2.3M, up 15% from Q4. Customer churn decreased to 3.2%. New sign-ups increased by 22%.", "query": "What are the key business metrics and trends?"}'
Enter fullscreen mode Exit fullscreen mode

Check Aggregated Usage

To see how usage is stacking up per user, open the Konnect dashboard and go to Metering & Billing > Meters. Select the llm_total_tokens meter. You'll see aggregated usage broken down by the dimensions you defined (model and endpoint), filterable by customer.

You can also verify that events are being ingested by checking the meter's event count. If Alice's requests went through, you should see her user-001 subject with token totals reflecting the requests you just sent.

Check the Invoice

After the billing period ends (or you trigger an invoice draft manually in the dashboard), check what the customer owes under Metering & Billing > Customers > Alice > Invoices.

The invoice will show line items based on the rate card in Alice's plan. If she consumed 15,000 tokens at $0.000002/token, the line item would be $0.03.

You can also check Alice's entitlement access via the API:

curl $KONNECT_API_URL/customers/{alice_customer_id}/entitlement-access \
  -H "Authorization: Bearer $KONNECT_TOKEN"
Enter fullscreen mode Exit fullscreen mode

This returns Alice's configured entitlements (boolean, static, or metered). Pure usage-based rate cards without an explicit entitlement won't appear here — for usage verification, use the Konnect dashboard or the meter listing endpoint. Use the customer ID (ULID) returned when you created the customer, not the customer key.

Step 9: Connect a Payment Provider

Once invoices are generating, you need a way to collect payment. Konnect Metering & Billing integrates with Stripe. You connect your Stripe account through the Konnect dashboard (a one-time OAuth flow), link billing customers to their Stripe profiles, and invoices sync automatically.

For the full Stripe setup walkthrough, see the Konnect Metering & Billing documentation.

Common Gotchas and Production Tips

Here are problems you'll likely hit when moving this to production, along with how to handle them:

Events only count after the subscription starts. This is the most common surprise. If a user sends 10,000 tokens worth of requests before you create their subscription, those tokens won't appear on any invoice. Always create the subscription before the user starts using the API (or at least on the same day).

Event deduplication is by id + source. If you retry a failed event with the same UUID and the same source string, the metering engine will silently drop the duplicate. This is good for reliability (safe retries), but it means you should never reuse event IDs across different events. Always generate a fresh UUID.

Price per unit means per single token. There's no "per 1,000 tokens" setting. If you want to charge $0.002 per 1,000 tokens, enter 0.000002 as the unit price. This trips up almost everyone on the first try.

Use background tasks for event ingestion. In the tutorial code, track_usage runs synchronously inside the request handler. In production, you'd want to push events to a background queue so that billing telemetry doesn't add latency to your API responses. FastAPI's BackgroundTasks is the simplest option:

from fastapi import BackgroundTasks

@app.post("/api/generate", response_model=AIResponse)
def generate_text(
    request: GenerateRequest,
    user: dict = Depends(authenticate),
    background_tasks: BackgroundTasks = BackgroundTasks(),
):
    response = openai_client.chat.completions.create(...)

    usage = {
        "prompt_tokens": response.usage.prompt_tokens,
        "completion_tokens": response.usage.completion_tokens,
        "total_tokens": response.usage.total_tokens,
    }

    # Send the billing event in the background
    background_tasks.add_task(
        track_usage,
        user_id=user["user_id"],
        model=response.model,
        endpoint="/api/generate",
        usage=usage,
    )

    return AIResponse(
        content=response.choices[0].message.content,
        model=response.model,
        usage=usage,
    )
Enter fullscreen mode Exit fullscreen mode

Test with recognizable token amounts. During testing, set the unit price to 1 (one dollar per token) so that the math is obvious. If a request uses 247 tokens, the invoice should show $247. Once the flow works, switch to production pricing.

What You Learned

In this tutorial, you built a complete usage-based billing system for an AI agent. Here's what you covered:

  • Built an AI API with FastAPI that wraps OpenAI's chat completions

  • Added user authentication so every request is tied to a specific user

  • Sent CloudEvents with token usage data to the Konnect Metering & Billing API on every request

  • Created meters that aggregate raw events into per-user usage totals

  • Defined features and pricing plans that turn metered usage into billable line items

  • Onboarded customers and created subscriptions to start the billing lifecycle

  • Tested the full flow from API request to invoice generation

  • Connected a payment provider for invoice collection

The core pattern is portable: capture a usage event at the moment something billable happens, send it to a metering system, and let the billing layer handle pricing and invoicing. You can apply this same approach to any metered resource, not just LLM tokens. API calls, documents processed, images generated, compute minutes, storage bytes. The CloudEvents format and the metering/billing pipeline work the same way.

Where to Go Next

Here are some things you'd add for a production deployment:

  • Entitlements and access control: Set monthly token limits per plan tier and enforce them at the API level. When a user hits their limit, return a 429 Too Many Requests response instead of burning through your LLM budget.

  • Multi-model pricing: Price input tokens and output tokens differently, and vary pricing by model. Output tokens from GPT-4o cost roughly 3x more than input tokens. Your rate cards should reflect that.

  • Real-time usage dashboards: Expose a /api/usage endpoint so users can check their own consumption. Use the entitlement access API to return their current-period totals and remaining allowances.

  • Webhook notifications: Send alerts when users hit 80% of their plan limit, when invoices are generated, or when payments fail.

  • Free tier with upgrade path: Create a plan with a first tier of 10,000 free tokens, then usage-based pricing after that. Graduated tiered pricing handles this automatically.

  • Dead-letter queue for failed events: If event ingestion fails, queue the event for retry rather than dropping it silently.

Top comments (0)