Kat Laszlo for Tanso

Posted on Mar 25

Track Your AI Costs Per Customer in One API Call

#ai #saas #billing #devops

If you're building on top of AI APIs, you're probably calling OpenAI, Anthropic, Cohere, and a few others depending on the task. Each one bills differently. Costs shift when model prices change. And at some point, someone asks: "Are we making money on this customer?"

Without instrumentation, that question takes a day to answer. With it, it takes a query.

The problem

You ship a feature. It works. Customers use it. Then pricing changes, or a customer's usage spikes, or you realize token costs for one feature are eating your margin on that plan tier.

The issue isn't that you don't have cost data. Your AI provider invoices you every month. The issue is that data isn't broken down by your customers or your features. You see what you spent. You don't see what each customer cost you.

To know your margin per customer, you need to capture cost at the moment the AI call happens, with context attached.

One event call

After each AI API response, send one event to Tanso:

await fetch('http://localhost:8080/api/v1/client/events', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk_test_67d9fb04f0344036ba92ecc973f1445a'
  },
  body: JSON.stringify({
    eventIdempotencyKey: crypto.randomUUID(),
    eventName: 'chat_completion',
    customerReferenceId: 'cus_123',
    featureKey: 'ai_summarization',
    costInput: {
      model: 'gpt-4o',
      modelProvider: 'openai',
    },
    usageUnits: response.usage.total_tokens,
    costAmount: 0.05,
  })
})

A few things worth noting:

eventIdempotencyKey required. If the same event is sent twice (network retry, duplicate webhook), Tanso deduplicates silently. Use crypto.randomUUID() per call, or derive it from your own request ID if you need stable deduplication.

customerReferenceId your customer's ID, whatever you already use. Tanso maps this to their account.

featureKey ties the event to a specific feature on the customer's plan. This is how Tanso separates cost for ai_summarization from document_export from chat_completion.

costAmount dollars, not cents. Pass 0.05, not 5 for 5 cents.

costInput model and provider metadata. Used for cost attribution when you want Tanso to calculate costs from usage rather than passing them explicitly.

That's it. No batch jobs. No ETL. No scraping provider invoices.

What you get

Once events are flowing, Tanso aggregates them in real time by customer and feature.

Margin by customer. You know what cus_123 costs you this billing period, broken down by feature. Compare that to what they're paying. That's your margin.

Margin by feature. If ai_summarization on your Starter plan is consistently underwater, you see it before it shows up as a bad quarter.

Usage against plan limits. Tanso tracks usageUnits against the feature's limit on the customer's plan. The entitlement check API (POST /api/v1/client/entitlements/check) returns current usage and limit in real time, so you can gate access before a customer blows past their quota.

Automatic Stripe sync. Events flow through to Stripe Billing Meters and land on the customer's next invoice. You don't manage the billing side separately.

Works from anywhere

You don't need a dedicated service to send events. Tanso's event API is an HTTP POST. That means it works from:

Your backend add the call after any AI API response
Your terminal curl for one-off testing or backfilling
An AI agent Tanso exposes an MCP server, so Claude Code and other agents can instrument their own AI calls natively, or query cost data while they work

curl -X POST http://localhost:8080/api/v1/client/events \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk_test_67d9fb04f0344036ba92ecc973f1445a" \
  -d '{
    "eventIdempotencyKey": "550e8400-e29b-41d4-a716-446655440000",
    "eventName": "chat_completion",
    "customerReferenceId": "cus_123",
    "featureKey": "ai_summarization",
    "costInput": {
      "model": "gpt-4o",
      "modelProvider": "openai"
    },
    "usageUnits": 1847,
    "costAmount": 0.05
  }'

Get started

Get an API key from the Tanso dashboard
Configure a feature on your plan (e.g., ai_summarization) with the expected pricing model
Add the event call after each AI API response in your code
Check the dashboard, cost and usage data appears immediately

If you want to verify an event landed correctly, the dashboard shows raw event history per customer. You can also check a customer's current entitlement state at any time via the entitlement API.

Tanso is built for teams shipping on top of multiple AI APIs who need to know their economics at the customer level, not just at the invoice level. The event API is the foundation. Everything else, entitlement checks, metered billing, plan enforcement, runs on the same data.

Would love to hear if this is helpful or not. Happy to chat!

The Docs:

tanso-core.readme.io

Top comments (2)

Henry Godnick • Apr 13

This is the part most AI teams miss. The invoice is the easy number. The hard part is tying spend to customer, feature, and time so you can see margin drift before it becomes a surprise. Instrument-at-call-time is the right default.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.