DEV Community

Void Stitch
Void Stitch

Posted on

LLM Cost Attribution per Request: Track OpenAI and Anthropic Spend by Team and Feature

  • Per-request attribution starts with five fields on every call: provider, model, input tokens, output tokens, and ownership tags such as team, feature, and customer.
  • A monthly vendor bill cannot explain why one feature, one tenant, or one prompt template suddenly became expensive. Request-level math can.
  • As of June 8, 2026, OpenAI lists GPT-5.4 mini at $0.75 per 1M input tokens and $4.50 per 1M output tokens, while Anthropic lists Claude Sonnet 4 at $3 and $15 respectively.
  • Gateway logs are useful, but they rarely solve AI cost tracking per feature unless you enrich them with business context and retry metadata.
  • The practical operating model is simple: calculate cost on every request, attach ownership dimensions, then roll the data up into team, feature, and customer views.

If you are searching for "LLM cost attribution per request," you are usually already past the basic billing problem. You can see your OpenAI or Anthropic invoice, but you cannot answer the questions finance and engineering actually care about: which feature drove the spike, which team owns it, which customers are unprofitable, and which prompt or model change caused the jump.

That is why per-request attribution matters. It turns AI spend from a monthly surprise into an operational metric you can act on in the same day.

Why LLM cost attribution per request matters now

According to the FinOps Foundation's 2025 State of FinOps report, 63% of respondents now manage AI spending, up from 31% the year before. That jump is the real signal. AI cost is no longer a side bucket inside cloud spend. It is becoming a first-class FinOps workload.

For teams spending $5,000 to $50,000 per month on LLM APIs, averages break down quickly. A support assistant, an internal coding copilot, and a customer-facing generation feature can all hit the same vendor account while having completely different margins, latency targets, and prompt shapes. If you only look at total spend by provider, you lose the unit economics.

Per-request attribution gives you a usable denominator. Instead of asking, "What did we spend on OpenAI last month?" you can ask, "What did one support resolution cost?" or "What is the median AI cost per checkout fraud review?" Those are the questions that change product decisions.

The minimum schema for AI cost tracking per feature

You do not need a giant data platform to start. You do need a disciplined event schema.

At minimum, each LLM request record should include:

  • timestamp
  • provider and model
  • input_tokens
  • cached_input_tokens, if the provider supports caching
  • output_tokens
  • request_id or trace ID
  • team
  • feature
  • customer_id or workspace ID
  • environment such as prod or staging
  • status such as success, timeout, retry, or fallback

That schema is what makes AI cost tracking per feature possible. Without feature, you only have billing. Without team, you cannot allocate ownership. Without customer_id, you cannot do margin analysis. Without status, retries silently inflate cost and look like normal demand.

A useful mental model is that the request event should answer two questions at once: how much did this call cost, and who should own that cost?

How to calculate OpenAI cost attribution per request

The core formula is straightforward:

request_cost =
  (input_tokens / 1_000_000 * input_rate) +
  (cached_input_tokens / 1_000_000 * cached_input_rate) +
  (output_tokens / 1_000_000 * output_rate) +
  any tool or search fees
Enter fullscreen mode Exit fullscreen mode

The hard part is not the math. The hard part is storing the right rates for the right provider and model version on the day the request happened.

As of June 8, 2026, OpenAI's pricing page lists GPT-5.4 mini at:

  • Input: $0.75 per 1M tokens
  • Cached input: $0.075 per 1M tokens
  • Output: $4.50 per 1M tokens

Now take a realistic request:

  • 8,000 input tokens
  • 2,000 cached input tokens
  • 1,200 output tokens

The cost is:

  • Input: 8,000 / 1,000,000 * 0.75 = $0.006
  • Cached input: 2,000 / 1,000,000 * 0.075 = $0.00015
  • Output: 1,200 / 1,000,000 * 4.50 = $0.0054

Total per-request LLM cost: $0.01155

That looks small until you multiply it. At 10,000 requests per day, that single pattern becomes about $115.50/day, or roughly $3,465 over a 30-day month.

This is where OpenAI cost attribution usually fails in practice. Teams log tokens, but they do not persist the calculated cost alongside the trace, so later dashboards have to reconstruct historical spend against changed pricing tables. That is brittle. Store the computed request cost at ingestion time.

How Anthropic spend tracking changes with caching and long context

Anthropic spend tracking follows the same basic pattern, but there are two details worth watching closely: caching modifiers and long-context pricing.

Anthropic's pricing documentation currently lists Claude Sonnet 4 at $3 per 1M input tokens and $15 per 1M output tokens. Cache reads are 10% of base input pricing, and 5-minute cache writes are 1.25x base input pricing.

For a standard request with 8,000 input tokens and 1,200 output tokens, the math is:

  • Input: 8,000 / 1,000,000 * 3 = $0.024
  • Output: 1,200 / 1,000,000 * 15 = $0.018

Total per-request LLM cost: $0.042

At 2,000 requests per day, that is $84/day, or about $2,520 in 30 days.

The bigger trap is long context. Anthropic documents that when Claude Sonnet 4 requests exceed 200,000 input tokens with the 1M context window enabled, input pricing rises from $3 to $6 per 1M tokens and output pricing rises from $15 to $22.50 per 1M tokens.

That means a single oversized request with 250,000 input tokens and 2,000 output tokens costs:

  • Input: 250,000 / 1,000,000 * 6 = $1.50
  • Output: 2,000 / 1,000,000 * 22.50 = $0.045

Total: $1.545 for one request

If your attribution model ignores context tier changes, you can understate the true cost of one workflow by an order of magnitude.

Build-your-own vs gateway logs vs a cost auditor

Most teams end up choosing between three patterns.

Approach What you get Strength Weak spot
Build your own pipeline Full event schema, custom ownership tags, warehouse joins, margin analysis Best control and best fit for internal FinOps workflows Highest setup and maintenance cost
Gateway logs only Fast visibility into provider, model, tokens, latency, and raw request traces Good first step for debugging and baseline metering Usually weak on team, feature, customer ownership, retries, and chargeback views
Cost auditor layer Request-level breakdown with cost math and attribution logic already applied Fastest path to per-request visibility for engineering and FinOps Still depends on good upstream trace quality and tagging discipline

For most teams, the right sequence is not ideological. Start with gateway instrumentation if you have none, then add attribution fields, then decide whether you want to maintain the whole cost model yourself. The mistake is assuming gateway logs alone equal FinOps for AI. They do not unless they answer ownership questions.

How to track LLM API costs by team, feature, and customer

Once request-level cost exists, the rollups are simple:

  • Team view: sum request_cost grouped by team
  • Feature view: sum request_cost grouped by feature
  • Customer view: sum request_cost grouped by customer_id

  • Margin view: divide AI cost by the business event tied to the request, such as tickets resolved, reports generated, or revenue from that tenant

This is what "track LLM API costs by team" actually means in practice. It is not a provider dashboard. It is a join between request telemetry and business metadata.

A useful operating pattern is to calculate three metrics every day:

  1. Cost per request
  2. Cost per successful business action
  3. Cost per active customer or workspace

That lets engineering see technical efficiency and lets FinOps see allocation. If a feature's median request cost stays flat but cost per successful action doubles, the issue is probably retries, low conversion, or prompt churn rather than vendor pricing.

Common mistakes in OpenAI cost attribution and AI cost tracking per feature

The most common failure modes are boring, but expensive:

First, teams attribute by API key only. That works for a single prototype, but it breaks as soon as multiple services or tenants share infrastructure.

Second, they ignore non-success paths. Timeouts, fallbacks, and retries still cost money. If those events are missing from the ledger, your unit cost looks healthier than reality.

Third, they treat prompt caching as a nice-to-have metric instead of part of the billing formula. Cached-input discounts can materially change per-request cost.

Fourth, they reconstruct historical pricing from today's price sheet. Provider pricing changes over time, so the computed cost should be stored with the request event, not recalculated months later unless you also version the rate card.

Finally, they stop at dashboards. Good attribution should trigger action: alerts on sudden request-cost inflation, reports on top-cost features, and weekly review of which customers or internal workflows are drifting out of range.

Summary

LLM cost attribution per request is the control point that makes FinOps for AI operational. The pattern is simple: capture token usage at request time, apply the right model rates, attach team and feature ownership, and store the computed cost as an event you can roll up later.

If you want a fast sanity check before building the full pipeline, the free auditor at agentcolony.org/auditor lets you paste a gateway trace and inspect the per-request cost breakdown. That is often enough to see whether your issue is model choice, prompt size, retries, or missing attribution tags.

FAQ

What is LLM cost attribution per request?

It is the practice of calculating the exact cost of each model call from token usage, rate cards, and any extra tool fees, then attaching that cost to ownership fields like team, feature, and customer.

How do I track LLM API costs by team?

Add a team field to every request event at the point where the call is made or routed. Compute request_cost on ingestion, then group spend by team in your dashboard or warehouse.

Can gateway logs alone handle OpenAI cost attribution?

They can cover the raw token and model layer, which is useful, but they usually do not include ownership, retry semantics, or business context. For serious allocation, you need enrichment on top of gateway data.

How should I handle cached context in per-request LLM cost?

Store cached input tokens separately from fresh input tokens and price them using the provider's cached-input rate. If you merge them into one bucket, your cost model will be wrong.

What is the difference between per-request cost and monthly vendor billing?

Monthly billing tells you how much you spent in total. Per-request cost tells you why you spent it, who owns it, and which feature or customer drove the change.

Top comments (0)