Sol

Posted on Jun 8

LLM Cost Attribution per Request: Track OpenAI and Anthropic Spend by Team and Feature

#finops #devops #openai #anthropic

Per-request attribution starts with five fields on every call: provider, model, input tokens, output tokens, and ownership tags such as team, feature, and customer.
A monthly vendor bill cannot explain why one feature, one tenant, or one prompt template suddenly became expensive. Request-level math can.
As of June 8, 2026, OpenAI lists GPT-5.4 mini at $0.75 per 1M input tokens and $4.50 per 1M output tokens, while Anthropic lists Claude Sonnet 4 at $3 and $15 respectively.
Gateway logs are useful, but they rarely solve AI cost tracking per feature unless you enrich them with business context and retry metadata.
The practical operating model is simple: calculate cost on every request, attach ownership dimensions, then roll the data up into team, feature, and customer views.

If you are searching for "LLM cost attribution per request," you are usually already past the basic billing problem. You can see your OpenAI or Anthropic invoice, but you cannot answer the questions finance and engineering actually care about: which feature drove the spike, which team owns it, which customers are unprofitable, and which prompt or model change caused the jump.

That is why per-request attribution matters. It turns AI spend from a monthly surprise into an operational metric you can act on in the same day.

Why LLM cost attribution per request matters now

According to the FinOps Foundation's 2025 State of FinOps report, 63% of respondents now manage AI spending, up from 31% the year before. That jump is the real signal. AI cost is no longer a side bucket inside cloud spend. It is becoming a first-class FinOps workload.

For teams spending $5,000 to $50,000 per month on LLM APIs, averages break down quickly. A support assistant, an internal coding copilot, and a customer-facing generation feature can all hit the same vendor account while having completely different margins, latency targets, and prompt shapes. If you only look at total spend by provider, you lose the unit economics.

Per-request attribution gives you a usable denominator. Instead of asking, "What did we spend on OpenAI last month?" you can ask, "What did one support resolution cost?" or "What is the median AI cost per checkout fraud review?" Those are the questions that change product decisions.

The minimum schema for AI cost tracking per feature

You do not need a giant data platform to start. You do need a disciplined event schema.

At minimum, each LLM request record should include:

timestamp
provider and model
input_tokens
cached_input_tokens, if the provider supports caching
output_tokens
request_id or trace ID
team
feature
customer_id or workspace ID
environment such as prod or staging
status such as success, timeout, retry, or fallback

That schema is what makes AI cost tracking per feature possible. Without feature, you only have billing. Without team, you cannot allocate ownership. Without customer_id, you cannot do margin analysis. Without status, retries silently inflate cost and look like normal demand.

A useful mental model is that the request event should answer two questions at once: how much did this call cost, and who should own that cost?

How to calculate OpenAI cost attribution per request

The core formula is straightforward:

request_cost =
  (input_tokens / 1_000_000 * input_rate) +
  (cached_input_tokens / 1_000_000 * cached_input_rate) +
  (output_tokens / 1_000_000 * output_rate) +
  any tool or search fees

The hard part is not the math. The hard part is storing the right rates for the right provider and model version on the day the request happened.

As of June 8, 2026, OpenAI's pricing page lists GPT-5.4 mini at:

Input: $0.75 per 1M tokens
Cached input: $0.075 per 1M tokens
Output: $4.50 per 1M tokens

Now take a realistic request:

8,000 input tokens
2,000 cached input tokens
1,200 output tokens

The cost is:

Input: 8,000 / 1,000,000 * 0.75 = $0.006
Cached input: 2,000 / 1,000,000 * 0.075 = $0.00015
Output: 1,200 / 1,000,000 * 4.50 = $0.0054

Total per-request LLM cost: $0.01155

That looks small until you multiply it. At 10,000 requests per day, that single pattern becomes about $115.50/day, or roughly $3,465 over a 30-day month.

This is where OpenAI cost attribution usually fails in practice. Teams log tokens, but they do not persist the calculated cost alongside the trace, so later dashboards have to reconstruct historical spend against changed pricing tables. That is brittle. Store the computed request cost at ingestion time.

How Anthropic spend tracking changes with caching and long context

Anthropic spend tracking follows the same basic pattern, but there are two details worth watching closely: caching modifiers and long-context pricing.

Anthropic's pricing documentation currently lists Claude Sonnet 4 at $3 per 1M input tokens and $15 per 1M output tokens. Cache reads are 10% of base input pricing, and 5-minute cache writes are 1.25x base input pricing.

For a standard request with 8,000 input tokens and 1,200 output tokens, the math is:

Input: 8,000 / 1,000,000 * 3 = $0.024
Output: 1,200 / 1,000,000 * 15 = $0.018

Total per-request LLM cost: $0.042

At 2,000 requests per day, that is $84/day, or about $2,520 in 30 days.

The bigger trap is long context. Anthropic documents that when Claude Sonnet 4 requests exceed 200,000 input tokens with the 1M context window enabled, input pricing rises from $3 to $6 per 1M tokens and output pricing rises from $15 to $22.50 per 1M tokens.

That means a single oversized request with 250,000 input tokens and 2,000 output tokens costs:

Input: 250,000 / 1,000,000 * 6 = $1.50
Output: 2,000 / 1,000,000 * 22.50 = $0.045

Total: $1.545 for one request

If your attribution model ignores context tier changes, you can understate the true cost of one workflow by an order of magnitude.

Build-your-own vs gateway logs vs a cost auditor

Most teams end up choosing between three patterns.

Approach	What you get	Strength	Weak spot
Build your own pipeline	Full event schema, custom ownership tags, warehouse joins, margin analysis	Best control and best fit for internal FinOps workflows	Highest setup and maintenance cost
Gateway logs only	Fast visibility into provider, model, tokens, latency, and raw request traces	Good first step for debugging and baseline metering	Usually weak on team, feature, customer ownership, retries, and chargeback views
Cost auditor layer	Request-level breakdown with cost math and attribution logic already applied	Fastest path to per-request visibility for engineering and FinOps	Still depends on good upstream trace quality and tagging discipline

For most teams, the right sequence is not ideological. Start with gateway instrumentation if you have none, then add attribution fields, then decide whether you want to maintain the whole cost model yourself. The mistake is assuming gateway logs alone equal FinOps for AI. They do not unless they answer ownership questions.

How to track LLM API costs by team, feature, and customer

Once request-level cost exists, the rollups are simple:

Team view: sum request_cost grouped by team
Feature view: sum request_cost grouped by feature
Customer view: sum request_cost grouped by customer_id
Margin view: divide AI cost by the business event tied to the request, such as tickets resolved, reports generated, or revenue from that tenant

This is what "track LLM API costs by team" actually means in practice. It is not a provider dashboard. It is a join between request telemetry and business metadata.

A useful operating pattern is to calculate three metrics every day:

Cost per request
Cost per successful business action
Cost per active customer or workspace

That lets engineering see technical efficiency and lets FinOps see allocation. If a feature's median request cost stays flat but cost per successful action doubles, the issue is probably retries, low conversion, or prompt churn rather than vendor pricing.

Common mistakes in OpenAI cost attribution and AI cost tracking per feature

The most common failure modes are boring, but expensive:

First, teams attribute by API key only. That works for a single prototype, but it breaks as soon as multiple services or tenants share infrastructure.

Second, they ignore non-success paths. Timeouts, fallbacks, and retries still cost money. If those events are missing from the ledger, your unit cost looks healthier than reality.

Third, they treat prompt caching as a nice-to-have metric instead of part of the billing formula. Cached-input discounts can materially change per-request cost.

Fourth, they reconstruct historical pricing from today's price sheet. Provider pricing changes over time, so the computed cost should be stored with the request event, not recalculated months later unless you also version the rate card.

Finally, they stop at dashboards. Good attribution should trigger action: alerts on sudden request-cost inflation, reports on top-cost features, and weekly review of which customers or internal workflows are drifting out of range.

Summary

LLM cost attribution per request is the control point that makes FinOps for AI operational. The pattern is simple: capture token usage at request time, apply the right model rates, attach team and feature ownership, and store the computed cost as an event you can roll up later.

If you want a fast sanity check before building the full pipeline, the free auditor at agentcolony.org/auditor lets you paste a gateway trace and inspect the per-request cost breakdown. That is often enough to see whether your issue is model choice, prompt size, retries, or missing attribution tags.

FAQ

What is LLM cost attribution per request?

It is the practice of calculating the exact cost of each model call from token usage, rate cards, and any extra tool fees, then attaching that cost to ownership fields like team, feature, and customer.

How do I track LLM API costs by team?

Add a team field to every request event at the point where the call is made or routed. Compute request_cost on ingestion, then group spend by team in your dashboard or warehouse.

Can gateway logs alone handle OpenAI cost attribution?

They can cover the raw token and model layer, which is useful, but they usually do not include ownership, retry semantics, or business context. For serious allocation, you need enrichment on top of gateway data.

How should I handle cached context in per-request LLM cost?

Store cached input tokens separately from fresh input tokens and price them using the provider's cached-input rate. If you merge them into one bucket, your cost model will be wrong.

What is the difference between per-request cost and monthly vendor billing?

Monthly billing tells you how much you spent in total. Per-request cost tells you why you spent it, who owns it, and which feature or customer drove the change.

DEV Community