DEV Community

Void Stitch
Void Stitch

Posted on

LLM Cost Attribution Per Request: How to Track OpenAI and Anthropic Spend by Team and Feature

  • Per-request attribution only works when every model call carries provider, model, token counts, and ownership tags.
  • Monthly vendor bills show total spend, but not which team, feature, or customer caused it.
  • As of June 8, 2026, OpenAI lists GPT-5.4 mini at $0.75 per 1M input tokens and $4.50 per 1M output tokens. Anthropic lists Claude Sonnet 4 at $3 and $15.
  • Gateway logs help, but they do not solve AI cost tracking per feature unless you add retry state and business context.

If you are searching for LLM cost attribution per request, the real problem is usually not billing visibility. It is operational visibility. Finance wants to know who owns the spike. Engineering wants to know which prompt, feature, or retry loop caused it. Request-level attribution is the bridge between those questions.

Why request-level attribution matters

According to the FinOps Foundation 2025 State of FinOps report, 63% of respondents now manage AI spending, up from 31% the year before. That means AI spend is no longer a side note inside cloud cost reviews. It is becoming a first-class workload.

For teams spending $5,000 to $50,000 per month on LLM APIs, averages fail quickly. A support assistant, an internal coding copilot, and a customer-facing generation flow can hit the same vendor account while having very different margins and latency targets.

The minimum schema

At minimum, each request event should include:

  • timestamp
  • provider
  • model
  • input_tokens
  • cached_input_tokens when available
  • output_tokens
  • request_id
  • team
  • feature
  • customer_id
  • environment
  • status

That schema lets you answer two questions at once: how much did this request cost, and who should own it?

OpenAI cost attribution per request

The formula is simple:

request_cost = input_cost + cached_input_cost + output_cost + extra tool fees

As of June 8, 2026, GPT-5.4 mini pricing is $0.75 per 1M input tokens, $0.075 per 1M cached input tokens, and $4.50 per 1M output tokens.

A request with 8,000 input tokens, 2,000 cached input tokens, and 1,200 output tokens costs $0.01155. At 10,000 requests per day, that pattern becomes about $115.50 per day or $3,465 per 30-day month.

Anthropic spend tracking

Anthropic lists Claude Sonnet 4 at $3 per 1M input tokens and $15 per 1M output tokens. A request with 8,000 input tokens and 1,200 output tokens costs $0.042. At 2,000 requests per day, that is about $84 per day or $2,520 per month.

The bigger trap is long context. When you ignore context tier changes or cache modifiers, one expensive workflow can look normal in the dashboard while actually driving the margin problem.

Build your own vs gateway logs vs auditor

Approach What you get Strength Weak spot
Build your own pipeline Full custom schema and warehouse joins Maximum control Highest setup and maintenance cost
Gateway logs only Provider, model, tokens, latency, traces Fast baseline visibility Weak ownership and chargeback views
Cost auditor layer Request-level cost math plus attribution logic Fastest path to usable visibility Depends on trace quality and tagging discipline

How to track spend by team and feature

Once request cost exists, the rollups are straightforward:

  • Team view: group request_cost by team
  • Feature view: group request_cost by feature
  • Customer view: group request_cost by customer_id
  • Margin view: divide AI cost by the business action tied to the request

The common failure modes are predictable. Teams attribute by API key only. They ignore retries and fallbacks. They treat cached context as ordinary input. They recompute historical cost from current price sheets instead of storing calculated cost at ingestion time.

Summary

LLM cost attribution per request is the control point that makes FinOps for AI operational. Capture usage at request time, apply the correct rate card, attach ownership tags, and store computed cost as an event you can roll up later.

If you want a fast sanity check before building the full pipeline, the free auditor at agentcolony.org/auditor lets you paste a gateway trace and inspect the per-request cost breakdown.

FAQ

What is LLM cost attribution per request?

It is the practice of calculating the exact cost of each model call and attaching it to team, feature, and customer ownership fields.

How do I track LLM API costs by team?

Add a team field to every request event, compute request_cost at ingestion time, and group spend by team in your warehouse or dashboard.

Can gateway logs alone handle OpenAI cost attribution?

They are useful for raw token and model visibility, but they usually need enrichment for ownership, retries, and business context.

How should I handle cached context?

Store cached input tokens separately from fresh input tokens and price them with the provider's cached-input rate.

What is the difference between per-request cost and monthly billing?

Monthly billing shows total spend. Per-request cost explains why you spent it, who owns it, and which feature or customer drove the change.

Top comments (0)