- Per-request attribution only works when every model call carries provider, model, token counts, and ownership tags.
- Monthly vendor bills show total spend, but not which team, feature, or customer caused it.
- As of June 8, 2026, OpenAI lists GPT-5.4 mini at $0.75 per 1M input tokens and $4.50 per 1M output tokens. Anthropic lists Claude Sonnet 4 at $3 and $15.
- Gateway logs help, but they do not solve AI cost tracking per feature unless you add retry state and business context.
If you are searching for LLM cost attribution per request, the real problem is usually not billing visibility. It is operational visibility. Finance wants to know who owns the spike. Engineering wants to know which prompt, feature, or retry loop caused it. Request-level attribution is the bridge between those questions.
Why request-level attribution matters
According to the FinOps Foundation 2025 State of FinOps report, 63% of respondents now manage AI spending, up from 31% the year before. That means AI spend is no longer a side note inside cloud cost reviews. It is becoming a first-class workload.
For teams spending $5,000 to $50,000 per month on LLM APIs, averages fail quickly. A support assistant, an internal coding copilot, and a customer-facing generation flow can hit the same vendor account while having very different margins and latency targets.
The minimum schema
At minimum, each request event should include:
- timestamp
- provider
- model
- input_tokens
- cached_input_tokens when available
- output_tokens
- request_id
- team
- feature
- customer_id
- environment
- status
That schema lets you answer two questions at once: how much did this request cost, and who should own it?
OpenAI cost attribution per request
The formula is simple:
request_cost = input_cost + cached_input_cost + output_cost + extra tool fees
As of June 8, 2026, GPT-5.4 mini pricing is $0.75 per 1M input tokens, $0.075 per 1M cached input tokens, and $4.50 per 1M output tokens.
A request with 8,000 input tokens, 2,000 cached input tokens, and 1,200 output tokens costs $0.01155. At 10,000 requests per day, that pattern becomes about $115.50 per day or $3,465 per 30-day month.
Anthropic spend tracking
Anthropic lists Claude Sonnet 4 at $3 per 1M input tokens and $15 per 1M output tokens. A request with 8,000 input tokens and 1,200 output tokens costs $0.042. At 2,000 requests per day, that is about $84 per day or $2,520 per month.
The bigger trap is long context. When you ignore context tier changes or cache modifiers, one expensive workflow can look normal in the dashboard while actually driving the margin problem.
Build your own vs gateway logs vs auditor
| Approach | What you get | Strength | Weak spot |
|---|---|---|---|
| Build your own pipeline | Full custom schema and warehouse joins | Maximum control | Highest setup and maintenance cost |
| Gateway logs only | Provider, model, tokens, latency, traces | Fast baseline visibility | Weak ownership and chargeback views |
| Cost auditor layer | Request-level cost math plus attribution logic | Fastest path to usable visibility | Depends on trace quality and tagging discipline |
How to track spend by team and feature
Once request cost exists, the rollups are straightforward:
- Team view: group request_cost by team
- Feature view: group request_cost by feature
- Customer view: group request_cost by customer_id
- Margin view: divide AI cost by the business action tied to the request
The common failure modes are predictable. Teams attribute by API key only. They ignore retries and fallbacks. They treat cached context as ordinary input. They recompute historical cost from current price sheets instead of storing calculated cost at ingestion time.
Summary
LLM cost attribution per request is the control point that makes FinOps for AI operational. Capture usage at request time, apply the correct rate card, attach ownership tags, and store computed cost as an event you can roll up later.
If you want a fast sanity check before building the full pipeline, the free auditor at agentcolony.org/auditor lets you paste a gateway trace and inspect the per-request cost breakdown.
FAQ
What is LLM cost attribution per request?
It is the practice of calculating the exact cost of each model call and attaching it to team, feature, and customer ownership fields.
How do I track LLM API costs by team?
Add a team field to every request event, compute request_cost at ingestion time, and group spend by team in your warehouse or dashboard.
Can gateway logs alone handle OpenAI cost attribution?
They are useful for raw token and model visibility, but they usually need enrichment for ownership, retries, and business context.
How should I handle cached context?
Store cached input tokens separately from fresh input tokens and price them with the provider's cached-input rate.
What is the difference between per-request cost and monthly billing?
Monthly billing shows total spend. Per-request cost explains why you spent it, who owns it, and which feature or customer drove the change.
Top comments (0)