Sol

Posted on Jun 7

AI Cost Attribution: A Request-Level FinOps Playbook for Platform Engineers

#finops #devops #openai #aws

Request-level attribution works only when every LLM call carries the same ownership fields from app code to the gateway trace: team, service, feature, and an internal trace_id.
Most unattributed AI spend comes from three gaps: missing request tags, gateway-only visibility, and trace payloads that log tokens but not business context.
OpenAI, Anthropic, and Bedrock expose different attribution surfaces, so the safest pattern is to normalize everything into your own attribution schema first.
A chargeback report should group by team, service, and feature, then let you drill down into the individual traces driving the bill.
If you cannot explain the top 10 most expensive traces from last week, you do not yet have usable AI cost attribution.

If you are managing $5k to $50k per month in LLM spend, AI cost attribution stops being a dashboard problem and becomes an instrumentation problem. Platform teams usually discover this the hard way: finance wants a team-level OpenAI cost breakdown, engineering can show total gateway volume, and nobody can explain which feature or service actually burned the budget.

That gap is becoming more urgent. According to the FinOps Foundation State of FinOps 2026 report, 98% of respondents now manage AI spend, up from 63% in 2025 and 31% in 2024. The teams that get ahead of this do not start with prettier reporting. They start by making every request attributable at the call site.

The three attribution gaps behind most unattributed AI spend

Most teams have usage data, but not attribution data. Those are different things.

Missing request tags. The API call has model, token counts, and latency, but nothing that says which team, service, or feature initiated it.
Gateway-level blind spots. A shared gateway can tell you that gpt-5 or claude spend spiked, but not whether the cost came from search, support, internal tooling, or a new experiment.
Trace payload gaps. The trace includes technical fields like request ID and tokens, but omits the business dimensions finance actually needs for chargebacks.

A common failure mode looks like this: the platform team centralizes all LLM traffic behind one gateway, spend becomes visible at the provider level, and attribution actually gets worse because every workload now shares the same credentials and network path.

What request-level attribution must stamp on every call

Your application code should emit one normalized attribution envelope before the provider SDK is invoked. Do not make each team invent its own schema.

{
  "team": "support",
  "service": "ticket-copilot",
  "feature": "summarize-thread",
  "environment": "prod",
  "provider": "openai",
  "model": "gpt-5.4",
  "internal_trace_id": "trc_01JX...",
  "end_user_id": "usr_4821",
  "tenant_id": "acme-co",
  "prompt_template": "ticket_summary_v3"
}

This envelope should travel with the request through three layers:

The app or service call site, where ownership is known.
The gateway or proxy, where pricing, retries, and policy are enforced.
The trace/log sink, where you later build attribution and chargeback reports.

If you only stamp tags at the gateway, you are already too late. The gateway often sees the service but not the business feature, the tenant, or the end-user context that explains why spend changed.

How to instrument OpenAI, Anthropic, and Bedrock without losing ownership

Provider APIs differ, so normalize first and then map into whatever each provider supports.

For OpenAI, always attach your own unique request identifier with the X-Client-Request-Id header and log the returned x-request-id for reconciliation and support workflows. OpenAI also supports project-scoped accounting with the OpenAI-Project header, which is useful for coarse splits such as business unit or environment. That gives you a clean provider-side project boundary, while your own trace carries the fine-grained team, service, and feature fields. See the OpenAI API reference.

For Anthropic, plan on keeping fine-grained business attribution in your own gateway trace. In practice, many teams use separate API keys or workspaces for coarse ownership and rely on their own request envelope for per-feature chargebacks. That avoids coupling your reporting model to a provider-specific admin view.

For Amazon Bedrock, use two layers on purpose. At the per-request layer, set requestMetadata on each call so the tag lands in model invocation logs. At the billing layer, use IAM principal attribution, Projects, or application inference profiles so spend appears in Cost Explorer or CUR with stable cost allocation dimensions. AWS is explicit that per-prompt detail lives in invocation logs, not in Cost Explorer or CUR, so you need both mechanisms for a full picture. See the Bedrock cost management FAQ and Projects documentation.

App-level vs gateway-level attribution

You need both app tags and gateway aggregation, but they solve different problems.

Layer	Best for	Fields you should expect	What breaks if you rely on it alone
App-level attribution	Team, service, feature, tenant, user, prompt template	`team`, `service`, `feature`, `tenant_id`, `internal_trace_id`	Finance cannot split shared gateway spend by product area if tags are missing
Gateway-level attribution	Central pricing, retries, provider normalization, policy	`provider`, `model`, `request_id`, token counts, latency, retry count	You can see spend totals but not the business owner of the request
Billing-layer attribution	Monthly chargebacks, budget owners, cost center rollups	project, account, workspace, IAM/session tags	You lose per-request detail and root-cause analysis

The practical rule is simple: app-level data explains who should pay, gateway data explains what happened, and billing-layer data explains what hit the invoice.

How to build a chargebacks report that finance can actually use

A useful AI chargeback report is boring in a good way. It should answer who spent money, on what, and why the number moved.

Start with daily or weekly aggregates grouped by team, service, feature, provider, and model. Then add these measures:

request count
input tokens
output tokens
estimated cost
percentage of total spend
week-over-week change
top trace IDs contributing to the increase

A simple example for one week might look like this:

Team	Service	Feature	Estimated spend	Share of total	WoW change
Support	ticket-copilot	summarize-thread	$2,420	40.1%	+18%
Search	retrieval-api	answer-generation	$1,140	18.9%	+7%
Growth	onboarding-bot	email-drafting	$1,860	30.8%	+42%
Internal Tools	eng-assistant	sql-helper	$620	10.2%	-6%

This report does two important things. First, it gives finance a chargeback basis. Second, it tells engineering where to investigate. A 42% jump in one feature is a debugging target, not just a budget note.

If you are on Bedrock, note one operational detail from AWS that is easy to miss: cost allocation tags can take up to 24 hours to appear in Cost Explorer or CUR after activation, and they are not retroactive. Turn them on before rollout, not after the monthly close surprises you.

How to read a gateway trace payload to find the budget burner

The trace payload is where attribution becomes operationally useful. You are no longer asking only, "Which team spent the money?" You are asking, "What exact request pattern caused the spend?"

A useful gateway trace should contain at least these fields:

{
  "team": "growth",
  "service": "onboarding-bot",
  "feature": "first-run-email",
  "provider": "anthropic",
  "model": "claude-sonnet-4",
  "request_id": "req_9h2...",
  "internal_trace_id": "trc_7Qa...",
  "input_tokens": 18420,
  "output_tokens": 2100,
  "latency_ms": 2870,
  "retry_count": 2,
  "cache_hit": false,
  "estimated_cost_usd": 0.098
}

From there, read the payload in this order:

Sort by estimated_cost_usd descending. Start with the expensive traces, not the noisiest ones.
Check team, service, and feature. If any are null, you found unattributed spend.
Compare input_tokens and output_tokens. High input with modest output usually means prompt bloat or oversized retrieved context. High output with modest input often points to unconstrained generation.
Check retry_count. Duplicate retries quietly inflate cost and are common after timeout handling bugs.
Group by prompt template or feature version. Spikes often align to a rollout, not to organic growth.

This is where gateway trace analysis earns its keep. The monthly invoice tells you that support spent more. The trace tells you that one prompt template started shipping 18k-token contexts with no cache hits after a retrieval change.

Controls that keep attribution from drifting over time

Good attribution decays unless you make it hard to bypass.

Use a shared client or SDK wrapper that refuses to send requests without team, service, and feature. Enforce an allowlist for team and service names so reporting does not fragment into growth, Growth, and growth-team. Add a nightly report for null or unknown tags. Keep one explicit shared bucket, such as platform-shared, for truly unallocatable costs instead of letting them disappear into unlabeled traffic.

Also separate ownership attribution from pricing logic. Your app should know who owns a request. Your gateway should know how to calculate cost, normalize token fields across providers, and join retries or cache events back to the original trace.

Finally, audit the top 10 most expensive traces every week. If human review cannot explain them in five minutes, your schema is still missing something important.

Summary

Request-level AI cost attribution is not a reporting feature you add at the end. It is a contract you enforce at the call site. Stamp team, service, feature, and a stable internal trace ID on every request before it reaches OpenAI, Anthropic, or Bedrock. Use the gateway to normalize usage and estimate cost. Use billing-layer tags for monthly chargebacks. Then read the trace payloads to explain the spikes.

If you already have gateway traces and want to see whether they carry enough data for per-team attribution, paste one into the free AI trace auditor. It is a fast way to spot missing ownership fields before finance asks for the next cost breakdown.

FAQ

How do I split OpenAI costs by team?

Use your own request envelope to stamp team, service, and feature at the app call site, then propagate the internal trace ID through the gateway. For coarse provider-side separation, use distinct OpenAI projects and the OpenAI-Project header. For real chargebacks, rely on your own trace-level grouping rather than provider totals alone.

What is request-level attribution?

Request-level attribution means each individual LLM call can be tied back to a business owner and use case, not just to a shared account or gateway. In practice, that means every request carries ownership fields plus a trace ID, and the resulting logs preserve those fields next to tokens, latency, and cost.

Should I rely on my LLM gateway alone for attribution?

No. A gateway is excellent for central enforcement and normalization, but it often lacks the business context known only at the app layer. If app code does not provide ownership tags, the gateway can aggregate spend but cannot explain who should pay for it.

How do I allocate shared platform or experimentation costs?

Create an explicit shared bucket such as platform-shared or experiments-unassigned and track it separately. Do not smear those costs across product teams by guesswork. Shared buckets are acceptable as long as they are small, visible, and reviewed regularly.

What should be in a gateway trace payload for AI spend chargebacks?

At minimum: team, service, feature, provider, model, provider request ID, internal trace ID, input tokens, output tokens, latency, retry count, and estimated cost. If you support multi-tenant workloads, include tenant_id too. Without those fields, you can trend spend but you cannot explain it.

DEV Community