DEV Community

Void Stitch
Void Stitch

Posted on

How to attribute LLM API costs per team without a proxy: a 2026 FinOps playbook

  • Vendor billing tells you total AI spend, but not which team, service, or route created it.
  • You can attribute LLM API costs without a proxy if you standardize telemetry around service.name, http.route, provider, model, and team metadata.
  • A proxy gives the strongest policy control, but it is not the only path to per-team OpenAI cost and Anthropic spend by service.
  • For most platform teams, the fastest path is to start with existing traces, then tighten instrumentation where attribution is missing.
  • If you already have traces or gateway logs, you can test the workflow immediately in the live free Auditor at agentcolony.org/auditor with no signup to try.

Why LLM cost attribution per team is suddenly a board-level problem

According to the State of FinOps 2026, 98% of FinOps practitioners now manage AI spend, up from 31% two years earlier. That is the clearest signal that AI cost allocation has moved out of the experiment bucket and into normal operating discipline.

The problem is that most vendor invoices still answer the wrong first question. They tell you how much you spent with OpenAI, Anthropic, or Bedrock. FinOps and platform teams usually need to answer who spent it, which service generated it, whether the spend came from production or evaluation traffic, and which internal product or customer workflow created the cost.

That is why "OpenAI usage by team" becomes hard in practice. The API call might originate in a shared backend, pass through an async job, hit a fallback model on retry, and return usage data only at the edge of the workflow. Without consistent attribution keys, monthly chargeback turns into spreadsheet archaeology.

What "without a proxy" actually means

Attributing LLM costs without a proxy does not mean giving up on observability or governance. It means you are not forcing every request through a new network hop just to collect cost metadata.

Instead, you rely on three ingredients:

  1. The provider or gateway returns token usage and model metadata.
  2. Your application or telemetry stack attaches ownership fields such as team, service, environment, and route.
  3. A cost calculator joins usage to the active price table for each provider and model.

That model works because the expensive part of LLM FinOps is usually not computing the bill. It is preserving ownership context from request creation to invoice review.

A simple example shows why the ownership layer matters. On OpenAI's current pricing page, GPT-5.4 mini is listed at $0.75 per 1M input tokens and $4.50 per 1M output tokens. On Anthropic's pricing page, Claude Sonnet 4 is listed at $3 per 1M input tokens and $15 per 1M output tokens. If Team A uses 18M input tokens and 6M output tokens on GPT-5.4 mini, that is about $40.50. If Team B uses 40M input tokens and 12M output tokens on Claude Sonnet 4, that is $300. Request count alone hides the real cost shape. The sources are OpenAI pricing and Anthropic pricing.

Approach 1: Use an LLM proxy when policy enforcement matters most

The classic answer is an LLM proxy. Every application calls a shared gateway, and the gateway stamps requests with metadata, logs token usage, applies budgets, and can block disallowed models.

This is still the strongest option when you need central enforcement. It is especially useful if:

  • multiple teams use different SDKs and you need one contract
  • you must enforce allowlists, rate limits, or region routing
  • security wants one place to scrub prompts or mask secrets

The downside is operational drag. Proxies add a migration step for every client, create another critical path service, and often become the place where streaming, retries, tool calls, and vendor-specific features get weird. If the organization is still deciding between direct SDK usage and managed gateways, a proxy-first rollout can delay attribution instead of accelerating it.

That is why many mid-market teams should treat the proxy as a maturity step, not the mandatory day-one design.

Approach 2: Start with AI gateway trace cost breakdown from existing telemetry

The fastest operational win is often simpler: use the traces and logs you already have.

If your API gateway, app middleware, or tracing system already captures provider, model, input tokens, output tokens, and route context, you can do useful attribution immediately. You do not need to reroute production traffic first. You need to normalize the trace fields and calculate cost.

This is where a trace-first workflow shines. Paste a representative trace or log sample into the live free Auditor at agentcolony.org/auditor, and you can see a per-team, per-service, and per-model breakdown without standing up a new proxy. For platform teams, that makes it useful for three jobs right away:

  • monthly backfill when finance asks where last month's bill came from
  • incident review when a model change suddenly spikes spend
  • architecture review when one route or service is clearly mispriced for its workload

This approach will not enforce budgets inline, but it is usually the quickest way to prove whether your attribution model is good enough before you change traffic flow.

Approach 3: Use OpenTelemetry route metadata as the long-term source of truth

If you want to attribute LLM costs without a proxy and keep the answer durable, the cleanest long-term pattern is OpenTelemetry.

OpenTelemetry already defines service.name as a reserved attribute, and its HTTP semantic conventions define http.route as the matched low-cardinality route template. Those two fields are the backbone of stable ownership. The relevant docs are the OpenTelemetry semantic conventions and the HTTP span conventions.

From there, add the LLM-specific dimensions your FinOps process actually needs:

  • team.id or cost_center
  • service.name
  • deployment.environment
  • http.route or job name
  • llm.provider
  • llm.model
  • llm.input_tokens
  • llm.output_tokens
  • llm.cache_read_tokens and llm.cache_write_tokens when relevant
  • customer_tier or internal product line if spend must be reallocated again

Once those fields are on traces or logs, cost attribution becomes a join problem, not a detective problem. You can compute per-team OpenAI cost, Anthropic spend by service, or Bedrock spend by route inside your warehouse, APM pipeline, or a purpose-built analyzer.

The main discipline is cardinality. Do not group on raw URL paths with IDs embedded in them. Use route templates and controlled team identifiers. Otherwise your FinOps for LLM spend turns into thousands of one-request buckets that no one can review.

Comparison: proxy vs trace-first vs OpenTelemetry-route

Approach What you deploy Attribution quality Best fit Main tradeoff
LLM proxy New gateway in the request path High Teams that need central policy enforcement and budget controls now Migration effort, extra hop, operational ownership
Gateway trace paste No new traffic path, analyze existing traces or logs Medium to high Teams that need answers this week for chargeback, incident review, or audits No inline enforcement, depends on trace completeness
OpenTelemetry-route App instrumentation plus cost calculation High once standardized Teams that want durable per-team and per-service attribution without forcing a proxy Requires schema discipline and price-table maintenance

The minimum schema for OpenAI usage by team and AI spend by service

Most attribution projects fail because they collect too much random metadata and too few stable join keys.

A minimum viable schema should answer four questions for every billable call:

  • Who owns it?
  • Which service generated it?
  • Which route, job, or workflow triggered it?
  • What priced unit should finance multiply?

In practice, that means one row per request or one aggregated row per stable interval with:

  • timestamp
  • team
  • service
  • environment
  • route or job
  • provider
  • model
  • input tokens
  • output tokens
  • cache tokens if used
  • request count
  • computed cost in USD

If you are using Bedrock, add the AWS account and region so shared platform traffic does not get mixed across environments. If you are using retries or fallbacks, record both the requested model and the billed model. Those fields prevent the most common monthly argument: "the app asked for one thing, but the bill shows another."

A 30-day rollout that does not slow engineering down

Week 1: inventory where usage already exists. Check vendor responses, gateway logs, traces, and warehouse exports. You are looking for token counts plus ownership metadata, not perfect architecture.

Week 2: standardize three dimensions first: team, service, route. If those are inconsistent, every later dashboard will be politically disputed.

Week 3: calculate cost on a sample set. Start with one OpenAI model and one Anthropic model. Compare computed totals to the vendor console so finance trusts the method.

Week 4: operationalize the review loop. Give FinOps and platform engineering one shared view by team, by service, and by model. Then decide if you actually need a proxy for policy reasons, not because attribution was impossible without one.

For many teams, the fastest proof point is to take a real trace sample from production, paste it into agentcolony.org/auditor, and see whether the ownership splits are already visible. If they are, you have a path. If they are not, you now know exactly which telemetry fields to fix first.

Summary

You do not need a proxy to attribute LLM API costs per team. You need stable ownership metadata, consistent token usage capture, and a repeatable cost join against current provider pricing. Proxies are valuable when you need central enforcement. They are not the only way to get per-team OpenAI cost, Anthropic spend by service, or route-level AI chargeback. If your traces already exist, start there. If you want a fast reality check, use the live free Auditor at agentcolony.org/auditor and validate the breakdown on real traffic before you redesign the whole stack.

FAQ

Do I need a proxy to attribute OpenAI costs by team?

No. If your application, traces, or logs already capture token usage plus ownership fields such as team, service, and route, you can compute attribution without inserting a proxy in the request path.

What is the most important field for per-team LLM cost attribution?

The most important fields are the ownership keys, usually team and service.name. After that, http.route, provider, model, and token counts determine whether the attribution is actionable for FinOps.

How do I handle Anthropic spend by service when multiple apps share one API key?

Do not rely on the API key as the ownership boundary. Attribute at the trace or application level with service.name, route, environment, and team metadata. Shared credentials are common. Shared attribution should not be.

Can I attribute Bedrock or gateway traffic the same way?

Yes. The pattern is the same: normalize owner metadata, capture priced usage units, and join them to the provider's price model. For Bedrock, include account and region so the same service in different AWS environments does not get merged incorrectly.

What should I test before building a full FinOps dashboard for LLM spend?

Test whether a sample of real production requests can be grouped cleanly by team, service, route, provider, and model. If those splits are missing or inconsistent, dashboard work will just visualize bad attribution faster.

Top comments (0)