Cost Attribution for AI Workloads

Track LLM spend by team, project, and application using Bifrost's hierarchical budgets, virtual keys, and native observability exports.

Teams often discover that their LLM invoices from providers arrive as a single blended total, making it impossible to determine which internal group, service, or project was responsible for the cost. Bifrost, an open-source AI gateway built in Go by Maxim AI, sits between applications and providers as a single routing layer where every request is identified, costed, and recorded with team ownership tags before it travels to the vendor. This guide explains how to establish cost transparency for AI: the organizational hierarchy that owns spend, the instrumentation that captures it, and the export mechanisms that integrate with your monitoring infrastructure.

Understanding Per-Team Cost Attribution for LLM Workloads

Per-team cost attribution is the process of connecting each unit of provider spend to the team, service, or project that initiated it, rather than accepting a single monthly invoice total. This requires assigning an owner dimension to each request, computing its expense at the point of invocation, and rolling those costs up according to organizational structure.

The FinOps framework recognizes two distinct use cases built on cost attribution: showback displays costs to each team for awareness, while chargeback bills teams directly. The FinOps Foundation identifies cost allocation precision as the prerequisite for both models. For LLM systems, the cost unit is the individual request, so measurement must occur at the source of each request issuance.

Where Per-Team Attribution Fails When Teams Share Provider Keys

Attribution becomes impossible when a single provider credential is shared across many internal teams. A provider's cost dashboard shows spending per account and per model, but has zero visibility into your organizational structure, so splitting one AWS Bedrock or OpenAI invoice without external context is not feasible.

Three conditions compound this problem at scale:

Credential sharing: Multiple applications and teams use a single provider API key, so the provider sees one customer, not the actual ten teams generating traffic.
Multi-vendor billing: Usage distributed across OpenAI, Anthropic, AWS Bedrock, and other providers generates separate invoices with different cost semantics.
Missing request context: Without team, project, or environment metadata embedded in each request, reconstructing costs by team after the fact is either impossible or highly error-prone.

A gateway architecture addresses this by becoming the single aggregation point where every request is seen, priced, and tagged. Bifrost secures all real provider credentials internally and distributes scoped virtual keys to consumers as part of its governance model, which establishes cost ownership at the moment of the request rather than requiring retrospective invoice analysis.

How Bifrost's Three-Tier Governance Model Assigns Spend

Bifrost's cost model uses three organizational levels to partition spend: customers, teams, and virtual keys. Each tier maintains an independent budget, and cumulative budget checks occur before the request proceeds to the provider. This ensures automatic rollup: a virtual key's usage feeds into its team's budget, and a team's usage feeds into its customer's budget.

The model mirrors typical organizational charts:

Tier	Purpose in cost tracking	Budget model
Customer	Top-level entity or business division	Independent spending limit, organization-wide ceiling
Team	Department, squad, or functional area	Independent spending limit, group-level ceiling
Virtual Key	Individual developer, microservice, or CI/CD pipeline	Independent limit plus token count and request rate ceilings

Virtual keys function as the atomic governance unit. Each key represents one consumer, specifies which vendors and model families that consumer has access to, and maps to exactly one team or one customer. A twelve-engineer team might have a $500 monthly allocation split across the group, with each individual key capped at $60 monthly; a request succeeds only if both limits have remaining budget. Exceeding any budget level results in an error response coded as a governance violation, preventing further spend accumulation.

Because cost is calculated from the model pricing catalog and deducted across all tiers synchronously with request execution, the metering system naturally generates cost visibility at every level: customer, team, virtual key, and provider simultaneously. When teams negotiate volume discounts with vendors, custom pricing tables ensure reported costs align with negotiated rates rather than default pricing. Request rate limits and token caps also attach to virtual keys, providing request throttling independent of budget constraints.

Instrumentation: How Bifrost Records and Labels Usage

Governance determines ownership; instrumentation records the economics. Bifrost emits native Prometheus metrics through a /metrics endpoint, with counters such as bifrost_cost_total (USD spend), bifrost_input_tokens_total, and bifrost_output_tokens_total collected without introducing request-path latency.

All metrics carry intrinsic labels for immediate queryability: provider, model, virtual_key_id, and virtual_key_name. To align metrics with your own organizational taxonomy, custom dimensions can be attached in two ways:

Static configuration: Define dimensions like team, environment, organization, and project in the gateway's config file, automatically tagging every metric.
Per-request headers: Attach values at request time using x-bf-dim-* headers, for instance x-bf-dim-team: payments or x-bf-dim-project: agent-framework.

These per-request header dimensions propagate beyond Prometheus. The same x-bf-dim-* values are carried into application logs, OpenTelemetry span context, and Maxim's native tagging system, ensuring consistent attribution across all observability backends. A typical cost aggregation query then looks like: sum by (team) (increase(bifrost_cost_total[1d])), which returns 24-hour LLM spend grouped by team.

Requiring specific headers guarantees no request escapes measurement. When you configure a mandatory header like X-Tenant-ID, the gateway's policy layer rejects any request lacking it with a 400 error, preventing unattributed requests from reaching the provider and landing in an unallocated spend bucket.

The Export Stack: Where Usage Data Flows for Reporting

The export layer encompasses all systems and tools where cost and usage data ultimately appears, allowing your teams to view spending. Bifrost feeds existing observability systems rather than serving as a replacement, keeping cost visibility in the dashboards your infrastructure teams already know.

There are three core export channels:

Prometheus scraping and Grafana visualization: Ingest the /metrics endpoint and create cost dashboards using bifrost_cost_total and token counters, sliced by custom dimensions. Alert rules trigger when team spending surpasses thresholds.
OpenTelemetry export: Cost and token data are exported as OTLP traces conforming to the GenAI telemetry spec, feeding platforms like Datadog, New Relic, and Honeycomb.
Structured request logs: A built-in logging system records every request with associated token counts and cost, accessible via a logs query API that filters by provider, model, completion status, and time span for flexible team-level reporting.

The Bifrost Enterprise tier introduces audit logs and compliance-grade log exports for regulated organizations, combined with SAML/OIDC identity provider integration and RBAC so team membership stays current with your identity system. The same cost model also covers tool calls: when Bifrost operates as an MCP gateway, tool invocations flow through the same metered, budgeted path as language model calls. All of this operates with only 11 microseconds of added latency per request at 5,000 concurrent requests in independent performance benchmarks, so cost tracking does not impose a latency penalty.

Frequently Asked Questions

How does per-team attribution differ from what a cloud provider bills?

Provider billing dashboards show total charges per account and per model without understanding your internal organization. Per-team attribution labels every request with owning team metadata at execution time, splitting charges along your own organizational structure that the vendor never knows about.

Can you implement team-based cost tracking without code changes?

Absolutely. Switching to the gateway requires only changing the provider URL, and organizational dimensions can be supplied via x-bf-dim-* headers or configured as mandatory at the gateway layer. Your applications continue to use the same provider SDKs unchanged.

What occurs if a team burns through its allotted budget?

When a team's budget is exhausted, the gateway rejects requests with a budget-limit error message before sending anything to the provider, halting spend at that boundary. Budget enforcement cascades: a request is denied if any level (the virtual key, its parent team, or the parent customer) has zero budget remaining.

How is cost tracking managed when multiple teams share a vendor API key?

Real vendor credentials are kept secret inside the gateway. Teams and services instead receive virtual keys, so even if multiple virtual keys ultimately use one shared vendor credential, expense is tagged to the original virtual key and its owning team.

Implementing Per-Team Cost Attribution

Per-team cost attribution transforms an opaque provider invoice into granular visibility across teams, services, and business units by combining a layered ownership model with continuous metering. Using a three-tier budget hierarchy, virtual key credentials, customizable dimension labels, and out-of-the-box Prometheus, OpenTelemetry, and logs integration, the open-source Bifrost gateway equips infrastructure teams with the same cost governance tools for AI that they apply to cloud services.

To explore how Bifrost enables your platform team to implement per-team cost attribution and spending oversight across every service and AI provider, schedule a demo with the Bifrost team at https://getmaxim.ai/bifrost/book-a-demo.