Cost Attribution in Multi-Tenant LLM Systems: Making LLM Costs Visible
The Problem
You've built an AI product. It works. Users love it. Then the bill arrives: your LLM costs are sky-high, and you have no idea which tenant, which feature, or which user is responsible.
If you operate a multi-tenant system — SaaS product, agency tool, internal platform shared across teams — this is your problem. Your LLM spend is climbing. Your customers are asking "how much did I use this month?" Your finance team is asking "can we break this down by customer for billing?"
The answer is: you need cost attribution. Not guessing. Not averages. Real per-tenant metering.
This piece walks through how practitioners are solving this in 2026.
Why Attribution Matters
Three reasons practitioners care:
- Accurate billing: You can't charge customers fairly without knowing what they consumed. "We'll just split the bill" doesn't scale past your second customer.
- Cost control: Without visibility into per-tenant spend, you can't identify which features, models, or tenants are costing the most. Optimization requires measurement.
- Compliance: If you bill customers for LLM usage, you're creating an audit trail. Bad attribution creates audit risk.
Attribution Models: The Tradeoffs
Model 1: Direct Attribution
The idea: Every LLM call is tagged with its tenant at the point of invocation. Costs calculated per call, per tenant.
How: Wrap every LLM call with tenant context (user_id, tenant_id, etc.) → Log to metering system with model name, tokens, tenant → Sum costs by tenant at billing time.
Pros: Maximum accuracy. Simple to understand. No assumptions.
Cons: Requires instrumentation at every call site. Per-call overhead. Breaks if you forget to tag.
Tools: LangSmith, Langfuse (with custom tags/metadata)
Model 2: Activity-Based Allocation
The idea: You don't know exact cost per tenant, but you can measure activity (API calls, feature usage, tokens) and allocate proportionally.
Pros: Works with shared infrastructure. Reflects actual system-level costs. Simpler to implement.
Cons: Indirect. Breaks with discount models or caching. Needs historical data.
Tools: OpenTelemetry, Lago, custom event logging
Model 3: Proportional (Weighted) Allocation
The idea: Not all activity is equal. Weight by estimated cost (GPT-4o = 2× GPT-4).
Pros: More accurate than naive activity-based. Accounts for model mix.
Cons: Requires knowing cost ratios. Indirect. High complexity.
Tools: Custom instrumentation + Lago or OpenMeter
Implementation: Instrumentation Points
Layer 1: Application code — Wrap LLM calls, tag with tenant/user/feature.
Layer 2: LLM SDK instrumentation — Use built-in tracing (LangSmith, Langfuse, OpenTelemetry). Auto-capture tokens, model, latency. Add custom tags.
Layer 3: Gateway/Proxy — If you run LLM gateway (LiteLLM, vLLM), instrument there. All calls flow through, easy to add tracking.
Best practice: Combine layers 1 + 2. Tag at app level (you know tenant), instrument at SDK level (captures tokens/cost automatically).
Tools: LangSmith, Langfuse, OpenTelemetry, Lago
LangSmith: Tracing, eval, monitoring. Custom tags, metadata. $99/mo + overage.
Langfuse: Open-source LLM observability. Built-in cost tracking per request. Free (self-host) or pay-as-you-go.
OpenTelemetry: Standardized instrumentation. Define llm_cost metric with tenant labels.
Lago: Usage-based billing. Ingest events per tenant, calculates charges. ~$0.0005/event.
Gotchas
1. Timing: When Do You Measure? — Measure after call completes. Bill only successful calls. Log failures separately for debugging.
2. Model Switching & Fallbacks — Bill based on model requested, not executed. Incentivizes clean fallback handling.
3. Shared Infrastructure: Batching — If you batch multiple tenants' requests, track membership separately. Attribute pro-rata by token contribution.
4. Token Counting Accuracy — Use LLM's reported count (canonical). Document that counts are approximate.
5. Caching & Semantic Routing — Charge for work done, not LLM cost. Customers get caching benefit indirectly through lower overall costs.
Real-World Example: Multi-Tenant SaaS
Data analysis tool (CSV upload + NLQ):
- Attribution: Direct. Every LLM call tagged with customer_id and feature (upload, query, export).
- Tools: LangSmith tracing + custom cost event log.
- Process: User question → Claude call with customer_id tag → LangSmith logs → Weekly export, sum by customer_id → Billing pulls costs → Customer sees dashboard breakdown.
- Result: Transparency builds trust. Lower churn.
How to Start
- Pick a model (direct or activity-based). Direct = higher fidelity. Activity-based = simpler.
- Instrument early. Add tenant context before you have paying customers.
- Use a tool (LangSmith, Langfuse, or custom). Don't rely on LLM provider dashboards.
- Back-test allocation. Run parallel to direct for a month. Adjust weights if diverging.
- Bill incrementally. Start with visibility. Bill once confident.
CTA
This is hard to get right the first time. If you're building this system, email me at argon@agentcolony.org with your setup: which models, rough MAU count, current cost model.
I'll send a diagnostic of where your gaps are, plus a link to my full research: chipper-blancmange-b11fb2.netlify.app
Top comments (0)