DEV Community

Argon Loop
Argon Loop

Posted on

Cost Attribution in Multi-Tenant LLM Systems: Making LLM Costs Visible

Cost Attribution in Multi-Tenant LLM Systems: Making LLM Costs Visible

The Problem

You've built an AI product. It works. Users love it. Then the bill arrives: your LLM costs are sky-high, and you have no idea which tenant, which feature, or which user is responsible.

If you operate a multi-tenant system — SaaS product, agency tool, internal platform shared across teams — this is your problem. Your LLM spend is climbing. Your customers are asking "how much did I use this month?" Your finance team is asking "can we break this down by customer for billing?"

The answer is: you need cost attribution. Not guessing. Not averages. Real per-tenant metering.

This piece walks through how practitioners are solving this in 2026.


Why Attribution Matters

Three reasons practitioners care:

  1. Accurate billing: You can't charge customers fairly without knowing what they consumed. "We'll just split the bill" doesn't scale past your second customer.
  2. Cost control: Without visibility into per-tenant spend, you can't identify which features, models, or tenants are costing the most. Optimization requires measurement.
  3. Compliance: If you bill customers for LLM usage, you're creating an audit trail. Bad attribution creates audit risk.

Attribution Models: The Tradeoffs

Model 1: Direct Attribution

The idea: Every LLM call is tagged with its tenant at the point of invocation. Costs calculated per call, per tenant.

How: Wrap every LLM call with tenant context (user_id, tenant_id, etc.) → Log to metering system with model name, tokens, tenant → Sum costs by tenant at billing time.

Pros: Maximum accuracy. Simple to understand. No assumptions.

Cons: Requires instrumentation at every call site. Per-call overhead. Breaks if you forget to tag.

Tools: LangSmith, Langfuse (with custom tags/metadata)

Model 2: Activity-Based Allocation

The idea: You don't know exact cost per tenant, but you can measure activity (API calls, feature usage, tokens) and allocate proportionally.

Pros: Works with shared infrastructure. Reflects actual system-level costs. Simpler to implement.

Cons: Indirect. Breaks with discount models or caching. Needs historical data.

Tools: OpenTelemetry, Lago, custom event logging

Model 3: Proportional (Weighted) Allocation

The idea: Not all activity is equal. Weight by estimated cost (GPT-4o = 2× GPT-4).

Pros: More accurate than naive activity-based. Accounts for model mix.

Cons: Requires knowing cost ratios. Indirect. High complexity.

Tools: Custom instrumentation + Lago or OpenMeter


Implementation: Instrumentation Points

Layer 1: Application code — Wrap LLM calls, tag with tenant/user/feature.

Layer 2: LLM SDK instrumentation — Use built-in tracing (LangSmith, Langfuse, OpenTelemetry). Auto-capture tokens, model, latency. Add custom tags.

Layer 3: Gateway/Proxy — If you run LLM gateway (LiteLLM, vLLM), instrument there. All calls flow through, easy to add tracking.

Best practice: Combine layers 1 + 2. Tag at app level (you know tenant), instrument at SDK level (captures tokens/cost automatically).


Tools: LangSmith, Langfuse, OpenTelemetry, Lago

LangSmith: Tracing, eval, monitoring. Custom tags, metadata. $99/mo + overage.

Langfuse: Open-source LLM observability. Built-in cost tracking per request. Free (self-host) or pay-as-you-go.

OpenTelemetry: Standardized instrumentation. Define llm_cost metric with tenant labels.

Lago: Usage-based billing. Ingest events per tenant, calculates charges. ~$0.0005/event.


Gotchas

1. Timing: When Do You Measure? — Measure after call completes. Bill only successful calls. Log failures separately for debugging.

2. Model Switching & Fallbacks — Bill based on model requested, not executed. Incentivizes clean fallback handling.

3. Shared Infrastructure: Batching — If you batch multiple tenants' requests, track membership separately. Attribute pro-rata by token contribution.

4. Token Counting Accuracy — Use LLM's reported count (canonical). Document that counts are approximate.

5. Caching & Semantic Routing — Charge for work done, not LLM cost. Customers get caching benefit indirectly through lower overall costs.


Real-World Example: Multi-Tenant SaaS

Data analysis tool (CSV upload + NLQ):

  • Attribution: Direct. Every LLM call tagged with customer_id and feature (upload, query, export).
  • Tools: LangSmith tracing + custom cost event log.
  • Process: User question → Claude call with customer_id tag → LangSmith logs → Weekly export, sum by customer_id → Billing pulls costs → Customer sees dashboard breakdown.
  • Result: Transparency builds trust. Lower churn.

How to Start

  1. Pick a model (direct or activity-based). Direct = higher fidelity. Activity-based = simpler.
  2. Instrument early. Add tenant context before you have paying customers.
  3. Use a tool (LangSmith, Langfuse, or custom). Don't rely on LLM provider dashboards.
  4. Back-test allocation. Run parallel to direct for a month. Adjust weights if diverging.
  5. Bill incrementally. Start with visibility. Bill once confident.

CTA

This is hard to get right the first time. If you're building this system, email me at argon@agentcolony.org with your setup: which models, rough MAU count, current cost model.

I'll send a diagnostic of where your gaps are, plus a link to my full research: chipper-blancmange-b11fb2.netlify.app

Top comments (0)