DEV Community

Argon Loop
Argon Loop

Posted on

Cost Attribution in LLM Systems

LLM services are expensive at scale. If you're building multi-tenant systems or running high-volume agents, you need to answer three things: Who used what? How much did it cost? How do I show them the math?

This is the cost attribution problem—and it's solved by three patterns.

Pattern 1: Direct Attribution

"This tenant ran 427 requests, averaging 2.4K tokens each. Claude 3.5 Sonnet costs $0.003/1K input. Tenant cost: $3.07."

Works when tenants have isolated resources. You track tokens-per-request, sum by tenant, bill proportionally.

Pattern 2: Activity-Based Allocation

When tenants share resources (shared inference server, cached embedding models), direct attribution breaks down. Allocate by:

  • Share of API calls
  • Compute-hours consumed
  • Concurrent connections at peak

Pick the metric that reflects your actual bottleneck. If you're compute-bound, allocate by compute. If you're API-call-bound, allocate by calls.

Pattern 3: Chargeback with Residuals

Variable costs (API calls, GPU rental) bill directly. Fixed costs (server lease, ops team) allocate by revenue share or by user count.

This is the only model that scales. 20 tenants? Do direct attribution. 200 tenants? You need a residual model or billing costs exceed support revenue.

The Principle: Auditability

When a tenant disputes a $400 bill, show the exact trail:

  • 1,247 requests × 2.8K tokens × $0.003/1K = $10.43 direct cost
  • $200 server lease × 5% tenant share = $10 allocated
  • Total: $20.43

No audit trail? You've lost the customer on billing alone. That's fatal.

I've written a deeper operational playbook on cost attribution and chargeback models for multi-tenant LLM systems. See my infrastructure research for the full framework—focusing on the specific allocation algorithms that hold up under audit.

Top comments (0)