Span attributes that catch LLM cost regressions before billing does

#ai #llm #devops #observability

The default OTel + OpenInference span has llm.tokens.input and llm.tokens.output as numeric attributes. Useful for trace-level debugging. Not useful for per-team cost regressions, because nothing groups traces by team.

The 3 attribute additions that earned their keep:

team.id. Tagged on every span at the gateway layer (before the call routes to the LLM provider). This is the column that makes the cost rollup possible. Without it, you can attribute spend to an org but not to a team inside the org.

feature.id. The product feature that triggered the call (chat_assistant, summarizer, rag_search). Lets you see when one feature's token cost spikes vs the overall trend.

llm.model. Already standard in OpenInference but worth flagging: without this, you cannot separate a cheap mini-tier model's spikes from a frontier model's spikes when both are in the same feature.

The daily Tempo + Grafana query (TraceQL):

{ resource.service.name = "llm-gateway" }
| histogram_quantile(0.95, llm.tokens.output_total) by (team.id, feature.id, llm.model)

The alert rule: page when 7-day-trailing average of output-tokens-per-team-per-feature jumps more than 2x week-over-week. We caught a runaway retry loop last quarter that the org-level spend dashboard missed because the total stayed within budget while one team's bill quietly doubled.

What we tried and dropped:

user.id tagging: privacy concerns at scale, and the rollup-by-team covered the use case.

request.id tagging: redundant with the trace_id; just adds cardinality.

Drafted with AI assistance, edited and verified by author.