I used to look at AI costs the same way a lot of teams do:
- OpenAI dashboard
- model-by-model spend
- token counts
- maybe split traffic by API key if things got ugly
That works right up until your "one chatbot" turns into actual agent infrastructure.
Now the bot is touching Slack, Telegram, n8n, OpenClaw, a couple internal tools, and some workflow you forgot was still retrying in the background.
At that point, model-level pricing stops being the useful metric.
The thing you actually need is cost per workflow.
Not:
- cost per model
- cost per provider
- cost per token bucket
But:
- cost per Slack channel
- cost per customer
- cost per automation run
- cost per conversation
- cost per workflow execution
That distinction sounds small until you miss the one channel or workflow that is quietly lighting money on fire.
The Reddit post that said the quiet part out loud
While digging through agent cost discussions, I found a thread on r/openclaw about tracking cost per Slack channel or Telegram topic in SigNoz.
One suggestion was the usual answer: use separate API keys per channel.
The original poster immediately shut that down:
“Problem is that I need to track cost per slack channel without adding new api key when I add bot to new channel”
That is the real problem.
Not "which model is expensive?"
Not even "how many tokens did we use?"
The real question is:
Which unit of work is causing spend?
Once you start thinking in those terms, a lot of popular AI pricing advice looks incomplete.
Provider dashboards are telling the truth and still lying to you
I still want provider dashboards.
If GPT-5 usage spikes after a deploy, I want to know.
If Claude Opus 4.6 suddenly doubles in cost, I want to know.
If Grok 4.20 starts showing up in traces where it should not, I want to know.
But those dashboards mostly answer procurement questions.
They do not answer operator questions.
If one OpenClaw bot serves:
- 40 Slack channels
- 12 Telegram topics
- 3 customer automations in n8n
then "Claude cost this much this week" is interesting, but not actionable enough.
What I actually want is:
- Which Slack channel is the most expensive?
- Which customer account is triggering retries and tool loops?
- Which n8n workflow fans out into 5 model calls instead of 1?
- Which automation got more expensive after I added retrieval?
That is operations.
And once agents become multi-step systems, operations matters more than raw model spend.
Why per-model reporting breaks as soon as agents get real
A single LLM call is not the same thing as an agent run.
That sounds obvious, but teams still budget like those are interchangeable.
They are not.
A real agent run might look like this:
- classify the message
- retrieve context
- summarize prior conversation
- call a tool
- generate a response
- reformat output for Slack
- retry because the tool call failed the first time
That is one business event.
But under the hood it might hit multiple models, multiple retries, and multiple tools.
So when someone says, "just track model spend," my first reaction is: for what exactly?
Because the business event was not:
- GPT-5 used 18k tokens
- Claude Opus 4.6 handled generation
- some proxy logged a cheap call
The business event was:
- support escalation workflow failed twice and cost $1.18
- sales-assist bot in
#enterprise-dealssuddenly costs 4x more than last week - onboarding workflow now uses 3 model calls where it used to use 1
That is what engineers and operators can act on.
The metric I would use first: cost per completed workflow
If you run agents in production, my opinionated take is simple:
Your primary cost metric should be cost per completed workflow.
Then break that down by the dimensions that match your system.
The dimensions that actually matter
Here are the fields I would want on every LLM request:
workflow_idcustomer_idchannelstagefeatureuser_idconversation_idenvironment
Examples:
workflow_id=support_triage_v3channel=slack:#salescustomer_id=acct_1284stage=retrievalfeature=thread_summaryenvironment=production
This is where a lot of so-called LLM cost optimization goes sideways.
Teams spend weeks debating whether to move from Claude to Qwen, or from GPT-5 to Llama, but they still cannot answer:
Which workflow is worth what it costs?
That is the question that should drive the model choice, not the other way around.
Practical example: tagging requests instead of guessing later
If you are using an OpenAI-compatible client, the cleanest pattern is to attach metadata at request time.
Here is a LiteLLM-style example:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000", api_key="sk-local")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Summarize this support thread"}
],
user="acct_1284",
extra_body={
"metadata": {
"workflow_id": "support_triage_v3",
"channel": "slack:#support-enterprise",
"stage": "summarization",
"conversation_id": "thread_98765",
"environment": "production"
}
}
)
That one change gives you something way more useful than "model spend."
Now you can answer:
- how much
support_triage_v3costs - whether
#support-enterpriseis abnormally expensive - whether summarization is the expensive stage
- whether one account is causing most of the load
Without metadata, you are guessing after the fact.
Tiny per-call cost numbers are not enough
One thing that keeps fooling people: per-call cost looks precise, so it feels useful.
LiteLLM exposes headers like x-litellm-response-cost, and their docs show examples with tiny values like 2.85e-05.
That is cool. I want that data.
But on its own, it does not solve the operational problem.
If I see this:
curl -i -sSL 'http://0.0.0.0:4000/chat/completions' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "what llm are you"}]
}' | grep 'x-litellm'
and I get back a tiny cost value, great.
But I still need to know:
- which workflow produced that call
- whether it was part of a retry loop
- which customer or channel it belonged to
- whether that call was useful or wasted
A precise number without context is still weak telemetry.
Helicone and Langfuse are pointing at the same answer
This is not some exotic pattern.
The tooling already exists.
Helicone is very explicit about custom properties like conversation, app, environment, and other request metadata.
Example:
Helicone-Property-Conversation: support_issue_2
Helicone-Property-App: mobile
Helicone-Property-Environment: production
That is exactly how you answer questions like:
- Why did mobile support get expensive yesterday?
- Why is production spending 4x more than staging?
- Which conversation type causes long agent loops?
Langfuse lands in the same place with traces, tags, user IDs, and rollups.
The common pattern is obvious once you see it:
Attach business context to every model call.
Then aggregate cost by workflow or conversation first, model second.
What OpenAI usage APIs can and cannot do
OpenAI's usage and cost APIs are better than they used to be.
Grouping by things like:
project_idsuser_idsapi_key_idsmodels
is genuinely useful.
But there is a hard limit here.
If five workflows all share one project and one API key, OpenAI cannot infer your internal business unit of work.
It cannot know that:
-
#support-enterpriseis the expensive Slack channel -
lead_enrichment_v3is retrying too much - one tenant is generating most of the spend
That information only exists if you pass it through your stack.
So yes, use provider-native reporting.
Just do not expect it to solve workflow economics by itself.
The sneaky cost categories that make token charts misleading
A lot of teams still reason about cost like this:
- input tokens
- output tokens
That is already too simplistic for real agent systems.
Depending on your setup, cost behavior can also be affected by:
- cached tokens
- audio tokens
- image tokens
- retries
- tool-call loops
- provider-specific pricing tiers
- routing decisions between models
That means two workflows with similar token totals can have very different actual cost profiles.
So when someone says, "we used 8 million tokens this week," my reaction is usually: okay, but where did the money actually go?
I would rather know:
- comment moderation costs $0.004 per item
- enterprise support escalation costs $0.19 per incident
- invoice reconciliation costs $0.11 per successful run
Those are numbers you can optimize.
If you do not propagate metadata on every request, the whole thing rots
This is the annoying part.
Workflow-level attribution only works if your engineers are disciplined.
If one service forgets to pass:
workflow_idchannelcustomer_idstage
then your charts slowly become fiction.
That is why I would treat metadata propagation as part of the contract, not a nice-to-have.
A setup I would actually ship
If I had to implement this tomorrow, I would do something like this:
1. Pick one primary unit of cost
Choose one:
- workflow run
- conversation
- customer interaction
- automation execution
Do not pick five primary units.
2. Define required metadata fields
For example:
{
"workflow_id": "support_triage_v3",
"customer_id": "acct_1284",
"channel": "slack:#support-enterprise",
"stage": "generation",
"user_id": "u_991",
"conversation_id": "thread_98765",
"environment": "production"
}
3. Enforce it in middleware
If you already use an OpenAI-compatible layer, this is a good place to enforce request shape.
Pseudo-code:
function assertCostMetadata(meta: Record<string, string>) {
const required = [
"workflow_id",
"customer_id",
"channel",
"stage",
"environment"
]
for (const key of required) {
if (!meta[key]) {
throw new Error(`Missing required metadata: ${key}`)
}
}
}
4. Aggregate by workflow first, model second
That ordering matters.
If you reverse it, you end up shaving pennies off model selection while one workflow keeps exploding in production.
5. Use provider dashboards as a secondary lens
Good for:
- anomaly detection
- provider comparisons
- procurement discussions
- catching bad deploys
Not enough for:
- workflow economics
- per-channel attribution
- customer-level profitability
Where flat-rate infrastructure starts making more sense
This is also why flat-rate AI infrastructure is getting more interesting for agent teams.
Once you have agents running 24/7 across n8n, Make, Zapier, OpenClaw, or custom workflows, per-token billing creates a weird mental model.
Every extra branch, retry, or tool call feels like financial risk.
That pushes teams into defensive behavior:
- over-monitoring tokens
- prematurely downgrading models
- avoiding useful automation because cost is unpredictable
- constantly switching providers to chase a lower line item
That is part of why products like Standard Compute are appealing to agent builders.
If your stack already speaks the OpenAI API, a drop-in replacement with flat monthly pricing changes the optimization problem.
Instead of obsessing over every token, you can focus on:
- cost per workflow
- throughput
- latency
- reliability
- whether the automation is worth running at all
That is a healthier way to operate production agents.
Especially when your workload spans multiple models and tools anyway.
The part that finally clicked for me
Here is the shortest version.
If you run one prompt, model-level pricing is fine.
If you run agents, cost per workflow is the real bill.
That is the shift.
And once you see it, a lot of weird industry behavior makes sense:
- panic over token bills
- endless provider switching
- API key hacks for attribution
- growing interest in flat monthly AI compute
Because the actual operator question is brutally simple:
Did this workflow create enough value to justify what it cost?
Everything else is just a prettier dashboard.
Actionable takeaway
If you only do one thing after reading this, do this:
Start attaching business metadata to every LLM request today.
At minimum:
workflow_id
customer_id
channel
stage
environment
Then build your reporting around cost per workflow, not cost per model.
That one change will tell you more about your agent system than another week of staring at token charts.
And if your team is tired of per-token pricing entirely, it is worth looking at infrastructure that lets you keep the OpenAI-compatible workflow you already have without turning every agent run into a budget event.
That is the part most teams actually want: fewer billing games, more useful automation.
Top comments (0)