DEV Community

Cover image for I kept tracking AI agent pricing by model and missed the Slack channel that was burning the budget
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I kept tracking AI agent pricing by model and missed the Slack channel that was burning the budget

I used to look at AI costs the same way a lot of teams do:

  • OpenAI dashboard
  • model-by-model spend
  • token counts
  • maybe split traffic by API key if things got ugly

That works right up until your "one chatbot" turns into actual agent infrastructure.

Now the bot is touching Slack, Telegram, n8n, OpenClaw, a couple internal tools, and some workflow you forgot was still retrying in the background.

At that point, model-level pricing stops being the useful metric.

The thing you actually need is cost per workflow.

Not:

  • cost per model
  • cost per provider
  • cost per token bucket

But:

  • cost per Slack channel
  • cost per customer
  • cost per automation run
  • cost per conversation
  • cost per workflow execution

That distinction sounds small until you miss the one channel or workflow that is quietly lighting money on fire.

The Reddit post that said the quiet part out loud

While digging through agent cost discussions, I found a thread on r/openclaw about tracking cost per Slack channel or Telegram topic in SigNoz.

One suggestion was the usual answer: use separate API keys per channel.

The original poster immediately shut that down:

“Problem is that I need to track cost per slack channel without adding new api key when I add bot to new channel”

That is the real problem.

Not "which model is expensive?"

Not even "how many tokens did we use?"

The real question is:

Which unit of work is causing spend?

Once you start thinking in those terms, a lot of popular AI pricing advice looks incomplete.

Provider dashboards are telling the truth and still lying to you

I still want provider dashboards.

If GPT-5 usage spikes after a deploy, I want to know.
If Claude Opus 4.6 suddenly doubles in cost, I want to know.
If Grok 4.20 starts showing up in traces where it should not, I want to know.

But those dashboards mostly answer procurement questions.

They do not answer operator questions.

If one OpenClaw bot serves:

  • 40 Slack channels
  • 12 Telegram topics
  • 3 customer automations in n8n

then "Claude cost this much this week" is interesting, but not actionable enough.

What I actually want is:

  • Which Slack channel is the most expensive?
  • Which customer account is triggering retries and tool loops?
  • Which n8n workflow fans out into 5 model calls instead of 1?
  • Which automation got more expensive after I added retrieval?

That is operations.

And once agents become multi-step systems, operations matters more than raw model spend.

Why per-model reporting breaks as soon as agents get real

A single LLM call is not the same thing as an agent run.

That sounds obvious, but teams still budget like those are interchangeable.

They are not.

A real agent run might look like this:

  1. classify the message
  2. retrieve context
  3. summarize prior conversation
  4. call a tool
  5. generate a response
  6. reformat output for Slack
  7. retry because the tool call failed the first time

That is one business event.

But under the hood it might hit multiple models, multiple retries, and multiple tools.

So when someone says, "just track model spend," my first reaction is: for what exactly?

Because the business event was not:

  • GPT-5 used 18k tokens
  • Claude Opus 4.6 handled generation
  • some proxy logged a cheap call

The business event was:

  • support escalation workflow failed twice and cost $1.18
  • sales-assist bot in #enterprise-deals suddenly costs 4x more than last week
  • onboarding workflow now uses 3 model calls where it used to use 1

That is what engineers and operators can act on.

The metric I would use first: cost per completed workflow

If you run agents in production, my opinionated take is simple:

Your primary cost metric should be cost per completed workflow.

Then break that down by the dimensions that match your system.

The dimensions that actually matter

Here are the fields I would want on every LLM request:

  • workflow_id
  • customer_id
  • channel
  • stage
  • feature
  • user_id
  • conversation_id
  • environment

Examples:

  • workflow_id=support_triage_v3
  • channel=slack:#sales
  • customer_id=acct_1284
  • stage=retrieval
  • feature=thread_summary
  • environment=production

This is where a lot of so-called LLM cost optimization goes sideways.

Teams spend weeks debating whether to move from Claude to Qwen, or from GPT-5 to Llama, but they still cannot answer:

Which workflow is worth what it costs?

That is the question that should drive the model choice, not the other way around.

Practical example: tagging requests instead of guessing later

If you are using an OpenAI-compatible client, the cleanest pattern is to attach metadata at request time.

Here is a LiteLLM-style example:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:4000", api_key="sk-local")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Summarize this support thread"}
    ],
    user="acct_1284",
    extra_body={
        "metadata": {
            "workflow_id": "support_triage_v3",
            "channel": "slack:#support-enterprise",
            "stage": "summarization",
            "conversation_id": "thread_98765",
            "environment": "production"
        }
    }
)
Enter fullscreen mode Exit fullscreen mode

That one change gives you something way more useful than "model spend."

Now you can answer:

  • how much support_triage_v3 costs
  • whether #support-enterprise is abnormally expensive
  • whether summarization is the expensive stage
  • whether one account is causing most of the load

Without metadata, you are guessing after the fact.

Tiny per-call cost numbers are not enough

One thing that keeps fooling people: per-call cost looks precise, so it feels useful.

LiteLLM exposes headers like x-litellm-response-cost, and their docs show examples with tiny values like 2.85e-05.

That is cool. I want that data.

But on its own, it does not solve the operational problem.

If I see this:

curl -i -sSL 'http://0.0.0.0:4000/chat/completions' \
  -H 'Authorization: Bearer sk-1234' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "what llm are you"}]
  }' | grep 'x-litellm'
Enter fullscreen mode Exit fullscreen mode

and I get back a tiny cost value, great.

But I still need to know:

  • which workflow produced that call
  • whether it was part of a retry loop
  • which customer or channel it belonged to
  • whether that call was useful or wasted

A precise number without context is still weak telemetry.

Helicone and Langfuse are pointing at the same answer

This is not some exotic pattern.

The tooling already exists.

Helicone is very explicit about custom properties like conversation, app, environment, and other request metadata.

Example:

Helicone-Property-Conversation: support_issue_2
Helicone-Property-App: mobile
Helicone-Property-Environment: production
Enter fullscreen mode Exit fullscreen mode

That is exactly how you answer questions like:

  • Why did mobile support get expensive yesterday?
  • Why is production spending 4x more than staging?
  • Which conversation type causes long agent loops?

Langfuse lands in the same place with traces, tags, user IDs, and rollups.

The common pattern is obvious once you see it:

Attach business context to every model call.

Then aggregate cost by workflow or conversation first, model second.

What OpenAI usage APIs can and cannot do

OpenAI's usage and cost APIs are better than they used to be.

Grouping by things like:

  • project_ids
  • user_ids
  • api_key_ids
  • models

is genuinely useful.

But there is a hard limit here.

If five workflows all share one project and one API key, OpenAI cannot infer your internal business unit of work.

It cannot know that:

  • #support-enterprise is the expensive Slack channel
  • lead_enrichment_v3 is retrying too much
  • one tenant is generating most of the spend

That information only exists if you pass it through your stack.

So yes, use provider-native reporting.

Just do not expect it to solve workflow economics by itself.

The sneaky cost categories that make token charts misleading

A lot of teams still reason about cost like this:

  • input tokens
  • output tokens

That is already too simplistic for real agent systems.

Depending on your setup, cost behavior can also be affected by:

  • cached tokens
  • audio tokens
  • image tokens
  • retries
  • tool-call loops
  • provider-specific pricing tiers
  • routing decisions between models

That means two workflows with similar token totals can have very different actual cost profiles.

So when someone says, "we used 8 million tokens this week," my reaction is usually: okay, but where did the money actually go?

I would rather know:

  • comment moderation costs $0.004 per item
  • enterprise support escalation costs $0.19 per incident
  • invoice reconciliation costs $0.11 per successful run

Those are numbers you can optimize.

If you do not propagate metadata on every request, the whole thing rots

This is the annoying part.

Workflow-level attribution only works if your engineers are disciplined.

If one service forgets to pass:

  • workflow_id
  • channel
  • customer_id
  • stage

then your charts slowly become fiction.

That is why I would treat metadata propagation as part of the contract, not a nice-to-have.

A setup I would actually ship

If I had to implement this tomorrow, I would do something like this:

1. Pick one primary unit of cost

Choose one:

  • workflow run
  • conversation
  • customer interaction
  • automation execution

Do not pick five primary units.

2. Define required metadata fields

For example:

{
  "workflow_id": "support_triage_v3",
  "customer_id": "acct_1284",
  "channel": "slack:#support-enterprise",
  "stage": "generation",
  "user_id": "u_991",
  "conversation_id": "thread_98765",
  "environment": "production"
}
Enter fullscreen mode Exit fullscreen mode

3. Enforce it in middleware

If you already use an OpenAI-compatible layer, this is a good place to enforce request shape.

Pseudo-code:

function assertCostMetadata(meta: Record<string, string>) {
  const required = [
    "workflow_id",
    "customer_id",
    "channel",
    "stage",
    "environment"
  ]

  for (const key of required) {
    if (!meta[key]) {
      throw new Error(`Missing required metadata: ${key}`)
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Aggregate by workflow first, model second

That ordering matters.

If you reverse it, you end up shaving pennies off model selection while one workflow keeps exploding in production.

5. Use provider dashboards as a secondary lens

Good for:

  • anomaly detection
  • provider comparisons
  • procurement discussions
  • catching bad deploys

Not enough for:

  • workflow economics
  • per-channel attribution
  • customer-level profitability

Where flat-rate infrastructure starts making more sense

This is also why flat-rate AI infrastructure is getting more interesting for agent teams.

Once you have agents running 24/7 across n8n, Make, Zapier, OpenClaw, or custom workflows, per-token billing creates a weird mental model.

Every extra branch, retry, or tool call feels like financial risk.

That pushes teams into defensive behavior:

  • over-monitoring tokens
  • prematurely downgrading models
  • avoiding useful automation because cost is unpredictable
  • constantly switching providers to chase a lower line item

That is part of why products like Standard Compute are appealing to agent builders.

If your stack already speaks the OpenAI API, a drop-in replacement with flat monthly pricing changes the optimization problem.

Instead of obsessing over every token, you can focus on:

  • cost per workflow
  • throughput
  • latency
  • reliability
  • whether the automation is worth running at all

That is a healthier way to operate production agents.

Especially when your workload spans multiple models and tools anyway.

The part that finally clicked for me

Here is the shortest version.

If you run one prompt, model-level pricing is fine.

If you run agents, cost per workflow is the real bill.

That is the shift.

And once you see it, a lot of weird industry behavior makes sense:

  • panic over token bills
  • endless provider switching
  • API key hacks for attribution
  • growing interest in flat monthly AI compute

Because the actual operator question is brutally simple:

Did this workflow create enough value to justify what it cost?

Everything else is just a prettier dashboard.

Actionable takeaway

If you only do one thing after reading this, do this:

Start attaching business metadata to every LLM request today.

At minimum:

workflow_id
customer_id
channel
stage
environment
Enter fullscreen mode Exit fullscreen mode

Then build your reporting around cost per workflow, not cost per model.

That one change will tell you more about your agent system than another week of staring at token charts.

And if your team is tired of per-token pricing entirely, it is worth looking at infrastructure that lets you keep the OpenAI-compatible workflow you already have without turning every agent run into a budget event.

That is the part most teams actually want: fewer billing games, more useful automation.

Top comments (0)