I think a lot of companies are still telling themselves a very comforting story about AI costs.
The story goes like this:
Tokens are cheap.
Models keep getting better.
A few copilots here, a few agents there, maybe a chatbot for support, maybe some code generation in CI, and somehow this all stays in the “software subscription” bucket.
I do not buy that story anymore.
My take is simple:
tokens are starting to behave less like a cheap productivity feature and more like a volatile labor line item.
And in a growing number of workflows, they are already expensive enough to compete with what companies would happily pay for junior humans.
Not just junior developers.
Junior assistants too.
The worse part is not even the absolute price.
It is the unpredictability.
A junior hire has a salary.
A token budget has moods.
the spreadsheet starts lying very early
On paper, token prices still look harmless.
They are quoted per million tokens, which is a wonderful way to make real usage feel abstract.
A few examples from current public pricing pages:
- OpenAI GPT-5.4: $2.50 / 1M input and $15 / 1M output
- Anthropic Claude Sonnet 4.6: $3 / 1M input and $15 / 1M output
- Google Gemini 2.5 Pro: $1.25 / 1M input and $10 / 1M output for prompts up to 200k tokens, then $2.50 input and $15 output beyond that threshold
That still sounds cheap if you are thinking about a few prompts in a playground.
It stops sounding cheap the moment AI stops being a toy and starts becoming part of your operating model.
Let’s do slightly less fake math.
Imagine a team with 10 people using coding agents, document summarizers, support drafting, and internal automation.
Nothing science-fiction here.
Just normal “we adopted AI everywhere” behavior.
Assume each seat consumes 5 million input tokens and 2 million output tokens per workday.
That is not tiny, but it is also not insane once you include long contexts, retries, tool traces, generated code, explanations, and review loops.
Here is what that looks like over roughly 22 workdays:
| Provider/model | Approx monthly cost for 10 seats |
|---|---|
| OpenAI GPT-5.4 | $9,350 |
| Claude Sonnet 4.6 | $9,900 |
| Gemini 2.5 Pro | $7,150 to $9,350 |
That range on Gemini is already part of the point.
The same team can pay very different numbers depending on prompt size behavior.
Now compare that with actual wage data.
The U.S. Bureau of Labor Statistics lists:
- $47,460/year as the 2024 median pay for secretaries and administrative assistants
- $133,080/year as the 2024 median pay for software developers
- $79,850/year as the lower 10th percentile for software developers
Monthly, that works out to roughly:
- $3,955/month for an administrative assistant at the median
- $6,654/month for the lower 10th percentile of software developers
- $11,090/month for the median software developer
So no, one engineer casually using a model is not suddenly more expensive than a junior developer.
That would be a silly headline.
But a company-wide AI workflow absolutely can become more expensive than junior labor, very fast.
And in some cases it already is.
Five heavy AI seats can outrun a median administrative assistant.
Ten can get uncomfortably close to, or exceed, what many companies would budget for an early-career developer.
That is before you count observability, vector databases, eval pipelines, orchestration glue, and the humans still needed to check whether the machine did something stupid.
token costs are worse than salaries because they are less stable
This is the part I think many executives still do not fully internalize.
A salary is expensive, yes.
But it is legible.
Token spend is worse in one important way:
you often do not know the real cost profile until after the workflow becomes popular.
A few reasons:
1. output is where the pain lives
A lot of people anchor on input pricing because it looks small.
That is the wrong anchor.
The expensive part is often output.
Especially when models reason longer, explain more, retry more, or emit giant blobs of code and text nobody asked them to be that verbose about.
OpenAI GPT-5.4 is 6x more expensive on output than input.
Claude Sonnet 4.6 is 5x more expensive on output than input.
Gemini 2.5 Pro jumps hard on output too.
So the team that says, “we only send a lot of context” is often missing the real bill.
The bill usually shows up when the system starts talking back too much.
2. the same work can suddenly tokenize differently
Anthropic documents that Claude Opus 4.7 uses a new tokenizer that may consume up to 35% more tokens for the same fixed text.
That should make every finance person mildly uncomfortable.
Imagine paying 35% more for the same semantic workload because the tokenizer changed.
Not because your product changed.
Not because customers changed.
Just because the vendor changed how text gets counted.
That is not labor-like.
That is utility-bill-like.
3. thresholds and modes quietly change the bill
Gemini 2.5 Pro charges one rate for prompts up to 200k tokens and a higher one above that.
Anthropic has regional multipliers and a fast mode with premium pricing.
OpenAI offers batch discounts, but also a data residency premium.
So even if the application behavior looks “the same” from the outside, the internal billing shape can move around because:
- prompts got longer
- cache hit rates dropped
- a team enabled a faster mode
- a product shifted regions
- grounding or search got added
- the model started generating more output than last month
That is not predictable staffing.
That is spend drift.
4. agents multiply hidden tokens
This gets worse with agents.
A normal chat interaction is one thing.
An agent loop is another beast entirely.
Now you are paying for:
- the original prompt
- tool schemas
- tool results
- chain-of-thought-adjacent reasoning budgets, depending on platform semantics
- retries
- file context
- summaries of prior turns
- review passes
- self-correction loops
People love saying “the agent did this task in 8 minutes.”
Cool.
What they often do not say is that the agent may have consumed the token equivalent of several ordinary interactions to get there.
That means your marginal cost per useful result is often much blurrier than the dashboard suggests.
this does not mean “stop using AI”
To be clear, I am not making the boomer argument here.
I am not saying, “AI is too expensive, go back to doing everything manually.”
That would be dumb.
AI is real leverage.
It is already useful.
It can absolutely make a strong person much stronger.
But I think companies need to stop treating token spend as if it were automatically better than human spend.
Sometimes it is.
Sometimes it is not.
And sometimes it is only better if a human is still clearly in charge of:
- scope
- review
- escalation
- quality control
- budget discipline
- model selection
The winning pattern is not “replace juniors with tokens.”
The winning pattern is more like:
use tokens to amplify good people, while good people remain the owners of correctness, cost, and consequences.
That is a much more boring sentence.
It is also the one that survives contact with finance.
my opinionated version
I think a lot of AI adoption right now is being sold with the same bad habit we saw in early cloud conversations.
People love the upside story.
Nobody wants to dwell on the bill shape.
So teams say things like:
- “it is only a few dollars per million tokens”
- “the model is cheap enough”
- “we will optimize later”
- “let’s just let everyone use the best model for now”
That is exactly how small variable costs become strategic costs.
And unlike hiring, token spend can get uglier without any emotionally obvious moment.
You do not interview a token.
You do not onboard a token.
You do not notice 14 small workflow expansions the same way you notice one new headcount request.
That is why this category is dangerous.
It slips past normal management instincts.
You would debate a junior hire.
You might not debate a bunch of “helpful” agent workflows until the invoice starts looking like a small payroll category.
what smart companies should do instead
My recommendation is not anti-AI.
It is anti-delusion.
If you are serious about using models across the company, then do a few boring things early:
price workflows, not prompts
Do not benchmark one cute demo request.
Measure the full workflow:
retries, context growth, tool calls, review passes, and average output length.
assign model tiers intentionally
Not every task deserves the frontier model.
Most companies are massively overpaying because they use the most expensive reasoning setup for work that could be routed to a cheaper model.
put humans on the acceptance boundary
Do not use expensive models as a management substitute.
If the output matters, a human should still own acceptance.
Otherwise you are paying for generation and then paying again for the fallout.
treat token budgets like cloud budgets
Tag them.
Attribute them.
Alert on them.
Set hard ceilings where needed.
Cloud taught us this already.
Variable spend is only “efficient” when someone is actually watching it.
optimize for controlled leverage
The right comparison is not “AI versus humans.”
It is “AI plus one good human versus the old way of working.”
That framing usually leads to better architecture and more honest economics.
my take
Tokens are still useful.
Sometimes incredibly useful.
But they are no longer a cute rounding error.
And they are definitely not predictable enough to treat as a harmless software snack.
For many teams, token spend is becoming a real labor-adjacent budget category.
In some workflows it is already expensive enough to beat junior human cost.
In many more, it is at least expensive enough that the comparison should happen before the rollout, not after the invoice.
So no, I would not stop using AI.
I would just stop pretending that tokens are magically cheaper than people.
They are often cheaper than some kinds of work.
That is different.
And unlike people, tokens come with a billing model that can change under your feet, a cost profile that explodes with usage patterns, and a nasty habit of looking cheap right until they are not.
That is why my current default is simple:
use AI aggressively, but never let the token budget operate without adult supervision.
references
- OpenAI, API Pricing — https://openai.com/api/pricing/
- Anthropic, Claude pricing — https://docs.anthropic.com/en/docs/about-claude/pricing
- Google, Gemini Developer API pricing — https://ai.google.dev/gemini-api/docs/pricing
- U.S. Bureau of Labor Statistics, Software Developers, Quality Assurance Analysts, and Testers — https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm
- U.S. Bureau of Labor Statistics, Secretaries and Administrative Assistants — https://www.bls.gov/ooh/office-and-administrative-support/secretaries-and-administrative-assistants.htm
Top comments (0)