Paulo Victor Leite Lima Gomes

Posted on May 1 • Edited on May 5

tokens are now more expensive than juniors, and less predictable

#ai #softwareengineering #opinion #programming

I think a lot of companies are still telling themselves a very comforting story about AI costs.

The story goes like this:

Tokens are cheap.
Models keep getting better.

A few copilots here, a few agents there, maybe a chatbot for support, maybe some code generation in CI, lets buy macminis and put everything on hermes and openclaw, doesn't mater if they will take all our token and somehow this all stays in the “software subscription” bucket.

Software engineers like me didn't buy it some years ago already, but now, it started hitting the non-technical people as well. Some of them? Investors...

My take is simple:

tokens are starting to behave less like a cheap productivity feature and more like a volatile labor line item.

And in a growing number of workflows, they are already expensive enough to compete with what companies would happily pay for junior humans. Not just junior developers. Junior assistants too!!!

The worse part is not even the absolute price. It is the unpredictability.

A junior hire has a salary.
A token budget has moods.

A human when makes a mistake, learns from it and tries to make it right next time. An LLM? Might a sorry will happen. in other words, someone "triggered it to do in this way", its not AI mistake in fact. Its just a tool, a powerful and revolutionary one, but still a tool.

the spreadsheet starts lying very early

On paper, token prices still look harmless.
They are quoted per million tokens, which is a wonderful way to make real usage feel abstract.

A few examples from current public pricing pages:

OpenAI GPT-5.4: $2.50 / 1M input and $15 / 1M output
Anthropic Claude Sonnet 4.6: $3 / 1M input and $15 / 1M output
Google Gemini 2.5 Pro: $1.25 / 1M input and $10 / 1M output for prompts up to 200k tokens, then $2.50 input and $15 output beyond that threshold

That still sounds cheap if you are thinking about a few prompts in a playground.
It stops sounding cheap the moment AI stops being a toy and starts becoming part of your operating model.

Let’s do slightly less fake math.

Imagine a team with 10 people using coding agents, document summarizers, support drafting, and internal automation.
Nothing science-fiction here.
Just normal “we adopted AI everywhere” behavior.

Assume each seat consumes 5 million input tokens and 2 million output tokens per workday.
That is not tiny, but it is also not insane once you include long contexts, retries, tool traces, generated code, explanations, and review loops.

Here is what that looks like over roughly 22 workdays:

Provider/model	Approx monthly cost for 10 seats
OpenAI GPT-5.4	$9,350
Claude Sonnet 4.6	$9,900
Gemini 2.5 Pro	$7,150 to $9,350

That range on Gemini is already part of the point.
The same team can pay very different numbers depending on prompt size behavior.

Now compare that with actual wage data.
The U.S. Bureau of Labor Statistics lists:

$47,460/year as the 2024 median pay for secretaries and administrative assistants
$133,080/year as the 2024 median pay for software developers
$79,850/year as the lower 10th percentile for software developers

Monthly, that works out to roughly:

$3,955/month for an administrative assistant at the median
$6,654/month for the lower 10th percentile of software developers
$11,090/month for the median software developer

So no, one engineer casually using a model is not suddenly more expensive than a junior developer.
That would be a silly headline.

But a company-wide AI workflow absolutely can become more expensive than junior labor, very fast.
And in some cases it already is.

Five heavy AI seats can outrun a median administrative assistant.
Ten can get uncomfortably close to, or exceed, what many companies would budget for an early-career developer.
That is before you count observability, vector databases, eval pipelines, orchestration glue, and the humans still needed to check whether the machine did something stupid.

token costs are worse than salaries because they are less stable

This is the part I think many executives still do not fully internalize.

A salary is expensive, yes.
But it is legible.

Token spend is worse in one important way:

you often do not know the real cost profile until after the workflow becomes popular.

A few reasons:

1. output is where the pain lives

A lot of people anchor on input pricing because it looks small.
That is the wrong anchor.

The expensive part is often output.
Especially when models reason longer, explain more, retry more, or emit giant blobs of code and text nobody asked them to be that verbose about.

OpenAI GPT-5.4 is 6x more expensive on output than input.
Claude Sonnet 4.6 is 5x more expensive on output than input.
Gemini 2.5 Pro jumps hard on output too.

So the team that says, “we only send a lot of context” is often missing the real bill.
The bill usually shows up when the system starts talking back too much.

2. the same work can suddenly tokenize differently

Anthropic documents that Claude Opus 4.7 uses a new tokenizer that may consume up to 35% more tokens for the same fixed text.

That should make every finance person mildly uncomfortable.

Imagine paying 35% more for the same semantic workload because the tokenizer changed.
Not because your product changed.
Not because customers changed.
Just because the vendor changed how text gets counted.

That is not labor-like.
That is utility-bill-like.

3. thresholds and modes quietly change the bill

Gemini 2.5 Pro charges one rate for prompts up to 200k tokens and a higher one above that.
Anthropic has regional multipliers and a fast mode with premium pricing.
OpenAI offers batch discounts, but also a data residency premium.

So even if the application behavior looks “the same” from the outside, the internal billing shape can move around because:

prompts got longer
cache hit rates dropped
a team enabled a faster mode
a product shifted regions
grounding or search got added
the model started generating more output than last month

That is not predictable staffing.
That is spend drift.

4. agents multiply hidden tokens

This gets worse with agents.

A normal chat interaction is one thing.
An agent loop is another beast entirely.

Now you are paying for:

the original prompt
tool schemas
tool results
chain-of-thought-adjacent reasoning budgets, depending on platform semantics
retries
file context
summaries of prior turns
review passes
self-correction loops

People love saying “the agent did this task in 8 minutes.”
Cool.
What they often do not say is that the agent may have consumed the token equivalent of several ordinary interactions to get there.

That means your marginal cost per useful result is often much blurrier than the dashboard suggests.

this does not mean “stop using AI”

To be clear, I am not making the boomer argument here.

I am not saying, “AI is too expensive, go back to doing everything manually.”
That would be dumb.

AI is real leverage.
It is already useful.
It can absolutely make a strong person much stronger.

But I think companies need to stop treating token spend as if it were automatically better than human spend.

Sometimes it is.
Sometimes it is not.
And sometimes it is only better if a human is still clearly in charge of:

scope
review
escalation
quality control
budget discipline
model selection

The winning pattern is not “replace juniors with tokens.”
The winning pattern is more like:

use tokens to amplify good people, while good people remain the owners of correctness, cost, and consequences.

That is a much more boring sentence.
It is also the one that survives contact with finance.

my opinionated version

I think a lot of AI adoption right now is being sold with the same bad habit we saw in early cloud conversations.

People love the upside story.
Nobody wants to dwell on the bill shape.

So teams say things like:

“it is only a few dollars per million tokens”
“the model is cheap enough”
“we will optimize later”
“let’s just let everyone use the best model for now”

That is exactly how small variable costs become strategic costs.

And unlike hiring, token spend can get uglier without any emotionally obvious moment.
You do not interview a token.
You do not onboard a token.
You do not notice 14 small workflow expansions the same way you notice one new headcount request.

That is why this category is dangerous.
It slips past normal management instincts.

You would debate a junior hire.
You might not debate a bunch of “helpful” agent workflows until the invoice starts looking like a small payroll category.

what smart companies should do instead ?

My recommendation is not anti-AI.
It is anti-delusion. 🌈

If you are serious about using models across the company, then do a few boring things early:

price workflows, not prompts

Do not benchmark one cute demo request.
Measure the full workflow: retries, context growth, tool calls, review passes, and average output length.

assign model tiers intentionally

Not every task deserves the frontier model.
Most companies are massively overpaying because they use the most expensive reasoning setup for work that could be routed to a cheaper model.

put humans on the acceptance boundary

Do not use expensive models as a management substitute.
If the output matters, a human should still own acceptance.
Otherwise you are paying for generation and then paying again for the fallout.

treat token budgets like cloud budgets

Tag them.
Attribute them.
Alert on them.
Set hard ceilings where needed.

Cloud taught us this already.
Variable spend is only “efficient” when someone is actually watching it.

optimize for controlled leverage

The right comparison is not “AI versus humans.”
It is “AI plus one good human versus the old way of working.”

That framing usually leads to better architecture and more honest economics.

my take

Tokens are still useful. Sometimes incredibly useful!!! I get it.

But they are no longer a cute rounding error.
And they are definitely not predictable enough to treat as a harmless software snack.

For many teams, token spend is becoming a real labor-adjacent budget category.
In some workflows it is already expensive enough to beat junior human cost.
In many more, it is at least expensive enough that the comparison should happen before the rollout, not after the invoice.

So no, of course, I would not stop using AI, this is madness loosing good optimization

I would just stop pretending that tokens are magically cheaper than people.
They are often cheaper than some kinds of work.
That is different.

And unlike people, tokens come with a billing model that can change under your feet, a cost profile that explodes with usage patterns, and a nasty habit of looking cheap right until they are not.

That is why my current default is simple:

use AI aggressively, but never let the token budget operate without adult supervision.

references

OpenAI, API Pricing — https://openai.com/api/pricing/
Anthropic, Claude pricing — https://docs.anthropic.com/en/docs/about-claude/pricing
Google, Gemini Developer API pricing — https://ai.google.dev/gemini-api/docs/pricing
U.S. Bureau of Labor Statistics, Software Developers, Quality Assurance Analysts, and Testers — https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm
U.S. Bureau of Labor Statistics, Secretaries and Administrative Assistants — https://www.bls.gov/ooh/office-and-administrative-support/secretaries-and-administrative-assistants.htm

Top comments (1)

jia liu • May 6

ou can grab a $1.00 Trial Pack (approx. 500K tokens) via our automated Shoppy store to test the speed yourself. It’s instant delivery!

Let’s discuss! How are you guys handling the DeepSeek-V3 traffic spikes? Drop a comment below!