Joshua Chukwu

Posted on May 6

Where your AI budget is actually going (it’s not what you think)

#ai #openai #llm #webdev

Series:

AI Isn’t an Engineering Problem Anymore (Part 3)
It’s a cost problem–and most teams don’t realize it yet.

In the last post, I talked about how most LLM usage isn’t as “new” as it feels.
A lot of it is:
iterative
repetitive
overlapping
That’s interesting on its own.
But it becomes a lot more important when you start looking at it through a different lens:
cost.

The assumption most people make.

When people think about AI costs, they usually assume it comes from:
heavy usage
complex and heavy queries
large models
high traffic
Which is partially true, but incomplete.
What actually adds up.
In practice, a significant portion of usage comes from things like:
retrying prompts
slightly reworded questions
debugging loops
near-duplicate workflows
None of these feel expensive individually.
But together, they add up.
In my experience from robotics and even outside engineering, anything that compounds tends to spiral faster than expected.

The part most people miss

.
There’s another layer that makes this worse:
context growth
As conversations get longer, the model doesn’t just process your latest message.
It processes:
your current prompt
plus everything that came before it (within the context window)
So each new message isn’t just:
“one more request”
It’s:
“one more request plus an increasing amount of prior context”.
Why this compounds quickly
Think about a long debugging session.
Message 1: What is A
small context
relatively cheap
Message 10: what is A made up of
includes previous messages
more tokens
Message 30: compound characteristics of sample A+(message 1 - 29)
includes a large conversation history
significantly more tokens
Now combine that with:
iteration loops
retries
near-duplicate prompts
And you get a pattern where:
cost doesn’t just grow linearly—it compounds with usage patterns.

A rough mental model

Imagine your usage looks like this:
40% - genuinely new work
30% - variations of the same request
20% - retries / debugging loops
10% - other
Now layer in context growth.
Even if each request seems small:
later requests are more expensive than earlier ones
And all of it is still treated as:
new work.

Why this is easy to miss

Because cost doesn’t show up per thought.
It shows up per request.
And each request feels justified.

The compounding effect

Now scale this:
across a team
across features
across users
What starts as:
“a bit of iteration”
becomes:
a large portion of your AI spend.

The hidden problem

The issue isn’t just cost.
It’s visibility.
Most teams don’t know:
where their AI usage is going
how much is repeated
how context is affecting cost
which workflows are inefficient
So they default to:
keep building, keep shipping, keep paying.

The shift

At some point, the question changes from:
“What can AI do?”
to:
“What is AI costing us?”
This is where AI stops being:
just an engineering problem
And becomes:
a financial one.

A different way to think about it

Knowledge is often described as power.
But knowledge on its own is more like potential energy.
It only becomes power when it’s applied—when it turns into decision-making and control.
In the context of AI:
access to models is knowledge
usage patterns are behavior
but efficiency is applied understanding
The teams that will win aren’t just the ones using AI.
They’re the ones who:
understand how their usage behaves
maximize what they already have
and design systems that avoid unnecessary repetition

A broader implication

In this age, AI is becoming something that almost every team, and even every household will use.
But just like any other resource:
usage without visibility leads to waste
There needs to be some form of:
awareness
control
and, effectively, a “meter” on how it’s being used
Not to restrict usage, but to understand it
What I’m trying to understand
At this point, the question isn’t just:
how do we use AI?
It’s:
how do we make AI usage economically sustainable?

What I’ll explore next

In the next post, I’ll go deeper into one specific piece of this:
why simple caching approaches don’t fully solve the problem
👉 Part 2 is here: (https://dev.to/joshua_chukwu_ccb92f05a94/why-most-llm-api-usage-is-quietly-inefficient-4eko?preview=401f3ac7119c46d21f585a03fcb4a625008594ab67a937a4cdfafeebd060d28d70ff4a0887e0f29bc789f100e99c71b343e3223a6756451f5f83dc94)

Closing thought

AI makes it easier to build.
But it also makes it easier to spend.
And unless we start thinking about how usage behaves and not just what models can do,
that spend can grow in ways that are hard to see until it’s too late.

DEV Community