Joshua Chukwu

Posted on May 8

You’re probably paying twice for the same LLM response

#ai #programming #machinelearning #automation

Series

: AI Isn’t an Engineering Problem Anymore (Part 4)
It’s a cost problem—and most teams don’t realize it yet.

In the last post, I talked about where AI costs actually come from—and how context growth quietly compounds them.
But there’s another pattern that’s even more uncomfortable once you notice it:
you’re probably paying twice for the same underlying answer
Not literally the exact same string.
But functionally, the same work.
The obvious version
Let’s start with something simple.
You ask:
“Why can’t my rover perform a zero-radius turn?”
Then later you ask:
“What could cause skid-steer instability at low speeds?”
Different wording.
Same underlying problem.

The subtle version (this is where it gets interesting)

Now imagine a debugging session.
You go through a loop:
ask a question
get a partial answer
refine your prompt
ask again
Each step feels like progress.
But look at it from a system perspective:
you’re repeatedly asking for overlapping reasoning
And each time:
the model recomputes it
the system bills it
nothing is reused

Why this happens naturally

Because this is how humans think.
We don’t ask perfect questions.
We:
explore
rephrase
iterate
So the system ends up doing:
slightly different versions of the same work
over and over again.

The team version (this is worse)

Now scale this beyond one person.
Multiple engineers might:
debug similar issues
build similar features
ask similar questions
But there’s no shared layer that says:
“we’ve already solved something like this”
So the same reasoning gets recomputed across:
people
time
workflows

Why simple caching doesn’t solve it

At this point, the obvious solution seems like:
“just cache the response”
But that only works for:
exact matches
In reality, most prompts are:
slightly reworded
slightly extended
slightly different
So traditional caching misses most of the problem.

A better way to think about it

Instead of asking:
“is this the same prompt?”
The better question is:
“is this the same intent?”
Because that’s what actually matters.

A quick analogy

It’s a very human pattern.
You’ve probably done this before, looking for something e.g a key, checking the same place twice, even though you already know it’s not there.
Not because it makes sense.
But because you’re searching, refining, trying again.
That’s exactly how we interact with LLMs.

The hidden cost

This is where things connect back to Post 3.
If:
a large portion of your usage is repetitive
and context keeps growing
and nothing is reused
Then:
you’re not just paying for usage
you’re paying for recomputation
And computation has always had a cost.
We’re only noticing it now because LLMs make that cost visible at scale.

A simple mental model

Think of it like this:
Every time you ask a similar question, the system:
re-derives the reasoning
re-generates the explanation
re-processes the context
Even if:
you’ve effectively already “paid” for that knowledge before

Why this matters more than it seems

At small scale, this is fine.
At larger scale, it becomes:
expensive
inefficient
invisible
Because it doesn’t show up as:
“duplicate cost”
It shows up as:
normal usage

The uncomfortable realization

Most teams don’t have a way to:
detect repetition
reuse prior work
or even measure how much overlap exists
So they default to:
keep asking, keep recomputing, keep paying

What I’m trying to understand

At this point, the question becomes:
how much of LLM usage is actually new?
And more importantly:
how much of it needs to be recomputed?

What I’ll explore next

In the next post, I’ll go deeper into this:
what a system would actually need in order to reuse work effectively (beyond simple caching)

Part 3 is here: (https://dev.to/joshua_chukwu_ccb92f05a94/where-your-ai-budget-is-actually-going-its-not-what-you-think-3bi0?preview=856c097d9c26551c1a836539589fd827398a31aa4efacf6a4b5ce551f241a957c369cff2b5e18f6c0187f9f94b07430dd024cdd47bc200dd6501ba1d)

Closing thought

AI gives you answers instantly.
But under the hood, every answer is being computed from scratch—again and again.
And unless we rethink how that work is reused,
we’re going to keep paying for the same intelligence
multiple times.

DEV Community