Series: AI Isn’t an Engineering Problem Anymore (Part 2)
It’s a cost problem—and most teams don’t realize it yet.
In the last post, I talked about hitting a usage limit while debugging my robot and realizing how repetitive my own AI usage had become.
At the time, it felt like a personal workflow issue.
But the more I thought about it, the more it became clear:
This isn’t just a “me problem.”
It’s a pattern.
The illusion of “new” work
When we use LLMs, whether through APIs or tools, it feels like every request is new.
You type something different.
You add more context.
You refine your question.
But under the hood, a lot of those requests are doing very similar work.
For example:
debugging the same issue from different angles
rewording prompts to get a better answer
retrying when output isn’t quite right
asking for clarification on something you already asked
Each one feels justified.
And most of them are.
Where inefficiency actually comes from
The inefficiency isn’t from using AI too much.
It comes from how we naturally interact with it.
A few patterns show up almost everywhere:
1. Iteration loops
You don’t ask once, you iterate.
“Try this approach”
“That didn’t work, what about this?”
“What if I change this parameter?”
Each step builds on the last, but often overlaps heavily.
2. Near-duplicate prompts
These are the most interesting ones.
They’re not identical, but they’re close:
same intent
slightly different phrasing
maybe a bit more context
To a human, they’re obviously related.
To most systems, they’re treated as completely new.
3. Retry behavior
Sometimes you just don’t like the answer.
So you try again.
same prompt
or a slightly modified one
This is normal.
But it means the same underlying request can be executed multiple times.
4. Team-level duplication
This gets amplified in teams.
Multiple developers might:
debug similar issues
build similar features
ask similar questions
But there’s no shared memory between them.
So the same work gets repeated across people.
Why this is hard to notice
The tricky part is:
None of this feels inefficient at the moment.
It feels like:
progress
exploration
iteration
And that’s because it is.
A quick analogy (this is exactly how it happens)
My dog gained 15 pounds in a year without us realizing.
What was happening?
Every weekday:
I fed him before leaving for work at 6:30am
my girlfriend fed him again at 8:30am
From each of our perspectives:
“I only fed him once.”
But at the system level:
He was getting fed twice, every single day.
We only noticed when something else didn’t add up,
a 25-pound bag of dog food disappearing way too fast.
Back to LLM usage
That’s exactly how LLM usage behaves.
Individually:
each request feels justified
each interaction feels necessary
But at the system level:
similar work is being recomputed
similar responses are being regenerated
similar costs are being incurred
Over and over again.
There’s also a more subtle version of this that most people don’t think about.
Even the smallest additions to prompts—things that feel natural or polite—are still part of the computation.

This is obviously a lighthearted example that seems trivial at first.
But multiply it across millions of requests…
Now it’s a systems problem.
How many extra tokens do you think you generate just from “please” and “thanks”?
A simple mental model
Think of LLM usage like this:
Every request is treated as a completely new computation, even when it’s not.
There’s no built-in concept of:
“we’ve already solved something like this”
“this looks similar to a previous request”
“we could reuse part of this result”
So the system does exactly what it’s designed to do:
recompute everything
When this becomes a real problem
If you’re just experimenting, this isn’t a big deal.
But once you start:
building products
scaling usage
or running teams
This pattern starts to matter.
Because now you have:
more requests
more iteration
more overlap
And all of it compounds.
This is where the shift happens
At some point, the problem stops being:
“How do we use AI effectively?”
And starts becoming:
“How do we use AI efficiently?”
That’s a very different question.
Why this isn’t obvious (yet)
Most of the conversation around AI is still focused on:
model quality
capabilities
performance
Not:
usage patterns
repetition
system-level efficiency
So a lot of teams don’t even look for this problem.
What I’m trying to understand
After noticing this pattern, the question I’ve been thinking about is:
How much of LLM usage is actually new… and how much is just repetition in disguise?
And more importantly:
If a meaningful portion is repetitive, what should we do about it?
What I’ll explore next
In the next post, I’ll go deeper into one specific part of this:
why you’re probably paying twice for the same LLM response, even when the prompts aren’t identical.
👉 Part 1 is here:
(https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700)
Top comments (0)