Joshua Chukwu

Posted on May 4

Why most LLM API usage is quietly inefficient

#webdev #ai #programming #llm

Series: AI Isn’t an Engineering Problem Anymore (Part 2)
It’s a cost problem—and most teams don’t realize it yet.

In the last post, I talked about hitting a usage limit while debugging my robot and realizing how repetitive my own AI usage had become.
At the time, it felt like a personal workflow issue.
But the more I thought about it, the more it became clear:
This isn’t just a “me problem.”
It’s a pattern.

The illusion of “new” work

When we use LLMs, whether through APIs or tools, it feels like every request is new.
You type something different.
You add more context.
You refine your question.
But under the hood, a lot of those requests are doing very similar work.
For example:
debugging the same issue from different angles
rewording prompts to get a better answer
retrying when output isn’t quite right
asking for clarification on something you already asked
Each one feels justified.
And most of them are.

Where inefficiency actually comes from

The inefficiency isn’t from using AI too much.
It comes from how we naturally interact with it.
A few patterns show up almost everywhere:

1. Iteration loops

You don’t ask once, you iterate.
“Try this approach”
“That didn’t work, what about this?”
“What if I change this parameter?”
Each step builds on the last, but often overlaps heavily.

2. Near-duplicate prompts

These are the most interesting ones.
They’re not identical, but they’re close:
same intent
slightly different phrasing
maybe a bit more context
To a human, they’re obviously related.
To most systems, they’re treated as completely new.

3. Retry behavior

Sometimes you just don’t like the answer.
So you try again.
same prompt
or a slightly modified one
This is normal.
But it means the same underlying request can be executed multiple times.

4. Team-level duplication

This gets amplified in teams.
Multiple developers might:
debug similar issues
build similar features
ask similar questions
But there’s no shared memory between them.
So the same work gets repeated across people.

Why this is hard to notice

The tricky part is:
None of this feels inefficient at the moment.
It feels like:
progress
exploration
iteration
And that’s because it is.

A quick analogy (this is exactly how it happens)

My dog gained 15 pounds in a year without us realizing.
What was happening?
Every weekday:
I fed him before leaving for work at 6:30am
my girlfriend fed him again at 8:30am
From each of our perspectives:
“I only fed him once.”
But at the system level:
He was getting fed twice, every single day.
We only noticed when something else didn’t add up,
a 25-pound bag of dog food disappearing way too fast.

Back to LLM usage

That’s exactly how LLM usage behaves.
Individually:
each request feels justified
each interaction feels necessary
But at the system level:
similar work is being recomputed
similar responses are being regenerated
similar costs are being incurred
Over and over again.
There’s also a more subtle version of this that most people don’t think about.
Even the smallest additions to prompts—things that feel natural or polite—are still part of the computation.

This is obviously a lighthearted example that seems trivial at first.

But multiply it across millions of requests…
Now it’s a systems problem.
How many extra tokens do you think you generate just from “please” and “thanks”?

A simple mental model

Think of LLM usage like this:
Every request is treated as a completely new computation, even when it’s not.
There’s no built-in concept of:
“we’ve already solved something like this”
“this looks similar to a previous request”
“we could reuse part of this result”
So the system does exactly what it’s designed to do:
recompute everything

When this becomes a real problem

If you’re just experimenting, this isn’t a big deal.
But once you start:
building products
scaling usage
or running teams
This pattern starts to matter.
Because now you have:
more requests
more iteration
more overlap
And all of it compounds.

This is where the shift happens

At some point, the problem stops being:
“How do we use AI effectively?”
And starts becoming:
“How do we use AI efficiently?”
That’s a very different question.

Why this isn’t obvious (yet)

Most of the conversation around AI is still focused on:
model quality
capabilities
performance
Not:
usage patterns
repetition
system-level efficiency
So a lot of teams don’t even look for this problem.

What I’m trying to understand

After noticing this pattern, the question I’ve been thinking about is:
How much of LLM usage is actually new… and how much is just repetition in disguise?
And more importantly:
If a meaningful portion is repetitive, what should we do about it?

What I’ll explore next

In the next post, I’ll go deeper into one specific part of this:
why you’re probably paying twice for the same LLM response, even when the prompts aren’t identical.

👉 Part 1 is here:
(https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700)

DEV Community