<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joshua Chukwu</title>
    <description>The latest articles on DEV Community by Joshua Chukwu (@joshua_chukwu_ccb92f05a94).</description>
    <link>https://dev.to/joshua_chukwu_ccb92f05a94</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3906568%2F3dc9d440-1a71-4e75-8be0-973c8a5ce1f2.png</url>
      <title>DEV Community: Joshua Chukwu</title>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joshua_chukwu_ccb92f05a94"/>
    <language>en</language>
    <item>
      <title>What’s actually missing in most AI stacks</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Tue, 19 May 2026 13:44:21 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/whats-actually-missing-in-most-ai-stacks-1673</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/whats-actually-missing-in-most-ai-stacks-1673</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 9)&lt;br&gt;
It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;Over the last few posts, I talked a lot about the differences between how humans process information and how systems process information.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;When humans take a multiple-choice quiz and several answers look similar, we naturally slow down.&lt;/p&gt;

&lt;p&gt;We:&lt;/p&gt;

&lt;p&gt;scan carefully&lt;br&gt;
 compare patterns&lt;br&gt;
 revisit assumptions&lt;br&gt;
 and spend more time deciding what is actually relevant&lt;/p&gt;

&lt;p&gt;And when there’s a timer involved, we suddenly become aware of the actual cost of processing information:&lt;/p&gt;

&lt;p&gt;TIME.&lt;/p&gt;

&lt;p&gt;Systems process information differently, albeit much faster.&lt;/p&gt;

&lt;p&gt;But the underlying tradeoff still exists:&lt;/p&gt;

&lt;p&gt;more context&lt;br&gt;
 more comparisons&lt;br&gt;
 more ambiguity&lt;br&gt;
 more computation&lt;/p&gt;

&lt;p&gt;Which raises an interesting question:&lt;/p&gt;

&lt;p&gt;should we feed systems information the same way humans naturally think about problems?&lt;/p&gt;

&lt;p&gt;Or should we instead optimize workflows around the strengths and limitations of the systems themselves?&lt;/p&gt;

&lt;p&gt;The more I think about it, the more I feel like most AI stacks today are missing an entire layer.&lt;/p&gt;

&lt;p&gt;Not:&lt;/p&gt;

&lt;p&gt;smarter models&lt;br&gt;
 larger context windows&lt;br&gt;
 or more API access&lt;/p&gt;

&lt;p&gt;A different layer entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The current AI stack is incomplete
&lt;/h2&gt;

&lt;p&gt;Right now, many AI workflows still look like this:&lt;/p&gt;

&lt;p&gt;User → Prompt → Model → Response&lt;/p&gt;

&lt;p&gt;At small scale, this works perfectly fine.&lt;/p&gt;

&lt;p&gt;But once usage compounds across:&lt;/p&gt;

&lt;p&gt;teams&lt;br&gt;
 organizations&lt;br&gt;
 agents&lt;br&gt;
 workflows&lt;br&gt;
 and long-running projects&lt;/p&gt;

&lt;p&gt;the cracks start appearing.&lt;/p&gt;

&lt;p&gt;Because the system has very little understanding of:&lt;/p&gt;

&lt;p&gt;reuse&lt;br&gt;
 coordination&lt;br&gt;
 attribution&lt;br&gt;
 cost efficiency&lt;br&gt;
 workflow overlap&lt;br&gt;
 or memory lifecycle management&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s missing
&lt;/h2&gt;

&lt;p&gt;I think most AI systems are currently missing:&lt;/p&gt;

&lt;p&gt;an operational intelligence layer.&lt;/p&gt;

&lt;p&gt;Something sitting between:&lt;/p&gt;

&lt;p&gt;humans and raw model inference&lt;/p&gt;

&lt;p&gt;A layer responsible for:&lt;/p&gt;

&lt;p&gt;orchestration&lt;br&gt;
 routing&lt;br&gt;
 observability&lt;br&gt;
 context optimization&lt;br&gt;
 memory lifecycle management&lt;br&gt;
 intelligent reuse&lt;br&gt;
 governance&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;and compute efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not to replace the models.&lt;/p&gt;

&lt;p&gt;But to make large-scale AI usage sustainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Right now, most systems are reactive
&lt;/h2&gt;

&lt;p&gt;Most current workflows only react after:&lt;/p&gt;

&lt;p&gt;costs spike&lt;br&gt;
 limits get hit&lt;br&gt;
 latency grows&lt;br&gt;
 context becomes bloated&lt;br&gt;
 or workflows become chaotic&lt;/p&gt;

&lt;p&gt;But by then:&lt;/p&gt;

&lt;p&gt;the inefficiency has already compounded.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cloud parallel keeps showing up
&lt;/h2&gt;

&lt;p&gt;Cloud computing followed a similar pattern.&lt;/p&gt;

&lt;p&gt;At first:&lt;/p&gt;

&lt;p&gt;compute availability was the breakthrough.&lt;/p&gt;

&lt;p&gt;Later:&lt;/p&gt;

&lt;p&gt;orchestration mattered&lt;br&gt;
 observability mattered&lt;br&gt;
 governance mattered&lt;br&gt;
 cost attribution mattered&lt;br&gt;
 optimization mattered&lt;/p&gt;

&lt;p&gt;I think AI is entering a similar phase now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intelligence alone does not create efficiency
&lt;/h2&gt;

&lt;p&gt;This is the part I keep coming back to.&lt;/p&gt;

&lt;p&gt;A smarter model does not automatically solve:&lt;/p&gt;

&lt;p&gt;duplicated reasoning&lt;br&gt;
 overlapping workflows&lt;br&gt;
 repeated context&lt;br&gt;
 unnecessary inference&lt;br&gt;
 or organizational inefficiency&lt;/p&gt;

&lt;p&gt;Those problems exist above the model layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dangerous illusion
&lt;/h2&gt;

&lt;p&gt;Larger context windows and more capable models can sometimes create the illusion that:&lt;/p&gt;

&lt;p&gt;scaling problems are solved.&lt;/p&gt;

&lt;p&gt;But in many cases:&lt;/p&gt;

&lt;p&gt;the system may simply be brute-forcing more compute through increasingly messy workflows.&lt;/p&gt;

&lt;p&gt;That works temporarily.&lt;/p&gt;

&lt;p&gt;Until scale compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What organizations will eventually ask
&lt;/h2&gt;

&lt;p&gt;I think organizations will increasingly start asking questions like:&lt;/p&gt;

&lt;p&gt;Where is our AI spend actually going?&lt;br&gt;
 Which workflows are inefficient?&lt;br&gt;
 Which teams generate the most repeated reasoning?&lt;br&gt;
 What context should persist?&lt;br&gt;
 What should expire?&lt;br&gt;
 What should never hit the model at all?&lt;br&gt;
 What work are we recomputing unnecessarily?&lt;/p&gt;

&lt;p&gt;Those are operational questions.&lt;/p&gt;

&lt;p&gt;Not purely model questions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The next phase of AI infrastructure
&lt;/h2&gt;

&lt;p&gt;The first phase of AI was:&lt;/p&gt;

&lt;p&gt;access to intelligence.&lt;/p&gt;

&lt;p&gt;The next phase may become:&lt;/p&gt;

&lt;p&gt;efficient coordination of intelligence.&lt;/p&gt;

&lt;p&gt;And I think that changes the infrastructure conversation completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  My opinion
&lt;/h2&gt;

&lt;p&gt;I think the companies that win long term won’t necessarily be:&lt;/p&gt;

&lt;p&gt;the companies generating the most tokens&lt;/p&gt;

&lt;p&gt;But the companies that best understand:&lt;/p&gt;

&lt;p&gt;orchestration&lt;br&gt;
 reuse&lt;br&gt;
 observability&lt;br&gt;
 memory management&lt;br&gt;
 and intelligent compute allocation&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the final post of this series, I’ll talk about the conclusion this entire journey eventually led me to:&lt;/p&gt;

&lt;p&gt;The current setup and workflow I use to manage AI context, repeated reasoning, memory drift, and operational inefficiency across long-running projects.&lt;/p&gt;

&lt;p&gt;👉 Part 8 is here: &lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/why-you-cant-just-cache-everything-privacy-safety-and-reality-5943"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/why-you-cant-just-cache-everything-privacy-safety-and-reality-5943&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Most AI conversations today focus on:&lt;/p&gt;

&lt;p&gt;model intelligence.&lt;/p&gt;

&lt;p&gt;But I think the harder long-term problem may become:&lt;/p&gt;

&lt;p&gt;operational intelligence.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Why you can’t just cache everything (privacy, safety, and reality)</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Mon, 18 May 2026 12:05:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/why-you-cant-just-cache-everything-privacy-safety-and-reality-5943</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/why-you-cant-just-cache-everything-privacy-safety-and-reality-5943</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 8)&lt;br&gt;
It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;In the last few posts, I talked a lot about:&lt;br&gt;
repeated reasoning&lt;br&gt;
workflow duplication&lt;br&gt;
context growth&lt;br&gt;
reuse&lt;br&gt;
and AI control planes. &lt;/p&gt;

&lt;p&gt;At first glance, the solution feels obvious:&lt;br&gt;
“Why not just cache everything?”&lt;br&gt;
If organizations repeatedly ask similar questions, why recompute the same reasoning over and over again?&lt;br&gt;
The idea sounds simple.&lt;br&gt;
Reality is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden assumption
&lt;/h2&gt;

&lt;p&gt;A lot of discussions around AI optimization assume:&lt;br&gt;
every request is safely reusable.&lt;br&gt;
But once AI moves from:&lt;br&gt;
hobby projects&lt;br&gt;
to:&lt;br&gt;
organizations&lt;br&gt;
enterprises&lt;br&gt;
production workflows&lt;br&gt;
internal tooling&lt;br&gt;
the problem changes completely.&lt;br&gt;
Because now the system starts interacting with:&lt;br&gt;
private codebases&lt;br&gt;
internal documents&lt;br&gt;
customer data&lt;br&gt;
financial information&lt;br&gt;
credentials&lt;br&gt;
legal discussions&lt;br&gt;
deployment infrastructure&lt;br&gt;
and operational decisions&lt;br&gt;
That changes the optimization equation immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trust boundary problem
&lt;/h2&gt;

&lt;p&gt;This is where things become difficult.&lt;br&gt;
Two prompts may look semantically similar:&lt;br&gt;
“Why is my deployment failing?”&lt;/p&gt;

&lt;p&gt;But underneath, the contexts may contain:&lt;br&gt;
completely different infrastructure&lt;br&gt;
different secrets&lt;br&gt;
different environments&lt;br&gt;
different permissions&lt;br&gt;
different organizations&lt;br&gt;
Which means:&lt;br&gt;
Similarity alone is not enough.&lt;br&gt;
A system cannot blindly reuse reasoning across trust boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dangerous version of optimization
&lt;/h2&gt;

&lt;p&gt;This is the part I think many people underestimate.&lt;br&gt;
Aggressive optimization without governance can quietly become:&lt;br&gt;
a privacy problem&lt;br&gt;
a security problem&lt;br&gt;
or a trust problem&lt;br&gt;
Especially once:&lt;br&gt;
organizations&lt;br&gt;
teams&lt;br&gt;
or multiple users&lt;br&gt;
share the same AI infrastructure layer.&lt;br&gt;
Because now the system must answer questions like:&lt;br&gt;
What can safely be reused?&lt;br&gt;
What should remain isolated?&lt;br&gt;
Which contexts are sensitive?&lt;br&gt;
Who owns the generated reasoning?&lt;br&gt;
What should expire?&lt;br&gt;
What should never persist at all?&lt;br&gt;
Those are not just engineering problems anymore.&lt;br&gt;
They become:&lt;br&gt;
organizational&lt;br&gt;
legal&lt;br&gt;
operational&lt;br&gt;
and ethical problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why enterprise AI becomes harder
&lt;/h2&gt;

&lt;p&gt;At small scale, people mostly think about:&lt;br&gt;
“making the model smarter.”&lt;br&gt;
At organizational scale, companies start worrying about:&lt;br&gt;
observability&lt;br&gt;
compliance&lt;br&gt;
governance&lt;br&gt;
attribution&lt;br&gt;
auditability&lt;br&gt;
and trust boundaries&lt;br&gt;
Which is why scaling AI usage inside organizations becomes much more complicated than:&lt;br&gt;
“Just increase the context window.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Human behavior complicates this further
&lt;/h2&gt;

&lt;p&gt;Humans are messy.&lt;br&gt;
We:&lt;br&gt;
paste sensitive logs&lt;br&gt;
include unnecessary context&lt;br&gt;
reuse prompts carelessly&lt;br&gt;
carry old information forward&lt;br&gt;
and mix unrelated workflows together constantly&lt;br&gt;
That means optimization systems cannot simply assume:&lt;br&gt;
more memory = better.&lt;br&gt;
Sometimes:&lt;br&gt;
more memory increases risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The difficult tradeoff
&lt;/h2&gt;

&lt;p&gt;This creates a difficult system tension.&lt;br&gt;
Organizations want:&lt;br&gt;
lower cost&lt;br&gt;
faster workflows&lt;br&gt;
more reuse&lt;br&gt;
less recomputation&lt;br&gt;
But they also need:&lt;br&gt;
isolation&lt;br&gt;
privacy&lt;br&gt;
security&lt;br&gt;
and trust&lt;br&gt;
And those goals often push against each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Opinion
&lt;/h2&gt;

&lt;p&gt;I’m believe that  the long-term winners in AI infrastructure won’t just be:&lt;br&gt;
the companies with the largest models&lt;br&gt;
or the cheapest inference&lt;br&gt;
But the companies that best understand:&lt;br&gt;
orchestration&lt;br&gt;
trust boundaries&lt;br&gt;
memory lifecycle&lt;br&gt;
governance&lt;br&gt;
and intelligent reuse under constraints&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization without control becomes dangerous
&lt;/h2&gt;

&lt;p&gt;One thing cloud infrastructure taught us is this:&lt;br&gt;
efficiency without governance eventually creates chaos.&lt;br&gt;
I think AI infrastructure may be heading toward the same lesson.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll summarize what I think is currently missing across most AI stacks:&lt;br&gt;
the missing layer between model intelligence and operational efficiency.&lt;/p&gt;

&lt;p&gt;👉 Part 7 is here: &lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/why-ai-products-need-a-control-plane-not-just-api-calls-26ne?comments_sort=top#toggle-comments-sort-dropdown"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/why-ai-products-need-a-control-plane-not-just-api-calls-26ne?comments_sort=top#toggle-comments-sort-dropdown&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The challenge is no longer:&lt;br&gt;
“Can AI generate useful responses?”&lt;br&gt;
The harder challenge may become:&lt;br&gt;
“How do we optimize intelligence without breaking trust?”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why AI products need a “control plane” (not just API calls)</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Fri, 15 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/why-ai-products-need-a-control-plane-not-just-api-calls-26ne</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/why-ai-products-need-a-control-plane-not-just-api-calls-26ne</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 7)&lt;br&gt;
It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Control Plane?
&lt;/h2&gt;

&lt;p&gt;Imagine a world with hundreds of thousands of planes in the sky, but no airports, no central communication systems, and no coordination for routing, landing, or takeoff scheduling.&lt;br&gt;
Sounds chaotic, right?&lt;br&gt;
Air traffic control acts as a control plane.&lt;br&gt;
The planes do the actual work, but the control systems coordinate:&lt;br&gt;
routing&lt;br&gt;
scheduling&lt;br&gt;
communication&lt;br&gt;
and safety&lt;br&gt;
Now let’s refocus back to AI.&lt;/p&gt;

&lt;p&gt;Most AI products today are surprisingly simple underneath.&lt;br&gt;
At their core, many of them are essentially:&lt;br&gt;
User input → LLM API call → Response&lt;/p&gt;

&lt;p&gt;At a small scale, this works perfectly fine.&lt;br&gt;
But as AI adoption grows, raw API calls alone stop being enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden assumption
&lt;/h2&gt;

&lt;p&gt;A lot of current AI workflows assume:&lt;br&gt;
“Since the model is smart, the system will scale.”&lt;br&gt;
But intelligence alone does not solve:&lt;br&gt;
cost visibility&lt;br&gt;
repeated reasoning&lt;br&gt;
workflow duplication&lt;br&gt;
routing&lt;br&gt;
attribution&lt;br&gt;
memory growth&lt;br&gt;
governance&lt;br&gt;
or context management&lt;br&gt;
Those are infrastructural problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happens inside organizations
&lt;/h2&gt;

&lt;p&gt;Most businesses already use AI to some extent.&lt;br&gt;
Maybe not officially at the organizational level yet, but when:&lt;br&gt;
developers use Codex&lt;br&gt;
teams use ChatGPT&lt;br&gt;
support uses Claude&lt;br&gt;
designers use AI generation tools&lt;br&gt;
employees automate tasks independently&lt;br&gt;
then the organization is already relying on AI operationally.&lt;br&gt;
The problem is:&lt;br&gt;
Most of this usage is happening without coordination.&lt;/p&gt;

&lt;h2&gt;
  
  
  The duplication problem scales quietly
&lt;/h2&gt;

&lt;p&gt;At small scale:&lt;br&gt;
repeated prompts&lt;br&gt;
retries&lt;br&gt;
overlapping reasoning&lt;br&gt;
duplicate workflows&lt;br&gt;
feel harmless.&lt;br&gt;
At an organizational scale, they compound.&lt;br&gt;
Different people may:&lt;br&gt;
solve the same issue repeatedly&lt;br&gt;
regenerate similar reasoning&lt;br&gt;
feed the same context multiple times&lt;br&gt;
independently rediscover the same solution paths&lt;br&gt;
Without realizing it.&lt;br&gt;
This starts looking less like “usage”&lt;br&gt;
And more like:&lt;br&gt;
distributed inefficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why APIs alone are insufficient
&lt;/h2&gt;

&lt;p&gt;An API call only answers:&lt;br&gt;
“Can the model respond?”&lt;br&gt;
It does not answer:&lt;br&gt;
Have we solved something similar before?&lt;br&gt;
Should this request even hit the model?&lt;br&gt;
Is this the optimal model for this task?&lt;br&gt;
Is this repeated work?&lt;br&gt;
Is this context unnecessarily large?&lt;br&gt;
Which team is driving the cost?&lt;br&gt;
Which workflows are inefficient?&lt;br&gt;
What should persist?&lt;br&gt;
What should expire?&lt;br&gt;
Those questions exist above the model layer.&lt;br&gt;
This is where a control plane becomes important&lt;br&gt;
AI systems eventually need something closer to a CONTROL PLANE, not just raw model access.&lt;br&gt;
A layer responsible for:&lt;br&gt;
routing&lt;br&gt;
caching&lt;br&gt;
attribution&lt;br&gt;
observability&lt;br&gt;
governance&lt;br&gt;
context optimization&lt;br&gt;
and intelligent reuse&lt;br&gt;
Because once AI becomes operational infrastructure, organizations eventually need visibility into:&lt;br&gt;
where compute is going&lt;br&gt;
why it is being consumed&lt;br&gt;
and whether the work being performed is actually necessary&lt;/p&gt;

&lt;h2&gt;
  
  
  The cloud parallel
&lt;/h2&gt;

&lt;p&gt;Cloud computing went through something similar.&lt;br&gt;
At first:&lt;br&gt;
spinning up compute felt magical.&lt;br&gt;
Then eventually organizations realized:&lt;br&gt;
compute sprawl exists&lt;br&gt;
waste compounds&lt;br&gt;
visibility matters&lt;br&gt;
governance matters&lt;br&gt;
optimization matters&lt;br&gt;
I think AI is approaching a similar phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  The difficult part
&lt;/h2&gt;

&lt;p&gt;We have previously shown that human interaction with AI is inherently messy.&lt;br&gt;
Humans:&lt;br&gt;
revisit ideas&lt;br&gt;
refine prompts&lt;br&gt;
change direction&lt;br&gt;
retry tasks&lt;br&gt;
explore uncertainty&lt;br&gt;
So unlike traditional deterministic systems, the boundaries between:&lt;br&gt;
“new work”&lt;br&gt;
and&lt;br&gt;
“repeated work”&lt;br&gt;
become much harder to detect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Because if organizations do not eventually solve:&lt;br&gt;
reuse&lt;br&gt;
coordination&lt;br&gt;
attribution&lt;br&gt;
and context efficiency&lt;br&gt;
then scaling AI usage may become significantly more expensive than most people currently expect.&lt;br&gt;
Especially for:&lt;br&gt;
startups&lt;br&gt;
vibecoders&lt;br&gt;
small engineering teams&lt;br&gt;
and companies heavily integrating AI into daily workflows&lt;/p&gt;

&lt;h2&gt;
  
  
  My Opinion
&lt;/h2&gt;

&lt;p&gt;I think the companies that win long term won’t just be the companies with the best models&lt;br&gt;
But the companies that best understand:&lt;br&gt;
orchestration&lt;br&gt;
memory management&lt;br&gt;
workflow efficiency&lt;br&gt;
and intelligent compute allocation&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll talk about something that makes this even harder:&lt;br&gt;
why you can’t simply cache everything&lt;br&gt;
Especially once:&lt;br&gt;
privacy&lt;br&gt;
enterprise trust&lt;br&gt;
and sensitive data&lt;br&gt;
enter the picture.&lt;/p&gt;

&lt;p&gt;👉 Part 6 is here: (&lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/why-similarity-matters-more-than-exact-matches-in-llm-systems-46pa"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/why-similarity-matters-more-than-exact-matches-in-llm-systems-46pa&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The first phase of AI adoption was:&lt;br&gt;
“Can we make this work?”&lt;br&gt;
The next phase may become:&lt;br&gt;
“Can we make this scale efficiently?”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>Why similarity matters more than exact matches in LLM systems</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Wed, 13 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/why-similarity-matters-more-than-exact-matches-in-llm-systems-46pa</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/why-similarity-matters-more-than-exact-matches-in-llm-systems-46pa</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 6)&lt;br&gt;
It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;In the last post, I talked about why simple caching doesn’t solve most LLM inefficiency.&lt;br&gt;
Caching works well for:&lt;br&gt;
exact prompts&lt;br&gt;
repeated workflows&lt;br&gt;
deterministic pipelines&lt;br&gt;
But that’s not how humans actually use LLMs.&lt;br&gt;
Humans don’t think in exact matches.&lt;br&gt;
We think in:&lt;br&gt;
approximations&lt;br&gt;
iterations&lt;br&gt;
refinements&lt;br&gt;
revisits&lt;br&gt;
And that changes everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with exact matching
&lt;/h2&gt;

&lt;p&gt;Traditional caching systems work by asking:&lt;br&gt;
“Have I seen this exact request before?”&lt;br&gt;
That works great for:&lt;br&gt;
static assets&lt;br&gt;
repeated queries&lt;br&gt;
deterministic systems&lt;br&gt;
But LLM usage is rarely exact.&lt;br&gt;
You ask:&lt;br&gt;
“Why is this deployment failing?”&lt;br&gt;
Then later:&lt;br&gt;
“Could this Railway deployment issue be related to env validation?”&lt;br&gt;
Different prompts.&lt;br&gt;
Very similar intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Humans think in similarity, not identity
&lt;/h2&gt;

&lt;p&gt;This is the part I think most systems still struggle with.&lt;br&gt;
When humans solve problems, we rarely:&lt;br&gt;
repeat the exact same sentence&lt;br&gt;
follow the exact same reasoning path&lt;br&gt;
or preserve identical context&lt;br&gt;
Instead we:&lt;br&gt;
circle around ideas&lt;br&gt;
refine wording&lt;br&gt;
revisit earlier thoughts&lt;br&gt;
approach the same problem from different angles&lt;br&gt;
From a human perspective:&lt;br&gt;
these are connected&lt;br&gt;
From most systems’ perspectives:&lt;br&gt;
they are completely unrelated requests&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Because if the system only recognizes:&lt;br&gt;
exact matches&lt;br&gt;
Then most real-world repetition gets missed.&lt;br&gt;
And that means:&lt;br&gt;
recomputation continues&lt;br&gt;
context keeps growing&lt;br&gt;
costs keep compounding&lt;/p&gt;

&lt;h2&gt;
  
  
  A pattern I kept noticing
&lt;/h2&gt;

&lt;p&gt;One thing I started noticing while working across different systems was this:&lt;br&gt;
The model would often regenerate reasoning I had effectively already paid for before.&lt;br&gt;
Not identical wording.&lt;br&gt;
But the same underlying explanation:&lt;br&gt;
same deployment issue&lt;br&gt;
same architectural tradeoff&lt;br&gt;
same debugging pattern&lt;br&gt;
same reasoning path&lt;br&gt;
Just wrapped in a slightly different context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden inefficiency
&lt;/h2&gt;

&lt;p&gt;This creates a strange situation.&lt;br&gt;
The user feels like:&lt;br&gt;
they are progressing through a problem&lt;br&gt;
But the system may actually be:&lt;br&gt;
repeatedly recomputing overlapping reasoning&lt;br&gt;
That overlap is hard to notice because:&lt;br&gt;
humans naturally think iteratively&lt;br&gt;
conversations evolve gradually&lt;br&gt;
context changes slightly every step&lt;/p&gt;

&lt;h2&gt;
  
  
  Why similarity is difficult
&lt;/h2&gt;

&lt;p&gt;Similarity sounds easy conceptually.&lt;br&gt;
It isn’t.&lt;br&gt;
Because similarity is:&lt;br&gt;
contextual&lt;br&gt;
probabilistic&lt;br&gt;
semantic&lt;br&gt;
constantly shifting&lt;br&gt;
Two prompts can:&lt;br&gt;
look different syntactically&lt;br&gt;
but mean almost the same thing&lt;br&gt;
Or:&lt;br&gt;
look similar&lt;br&gt;
but require completely different reasoning&lt;br&gt;
That makes reuse much harder than traditional caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deeper issue
&lt;/h2&gt;

&lt;p&gt;At this point, the problem stops being:&lt;br&gt;
“Can we store responses?”&lt;br&gt;
And becomes:&lt;br&gt;
“Can we recognize when work is fundamentally overlapping?”&lt;br&gt;
That is a very different system problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context makes this worse
&lt;/h2&gt;

&lt;p&gt;Long threads amplify the issue even more.&lt;br&gt;
Because each new message carries:&lt;br&gt;
previous reasoning&lt;br&gt;
prior attempts&lt;br&gt;
accumulated context&lt;br&gt;
So even when the new information is small:&lt;br&gt;
the system still processes the entire growing state around it&lt;br&gt;
That’s one reason costs quietly compound over time.&lt;br&gt;
This changes how I think about memory&lt;br&gt;
A lot of current AI workflows treat memory like:&lt;br&gt;
“store more context”&lt;br&gt;
But I’m starting to think the more important question is:&lt;br&gt;
“What context actually needs to survive?”&lt;br&gt;
Because not all memory is useful.&lt;br&gt;
Some memory:&lt;br&gt;
reinforces direction&lt;br&gt;
Other memory:&lt;br&gt;
introduces drift&lt;br&gt;
redundancy&lt;br&gt;
noise&lt;br&gt;
repeated reasoning loops&lt;br&gt;
The systems that win&lt;br&gt;
I think the systems that eventually win won’t just be:&lt;br&gt;
the smartest models&lt;br&gt;
or the largest context windows&lt;br&gt;
They’ll be the systems that:&lt;br&gt;
understand relevance&lt;br&gt;
recognize overlap&lt;br&gt;
manage context intelligently&lt;br&gt;
and avoid unnecessary recomputation&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m trying to understand
&lt;/h2&gt;

&lt;p&gt;At this point, the question I keep coming back to is:&lt;br&gt;
How much intelligence is actually new computation?&lt;br&gt;
Humans are dynamic in behavior and thought.&lt;br&gt;
We:&lt;br&gt;
revisit ideas&lt;br&gt;
refine reasoning&lt;br&gt;
circle back&lt;br&gt;
change direction constantly&lt;br&gt;
But systems tend toward determinism, because the more deterministic a system is,&lt;br&gt;
the easier it becomes to:&lt;br&gt;
optimize&lt;br&gt;
predict&lt;br&gt;
cache&lt;br&gt;
and run efficiently&lt;br&gt;
So the tension is:&lt;br&gt;
How intelligent can AI become while still achieving the ultimate goal of efficiency?&lt;br&gt;
Because intelligence may naturally resist exact repetition, while efficiency depends on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore nex
&lt;/h2&gt;

&lt;p&gt;t&lt;br&gt;
In the next post, I’ll go deeper into this idea:&lt;br&gt;
why bigger context windows alone may not actually solve the problem&lt;/p&gt;

&lt;p&gt;👉 Part 5 is here: &lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/i-tried-caching-llm-responses-it-didnt-work-the-way-i-expected-aj"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/i-tried-caching-llm-responses-it-didnt-work-the-way-i-expected-aj&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Humans don’t think in exact matches.&lt;br&gt;
We think in related ideas, partial overlaps, revisits, and refinements.&lt;br&gt;
But most LLM systems still treat those interactions as completely new work.&lt;br&gt;
And I think that gap is where a lot of the hidden inefficiency lives.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>I tried caching LLM responses. It didn’t work the way I expected.</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Mon, 11 May 2026 12:30:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/i-tried-caching-llm-responses-it-didnt-work-the-way-i-expected-aj</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/i-tried-caching-llm-responses-it-didnt-work-the-way-i-expected-aj</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 5)&lt;br&gt;
 It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;In the last few posts, I’ve been exploring how LLM usage behaves in practice:&lt;br&gt;
it’s iterative&lt;br&gt;
it’s repetitive&lt;br&gt;
and it compounds through growing context&lt;br&gt;
So the obvious question becomes:&lt;br&gt;
why not just cache the responses?&lt;/p&gt;

&lt;h2&gt;
  
  
  The simple idea
&lt;/h2&gt;

&lt;p&gt;At first, this feels straightforward.&lt;br&gt;
If you’ve already asked something before:&lt;br&gt;
just store the response and reuse it&lt;br&gt;
No recomputation.&lt;br&gt;
 No extra cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reality
&lt;/h2&gt;

&lt;p&gt;It works… but only in very limited cases.&lt;br&gt;
Specifically:&lt;br&gt;
exact matches&lt;br&gt;
If the prompt is identical:&lt;br&gt;
same wording&lt;br&gt;
same structure&lt;br&gt;
same input&lt;br&gt;
Then yes, caching works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with exact matches
&lt;/h2&gt;

&lt;p&gt;That’s not how people use LLMs.&lt;br&gt;
In practice, prompts look like this:&lt;br&gt;
slightly reworded&lt;br&gt;
slightly extended&lt;br&gt;
slightly more context&lt;br&gt;
Same intent.&lt;br&gt;
Different string.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple example
&lt;/h2&gt;

&lt;p&gt;You ask:&lt;br&gt;
“Why is my rover not turning in place?”&lt;br&gt;
Later, you ask:&lt;br&gt;
“What could cause a skid-steer robot to fail a zero-radius turn?”&lt;br&gt;
These are clearly related.&lt;br&gt;
But to a basic cache:&lt;br&gt;
they are completely different requests&lt;br&gt;
So the system:&lt;br&gt;
misses the cache&lt;br&gt;
recomputes the answer&lt;br&gt;
charges again&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Because most repetition isn’t:&lt;br&gt;
exact&lt;br&gt;
It’s:&lt;br&gt;
conceptual&lt;/p&gt;

&lt;h2&gt;
  
  
  What caching actually solves
&lt;/h2&gt;

&lt;p&gt;Basic caching helps with:&lt;br&gt;
identical API calls&lt;br&gt;
repeated automated workflows&lt;br&gt;
fixed prompt pipelines&lt;br&gt;
That’s useful.&lt;br&gt;
But it only captures a small portion of real-world usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn’t solve
&lt;/h2&gt;

&lt;p&gt;It doesn’t handle:&lt;br&gt;
rephrased prompts&lt;br&gt;
debugging loops&lt;br&gt;
evolving context&lt;br&gt;
team-level overlap&lt;br&gt;
Which is where a lot of the cost actually comes from.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deeper issue
&lt;/h2&gt;

&lt;p&gt;At this point, the problem isn’t:&lt;br&gt;
“how do we store responses?”&lt;br&gt;
It’s:&lt;br&gt;
“how do we recognize when two requests are actually the same work?”&lt;/p&gt;

&lt;h2&gt;
  
  
  A different framing
&lt;/h2&gt;

&lt;p&gt;Instead of thinking in terms of:&lt;br&gt;
prompt ….. response&lt;br&gt;
It becomes:&lt;br&gt;
intent ….. reasoning….. output&lt;br&gt;
Caching only works at the prompt level.&lt;br&gt;
But the repetition happens at the intent level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is hard
&lt;/h2&gt;

&lt;p&gt;Because intent isn’t explicit.&lt;br&gt;
It’s:&lt;br&gt;
inferred&lt;br&gt;
contextual&lt;br&gt;
and often slightly changing&lt;br&gt;
Which makes it difficult to:&lt;br&gt;
detect overlap&lt;br&gt;
reuse prior work&lt;br&gt;
or avoid recomputation&lt;/p&gt;

&lt;h2&gt;
  
  
  What this leads to
&lt;/h2&gt;

&lt;p&gt;So even after adding caching:&lt;br&gt;
costs still grow&lt;br&gt;
repetition still happens&lt;br&gt;
inefficiency remains&lt;br&gt;
Just slightly reduced.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m trying to understand
&lt;/h2&gt;

&lt;p&gt;At this point, the question becomes:&lt;br&gt;
what would it take to reuse work beyond exact matches?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll go deeper into this:&lt;br&gt;
what a system would need to actually recognize and reuse similar work&lt;/p&gt;

&lt;p&gt;Part 4 is here: (&lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/youre-probably-paying-twice-for-the-same-llm-response-481e?preview=c6a3bad002bb14076c2a13b65fc8db1237dfc016b3f1a582c0e448db6511dde5856630f2bb7a1d75865b66f23c5afdb373018f530b75a78e57b11a64"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/youre-probably-paying-twice-for-the-same-llm-response-481e?preview=c6a3bad002bb14076c2a13b65fc8db1237dfc016b3f1a582c0e448db6511dde5856630f2bb7a1d75865b66f23c5afdb373018f530b75a78e57b11a64&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Caching feels like the obvious solution.&lt;br&gt;
And it helps.&lt;br&gt;
But it doesn’t address the core issue:&lt;br&gt;
most of the repetition in LLM usage isn’t identical—it’s just similar&lt;br&gt;
And until we handle that,&lt;br&gt;
we’ll keep recomputing the same ideas&lt;br&gt;
 over and over again.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>You’re probably paying twice for the same LLM response</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Fri, 08 May 2026 12:06:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/youre-probably-paying-twice-for-the-same-llm-response-481e</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/youre-probably-paying-twice-for-the-same-llm-response-481e</guid>
      <description>&lt;h2&gt;
  
  
  Series
&lt;/h2&gt;

&lt;p&gt;: AI Isn’t an Engineering Problem Anymore (Part 4)&lt;br&gt;
 It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;In the last post, I talked about where AI costs actually come from—and how context growth quietly compounds them.&lt;br&gt;
But there’s another pattern that’s even more uncomfortable once you notice it:&lt;br&gt;
you’re probably paying twice for the same underlying answer&lt;br&gt;
Not literally the exact same string.&lt;br&gt;
But functionally, the same work.&lt;br&gt;
The obvious version&lt;br&gt;
Let’s start with something simple.&lt;br&gt;
You ask:&lt;br&gt;
“Why can’t my rover perform a zero-radius turn?”&lt;br&gt;
Then later you ask:&lt;br&gt;
“What could cause skid-steer instability at low speeds?”&lt;br&gt;
Different wording.&lt;br&gt;
Same underlying problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The subtle version (this is where it gets interesting)
&lt;/h2&gt;

&lt;p&gt;Now imagine a debugging session.&lt;br&gt;
You go through a loop:&lt;br&gt;
ask a question&lt;br&gt;
get a partial answer&lt;br&gt;
refine your prompt&lt;br&gt;
ask again&lt;br&gt;
Each step feels like progress.&lt;br&gt;
But look at it from a system perspective:&lt;br&gt;
you’re repeatedly asking for overlapping reasoning&lt;br&gt;
And each time:&lt;br&gt;
the model recomputes it&lt;br&gt;
the system bills it&lt;br&gt;
nothing is reused&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens naturally
&lt;/h2&gt;

&lt;p&gt;Because this is how humans think.&lt;br&gt;
We don’t ask perfect questions.&lt;br&gt;
We:&lt;br&gt;
explore&lt;br&gt;
rephrase&lt;br&gt;
iterate&lt;br&gt;
So the system ends up doing:&lt;br&gt;
slightly different versions of the same work&lt;br&gt;
over and over again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The team version (this is worse)
&lt;/h2&gt;

&lt;p&gt;Now scale this beyond one person.&lt;br&gt;
Multiple engineers might:&lt;br&gt;
debug similar issues&lt;br&gt;
build similar features&lt;br&gt;
ask similar questions&lt;br&gt;
But there’s no shared layer that says:&lt;br&gt;
“we’ve already solved something like this”&lt;br&gt;
So the same reasoning gets recomputed across:&lt;br&gt;
people&lt;br&gt;
time&lt;br&gt;
workflows&lt;/p&gt;

&lt;h2&gt;
  
  
  Why simple caching doesn’t solve it
&lt;/h2&gt;

&lt;p&gt;At this point, the obvious solution seems like:&lt;br&gt;
“just cache the response”&lt;br&gt;
But that only works for:&lt;br&gt;
exact matches&lt;br&gt;
In reality, most prompts are:&lt;br&gt;
slightly reworded&lt;br&gt;
slightly extended&lt;br&gt;
slightly different&lt;br&gt;
So traditional caching misses most of the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better way to think about it
&lt;/h2&gt;

&lt;p&gt;Instead of asking:&lt;br&gt;
“is this the same prompt?”&lt;br&gt;
The better question is:&lt;br&gt;
“is this the same intent?”&lt;br&gt;
Because that’s what actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick analogy
&lt;/h2&gt;

&lt;p&gt;It’s a very human pattern.&lt;br&gt;
You’ve probably done this before, looking for something e.g a key, checking the same place twice, even though you already know it’s not there.&lt;br&gt;
Not because it makes sense.&lt;br&gt;
But because you’re searching, refining, trying again.&lt;br&gt;
That’s exactly how we interact with LLMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden cost
&lt;/h2&gt;

&lt;p&gt;This is where things connect back to Post 3.&lt;br&gt;
If:&lt;br&gt;
a large portion of your usage is repetitive&lt;br&gt;
and context keeps growing&lt;br&gt;
and nothing is reused&lt;br&gt;
Then:&lt;br&gt;
you’re not just paying for usage&lt;br&gt;
 you’re paying for recomputation&lt;br&gt;
And computation has always had a cost.&lt;br&gt;
We’re only noticing it now because LLMs make that cost visible at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple mental model
&lt;/h2&gt;

&lt;p&gt;Think of it like this:&lt;br&gt;
Every time you ask a similar question, the system:&lt;br&gt;
re-derives the reasoning&lt;br&gt;
re-generates the explanation&lt;br&gt;
re-processes the context&lt;br&gt;
Even if:&lt;br&gt;
you’ve effectively already “paid” for that knowledge before&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than it seems
&lt;/h2&gt;

&lt;p&gt;At small scale, this is fine.&lt;br&gt;
At larger scale, it becomes:&lt;br&gt;
expensive&lt;br&gt;
inefficient&lt;br&gt;
invisible&lt;br&gt;
Because it doesn’t show up as:&lt;br&gt;
“duplicate cost”&lt;br&gt;
It shows up as:&lt;br&gt;
normal usage&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable realization
&lt;/h2&gt;

&lt;p&gt;Most teams don’t have a way to:&lt;br&gt;
detect repetition&lt;br&gt;
reuse prior work&lt;br&gt;
or even measure how much overlap exists&lt;br&gt;
So they default to:&lt;br&gt;
keep asking, keep recomputing, keep paying&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m trying to understand
&lt;/h2&gt;

&lt;p&gt;At this point, the question becomes:&lt;br&gt;
how much of LLM usage is actually new?&lt;br&gt;
And more importantly:&lt;br&gt;
how much of it needs to be recomputed?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll go deeper into this:&lt;br&gt;
what a system would actually need in order to reuse work effectively (beyond simple caching)&lt;/p&gt;

&lt;p&gt;Part 3 is here: (&lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/where-your-ai-budget-is-actually-going-its-not-what-you-think-3bi0?preview=856c097d9c26551c1a836539589fd827398a31aa4efacf6a4b5ce551f241a957c369cff2b5e18f6c0187f9f94b07430dd024cdd47bc200dd6501ba1d"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/where-your-ai-budget-is-actually-going-its-not-what-you-think-3bi0?preview=856c097d9c26551c1a836539589fd827398a31aa4efacf6a4b5ce551f241a957c369cff2b5e18f6c0187f9f94b07430dd024cdd47bc200dd6501ba1d&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;AI gives you answers instantly.&lt;br&gt;
But under the hood, every answer is being computed from scratch—again and again.&lt;br&gt;
And unless we rethink how that work is reused,&lt;br&gt;
we’re going to keep paying for the same intelligence&lt;br&gt;
 multiple times.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>automation</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Where your AI budget is actually going (it’s not what you think)</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Wed, 06 May 2026 12:30:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/where-your-ai-budget-is-actually-going-its-not-what-you-think-3bi0</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/where-your-ai-budget-is-actually-going-its-not-what-you-think-3bi0</guid>
      <description>&lt;h2&gt;
  
  
  Series:
&lt;/h2&gt;

&lt;p&gt;AI Isn’t an Engineering Problem Anymore (Part 3)&lt;br&gt;
 It’s a cost problem–and most teams don’t realize it yet.&lt;/p&gt;

&lt;p&gt;In the last post, I talked about how most LLM usage isn’t as “new” as it feels.&lt;br&gt;
A lot of it is:&lt;br&gt;
iterative&lt;br&gt;
repetitive&lt;br&gt;
overlapping&lt;br&gt;
That’s interesting on its own.&lt;br&gt;
But it becomes a lot more important when you start looking at it through a different lens:&lt;br&gt;
cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The assumption most people make.
&lt;/h2&gt;

&lt;p&gt;When people think about AI costs, they usually assume it comes from:&lt;br&gt;
heavy usage&lt;br&gt;
complex  and heavy queries&lt;br&gt;
large models&lt;br&gt;
high traffic&lt;br&gt;
Which is partially true, but incomplete.&lt;br&gt;
What actually adds up.&lt;br&gt;
In practice, a significant portion of usage comes from things like:&lt;br&gt;
retrying prompts&lt;br&gt;
slightly reworded questions&lt;br&gt;
debugging loops&lt;br&gt;
near-duplicate workflows&lt;br&gt;
None of these feel expensive individually.&lt;br&gt;
But together, they add up. &lt;br&gt;
In my experience from robotics and even outside engineering, anything that compounds tends to spiral faster than expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part most people miss
&lt;/h2&gt;

&lt;p&gt;.&lt;br&gt;
There’s another layer that makes this worse:&lt;br&gt;
context growth&lt;br&gt;
As conversations get longer, the model doesn’t just process your latest message.&lt;br&gt;
It processes:&lt;br&gt;
your current prompt&lt;br&gt;
plus everything that came before it (within the context window)&lt;br&gt;
So each new message isn’t just:&lt;br&gt;
“one more request”&lt;br&gt;
It’s:&lt;br&gt;
“one more request plus an increasing amount of prior context”.&lt;br&gt;
Why this compounds quickly&lt;br&gt;
Think about a long debugging session.&lt;br&gt;
Message 1: What is A&lt;br&gt;
small context&lt;br&gt;
relatively cheap&lt;br&gt;
Message 10: what is A made up of&lt;br&gt;
includes previous messages&lt;br&gt;
more tokens&lt;br&gt;
Message 30: compound characteristics of sample A+(message 1 - 29)&lt;br&gt;
includes a large conversation history&lt;br&gt;
significantly more tokens&lt;br&gt;
Now combine that with:&lt;br&gt;
iteration loops&lt;br&gt;
retries&lt;br&gt;
near-duplicate prompts&lt;br&gt;
And you get a pattern where:&lt;br&gt;
cost doesn’t just grow linearly—it compounds with usage patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  A rough mental model
&lt;/h2&gt;

&lt;p&gt;Imagine your usage looks like this:&lt;br&gt;
40% - genuinely new work&lt;br&gt;
30% - variations of the same request&lt;br&gt;
20% - retries / debugging loops&lt;br&gt;
10% - other&lt;br&gt;
Now layer in context growth.&lt;br&gt;
Even if each request seems small:&lt;br&gt;
later requests are more expensive than earlier ones&lt;br&gt;
And all of it is still treated as:&lt;br&gt;
new work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is easy to miss
&lt;/h2&gt;

&lt;p&gt;Because cost doesn’t show up per thought.&lt;br&gt;
It shows up per request.&lt;br&gt;
And each request feels justified.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compounding effect
&lt;/h2&gt;

&lt;p&gt;Now scale this:&lt;br&gt;
across a team&lt;br&gt;
across features&lt;br&gt;
across users&lt;br&gt;
What starts as:&lt;br&gt;
“a bit of iteration”&lt;br&gt;
becomes:&lt;br&gt;
a large portion of your AI spend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden problem
&lt;/h2&gt;

&lt;p&gt;The issue isn’t just cost.&lt;br&gt;
It’s visibility.&lt;br&gt;
Most teams don’t know:&lt;br&gt;
where their AI usage is going&lt;br&gt;
how much is repeated&lt;br&gt;
how context is affecting cost&lt;br&gt;
which workflows are inefficient&lt;br&gt;
So they default to:&lt;br&gt;
keep building, keep shipping, keep paying.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift
&lt;/h2&gt;

&lt;p&gt;At some point, the question changes from:&lt;br&gt;
“What can AI do?”&lt;br&gt;
to:&lt;br&gt;
“What is AI costing us?”&lt;br&gt;
This is where AI stops being:&lt;br&gt;
just an engineering problem&lt;br&gt;
And becomes:&lt;br&gt;
a financial one.&lt;/p&gt;

&lt;h2&gt;
  
  
  A different way to think about it
&lt;/h2&gt;

&lt;p&gt;Knowledge is often described as power.&lt;br&gt;
But knowledge on its own is more like potential energy.&lt;br&gt;
It only becomes power when it’s applied—when it turns into decision-making and control.&lt;br&gt;
In the context of AI:&lt;br&gt;
access to models is knowledge&lt;br&gt;
usage patterns are behavior&lt;br&gt;
but efficiency is applied understanding&lt;br&gt;
The teams that will win aren’t just the ones using AI.&lt;br&gt;
They’re the ones who:&lt;br&gt;
understand how their usage behaves&lt;br&gt;
maximize what they already have&lt;br&gt;
and design systems that avoid unnecessary repetition&lt;/p&gt;

&lt;h2&gt;
  
  
  A broader implication
&lt;/h2&gt;

&lt;p&gt;In this age, AI is becoming something that almost every team, and even every household will use.&lt;br&gt;
But just like any other resource:&lt;br&gt;
usage without visibility leads to waste&lt;br&gt;
There needs to be some form of:&lt;br&gt;
awareness&lt;br&gt;
control&lt;br&gt;
and, effectively, a “meter” on how it’s being used&lt;br&gt;
Not to restrict usage, but to understand it&lt;br&gt;
What I’m trying to understand&lt;br&gt;
At this point, the question isn’t just:&lt;br&gt;
how do we use AI?&lt;br&gt;
It’s:&lt;br&gt;
how do we make AI usage economically sustainable?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll go deeper into one specific piece of this:&lt;br&gt;
why simple caching approaches don’t fully solve the problem&lt;br&gt;
👉 Part 2 is here: (&lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/why-most-llm-api-usage-is-quietly-inefficient-4eko?preview=401f3ac7119c46d21f585a03fcb4a625008594ab67a937a4cdfafeebd060d28d70ff4a0887e0f29bc789f100e99c71b343e3223a6756451f5f83dc94"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/why-most-llm-api-usage-is-quietly-inefficient-4eko?preview=401f3ac7119c46d21f585a03fcb4a625008594ab67a937a4cdfafeebd060d28d70ff4a0887e0f29bc789f100e99c71b343e3223a6756451f5f83dc94&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;AI makes it easier to build.&lt;br&gt;
But it also makes it easier to spend.&lt;br&gt;
And unless we start thinking about how usage behaves and not just what models can do,&lt;br&gt;
that spend can grow in ways that are hard to see until it’s too late.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>openai</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why most LLM API usage is quietly inefficient</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Mon, 04 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/why-most-llm-api-usage-is-quietly-inefficient-4eko</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/why-most-llm-api-usage-is-quietly-inefficient-4eko</guid>
      <description>&lt;p&gt;Series: AI Isn’t an Engineering Problem Anymore (Part 2)&lt;br&gt;
 It’s a cost problem—and most teams don’t realize it yet.&lt;/p&gt;




&lt;p&gt;In the last post, I talked about hitting a usage limit while debugging my robot and realizing how repetitive my own AI usage had become.&lt;br&gt;
At the time, it felt like a personal workflow issue.&lt;br&gt;
But the more I thought about it, the more it became clear:&lt;br&gt;
This isn’t just a “me problem.”&lt;br&gt;
 It’s a pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  The illusion of “new” work
&lt;/h2&gt;

&lt;p&gt;When we use LLMs, whether through APIs or tools, it feels like every request is new.&lt;br&gt;
You type something different.&lt;br&gt;
 You add more context.&lt;br&gt;
 You refine your question.&lt;br&gt;
But under the hood, a lot of those requests are doing very similar work.&lt;br&gt;
For example:&lt;br&gt;
debugging the same issue from different angles&lt;br&gt;
rewording prompts to get a better answer&lt;br&gt;
retrying when output isn’t quite right&lt;br&gt;
asking for clarification on something you already asked&lt;br&gt;
Each one feels justified.&lt;br&gt;
And most of them are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where inefficiency actually comes from
&lt;/h2&gt;

&lt;p&gt;The inefficiency isn’t from using AI too much.&lt;br&gt;
It comes from how we naturally interact with it.&lt;br&gt;
A few patterns show up almost everywhere:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Iteration loops
&lt;/h2&gt;

&lt;p&gt;You don’t ask once, you iterate.&lt;br&gt;
“Try this approach”&lt;br&gt;
“That didn’t work, what about this?”&lt;br&gt;
“What if I change this parameter?”&lt;br&gt;
Each step builds on the last, but often overlaps heavily.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Near-duplicate prompts
&lt;/h2&gt;

&lt;p&gt;These are the most interesting ones.&lt;br&gt;
They’re not identical, but they’re close:&lt;br&gt;
same intent&lt;br&gt;
slightly different phrasing&lt;br&gt;
maybe a bit more context&lt;br&gt;
To a human, they’re obviously related.&lt;br&gt;
To most systems, they’re treated as completely new.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Retry behavior
&lt;/h2&gt;

&lt;p&gt;Sometimes you just don’t like the answer.&lt;br&gt;
So you try again.&lt;br&gt;
same prompt&lt;br&gt;
or a slightly modified one&lt;br&gt;
This is normal.&lt;br&gt;
But it means the same underlying request can be executed multiple times.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Team-level duplication
&lt;/h2&gt;

&lt;p&gt;This gets amplified in teams.&lt;br&gt;
Multiple developers might:&lt;br&gt;
debug similar issues&lt;br&gt;
build similar features&lt;br&gt;
ask similar questions&lt;br&gt;
But there’s no shared memory between them.&lt;br&gt;
So the same work gets repeated across people.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is hard to notice
&lt;/h2&gt;

&lt;p&gt;The tricky part is:&lt;br&gt;
None of this feels inefficient at the moment.&lt;br&gt;
It feels like:&lt;br&gt;
progress&lt;br&gt;
exploration&lt;br&gt;
iteration&lt;br&gt;
And that’s because it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick analogy (this is exactly how it happens)
&lt;/h2&gt;

&lt;p&gt;My dog gained 15 pounds in a year without us realizing.&lt;br&gt;
What was happening?&lt;br&gt;
Every weekday:&lt;br&gt;
I fed him before leaving for work at 6:30am&lt;br&gt;
my girlfriend fed him again at 8:30am&lt;br&gt;
From each of our perspectives:&lt;br&gt;
“I only fed him once.”&lt;br&gt;
But at the system level:&lt;br&gt;
He was getting fed twice, every single day.&lt;br&gt;
We only noticed when something else didn’t add up,&lt;br&gt;
 a 25-pound bag of dog food disappearing way too fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Back to LLM usage
&lt;/h2&gt;

&lt;p&gt;That’s exactly how LLM usage behaves.&lt;br&gt;
Individually:&lt;br&gt;
each request feels justified&lt;br&gt;
each interaction feels necessary&lt;br&gt;
But at the system level:&lt;br&gt;
similar work is being recomputed&lt;br&gt;
similar responses are being regenerated&lt;br&gt;
similar costs are being incurred&lt;br&gt;
Over and over again.&lt;br&gt;
There’s also a more subtle version of this that most people don’t think about.&lt;br&gt;
Even the smallest additions to prompts—things that feel natural or polite—are still part of the computation.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flw45ocxqx70xovddbp0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flw45ocxqx70xovddbp0p.png" alt=" " width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is obviously a lighthearted example that seems trivial at first.&lt;/p&gt;

&lt;p&gt;But multiply it across millions of requests…&lt;br&gt;
Now it’s a systems problem.&lt;br&gt;
How many extra tokens do you think you generate just from “please” and “thanks”?&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple mental model
&lt;/h2&gt;

&lt;p&gt;Think of LLM usage like this:&lt;br&gt;
Every request is treated as a completely new computation, even when it’s not.&lt;br&gt;
There’s no built-in concept of:&lt;br&gt;
“we’ve already solved something like this”&lt;br&gt;
“this looks similar to a previous request”&lt;br&gt;
“we could reuse part of this result”&lt;br&gt;
So the system does exactly what it’s designed to do:&lt;br&gt;
recompute everything&lt;/p&gt;

&lt;h2&gt;
  
  
  When this becomes a real problem
&lt;/h2&gt;

&lt;p&gt;If you’re just experimenting, this isn’t a big deal.&lt;br&gt;
But once you start:&lt;br&gt;
building products&lt;br&gt;
scaling usage&lt;br&gt;
or running teams&lt;br&gt;
This pattern starts to matter.&lt;br&gt;
Because now you have:&lt;br&gt;
more requests&lt;br&gt;
more iteration&lt;br&gt;
more overlap&lt;br&gt;
And all of it compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is where the shift happens
&lt;/h2&gt;

&lt;p&gt;At some point, the problem stops being:&lt;br&gt;
“How do we use AI effectively?”&lt;br&gt;
And starts becoming:&lt;br&gt;
“How do we use AI efficiently?”&lt;br&gt;
That’s a very different question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this isn’t obvious (yet)
&lt;/h2&gt;

&lt;p&gt;Most of the conversation around AI is still focused on:&lt;br&gt;
model quality&lt;br&gt;
capabilities&lt;br&gt;
performance&lt;br&gt;
Not:&lt;br&gt;
usage patterns&lt;br&gt;
repetition&lt;br&gt;
system-level efficiency&lt;br&gt;
So a lot of teams don’t even look for this problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m trying to understand
&lt;/h2&gt;

&lt;p&gt;After noticing this pattern, the question I’ve been thinking about is:&lt;br&gt;
How much of LLM usage is actually new… and how much is just repetition in disguise?&lt;br&gt;
And more importantly:&lt;br&gt;
If a meaningful portion is repetitive, what should we do about it?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’ll explore next
&lt;/h2&gt;

&lt;p&gt;In the next post, I’ll go deeper into one specific part of this:&lt;br&gt;
why you’re probably paying twice for the same LLM response, even when the prompts aren’t identical.&lt;/p&gt;

&lt;p&gt;👉 Part 1 is here:&lt;br&gt;
(&lt;a href="https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700"&gt;https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700&lt;/a&gt;)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>“You’ve hit your ChatGPT usage limit” — and what it actually reveals about LLM usage</title>
      <dc:creator>Joshua Chukwu</dc:creator>
      <pubDate>Fri, 01 May 2026 00:21:36 +0000</pubDate>
      <link>https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700</link>
      <guid>https://dev.to/joshua_chukwu_ccb92f05a94/youve-hit-your-chatgpt-usage-limit-and-what-it-actually-reveals-about-llm-usage-700</guid>
      <description>&lt;h2&gt;
  
  
  “You’ve hit your ChatGPT usage limit.”
&lt;/h2&gt;

&lt;p&gt;I didn’t expect that message to mean anything beyond mild inconvenience.&lt;br&gt;
But it ended up revealing something much deeper about how I was using AI, and how most of us probably are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;I’ve been working on building an autonomous snow-clearing robot since 2023.&lt;br&gt;
It’s one of those projects where everything sounds straightforward until you actually try to make it work:&lt;br&gt;
motor control&lt;br&gt;
traction&lt;br&gt;
turning dynamics&lt;br&gt;
real-world constraints&lt;br&gt;
Things got a lot more interesting once AI tools became part of my workflow. Suddenly:&lt;br&gt;
debugging got faster&lt;br&gt;
ideas came quicker&lt;br&gt;
I could iterate without getting stuck for hours&lt;br&gt;
It genuinely felt like I was getting closer to something I had been chasing for a while.&lt;/p&gt;

&lt;h2&gt;
  
  
  The turning point
&lt;/h2&gt;

&lt;p&gt;Then I made what felt like a small decision at the time:&lt;br&gt;
I bought a set of cheap motors from a manufacturer.&lt;br&gt;
Bad idea.&lt;br&gt;
The software was glitchy.&lt;br&gt;
 The behavior was inconsistent.&lt;br&gt;
 And my rover couldn’t perform a proper zero-radius turn.&lt;br&gt;
So I did what most of us do now:&lt;br&gt;
I leaned heavily on ChatGPT.&lt;/p&gt;

&lt;h2&gt;
  
  
  The usage spiral
&lt;/h2&gt;

&lt;p&gt;At first, I was on the free plan.&lt;br&gt;
That lasted… not very long.&lt;br&gt;
I’d start debugging in the morning, and before noon:&lt;br&gt;
“You’ve hit your usage limit.”&lt;br&gt;
That alone should have been a signal.&lt;br&gt;
Instead, I upgraded.&lt;/p&gt;

&lt;h2&gt;
  
  
  The upgrade (and addiction phase)
&lt;/h2&gt;

&lt;p&gt;When the “try free for 1 month” plan rolled out, I jumped on it.&lt;br&gt;
And honestly, it changed everything.&lt;br&gt;
I wasn’t just using it for debugging anymore:&lt;br&gt;
I started automating parts of my workflow&lt;br&gt;
I used it at work&lt;br&gt;
I even used it for things I used to avoid—like CAD design&lt;br&gt;
It stopped feeling like a tool.&lt;br&gt;
It started feeling like a multiplier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment that stuck
&lt;/h2&gt;

&lt;p&gt;Then one day, I hit the limit again.&lt;br&gt;
But this time the message was different:&lt;br&gt;
“Your usage limit will reset in 7125 minutes.”&lt;br&gt;
7125 minutes.&lt;br&gt;
Such a strangely specific number that I had to calculate it.&lt;br&gt;
divide by 60 → hours&lt;br&gt;
divide by 24 → days&lt;br&gt;
It came out to roughly 5 days.&lt;br&gt;
That’s when it hit me:&lt;br&gt;
I had gotten so used to having this capability on demand that being cut off for a few days felt… unreal.&lt;br&gt;
Like I had to come back down to earth after living somewhere else for a while.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I started noticing
&lt;/h2&gt;

&lt;p&gt;After that moment, I started paying closer attention to how I was actually using ChatGPT.&lt;br&gt;
Not in a formal, instrumented way, just observing my own behavior.&lt;br&gt;
A few patterns stood out almost immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. I was asking the same question… differently
&lt;/h2&gt;

&lt;p&gt;When debugging the motor issue on my rover, I wasn’t asking completely new questions each time.&lt;br&gt;
It was more like:&lt;br&gt;
slight variations of the same prompt&lt;br&gt;
reworded explanations&lt;br&gt;
retrying when the answer didn’t feel quite right&lt;br&gt;
Something like:&lt;br&gt;
“Why can’t my rover perform a zero-radius turn?”&lt;br&gt;
would turn into:&lt;br&gt;
“What could cause skid-steer instability at low speeds?”&lt;br&gt;
 “Could motor torque limits affect turning radius?”&lt;br&gt;
 “Why does my robot struggle to rotate in place?”&lt;br&gt;
Different wording.&lt;br&gt;
 Same underlying problem.&lt;br&gt;
And every time, it was treated as a fresh request.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Debugging creates loops
&lt;/h2&gt;

&lt;p&gt;The nature of debugging makes this worse.&lt;br&gt;
You don’t just ask once and move on—you iterate:&lt;br&gt;
test something&lt;br&gt;
observe behavior&lt;br&gt;
come back with slightly more context&lt;br&gt;
ask again&lt;br&gt;
That loop might happen 10–20 times for a single issue.&lt;br&gt;
And each iteration:&lt;br&gt;
feels necessary&lt;br&gt;
feels new&lt;br&gt;
but often overlaps heavily with previous ones&lt;/p&gt;

&lt;h2&gt;
  
  
  3. I wasn’t aware of how much I was repeating
&lt;/h2&gt;

&lt;p&gt;At no point did it feel like I was “wasting” usage.&lt;br&gt;
It felt like I was:&lt;br&gt;
making progress&lt;br&gt;
refining my understanding&lt;br&gt;
getting closer to the answer&lt;br&gt;
But in reality, I was often:&lt;br&gt;
revisiting the same concepts&lt;br&gt;
re-triggering similar responses&lt;br&gt;
paying (in usage) for near-duplicate work&lt;/p&gt;

&lt;h2&gt;
  
  
  The realization
&lt;/h2&gt;

&lt;p&gt;That’s when the earlier message started to make more sense:&lt;br&gt;
“You’ve hit your ChatGPT usage limit.”&lt;br&gt;
It wasn’t just about “using too much AI.”&lt;br&gt;
It was about how I was using it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable question
&lt;/h2&gt;

&lt;p&gt;If this is how I was using it as a single person working on one project…&lt;br&gt;
What does this look like for:&lt;br&gt;
a small team of developers&lt;br&gt;
multiple engineers debugging in parallel&lt;br&gt;
a product that has users triggering similar workflows&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple thought experiment
&lt;/h2&gt;

&lt;p&gt;Imagine a team of 5 engineers(A practical case of my office)&lt;br&gt;
Each one:&lt;br&gt;
debugs with AI&lt;br&gt;
iterates through similar prompts&lt;br&gt;
retries and rephrases&lt;br&gt;
Even if 30–40% of their prompts overlap conceptually, there’s no mechanism to:&lt;br&gt;
recognize that overlap&lt;br&gt;
reuse prior results&lt;br&gt;
or even measure it&lt;br&gt;
Every request is treated as completely new work&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters (even if you’re not thinking about cost)
&lt;/h2&gt;

&lt;p&gt;At the time, I wasn’t thinking: “I’m wasting tokens”&lt;br&gt;
I was thinking: “I need to fix this rover”&lt;br&gt;
And that’s the point.&lt;br&gt;
Most usage doesn’t feel wasteful at the moment.&lt;br&gt;
It feels productive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift in perspective
&lt;/h2&gt;

&lt;p&gt;But once you zoom out, a different pattern appears:&lt;br&gt;
a lot of AI usage is iterative&lt;br&gt;
a lot of that iteration is repetitive&lt;br&gt;
and that repetition is invisible while you’re in it&lt;/p&gt;

&lt;h2&gt;
  
  
  What this post is really about
&lt;/h2&gt;

&lt;p&gt;This isn’t about:&lt;br&gt;
ChatGPT limits&lt;br&gt;
free vs paid plans&lt;br&gt;
or even just cost&lt;br&gt;
It’s about something more subtle:&lt;br&gt;
How easily we fall into patterns of repeated AI usage without realizing it&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this goes next
&lt;/h2&gt;

&lt;p&gt;In my case, this started as frustration:&lt;br&gt;
glitchy motors&lt;br&gt;
endless debugging&lt;br&gt;
hitting limits at the worst possible time&lt;br&gt;
But it led to a much more interesting question:&lt;br&gt;
How much of AI usage is actually new… and how much is just repetition in disguise?&lt;br&gt;
That’s what I’ll dig into next.&lt;/p&gt;

&lt;p&gt;(Next post)&lt;br&gt;
Why most LLM API usage is quietly inefficient&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
