<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sam</title>
    <description>The latest articles on DEV Community by Sam (@gauzzastrip).</description>
    <link>https://dev.to/gauzzastrip</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3851799%2Fb6fa26a0-1250-4ded-8ce4-80dd20407b95.jpg</url>
      <title>DEV Community: Sam</title>
      <link>https://dev.to/gauzzastrip</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gauzzastrip"/>
    <language>en</language>
    <item>
      <title>I Cut Coding Agent Context Usage by 22–45% by Killing Context Bloat</title>
      <dc:creator>Sam</dc:creator>
      <pubDate>Tue, 12 May 2026 19:02:23 +0000</pubDate>
      <link>https://dev.to/gauzzastrip/i-cut-coding-agent-context-usage-by-22-45-by-killing-context-bloat-2g3k</link>
      <guid>https://dev.to/gauzzastrip/i-cut-coding-agent-context-usage-by-22-45-by-killing-context-bloat-2g3k</guid>
      <description>&lt;h2&gt;
  
  
  A lot of AI coding workflows degrade the exact same way.
&lt;/h2&gt;

&lt;p&gt;At first, everything feels incredible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your coding agent:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;understands the project&lt;/li&gt;
&lt;li&gt;moves insanely fast&lt;/li&gt;
&lt;li&gt;eliminates boilerplate&lt;/li&gt;
&lt;li&gt;compounds your momentum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then a few weeks later:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; turns into a novel.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Prompts get bloated.&lt;/p&gt;

&lt;p&gt;The model starts missing obvious things.&lt;/p&gt;

&lt;p&gt;Responses become inconsistent.&lt;/p&gt;

&lt;p&gt;Token usage quietly becomes absurd.&lt;/p&gt;

&lt;p&gt;I kept running into this while building &lt;a href="https://empirical.gauzza.com" rel="noopener noreferrer"&gt;Empirical&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Eventually I realized the problem wasn’t:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The model needs more context.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The problem was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  “The model is carrying too much irrelevant context at once.”
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction changed everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hidden Failure Mode of Coding Agents
&lt;/h2&gt;

&lt;p&gt;Most teams solve AI memory like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Just add it to the prompt.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And over time the context fills up with:&lt;/p&gt;

&lt;h3&gt;
  
  
  Permanent Context Soup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;architecture decisions&lt;/li&gt;
&lt;li&gt;coding standards&lt;/li&gt;
&lt;li&gt;deployment notes&lt;/li&gt;
&lt;li&gt;UI preferences&lt;/li&gt;
&lt;li&gt;old implementation details&lt;/li&gt;
&lt;li&gt;temporary fixes&lt;/li&gt;
&lt;li&gt;abandoned experiments&lt;/li&gt;
&lt;li&gt;half-finished thoughts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventually every request drags all of it around forever.&lt;/p&gt;

&lt;p&gt;Even when most of it has absolutely nothing to do with the current task.&lt;/p&gt;

&lt;p&gt;That creates a brutal signal-to-noise problem.&lt;/p&gt;

&lt;p&gt;The model starts treating temporary junk and critical architecture decisions with equal importance.&lt;/p&gt;

&lt;p&gt;You can actually &lt;em&gt;feel&lt;/em&gt; the degradation happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Symptoms:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;the agent gets fuzzier&lt;/li&gt;
&lt;li&gt;architecture drift increases&lt;/li&gt;
&lt;li&gt;outputs become inconsistent&lt;/li&gt;
&lt;li&gt;you spend more time correcting than building&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bigger Context Windows Aren’t the Real Solution
&lt;/h2&gt;

&lt;p&gt;I think the industry is optimizing the wrong thing right now.&lt;/p&gt;

&lt;p&gt;Everyone keeps pushing toward:&lt;/p&gt;

&lt;h2&gt;
  
  
  Bigger Everything
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;million-token windows&lt;/li&gt;
&lt;li&gt;infinite memory&lt;/li&gt;
&lt;li&gt;larger context sizes&lt;/li&gt;
&lt;li&gt;stuffing more into prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But humans don’t work like that either.&lt;/p&gt;

&lt;p&gt;Good engineering teams don’t bring every document into every meeting.&lt;/p&gt;

&lt;p&gt;Most information is situational.&lt;/p&gt;

&lt;p&gt;Most memory should stay dormant until it becomes relevant.&lt;/p&gt;

&lt;p&gt;That was the shift for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I fit more into context?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  But:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I load only what matters right now?”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Worked Better
&lt;/h2&gt;

&lt;p&gt;I started treating AI memory more like &lt;strong&gt;layered working memory&lt;/strong&gt; instead of permanent prompt stuffing.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Lean Persistent Context
&lt;/h3&gt;

&lt;p&gt;Keep permanent instructions &lt;em&gt;extremely small&lt;/em&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Only things like:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;architecture principles&lt;/li&gt;
&lt;li&gt;coding philosophy&lt;/li&gt;
&lt;li&gt;project identity&lt;/li&gt;
&lt;li&gt;non-negotiables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That layer should stay lean on purpose.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Retrieved Context
&lt;/h3&gt;

&lt;p&gt;Pull implementation knowledge dynamically based on:&lt;/p&gt;

&lt;h4&gt;
  
  
  Relevance Signals
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;semantic similarity&lt;/li&gt;
&lt;li&gt;current task&lt;/li&gt;
&lt;li&gt;related code paths&lt;/li&gt;
&lt;li&gt;previous work in the same area&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only relevant context enters the active prompt.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Session Context
&lt;/h3&gt;

&lt;p&gt;Use temporary working memory for:&lt;/p&gt;

&lt;h4&gt;
  
  
  Active Work
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;bugs&lt;/li&gt;
&lt;li&gt;in-progress features&lt;/li&gt;
&lt;li&gt;short-lived implementation decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then let it expire naturally instead of polluting long-term memory forever.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;The biggest surprise wasn’t even the token savings.&lt;/p&gt;

&lt;p&gt;It was how much sharper the agents became once the noise disappeared.&lt;/p&gt;

&lt;h2&gt;
  
  
  After reducing context bloat:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;responses became more focused&lt;/li&gt;
&lt;li&gt;architecture stayed more consistent&lt;/li&gt;
&lt;li&gt;prompt babysitting dropped significantly&lt;/li&gt;
&lt;li&gt;outputs drifted less between sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The token reduction was just the measurable side effect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;Context Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Smaller focused tasks&lt;/td&gt;
&lt;td&gt;~22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Larger iterative workflows&lt;/td&gt;
&lt;td&gt;Up to ~45%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That compounds &lt;em&gt;fast&lt;/em&gt; once agents start looping.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Realization
&lt;/h2&gt;

&lt;p&gt;I think a lot of AI tooling is accidentally recreating bad human organizational habits.&lt;/p&gt;

&lt;p&gt;We already know what happens when people dump everything into:&lt;/p&gt;

&lt;h3&gt;
  
  
  Organizational Chaos
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;giant docs&lt;/li&gt;
&lt;li&gt;giant meetings&lt;/li&gt;
&lt;li&gt;giant Slack threads&lt;/li&gt;
&lt;li&gt;giant Notion pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clarity collapses.&lt;/p&gt;

&lt;p&gt;Coding agents seem to behave better when memory works more like human working memory:&lt;/p&gt;

&lt;h2&gt;
  
  
  Better Memory Pattern
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;small active focus&lt;/li&gt;
&lt;li&gt;relevant recall&lt;/li&gt;
&lt;li&gt;long-term memory separated from immediate attention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That mattered far more than raw context size.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Breakdown
&lt;/h2&gt;

&lt;p&gt;I wrote the complete breakdown here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval architecture&lt;/li&gt;
&lt;li&gt;layered memory strategy&lt;/li&gt;
&lt;li&gt;implementation lessons&lt;/li&gt;
&lt;li&gt;where the 22–45% savings actually came from&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ &lt;a href="https://empirical.gauzza.com/blog/coding-agent-context-savings-coding-agent-context-savings-22-45-percent/" rel="noopener noreferrer"&gt;Reducing Coding Agent Context Usage by 22–45% with Retrieval-Based Memory Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>webdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I’ve been using Empirical as my memory layer across AI tools.</title>
      <dc:creator>Sam</dc:creator>
      <pubDate>Fri, 08 May 2026 16:36:24 +0000</pubDate>
      <link>https://dev.to/gauzzastrip/ive-been-using-empirical-as-my-memory-layer-across-ai-tools-lji</link>
      <guid>https://dev.to/gauzzastrip/ive-been-using-empirical-as-my-memory-layer-across-ai-tools-lji</guid>
      <description>&lt;p&gt;ChatGPT memory helps.&lt;br&gt;
Local MD files help.&lt;/p&gt;

&lt;p&gt;But neither travels cleanly across everything I use, and packing too much into MD files eats context and tokens.&lt;/p&gt;

&lt;p&gt;With Empirical, I keep my AGENTS.md lean and let Codex pull context dynamically when it actually needs it. &lt;/p&gt;

&lt;p&gt;I can open ChatGPT on my phone, connected to Empirical, and it pulls the same memory context and writing tone I use in Codex or any other connected AI tool. &lt;br&gt;
That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;less repeated setup&lt;/li&gt;
&lt;li&gt;cleaner, cheaper prompts&lt;/li&gt;
&lt;li&gt;more consistent output across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is just the tip of the iceberg.&lt;/p&gt;

&lt;p&gt;I wrote up a Codex example here:&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://empirical.gauzza.com/blog/codex-session-tone-voice-how-i-used-codex-empirical-to-lock-in-my-writing-voice/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fempirical.gauzza.com%2Fimages%2Fblog%2Fcodex-session-tone-voice%2Fhero.gif" height="450" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://empirical.gauzza.com/blog/codex-session-tone-voice-how-i-used-codex-empirical-to-lock-in-my-writing-voice/" rel="noopener noreferrer" class="c-link"&gt;
            How I Used Codex + Empirical to Lock In My Writing Voice | Empirical Blog
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            April 30 note on using Empirical with Codex to define a repeatable writing voice through guided questions and live revision.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fempirical.gauzza.com%2Flogo.png" width="800" height="800"&gt;
          empirical.gauzza.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>agents</category>
      <category>ai</category>
      <category>productivity</category>
      <category>tooling</category>
    </item>
    <item>
      <title>I Needed Memory That Survives Context Windows. Memory That Moves Across Environments</title>
      <dc:creator>Sam</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:05:00 +0000</pubDate>
      <link>https://dev.to/gauzzastrip/i-needed-memory-that-survives-context-windows-memory-that-moves-across-environments-p4</link>
      <guid>https://dev.to/gauzzastrip/i-needed-memory-that-survives-context-windows-memory-that-moves-across-environments-p4</guid>
      <description>&lt;p&gt;I kept running into the same thing with AI tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;great context disappears&lt;/li&gt;
&lt;li&gt;I repeat myself constantly&lt;/li&gt;
&lt;li&gt;Every tool remembers different stuff (or nothing)&lt;/li&gt;
&lt;li&gt;Moving between tools my context doesn't follow me&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbkc8zluat4bhk8tijro.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbkc8zluat4bhk8tijro.jpg" alt="Image description=" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built &lt;a href="https://empirical.gauzza.com" rel="noopener noreferrer"&gt;Empirical&lt;/a&gt;.
&lt;/h2&gt;

&lt;p&gt;It started in a pretty common place: I was iterating on a Philly-style hoagie roll recipe.&lt;/p&gt;

&lt;p&gt;I wanted the AI to remember what I liked, what failed, and what I wanted to try next without re-explaining it every time.&lt;/p&gt;

&lt;p&gt;I originally thought Empirical would be its own chatbot. I started down that path, then realized I was solving the wrong problem. Reinventing the wheel.&lt;/p&gt;

&lt;p&gt;I didn’t need another chat interface.&lt;br&gt;
I needed a memory layer I could use everywhere.&lt;/p&gt;

&lt;p&gt;So I changed lanes and focused on MCP tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Now I use Empirical memory across:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Coding CLI's&lt;/li&gt;
&lt;li&gt;ChatGPT&lt;/li&gt;
&lt;li&gt;Claude Web&lt;/li&gt;
&lt;li&gt;Claw Agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same memory, different interfaces. Now if ChatGPT is no longer _cool _ or Claude leaks it's entire codebase, I can switch to the latest hot thing and all my context and memories move with me.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real examples that made this click for me
&lt;/h3&gt;

&lt;p&gt;I can take a pic of a bourbon, say “I like this,” and that preference is saved as persistent memory.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02e6vz9di2q8msvvqqls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02e6vz9di2q8msvvqqls.png" alt="Image description=" width="699" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I can send health data and query/chat over it later to help spot patterns.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz9jaknlbmw2nsqucz78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz9jaknlbmw2nsqucz78.png" alt="Image description=" width="698" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I can write a PRD while going on a walk with ChatGPT, then pull it back up in a CLI session at my desk.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s next
&lt;/h3&gt;

&lt;p&gt;I’m now working on connecting Empirical to more sources so memory reflects more of my actual life/workflow.&lt;/p&gt;

&lt;p&gt;Current focus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better pattern recognition over time&lt;/li&gt;
&lt;li&gt;stronger multimodal memory (text + image + structured data)&lt;/li&gt;
&lt;li&gt;cleaner memory workflows for agents&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  If this clicks with you, I'd love for you to check it out and give it a try:
&lt;/h3&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://empirical.gauzza.com" rel="noopener noreferrer"&gt;Empirical&lt;/a&gt;
&lt;/h2&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
