<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jean-Rodney Larrieux</title>
    <description>The latest articles on DEV Community by Jean-Rodney Larrieux (@jlarrieux).</description>
    <link>https://dev.to/jlarrieux</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F20424%2F6db2f43f-7f7f-40f8-a1bd-cfa33d4f1a7e.jpeg</url>
      <title>DEV Community: Jean-Rodney Larrieux</title>
      <link>https://dev.to/jlarrieux</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jlarrieux"/>
    <language>en</language>
    <item>
      <title>How I Spent My Vacation Day: Debugging Docker with AI (And Recovering 39GB)</title>
      <dc:creator>Jean-Rodney Larrieux</dc:creator>
      <pubDate>Wed, 30 Jul 2025 16:07:06 +0000</pubDate>
      <link>https://dev.to/jlarrieux/how-i-spent-my-vacation-day-debugging-docker-with-ai-and-recovering-39gb-182h</link>
      <guid>https://dev.to/jlarrieux/how-i-spent-my-vacation-day-debugging-docker-with-ai-and-recovering-39gb-182h</guid>
      <description>&lt;p&gt;&lt;strong&gt;Should have been poolside. Ended up in a terminal. Best vacation day ever.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yesterday morning, sipping my first coffee and checking my home server dashboard, I saw it: &lt;strong&gt;89% disk usage&lt;/strong&gt; on my 512GB SSD. My production Nomad cluster with 173 containers was slowly choking to death.&lt;/p&gt;

&lt;p&gt;I told myself "just a quick look." Six hours later, I'd solved a mystery that would have taken me days without my AI debugging partner.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mystery That Hooked Me 🕵️
&lt;/h2&gt;

&lt;p&gt;The numbers didn't add up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker reporting &lt;strong&gt;99GB&lt;/strong&gt; of space used&lt;/li&gt;
&lt;li&gt;Actual images: only &lt;strong&gt;13GB&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Where was the missing 86GB?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This wasn't just about disk space. With 25 replicas each of my core services (ferengi, price-service, formatter), something was fundamentally wrong with my container architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter My AI Debugging Partner 🤖
&lt;/h2&gt;

&lt;p&gt;Instead of random Googling, I structured this as a systematic investigation with Claude. The collaboration pattern that emerged was powerful:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; "Here's what I'm seeing..." (intuition, context, constraints)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; "Let's check X, Y, Z systematically..." (comprehensive analysis, documentation)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Me:&lt;/strong&gt; "That's weird, but this makes sense..." (creative leaps, real-world experience)  &lt;/p&gt;

&lt;p&gt;We built diagnostic commands step by step, creating a repeatable investigation process.&lt;/p&gt;
&lt;h2&gt;
  
  
  Down the Rabbit Hole 🐰
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Hour 2:&lt;/strong&gt; Found the smoking gun - 482 overlay2 directories vs 173 running containers. Docker had accumulated massive "ghost layers" from months of deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 3:&lt;/strong&gt; Built a cleanup script, recovered 93GB. Victory! 🎉&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 4:&lt;/strong&gt; Plot twist - everything broke. CNI networking conflicts, ECR authentication failures, Docker metadata corruption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hours 5-6:&lt;/strong&gt; "This should have been simple..." 😅&lt;/p&gt;

&lt;p&gt;Each new problem became a mini-learning session. Docker internals, CNI plugins, Nomad networking - concepts I'd used but never deeply understood were now crystal clear through systematic AI-assisted debugging.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Real Breakthrough 💡
&lt;/h2&gt;

&lt;p&gt;The infrastructure cleanup was only half the story. Even after recovery, my containers were consuming &lt;strong&gt;800MB+ each&lt;/strong&gt; - multiply that by 75+ containers and you're right back where you started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The aha moment&lt;/strong&gt;: Attack the problem at TWO levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Clean ghost layers and corrupted state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application&lt;/strong&gt;: Optimize the images themselves&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Multi-Stage Docker Builds to the Rescue 🏗️
&lt;/h2&gt;

&lt;p&gt;My old Dockerfile was a disaster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10-slim&lt;/span&gt;
&lt;span class="c"&gt;# Install EVERYTHING including build tools&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;gcc build-essential git...
&lt;span class="c"&gt;# Build tools live forever in final image&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stage 1: Builder (with all the messy build tools)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;python:3.10-bookworm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="c"&gt;# Do all the building, installing, compiling...&lt;/span&gt;

&lt;span class="c"&gt;# Stage 2: Clean runtime (minimal dependencies)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10-slim&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app /app&lt;/span&gt;
&lt;span class="c"&gt;# Only copy what you need to RUN, not BUILD&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mathematics were beautiful: &lt;strong&gt;500MB savings per image&lt;/strong&gt; × layer sharing across replicas = massive space recovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Victory Metrics 📊
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Final result: 99GB → 60GB (39GB recovered)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System health: 89% → 58% disk usage&lt;/li&gt;
&lt;li&gt;Deployment efficiency: Images 60% smaller&lt;/li&gt;
&lt;li&gt;Security win: No build tools or secrets in production images&lt;/li&gt;
&lt;li&gt;Knowledge gained: Priceless&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Takeaways for Fellow Engineers 🎯
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The Two-Pronged Approach&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Don't just fix infrastructure OR application - optimize both layers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. AI as Debugging Partner&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
LLMs excel at systematic analysis and comprehensive checklists. Humans bring intuition and creative problem-solving. Together? Unstoppable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Multi-Stage Builds Aren't Optional&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you're not using them, you're shipping build tools to production. Every. Single. Time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Systems Thinking Matters&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Local optimizations (cleaning up space) can miss bigger architectural issues (bloated images).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Document Your Wins&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I turned this debugging session into a repeatable script. Future me will thank present me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Human-AI Collaboration That Worked 🤝
&lt;/h2&gt;

&lt;p&gt;What made this effective wasn't replacing human insight with AI - it was amplifying it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI strength&lt;/strong&gt;: Systematic analysis, comprehensive command sequences, documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human strength&lt;/strong&gt;: Pattern recognition, creative leaps, understanding constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Together&lt;/strong&gt;: Accelerated learning and problem-solving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of days of trial-and-error, I compressed months of Docker expertise into hours of focused collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line 💪
&lt;/h2&gt;

&lt;p&gt;Best vacation day debugging session ever. Sometimes the most rewarding problems find you when you least expect them.&lt;/p&gt;

&lt;p&gt;Who else has had their "relaxing day off" hijacked by a fascinating technical challenge? Share your stories below! 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. - The cleanup script is now version-controlled and includes automatic backup rotation, CNI state cleanup, and ECR re-authentication. Because future-me deserves nice things too.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>ai</category>
      <category>problemsolving</category>
      <category>lifelonglearning</category>
    </item>
  </channel>
</rss>
