<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alvarito1983</title>
    <description>The latest articles on DEV Community by Alvarito1983 (@alvarito1983).</description>
    <link>https://dev.to/alvarito1983</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3858088%2Fa0fbc217-d69f-4570-b98d-c85a5e01ed1d.png</url>
      <title>DEV Community: Alvarito1983</title>
      <link>https://dev.to/alvarito1983</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alvarito1983"/>
    <language>en</language>
    <item>
      <title>Quadlet: The Podman Feature That Finally Makes Sense on a Homelab</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Thu, 23 Apr 2026 08:45:09 +0000</pubDate>
      <link>https://dev.to/alvarito1983/quadlet-the-podman-feature-that-finally-makes-sense-on-a-homelab-2m3</link>
      <guid>https://dev.to/alvarito1983/quadlet-the-podman-feature-that-finally-makes-sense-on-a-homelab-2m3</guid>
      <description>&lt;p&gt;If you run a homelab on Docker Compose, you've probably accepted a quiet trade-off without noticing.&lt;/p&gt;

&lt;p&gt;Your containers work. They start on boot. They restart if they crash. But they live &lt;em&gt;outside&lt;/em&gt; the operating system's service model — orphaned from systemd, invisible to &lt;code&gt;journalctl&lt;/code&gt;, dependent on a daemon that has to be alive before anything else in your stack can exist. Your 40 services are really 40 children of a single PID, not 40 first-class citizens of your Linux host.&lt;/p&gt;

&lt;p&gt;This isn't a problem, exactly. It's just a ceiling. And over the last year a feature from the Podman side of the container world has quietly raised that ceiling in a way that's interesting enough to write about — even if you, like me, have zero intention of ripping out your Docker Compose setup tomorrow.&lt;/p&gt;

&lt;p&gt;The feature is called &lt;strong&gt;Quadlet&lt;/strong&gt;. It has been around since Podman 4.4, but 2026 is the year it's actually becoming the default way sysadmins on Red Hat-adjacent systems manage containers at home. And it does something I've not seen any other container tool do cleanly: it makes a container behave like a native systemd service.&lt;/p&gt;

&lt;p&gt;Let's look at what that actually means, what it's good for, and — importantly — when you should probably ignore it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Quadlet actually is, in one paragraph
&lt;/h2&gt;

&lt;p&gt;Quadlet is a systemd generator that reads declarative &lt;code&gt;.container&lt;/code&gt; files (plus &lt;code&gt;.network&lt;/code&gt;, &lt;code&gt;.volume&lt;/code&gt;, &lt;code&gt;.pod&lt;/code&gt;, &lt;code&gt;.kube&lt;/code&gt;, &lt;code&gt;.image&lt;/code&gt; and a few others) and converts them at boot time into real systemd service units. You don't run &lt;code&gt;podman run&lt;/code&gt; and you don't write unit files by hand. You drop an INI-style file into a specific directory, run &lt;code&gt;systemctl daemon-reload&lt;/code&gt;, and now you have a fully managed service with restart policies, dependency management, journal logging, healthchecks, and optional auto-updates — all driven by systemd, not by a background daemon.&lt;/p&gt;

&lt;p&gt;That's the whole idea. The container is no longer something Podman runs &lt;em&gt;on the side&lt;/em&gt;. It's something your operating system runs, the same way it runs &lt;code&gt;sshd&lt;/code&gt; or &lt;code&gt;nginx&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The simplest possible example
&lt;/h2&gt;

&lt;p&gt;A minimal rootless Quadlet for something useful — let's say Vaultwarden — looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.config/containers/systemd/vaultwarden.container
&lt;/span&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Vaultwarden password vault&lt;/span&gt;
&lt;span class="py"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network-online.target&lt;/span&gt;

&lt;span class="nn"&gt;[Container]&lt;/span&gt;
&lt;span class="py"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;docker.io/vaultwarden/server:latest&lt;/span&gt;
&lt;span class="py"&gt;ContainerName&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;vaultwarden&lt;/span&gt;
&lt;span class="py"&gt;PublishPort&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;8080:80&lt;/span&gt;
&lt;span class="py"&gt;Volume&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;vaultwarden-data.volume:/data&lt;/span&gt;
&lt;span class="py"&gt;AutoUpdate&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;registry&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;always&lt;/span&gt;
&lt;span class="py"&gt;TimeoutStartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;300&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;default.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl &lt;span class="nt"&gt;--user&lt;/span&gt; daemon-reload
systemctl &lt;span class="nt"&gt;--user&lt;/span&gt; start vaultwarden.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. There's no &lt;code&gt;vaultwarden.service&lt;/code&gt; file on disk — Quadlet generates it on the fly. If you edit the &lt;code&gt;.container&lt;/code&gt; file and reload, the generated unit updates. If a newer Podman version ships improvements to the generator, your service picks them up next reboot without you touching anything.&lt;/p&gt;

&lt;p&gt;The volume referenced as &lt;code&gt;vaultwarden-data.volume&lt;/code&gt; is another Quadlet file (a &lt;code&gt;.volume&lt;/code&gt;), which generates a named volume managed by systemd the same way.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you actually gain
&lt;/h2&gt;

&lt;p&gt;This is the part that took me a while to appreciate. It's not "Podman has replaced Docker." It's that &lt;strong&gt;the integration layer is fundamentally different.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified logging.&lt;/strong&gt; &lt;code&gt;journalctl --user -u vaultwarden.service&lt;/code&gt; gives you the container's logs in the same place and same format as the rest of the system. No &lt;code&gt;docker logs&lt;/code&gt;, no parallel logging story. When something breaks at 2am, you're searching one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real dependency management.&lt;/strong&gt; Want your Nextcloud container to only start after MariaDB is ready &lt;em&gt;and&lt;/em&gt; network-online.target has fired? Write it in the &lt;code&gt;[Unit]&lt;/code&gt; section like any other service. systemd handles the ordering, not a &lt;code&gt;depends_on&lt;/code&gt; line that only waits for process start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Restart policies and health-triggered recovery.&lt;/strong&gt; Quadlet exposes &lt;code&gt;HealthCmd&lt;/code&gt;, &lt;code&gt;HealthInterval&lt;/code&gt;, &lt;code&gt;HealthOnFailure&lt;/code&gt;, &lt;code&gt;HealthRetries&lt;/code&gt; directly. Combined with &lt;code&gt;HealthOnFailure=kill&lt;/code&gt; and systemd's &lt;code&gt;Restart=always&lt;/code&gt;, you get a self-healing service loop that doesn't depend on anything watching from outside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-update with automatic rollback.&lt;/strong&gt; Add &lt;code&gt;AutoUpdate=registry&lt;/code&gt; and the &lt;code&gt;podman-auto-update.timer&lt;/code&gt; checks daily for new image digests. If the new image fails the healthcheck after restart, Podman rolls back to the previous image automatically. No Watchtower, no cron hack, no custom script — and crucially, no blind &lt;code&gt;latest&lt;/code&gt; tag chasing that silently deploys a broken image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rootless by default.&lt;/strong&gt; The file above runs under your user account. A container breakout gets your UID, not root. For a homelab exposing services, this is a meaningful reduction in blast radius.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No daemon to keep alive.&lt;/strong&gt; There's no &lt;code&gt;dockerd&lt;/code&gt; that has to be up before your stack exists. Containers are direct children of systemd. If Podman itself has a bug tomorrow, the contract with your services is still "this is a systemd unit" — same tools, same lifecycle, same recovery procedures you already know.&lt;/p&gt;




&lt;h2&gt;
  
  
  The comparison that matters: Quadlet vs Docker Compose
&lt;/h2&gt;

&lt;p&gt;I want to be honest here because most of what's written online about Podman/Quadlet treats Docker Compose as the enemy. It isn't. Compose and Quadlet solve overlapping problems with very different philosophies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker Compose thinks in stacks.&lt;/strong&gt; One &lt;code&gt;docker-compose.yml&lt;/code&gt; describes a self-contained application: services, networks, volumes, dependencies, all in one file. You &lt;code&gt;docker compose up -d&lt;/code&gt; and the stack exists. It's portable across any machine with Docker. You can commit it to a git repo and anyone can reproduce your environment. The mental model is: &lt;em&gt;an application is a folder&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quadlet thinks in system services.&lt;/strong&gt; Each container is its own unit file. Networks and volumes are their own unit files. The relationships are expressed through systemd's dependency graph, not through a single manifest. The mental model is: &lt;em&gt;a container is a service that happens to run in a container&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For development, portability, and throwaway environments, Compose wins. It's simpler, the tooling is mature, and the ecosystem is vast. For a long-lived single-host deployment where you want containers to behave like every other service on your Linux box, Quadlet wins.&lt;/p&gt;

&lt;p&gt;The honest answer for most homelabbers is that &lt;strong&gt;both are valid&lt;/strong&gt;, and which one fits depends on what you actually value.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Quadlet is the wrong answer
&lt;/h2&gt;

&lt;p&gt;Skip this feature if any of the following apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your homelab runs on Windows / WSL2 / macOS.&lt;/strong&gt; Quadlet is Linux + systemd. Period. If you're on Docker Desktop, this isn't for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're deeply invested in Compose-specific features&lt;/strong&gt; like &lt;code&gt;profiles&lt;/code&gt;, &lt;code&gt;extends&lt;/code&gt;, &lt;code&gt;x-*&lt;/code&gt; anchors, or the Compose CLI's build workflows. &lt;code&gt;podman-compose&lt;/code&gt; exists but compatibility isn't 100%, and Quadlet isn't trying to replicate Compose semantics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want GUI-first management.&lt;/strong&gt; Portainer, Dockge, and the like are built around Docker. Cockpit has Quadlet integration now, but the ecosystem is thinner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You share stacks across machines with non-Linux users.&lt;/strong&gt; Compose files are more portable as artifacts than a folder of &lt;code&gt;.container&lt;/code&gt; units tied to one host's systemd tree.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU passthrough for media transcoding is a hard requirement and you have it working on Docker.&lt;/strong&gt; It's doable on Podman, but expect extra work. If your Jellyfin is happy today, don't break it for philosophy.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When Quadlet is genuinely the right answer
&lt;/h2&gt;

&lt;p&gt;Reach for it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You run a Linux-native homelab&lt;/strong&gt; on Fedora, Rocky, Alma, CentOS Stream, Debian, or openSUSE and your containers are long-lived services rather than dev environments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You care about boot ordering.&lt;/strong&gt; "Start reverse proxy after databases after network is up" is a one-liner in systemd, and a fragile convention in Compose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want auto-update with rollback&lt;/strong&gt; without bolting on a third-party tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're tired of the Docker daemon being a single point of failure&lt;/strong&gt; for your entire container story.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're already comfortable with systemd&lt;/strong&gt; as an operator. This is the key one. If &lt;code&gt;systemctl&lt;/code&gt;, &lt;code&gt;journalctl&lt;/code&gt;, and unit file syntax feel natural to you, Quadlet is going to feel like containers finally speaking your language. If they don't, the learning curve is real and not worth it for a Plex install.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The hybrid setup that actually makes sense
&lt;/h2&gt;

&lt;p&gt;This is where I've landed, and I think it's where most practical people will land:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Docker Compose for development and prototyping.&lt;/strong&gt; Fast iteration, familiar tooling, easy to share. If I'm testing a new self-hosted tool for the first time, it's going into a Compose file in a scratch directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quadlet for the services I've committed to.&lt;/strong&gt; Once something graduates from "I'm trying this out" to "this runs my household," it's worth the 15 minutes to port it to a &lt;code&gt;.container&lt;/code&gt; file and let systemd own its lifecycle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't have to pick one. On a Linux homelab you can run both side by side. The Docker daemon and Podman's rootless stack don't fight each other — they just don't talk.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest closing thought
&lt;/h2&gt;

&lt;p&gt;Quadlet isn't going to replace Docker Compose in most homelabs, and I don't think it should. Compose is good at what Compose is good for. What Quadlet does is close a gap I didn't realize I had: my containers were never really part of my operating system. They were tenants. With Quadlet, for the services where it matters, they become residents.&lt;/p&gt;

&lt;p&gt;That's a small distinction until it isn't — until the night something breaks and you realize your recovery story is the same one you use for every other service on the box, instead of a separate container-shaped exception.&lt;/p&gt;

&lt;p&gt;If you run Linux and you've been curious about Podman but couldn't articulate what the actual upgrade was, this is it. Not the daemon thing. Not the rootless thing in isolation. The integration.&lt;/p&gt;

&lt;p&gt;Try it on one service. See how it feels. If it clicks, you'll know. If it doesn't, your Compose file is still there, and nothing has been lost.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you moved any part of your stack to Quadlet, or tried and bounced off it? I'd like to hear which services worked cleanly and which were more trouble than they were worth — drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>podman</category>
      <category>linux</category>
      <category>selfhosted</category>
      <category>homelab</category>
    </item>
    <item>
      <title>The WSL2 Guide I Wish I Had: 4 Gotchas That Will Eat Your Afternoon</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Wed, 22 Apr 2026 12:38:57 +0000</pubDate>
      <link>https://dev.to/alvarito1983/the-wsl2-guide-i-wish-i-had-4-gotchas-that-will-eat-your-afternoon-5aal</link>
      <guid>https://dev.to/alvarito1983/the-wsl2-guide-i-wish-i-had-4-gotchas-that-will-eat-your-afternoon-5aal</guid>
      <description>&lt;p&gt;WSL2 is a fantastic development environment on Windows. It's also a system with sharp edges that the official docs rarely highlight — the kind you only discover after losing an afternoon to a process eating 300% CPU for no apparent reason.&lt;/p&gt;

&lt;p&gt;This guide documents four specific problems I've hit repeatedly over the last year while using WSL2 as my main development environment for Docker-based projects. For each one: the root cause, why the obvious fix doesn't work, and what actually solves it.&lt;/p&gt;

&lt;p&gt;This isn't an introduction to WSL2. If you're already using it daily and something feels off, keep reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Docker Desktop, cgroups, and processes that ignore resource limits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;You run a container on Docker Desktop for Windows (which uses WSL2 under the hood). The container executes a CPU-intensive process — a vulnerability scanner, a compiler, a batch job.&lt;/p&gt;

&lt;p&gt;You watch &lt;code&gt;htop&lt;/code&gt; and the process is consuming &lt;strong&gt;300%+ CPU&lt;/strong&gt;, dragging the entire system down.&lt;/p&gt;

&lt;p&gt;You think: &lt;em&gt;"no problem, I'll throttle it."&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;heavy-worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-scanner:latest&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1.0'&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1G&lt;/span&gt;
    &lt;span class="na"&gt;cpu_count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You restart the container.&lt;/p&gt;

&lt;p&gt;Still 300% CPU.&lt;/p&gt;

&lt;p&gt;You try &lt;code&gt;nice&lt;/code&gt;, &lt;code&gt;ionice&lt;/code&gt;, &lt;code&gt;cpulimit&lt;/code&gt;... nothing works.&lt;/p&gt;

&lt;h3&gt;
  
  
  The root cause
&lt;/h3&gt;

&lt;p&gt;Docker Desktop runs containers inside a WSL2-hosted VM using &lt;strong&gt;cgroup v2&lt;/strong&gt;, often with limited controllers.&lt;/p&gt;

&lt;p&gt;The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;deploy.resources.limits&lt;/code&gt; → ignored&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cpu_count&lt;/code&gt; → ignored&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nice&lt;/code&gt; / &lt;code&gt;ionice&lt;/code&gt; → ineffective&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /sys/fs/cgroup/cgroup.controllers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll often see fewer controllers than on a native Linux system.&lt;/p&gt;

&lt;h3&gt;
  
  
  What actually works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Option A (best): Limit the WSL2 VM&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[wsl2]&lt;/span&gt;
&lt;span class="py"&gt;processors&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;
&lt;span class="py"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;8GB&lt;/span&gt;
&lt;span class="py"&gt;swap&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2GB&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;wsl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--shutdown&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the VM is capped → containers behave predictably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: Use tool-level throttling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--parallel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="nt"&gt;--low-mem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These bypass scheduler issues entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option C: Replace the tool&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it's designed to max all cores and can't be tuned → wrong tool for WSL2.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Don't trust container limits on WSL2. Control the VM or use self-throttling tools.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Disk performance: &lt;code&gt;/mnt/c&lt;/code&gt; vs native WSL filesystem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;Working in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/mnt/c/Users/you/projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; → 8 minutes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git status&lt;/code&gt; → 4 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Move to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; → 25 seconds&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git status&lt;/code&gt; → instant&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The root cause
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/mnt/c&lt;/code&gt; uses the &lt;strong&gt;9P protocol&lt;/strong&gt; → every filesystem call crosses the Windows ↔ Linux boundary.&lt;/p&gt;

&lt;p&gt;Heavy IO workloads (Node, Git, Docker builds) get destroyed by latency.&lt;/p&gt;

&lt;p&gt;Native WSL FS = ext4 inside VHDX → near-native Linux speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real benchmark
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;&lt;code&gt;/mnt/c&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;WSL native&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;npm install&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~8 min&lt;/td&gt;
&lt;td&gt;~25 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;git status&lt;/code&gt; (10k files)&lt;/td&gt;
&lt;td&gt;~4 sec&lt;/td&gt;
&lt;td&gt;&amp;lt; 100 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;docker build&lt;/code&gt; context&lt;/td&gt;
&lt;td&gt;~90 sec&lt;/td&gt;
&lt;td&gt;~3 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What actually works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; keep code inside WSL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/projects/myapp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VS Code + WSL extension ✅ (best)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\\wsl$\Ubuntu\home\you\projects&lt;/code&gt; (OK)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you MUST use &lt;code&gt;/mnt/c&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Move heavy dirs (&lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;.git&lt;/code&gt;) to WSL&lt;/li&gt;
&lt;li&gt;Use symlinks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;/mnt/c&lt;/code&gt; is for compatibility, not performance.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Networking: ports, shifting IPs, and host access
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;localhost:3000&lt;/code&gt; → sometimes works, sometimes not&lt;/li&gt;
&lt;li&gt;WSL IP changes every reboot&lt;/li&gt;
&lt;li&gt;LAN access → broken&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The root cause
&lt;/h3&gt;

&lt;p&gt;WSL2 networking is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NATed via Hyper-V&lt;/li&gt;
&lt;li&gt;Not bridged&lt;/li&gt;
&lt;li&gt;Dynamic IP&lt;/li&gt;
&lt;li&gt;Partial &lt;code&gt;localhost&lt;/code&gt; forwarding&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What actually works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Ensure localhost forwarding:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[wsl2]&lt;/span&gt;
&lt;span class="py"&gt;localhostForwarding&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Get WSL IP:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip addr show eth0 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'inet '&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;/ &lt;span class="nt"&gt;-f1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expose to LAN (port proxy):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$wslIP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wsl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-I&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nx"&gt;netsh&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;interface&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;portproxy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;v4tov4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;listenport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;listenaddress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;connectport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;connectaddress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$wslIP&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;New-NetFirewallRule&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DisplayName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WSL 3000"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Direction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Inbound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-LocalPort&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Protocol&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;TCP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Allow&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best solution (modern WSL):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[wsl2]&lt;/span&gt;
&lt;span class="py"&gt;networkingMode&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;mirrored&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✔ Same network as host&lt;/li&gt;
&lt;li&gt;✔ No NAT issues&lt;/li&gt;
&lt;li&gt;✔ LAN works directly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;WSL2 networking = NAT. Use mirrored mode if available.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. Memory: the &lt;code&gt;vmmem&lt;/code&gt; problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Start day → 4 GB used&lt;/li&gt;
&lt;li&gt;Work with Docker&lt;/li&gt;
&lt;li&gt;Stop everything&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vmmem&lt;/code&gt; still using 12 GB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Never released.&lt;/p&gt;

&lt;h3&gt;
  
  
  The root cause
&lt;/h3&gt;

&lt;p&gt;WSL2:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allocates memory dynamically&lt;/li&gt;
&lt;li&gt;Does not release it back&lt;/li&gt;
&lt;li&gt;Linux keeps cache (normal behavior)&lt;/li&gt;
&lt;li&gt;Windows cannot reclaim it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What actually works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cap memory:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[wsl2]&lt;/span&gt;
&lt;span class="py"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;8GB&lt;/span&gt;
&lt;span class="py"&gt;swap&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;4GB&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Enable auto reclaim:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[experimental]&lt;/span&gt;
&lt;span class="py"&gt;autoMemoryReclaim&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gradual&lt;/span&gt;
&lt;span class="py"&gt;sparseVhd&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Manual reclaim:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"echo 3 &amp;gt; /proc/sys/vm/drop_caches"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Last resort:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;wsl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--shutdown&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;WSL will NOT give memory back unless you force it or configure it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Minimal &lt;code&gt;.wslconfig&lt;/code&gt; (recommended)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[wsl2]&lt;/span&gt;
&lt;span class="py"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;8GB&lt;/span&gt;
&lt;span class="py"&gt;processors&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;
&lt;span class="py"&gt;swap&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2GB&lt;/span&gt;

&lt;span class="py"&gt;localhostForwarding&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;networkingMode&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;mirrored&lt;/span&gt;

&lt;span class="nn"&gt;[experimental]&lt;/span&gt;
&lt;span class="py"&gt;autoMemoryReclaim&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gradual&lt;/span&gt;
&lt;span class="py"&gt;sparseVhd&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📍 &lt;strong&gt;Path:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;C:&lt;span class="se"&gt;\U&lt;/span&gt;sers&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;you&amp;gt;&lt;span class="se"&gt;\.&lt;/span&gt;wslconfig
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;wsl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--shutdown&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;Most of these issues come from one fact:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;WSL2 is a VM pretending to be native Linux.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And the cracks show when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You push CPU&lt;/li&gt;
&lt;li&gt;You do heavy IO&lt;/li&gt;
&lt;li&gt;You rely on networking assumptions&lt;/li&gt;
&lt;li&gt;You expect Linux memory behavior to match Windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WSL2 is still excellent — but only if you understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cgroups quirks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/mnt/c&lt;/code&gt; performance trap&lt;/li&gt;
&lt;li&gt;NAT networking&lt;/li&gt;
&lt;li&gt;Memory ballooning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you do, most "random issues" become predictable.&lt;/p&gt;




&lt;p&gt;If you've hit other WSL2 gotchas, drop them in the comments 👇&lt;/p&gt;

&lt;p&gt;The one that surprised me most? Spending 3 days tuning container limits… that were being completely ignored.&lt;/p&gt;

&lt;p&gt;💡 &lt;em&gt;Did this save you an afternoon? A follow or reaction helps me write more of these.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>wsl2</category>
      <category>docker</category>
      <category>devops</category>
      <category>microsoft</category>
    </item>
    <item>
      <title>Everyone is using Claude Code wrong.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:17:15 +0000</pubDate>
      <link>https://dev.to/alvarito1983/everyone-is-using-claude-code-wrong-1i26</link>
      <guid>https://dev.to/alvarito1983/everyone-is-using-claude-code-wrong-1i26</guid>
      <description>&lt;p&gt;Not because they're bad developers. Because the mental model is wrong from the start.&lt;/p&gt;

&lt;p&gt;Most people open Claude Code and treat it like a smarter autocomplete. They ask it to write a function. Fix a bug. Generate a component. Then they wonder why the output needs so much correction, why the context gets lost after a few sessions, why it feels like babysitting rather than collaborating.&lt;/p&gt;

&lt;p&gt;The problem isn't Claude Code. It's the job description you gave it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The wrong mental model
&lt;/h2&gt;

&lt;p&gt;When you use Claude Code as a coding assistant, you're asking it to be a fast typist. It writes code, you review it, you correct it, you move on.&lt;/p&gt;

&lt;p&gt;That works. It's faster than writing everything yourself. But it's not where the leverage is.&lt;/p&gt;

&lt;p&gt;The leverage is in treating Claude Code as a contractor, not a typist. A contractor who can execute an entire feature end-to-end — backend, frontend, tests, documentation — if you give them the right briefing.&lt;/p&gt;

&lt;p&gt;The difference between a typist and a contractor isn't skill. It's context.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude Code actually needs
&lt;/h2&gt;

&lt;p&gt;Claude Code has no persistent memory between sessions. Every time you start a new conversation, it starts completely fresh. It doesn't know your architecture decisions, your naming conventions, your known bugs, or where you left off last time.&lt;/p&gt;

&lt;p&gt;Most people solve this by re-explaining everything at the start of each session. That's the wrong solution. It's slow, it's incomplete, and you always forget something important.&lt;/p&gt;

&lt;p&gt;The right solution is a single file that does the re-explaining for you.&lt;/p&gt;

&lt;p&gt;I call it CLAUDE.md. It lives at the root of every project. Claude Code reads it automatically at the start of every session.&lt;/p&gt;

&lt;p&gt;It has four sections:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current State&lt;/strong&gt; — what's working, what's broken, where to pick up. Updated at the end of every session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Decisions&lt;/strong&gt; — not what you built, but why. "We use in-memory sessions because adding a persistence layer would complicate the Docker setup for homelab users." The reasoning that would otherwise live only in your head.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conventions&lt;/strong&gt; — specific rules Claude Code must follow. Not "write clean code." That's useless. Instead: exact patterns learned from real debugging sessions. Things that would take 30 minutes to rediscover without documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known Issues&lt;/strong&gt; — bugs and limitations that are known but not yet fixed. Prevents Claude Code from "fixing" something intentionally left as-is.&lt;/p&gt;

&lt;p&gt;Without this file, every session starts at zero. With it, Claude Code picks up exactly where you left off.&lt;/p&gt;




&lt;h2&gt;
  
  
  The second mistake: one task at a time
&lt;/h2&gt;

&lt;p&gt;Most people use Claude Code sequentially. Finish one thing, start the next.&lt;/p&gt;

&lt;p&gt;Claude Code supports sub-agents — multiple parallel workstreams running simultaneously. If you need to build five related components, you don't have to build them one by one.&lt;/p&gt;

&lt;p&gt;The mental shift is from "what's the next task" to "what can run in parallel." A frontend and a backend for the same feature. Unit tests while the implementation is being written. Documentation while the code is being reviewed.&lt;/p&gt;

&lt;p&gt;This is where the output multiplier kicks in. Not 2x faster — closer to 5x, because the bottleneck is no longer Claude Code's speed. It's your ability to architect work that can run in parallel.&lt;/p&gt;




&lt;h2&gt;
  
  
  The third mistake: no standards document
&lt;/h2&gt;

&lt;p&gt;When you use Claude Code across multiple sessions or multiple parts of a project, consistency breaks down fast. The button style from session one doesn't match session three. The error handling pattern in one module contradicts another.&lt;/p&gt;

&lt;p&gt;The fix is a standards document in CLAUDE.md. Not a style guide for humans — a set of rules Claude Code must follow exactly, every time, without being reminded.&lt;/p&gt;

&lt;p&gt;Color values. Component patterns. API response shapes. Auth flows. Every decision that should be consistent across the codebase, written down once, enforced automatically.&lt;/p&gt;

&lt;p&gt;Without it, you spend half your time correcting drift. With it, Claude Code enforces your standards better than you would yourself — because it doesn't get tired or forget.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fourth mistake: no forcing function for updates
&lt;/h2&gt;

&lt;p&gt;The obvious weakness of CLAUDE.md is staleness. If you don't update it, it becomes worse than useless — it confidently points Claude Code in the wrong direction.&lt;/p&gt;

&lt;p&gt;Discipline doesn't work. The sessions where you forget to update are exactly the sessions where something important happened.&lt;/p&gt;

&lt;p&gt;The fix is simple: at the end of every session, ask Claude Code to update the Current State section before closing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Before we finish, update the Current State section in CLAUDE.md to reflect what we did today, what's working, and where to pick up next time."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Thirty seconds. While the context is still fresh. The model is better at summarizing what just happened than you are at remembering to write it down.&lt;/p&gt;

&lt;p&gt;It catches maybe 80% of sessions. The other 20% are interrupted sessions — closed terminal, crashed IDE, ran out of time. Those you can't fully solve. But 80% is enough to make the system work.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changes when you get this right
&lt;/h2&gt;

&lt;p&gt;The output doesn't feel like AI-assisted coding anymore. It feels like having a contractor who knows your codebase, follows your standards, picks up exactly where you left off, and can work on multiple things at once.&lt;/p&gt;

&lt;p&gt;The work you actually do shifts. Less time writing boilerplate. Less time correcting style drift. Less time re-explaining context. More time on architecture, on product decisions, on the problems that actually require human judgment.&lt;/p&gt;

&lt;p&gt;That's the job description Claude Code was built for. Not typist. Architect's executor.&lt;/p&gt;

&lt;p&gt;Most people never get there because they never give it the right briefing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The CLAUDE.md approach and sub-agent patterns came out of building a 15-tool Docker management platform over several weeks. If you want the specifics on how to structure the file, I wrote about it here: [link to previous article]&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>I let Claude AI decide whether to patch my Docker vulnerabilities — here's what it found</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Sat, 18 Apr 2026 18:02:54 +0000</pubDate>
      <link>https://dev.to/alvarito1983/i-let-claude-ai-decide-whether-to-patch-my-docker-vulnerabilities-heres-what-it-found-4dpf</link>
      <guid>https://dev.to/alvarito1983/i-let-claude-ai-decide-whether-to-patch-my-docker-vulnerabilities-heres-what-it-found-4dpf</guid>
      <description>&lt;p&gt;Every security scanner will tell you what's vulnerable.&lt;/p&gt;

&lt;p&gt;None of them will tell you what to actually do about it.&lt;/p&gt;

&lt;p&gt;You get a list. CVE IDs, severity badges, affected packages. Then you're alone with the question that actually matters: is this exploitable in my setup, can I safely apply the fix, and does patching this break anything?&lt;/p&gt;

&lt;p&gt;I've been building a self-hosted Docker management platform. Last week I wired up an AI layer on top of the vulnerability scanner — not to automate patching, but to automate the reasoning about whether to patch. Here's what happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with "critical"
&lt;/h2&gt;

&lt;p&gt;Critical doesn't mean the same thing in every context.&lt;/p&gt;

&lt;p&gt;CVE-2025-58050 is a critical vulnerability in pcre2. On paper: patch immediately. In practice: it's in a socket proxy image maintained by a third party. I don't control that Dockerfile. I can pull the latest image and hope the maintainer already shipped the fix — or I can wait. Neither option is obvious from the CVE report alone.&lt;/p&gt;

&lt;p&gt;CVE-2026-27143 is a critical vulnerability in Go stdlib. Fix available: upgrade to 1.25.9. Sounds straightforward. The complication: the binary that ships this version of stdlib is cloudflared, Cloudflare's tunnel client. I didn't write it. I can't easily recompile it. The fix depends on Cloudflare publishing an updated release.&lt;/p&gt;

&lt;p&gt;Both are "critical." Neither has the same answer. A scanner can't tell you that. A rule can't tell you that. It requires judgment.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Level 1 does
&lt;/h2&gt;

&lt;p&gt;The first layer of automation is rule-based. No AI, no API calls, no external dependencies.&lt;/p&gt;

&lt;p&gt;When a scan completes and finds a critical vulnerability, Level 1 fires an alert. It knows: this image has N critical CVEs, the threshold is 1, therefore alert. Fast, deterministic, always on.&lt;/p&gt;

&lt;p&gt;This covers the obvious cases. A container with a known critical CVE should be flagged. That part doesn't need AI.&lt;/p&gt;

&lt;p&gt;What Level 1 can't do is answer: &lt;em&gt;should I patch this right now, and at what risk?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Level 2 does
&lt;/h2&gt;

&lt;p&gt;When Level 1 detects a critical vulnerability, Level 2 kicks in if an Anthropic API key is configured. It builds a structured prompt with everything the model needs to reason about the situation: the image name, the CVEs, the packages affected, the versions with fixes available, and whether those fixes represent a patch bump, a minor version change, or a major version jump.&lt;/p&gt;

&lt;p&gt;Claude Haiku then returns a structured analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exploitability&lt;/strong&gt; — is this remotely exploitable or does it require local access?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Urgency&lt;/strong&gt; — how quickly does this need to be addressed?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version Risk&lt;/strong&gt; — what's the upgrade risk? Patch bump vs minor vs major?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix Impact&lt;/strong&gt; — is the fix likely to break anything?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommended Action&lt;/strong&gt; — a concrete next step with reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This analysis goes into the notification. The email you receive isn't "critical CVE detected." It's a security brief.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it found on my stack
&lt;/h2&gt;

&lt;p&gt;Three critical vulnerabilities across two images. Here's what the AI analysis said about each:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CVE-2026-27143 and GHSA-p77j-4mvh-x3m3&lt;/strong&gt; — both in a tunnel manager image, both in third-party binaries (Go stdlib and gRPC shipped inside cloudflared). Exploitability: remote. Urgency: critical. Version risk: patch bump. Fix impact: low — &lt;em&gt;but the fix depends on the upstream vendor shipping an updated binary. Recommended action: defer until the vendor publishes an updated release, monitor for new cloudflared versions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CVE-2025-58050&lt;/strong&gt; — in a socket proxy image maintained by a third party. Package: pcre2. Fix available: 10.46-r0. Exploitability: remote. Urgency: critical. Fix impact: low. Recommended action: &lt;em&gt;pull the latest image version to pick up the fix if the maintainer has already shipped it. If not, defer and accept the risk with documentation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In both cases the AI correctly identified that I don't control the vulnerable code. It didn't tell me to patch something I can't patch. It told me what I actually needed to know: these are third-party dependencies, here's the risk profile, here's what to do while you wait.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part I didn't expect
&lt;/h2&gt;

&lt;p&gt;The most useful output wasn't the vulnerability analysis. It was the differentiation between "you can fix this" and "you're waiting on someone else."&lt;/p&gt;

&lt;p&gt;That distinction is obvious to a human who investigates the CVE. It's not obvious to a scanner. It requires knowing what the affected binary is, who maintains it, and whether the fix is in your control.&lt;/p&gt;

&lt;p&gt;The AI got this right without me telling it explicitly. It reasoned from the package name and context to the correct conclusion about ownership and fixability.&lt;/p&gt;

&lt;p&gt;That's the gap that rules-based automation can't close. You can write a rule that says "alert on critical CVEs." You can't write a rule that says "if the vulnerable binary is a third-party dependency with no upstream fix available, recommend deferral with documentation."&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned about AI in security workflows
&lt;/h2&gt;

&lt;p&gt;The value isn't automating the patch. It's automating the triage.&lt;/p&gt;

&lt;p&gt;A human security engineer looking at these three CVEs would spend 20-30 minutes researching each one: checking exploitability databases, looking at the upstream project's changelog, assessing version risk, writing up a recommendation. The AI does this in seconds, for every scan, every time.&lt;/p&gt;

&lt;p&gt;The output isn't always perfect. The model can misread version risk or miss context about your specific setup. Every decision is visible in the feed with the full reasoning attached — you can always override, and you should review anything the model flags as critical before acting.&lt;/p&gt;

&lt;p&gt;But the triage is right often enough to dramatically reduce the cognitive load of managing vulnerabilities across a fleet of containers. You stop reading CVE lists and start reading executive summaries with recommended actions.&lt;/p&gt;

&lt;p&gt;That's a different kind of tool.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building this as part of an open source self-hosted Docker management platform. If you're working on similar infrastructure automation problems, drop a comment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>CLAUDE.md: the file that makes AI actually remember what you built and why</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Fri, 17 Apr 2026 14:51:25 +0000</pubDate>
      <link>https://dev.to/alvarito1983/claudemd-the-file-that-makes-ai-actually-remember-what-you-built-and-why-228d</link>
      <guid>https://dev.to/alvarito1983/claudemd-the-file-that-makes-ai-actually-remember-what-you-built-and-why-228d</guid>
      <description>&lt;p&gt;Every AI coding session starts the same way.&lt;/p&gt;

&lt;p&gt;You open a new chat. You explain the project. You explain the stack. You explain the decisions you made last week and why. You spend 15 minutes giving context before writing a single line of code.&lt;/p&gt;

&lt;p&gt;Then the session ends. The context disappears. Next time, you start over.&lt;/p&gt;

&lt;p&gt;I got tired of this. So I built a system around a single file called CLAUDE.md — and it changed how I work with AI completely.&lt;/p&gt;




&lt;h2&gt;
  
  
  What CLAUDE.md is
&lt;/h2&gt;

&lt;p&gt;It's a plain text file that lives at the root of every project I work on. Claude Code reads it automatically at the start of every session.&lt;/p&gt;

&lt;p&gt;Not a README. Not documentation for other developers. This file is written specifically for the AI — it contains everything the model needs to pick up exactly where we left off without me having to re-explain anything.&lt;/p&gt;

&lt;p&gt;The difference sounds small. It isn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  What goes in it
&lt;/h2&gt;

&lt;p&gt;The file has four sections that I've refined over months of daily use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current State&lt;/strong&gt; is the entry point. It's the first thing Claude Code reads and it answers three questions: what's working, what's broken, and where to pick up next session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Current State — last updated [date]&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; What's working: Hub SSO, all agents, Log Center
&lt;span class="p"&gt;-&lt;/span&gt; What's broken: standalone tool still on old version
&lt;span class="p"&gt;-&lt;/span&gt; Next session: bump versions, publish release
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This section gets updated at the end of every session. More on that later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Decisions&lt;/strong&gt; is where I explain the &lt;em&gt;why&lt;/em&gt;, not the &lt;em&gt;what&lt;/em&gt;. Not "we use JWT auth" but "we use JWT auth with in-memory sessions because adding a persistence layer would complicate the Docker setup for homelab users — revisit when user base grows." The reasoning, the trade-offs, the constraints that shaped the decision.&lt;/p&gt;

&lt;p&gt;This is the section that saves me from re-litigating decisions. When Claude Code suggests something that contradicts a past decision, it sees the reasoning and understands why the alternative was rejected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conventions&lt;/strong&gt; is a list of rules Claude Code must follow. Specific ones, not generic ones. Not "write clean code" — that's useless. Instead: things learned from actual debugging sessions that would take 30 minutes to rediscover. Exact patterns that must be followed. Edge cases that look fixable but aren't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known Issues&lt;/strong&gt; is a list of bugs and limitations that are known but not yet fixed. This prevents Claude Code from "fixing" something that's intentionally left as-is, or spending time diagnosing something I already understand.&lt;/p&gt;




&lt;h2&gt;
  
  
  The forcing function problem
&lt;/h2&gt;

&lt;p&gt;The obvious weakness: if you don't update the file, it goes stale. And stale context is worse than no context — it confidently points the AI in the wrong direction.&lt;/p&gt;

&lt;p&gt;I tried discipline. It doesn't work reliably. The sessions where you forget to update are exactly the sessions where something important happened — a late-night debugging run, an interrupted session, a decision made in conversation that never touched the codebase.&lt;/p&gt;

&lt;p&gt;The solution I landed on: ask Claude Code to update the Current State block at the end of every session before closing.&lt;/p&gt;

&lt;p&gt;The prompt is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Before we finish, update the Current State section in CLAUDE.md to reflect what we did today, what's working, and where to pick up next time."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It takes 30 seconds. It happens while the context is still fresh. And the model is better at summarizing what just happened than I am at remembering to write it down.&lt;/p&gt;

&lt;p&gt;This catches maybe 80% of sessions that would otherwise leave stale state. The other 20% are interrupted sessions — closed terminal, crashed IDE, ran out of time. Those you can't fully solve. But 80% is enough to make the system work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem it doesn't solve
&lt;/h2&gt;

&lt;p&gt;There's a class of architectural knowledge that CLAUDE.md can't easily capture: the implicit decisions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"We avoid Y because of the incident in March"&lt;/em&gt; is the most important kind of architectural knowledge and the hardest to write down. It's not a pattern — it's a scar. The context that makes it meaningful lives in someone's memory, or in a post-mortem that nobody links to the codebase.&lt;/p&gt;

&lt;p&gt;A model reading CLAUDE.md can only match against what got written. If the decision was implicit — understood by everyone who was there, never documented because it seemed obvious at the time — the model has no surface to match against.&lt;/p&gt;

&lt;p&gt;My partial fix: at the end of sessions I ask Claude Code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Is there anything we decided today that we'd regret not documenting in six months?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It catches some of it. Not all. But it asks in the right direction — not "what did we do" but "what would we wish we'd written down."&lt;/p&gt;




&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;I'm building a self-hosted Docker management platform — 13 tools, each with its own frontend, backend, agent, and central Hub integration. The kind of project where losing context between sessions would be catastrophic.&lt;/p&gt;

&lt;p&gt;With CLAUDE.md, a new session starts like this: Claude Code reads the file, understands the current state of all 13 tools, knows the conventions for auth patterns and Docker socket connections and visual standards, and picks up exactly where we left off. No re-explanation. No re-litigating past decisions.&lt;/p&gt;

&lt;p&gt;Today's session: 7 new tools built from scratch, all integrated into the ecosystem, all following the same design system and agent patterns. Claude Code maintained consistency across all of them because the standards were written down, not carried in my head.&lt;/p&gt;

&lt;p&gt;Without CLAUDE.md, that consistency would have required constant correction. With it, the model enforces the standards itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one-line version
&lt;/h2&gt;

&lt;p&gt;CLAUDE.md is not a README. It's not documentation. It's the answer to the question the AI needs to ask at the start of every session:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What do I need to know to be useful right now?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Write it for that question. Update it every session. The compounding effect over months of development is hard to overstate.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building an open source self-hosted Docker ecosystem. If you're interested in the project or in how I use Claude Code to build it, follow along.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>claudecode</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>I added autonomous AI agents to my self-hosted Docker platform — here's what Level 1 vs Level 2 autonomy actually means</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Thu, 16 Apr 2026 07:07:02 +0000</pubDate>
      <link>https://dev.to/alvarito1983/i-added-autonomous-ai-agents-to-my-self-hosted-docker-platform-heres-what-level-1-vs-level-2-5dde</link>
      <guid>https://dev.to/alvarito1983/i-added-autonomous-ai-agents-to-my-self-hosted-docker-platform-heres-what-level-1-vs-level-2-5dde</guid>
      <description>&lt;p&gt;Managing a Docker ecosystem manually doesn't scale. Not because the tools are bad — Docker is fine, Compose is fine — but because the cognitive load of watching six services across multiple hosts adds up fast. You miss things. A container dies at 3am. A CVE sits unactioned for two weeks because you were heads-down on something else. An SSL cert expires because the renewal check was a cron job you forgot about.&lt;/p&gt;

&lt;p&gt;I've been building a self-hosted Docker management platform for the past few months. Last week I added something I'd been putting off: autonomous agents. Not AI wrappers around Docker commands. Actual agents that monitor, decide, and act — with two distinct levels of autonomy depending on the situation.&lt;/p&gt;

&lt;p&gt;Here's what I learned building them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllsp7ii6jn10hunx3pl6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllsp7ii6jn10hunx3pl6.png" alt=" " width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with "smart" monitoring
&lt;/h2&gt;

&lt;p&gt;Most monitoring tools solve the detection problem. They tell you something is wrong. Then you still have to act.&lt;/p&gt;

&lt;p&gt;That's fine for a team with an on-call rotation. For a solo homelab operator or a small team without 24/7 coverage, detection without action means you're still waking up at 3am — just now with a notification in your hand.&lt;/p&gt;

&lt;p&gt;What I wanted was a system that could handle the obvious cases automatically, and only escalate to me when the situation genuinely required human judgment.&lt;/p&gt;

&lt;p&gt;That distinction — obvious cases vs judgment calls — is where the Level 1 / Level 2 split comes from.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 1 — Rule-based, no AI, always on
&lt;/h2&gt;

&lt;p&gt;Level 1 agents operate on pure logic. No API calls, no model inference, no external dependencies. They run on a poll interval and apply deterministic rules.&lt;/p&gt;

&lt;p&gt;Examples of what Level 1 handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A container exits unexpectedly → restart it&lt;/li&gt;
&lt;li&gt;An image has a new digest available → flag it for review&lt;/li&gt;
&lt;li&gt;A monitor has failed N consecutive checks → fire an alert&lt;/li&gt;
&lt;li&gt;A notification channel has been disabled → warn that delivery is impossible&lt;/li&gt;
&lt;li&gt;An SSL cert expires in less than X days → escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These cases have clear right answers. A container that died should come back up. A cert expiring in 7 days needs attention. No judgment required.&lt;/p&gt;

&lt;p&gt;Level 1 is the baseline. It works without any API key, without any configuration beyond enabling it. The idea is that anyone who installs the platform gets meaningful automation out of the box.&lt;/p&gt;

&lt;p&gt;One thing I got wrong initially: I had Level 1 acting too fast. A container that exits and restarts within Docker's own backoff window doesn't need the agent to intervene — Docker already handles it. The fix was adding a minimum dead time before the agent acts. If the container has been down for less than 60 seconds, wait. Docker might already be handling it.&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;minDeadTime&lt;/code&gt; parameter turned out to be one of the most important tuning knobs in the whole system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 2 — Claude Haiku decides when it's ambiguous
&lt;/h2&gt;

&lt;p&gt;Level 2 activates when a situation falls outside the clear rules. The trigger is usually a pattern that looks bad but might have a legitimate explanation.&lt;/p&gt;

&lt;p&gt;The clearest example: a container that keeps crashing.&lt;/p&gt;

&lt;p&gt;Level 1 will restart a crashed container. But what if it crashes again? And again? At some point, restarting a container in a crash loop isn't helpful — you're just burning resources and masking a real problem. But stopping it entirely might break something that depends on it.&lt;/p&gt;

&lt;p&gt;This is a judgment call. The right answer depends on context: what's in the logs, how many times it's restarted in the last hour, what the exit codes look like, whether this is a critical service or a background worker.&lt;/p&gt;

&lt;p&gt;Level 2 passes that context to Claude Haiku and asks for a decision: &lt;strong&gt;restart&lt;/strong&gt;, &lt;strong&gt;escalate&lt;/strong&gt;, or &lt;strong&gt;force-stop&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The prompt includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container name and image&lt;/li&gt;
&lt;li&gt;Restart policy&lt;/li&gt;
&lt;li&gt;Exit code history&lt;/li&gt;
&lt;li&gt;Last 20 lines of logs (capped at 1500 chars)&lt;/li&gt;
&lt;li&gt;Current restart count within the window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude returns a JSON decision with an action and a reason. That reason gets surfaced in the UI so the operator can see why the agent acted the way it did.&lt;/p&gt;

&lt;p&gt;A few things I got right here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fallback to escalate on any error.&lt;/strong&gt; If the API call fails, times out, or returns something unparseable, the agent escalates rather than retrying or doing nothing. Conservative failure mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guard against decision spam.&lt;/strong&gt; The agent tracks its Claude decisions per container. If it already asked Claude about this container at restart count N, it won't ask again until the count changes. Without this, a stuck container would generate an API call every 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI badge in the feed.&lt;/strong&gt; When Claude made a decision, the action in the feed shows an AI badge and the model's reasoning inline. Operators can see "Claude decided to escalate this because the logs show a repeated OOM pattern" rather than just "escalated."&lt;/p&gt;




&lt;h2&gt;
  
  
  The demo that made it real
&lt;/h2&gt;

&lt;p&gt;The moment the system clicked for me was this test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker stop my-service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within 90 seconds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The container agent detected the exit (exit code 137 — SIGKILL from &lt;code&gt;docker stop&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Waited for &lt;code&gt;minDeadTime&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Restarted the container automatically&lt;/li&gt;
&lt;li&gt;Logged the action: &lt;code&gt;Auto-restarted · exitCode=137 · policy=unless-stopped · restart #1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The orchestrator detected the service coming back online&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No intervention. No notification. It just handled it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Exit code 137 is &lt;code&gt;128 + 9&lt;/code&gt; — SIGKILL. The agent knows this isn't a crash, it's an external stop. In production you'd probably exclude manually-stopped containers from auto-restart. That's what the exclusion list is for.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What I built across the ecosystem
&lt;/h2&gt;

&lt;p&gt;Six agents in total, one per tool:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Container Agent&lt;/strong&gt; — watches running containers, restarts crashed ones, escalates crash loops to Level 2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update Agent&lt;/strong&gt; — monitors image digests for changes, evaluates whether updates are safe to apply automatically or should be deferred for review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitor Agent&lt;/strong&gt; — analyzes uptime check results, detects three patterns: sustained failure, flapping (up-down-up-down), and SSL expiry. Level 2 handles ambiguous sustained failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scan Agent&lt;/strong&gt; — reads CVE scan results and evaluates severity. Critical vulnerabilities go to Level 2 — the model assesses whether the CVE is likely exploitable in the specific configuration. High vulnerabilities alert directly without AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notify Agent&lt;/strong&gt; — watches the notification system itself. Catches misconfiguration (disabled channels, no active rules) and anomalies (delivery failure spikes, notification volume spikes).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Master Agent&lt;/strong&gt; — sits at the orchestrator level with visibility across the whole ecosystem. Detects cross-service patterns: ecosystem-wide degradation, event storms, correlated critical events across multiple services.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Level 2 is not
&lt;/h2&gt;

&lt;p&gt;It's not a replacement for alerting. The agents escalate to the notification system when they decide a human needs to know. Level 2 makes that escalation smarter — it filters out the cases that don't warrant waking someone up.&lt;/p&gt;

&lt;p&gt;It's not always right. The model can make bad calls, especially with ambiguous logs. That's why every Level 2 decision is visible in the feed with the reasoning attached. You can always override, and you can always clear a decision and let the agent re-evaluate.&lt;/p&gt;

&lt;p&gt;It's not required. The whole system works without an API key. Level 1 covers the clear cases. Level 2 is additive.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing I didn't expect
&lt;/h2&gt;

&lt;p&gt;Building this forced me to think carefully about what "autonomous" actually means for infrastructure tooling.&lt;/p&gt;

&lt;p&gt;Full autonomy — an agent that can do anything without asking — is the wrong target. The right target is &lt;strong&gt;calibrated autonomy&lt;/strong&gt;: the agent acts confidently on clear cases, hesitates on ambiguous ones, and always leaves a paper trail explaining what it did and why.&lt;/p&gt;

&lt;p&gt;The Level 1 / Level 2 split is really just a formalization of that. Some situations have obvious answers. Some don't. The system should know the difference.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building this as an open-source self-hosted platform. If you're interested in following the progress, drop a comment.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>selfhosted</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>The next phase of AI isn't smarter models. It's infrastructure.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Wed, 15 Apr 2026 06:54:02 +0000</pubDate>
      <link>https://dev.to/alvarito1983/the-next-phase-of-ai-isnt-smarter-models-its-infrastructure-6jl</link>
      <guid>https://dev.to/alvarito1983/the-next-phase-of-ai-isnt-smarter-models-its-infrastructure-6jl</guid>
      <description>&lt;p&gt;Everyone is talking about the models.&lt;/p&gt;

&lt;p&gt;GPT-5. Claude Opus 4. Gemini Ultra. Which one scores higher on benchmarks. Which one writes better code. Which one is worth the subscription.&lt;/p&gt;

&lt;p&gt;I think that's the wrong conversation. And I think in 18 months, most people will agree.&lt;/p&gt;

&lt;p&gt;The next phase of AI isn't about smarter models. It's about infrastructure. And I say that as someone who has spent 15 years building infrastructure for a living.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where we are now
&lt;/h2&gt;

&lt;p&gt;The shift that already happened — and that most people haven't fully absorbed — is the move from conversational AI to agentic AI.&lt;/p&gt;

&lt;p&gt;Conversational AI waits for you. You ask, it answers. You prompt, it responds. The human is the engine; the AI is the tool.&lt;/p&gt;

&lt;p&gt;Agentic AI plans and executes. You give it a goal. It reads your codebase, breaks the problem into steps, executes them in sequence, checks the results, fixes what broke, and reports back. The AI is the engine; the human is the director.&lt;/p&gt;

&lt;p&gt;This isn't future speculation. It's what Claude Code, GitHub Copilot's agent mode, and a dozen other tools are doing right now. I've been running multi-agent workflows for months — orchestrator agents coordinating specialist sub-agents building entire features in parallel while I review the output.&lt;/p&gt;

&lt;p&gt;But here's what I think most people are missing: this transition creates massive unsolved infrastructure problems. And those problems are going to define the next 18-24 months.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I think happens next
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Agents are going to need their own infrastructure
&lt;/h3&gt;

&lt;p&gt;Right now, agents run in your terminal, on your machine, inside someone else's cloud. That works at small scale. It breaks at large scale.&lt;/p&gt;

&lt;p&gt;Think about what a production agent system actually needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent state&lt;/strong&gt; — an agent mid-task needs to survive a restart. Where does its working memory live?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Networking&lt;/strong&gt; — agents calling other agents, agents calling external APIs, agents accessing internal services. Who manages that network?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity and auth&lt;/strong&gt; — if an agent is making API calls, creating files, pushing commits, what identity does it have? How do you audit what it did?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource limits&lt;/strong&gt; — a runaway agent can consume compute indefinitely. Who enforces the limits?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — when something goes wrong in a multi-agent workflow, how do you trace which agent made which decision at which step?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this exists in a coherent form yet. We're running agents the way we ran web apps in 2003 — on single servers, with manual restarts, hoping nothing crashes.&lt;/p&gt;

&lt;p&gt;Someone is going to build the Kubernetes of agents. Probably in the next 18 months. And when they do, someone is going to have to run it.&lt;/p&gt;

&lt;p&gt;That someone is infrastructure engineers.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Prompt engineering is going to die
&lt;/h3&gt;

&lt;p&gt;Not immediately. But the trend is clear.&lt;/p&gt;

&lt;p&gt;Right now, there's an entire cottage industry around "prompt engineering" — the art of asking AI the right question in the right way to get the right answer. It's a real skill. It matters today.&lt;/p&gt;

&lt;p&gt;But it's a transitional skill, not a permanent one.&lt;/p&gt;

&lt;p&gt;As agentic systems mature, the question stops being "how do I write the perfect prompt?" and starts being "how do I design this system of agents so it reliably solves this class of problems?"&lt;/p&gt;

&lt;p&gt;That's not prompt engineering. That's systems design.&lt;/p&gt;

&lt;p&gt;It's the same shift that happened with databases. Early practitioners had to be experts at writing optimal SQL queries. Then query optimizers got good enough that you could trust the system to figure it out — and the skill that mattered became schema design, indexing strategy, and query planning at the architecture level.&lt;/p&gt;

&lt;p&gt;The same thing is going to happen with AI. The people who will matter aren't the ones who can write clever prompts. They're the ones who can design reliable systems.&lt;/p&gt;

&lt;p&gt;I've been building a 6-tool Docker management ecosystem with Claude Code. The prompts matter less than I expected. What matters almost entirely is: clear architecture, explicit scope boundaries, good context management, and knowing when something is wrong. Those are systems thinking skills, not prompt writing skills.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Agents are going to start modifying themselves
&lt;/h3&gt;

&lt;p&gt;This one is further out, but the early signs are already there.&lt;/p&gt;

&lt;p&gt;Right now, agents execute within the boundaries you set. They read your CLAUDE.md, follow your instructions, build what you ask.&lt;/p&gt;

&lt;p&gt;But agents already write code. And some of that code is agent infrastructure — the scaffolding, the context management, the workflow definitions. The logical next step is agents that notice their own inefficiencies and propose modifications to their own workflows.&lt;/p&gt;

&lt;p&gt;We're not there yet. But the gap between "agent that executes a workflow" and "agent that improves a workflow" is narrower than it looks.&lt;/p&gt;

&lt;p&gt;When it closes, the role of the human changes again. Not from director to spectator — the human still needs to validate, approve, understand what changed and why. But the cycle time between "this workflow has a problem" and "the workflow is fixed" compresses dramatically.&lt;/p&gt;

&lt;p&gt;The implication: the humans who stay in the loop effectively are the ones who understand systems well enough to evaluate a proposed change. Generalists who can prompt but can't reason about system behavior will struggle here.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Self-hosted AI is going to become serious infrastructure
&lt;/h3&gt;

&lt;p&gt;This is the one I'm most confident about, because it's already starting.&lt;/p&gt;

&lt;p&gt;Right now, most AI runs on someone else's infrastructure. Anthropic's servers. OpenAI's servers. Google's servers. You send your data there, get a response back, trust that the provider handles it responsibly.&lt;/p&gt;

&lt;p&gt;For consumer use, this is fine. For enterprise use — especially in regulated industries, sensitive domains, or organizations that have genuinely learned the lesson about vendor dependency — this is becoming a problem.&lt;/p&gt;

&lt;p&gt;The models are getting small enough to run on-premise. Llama, Mistral, Qwen — capable open-source models that you can run on hardware you control. The tooling around self-hosted inference is maturing fast.&lt;/p&gt;

&lt;p&gt;And when organizations start running AI on their own infrastructure, someone has to manage it. GPUs don't configure themselves. Model updates need to be evaluated and deployed. Inference infrastructure needs to be monitored, scaled, and maintained.&lt;/p&gt;

&lt;p&gt;That's not a developer job. That's an infrastructure job.&lt;/p&gt;

&lt;p&gt;I manage 7,000 servers professionally. I can see exactly where the self-hosted AI infrastructure conversation is heading — it's heading toward the same conversations we had about on-premise databases and private cloud ten years ago. Same problems, new stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means in practice
&lt;/h2&gt;

&lt;p&gt;I'm not making predictions about timelines with false precision. But the direction seems clear:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model layer is becoming a commodity.&lt;/strong&gt; When you have ten capable models competing for your subscription, the model itself stops being the differentiator. The system around it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The infrastructure layer is becoming critical.&lt;/strong&gt; Agents need to run somewhere, persist state somewhere, authenticate somewhere, get monitored somewhere. That infrastructure doesn't exist yet in mature form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The skills that matter are shifting.&lt;/strong&gt; Prompt writing → systems design. Single-agent workflows → multi-agent orchestration. Cloud-hosted AI → self-hosted AI infrastructure.&lt;/p&gt;

&lt;p&gt;I've been building on the leading edge of this — running agent workflows, building self-hosted tools, thinking about how multiple services coordinate and communicate. And what I keep noticing is that the problems I'm solving aren't AI problems. They're infrastructure problems wearing AI clothes.&lt;/p&gt;

&lt;p&gt;The next chapter of AI isn't written by the people building smarter models. It's written by the people building the systems those models run in.&lt;/p&gt;




&lt;p&gt;NEXUS Ecosystem is my attempt to build serious infrastructure for self-hosted Docker environments. Open source, 6 tools, unified control plane.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  ai #devops #infrastructure #claudecode #selfhosted #programming #discuss #career
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>infrastructure</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Claude Code Part 2: How I use Sub-agents to build entire features in parallel</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Tue, 14 Apr 2026 11:26:27 +0000</pubDate>
      <link>https://dev.to/alvarito1983/claude-code-part-2-how-i-use-sub-agents-to-build-entire-features-in-parallel-aj3</link>
      <guid>https://dev.to/alvarito1983/claude-code-part-2-how-i-use-sub-agents-to-build-entire-features-in-parallel-aj3</guid>
      <description>&lt;p&gt;This is Part 2 of my Claude Code series. &lt;a href="https://dev.to/alvarito1983/claude-code-the-complete-guide-from-zero-to-autonomous-development-2fk"&gt;Part 1 covers the fundamentals&lt;/a&gt; — installation, CLAUDE.md, prompts, and workflow patterns. Read that first if you're new to Claude Code.&lt;/p&gt;

&lt;p&gt;This part is about something most guides don't cover: &lt;strong&gt;sub-agents&lt;/strong&gt;. How to run multiple specialized AI agents in parallel, coordinate them, and use them to build entire features simultaneously.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. Everything here comes from building NEXUS Ecosystem — six self-hosted Docker tools — where sub-agents went from curiosity to core workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What sub-agents actually are
&lt;/h2&gt;

&lt;p&gt;When you run Claude Code normally, you have one agent doing one thing at a time. It reads, thinks, writes, verifies — sequentially.&lt;/p&gt;

&lt;p&gt;Sub-agents change that. You can launch multiple specialized agents that work in parallel, each focused on a specific task, while the main agent orchestrates.&lt;/p&gt;

&lt;p&gt;The mental model: instead of one developer doing everything, you have a tech lead (the main agent) coordinating a team of specialists (sub-agents).&lt;/p&gt;

&lt;p&gt;Here's what it looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;● 2 background agents launched (↓ to manage)
   ├─ nexus-security improvements: Check Now fix, sidebar logs, Report view
   └─ nexus-hub Log Center: backend routes, tool log helpers, frontend Logs panel
● Both agents are running in parallel.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's two agents simultaneously building two different modules of the same project. One fixing and extending the Security tool. One building an entirely new Log Center in Hub. Neither blocking the other.&lt;/p&gt;

&lt;p&gt;The work that would have taken 3-4 sequential hours took under 45 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installing the agent marketplace
&lt;/h2&gt;

&lt;p&gt;Claude Code has a built-in agent system, but the real power comes from the community marketplace. The one worth installing is &lt;code&gt;wshobson/agents&lt;/code&gt; — 22 specialized agents covering everything from backend security to Kubernetes to incident response.&lt;/p&gt;

&lt;p&gt;Install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add the marketplace&lt;/span&gt;
claude config &lt;span class="nb"&gt;set &lt;/span&gt;extraKnownMarketplaces.claude-code-workflows.source.source github
claude config &lt;span class="nb"&gt;set &lt;/span&gt;extraKnownMarketplaces.claude-code-workflows.source.repo wshobson/agents

&lt;span class="c"&gt;# Install the agents you need&lt;/span&gt;
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;agent-teams@claude-code-workflows
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;backend-development@claude-code-workflows
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;security-scanning@claude-code-workflows
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;debugging-toolkit@claude-code-workflows
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;comprehensive-review@claude-code-workflows
claude plugin &lt;span class="nb"&gt;install &lt;/span&gt;incident-response@claude-code-workflows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify what's installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude plugin list
&lt;span class="c"&gt;# or inside a session:&lt;/span&gt;
/agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/agents&lt;/code&gt; command opens a panel showing all available agents with their model assignments (haiku, sonnet, or opus depending on complexity).&lt;/p&gt;




&lt;h2&gt;
  
  
  The agents worth knowing
&lt;/h2&gt;

&lt;p&gt;Not all 22 agents are equally useful. These are the ones I actually use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;agent-teams:team-lead&lt;/code&gt; (opus)&lt;/strong&gt; — Orchestrates other agents. Use this when you need someone to break down a complex task and delegate to specialists. It thinks slower but plans better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;agent-teams:team-implementer&lt;/code&gt; (opus)&lt;/strong&gt; — Pure implementation. Give it a spec and it builds. Works best when the architecture is already decided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;agent-teams:team-reviewer&lt;/code&gt; (opus)&lt;/strong&gt; — Code review. Run this after implementation to catch issues before you do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;backend-development:security-auditor&lt;/code&gt; (sonnet)&lt;/strong&gt; — Scans backend code for vulnerabilities, auth issues, injection risks. Genuinely useful before deploying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;debugging-toolkit:debugger&lt;/code&gt; (sonnet)&lt;/strong&gt; — Specialized in diagnosis. Give it a symptom and it traces through the code to find the cause. Better than the general agent for tricky bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;comprehensive-review:architect-review&lt;/code&gt; (opus)&lt;/strong&gt; — High-level architecture review. Use this when you've built something significant and want a second opinion on the design decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;security-scanning:threat-modeling-expert&lt;/code&gt;&lt;/strong&gt; — Thinks about what could go wrong in your system. Good for security-sensitive features.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to launch sub-agents
&lt;/h2&gt;

&lt;p&gt;The simplest way is to describe parallel work in your prompt and ask Claude Code to use sub-agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Implement two features in parallel using sub-agents:

&lt;span class="gu"&gt;## SUB-AGENT 1 — Security improvements&lt;/span&gt;
[detailed spec for security work]

&lt;span class="gu"&gt;## SUB-AGENT 2 — Log Center&lt;/span&gt;
[detailed spec for log center work]

Report when both complete.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code launches both agents and manages them. You see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;● Agent "Security improvements" completed
● Sub-agente 1 completado. Resumen:
  - Check Now fix: polling cada 3s...
  - Panel de logs: ring buffer 50 entradas...
  - Report view: deduplicación de CVEs...
  ---
  Sub-agent 2 sigue en ejecución.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: &lt;strong&gt;each sub-agent gets its own context window&lt;/strong&gt;. They don't share state or interfere with each other. This is why parallel work is possible — there's no race condition on the context.&lt;/p&gt;




&lt;h2&gt;
  
  
  Checking on agents without interrupting them
&lt;/h2&gt;

&lt;p&gt;This took me a while to figure out. When agents are running in background, you don't want to interrupt them — but you might want to know what's happening.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/btw&lt;/code&gt; command is designed exactly for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/btw how are the two background agents doing?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get a status report without the agents noticing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Los agentes siguen corriendo.
  Sub-agente 1 (nexus-security) — en progreso
  - Ya modificó containerMonitor.js
  - Queda: fix Check Now, panel de logs, vista Report

  Sub-agente 2 (nexus-hub Log Center) — en progreso  
  - Sin cambios confirmados aún
  - Tiene más trabajo: rutas logs.js, helper logToHub(),
    componente LogsPanel.jsx con 3 tabs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agents keep running. You keep informed. No interruption.&lt;/p&gt;

&lt;p&gt;To actually manage them, press &lt;code&gt;↓&lt;/code&gt; to open the agent panel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;● 2 background agents launched (↓ to manage)
   ├─ nexus-security improvements ✓ completed
   └─ nexus-hub Log Center — still running
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The orchestrator pattern
&lt;/h2&gt;

&lt;p&gt;For complex multi-service projects, the most effective pattern is explicit orchestration: one agent that reads the full context and coordinates, specialists that implement.&lt;/p&gt;

&lt;p&gt;Here's a real prompt structure I use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the orchestrator for this task. 
Before launching any sub-agents:

1. Read the CLAUDE.md completely
2. Read these files to understand current state:
   - nexus-hub/backend/server.js
   - nexus-security/backend/src/routes/scan.js
   - nexus-hub/backend/src/services/flowEngine.js

3. Plan the work and split into parallel tracks
4. Launch sub-agents with specific, complete specs
5. Wait for all to complete
6. Run a final integration check

The goal: implement CVE scanning in Security with 
automatic alerts to Hub when Critical found.

Use security-scanning:security-auditor for the 
security implementation and backend-development:backend-architect 
for the Hub integration.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator reads everything first, makes the architectural decisions, then delegates implementation to specialists who don't need to understand the full picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real example: building Security and Log Center in parallel
&lt;/h2&gt;

&lt;p&gt;Here's a condensed version of a real session from building NEXUS. I needed to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a progress bar to Security's Check Now button&lt;/li&gt;
&lt;li&gt;Add a sidebar activity log to Security
&lt;/li&gt;
&lt;li&gt;Build an entirely new Log Center in Hub with 3 tabs&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;logToHub()&lt;/code&gt; helper to all 5 tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of doing this sequentially (3-4 hours), I launched two sub-agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Dos tareas en paralelo con sub-agentes:

&lt;span class="gu"&gt;## SUB-AGENTE 1 — nexus-security&lt;/span&gt;
Lee nexus-security/frontend/src/components/Dashboard.jsx
y nexus-security/backend/src/services/containerMonitor.js

Implementa:
&lt;span class="p"&gt;1.&lt;/span&gt; Check Now con progreso real — polling /api/scan/results
   cada 3s, timeout 5min, toast al terminar
&lt;span class="p"&gt;2.&lt;/span&gt; Panel Activity en sidebar — ring buffer 50 entradas,
   logs tiempo real via Socket.io, color-coded por tipo

Rebuild: docker compose -f docker-compose.test.yml up --build -d nexus-security

&lt;span class="gu"&gt;## SUB-AGENTE 2 — nexus-hub Log Center&lt;/span&gt;
Lee nexus-hub/backend/server.js y src/routes/ (todos)

Implementa:
&lt;span class="p"&gt;-&lt;/span&gt; POST /api/logs — ingest con X-Api-Key
&lt;span class="p"&gt;-&lt;/span&gt; GET /api/logs/ecosystem, /app/:source, /docker/:source
&lt;span class="p"&gt;-&lt;/span&gt; Retención 10 días en /app/data/logs/[source]/[YYYY-MM-DD].json
&lt;span class="p"&gt;-&lt;/span&gt; LogsPanel.jsx con 3 tabs: Ecosystem Events / App Logs / Docker Logs
&lt;span class="p"&gt;-&lt;/span&gt; logToHub() helper en los 5 tools

Rebuild: todos los servicios al terminar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;● Agent "nexus-security improvements" completed  ✓  8m 12s
● Agent "nexus-hub Log Center" completed         ✓  19m 44s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two significant features. Parallel. ~20 minutes total.&lt;/p&gt;

&lt;p&gt;The Security agent finished first (simpler scope). The Log Center agent took longer (6 services to modify). But they didn't block each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  What makes sub-agents faster
&lt;/h2&gt;

&lt;p&gt;It's not magic. Here's the actual reason parallel agents save time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No context switching cost.&lt;/strong&gt; When one agent finishes reading Security's files, it doesn't have to context-switch to understand Hub's architecture. The Hub agent has its own fresh context, already loaded with the right files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No sequential dependencies (when designed correctly).&lt;/strong&gt; If you structure the work so agents aren't waiting on each other's output, they run fully in parallel. The Log Center doesn't need Security's code to be finished — it's a separate service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model specialization.&lt;/strong&gt; A &lt;code&gt;security-auditor&lt;/code&gt; agent running on sonnet is tuned differently than a general &lt;code&gt;team-implementer&lt;/code&gt;. The right model for the right task.&lt;/p&gt;

&lt;p&gt;The failure mode: creating false dependencies. If you ask Agent 2 to "use the pattern that Agent 1 will establish", Agent 2 has to wait. Design your parallel tasks to be genuinely independent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Giving agents their own CLAUDE.md context
&lt;/h2&gt;

&lt;p&gt;This is underused. You can pre-load agents with specific context using the prompt structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## CONTEXT FOR THIS AGENT&lt;/span&gt;
You are working on nexus-security only.
The project is at E:&lt;span class="se"&gt;\C&lt;/span&gt;laude&lt;span class="se"&gt;\N&lt;/span&gt;EXUS&lt;span class="se"&gt;\n&lt;/span&gt;exus-security

Key facts:
&lt;span class="p"&gt;-&lt;/span&gt; Accent color: #ef4444 (red)
&lt;span class="p"&gt;-&lt;/span&gt; Backend port: 9093, container: nexus-security-test  
&lt;span class="p"&gt;-&lt;/span&gt; Frontend uses CSS custom properties, no Tailwind
&lt;span class="p"&gt;-&lt;/span&gt; Socket.io is already configured in server.js
&lt;span class="p"&gt;-&lt;/span&gt; The scan result format is: { id, severity, package, 
  version, fixedIn, description }
&lt;span class="p"&gt;-&lt;/span&gt; Do not touch nexus-hub — that's a separate agent's scope

&lt;span class="gu"&gt;## YOUR TASK&lt;/span&gt;
[specific instructions]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This context header tells the agent exactly what it needs to know without reading the entire CLAUDE.md. For sub-agents with narrow scope, targeted context is more efficient than full project context.&lt;/p&gt;




&lt;h2&gt;
  
  
  The mistakes I made
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Launching agents with vague specs.&lt;/strong&gt; Sub-agents don't have the conversational context you've built up in the main session. They start cold. If your spec says "fix the Check Now button", the agent doesn't know what's wrong with it, what the expected behavior is, or what files to look at. Be exhaustively specific.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating dependent parallel tasks.&lt;/strong&gt; I once launched two agents where Agent 2 needed to "follow the pattern Agent 1 establishes". Agent 2 started before Agent 1 finished, made its own decisions, and the patterns were inconsistent. Now I either sequence dependent tasks or make both specs completely explicit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not including the rebuild step.&lt;/strong&gt; Sub-agents will write perfect code and then stop. If you don't explicitly tell them to rebuild the Docker containers, you won't see the changes. Always end sub-agent specs with the rebuild command.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overloading a single sub-agent.&lt;/strong&gt; One agent doing 10 things is slower than two agents doing 5 things each. But one agent doing 30 things will run out of context before it finishes. Scope sub-agents to coherent, bounded tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring the completion order.&lt;/strong&gt; Agents complete when they're done, not when you expect them to. The faster one might complete while the slower one is still running. Check both before doing integration work.&lt;/p&gt;




&lt;h2&gt;
  
  
  When not to use sub-agents
&lt;/h2&gt;

&lt;p&gt;Sub-agents are powerful but not always the right tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use them for tasks with shared state.&lt;/strong&gt; If both agents are writing to the same store.js or modifying the same Docker compose file, you'll get conflicts. One agent's changes will overwrite the other's.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use them for highly sequential work.&lt;/strong&gt; If Step B genuinely cannot start until Step A is complete, sub-agents just add overhead. Sequential is fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use them for quick tasks.&lt;/strong&gt; Launching a sub-agent has overhead. For a 5-minute task, just do it directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use them when you need to watch the work.&lt;/strong&gt; Sub-agents run in background. If you need to review and approve intermediate steps, sequential work with explicit phase breaks is better.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real productivity shift
&lt;/h2&gt;

&lt;p&gt;When I look back at building NEXUS Ecosystem — six tools, SSO, CVE scanning, real-time logs, event-driven alerts — the sub-agent pattern is what made the scope possible for a solo developer.&lt;/p&gt;

&lt;p&gt;Not because it's magic. Because it matches how real work is structured: parallel tracks, specialized expertise, coordinated by someone who understands the full system.&lt;/p&gt;

&lt;p&gt;That someone is still you. The orchestration decisions, the architectural choices, the review — those are yours. Sub-agents execute. You direct.&lt;/p&gt;

&lt;p&gt;The ceiling on what one person can build has moved. Sub-agents are a significant part of why.&lt;/p&gt;




&lt;p&gt;NEXUS Ecosystem is open source. All the patterns above come from real sessions building it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983
&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;li&gt;Part 1: &lt;a href="https://dev.to/alvarito1983/claude-code-the-complete-guide-from-zero-to-autonomous-development-2fk"&gt;Claude Code: The Complete Guide&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  claudecode #ai #programming #webdev #devtools #productivity #opensource #softwaredevelopment
&lt;/h1&gt;

</description>
      <category>claude</category>
      <category>devtools</category>
      <category>ai</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>I run 7,000 servers at work. My homelab taught me more about reliability than any of them.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Tue, 14 Apr 2026 08:06:03 +0000</pubDate>
      <link>https://dev.to/alvarito1983/i-run-7000-servers-at-work-my-homelab-taught-me-more-about-reliability-than-any-of-them-5h8b</link>
      <guid>https://dev.to/alvarito1983/i-run-7000-servers-at-work-my-homelab-taught-me-more-about-reliability-than-any-of-them-5h8b</guid>
      <description>&lt;p&gt;I manage more than 7,000 servers.&lt;/p&gt;

&lt;p&gt;They're spread across data centers on multiple continents. They run critical telecom infrastructure for millions of users. When something breaks, it affects real people, real services, real money. There are escalation procedures, change management windows, runbooks, on-call rotations, SLAs.&lt;/p&gt;

&lt;p&gt;And yet, the place where I've learned the most about reliability isn't work.&lt;/p&gt;

&lt;p&gt;It's my homelab.&lt;/p&gt;




&lt;h2&gt;
  
  
  What enterprise infrastructure teaches you
&lt;/h2&gt;

&lt;p&gt;At scale, infrastructure engineering becomes a discipline of process.&lt;/p&gt;

&lt;p&gt;You don't just restart a service — you open a change ticket, get approval, schedule a maintenance window, notify stakeholders, execute the change with a rollback plan ready, and document what happened. You don't just deploy software — you go through staging, testing, canary deployments, gradual rollout.&lt;/p&gt;

&lt;p&gt;This is good. This is necessary. When you're responsible for infrastructure that thousands of people depend on, process is what keeps you from making catastrophic mistakes at 3am.&lt;/p&gt;

&lt;p&gt;But process also insulates you from consequences.&lt;/p&gt;

&lt;p&gt;When something breaks at work, there's a team. There's an escalation path. There's a senior engineer who's seen this before. There's documentation. There's a vendor support contract. The blast radius of any single mistake is contained by layers of process designed specifically to contain it.&lt;/p&gt;

&lt;p&gt;You learn a lot. But you learn it slowly, safely, with a net.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a homelab teaches you instead
&lt;/h2&gt;

&lt;p&gt;My homelab has no runbooks. No change management. No on-call rotation except me.&lt;/p&gt;

&lt;p&gt;When something breaks, I broke it. When something is down, it stays down until I fix it. When I make a mistake at 11pm, I'm the one staying up until 1am undoing it. When I deploy something that kills my Docker networking, my wife can't use the media server until I sort it out.&lt;/p&gt;

&lt;p&gt;That immediacy changes how you think.&lt;/p&gt;

&lt;p&gt;At work, I know intellectually that persistent volumes matter. In my homelab, I learned it viscerally the first time I rebuilt a container and lost three months of monitoring data because I forgot to mount the volume. I've never forgotten it since. The lesson cost me an evening. It was worth it.&lt;/p&gt;

&lt;p&gt;At work, I understand that networking configuration is consequential. In my homelab, I understand it in my hands — because I've misconfigured it, watched everything break, and traced the problem back through docker network ls and ip route until I found it. No ticket, no escalation, no one to ask. Just me and the problem.&lt;/p&gt;

&lt;p&gt;The difference is skin in the game.&lt;/p&gt;




&lt;h2&gt;
  
  
  The specific things I've learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Persistence is everything, and no one tells you until it's too late.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enterprise storage is managed by a storage team. The persistence layer is abstracted away. In my homelab, I'm the storage team. The first time I rebuilt a service and lost its data because I didn't understand how Docker volumes worked with my compose configuration, I understood persistence at a level I never had before. Now I think about persistence first, before I think about almost anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring you build yourself is monitoring you actually understand.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At work, we have enterprise monitoring tools. They work. But they were configured by someone else, they alert on thresholds someone else set, and when they fire, the first question is always "what does this alert actually mean?"&lt;/p&gt;

&lt;p&gt;When I built Pulse — my own uptime monitoring tool — I wrote every check, set every threshold, decided what mattered and what didn't. When it alerts, I know exactly what it means. That understanding transfers back to how I think about monitoring at work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes you've personally caused are failure modes you never forget.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I know theoretically what happens when a container can't reach its database. I've read the documentation. I've seen the symptoms described in runbooks.&lt;/p&gt;

&lt;p&gt;I also know exactly what it looks like in practice, because I've caused it. I misconfigured the network, watched the application fail in a confusing way, spent forty minutes figuring out it was a DNS resolution problem, and fixed it. That forty minutes of confusion is what makes the knowledge stick.&lt;/p&gt;

&lt;p&gt;At work, someone else usually causes the problems. The experience of debugging someone else's mistake is educational, but it's different from debugging your own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Small-scale forces clarity that large-scale obscures.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you're managing 7,000 servers, you think in abstractions. You have to — you can't think about individual machines. But sometimes those abstractions hide things.&lt;/p&gt;

&lt;p&gt;In my homelab, I have maybe thirty containers. I know what every one of them does. I know why it exists. I know what it depends on and what depends on it. That level of understanding is impossible at scale, but practicing it at small scale sharpens the instincts you need at large scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  The project that came out of it
&lt;/h2&gt;

&lt;p&gt;All of this eventually pushed me to build something.&lt;/p&gt;

&lt;p&gt;I kept finding that the tools I was using in my homelab — Portainer, various monitoring solutions, alerting systems — were built for someone else's use case. They were too heavy, too complex, too assumption-laden.&lt;/p&gt;

&lt;p&gt;So I built NEXUS Ecosystem: six self-hosted Docker tools designed specifically for the homelab and small-team use case. Container management, image update detection, uptime monitoring, CVE scanning, alerts, and a central Hub with SSO.&lt;/p&gt;

&lt;p&gt;Building it taught me more than using any existing tool would have. I had to make every design decision. I had to understand why the architecture worked, not just how to configure it. I broke it repeatedly and had to understand why it broke.&lt;/p&gt;

&lt;p&gt;That's the homelab ethos applied to software development.&lt;/p&gt;




&lt;h2&gt;
  
  
  What enterprise infrastructure still teaches you that a homelab can't
&lt;/h2&gt;

&lt;p&gt;I don't want to romanticize the homelab at the expense of the real thing.&lt;/p&gt;

&lt;p&gt;There are things you only learn at scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real failure modes.&lt;/strong&gt; A homelab doesn't have Byzantine failures, network partitions across continents, or hardware that fails in ambiguous ways while staying technically online. The edge cases that happen at scale are qualitatively different from the edge cases that happen with thirty containers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational discipline.&lt;/strong&gt; The change management process I described earlier is genuinely valuable. The instinct to document, to plan rollback, to notify stakeholders — these habits save incidents. A homelab doesn't build them the same way because the stakes are too low.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collaboration under pressure.&lt;/strong&gt; Debugging a production incident with four engineers on a call, each with different information, trying to converge on a diagnosis in real time — that's a skill that only develops in real incidents.&lt;/p&gt;

&lt;p&gt;The homelab and enterprise infrastructure teach different things. The engineers I've seen grow fastest are the ones who do both — who bring the experimental instincts of the homelab into their professional work, and the operational discipline of professional work back into their homelab.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest takeaway
&lt;/h2&gt;

&lt;p&gt;I've been in infrastructure for 15 years. I manage more servers than most people will ever touch.&lt;/p&gt;

&lt;p&gt;And the most important lessons I've internalized — about persistence, about failure modes, about what monitoring is actually for, about what it means to truly understand a system — came from a homelab running on hardware in my house, where the consequences of getting it wrong were an annoyed partner and a ruined evening.&lt;/p&gt;

&lt;p&gt;The stakes were low. The learning was real.&lt;/p&gt;

&lt;p&gt;If you're an infrastructure engineer who doesn't have a homelab: build one. Not because it will make you better at your job directly. But because the freedom to break things, fix them yourself, and understand why they broke is something you can't get anywhere else.&lt;/p&gt;




&lt;p&gt;NEXUS Ecosystem is what I built in my homelab. Open source, self-hosted, Docker-native.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  devops #homelab #docker #career #sysadmin #selfhosted #programming #discuss
&lt;/h1&gt;

</description>
      <category>devops</category>
      <category>homelab</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I'm not a developer. I used AI to build a 6-tool software ecosystem anyway.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Mon, 13 Apr 2026 08:38:33 +0000</pubDate>
      <link>https://dev.to/alvarito1983/im-not-a-developer-i-used-ai-to-build-a-6-tool-software-ecosystem-anyway-18pa</link>
      <guid>https://dev.to/alvarito1983/im-not-a-developer-i-used-ai-to-build-a-6-tool-software-ecosystem-anyway-18pa</guid>
      <description>&lt;p&gt;Let me be upfront about something.&lt;/p&gt;

&lt;p&gt;I'm a Computer Science Engineer. I work at a global telecommunications company managing more than 7,000 servers spread across the world. I know infrastructure deeply — HPE, VMware, Linux, Docker, AWS. I know exactly what happens when a system fails at 2am because I'm the one who fixes it.&lt;/p&gt;

&lt;p&gt;I understand systems. I understand networks. I understand why things break and how to prevent them from breaking.&lt;/p&gt;

&lt;p&gt;What I had never done is build and launch a software product of my own. Not because I didn't understand how things work — I knew perfectly well how to design an architecture, how services need to communicate with each other, what data needs to persist and what doesn't, when to use WebSockets and when not to. That was clear to me from the start.&lt;/p&gt;

&lt;p&gt;The gap was something else: time and implementation speed. Converting a clear architecture into working code, component by component, endpoint by endpoint, takes months when you're doing it alone in your spare time.&lt;/p&gt;

&lt;p&gt;AI didn't teach me to design systems. It let me build them at the speed my mind designs them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;NEXUS Ecosystem is a suite of six self-hosted Docker management tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NEXUS&lt;/strong&gt; — container management (start, stop, deploy, terminal, metrics)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watcher&lt;/strong&gt; — automatic image update detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pulse&lt;/strong&gt; — uptime monitoring for HTTP, TCP, DNS, APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; — CVE vulnerability scanning + VirusTotal integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notify&lt;/strong&gt; — multi-channel alerts (email, Telegram, Discord)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hub&lt;/strong&gt; — central control with SSO, event bus, log center, automations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stack: React 18, Node.js, Express, Socket.io, Docker. Published on GitHub and Docker Hub. Running 24/7 on my homelab.&lt;/p&gt;

&lt;p&gt;A team of seven engineers would have taken close to a year to build this. I built it in weeks of evening sessions, as a side project, while working full time managing global infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it actually works with AI
&lt;/h2&gt;

&lt;p&gt;Everybody talks about AI writing code. That's not what happened here — or at least, that's not the useful frame.&lt;/p&gt;

&lt;p&gt;The useful frame is: &lt;strong&gt;AI changed what I could execute on at once.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As a systems engineer, I understand distributed systems deeply. I know how Docker networking works. I know what happens when a service can't reach its dependency. I know why you need persistent volumes. I know the difference between a health check and a readiness check. I know what an event bus is and why you'd use one.&lt;/p&gt;

&lt;p&gt;What I didn't have was the bandwidth to express all of that understanding in code at the speed I was thinking it. Designing the architecture takes minutes. Implementing it correctly, with all the edge cases, across six interconnected services — that's where the time goes.&lt;/p&gt;

&lt;p&gt;With Claude Code, I describe the system I want in terms I already understand deeply, and it produces the implementation. I review it, I understand it, I modify it when it's wrong — and I know when it's wrong because I understand the system. I'm the architect. AI is the contractor.&lt;/p&gt;

&lt;p&gt;I know exactly what I want built. I can specify it precisely. I can inspect the work and know when it doesn't meet the spec. I just don't have to write every line myself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The parts that were still hard
&lt;/h2&gt;

&lt;p&gt;I want to be honest about this, because most AI hype glosses over it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging distributed systems is still hard.&lt;/strong&gt; When Security was emitting events that Hub wasn't receiving, tracking down whether the problem was the event bus, the network, the auth middleware, or the Socket.io room configuration took hours. AI helped narrow it down, but it didn't eliminate the work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture decisions are still yours.&lt;/strong&gt; AI will implement whatever you ask it to. It won't tell you that your data model is wrong until you've built three features on top of it and discovered the problem yourself. The decisions that matter — how services communicate, where state lives, what the failure modes are — those are entirely on you. This is where domain expertise is irreplaceable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context management is a real skill.&lt;/strong&gt; A large codebase across six tools has more context than fits in any single session. Knowing which files to include, how to describe the current state, when to start fresh versus continue — this is something you have to learn. The CLAUDE.md file I maintain for the project is as important as any piece of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You have to know enough to know when it's wrong.&lt;/strong&gt; This is critical. When Claude Code suggested using &lt;code&gt;nice&lt;/code&gt; and &lt;code&gt;ionice&lt;/code&gt; to limit Grype's CPU usage inside a Docker container on Windows/WSL2, I knew immediately that was wrong — those Linux process priority tools don't work the way you'd expect in that environment. Someone without infrastructure experience might have shipped that and spent weeks confused. That judgment — knowing when the implementation is subtly broken — comes from years of experience, not from AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this means for the "AI will replace developers" debate
&lt;/h2&gt;

&lt;p&gt;I've been watching this debate with particular interest, because I sit in an unusual position relative to it.&lt;/p&gt;

&lt;p&gt;Here's what I actually think:&lt;/p&gt;

&lt;p&gt;AI didn't replace a developer to build NEXUS. It enabled someone with deep infrastructure and systems expertise — who already understood exactly what needed to be built — to build it without needing a dedicated development team.&lt;/p&gt;

&lt;p&gt;That's a different thing. And I think it's more interesting than the replacement narrative.&lt;/p&gt;

&lt;p&gt;The engineers I'd worry about, if I were an engineer, aren't the senior people building complex distributed systems. Their judgment, architecture instincts, and debugging skills are more valuable with AI than without — they can move faster without moving sloppier.&lt;/p&gt;

&lt;p&gt;The role that's genuinely under pressure is the one that's primarily about translation: taking a specification from a domain expert and converting it into working code. That translation work — from "I need the system to do X" to a working implementation — is exactly what AI is getting good at.&lt;/p&gt;

&lt;p&gt;But here's the flip side: if you're a domain expert who already understands what needs to be built, AI is giving you superpowers you didn't have before. Infrastructure engineers, DevOps specialists, data analysts, scientists — people who understand their domain deeply and can design the right solution — are suddenly able to build things at a speed that wasn't previously possible.&lt;/p&gt;

&lt;p&gt;That's not replacement. That's expansion.&lt;/p&gt;




&lt;h2&gt;
  
  
  The uncomfortable part
&lt;/h2&gt;

&lt;p&gt;I built software that works, that solves real problems, that has real users. It has a design system. It has real-time communication. It has security scanning. It has an SSO system. It has an event-driven automation engine.&lt;/p&gt;

&lt;p&gt;I designed every piece of it. I made every architecture decision. I knew what needed to be built and why.&lt;/p&gt;

&lt;p&gt;AI let me build it at the speed I designed it.&lt;/p&gt;

&lt;p&gt;What this tells me is that the gap between "I understand how this should work" and "I can ship this" is closing fast. For people who already have the domain knowledge and the systems thinking — and just needed the implementation bandwidth — that gap is already gone.&lt;/p&gt;

&lt;p&gt;The value of deep expertise hasn't decreased. If anything, it's more valuable now — because the people who truly understand what needs to be built can now actually build it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where NEXUS is now
&lt;/h2&gt;

&lt;p&gt;NEXUS Ecosystem is open source and running in my homelab right now — six services, unified Hub with SSO, 24/7. The plan is to publish the Hub integration formally once it's been running stably for a few weeks.&lt;/p&gt;

&lt;p&gt;If you're an infrastructure engineer, DevOps specialist, or systems person who has the domain knowledge and the architecture vision but has always needed a development team to execute it — that constraint is gone.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  ai #career #devops #docker #selfhosted #programming #claudecode #discuss
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I stopped trying to learn every DevOps tool. So I built my own.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Sun, 12 Apr 2026 12:49:48 +0000</pubDate>
      <link>https://dev.to/alvarito1983/i-stopped-trying-to-learn-every-devops-tool-so-i-built-my-own-30cp</link>
      <guid>https://dev.to/alvarito1983/i-stopped-trying-to-learn-every-devops-tool-so-i-built-my-own-30cp</guid>
      <description>&lt;p&gt;I've been a Systems Administrator for 15 years.&lt;/p&gt;

&lt;p&gt;In that time I've learned more tools than I can count. Portainer. Rancher. Kubernetes. Helm. Terraform. Ansible. Grafana. Prometheus. Datadog. PagerDuty. Slack integrations. Webhook hell. SaaS dashboards that cost $200/month and do 10% of what they promise.&lt;/p&gt;

&lt;p&gt;At some point I stopped and asked myself: what do I actually need?&lt;/p&gt;

&lt;p&gt;The answer was simpler than I expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  The tool treadmill
&lt;/h2&gt;

&lt;p&gt;Every year the DevOps landscape produces a new set of "essential" tools. Every year the community collectively decides that the previous essential tools were actually terrible and you should learn the new ones instead.&lt;/p&gt;

&lt;p&gt;I've watched this cycle repeat itself for a decade and a half.&lt;/p&gt;

&lt;p&gt;The tools change. The underlying problems don't.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to know what's running&lt;/li&gt;
&lt;li&gt;You need to know when something breaks&lt;/li&gt;
&lt;li&gt;You need to know when something needs updating&lt;/li&gt;
&lt;li&gt;You need to know if something is vulnerable&lt;/li&gt;
&lt;li&gt;You need to be alerted when any of the above happens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. That's the job. Everything else is someone else's business model dressed up as a solution to your problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The moment I decided to stop
&lt;/h2&gt;

&lt;p&gt;It wasn't dramatic. I was setting up yet another monitoring tool — configuring agents, wrestling with YAML, reading documentation that assumed I already knew the tool's proprietary concepts — and I realized I'd spent three hours doing something I could have built in an afternoon.&lt;/p&gt;

&lt;p&gt;Not because I'm a brilliant developer. Because the problem I was solving wasn't actually that complex. I was renting complexity I didn't need.&lt;/p&gt;

&lt;p&gt;So I stopped. And I started building.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built instead
&lt;/h2&gt;

&lt;p&gt;NEXUS Ecosystem is six self-hosted Docker tools that do exactly what I need and nothing else:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NEXUS&lt;/strong&gt; — container management. See what's running, start/stop/restart, deploy stacks, inspect logs, open a terminal. No Kubernetes abstractions. No cloud provider lock-in. Just Docker, directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Watcher&lt;/strong&gt; — image update detection. Scans your running containers, checks Docker Hub for newer versions, tells you what's outdated. No surprise updates. No manual checking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pulse&lt;/strong&gt; — uptime monitoring. HTTP, TCP, DNS, database, API endpoints. Know when something is down before your users do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt; — CVE scanning with Grype + VirusTotal hash analysis. Know what vulnerabilities are in your images and which ones have a fix available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notify&lt;/strong&gt; — alert routing. When something happens, tell me via email, Telegram, or Discord. One place to configure all channels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hub&lt;/strong&gt; — central control. SSO across all tools. Unified dashboard. Event bus connecting everything. Log Center with 10-day retention. Automated workflows.&lt;/p&gt;

&lt;p&gt;No vendor. No subscription. No data leaving my network. No dashboard I don't control.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that surprised me
&lt;/h2&gt;

&lt;p&gt;I expected building to be harder than using existing tools. It wasn't.&lt;/p&gt;

&lt;p&gt;The hard part of using existing tools is that they're built for everyone, which means they're optimized for no one in particular. You spend enormous amounts of time configuring things to fit your use case, fighting defaults that make sense for someone else, and working around limitations that exist for business reasons rather than technical ones.&lt;/p&gt;

&lt;p&gt;Building your own tool means you make exactly the decisions you need to make and none of the ones you don't. The scope is small because you define the scope.&lt;/p&gt;

&lt;p&gt;My first version of NEXUS took a weekend. It was rough. But it was mine, it worked, and I understood every line of it. When it broke — and it broke — I knew where to look.&lt;/p&gt;

&lt;p&gt;That's worth more than any amount of documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The AI inflection point
&lt;/h2&gt;

&lt;p&gt;I want to be honest about something: I built the early versions slowly, over months, in my spare time as a solo developer.&lt;/p&gt;

&lt;p&gt;The acceleration happened when I started using Claude Code seriously.&lt;/p&gt;

&lt;p&gt;Not because it writes perfect code — it doesn't. But because it lets me work at a different level of abstraction. Instead of writing every function, I describe systems and review implementations. Instead of debugging for hours, I describe the symptom and investigate the diagnosis together.&lt;/p&gt;

&lt;p&gt;The six-tool ecosystem I described above — with SSO, event bus, real-time logs, CVE scanning, automated workflows — was built in weeks of sessions, not months. A team of seven engineers would have taken nearly a year to build the equivalent.&lt;/p&gt;

&lt;p&gt;That's not an exaggeration. That's what changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I gave up
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend building your own tools is always better. There are real tradeoffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support&lt;/strong&gt;: When Portainer breaks, you file an issue. When my tool breaks, I fix it. That's fine when I have context. It's less fine at 2am when I've forgotten how something works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;: Portainer has years of community contributions. My tools have exactly the features I've needed so far. Some things I want don't exist yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;: I don't need it. If you do, build on something that already supports it. Don't build Kubernetes support yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time&lt;/strong&gt;: There's always more to build. It's genuinely hard to stop adding features when you're also the user.&lt;/p&gt;

&lt;p&gt;These are real costs. For my use case, the benefits outweigh them. For yours, they might not.&lt;/p&gt;




&lt;h2&gt;
  
  
  The question worth asking
&lt;/h2&gt;

&lt;p&gt;Before you learn the next tool on your list, ask yourself: what problem am I actually trying to solve?&lt;/p&gt;

&lt;p&gt;If the answer is "I need to understand this technology for my job" — learn it. The knowledge compounds.&lt;/p&gt;

&lt;p&gt;If the answer is "I need to solve this specific problem" — consider whether the tool is actually the minimum solution or whether you're reaching for it out of habit.&lt;/p&gt;

&lt;p&gt;Sometimes the right tool is one you build yourself. Most of the time it isn't. The skill is knowing the difference.&lt;/p&gt;

&lt;p&gt;After 15 years, I'm still learning that difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where NEXUS is now
&lt;/h2&gt;

&lt;p&gt;NEXUS Ecosystem is open source. Six tools, unified Hub, Docker-native, self-hosted.&lt;/p&gt;

&lt;p&gt;I'm running it in my homelab and on an internal test server right now — 24/7, watching it fail in interesting ways and fixing what breaks. The plan is to publish the Hub integration formally once it's stable.&lt;/p&gt;

&lt;p&gt;If you're tired of the tool treadmill and want to see what a self-hosted alternative looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  devops #docker #selfhosted #opensource #career #programming #sysadmin #productivity
&lt;/h1&gt;

</description>
      <category>devops</category>
      <category>docker</category>
      <category>selfhosted</category>
      <category>career</category>
    </item>
    <item>
      <title>I scanned my own Docker images. Here's what I found — and how I built the scanner.</title>
      <dc:creator>Alvarito1983</dc:creator>
      <pubDate>Sat, 11 Apr 2026 09:25:34 +0000</pubDate>
      <link>https://dev.to/alvarito1983/i-scanned-my-own-docker-images-heres-what-i-found-and-how-i-built-the-scanner-58cd</link>
      <guid>https://dev.to/alvarito1983/i-scanned-my-own-docker-images-heres-what-i-found-and-how-i-built-the-scanner-58cd</guid>
      <description>&lt;p&gt;I scanned my own Docker images today.&lt;/p&gt;

&lt;p&gt;I wasn't expecting much. These are images I built myself, running in my homelab, on a private network. I update them regularly. I'm a Senior Systems Administrator — I know what I'm doing.&lt;/p&gt;

&lt;p&gt;308 vulnerabilities. 10 Critical. 298 High.&lt;/p&gt;

&lt;p&gt;The Critical ones? Two axios vulnerabilities — SSRF bypass and Cloud Metadata Exfiltration — sitting in production code I wrote. Fixed in axios 1.15.0. I was running 1.14.0.&lt;/p&gt;

&lt;p&gt;This is the Docker security problem nobody talks about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The invisible attack surface
&lt;/h2&gt;

&lt;p&gt;When you run a Docker container, you're not just running your code. You're running:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your application dependencies (npm packages, pip packages, gems)&lt;/li&gt;
&lt;li&gt;The base image (node:alpine, python:slim, ubuntu)&lt;/li&gt;
&lt;li&gt;Every library those pull in transitively&lt;/li&gt;
&lt;li&gt;The OS-level packages in the container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each layer is a potential attack surface. And unlike your application code — which you read, review, and update — the dependencies update silently, the base images accumulate CVEs, and the transitive dependencies are invisible.&lt;/p&gt;

&lt;p&gt;Most developers have no idea what's actually running inside their containers.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built: NEXUS Security
&lt;/h2&gt;

&lt;p&gt;NEXUS Security is a module of the NEXUS Ecosystem — a suite of self-hosted Docker management tools. Security's job is to give you visibility into what's actually running in your containers and flag what's dangerous.&lt;/p&gt;

&lt;p&gt;It does two things that most tools don't combine:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CVE scanning with Grype&lt;/strong&gt; — vulnerability detection against the container's actual content&lt;br&gt;
&lt;strong&gt;Hash analysis with VirusTotal&lt;/strong&gt; — malware detection at the image layer level&lt;/p&gt;

&lt;p&gt;These are fundamentally different approaches. Understanding why both matter requires understanding how Docker images actually work.&lt;/p&gt;


&lt;h2&gt;
  
  
  How Docker images work (the part that matters for security)
&lt;/h2&gt;

&lt;p&gt;A Docker image isn't a monolithic file. It's a stack of layers, each one a filesystem delta from the previous.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nexus-nexus-hub:latest
├── Layer 1: node:22-alpine (base OS + Node.js runtime)
├── Layer 2: npm install (your dependencies)
├── Layer 3: COPY . . (your application code)
└── Layer 4: entrypoint configuration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer has a SHA256 hash. The final image hash is derived from all layers combined.&lt;/p&gt;

&lt;p&gt;This structure matters for security in two ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CVE scanning&lt;/strong&gt; looks &lt;em&gt;inside&lt;/em&gt; the layers — it reads the package manifests, identifies installed software, and checks each one against vulnerability databases. Grype does this: it extracts the SBOM (Software Bill of Materials) from the image and cross-references against NVD, GitHub Advisory, and other databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hash analysis&lt;/strong&gt; looks at the layer hashes themselves — it asks "has this specific layer ever been seen containing malware?" VirusTotal has seen billions of files. If a layer hash matches something previously flagged, it surfaces immediately.&lt;/p&gt;

&lt;p&gt;Neither approach alone is sufficient. Grype catches known CVEs in legitimate software. VirusTotal catches tampered or malicious images that might pass CVE scanning cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Grype integration
&lt;/h2&gt;

&lt;p&gt;Grype is an open source vulnerability scanner from Anchore. It runs as a CLI tool, takes an image reference, and outputs a structured JSON report of every vulnerability found.&lt;/p&gt;

&lt;p&gt;Installing it in the Security container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; curl &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="nt"&gt;-sSfL&lt;/span&gt; https://raw.githubusercontent.com/anchore/grype/main/install.sh &lt;span class="se"&gt;\
&lt;/span&gt;    | sh &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;-b&lt;/span&gt; /usr/local/bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running a scan from Node.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;execSync&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;child_process&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scanImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageRef&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`grype &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imageRef&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; -o json --quiet`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;maxBuffer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vulnerability&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vulnerability&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;artifact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;artifact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;fixedIn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vulnerability&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fix&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;versions&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vulnerability&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scan is asynchronous — for large images it takes 30-60 seconds. NEXUS Security returns a &lt;code&gt;scanId&lt;/code&gt; immediately and emits a Socket.io event when complete, updating the UI in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the scan actually found
&lt;/h2&gt;

&lt;p&gt;Running against my own ecosystem — six Docker images I built myself:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;th&gt;High&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;nexus-nexus-hub&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nexus-nexus-watcher&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nexus-nexus-security&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nexus-nexus&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nginx:latest&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;209&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tecnativa/docker-socket-proxy&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two Critical in Hub and Watcher were identical: &lt;code&gt;axios 1.14.0&lt;/code&gt; with two separate CVEs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GHSA-3p68-rc4w-qgx5&lt;/strong&gt;: NO_PROXY Hostname Normalization Bypass → SSRF&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GHSA-fvcv-3m26-pcqx&lt;/strong&gt;: Unrestricted Cloud Metadata Exfiltration via Header Injection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both fixed in &lt;code&gt;axios 1.15.0&lt;/code&gt;. I updated all backends immediately.&lt;/p&gt;

&lt;p&gt;The nginx findings are worth noting: 209 vulnerabilities, none Critical. This is normal — nginx:latest carries a lot of system library CVEs that have no patches available. The signal-to-noise problem in Docker security is real. A tool that shows you 209 vulnerabilities without context is almost worse than no tool at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  The VirusTotal integration
&lt;/h2&gt;

&lt;p&gt;VirusTotal's API v3 lets you check a file hash against their database of 70+ antivirus engines. For Docker images, you extract the layer digests and check each one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scanWithVirusTotal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Get image manifest to extract layer hashes&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inspect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;execSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`docker inspect &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imageRef&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sha256:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Query VirusTotal&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`https://www.virustotal.com/api/v3/files/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imageId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-apikey&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Not seen by VirusTotal&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;last_analysis_stats&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;malicious&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;malicious&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;suspicious&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;suspicious&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;undetected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;permalink&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`https://virustotal.com/gui/file/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imageId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The free VirusTotal tier allows 500 requests/day — more than enough for a homelab or small team.&lt;/p&gt;




&lt;h2&gt;
  
  
  The event-driven alert system
&lt;/h2&gt;

&lt;p&gt;Scanning is only useful if someone sees the results. NEXUS Security integrates with the Hub's event bus — when a Critical vulnerability is found, it emits an event that flows through the entire ecosystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Security emits when Critical found&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;emitEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vulnerability.critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;cve&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;package&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;imageRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;fixedIn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;vuln&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fixedIn&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Hub flow engine receives and routes to Notify&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vulnerability.critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vulnerability.critical&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notify&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;🔴 Critical vulnerability: {cve} in {package} ({image}) — fix: {fixedIn}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Grype detects Critical CVE
        ↓
Security emits event to Hub
        ↓
Hub flow engine processes it
        ↓
Notify sends to all active channels
        ↓
Email / Telegram / Discord alert
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No polling. No manual checking. The moment a scan finds something Critical, it lands in your inbox.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's coming when Security ships as a Hub module
&lt;/h2&gt;

&lt;p&gt;The current implementation is functional but early. When NEXUS Security ships as a full Hub module, the plan includes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled scanning&lt;/strong&gt; — automatic scans on a configurable schedule, not just on demand. Images get re-scanned when Watcher detects an update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Baseline suppression&lt;/strong&gt; — mark known/accepted vulnerabilities so they don't keep triggering alerts. The noise problem is real; you need a way to say "I know about this one, it has no fix, stop telling me."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-host visibility&lt;/strong&gt; — Hub already manages multiple hosts. Security will aggregate vulnerability data across all of them. One dashboard, complete picture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SBOM export&lt;/strong&gt; — export the Software Bill of Materials for each image in standard formats (SPDX, CycloneDX) for compliance and audit trails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix suggestions&lt;/strong&gt; — when a vulnerability has a fixed version, Security will suggest the exact package.json or Dockerfile change needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The uncomfortable conclusion
&lt;/h2&gt;

&lt;p&gt;I built these tools. I ran them on a private network. I updated them regularly. And I still had two Critical vulnerabilities in production because I wasn't tracking transitive dependency updates in axios.&lt;/p&gt;

&lt;p&gt;The uncomfortable truth about Docker security is that you can do everything right — write good code, review PRs, keep your application logic clean — and still be exposed by a dependency you didn't know you were running.&lt;/p&gt;

&lt;p&gt;The answer isn't paranoia. It's visibility. Know what's running. Know what's vulnerable. Know what has a fix. Act on the Critical ones.&lt;/p&gt;

&lt;p&gt;Everything else is acceptable risk, consciously taken.&lt;/p&gt;




&lt;p&gt;NEXUS Security is part of NEXUS Ecosystem — open source, self-hosted Docker management.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: github.com/Alvarito1983&lt;/li&gt;
&lt;li&gt;Docker Hub: hub.docker.com/u/afraguas1983&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  docker #security #opensource #selfhosted #devops #programming #devsecops #nodejs
&lt;/h1&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>devsecops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
