DEV Community: Alvarito1983

I asked Claude Code to refactor a feature it built last week. It hated its own code.

Alvarito1983 — Mon, 27 Apr 2026 08:50:42 +0000

Last week I opened a feature I'd built six days earlier and asked Claude Code to extend it.

I hadn't touched the file since. The code was working. Tests passing. Nothing was on fire. I just wanted to add a small capability — fifteen minutes of work, maybe.

What I got back wasn't an extension. It was a quiet roast.

"Before extending this, I'd recommend refactoring. The current implementation has a few issues: the error handling is inconsistent, the function is doing more than its name suggests, and there's a duplicated validation block that should be a helper. Want me to fix those first?"

Reader, that code was written by Claude Code. Same project. Same model tier. Six days earlier. By the same agent that was now telling me, with polite confidence, that it was a mess.

I sat with that for a minute. Then I started thinking about what it meant. Because the more I thought, the more I realized that this — this exact moment — is the thing nobody is writing about with AI coding agents.

The "memory problem" everyone complains about

If you've used Claude Code, Codex, Cursor in agent mode, or any of the rest, you've hit it: agents don't have continuity between sessions. Every new conversation starts cold. They don't remember what they decided last Tuesday. They don't remember why a particular abstraction exists. They don't remember that you already debated naming this function and settled on something specific.

Most of the discourse treats this as a problem to solve. We bolt on memory systems. We pile up CLAUDE.md files. We build context-loading scripts. The implicit assumption is that the agent should remember, and the work is to compensate for the fact that it doesn't.

I'm starting to think that framing is exactly backwards.

What actually happened in that session

When Claude Code criticized its own code from six days earlier, it wasn't being inconsistent. It wasn't malfunctioning. It was being honest — in a way no human collaborator can.

Think about what just happened:

A reviewer with no ego attachment to the code looked at it fresh.
It had no memory of why we wrote it that way, what we'd ruled out, what compromises we made under time pressure, or what I muttered at 11pm when the test finally passed.
It just looked at the code on its merits and said: this could be better.

That's not a bug. That's the cleanest code review you'll ever get.

A human reviewing their own code from a week ago has every cognitive bias working against them. They remember the constraints. They remember the deadline. They remember that they "knew" the helper was duplicated but didn't want to break the function signature. They protect their past decisions because their past decisions are theirs.

Claude Code, with no continuity, has none of that. Every session is a new pair of eyes that hasn't been bought off by yesterday's reasoning.

The reframe

Once I saw this, I started doing it on purpose.

When I finish a feature now, I don't immediately ship it. I commit, sleep on it, and the next morning I open a new Claude Code session and ask it to review the code as if it had never seen it. Because for that session, it hasn't.

The reviews are brutal. They are also, almost always, right.

It catches:

Functions that are doing two things and lying about it in their name
Error handling that's defensive in some places and absent in others
Helpers I should have extracted but didn't
Naming that made sense in context but doesn't survive cold reading
Tests that pass but verify the wrong thing

The same agent that wrote the code can't see these issues while it's writing, because it has the same tunnel vision a human does in flow. But strip its memory of the session, hand it the file, and it becomes the reviewer it could never be in real-time.

The "memory problem" is actually a feature gate to a different mode of useful behavior. We've been trying to remove it instead of using it.

Why this changes how I think about CLAUDE.md

I've written about CLAUDE.md before, and I want to refine what I said there, because this experience clarified something for me.

CLAUDE.md is not the agent's memory. That framing is wrong.

CLAUDE.md is the project's memory of decisions that shouldn't be relitigated every session. Things like:

"We use Tailwind v3, not v4. Don't migrate."
"The auth layer is intentionally simple. Don't add OAuth without asking."
"Tests live next to source files, not in /tests."

These are decisions, not style. CLAUDE.md exists so the agent doesn't waste a session arguing with itself about settled choices.

But CLAUDE.md should not contain things like:

"This function does X" (the code says that)
"Last week we refactored Y" (that's history, not a decision)
"The pattern we use is Z" (let the code be the source of truth)

The mistake is using CLAUDE.md as a memory dump. When you do that, you're trying to give the agent continuity — and you lose the fresh-eyes review benefit. You've turned your honest reviewer into a yes-man who already agrees with everything you've decided.

The right CLAUDE.md is short. It pins down decisions. It leaves judgment alone, so a new session can still tell you when your code is bad.

The mental model shift

Stop thinking of an AI coding agent as one consistent collaborator across time.

Start thinking of it as a high-quality reviewer you can summon, fresh, as many times as you want — provided you preserve the decisions, not the reasoning.

This changes the workflow:

Write code with the agent in a session.
Commit when it works.
Open a new session the next day. Ask it to review.
Take the criticism seriously. It's coming from someone who isn't defending yesterday's choices.
If the criticism is wrong, that's a signal that a decision is missing from CLAUDE.md.
If the criticism is right, fix it. The fact that the same agent wrote the code is irrelevant.

I've started shipping noticeably cleaner code since I adopted this. Not because Claude Code got smarter. Because I stopped trying to glue together its sessions and started using the gaps between them.

The closing thought

There's a lot of energy right now going into making AI agents more continuous, more memory-rich, more "aware" across sessions. Long-running agents. Persistent context. Memory layers.

I'm not against any of that. But I think the rush is hiding something valuable. The discontinuity between sessions is the thing that lets the agent be a real reviewer instead of a sympathetic teammate. Once you give it perfect memory, it stops being able to look at your code from outside, and it starts protecting your past decisions the way a human would.

Maybe the goal isn't to give agents continuity. Maybe it's to give them just enough of it — your settled decisions — and protect the rest. Let them forget how the code got that way. Let them tell you, fresh, that it's bad.

Mine did. It was right.

Has Claude Code ever roasted its own past output in your projects? I want to hear the worst one — drop it in the comments.

The AI Coding Tools Panorama in 2026: From Claude Code to the Free Alternatives That Actually Replace It

Alvarito1983 — Sat, 25 Apr 2026 16:18:39 +0000

The AI coding tool space in 2026 looks nothing like it did 18 months ago. Autocomplete is a solved problem. The interesting question now is: which agent do you trust to read your codebase, plan a refactor, run your tests, and not torch your API budget while doing it.

I've been using these tools daily for the last year on a real project — a Docker management suite I'm building solo, where the cost of a bad refactor is hours of cleanup. That context matters, because most "best AI coding tools" lists rank by benchmark scores. Benchmarks measure isolated tasks. They don't measure what happens at hour three of a complex migration when the agent forgets which files it already touched.

This is the honest panorama from someone shipping production code. From the ones everyone knows to the ones that quietly outperform them. Then two opinionated rankings: the five paid tools that justify their price, and the five free ones that make you question whether you need to pay at all.

Let's get into it.

The full panorama

The space has split into clear layers. Most developers running serious AI workflows use 2–3 tools, not one — and once you understand the layers, the picking gets easier.

Layer 1: Inline assistants (autocomplete + chat in your editor)

These are what most people still think "AI coding" means. They suggest the next line, complete a function, answer a quick question. Low cognitive overhead, high frequency, low ambition.

GitHub Copilot is the default. 76% of developers have heard of it, 29% use it at work, and at $10/month for Pro it's the cheapest entry point that doesn't feel like a compromise. It ships in every editor that matters and just works. The growth has stalled — adoption has plateaued — but it's still the safest choice for someone who wants AI in their editor and doesn't want to think about it.

JetBrains AI Assistant + Junie is the equivalent for the JetBrains crowd. 11% adoption combined. If you live in IntelliJ, PyCharm, or WebStorm, it's the natural fit because it understands JetBrains' code intelligence in ways Copilot doesn't.

Tabnine still exists and still gets recommended for teams that need on-prem or air-gapped deployment. Outside that niche, it's been overtaken.

Continue.dev is the open-source Copilot-shaped option. Lives inside VS Code or JetBrains as an extension, brings your own API key, 31K GitHub stars. Less polished than Copilot, but you're not on a per-seat license and you choose your model.

Layer 2: AI-native IDEs (the editor itself is the product)

These are full editor replacements where the AI is woven into every keystroke, not bolted on as an extension.

Cursor is the reference. A VS Code fork that took "AI in the editor" further than anyone else. Composer mode handles multi-file edits with visual diffs, autocomplete is supernaturally fast (Supermaven under the hood), and the agent can take on actual tasks. It's the most productive IDE-based AI experience right now, with one giant asterisk: pricing trust took a hit earlier in 2026 when Anthropic's pricing changes cascaded through Cursor's billing model and a lot of people got surprise bills. The product is still excellent. The pricing is the part you have to watch.

Windsurf is the value alternative. Same category as Cursor, $15/month base, free tier with full IDE features and the Cascade agent. It's been climbing the rankings precisely because it's what Cursor was before the pricing got messy.

Antigravity is Google's entry, launched November 2025. Free during preview. Supports Claude, Gemini, GPT-OSS — the most diverse model lineup of any free tool right now. 6% adoption already, which is fast for a tool this new.

Layer 3: Terminal agents (the new center of gravity)

This is where serious work is happening in 2026. You point an agent at your repo from the terminal, and it reads, edits, runs tests, iterates. The terminal-first approach composes with everything: git, your shell, your CI, your existing scripts.

Claude Code is the one I use daily and the one most experienced developers have settled on. Anthropic's official terminal agent, runs Opus and Sonnet, scores 80.8% on SWE-bench Verified — meaning it actually solves real GitHub issues, not toy problems. It's the best at multi-file reasoning and the best at not losing context on hour three of a complex task. Costs $20/month for Pro, but heavy use can run $100–200/month on the Max plan or via API. That's the elephant in the room.

OpenAI Codex CLI re-entered the conversation in early 2026 with parallel sandboxed execution and automatic PR creation. 3% adoption (data from before its desktop app launched), but climbing. Strong choice if you're already in the OpenAI ecosystem.

Gemini CLI is the underrated one. Google's terminal agent. 1,000 free requests/day with Gemini 2.5 Pro and a 1M context window. Less consistent than Claude on complex refactors, but for the price (zero) and the context size (massive), it's the best free terminal option right now. Don't skip it.

Aider is the elder statesman. 41K GitHub stars, terminal-first, and its defining feature is that every AI edit is a git commit. You get a complete audit trail of what the AI changed and why. Bring your own API key. If you live in git, this is the one to try first.

OpenCode is the most popular open-source terminal agent. 95K stars. Provider-agnostic — supports 75+ models. Free models included, plus you can plug in any API key. Cleaner TUI than Aider, less opinionated about git.

Cline (58K stars) and its forks Roo Code (22K) and Kilo Code (16K) live as VS Code extensions but operate as full agents — they edit files, run commands, and ask for approval at each step. BYOK with zero markup. If you want Cursor-style agentic work but inside vanilla VS Code without the subscription, this is the path.

Layer 4: Cloud agents (autonomous, parallel, expensive)

These run in the cloud, often handling multiple tasks in parallel sandboxes, opening PRs against your repo while you do something else.

Devin is the original autonomous agent. 67% PR merge rate on well-defined tasks. Treats coding tasks like Jira tickets it picks up and ships. $20/month base plus unpredictable per-task costs. Useful as an accelerator for narrow, well-scoped work; treat anything it ships as a draft.

Codex (cloud version) is OpenAI's hosted agent that runs in sandboxed environments and pushes PRs to GitHub. Strong if your stack is OpenAI-aligned.

OpenHands (formerly OpenDevin) is the open-source version of this category. 68K stars, MIT-licensed, BYOK. If you want a cloud-style autonomous agent without the SaaS dependency, this is it.

Layer 5: Specialized tools (the rest)

Replit still owns the "build an app from a prompt in your browser" niche, especially for prototyping and education. Costs accumulate fast at scale.

Bolt.new and Lovable.dev are similar — browser-based, AI-first, great for MVPs and demos. Lovable couples to Supabase tightly, which is convenient until you outgrow it.

v0 (Vercel) generates React components from prompts or Figma designs. Useful for design-to-code, less so for general work.

Tabby is self-hosted autocomplete you run on your own GPU. The privacy-first option for teams that can't send code to anyone.

Snyk Code and Qodo sit alongside the rest of the stack — security scanning and AI code review on PRs. Not coding tools strictly, but they're part of the modern AI workflow.

Top 5 paid tools that actually justify their price

I'm ranking these on a single criterion: would I be measurably less productive without them. Not "is the demo impressive." Not "did they raise a Series B." Would my output drop if I uninstalled it tomorrow.

1. Claude Code — $20–200/month

Best for: complex multi-file work, anything where the model losing context costs you an hour of cleanup.

Claude Code on Opus is the only tool I trust with a refactor that touches more than three files. It plans, executes, runs my tests, and recovers from its own mistakes. The 200K context window plus extended thinking actually changes how I architect things — I can describe a problem at a higher level than I used to.

The price is real. Heavy use can run $100–200/month on Max. The honest framing is: if you're a working developer billing for your time, $200/month for the tool that saves you a few hours a week is the cheapest line item on your invoice. If you're a hobbyist, this isn't the right tier.

2. Cursor — from $20/month

Best for: developers who think visually and want diffs they can scan instead of accept.

If you do most of your work in an editor and the terminal-first model doesn't fit your brain, Cursor is the most productive AI IDE in existence. Composer mode for multi-file changes, instant autocomplete, model orchestration. The pricing trust issue I mentioned is real — monitor your credits, especially if you flip into agent mode for big tasks. With that caveat, it earns its place here.

3. GitHub Copilot — $10/month

Best for: the developer who wants AI in their editor and wants to never think about it again.

Copilot earned its place not by being the best, but by being the floor. $10/month, works everywhere, never surprises you with a bill, integrates with the GitHub ecosystem (PR summaries, issue context, repo activity). It's the silent default. Stop overthinking.

4. Windsurf — from $15/month

Best for: people who want what Cursor was before the pricing drama.

Windsurf does what Cursor does, charges less, and the free tier is genuinely usable for daily work. Cascade agent, plan mode, parallel multi-agent sessions with git worktrees. It's the IDE I'd recommend to someone starting from scratch in 2026 if cost predictability matters.

5. Devin — $20/month + variable

Best for: well-scoped, repetitive tasks you can describe in a paragraph.

Devin shipping a 67% PR merge rate on defined tasks is the data point that matters. It's not autonomy in the science-fiction sense — it's a junior engineer who works on tickets in parallel sandboxes and submits drafts. For the right kind of work (boring but real: dependency upgrades, test coverage, scoped refactors) it earns its keep. For ambiguous work, you'll spend more time correcting it than you save.

Top 5 free tools that genuinely replace a paid one

The bar here: would I recommend this to someone who can't or won't pay, knowing they'll get within 80% of the paid experience.

1. Gemini CLI — free (1,000 requests/day)

Best free terminal agent in 2026, full stop. Gemini 2.5 Pro, 1M context window, runs in your terminal, BYOK with a generous free tier from Google. It's not as consistent as Claude Code on the hardest tasks, but for 90% of work, the gap doesn't matter. And the 1M context window means it can ingest a small codebase in a single prompt — something even paid Claude Code can't always match.

2. Aider — free (BYOK)

Best for terminal-native, git-centric work. Pair it with the DeepSeek API and you're paying $5–15/month total for AI coding that competes with $200/month tools. Every edit is a git commit. Reviewable, revertible, auditable. If you've ever lost track of what an agent changed across a session, Aider's commit history is the answer.

3. Cline — free (BYOK, runs in VS Code)

Best free agent inside an IDE. 5M+ installs, Apache-licensed, every action requires human approval. Plug in your Claude or OpenAI key and you have Cursor's agent capability inside vanilla VS Code without the subscription. The forks (Roo Code, Kilo Code) add structured modes and broader model support — pick whichever feels right; they're all good.

4. GitHub Copilot Free — free (2,000 completions + 50 chats/month)

The free tier of Copilot is real and surprisingly generous for casual or learning use. If you're not coding 8 hours a day, the free quota covers it. The path most people should take: start here, find your edges, then decide whether to pay or move to BYOK.

5. OpenCode — free (BYOK, free models included)

The open-source terminal agent that's quietly become the most popular on GitHub (95K stars). Ships with free models you can use immediately, supports 75+ providers when you bring keys, polished TUI. If you want to try terminal agents without making any decisions about API keys or pricing on day one, OpenCode is the lowest-friction starting point.

What I actually run

For full transparency, since I think it matters more than abstract rankings: I use Claude Code on Max for the heavy lifting, Cline in VS Code for in-editor work where I want approval gates, and Gemini CLI for anything where I want to throw a huge codebase at a model in one prompt. Three tools, three roles, no overlap.

That's the meta-point. In 2026, the question is no longer "which is the best AI coding tool." It's "which combination handles each layer of my workflow without breaking under pressure." Get that right and the cost question answers itself — because the right setup pays for itself in saved hours, and the wrong setup torches credits without shipping anything.

If your current setup is one tool doing everything, you're probably either overpaying for capability you don't use or underpowered for the work you actually do. Pick the layer that hurts most and start there.

What does your stack look like? Curious especially about people running fully BYOK setups — what's the monthly bill landing at versus the equivalent SaaS subscriptions?

Quadlet: The Podman Feature That Finally Makes Sense on a Homelab

Alvarito1983 — Thu, 23 Apr 2026 08:45:09 +0000

If you run a homelab on Docker Compose, you've probably accepted a quiet trade-off without noticing.

Your containers work. They start on boot. They restart if they crash. But they live outside the operating system's service model — orphaned from systemd, invisible to journalctl, dependent on a daemon that has to be alive before anything else in your stack can exist. Your 40 services are really 40 children of a single PID, not 40 first-class citizens of your Linux host.

This isn't a problem, exactly. It's just a ceiling. And over the last year a feature from the Podman side of the container world has quietly raised that ceiling in a way that's interesting enough to write about — even if you, like me, have zero intention of ripping out your Docker Compose setup tomorrow.

The feature is called Quadlet. It has been around since Podman 4.4, but 2026 is the year it's actually becoming the default way sysadmins on Red Hat-adjacent systems manage containers at home. And it does something I've not seen any other container tool do cleanly: it makes a container behave like a native systemd service.

Let's look at what that actually means, what it's good for, and — importantly — when you should probably ignore it.

What Quadlet actually is, in one paragraph

Quadlet is a systemd generator that reads declarative .container files (plus .network, .volume, .pod, .kube, .image and a few others) and converts them at boot time into real systemd service units. You don't run podman run and you don't write unit files by hand. You drop an INI-style file into a specific directory, run systemctl daemon-reload, and now you have a fully managed service with restart policies, dependency management, journal logging, healthchecks, and optional auto-updates — all driven by systemd, not by a background daemon.

That's the whole idea. The container is no longer something Podman runs on the side. It's something your operating system runs, the same way it runs sshd or nginx.

The simplest possible example

A minimal rootless Quadlet for something useful — let's say Vaultwarden — looks like this:

# ~/.config/containers/systemd/vaultwarden.container
[Unit]
Description=Vaultwarden password vault
After=network-online.target

[Container]
Image=docker.io/vaultwarden/server:latest
ContainerName=vaultwarden
PublishPort=8080:80
Volume=vaultwarden-data.volume:/data
AutoUpdate=registry

[Service]
Restart=always
TimeoutStartSec=300

[Install]
WantedBy=default.target

Then:

systemctl --user daemon-reload
systemctl --user start vaultwarden.service

That's it. There's no vaultwarden.service file on disk — Quadlet generates it on the fly. If you edit the .container file and reload, the generated unit updates. If a newer Podman version ships improvements to the generator, your service picks them up next reboot without you touching anything.

The volume referenced as vaultwarden-data.volume is another Quadlet file (a .volume), which generates a named volume managed by systemd the same way.

What you actually gain

This is the part that took me a while to appreciate. It's not "Podman has replaced Docker." It's that the integration layer is fundamentally different.

Unified logging. journalctl --user -u vaultwarden.service gives you the container's logs in the same place and same format as the rest of the system. No docker logs, no parallel logging story. When something breaks at 2am, you're searching one place.

Real dependency management. Want your Nextcloud container to only start after MariaDB is ready and network-online.target has fired? Write it in the [Unit] section like any other service. systemd handles the ordering, not a depends_on line that only waits for process start.

Restart policies and health-triggered recovery. Quadlet exposes HealthCmd, HealthInterval, HealthOnFailure, HealthRetries directly. Combined with HealthOnFailure=kill and systemd's Restart=always, you get a self-healing service loop that doesn't depend on anything watching from outside.

Auto-update with automatic rollback. Add AutoUpdate=registry and the podman-auto-update.timer checks daily for new image digests. If the new image fails the healthcheck after restart, Podman rolls back to the previous image automatically. No Watchtower, no cron hack, no custom script — and crucially, no blind latest tag chasing that silently deploys a broken image.

Rootless by default. The file above runs under your user account. A container breakout gets your UID, not root. For a homelab exposing services, this is a meaningful reduction in blast radius.

No daemon to keep alive. There's no dockerd that has to be up before your stack exists. Containers are direct children of systemd. If Podman itself has a bug tomorrow, the contract with your services is still "this is a systemd unit" — same tools, same lifecycle, same recovery procedures you already know.

The comparison that matters: Quadlet vs Docker Compose

I want to be honest here because most of what's written online about Podman/Quadlet treats Docker Compose as the enemy. It isn't. Compose and Quadlet solve overlapping problems with very different philosophies.

Docker Compose thinks in stacks. One docker-compose.yml describes a self-contained application: services, networks, volumes, dependencies, all in one file. You docker compose up -d and the stack exists. It's portable across any machine with Docker. You can commit it to a git repo and anyone can reproduce your environment. The mental model is: an application is a folder.

Quadlet thinks in system services. Each container is its own unit file. Networks and volumes are their own unit files. The relationships are expressed through systemd's dependency graph, not through a single manifest. The mental model is: a container is a service that happens to run in a container.

For development, portability, and throwaway environments, Compose wins. It's simpler, the tooling is mature, and the ecosystem is vast. For a long-lived single-host deployment where you want containers to behave like every other service on your Linux box, Quadlet wins.

The honest answer for most homelabbers is that both are valid, and which one fits depends on what you actually value.

When Quadlet is the wrong answer

Skip this feature if any of the following apply:

Your homelab runs on Windows / WSL2 / macOS. Quadlet is Linux + systemd. Period. If you're on Docker Desktop, this isn't for you.
You're deeply invested in Compose-specific features like profiles, extends, x-* anchors, or the Compose CLI's build workflows. podman-compose exists but compatibility isn't 100%, and Quadlet isn't trying to replicate Compose semantics.
You want GUI-first management. Portainer, Dockge, and the like are built around Docker. Cockpit has Quadlet integration now, but the ecosystem is thinner.
You share stacks across machines with non-Linux users. Compose files are more portable as artifacts than a folder of .container units tied to one host's systemd tree.
GPU passthrough for media transcoding is a hard requirement and you have it working on Docker. It's doable on Podman, but expect extra work. If your Jellyfin is happy today, don't break it for philosophy.

When Quadlet is genuinely the right answer

Reach for it when:

You run a Linux-native homelab on Fedora, Rocky, Alma, CentOS Stream, Debian, or openSUSE and your containers are long-lived services rather than dev environments.
You care about boot ordering. "Start reverse proxy after databases after network is up" is a one-liner in systemd, and a fragile convention in Compose.
You want auto-update with rollback without bolting on a third-party tool.
You're tired of the Docker daemon being a single point of failure for your entire container story.
You're already comfortable with systemd as an operator. This is the key one. If systemctl, journalctl, and unit file syntax feel natural to you, Quadlet is going to feel like containers finally speaking your language. If they don't, the learning curve is real and not worth it for a Plex install.

The hybrid setup that actually makes sense

This is where I've landed, and I think it's where most practical people will land:

Docker Compose for development and prototyping. Fast iteration, familiar tooling, easy to share. If I'm testing a new self-hosted tool for the first time, it's going into a Compose file in a scratch directory.
Quadlet for the services I've committed to. Once something graduates from "I'm trying this out" to "this runs my household," it's worth the 15 minutes to port it to a .container file and let systemd own its lifecycle.

You don't have to pick one. On a Linux homelab you can run both side by side. The Docker daemon and Podman's rootless stack don't fight each other — they just don't talk.

The honest closing thought

Quadlet isn't going to replace Docker Compose in most homelabs, and I don't think it should. Compose is good at what Compose is good for. What Quadlet does is close a gap I didn't realize I had: my containers were never really part of my operating system. They were tenants. With Quadlet, for the services where it matters, they become residents.

That's a small distinction until it isn't — until the night something breaks and you realize your recovery story is the same one you use for every other service on the box, instead of a separate container-shaped exception.

If you run Linux and you've been curious about Podman but couldn't articulate what the actual upgrade was, this is it. Not the daemon thing. Not the rootless thing in isolation. The integration.

Try it on one service. See how it feels. If it clicks, you'll know. If it doesn't, your Compose file is still there, and nothing has been lost.

Have you moved any part of your stack to Quadlet, or tried and bounced off it? I'd like to hear which services worked cleanly and which were more trouble than they were worth — drop it in the comments.

The WSL2 Guide I Wish I Had: 4 Gotchas That Will Eat Your Afternoon

Alvarito1983 — Wed, 22 Apr 2026 12:38:57 +0000

WSL2 is a fantastic development environment on Windows. It's also a system with sharp edges that the official docs rarely highlight — the kind you only discover after losing an afternoon to a process eating 300% CPU for no apparent reason.

This guide documents four specific problems I've hit repeatedly over the last year while using WSL2 as my main development environment for Docker-based projects. For each one: the root cause, why the obvious fix doesn't work, and what actually solves it.

This isn't an introduction to WSL2. If you're already using it daily and something feels off, keep reading.

1. Docker Desktop, cgroups, and processes that ignore resource limits

The symptom

You run a container on Docker Desktop for Windows (which uses WSL2 under the hood). The container executes a CPU-intensive process — a vulnerability scanner, a compiler, a batch job.

You watch htop and the process is consuming 300%+ CPU, dragging the entire system down.

You think: "no problem, I'll throttle it."

services:
  heavy-worker:
    image: my-scanner:latest
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
    cpu_count: 1

You restart the container.

Still 300% CPU.

You try nice, ionice, cpulimit... nothing works.

The root cause

Docker Desktop runs containers inside a WSL2-hosted VM using cgroup v2, often with limited controllers.

The result:

deploy.resources.limits → ignored
cpu_count → ignored
nice / ionice → ineffective

You can verify:

cat /sys/fs/cgroup/cgroup.controllers

You'll often see fewer controllers than on a native Linux system.

What actually works

Option A (best): Limit the WSL2 VM

[wsl2]
processors=4
memory=8GB
swap=2GB

Then:

wsl --shutdown

Now the VM is capped → containers behave predictably.

Option B: Use tool-level throttling

Examples:

--parallel=1
--low-mem

These bypass scheduler issues entirely.

Option C: Replace the tool

If it's designed to max all cores and can't be tuned → wrong tool for WSL2.

Key takeaway

Don't trust container limits on WSL2. Control the VM or use self-throttling tools.

2. Disk performance: `/mnt/c` vs native WSL filesystem

The symptom

Working in:

/mnt/c/Users/you/projects

npm install → 8 minutes
git status → 4 seconds

Move to:

~/projects

npm install → 25 seconds
git status → instant

The root cause

/mnt/c uses the 9P protocol → every filesystem call crosses the Windows ↔ Linux boundary.

Heavy IO workloads (Node, Git, Docker builds) get destroyed by latency.

Native WSL FS = ext4 inside VHDX → near-native Linux speed.

Real benchmark

Operation	`/mnt/c`	WSL native
`npm install`	~8 min	~25 sec
`git status` (10k files)	~4 sec	< 100 ms
`docker build` context	~90 sec	~3 sec

What actually works

Rule: keep code inside WSL.

~/projects/myapp

Access options:

VS Code + WSL extension ✅ (best)
\\wsl$\Ubuntu\home\you\projects (OK)

If you MUST use /mnt/c:

Move heavy dirs (node_modules, .git) to WSL
Use symlinks

Key takeaway

/mnt/c is for compatibility, not performance.

3. Networking: ports, shifting IPs, and host access

The symptom

localhost:3000 → sometimes works, sometimes not
WSL IP changes every reboot
LAN access → broken

The root cause

WSL2 networking is:

NATed via Hyper-V
Not bridged
Dynamic IP
Partial localhost forwarding

What actually works

Ensure localhost forwarding:

[wsl2]
localhostForwarding=true

Get WSL IP:

ip addr show eth0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1

Expose to LAN (port proxy):

$wslIP = (wsl hostname -I).Trim().Split(' ')[0]

netsh interface portproxy add v4tov4 `
  listenport=3000 listenaddress=0.0.0.0 `
  connectport=3000 connectaddress=$wslIP

New-NetFirewallRule -DisplayName "WSL 3000" `
  -Direction Inbound -LocalPort 3000 `
  -Protocol TCP -Action Allow

Best solution (modern WSL):

[wsl2]
networkingMode=mirrored

✔ Same network as host
✔ No NAT issues
✔ LAN works directly

Key takeaway

WSL2 networking = NAT. Use mirrored mode if available.

4. Memory: the `vmmem` problem

The symptom

Start day → 4 GB used
Work with Docker
Stop everything
vmmem still using 12 GB

Never released.

The root cause

WSL2:

Allocates memory dynamically
Does not release it back
Linux keeps cache (normal behavior)
Windows cannot reclaim it

What actually works

Cap memory:

[wsl2]
memory=8GB
swap=4GB

Enable auto reclaim:

[experimental]
autoMemoryReclaim=gradual
sparseVhd=true

Manual reclaim:

sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

Last resort:

wsl --shutdown

Key takeaway

WSL will NOT give memory back unless you force it or configure it.

Minimal `.wslconfig` (recommended)

[wsl2]
memory=8GB
processors=4
swap=2GB

localhostForwarding=true
networkingMode=mirrored

[experimental]
autoMemoryReclaim=gradual
sparseVhd=true

📍 Path:

C:\Users\<you>\.wslconfig

Apply with:

wsl --shutdown

Closing thoughts

Most of these issues come from one fact:

WSL2 is a VM pretending to be native Linux.

And the cracks show when:

You push CPU
You do heavy IO
You rely on networking assumptions
You expect Linux memory behavior to match Windows

WSL2 is still excellent — but only if you understand:

cgroups quirks
/mnt/c performance trap
NAT networking
Memory ballooning

Once you do, most "random issues" become predictable.

If you've hit other WSL2 gotchas, drop them in the comments 👇

The one that surprised me most? Spending 3 days tuning container limits… that were being completely ignored.

💡 Did this save you an afternoon? A follow or reaction helps me write more of these.

Everyone is using Claude Code wrong.

Alvarito1983 — Mon, 20 Apr 2026 14:17:15 +0000

Not because they're bad developers. Because the mental model is wrong from the start.

Most people open Claude Code and treat it like a smarter autocomplete. They ask it to write a function. Fix a bug. Generate a component. Then they wonder why the output needs so much correction, why the context gets lost after a few sessions, why it feels like babysitting rather than collaborating.

The problem isn't Claude Code. It's the job description you gave it.

The wrong mental model

When you use Claude Code as a coding assistant, you're asking it to be a fast typist. It writes code, you review it, you correct it, you move on.

That works. It's faster than writing everything yourself. But it's not where the leverage is.

The leverage is in treating Claude Code as a contractor, not a typist. A contractor who can execute an entire feature end-to-end — backend, frontend, tests, documentation — if you give them the right briefing.

The difference between a typist and a contractor isn't skill. It's context.

What Claude Code actually needs

Claude Code has no persistent memory between sessions. Every time you start a new conversation, it starts completely fresh. It doesn't know your architecture decisions, your naming conventions, your known bugs, or where you left off last time.

Most people solve this by re-explaining everything at the start of each session. That's the wrong solution. It's slow, it's incomplete, and you always forget something important.

The right solution is a single file that does the re-explaining for you.

I call it CLAUDE.md. It lives at the root of every project. Claude Code reads it automatically at the start of every session.

It has four sections:

Current State — what's working, what's broken, where to pick up. Updated at the end of every session.

Architecture Decisions — not what you built, but why. "We use in-memory sessions because adding a persistence layer would complicate the Docker setup for homelab users." The reasoning that would otherwise live only in your head.

Conventions — specific rules Claude Code must follow. Not "write clean code." That's useless. Instead: exact patterns learned from real debugging sessions. Things that would take 30 minutes to rediscover without documentation.

Known Issues — bugs and limitations that are known but not yet fixed. Prevents Claude Code from "fixing" something intentionally left as-is.

Without this file, every session starts at zero. With it, Claude Code picks up exactly where you left off.

The second mistake: one task at a time

Most people use Claude Code sequentially. Finish one thing, start the next.

Claude Code supports sub-agents — multiple parallel workstreams running simultaneously. If you need to build five related components, you don't have to build them one by one.

The mental shift is from "what's the next task" to "what can run in parallel." A frontend and a backend for the same feature. Unit tests while the implementation is being written. Documentation while the code is being reviewed.

This is where the output multiplier kicks in. Not 2x faster — closer to 5x, because the bottleneck is no longer Claude Code's speed. It's your ability to architect work that can run in parallel.

The third mistake: no standards document

When you use Claude Code across multiple sessions or multiple parts of a project, consistency breaks down fast. The button style from session one doesn't match session three. The error handling pattern in one module contradicts another.

The fix is a standards document in CLAUDE.md. Not a style guide for humans — a set of rules Claude Code must follow exactly, every time, without being reminded.

Color values. Component patterns. API response shapes. Auth flows. Every decision that should be consistent across the codebase, written down once, enforced automatically.

Without it, you spend half your time correcting drift. With it, Claude Code enforces your standards better than you would yourself — because it doesn't get tired or forget.

The fourth mistake: no forcing function for updates

The obvious weakness of CLAUDE.md is staleness. If you don't update it, it becomes worse than useless — it confidently points Claude Code in the wrong direction.

Discipline doesn't work. The sessions where you forget to update are exactly the sessions where something important happened.

The fix is simple: at the end of every session, ask Claude Code to update the Current State section before closing.

"Before we finish, update the Current State section in CLAUDE.md to reflect what we did today, what's working, and where to pick up next time."

Thirty seconds. While the context is still fresh. The model is better at summarizing what just happened than you are at remembering to write it down.

It catches maybe 80% of sessions. The other 20% are interrupted sessions — closed terminal, crashed IDE, ran out of time. Those you can't fully solve. But 80% is enough to make the system work.

What changes when you get this right

The output doesn't feel like AI-assisted coding anymore. It feels like having a contractor who knows your codebase, follows your standards, picks up exactly where you left off, and can work on multiple things at once.

The work you actually do shifts. Less time writing boilerplate. Less time correcting style drift. Less time re-explaining context. More time on architecture, on product decisions, on the problems that actually require human judgment.

That's the job description Claude Code was built for. Not typist. Architect's executor.

Most people never get there because they never give it the right briefing.

The CLAUDE.md approach and sub-agent patterns came out of building a 15-tool Docker management platform over several weeks. If you want the specifics on how to structure the file, I wrote about it here: [link to previous article]

I let Claude AI decide whether to patch my Docker vulnerabilities — here's what it found

Alvarito1983 — Sat, 18 Apr 2026 18:02:54 +0000

Every security scanner will tell you what's vulnerable.

None of them will tell you what to actually do about it.

You get a list. CVE IDs, severity badges, affected packages. Then you're alone with the question that actually matters: is this exploitable in my setup, can I safely apply the fix, and does patching this break anything?

I've been building a self-hosted Docker management platform. Last week I wired up an AI layer on top of the vulnerability scanner — not to automate patching, but to automate the reasoning about whether to patch. Here's what happened.

The problem with "critical"

Critical doesn't mean the same thing in every context.

CVE-2025-58050 is a critical vulnerability in pcre2. On paper: patch immediately. In practice: it's in a socket proxy image maintained by a third party. I don't control that Dockerfile. I can pull the latest image and hope the maintainer already shipped the fix — or I can wait. Neither option is obvious from the CVE report alone.

CVE-2026-27143 is a critical vulnerability in Go stdlib. Fix available: upgrade to 1.25.9. Sounds straightforward. The complication: the binary that ships this version of stdlib is cloudflared, Cloudflare's tunnel client. I didn't write it. I can't easily recompile it. The fix depends on Cloudflare publishing an updated release.

Both are "critical." Neither has the same answer. A scanner can't tell you that. A rule can't tell you that. It requires judgment.

What Level 1 does

The first layer of automation is rule-based. No AI, no API calls, no external dependencies.

When a scan completes and finds a critical vulnerability, Level 1 fires an alert. It knows: this image has N critical CVEs, the threshold is 1, therefore alert. Fast, deterministic, always on.

This covers the obvious cases. A container with a known critical CVE should be flagged. That part doesn't need AI.

What Level 1 can't do is answer: should I patch this right now, and at what risk?

What Level 2 does

When Level 1 detects a critical vulnerability, Level 2 kicks in if an Anthropic API key is configured. It builds a structured prompt with everything the model needs to reason about the situation: the image name, the CVEs, the packages affected, the versions with fixes available, and whether those fixes represent a patch bump, a minor version change, or a major version jump.

Claude Haiku then returns a structured analysis:

Exploitability — is this remotely exploitable or does it require local access?
Urgency — how quickly does this need to be addressed?
Version Risk — what's the upgrade risk? Patch bump vs minor vs major?
Fix Impact — is the fix likely to break anything?
Recommended Action — a concrete next step with reasoning

This analysis goes into the notification. The email you receive isn't "critical CVE detected." It's a security brief.

What it found on my stack

Three critical vulnerabilities across two images. Here's what the AI analysis said about each:

CVE-2026-27143 and GHSA-p77j-4mvh-x3m3 — both in a tunnel manager image, both in third-party binaries (Go stdlib and gRPC shipped inside cloudflared). Exploitability: remote. Urgency: critical. Version risk: patch bump. Fix impact: low — but the fix depends on the upstream vendor shipping an updated binary. Recommended action: defer until the vendor publishes an updated release, monitor for new cloudflared versions.

CVE-2025-58050 — in a socket proxy image maintained by a third party. Package: pcre2. Fix available: 10.46-r0. Exploitability: remote. Urgency: critical. Fix impact: low. Recommended action: pull the latest image version to pick up the fix if the maintainer has already shipped it. If not, defer and accept the risk with documentation.

In both cases the AI correctly identified that I don't control the vulnerable code. It didn't tell me to patch something I can't patch. It told me what I actually needed to know: these are third-party dependencies, here's the risk profile, here's what to do while you wait.

The part I didn't expect

The most useful output wasn't the vulnerability analysis. It was the differentiation between "you can fix this" and "you're waiting on someone else."

That distinction is obvious to a human who investigates the CVE. It's not obvious to a scanner. It requires knowing what the affected binary is, who maintains it, and whether the fix is in your control.

The AI got this right without me telling it explicitly. It reasoned from the package name and context to the correct conclusion about ownership and fixability.

That's the gap that rules-based automation can't close. You can write a rule that says "alert on critical CVEs." You can't write a rule that says "if the vulnerable binary is a third-party dependency with no upstream fix available, recommend deferral with documentation."

What I learned about AI in security workflows

The value isn't automating the patch. It's automating the triage.

A human security engineer looking at these three CVEs would spend 20-30 minutes researching each one: checking exploitability databases, looking at the upstream project's changelog, assessing version risk, writing up a recommendation. The AI does this in seconds, for every scan, every time.

The output isn't always perfect. The model can misread version risk or miss context about your specific setup. Every decision is visible in the feed with the full reasoning attached — you can always override, and you should review anything the model flags as critical before acting.

But the triage is right often enough to dramatically reduce the cognitive load of managing vulnerabilities across a fleet of containers. You stop reading CVE lists and start reading executive summaries with recommended actions.

That's a different kind of tool.

Building this as part of an open source self-hosted Docker management platform. If you're working on similar infrastructure automation problems, drop a comment.

CLAUDE.md: the file that makes AI actually remember what you built and why

Alvarito1983 — Fri, 17 Apr 2026 14:51:25 +0000

Every AI coding session starts the same way.

You open a new chat. You explain the project. You explain the stack. You explain the decisions you made last week and why. You spend 15 minutes giving context before writing a single line of code.

Then the session ends. The context disappears. Next time, you start over.

I got tired of this. So I built a system around a single file called CLAUDE.md — and it changed how I work with AI completely.

What CLAUDE.md is

It's a plain text file that lives at the root of every project I work on. Claude Code reads it automatically at the start of every session.

Not a README. Not documentation for other developers. This file is written specifically for the AI — it contains everything the model needs to pick up exactly where we left off without me having to re-explain anything.

The difference sounds small. It isn't.

What goes in it

The file has four sections that I've refined over months of daily use.

Current State is the entry point. It's the first thing Claude Code reads and it answers three questions: what's working, what's broken, and where to pick up next session.

## Current State — last updated [date]
- What's working: Hub SSO, all agents, Log Center
- What's broken: standalone tool still on old version
- Next session: bump versions, publish release

This section gets updated at the end of every session. More on that later.

Architecture Decisions is where I explain the why, not the what. Not "we use JWT auth" but "we use JWT auth with in-memory sessions because adding a persistence layer would complicate the Docker setup for homelab users — revisit when user base grows." The reasoning, the trade-offs, the constraints that shaped the decision.

This is the section that saves me from re-litigating decisions. When Claude Code suggests something that contradicts a past decision, it sees the reasoning and understands why the alternative was rejected.

Conventions is a list of rules Claude Code must follow. Specific ones, not generic ones. Not "write clean code" — that's useless. Instead: things learned from actual debugging sessions that would take 30 minutes to rediscover. Exact patterns that must be followed. Edge cases that look fixable but aren't.

Known Issues is a list of bugs and limitations that are known but not yet fixed. This prevents Claude Code from "fixing" something that's intentionally left as-is, or spending time diagnosing something I already understand.

The forcing function problem

The obvious weakness: if you don't update the file, it goes stale. And stale context is worse than no context — it confidently points the AI in the wrong direction.

I tried discipline. It doesn't work reliably. The sessions where you forget to update are exactly the sessions where something important happened — a late-night debugging run, an interrupted session, a decision made in conversation that never touched the codebase.

The solution I landed on: ask Claude Code to update the Current State block at the end of every session before closing.

The prompt is simple:

"Before we finish, update the Current State section in CLAUDE.md to reflect what we did today, what's working, and where to pick up next time."

It takes 30 seconds. It happens while the context is still fresh. And the model is better at summarizing what just happened than I am at remembering to write it down.

This catches maybe 80% of sessions that would otherwise leave stale state. The other 20% are interrupted sessions — closed terminal, crashed IDE, ran out of time. Those you can't fully solve. But 80% is enough to make the system work.

The problem it doesn't solve

There's a class of architectural knowledge that CLAUDE.md can't easily capture: the implicit decisions.

"We avoid Y because of the incident in March" is the most important kind of architectural knowledge and the hardest to write down. It's not a pattern — it's a scar. The context that makes it meaningful lives in someone's memory, or in a post-mortem that nobody links to the codebase.

A model reading CLAUDE.md can only match against what got written. If the decision was implicit — understood by everyone who was there, never documented because it seemed obvious at the time — the model has no surface to match against.

My partial fix: at the end of sessions I ask Claude Code:

"Is there anything we decided today that we'd regret not documenting in six months?"

It catches some of it. Not all. But it asks in the right direction — not "what did we do" but "what would we wish we'd written down."

What it looks like in practice

I'm building a self-hosted Docker management platform — 13 tools, each with its own frontend, backend, agent, and central Hub integration. The kind of project where losing context between sessions would be catastrophic.

With CLAUDE.md, a new session starts like this: Claude Code reads the file, understands the current state of all 13 tools, knows the conventions for auth patterns and Docker socket connections and visual standards, and picks up exactly where we left off. No re-explanation. No re-litigating past decisions.

Today's session: 7 new tools built from scratch, all integrated into the ecosystem, all following the same design system and agent patterns. Claude Code maintained consistency across all of them because the standards were written down, not carried in my head.

Without CLAUDE.md, that consistency would have required constant correction. With it, the model enforces the standards itself.

The one-line version

CLAUDE.md is not a README. It's not documentation. It's the answer to the question the AI needs to ask at the start of every session:

"What do I need to know to be useful right now?"

Write it for that question. Update it every session. The compounding effect over months of development is hard to overstate.

Building an open source self-hosted Docker ecosystem. If you're interested in the project or in how I use Claude Code to build it, follow along.

I added autonomous AI agents to my self-hosted Docker platform — here's what Level 1 vs Level 2 autonomy actually means

Alvarito1983 — Thu, 16 Apr 2026 07:07:02 +0000

Managing a Docker ecosystem manually doesn't scale. Not because the tools are bad — Docker is fine, Compose is fine — but because the cognitive load of watching six services across multiple hosts adds up fast. You miss things. A container dies at 3am. A CVE sits unactioned for two weeks because you were heads-down on something else. An SSL cert expires because the renewal check was a cron job you forgot about.

I've been building a self-hosted Docker management platform for the past few months. Last week I added something I'd been putting off: autonomous agents. Not AI wrappers around Docker commands. Actual agents that monitor, decide, and act — with two distinct levels of autonomy depending on the situation.

Here's what I learned building them.

The problem with "smart" monitoring

Most monitoring tools solve the detection problem. They tell you something is wrong. Then you still have to act.

That's fine for a team with an on-call rotation. For a solo homelab operator or a small team without 24/7 coverage, detection without action means you're still waking up at 3am — just now with a notification in your hand.

What I wanted was a system that could handle the obvious cases automatically, and only escalate to me when the situation genuinely required human judgment.

That distinction — obvious cases vs judgment calls — is where the Level 1 / Level 2 split comes from.

Level 1 — Rule-based, no AI, always on

Level 1 agents operate on pure logic. No API calls, no model inference, no external dependencies. They run on a poll interval and apply deterministic rules.

Examples of what Level 1 handles:

A container exits unexpectedly → restart it
An image has a new digest available → flag it for review
A monitor has failed N consecutive checks → fire an alert
A notification channel has been disabled → warn that delivery is impossible
An SSL cert expires in less than X days → escalate

These cases have clear right answers. A container that died should come back up. A cert expiring in 7 days needs attention. No judgment required.

Level 1 is the baseline. It works without any API key, without any configuration beyond enabling it. The idea is that anyone who installs the platform gets meaningful automation out of the box.

One thing I got wrong initially: I had Level 1 acting too fast. A container that exits and restarts within Docker's own backoff window doesn't need the agent to intervene — Docker already handles it. The fix was adding a minimum dead time before the agent acts. If the container has been down for less than 60 seconds, wait. Docker might already be handling it.

That minDeadTime parameter turned out to be one of the most important tuning knobs in the whole system.

Level 2 — Claude Haiku decides when it's ambiguous

Level 2 activates when a situation falls outside the clear rules. The trigger is usually a pattern that looks bad but might have a legitimate explanation.

The clearest example: a container that keeps crashing.

Level 1 will restart a crashed container. But what if it crashes again? And again? At some point, restarting a container in a crash loop isn't helpful — you're just burning resources and masking a real problem. But stopping it entirely might break something that depends on it.

This is a judgment call. The right answer depends on context: what's in the logs, how many times it's restarted in the last hour, what the exit codes look like, whether this is a critical service or a background worker.

Level 2 passes that context to Claude Haiku and asks for a decision: restart, escalate, or force-stop.

The prompt includes:

Container name and image
Restart policy
Exit code history
Last 20 lines of logs (capped at 1500 chars)
Current restart count within the window

Claude returns a JSON decision with an action and a reason. That reason gets surfaced in the UI so the operator can see why the agent acted the way it did.

A few things I got right here:

Fallback to escalate on any error. If the API call fails, times out, or returns something unparseable, the agent escalates rather than retrying or doing nothing. Conservative failure mode.

Guard against decision spam. The agent tracks its Claude decisions per container. If it already asked Claude about this container at restart count N, it won't ask again until the count changes. Without this, a stuck container would generate an API call every 30 seconds.

The AI badge in the feed. When Claude made a decision, the action in the feed shows an AI badge and the model's reasoning inline. Operators can see "Claude decided to escalate this because the logs show a repeated OOM pattern" rather than just "escalated."

The demo that made it real

The moment the system clicked for me was this test:

docker stop my-service

Within 90 seconds:

The container agent detected the exit (exit code 137 — SIGKILL from docker stop)
Waited for minDeadTime
Restarted the container automatically
Logged the action: Auto-restarted · exitCode=137 · policy=unless-stopped · restart #1
The orchestrator detected the service coming back online

No intervention. No notification. It just handled it.

Exit code 137 is 128 + 9 — SIGKILL. The agent knows this isn't a crash, it's an external stop. In production you'd probably exclude manually-stopped containers from auto-restart. That's what the exclusion list is for.

What I built across the ecosystem

Six agents in total, one per tool:

Container Agent — watches running containers, restarts crashed ones, escalates crash loops to Level 2.

Update Agent — monitors image digests for changes, evaluates whether updates are safe to apply automatically or should be deferred for review.

Monitor Agent — analyzes uptime check results, detects three patterns: sustained failure, flapping (up-down-up-down), and SSL expiry. Level 2 handles ambiguous sustained failures.

Scan Agent — reads CVE scan results and evaluates severity. Critical vulnerabilities go to Level 2 — the model assesses whether the CVE is likely exploitable in the specific configuration. High vulnerabilities alert directly without AI.

Notify Agent — watches the notification system itself. Catches misconfiguration (disabled channels, no active rules) and anomalies (delivery failure spikes, notification volume spikes).

Master Agent — sits at the orchestrator level with visibility across the whole ecosystem. Detects cross-service patterns: ecosystem-wide degradation, event storms, correlated critical events across multiple services.

What Level 2 is not

It's not a replacement for alerting. The agents escalate to the notification system when they decide a human needs to know. Level 2 makes that escalation smarter — it filters out the cases that don't warrant waking someone up.

It's not always right. The model can make bad calls, especially with ambiguous logs. That's why every Level 2 decision is visible in the feed with the reasoning attached. You can always override, and you can always clear a decision and let the agent re-evaluate.

It's not required. The whole system works without an API key. Level 1 covers the clear cases. Level 2 is additive.

The thing I didn't expect

Building this forced me to think carefully about what "autonomous" actually means for infrastructure tooling.

Full autonomy — an agent that can do anything without asking — is the wrong target. The right target is calibrated autonomy: the agent acts confidently on clear cases, hesitates on ambiguous ones, and always leaves a paper trail explaining what it did and why.

The Level 1 / Level 2 split is really just a formalization of that. Some situations have obvious answers. Some don't. The system should know the difference.

Building this as an open-source self-hosted platform. If you're interested in following the progress, drop a comment.

The next phase of AI isn't smarter models. It's infrastructure.

Alvarito1983 — Wed, 15 Apr 2026 06:54:02 +0000

Everyone is talking about the models.

GPT-5. Claude Opus 4. Gemini Ultra. Which one scores higher on benchmarks. Which one writes better code. Which one is worth the subscription.

I think that's the wrong conversation. And I think in 18 months, most people will agree.

The next phase of AI isn't about smarter models. It's about infrastructure. And I say that as someone who has spent 15 years building infrastructure for a living.

Where we are now

The shift that already happened — and that most people haven't fully absorbed — is the move from conversational AI to agentic AI.

Conversational AI waits for you. You ask, it answers. You prompt, it responds. The human is the engine; the AI is the tool.

Agentic AI plans and executes. You give it a goal. It reads your codebase, breaks the problem into steps, executes them in sequence, checks the results, fixes what broke, and reports back. The AI is the engine; the human is the director.

This isn't future speculation. It's what Claude Code, GitHub Copilot's agent mode, and a dozen other tools are doing right now. I've been running multi-agent workflows for months — orchestrator agents coordinating specialist sub-agents building entire features in parallel while I review the output.

But here's what I think most people are missing: this transition creates massive unsolved infrastructure problems. And those problems are going to define the next 18-24 months.

What I think happens next

1. Agents are going to need their own infrastructure

Right now, agents run in your terminal, on your machine, inside someone else's cloud. That works at small scale. It breaks at large scale.

Think about what a production agent system actually needs:

Persistent state — an agent mid-task needs to survive a restart. Where does its working memory live?
Networking — agents calling other agents, agents calling external APIs, agents accessing internal services. Who manages that network?
Identity and auth — if an agent is making API calls, creating files, pushing commits, what identity does it have? How do you audit what it did?
Resource limits — a runaway agent can consume compute indefinitely. Who enforces the limits?
Observability — when something goes wrong in a multi-agent workflow, how do you trace which agent made which decision at which step?

None of this exists in a coherent form yet. We're running agents the way we ran web apps in 2003 — on single servers, with manual restarts, hoping nothing crashes.

Someone is going to build the Kubernetes of agents. Probably in the next 18 months. And when they do, someone is going to have to run it.

That someone is infrastructure engineers.

2. Prompt engineering is going to die

Not immediately. But the trend is clear.

Right now, there's an entire cottage industry around "prompt engineering" — the art of asking AI the right question in the right way to get the right answer. It's a real skill. It matters today.

But it's a transitional skill, not a permanent one.

As agentic systems mature, the question stops being "how do I write the perfect prompt?" and starts being "how do I design this system of agents so it reliably solves this class of problems?"

That's not prompt engineering. That's systems design.

It's the same shift that happened with databases. Early practitioners had to be experts at writing optimal SQL queries. Then query optimizers got good enough that you could trust the system to figure it out — and the skill that mattered became schema design, indexing strategy, and query planning at the architecture level.

The same thing is going to happen with AI. The people who will matter aren't the ones who can write clever prompts. They're the ones who can design reliable systems.

I've been building a 6-tool Docker management ecosystem with Claude Code. The prompts matter less than I expected. What matters almost entirely is: clear architecture, explicit scope boundaries, good context management, and knowing when something is wrong. Those are systems thinking skills, not prompt writing skills.

3. Agents are going to start modifying themselves

This one is further out, but the early signs are already there.

Right now, agents execute within the boundaries you set. They read your CLAUDE.md, follow your instructions, build what you ask.

But agents already write code. And some of that code is agent infrastructure — the scaffolding, the context management, the workflow definitions. The logical next step is agents that notice their own inefficiencies and propose modifications to their own workflows.

We're not there yet. But the gap between "agent that executes a workflow" and "agent that improves a workflow" is narrower than it looks.

When it closes, the role of the human changes again. Not from director to spectator — the human still needs to validate, approve, understand what changed and why. But the cycle time between "this workflow has a problem" and "the workflow is fixed" compresses dramatically.

The implication: the humans who stay in the loop effectively are the ones who understand systems well enough to evaluate a proposed change. Generalists who can prompt but can't reason about system behavior will struggle here.

4. Self-hosted AI is going to become serious infrastructure

This is the one I'm most confident about, because it's already starting.

Right now, most AI runs on someone else's infrastructure. Anthropic's servers. OpenAI's servers. Google's servers. You send your data there, get a response back, trust that the provider handles it responsibly.

For consumer use, this is fine. For enterprise use — especially in regulated industries, sensitive domains, or organizations that have genuinely learned the lesson about vendor dependency — this is becoming a problem.

The models are getting small enough to run on-premise. Llama, Mistral, Qwen — capable open-source models that you can run on hardware you control. The tooling around self-hosted inference is maturing fast.

And when organizations start running AI on their own infrastructure, someone has to manage it. GPUs don't configure themselves. Model updates need to be evaluated and deployed. Inference infrastructure needs to be monitored, scaled, and maintained.

That's not a developer job. That's an infrastructure job.

I manage 7,000 servers professionally. I can see exactly where the self-hosted AI infrastructure conversation is heading — it's heading toward the same conversations we had about on-premise databases and private cloud ten years ago. Same problems, new stack.

What this means in practice

I'm not making predictions about timelines with false precision. But the direction seems clear:

The model layer is becoming a commodity. When you have ten capable models competing for your subscription, the model itself stops being the differentiator. The system around it is.

The infrastructure layer is becoming critical. Agents need to run somewhere, persist state somewhere, authenticate somewhere, get monitored somewhere. That infrastructure doesn't exist yet in mature form.

The skills that matter are shifting. Prompt writing → systems design. Single-agent workflows → multi-agent orchestration. Cloud-hosted AI → self-hosted AI infrastructure.

I've been building on the leading edge of this — running agent workflows, building self-hosted tools, thinking about how multiple services coordinate and communicate. And what I keep noticing is that the problems I'm solving aren't AI problems. They're infrastructure problems wearing AI clothes.

The next chapter of AI isn't written by the people building smarter models. It's written by the people building the systems those models run in.

NEXUS Ecosystem is my attempt to build serious infrastructure for self-hosted Docker environments. Open source, 6 tools, unified control plane.

GitHub: github.com/Alvarito1983
Docker Hub: hub.docker.com/u/afraguas1983

ai #devops #infrastructure #claudecode #selfhosted #programming #discuss #career

Claude Code Part 2: How I use Sub-agents to build entire features in parallel

Alvarito1983 — Tue, 14 Apr 2026 11:26:27 +0000

This is Part 2 of my Claude Code series. Part 1 covers the fundamentals — installation, CLAUDE.md, prompts, and workflow patterns. Read that first if you're new to Claude Code.

This part is about something most guides don't cover: sub-agents. How to run multiple specialized AI agents in parallel, coordinate them, and use them to build entire features simultaneously.

This isn't theoretical. Everything here comes from building NEXUS Ecosystem — six self-hosted Docker tools — where sub-agents went from curiosity to core workflow.

What sub-agents actually are

When you run Claude Code normally, you have one agent doing one thing at a time. It reads, thinks, writes, verifies — sequentially.

Sub-agents change that. You can launch multiple specialized agents that work in parallel, each focused on a specific task, while the main agent orchestrates.

The mental model: instead of one developer doing everything, you have a tech lead (the main agent) coordinating a team of specialists (sub-agents).

Here's what it looks like in practice:

● 2 background agents launched (↓ to manage)
   ├─ nexus-security improvements: Check Now fix, sidebar logs, Report view
   └─ nexus-hub Log Center: backend routes, tool log helpers, frontend Logs panel
● Both agents are running in parallel.

That's two agents simultaneously building two different modules of the same project. One fixing and extending the Security tool. One building an entirely new Log Center in Hub. Neither blocking the other.

The work that would have taken 3-4 sequential hours took under 45 minutes.

Installing the agent marketplace

Claude Code has a built-in agent system, but the real power comes from the community marketplace. The one worth installing is wshobson/agents — 22 specialized agents covering everything from backend security to Kubernetes to incident response.

Install it:

# Add the marketplace
claude config set extraKnownMarketplaces.claude-code-workflows.source.source github
claude config set extraKnownMarketplaces.claude-code-workflows.source.repo wshobson/agents

# Install the agents you need
claude plugin install agent-teams@claude-code-workflows
claude plugin install backend-development@claude-code-workflows
claude plugin install security-scanning@claude-code-workflows
claude plugin install debugging-toolkit@claude-code-workflows
claude plugin install comprehensive-review@claude-code-workflows
claude plugin install incident-response@claude-code-workflows

Verify what's installed:

claude plugin list
# or inside a session:
/agents

The /agents command opens a panel showing all available agents with their model assignments (haiku, sonnet, or opus depending on complexity).

The agents worth knowing

Not all 22 agents are equally useful. These are the ones I actually use:

agent-teams:team-lead (opus) — Orchestrates other agents. Use this when you need someone to break down a complex task and delegate to specialists. It thinks slower but plans better.

agent-teams:team-implementer (opus) — Pure implementation. Give it a spec and it builds. Works best when the architecture is already decided.

agent-teams:team-reviewer (opus) — Code review. Run this after implementation to catch issues before you do.

backend-development:security-auditor (sonnet) — Scans backend code for vulnerabilities, auth issues, injection risks. Genuinely useful before deploying.

debugging-toolkit:debugger (sonnet) — Specialized in diagnosis. Give it a symptom and it traces through the code to find the cause. Better than the general agent for tricky bugs.

comprehensive-review:architect-review (opus) — High-level architecture review. Use this when you've built something significant and want a second opinion on the design decisions.

security-scanning:threat-modeling-expert — Thinks about what could go wrong in your system. Good for security-sensitive features.

How to launch sub-agents

The simplest way is to describe parallel work in your prompt and ask Claude Code to use sub-agents:

Implement two features in parallel using sub-agents:

## SUB-AGENT 1 — Security improvements
[detailed spec for security work]

## SUB-AGENT 2 — Log Center
[detailed spec for log center work]

Report when both complete.

Claude Code launches both agents and manages them. You see output like:

● Agent "Security improvements" completed
● Sub-agente 1 completado. Resumen:
  - Check Now fix: polling cada 3s...
  - Panel de logs: ring buffer 50 entradas...
  - Report view: deduplicación de CVEs...
  ---
  Sub-agent 2 sigue en ejecución.

The key insight: each sub-agent gets its own context window. They don't share state or interfere with each other. This is why parallel work is possible — there's no race condition on the context.

Checking on agents without interrupting them

This took me a while to figure out. When agents are running in background, you don't want to interrupt them — but you might want to know what's happening.

The /btw command is designed exactly for this:

/btw how are the two background agents doing?

You get a status report without the agents noticing:

  Los agentes siguen corriendo.
  Sub-agente 1 (nexus-security) — en progreso
  - Ya modificó containerMonitor.js
  - Queda: fix Check Now, panel de logs, vista Report

  Sub-agente 2 (nexus-hub Log Center) — en progreso  
  - Sin cambios confirmados aún
  - Tiene más trabajo: rutas logs.js, helper logToHub(),
    componente LogsPanel.jsx con 3 tabs

The agents keep running. You keep informed. No interruption.

To actually manage them, press ↓ to open the agent panel:

● 2 background agents launched (↓ to manage)
   ├─ nexus-security improvements ✓ completed
   └─ nexus-hub Log Center — still running

The orchestrator pattern

For complex multi-service projects, the most effective pattern is explicit orchestration: one agent that reads the full context and coordinates, specialists that implement.

Here's a real prompt structure I use:

You are the orchestrator for this task. 
Before launching any sub-agents:

1. Read the CLAUDE.md completely
2. Read these files to understand current state:
   - nexus-hub/backend/server.js
   - nexus-security/backend/src/routes/scan.js
   - nexus-hub/backend/src/services/flowEngine.js

3. Plan the work and split into parallel tracks
4. Launch sub-agents with specific, complete specs
5. Wait for all to complete
6. Run a final integration check

The goal: implement CVE scanning in Security with 
automatic alerts to Hub when Critical found.

Use security-scanning:security-auditor for the 
security implementation and backend-development:backend-architect 
for the Hub integration.

The orchestrator reads everything first, makes the architectural decisions, then delegates implementation to specialists who don't need to understand the full picture.

Real example: building Security and Log Center in parallel

Here's a condensed version of a real session from building NEXUS. I needed to:

Add a progress bar to Security's Check Now button
Add a sidebar activity log to Security
Build an entirely new Log Center in Hub with 3 tabs
Add logToHub() helper to all 5 tools

Instead of doing this sequentially (3-4 hours), I launched two sub-agents:

Dos tareas en paralelo con sub-agentes:

## SUB-AGENTE 1 — nexus-security
Lee nexus-security/frontend/src/components/Dashboard.jsx
y nexus-security/backend/src/services/containerMonitor.js

Implementa:
1. Check Now con progreso real — polling /api/scan/results
   cada 3s, timeout 5min, toast al terminar
2. Panel Activity en sidebar — ring buffer 50 entradas,
   logs tiempo real via Socket.io, color-coded por tipo

Rebuild: docker compose -f docker-compose.test.yml up --build -d nexus-security

## SUB-AGENTE 2 — nexus-hub Log Center
Lee nexus-hub/backend/server.js y src/routes/ (todos)

Implementa:
- POST /api/logs — ingest con X-Api-Key
- GET /api/logs/ecosystem, /app/:source, /docker/:source
- Retención 10 días en /app/data/logs/[source]/[YYYY-MM-DD].json
- LogsPanel.jsx con 3 tabs: Ecosystem Events / App Logs / Docker Logs
- logToHub() helper en los 5 tools

Rebuild: todos los servicios al terminar

Results:

● Agent "nexus-security improvements" completed  ✓  8m 12s
● Agent "nexus-hub Log Center" completed         ✓  19m 44s

Two significant features. Parallel. ~20 minutes total.

The Security agent finished first (simpler scope). The Log Center agent took longer (6 services to modify). But they didn't block each other.

What makes sub-agents faster

It's not magic. Here's the actual reason parallel agents save time:

No context switching cost. When one agent finishes reading Security's files, it doesn't have to context-switch to understand Hub's architecture. The Hub agent has its own fresh context, already loaded with the right files.

No sequential dependencies (when designed correctly). If you structure the work so agents aren't waiting on each other's output, they run fully in parallel. The Log Center doesn't need Security's code to be finished — it's a separate service.

Model specialization. A security-auditor agent running on sonnet is tuned differently than a general team-implementer. The right model for the right task.

The failure mode: creating false dependencies. If you ask Agent 2 to "use the pattern that Agent 1 will establish", Agent 2 has to wait. Design your parallel tasks to be genuinely independent.

Giving agents their own CLAUDE.md context

This is underused. You can pre-load agents with specific context using the prompt structure:

## CONTEXT FOR THIS AGENT
You are working on nexus-security only.
The project is at E:\Claude\NEXUS\nexus-security

Key facts:
- Accent color: #ef4444 (red)
- Backend port: 9093, container: nexus-security-test  
- Frontend uses CSS custom properties, no Tailwind
- Socket.io is already configured in server.js
- The scan result format is: { id, severity, package, 
  version, fixedIn, description }
- Do not touch nexus-hub — that's a separate agent's scope

## YOUR TASK
[specific instructions]

This context header tells the agent exactly what it needs to know without reading the entire CLAUDE.md. For sub-agents with narrow scope, targeted context is more efficient than full project context.

The mistakes I made

Launching agents with vague specs. Sub-agents don't have the conversational context you've built up in the main session. They start cold. If your spec says "fix the Check Now button", the agent doesn't know what's wrong with it, what the expected behavior is, or what files to look at. Be exhaustively specific.

Creating dependent parallel tasks. I once launched two agents where Agent 2 needed to "follow the pattern Agent 1 establishes". Agent 2 started before Agent 1 finished, made its own decisions, and the patterns were inconsistent. Now I either sequence dependent tasks or make both specs completely explicit.

Not including the rebuild step. Sub-agents will write perfect code and then stop. If you don't explicitly tell them to rebuild the Docker containers, you won't see the changes. Always end sub-agent specs with the rebuild command.

Overloading a single sub-agent. One agent doing 10 things is slower than two agents doing 5 things each. But one agent doing 30 things will run out of context before it finishes. Scope sub-agents to coherent, bounded tasks.

Ignoring the completion order. Agents complete when they're done, not when you expect them to. The faster one might complete while the slower one is still running. Check both before doing integration work.

When not to use sub-agents

Sub-agents are powerful but not always the right tool.

Don't use them for tasks with shared state. If both agents are writing to the same store.js or modifying the same Docker compose file, you'll get conflicts. One agent's changes will overwrite the other's.

Don't use them for highly sequential work. If Step B genuinely cannot start until Step A is complete, sub-agents just add overhead. Sequential is fine.

Don't use them for quick tasks. Launching a sub-agent has overhead. For a 5-minute task, just do it directly.

Don't use them when you need to watch the work. Sub-agents run in background. If you need to review and approve intermediate steps, sequential work with explicit phase breaks is better.

The real productivity shift

When I look back at building NEXUS Ecosystem — six tools, SSO, CVE scanning, real-time logs, event-driven alerts — the sub-agent pattern is what made the scope possible for a solo developer.

Not because it's magic. Because it matches how real work is structured: parallel tracks, specialized expertise, coordinated by someone who understands the full system.

That someone is still you. The orchestration decisions, the architectural choices, the review — those are yours. Sub-agents execute. You direct.

The ceiling on what one person can build has moved. Sub-agents are a significant part of why.

NEXUS Ecosystem is open source. All the patterns above come from real sessions building it.

GitHub: github.com/Alvarito1983
Docker Hub: hub.docker.com/u/afraguas1983
Part 1: Claude Code: The Complete Guide

claudecode #ai #programming #webdev #devtools #productivity #opensource #softwaredevelopment

I run 7,000 servers at work. My homelab taught me more about reliability than any of them.

Alvarito1983 — Tue, 14 Apr 2026 08:06:03 +0000

I manage more than 7,000 servers.

They're spread across data centers on multiple continents. They run critical telecom infrastructure for millions of users. When something breaks, it affects real people, real services, real money. There are escalation procedures, change management windows, runbooks, on-call rotations, SLAs.

And yet, the place where I've learned the most about reliability isn't work.

It's my homelab.

What enterprise infrastructure teaches you

At scale, infrastructure engineering becomes a discipline of process.

You don't just restart a service — you open a change ticket, get approval, schedule a maintenance window, notify stakeholders, execute the change with a rollback plan ready, and document what happened. You don't just deploy software — you go through staging, testing, canary deployments, gradual rollout.

This is good. This is necessary. When you're responsible for infrastructure that thousands of people depend on, process is what keeps you from making catastrophic mistakes at 3am.

But process also insulates you from consequences.

When something breaks at work, there's a team. There's an escalation path. There's a senior engineer who's seen this before. There's documentation. There's a vendor support contract. The blast radius of any single mistake is contained by layers of process designed specifically to contain it.

You learn a lot. But you learn it slowly, safely, with a net.

What a homelab teaches you instead

My homelab has no runbooks. No change management. No on-call rotation except me.

When something breaks, I broke it. When something is down, it stays down until I fix it. When I make a mistake at 11pm, I'm the one staying up until 1am undoing it. When I deploy something that kills my Docker networking, my wife can't use the media server until I sort it out.

That immediacy changes how you think.

At work, I know intellectually that persistent volumes matter. In my homelab, I learned it viscerally the first time I rebuilt a container and lost three months of monitoring data because I forgot to mount the volume. I've never forgotten it since. The lesson cost me an evening. It was worth it.

At work, I understand that networking configuration is consequential. In my homelab, I understand it in my hands — because I've misconfigured it, watched everything break, and traced the problem back through docker network ls and ip route until I found it. No ticket, no escalation, no one to ask. Just me and the problem.

The difference is skin in the game.

The specific things I've learned

Persistence is everything, and no one tells you until it's too late.

Enterprise storage is managed by a storage team. The persistence layer is abstracted away. In my homelab, I'm the storage team. The first time I rebuilt a service and lost its data because I didn't understand how Docker volumes worked with my compose configuration, I understood persistence at a level I never had before. Now I think about persistence first, before I think about almost anything else.

Monitoring you build yourself is monitoring you actually understand.

At work, we have enterprise monitoring tools. They work. But they were configured by someone else, they alert on thresholds someone else set, and when they fire, the first question is always "what does this alert actually mean?"

When I built Pulse — my own uptime monitoring tool — I wrote every check, set every threshold, decided what mattered and what didn't. When it alerts, I know exactly what it means. That understanding transfers back to how I think about monitoring at work.

Failure modes you've personally caused are failure modes you never forget.

I know theoretically what happens when a container can't reach its database. I've read the documentation. I've seen the symptoms described in runbooks.

I also know exactly what it looks like in practice, because I've caused it. I misconfigured the network, watched the application fail in a confusing way, spent forty minutes figuring out it was a DNS resolution problem, and fixed it. That forty minutes of confusion is what makes the knowledge stick.

At work, someone else usually causes the problems. The experience of debugging someone else's mistake is educational, but it's different from debugging your own.

Small-scale forces clarity that large-scale obscures.

When you're managing 7,000 servers, you think in abstractions. You have to — you can't think about individual machines. But sometimes those abstractions hide things.

In my homelab, I have maybe thirty containers. I know what every one of them does. I know why it exists. I know what it depends on and what depends on it. That level of understanding is impossible at scale, but practicing it at small scale sharpens the instincts you need at large scale.

The project that came out of it

All of this eventually pushed me to build something.

I kept finding that the tools I was using in my homelab — Portainer, various monitoring solutions, alerting systems — were built for someone else's use case. They were too heavy, too complex, too assumption-laden.

So I built NEXUS Ecosystem: six self-hosted Docker tools designed specifically for the homelab and small-team use case. Container management, image update detection, uptime monitoring, CVE scanning, alerts, and a central Hub with SSO.

Building it taught me more than using any existing tool would have. I had to make every design decision. I had to understand why the architecture worked, not just how to configure it. I broke it repeatedly and had to understand why it broke.

That's the homelab ethos applied to software development.

What enterprise infrastructure still teaches you that a homelab can't

I don't want to romanticize the homelab at the expense of the real thing.

There are things you only learn at scale:

Real failure modes. A homelab doesn't have Byzantine failures, network partitions across continents, or hardware that fails in ambiguous ways while staying technically online. The edge cases that happen at scale are qualitatively different from the edge cases that happen with thirty containers.

Operational discipline. The change management process I described earlier is genuinely valuable. The instinct to document, to plan rollback, to notify stakeholders — these habits save incidents. A homelab doesn't build them the same way because the stakes are too low.

Collaboration under pressure. Debugging a production incident with four engineers on a call, each with different information, trying to converge on a diagnosis in real time — that's a skill that only develops in real incidents.

The homelab and enterprise infrastructure teach different things. The engineers I've seen grow fastest are the ones who do both — who bring the experimental instincts of the homelab into their professional work, and the operational discipline of professional work back into their homelab.

The honest takeaway

I've been in infrastructure for 15 years. I manage more servers than most people will ever touch.

And the most important lessons I've internalized — about persistence, about failure modes, about what monitoring is actually for, about what it means to truly understand a system — came from a homelab running on hardware in my house, where the consequences of getting it wrong were an annoyed partner and a ruined evening.

The stakes were low. The learning was real.

If you're an infrastructure engineer who doesn't have a homelab: build one. Not because it will make you better at your job directly. But because the freedom to break things, fix them yourself, and understand why they broke is something you can't get anywhere else.

NEXUS Ecosystem is what I built in my homelab. Open source, self-hosted, Docker-native.

GitHub: github.com/Alvarito1983
Docker Hub: hub.docker.com/u/afraguas1983

devops #homelab #docker #career #sysadmin #selfhosted #programming #discuss

I'm not a developer. I used AI to build a 6-tool software ecosystem anyway.

Alvarito1983 — Mon, 13 Apr 2026 08:38:33 +0000

Let me be upfront about something.

I'm a Computer Science Engineer. I work at a global telecommunications company managing more than 7,000 servers spread across the world. I know infrastructure deeply — HPE, VMware, Linux, Docker, AWS. I know exactly what happens when a system fails at 2am because I'm the one who fixes it.

I understand systems. I understand networks. I understand why things break and how to prevent them from breaking.

What I had never done is build and launch a software product of my own. Not because I didn't understand how things work — I knew perfectly well how to design an architecture, how services need to communicate with each other, what data needs to persist and what doesn't, when to use WebSockets and when not to. That was clear to me from the start.

The gap was something else: time and implementation speed. Converting a clear architecture into working code, component by component, endpoint by endpoint, takes months when you're doing it alone in your spare time.

AI didn't teach me to design systems. It let me build them at the speed my mind designs them.

What I built

NEXUS Ecosystem is a suite of six self-hosted Docker management tools:

NEXUS — container management (start, stop, deploy, terminal, metrics)
Watcher — automatic image update detection
Pulse — uptime monitoring for HTTP, TCP, DNS, APIs
Security — CVE vulnerability scanning + VirusTotal integration
Notify — multi-channel alerts (email, Telegram, Discord)
Hub — central control with SSO, event bus, log center, automations

Stack: React 18, Node.js, Express, Socket.io, Docker. Published on GitHub and Docker Hub. Running 24/7 on my homelab.

A team of seven engineers would have taken close to a year to build this. I built it in weeks of evening sessions, as a side project, while working full time managing global infrastructure.

How it actually works with AI

Everybody talks about AI writing code. That's not what happened here — or at least, that's not the useful frame.

The useful frame is: AI changed what I could execute on at once.

As a systems engineer, I understand distributed systems deeply. I know how Docker networking works. I know what happens when a service can't reach its dependency. I know why you need persistent volumes. I know the difference between a health check and a readiness check. I know what an event bus is and why you'd use one.

What I didn't have was the bandwidth to express all of that understanding in code at the speed I was thinking it. Designing the architecture takes minutes. Implementing it correctly, with all the edge cases, across six interconnected services — that's where the time goes.

With Claude Code, I describe the system I want in terms I already understand deeply, and it produces the implementation. I review it, I understand it, I modify it when it's wrong — and I know when it's wrong because I understand the system. I'm the architect. AI is the contractor.

I know exactly what I want built. I can specify it precisely. I can inspect the work and know when it doesn't meet the spec. I just don't have to write every line myself.

The parts that were still hard

I want to be honest about this, because most AI hype glosses over it.

Debugging distributed systems is still hard. When Security was emitting events that Hub wasn't receiving, tracking down whether the problem was the event bus, the network, the auth middleware, or the Socket.io room configuration took hours. AI helped narrow it down, but it didn't eliminate the work.

Architecture decisions are still yours. AI will implement whatever you ask it to. It won't tell you that your data model is wrong until you've built three features on top of it and discovered the problem yourself. The decisions that matter — how services communicate, where state lives, what the failure modes are — those are entirely on you. This is where domain expertise is irreplaceable.

Context management is a real skill. A large codebase across six tools has more context than fits in any single session. Knowing which files to include, how to describe the current state, when to start fresh versus continue — this is something you have to learn. The CLAUDE.md file I maintain for the project is as important as any piece of code.

You have to know enough to know when it's wrong. This is critical. When Claude Code suggested using nice and ionice to limit Grype's CPU usage inside a Docker container on Windows/WSL2, I knew immediately that was wrong — those Linux process priority tools don't work the way you'd expect in that environment. Someone without infrastructure experience might have shipped that and spent weeks confused. That judgment — knowing when the implementation is subtly broken — comes from years of experience, not from AI.

What this means for the "AI will replace developers" debate

I've been watching this debate with particular interest, because I sit in an unusual position relative to it.

Here's what I actually think:

AI didn't replace a developer to build NEXUS. It enabled someone with deep infrastructure and systems expertise — who already understood exactly what needed to be built — to build it without needing a dedicated development team.

That's a different thing. And I think it's more interesting than the replacement narrative.

The engineers I'd worry about, if I were an engineer, aren't the senior people building complex distributed systems. Their judgment, architecture instincts, and debugging skills are more valuable with AI than without — they can move faster without moving sloppier.

The role that's genuinely under pressure is the one that's primarily about translation: taking a specification from a domain expert and converting it into working code. That translation work — from "I need the system to do X" to a working implementation — is exactly what AI is getting good at.

But here's the flip side: if you're a domain expert who already understands what needs to be built, AI is giving you superpowers you didn't have before. Infrastructure engineers, DevOps specialists, data analysts, scientists — people who understand their domain deeply and can design the right solution — are suddenly able to build things at a speed that wasn't previously possible.

That's not replacement. That's expansion.

The uncomfortable part

I built software that works, that solves real problems, that has real users. It has a design system. It has real-time communication. It has security scanning. It has an SSO system. It has an event-driven automation engine.

I designed every piece of it. I made every architecture decision. I knew what needed to be built and why.

AI let me build it at the speed I designed it.

What this tells me is that the gap between "I understand how this should work" and "I can ship this" is closing fast. For people who already have the domain knowledge and the systems thinking — and just needed the implementation bandwidth — that gap is already gone.

The value of deep expertise hasn't decreased. If anything, it's more valuable now — because the people who truly understand what needs to be built can now actually build it.

Where NEXUS is now

NEXUS Ecosystem is open source and running in my homelab right now — six services, unified Hub with SSO, 24/7. The plan is to publish the Hub integration formally once it's been running stably for a few weeks.

If you're an infrastructure engineer, DevOps specialist, or systems person who has the domain knowledge and the architecture vision but has always needed a development team to execute it — that constraint is gone.

GitHub: github.com/Alvarito1983
Docker Hub: hub.docker.com/u/afraguas1983

ai #career #devops #docker #selfhosted #programming #claudecode #discuss

DEV Community: Alvarito1983

I asked Claude Code to refactor a feature it built last week. It hated its own code.

The "memory problem" everyone complains about

What actually happened in that session

The reframe

Why this changes how I think about CLAUDE.md

The mental model shift

The closing thought

The AI Coding Tools Panorama in 2026: From Claude Code to the Free Alternatives That Actually Replace It

The full panorama

Layer 1: Inline assistants (autocomplete + chat in your editor)

Layer 2: AI-native IDEs (the editor itself is the product)

Layer 3: Terminal agents (the new center of gravity)

Layer 4: Cloud agents (autonomous, parallel, expensive)

Layer 5: Specialized tools (the rest)

Top 5 paid tools that actually justify their price

1. Claude Code — $20–200/month

2. Cursor — from $20/month

3. GitHub Copilot — $10/month

4. Windsurf — from $15/month

5. Devin — $20/month + variable

Top 5 free tools that genuinely replace a paid one

1. Gemini CLI — free (1,000 requests/day)

2. Aider — free (BYOK)

3. Cline — free (BYOK, runs in VS Code)

4. GitHub Copilot Free — free (2,000 completions + 50 chats/month)

5. OpenCode — free (BYOK, free models included)

What I actually run

Quadlet: The Podman Feature That Finally Makes Sense on a Homelab

What Quadlet actually is, in one paragraph

The simplest possible example

What you actually gain

The comparison that matters: Quadlet vs Docker Compose

When Quadlet is the wrong answer

When Quadlet is genuinely the right answer

The hybrid setup that actually makes sense

The honest closing thought

The WSL2 Guide I Wish I Had: 4 Gotchas That Will Eat Your Afternoon

1. Docker Desktop, cgroups, and processes that ignore resource limits

The symptom

The root cause

What actually works

Key takeaway

2. Disk performance: /mnt/c vs native WSL filesystem

The symptom

The root cause

Real benchmark

What actually works

Key takeaway

3. Networking: ports, shifting IPs, and host access

The symptom

The root cause

What actually works

Key takeaway

4. Memory: the vmmem problem

The symptom

The root cause

What actually works

Key takeaway

Minimal .wslconfig (recommended)

Closing thoughts

Everyone is using Claude Code wrong.

The wrong mental model

What Claude Code actually needs

The second mistake: one task at a time

The third mistake: no standards document

The fourth mistake: no forcing function for updates

What changes when you get this right

I let Claude AI decide whether to patch my Docker vulnerabilities — here's what it found

The problem with "critical"

What Level 1 does

What Level 2 does

What it found on my stack

The part I didn't expect

What I learned about AI in security workflows

CLAUDE.md: the file that makes AI actually remember what you built and why

What CLAUDE.md is

What goes in it

The forcing function problem

The problem it doesn't solve

2. Disk performance: `/mnt/c` vs native WSL filesystem

4. Memory: the `vmmem` problem

Minimal `.wslconfig` (recommended)