<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community</title>
    <description>The most recent home feed on DEV Community.</description>
    <link>https://dev.to</link>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/"/>
    <language>en</language>
    <item>
      <title>From Kubernetes to a Self-Healing, Low-Cost Infrastructure</title>
      <dc:creator>Bruce Mcpherson</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:20:53 +0000</pubDate>
      <link>https://dev.to/brucemcpherson/from-kubernetes-to-a-self-healing-low-cost-infrastructure-1da4</link>
      <guid>https://dev.to/brucemcpherson/from-kubernetes-to-a-self-healing-low-cost-infrastructure-1da4</guid>
      <description>&lt;p&gt;I've been running a background project on Kubernetes for a while now. It's not a project that needs 100% uptime,  and neither is it one I wanted to spend a lot of time managing or even checking up on, so Kubernetes with Spot VM's seemed the most cost effective solution, and it's been solid and trouble free. However running a couple of pre-emptible nodes with a managed ingress was still costing $150 a month or so. Way too much for a hobby project. &lt;/p&gt;

&lt;p&gt;This article is about retaining the self healing capability you get with Kubernets, but migrating to a much more cost effective (about $40 a month) approach. I found that without Kubernetes in the the mix, i could get away with a single VM, but of course that doesn't give me recovery from pre-emption, so here's how to get that too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managed instance group
&lt;/h2&gt;

&lt;p&gt;The transition from a high-cost Google Kubernetes Engine (GKE) cluster to a single, highly available Spot VM managed by a Stateful Managed Instance Group (MIG) offers a path to significant cost savings without sacrificing resilience. By leveraging Docker Compose and automated infrastructure orchestration, the platform—comprising microservices such as GraphQL gateways, Elasticsearch processors, and Redis queues—now operates at the lowest possible compute cost while maintaining full recovery capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decoupling Compute from State
&lt;/h2&gt;

&lt;p&gt;The core challenge with using Spot VMs is their preemptible nature. To make this architecture immune to data loss during preemption, it utilizes GCP Stateful MIG Policies alongside stable device-id path targeting.&lt;/p&gt;

&lt;p&gt;A critical component of this decoupling is Storage State Preservation. Standard attachment targets (like /dev/sdb) are prone to swapping order during VM initialization. To guarantee consistency across machine replacements, the persistent volume is targeted via its unchangeable physical serial header: /dev/disk/by-id/google-existing-data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Autonomous Recovery Workflow
&lt;/h2&gt;

&lt;p&gt;When Google Cloud preempts a Spot VM, the system triggers a fully automated self-healing pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Detection &amp;amp; Provisioning: The MIG detects the deletion and instantly provisions a fresh instance node to maintain the target capacity of one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stateful Attachment: The MIG automatically binds the regional static IP and hot-plugs the persistent block storage to the new node at boot time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Guest OS Bootstrapping: A custom startup-script.sh holds execution until hardware attachment is verified. It then mounts the filesystem, installs the Docker Engine, and restarts microservices seamlessly using ./start-all.sh.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This entire process typically brings the platform back online with zero manual intervention within 60–90 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational States: Preemption vs. Scaling Down
&lt;/h2&gt;

&lt;p&gt;It is vital to distinguish between a True Spot Preemption and a Manual Scale Down to 0.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Spot Preemption: The MIG's intent is to keep one machine online. Per-instance configurations are preserved, and the recovery is fully autonomous.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scale Down to 0: This decommission command destroys the unique stateful metadata ties. When scaling back to 1, the new VM will get stuck in a boot loop because the MIG no longer knows to attach the existing disk or IP. Recovery in this scenario requires a manual orchestration script, ./create-mig.sh, to re-bind the regional static IP and existing data disk.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparing Infrastructure Models
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Kubernetes (GKE)&lt;/th&gt;
&lt;th&gt;Standalone VM&lt;/th&gt;
&lt;th&gt;Stateful MIG&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (Cluster + Nodes)&lt;/td&gt;
&lt;td&gt;Medium (Standard VM)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lowest (Spot VM)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Preemption Recovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;Manual Recreate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fully Automatic&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Volume Mounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PVCs with GCE PD&lt;/td&gt;
&lt;td&gt;Local Static /dev/sdb&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Stable by-id path&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IP Persistence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K8s LoadBalancer&lt;/td&gt;
&lt;td&gt;Bound to Instance&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Preserved via Config&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Docker-compose benefit
&lt;/h2&gt;

&lt;p&gt;Previously I was using kubernetes, cloud build and the artifact registry to manage my builds and releases. This meant that testing was a bit awkward, involving minikube, ngrok and various other workaraounds. Now that I've transitioned to docker compose, the exact same scripts and yaml files work both locally on my mac and on my vm, so i have a complete end to end simulation locally. &lt;/p&gt;

&lt;h2&gt;
  
  
  Replacing the Kubernetes Ingress
&lt;/h2&gt;

&lt;p&gt;If you need to access the VM externally, you're going to need to create some kind of ingress. Under GKE I was using a managed ingress, with letsencrypt handling the ssl certificate. On our vanilla VM, we can use a traefix proxy. All my services run on docker, as does the traefik proxy. Here's how to set it up. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;start-traefik.sh&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;32m'&lt;/span&gt;
&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[1;33m'&lt;/span&gt;
&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;31m'&lt;/span&gt;
&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0m'&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uname&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"Darwin"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Skipping Traefik on Mac (not needed)"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;🚀 Starting Traefik...&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Ensure network exists&lt;/span&gt;
docker network create fid-network 2&amp;gt;/dev/null

&lt;span class="c"&gt;# Start Traefik&lt;/span&gt;
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose-traefik.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="nb"&gt;sleep &lt;/span&gt;3

&lt;span class="k"&gt;if &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; http://localhost:80 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"200&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;301&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;302"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;✅ Traefik is running&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Traefik is listening on ports 80 (HTTP) and 443 (HTTPS)."&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Check logs: docker compose -f docker-compose-traefik.yml logs -f traefik"&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;❌ Traefik may not be ready. Check logs.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;docker-compose-traefik.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;traefik&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik:v3.2&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fid-traefik&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--entrypoints.web.address=:80"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--entrypoints.websecure.address=:443"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--providers.file.directory=/etc/traefik/conf"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--certificatesresolvers.letsencrypt.acme.httpchallenge=true"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--certificatesresolvers.letsencrypt.acme.email=admin@xliberation.com"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--entrypoints.web.http.redirections.entrypoint.to=websecure"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--entrypoints.web.http.redirections.entrypoint.scheme=https"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--entrypoints.web.http.redirections.entrypoint.permanent=true"&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;80:80"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;443:443"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./conf:/etc/traefik/conf&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;traefik_certs:/letsencrypt&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;fid-network&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;traefik_certs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;fid-network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Some example scripts
&lt;/h2&gt;

&lt;p&gt;All of this is a little tricky and precise, so here are some scripts I have used to get my services running, along with a few hints. I'll assume you already have a reserved static address (if you need to expose publicly)&lt;/p&gt;

&lt;h4&gt;
  
  
  Your startup script (startup-script.sh)
&lt;/h4&gt;

&lt;p&gt;This is mine - note the connection string that uses the by-id path. &lt;code&gt;google-exisiting-data&lt;/code&gt; refers to the persistent disk it should attach. This is not the disk name (in my case that name is fid-data), but a standard name that a mig applies to an incoming state fule disk attachment. Note the subsequent mount command that mounts the disk as its correct name. The systemd platform file describes what to do once all that is complete. I have a start-all.sh there that will actually start all my services using the persistent disk fid-data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# --- 1. Wait for persistent disk to physically attach ---&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Waiting for persistent disk to attach..."&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; /dev/disk/by-id/google-existing-data &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;1
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Give the storage driver a brief moment to stabilize the block mapping&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;2

&lt;span class="c"&gt;# --- 2. Mount persistent disk ---&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /mnt/disks/fid-data
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; mountpoint &lt;span class="nt"&gt;-q&lt;/span&gt; /mnt/disks/fid-data&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;mount &lt;span class="nt"&gt;-o&lt;/span&gt; discard,defaults /dev/disk/by-id/google-existing-data /mnt/disks/fid-data
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# --- 3. Create symlink to repo on persistent disk ---&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /home/brucemcpherson
&lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; brucemcpherson:brucemcpherson /home/brucemcpherson
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sf&lt;/span&gt; /mnt/disks/fid-data/fidmaster /home/brucemcpherson/fidmaster

&lt;span class="c"&gt;# --- 4. Install Official Modern Docker Engine &amp;amp; Compose ---&lt;/span&gt;
apt-get update &lt;span class="nt"&gt;-qq&lt;/span&gt;
apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;-qq&lt;/span&gt; ca-certificates curl gnupg

&lt;span class="c"&gt;# Add Docker's official GPG key and repository&lt;/span&gt;
&lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; 0755 &lt;span class="nt"&gt;-d&lt;/span&gt; /etc/apt/keyrings
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://download.docker.com/linux/debian/gpg | gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /etc/apt/keyrings/docker.gpg
&lt;span class="nb"&gt;chmod &lt;/span&gt;a+r /etc/apt/keyrings/docker.gpg

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"deb [arch="&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;dpkg &lt;span class="nt"&gt;--print-architecture&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  "&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; /etc/os-release &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VERSION_CODENAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;" stable"&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;tee&lt;/span&gt; /etc/apt/sources.list.d/docker.list &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null

&lt;span class="c"&gt;# Install the exact modern packages (restores "docker compose" with a space)&lt;/span&gt;
apt-get update &lt;span class="nt"&gt;-qq&lt;/span&gt;
apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;-qq&lt;/span&gt; docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

&lt;span class="c"&gt;# --- 5. Configure Docker data root ---&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s1"&gt;'"data-root": "/mnt/disks/fid-data/docker"'&lt;/span&gt; /etc/docker/daemon.json 2&amp;gt;/dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"data-root": "/mnt/disks/fid-data/docker"}'&lt;/span&gt; | &lt;span class="nb"&gt;tee&lt;/span&gt; /etc/docker/daemon.json
    systemctl restart docker
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# --- 6. Create systemd service file ---&lt;/span&gt;
&lt;span class="c"&gt;# Using 'tee' inside the script ensures no root permission redirection blocks&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;' | tee /etc/systemd/system/fid-platform.service &amp;gt; /dev/null
[Unit]
Description=FID Platform (all services)
After=docker.service network.target
Requires=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
User=brucemcpherson
WorkingDirectory=/home/brucemcpherson/fidmaster/vm-docker/local-compose
ExecStartPre=/bin/sleep 5
ExecStart=/bin/bash /home/brucemcpherson/fidmaster/vm-docker/local-compose/start-all.sh
ExecStop=/bin/bash /home/brucemcpherson/fidmaster/vm-docker/local-compose/stop-all.sh
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;systemctl daemon-reload
systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;fid-platform

&lt;span class="c"&gt;# --- 7. Start all services ---&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /home/brucemcpherson/fidmaster/vm-docker/local-compose
./start-all.sh

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Startup complete."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Create a managed group template
&lt;/h4&gt;

&lt;p&gt;Note that the template references this startup-script. If you subsequently change the startup-script, you'll need to replace the template.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instance-templates create fid-vm-template-ephemeral &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west2 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-standard-4 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-12 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--boot-disk-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;50GB &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--boot-disk-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;pd-ssd &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--provisioning-model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;SPOT &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata-from-file&lt;/span&gt; startup-script&lt;span class="o"&gt;=&lt;/span&gt;startup-script.shgcloud 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Create a managed group (create-mig.sh)
&lt;/h4&gt;

&lt;p&gt;The size=1 means i want a single VM. This VM will be called your_mig_name-random_chars.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# 1. Define your known environment variables&lt;/span&gt;
&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your project"&lt;/span&gt;
&lt;span class="nv"&gt;REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"europe-west2"&lt;/span&gt;
&lt;span class="nv"&gt;ZONE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"europe-west2-b"&lt;/span&gt;
&lt;span class="nv"&gt;MIG_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"the name for your new mig"&lt;/span&gt;
&lt;span class="nv"&gt;IP_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your static ip name"&lt;/span&gt;
&lt;span class="nv"&gt;DISK_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your persistent disk name"&lt;/span&gt;

&lt;span class="c"&gt;# Check if the MIG already exists before trying to create it&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;gcloud compute instance-groups managed describe &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Managed Instance Group '&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;' already exists. Skipping creation..."&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Creating Managed Instance Group '&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;'..."&lt;/span&gt;
    gcloud compute instance-groups managed create &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--template&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;fid-vm-template-ephemeral &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

    &lt;span class="c"&gt;# Give GCP a moment to spin up the instance before querying it&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Waiting 15 seconds for instance to initialize..."&lt;/span&gt;
    &lt;span class="nb"&gt;sleep &lt;/span&gt;15
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# 2. Automatically grab the dynamic instance name&lt;/span&gt;
&lt;span class="nv"&gt;INSTANCE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud compute instances list &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"name~'^fid-mig-'"&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(name)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Safety check: Ensure an instance actually exists&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Error: No running instance found starting with '&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;-'."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Found target instance: &lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Handle the per-instance config cleanly (Create if missing, update if exists)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Configuring stateful IP and Disk resources..."&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;gcloud compute instance-groups managed instance-configs describe &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--instance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;CONFIG_ACTION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"update"&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nv"&gt;CONFIG_ACTION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"create"&lt;/span&gt;
&lt;span class="k"&gt;fi

&lt;/span&gt;gcloud compute instance-groups managed instance-configs &lt;span class="nv"&gt;$CONFIG_ACTION&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--instance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--stateful-external-ip&lt;/span&gt; interface-name&lt;span class="o"&gt;=&lt;/span&gt;nic0,address&lt;span class="o"&gt;=&lt;/span&gt;projects/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/regions/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/addresses/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$IP_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--stateful-disk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;device-name&lt;span class="o"&gt;=&lt;/span&gt;existing-data,source&lt;span class="o"&gt;=&lt;/span&gt;projects/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/zones/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/disks/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DISK_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;,auto-delete&lt;span class="o"&gt;=&lt;/span&gt;never

&lt;span class="c"&gt;# 4. Trigger the MIG to apply these stateful settings to the live VM&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Applying configurations to &lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
gcloud compute instance-groups managed update-instances &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIG_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--instances&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Done! Dynamic setup complete."&lt;/span&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Ssh to your instance (ssh.sh)
&lt;/h4&gt;

&lt;p&gt;My mig group is called fid-mig, so i can extract the instance name and attach to it like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute ssh &lt;span class="si"&gt;$(&lt;/span&gt;gcloud compute instances list &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"name~'^fid-mig-'"&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(name)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west2-b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Simulate a pre-emption to test (simulate-preemption.sh)
&lt;/h4&gt;

&lt;p&gt;In this case, just deleting the instance will simulate a preemption. It will come back up and execute your startup without needing intervention. This differs from the deliberate resizing the MIG to 0 which would need manual intervention to restart the VM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# 1. Find the current instance name&lt;/span&gt;
&lt;span class="nv"&gt;INSTANCE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud compute instances list &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"name~'^fid-mig-'"&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"value(name)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# 2. Simulate a sudden crash/preemption by deleting the instance body directly&lt;/span&gt;
gcloud compute instances delete &lt;span class="nv"&gt;$INSTANCE_NAME&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west2-b &lt;span class="nt"&gt;--quiet&lt;/span&gt;

gcloud compute instance-groups managed list

Take down the VM &lt;span class="o"&gt;(&lt;/span&gt;down-mig.sh&lt;span class="o"&gt;)&lt;/span&gt;

If you&lt;span class="s1"&gt;'re not using it, might as well save the cost. All you have to do is set the Mig size to 0.

gcloud compute instance-groups managed resize fid-mig \
    --size=0 \
    --zone=europe-west2-b \
    --quiet

gcloud compute instance-groups managed list
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Bring up the VM (up-mig.sh)
&lt;/h4&gt;

&lt;p&gt;Setting the size to 1, will reinstate the vm. However at this point it knows nothing about what it's supposed to do, so you also need to run create-mig.sh to get back to a running system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instance-groups managed resize fid-mig &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;europe-west2-b &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--quiet&lt;/span&gt;

gcloud compute instance-groups managed list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  links
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ramblings.mcpher.com/kube-to-mig/" rel="noopener noreferrer"&gt;article&lt;/a&gt;&lt;br&gt;
&lt;a href="https://youtu.be/U17bgNQwowg" rel="noopener noreferrer"&gt;video&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>infrastructure</category>
      <category>kubernetes</category>
      <category>sre</category>
    </item>
    <item>
      <title>Human Attention is a Scarce Resource</title>
      <dc:creator>Rizèl Scarlett</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:20:08 +0000</pubDate>
      <link>https://dev.to/entire/human-attention-is-a-scarce-resource-1g1o</link>
      <guid>https://dev.to/entire/human-attention-is-a-scarce-resource-1g1o</guid>
      <description>&lt;p&gt;I recently chatted with a Distinguished Engineer about how he uses agents in his engineering workflow and how he builds new team processes around AI-generated work.&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-2068069283184144450-524" src="https://platform.twitter.com/embed/Tweet.html?id=2068069283184144450"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-2068069283184144450-524');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2068069283184144450&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;p&gt;During the conversation, David Fowler made a really poignant take: when producing code becomes trivial, human attention becomes scarce resource. 🤯&lt;/p&gt;

&lt;p&gt;You can listen to our &lt;a href="https://open.spotify.com/episode/7nwblq1vppZesoInf656cN?si=gkwcvI66QN2yKJZ5Td3kQA" rel="noopener noreferrer"&gt;full conversation&lt;/a&gt; on Spotify. (It's a recording of a Twitter Space I did with him. People have said they really enjoyed the spaces I run, so I saved them as lightly edited podcast episodes).&lt;/p&gt;

&lt;p&gt;See below for my own thoughts on code review in the world of agentic coding: &lt;/p&gt;

&lt;h2&gt;
  
  
  My own thoughts
&lt;/h2&gt;

&lt;p&gt;For decades, software engineering has relied on a foundational necessity: a reliable paper trail. As version control matured from changelogs to CVS to modern Git diffs, we built our craft around durable artifacts that preserve intent, track progress, and safeguard quality.&lt;/p&gt;

&lt;p&gt;Historically, this system worked because software development operated at human speed. An engineer reasoned through a problem, committed code, and opened a pull request. Colleagues reviewed that code, engaging in a back-and-forth dialogue to unpack the underlying logic. We deliberately used this collaborative friction to maintain code quality.&lt;/p&gt;

&lt;p&gt;But today, a new class of autonomous collaborators has disrupted the traditional engineering workflow.&lt;/p&gt;

&lt;p&gt;Coding agents have drastically compressed the implementation window from typing code line-by-line to writing a single prompt that generates a full feature. The most forward facing teams are already moving past single-agent execution, orchestrating parallel sessions where a main agent manages subagents to complete larger bodies of work on demand. (When I worked at Block, this became the way many teams I encountered worked).&lt;/p&gt;

&lt;p&gt;This sudden leap in speed is intoxicating, but it threatens to outrun our ability to keep software trustworthy. In practice, an engineer uses an agent to generate hundreds of lines, but the engineer only skims the results. Reviewers facing a growing backlog do the same. If the Git diff looks right, the team ships. Then a production outage occurs. In the past, you could bring the authoring engineer into the incident room to trace their logic and patch the system. But when the decision was made by one of a dozen subagents running in parallel, there is no one to bring in, and the commit history shows only the result, not the reasoning.&lt;/p&gt;

&lt;p&gt;This is the central problem I keep seeing in AI-native development: we can now produce code faster than we can understand it. And yet, for an industry obsessed with artifacts, meticulously tracking commits, pull requests, and logs, we often throw away the one record that explains all of them: the agent session itself.&lt;/p&gt;

&lt;p&gt;The company I work at, &lt;a href="//entire.io"&gt;Entire&lt;/a&gt;, has been aiming to maintain velocity without sacrificing engineering integrity by preserving agent sessions alongside the tools developers already use. The full chain of AI-assisted work, including prompts, responses, tool calls, subagent activity, checkpoints, and the final commit, can all become part of the engineering context. The mechanics do not have to be heavy. We do this using lightweight hooks around the agent and Git workflow.&lt;/p&gt;

&lt;p&gt;But the purpose of capturing session history is not for users to simply reread a diary of logs and transcripts. (A lot of people tell me, "So what, I don't want to read the logs!") Instead, session history can become an active surface for understanding, unlocking capabilities that standard Git diffs simply cannot support.&lt;/p&gt;

&lt;p&gt;When an engineer or agent needs to understand a complex block of autonomous code, the investigation should not stop at a timestamp, a commit hash, or a best guess. I want to be able to ask why this implementation exists, what prompt produced it, which agent or subagent touched it, and what validation or review context shaped the final result.&lt;/p&gt;

&lt;p&gt;If we want to build at this new speed without losing our grip on the codebase, a static diff of the final code is no longer enough. When the system breaks at 2:00 AM, we cannot rely only on tools built for a human pace to audit agent-to-human collaboration. To keep our systems reliable, we have to preserve more of the actual narrative of how the software came to be.&lt;/p&gt;

&lt;p&gt;Because in an AI-native world, the session is the story.&lt;/p&gt;

&lt;p&gt;Check us out on &lt;a href="https://entire.io" rel="noopener noreferrer"&gt;entire.io&lt;/a&gt; !&lt;/p&gt;

&lt;p&gt;And check out this episode of a crazy time I was on an AI coding game show for CodeTV . I look back at this episode many times, and just think "Ugh, we should've used Entire. It would've made handing off work between my team so much easier."&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/9AoMFGVffV0"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
      <category>entire</category>
    </item>
    <item>
      <title>What's Actually in My AGENTS.md</title>
      <dc:creator>Mitesh Sharma</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:18:07 +0000</pubDate>
      <link>https://dev.to/miteshethos/whats-actually-in-my-agentsmd-434e</link>
      <guid>https://dev.to/miteshethos/whats-actually-in-my-agentsmd-434e</guid>
      <description>&lt;p&gt;&lt;strong&gt;The instruction file that runs my codebase started as a junk drawer. The rules that ended up mattering weren't the ones I expected.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A note on credit before I start: most of what follows isn't original to me. I picked up these ideas from other engineers sharing what worked for them — scattered across Twitter/X threads, blog posts, and conversations — then tested them against my own codebase to see what actually held up. This is a synthesis, not an invention. Where something came from the community rather than from me, I've tried to say so. Consider it a thank-you to everyone quietly posting what they've learned.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first time I wrote an AGENTS.md, I treated it like a README's annoying cousin. Tech stack, a few style notes, a line begging the agent to run the tests. Then I watched the agent ignore half of it and confidently do the wrong thing with the other half.&lt;/p&gt;

&lt;p&gt;So I kept editing. Months later, the file looks nothing like where it started. And the surprising part is which rules earned their place. It wasn't the formatting conventions or the careful style guidance. It was a small number of rules that fight an agent's worst instincts, plus one hard lesson about the difference between asking and enforcing.&lt;/p&gt;

&lt;p&gt;Here's what I found actually mattered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first rule isn't about code. It's about permission to stop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The very first thing in my file has nothing to do with syntax. It tells the agent that when something is unclear, it should stop and ask. State your assumptions. If two interpretations exist, name both instead of silently picking one. If a simpler approach exists, say so.&lt;/p&gt;

&lt;p&gt;I put this first because the default failure mode of a coding agent isn't bad code. It's confident code built on a wrong reading of an ambiguous request. Left alone, a model resolves ambiguity quietly and keeps going, and you don't find out until you're reviewing two hundred lines that solve the wrong problem.&lt;/p&gt;

&lt;p&gt;What surprised me was that the agent needed permission to stop. Without it, the model optimizes for looking helpful, and looking helpful means producing something rather than admitting confusion. Telling it that stopping to ask is the desired outcome — not a failure — changed its behavior more than any other single line.&lt;/p&gt;

&lt;p&gt;The lesson generalized: an agent will fill silence with confidence unless you tell it that "I'm not sure" is an acceptable answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most of the useful rules just name a bad habit and forbid it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once I started paying attention, I noticed that my most valuable rules all had the same shape. Each one named a specific, predictable bad instinct and told the agent not to do it. None of them were clever. They were just earned.&lt;/p&gt;

&lt;p&gt;Agents over-build. Ask for a function and you can get a configurable framework with options nobody requested. So one rule is blunt about it: minimum code that solves the problem, nothing speculative, and if you wrote two hundred lines where fifty would do, rewrite it. Naming the tax is what keeps it from being paid.&lt;/p&gt;

&lt;p&gt;Agents over-reach. An agent that helpfully reformats the whole file alongside its actual change turns a ten-line diff into a two-hundred-line one you can't safely review. So another rule constrains the blast radius: touch only what you must, match the existing style even if you'd do it differently, and make every changed line trace back to the request. This one is really about protecting review, not code. A diff you can't read is a diff you can't trust.&lt;/p&gt;

&lt;p&gt;The rule I'd most recommend stealing is one I didn't appreciate until it bit me. When an agent finds two competing patterns in a codebase, its instinct is to find a middle path that honors both. The result belongs to neither pattern and confuses everyone who reads it later. So the rule is: don't average conflicting patterns. Pick one, explain why, flag the other for cleanup. Average code that satisfies two contradictory rules is the worst code in the repo.&lt;/p&gt;

&lt;p&gt;There's a companion to all of these that I keep coming back to: if the agent can't explain why existing code is shaped the way it is, it should ask before adding next to it. "Looks unrelated to me" is the most expensive assumption in any mature codebase. Most subtle breakages come from changes that looked perfectly isolated to the person making them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lesson that reshaped the whole file&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If I could keep only one thing I learned, it's this, and it took the longest to accept.&lt;/p&gt;

&lt;p&gt;I once had a hard requirement written in plain, unmissable language at the top of the file. The agent still skipped it sometimes. Not often — just often enough to cause real problems. When I dug into why, the answer was deflating in its simplicity: models drift from instructions. A rule in a markdown file is a strong suggestion, not a contract. Over enough runs, any instruction gets ignored eventually.&lt;/p&gt;

&lt;p&gt;That one realization split my entire file into two tiers.&lt;/p&gt;

&lt;p&gt;There's guidance — the style preferences, the philosophy, the "prefer this over that." Guidance lives in prose and gets followed most of the time, and most of the time is fine for that category.&lt;/p&gt;

&lt;p&gt;Then there's the stuff that has to hold every single time. And what I learned is that this stuff does not belong in prose at all. If something absolutely must happen, don't rely on instructions. Enforce it. Put it in a hook, a script, a CI check — something deterministic that makes the violation impossible to merge, not merely discouraged.&lt;/p&gt;

&lt;p&gt;My file still says "run the checks before you call it done." But the real guarantee isn't that sentence. It's that CI fails on lint errors no matter what the agent intended. The instruction is a courtesy. The gate is the guarantee.&lt;/p&gt;

&lt;p&gt;The practical test I use now: every time I write "always" or "never" in an instruction file, I ask whether there's a mechanism enforcing it. If there isn't, I don't have a rule. I have a hope.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the agent prove it, not promise it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two more rules in the file are really about the gap between "I did it" and "it works."&lt;/p&gt;

&lt;p&gt;The first changes how a task is framed. Instead of "add validation," the instruction is to define what success looks like and loop until it's met: write tests for the invalid inputs, then make them pass. "Fix the bug" becomes "write a test that reproduces it, then make it pass." A vague directive becomes one with a finish line the agent can check itself against, instead of declaring victory on vibes.&lt;/p&gt;

&lt;p&gt;The second is blunt about trust: don't claim tests pass from memory — re-run them. I added this after watching the agent report a clean run that wasn't, and after watching sub-agent reviews invent bugs that didn't exist in the source. Both are the same failure: a claim untethered from a fresh check. So the rule ties every claim back to a command actually run, and tells the agent to verify a reported bug against the real code before acting on it.&lt;/p&gt;

&lt;p&gt;The principle underneath both: trust the check, not the claim. An agent's confidence is not evidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep the laws out of the manual&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One structural decision kept the whole file maintainable: my AGENTS.md doesn't try to contain the architecture.&lt;/p&gt;

&lt;p&gt;The structural laws — dependency direction, frozen contracts, the boundaries that can't be crossed — live in a separate constitution document. The working file just points at it: read this before you add a package, cross a layer, or touch anything safety-critical. And it's explicit that if a change conflicts with the constitution, the move is to refactor or open an amendment, never to slip in a violation to ship faster, because that cost compounds.&lt;/p&gt;

&lt;p&gt;The reason to separate them is pace. The working manual is full of conventions and gotchas that change constantly and should be edited freely. The constitution holds laws that should be slow to change and mechanically enforced. Cram both into one file and the stable laws get buried under operational churn. Keep them apart, and each can move at its own speed.&lt;/p&gt;

&lt;p&gt;The takeaway: things that change weekly and things that should never change don't belong in the same document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write down the scars&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The back half of my file is the part I'd most encourage every team to keep: a running log of non-obvious lessons. Not conventions — scars. The specific things that cost someone half a day and would cost the next person the same half-day if they weren't written down.&lt;/p&gt;

&lt;p&gt;They're deliberately concrete. The database pseudo-column that isn't selected by default, and the exact error you get when you forget it. The streaming API that keys its events by index instead of ID and arrives in a surprising order. The native dependency that silently fails to compile unless it's on an allowlist. None of these are principles. They're landmines, mapped.&lt;/p&gt;

&lt;p&gt;This section matters more with agents than it ever did with people, because an agent has no scar tissue. A human who lost a day to something tends to remember it. An agent will rediscover the same landmine the hard way every single time, unless the knowledge is written where it'll read it. That's the quiet value here: it's how a codebase's painful, accumulated knowledge gets inherited instead of relearned.&lt;/p&gt;

&lt;p&gt;Every unexpected hour you lose is only worth paying once. Writing it down is how you make sure of that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the file is really for&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After all the editing, here's how I think about it now. This file isn't a style guide. The formatter handles tabs and semicolons. What the file actually does is encode judgment — the judgment that used to live only in the heads of the few people who'd been around long enough to have it.&lt;/p&gt;

&lt;p&gt;The rules that matter are the ones that counteract an agent's predictable instincts: over-building, over-reaching, guessing silently, blending conflicting patterns, claiming success it never verified. And the most important distinction in the whole file is the line between guidance you state and guarantees you enforce.&lt;/p&gt;

&lt;p&gt;So if you're writing one of these, that's where I'd spend the effort. Give the agent permission to stop and ask. Name the bad habits you keep seeing and forbid them by name. Move anything that must always hold out of prose and into a check that fails the build. Point at your architecture instead of restating it. And keep a growing log of your scars, because the agent will step on every landmine you don't map.&lt;/p&gt;

&lt;p&gt;The formatting rules write themselves. The judgment is the part worth writing down.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>coding</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Hardening Unattended Raspberry Pi Edge Nodes: Watchdog, fail2ban, nftables, and the Mistakes That Take Down DNS</title>
      <dc:creator>david</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:14:01 +0000</pubDate>
      <link>https://dev.to/dwoitzik/hardening-unattended-raspberry-pi-edge-nodes-watchdog-fail2ban-nftables-and-the-mistakes-that-4gk3</link>
      <guid>https://dev.to/dwoitzik/hardening-unattended-raspberry-pi-edge-nodes-watchdog-fail2ban-nftables-and-the-mistakes-that-4gk3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://woitzik.dev/blog/raspberry-pi-edge-hardening-watchdog-fail2ban-nftables/" rel="noopener noreferrer"&gt;woitzik.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two Raspberry Pi 4Bs run AdGuard Home and Unbound for an entire home network, in an active/passive pair via Keepalived. They're physical hardware sitting on a shelf, not VMs or LXCs — no Proxmox snapshot, no PBS backup, no &lt;code&gt;terraform destroy &amp;amp;&amp;amp; apply&lt;/code&gt; to recover from a bad state. If one hangs hard at 2am, nobody notices until someone's phone can't resolve a hostname.&lt;/p&gt;

&lt;p&gt;This is the hardening pass that closed every gap I found in that setup: a hardware watchdog for total-system-freeze recovery, fail2ban for the one SSH-exposed surface, an nftables host firewall that's careful not to fight with Docker's own iptables rules, log size caps to stop slow SD-card death, and a DNS health check that works even on the day the rest of the monitoring stack is offline — which, as it turned out, was exactly the day it mattered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dwoitzik/homelab-infrastructure" rel="noopener noreferrer"&gt;View the complete homelab infrastructure source on GitHub 🐙&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "It's Just DNS" Needs More Hardening, Not Less
&lt;/h2&gt;

&lt;p&gt;The instinct with a small, single-purpose device is to leave it alone — fewer moving parts, fewer ways to break it. That's backwards for a device with no operator watching it and no automated recovery path. A k3s pod that crashes gets rescheduled in seconds. A Raspberry Pi that hard-hangs stays hung until a human walks over and pulls the power.&lt;/p&gt;

&lt;p&gt;Everything below is about closing that gap: detecting failure independently, recovering from total freezes without intervention, and not introducing a new failure mode in the process of doing any of this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Watchdog: Recovering From a Hang Software Can't See
&lt;/h2&gt;

&lt;p&gt;A crashed container gets restarted by Docker. A kernel deadlock — the whole system stops responding, nothing crashes, nothing logs anything — doesn't. Nothing is left running to notice the problem or act on it.&lt;/p&gt;

&lt;p&gt;The Broadcom SoC in a Raspberry Pi has a hardware watchdog timer: a circuit that resets the board if it isn't periodically "petted." As long as something pets it, the system is presumed alive. If petting stops — because the kernel is deadlocked and nothing can run — the watchdog fires and power-cycles the board.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /boot/firmware/config.txt
&lt;/span&gt;&lt;span class="py"&gt;dtparam&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;watchdog=on&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/systemd/system.conf
&lt;/span&gt;&lt;span class="py"&gt;RuntimeWatchdogSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;15s&lt;/span&gt;
&lt;span class="py"&gt;RebootWatchdogSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10min&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;RuntimeWatchdogSec=15s&lt;/code&gt; means systemd pets the hardware watchdog every 15 seconds while the system is healthy. If systemd itself stops running (the actual deadlock case this exists for), the pets stop, and the watchdog circuit force-resets the board. &lt;code&gt;RebootWatchdogSec=10min&lt;/code&gt; is a second, independent safety net — if a &lt;em&gt;reboot&lt;/em&gt; itself hangs (stuck somewhere in shutdown), the watchdog fires again after 10 minutes rather than leaving the board hung mid-reboot indefinitely.&lt;/p&gt;

&lt;p&gt;This requires a reboot to take effect — the &lt;code&gt;config.txt&lt;/code&gt; change only applies at boot. I gated the actual reboot behind an explicit flag (&lt;code&gt;rpi_optimize_reboot&lt;/code&gt;, default &lt;code&gt;false&lt;/code&gt;) rather than auto-rebooting a DNS server as a side effect of an Ansible run.&lt;/p&gt;

&lt;h2&gt;
  
  
  fail2ban: The One Exposed Surface
&lt;/h2&gt;

&lt;p&gt;These Pis are reachable from the entire server VLAN, and via the Keepalived VIP, present a single consistent address that's an obvious target for anything scanning the network. The only network-facing attack surface that matters here is SSH.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/fail2ban/jail.d/sshd.local
&lt;/span&gt;&lt;span class="nn"&gt;[sshd]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;ssh&lt;/span&gt;
&lt;span class="py"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;sshd&lt;/span&gt;
&lt;span class="py"&gt;maxretry&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;findtime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;10m&lt;/span&gt;
&lt;span class="py"&gt;bantime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;1h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five failed attempts within ten minutes bans the source IP for an hour. fail2ban only watches &lt;code&gt;sshd&lt;/code&gt; auth logs — it has zero interaction with the DNS path (AdGuard, Unbound, Docker). That isolation matters: a misconfigured fail2ban jail watching the wrong log file, or banning based on the wrong filter, is a self-inflicted outage risk on a box where outages are expensive. Scoping it to exactly one well-understood log source keeps the blast radius of a fail2ban misconfiguration limited to "SSH access," never to DNS itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The nftables Trap: Don't Touch /etc/nftables.conf
&lt;/h2&gt;

&lt;p&gt;This is the part that could have caused the exact outage the rest of this hardening pass exists to prevent.&lt;/p&gt;

&lt;p&gt;The obvious way to add a host firewall on Debian is to edit &lt;code&gt;/etc/nftables.conf&lt;/code&gt; and enable &lt;code&gt;nftables.service&lt;/code&gt;. The problem: that file conventionally starts with &lt;code&gt;flush ruleset&lt;/code&gt; — and Docker manages its own NAT and FORWARD chains via &lt;code&gt;iptables-nft&lt;/code&gt; (the nftables-backed iptables compatibility layer). Enabling the stock &lt;code&gt;nftables.service&lt;/code&gt; would flush ruleset on every boot, wiping out Docker's NAT rules along with it, and silently break every published container port. On a box running AdGuard with &lt;code&gt;network_mode: host&lt;/code&gt; specifically so it can bind port 53 directly — but also running other containers in bridge mode with published ports — that's not a hypothetical, it's the actual topology.&lt;/p&gt;

&lt;p&gt;The fix: don't touch &lt;code&gt;/etc/nftables.conf&lt;/code&gt; or the stock service at all. Use a separate ruleset file and a separate, custom systemd service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# /etc/nftables-hostfw.conf
table inet hostfw {
  chain input {
    type filter hook input priority filter; policy drop;
    iif "lo" accept
    ct state established,related accept
    ip protocol icmp accept
    meta l4proto ipv6-icmp accept
    tcp dport 22 accept
    tcp dport 53 accept
    udp dport 53 accept
    tcp dport 3001 accept
    tcp dport { 80, 443 } accept
    udp dport 41641 accept
    ip protocol vrrp accept
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/systemd/system/hostfw.service
&lt;/span&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Host firewall (inet hostfw table, additive — does not touch Docker's tables)&lt;/span&gt;
&lt;span class="py"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network.target docker.service&lt;/span&gt;
&lt;span class="py"&gt;Wants&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;docker.service&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;oneshot&lt;/span&gt;
&lt;span class="py"&gt;RemainAfterExit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/sbin/nft -f /etc/nftables-hostfw.conf&lt;/span&gt;
&lt;span class="py"&gt;ExecStop&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/sbin/nft delete table inet hostfw&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;multi-user.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A named table (&lt;code&gt;inet hostfw&lt;/code&gt;) in its own namespace, with &lt;code&gt;policy drop&lt;/code&gt; only on &lt;em&gt;that&lt;/em&gt; table's input chain — it's additive to whatever else nftables is doing, not a replacement of the ruleset. &lt;code&gt;After=docker.service&lt;/code&gt; and &lt;code&gt;Wants=docker.service&lt;/code&gt; ensure ordering: this table gets applied after Docker has already set up its own rules, so there's no race where this firewall's &lt;code&gt;policy drop&lt;/code&gt; briefly applies before Docker's accept rules for its own traffic exist.&lt;/p&gt;

&lt;p&gt;What this firewall &lt;strong&gt;covers&lt;/strong&gt;: SSH (22), DNS (53 — AdGuard runs &lt;code&gt;network_mode: host&lt;/code&gt;, so this is genuinely host-stack traffic, not Docker-NAT'd), AdGuard's web UI (3001), the HAProxy VIP (80/443), Tailscale (41641/udp), Keepalived VRRP.&lt;/p&gt;

&lt;p&gt;What it &lt;strong&gt;deliberately doesn't cover&lt;/strong&gt;: bridge-mode containers like Unbound (5335) and node_exporter (9100). Docker DNATs traffic to these &lt;em&gt;before&lt;/em&gt; it ever reaches the host's INPUT chain — this firewall's table never sees that traffic, confirmed by live testing, not just by reading documentation about how Docker's iptables integration works. Restricting bridge-mode container ports would require rules in Docker's own &lt;code&gt;DOCKER-USER&lt;/code&gt; chain, with careful IPv4/IPv6 handling to avoid breaking container egress. I deferred this: MikroTik already segments these Pis from the wider internet at the network layer, and the mistake-risk of getting &lt;code&gt;DOCKER-USER&lt;/code&gt; chain rules wrong on a live DNS server outweighed the marginal security benefit of restricting traffic that's already internal-only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation that actually validates the deployment path&lt;/strong&gt;, not just the live change: live-tested on the replica Pi first, with a &lt;code&gt;systemd-run&lt;/code&gt; safety-rollback timer staged before every individual change (the same dead-man's-switch pattern as the MikroTik cleanup). Then re-tested via the actual Ansible run — a separate code path from the manual live test, since a playbook can have a templating bug that a manual &lt;code&gt;nft -f&lt;/code&gt; test wouldn't catch. Then validated with an actual reboot, to confirm the systemd service correctly &lt;em&gt;reapplies&lt;/em&gt; the ruleset on boot, rather than only working because it happened to still be live-applied from the manual test. Only after the replica was fully green did the same sequence run against the primary DNS node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stopping Slow SD-Card Death
&lt;/h2&gt;

&lt;p&gt;Docker's default &lt;code&gt;json-file&lt;/code&gt; log driver has no size limit. On a box with a real disk, that's eventually a problem; on a Pi with an SD card as its only storage, it's a slow-motion outage that looks like nothing is wrong until the card is full and everything stops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/etc/docker/daemon.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-driver"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log-opts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max-size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max-file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Existing container logs were already at 17MB and 2.7MB by the time I checked — not catastrophic yet, but on a trajectory toward "disk full" with zero warning beforehand, months out. This setting only caps logs for containers &lt;em&gt;created or recreated after&lt;/em&gt; the daemon restart — it doesn't retroactively truncate what's already there. Existing oversized logs needed a manual one-time cleanup; the daemon-wide default just stops the problem from recurring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Limits: Catching a Leak Before It Takes the Whole Pi Down
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose, per service&lt;/span&gt;
&lt;span class="na"&gt;adguardhome&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512m&lt;/span&gt;
&lt;span class="na"&gt;unbound&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256m&lt;/span&gt;
&lt;span class="na"&gt;promtail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;256m&lt;/span&gt;
&lt;span class="na"&gt;node_exporter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;128m&lt;/span&gt;
&lt;span class="na"&gt;autoheal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;64m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are generous numbers, chosen from actual observed usage with real headroom — the goal isn't to constrain normal operation, it's to make sure a genuine memory leak or runaway process in one container gets killed by Docker's OOM handling for &lt;em&gt;that container&lt;/em&gt; before it starves every other process on the Pi, including the DNS resolver everything depends on. Tested incrementally on the replica first, verified via &lt;code&gt;docker inspect&lt;/code&gt; that limits were actually enforced, confirmed all containers came back &lt;code&gt;Up&lt;/code&gt; after restart, with DNS unaffected throughout — the kind of change where "looks fine" isn't sufficient confirmation on a box this important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Config Backup: The Gap Nobody Noticed
&lt;/h2&gt;

&lt;p&gt;These Pis are physical hardware — Proxmox Backup Server and Velero only cover VMs and LXCs, so neither one was ever backing these up. The gap had existed since the Pis were first deployed, just never surfaced, because nothing had ever required restoring from a backup yet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# /usr/local/bin/backup-rpi-configs.sh&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail
&lt;span class="nv"&gt;DEST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/opt/backups
&lt;span class="nv"&gt;STAMP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d-%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;tar &lt;/span&gt;czf &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DEST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/configs-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;STAMP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.tar.gz"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-C&lt;/span&gt; / opt/adguardhome/conf opt/unbound 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
ls&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DEST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/configs-&lt;span class="k"&gt;*&lt;/span&gt;.tar.gz 2&amp;gt;/dev/null | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; +15 | xargs &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Daily, via a systemd timer with randomized delay (to avoid both Pis hitting disk I/O at the exact same instant), keeping the 14 most recent snapshots. Deliberately &lt;strong&gt;local-only&lt;/strong&gt;, with no NFS or git dependency — the NFS server runs as an LXC on the Proxmox host, and depending on the thing you're backing up &lt;em&gt;away from&lt;/em&gt; failing defeats the purpose. AdGuard's config also contains a bcrypt password hash; pushing that into git history, even encrypted-at-rest on a private remote, is an unnecessary exposure for a snapshot whose only job is "let me recover the last known-good config after an accidental change."&lt;/p&gt;

&lt;h2&gt;
  
  
  Alerting That Survives the Main Alerting Stack Being Down
&lt;/h2&gt;

&lt;p&gt;This is the piece that mattered in practice, not just in theory. The homelab's primary alerting path (Prometheus → Alertmanager → Discord) runs on the k3s cluster, which runs on the Proxmox host. On the day I built this, the Proxmox host itself was down for hardware repair — which meant the entire alerting pipeline was also down, on exactly the day DNS health mattered most, since DNS was now also the only thing left running unsupervised.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Independent DNS health check — ZERO dependency on k3s/Prometheus/Alertmanager&lt;/span&gt;
&lt;span class="nv"&gt;WEBHOOK_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;span class="nv"&gt;STATE_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/lib/dns-healthcheck.state"&lt;/span&gt;
&lt;span class="nv"&gt;HOSTNAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;hostname&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

check_dns&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  dig +short +timeout&lt;span class="o"&gt;=&lt;/span&gt;3 google.com @127.0.0.1 &lt;span class="nt"&gt;-p&lt;/span&gt; 53 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  dig +short +timeout&lt;span class="o"&gt;=&lt;/span&gt;3 google.com @127.0.0.1 &lt;span class="nt"&gt;-p&lt;/span&gt; 5335 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;PREV_STATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"unknown"&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATE_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;PREV_STATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATE_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;check_dns&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then &lt;/span&gt;&lt;span class="nv"&gt;CURRENT_STATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;else &lt;/span&gt;&lt;span class="nv"&gt;CURRENT_STATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"unhealthy"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;fi

if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_STATE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PREV_STATE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_STATE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"unhealthy"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"🔴 **&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HOSTNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;**: DNS resolution failing. This alert is independent of the main monitoring stack."&lt;/span&gt;
  &lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"🟢 **&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HOSTNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;**: DNS resolution recovered."&lt;/span&gt;
  &lt;span class="k"&gt;fi
  &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;content&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;WEBHOOK_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1 &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_STATE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATE_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run every two minutes via a systemd timer. Two design choices that matter more than the script's mechanics:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It tests both layers independently&lt;/strong&gt; — AdGuard on port 53 &lt;em&gt;and&lt;/em&gt; Unbound directly on port 5335. AdGuard forwards to Unbound; testing only the front door (53) wouldn't distinguish "AdGuard is fine but its upstream resolver died" from "everything's fine." &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; between the two &lt;code&gt;dig&lt;/code&gt; calls means both have to succeed for the overall state to be healthy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It only posts on a state change&lt;/strong&gt;, not on every run. A naive healthcheck that posts every two minutes regardless of state either spams a channel into being muted (defeating the purpose) or gets its messages ignored after the first few identical ones. Tracking previous state in a file and diffing against it means the alert fires exactly twice per incident: once when it breaks, once when it recovers — and nothing in between.&lt;/p&gt;

&lt;p&gt;The webhook URL reuses the same Discord webhook Alertmanager already posts to — found, while wiring this up, to have been committed in plaintext in the cluster's own monitoring config. Worth its own fix, but explicitly out of scope for this change; noted rather than silently expanded into a second unrelated remediation in the same commit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Got Tested, Not Just Written
&lt;/h2&gt;

&lt;p&gt;Every change here got the same validation discipline, because the box matters too much to skip it: replica first, primary only after the replica was fully green; a manual live test &lt;em&gt;and&lt;/em&gt; a separate Ansible-driven test, since they're different code paths; and for anything that should survive a reboot, an actual reboot — not just trusting that a systemd unit file is correct.&lt;/p&gt;




&lt;p&gt;The pattern generalizes past Raspberry Pis: any unattended edge device — a branch-office router, an IoT gateway, a remote sensor node — has the same shape of problem. No operator watching it, no automated platform-level recovery, and a failure mode (hard hang) that ordinary application-level monitoring can't see because the monitoring agent itself is also hung. A hardware watchdog plus an alerting path with zero dependency on the thing being monitored is the minimum bar for "I'll find out if this breaks," regardless of what the device actually does.&lt;/p&gt;



</description>
      <category>homelab</category>
      <category>security</category>
      <category>networking</category>
    </item>
    <item>
      <title>IPv6 NAT66 Behind a FritzBox: The RouterOS 7 Bug That Broke WiFi Clients</title>
      <dc:creator>david</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:13:49 +0000</pubDate>
      <link>https://dev.to/dwoitzik/ipv6-nat66-behind-a-fritzbox-the-routeros-7-bug-that-broke-wifi-clients-4nha</link>
      <guid>https://dev.to/dwoitzik/ipv6-nat66-behind-a-fritzbox-the-routeros-7-bug-that-broke-wifi-clients-4nha</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://woitzik.dev/blog/mikrotik-ipv6-nat66-cgn-routeros7/" rel="noopener noreferrer"&gt;woitzik.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most homelab IPv6 guides assume you have native IPv6 from your ISP: a delegated /56 prefix, clean RA on the WAN, no NAT. That describes maybe 30% of actual deployments in Germany.&lt;/p&gt;

&lt;p&gt;The other 70% sits behind a FritzBox with DS-Lite or CGN, gets a GUA on the WAN interface via SLAAC, and has no delegated prefix to distribute internally. If you want IPv6 inside your network, you build it yourself.&lt;/p&gt;

&lt;p&gt;This is the setup I run: ULA addressing internally, NAT66 masquerade for outbound, everything Terraform-managed. It worked until RouterOS 7's router advertisement defaults caused every FritzBox WiFi client to route IPv6 through MikroTik — and then get dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dwoitzik/homelab-infrastructure" rel="noopener noreferrer"&gt;View the complete homelab infrastructure source on GitHub 🐙&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Topology
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internet
    │
FritzBox (CGN / DS-Lite)
    │  ether1 (WAN) — gets GUA via SLAAC from FritzBox
MikroTik RB5009
    ├── vlan10-mgmt   fd10::1/64
    ├── vlan20-srv    fd20::1/64
    ├── vlan30-dmz    fd30::1/64
    ├── vlan40-iot    fd40::1/64
    └── vlan100-admin fd64::1/64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The FritzBox provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IPv4 via CGN/DS-Lite (no public IPv4)&lt;/li&gt;
&lt;li&gt;IPv6 GUA prefix via RA on its LAN port — MikroTik's ether1 picks this up via SLAAC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Internally, I use ULA (&lt;code&gt;fd00::/8&lt;/code&gt;, RFC 4193). ULA is the IPv6 equivalent of RFC1918 private addressing. It's stable — it doesn't change when the ISP rotates the GUA prefix — and it works for all internal communication. The NAT66 rule masquerades ULA sources to the GUA when leaving ether1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: ULA Addresses Per VLAN
&lt;/h2&gt;

&lt;p&gt;Each VLAN gets a /64 from the &lt;code&gt;fd::/8&lt;/code&gt; space. I use the VLAN number as the second octet for readability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# terraform/stacks/network/ipv6_network.tf&lt;/span&gt;

&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ipv6_ula_prefixes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"vlan10-mgmt"&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd10::/64"&lt;/span&gt;
    &lt;span class="s2"&gt;"vlan20-srv"&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd20::/64"&lt;/span&gt;
    &lt;span class="s2"&gt;"vlan30-dmz"&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd30::/64"&lt;/span&gt;
    &lt;span class="s2"&gt;"vlan40-iot"&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd40::/64"&lt;/span&gt;
    &lt;span class="s2"&gt;"vlan100-admin"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd64::/64"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_address"&lt;/span&gt; &lt;span class="s2"&gt;"vlan_ula"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ipv6_ula_prefixes&lt;/span&gt;

  &lt;span class="nx"&gt;address&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/64"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::1/64"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;
  &lt;span class="nx"&gt;advertise&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ULA gateway for ${each.key}"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;advertise = true&lt;/code&gt; enables IPv6 ND (Neighbor Discovery) on each interface. Hosts on each VLAN receive a Router Advertisement with the /64 prefix and auto-configure a ULA address via SLAAC. No DHCPv6 needed.&lt;/p&gt;

&lt;p&gt;The router address is &lt;code&gt;::1&lt;/code&gt; in each /64: &lt;code&gt;fd10::1/64&lt;/code&gt;, &lt;code&gt;fd20::1/64&lt;/code&gt;, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Accept RA on ether1
&lt;/h2&gt;

&lt;p&gt;MikroTik defaults to ignoring Router Advertisements when &lt;code&gt;forward = true&lt;/code&gt; (i.e., when acting as a router). You have to explicitly enable RA acceptance on the WAN interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_settings"&lt;/span&gt; &lt;span class="s2"&gt;"global"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;accept_router_advertisements&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"yes"&lt;/span&gt;
  &lt;span class="nx"&gt;forward&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this, ether1 accepts the RA from the FritzBox and configures its GUA via SLAAC. &lt;code&gt;ip6 address print&lt;/code&gt; will show the GUA alongside the manually configured ULA if you have any internal IPv6 config on ether1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: NAT66 Masquerade
&lt;/h2&gt;

&lt;p&gt;The NAT66 rule masquerades outbound IPv6 from ULA sources to the GUA on ether1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_nat"&lt;/span&gt; &lt;span class="s2"&gt;"nat66_masquerade"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"srcnat"&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"masquerade"&lt;/span&gt;
  &lt;span class="nx"&gt;src_address&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd00::/8"&lt;/span&gt;
  &lt;span class="nx"&gt;out_interface&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ether1"&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NAT66: ULA → WAN GUA (FritzBox upstream)"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;src_address = "fd00::/8"&lt;/code&gt; constraint is critical. Without it, the rule matches ALL IPv6 traffic leaving ether1 — including traffic from FritzBox WiFi clients that happens to transit MikroTik. This is one half of the bug that caused problems (more on that below).&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: IPv6 Firewall
&lt;/h2&gt;

&lt;p&gt;The IPv6 firewall mirrors the IPv4 firewall philosophy: default-drop, explicit allows, &lt;code&gt;place_before&lt;/code&gt; for deterministic rule ordering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# INPUT chain&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_in_00_established"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"accept"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"input"&lt;/span&gt;
  &lt;span class="nx"&gt;connection_state&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"established,related,untracked"&lt;/span&gt;
  &lt;span class="nx"&gt;place_before&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;routeros_ipv6_firewall_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v6_in_01_icmpv6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-IN-00: Allow established/related"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_in_01_icmpv6"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"accept"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"input"&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"icmpv6"&lt;/span&gt;
  &lt;span class="nx"&gt;place_before&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;routeros_ipv6_firewall_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v6_input_drop_all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-IN-01: Allow ICMPv6 (NDP, RA, ping6)"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_input_drop_all"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"drop"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"input"&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-IN-DROP: Drop all other IPv6 input"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# FORWARD chain&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_fwd_00_established"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"accept"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"forward"&lt;/span&gt;
  &lt;span class="nx"&gt;connection_state&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"established,related,untracked"&lt;/span&gt;
  &lt;span class="nx"&gt;place_before&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;routeros_ipv6_firewall_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v6_fwd_01_icmpv6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-FWD-00: Allow established/related"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_fwd_01_icmpv6"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"accept"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"forward"&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"icmpv6"&lt;/span&gt;
  &lt;span class="nx"&gt;place_before&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;routeros_ipv6_firewall_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v6_fwd_02_internal_out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-FWD-01: Allow ICMPv6"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_fwd_02_internal_out"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"accept"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"forward"&lt;/span&gt;
  &lt;span class="nx"&gt;src_address&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"fd00::/8"&lt;/span&gt;
  &lt;span class="nx"&gt;out_interface&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ether1"&lt;/span&gt;
  &lt;span class="nx"&gt;place_before&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;routeros_ipv6_firewall_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;v6_forward_drop_all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-FWD-02: Allow internal ULA to WAN"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_firewall_filter"&lt;/span&gt; &lt;span class="s2"&gt;"v6_forward_drop_all"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;action&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"drop"&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"forward"&lt;/span&gt;
  &lt;span class="nx"&gt;comment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"V6-FWD-DROP: Drop all other IPv6 forward"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The forward rule &lt;code&gt;v6_fwd_02_internal_out&lt;/code&gt; only allows ULA sources (&lt;code&gt;fd00::/8&lt;/code&gt;) to exit via ether1. That's intentional — and it's what exposed the RouterOS 7 bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug: RouterOS 7 Sends RA on All Interfaces
&lt;/h2&gt;

&lt;p&gt;After deploying this configuration, FritzBox WiFi clients started losing IPv6 connectivity.&lt;/p&gt;

&lt;p&gt;The symptom: devices on the FritzBox WiFi (SSID, not the MikroTik VLANs) had IPv6 addresses but couldn't reach the internet via IPv6. &lt;code&gt;traceroute6&lt;/code&gt; on an affected device showed the path going through MikroTik — not the FritzBox.&lt;/p&gt;

&lt;p&gt;The cause: &lt;strong&gt;RouterOS 7 enables Router Advertisement on all interfaces by default&lt;/strong&gt;, including &lt;code&gt;ether1&lt;/code&gt; (WAN).&lt;/p&gt;

&lt;p&gt;Here's the sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MikroTik receives a GUA prefix from FritzBox via RA on ether1&lt;/li&gt;
&lt;li&gt;RouterOS 7 then &lt;em&gt;re-advertises&lt;/em&gt; a Router Advertisement on ether1 — back towards the FritzBox&lt;/li&gt;
&lt;li&gt;The FritzBox sees MikroTik advertising itself as an IPv6 router on the LAN&lt;/li&gt;
&lt;li&gt;FritzBox WiFi clients pick up MikroTik's RA and install it as their default IPv6 gateway&lt;/li&gt;
&lt;li&gt;IPv6 traffic from WiFi clients routes through MikroTik's FORWARD chain&lt;/li&gt;
&lt;li&gt;FORWARD chain only accepts &lt;code&gt;fd00::/8&lt;/code&gt; sources — GUA addresses from WiFi clients don't match&lt;/li&gt;
&lt;li&gt;Traffic dropped. IPv6 broken for all FritzBox WiFi clients.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is to disable RA on ether1. In RouterOS &lt;code&gt;/ip6/nd&lt;/code&gt;, find the ether1 entry and set &lt;code&gt;advertise=no&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The problem: as of &lt;code&gt;terraform-routeros&lt;/code&gt; provider version 1.99.1 (latest at time of writing), there is no &lt;code&gt;routeros_ipv6_nd&lt;/code&gt; resource to manage this via Terraform. The fix has to be applied manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cisco_ios"&gt;&lt;code&gt;&lt;span class="k"&gt;/ipv6/nd&lt;/span&gt; set [find interface=ether1] advertise=no
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is documented in the Terraform configuration as a comment so it doesn't get overwritten by a future &lt;code&gt;terraform apply&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# RouterOS 7 enables RA advertisement on ALL interfaces by default — including&lt;/span&gt;
&lt;span class="c1"&gt;# ether1 (WAN). Once ether1 gets a GUA via SLAAC, MikroTik starts sending RAs&lt;/span&gt;
&lt;span class="c1"&gt;# on the FritzBox LAN. FritzBox WiFi clients then use MikroTik as their IPv6&lt;/span&gt;
&lt;span class="c1"&gt;# gateway, but the FORWARD chain only allows fd00::/8 sources → GUA clients&lt;/span&gt;
&lt;span class="c1"&gt;# are dropped → IPv6 broken on FritzBox WiFi.&lt;/span&gt;
&lt;span class="c1"&gt;# RA on ether1 is disabled in RouterOS: /ipv6/nd set [find interface=ether1] advertise=no&lt;/span&gt;
&lt;span class="c1"&gt;# routeros_ipv6_nd is not exposed in terraform-routeros/routeros ≤ 1.99.1 (latest as of 2026-06).&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once &lt;code&gt;routeros_ipv6_nd&lt;/code&gt; is added to the provider (tracked upstream), this should be managed as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"routeros_ipv6_nd"&lt;/span&gt; &lt;span class="s2"&gt;"ether1_no_ra"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ether1"&lt;/span&gt;
  &lt;span class="nx"&gt;advertise&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ULA vs. GUA: Why Not Just Use the ISP Prefix?
&lt;/h2&gt;

&lt;p&gt;The obvious alternative: use the GUA prefix the FritzBox receives from the ISP, delegate a /64 to each VLAN, and skip NAT66 entirely. IPv6 was designed to eliminate NAT.&lt;/p&gt;

&lt;p&gt;The problem: German ISPs frequently rotate GUA prefixes. A prefix change means every device on every VLAN gets a new address — breaking DNS records, Ansible inventory, firewall rules, and anything else that references addresses directly.&lt;/p&gt;

&lt;p&gt;ULA solves this. The &lt;code&gt;fd::/8&lt;/code&gt; prefix is locally assigned and never changes. Internal addressing is stable forever. The NAT66 rule handles the GUA ↔ ULA translation at the WAN boundary transparently.&lt;/p&gt;

&lt;p&gt;The trade-off: ULA + NAT66 breaks end-to-end IPv6 reachability (GUA hosts on the internet can't initiate connections to your ULA hosts). For a homelab where all inbound connections come through a Cloudflare Tunnel or Traefik ingress anyway, that's not a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying the Setup
&lt;/h2&gt;

&lt;p&gt;After applying the Terraform config and the manual RA fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From a device on vlan20-srv (should have fd20::/64 address)&lt;/span&gt;
ip &lt;span class="nt"&gt;-6&lt;/span&gt; addr show
&lt;span class="c"&gt;# Should see: fd20::xxx/64&lt;/span&gt;

&lt;span class="c"&gt;# Test outbound IPv6&lt;/span&gt;
ping6 &lt;span class="nt"&gt;-c&lt;/span&gt; 3 ipv6.google.com
&lt;span class="c"&gt;# Should succeed (NAT66 masquerades the ULA source to the GUA)&lt;/span&gt;

&lt;span class="c"&gt;# From a FritzBox WiFi device&lt;/span&gt;
ip &lt;span class="nt"&gt;-6&lt;/span&gt; route show
&lt;span class="c"&gt;# Default via should point to FritzBox, not MikroTik&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If WiFi clients still route through MikroTik after setting &lt;code&gt;advertise=no&lt;/code&gt;, run &lt;code&gt;ip6/nd print&lt;/code&gt; on the RouterOS terminal to verify the change persisted. RouterOS can be slow to propagate ND configuration changes.&lt;/p&gt;




&lt;p&gt;The same ULA-vs-GUA stability trade-off shows up in Azure networking — except there it's RFC1918 address space behind NAT Gateway or Azure Firewall instead of a CGN ISP. If you're designing the equivalent zero-trust network layer for Azure, the same default-deny-plus-explicit-allow philosophy applies.&lt;/p&gt;



</description>
      <category>mikrotik</category>
      <category>networking</category>
      <category>homelab</category>
    </item>
    <item>
      <title>npm Dependencies: How to Evaluate a Library Before Shipping It to Production</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:02:41 +0000</pubDate>
      <link>https://dev.to/jtorchia/npm-dependencies-how-to-evaluate-a-library-before-shipping-it-to-production-3bo3</link>
      <guid>https://dev.to/jtorchia/npm-dependencies-how-to-evaluate-a-library-before-shipping-it-to-production-3bo3</guid>
      <description>&lt;h1&gt;
  
  
  npm Dependencies: How to Evaluate a Library Before Shipping It to Production
&lt;/h1&gt;

&lt;p&gt;Back in 2005, when I was 16 and managing the network at a cyber café, I learned something no manual ever taught me: every cable you plugged in was debt. If the vendor for that cable disappeared or changed the connector, the problem was yours. Not the vendor's, not the customer's. Yours. Today, when I look at a &lt;code&gt;package.json&lt;/code&gt; with 180 direct dependencies in a TypeScript project, I think exactly the same thing. Every entry in that file is a cable someone is going to have to maintain. And in most cases, that someone is you.&lt;/p&gt;

&lt;p&gt;My take is direct: &lt;strong&gt;adding an npm dependency isn't just installing code — it's assuming its maintenance, its CVE history, its transitive dependencies, and the exit cost when the library gets abandoned&lt;/strong&gt;. The question isn't "does it work?" The question is "what happens when it stops working in six months?"&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Evaluating npm Dependencies Is a Maintenance Decision, Not Just a Security One
&lt;/h2&gt;

&lt;p&gt;The official npm documentation defines a package as "a file or directory described by a &lt;code&gt;package.json&lt;/code&gt;" (&lt;a href="https://docs.npmjs.com/about-packages-and-modules" rel="noopener noreferrer"&gt;npm docs&lt;/a&gt;). That's all npm guarantees as a platform: that the file exists and has metadata. Nothing about whether the author is still active, whether it has tests, whether the types are correct, or whether you'll be able to upgrade in two years without breaking half the system.&lt;/p&gt;

&lt;p&gt;What the official docs don't say — and where people get burned — is that a published package can freeze in time. The author might not have bandwidth, might abandon the project, or might simply never hear about a relevant CVE. And at that point, the debt is yours.&lt;/p&gt;

&lt;p&gt;There are three dimensions that matter before installing anything in a TypeScript project with pnpm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Active maintenance&lt;/strong&gt;: When was the last commit? Are there PRs that have gone unanswered for months? Any releases in the past year?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attack surface and types&lt;/strong&gt;: Does the package ship its own types (&lt;code&gt;@types/&lt;/code&gt;) or generate them? How many transitive dependencies does it drag in?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit cost&lt;/strong&gt;: If you need to rip it out tomorrow, how much of your own code changes?&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  How to Audit a Dependency Before &lt;code&gt;pnpm add&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The usual recipe is: search on npm, check if it has GitHub stars, install it, done. The problem is that measures popularity, not quality or longevity. Popularity and active maintenance are not the same thing.&lt;/p&gt;

&lt;p&gt;Here's the process I use, step by step and fully reproducible:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Check the Real State of the Repository
&lt;/h3&gt;

&lt;p&gt;Before installing, open the repo on GitHub and look at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Last commit on &lt;code&gt;main&lt;/code&gt;&lt;/strong&gt;: if it's been more than 12 months with no activity and it's not a stable utility library (like &lt;code&gt;lodash&lt;/code&gt;), that's a signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open issues&lt;/strong&gt;: Are there bugs sitting unanswered for months? CVEs mentioned but not patched?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CHANGELOG or releases&lt;/strong&gt;: a serious project has a version history. If it doesn't, the risk surface goes up.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Analyze Transitive Dependencies with &lt;code&gt;pnpm why&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install in an isolated test project&lt;/span&gt;
pnpm add &amp;lt;package-name&amp;gt;

&lt;span class="c"&gt;# See what it brought along&lt;/span&gt;
pnpm why &amp;lt;package-name&amp;gt;

&lt;span class="c"&gt;# Or a full dependency tree&lt;/span&gt;
pnpm list &lt;span class="nt"&gt;--depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A dependency that looks small can drag in 40 transitive packages. That's not automatically bad — but if two of those 40 have active CVEs, the problem is yours even if your own code never calls them directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Run a Security Audit From the Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic audit with npm (works on pnpm projects too)&lt;/span&gt;
npm audit

&lt;span class="c"&gt;# To see only critical and high vulnerabilities&lt;/span&gt;
npm audit &lt;span class="nt"&gt;--audit-level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;high

&lt;span class="c"&gt;# If you want the JSON to process it&lt;/span&gt;
npm audit &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="s1"&gt;'.vulnerabilities | to_entries[] | select(.value.severity == "critical")'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;npm audit&lt;/code&gt; uses the &lt;a href="https://github.com/advisories" rel="noopener noreferrer"&gt;npm Advisory Database&lt;/a&gt; to cross-reference installed versions against known CVEs. It's not infallible — there are vulnerabilities that don't have an advisory yet — but it's the minimum reasonable floor before committing to a dependency.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Verify TypeScript Types
&lt;/h3&gt;

&lt;p&gt;In a TypeScript project, a dependency without types is guaranteed friction. Check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Does the package ship its own types?&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;node_modules/&amp;lt;package&amp;gt;/package.json | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'"types"'&lt;/span&gt;

&lt;span class="c"&gt;# Are @types/ available?&lt;/span&gt;
npm info @types/&amp;lt;package&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the package doesn't ship its own types and the &lt;code&gt;@types/&lt;/code&gt; are community-maintained (not by the original author), you have two separate sources of drift. When the package updates and &lt;code&gt;@types/&lt;/code&gt; doesn't, the compiler fails in ways that aren't obvious.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Evaluate Exit Cost With Your Own Interface
&lt;/h3&gt;

&lt;p&gt;This is the step that gets skipped the most. The question isn't just "does it work today?" but "how much code do I change if I rip it out tomorrow?"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pattern that reduces exit cost:&lt;/span&gt;
&lt;span class="c1"&gt;// Wrap the dependency behind your own interface&lt;/span&gt;

&lt;span class="c1"&gt;// ❌ Using the dependency directly throughout the codebase&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;parse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;some-date-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2025-01-15&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Abstracting behind your own module&lt;/span&gt;
&lt;span class="c1"&gt;// lib/dates.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;parse&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;_parse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;some-date-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// single entry point&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the library is spread across 40 different files with no abstraction, removing it costs a major refactor. If it lives in one module, removing it costs an internal implementation swap.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Most Common Mistakes When Evaluating npm Dependencies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: Confusing Weekly Downloads With Stability
&lt;/h3&gt;

&lt;p&gt;npm download numbers include automatic mirrors, CIs, and pipelines. A library with 2M weekly downloads can have a maintainer who hasn't merged a PR in a year. Downloads are a lagging indicator of past popularity, not a guarantee of future support.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Ignoring &lt;code&gt;devDependencies&lt;/code&gt; in Projects With Build Steps
&lt;/h3&gt;

&lt;p&gt;If a vulnerable &lt;code&gt;devDependency&lt;/code&gt; is involved in the build (babel, webpack, esbuild, tsx), the code it generates can be compromised. The &lt;code&gt;devDependencies&lt;/code&gt; field in &lt;code&gt;package.json&lt;/code&gt; separates intent, not risk. If it goes through the compiler, it matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Not Looking at &lt;code&gt;peerDependencies&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check what versions of React/Node the lib expects&lt;/span&gt;
npm info &amp;lt;package&amp;gt; peerDependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A library that asks for React 17 as a peer in a React 19 project might work — or it might produce silent bugs from duplicate context. Peer conflicts are one of the most common hidden costs in stack upgrades.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Assuming a Small Package Is Safe
&lt;/h3&gt;

&lt;p&gt;Attack surface isn't proportional to size. The &lt;code&gt;event-stream&lt;/code&gt; incident in 2018 showed that a small utility package, transferred to a new maintainer, can become an attack vector. Small doesn't mean harmless. (Source: &lt;a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident" rel="noopener noreferrer"&gt;npm blog on the incident&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;This kind of risk connects directly to what I wrote about &lt;a href="https://juanchi.dev/en/blog/oauth-scope-creep-vercel-incident-audit-integrations" rel="noopener noreferrer"&gt;OAuth Scope Creep&lt;/a&gt;: attack surface accumulates at the edges, not the center.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision Matrix: Do I Add This Dependency or Not?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Green (add it)&lt;/th&gt;
&lt;th&gt;Yellow (evaluate further)&lt;/th&gt;
&lt;th&gt;Red (avoid or wrap)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Last release&lt;/td&gt;
&lt;td&gt;&amp;lt; 6 months&lt;/td&gt;
&lt;td&gt;6–18 months&lt;/td&gt;
&lt;td&gt;&amp;gt; 18 months with no activity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript types&lt;/td&gt;
&lt;td&gt;Bundled in the package&lt;/td&gt;
&lt;td&gt;Active &lt;code&gt;@types/&lt;/code&gt; aligned with the package&lt;/td&gt;
&lt;td&gt;No types or outdated &lt;code&gt;@types/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active CVEs&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Low severity, no public exploit&lt;/td&gt;
&lt;td&gt;Critical or high with no patch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transitive deps&lt;/td&gt;
&lt;td&gt;&amp;lt; 10&lt;/td&gt;
&lt;td&gt;10–40&lt;/td&gt;
&lt;td&gt;&amp;gt; 40 or deps with CVEs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exit cost&lt;/td&gt;
&lt;td&gt;Easy to wrap&lt;/td&gt;
&lt;td&gt;Moderate coupling&lt;/td&gt;
&lt;td&gt;Invasive across multiple modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active maintainer&lt;/td&gt;
&lt;td&gt;Responds to issues/PRs&lt;/td&gt;
&lt;td&gt;Slow but responds&lt;/td&gt;
&lt;td&gt;No visible activity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If a dependency lands in "Red" on more than two criteria, the right question is: do I actually need this abstraction, or can I implement the specific logic I need in 50 lines of my own code?&lt;/p&gt;

&lt;p&gt;When working on projects with pnpm workspaces — like I described in the post about &lt;a href="https://juanchi.dev/en/blog/pnpm-workspaces-monorepo-ci-railway-real-problems" rel="noopener noreferrer"&gt;pnpm workspaces and CI on Railway&lt;/a&gt; — this evaluation matters twice as much: a problematic dependency in a shared package of the monorepo gets inherited by every app in the workspace.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Checklist Can't Guarantee
&lt;/h2&gt;

&lt;p&gt;Being honest about the limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It doesn't predict future abandonment&lt;/strong&gt;: a library with recent releases can get abandoned tomorrow. The checklist measures current state, not future state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;npm audit&lt;/code&gt; doesn't cover all vectors&lt;/strong&gt;: business logic vulnerabilities, sophisticated supply chain attacks, and unreported CVEs don't show up in a standard audit. It's the floor, not the ceiling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The real exit cost only gets measured in practice&lt;/strong&gt;: estimating the cost of removing a dependency is a heuristic. Until you actually do it, it's a projection. If the project already has the dependency deeply integrated, the retrospective evaluation is more expensive than the prospective one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub metrics are indicators, not proof&lt;/strong&gt;: an archived repository can be stable because it reached feature-complete. A repo with tons of commits can be unstable due to constant refactoring. Context matters.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ: Evaluating npm Dependencies in TypeScript Projects
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How many direct dependencies is "too many" in a TypeScript project?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's no universal number. What is a warning sign is having more than 50–60 direct dependencies without having actively evaluated which ones could be replaced by your own implementations. The criterion isn't the count — it's whether every entry in &lt;code&gt;dependencies&lt;/code&gt; has a clear reason that can't be solved in fewer than 100 lines of your own code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does &lt;code&gt;pnpm&lt;/code&gt; have security advantages over &lt;code&gt;npm&lt;/code&gt; or &lt;code&gt;yarn&lt;/code&gt; for this kind of audit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pnpm has a different storage model (content-addressable store) that avoids duplication and makes the dependency tree more predictable. But for CVE audits, &lt;code&gt;npm audit&lt;/code&gt; is still the standard tool and works with any lockfile. pnpm's advantage in this context is more about tree predictability than intrinsic security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What do I do if a dependency has a CVE but no fix is available?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First, evaluate whether the CVE applies to how you're using it. Many CVEs have specific exploitation conditions that may not apply to your context. If it does apply, look for a fork with the fix, replace the dependency, or implement the minimum necessary functionality yourself. Keeping the vulnerable dependency and "noting it for later" is the most comfortable path and the most expensive one in the medium term.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it make sense to evaluate &lt;code&gt;devDependencies&lt;/code&gt; with the same rigor?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Less rigor, but not zero. Build tools, linters, and compilers that go through the CI pipeline deserve a basic review. A &lt;code&gt;devDependency&lt;/code&gt; that's only used on a local machine has less urgency than one that participates in generating the artifact going to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I evaluate a dependency when it has no visible public repository?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If an npm package doesn't have a public repository linked and has more than a couple of months of existence, the default criterion is don't install it in a serious project. The absence of a public source doesn't imply malice, but it eliminates the possibility of a code audit. Without visible source, the analysis is limited to what the package declares in its &lt;code&gt;package.json&lt;/code&gt; — which is incomplete information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does this affect maintaining a monorepo with multiple apps?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A problematic dependency in a &lt;code&gt;shared/&lt;/code&gt; package of the monorepo propagates automatically to all consumers. That makes upfront evaluation more important, not less. The cost of a CVE or a breaking change in a shared dependency multiplies by the number of apps in the workspace. It's worth spending more time on shared package dependencies than on ones specific to a single app.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Take and the Next Concrete Step
&lt;/h2&gt;

&lt;p&gt;Adding an npm dependency is a technical decision with consequences that extend far beyond the current sprint. I'm not saying you should avoid libraries — that would be absurd in an ecosystem where composition is the model. I'm saying the upfront evaluation costs half an hour and can prevent weeks of maintenance debt.&lt;/p&gt;

&lt;p&gt;What I don't buy is the idea that a package's popularity is sufficient evidence to install it without further analysis. GitHub stars don't pay the cost of a migration when the library goes unsupported.&lt;/p&gt;

&lt;p&gt;What I do buy: implementing your own logic when the dependency alternative drags in 30 transitives, has no types, or has a maintainer who hasn't responded in a year. In those cases, 80 well-tested lines of your own code are a more honest investment than delegating to a package you can't control.&lt;/p&gt;

&lt;p&gt;The next concrete step: open the &lt;code&gt;package.json&lt;/code&gt; of the most active project you're currently working on. Pick the five dependencies you know the least about. Run &lt;code&gt;pnpm why &amp;lt;package&amp;gt;&lt;/code&gt; on each one and look at the repository on GitHub. In at least one of them, you'll find something that deserves a conversation about whether it's still worth keeping.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Original sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;npm package documentation: &lt;a href="https://docs.npmjs.com/about-packages-and-modules" rel="noopener noreferrer"&gt;https://docs.npmjs.com/about-packages-and-modules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm blog — event-stream incident (2018): &lt;a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident" rel="noopener noreferrer"&gt;https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/npm-dependencies-evaluate-library-before-production" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>typescript</category>
      <category>pnpm</category>
      <category>npm</category>
    </item>
    <item>
      <title>Dependencias npm: cómo evaluar una librería antes de meterla en producción</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:02:37 +0000</pubDate>
      <link>https://dev.to/jtorchia/dependencias-npm-como-evaluar-una-libreria-antes-de-meterla-en-produccion-55e</link>
      <guid>https://dev.to/jtorchia/dependencias-npm-como-evaluar-una-libreria-antes-de-meterla-en-produccion-55e</guid>
      <description>&lt;h1&gt;
  
  
  Dependencias npm: cómo evaluar una librería antes de meterla en producción
&lt;/h1&gt;

&lt;p&gt;En 2005, cuando administraba redes en un cyber café a los 16, aprendí algo que no estaba en ningún manual: cada cable que conectabas era deuda. Si el proveedor de ese cable desaparecía o cambiaba el conector, el problema era tuyo. No del proveedor, no del cliente. Tuyo. Hoy, cuando miro un &lt;code&gt;package.json&lt;/code&gt; con 180 dependencias directas en un proyecto TypeScript, pienso exactamente lo mismo. Cada entrada en ese archivo es un cable que alguien va a tener que mantener. Y en la mayoría de los casos, ese alguien sos vos.&lt;/p&gt;

&lt;p&gt;Mi tesis es directa: &lt;strong&gt;agregar una dependencia npm no es solo instalar código — es asumir su mantenimiento, su historial de CVEs, sus dependencias transitivas y el costo de salida cuando la librería quede abandonada&lt;/strong&gt;. La pregunta no es "¿funciona?". La pregunta es "¿qué pasa cuando deje de funcionar en seis meses?".&lt;/p&gt;




&lt;h2&gt;
  
  
  Por qué evaluar dependencias npm es una decisión de mantenimiento, no solo de seguridad
&lt;/h2&gt;

&lt;p&gt;La documentación oficial de npm define un paquete como "un archivo o directorio descrito por un &lt;code&gt;package.json&lt;/code&gt;" (&lt;a href="https://docs.npmjs.com/about-packages-and-modules" rel="noopener noreferrer"&gt;npm docs&lt;/a&gt;). Eso es todo lo que garantiza npm como plataforma: que el archivo existe y tiene metadatos. Nada sobre si el autor sigue activo, si tiene tests, si los tipos son correctos o si vas a poder actualizar en dos años sin romper la mitad del sistema.&lt;/p&gt;

&lt;p&gt;Lo que la doc oficial no dice — y donde la gente se quema — es que un paquete publicado puede quedarse congelado en el tiempo. El autor puede no tener tiempo, puede abandonar el proyecto o puede simplemente no enterarse de un CVE relevante. Y en ese momento, la deuda es tuya.&lt;/p&gt;

&lt;p&gt;Hay tres dimensiones que importan antes de instalar algo en un proyecto TypeScript con pnpm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Mantenimiento activo&lt;/strong&gt;: ¿Cuándo fue el último commit? ¿Hay PRs abiertas sin respuesta hace meses? ¿Tiene releases en el último año?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superficie de ataque y tipos&lt;/strong&gt;: ¿El paquete tiene tipos propios (&lt;code&gt;@types/&lt;/code&gt;) o los genera? ¿Cuántas dependencias transitivas arrastra?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Costo de salida&lt;/strong&gt;: Si mañana necesitás sacarlo, ¿cuánto código propio cambiás?&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Cómo auditar una dependencia antes de &lt;code&gt;pnpm add&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;La receta habitual es: buscar en npm, ver si tiene estrellas en GitHub, instalarlo y listo. El problema es que eso mide popularidad, no calidad ni longevidad. Popularidad y mantenimiento activo no son lo mismo.&lt;/p&gt;

&lt;p&gt;Acá está el proceso que uso, paso a paso y reproducible:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Revisar el estado real del repositorio
&lt;/h3&gt;

&lt;p&gt;Antes de instalar, abrí el repo en GitHub y mirá:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Último commit en &lt;code&gt;main&lt;/code&gt;&lt;/strong&gt;: si tiene más de 12 meses sin actividad y no es una librería de utilidad estable (tipo &lt;code&gt;lodash&lt;/code&gt;), es una señal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Issues abiertas&lt;/strong&gt;: ¿Hay bugs sin respuesta desde hace meses? ¿CVEs mencionados y no parchados?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CHANGELOG o releases&lt;/strong&gt;: un proyecto serio tiene historial de versiones. Si no lo tiene, la superficie de riesgo sube.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Analizar las dependencias transitivas con &lt;code&gt;pnpm why&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Instalá en un proyecto de prueba aislado&lt;/span&gt;
pnpm add &amp;lt;nombre-del-paquete&amp;gt;

&lt;span class="c"&gt;# Mirá qué trajo consigo&lt;/span&gt;
pnpm why &amp;lt;nombre-del-paquete&amp;gt;

&lt;span class="c"&gt;# O un árbol completo de dependencias&lt;/span&gt;
pnpm list &lt;span class="nt"&gt;--depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Una dependencia que parece chica puede arrastrar 40 paquetes transitivos. Eso no es automáticamente malo, pero si dos de esos 40 tienen CVEs activos, el problema es tuyo aunque el código propio no los llame directamente.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Correr una auditoría de seguridad desde el inicio
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Auditoría básica con npm (funciona también en proyectos pnpm)&lt;/span&gt;
npm audit

&lt;span class="c"&gt;# Para ver solo vulnerabilidades críticas y altas&lt;/span&gt;
npm audit &lt;span class="nt"&gt;--audit-level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;high

&lt;span class="c"&gt;# Si querés el JSON para procesarlo&lt;/span&gt;
npm audit &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="s1"&gt;'.vulnerabilities | to_entries[] | select(.value.severity == "critical")'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;npm audit&lt;/code&gt; usa la base de datos del &lt;a href="https://github.com/advisories" rel="noopener noreferrer"&gt;npm Advisory Database&lt;/a&gt; para cruzar versiones instaladas contra CVEs conocidos. No es infalible — hay vulnerabilidades que aún no tienen advisory — pero es el piso mínimo razonable antes de comprometerse con una dependencia.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Verificar los tipos TypeScript
&lt;/h3&gt;

&lt;p&gt;En un proyecto TypeScript, una dependencia sin tipos es fricción garantizada. Revisá:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ¿El paquete tiene tipos propios?&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;node_modules/&amp;lt;paquete&amp;gt;/package.json | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'"types"'&lt;/span&gt;

&lt;span class="c"&gt;# ¿Hay @types/ disponibles?&lt;/span&gt;
npm info @types/&amp;lt;paquete&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si el paquete no tiene tipos propios y los &lt;code&gt;@types/&lt;/code&gt; son mantenidos por la comunidad (no por el autor original), tenés dos fuentes distintas de desfasaje. Cuando el paquete actualiza y &lt;code&gt;@types/&lt;/code&gt; no, el compilador falla de formas que no son obvias.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Evaluar el costo de salida con una interfaz propia
&lt;/h3&gt;

&lt;p&gt;Este es el paso que más se saltea. La pregunta no es solo "¿funciona hoy?" sino "¿cuánto código cambio si mañana lo saco?".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Patrón que reduce el costo de salida:&lt;/span&gt;
&lt;span class="c1"&gt;// Wrapeá la dependencia detrás de una interfaz propia&lt;/span&gt;

&lt;span class="c1"&gt;// ❌ Usar la dependencia directamente en toda la codebase&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;parse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alguna-lib-de-fechas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fecha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2025-01-15&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Abstraer detrás de un módulo propio&lt;/span&gt;
&lt;span class="c1"&gt;// lib/fechas.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;parse&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;_parse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alguna-lib-de-fechas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parsearFecha&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// un solo punto de entrada&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si la librería está en 40 archivos distintos sin abstracción, sacarla cuesta una refactorización mayor. Si está en un módulo propio, sacarla cuesta un reemplazo de implementación interno.&lt;/p&gt;




&lt;h2&gt;
  
  
  Los errores más comunes al evaluar dependencias npm
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Error 1: Confundir descargas semanales con estabilidad
&lt;/h3&gt;

&lt;p&gt;Los números de descargas en npm incluyen mirrors automáticos, CIs y pipelines. Una librería con 2M de descargas semanales puede tener un mantenedor que no mergea PRs hace un año. Las descargas son lagging indicator de popularidad pasada, no garantía de soporte futuro.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error 2: Ignorar las &lt;code&gt;devDependencies&lt;/code&gt; en proyectos con build steps
&lt;/h3&gt;

&lt;p&gt;Si una &lt;code&gt;devDependency&lt;/code&gt; vulnerada está involucrada en el build (babel, webpack, esbuild, tsx), el código que genera puede estar comprometido. El campo &lt;code&gt;devDependencies&lt;/code&gt; en &lt;code&gt;package.json&lt;/code&gt; separa intención, no riesgo. Si pasa por el compilador, importa.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error 3: No mirar el &lt;code&gt;peerDependencies&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mirá qué versiones de React/Node espera la lib&lt;/span&gt;
npm info &amp;lt;paquete&amp;gt; peerDependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Una librería que pide React 17 como peer en un proyecto React 19 puede funcionar, o puede generar bugs silenciosos de contexto duplicado. Los peer conflicts son uno de los costos ocultos más frecuentes en actualizaciones de stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error 4: Asumir que un paquete pequeño es seguro
&lt;/h3&gt;

&lt;p&gt;La superficie de ataque no es proporcional al tamaño. El incidente de &lt;code&gt;event-stream&lt;/code&gt; en 2018 mostró que un paquete de utilidad pequeño, transferido a un mantenedor nuevo, puede convertirse en vector de ataque. Que sea pequeño no lo hace inofensivo. (Fuente: &lt;a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident" rel="noopener noreferrer"&gt;npm blog sobre el incidente&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Este tipo de riesgo conecta directamente con lo que escribí sobre &lt;a href="https://juanchi.dev/es/blog/oauth-scope-creep-auditoria-integraciones-terceros-seguridad" rel="noopener noreferrer"&gt;OAuth Scope Creep&lt;/a&gt;: la superficie de ataque se acumula en los bordes, no en el centro.&lt;/p&gt;




&lt;h2&gt;
  
  
  Matriz de decisión: ¿agrego esta dependencia o no?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterio&lt;/th&gt;
&lt;th&gt;Verde (agregar)&lt;/th&gt;
&lt;th&gt;Amarillo (evaluar más)&lt;/th&gt;
&lt;th&gt;Rojo (evitar o wrappear)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Último release&lt;/td&gt;
&lt;td&gt;&amp;lt; 6 meses&lt;/td&gt;
&lt;td&gt;6-18 meses&lt;/td&gt;
&lt;td&gt;&amp;gt; 18 meses sin actividad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tipos TypeScript&lt;/td&gt;
&lt;td&gt;Incluidos en el paquete&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@types/&lt;/code&gt; activos y alineados&lt;/td&gt;
&lt;td&gt;Sin tipos o &lt;code&gt;@types/&lt;/code&gt; desactualizados&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CVEs activos&lt;/td&gt;
&lt;td&gt;Ninguno&lt;/td&gt;
&lt;td&gt;Bajos sin exploit público&lt;/td&gt;
&lt;td&gt;Críticos o altos sin parche&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deps transitivas&lt;/td&gt;
&lt;td&gt;&amp;lt; 10&lt;/td&gt;
&lt;td&gt;10-40&lt;/td&gt;
&lt;td&gt;&amp;gt; 40 o con deps con CVEs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Costo de salida&lt;/td&gt;
&lt;td&gt;Fácil de wrappear&lt;/td&gt;
&lt;td&gt;Acoplamiento moderado&lt;/td&gt;
&lt;td&gt;Invasivo en múltiples módulos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mantenedor activo&lt;/td&gt;
&lt;td&gt;Responde issues/PRs&lt;/td&gt;
&lt;td&gt;Lento pero responde&lt;/td&gt;
&lt;td&gt;Sin actividad visible&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Si una dependencia cae en "Rojo" en más de dos criterios, la pregunta correcta es: ¿realmente necesito esta abstracción o puedo implementar la lógica específica que necesito en 50 líneas propias?&lt;/p&gt;

&lt;p&gt;Cuando trabajo con proyectos usando pnpm workspaces — como describí en el post sobre &lt;a href="https://juanchi.dev/es/blog/pnpm-workspaces-monorepo-ci-railway-problemas" rel="noopener noreferrer"&gt;pnpm workspaces y CI en Railway&lt;/a&gt; — esta evaluación importa el doble: una dependencia problemática en un paquete compartido del monorepo la heredan todos los apps del workspace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lo que esta checklist no puede garantizar
&lt;/h2&gt;

&lt;p&gt;Siendo honesto sobre los límites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No predice el abandono futuro&lt;/strong&gt;: una librería con releases recientes puede quedar abandonada mañana. La checklist mide el estado actual, no el futuro.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;npm audit&lt;/code&gt; no cubre todos los vectores&lt;/strong&gt;: las vulnerabilidades de lógica de negocio, los supply chain attacks sofisticados y los CVEs no reportados no aparecen en la auditoría estándar. Es el piso, no el techo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;El costo de salida real solo se mide en práctica&lt;/strong&gt;: estimar el costo de remover una dependencia es una heurística. Hasta que no lo hacés, es una proyección. Si el proyecto ya tiene la dependencia profundamente integrada, la evaluación retrospectiva es más costosa que la prospectiva.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Las métricas de GitHub son indicadores, no pruebas&lt;/strong&gt;: un repositorio archivado puede ser estable porque llegó a feature-complete. Un repo con muchos commits puede ser inestable por refactorizaciones constantes. El contexto importa.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ: Evaluar dependencias npm en proyectos TypeScript
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;¿Cuántas dependencias directas es "demasiado" en un proyecto TypeScript?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No hay un número universal. Lo que sí es señal de alerta es tener más de 50-60 dependencias directas sin haber evaluado activamente cuáles son reemplazables por implementaciones propias. El criterio no es el conteo sino si cada entrada en &lt;code&gt;dependencies&lt;/code&gt; tiene una razón clara que no pueda resolverse en menos de 100 líneas de código propio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿&lt;code&gt;pnpm&lt;/code&gt; tiene ventajas de seguridad sobre &lt;code&gt;npm&lt;/code&gt; o &lt;code&gt;yarn&lt;/code&gt; para este tipo de auditoría?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pnpm tiene un modelo de almacenamiento distinto (content-addressable store) que evita duplicación y hace más predecible el árbol de dependencias. Pero para auditorías de CVEs, &lt;code&gt;npm audit&lt;/code&gt; sigue siendo la herramienta estándar y funciona con cualquier lockfile. La ventaja de pnpm en este contexto es más de predictibilidad del árbol que de seguridad intrínseca.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Qué hago si una dependencia tiene un CVE pero no hay fix disponible?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Primero, evaluá si el CVE aplica a cómo la usás. Muchos CVEs tienen condiciones de explotación específicas que pueden no aplicar en el contexto propio. Si aplica, buscá un fork con el fix, reemplazá la dependencia o implementá la funcionalidad mínima necesaria vos mismo. Quedarse con la dependencia vulnerable y "anotarlo para después" es el camino más cómodo y el más costoso a mediano plazo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Tiene sentido evaluar las &lt;code&gt;devDependencies&lt;/code&gt; con el mismo rigor?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Menos rigor, pero no cero. Las herramientas de build, linters y compiladores que pasan por el pipeline de CI merecen revisión básica. Una &lt;code&gt;devDependency&lt;/code&gt; que solo se usa en la máquina local tiene menos urgencia que una que participa en generar el artefacto que va a producción.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cómo evalúo una dependencia cuando no tiene repositorio público visible?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Si un paquete npm no tiene un repositorio público linkado y tiene más de un par de meses de existencia, el criterio por defecto es no instalarlo en un proyecto serio. La ausencia de fuente pública no implica malicia, pero elimina la posibilidad de auditoría de código. Sin source visible, el análisis se limita a lo que el paquete declara en su &lt;code&gt;package.json&lt;/code&gt;, que es información incompleta.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cómo afecta esto al mantenimiento de un monorepo con múltiples apps?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Una dependencia problemática en un paquete &lt;code&gt;shared/&lt;/code&gt; del monorepo se propaga a todos los consumers automáticamente. Eso hace que la evaluación previa sea más importante, no menos. El costo de un CVE o una breaking change en una dependencia compartida se multiplica por la cantidad de apps del workspace. Vale la pena dedicar más tiempo a las dependencias de los paquetes compartidos que a las específicas de una sola app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mi postura y el próximo paso concreto
&lt;/h2&gt;

&lt;p&gt;Agregar una dependencia npm es una decisión técnica con consecuencias que se extienden mucho más allá del sprint actual. No estoy diciendo que haya que evitar librerías — eso sería absurdo en un ecosistema donde la composición es el modelo. Estoy diciendo que la evaluación previa cuesta media hora y puede evitar semanas de deuda de mantenimiento.&lt;/p&gt;

&lt;p&gt;Lo que no compro es la idea de que la popularidad de un paquete sea suficiente evidencia para instalarlo sin más análisis. Las estrellas en GitHub no pagan el costo de una migración cuando la librería queda sin soporte.&lt;/p&gt;

&lt;p&gt;Lo que sí compro: implementar la lógica propia cuando la alternativa de dependencia arrastra 30 transitivas, no tiene tipos o tiene un mantenedor que no responde desde hace un año. En esos casos, 80 líneas propias bien testeadas son una inversión más honesta que delegar en un paquete que no podés controlar.&lt;/p&gt;

&lt;p&gt;El próximo paso concreto: abrí el &lt;code&gt;package.json&lt;/code&gt; del proyecto más activo en el que estés trabajando. Elegí las cinco dependencias que menos conocés en detalle. Corré &lt;code&gt;pnpm why &amp;lt;paquete&amp;gt;&lt;/code&gt; en cada una y mirá el repositorio en GitHub. En al menos una vas a encontrar algo que merece una conversación sobre si sigue valiendo la pena.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Fuente original:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;npm package documentation: &lt;a href="https://docs.npmjs.com/about-packages-and-modules" rel="noopener noreferrer"&gt;https://docs.npmjs.com/about-packages-and-modules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm blog — event-stream incident (2018): &lt;a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident" rel="noopener noreferrer"&gt;https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/evaluar-dependencias-npm-seguridad-mantenimiento" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>typescript</category>
      <category>pnpm</category>
    </item>
    <item>
      <title>AI 代理的记忆困境：从「记住」到「知道」</title>
      <dc:creator>Manoir Yantai</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:01:48 +0000</pubDate>
      <link>https://dev.to/manoir_yantai_f22f01340f0/ai-dai-li-de-ji-yi-kun-jing-cong-ji-zhu-dao-zhi-dao--2g9n</link>
      <guid>https://dev.to/manoir_yantai_f22f01340f0/ai-dai-li-de-ji-yi-kun-jing-cong-ji-zhu-dao-zhi-dao--2g9n</guid>
      <description>&lt;p&gt;当你的 AI 代理（Agent）能记住你是谁、偏好什么，它已经解决了第一层问题——&lt;strong&gt;记忆&lt;/strong&gt;。但真正让它产生工程价值的，是下一层：&lt;strong&gt;知识&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;记忆是「我见过」，知识是「我能用」。这是一个根本区别。&lt;/p&gt;

&lt;h3&gt;
  
  
  问题：Agent 的知识从哪来？
&lt;/h3&gt;

&lt;p&gt;大多数 agent 框架的「记忆系统」只做一件事：存聊天记录、存用户偏好、存几个 key-value。这叫上下文缓存，不叫知识库。&lt;/p&gt;

&lt;p&gt;你的 agent 读完一篇公众号文章能自动入库吗？看完一段抖音能提取关键信息？下载一本技术书能变成可查询的 skill？大概率不能——因为它只有记忆层，没有采集层。&lt;/p&gt;

&lt;h3&gt;
  
  
  解：记忆体 + 采集管线 = 知识闭环
&lt;/h3&gt;

&lt;p&gt;把问题拆成三层：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;采集层（40+ 工具） → 分析层（AI 处理） → 存储层（三层记忆）
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;采集层负责从一切源头拉数据——网页、视频、文档、书籍、RSS。分析层做精炼：自动总结、提取关键词、事实核查。存储层分三档：Hot（memory tool 即时取）、Warm（Hindsight 向量检索）、Cold（gbrain 知识图谱）。&lt;/p&gt;

&lt;h3&gt;
  
  
  代码实战：采集一篇文章并自动入库
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;knowledge_collector&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;collect_web&lt;/span&gt;

&lt;span class="c1"&gt;# 采集任意网页，自动提取正文、关键词，生成笔记
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;collect_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://arxiv.org/abs/2401.12345&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 三样东西已落地：
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;note_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# 结构化 Markdown 笔记
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gbrain_slug&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 知识图谱节点 ID
# OneDrive 同步已触发（无需手动调用）
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;这是最简单的用例。背后实际发生了：trafilatura 正文提取 → LLM 关键词抽取 → 笔记模板渲染 → gbrain 创建页面 → Hindsight 嵌入索引 → rclone 推云盘。全自动。&lt;/p&gt;

&lt;h3&gt;
  
  
  视频知识采集更难，但更值
&lt;/h3&gt;

&lt;p&gt;视频是信息密度最高的来源之一。采集流程：&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;yt-dlp 拉流 → Whisper ASR 转文字 → EasyOCR/PaddleOCR 关键帧文字识别&lt;/li&gt;
&lt;li&gt;LLM 综合画面+字幕做结构化摘要&lt;/li&gt;
&lt;li&gt;入库 + 知识图谱链接&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;一句话就能触发整条链路：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"from knowledge_collector import collect_video; collect_video('https://www.bilibili.com/video/BV1xx411c7mD')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  三层召回：同一条知识，三种检索路径
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;层级&lt;/th&gt;
&lt;th&gt;载体&lt;/th&gt;
&lt;th&gt;延迟&lt;/th&gt;
&lt;th&gt;精度&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hot&lt;/td&gt;
&lt;td&gt;Hermes Memory tool&lt;/td&gt;
&lt;td&gt;纳秒级&lt;/td&gt;
&lt;td&gt;精确键值&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm&lt;/td&gt;
&lt;td&gt;Hindsight 向量库（10K 节点）&lt;/td&gt;
&lt;td&gt;毫秒级&lt;/td&gt;
&lt;td&gt;语义相似&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold&lt;/td&gt;
&lt;td&gt;gbrain 知识图谱（11K 页）&lt;/td&gt;
&lt;td&gt;秒级&lt;/td&gt;
&lt;td&gt;关联推理&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;搜索时 FTS5 → Hindsight 语义 → gbrain 知识图三级回退，本地命中就不走网络。&lt;/p&gt;

&lt;h3&gt;
  
  
  踩过的坑
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;443 错误&lt;/strong&gt;：不用看排查教程——直接重试。临时网络波动比你想的常见。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;三层检索的回退阈值&lt;/strong&gt;：FTS5 匹配 &amp;gt;0 就停，不继续向量搜索——节省大量 LLM token。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;书精炼不要一次跑 700 本&lt;/strong&gt;：先用 &lt;code&gt;book_cache_manager list&lt;/code&gt; 看索引，选 3-5 本跑管线验证输出质量，再批量。&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  结语
&lt;/h3&gt;

&lt;p&gt;记忆是 agent 的基础设施，知识采集是让它真正有用的引擎。40 个采集工具不算多——每加一个来源，agent 就多一个信息维度。当你的 agent 能从网页、视频、公众号、技术书、微博中自动吸收知识并关联到已有信息时，它才真正开始「知道」自己在做什么。&lt;/p&gt;

&lt;p&gt;而不是只「记得」你说过什么。&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>You posted your OSS tool once, got silence, and never posted again</title>
      <dc:creator>J Now</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:01:42 +0000</pubDate>
      <link>https://dev.to/palo_alto_ai/you-posted-your-oss-tool-once-got-silence-and-never-posted-again-4o0o</link>
      <guid>https://dev.to/palo_alto_ai/you-posted-your-oss-tool-once-got-silence-and-never-posted-again-4o0o</guid>
      <description>&lt;p&gt;That's not a marketing failure — it's a time and friction problem. Writing for Bluesky means staying under 300 characters. X is 280. Dev.to wants 150–400 words with a code block. Mastodon is 500. Track which angle you used last week so you're not repeating yourself. Now multiply that by four platforms, every week, for every project you maintain. Nobody does this consistently, so most tools die quietly.&lt;/p&gt;

&lt;p&gt;I built marketing-pipeline to handle the recurring mechanics of distribution so I don't have to.&lt;/p&gt;

&lt;p&gt;Onboarding a project is one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;marketing onboard &lt;span class="nt"&gt;--name&lt;/span&gt; my-tool &lt;span class="nt"&gt;--repo&lt;/span&gt; owner/repo &lt;span class="nt"&gt;--kind&lt;/span&gt; mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That fetches the README, sends it to Claude, and saves extracted problem statements, facts, and rotation angles to &lt;code&gt;projects.yml&lt;/code&gt;. The &lt;code&gt;kind&lt;/code&gt; field routes the project to the right directories automatically — &lt;code&gt;mcp-server&lt;/code&gt; goes to MCP Registry, Smithery, Glama, and PulseMCP; &lt;code&gt;claude-skill&lt;/code&gt; goes to awesome-claude-code; &lt;code&gt;browser-extension&lt;/code&gt; goes to Chrome Web Store, Firefox AMO, and Edge Add-ons.&lt;/p&gt;

&lt;p&gt;The daily cycle (&lt;code&gt;marketing cycle&lt;/code&gt;) runs via GitHub Actions at 14:00 UTC on weekdays. It rotates through projects × angles × channels and picks the least-recently-used angle for each project, so you're not repeating yourself and not manually tracking what went out last Tuesday.&lt;/p&gt;

&lt;p&gt;Per-channel length limits are enforced at generation time — not as a suggestion, as a hard constraint. The anti-slop gate in &lt;code&gt;pipeline/antislop.py&lt;/code&gt; hard-rejects posts before they ship if they contain tokens like &lt;code&gt;excited&lt;/code&gt;, &lt;code&gt;game-changer&lt;/code&gt;, &lt;code&gt;unlock&lt;/code&gt;, &lt;code&gt;AI-powered&lt;/code&gt;, emoji, hashtags, or exclamation points. The goal is posts that read like a practitioner wrote them, not a launch-day press release.&lt;/p&gt;

&lt;p&gt;The one thing that can't be automated: awesome-claude-code submissions require a human to file a GitHub issue per their rules. The pipeline generates the payload; you submit it once.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/robertnowell/marketing-pipeline" rel="noopener noreferrer"&gt;https://github.com/robertnowell/marketing-pipeline&lt;/a&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>marketing</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Your browser engine doesn't need the cloud. Cryptographic Audit Ledgers for Autonomous Browser Agents proves it.</title>
      <dc:creator>Lois-Kleinner</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:01:34 +0000</pubDate>
      <link>https://dev.to/kleinner/your-browser-engine-doesnt-need-the-cloud-cryptographic-audit-ledgers-for-autonomous-browser-md8</link>
      <guid>https://dev.to/kleinner/your-browser-engine-doesnt-need-the-cloud-cryptographic-audit-ledgers-for-autonomous-browser-md8</guid>
      <description>&lt;h1&gt;
  
  
  Your browser engine doesn't need the cloud. Cryptographic Audit Ledgers for Autonomous Browser Agents proves it.
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Cryptographic Audit Ledgers for Autonomous Browser Agents: Verifiable Action Logging with SHA3-256&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Autonomous browser agents?AI-driven systems that navigate web pages, fill forms, and execute user-delegated tasks?present a fundamental accountability problem: how can users verify that an agent acted correctly and did not exceed its authority? Existing approaches rely on opaque logging in unverifiable formats, making forensic analysis and dispute resolution impractical. This paper introduces the .aioss cryptographic audit ledger, a Merkle-DAG-based append-only data structure that records every action taken by an autonomous browser agent in a tamper-evident, verifiable format.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;Each action is hashed with SHA3-256 , signed with the user's Ed25519 keypair , and linked to the preceding action via a cryptographic hash chain, producing an immutable sequence of agent operations. We define the formal grammar of audit events?including DOM mutations, synthetic clicks, navigation commands, form inputs, and data access requests?and specify the serialization and canonicalization rules necessary for deterministic ledger construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Research
&lt;/h2&gt;

&lt;p&gt;Autonomous browser agents?AI-driven systems that navigate web pages, fill forms, and execute user-delegated tasks?present a fundamental accountability problem: how can users verify that an agent acted correctly and did not exceed its authority? Existing approaches rely on opaque logging in unverifiable formats, making forensic analysis and dispute resolution impractical.&lt;/p&gt;

&lt;p&gt;This paper introduces the .aioss cryptographic audit ledger, a Merkle-DAG-based append-only data structure that records every action taken by an autonomous browser agent in a tamper-evident, verifiable format.&lt;/p&gt;

&lt;p&gt;Each action is hashed with SHA3-256 , signed with the user's Ed25519 keypair , and linked to the preceding action via a cryptographic hash chain, producing an immutable sequence of agent operations.&lt;/p&gt;

&lt;p&gt;We define the formal grammar of audit events?including DOM mutations, synthetic clicks, navigation commands, form inputs, and data access requests?and specify the serialization and canonicalization rules necessary for deterministic ledger construction.&lt;/p&gt;

&lt;p&gt;This research demonstrates that sovereign, local-first AI infrastructure is not a future possibility ? it is a present reality.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Full citation:&lt;/strong&gt; Alpasan, L.-K. (2026). Cryptographic Audit Ledgers for Autonomous Browser Agents: Verifiable Action Logging with SHA3-256. &lt;em&gt;The Anticloud Research Corpus.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://zenodo.org/search?q=anticloud" rel="noopener noreferrer"&gt;Read the full paper&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Why The Anticloud
&lt;/h3&gt;

&lt;p&gt;The AI industry is built on promises that vaporize the moment you look closely. Black box models running on opaque infrastructure, trained on data you did not consent to, monetizing outputs you did not authorize. The Anticloud is the opposite of that in every way.&lt;/p&gt;

&lt;p&gt;Everything we claim is backed by published research. There is a paper behind every component in the stack, and the code behind every paper is open. We do not make promises about what the system will do someday — we show you what it does today, and you can verify it yourself.&lt;/p&gt;

&lt;p&gt;Privacy is not a feature we added to the product. It is a property of the architecture. There are no API endpoints to harden because there is no API to expose. There is no database to encrypt because there is no database. There is no cloud to compromise because there is no cloud. We cannot protect what we do not have, and we designed the system so we have nothing to protect you from.&lt;/p&gt;

&lt;p&gt;The system does not guess. It cross-validates its own outputs, detects inconsistencies in its reasoning, and surfaces uncertainty when it does not have confidence in the answer. It knows when it does not know — and it tells you instead of generating a confident-sounding lie.&lt;/p&gt;

&lt;p&gt;We built local AI with RAG and RLHF so your knowledge base and your preference alignment stay on your hardware. The model does not need to be fine-tuned on a server farm to understand your context. It learns from your data on your machine, and the results never leave.&lt;/p&gt;

&lt;p&gt;The Anticloud requires one machine, one binary, and zero trust in anyone.&lt;/p&gt;




&lt;h3&gt;
  
  
  About the Author
&lt;/h3&gt;

&lt;p&gt;My name is Lois-Kleinner Alpasan. I'm 23 years old. I built The Anticloud.&lt;/p&gt;

&lt;p&gt;I started this because I looked at the AI industry and saw something wrong. Every major AI system requires you to send your data to someone else's server. Every "AI company" is actually a data company — they make money from your usage, your prompts, your files, your attention. They call it a service. I call it extraction.&lt;/p&gt;

&lt;p&gt;I spent the last two years building an alternative. Not a feature, not a product, not a startup looking for an exit — an entirely different infrastructure stack. One where AI runs on your machine, for you, and never needs to phone home. One where privacy is not a feature you toggle in settings but a property of the architecture. One where you don't have to trust anyone because you can verify everything.&lt;/p&gt;

&lt;p&gt;The project is near production-ready. Every component is open. Every claim is backed by published research. The code is documented. The ledger is verifiable. The binary fits on a laptop.&lt;/p&gt;

&lt;p&gt;I'm not asking for trust. I'm asking you to read the paper, verify the claims, and decide for yourself whether the cloud is really necessary — or whether it was always just the default because no one bothered to build an alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow the work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Research papers: &lt;a href="https://zenodo.org/search?q=anticloud" rel="noopener noreferrer"&gt;https://zenodo.org/search?q=anticloud&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://linkedin.com/in/kleinner" rel="noopener noreferrer"&gt;https://linkedin.com/in/kleinner&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Project: The Anticloud&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; AI, SovereignAI, Anticloud, LocalFirst, Airgapped, ZeroTrust, NoDatacenter, OpenSource, Browser Engine, Privacy, VLM, Ad Blocking&lt;/p&gt;

</description>
      <category>security</category>
      <category>opensource</category>
      <category>privacy</category>
      <category>research</category>
    </item>
    <item>
      <title>Introducing kazi-mcp: Labor Market Coordination for Kenya's 15 Million Informal Workers</title>
      <dc:creator>Gabriel Mahia</dc:creator>
      <pubDate>Mon, 22 Jun 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/gabrielmahia/introducing-kazi-mcp-labor-market-coordination-for-kenyas-15-million-informal-workers-37mm</link>
      <guid>https://dev.to/gabrielmahia/introducing-kazi-mcp-labor-market-coordination-for-kenyas-15-million-informal-workers-37mm</guid>
      <description>&lt;h1&gt;
  
  
  Introducing kazi-mcp: Labor Market Coordination for Kenya's 15 Million Informal Workers
&lt;/h1&gt;

&lt;p&gt;Kenya's informal sector employs roughly 83% of the workforce. Most of those workers — jua kali artisans, boda riders, domestic workers, hawkers — have no formal employment record, no portable skills credential, and no way to benchmark their earnings against market rates.&lt;/p&gt;

&lt;p&gt;That changes today with &lt;strong&gt;kazi-mcp&lt;/strong&gt;, the newest tool in the East Africa coordination infrastructure suite.&lt;/p&gt;

&lt;h2&gt;
  
  
  What kazi-mcp does
&lt;/h2&gt;

&lt;p&gt;kazi-mcp is an MCP server with 6 tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;kazi-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;job_match&lt;/code&gt;&lt;/strong&gt; — Input a list of skills, get ranked job matches with salary ranges in KES.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;wage_benchmark&lt;/code&gt;&lt;/strong&gt; — Query monthly gross pay benchmarks (entry/mid/senior) for any job in Kenya.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;skills_gap_analysis&lt;/code&gt;&lt;/strong&gt; — Find the gap between your current skills and a target role, with specific Kenyan training providers for each missing skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;informal_sector_registry&lt;/code&gt;&lt;/strong&gt; — Register or look up jua kali and informal worker profiles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;contract_template&lt;/code&gt;&lt;/strong&gt; — Generate Kenya Employment Act 2007-compliant contract templates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;labor_rights_query&lt;/code&gt;&lt;/strong&gt; — Ask about any Employment Act right: maternity leave, overtime, termination, NSSF/NHIF.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why labor coordination is a structural problem
&lt;/h2&gt;

&lt;p&gt;The core issue isn't wages — it's information asymmetry. An employer in Westlands doesn't know the going rate for a skilled welder in Gikomba. A domestic worker in Karen can't easily document her 10 years of experience when seeking a new placement. A fresh CS graduate doesn't know whether to negotiate for KES 60,000 or KES 120,000.&lt;/p&gt;

&lt;p&gt;kazi-mcp surfaces what was previously opaque: market rates, skill requirements, training pathways, and legal protections — all queryable by any AI agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The coordination infrastructure suite
&lt;/h2&gt;

&lt;p&gt;kazi-mcp completes the first layer of Kenya's coordination stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Payments&lt;/td&gt;
&lt;td&gt;mpesa-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insurance&lt;/td&gt;
&lt;td&gt;bima-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit&lt;/td&gt;
&lt;td&gt;mkopo-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markets&lt;/td&gt;
&lt;td&gt;soko-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reputation&lt;/td&gt;
&lt;td&gt;sifa-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Labor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;kazi-mcp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Water/Drought&lt;/td&gt;
&lt;td&gt;wapimaji-mcp&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each tool addresses a market failure where information asymmetry costs Kenyans real money. Together, they represent a queryable layer on top of East Africa's economic reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In Claude Desktop, Cursor, or any MCP client:
# "What jobs match a Python developer in Nairobi?"
# "What's the going rate for a nurse in Mombasa?"
# "What rights does a casual worker have under Kenyan law?"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source code: &lt;a href="https://github.com/gabrielmahia/kazi-mcp" rel="noopener noreferrer"&gt;github.com/gabrielmahia/kazi-mcp&lt;/a&gt;&lt;br&gt;
PyPI: &lt;code&gt;pip install kazi-mcp&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All data is synthetic demo data. This is not legal or financial advice.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>kenya</category>
      <category>africa</category>
      <category>ai</category>
    </item>
    <item>
      <title>Como uso IA para produzir conteúdo com identidade visual consistente sem equipe</title>
      <dc:creator>Nauiter Master</dc:creator>
      <pubDate>Mon, 22 Jun 2026 11:59:38 +0000</pubDate>
      <link>https://dev.to/nauitermaster/como-uso-ia-para-produzir-conteudo-com-identidade-visual-consistente-sem-equipe-36ml</link>
      <guid>https://dev.to/nauitermaster/como-uso-ia-para-produzir-conteudo-com-identidade-visual-consistente-sem-equipe-36ml</guid>
      <description>&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Contexto&lt;br&gt;
Opero múltiplos projetos digitais simultaneamente, cada um com identidade visual e tom de voz distintos. Faço isso sozinho, sem designer, sem editor e sem equipe de produção. O que permite isso funcionar não é talento manual, mas um pipeline estruturado onde a IA executa e eu dirijo.&lt;br&gt;
Este artigo documenta esse pipeline: como organizo a produção de conteúdo visual, quais ferramentas uso em cada etapa, como estruturo prompts para gerar consistência e por que trato a estética como um sistema, não como uma decisão criativa repetida a cada peça.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;O problema de escalar conteúdo solo&lt;br&gt;
Produzir conteúdo com qualidade visual consistente sem equipe exige resolver três tensões ao mesmo tempo:&lt;br&gt;
• Velocidade versus qualidade: quanto mais rápido, mais genérico tende a ficar&lt;br&gt;
• Escala versus identidade: mais volume normalmente dilui o estilo&lt;br&gt;
• Automação versus autenticidade: delegar demais para a IA elimina o que torna o conteúdo reconhecível&lt;br&gt;
A saída não é escolher um lado. É separar o que a IA faz bem do que precisa ser humano, e construir um fluxo que preserve essa divisão em cada peça produzida.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A divisão de responsabilidades&lt;br&gt;
O primeiro passo foi definir com clareza o que entra no prompt e o que não entra:&lt;br&gt;
• A IA executa: gera a imagem, aplica o estilo, segue a composição, renderiza o elemento visual&lt;br&gt;
• Eu defino: a ideia, o contexto, a emocao que a peça precisa transmitir, o elemento autoral que diferencia&lt;br&gt;
• Nunca delego: a decisão sobre o que comunicar, a seleção final e o ajuste de tom&lt;br&gt;
Essa divisão parece óbvia mas é frequentemente ignorada. Quando o criador pede para a IA decidir o que comunicar, o resultado é genérico mesmo que visualmente bonito.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Estrutura do pipeline de produção&lt;br&gt;
O fluxo que uso segue quatro etapas fixas, independentemente do projeto ou formato:&lt;br&gt;
• Conceito: defino a ideia central, o elemento autoral e a emocao alvo antes de abrir qualquer ferramenta&lt;br&gt;
• Ativos de referência: reúno imagens anteriores do mesmo projeto para garantir consistência. A IA precisa de contexto visual, não só texto&lt;br&gt;
• Engenharia de prompt: descrevo cena, composição, paleta, elementos proibidos e formato. Quanto mais específico, menos retrabalho&lt;br&gt;
• Pós-processamento: ajustes finos de brilho, saturação e recorte no Canva. A IA entrega o corpo, o refinamento é humano&lt;br&gt;
Cada etapa tem input e output definidos. Isso transforma produção criativa em processo repetivel sem perder qualidade.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Como estruturo prompts para consistência visual&lt;br&gt;
O maior erro em prompts visuais é descrever o resultado esperado sem descrever a cena. Um prompt eficiente contém:&lt;br&gt;
• Personagem ou elemento central com descrição precisa: não apenas 'homem calvo' mas proporções, expressão, postura&lt;br&gt;
• Plano de fundo com função definida: o fundo é contexto, não decoração&lt;br&gt;
• Paleta de cores como restrição: especificar o que não pode aparecer é tão importante quanto o que deve aparecer&lt;br&gt;
• Elementos proibidos explicitados: textos complexos, rostos adicionais, objetos que a IA tende a inserir automaticamente&lt;br&gt;
Manter um repositório de prompts que funcionaram por projeto é o que garante consistência ao longo do tempo. O prompt vira parte do ativo do canal, não só a imagem gerada.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Estética como sistema&lt;br&gt;
Cada projeto tem um documento de identidade visual que define:&lt;br&gt;
• Paleta primária e restrições de cor&lt;br&gt;
• Tom visual: realista, abstrato, minimalista, brutalista&lt;br&gt;
• Elementos recorrentes que criam reconhecimento imediato&lt;br&gt;
• O que nunca aparece em nenhuma peça do projeto&lt;br&gt;
Esse documento é o que alimenta todos os prompts daquele projeto. Mudar um elemento nele muda a estética de tudo que vier depois. Tratar estética como sistema significa que a consistência não depende de memória ou gosto momentâneo, ela está documentada e é reproduzível.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ferramentas por etapa&lt;br&gt;
O stack que uso atualmente, sem romantizar nenhuma ferramenta:&lt;br&gt;
• Conceito e prompt: ChatGPT para composições realistas e narrativas&lt;br&gt;
• Estética abstrata e conceitual: Midjourney quando o projeto exige esse registro&lt;br&gt;
• Pós-processamento e tipografia: Canva para ajustes finais e inserção de texto&lt;br&gt;
• Repositório de prompts e ativos: arquivo local organizado por projeto&lt;br&gt;
Não existe ferramenta certa em absoluto. Existe ferramenta certa para o tipo de output que cada projeto exige.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conclusão&lt;br&gt;
Produzir conteúdo visual consistente sem equipe é possível quando a IA é usada como executor dentro de um sistema definido pelo humano.&lt;br&gt;
O que não é possível é delegar a direção criativa para a IA e esperar identidade visual. A IA executa bem. Ela não decide bem o que vale a pena executar.&lt;br&gt;
O pipeline é o produto. A imagem gerada é apenas o output dele.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;© Nauiter Master | AI Strategist, Digital Artist &amp;amp; Automation&lt;/p&gt;

</description>
      <category>ai</category>
      <category>marketing</category>
      <category>design</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
