<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Guatu</title>
    <description>The latest articles on DEV Community by Guatu (@futhgar).</description>
    <link>https://dev.to/futhgar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847021%2F5aa46faa-d8e6-4023-ad78-5a335f875d69.png</url>
      <title>DEV Community: Guatu</title>
      <link>https://dev.to/futhgar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/futhgar"/>
    <language>en</language>
    <item>
      <title>The 6-Layer Memory Architecture I Run for Claude Code</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Fri, 17 Apr 2026 14:15:45 +0000</pubDate>
      <link>https://dev.to/futhgar/the-6-layer-memory-architecture-i-run-for-claude-code-5adc</link>
      <guid>https://dev.to/futhgar/the-6-layer-memory-architecture-i-run-for-claude-code-5adc</guid>
      <description>&lt;p&gt;I started where most people start: a single &lt;code&gt;CLAUDE.md&lt;/code&gt; at the root of every repo. It worked for a few weeks. Then it started failing in the same boring way every time. The file grew past 200 lines and instructions started getting ignored. The agent re-learned the same infrastructure facts every session. I'd find myself pasting the same context into the chat again, then opening a doc to copy a command I'd already told Claude about three times that week.&lt;/p&gt;

&lt;p&gt;So I kept adding layers. Six months later there are six of them. Last week I ripped out the two that didn't earn their keep, sanitized the rest, and pushed the whole thing as a public reference implementation at &lt;strong&gt;&lt;a href="https://github.com/futhgar/agent-memory-architecture" rel="noopener noreferrer"&gt;github.com/futhgar/agent-memory-architecture&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post is the honest tour — what each layer does, what I got wrong, what I'd skip if I were starting fresh today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The layers
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session start (always loaded)
├── Layer 1  Auto-memory (tool-provided persistence)
├── Layer 2  System instructions (CLAUDE.md / .cursorrules)
└── Layer 3  Path-scoped rules (load conditionally on file path)

On-demand retrieval (lazy)
├── Layer 4  Wiki knowledge base (markdown + [[wikilinks]])
├── Layer 5  Semantic vector search (Qdrant + nomic-embed-text)
└── Layer 6  Cognitive memory with activation decay (MSAM / Zep / Letta)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The composition is deliberate. Layers 1-3 sit in context every session so the agent starts knowing how to behave. Layers 4-6 are called on demand when the agent needs a specific fact — cheap index lookup (Layer 4) first, semantic search fallback (Layer 5) when the keyword index misses, cognitive memory (Layer 6) only when you need temporal dynamics that flat files can't express.&lt;/p&gt;

&lt;p&gt;Most teams stop at Layer 2. Some go to Layer 4 — Karpathy's &lt;a href="https://karpathy.bearblog.dev/" rel="noopener noreferrer"&gt;"LLM Wiki" insight&lt;/a&gt; that a disciplined wiki with good cross-references outperforms naive RAG at near-zero operational cost. I kept going because the homelab had enough breadth that even the wiki started missing on keyword lookups, and because session-specific learnings — "we tried X, it failed because Y" — didn't belong in permanent wiki articles but shouldn't evaporate either.&lt;/p&gt;

&lt;h2&gt;
  
  
  What went wrong along the way
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md bloat.&lt;/strong&gt; The first mistake. I kept adding "this one more thing" to &lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; and watched the agent ignore the back half. Anthropic's &lt;a href="https://code.claude.com/docs/en/memory" rel="noopener noreferrer"&gt;Claude Code memory docs&lt;/a&gt; explicitly say under 200 lines per file. Take that seriously. Every line above the threshold is making the lines below it less effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using a vector store for things a keyword index would have solved.&lt;/strong&gt; I set up a Qdrant &lt;code&gt;claude-code-memory&lt;/code&gt; collection early and started dumping session learnings into it. Six months in, it had 451 points and most of them were never retrieved. The wiki could have solved 95% of what I was using it for. I still use the vector store for session-to-session learnings — but I no longer recommend reaching for it before Layer 4 is properly curated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treating path-scoped rules as optional.&lt;/strong&gt; The &lt;code&gt;.claude/rules/*.md&lt;/code&gt; pattern lets you write "when editing &lt;code&gt;kubernetes/**&lt;/code&gt;, load these K8s conventions" rules that don't eat tokens when you're writing Python. Before I moved K8s conventions from the monolithic project CLAUDE.md into &lt;code&gt;.claude/rules/kubernetes.md&lt;/code&gt;, my baseline context load was ~500-800 tokens higher for every session, whether or not I was editing K8s. That's real money — not the dollar kind, the context-window-budget kind that directly reduces how much the user can use for work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cognitive memory before I needed it.&lt;/strong&gt; I set up MSAM (a custom ACT-R-inspired memory system with activation decay) for three months before I had a single use case that actually needed temporal dynamics. It was cool, but skipping to Layer 6 before Layers 4-5 are mature is the classic over-engineering trap. If I started fresh today, I'd stop at Layer 4 for at least a month and only add Layers 5-6 once the wiki's limitations were obvious from actual use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting to validate the "fixes."&lt;/strong&gt; Mid-project I realized my MSAM MCP integration was silently broken — the wrapper path in &lt;code&gt;.claude.json&lt;/code&gt; pointed to a file that didn't exist. Every "use MSAM for this" instruction in CLAUDE.md had been a dead letter for weeks. The lesson: when you configure a memory system, test the round-trip (store → recall) before trusting that it works. Configuration isn't validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually in the repo
&lt;/h2&gt;

&lt;p&gt;The repo is opinionated and template-heavy, not a framework. You fork it or cherry-pick pieces; you don't install it and "run" it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;templates/global/CLAUDE.md&lt;/code&gt; and &lt;code&gt;templates/project/CLAUDE.md&lt;/code&gt; — sanitized starting points, under 60 lines each&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;templates/rules/&lt;/code&gt; — path-scoped rule examples for kubernetes, terraform, dockerfiles, and wiki editing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;templates/memory-files/&lt;/code&gt; — YAML-frontmatter templates for project / reference / feedback memory files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scripts/rebuild-memory-index.py&lt;/code&gt; — audits memory files for orphans, stale content, oversized files, and credential leaks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scripts/build-wiki-graph.py&lt;/code&gt; — generates an interactive force-directed graph of your wiki's &lt;code&gt;[[wikilinks]]&lt;/code&gt; (or use &lt;a href="https://cosma.arthurperret.fr" rel="noopener noreferrer"&gt;Cosma&lt;/a&gt; for a more polished rendering)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scripts/msam-mcp-wrapper.py&lt;/code&gt; — a &lt;a href="https://github.com/jlowin/fastmcp" rel="noopener noreferrer"&gt;FastMCP&lt;/a&gt; wrapper for cognitive memory if you go that far&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scripts/check-sanitization.sh&lt;/code&gt; — pre-publish scanner for secrets, IPs, and personal data if you fork this&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a one-line installer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/futhgar/agent-memory-architecture/main/bootstrap.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--layer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That auto-detects your agent (Claude Code, Cursor, or Aider), backs up any existing files, and drops in the templates. It has a &lt;code&gt;--dry-run&lt;/code&gt; flag because I wouldn't blind-trust someone else's curl-bash either. See &lt;code&gt;docs/getting-started.md&lt;/code&gt; for the decision tree on which layer to install when.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to skip
&lt;/h2&gt;

&lt;p&gt;Honestly: Layer 6 unless you already know you need it. The cognitive memory layer is the most opinionated and least-validated part of the system. MSAM is a research-grade tool. Zep and Letta are the production alternatives. All three require infrastructure, monitoring, and conceptual work. If your wiki (Layer 4) + vector search (Layer 5) aren't yet exhausting your team's patience, Layer 6 is premature.&lt;/p&gt;

&lt;p&gt;Also: don't cargo-cult the whole stack. The repo's &lt;code&gt;docs/getting-started.md&lt;/code&gt; has a decision tree — "is your CLAUDE.md over 200 lines? yes → try Layer 3; no → stay at Layer 2." Most teams should stop at Layer 4. The repo exists so you can see what the whole road looks like, not because everyone should walk it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why open-source it
&lt;/h2&gt;

&lt;p&gt;Two reasons. One is pure: the pattern is genuinely useful and I didn't see it written down anywhere in its full form. There are a lot of "here is my CLAUDE.md" repos and a lot of research on isolated pieces (RAG, knowledge graphs, vector stores), but the composition — which layers to use together, in what order, with what tradeoffs — was pattern I had to piece together from running it.&lt;/p&gt;

&lt;p&gt;The other reason is less pure: I run &lt;a href="https://guatulabs.com" rel="noopener noreferrer"&gt;Guatu Labs&lt;/a&gt;, and we help companies implement AI agent infrastructure. A reference implementation that people can actually read is a better pitch than anything I'd write on a services page. If this saves you a week of research, and later you need help rolling something similar out in your org, you know where to find me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;strong&gt;&lt;a href="https://github.com/futhgar/agent-memory-architecture" rel="noopener noreferrer"&gt;github.com/futhgar/agent-memory-architecture&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Quick start: &lt;a href="https://github.com/futhgar/agent-memory-architecture/blob/main/docs/getting-started.md" rel="noopener noreferrer"&gt;docs/getting-started.md&lt;/a&gt; (decision tree)&lt;/li&gt;
&lt;li&gt;Deep dive: &lt;a href="https://github.com/futhgar/agent-memory-architecture/blob/main/docs/architecture.md" rel="noopener noreferrer"&gt;docs/architecture.md&lt;/a&gt; (each layer in detail)&lt;/li&gt;
&lt;li&gt;Cursor / Aider / custom agents: &lt;a href="https://github.com/futhgar/agent-memory-architecture/blob/main/docs/adapting-to-other-agents.md" rel="noopener noreferrer"&gt;docs/adapting-to-other-agents.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build something on top of it, please ping me — that's the only "contribution" it asks for.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>claudecode</category>
      <category>memory</category>
      <category>rag</category>
    </item>
    <item>
      <title>Building Karpathy's LLM Wiki: A Production Homelab Implementation</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Mon, 13 Apr 2026 18:15:03 +0000</pubDate>
      <link>https://dev.to/futhgar/building-karpathys-llm-wiki-a-production-homelab-implementation-585j</link>
      <guid>https://dev.to/futhgar/building-karpathys-llm-wiki-a-production-homelab-implementation-585j</guid>
      <description>&lt;p&gt;I tried to run Karpathy's LLM Wiki on my Proxmox homelab cluster and spent three days debugging why the front-end wouldn't load. The error log said &lt;code&gt;502 Bad Gateway&lt;/code&gt;, but the backend was running and the API was reachable. It turned out the problem was in the reverse proxy configuration. I'd missed a single line in the Nginx config that was required for WebSockets to work properly.&lt;/p&gt;

&lt;p&gt;This isn't the first time I've run into an issue where the documentation says one thing and the reality is something else. That's why I'm writing this — to show you exactly what I did, what went wrong, and how I fixed it.&lt;/p&gt;

&lt;p&gt;If you're running a multi-node Kubernetes cluster with Proxmox, or you're trying to set up a production-like AI agent environment in your homelab, this is for you. I'll walk you through the exact steps I used to deploy Karpathy's LLM Wiki in a way that mirrors real-world production setups, with the gotchas and workarounds that actually matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried First
&lt;/h2&gt;

&lt;p&gt;I started by cloning the LLM Wiki repository and running the demo setup with Docker Compose. It worked locally, but when I tried to scale it up to Kubernetes, I hit a wall. The first thing that broke was the persistent storage. I assumed the default Kubernetes emptyDir would work, but the wiki needed to persist data across restarts. I tried Longhorn, but the initial setup didn't account for the way the LLM Wiki uses SQLite — it required a specific file permission that wasn't set in the default PVC.&lt;/p&gt;

&lt;p&gt;Then I tried using the &lt;code&gt;initContainers&lt;/code&gt; approach to set the right permissions, but that didn't work either. I ended up having to modify the &lt;code&gt;StorageClass&lt;/code&gt; and &lt;code&gt;PVC&lt;/code&gt; definitions to include &lt;code&gt;fsGroup&lt;/code&gt; and &lt;code&gt;runAsUser&lt;/code&gt; settings that matched the SQLite process. That was a pain, but it was necessary.&lt;/p&gt;

&lt;p&gt;Next, I tried to set up the reverse proxy with Traefik, which is what I use in production. But again, the documentation didn't mention that the LLM Wiki's WebSockets needed a specific configuration. I spent a few hours trying to figure out why the chat interface was broken, until I found a single line in the Traefik config that I had to add: &lt;code&gt;proxy-set-headers: "Connection: keep-alive"&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Solution
&lt;/h2&gt;

&lt;p&gt;Here's what I ended up with. This is a production-ready setup that mirrors real-world environments, not just a demo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;I'm assuming you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Proxmox cluster with Kubernetes installed&lt;/li&gt;
&lt;li&gt;A working Kubernetes cluster with at least 2 nodes&lt;/li&gt;
&lt;li&gt;A working Traefik ingress controller&lt;/li&gt;
&lt;li&gt;A working Longhorn storage system&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploying the LLM Wiki
&lt;/h3&gt;

&lt;p&gt;I used Helm to deploy the LLM Wiki, but I had to modify the values file to match the specific needs of the application. Here's what my &lt;code&gt;values.yaml&lt;/code&gt; looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# values.yaml&lt;/span&gt;
&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wiki.example.com"&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/"&lt;/span&gt;
          &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;traefik.ingress.kubernetes.io/router.middlewares&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-websocket-middleware"&lt;/span&gt;
    &lt;span class="na"&gt;traefik.ingress.kubernetes.io/router.tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;

&lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;longhorn"&lt;/span&gt;
  &lt;span class="na"&gt;accessModes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ReadWriteMany&lt;/span&gt;
  &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10Gi"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I created the middleware for WebSockets in Traefik:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# traefik-middleware.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.containo.us/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Middleware&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default-websocket-middleware&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;webSocket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;upGrade&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also had to add the &lt;code&gt;proxy-set-headers&lt;/code&gt; configuration to the Traefik IngressRoute, which I did by modifying the &lt;code&gt;ingressRoute&lt;/code&gt; spec in the Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ingressroute.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.containo.us/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IngressRoute&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wiki-ingress&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;websecure&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Host(`wiki.example.com`)&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rule&lt;/span&gt;
      &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wiki&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;middlewares&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default-websocket-middleware&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;passthrough&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the storage, I used Longhorn with a &lt;code&gt;ReadWriteMany&lt;/code&gt; access mode. That was tricky because SQLite doesn't support &lt;code&gt;ReadWriteMany&lt;/code&gt; by default, but I was able to get it working by using a Longhorn volume with a &lt;code&gt;Filesystem&lt;/code&gt; type. That's something the documentation didn't mention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring the Application
&lt;/h3&gt;

&lt;p&gt;The LLM Wiki itself had a few configuration options I had to set. I added the following environment variables to the deployment spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WIKI_DB_URL&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite:///wiki.db"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WIKI_HOST&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wiki.example.com"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WIKI_PORT&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;80"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also had to make sure the SQLite file had the correct permissions. I used a &lt;code&gt;initContainer&lt;/code&gt; to set the right ownership:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;initContainers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;init-sqlite&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sh"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;touch&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/wiki.db&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;chown&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1001:1001&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/wiki.db"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wiki-data&lt;/span&gt;
        &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/wiki.db&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was necessary because the SQLite process runs as user &lt;code&gt;1001&lt;/code&gt;, and without the right ownership, it wouldn't be able to write to the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Works
&lt;/h2&gt;

&lt;p&gt;The key to getting this to work was understanding the specific requirements of the LLM Wiki and how they interacted with Kubernetes and Longhorn.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent Storage&lt;/strong&gt;: SQLite requires a specific file permission that wasn't set in the default PVC. I had to use an &lt;code&gt;initContainer&lt;/code&gt; to set the right ownership and make sure the volume was mounted correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reverse Proxy Configuration&lt;/strong&gt;: The WebSockets in the LLM Wiki required a specific Traefik middleware to work properly. That's something the documentation didn't mention, but it was necessary for the chat interface to function.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Longhorn Configuration&lt;/strong&gt;: The LLM Wiki uses SQLite, which doesn't support &lt;code&gt;ReadWriteMany&lt;/code&gt; by default. I had to use a Longhorn volume with a &lt;code&gt;Filesystem&lt;/code&gt; type to make it work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the kinds of gotchas that don't appear in the documentation, but they're critical to getting the application running in a real-world environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;This was a learning experience, and here's what I'd do differently next time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with a Minimal Setup&lt;/strong&gt;: I should have started with a minimal setup and then added complexity incrementally. That would have helped me identify the issues earlier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Real-World Tools&lt;/strong&gt;: I should have used the same tools I use in production — like Traefik and Longhorn — from the start. That would have saved me time debugging compatibility issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the WebSockets Early&lt;/strong&gt;: I should have tested the WebSockets early on to make sure they worked before deploying the application. That would have saved me from a lot of frustration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also learned that the documentation doesn't always cover the edge cases. For example, the LLM Wiki's documentation didn't mention the need for a specific Traefik middleware for WebSockets. That's something that's critical to getting the application running, but it's not something you'd find in the documentation.&lt;/p&gt;

&lt;p&gt;If you're trying to run the LLM Wiki in a production-like environment, I'd recommend using the same tools you use in production — like Traefik and Longhorn — and testing the WebSockets early on. That will help you avoid a lot of the gotchas I ran into.&lt;/p&gt;

&lt;p&gt;Finally, I'd like to mention that if you're looking for help with AI agent orchestration, Kubernetes infrastructure, or industrial IoT systems, I'm available for consulting. You can find more information at &lt;a href="https://guatulabs.com/services" rel="noopener noreferrer"&gt;https://guatulabs.com/services&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>llmwiki</category>
      <category>homelab</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Proxmox API Tokens: Bash History Expansion and the ! Character</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:15:16 +0000</pubDate>
      <link>https://dev.to/futhgar/proxmox-api-tokens-bash-history-expansion-and-the-character-2k4p</link>
      <guid>https://dev.to/futhgar/proxmox-api-tokens-bash-history-expansion-and-the-character-2k4p</guid>
      <description>&lt;p&gt;I spent 45 minutes trying to figure out why my Proxmox API token kept getting exposed in logs. Turns out, it wasn't a security hole — it was Bash history expansion eating my token string and spitting it back in plaintext.&lt;/p&gt;

&lt;p&gt;The thing: Bash history expansion interprets &lt;code&gt;!&lt;/code&gt; as a history reference. When you paste an API token into a script or command line, any &lt;code&gt;!&lt;/code&gt; in the token gets expanded to a previous command. This is a problem when your token contains &lt;code&gt;!&lt;/code&gt; — which some Proxmox API tokens do.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# This will fail because Bash tries to expand the ! in the token&lt;/span&gt;
curl &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"user:my-token!123"&lt;/span&gt; https://proxmox.example.com/api2/json/nodes

&lt;span class="c"&gt;# Fix it by quoting the token or escaping the !&lt;/span&gt;
curl &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"user:my-token&lt;/span&gt;&lt;span class="se"&gt;\!&lt;/span&gt;&lt;span class="s2"&gt;123"&lt;/span&gt; https://proxmox.example.com/api2/json/nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is simple but easy to miss: quote the token or escape any &lt;code&gt;!&lt;/code&gt; characters. I learned this the hard way after seeing &lt;code&gt;my-token!123&lt;/code&gt; show up in my shell history and logs after a failed API call.&lt;/p&gt;

&lt;p&gt;Done: Bash history expansion is a gotcha for anyone using &lt;code&gt;!&lt;/code&gt; in API tokens — quote or escape them to avoid surprises.&lt;/p&gt;

</description>
      <category>proxmox</category>
      <category>bash</category>
      <category>apitokens</category>
      <category>homelab</category>
    </item>
    <item>
      <title>AMD iGPU Stealing Your RAM: UMA Frame Buffer on Headless Servers</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Mon, 13 Apr 2026 10:15:17 +0000</pubDate>
      <link>https://dev.to/futhgar/amd-igpu-stealing-your-ram-uma-frame-buffer-on-headless-servers-klg</link>
      <guid>https://dev.to/futhgar/amd-igpu-stealing-your-ram-uma-frame-buffer-on-headless-servers-klg</guid>
      <description>&lt;p&gt;I had 16 GB of RAM in my Proxmox host. I expected to see 16 GB. What I got was 11 GB.&lt;/p&gt;

&lt;p&gt;The first time I noticed it was during a reboot. The Proxmox web UI said 11 GB installed. The &lt;code&gt;free -h&lt;/code&gt; command showed 11 GB. The BIOS reported 16 GB. Something was missing 5 GB. Not a kernel module. Not a driver. Not a config. It was the UMA frame buffer, reserved by the AMD iGPU.&lt;/p&gt;

&lt;p&gt;I had a headless server, no display, no GPU passthrough, no need for the integrated graphics. But the UMA frame buffer was still allocating memory. Every time the system booted, 5 GB was carved out for the iGPU, even though it had no use for it.&lt;/p&gt;

&lt;p&gt;The fix was in the BIOS. I had to disable UMA frame buffer allocation. But I didn't find that setting right away. I tried looking for "graphics" in the BIOS, found a section called "Advanced Chipset Configuration," and there it was: "UMA Frame Buffer Size." Set to "Auto" by default. I changed it to "Disabled" and rebooted. The RAM came back.&lt;/p&gt;

&lt;p&gt;I should have known this would happen. I've seen it before in other Proxmox nodes. The docs say nothing about it. The forums have a handful of posts about it, buried in threads about GPU passthrough or memory leaks. No one says, "Hey, if you're running a headless server with an AMD iGPU, you might be losing 5 GB of RAM."&lt;/p&gt;

&lt;p&gt;I've been running this setup for months now. I've got a couple of other Proxmox nodes with the same hardware. I've applied the same fix on all of them. It's not a major issue, but it's one of those things that feels like a waste of resources. 5 GB is a lot for a headless server that's not even using the GPU.&lt;/p&gt;

&lt;p&gt;If you're running Proxmox on hardware with an AMD iGPU and you're not using it for anything — no passthrough, no display, no headless GPU workloads — check the UMA frame buffer setting. It's easy to miss, but it's there. You'll save memory. You'll save confusion. And you'll avoid that moment when you think your system is broken, but it's just the iGPU eating your RAM.&lt;/p&gt;

&lt;p&gt;You can find the setting in the BIOS under "Advanced Chipset Configuration." It's called "UMA Frame Buffer Size." If you're not using the integrated graphics, set it to "Disabled." You can also add &lt;code&gt;nomodeset&lt;/code&gt; to your kernel command line if you want to avoid the iGPU altogether. But the BIOS setting is cleaner and more permanent.&lt;/p&gt;

&lt;p&gt;I've seen this come up in discussions about Proxmox, Kubernetes, and even in some homelab setups with bare metal. It's not a common issue, but it's a real one. And it's one of those gotchas that doesn't make it into the official documentation. You have to dig for it. You have to know what to look for.&lt;/p&gt;

&lt;p&gt;If you're running a headless server with an AMD iGPU and you're seeing unexplained memory shortages, check the UMA frame buffer. It's not a bug. It's a feature. And it's one you can turn off if you don't need it.&lt;/p&gt;

</description>
      <category>headlessservers</category>
      <category>amdigpu</category>
      <category>ramleak</category>
      <category>umfframebuffer</category>
    </item>
    <item>
      <title>Agent Credential Management: Two-Tier Service Accounts for Secure AI Agent Workflows</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Fri, 10 Apr 2026 04:15:25 +0000</pubDate>
      <link>https://dev.to/futhgar/agent-credential-management-two-tier-service-accounts-for-secure-ai-agent-workflows-p72</link>
      <guid>https://dev.to/futhgar/agent-credential-management-two-tier-service-accounts-for-secure-ai-agent-workflows-p72</guid>
      <description>&lt;p&gt;I spent three days trying to get a multi-agent system to talk to a Kubernetes API endpoint. Every time I used the default service account, the agent would hit a 403 and lock out. I was using the right permissions, the right roles, the right RBAC rules. It wasn’t until I implemented a two-tier service account system that the agents finally stopped throwing errors. It’s not just about having the right permissions , it’s about structuring them in a way that isolates the agent’s access and limits its blast radius.&lt;/p&gt;

&lt;p&gt;If you're running AI agents in Kubernetes, especially ones that interact with external systems or sensitive data, this is a pattern you should consider. This isn't just about security , it's about making sure your agents fail safely and don't accidentally break your entire infrastructure if they're compromised.&lt;/p&gt;

&lt;p&gt;I first tried using the default service account for all my agents. It worked fine for a while, but as I scaled out to more agents and more workflows, I started seeing odd behavior. The agents would occasionally call APIs with incorrect credentials, or worse, they'd silently fail without any logs. I checked the RBAC policies, adjusted them, and kept getting the same issues. It was like the agents were leaking credentials or being intercepted by something in the cluster.&lt;/p&gt;

&lt;p&gt;At one point, I even tried using a single dedicated service account for all agents, which I thought would give them more consistent permissions. But that only made things worse. Now every agent had the same access, and when one got compromised, the whole system was at risk. I realized I needed a way to isolate each agent’s credentials while still maintaining some level of central control.&lt;/p&gt;

&lt;p&gt;The solution came in the form of a two-tier service account system. Instead of giving each agent its own service account with direct access to the cluster, I created a central service account with limited permissions that acted as a "proxy" for all the agent workloads. Each agent then had its own "child" service account, which was granted access only through this central account. This way, I could manage permissions at the central level without having to update every agent individually when something changed.&lt;/p&gt;

&lt;p&gt;Here’s how I set it up. First, I created the central service account with a role that had access to the necessary resources but nothing more:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy-role&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pods"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;services"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configmaps"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy-binding&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy-role&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, for each agent, I created a separate service account that was bound to the central account using a &lt;code&gt;RoleBinding&lt;/code&gt; that referenced the central account’s role. This gave each agent its own identity while still limiting their access to only what was necessary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-worker-1&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-worker-1-binding&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-worker-1&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-system&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-proxy-role&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent’s deployment was configured to use its own service account, and I made sure to set the &lt;code&gt;automountServiceAccountToken&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt; so that the agents could access the token automatically. This way, the agents had their own credentials but were still limited by the permissions defined in the central role.&lt;/p&gt;

&lt;p&gt;This approach has a few key benefits. First, it isolates each agent's access, so a compromise in one agent doesn’t automatically give access to all others. Second, it simplifies permission management , you can update the central role once and all agents inherit the new permissions. And third, it makes auditing easier , if an agent does something unexpected, you can trace it back to its specific service account.&lt;/p&gt;

&lt;p&gt;This isn’t a perfect solution, though. There are tradeoffs. Setting this up requires more configuration than just using a single service account, and it adds some complexity to the system. You also have to make sure that the central account doesn’t have more permissions than it needs , otherwise, you end up with the same security risks you were trying to avoid.&lt;/p&gt;

&lt;p&gt;What I would do differently is to automate this process more. I had to manually create each service account and role binding, which was tedious and error-prone. If I were to do this again, I’d set up a Kubernetes operator or a script that could generate the necessary resources based on a list of agent names. That would reduce the risk of mistakes and make it easier to scale.&lt;/p&gt;

&lt;p&gt;Another thing I’d consider is adding more fine-grained permissions. Right now, the central role gives access to pods, services, and configmaps, but that might be more than some agents need. Going forward, I’d look into creating more specific roles for different types of agents , for example, one role for agents that only need to read pods and another for agents that can modify services. This would make the system more secure and more efficient.&lt;/p&gt;

&lt;p&gt;I also wish I had implemented some kind of token rotation or short-lived credentials for the agent service accounts. Right now, the tokens are long-lived, which means if an agent’s credentials are ever compromised, the attacker has a long window to do damage. Adding a system that rotates the tokens periodically would help reduce that risk, even if it adds some complexity.&lt;/p&gt;

&lt;p&gt;In the end, the two-tier service account system worked well for my needs. It gave me the security I wanted without sacrificing the flexibility I needed. It wasn’t the easiest setup to get going, but the tradeoff was worth it , I now have a more secure and manageable agent system. If you're managing AI agents in Kubernetes, I'd recommend considering this approach for your credential management.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>credentialmanagement</category>
      <category>security</category>
      <category>serviceaccounts</category>
    </item>
    <item>
      <title>Pod Disruption Budgets: Why kubectl drain Gets Stuck on Longhorn</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 09 Apr 2026 01:56:30 +0000</pubDate>
      <link>https://dev.to/futhgar/pod-disruption-budgets-why-kubectl-drain-gets-stuck-on-longhorn-23k9</link>
      <guid>https://dev.to/futhgar/pod-disruption-budgets-why-kubectl-drain-gets-stuck-on-longhorn-23k9</guid>
      <description>&lt;p&gt;I spent three hours trying to drain a node running Longhorn, only to watch &lt;code&gt;kubectl drain&lt;/code&gt; freeze with no progress or error. It felt like the command was waiting for something that would never come.&lt;/p&gt;

&lt;p&gt;What I expected was a smooth drain — pods evicted, node marked as unavailable, and everything moving on. No drama. No hanging. Just a clean maintenance operation.&lt;/p&gt;

&lt;p&gt;What actually happened was a silent stall. The drain command would not proceed past the point where it started trying to evict the Longhorn manager pod. It would sit there, stuck, with no message. After some digging, I realized it was a Pod Disruption Budget (PDB) blocking the eviction. Longhorn has a PDB in place that ensures at least one manager pod is always running, and that PDB was preventing the eviction from completing.&lt;/p&gt;

&lt;p&gt;The fix came in two parts: first, I had to adjust the PDB to allow the drain to proceed. Second, I had to make sure the &lt;code&gt;kubectl drain&lt;/code&gt; command used the right flags. Here's what I did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl drain &amp;lt;node-name&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ignore-daemonsets&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--delete-emptydir-data&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--grace-period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;300s &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also updated the PDB to allow at least one manager pod to be evicted during the drain, while still ensuring availability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PodDisruptionBudget&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-pdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;minAvailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/instance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This change allowed the drain to proceed without blocking. The key takeaway was that PDBs are not just a safety net — they're a potential roadblock if not configured with maintenance in mind.&lt;/p&gt;

&lt;p&gt;This matters a lot if you're managing a Kubernetes cluster with Longhorn. Every time you want to do maintenance, scale down, or upgrade, you're going to run into this. You need to know how PDBs interact with your storage layer, and how to configure them to allow safe operations. If you don't, you'll be stuck trying to figure out why &lt;code&gt;kubectl drain&lt;/code&gt; is hanging — and it won't be fun.&lt;/p&gt;

&lt;p&gt;I learned my lesson the hard way. I'm now making sure every PDB in my cluster is reviewed during maintenance planning. I've also added documentation to our team's Kubernetes guides about this specific scenario. If I had known earlier, I would have configured the PDB with more flexibility from the start.&lt;/p&gt;

&lt;p&gt;If you're running Longhorn and doing any kind of node maintenance, take a moment to check your PDBs. They might be the reason your &lt;code&gt;kubectl drain&lt;/code&gt; is getting stuck. And if you're not sure where to start, I recommend looking at the Longhorn PDBs and asking: "What's the worst that could happen if one of these pods goes away during a drain?" The answer will tell you what minAvailable should be.&lt;/p&gt;

&lt;p&gt;For more on how to handle storage in Kubernetes, check out my post on &lt;a href="https://dev.to/kubernetes-storage-on-bare-metal-longhorn-in-practice"&gt;Kubernetes Storage on Bare Metal: Longhorn in Practice&lt;/a&gt;. It dives deeper into how Longhorn fits into larger infrastructure setups.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>longhorn</category>
      <category>poddisruptionbudgets</category>
      <category>nodedrain</category>
    </item>
    <item>
      <title>Equipment Health Scoring: How One Number Made My Operators Stop Checking the Dashboard</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 09 Apr 2026 01:54:34 +0000</pubDate>
      <link>https://dev.to/futhgar/equipment-health-scoring-how-one-number-made-my-operators-stop-checking-the-dashboard-3o1j</link>
      <guid>https://dev.to/futhgar/equipment-health-scoring-how-one-number-made-my-operators-stop-checking-the-dashboard-3o1j</guid>
      <description>&lt;p&gt;Operators in my last plant used to ignore dashboards — not because they didn’t care, but because the data was too noisy, too fragmented, and too abstract. I've seen this pattern again and again: a sea of metrics with no clear signal. But when I implemented a simple health scoring system that aggregated temperature, vibration, and pressure into a single number between 0 and 100, that changed everything. Now, operators check that number daily, and it’s the only metric they ask about during shift handovers.&lt;/p&gt;

&lt;p&gt;The key was making the score intuitive, real-time, and tied directly to the equipment’s state. I built it using a simple weighted formula that reflects the relative importance of each sensor. Temperature and vibration are more sensitive to wear and tear, so I gave them higher weight. Pressure, while important, was less of a red flag unless it spiked suddenly. This isn’t perfect — it’s not an ML model, it’s a rule-based system — but it works for my use case and it’s easy to maintain.&lt;/p&gt;

&lt;p&gt;Here’s how I implemented it in Python as part of a Node-RED flow that aggregates data from multiple sensors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# equipment_health.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Weights for each sensor type
&lt;/span&gt;&lt;span class="n"&gt;WEIGHTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vibration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pressure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_health_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Normalize each metric to a 0-1 scale
&lt;/span&gt;    &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;WEIGHTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Example: linear normalization between 0 and 100
&lt;/span&gt;            &lt;span class="c1"&gt;# This would be replaced with domain-specific logic
&lt;/span&gt;            &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# 20-100 range
&lt;/span&gt;
    &lt;span class="c1"&gt;# Calculate weighted score
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;WEIGHTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sensor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;

    &lt;span class="c1"&gt;# Map to 0-100 range
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;health&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculate_health_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;health_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a basic example, but it shows the core idea: take raw data, normalize it, apply weights, and return a single score. The scoring function can be as simple or complex as needed. For example, I’ve used exponential decay for vibration data or applied thresholds based on historical failure data.&lt;/p&gt;

&lt;p&gt;Operators now see this score on a Grafana dashboard with color-coded thresholds (red for &amp;lt;40, yellow for 40–60, green for 60+), and it's the only metric they ask about. The real win isn’t the algorithm, though — it’s that the score is actionable. When it drops below 40, they know to investigate. When it hits 20, they know it’s time to schedule maintenance. It's not perfect, but it’s what they check every day.&lt;/p&gt;

</description>
      <category>iiot</category>
      <category>predictivemaintenance</category>
      <category>sensordata</category>
      <category>equipmenthealth</category>
    </item>
    <item>
      <title>Infrastructure as Code, but Automated: OpenTofu and GitHub Actions</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 09 Apr 2026 01:54:21 +0000</pubDate>
      <link>https://dev.to/futhgar/infrastructure-as-code-but-automated-opentofu-and-github-actions-1g78</link>
      <guid>https://dev.to/futhgar/infrastructure-as-code-but-automated-opentofu-and-github-actions-1g78</guid>
      <description>&lt;p&gt;I once spent three hours debugging a "successful" pipeline that had actually failed to deploy a critical security group update because I had set &lt;code&gt;continue-on-error: true&lt;/code&gt; in a shell script step. The logs said green, the UI said green, but my actual infrastructure was still wide open to the internet. It is a specific type of dread that only hits when you realize your automation is lying to you.&lt;/p&gt;

&lt;p&gt;If you are managing even a modest cluster of VMs or a few bare-metal nodes, you eventually hit a wall where manual &lt;code&gt;tofu apply&lt;/code&gt; commands from your laptop become a liability. You start worrying about which version of the binary you're running, whether your local state is out of sync with the remote, and if you accidentally left a sensitive variable in your shell history.&lt;/p&gt;

&lt;p&gt;This is the problem for anyone moving from "I run scripts" to "I manage systems." Whether you are orchestrating Kubernetes nodes on Proxm/x or managing cloud resources, the goal is to move the source of truth from your terminal to a version-controlled workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Tried First
&lt;/h3&gt;

&lt;p&gt;My first instinct was the "Lazy Engineer" approach. I just wanted a GitHub Action that ran &lt;code&gt;tofu apply&lt;/code&gt; whenever I pushed to &lt;code&gt;main&lt;/code&gt;. I didn't bother with a plan phase, a pull request review, or even a remote backend. I just pointed a runner at my state file and hoped for the best.&lt;/p&gt;

&lt;p&gt;It was a disaster.&lt;/p&gt;

&lt;p&gt;I pushed a change to a networking module that had a small typo in a CIDR block. Because there was no "Plan" step to inspect, the runner immediately started destroying and recreating the primary network interface. My entire lab went dark. I couldn't even SSH into the nodes to fix the mistake because the automation had nuked the route I was using.&lt;/p&gt;

&lt;p&gt;I also learned the hard way that running &lt;code&gt;apply&lt;/code&gt; directly on a push is dangerous. Without a decoupled &lt;code&gt;plan&lt;/code&gt; step that attaches to a Pull Request, you lose the ability to peer-review the impact of your changes. You aren't reviewing code; you are reviewing side effects.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Actual Solution
&lt;/h3&gt;

&lt;p&gt;A real automation pipeline needs three distinct stages: Validation, Planning, and Deployment. You want the Plan to happen when a Pull Request is opened, and the Apply to happen only after that Plan has been merged into your main branch.&lt;/p&gt;

&lt;p&gt;Here is the architecture I use for my infrastructure projects.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. The Backend Configuration
&lt;/h4&gt;

&lt;p&gt;First, you cannot use local state. If the GitHub runner dies or the workspace is wiped, your infrastructure is orphaned. You need a remote backend—S3, GCS, or even a specialized state server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend.tf&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-infra-state-bucket"&lt;/span&gt;
&lt;span class="nx"&gt;s&lt;/span&gt;    &lt;span class="nx"&gt;key&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"network/terraform.tfstate"&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
    &lt;span class="nx"&gt;dynamodb_table&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-lock"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. The GitHub Actions Workflow
&lt;/h4&gt;

&lt;p&gt;The workflow is split into two jobs: &lt;code&gt;plan&lt;/code&gt; (triggered on PR) and &lt;code&gt;apply&lt;/code&gt; (triggered on push to main). I use the &lt;code&gt;opentofu/setup-opentofu&lt;/code&gt; action because it handles the binary installation cleanly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Infrastructure CI/CD&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout Code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup OpenTofu&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;opentofu/setup-opentofu@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.6.0&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS Credentials&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;aws-access-key-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_ACCESS_KEY_ID }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-secret-access-key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_SECRET_ACCESS/ACCESS_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tofu Init&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tofu init&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tofu Plan&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tofu plan -no-color &amp;gt; plan.txt&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Upload Plan Artifact&lt;/span&gt;
        &lt;span class="s"&gt;uses&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
        &lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tfplan&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan.txt&lt;/span&gt;

  &lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.event_name == 'push' &amp;amp;&amp;amp; github.ref == 'refs/heads/main'&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout Code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup OpenTofu&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;opentofu/setup-opentofu@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.6.0&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS Credentials&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-async/configure-aws-credentials@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;aws-access-key-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_ACCESS_KEY_ID }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-secret-access-key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_SECRET_ACCESS_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tofu Init&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tofu init&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tofu Apply&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tofu apply -auto-approve&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: In a production-grade setup, you would actually download the &lt;code&gt;tfplan&lt;/code&gt; artifact from the previous job and run &lt;code&gt;tofu apply tfplan&lt;/code&gt;. This ensures that the exact changes you reviewed in the PR are the ones being applied, rather than a fresh plan that might have drifted in the minutes since the PR was merged.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Works
&lt;/h3&gt;

&lt;p&gt;This setup works because it enforces a "Review-First" culture. When you open a PR, the &lt;code&gt;plan&lt;/code&gt; job runs. You can look at the GitHub Actions logs, see exactly which resources are being added, changed, or destroyed, and then leave comments on the PR.&lt;/p&gt;

&lt;p&gt;The separation of &lt;code&gt;plan&lt;/code&gt; and &lt;code&gt;apply&lt;/code&gt; creates a gate. The &lt;code&gt;apply&lt;/code&gt; job is guarded by a conditional check: &lt;code&gt;if: github.event_name == 'push'&lt;/code&gt;. This prevents accidental deployments from feature branches.&lt;/p&gt;

&lt;p&gt;Using a remote backend with locking (like DynamoDB) is the secret sauce for stability. If two developers try to run a pipeline at the same time, OpenTofu will see the lock and fail the second job rather than corrupting the state file. This is the difference between a professional deployment and a lucky one.&lt;/p&gt;

&lt;p&gt;If you are managing more complex environments, such as &lt;a href="https://guatulabs.com/services" rel="noopener noreferrer"&gt;AI agent deployments&lt;/a&gt; that require specific GPU-backed compute nodes, this level of automation is non-negotiable. You cannot afford to manually tweak instance types or disk sizes when your workload depends on specific hardware availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;p&gt;If I could go back and rewrite my first few workflows, here is what I would change:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Never use &lt;code&gt;auto-approve&lt;/code&gt; on a PR.&lt;/strong&gt; It defeats the purpose of the plan. Only use it in the &lt;code&gt;apply&lt;/code&gt; job where the human review has already happened in the PR stage.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Validate your logic before you run it.&lt;/strong&gt; I have seen pipelines pass because a &lt;code&gt;grep&lt;/code&gt; command failed to find a string, but the actual deployment failed. Always check the exit codes of your custom scripts. If you are using an automation tool like n8n to trigger these workflows, ensure your error-handling logic is explicit. A failed check should result in an immediate &lt;code&gt;exit 1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Secrets management is a minefield.&lt;/strong&gt; Do not pass secrets as environment variables in your YAML if you can avoid it. Use the official provider actions (like &lt;code&gt;aws-actions/configure-aws-credentials&lt;/code&gt;) which handle the heavy lifting of credential injection securely.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The "Drift" problem is real.&lt;/strong&gt; Automation only works if you actually use it. The moment you manually change a setting in a web console or via the CLI, your OpenTofu state is out of sync. I have learned to treat any manual change as a "broken build" that needs to be codified immediately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Automating infrastructure is not about making things "faster"—it is about making them predictable. When you can trust that a push to &lt;code&gt;main&lt;/code&gt; will do exactly what the PR promised, you stop being a firefighter and start being an engineer.&lt;/p&gt;

</description>
      <category>infrastructure</category>
      <category>opentofu</category>
      <category>githubactions</category>
      <category>gitops</category>
    </item>
    <item>
      <title>Attention Residuals: How Kimi Is Rethinking Transformer Depth</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Tue, 07 Apr 2026 18:12:06 +0000</pubDate>
      <link>https://dev.to/futhgar/attention-residuals-how-kimi-is-rethinking-transformer-depth-4ii3</link>
      <guid>https://dev.to/futhgar/attention-residuals-how-kimi-is-rethinking-transformer-depth-4ii3</guid>
      <description>&lt;p&gt;Every transformer you've ever used stacks layers with a dead-simple formula: take the input, add the layer's output, move on. &lt;code&gt;x + layer(x)&lt;/code&gt;. Fixed weight of 1. No questions asked.&lt;/p&gt;

&lt;p&gt;The Kimi team at Moonshot AI just published a paper that asks: what if that's been wrong the whole time?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Standard residual connections accumulate layer outputs with equal weight. Layer 1 contributes the same as layer 47. The hidden state grows without bound as you stack more layers, and each individual layer's contribution gets diluted into the noise.&lt;/p&gt;

&lt;p&gt;This is called &lt;strong&gt;PreNorm dilution&lt;/strong&gt; — and it gets worse the deeper your model goes. At 100+ layers, the early layers are essentially screaming into a hurricane. Their signal is there, mathematically, but it's buried under the sum of everything that came after.&lt;/p&gt;

&lt;p&gt;For most of transformer history, we've papered over this with normalization tricks. RMSNorm, LayerNorm, various pre-norm and post-norm arrangements. They help. They don't solve the root cause.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Attention Residuals Actually Do
&lt;/h2&gt;

&lt;p&gt;The Kimi team's solution is elegant: replace the fixed &lt;code&gt;x + layer(x)&lt;/code&gt; with &lt;strong&gt;softmax attention over all preceding layer outputs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of blindly accumulating, each layer looks back at every layer before it and decides — with learned, input-dependent weights — how much of each previous layer to carry forward.&lt;/p&gt;

&lt;p&gt;Think of it like this: in standard residual connections, you're stuffing every letter you've ever received into one folder, in order, and hoping the important ones float to the top. Attention Residuals let each layer read through the folder and pick out exactly the letters that matter for the current task.&lt;/p&gt;

&lt;p&gt;The key properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input-dependent&lt;/strong&gt;: The aggregation weights change based on what the model is processing, not fixed at training time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth-selective&lt;/strong&gt;: Layer 50 might pull heavily from layer 3 and layer 48, but ignore everything in between&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learned&lt;/strong&gt;: The attention mechanism over layers is trained end-to-end with the rest of the model&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Block AttnRes: Making It Practical
&lt;/h2&gt;

&lt;p&gt;The naive version — attending over every single preceding layer — would be computationally brutal. O(n²) in the number of layers, which matters when you're stacking 80+ of them.&lt;/p&gt;

&lt;p&gt;Their practical solution is &lt;strong&gt;Block AttnRes&lt;/strong&gt;: partition layers into blocks and attend over block-level representations instead of individual layer outputs. Same principle, much lower overhead.&lt;/p&gt;

&lt;p&gt;They combine this with a two-phase computation strategy and cache-based pipeline communication. The result is a drop-in replacement for standard residual connections that adds minimal training cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;They integrated Attention Residuals into the Kimi Linear architecture — 48 billion total parameters with 3 billion activated (a sparse mixture-of-experts setup). Pre-trained on 1.4 trillion tokens.&lt;/p&gt;

&lt;p&gt;The improvements show up in two places:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training stability&lt;/strong&gt;: More uniform output magnitudes and gradient distribution across network depth. The deep layers aren't drowning in accumulated noise anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downstream performance&lt;/strong&gt;: Consistent improvements across standard benchmarks (MMLU, GSM8K, TriviaQA). Not dramatic jumps — this isn't a "we beat GPT" paper. It's a "we found a better foundation to build on" paper.&lt;/p&gt;

&lt;p&gt;The scaling law experiments are the most interesting part. The gains from Attention Residuals hold as you scale up model size. That's the signal that this is a fundamental architectural improvement, not a trick that works at one scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means For You
&lt;/h2&gt;

&lt;p&gt;If you're running inference on pre-trained models — which most of us are — you don't need to do anything. If Kimi or future models adopt AttnRes, you get the benefit automatically when those weights are published.&lt;/p&gt;

&lt;p&gt;If you're fine-tuning or training from scratch (at research scale), this is worth paying attention to. The technique is architecture-level — you'd need to modify the model definition, not just swap a config flag.&lt;/p&gt;

&lt;p&gt;The broader implication: we're still finding meaningful improvements in basic transformer plumbing. The residual connection hasn't changed since ResNet in 2015. Eleven years of "good enough" just got challenged with a clean, principled alternative.&lt;/p&gt;

&lt;p&gt;For the &lt;a href="https://guatulabs.com/services" rel="noopener noreferrer"&gt;AI agent systems&lt;/a&gt; we build, better base models mean better reasoning at every layer of the stack. An agent that can more effectively utilize deep network representations makes fewer mistakes in multi-step planning. The compounding effect is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Attention Residuals won't change your workflow tomorrow. But they represent the kind of foundational research that makes next year's models meaningfully better — not through scale alone, but through smarter architecture.&lt;/p&gt;

&lt;p&gt;The Kimi team showed that a component we've taken for granted since 2015 still had room for improvement. That's the kind of finding that ages well.&lt;/p&gt;

&lt;p&gt;Paper: &lt;a href="https://arxiv.org/abs/2603.15031" rel="noopener noreferrer"&gt;Attention Residuals&lt;/a&gt; — Kimi Team, Moonshot AI.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>transformers</category>
      <category>llmarchitecture</category>
      <category>attention</category>
    </item>
    <item>
      <title>NVIDIA Container Toolkit: Why the Default Runtime Matters</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 02 Apr 2026 20:46:15 +0000</pubDate>
      <link>https://dev.to/futhgar/nvidia-container-toolkit-why-the-default-runtime-matters-4l3a</link>
      <guid>https://dev.to/futhgar/nvidia-container-toolkit-why-the-default-runtime-matters-4l3a</guid>
      <description>&lt;p&gt;I spent two hours trying to debug why my AI agent container couldn’t find the GPU. The error was cryptic, just a "device not found" in the logs, and I had no idea why. I had installed the NVIDIA Container Toolkit, configured containerd, and even tried running the container with &lt;code&gt;--gpus all&lt;/code&gt; — nothing worked.&lt;/p&gt;

&lt;p&gt;What I expected was for the container to launch with full GPU access, as I had done this before on a different setup. The container should have detected the GPU, initialized the libraries, and started training my model without a hitch.&lt;/p&gt;

&lt;p&gt;What actually happened was that the container was using the default OCI runtime, which wasn’t the NVIDIA runtime. As a result, the NVIDIA libraries weren’t loaded, and the GPU wasn’t accessible. The container didn’t fail outright — it just silently missed the GPU, and the AI agent couldn’t proceed.&lt;/p&gt;

&lt;p&gt;The fix came after I remembered to run &lt;code&gt;nvidia-ctk runtime configure --runtime=containerd --set-as-default&lt;/code&gt; and restart &lt;code&gt;containerd&lt;/code&gt;. That command sets the NVIDIA runtime as the default for all containers, not just those that explicitly request it. Without this step, even if you use &lt;code&gt;--gpus all&lt;/code&gt;, the container might still run on the default runtime, which is usually &lt;code&gt;runc&lt;/code&gt; or &lt;code&gt;crun&lt;/code&gt;, not &lt;code&gt;nvidia&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To make sure the configuration sticks, I added the following to my containerd config under &lt;code&gt;[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;nvidia&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;runtime_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I restarted containerd and verified with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nvidia-ctk runtime configure &lt;span class="nt"&gt;--runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;containerd &lt;span class="nt"&gt;--set-as-default&lt;/span&gt;
systemctl restart containerd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also made sure that in Kubernetes, the NVIDIA device plugin was running with the correct runtime class. I updated the DaemonSet for the NVIDIA device plugin to explicitly set &lt;code&gt;runtimeClassName: nvidia&lt;/code&gt;, which is crucial in newer Kubernetes versions where the default runtime isn’t automatically set.&lt;/p&gt;

&lt;p&gt;Why does this matter? If you're running AI agents, LLMs, or any GPU-dependent workloads, and you forget to set the NVIDIA runtime as the default, you’ll run into silent failures. Containers might start, but they won’t see the GPU. Worse, you might not even get an error — just a model that doesn’t train or a container that exits with no clear reason.&lt;/p&gt;

&lt;p&gt;This is especially critical in Kubernetes environments where the NVIDIA device plugin relies on the runtime being set correctly. If it’s not, the device plugin won’t register the GPU, and your cluster will report zero GPU capacity.&lt;/p&gt;

&lt;p&gt;If you're using Proxmox and running containers with GPU access, or if you're deploying AI agents in a Kubernetes cluster, always make sure the NVIDIA runtime is set as the default. You can do this with the &lt;code&gt;nvidia-ctk&lt;/code&gt; tool, and it’s a simple but crucial step. Otherwise, you’ll be chasing cryptic errors and wasted compute time.&lt;/p&gt;

&lt;p&gt;For more on how to avoid GPU passthrough gotchas in VMs, check out &lt;a href="https://guatulabs.dev/posts/gpu-passthrough-on-proxmox-gotcha-guide" rel="noopener noreferrer"&gt;this post on GPU passthrough on Proxmox&lt;/a&gt;. If you're deploying AI agents in Kubernetes, &lt;a href="https://guatulabs.dev/posts/multi-agent-ai-systems-architecture-patterns" rel="noopener noreferrer"&gt;this guide on building multi-agent systems&lt;/a&gt; might also help.&lt;/p&gt;

</description>
      <category>nvidiaruntime</category>
      <category>containerd</category>
      <category>kubernetes</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>Kubernetes Storage on Bare Metal: Longhorn in Practice</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 02 Apr 2026 20:45:24 +0000</pubDate>
      <link>https://dev.to/futhgar/kubernetes-storage-on-bare-metal-longhorn-in-practice-25fa</link>
      <guid>https://dev.to/futhgar/kubernetes-storage-on-bare-metal-longhorn-in-practice-25fa</guid>
      <description>&lt;p&gt;I spent nearly a week trying to get Kubernetes storage working on my bare metal cluster before I finally figured out the right combination of Longhorn settings, node labels, and storage classes that made it stable. Turns out, most of the pain came from assumptions I made about how storage should work—not the tool itself.&lt;/p&gt;

&lt;p&gt;If you're running Kubernetes on physical hardware and need persistent storage—especially for databases, media servers, or AI agents that need state—this is for you. You’ll find the gotchas I hit, the config I used, and why it works better than most other bare-metal storage setups I've tried.&lt;/p&gt;

&lt;p&gt;My setup involved a small 3-node Kubernetes cluster, all running on bare metal. I wanted to run PostgreSQL, some AI agents, and a few stateful services like Nextcloud and MinIO. The cluster was deployed with Kubespray and Proxmox as the hypervisor. I tried using hostPath for a while, but that was a nightmare when nodes died or storage filled up.&lt;/p&gt;

&lt;p&gt;I tried a few other tools first—Rook-Ceph, RBD with Ceph, and even some DIY iSCSI solutions. None of them clicked the way I needed them to. Rook-Ceph was too heavy, and RBD required a full Ceph cluster. iSCSI was flaky and required a dedicated storage node, which I didn’t want to add. I needed something simple, self-hosted, and easy to manage—and that’s when I turned to Longhorn.&lt;/p&gt;

&lt;p&gt;I assumed that Longhorn would be a drop-in solution. I followed a few tutorials and tried to deploy it as-is. The first problem was that my nodes weren’t labeled properly for storage scheduling. I didn’t set &lt;code&gt;node.kubernetes.io/role=worker&lt;/code&gt; or &lt;code&gt;longhorn.io/host-attached-storage&lt;/code&gt;, so the storage manager wouldn’t assign volumes correctly.&lt;/p&gt;

&lt;p&gt;Then I tried to create a PVC and it failed with &lt;code&gt;ProvisioningFailed&lt;/code&gt;. I thought it was a storage class issue, but it turned out my disks weren’t labeled correctly in Longhorn. I had three drives on each node, but I forgot to set them to &lt;code&gt;used&lt;/code&gt;—Longhorn wasn’t picking them up for volume placement.&lt;/p&gt;

&lt;p&gt;I also tried using &lt;code&gt;numberOfReplicas: 3&lt;/code&gt; on a 3-node cluster, which sounded logical. But when I tried to write data to a volume, it started failing because it couldn’t replicate across all three nodes. I had to manually adjust the replicas and found that a &lt;code&gt;1&lt;/code&gt; replica was actually more stable in my setup.&lt;/p&gt;

&lt;p&gt;Here’s how I got Longhorn working reliably. This is based on a 3-node Kubernetes cluster with Proxmox VMs, each with a single storage disk. You can adjust the number of replicas and storage classes based on your cluster size and reliability needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install Longhorn via Helm
&lt;/h3&gt;

&lt;p&gt;First, install Longhorn with Helm. I used the official chart from the Longhorn repo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add longhorn https://charts.longhorn.io
helm repo update
helm &lt;span class="nb"&gt;install &lt;/span&gt;longhorn longhorn/longhorn &lt;span class="nt"&gt;--namespace&lt;/span&gt; longhorn &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will deploy the Longhorn manager, engine, and UI components.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Label Your Nodes
&lt;/h3&gt;

&lt;p&gt;Longhorn uses node labels to determine where to place volumes. Make sure your nodes are labeled properly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl label nodes &amp;lt;node-name&amp;gt; longhorn.io/host-attached-storage&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;kubectl label nodes &amp;lt;node-name&amp;gt; node-role.kubernetes.io/worker&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat for all worker nodes. This tells Longhorn that the node has storage available and is suitable for scheduling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Configure Storage Disks in Longhorn UI
&lt;/h3&gt;

&lt;p&gt;Open the Longhorn UI (usually at &lt;code&gt;http://&amp;lt;your-cluster-ip&amp;gt;:30000&lt;/code&gt;) and navigate to the &lt;strong&gt;Nodes&lt;/strong&gt; section. For each node, you'll see a list of disks. Make sure each disk you want to use is marked as &lt;code&gt;used&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can set the disk to &lt;code&gt;used&lt;/code&gt; directly in the UI. This tells Longhorn that the disk is available for volume creation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Create a Storage Class
&lt;/h3&gt;

&lt;p&gt;Now create a storage class in Kubernetes. I used the following for a single-replica setup (ideal for 3-node clusters with minimal redundancy but better performance):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;storage.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;StorageClass&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-slow&lt;/span&gt;
&lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;numberOfReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
  &lt;span class="na"&gt;staleReplicaTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2880"&lt;/span&gt; &lt;span class="c1"&gt;# 48 hours&lt;/span&gt;
  &lt;span class="na"&gt;fromBackup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="na"&gt;provisioner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;driver.longhorn.io&lt;/span&gt;
&lt;span class="na"&gt;reclaimPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Delete&lt;/span&gt;
&lt;span class="na"&gt;volumeBindingMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Immediate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; longhorn-slow.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also create a &lt;code&gt;longhorn-fast&lt;/code&gt; class with &lt;code&gt;numberOfReplicas: 2&lt;/code&gt; if you want more redundancy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Create a PVC
&lt;/h3&gt;

&lt;p&gt;Now create a PersistentVolumeClaim that uses the storage class you just defined.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-pvc&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;accessModes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ReadWriteOnce&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50Gi&lt;/span&gt;
  &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-slow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; longhorn-pvc.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then you can use it in a pod like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-pod&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-volume&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/usr/share/nginx/html&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-volume&lt;/span&gt;
      &lt;span class="na"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;claimName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;longhorn-pvc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply that and you should have a pod with a persistent volume backed by Longhorn.&lt;/p&gt;

&lt;p&gt;Longhorn works well on bare metal because it abstracts away the need for a full storage cluster. It doesn’t require a shared filesystem or a dedicated storage node—just one or more storage disks per node. The key to stability is labeling your nodes correctly and configuring the storage class with the right number of replicas.&lt;/p&gt;

&lt;p&gt;Longhorn uses a distributed engine to manage volume replication and snapshotting. When you write data to a volume, Longhorn creates a block device that mirrors the data across replicas. The more replicas you have, the more redundant the data is—but also the more resources it uses.&lt;/p&gt;

&lt;p&gt;By setting &lt;code&gt;numberOfReplicas: 1&lt;/code&gt;, I was able to get better performance and avoid the issues I had with multi-replica setups in a small cluster. Longhorn’s snapshot system is also great—it allows you to take snapshots and roll back to them if needed, which is especially useful for databases or stateful apps.&lt;/p&gt;

&lt;p&gt;Another thing that helps is the &lt;code&gt;staleReplicaTimeout&lt;/code&gt; setting. This tells Longhorn how long to wait before removing a stale replica that’s no longer in sync. I set it to 2880 (48 hours) to give myself time to investigate issues before cleanup.&lt;/p&gt;

&lt;p&gt;Here’s what I learned after running this setup for a few weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don’t assume all nodes can host storage&lt;/strong&gt;: I had one node that wasn’t labeled properly, and that caused volumes to fail to schedule. Make sure all nodes are labeled correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replicas = redundancy, not performance&lt;/strong&gt;: Setting &lt;code&gt;numberOfReplicas: 3&lt;/code&gt; in a 3-node cluster didn't improve performance—it just caused more overhead and failures. Stick to 1 or 2 replicas unless you really need it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage classes matter&lt;/strong&gt;: I had a few PVCs that failed because I didn’t use the right storage class. Make sure your pods are using the correct class for their use case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snapshots are powerful but not automatic&lt;/strong&gt;: Longhorn snapshots work well, but you have to manage them manually unless you set up an external tool like Velero or Kubernetes Event Watcher.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disk labels are important&lt;/strong&gt;: If your disks aren’t marked as &lt;code&gt;used&lt;/code&gt; in Longhorn, it won’t schedule volumes on them. That was a big gotcha I hit early on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, I’m happy with how Longhorn works on bare metal. It’s lightweight, reliable, and doesn’t require a full storage cluster. It’s not perfect—there are some quirks regarding scheduling and snapshot management—but for a small to medium-sized cluster, it’s one of the best options I’ve found.&lt;/p&gt;

</description>
      <category>kubernetesstorage</category>
      <category>longhorn</category>
      <category>baremetal</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Helm fullnameOverride: Naming Sanity in ArgoCD</title>
      <dc:creator>Guatu</dc:creator>
      <pubDate>Thu, 02 Apr 2026 20:45:19 +0000</pubDate>
      <link>https://dev.to/futhgar/helm-fullnameoverride-naming-sanity-in-argocd-5clo</link>
      <guid>https://dev.to/futhgar/helm-fullnameoverride-naming-sanity-in-argocd-5clo</guid>
      <description>&lt;p&gt;ArgoCD is great for GitOps, but when you start deploying multiple Helm charts, naming collisions become a real pain. I’ve spent hours digging through logs trying to figure out why a deployment failed because the wrong service was being referenced. That’s where &lt;code&gt;fullnameOverride&lt;/code&gt; comes in — it’s not just a feature, it’s a lifeline for keeping your Kubernetes resources uniquely identifiable.&lt;/p&gt;

&lt;p&gt;The key is to explicitly set &lt;code&gt;fullnameOverride&lt;/code&gt; in your ArgoCD app spec. This ensures that Helm uses a predictable name for your resources, preventing clashes between different apps or environments. Without it, you end up with names like &lt;code&gt;myapp-abc-xyz&lt;/code&gt;, which are fine in isolation but disastrous when you scale.&lt;/p&gt;

&lt;p&gt;Here’s how to do it in your ArgoCD app definition. This example sets a custom name for a Helm chart, ensuring it doesn’t clash with any other deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://charts.bitnami.com/bitnami&lt;/span&gt;
    &lt;span class="na"&gt;chart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;18.1.1&lt;/span&gt;
    &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;valueOverrides&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;fullnameOverride&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis-cache-prod"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This change ensures that every deployment of this chart gets the same consistent name, making it easier to manage, troubleshoot, and reference in other services or configurations. It also helps with resource discovery in ArgoCD, avoiding the confusion that comes with ambiguous names.&lt;/p&gt;

&lt;p&gt;Don’t underestimate the power of a well-named resource — it’s the difference between a smooth deployment and a debugging nightmare.&lt;/p&gt;

</description>
      <category>helm</category>
      <category>argocd</category>
      <category>kubernetes</category>
      <category>naming</category>
    </item>
  </channel>
</rss>
