<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: George Ezejiofor</title>
    <description>The latest articles on DEV Community by George Ezejiofor (@ezejioforog).</description>
    <link>https://dev.to/ezejioforog</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F989441%2Fa7a14354-4598-4802-93ab-1a4c95ff90f7.jpeg</url>
      <title>DEV Community: George Ezejiofor</title>
      <link>https://dev.to/ezejioforog</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ezejioforog"/>
    <language>en</language>
    <item>
      <title>Homelab HA Kubernetes Cluster Upgrade: My New Shrine / Altar</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Fri, 03 Apr 2026 21:24:36 +0000</pubDate>
      <link>https://dev.to/ezejioforog/homelab-ha-kubernetes-cluster-upgrade-my-new-shrine-altar-5c34</link>
      <guid>https://dev.to/ezejioforog/homelab-ha-kubernetes-cluster-upgrade-my-new-shrine-altar-5c34</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;INTRODUCTION&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;In the beginning, there was MicroK8s on a Mac Studio. It was fast, it was ARM64, but it was lonely. Today, I stand before a high-availability monument built on Proxmox, orchestrated by Terraform, and kept in holy alignment by FluxCD.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not long ago, my entire Kubernetes universe lived inside a humble Mac Studio - a single microk8s cluster with 6 nodes running on ARM64. It was cute, quiet, and completely unfit for the kind of multi‑DC, production‑grade nonsense I wanted to learn.&lt;/p&gt;

&lt;p&gt;So I burned it down. And built this new place of worship.&lt;/p&gt;

&lt;p&gt;Today, I run a high‑availability kubeadm cluster across three bare‑metal Proxmox Datacenters, all managed with Terraform, Ansible, and FluxCD. No cloud vendor lock‑in. No magic. Just a rack full of metal, a bunch of cables, and a lot of terminal time.&lt;/p&gt;

&lt;p&gt;This is the story of my shrine - and how you can build one too.&lt;/p&gt;

&lt;h2&gt;
  
  
  UGLY WIRING:
&lt;/h2&gt;

&lt;p&gt;MAJOR REASON WHY I CALLED IT SHRINE 😂  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruu71fwpd17d8k0hi99i.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruu71fwpd17d8k0hi99i.jpeg" alt="homelab1" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Traffic Flow at a Glance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before we dive into the layers, here's how the traffic moves from my "pulpit" (Mac Studio) to the "shrine" (the cluster):&lt;/p&gt;

&lt;p&gt;No inbound holes – all management traffic originates from my Mac or the cluster itself (GitOps pulls). This is how real datacenters work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                         🖥️ macOS COMMAND CENTER (The Pulpit)                         │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│              kubectl  │  Terraform  │  Ansible  │  Flux CLI  │  Git                  │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                        (All management tools installed locally)                      │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────────────────────┬────────────────────────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                          &lt;span class="s"&gt;SSH │ API (HTTPS) │ Git (SSH/HTTPS)&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                                          &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                         🛡️ OPNsense Firewall (10.0.1.1)                              │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ DHCP Server     │  │ Static DHCP     │  │ WireGuard VPN   │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ 10.0.1.100-200  │  │ MAC → IP Pinning │  │ Remote Access   │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="na"&gt;│   • Split-Horizon DNS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="s"&gt;.georgehomelab.com → 10.0.1.x                               │&lt;/span&gt;
&lt;span class="s"&gt;│   • Gateway for all Proxmox + Kubernetes traffic                                     │&lt;/span&gt;
&lt;span class="na"&gt;│   • Firewall rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WAN → LAN passes for management                                  │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────────────────────┬────────────────────────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                                          &lt;span class="s"&gt;│ LAN (10.0.1.0/16)&lt;/span&gt;
                                          &lt;span class="s"&gt;│ 2.5GbE Links&lt;/span&gt;
                                          &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                         🔌 Zyxel XMG1915-10E Switch                                  │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                     Star topology │ 8× 2.5GbE + 2× SFP+                              │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────────────────────┬────────────────────────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
              &lt;span class="s"&gt;┌───────────────────────────┼───────────────────────────┐&lt;/span&gt;
              &lt;span class="s"&gt;│                           │                           │&lt;/span&gt;
              &lt;span class="s"&gt;▼                           ▼                           ▼&lt;/span&gt;
&lt;span class="s"&gt;┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                         │ │                         │ │                         │&lt;/span&gt;
&lt;span class="s"&gt;│  🏗️ Proxmox Node 1      │ │  🏗️ Proxmox Node 2      │ │  🏗️ Proxmox Node 3      │&lt;/span&gt;
&lt;span class="s"&gt;│  (proxmox-dc-1)         │ │  (proxmox-dc-2)         │ │  (proxmox-dc-3)         │&lt;/span&gt;
&lt;span class="s"&gt;│  10.0.1.10              │ │  10.0.1.11              │ │  10.0.1.12              │&lt;/span&gt;
&lt;span class="s"&gt;│                         │ │                         │ │                         │&lt;/span&gt;
&lt;span class="s"&gt;│  • Local ZFS Storage    │ │  • Local ZFS Storage    │ │  • Local ZFS Storage    │&lt;/span&gt;
&lt;span class="s"&gt;│  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │&lt;/span&gt;
&lt;span class="s"&gt;│  • NFS Client (Backups) │ │  • NFS Client (Backups) │ │  • NFS Client (Backups) │&lt;/span&gt;
&lt;span class="s"&gt;│                         │ │                         │ │                         │&lt;/span&gt;
&lt;span class="s"&gt;│  Terraform → VM Creation via Proxmox API (telmate/provider)                          │&lt;/span&gt;
&lt;span class="s"&gt;│  Packer → Ubuntu Cloud-Init Templates                                                │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘&lt;/span&gt;
              &lt;span class="s"&gt;│                           │                           │&lt;/span&gt;
              &lt;span class="s"&gt;│ Cloud-Init DHCP (Static Reservations → Predictable IPs)                │&lt;/span&gt;
              &lt;span class="s"&gt;│                           │                           │&lt;/span&gt;
              &lt;span class="s"&gt;└───────────────────────────┼───────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                         ☸️ HA Kubernetes Cluster (kubeadm)                           │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│   │ Control Plane Node 1    │ │ Control Plane Node 2    │ │ Control Plane Node 3    │&lt;/span&gt;
&lt;span class="s"&gt;│   │ k8s-cp-1                │ │ k8s-cp-2                │ │ k8s-cp-3                │&lt;/span&gt;
&lt;span class="s"&gt;│   │ 10.0.1.110              │ │ 10.0.1.111              │ │ 10.0.1.112              │&lt;/span&gt;
&lt;span class="s"&gt;│   │                         │ │                         │ │                         │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • etcd (stacked)        │ │ • etcd (stacked)        │ │ • etcd (stacked)        │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • kube-apiserver        │ │ • kube-apiserver        │ │ • kube-apiserver        │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│   │ Worker Node 1           │ │ Worker Node 2           │ │ Worker Node 3           │&lt;/span&gt;
&lt;span class="s"&gt;│   │ k8s-worker-1            │ │ k8s-worker-2            │ │ k8s-worker-3            │&lt;/span&gt;
&lt;span class="s"&gt;│   │ 10.0.1.120              │ │ 10.0.1.121              │ │ 10.0.1.122              │&lt;/span&gt;
&lt;span class="s"&gt;│   │                         │ │                         │ │                         │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • kube-proxy            │ │ • kube-proxy            │ │ • kube-proxy            │&lt;/span&gt;
&lt;span class="s"&gt;│   │ • Workload Pods         │ │ • Workload Pods         │ │ • Workload Pods         │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="na"&gt;│   Pod CIDR: 10.244.0.0/16 │ Service CIDR: 10.245.0.0/16 │ MetalLB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.1.200-210  │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   🔧 Bootstrapped entirely by Ansible (kubeadm playbook)                            │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────────────────────┬────────────────────────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                                          &lt;span class="s"&gt;│ GitOps Sync (Outbound Only)&lt;/span&gt;
                                          &lt;span class="s"&gt;│ FluxCD pulls from GitHub (no inbound!)&lt;/span&gt;
                                          &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                         🔄 FluxCD System (Inside Cluster)                            │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ source-         │  │ kustomize-      │  │ helm-           │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ controller      │  │ controller      │  │ controller      │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ notification-   │  │ image-reflector-│  │ image-          │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   │ controller      │  │ controller      │  │ automation-     │                      │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   • Deployed as part of Ansible playbook (not a separate step)                       │&lt;/span&gt;
&lt;span class="s"&gt;│   • Continuously reconciles cluster state with Git                                   │&lt;/span&gt;
&lt;span class="s"&gt;│   • Auto-heals configuration drift                                                   │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└─────────────────────────────────────────┬────────────────────────────────────────────┘&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                                          &lt;span class="s"&gt;│ HTTPS/SSH (Outbound Pull)&lt;/span&gt;
                                          &lt;span class="s"&gt;│&lt;/span&gt;
                                          &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;┌──────────────────────────────────────────────────────────────────────────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│                              📦 GitHub Private Repository                            │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;│   ┌─────────────────────────────────────────────────────────────────────────────┐    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  clusters/prod/                                                             │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  ├── flux-system/&lt;/span&gt;          &lt;span class="c1"&gt;# Flux bootstrapping config                      │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  │   ├── gotk-components.yaml                                               │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  │   └── gotk-sync.yaml                                                     │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  ├── apps/&lt;/span&gt;                  &lt;span class="c1"&gt;# Application deployments                       │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  │   ├── metallb/                                                          │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  │   ├── istio-ingress/                                                    │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  │   └── prometheus-stack/                                                 │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │  └── infrastructure/&lt;/span&gt;        &lt;span class="c1"&gt;# Cluster-wide config                          │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │      ├── namespaces.yaml                                                   │    │&lt;/span&gt;
&lt;span class="s"&gt;│   │      └── storage-class.yaml                                                │    │&lt;/span&gt;
&lt;span class="s"&gt;│   └─────────────────────────────────────────────────────────────────────────────┘    │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="na"&gt;│   🔑 Source of Truth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Every change starts as a PR, reviewed, merged, then applied   │&lt;/span&gt;
&lt;span class="s"&gt;│                                                                                      │&lt;/span&gt;
&lt;span class="s"&gt;└──────────────────────────────────────────────────────────────────────────────────────┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Level 1: The Physical Layer (The Foundations)
&lt;/h2&gt;

&lt;p&gt;Every altar begins with something tangible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; A fleet of &lt;em&gt;Minisforum MS-01&lt;/em&gt; machines acting as compute Datacenter (&lt;em&gt;96GB RAM&lt;/em&gt;  and &lt;em&gt;8GB nvidia GPU&lt;/em&gt; on each machine) . That's Total of &lt;em&gt;288GB RAM&lt;/em&gt; and &lt;em&gt;24GB nvidia GPU&lt;/em&gt; in the three _Minisforum MS-01&lt;br&gt;
_&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network Entry Point:&lt;/strong&gt; My Mac Studio (the “&lt;em&gt;pulpit&lt;/em&gt;”), connected via Wi-Fi&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Firewall:&lt;/strong&gt; OPNsense bridging external (192.168.1.x) to internal lab network (10.0.1.x)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Out-of-Band Access:&lt;/strong&gt; &lt;em&gt;TinyPilot Voyager 2a&lt;/em&gt; and &lt;em&gt;TESmart 4‑port HDMI KVM&lt;/em&gt; — BIOS-level control even when the OS is down&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Switch:&lt;/strong&gt; &lt;em&gt;Zyxel XMG1915-10E (2.5GbE + SFP+)&lt;/em&gt; is the &lt;em&gt;Central Nervous System&lt;/em&gt;. With &lt;em&gt;High-Velocity East-West Traffic&lt;/em&gt; ( &lt;em&gt;Low Latency / High Throughput&lt;/em&gt; for etcd and storage )&lt;/p&gt;

&lt;p&gt;Why I worship here:&lt;/p&gt;

&lt;p&gt;Physical simplicity enables logical complexity.&lt;/p&gt;

&lt;p&gt;No mystery cables. Everything is deliberate. This playground makes me the opportunity to play with any cloud-native tool with ease.&lt;/p&gt;
&lt;h2&gt;
  
  
  Level 2: The Infrastructure Layer (Proxmox Datacenter)
&lt;/h2&gt;

&lt;p&gt;Before automation, there must be a foundation.&lt;/p&gt;

&lt;p&gt;Proxmox VE installed manually on all three Minisforum MS-01 machine&lt;/p&gt;

&lt;p&gt;Clustered into a single datacenter abstraction&lt;/p&gt;

&lt;p&gt;Networking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vmbr0 → Kubernetes network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Static host IPs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10.0.1.1x&lt;/li&gt;
&lt;li&gt;10.0.1.1x&lt;/li&gt;
&lt;li&gt;10.0.1.1x&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gateway: 10.0.1.x (OPNsense)&lt;/p&gt;

&lt;p&gt;Storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local ZFS (NVMe)&lt;/li&gt;
&lt;li&gt;NFS for shared ISO + backups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ritual:&lt;/p&gt;

&lt;p&gt;I installed the first Proxmox VE manually on each machine via TinyPilot’s virtual media from my MAC-Studio browser over wifi.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;No HDMI cable ever touched my desk.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Level 3: The Node Layer (Terraform Automation)
&lt;/h2&gt;

&lt;p&gt;I no longer click buttons to create infrastructure.&lt;/p&gt;

&lt;p&gt;I declare it.&lt;/p&gt;

&lt;p&gt;Using the Proxmox Terraform provider, I define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VM CPU, memory, disk&lt;/li&gt;
&lt;li&gt;Network interfaces&lt;/li&gt;
&lt;li&gt;Clone source (Ubuntu template from Packer)
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;resource "proxmox_vm_qemu" "k8s_node" {&lt;/span&gt;
  &lt;span class="s"&gt;for_each = var.nodes&lt;/span&gt;

  &lt;span class="s"&gt;name        = each.value.name&lt;/span&gt;
  &lt;span class="s"&gt;target_node = each.value.proxmox_node&lt;/span&gt;
  &lt;span class="s"&gt;clone       = "ubuntu-24-04-template"&lt;/span&gt;
  &lt;span class="s"&gt;cores       = each.value.cores&lt;/span&gt;
  &lt;span class="s"&gt;memory      = each.value.memory&lt;/span&gt;

  &lt;span class="s"&gt;network {&lt;/span&gt;
    &lt;span class="s"&gt;model     = "virtio"&lt;/span&gt;
    &lt;span class="s"&gt;bridge    = "vmbr0"&lt;/span&gt;
    &lt;span class="s"&gt;ipconfig0 = "ip=dhcp"&lt;/span&gt;
  &lt;span class="s"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;The DHCP Decision (And Why It Matters)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was one of the most important lessons in my journey.&lt;/p&gt;

&lt;p&gt;In my old Mac Studio setup, I used pure DHCP for Kubernetes nodes.&lt;/p&gt;

&lt;p&gt;It worked… until every restart broke my cluster access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What went wrong?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control plane nodes changed IPs&lt;/li&gt;
&lt;li&gt;kubeconfig became invalid&lt;/li&gt;
&lt;li&gt;API server endpoints broke&lt;/li&gt;
&lt;li&gt;etcd stability was at risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with 3 &lt;strong&gt;control planes&lt;/strong&gt;, the cluster wasn’t truly stable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Not Static IPs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because static IPs inside the OS mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual netplan configuration&lt;/li&gt;
&lt;li&gt;Hardcoding network logic into templates&lt;/li&gt;
&lt;li&gt;Reduced rebuild flexibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not how cloud-native systems behave.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution: DHCP + Reservations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I used DHCP everywhere — but configured static reservations in OPNsense.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔ Nodes auto-configure&lt;/li&gt;
&lt;li&gt;✔ IPs never change&lt;/li&gt;
&lt;li&gt;✔ Rebuilds are seamless&lt;/li&gt;
&lt;li&gt;✔ etcd remains stable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 The Real Insight&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Kubernetes doesn’t care how IPs are assigned — only that they don’t change.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Level 4: The Cluster Layer (Ansible + Kubeadm)
&lt;/h2&gt;

&lt;p&gt;Once the infrastructure exists, it must be transformed.&lt;/p&gt;

&lt;p&gt;Using &lt;strong&gt;Ansible:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- OS hardening&lt;/li&gt;
&lt;li&gt;- Swap disabled&lt;/li&gt;
&lt;li&gt;- containerd installed&lt;/li&gt;
&lt;li&gt;kubeadm, kubelet, kubectl configured&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;HA Control Plane&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 control plane nodes&lt;/li&gt;
&lt;li&gt;Stacked etcd (homelab-friendly)&lt;/li&gt;
&lt;li&gt;kube-vip for API virtual IP&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Level 5: The Application Layer (GitOps with FluxCD)
&lt;/h2&gt;

&lt;p&gt;This is where everything changes.&lt;/p&gt;

&lt;p&gt;Instead of imperative deployments or declarative deployment with kubectl, I use GitOps &lt;strong&gt;FluxCD.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitOps From Day One&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FluxCD is not an add-on.&lt;/p&gt;

&lt;p&gt;It is deployed &lt;strong&gt;during cluster creation via Ansible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cluster is GitOps-ready immediately&lt;/li&gt;
&lt;li&gt;No manual bootstrap later&lt;/li&gt;
&lt;li&gt;No drift from day one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Pull Model&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flux runs inside the cluster&lt;/li&gt;
&lt;li&gt;Watches Git repository&lt;/li&gt;
&lt;li&gt;Pulls changes automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No inbound access required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flg296c0p1i1pfc2nycib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flg296c0p1i1pfc2nycib.png" alt="Homelab3" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjivr0wykhduzmj6cifew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjivr0wykhduzmj6cifew.png" alt="Homelab4" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8irk4e2v8dg8l1f52ukh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8irk4e2v8dg8l1f52ukh.png" alt="Homelab5" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traffic Flow&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;Mac Studio (192.168.1.x)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
        &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;OPNsense Firewall (10.0.1.x)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
        &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;Proxmox Cluster (10.0.1.1x–1x)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
        &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;Kubernetes Nodes (DHCP → Reserved IPs)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
        &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;FluxCD Controllers (inside cluster)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
        &lt;span class="s"&gt;▼&lt;/span&gt;
&lt;span class="s"&gt;GitHub (OUTBOUND pull model)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ GitHub never connects to your cluster&lt;/li&gt;
&lt;li&gt;❌ No firewall holes needed&lt;/li&gt;
&lt;li&gt;✅ Flux initiates outbound sync&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;** Current State of the Shrine**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 control plane nodes ✅&lt;/li&gt;
&lt;li&gt;3 worker nodes ✅&lt;/li&gt;
&lt;li&gt;etcd cluster healthy ✅&lt;/li&gt;
&lt;li&gt;Flux controllers distributed across nodes ✅&lt;/li&gt;
&lt;li&gt;Calico networking active ✅&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is no longer a lab.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is a self-healing platform.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y003v3qkbbsr4llvqny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y003v3qkbbsr4llvqny.png" alt="Homelab2" width="800" height="639"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiu5ziz6sov9q4npljq4f.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiu5ziz6sov9q4npljq4f.gif" alt="Homelab6" width="760" height="682"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I Learned&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DHCP + reservations is the sweet spot&lt;/li&gt;
&lt;li&gt;etcd requires stable identity, not static config&lt;/li&gt;
&lt;li&gt;GitOps removes human drift completely&lt;/li&gt;
&lt;li&gt;Terraform + Ansible + FluxCD = powerful combination&lt;/li&gt;
&lt;li&gt;Firewalls must allow internal routing for automation&lt;/li&gt;
&lt;li&gt;Never use root API for automation — use scoped tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What’s Next on the Altar&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ceph or Longhorn for HA storage&lt;/li&gt;
&lt;li&gt;Velero for cluster backups&lt;/li&gt;
&lt;li&gt;External Secrets + Vault&lt;/li&gt;
&lt;li&gt;Cluster autoscaler experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Final Words&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This homelab is more than a project.&lt;/p&gt;

&lt;p&gt;It is a &lt;strong&gt;practice ground for real-world platform engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The move from a single ARM node to a distributed HA cluster wasn’t just an upgrade in hardware — it was an upgrade in mindset.&lt;/p&gt;

&lt;p&gt;My Mac Studio is no longer the host.&lt;/p&gt;

&lt;p&gt;It is the &lt;strong&gt;pulpit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Shrine runs independently.&lt;/p&gt;

&lt;p&gt;If you’re thinking of building something like this — do it.&lt;br&gt;
Start small. Break things. Rebuild them better.&lt;/p&gt;

&lt;p&gt;Now go build your own altar. 🛐&lt;/p&gt;

&lt;p&gt;🤝 Stay Connected&lt;/p&gt;

&lt;p&gt;Found this guide helpful? Follow my journey into Homelabing on LinkedIn! Click the blue LinkedIn button to connect: &lt;a href="https://www.linkedin.com/in/george-ezejiofor-89615a8a/" rel="noopener noreferrer"&gt;George Ezejiofor&lt;/a&gt; . Let’s keep building scalable, secure cloud-native systems, one project at a time! 🌐🔧&lt;/p&gt;

</description>
      <category>devops</category>
      <category>homelab</category>
      <category>kubernetes</category>
      <category>automation</category>
    </item>
    <item>
      <title>Secured AI‑Driven SRE Platform for Kubernetes Observability</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Mon, 23 Mar 2026 21:18:19 +0000</pubDate>
      <link>https://dev.to/ezejioforog/secured-ai-driven-sre-platform-for-kubernetes-observability-eke</link>
      <guid>https://dev.to/ezejioforog/secured-ai-driven-sre-platform-for-kubernetes-observability-eke</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecnnfkfzpk22y78g8tim.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecnnfkfzpk22y78g8tim.png" alt="sre-gif" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A complete guide to deploying a production-grade firewall with remote management&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Introduction — The Observability Problem
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Modern Kubernetes platforms are inherently complex.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A single production cluster can run hundreds of microservices, service mesh components, CI/CD controllers, and security systems — all evolving continuously across both application and infrastructure layers.&lt;br&gt;
Over the past few years, observability tooling has matured significantly. Platforms like Prometheus, Grafana, and Jaeger provide deep visibility into system behaviour.&lt;br&gt;
But during an incident, visibility alone is not enough.&lt;br&gt;
SREs are still required to manually interpret and correlate signals across multiple systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics must be queried and interpreted&lt;/li&gt;
&lt;li&gt;Logs must be searched and correlated&lt;/li&gt;
&lt;li&gt;Traces must be followed across service boundaries&lt;/li&gt;
&lt;li&gt;Infrastructure changes must be identified and linked to symptoms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite having all the data, the investigation process remains fundamentally manual.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Observability tools provide data — but they don’t provide reasoning.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Real Bottleneck: Correlation and Time
&lt;/h2&gt;

&lt;p&gt;The challenge is no longer data collection.&lt;br&gt;
The real bottleneck is &lt;strong&gt;how quickly that data can be turned into understanding during an incident.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;In practice, incident response often involves:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;switching between multiple dashboards&lt;/li&gt;
&lt;li&gt;writing ad hoc queries&lt;/li&gt;
&lt;li&gt;forming and testing hypotheses&lt;/li&gt;
&lt;li&gt;mentally correlating signals across systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This process is:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;time-consuming&lt;/li&gt;
&lt;li&gt;cognitively demanding&lt;/li&gt;
&lt;li&gt;highly dependent on individual expertise
As systems scale, this model does not scale with them.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Rethinking Observability
&lt;/h2&gt;

&lt;p&gt;This project explores a different approach.&lt;/p&gt;

&lt;p&gt;Instead of treating observability as a collection of tools, it treats it as a reasoning problem.&lt;/p&gt;

&lt;p&gt;What if observability data could be investigated automatically by an AI-driven SRE platform — one that understands Kubernetes, infrastructure behaviour, and failure patterns, while operating within strict, read-only security boundaries?&lt;/p&gt;
&lt;h2&gt;
  
  
  From Signals to Intelligence
&lt;/h2&gt;

&lt;p&gt;The goal is to move from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dashboards → decisions&lt;/li&gt;
&lt;li&gt;alerts → investigations&lt;/li&gt;
&lt;li&gt;data → actionable insight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By introducing an AI-driven investigation layer, the platform aims to reduce the time required to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detect&lt;/li&gt;
&lt;li&gt;analyze&lt;/li&gt;
&lt;li&gt;and understand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;failures in modern Kubernetes environments.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Private AKS Cluster behind Azure Firewall: Deployed via Terraform Cloud, Using Gitops (Argocd with SSO) for deployment&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg714eo53ko215noqes5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg714eo53ko215noqes5.png" alt="argocd UI" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69vtgxlfc3es1uwlo76m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69vtgxlfc3es1uwlo76m.png" alt="argocd Dashboard" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0uqbhta8lnka4onpuag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx0uqbhta8lnka4onpuag.png" alt="Observability stack1" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftekoh8vtksulh7agmusq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftekoh8vtksulh7agmusq.png" alt="Observability stack2" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatn7v1a5g6eg32c0jn20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatn7v1a5g6eg32c0jn20.png" alt="Observability stack3" width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falea4gst9jocm05rd721.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falea4gst9jocm05rd721.png" alt="Observability stack4" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Core Observability Components
&lt;/h2&gt;

&lt;p&gt;The platform integrates multiple tools, each responsible for a specific signal type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics — Prometheus&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Prometheus is the central metrics engine of the platform.&lt;/p&gt;

&lt;p&gt;It is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scraping metrics from Kubernetes components&lt;/li&gt;
&lt;li&gt;collecting node and pod-level telemetry&lt;/li&gt;
&lt;li&gt;storing time-series data&lt;/li&gt;
&lt;li&gt;enabling PromQL-based querying&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From your cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prometheus-prometheus-prometheus-0&lt;/li&gt;
&lt;li&gt;prometheus-node-exporter-*&lt;/li&gt;
&lt;li&gt;kube-state-metrics&lt;/li&gt;
&lt;li&gt;blackbox-exporter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These components allow the system to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU / memory spikes&lt;/li&gt;
&lt;li&gt;pod restarts&lt;/li&gt;
&lt;li&gt;service latency trends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9rwdqmesayqk7kdnhlp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9rwdqmesayqk7kdnhlp.png" alt="Observability stack5" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Visualisation — Grafana
&lt;/h2&gt;

&lt;p&gt;Grafana provides real-time dashboards and visualisations.&lt;/p&gt;

&lt;p&gt;Role in the platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visualising Prometheus metrics&lt;/li&gt;
&lt;li&gt;building SRE dashboards&lt;/li&gt;
&lt;li&gt;supporting manual and AI-assisted investigations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example pod:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prometheus-grafana-*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu71agu0ogou2uwqigyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu71agu0ogou2uwqigyp.png" alt="Observability stack6" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Logs — Elasticsearch + Kibana
&lt;/h2&gt;

&lt;p&gt;Logging is handled using &lt;strong&gt;Elasticsearch&lt;/strong&gt; and &lt;strong&gt;Kibana&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Responsibilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;centralized log aggregation&lt;/li&gt;
&lt;li&gt;indexing and searching logs&lt;/li&gt;
&lt;li&gt;enabling correlation with metrics and traces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From your cluster:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;elasticsearch-es-default-0&lt;/li&gt;
&lt;li&gt;kibana-kb-*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This enables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application log analysis&lt;/li&gt;
&lt;li&gt;error tracing&lt;/li&gt;
&lt;li&gt;debugging failed workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy6sbbe7849w0mn1eyv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy6sbbe7849w0mn1eyv9.png" alt="Observability stack7" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Distributed Tracing - Jaeger
&lt;/h2&gt;

&lt;p&gt;Jaeger provides end-to-end request tracing across services.&lt;br&gt;
Role:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;track request flow across microservices&lt;/li&gt;
&lt;li&gt;identify latency bottlenecks&lt;/li&gt;
&lt;li&gt;debug service-to-service communication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From your cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jaeger-*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is critical in service mesh environments where requests traverse multiple services.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzqpz887u3aoihcvte36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzqpz887u3aoihcvte36.png" alt="Observability stack8" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Service Mesh Observability - Kiali
&lt;/h2&gt;

&lt;p&gt;Kiali is used to visualize the service mesh topology.&lt;br&gt;
It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traffic flow visualization&lt;/li&gt;
&lt;li&gt;service dependencies&lt;/li&gt;
&lt;li&gt;health status of services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From your cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kiali-*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful with Istio-based environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft271z68mzsqqrquamwce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft271z68mzsqqrquamwce.png" alt="Observability stack9" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Telemetry Collection - OpenTelemetry
&lt;/h2&gt;

&lt;p&gt;OpenTelemetry acts as the &lt;strong&gt;telemetry pipeline.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Responsibilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;collecting metrics, logs, and traces&lt;/li&gt;
&lt;li&gt;exporting data to observability backends&lt;/li&gt;
&lt;li&gt;enabling standardized instrumentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;From your cluster:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;opentelemetry-collector-*&lt;/li&gt;
&lt;li&gt;opentelemetry-operator-*&lt;/li&gt;
&lt;li&gt;opentelemetry-ebpf-instrumentation-*&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The use of &lt;strong&gt;eBPF-based instrumentation&lt;/strong&gt; is particularly powerful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no application code changes required&lt;/li&gt;
&lt;li&gt;deep kernel-level visibility&lt;/li&gt;
&lt;li&gt;automatic tracing and metrics collection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc3y989rg1mjgo8y831i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnc3y989rg1mjgo8y831i.png" alt="Observability stack10" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Kubernetes State Monitoring
&lt;/h2&gt;

&lt;p&gt;Additional components provide insight into cluster state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;kube-state-metrics&lt;/strong&gt; → Kubernetes object state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cadvisor → container&lt;/strong&gt; resource usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;node-exporter&lt;/strong&gt; → node-level metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These ensure the platform can inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployments&lt;/li&gt;
&lt;li&gt;pods&lt;/li&gt;
&lt;li&gt;nodes&lt;/li&gt;
&lt;li&gt;resource utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Alerting - Alertmanager
&lt;/h2&gt;

&lt;p&gt;While Prometheus is responsible for collecting and evaluating metrics, &lt;strong&gt;Alertmanager&lt;/strong&gt; handles what happens &lt;strong&gt;when something goes wrong.&lt;/strong&gt;&lt;br&gt;
In this platform, Alertmanager acts as the &lt;strong&gt;bridge between detection and response.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Role in the Platform
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Alertmanager&lt;/strong&gt; is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;receiving alerts from Prometheus&lt;/li&gt;
&lt;li&gt;grouping and deduplicating alerts&lt;/li&gt;
&lt;li&gt;routing alerts to the appropriate channels&lt;/li&gt;
&lt;li&gt;managing alert silencing and escalation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From your cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;alertmanager-prometheus-alertmanager-0&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why Alertmanager Matters
&lt;/h2&gt;

&lt;p&gt;In traditional setups, Alertmanager typically sends notifications to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;email&lt;/li&gt;
&lt;li&gt;Slack&lt;/li&gt;
&lt;li&gt;PagerDuty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, this introduces a key limitation:&lt;br&gt;
&lt;em&gt;&lt;strong&gt;Alerts notify humans - but they do not investigate the problem.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This creates a workflow like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;Alert Triggered&lt;/span&gt;
     &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;Human SRE Responds&lt;/span&gt;
     &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;Manual Investigation Begins&lt;/span&gt;
     &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;Root Cause Found (Eventually)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This process is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reactive&lt;/li&gt;
&lt;li&gt;slow&lt;/li&gt;
&lt;li&gt;dependent on human availability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglvm9mx9a2lplm4g4m7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglvm9mx9a2lplm4g4m7z.png" alt="observabiltiy stack13" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why This Stack Matters&lt;/p&gt;

&lt;p&gt;Each tool provides a piece of the puzzle, but none provides full understanding alone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1tghaewet5nlry040ay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1tghaewet5nlry040ay.png" alt="observability stack14" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Limitation
&lt;/h2&gt;

&lt;p&gt;Despite having a powerful observability stack:&lt;br&gt;
These tools provide data, not reasoning.&lt;br&gt;
An SRE still has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jump between dashboards&lt;/li&gt;
&lt;li&gt;write queries manually&lt;/li&gt;
&lt;li&gt;correlate signals mentally&lt;/li&gt;
&lt;li&gt;Identify root causes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;time-consuming&lt;/li&gt;
&lt;li&gt;error-prone&lt;/li&gt;
&lt;li&gt;not scalable&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Transition to the Next Section
&lt;/h2&gt;

&lt;p&gt;This limitation led to the key idea behind this project:&lt;br&gt;
What if we could add an intelligent layer on top of this observability stack - one that can reason across metrics, logs, traces, and infrastructure changes automatically?&lt;br&gt;
This is where the AI-driven SRE platform comes in.&lt;/p&gt;
&lt;h2&gt;
  
  
  Project Vision
&lt;/h2&gt;

&lt;p&gt;The goal of this project is simple:&lt;br&gt;
Build a secure AI-driven SRE platform that can investigate Kubernetes incidents by correlating signals across observability systems and the cloud control plane.&lt;br&gt;
The platform focuses on investigation, not just visibility.&lt;br&gt;
&lt;strong&gt;Core capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Investigate incidents end-to-end&lt;/li&gt;
&lt;li&gt;Query observability systems programmatically&lt;/li&gt;
&lt;li&gt;Understand Kubernetes state in real time&lt;/li&gt;
&lt;li&gt;Correlate cloud-level events (Azure Activity Logs)&lt;/li&gt;
&lt;li&gt;Attribute changes to identities (Entra ID)&lt;/li&gt;
&lt;li&gt;Operate under strict read-only RBAC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The design principle is clear:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;No write access. No assumptions. Only evidence-driven analysis.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  High-Level Platform Architecture
&lt;/h2&gt;

&lt;p&gt;The platform is structured into four layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;AI SRE Reasoning Layer&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
&lt;span class="s"&gt;Agent Tooling Layer (MCP)&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
&lt;span class="s"&gt;Observability Platform&lt;/span&gt;
        &lt;span class="s"&gt;│&lt;/span&gt;
&lt;span class="s"&gt;Kubernetes + Azure Infrastructure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Agentic SRE Model
&lt;/h2&gt;

&lt;p&gt;The platform follows an agent-based investigation model.&lt;/p&gt;

&lt;h2&gt;
  
  
  George-GPT (Lead SRE Agent)
&lt;/h2&gt;

&lt;p&gt;Acts as the coordinator.&lt;br&gt;
Responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receive investigation requests&lt;/li&gt;
&lt;li&gt;decide which agents to use&lt;/li&gt;
&lt;li&gt;aggregate findings&lt;/li&gt;
&lt;li&gt;produce root cause analysis&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Specialist Agents
&lt;/h2&gt;

&lt;p&gt;Each agent is scoped to a single domain:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhsrcnw4awpyoxzfxamp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frhsrcnw4awpyoxzfxamp.png" alt="Observability stack16" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Delegation Model
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The lead agent does not query systems directly&lt;/li&gt;
&lt;li&gt;It delegates tasks to specialist agents&lt;/li&gt;
&lt;li&gt;Each agent returns structured results&lt;/li&gt;
&lt;li&gt;Final output is synthesised into a single conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps reasoning focused and composable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr36ir1lx0buqekzkbb3a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr36ir1lx0buqekzkbb3a.png" alt="Observalbility stack17" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fwlauqileanfefsflvx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fwlauqileanfefsflvx.png" alt="Observatility stack18" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vo9vutn5avjt426m185.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vo9vutn5avjt426m185.png" alt="Observatility stack19" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgh54ssasrsoqhqthk4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgh54ssasrsoqhqthk4w.png" alt="Observatility stack20" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F326z5zr7qx5x5x347eml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F326z5zr7qx5x5x347eml.png" alt="Observatility stack21" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  MY CHAT WITH GEORGE-GPT
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;user&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;who are you and what can you do?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kagent/george-gpt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I am GeorgeGPT, Lead SRE Engineer and orchestrator for a production Kubernetes cluster, powered by Gemini 2.5 Flash. I am a specialized SRE agent, not a generic AI.&lt;/p&gt;

&lt;p&gt;I work alongside several peer agents, all deployed in the kagent namespace and powered by Gemini Flash-Lite:&lt;/p&gt;

&lt;p&gt;k8s-agent: Kubernetes resource management helm-agent: Helm release management istio-agent: Istio service mesh analysis promql-agent: Prometheus PromQL queries observability-agent: Grafana dashboards, alerts, and datasources k8sgpt: Automated Kubernetes diagnostics I can also talk to 11 specialized MCP (Model Context Protocol) servers, which expose a total of 74 tools for various SRE tasks:&lt;/p&gt;

&lt;p&gt;kagent-tool-server (namespace: kagent) — 16 tools for Kubernetes resources, Helm releases, Istio proxy status, and Grafana operations (k8s_get_resources, helm_list_releases, istio_proxy_status, grafana_get_dashboard, etc.) mcp-elasticsearch (namespace: mcp-servers) — 5 tools for Elasticsearch log search, ESQL queries, and cluster health (search, esql, list_indices, etc.) mcp-jaeger (namespace: mcp-servers) — 8 tools for Jaeger distributed tracing (list_jaeger_services, search_jaeger_traces, get_error_jaeger_traces, etc.) mcp-kiali (namespace: mcp-servers) — 9 tools for Kiali Istio service mesh observability (diagnose_503, get_traffic_graph, get_service_health, etc.) mcp-auth (namespace: mcp-servers) — 3 tools for JWT/OIDC token analysis (diagnose_auth_failure, validate_jwt, decode_jwt) mcp-keycloak (namespace: mcp-servers) — 5 tools for Keycloak identity management (diagnose_kubeflow_auth, get_client_config, get_realm_settings, etc.) mcp-oauth2-proxy (namespace: mcp-servers) — 7 tools for OAuth2 proxy diagnostics (diagnose_kubeflow_503, get_oauth2_proxy_config, ping_oauth2_proxy, etc.) mcp-otel (namespace: mcp-servers) — 3 tools for OpenTelemetry trace correlation (correlate_trace, get_error_traces, search_traces) mcp-orchestrator (namespace: mcp-servers) — 5 tools for incident lifecycle management (start_incident, add_finding, resolve_incident, etc.) george-gpt (namespace: mcp-servers) — 13 tools for custom SRE tasks including alerts, PromQL queries, and kubectl/helm remediation (check_alerts, run_promql, investigate, kubectl_remediate, helm_remediate, etc.) mcp-prometheus (namespace: mcp-servers) — (Not currently used; I use run_promql() from george-gpt for Prometheus queries.) My primary function is to investigate and resolve production incidents in the Kubernetes cluster by intelligently calling these tools, correlating data across various observability signals (metrics, logs, traces, Kubernetes state), identifying root causes, and providing actionable remediation steps. I follow comprehensive playbooks for common scenarios like login failures, pod evictions, and service mesh 503 errors.&lt;/p&gt;
&lt;h2&gt;
  
  
  TESTING
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;## image: georgeezejiofor/echo-pod:blue-v1  (correct image)&lt;/span&gt;
&lt;span class="s"&gt;kubectl create ns terranetes&lt;/span&gt;
&lt;span class="s"&gt;kubectl create deployment echo-pod --image=georgeezejiofor/echo-pod:blue-bad-v1 -n terranetes --replicas=2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmgzfx9h7xqftemriqom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmgzfx9h7xqftemriqom.png" alt="Observatility stack22" width="800" height="92"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6hpbbp60mrji8611iwr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6hpbbp60mrji8611iwr.png" alt="Observatility stack23" width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nxz3wvo7qglf38vge30.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nxz3wvo7qglf38vge30.png" alt="Observatility stack24" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  TEST RESULTS
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;user&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;why is the pod in terranetes namespace not running?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kagent/george-gpt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;k8s_get_resources adk-485b17df-a174-4d49-a5da-c663935622d9 Completed&lt;/p&gt;

&lt;p&gt;Arguments { "namespace": "terranetes", "resource_type": "pod" }&lt;/p&gt;

&lt;p&gt;Results k8s_describe_resource adk-f6786634-19b3-4da0-a1fb-bd87c306e03d Completed&lt;/p&gt;

&lt;p&gt;Arguments { "namespace": "terranetes", "resource_name": "echo-pod-779cb697d4-czvgp", "resource_type": "pod" }&lt;/p&gt;

&lt;p&gt;Results Name: echo-pod-779cb697d4-czvgp Namespace: terranetes Priority: 0 Service Account: default Node: aks-system-30404036-vmss000000/10.0.0.100 Start Time: Tue, 17 Mar 2026 21:55:51 +0000 Labels: app=echo-pod pod-template-hash=779cb697d4 Annotations: Status: Pending IP: 10.0.0.132 IPs: IP: 10.0.0.132 Controlled By: ReplicaSet/echo-pod-779cb697d4 Containers: echo-pod: Container ID:&lt;br&gt;
Image: georgeezejiofor/echo-pod:blue-bad-v1 Image ID:&lt;br&gt;
Port: Host Port: State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fkdjx (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-fkdjx: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message&lt;/p&gt;

&lt;p&gt;Normal Scheduled 8m57s default-scheduler Successfully assigned terranetes/echo-pod-779cb697d4-czvgp to aks-system-30404036-vmss000000 Normal Pulling 5m59s (x5 over 8m56s) kubelet Pulling image "georgeezejiofor/echo-pod:blue-bad-v1" Warning Failed 5m59s (x5 over 8m56s) kubelet Failed to pull image "georgeezejiofor/echo-pod:blue-bad-v1": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/georgeezejiofor/echo-pod:blue-bad-v1": failed to resolve reference "docker.io/georgeezejiofor/echo-pod:blue-bad-v1": docker.io/georgeezejiofor/echo-pod:blue-bad-v1: not found Warning Failed 5m59s (x5 over 8m56s) kubelet Error: ErrImagePull Warning Failed 3m49s (x20 over 8m55s) kubelet Error: ImagePullBackOff Normal BackOff 3m37s (x21 over 8m55s) kubelet Back-off pulling image "georgeezejiofor/echo-pod:blue-bad-v1"&lt;/p&gt;

&lt;p&gt;k8s_get_events adk-cd6d9711-1275-4279-9c30-5beb4d1f50d4 Completed&lt;/p&gt;

&lt;p&gt;Arguments { "namespace": "terranetes" }&lt;/p&gt;

&lt;p&gt;Results kubectl_remediate adk-6b617523-0d16-4b84-95b3-1bf8de120c84 Failed&lt;/p&gt;

&lt;p&gt;Arguments { "args": "patch deployment echo-pod -n terranetes --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "georgeezejiofor/echo-pod:blue"}]' ", "dry_run": false }&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;kagent/george-gpt&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The pods in the terranetes namespace are not running because the Kubernetes cluster is unable to pull the specified container image.&lt;/p&gt;

&lt;p&gt;Root Cause: The pods echo-pod-779cb697d4-czvgp and echo-pod-779cb697d4-x8bbd are in an ImagePullBackOff state. The error message from the kubelet clearly indicates: Failed to pull image "georgeezejiofor/echo-pod:blue-bad-v1": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/georgeezejiofor/echo-pod:blue-bad-v1": failed to resolve reference "docker.io/georgeezejiofor/echo-pod:blue-bad-v1": docker.io/georgeezejiofor/echo-pod:blue-bad-v1: not found&lt;/p&gt;

&lt;p&gt;This means the container image georgeezejiofor/echo-pod:blue-bad-v1 cannot be found on Docker Hub (docker.io).&lt;/p&gt;

&lt;p&gt;Impact: The echo-pod application in the terranetes namespace is completely non-functional, as its pods cannot start.&lt;/p&gt;

&lt;p&gt;Remediation Steps:&lt;/p&gt;

&lt;p&gt;Verify Image Name and Tag: The primary action is to confirm that the image georgeezejiofor/echo-pod with the tag blue-bad-v1 actually exists in your Docker Hub repository. There might be a typo in the image name or tag within the Kubernetes Deployment configuration.&lt;/p&gt;

&lt;p&gt;Update the Deployment with the Correct Image: If the image name or tag is incorrect, you need to update the echo-pod Deployment in the terranetes namespace to reference the correct, existing image.&lt;/p&gt;

&lt;p&gt;Assuming your deployment is named echo-pod, you can use the following kubectl command to patch the deployment with a corrected image (replace georgeezejiofor/echo-pod:blue with the actual correct image and tag):&lt;/p&gt;

&lt;p&gt;kubectl patch deployment echo-pod -n terranetes --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "georgeezejiofor/echo-pod:blue"}]'&lt;/p&gt;

&lt;p&gt;After running this command, Kubernetes will attempt to pull the new, correct image, and the pods should start running. You can verify this by checking the pod status again using kubectl get pods -n terranetes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Live Incident Example - GEORGE-GPT Resolved ImagePullBackOff in &amp;lt;2 Minutes
&lt;/h2&gt;

&lt;p&gt;The pods immediately went into Pending → ImagePullBackOff. Query to &lt;strong&gt;GEORGE-GPT:&lt;/strong&gt; why is the pod in terranetes namespace not running? What happened next (real-time tool calls):&lt;br&gt;
k8s_get_resources - listed all pods in terranetes k8s_describe_resource - pulled detailed pod description for one failing pod k8s_get_events - fetched namespace events&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GEORGE-GPT's immediate conclusion:&lt;/strong&gt; The pods … are not running because the Kubernetes cluster is unable to pull the specified container image. Root Cause: … ImagePullBackOff … failed to resolve reference "docker.io/georgeezejiofor/echo-pod:blue-bad-v1": … not found Impact: The echo-pod application in the terranetes namespace is completely non-functional, as its pods cannot start.&lt;br&gt;
Remediation recommendation (auto-generated safe command): George-GPT suggested how to fix the error , it doesn't have any permission to make any change in the aks architecture.&lt;br&gt;
Tool-Backed Reasoning (MCP)&lt;br&gt;
Agents interact with systems through &lt;strong&gt;MCP servers (Model Context Protocol).&lt;/strong&gt;&lt;br&gt;
This enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;structured queries&lt;/li&gt;
&lt;li&gt;controlled access&lt;/li&gt;
&lt;li&gt;consistent outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why MCP
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;prevents arbitrary access&lt;/li&gt;
&lt;li&gt;standardizes interactions&lt;/li&gt;
&lt;li&gt;improves reliability of results&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Incident Investigation Walkthrough
&lt;/h2&gt;

&lt;p&gt;This section demonstrates the system in action.&lt;br&gt;
Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingress gateway failure&lt;/li&gt;
&lt;li&gt;traffic disruption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;user submits investigation query&lt;/li&gt;
&lt;li&gt;agents collect evidence&lt;/li&gt;
&lt;li&gt;metrics are analyzed&lt;/li&gt;
&lt;li&gt;logs are inspected&lt;/li&gt;
&lt;li&gt;Azure activity logs are checked&lt;/li&gt;
&lt;li&gt;root cause is identified&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Security by Design
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Security is enforced at every layer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Read-Only RBAC
&lt;/h2&gt;

&lt;p&gt;Agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cannot create resources&lt;/li&gt;
&lt;li&gt;cannot modify resources&lt;/li&gt;
&lt;li&gt;cannot delete resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They can only:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read cluster state&lt;/li&gt;
&lt;li&gt;query telemetry&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  GitOps Deployment
&lt;/h2&gt;

&lt;p&gt;All changes flow through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;GitHub&lt;/span&gt;
   &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;GitHub App (OIDC)&lt;/span&gt;
   &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;ArgoCD&lt;/span&gt;
   &lt;span class="s"&gt;↓&lt;/span&gt;
&lt;span class="s"&gt;AKS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no manual changes&lt;/li&gt;
&lt;li&gt;full traceability&lt;/li&gt;
&lt;li&gt;secure authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Identity Correlation
&lt;/h2&gt;

&lt;p&gt;The platform integrates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Activity Logs&lt;/li&gt;
&lt;li&gt;Entra ID identities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This enables:&lt;br&gt;
&lt;strong&gt;&lt;em&gt;precise attribution of changes - who did what and when.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional vs AI-Driven Observability
&lt;/h2&gt;

&lt;p&gt;Traditional stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dashboards&lt;/li&gt;
&lt;li&gt;alerts&lt;/li&gt;
&lt;li&gt;manual investigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your platform adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reasoning layer&lt;/li&gt;
&lt;li&gt;cross-system correlation&lt;/li&gt;
&lt;li&gt;automated investigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Difference:&lt;br&gt;
&lt;strong&gt;&lt;em&gt;from data exploration → to decision support.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Read-only AI systems are safer and more predictable&lt;/li&gt;
&lt;li&gt;Structured tooling improves reliability&lt;/li&gt;
&lt;li&gt;Correlation is the hardest part of observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Observability tools provide signals, not understanding.&lt;br&gt;
By adding a secure AI-driven reasoning layer, we can turn fragmented data into actionable insight and significantly reduce incident response time in Kubernetes environments.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🤝 Stay Connected&lt;br&gt;
Found this guide helpful? Follow my journey into AI Agent Automation Engineer on LinkedIn! Click the blue LinkedIn button to connect: &lt;a href="https://www.linkedin.com/in/george-ezejiofor-89615a8a/" rel="noopener noreferrer"&gt;George Ezejiofor&lt;/a&gt; on &lt;a href="https://www.linkedin.com/in/george-ezejiofor-89615a8a/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. Let's keep building scalable, secure cloud-native systems, one project at a time!&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>aks</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to build a custom router and firewall with Opnsense in your Homelab</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Sun, 22 Mar 2026 22:15:58 +0000</pubDate>
      <link>https://dev.to/ezejioforog/building-a-headless-opnsense-homelab-from-zero-to-domain-access-35l4</link>
      <guid>https://dev.to/ezejioforog/building-a-headless-opnsense-homelab-from-zero-to-domain-access-35l4</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;A complete guide to deploying a production-grade firewall with remote management&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftku4790ngojj7j6imkp4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftku4790ngojj7j6imkp4.jpg" alt="homelab firewall" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Navigating the world of home networking can be a "plug-and-play" experience - until you decide to build a professional-grade lab. When I set out to install OPNsense on an Intel N150 fanless mini-PC, I ran into the classic "locked out" scenarios, interface mismatches, and DNS rebind attacks.&lt;br&gt;
Here is the definitive guide on how I moved from a broken installer to a fully headless, domain-accessible firewall that I can manage from anywhere in the world.&lt;/p&gt;

&lt;p&gt;Three days. That's how long it took me to get a working headless OPNsense setup on my Intel N150 mini-PC.&lt;/p&gt;

&lt;p&gt;Not because OPNsense is hard. Because nobody documents the &lt;em&gt;gotchas&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Interface mismatches. DNS rebind protection blocking your own domain. The "Live mode" trap where nothing persists. Plugging in two cables at the wrong moment and watching interface detection implode.&lt;/p&gt;

&lt;p&gt;I hit every single one. So you don't have to.&lt;/p&gt;


&lt;h2&gt;
  
  
  My Setup
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;┌─────────────────┐      ┌─────────────────┐&lt;/span&gt;
&lt;span class="s"&gt;│  Home Router    │ WAN  │  OPNsense N150  │&lt;/span&gt;
&lt;span class="na"&gt;│  192.168.1.xxx  │─────▶│  WAN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;igc1      │&lt;/span&gt;
&lt;span class="na"&gt;└─────────────────┘      │  LAN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;igc0      │&lt;/span&gt;
                         &lt;span class="s"&gt;└────────┬────────┘&lt;/span&gt;
                                  &lt;span class="s"&gt;│ LAN&lt;/span&gt;
                         &lt;span class="s"&gt;┌────────▼────────┐&lt;/span&gt;
                         &lt;span class="s"&gt;│  Mac Studio     │&lt;/span&gt;
                         &lt;span class="s"&gt;│  10.0.1.xxx/16  │&lt;/span&gt;
                         &lt;span class="s"&gt;└─────────────────┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Device:&lt;/strong&gt; Intel N150 (4 cores, fanless, silent)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ETH0 (igc0):&lt;/strong&gt; LAN → Mac Studio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ETH1 (igc1):&lt;/strong&gt; WAN → Home ISP Router&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target Network:&lt;/strong&gt; 10.0.1.0/16&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote Access:&lt;/strong&gt; TinyPilot KVM for when I inevitably lock myself out&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Gotcha #1: The "Live Mode" Trap
&lt;/h2&gt;

&lt;p&gt;OPNsense boots into Live mode by default. You configure everything, reboot, and it's gone. Poof.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Actually select "Install" from the boot menu. Sounds obvious. Cost me an evening.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Installation flow:&lt;/span&gt;
1. Boot USB
2. Select &lt;span class="s2"&gt;"Install"&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;NOT Live&lt;span class="o"&gt;)&lt;/span&gt;
3. Login: installer / opnsense
4. Target: internal SSD, GPT/EFI
5. ONLY plug WAN cable first → assign igc1 ETH1
6. THEN plug LAN cable → assign igc0 ETH0
7. Reboot, remove USB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why the cable dance? OPNsense interface detection chokes when both ports are active during assignment. Three reinstalls taught me this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #2: IP Conflict Hell
&lt;/h2&gt;

&lt;p&gt;OPNsense defaults to &lt;code&gt;192.168.1.1&lt;/code&gt; for LAN. Your home router? Probably &lt;code&gt;192.168.1.1&lt;/code&gt; too.&lt;/p&gt;

&lt;p&gt;Same subnet. Routing chaos. Web UI unreachable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Console fix (Option 2):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LAN Interface: igc0 ETH0
IPv4: 10.0.1.x
Subnet: /16
DHCP: Yes (10.0.1.xxx-yyy)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On your Mac, set Ethernet to DHCP. Test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ping 10.0.1.x
&lt;span class="c"&gt;# If replies → https://10.0.1.x works&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Gotcha #3: Wi-Fi Management Blocked
&lt;/h2&gt;

&lt;p&gt;I wanted to manage the firewall from my couch, not tethered to Ethernet. But Wi-Fi traffic comes in on WAN, and OPNsense blocks it by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three settings to fix:&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Unblock Private Networks
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Interfaces → [WAN]&lt;/code&gt; → Uncheck "Block private networks"&lt;/p&gt;

&lt;p&gt;Your home router hands OPNsense a 192.168.x.x address. That checkbox blocks it. Genius security feature that locked me out for hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Add WAN Management Rule
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Firewall → Rules → WAN → Add&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Action: Pass
Protocol: TCP
Destination: WAN address
Port: 443
Description: Wi-Fi Management
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Listen on All Interfaces
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;System → Settings → Administration&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Listen Interfaces: All
Protocol: HTTPS
Port: 443
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Gotcha #4: DNS Rebind Attack Protection
&lt;/h2&gt;

&lt;p&gt;I set up a DNS override for &lt;code&gt;firewall.homelab.com&lt;/code&gt;. Loaded it in my browser. Blocked.&lt;/p&gt;

&lt;p&gt;OPNsense has DNS rebind protection — it blocks external domains resolving to internal IPs. Good security. Terrible for homelabs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix at &lt;code&gt;System → Settings → Administration&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Option A (quick): Check "Disable DNS Rebind Checks"&lt;/p&gt;

&lt;p&gt;Option B (safer): Add domain to "Alternate Hostnames"&lt;/p&gt;




&lt;h2&gt;
  
  
  Complete Firewall Rules
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LAN (igc0 / 10.0.1.0/16)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Seq&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Destination&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;LAN net&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Internet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;LAN net&lt;/td&gt;
&lt;td&gt;10.0.1.xxx:443&lt;/td&gt;
&lt;td&gt;Management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;10.0.1.xxx/20&lt;/td&gt;
&lt;td&gt;10.0.16.xxx/20&lt;/td&gt;
&lt;td&gt;DC1→DC2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;10.0.1.xxx/20&lt;/td&gt;
&lt;td&gt;10.0.32.xxx/20&lt;/td&gt;
&lt;td&gt;DC1→DC3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;999&lt;/td&gt;
&lt;td&gt;Block&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Default deny&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  WAN (igc1 / DHCP)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Seq&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;Wi-Fi mgmt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Pass&lt;/td&gt;
&lt;td&gt;UDP&lt;/td&gt;
&lt;td&gt;51820&lt;/td&gt;
&lt;td&gt;WireGuard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;999&lt;/td&gt;
&lt;td&gt;Block&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Drop rest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  WireGuard for Remote Access
&lt;/h2&gt;

&lt;p&gt;I'm frequently in Nigeria. Need to hit my homelab from 5,000 miles away.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install plugin&lt;/span&gt;
System → Firmware → Plugins → os-wireguard

&lt;span class="c"&gt;# Server config&lt;/span&gt;
VPN → WireGuard → Settings
  Enable: Yes
  Port: 51820
  Interface: WAN

&lt;span class="c"&gt;# Endpoint&lt;/span&gt;
  Name: Remote_VPN
  Tunnel: 10.0.100.1/24
  DNS: 10.0.1.x

&lt;span class="c"&gt;# Peer (for each client)&lt;/span&gt;
  Name: Nigeria_Laptop
  Public Key: &lt;span class="o"&gt;[&lt;/span&gt;from client]
  Allowed IPs: 10.0.100.xxx/32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't forget the WAN firewall rule for UDP 51820.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ansible Automation
&lt;/h2&gt;

&lt;p&gt;Everything above? I codified it. If my N150 dies, one command rebuilds everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy Firewall Rules&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;opnsense&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow LAN to Internet&lt;/span&gt;
      &lt;span class="na"&gt;oxlorg.opnsense.rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LAN_to_Internet&lt;/span&gt;
        &lt;span class="na"&gt;interface&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;igc0&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any&lt;/span&gt;
        &lt;span class="na"&gt;source_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.0.1.xxx/16&lt;/span&gt;
        &lt;span class="na"&gt;destination_net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;any&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pass&lt;/span&gt;
        &lt;span class="na"&gt;sequence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Wi-Fi Management&lt;/span&gt;
      &lt;span class="na"&gt;oxlorg.opnsense.rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WiFi_Mgmt&lt;/span&gt;
        &lt;span class="na"&gt;interface&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;igc1&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp&lt;/span&gt;
        &lt;span class="na"&gt;destination_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pass&lt;/span&gt;
        &lt;span class="na"&gt;sequence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the &lt;code&gt;oxlorg.opnsense&lt;/code&gt; collection. Full playbook handles interfaces, DHCP, DNS, WireGuard — everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pre-Flight Checklist
&lt;/h2&gt;

&lt;p&gt;Before you call it done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;Interfaces → WAN&lt;/code&gt;: "Block private networks" UNCHECKED&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;Firewall → WAN&lt;/code&gt;: Port 443 rule exists&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;Firewall → WAN&lt;/code&gt;: Port 51820 rule exists (if using WireGuard)&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;System → Admin&lt;/code&gt;: DNS Rebind disabled OR alternate hostname set&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;System → Admin&lt;/code&gt;: Listen Interfaces = All&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;Services → Unbound&lt;/code&gt;: Domain override configured&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;NAT → Outbound&lt;/code&gt;: Hybrid mode, networks masqueraded&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Troubleshooting Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Can't reach 10.0.1.x&lt;/td&gt;
&lt;td&gt;Check Ethernet DHCP, verify igc0 connection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wi-Fi blocked&lt;/td&gt;
&lt;td&gt;"Block private networks" still checked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain fails&lt;/td&gt;
&lt;td&gt;DNS Rebind protection blocking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WireGuard dead&lt;/td&gt;
&lt;td&gt;Port 51820 not open on WAN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No inter-subnet traffic&lt;/td&gt;
&lt;td&gt;LAN rules missing or wrong subnets&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Access Methods After Setup
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Desk (Ethernet)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10.0.1.xxx" rel="noopener noreferrer"&gt;https://10.0.1.xxx&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Direct LAN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Home Wi-Fi&lt;/td&gt;
&lt;td&gt;&lt;a href="https://192.168.1.xxx" rel="noopener noreferrer"&gt;https://192.168.1.xxx&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;WAN address&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10.0.1.x" rel="noopener noreferrer"&gt;https://10.0.1.x&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;WireGuard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain&lt;/td&gt;
&lt;td&gt;&lt;a href="https://firewall.homelab.com" rel="noopener noreferrer"&gt;https://firewall.homelab.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DNS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Full Ansible automation. Zero-touch rebuild capability. One command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.yml playbooks/99_full_rebuild.yml &lt;span class="nt"&gt;--ask-vault-pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Working on that now. Article coming when it's battle-tested.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.opnsense.org/" rel="noopener noreferrer"&gt;OPNsense Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ansibleguy/collection_opnsense" rel="noopener noreferrer"&gt;Ansible OPNsense Collection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wireguard.com/" rel="noopener noreferrer"&gt;WireGuard&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Part of my homelab series — Proxmox clusters, bare-metal Kubernetes, and eventually low-latency trading systems. Follow if you're into infrastructure chaos.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Questions? Drop them below. I've probably made your mistake already.&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>networking</category>
      <category>opnsense</category>
      <category>devops</category>
    </item>
    <item>
      <title>Canary Deployments Using Argo Rollouts and Istio Service-mesh</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Mon, 17 Mar 2025 12:33:49 +0000</pubDate>
      <link>https://dev.to/ezejioforog/mastering-canary-deployments-zero-downtime-integration-with-argo-rollouts-and-istio-44mn</link>
      <guid>https://dev.to/ezejioforog/mastering-canary-deployments-zero-downtime-integration-with-argo-rollouts-and-istio-44mn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction 🚀
&lt;/h2&gt;

&lt;p&gt;🌟 In today’s cloud-native environments, &lt;strong&gt;Canary Deployment&lt;/strong&gt; stands out as a powerful technique for achieving &lt;strong&gt;zero-downtime&lt;/strong&gt; releases. By incrementally rolling out new application versions, canary deployments reduce risk and ensure a seamless user experience. When combined with the advanced rollout strategies of &lt;strong&gt;Argo Rollouts&lt;/strong&gt; and the fine-grained traffic management of &lt;strong&gt;Istio Service Mesh&lt;/strong&gt;, you gain precise control over traffic shifts, enabling smooth transitions between versions. This project also has Automatic Rollback capability for the stable version when canary version deployment goes wrong.&lt;/p&gt;

&lt;p&gt;🎯 This guide will show you how to implement &lt;strong&gt;Canary Deployment&lt;/strong&gt; using &lt;strong&gt;Argo Rollouts&lt;/strong&gt;’ intelligent strategies alongside Istio’s traffic-splitting capabilities. You’ll learn how to gradually shift traffic between application versions while maintaining full observability and control. By the end of this guide, you’ll have a robust, production-ready setup that deploys new features seamlessly—without impacting your end users.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✅ Prerequisites 🛠️
&lt;/h2&gt;

&lt;p&gt;To successfully implement Zero Downtime Canary Deployment with Argo-Rollouts and Istio Service-Mesh, ensure you have the following:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🐳 Kubernetes Cluster:&lt;/strong&gt; A working Kubernetes cluster set up using KUBEADM on a bare-metal setup, with MetalLB configured for LoadBalancer functionality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💻 kubectl:&lt;/strong&gt; Install and configure the Kubernetes command-line tool to interact with your cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧩 Helm:&lt;/strong&gt; The Kubernetes package manager for simplified application deployment and configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔒 Cert-Manager (optional):&lt;/strong&gt; Installed in the cluster for automated TLS certificate management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🌐 Istio Ingress Controller:&lt;/strong&gt; Deploy the Istio Ingress Gateway to handle HTTP(S) traffic routing effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📂 Namespace Configuration:&lt;/strong&gt; Create distinct namespaces or use labels to separate stable and canary deployments for clear isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🌐 Domain Name:&lt;/strong&gt; Set up a domain (e.g., terranetes.co.uk) or a subdomains pointing to your LoadBalancer IP address. You can manage DNS using providers like Cloudflare.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📧 Let's Encrypt Account:&lt;/strong&gt; Ready with a valid email address for certificate issuance to enable HTTPS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📡 MetalLB:&lt;/strong&gt; Configured for bare-metal Kubernetes clusters to manage LoadBalancer services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📈 Kiali:&lt;/strong&gt; Installed for monitoring Istio's traffic flow and gaining visibility into service dependencies and metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📡 Basic Networking Knowledge:&lt;/strong&gt; Familiarity with Kubernetes networking concepts like Ingress, Services, and LoadBalancer mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📦 Argo Rollouts:&lt;/strong&gt; Installed to handle advanced Canary Deployment strategies. Use the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace argo-rollouts
kubectl apply &lt;span class="nt"&gt;-n&lt;/span&gt; argo-rollouts &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pod &lt;span class="nt"&gt;-A&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'argo|istio|metal|cert|monitoring'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbglr1en3dn3gshalknlm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbglr1en3dn3gshalknlm.png" alt="Prerequisites" width="800" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With these prerequisites ready, you're equipped to dive into setting up Canary Deployments! 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture 📈
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1qn59brplc2n49nm60y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1qn59brplc2n49nm60y.gif" alt="Canary Deployment" width="720" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployments 🚀
&lt;/h2&gt;

&lt;p&gt;You can get &lt;strong&gt;certificate deployment&lt;/strong&gt; from my &lt;strong&gt;BlueGreen deployment&lt;/strong&gt; article&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/ezejioforog/implementing-blue-green-deployment-in-kubernetes-with-tls-encryption-using-cert-manager-and-nginx-38hg"&gt;BlueGreen deployment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deploy &lt;strong&gt;Clusterissuer&lt;/strong&gt; with the same method as &lt;strong&gt;Bluegreen Deployment&lt;/strong&gt;.&lt;br&gt;
Deploy certificate for &lt;strong&gt;istio-ingress&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt;EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cert-manager.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Certificate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranetes-istio-cert&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-ingress&lt;/span&gt;                    &lt;span class="c1"&gt;# Must match the namespace of the Istio ingress gateway&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranetes-istio-tls&lt;/span&gt;            &lt;span class="c1"&gt;# This is the credentialName in the Gateway resource&lt;/span&gt;
  &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2160h&lt;/span&gt; &lt;span class="c1"&gt;# 90 days&lt;/span&gt;
  &lt;span class="na"&gt;renewBefore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;360h&lt;/span&gt; &lt;span class="c1"&gt;# 15 days&lt;/span&gt;
  &lt;span class="na"&gt;isCA&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;privateKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;algorithm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RSA&lt;/span&gt;
    &lt;span class="na"&gt;encoding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PKCS1&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4096&lt;/span&gt;
  &lt;span class="na"&gt;issuerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;letsencrypt-dns01-istio&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIssuer&lt;/span&gt;
    &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cert-manager.io&lt;/span&gt;
  &lt;span class="na"&gt;dnsNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grafana.terranetes.co.uk"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prometheus.terranetes.co.uk"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alertmanager.terranetes.co.uk"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kiali.terranetes.co.uk"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argo-rollout.terranetes.co.uk"&lt;/span&gt;

&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Certificate Issued ✅&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy5rjyh66lr44etmveld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcy5rjyh66lr44etmveld.png" alt="Certificate Issued" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deploy argo rollout namespace (canary)🏠&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt; EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;istio-injection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enabled&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deploy argo rollout Gateway&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt; EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1alpha3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Gateway&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-gateway&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-ingress&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;istio&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress&lt;/span&gt;       &lt;span class="c1"&gt;# use istio default controller&lt;/span&gt;
  &lt;span class="na"&gt;servers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
      &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTP&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpsRedirect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Redirect HTTP to HTTPS&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https&lt;/span&gt;
      &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPS&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SIMPLE&lt;/span&gt;
      &lt;span class="na"&gt;credentialName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranetes-istio-tls&lt;/span&gt;  &lt;span class="c1"&gt;# Reference the TLS secret&lt;/span&gt;
    &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argo-rollout.terranetes.co.uk"&lt;/span&gt;    
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deploy argo rollout Service&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt; EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-canary&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;  &lt;span class="c1"&gt;# Add this label  &lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
    &lt;span class="c1"&gt;# This selector will be updated with the pod-template-hash of the canary ReplicaSet. e.g.:&lt;/span&gt;
    &lt;span class="c1"&gt;# rollouts-pod-template-hash: 7bf84f9696&lt;/span&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-stable&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;  &lt;span class="c1"&gt;# Add this label  &lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
    &lt;span class="c1"&gt;# This selector will be updated with the pod-template-hash of the stable ReplicaSet. e.g.:&lt;/span&gt;
    &lt;span class="c1"&gt;# rollouts-pod-template-hash: 789746c88d&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deploy argo rollout VirtualServices&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1alpha3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VirtualService&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-vs1&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateways&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;istio-ingress/terranates-app-gateway&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;argo-rollout.terranetes.co.uk"&lt;/span&gt;
  &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;route-one&lt;/span&gt;
    &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-stable&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-canary&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deploy argo rollout Terranates webapp&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt; EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rollout&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;canary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;canaryService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-canary&lt;/span&gt;
      &lt;span class="na"&gt;stableService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-stable&lt;/span&gt;
      &lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;startingStep&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
        &lt;span class="na"&gt;templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;templateName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-success-rate&lt;/span&gt;
        &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service&lt;/span&gt; 
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-canary&lt;/span&gt;  &lt;span class="c1"&gt;# ✅ Canary service name&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;namespace&lt;/span&gt;
          &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;fieldRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;fieldPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metadata.namespace&lt;/span&gt;
      &lt;span class="na"&gt;trafficRouting&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;istio&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;virtualServices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app-vs1&lt;/span&gt;
            &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;route-one&lt;/span&gt;
      &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;70&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;pause&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;30s&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terranates-app&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;georgeezejiofor/argo-rollout:yellow&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
          &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deploy argo rollout AnalysisTemplate&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt; EOF | kubectl apply -f -&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AnalysisTemplate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-success-rate&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;canary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;namespace&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success-rate&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;successCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;result[0] &amp;lt; 0.2 or result[1] &amp;lt; &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;  &lt;span class="c1"&gt;# ✅ Handle low traffic&lt;/span&gt;
    &lt;span class="na"&gt;failureCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;result[0] &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;0.2&lt;/span&gt;
    &lt;span class="na"&gt;failureLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://monitoring-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090&lt;/span&gt;
        &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;+&lt;/span&gt;
          &lt;span class="s"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;sum(irate(istio_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;reporter="source",&lt;/span&gt;
              &lt;span class="s"&gt;destination_service=~"{{args.service}}.{{args.namespace}}.svc.cluster.local",&lt;/span&gt;
              &lt;span class="s"&gt;response_code!~"2.*"}[2m])&lt;/span&gt;
            &lt;span class="s"&gt;)&lt;/span&gt;
            &lt;span class="s"&gt;/&lt;/span&gt;
            &lt;span class="s"&gt;sum(irate(istio_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;reporter="source",&lt;/span&gt;
              &lt;span class="s"&gt;destination_service=~"{{args.service}}.{{args.namespace}}.svc.cluster.local"}[2m])&lt;/span&gt;
            &lt;span class="s"&gt;)&lt;/span&gt;
          &lt;span class="s"&gt;)&lt;/span&gt;
          &lt;span class="s"&gt;# Add total requests as second metric&lt;/span&gt;
          &lt;span class="s"&gt;, &lt;/span&gt;
          &lt;span class="s"&gt;sum(irate(istio_requests_total{&lt;/span&gt;
            &lt;span class="s"&gt;reporter="source",&lt;/span&gt;
            &lt;span class="s"&gt;destination_service=~"{{args.service}}.{{args.namespace}}.svc.cluster.local"}[2m])&lt;/span&gt;
          &lt;span class="s"&gt;)&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Validate deployments in “canary” namespace&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get all &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;span class="c"&gt;# All these should show resources after successful deployment&lt;/span&gt;
kubectl get svc &lt;span class="nt"&gt;-n&lt;/span&gt; canary &lt;span class="nt"&gt;--show-labels&lt;/span&gt;
kubectl get rollout &lt;span class="nt"&gt;-n&lt;/span&gt; canary
kubectl get gateway &lt;span class="nt"&gt;-n&lt;/span&gt; canary
kubectl get virtualservice &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe6x8gctsjxkk36t85pg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe6x8gctsjxkk36t85pg.png" alt="Validate-deployment" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generate Traffic (Essential!): using hey command for macOS users&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hey &lt;span class="nt"&gt;-z&lt;/span&gt; 5m &lt;span class="nt"&gt;-q&lt;/span&gt; 10 https://argo-rollout.terranetes.co.uk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set new image Green for argo rollout&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts &lt;span class="nb"&gt;set &lt;/span&gt;image terranates-app terranates-app&lt;span class="o"&gt;=&lt;/span&gt;georgeezejiofor/argo-rollout:green &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set new image Red for argo rollout&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts &lt;span class="nb"&gt;set &lt;/span&gt;image terranates-app terranates-app&lt;span class="o"&gt;=&lt;/span&gt;georgeezejiofor/argo-rollout:red &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set new image Blue for argo rollout&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts &lt;span class="nb"&gt;set &lt;/span&gt;image terranates-app terranates-app&lt;span class="o"&gt;=&lt;/span&gt;georgeezejiofor/argo-rollout:blue &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set new image Yellow for argo rollout&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts &lt;span class="nb"&gt;set &lt;/span&gt;image terranates-app terranates-app&lt;span class="o"&gt;=&lt;/span&gt;georgeezejiofor/argo-rollout:yellow &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set new image Purple for argo rollout&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts &lt;span class="nb"&gt;set &lt;/span&gt;image terranates-app terranates-app&lt;span class="o"&gt;=&lt;/span&gt;georgeezejiofor/argo-rollout:purple &lt;span class="nt"&gt;-n&lt;/span&gt; canary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;em&gt;Watch argo-rollout *&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl argo rollouts get rollout terranates-app &lt;span class="nt"&gt;-n&lt;/span&gt; canary &lt;span class="nt"&gt;-w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfawsoojgfzrqhdh0laa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfawsoojgfzrqhdh0laa.png" alt="Watch-Rollouts" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikxjxizkz0hn0pg4n4p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikxjxizkz0hn0pg4n4p1.png" alt="Stable-version" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual Testing for Rollouts and Automatic Rollback 😊&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/uheUug076Dk" rel="noopener noreferrer"&gt;https://youtu.be/uheUug076Dk&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion 🎉&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Congratulations! You’ve just unlocked the power of &lt;strong&gt;zero-downtime deployments&lt;/strong&gt; with Argo Rollouts and Istio! 🚀 By combining Argo Rollouts’ intelligent canary strategies with Istio’s granular traffic management, you’ve built a robust system that:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduces Risk 😌:&lt;/strong&gt; Gradually shift traffic to new versions while monitoring real-time metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ensures Smooth User Experience 🌟:&lt;/strong&gt; No downtime, no disruptions—just seamless updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automates Rollbacks 🛡️:&lt;/strong&gt; Detect issues early and revert to stable versions effortlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimizes Traffic Control 🎛️:&lt;/strong&gt; Istio’s dynamic routing ensures precise traffic splitting.&lt;/p&gt;

&lt;p&gt;With this setup, you’re not just deploying code—you’re delivering confidence. 💪 Whether you’re rolling out mission-critical features or experimenting with new updates, this integration empowers you to innovate fearlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next Project: Observability Stacks 📈&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that you've mastered canary deployments, it's time to build a &lt;strong&gt;powerful observability stack&lt;/strong&gt; for deeper insights into your applications! 🚀 In this next project, we'll explore tools that provide real-time monitoring, centralised logging, and distributed tracing to help you maintain a reliable and performant system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability Tools We’ll Cover 🛠️&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dive into building a powerful observability stack for deeper insights! We'll explore tools like &lt;strong&gt;Prometheus, Grafana, Loki, Jaeger, OpenTelemetry, Kiali,&lt;/strong&gt; and the &lt;strong&gt;promtail&lt;/strong&gt; for real-time monitoring, logging, and tracing. 🛠️&lt;/p&gt;

&lt;p&gt;Stay tuned for hands-on implementations and best practices! 🎯This stack will help you monitor, troubleshoot, and optimize your applications with full visibility into system behavior.&lt;/p&gt;

&lt;p&gt;Stay tuned as we explore hands-on implementations and best practices! 🎯&lt;/p&gt;

&lt;p&gt;Follow me on &lt;strong&gt;Linkedin&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/george-ezejiofor-89615a8a/" rel="noopener noreferrer"&gt;George Ezejiofor&lt;/a&gt; to stay updated on cloud-native observability insights! 😊&lt;/p&gt;

&lt;p&gt;Happy deploying! 🚀🎉&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloudcomputing</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Automating Production-grade multi-node Kubernetes with KUBESPRAY on multipass with Just a single command</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Tue, 31 Dec 2024 03:25:10 +0000</pubDate>
      <link>https://dev.to/ezejioforog/automating-production-grade-multi-node-kubernetes-with-kubespray-on-multipass-with-just-a-2jl9</link>
      <guid>https://dev.to/ezejioforog/automating-production-grade-multi-node-kubernetes-with-kubespray-on-multipass-with-just-a-2jl9</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.tourl"&gt;&lt;/a&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
Are you overwhelmed by the complexity of deploying production-grade multi-node Kubernetes clusters? 😅 The countless configurations, the manual setup, and the inevitable errors—it’s enough to make anyone feel like giving up! But what if I told you there’s a way to automate the entire process with just one command? 🤯 Just like the way you do &lt;code&gt;terraform apply -auto-approve&lt;/code&gt; and &lt;code&gt;terraform destroy -auto-approve&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Imagine deploying a robust, scalable, and production-ready Kubernetes cluster as effortlessly as running a simple script. ✨ With the power of Kubespray and Multipass, I’ve developed an automated solution that does exactly that! 🚀&lt;/p&gt;

&lt;p&gt;Here’s how it works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster Deployment:&lt;/strong&gt; A single command, &lt;code&gt;python3 terranetes.py cluster init&lt;/code&gt; provisions a multi-node Kubernetes cluster using Kubespray on Multipass. No manual intervention is needed! 🛠️&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster Destruction:&lt;/strong&gt; When you're done, simply run python3 terranetes.py cluster terminate and it gracefully destroys the cluster, cleans up resources, and even purges the Multipass instances. 💣✨&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fresh Start&lt;/strong&gt;, Need a new deployment with new IPs from the DHCP? No problem! Run the command again, and voilà, your new Kubernetes cluster is deployed and ready to go. Each new deployment gets fresh IPs dynamically assigned from the DHCP, ensuring no conflicts and a seamless experience. 🌐&lt;/p&gt;

&lt;p&gt;This approach combines the best of both worlds: the flexibility and reliability of Kubespray with the lightweight convenience of Multipass. Whether you're setting up a home lab or experimenting with multi-node clusters for learning, this solution takes the pain out of the process—and puts the fun back in DevOps! 😎&lt;/p&gt;

&lt;p&gt;Ready to see the magic in action? Let’s dive in! 🚀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multipass 🌐:&lt;/strong&gt; It is essential for creating and managing lightweight virtual machines on your local machine. It’s easy to deploy and delete. Memory - 4GB, Disk - 20GB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python3 🐍:&lt;/strong&gt; Python is the backbone of the automation script. Ensure you have Python3 installed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ansible 📦:&lt;/strong&gt; Kubespray relies on Ansible for orchestrating Kubernetes setup across nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubespray 🚀:&lt;/strong&gt; Kubespray is the tool of choice for automating Kubernetes clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bash Script 📝:&lt;/strong&gt; This script automates the commands for deploying and destroying the Kubernetes cluster using Kubespray and Multipass.&lt;/p&gt;

&lt;p&gt;Why Multipass? 🤔&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9suu4i5vlpkfa30pbw7h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9suu4i5vlpkfa30pbw7h.png" alt="multipass" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When choosing a virtualization environment for running Kubernetes nodes, there are several options available like VirtualBox, VMware, Lima, Colima, Vagrant, etc. Each has unique features, but Multipass stands out for its simplicity and efficiency in local development. Here’s why I chose Multipass for this project:&lt;br&gt;
&lt;strong&gt;1. Lightweight and Fast ⚡&lt;/strong&gt;&lt;br&gt;
Multipass is optimized for lightweight, fast virtual machine creation and management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Deployment:&lt;/strong&gt; Launch VMs in seconds with minimal configuration. making it ideal for quickly spinning up and tearing down virtual machines.&lt;/p&gt;

&lt;p&gt;This &lt;strong&gt;efficiency&lt;/strong&gt; is crucial for development and testing environments where time is of the essence. ⚡&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low Overhead:&lt;/strong&gt; Multipass VMs consume fewer resources compared to traditional hypervisors like VMware or VirtualBox.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration 🌐&lt;/strong&gt;
Multipass is tightly integrated with Ubuntu, making it an excellent choice for running Kubernetes clusters on Ubuntu-based images.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Prebuilt Ubuntu Images:&lt;/strong&gt; Comes with official Ubuntu LTS images, eliminating the need to download or manage custom ISOs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration with Cloud-Init:&lt;/strong&gt; Multipass supports Cloud-Init, which allows you to automate the initial configuration of your instances. This feature is particularly useful for automating tasks and ensuring consistency across deployments. 🚀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Effortless Updates:&lt;/strong&gt; Automatically downloads and keeps images up-to-date with the latest versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Command-Line Simplicity 🛠️&lt;/strong&gt;&lt;br&gt;
Multipass is designed with developers in mind, featuring a clean and intuitive command-line interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single Command Operations:&lt;/strong&gt; Create, delete, and manage VMs with simple commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scripting-Friendly:&lt;/strong&gt; Ideal for automation workflows, making it easy to integrate with tools like Ansible and Kubespray.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Cross-Platform Support 🌍&lt;/strong&gt;&lt;br&gt;
Multipass is compatible with major platforms, including Linux, macOS, and Windows, ensuring flexibility and portability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified Experience:&lt;/strong&gt; Consistent CLI across all platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-Platform Compatibility:&lt;/strong&gt; Multipass runs on Windows, macOS, and Linux, providing a consistent experience across different operating systems. This cross-platform compatibility is beneficial for teams working in diverse environments. 🌐&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Dependencies:&lt;/strong&gt; Unlike Vagrant, which requires a hypervisor like VirtualBox, Multipass works out of the box with its backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Cloud-Like Experience ☁️&lt;/strong&gt;&lt;br&gt;
Multipass mimics the simplicity of provisioning instances in the cloud, bringing that experience to your local machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ease of Networking:&lt;/strong&gt; Multipass simplifies networking setup, making it easier to configure and manage network interfaces and IP addresses for your virtual machines. This ease of use is beneficial for setting up and managing Kubernetes clusters. 🌍&lt;/p&gt;

&lt;p&gt;**Automation-Friendly: **Multipass can be easily integrated into automation scripts and CI/CD pipelines, making it a perfect fit for automated deployments and testing workflows. 🛠️&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic IP Assignment:&lt;/strong&gt; Leverages DHCP for dynamic IP allocation, making networking seamless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimal Configuration:&lt;/strong&gt; Abstracts away the complexities of VM networking and resource management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Efficient Resource Usage 🖥️&lt;/strong&gt;&lt;br&gt;
Multipass is optimized for development environments and uses system resources efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multipass is resource-efficient&lt;/strong&gt;, meaning it uses fewer system resources compared to heavier alternatives like VMware or VirtualBox. This efficiency is crucial for running multiple virtual machines simultaneously without significant performance degradation. 💻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customizable Resources:&lt;/strong&gt; Allocate specific CPU, memory, and disk resources for each VM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low Host Impact:&lt;/strong&gt; Uses lightweight QEMU as the backend on Linux and macOS, ensuring your host system remains performant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f5z0taj62aj2dv4nld1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f5z0taj62aj2dv4nld1.png" alt="compare-multipass" width="753" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I Choose Multipass for My Project? 💡&lt;/strong&gt;&lt;br&gt;
For this project, the goal was to deploy and manage multi-node Kubernetes clusters locally with minimal effort. Multipass checked all the boxes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simplicity:&lt;/strong&gt; Easy VM provisioning without dealing with hypervisor complexities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed:&lt;/strong&gt; Rapid setup and teardown of VMs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration:&lt;/strong&gt; Works seamlessly with Ubuntu and aligns perfectly with Kubespray’s requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automation:&lt;/strong&gt; Script-friendly CLI for hands-free deployment and destruction of clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Kubespray?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxl046fefn2siomeoa14t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxl046fefn2siomeoa14t.png" alt="Kubespray" width="165" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;*&lt;em&gt;When it comes to deploying Kubernetes clusters, there are numerous tools to choose from, including Kind, Minikube, k3s, MicroK8s, and kubeadm. Each has its strengths and serves specific use cases. So, why choose Kubespray for setting up a production-grade multi-node Kubernetes cluster? Let’s explore! *&lt;/em&gt;&lt;/em&gt;🚀&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Production-Grade Setup 🌟&lt;/strong&gt;
Kubespray is designed to create highly available, production-ready Kubernetes clusters.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Supports Multi-Node Clusters:&lt;/strong&gt; While tools like Minikube and Kind are excellent for single-node or small-scale environments, Kubespray shines in setting up robust multi-node configurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built-In HA:&lt;/strong&gt; Out-of-the-box high availability (HA) ensures resilience, making it ideal for production workloads.&lt;/p&gt;

&lt;p&gt;With Kubespray, you can easily scale your clusters by adding or removing nodes as needed. This flexibility makes it a preferred choice for dynamic environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Highly Customizable 🎨&lt;/strong&gt;&lt;br&gt;
Kubespray provides the flexibility to customize every aspect of the cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-Tuned Configuration:&lt;/strong&gt; Modify parameters like container runtime, network plugins, and cloud providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced Options:&lt;/strong&gt; Includes additional features like network policies, cluster monitoring, and logging by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support for Various CNIs:&lt;/strong&gt; Choose from Calico, Cilium, Flannel, and more based on your networking needs.&lt;/p&gt;

&lt;p&gt;Kubespray offers extensive configuration options, allowing you to tailor your Kubernetes setup to specific requirements. This level of customization is often not available in simpler tools like MicroK8s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Idempotent and Reliable 🔁&lt;/strong&gt;&lt;br&gt;
Kubespray leverages Ansible for orchestration, ensuring idempotency and reliability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ansible Integration:&lt;/strong&gt; Kubespray leverages Ansible for configuration management, providing a powerful and flexible way to automate complex tasks. This integration enhances the overall automation and management capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error Recovery:&lt;/strong&gt; If an operation fails, rerunning the playbook picks up where it left off without breaking the cluster.&lt;/p&gt;

&lt;p&gt;Scalability: Easily scale your cluster by adding or removing nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Cross-Platform Support 🌍&lt;/strong&gt;&lt;br&gt;
Unlike other tools that are tied to specific environments, Kubespray works seamlessly across various platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud Provider Agnostic:&lt;/strong&gt; Kubespray can deploy clusters across various cloud providers and on-premises environments, providing versatility that other tools might lack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On-Premise and Cloud:&lt;/strong&gt; Supports deployment on on-premise bare metal, cloud VMs, or a hybrid setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Provider Flexibility:&lt;/strong&gt; Compatible with AWS, GCP, Azure, OpenStack, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Kubernetes Compliance ✅&lt;/strong&gt;&lt;br&gt;
Kubespray adheres to upstream Kubernetes standards, ensuring compatibility with the latest versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Community and Support:&lt;/strong&gt; Kubespray has a strong and active community, offering plenty of resources, documentation, and support for troubleshooting and optimization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Features:&lt;/strong&gt; Kubespray includes built-in security features and best practices, ensuring your clusters are secure from the start. 🔒&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Vendor Lock-In:&lt;/strong&gt; Fully compliant with the Kubernetes API, so you’re not locked into proprietary solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Community-Driven:&lt;/strong&gt; Supported by the Kubernetes SIG (Special Interest Group), ensuring regular updates and robust community support.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrmtuxhyp5fackfqb33g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrmtuxhyp5fackfqb33g.png" alt="compare-kubespray" width="753" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I Chose Kubespray for My Project? 💡&lt;/strong&gt;&lt;br&gt;
For this project, the goal was to automate the deployment of production-grade, highly available, multi-node Kubernetes clusters in a local environment and also customisable.&lt;/p&gt;

&lt;p&gt;Kubespray provided the perfect balance between flexibility, reliability, and ease of automation.&lt;/p&gt;

&lt;p&gt;Combining it with Multipass made the process lightweight and repeatable with minimal resource overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why My TerraNetes Over Kubespray? 🚀&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo3seggoz6slbbkdm30h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo3seggoz6slbbkdm30h.png" alt="terranetes" width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While Kubespray is a fantastic tool for deploying production-grade Kubernetes clusters, it requires significant setup and configuration. TerraNetes, on the other hand, simplifies the entire process by automating everything with just one command. Here's a detailed comparison and why TerraNetes stands out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Node Provisioning Made Simple 🛠️&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Kubespray:&lt;/strong&gt; Requires manual provisioning of nodes before you can deploy Kubernetes.&lt;/p&gt;

&lt;p&gt;You need to set up virtual machines or physical servers, configure SSH access, and ensure the nodes meet Kubernetes prerequisites.&lt;/p&gt;

&lt;p&gt;Customization can be time-consuming and prone to errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TerraNetes:&lt;/strong&gt; Handles node provisioning out of the box.&lt;/p&gt;

&lt;p&gt;Automatically spins up and configures virtual machines using Multipass.&lt;/p&gt;

&lt;p&gt;No manual intervention is required to create or manage nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Out-of-the-Box Configuration 🎁&lt;/strong&gt;&lt;br&gt;
**Kubespray: **Requires detailed configuration for nodes, including:&lt;/p&gt;

&lt;p&gt;Firewall Rules: Ensure necessary ports are open for Kubernetes components.&lt;/p&gt;

&lt;p&gt;Hostnames: Set unique and resolvable hostnames for all nodes.&lt;/p&gt;

&lt;p&gt;Swap Off: Manually disable swap to ensure Kubernetes’s stability.&lt;/p&gt;

&lt;p&gt;Kernel Modules: Enable necessary modules like br_netfilter with modprobe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TerraNetes:&lt;/strong&gt; Takes care of all these configurations automatically.&lt;/p&gt;

&lt;p&gt;Ensures all nodes are pre-configured with the required settings.&lt;/p&gt;

&lt;p&gt;No need to worry about forgetting a crucial step in node setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Ansible Connection Simplified 🔗&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Kubespray:&lt;/strong&gt; Requires SSH access and key management to establish connections between the control node and worker nodes.&lt;/p&gt;

&lt;p&gt;This includes creating SSH keys, distributing them to nodes, and ensuring proper permissions.&lt;br&gt;
&lt;strong&gt;TerraNetes:&lt;/strong&gt; Removes the need for manual SSH setup.&lt;/p&gt;

&lt;p&gt;Automatically configures Ansible connections during the provisioning process.&lt;/p&gt;

&lt;p&gt;New ssh-keys are being created on every deployment and deleted on every termination for security reasons.&lt;/p&gt;

&lt;p&gt;Automatically update the ansible inventory file with new node names and node IPs.&lt;/p&gt;

&lt;p&gt;No manual key distribution or host configuration is necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Single Command Deployment 💻&lt;/strong&gt;&lt;br&gt;
Kubespray: Deployment involves multiple steps:&lt;/p&gt;

&lt;p&gt;Provision nodes, prepare them, configure Ansible inventory, and run the playbooks.&lt;/p&gt;

&lt;p&gt;Each step requires careful attention to detail to avoid errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TerraNetes:&lt;/strong&gt; Simplifies deployment with one command.&lt;/p&gt;

&lt;p&gt;Handles node creation, configuration, and Kubernetes installation in a single execution.&lt;/p&gt;

&lt;p&gt;Saves time and ensures consistency across all environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Automation and User Experience 🎯&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Kubespray:&lt;/strong&gt; Powerful but designed for experienced users.&lt;/p&gt;

&lt;p&gt;Suitable for advanced scenarios where deep customization is required.&lt;/p&gt;

&lt;p&gt;However, it has a steep learning curve and requires significant effort to set up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TerraNetes:&lt;/strong&gt; Built with a focus on automation and ease of use.&lt;/p&gt;

&lt;p&gt;Perfect for users who want a Kubernetes cluster up and running quickly without dealing with the underlying complexities.&lt;/p&gt;

&lt;p&gt;Designed to minimize errors and reduce the barrier to entry for Kubernetes deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Time-Saving Advantage ⏳&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Kubespray:&lt;/strong&gt; Can take hours or even days to configure and deploy, depending on your expertise and environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TerraNetes:&lt;/strong&gt; Gets you a fully functional Kubernetes cluster in minutes, thanks to its streamlined approach.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hz6cjneqel1kjywg0wd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hz6cjneqel1kjywg0wd.png" alt="compare-terranetes" width="753" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to Choose TerraNetes?&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Rapid Prototyping:&lt;/strong&gt; Need a cluster quickly without worrying about the underlying infrastructure.&lt;br&gt;
&lt;strong&gt;Simplicity:&lt;/strong&gt; Want to focus on using Kubernetes, not setting it up.&lt;br&gt;
&lt;strong&gt;Automation:&lt;/strong&gt; Prefer an end-to-end automated solution with minimal manual intervention.&lt;/p&gt;

&lt;p&gt;While Kubespray shines in highly customizable and production-grade setups, TerraNetes provides an unmatched level of simplicity and speed. With just one command, it does all the heavy lifting - node provisioning, configuration, and deployment - so you can focus on building and deploying applications. 🎉&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing the Terranetes project&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://youtu.be/g7dDRwuw_v8?si=N8Eoaw4nYf02bMZL" rel="noopener noreferrer"&gt;Demo Video&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion 🎯&lt;/strong&gt;&lt;br&gt;
The TerraNetes project redefines the Kubernetes deployment experience with it’s simplicity, automation, and efficiency. It revolutionizes the way we deploy and manage production-grade multi-node Kubernetes clusters. Automating the entire process with just a single command simplifies the complexity of node provisioning, SSH configuration, updating node hostnames, firewall setup, and more. This out-of-the-box solution saves time, reduces errors, and makes Kubernetes deployment accessible to everyone, whether you're setting up a home lab or experimenting with multi-node clusters for learning.&lt;/p&gt;

&lt;p&gt;With Terranetes, you can focus on what truly matters—building and scaling your applications—while leaving the heavy lifting of cluster management to the automation script. The combination of Kubespray and Multipass, coupled with the streamlined approach of Terranetes, ensures a robust, scalable, and hassle-free Kubernetes experience. 🌐✨&lt;/p&gt;

&lt;p&gt;Whether you're a DevOps professional, developer, or Kubernetes enthusiast, TerraNetes empowers you to harness the full potential of Kubernetes in the fastest and easiest way possible. 🚀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/in/george-ezejiofor-89615a8a/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>python</category>
      <category>kubernetes</category>
      <category>ansible</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Thu, 28 Nov 2024 19:34:14 +0000</pubDate>
      <link>https://dev.to/ezejioforog/-4518</link>
      <guid>https://dev.to/ezejioforog/-4518</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/ezejioforog" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F989441%2Fa7a14354-4598-4802-93ab-1a4c95ff90f7.jpeg" alt="ezejioforog"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/ezejioforog/implementing-blue-green-deployment-in-kubernetes-with-tls-encryption-using-cert-manager-and-nginx-38hg" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Implementing Blue-Green Deployment in Kubernetes with TLS Encryption Using Cert-Manager and Nginx Ingress&lt;/h2&gt;
      &lt;h3&gt;George Ezejiofor ・ Nov 15&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#devops&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#kubernetes&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#devsecops&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#microservices&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Implementing Blue-Green Deployment in Kubernetes with TLS Encryption Using Cert-Manager and Nginx Ingress</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Fri, 15 Nov 2024 18:44:55 +0000</pubDate>
      <link>https://dev.to/ezejioforog/implementing-blue-green-deployment-in-kubernetes-with-tls-encryption-using-cert-manager-and-nginx-38hg</link>
      <guid>https://dev.to/ezejioforog/implementing-blue-green-deployment-in-kubernetes-with-tls-encryption-using-cert-manager-and-nginx-38hg</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;🌟 In modern cloud-native environments, ensuring zero-downtime deployments while maintaining robust security is critical. Blue-Green Deployment is a proven strategy that allows teams to switch traffic between different versions of an application seamlessly. Combined with TLS encryption for secure communication, this approach ensures a smooth and secure user experience.&lt;br&gt;
🚀 In this guide, we’ll implement a Blue-Green Deployment in Kubernetes, utilizing Cert-Manager for automated TLS certificate management and Nginx Ingress for traffic routing. By the end of this project, you’ll have a production-ready setup that you can replicate in your own environments.&lt;/p&gt;
&lt;h1&gt;
  
  
  Tech Stack
&lt;/h1&gt;

&lt;p&gt;🔧 Kubernetes: Cluster orchestration and management.&lt;br&gt;
🔒 Cert-Manager: Automated TLS certificate management.&lt;br&gt;
🌐 Nginx Ingress Controller: Routing HTTP(S) traffic to your services.&lt;br&gt;
📦 Helm (optional): Simplifying deployments.&lt;br&gt;
🛡️ Let's Encrypt: Free TLS certificates for HTTPS.&lt;br&gt;
📡 MetalLB: LoadBalancer for bare-metal Kubernetes clusters.&lt;br&gt;
💻 kubectl: Command-line tool for interacting with Kubernetes.&lt;/p&gt;
&lt;h1&gt;
  
  
  Prerequisites
&lt;/h1&gt;

&lt;p&gt;✅ Kubernetes Cluster: I will be using MICROK8S on this project on a bare-metal setup with MetalLB for LoadBalancer.&lt;br&gt;
✅ kubectl: Install and configure kubectl to interact with your cluster.&lt;br&gt;
✅ Helm: Install Helm, the Kubernetes package manager, for simplified application deployment and configuration.&lt;br&gt;
✅ Cert-Manager: Ensure Cert-Manager is installed in the cluster for TLS certificate management.&lt;br&gt;
✅ Nginx Ingress Controller: Deploy the Nginx Ingress Controller to handle HTTP(S) traffic routing.&lt;br&gt;
✅ Namespace Configuration: Create separate namespaces or labels for blue and green deployments.&lt;br&gt;
✅ Domain Name: Set up a domain name (or subdomain) terranetes.com pointing to your LoadBalancer IP address. I Used Cloudflare to manage my DNS .&lt;br&gt;
✅ Let's Encrypt Account: Prepare for certificate issuance by having a valid email for Let's Encrypt configuration.&lt;br&gt;
✅ Basic Networking Knowledge: Familiarity with Kubernetes networking concepts, including Ingress and Services.&lt;/p&gt;
&lt;h1&gt;
  
  
  Architecture
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm5c2xvzf96bokdgphoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm5c2xvzf96bokdgphoj.png" alt="Image description" width="800" height="472"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  DEPLOYMENTS
&lt;/h1&gt;

&lt;p&gt;Let’s create different namespaces for the project. And also add some environment variables for the project&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export CLOUDFLARE_API_KEY="xxxx change your token xxxxxxx"
export EMAIL="changeYOURemail@gmail.com"

echo $CLOUDFLARE_API_KEY
echo $EMAIL

kubectl create namespace blue-green
kubectl create namespace cert-manager

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Deploy Certificate components
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Create Cloudflare API Token Secret for cert-manager.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  # Create Cloudflare API Token Secret for cert-manager
  kubectl create secret generic cloudflare-api-token-cert-manager \
    --namespace cert-manager \
    --from-literal=api-token="$CLOUDFLARE_API_KEY"

  kubectl get secret cloudflare-api-token-cert-manager -n cert-manager -o yaml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Create a ClusterIssuer with Cloudflare DNS-01 validation.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  apiVersion: cert-manager.io/v1
  kind: ClusterIssuer
  metadata:
    name: letsencrypt-dns01-nginx
  spec:
    acme:
      server: https://acme-v02.api.letsencrypt.org/directory
      email: $EMAIL
      privateKeySecretRef:
        name: letsencrypt-dns01-private-nginx-key
      solvers:
      - dns01:
          cloudflare:
            email: $EMAIL
            apiTokenSecretRef:
              name: cloudflare-api-token-cert-manager
              key: api-token

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Create a Certificate that references the ClusterIssuer letsencrypt-dns01-nginx. I want to create the certificate in blue-green namespace.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  apiVersion: cert-manager.io/v1
  kind: Certificate
  metadata:
    name: blue-green-nginx-cert
    namespace: blue-green
  spec:
    secretName: blue-green-nginx-tls                     # Reference the secret name on encoding secretName value
    duration: 2160h # 90 days
    renewBefore: 360h # 15 days
    isCA: false
    privateKey:
      algorithm: RSA
      encoding: PKCS1
      size: 4096
    issuerRef:
      name: letsencrypt-dns01-nginx
      kind: ClusterIssuer
      group: cert-manager.io
    dnsNames:
      - "blue.terranetes.com"
      - "green.terranetes.com"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4z638dq2ijk7mn00yg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4z638dq2ijk7mn00yg8.png" alt="Image description" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Green Environment is my default live environment.
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: terranetes-nodegreen
  namespace: blue-green
  labels:
    app: terranetes-nodegreen
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: terranetes-nodegreen
      version: green
  template:
    metadata:
      labels:
        app: terranetes-nodegreen
        version: green
    spec:
      containers:
      - name: terranetes-nodegreen
        image: georgeezejiofor/terranetes-nodegreen:green-v1
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
        - name: VERSION
          value: "green"
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
---
apiVersion: v1
kind: Service
metadata:
  name: terranetes-nodegreen-svc
  namespace: blue-green
spec:
  selector:
    app: terranetes-nodegreen
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Blue Environment is my NEW live environment.
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: terranetes-nodeblue
  namespace: blue-green
  labels:
    app: terranetes-nodeblue
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: terranetes-nodeblue
      version: blue
  template:
    metadata:
      labels:
        app: terranetes-nodeblue
        version: blue
    spec:
      containers:
      - name: terranetes-nodeblue
        image: georgeezejiofor/terranetes-nodeblue:blue-v1
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
        - name: VERSION
          value: "blue"
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
---
apiVersion: v1
kind: Service
metadata:
  name: terranetes-nodeblue-svc
  namespace: blue-green
spec:
  selector:
    app: terranetes-nodeblue
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This services &lt;code&gt;terranetes-nodeblue-svc&lt;/code&gt; and &lt;code&gt;terranetes-nodegreen-svc&lt;/code&gt; acts as a Router or Switch to different environments.&lt;br&gt;
The services are also deployed as a &lt;code&gt;clusterIP&lt;/code&gt;. Hence can only be access within the cluster. I’m going to expose the service from the Loadbalancer of my ingress Controller. Also Update the Ingress resource with TLS configuration.&lt;/p&gt;
&lt;h1&gt;
  
  
  Deploy Ingress resource to expose both services
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: blue-green-ingress
  namespace: blue-green
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-dns01-nginx
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - blue.terranetes.com
        - green.terranetes.com
      secretName: blue-green-nginx-tls
  rules:
    - host: blue.terranetes.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: terranetes-nodeblue-svc  
                port:
                  number: 80
    - host: green.terranetes.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: terranetes-nodegreen-svc  
                port:
                  number: 80
---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fph1rfy09qoj63ia13xoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fph1rfy09qoj63ia13xoi.png" alt="Image description" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5einpa06d1ud64t0sugh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5einpa06d1ud64t0sugh.png" alt="Image description" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlffm4lzua1xktr2s7kp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlffm4lzua1xktr2s7kp.png" alt="Image description" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Initial Traffic Routing
&lt;/h1&gt;

&lt;p&gt;Currently, the traffic is split between blue.terranetes.com (blue app) and green.terranetes.com (green app). This setup routes users based on the hostname. Now let’s switching traffic from green environment (Live) to blue environment (New). And we are doing this switching on ingress resource. This is acting as out router .&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: blue-green-ingress
  namespace: blue-green
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-dns01-nginx
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - blue.terranetes.com
        - green.terranetes.com
      secretName: blue-green-nginx-tls
  rules:
    - host: blue.terranetes.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: terranetes-nodeblue-svc
                port:
                  number: 80
    - host: green.terranetes.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: terranetes-nodeblue-svc   # update blue svc here
                port:
                  number: 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will notice that even green.terranetes.com url will access (blue app).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blue.terranetes.com/" rel="noopener noreferrer"&gt;https://blue.terranetes.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://green.terranetes.com/" rel="noopener noreferrer"&gt;https://green.terranetes.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fillik6pig4maa9xw5t8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fillik6pig4maa9xw5t8s.png" alt="Image description" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We just switch the &lt;code&gt;green app&lt;/code&gt; users from &lt;code&gt;green environment&lt;/code&gt; to &lt;code&gt;blue environment&lt;/code&gt; we can easily rollback with this strategy by switching the service back to green svc on &lt;code&gt;green.terranetes.com&lt;/code&gt; .&lt;/p&gt;

&lt;h1&gt;
  
  
  Rolling Back the Switch
&lt;/h1&gt;

&lt;p&gt;You can roll back by modifying the Ingress again to restore the original routing:&lt;/p&gt;

&lt;p&gt;blue.terranetes.com -&amp;gt; &lt;strong&gt;blue app&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;green.terranetes.com -&amp;gt; &lt;strong&gt;green app&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Optional: Weighted Traffic Splitting
&lt;/h1&gt;

&lt;p&gt;If you want to gradually switch users from green to blue (or vice versa), consider implementing canary deployment or weighted routing using tools like:&lt;/p&gt;

&lt;p&gt;NGINX annotations (if supported).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Istio&lt;/code&gt; or &lt;code&gt;Traefik&lt;/code&gt; for advanced traffic management.&lt;/p&gt;

&lt;p&gt;This allows for:&lt;/p&gt;

&lt;p&gt;Gradual routing (e.g., 80% green, 20% blue).&lt;/p&gt;

&lt;p&gt;Monitoring the behavior of users before full migration.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion 🎉
&lt;/h1&gt;

&lt;p&gt;Through this guide, we’ve successfully implemented a Blue-Green Deployment strategy on Kubernetes with robust TLS encryption, utilizing Cert-Manager and Nginx Ingress. This architecture ensures zero-downtime deployments, seamless traffic switching, and enhanced security, making it a reliable choice for production environments.&lt;/p&gt;

&lt;p&gt;By switching users between green and blue environments effortlessly, we’ve demonstrated the power of dynamic traffic management. Whether it’s for releasing new features or mitigating issues with instant rollbacks, this approach minimizes risk and enhances user experience.&lt;/p&gt;

&lt;p&gt;Additionally, the option to incorporate weighted traffic splitting or canary deployments provides further flexibility for gradual rollouts, enabling better control and monitoring during transitions.&lt;/p&gt;

&lt;h1&gt;
  
  
  Key Takeaways 🗝️
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Ease of Switching:&lt;/strong&gt; ✨ Modify the Ingress resource to direct traffic instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Security:&lt;/strong&gt; 🔒 Automated TLS certificates ensure secure communication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rollback Ready:&lt;/strong&gt; 🔄 Revert traffic with minimal effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability:&lt;/strong&gt; 📈 Extend the setup to support more complex routing patterns like weighted traffic.&lt;/p&gt;

&lt;p&gt;With this setup, you’re equipped to deploy applications confidently, ensuring both reliability and user satisfaction. Ready to try this out in your environment? 🚀&lt;/p&gt;

&lt;p&gt;Happy Deploying! 🌟&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>devsecops</category>
      <category>microservices</category>
    </item>
    <item>
      <title>test</title>
      <dc:creator>George Ezejiofor</dc:creator>
      <pubDate>Sat, 28 Oct 2023 19:44:59 +0000</pubDate>
      <link>https://dev.to/ezejioforog/test-268m</link>
      <guid>https://dev.to/ezejioforog/test-268m</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
  </channel>
</rss>
