<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NTCTech</title>
    <description>The latest articles on DEV Community by NTCTech (@ntctech).</description>
    <link>https://dev.to/ntctech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784059%2Fc609d531-fdab-47ac-bb17-37fd1ecc3d71.jpg</url>
      <title>DEV Community: NTCTech</title>
      <link>https://dev.to/ntctech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ntctech"/>
    <language>en</language>
    <item>
      <title>The "Lift-and-Shift to KVM" Fallacy</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Mon, 04 May 2026 12:42:14 +0000</pubDate>
      <link>https://dev.to/ntctech/the-lift-and-shift-to-kvm-fallacy-3i9d</link>
      <guid>https://dev.to/ntctech/the-lift-and-shift-to-kvm-fallacy-3i9d</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsgokglxx71xvepd0dsx5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsgokglxx71xvepd0dsx5.jpg" alt="lift-and-shift KVM migration operating model gap — VMware integrated control plane vs unbundled KVM stack" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
The VM conversion completed without errors. Every workload made it across. The migration dashboard showed green, the project lead closed the ticket, and the consultants left the building.&lt;/p&gt;

&lt;p&gt;Three weeks later, backup verification jobs are silently failing. Monitoring dashboards are dark. The on-call team is operating without baselines. Nobody knows what normal looks like on the new platform.&lt;/p&gt;

&lt;p&gt;The VM conversion worked. The migration did not.&lt;/p&gt;

&lt;p&gt;This is the lift-and-shift KVM fallacy — and it isn't a KVM problem. It's a scoping problem. Most VMware-to-KVM migration plans capture the visible dependency — the hypervisor — and treat everything built around it as someone else's project. The Operating Model Gap is what that assumption leaves behind.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Lift-and-Shift Actually Moves
&lt;/h2&gt;

&lt;p&gt;Lift-and-shift KVM moves compute. Disk images transfer. Network definitions port. VM configurations recreate on the other side. From a data-plane perspective, the migration looks complete because the workloads are running.&lt;/p&gt;

&lt;p&gt;What does not move:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operational runbooks referencing vCenter constructs&lt;/li&gt;
&lt;li&gt;Backup architecture built against VADP APIs&lt;/li&gt;
&lt;li&gt;Monitoring thresholds calibrated to vSphere metrics&lt;/li&gt;
&lt;li&gt;Provisioning workflows targeting vCenter endpoints&lt;/li&gt;
&lt;li&gt;Snapshot behavior assumptions encoded in recovery procedures&lt;/li&gt;
&lt;li&gt;Storage policy logic tied to vSAN semantics&lt;/li&gt;
&lt;li&gt;Identity and access models mapped to vCenter RBAC&lt;/li&gt;
&lt;li&gt;Operator muscle memory built over years of vCenter navigation
None of this appears in the migration plan. All of it breaks after cutover.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;Operating Model Gap&lt;/strong&gt; is the distance between what the migration plan captured and what the platform actually required to function. Every item in that list is a component of the operating model. The hypervisor conversion touches none of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  VMware Was Never Just the Hypervisor
&lt;/h2&gt;

&lt;p&gt;The framing that produces lift-and-shift KVM plans is this: VMware equals ESXi. Replace ESXi with KVM. Migration complete.&lt;/p&gt;

&lt;p&gt;That framing is wrong. VMware was never ESXi. VMware was the control plane your entire operating model was built around.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqihyf1ya4mpehr343djb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqihyf1ya4mpehr343djb.jpg" alt="lift-and-shift KVM — VMware control plane stack showing vCenter, vSAN, NSX, vROps, and VADP as integrated layers" width="800" height="447"&gt;&lt;/a&gt; &lt;br&gt;
| What the plan says | What actually changes |&lt;br&gt;
|---|---|&lt;br&gt;
| ESXi → KVM | vCenter (lifecycle and provisioning control) |&lt;br&gt;
| | vMotion semantics (live migration behavior) |&lt;br&gt;
| | vSAN (storage abstraction and policy model) |&lt;br&gt;
| | NSX (network policy and microsegmentation) |&lt;br&gt;
| | vROps / vRealize (observability and alerting logic) |&lt;br&gt;
| | VADP (backup API framework) |&lt;br&gt;
| | DRS (scheduling and placement policy) |&lt;br&gt;
| | Snapshot behavior (application-consistent logic) |&lt;/p&gt;

&lt;p&gt;A VMware environment is not a hypervisor with add-ons. It is an integrated control surface where compute scheduling, storage policy, network segmentation, observability, and recovery operations all converge. When you replace ESXi with KVM, every one of those layers needs a replacement or a rebuild — and unlike ESXi, KVM does not ship them included.&lt;/p&gt;

&lt;p&gt;KVM is a kernel module. The management plane, storage architecture, network abstraction, and observability stack are your responsibility to assemble, integrate, and operate. That assembly is the migration work most lift-and-shift plans never scope.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Operating Model Test:&lt;/strong&gt; If vCenter disappeared tomorrow, what percentage of your operating model disappears with it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For most VMware shops, the honest answer is somewhere between 60 and 90 percent. That percentage is the scope of what a lift-and-shift to KVM does not address.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Failure Surfaces After Cutover
&lt;/h2&gt;

&lt;p&gt;Lift-and-shift KVM migrations do not fail at cutover. They fail in operations. The failure surfaces are predictable, they appear in sequence, and they are almost never in the migration plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbcgavtra6ldhh9qw4q1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbcgavtra6ldhh9qw4q1.jpg" alt="lift-and-shift KVM three failure surfaces — control plane replacement, storage semantics collapse, and operational signal loss after cutover" width="800" height="437"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Surface 1: Control Plane Replacement (Day 1–7)
&lt;/h3&gt;

&lt;p&gt;You did not replace ESXi. You replaced vCenter.&lt;/p&gt;

&lt;p&gt;vCenter was the operational control surface for provisioning new workloads, managing VM lifecycle, enforcing placement policy, controlling access, and targeting automation. When you move to KVM, vCenter is gone — and everything that pointed at it needs a new target.&lt;/p&gt;

&lt;p&gt;The KVM ecosystem offers options: &lt;a href="https://libvirt.org/docs.html" rel="noopener noreferrer"&gt;libvirt&lt;/a&gt; for direct management, Proxmox VE for a GUI-centric model, oVirt for a closer-to-vCenter experience, OpenStack for cloud-scale orchestration. Each is a different operating model. None is a drop-in replacement. The team that executed a lift-and-shift KVM migration and operated vCenter for a decade does not automatically know how to operate any of them under pressure at 2am.&lt;/p&gt;

&lt;p&gt;This is the first stall point. Not because the management plane doesn't exist — it does — but because the operating model loses its control surface and the team has to rebuild operational confidence from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure Surface 2: Storage Semantics Collapse (Day 7–30)
&lt;/h3&gt;

&lt;p&gt;You did not lose shared storage. You lost the storage abstraction your platform behavior depended on.&lt;/p&gt;

&lt;p&gt;vSAN provided a distributed storage fabric with defined behavior around replication, failure domains, snapshot consistency, and policy-based placement. That abstraction encoded a set of assumptions your entire backup architecture, recovery procedures, and performance baselines were built against.&lt;/p&gt;

&lt;p&gt;In a KVM environment, that abstraction is gone. You are now operating raw storage — whether &lt;a href="https://docs.ceph.com/en/latest/architecture/" rel="noopener noreferrer"&gt;Ceph&lt;/a&gt;, NFS, iSCSI, or local — and the behavior is different in ways that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot behavior&lt;/strong&gt; — application-consistent snapshot mechanics differ by storage backend; VADP is gone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup assumptions&lt;/strong&gt; — protection jobs built against VADP APIs break immediately; rebuild is required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance characteristics&lt;/strong&gt; — latency, IOPS, and throughput profiles differ between vSAN and Ceph under the same load pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication semantics&lt;/strong&gt; — storage replication behavior and consistency guarantees are not equivalent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure domain logic&lt;/strong&gt; — how the platform handles node loss differs from vSAN's policy model
This is where migrations pass validation and fail under load. Workloads run. The environment looks healthy. The gaps appear during the first backup verification window, the first storage-intensive workload spike, or the first incident that requires a restore from a snapshot taken after cutover.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Failure Surface 3: Operational Signal Loss (Day 30+)
&lt;/h3&gt;

&lt;p&gt;The workloads moved. The signals didn't.&lt;/p&gt;

&lt;p&gt;VMware environments accumulate operational signal over years — dashboards calibrated to vROps metrics, alert thresholds tuned against vSphere counters, runbooks that reference specific vCenter constructs, capacity models built on historical data from the VMware telemetry stack. That signal is institutional knowledge encoded in tooling.&lt;/p&gt;

&lt;p&gt;After a KVM migration, all of it is wrong. The old dashboards are meaningless because the metrics don't exist. The alert thresholds don't map because the counters are different. The runbooks reference objects that no longer exist. The on-call team is operating blind against a platform they don't have baselines for yet.&lt;/p&gt;

&lt;p&gt;This is where Day 30 failure begins. Not a dramatic incident — a slow erosion of operational confidence, a growing number of "we're not sure what normal looks like" moments, and a steady accumulation of unresolved alerts the team has stopped trusting.&lt;/p&gt;

&lt;p&gt;The observability rebuild is not a migration task. It is a post-migration operational project that takes weeks. It is almost never in the original migration scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  When KVM Actually Fits
&lt;/h2&gt;

&lt;p&gt;This is not a post about KVM being unsuitable for enterprise infrastructure. KVM is a legitimate hypervisor running production workloads at scale across some of the largest environments in the world. The question is not whether a lift-and-shift KVM approach works — it's whether your operating model is positioned for it.&lt;/p&gt;

&lt;p&gt;KVM fits when the operating model already lives below VMware's abstraction layer. KVM is a &lt;a href="https://www.linux-kvm.org/page/Main_Page" rel="noopener noreferrer"&gt;Linux kernel module&lt;/a&gt;; operating it well means operating Linux well, at depth, under production pressure.&lt;/p&gt;

&lt;p&gt;The signal that KVM is the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux is already the operational center — the team thinks in hosts, not abstractions&lt;/li&gt;
&lt;li&gt;Automation already targets infrastructure primitives directly, not vCenter APIs&lt;/li&gt;
&lt;li&gt;The team has operated without VMware's abstraction layer under pressure — not in theory, in production&lt;/li&gt;
&lt;li&gt;Sovereignty or cost physics make open-source the architectural requirement, not just the preference&lt;/li&gt;
&lt;li&gt;Greenfield or container-adjacent workloads where VMware's abstraction was overhead, not operating leverage
The distinction that matters is not "does the team know Linux." It is whether the team has operated infrastructure at the primitive layer under production pressure. A team with deep vCenter muscle memory that also has Linux skills is not the same as a team that has always operated below the abstraction. The former needs a longer runway and an explicit skills transition plan. The latter is ready.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Scope the Operating Model Before the Hypervisor
&lt;/h2&gt;

&lt;p&gt;The correct sequencing for a lift-and-shift KVM migration is not: pick hypervisor, convert VMs, go live. It is: audit the operating model, scope the rebuild, then pick the hypervisor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshki15f9dq5t1tp7yid2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshki15f9dq5t1tp7yid2.jpg" alt="lift-and-shift KVM migration scope checklist — management plane, storage semantics, observability rebuild, and skills audit" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
Four things to scope before the hypervisor decision is final:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;01 — Management Plane Decision&lt;/strong&gt;&lt;br&gt;
Pick the management plane before the hypervisor. libvirt, Proxmox, oVirt, and OpenStack are not equivalent choices — each implies a different operational model, skill requirement, and automation target. The management plane decision determines the operating model. The hypervisor follows from it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;02 — Storage Semantics Audit&lt;/strong&gt;&lt;br&gt;
Map every storage dependency in the current environment — snapshot behavior, backup integration points, replication architecture, performance baselines. Document what the new storage backend provides and where the semantics differ. The delta is the rebuild scope. Treat it as a parallel workstream, not a migration task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;03 — Observability Rebuild Plan&lt;/strong&gt;&lt;br&gt;
Plan for zero operational signal on Day 1. The old dashboards are dead. The alert thresholds don't transfer. Build the observability stack against the new platform before workloads arrive — or accept that the first weeks post-cutover will be operationally blind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;04 — Skills Audit (Honest Version)&lt;/strong&gt;&lt;br&gt;
Not certifications. Not training course completions. Operational depth under pressure. Has the team operated storage at the Ceph or NFS primitive level during an incident? Have they managed KVM scheduling behavior under resource contention? Knowing how something works is not the same as having operated it when it breaks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;KVM is not the problem. Treating the hypervisor as the platform is.&lt;/p&gt;

&lt;p&gt;VMware was a control plane your entire operating model was built around. A lift-and-shift KVM project moves the compute layer and leaves the operating model — management plane, storage semantics, observability stack, backup architecture, and operational muscle memory — orphaned on the other side of the migration window.&lt;/p&gt;

&lt;p&gt;The fallacy is not that KVM is harder than expected. The fallacy is scoping a lift-and-shift KVM project as a hypervisor migration when what you actually triggered is an operating model rewrite. Name it correctly before the project starts. Scope the rebuild explicitly. Run the Operating Model Test before you sign the migration plan.&lt;/p&gt;

&lt;p&gt;If vCenter disappeared tomorrow and 70 percent of your operating model went with it, that 70 percent is the migration. The hypervisor swap is the easy part.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/lift-and-shift-kvm-migration-fallacy/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vmware</category>
      <category>kvm</category>
      <category>devops</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Google Just Moved the Control Plane Boundary</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Fri, 01 May 2026 12:10:18 +0000</pubDate>
      <link>https://dev.to/ntctech/google-just-moved-the-control-plane-boundary-1fk8</link>
      <guid>https://dev.to/ntctech/google-just-moved-the-control-plane-boundary-1fk8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbf0l1pj5zmu77jubf6i4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbf0l1pj5zmu77jubf6i4.jpg" alt="Control plane boundary shift — Kubernetes scaling from cluster multiplication to control plane unification" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
For a decade, the Kubernetes scaling playbook had one move: add another cluster.&lt;/p&gt;

&lt;p&gt;Need more capacity? Add a cluster. Need workload isolation? Add a cluster. Need regional separation? Add a cluster. Need a dedicated GPU pool? Add a cluster. The cluster became the unit of scale because the control plane could not scale far enough to avoid making it one.&lt;/p&gt;

&lt;p&gt;At Google Cloud Next '26, Google made the opposite bet. A single Kubernetes-conformant control plane spanning 256,000 nodes across multiple regions, managing a million accelerators as a unified capacity reserve. Not bigger Kubernetes. A different architectural claim entirely.&lt;/p&gt;

&lt;p&gt;The claim is this: the control plane is now the unit of scale. The cluster is not.&lt;/p&gt;

&lt;p&gt;Most platform architectures were not built around that assumption. They are still operating the old boundary — and that mismatch is what this post is actually about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old Scaling Model Was Cluster Multiplication
&lt;/h2&gt;

&lt;p&gt;The cluster-as-boundary model made sense when it emerged. Kubernetes control planes had real scale limits. Policy enforcement was cluster-scoped. Observability was cluster-local. Capacity pools were physically tied to the node groups a given control plane could manage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31zqer257rhjb6rbuu4w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31zqer257rhjb6rbuu4w.jpg" alt="Cluster multiplication model — Kubernetes scaling by adding clusters creates fragmented capacity pools" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
So teams multiplied. A cluster per environment. A cluster per region. A cluster per team. A cluster per workload class. A cluster per GPU type. The operational pattern became: when you hit a boundary, add another cluster.&lt;/p&gt;

&lt;p&gt;That solved the immediate problem. It also created a different class of problem that compounded silently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fragmented capacity.&lt;/strong&gt; Idle capacity in one cluster could not be claimed by a workload running out of headroom in another.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicated policy.&lt;/strong&gt; Every cluster needed its own RBAC, network policy, and admission control. Changes had to propagate across every cluster. Drift was structural.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disconnected observability.&lt;/strong&gt; Metrics and logs were cluster-local. Understanding system-wide state required stitching together signals from dozens of independent sources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compounding operational overhead.&lt;/strong&gt; Each cluster was a discrete object requiring lifecycle management, upgrades, and failure response.
The industry normalized cluster multiplication because the alternative — scaling the control plane itself — was not a credible option. Until now.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Google Just Moved the Boundary
&lt;/h2&gt;

&lt;p&gt;GKE Hypercluster is not a capacity announcement. It is an architectural boundary announcement.&lt;/p&gt;

&lt;p&gt;A single, Kubernetes-conformant control plane managing 256,000 nodes across multiple Google Cloud regions, treating distributed infrastructure as a unified capacity reserve — that is a claim about where the boundary should sit. Not at the cluster. At the control plane.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Control Plane Boundary&lt;/strong&gt; is the logical boundary at which scheduling authority, policy enforcement, and capacity governance are unified. For a decade, that boundary was the cluster by necessity. Hypercluster is Google's signal that it does not have to be.&lt;/p&gt;

&lt;p&gt;When the control plane boundary moves outward — from cluster-scope to fleet-scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capacity planning becomes global&lt;/li&gt;
&lt;li&gt;Policy becomes a control plane concern, not a cluster concern&lt;/li&gt;
&lt;li&gt;Scheduling becomes capacity orchestration across a unified multi-region pool&lt;/li&gt;
&lt;li&gt;Failure domains get redefined
This is not a GKE-specific development. It is a signal about where the architectural center of gravity is moving.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Most Teams Still Operate the Old Boundary
&lt;/h2&gt;

&lt;p&gt;Most platform architectures today are still built around four cluster-scoped assumptions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster as operational boundary.&lt;/strong&gt; Runbooks, upgrade cycles, certificate rotation — all scoped to the cluster. This made sense when each cluster was the largest coherent unit. It becomes overhead when the control plane boundary moves outward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster as policy boundary.&lt;/strong&gt; RBAC, network policy, admission webhooks — all applied at cluster scope, duplicated across every cluster in the fleet, drifting over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster as capacity boundary.&lt;/strong&gt; Cluster autoscaler, node pools, resource quotas — all defined within a cluster. Cross-cluster capacity awareness requires external tooling or manual coordination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster as failure boundary.&lt;/strong&gt; Blast radius assumptions and availability zone mapping built around the cluster as the natural unit of failure.&lt;/p&gt;

&lt;p&gt;These assumptions were correct architectural choices when the control plane could not scale past them. They become architectural debt when the control plane boundary moves.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Breaks When the Boundary Moves
&lt;/h2&gt;

&lt;p&gt;When the control plane boundary shifts, the old cluster-scoped assumptions do not just become inefficient — some of them break operationally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkq2wmodvj6t8nnt4d1a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkq2wmodvj6t8nnt4d1a.jpg" alt="Four cluster boundary assumptions that break when the control plane boundary shifts" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Capacity planning stops being cluster-local.&lt;/strong&gt; The question "how much headroom does this cluster have" becomes wrong. The right question is "what is the available capacity in this scheduling domain" — which may span regions and node types. GPU idle is already a capacity forecasting failure in cluster-local models. It compounds in fleet-scale models without the right abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy can no longer be cluster-scoped by default.&lt;/strong&gt; Policy duplication that was an accepted operational cost becomes a design inconsistency across the unified scheduling domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure domains stop aligning cleanly to cluster boundaries.&lt;/strong&gt; Blast radius design at control-plane-boundary scale is an explicit architectural decision, not a cluster-topology default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability must model control-plane-wide state.&lt;/strong&gt; Cluster-local metrics describe local state. Fleet-wide scheduling decisions require fleet-wide visibility. The gap between what dashboards show and what the system is actually doing does not shrink when the scheduling domain expands without deliberate instrumentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduling becomes capacity orchestration, not node placement.&lt;/strong&gt; Kubernetes scheduling at cluster scope is a bin-packing problem. At control-plane-boundary scope it is a capacity allocation problem. Different mental model, different tooling, different operational discipline.&lt;/p&gt;

&lt;p&gt;This is where Kubernetes operations becomes distributed control plane design. That is the actual shift — not the chip count.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Million-Chip Problem Is Not About Chips
&lt;/h2&gt;

&lt;p&gt;The headline number from Hypercluster is a million chips. That is the wrong thing to pay attention to.&lt;/p&gt;

&lt;p&gt;Google is not telling you that you need to manage a million chips. Google is telling you that the next infrastructure bottleneck is not compute — it is the control plane that governs compute.&lt;/p&gt;

&lt;p&gt;The teams still scaling by multiplying clusters are solving yesterday's bottleneck. Every cluster added under the old model is a migration conversation waiting to happen under the new one. The cost of a cluster-multiplication architecture is not just operational overhead. It is the structural cost of a boundary assumption that the industry is moving past.&lt;/p&gt;

&lt;p&gt;The control plane boundary is not a GKE feature. It is the next architectural forcing function in distributed infrastructure. The architectural question for everyone else is not whether to adopt Hypercluster. It is whether your platform design is built around a boundary assumption that is already changing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;Kubernetes cluster multiplication was not a mistake. It was the correct architectural response to a real constraint: the control plane could not scale far enough to make it unnecessary.&lt;/p&gt;

&lt;p&gt;That constraint has now been challenged directly. The Control Plane Boundary — the logical boundary at which scheduling authority, policy enforcement, and capacity governance are unified — belongs at fleet scope, not cluster scope. Google made that bet publicly at Next '26.&lt;/p&gt;

&lt;p&gt;Most platform architectures are still designed around the cluster as that boundary. The four assumptions — cluster as operational boundary, policy boundary, capacity boundary, and failure boundary — were correct when the ceiling was low. They become architectural debt when the ceiling moves.&lt;/p&gt;

&lt;p&gt;The million-chip number is not the story. The story is what it signals about where the bottleneck is moving. For a decade, teams added clusters to avoid hitting the control plane ceiling. The ceiling just moved. The question is whether your architecture was designed for the constraint, or for the problem the constraint was preventing you from solving.&lt;/p&gt;

&lt;p&gt;The Control Plane Boundary has shifted. Most architectures have not.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/control-plane-boundary-kubernetes-scale/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>architecture</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>GPU Scheduling in Kubernetes: Start Before the Scheduler</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Thu, 30 Apr 2026 12:55:20 +0000</pubDate>
      <link>https://dev.to/ntctech/gpu-scheduling-in-kubernetes-start-before-the-scheduler-1pd7</link>
      <guid>https://dev.to/ntctech/gpu-scheduling-in-kubernetes-start-before-the-scheduler-1pd7</guid>
      <description>&lt;p&gt;Most teams think GPU scheduling starts with the scheduler.&lt;/p&gt;

&lt;p&gt;It starts with demand modeling.&lt;/p&gt;

&lt;p&gt;By the time Volcano, Kueue, or KEDA enters the conversation, the expensive mistake has usually already been made. The cluster was provisioned against a theoretical peak that rarely materializes. The demand curve was never drawn. The concurrency profile was assumed rather than measured.&lt;/p&gt;

&lt;p&gt;The core argument: &lt;strong&gt;GPU scheduling is not a capacity solution. It is a capacity enforcement layer.&lt;/strong&gt; If you provisioned against the wrong demand curve, the scheduler cannot save you.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Demand Model Preflight
&lt;/h3&gt;

&lt;p&gt;Before you talk about schedulers, answer four questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. What is your real concurrency floor?&lt;/strong&gt; Not peak theoretical demand. The minimum sustained parallel work your cluster must support without queue collapse. If you cannot answer this from measurement, you don't have a demand model — you have an assumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. What is burst, and what is noise?&lt;/strong&gt; If demand spikes for ninety seconds, does that justify permanent GPU allocation — or should it queue? Burst shorter than your cold-start window is noise. Noise should not drive provisioning decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. How long does work stay resident?&lt;/strong&gt; A model loaded in VRAM is not active work. If memory stays hot longer than compute stays busy, utilization is already overstated before the scheduler runs a single job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. What can wait, and for how long?&lt;/strong&gt; Scheduling starts with tolerated latency. If every workload is marked urgent, none of them are schedulable efficiently.&lt;/p&gt;

&lt;p&gt;If you cannot answer all four from data rather than assumption, the scheduler conversation is premature.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Correct GPU Demand Modeling Looks Like
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3reeyrcskpvrrm8ld41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3reeyrcskpvrrm8ld41.png" alt="GPU scheduling demand modeling inputs Kubernetes architecture diagram" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
Seven inputs. Each one has a consequence if you get it wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Request concurrency&lt;/strong&gt; — If you modeled single-thread throughput, your cluster is sized for a workload that never actually runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queue depth&lt;/strong&gt; — How many jobs can wait before it becomes a latency problem? Most teams buy hardware when they should be designing queue behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Burst profile&lt;/strong&gt; — Short demand spikes get priced into permanent capacity. A correct burst profile separates the spike duration from the allocation decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency tolerance&lt;/strong&gt; — Batch training tolerates queuing. Real-time inference does not. Sizing uniformly across both is a guaranteed waste pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Batch vs inference mix&lt;/strong&gt; — These are distinct provisioning decisions. A cluster optimized for training batch jobs has a different shape than one optimized for sustained inference throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VRAM residency time&lt;/strong&gt; — How long does a model stay loaded relative to how long it is actively processing requests? High residency-to-compute ratio means memory is doing the work of availability, not throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Job duration variance&lt;/strong&gt; — High variance creates scheduling fragmentation regardless of how well the scheduler is configured. Understanding variance at p50/p90/p99 determines whether gang scheduling or preemption policies are necessary.&lt;/p&gt;




&lt;h3&gt;
  
  
  Provision for Shape, Not Peak
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzipmrxs2jggqp2c70px.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzipmrxs2jggqp2c70px.png" alt="GPU provisioning demand shape vs peak architecture diagram Kubernetes" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
The corrective action is a provisioning philosophy shift.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Wrong Target&lt;/th&gt;
&lt;th&gt;Correct Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Peak demand&lt;/td&gt;
&lt;td&gt;Concurrency bands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max model size&lt;/td&gt;
&lt;td&gt;Queue tolerance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Future scale&lt;/td&gt;
&lt;td&gt;Sustained demand windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worst-case headroom&lt;/td&gt;
&lt;td&gt;Known burst ceilings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Concurrency bands come from request concurrency measurement. Queue tolerance comes from latency tolerance modeling. Burst ceilings come from burst profile analysis. The provisioning decision is downstream of the model — not upstream of it.&lt;/p&gt;




&lt;h3&gt;
  
  
  Where the Scheduler Actually Fits
&lt;/h3&gt;

&lt;p&gt;The right evaluation criterion for a scheduler is not feature sets. It is whether the scheduler enforces the constraints your demand model defined.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpk6vo03alvcbwqfzbg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpk6vo03alvcbwqfzbg2.png" alt="GPU scheduling Kubernetes enforcement layer Volcano Kueue KEDA architecture diagram" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
Three tools, three enforcement roles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Volcano&lt;/strong&gt; → batch fairness / queue discipline. Implements fair-share scheduling and gang scheduling for distributed training. Enforces concurrency band design across workload classes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kueue&lt;/strong&gt; → admission control / workload gating. Answers Preflight Question 4 directly — what can wait. Prevents jobs from entering the scheduling queue until capacity exists to run them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KEDA&lt;/strong&gt; → event-driven scale behavior. Answers Preflight Question 2 — burst vs noise. Scales to the burst ceiling the demand model defined, not to unbounded demand signals.&lt;/p&gt;

&lt;p&gt;These are not alternatives. They are complementary enforcement layers at different points in the scheduling stack.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Good GPU Scheduling Actually Looks Like
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfdcoqhkohgw9kr29mvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfdcoqhkohgw9kr29mvm.png" alt="GPU scheduling success state operational definition Kubernetes architecture diagram" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
Not which scheduler. What the outcome looks like when the demand model is correct:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs wait intentionally — queue latency exists by design, not by accident&lt;/li&gt;
&lt;li&gt;Inference scales on bounded demand — KEDA scales to the burst ceiling, not beyond it&lt;/li&gt;
&lt;li&gt;VRAM stays loaded for active work — residency-to-compute ratio is enforced operationally&lt;/li&gt;
&lt;li&gt;Queue latency is tolerated by design — the latency tolerance input becomes an SLA&lt;/li&gt;
&lt;li&gt;Expensive accelerators do not sit hot without work — loaded ≠ active, eliminated&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Architect's Verdict
&lt;/h3&gt;

&lt;p&gt;The scheduler is not where GPU efficiency begins. It is where good capacity decisions are enforced — or bad ones become permanent.&lt;/p&gt;

&lt;p&gt;Build the demand model first. Provision to its shape. Then configure the enforcement layer. In that order, and no other.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/gpu-scheduling-kubernetes/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>infrastructure</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Cost Visibility Is Not Cost Control</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Wed, 29 Apr 2026 12:05:37 +0000</pubDate>
      <link>https://dev.to/ntctech/cost-visibility-is-not-cost-control-e1i</link>
      <guid>https://dev.to/ntctech/cost-visibility-is-not-cost-control-e1i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ixc5hy2z3813020ejfo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ixc5hy2z3813020ejfo.jpg" alt="The Spend Decision Horizon — cost control vs cost visibility in cloud architecture" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
Cost visibility tells you what your architecture costs. Cost control determines whether that architecture should have existed in the first place.&lt;/p&gt;

&lt;p&gt;These are not the same discipline. Most organizations treat them as if they are — and the FinOps data proves they have been doing so for years without fixing the underlying problem.&lt;/p&gt;

&lt;p&gt;The State of FinOps 2026 report found that 98% of organizations are now actively managing AI spend. Tooling investment has increased. Executive ownership has expanded. Reporting has become more granular. And yet organizations without structured cost governance still waste 32–40% of their cloud budgets on idle resources, oversized instances, and structural inefficiencies that dashboards surface but cannot remove.&lt;/p&gt;

&lt;p&gt;More visibility. Same waste. That is the signal worth paying attention to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Visibility Is a Reporting Layer, Not a Control Layer
&lt;/h2&gt;

&lt;p&gt;FinOps tools do several things well. They surface spend. They expose waste. They identify anomalies. They allocate costs across teams and workloads. These are genuinely useful capabilities — the problem is that none of them can prevent the architecture decision that created the bill.&lt;/p&gt;

&lt;p&gt;This distinction matters because most cost governance programs are built around observation, not prevention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dashboards show you where money went&lt;/li&gt;
&lt;li&gt;Alerts tell you spend has increased&lt;/li&gt;
&lt;li&gt;Tagging lets you attribute cost to a team&lt;/li&gt;
&lt;li&gt;Optimization recommendations identify inefficiency&lt;/li&gt;
&lt;li&gt;Monthly reviews give you a structured moment to react
Every one of those mechanisms operates after the decision. The commitment — the topology choice, the platform selection, the replication model, the egress dependency — was made upstream. By the time FinOps sees the number, the architecture has already answered the cost question.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud cost is now an architectural constraint — but that constraint only bites when you treat cost as a design variable rather than a reporting output. Visibility is lagging telemetry. It tells you what happened. It does not determine what was allowed to happen.&lt;/p&gt;

&lt;p&gt;That distinction is the entire argument.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Spend Decision Horizon
&lt;/h2&gt;

&lt;p&gt;There is a point in the architecture lifecycle where cost becomes structurally committed and no longer meaningfully adjustable through reporting. Call it the Spend Decision Horizon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81ow81qgi23mmvn0tmsw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81ow81qgi23mmvn0tmsw.jpg" alt="Spend Decision Horizon diagram — before and after cost commitment in cloud architecture" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
Before that horizon, cost is a design variable. Service topology, data movement paths, replication models, control plane placement, GPU sizing, retention architecture, egress dependencies, idle capacity policy — these decisions are live. The architect is in the room. The cost outcome is still shapeable.&lt;/p&gt;

&lt;p&gt;After that horizon, cost is an observation. Dashboards appear. Tagging spreads. Allocation reports get generated. Anomaly alerts fire. Monthly optimization reviews happen. None of those activities change the architecture that produced the number.&lt;/p&gt;

&lt;p&gt;The Spend Decision Horizon is not a concept. It is a handoff. Before it, the architect owns cost. After it, FinOps has the receipt.&lt;/p&gt;

&lt;p&gt;The reason most cost governance programs underperform is that they are built entirely on the right side of that horizon. They are sophisticated receipt-reading operations with no authority over what gets ordered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Cost Actually Gets Locked In
&lt;/h2&gt;

&lt;p&gt;The Spend Decision Horizon is defined by five commitment points — the moments where spend transitions from negotiable to structural.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqri19afly3zvcr9pgvy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqri19afly3zvcr9pgvy.jpg" alt="Five cost commitment points in cloud architecture — where spend becomes structural" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;1. Data path design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How data moves through your architecture determines a significant portion of your recurring cost before a single workload runs. Cross-region reads, replication, egress, archive retrieval — these are not line items you optimize after deployment. They are the outcome of topology decisions made during design. Once the data path is established, the cost model follows it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Control plane decisions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Always-on orchestration, management overhead, idle infrastructure, and operational tooling all carry a cost that compounds at scale. The control plane was placed before FinOps arrived.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Capacity forecasting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Peak-sized clusters, overprovisioned GPU infrastructure, and statically allocated compute are the loudest signals in any cost audit. But the overprovisioning was a forecast decision, not a utilization decision. GPU idle is a capacity forecasting failure, not a scheduler problem — and the same logic applies across all compute layers. You cannot optimize your way out of a demand model that was wrong at provisioning time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Platform abstraction choices&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Managed services, proprietary data layers, and convenience abstractions trade operational simplicity for structural spend commitment. Data gravity is the mechanism: once data accumulates around a managed platform, movement cost locks in. Vendor lock-in happens through the networking layer, not through APIs — and by the time the cost is visible in a dashboard, the dependency chain is already load-bearing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Recovery architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standby duplication, replication tax, and restore-path cost are a function of how recovery was designed. The replication model, the standby footprint, and the recovery tier placement all commit spend at design time. FinOps sees the storage and compute bill. It does not redesign the recovery architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why FinOps Can See Waste But Not Remove It
&lt;/h2&gt;

&lt;p&gt;This is not a criticism of FinOps. It is a description of its structural position in the decision chain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffr569t96hiuxwnanjtq1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffr569t96hiuxwnanjtq1.jpg" alt="FinOps visibility gap — what FinOps can see vs what it can change in cloud architecture" width="800" height="447"&gt;&lt;/a&gt;&lt;br&gt;
FinOps can identify unused resources, overprovisioned instances, bad commitment purchases, idle capacity, and untagged spend. That visibility is real and valuable. The problem is that identifying the consequence is not the same as owning the cause.&lt;/p&gt;

&lt;p&gt;FinOps typically cannot change:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The service topology&lt;/li&gt;
&lt;li&gt;The platform selection&lt;/li&gt;
&lt;li&gt;The replication model&lt;/li&gt;
&lt;li&gt;The dependency chain&lt;/li&gt;
&lt;li&gt;The control plane footprint&lt;/li&gt;
&lt;li&gt;The egress architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those decisions were made by architects, platform teams, and engineering leads — usually without cost explicitly modeled as a design constraint. AI inference cost is the clearest current example: the decision to use a particular model, route to a particular endpoint, or replicate across a particular region commits spend that observability tooling can surface but not prevent.&lt;/p&gt;

&lt;p&gt;There is a pattern that has emerged as FinOps has scaled into larger organizations: shared ownership becoming no ownership. When cost accountability is distributed across engineering, finance, and platform teams without clear authority over architectural decisions, the observation layer grows while the control layer stays frozen. More people watching the dashboard. Nobody with authority to change what the dashboard is measuring.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Control Starts Before Deployment
&lt;/h2&gt;

&lt;p&gt;The corrective framing is not a checklist. It is a single shift in where cost enters the architecture conversation.&lt;/p&gt;

&lt;p&gt;Cost control starts at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture review, where topology and data path decisions are still live&lt;/li&gt;
&lt;li&gt;Workload placement, where capacity forecasting is still a design input&lt;/li&gt;
&lt;li&gt;Control plane design, where operational overhead is still negotiable&lt;/li&gt;
&lt;li&gt;Dependency design, where platform abstraction tradeoffs are still explicit&lt;/li&gt;
&lt;li&gt;Demand modeling, where GPU scheduling and capacity shape are still shapeable
Not after the bill arrives.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The teams that consistently achieve meaningful cost efficiency are not the ones with the best dashboards. They are the ones that treat cost as a first-class architectural constraint — alongside reliability, security, and performance — before the first resource is provisioned.&lt;/p&gt;

&lt;p&gt;Cost visibility is not the problem. Visibility is useful. The problem is treating it as the solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;The FinOps stack has never been more sophisticated. Spend is visible. Allocation is granular. Anomalies are caught faster. Optimization recommendations are automated. And organizations are still wasting a third of their cloud budgets on structural decisions that no amount of dashboard sophistication can undo.&lt;/p&gt;

&lt;p&gt;Visibility is lagging telemetry. It describes the cost of decisions already made. It cannot reach back across the Spend Decision Horizon and change the topology, the platform choice, the replication model, or the capacity forecast that produced the number.&lt;/p&gt;

&lt;p&gt;Cost control is not a reporting discipline. It is an architecture discipline. The five commitment points — data path, control plane, capacity forecasting, platform abstraction, and recovery architecture — are where spend is decided, not observed. Governance programs built entirely after those decisions are sophisticated receipt-reading operations with no authority over what gets ordered.&lt;/p&gt;

&lt;p&gt;The Spend Decision Horizon is not a concept. It is a handoff. Before it, the architect owns cost. After it, FinOps has the receipt. The question is not whether your dashboards are good enough. The question is how much of your cost structure was already committed before FinOps was ever in the room.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/cost-visibility-cost-control/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>architecture</category>
      <category>devops</category>
      <category>finops</category>
    </item>
    <item>
      <title>Your AI Cluster Is Idle 95% of the Time</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Tue, 28 Apr 2026 11:58:10 +0000</pubDate>
      <link>https://dev.to/ntctech/your-ai-cluster-is-idle-95-of-the-time-485g</link>
      <guid>https://dev.to/ntctech/your-ai-cluster-is-idle-95-of-the-time-485g</guid>
      <description>&lt;p&gt;Your GPU utilization dashboard reads 40%. The cluster is healthy. The GPUs are loaded.&lt;/p&gt;

&lt;p&gt;Except they're not working.&lt;/p&gt;

&lt;p&gt;That 40% is a peak average across a monitoring window. It doesn't show the forty minutes after the spike when the inference queue drained and the cluster sat fully provisioned against a trickle of requests two nodes could have handled.&lt;/p&gt;

&lt;p&gt;The cluster isn't underutilized. It's mispriced against actual demand.&lt;/p&gt;

&lt;p&gt;That's a different problem with a different root cause — and the mistake that created it didn't happen in your scheduler. It happened at design time.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why GPU Utilization Numbers Lie
&lt;/h3&gt;

&lt;p&gt;Most monitoring platforms conflate two things with almost nothing in common: &lt;strong&gt;memory residency&lt;/strong&gt; and &lt;strong&gt;compute activity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A GPU can be fully loaded — model weights resident, tensors staged, inference engine warm — and simultaneously producing zero output. The Kubernetes GPU resource model treats GPU allocation as binary: assigned or not. There's no native distinction between memory-resident and compute-active states.&lt;/p&gt;

&lt;p&gt;The hardware is occupied. No work is being done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loaded ≠ Active.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A model resident in VRAM is not a GPU doing work. It's a GPU holding a reservation. Most teams treat model-loaded status as GPU-in-use status and provision accordingly. That single assumption is responsible for more mispriced AI capacity than any scheduling inefficiency or orchestration gap.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Three GPU Utilization Idle Modes
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfsas0gm5fsbw05zcy1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfsas0gm5fsbw05zcy1e.png" alt="The Three GPU utilization Idle Modes — Batch Idle, Inference Idle, Provisioning Idle architecture diagram" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
Not all idle compute is the same problem. Before you can fix the architecture, you need to name which mode you're in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Batch Idle&lt;/strong&gt; — The gap between training runs. The cluster stays hot between jobs because cold startup costs are high. That gap, multiplied across a training schedule, is pure idle compute priced at full cluster cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inference Idle&lt;/strong&gt; — The model is loaded. The inference engine is warm. Requests are arriving — just not at the rate the cluster was sized for. GPU utilization metrics show the GPUs as occupied. The memory utilization is real. The compute utilization is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provisioning Idle&lt;/strong&gt; — The earliest failure and the most expensive one over time. The cluster was sized for a workload that hasn't arrived yet. Peak inference demand for Q3. The large model run that's six weeks out. The hardware is live, the cost is running, and the demand it was priced against exists only in a planning document.&lt;/p&gt;

&lt;p&gt;All three modes share one root cause: &lt;strong&gt;the demand curve was never modeled correctly.&lt;/strong&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  This Was a Forecasting Failure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l2uelh5x09xat15cr7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l2uelh5x09xat15cr7p.png" alt="AI GPU provisioning forecasting failure — demand curve never modeled architecture diagram" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
The framing that gets used for this problem is utilization. The fix must be better scheduling, better bin-packing, better autoscaling. That framing is wrong.&lt;/p&gt;

&lt;p&gt;Low utilization is an output. The input was a provisioning decision made without adequate demand modeling.&lt;/p&gt;

&lt;p&gt;Here's what the forecasting actually missed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The demand curve was never modeled.&lt;/strong&gt; Teams provisioned for theoretical peak without modeling actual request distribution across a typical operating window. Peak is real. It is also rare.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency was assumed, not measured.&lt;/strong&gt; Most provisioning decisions are made against a single-request mental model — how fast can the cluster serve one request — rather than against a concurrent request distribution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Residency was mistaken for throughput.&lt;/strong&gt; A GPU holding a 70B parameter model in VRAM is not a GPU running at capacity. It's a GPU with a very expensive reservation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime limits were never set.&lt;/strong&gt; Without execution budgets, the cluster expands to fill whatever headroom exists — and headroom was built in generously because the demand model was peak-anchored.
Most teams never modeled the demand curve. They sized for theoretical peak, provisioned for future concurrency, and treated loaded memory as active work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Did you model request concurrency before you provisioned — or did you just size for the busiest hour you could imagine?&lt;/strong&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  What the Math Actually Looks Like
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4najc9xo51tyhugeufk5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4najc9xo51tyhugeufk5.png" alt="GPU cluster mispriced capacity six-figure forecasting error math example" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
An 8× A100 cluster runs approximately $38,000/month in total cost of ownership. At 5% sustained utilization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Monthly cluster cost:     $38,000
Sustained utilization:        5%
Productive compute/month:  $1,900
Idle compute/month:       $36,100

Annual forecasting error: $433,200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a slightly inefficient cluster. It's a six-figure architecture constraint that compounds every month the provisioning assumption goes uncorrected.&lt;/p&gt;




&lt;h3&gt;
  
  
  This Is an Architecture Problem, Not a Scheduling Problem
&lt;/h3&gt;

&lt;p&gt;The standard response to low GPU utilization is a scheduling intervention: deploy Volcano, tune KEDA, implement DCGM-based autoscaling.&lt;/p&gt;

&lt;p&gt;These are real tools. They solve real problems. They do not fix this one.&lt;/p&gt;

&lt;p&gt;Schedulers optimize execution of work that has been correctly provisioned for. What they cannot do is retroactively correct a demand model that was wrong at design time. If the cluster was provisioned for 10× the actual sustained request rate, a better scheduler produces a more efficiently idle cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schedulers can distribute work. They cannot fix demand you modeled incorrectly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That fix happens before the cluster exists. It happens at design time, against a demand curve someone actually drew.&lt;/p&gt;




&lt;h3&gt;
  
  
  Architect's Verdict
&lt;/h3&gt;

&lt;p&gt;The gpu utilization problem is not a utilization problem. It's a forecasting problem that manifests as gpu utilization data, gets diagnosed as a scheduling problem, and gets treated with tooling that addresses the symptom while the root cause compounds every billing cycle.&lt;/p&gt;

&lt;p&gt;The central mistake is a category error: treating memory residency as compute activity. Every GPU idle mode — batch, inference, provisioning — traces back to a demand curve that was never drawn or was drawn incorrectly against theoretical maximums that rarely materialize in production.&lt;/p&gt;

&lt;p&gt;The teams that solve this aren't running more sophisticated schedulers. They're provisioning against actual request distributions, modeling concurrency from measurement rather than assumption, and treating loaded memory as exactly what it is: an expensive placeholder.&lt;/p&gt;

&lt;p&gt;Fix the demand model first. Everything else is optimization on top of a correctly sized foundation.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/ai-cluster-gpu-utilization/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>kubernetes</category>
      <category>infrastructure</category>
      <category>devops</category>
    </item>
    <item>
      <title>etcd Is Your Kubernetes Database: What Breaks and What to Watch</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Sat, 25 Apr 2026 12:26:50 +0000</pubDate>
      <link>https://dev.to/ntctech/etcd-is-your-kubernetes-database-what-breaks-and-what-to-watch-50i3</link>
      <guid>https://dev.to/ntctech/etcd-is-your-kubernetes-database-what-breaks-and-what-to-watch-50i3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgo2cstpb2n7jtpzwz1wo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgo2cstpb2n7jtpzwz1wo.jpg" alt="etcd kubernetes state layer — API server as stateless translation layer over etcd key-value store" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
etcd is the only component in your Kubernetes control plane that holds state.&lt;/p&gt;

&lt;p&gt;Not your API server. Not your scheduler. Not your controller manager. &lt;strong&gt;etcd.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If etcd is slow, your cluster is slow. If etcd is inconsistent, your cluster is inconsistent. If etcd fails, your control plane doesn't degrade — it stops.&lt;/p&gt;

&lt;p&gt;Most teams don't think about this until the cluster starts behaving in ways they can't explain.&lt;/p&gt;




&lt;h2&gt;
  
  
  What etcd Actually Does
&lt;/h2&gt;

&lt;p&gt;The API server is stateless. It validates your request, writes desired state to etcd, and returns. The scheduler watches etcd. The controller manager watches etcd. Every pod definition, secret, ConfigMap, lease, and node registration — written to etcd first, read from etcd later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes is a state machine. etcd is the state.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Breaks (And Why It Doesn't Look Like etcd)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9h85ucad4unmsha9pxk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9h85ucad4unmsha9pxk.jpg" alt="etcd kubernetes failure cascade showing disk latency causing API server lag, controller drift, and stuck pods" width="800" height="437"&gt;&lt;/a&gt; &lt;br&gt;
etcd failures don't surface as "database errors." They surface as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl get pods&lt;/code&gt; hanging for seconds&lt;/li&gt;
&lt;li&gt;Pods stuck in &lt;code&gt;Pending&lt;/code&gt; or &lt;code&gt;Terminating&lt;/code&gt; indefinitely&lt;/li&gt;
&lt;li&gt;Deployments not rolling, ReplicaSets not scaling&lt;/li&gt;
&lt;li&gt;Leader election flapping and log storms across control plane components
None of these point at etcd in your dashboard. They look like scheduler bugs, kubelet problems, or network weirdness. The actual cause is one layer below everything you're checking.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The 4 Failure Modes Nobody Monitors
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1 — Disk Latency
&lt;/h3&gt;

&lt;p&gt;etcd is disk-bound, not CPU-bound. Every write requires an fsync before it acknowledges. Slow IOPS = slow writes = slow API server = slow cluster. The entire call chain collapses to the speed of your disk.&lt;/p&gt;

&lt;p&gt;This is why etcd requires SSD or NVMe. NFS and gp2 EBS will quietly degrade your control plane under load.&lt;/p&gt;

&lt;h3&gt;
  
  
  2 — Quorum Instability
&lt;/h3&gt;

&lt;p&gt;3-node cluster: needs 2 to agree. 5-node: needs 3. Lose quorum and the cluster goes &lt;strong&gt;read-only&lt;/strong&gt; — no writes, no scheduling, no reconciliation.&lt;/p&gt;

&lt;p&gt;Common mistakes: 2-node clusters (zero quorum tolerance), 4-node clusters (same tolerance as 3, more cost), etcd members stretched across high-latency zones. Raft heartbeat timeouts are tuned for &amp;lt;10ms inter-member latency. Exceed that under normal load and you'll see leader elections fire.&lt;/p&gt;

&lt;h3&gt;
  
  
  3 — Large Object Writes
&lt;/h3&gt;

&lt;p&gt;etcd has a 1.5MB per-value default limit and a 2GB total DB limit (8GB max). Both are reachable.&lt;/p&gt;

&lt;p&gt;Usual offenders: CRDs storing runtime state, secrets used as blob storage, ConfigMaps holding multi-MB files. etcd is not an object store. Every oversized write slows the cluster and causes fragmentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  4 — Compaction and Fragmentation
&lt;/h3&gt;

&lt;p&gt;etcd keeps a history of every key revision. Without compaction, the DB grows unbounded. Without defrag after compaction, the on-disk footprint doesn't shrink.&lt;/p&gt;

&lt;p&gt;The pattern: DB grows quietly to several hundred MB, performance softens, nobody connects it to etcd because nothing is explicitly broken. Then a large write event pushes toward the size limit and you have an incident.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 5 Metrics That Actually Matter
&lt;/h2&gt;

&lt;p&gt;If you're only watching CPU and memory on your control plane nodes, you are not monitoring etcd.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What It Tells You&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;etcd_disk_wal_fsync_duration_seconds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;P99 &amp;gt;10ms = warning. P99 &amp;gt;25ms = problem. Most important etcd metric.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;etcd_server_leader_changes_seen_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Should be near zero. Frequent changes = instability.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;etcd_mvcc_db_total_size_in_bytes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Track growth rate. Growing faster than your cluster = something over-writing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;etcd_mvcc_db_total_size_in_use_in_bytes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Large gap vs total size = fragmentation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;etcd_server_slow_apply_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Nonzero and growing = investigate before it becomes an incident.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Rules
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DO:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Dedicated local SSD/NVMe for etcd data directories&lt;/li&gt;
&lt;li&gt;✅ 3 or 5 members — always odd, never 2 or 4&lt;/li&gt;
&lt;li&gt;✅ Monitor fsync latency as your primary health signal&lt;/li&gt;
&lt;li&gt;✅ Automate compaction and defragmentation&lt;/li&gt;
&lt;li&gt;✅ Snapshot etcd — treat it like a production database backup
&lt;strong&gt;DON'T:&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;❌ Co-locate etcd with noisy high-I/O workloads&lt;/li&gt;
&lt;li&gt;❌ Store large payloads in ConfigMaps or Secrets&lt;/li&gt;
&lt;li&gt;❌ Ignore fragmentation growth&lt;/li&gt;
&lt;li&gt;❌ Assume managed etcd (EKS/GKE/AKS) needs no visibility&lt;/li&gt;
&lt;li&gt;❌ Treat etcd as a transparent implementation detail&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Part Most Architectures Skip
&lt;/h2&gt;

&lt;p&gt;Your pods can fail and reschedule. Your nodes can fail and drain. etcd loses quorum and your cluster stops accepting writes — full stop. No automatic recovery, no clever failover, no workload that routes around it.&lt;/p&gt;

&lt;p&gt;Most Kubernetes architectures are designed assuming etcd works. Very few are designed for when it doesn't.&lt;/p&gt;

&lt;p&gt;Treat etcd like the database it is — because it's the most important one in your cluster.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If etcd is slow, Kubernetes lies to you. If etcd is unavailable, Kubernetes stops. If etcd is corrupted, recovery becomes a rebuild problem — not a restart.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is part of the Modern Infrastructure &amp;amp; IaC series at &lt;a href="https://rack2cloud.com" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;. Full post with architecture diagrams and HTML signal cards at &lt;a href="https://rack2cloud.com/etcd-kubernetes-database/" rel="noopener noreferrer"&gt;rack2cloud.com/etcd-kubernetes-database&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>infrastructure</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Operating Gateway API in Production: What the Migration Guides Don't Cover</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Thu, 23 Apr 2026 13:14:15 +0000</pubDate>
      <link>https://dev.to/ntctech/operating-gateway-api-in-production-what-the-migration-guides-dont-cover-2526</link>
      <guid>https://dev.to/ntctech/operating-gateway-api-in-production-what-the-migration-guides-dont-cover-2526</guid>
      <description>&lt;p&gt;You migrated. Traffic is flowing. ReferenceGrants are in place. The controller reconciliation loop is clean. And then — quietly, without a single alert firing — things start breaking in ways your observability stack was never built to see.&lt;/p&gt;

&lt;p&gt;Most Gateway API migration guides end at cutover. That is the wrong place to stop. The real operational surface of gateway API production begins exactly where those guides close — and it is governed by a different set of failure physics than anything Ingress introduced.&lt;/p&gt;

&lt;p&gt;The thesis is explicit: &lt;strong&gt;Gateway API doesn't just change how traffic is routed. It changes where routing failures live — and how invisible they become.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gap Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Part 0 was the decision. Part 1 was the shift. Part 2 was the migration. Part 3 is the reality.&lt;/p&gt;

&lt;p&gt;When you ran Ingress, failures were infrastructure-visible. A misconfigured annotation broke routing and your logs showed it. A missing backend returned a 502 and your alerting fired. The failure surface was shallow and legible.&lt;/p&gt;

&lt;p&gt;Gateway API moves routing failures into the decision layer. HTTPRoutes can be accepted by the controller — syntactically valid, status condition green — while silently misrouting traffic. ReferenceGrants can be deleted during a routine namespace cleanup with no downstream alert. Header matching logic from the annotation era doesn't translate 1:1, and the mismatch produces no error. It just routes incorrectly.&lt;/p&gt;

&lt;p&gt;This is not a tooling gap. It is an architectural one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: What Changes After Gateway API
&lt;/h2&gt;

&lt;p&gt;Ingress failures were infrastructure-visible. Gateway API failures are decision-layer invisible.&lt;/p&gt;

&lt;p&gt;Understanding what your monitoring stack actually covers requires mapping it against three distinct layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Controller Metrics (What You Get)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standard Prometheus scraping covers the controller layer. Reconciliation loop latency, controller health, memory and CPU. This is the layer most teams think of as "Gateway API observability" — and it is the least useful layer for diagnosing production routing failures. A healthy controller reconciliation loop tells you nothing about whether the routing decision it produced is correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Spec State (What You Miss)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HTTPRoute status fields are not surfaced by default in most monitoring stacks. The conditions you need to be watching — &lt;code&gt;Accepted&lt;/code&gt;, &lt;code&gt;ResolvedRefs&lt;/code&gt;, &lt;code&gt;Parents&lt;/code&gt; — exist in the Kubernetes API but require explicit instrumentation. A route in &lt;code&gt;Accepted: True&lt;/code&gt; with a backend in &lt;code&gt;ResolvedRefs: False&lt;/code&gt; will route requests to nothing — and your controller metrics will show green the entire time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — Runtime Behavior (What Actually Matters)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Routing outcomes, backend selection, header and path matching decisions. 200 OK is the new 500: a request that returns a success status from the wrong backend is operationally identical to a silent outage. Runtime behavior requires traffic-level instrumentation — service mesh telemetry, eBPF-based flow data, or access log enrichment — to become visible.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your monitoring stack sees the controller. It does not see the routing decision.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug3mh7rza1zspz1hprf9.jpg" alt="Diagram showing Prometheus monitoring reaching controller layer but not Gateway API routing decision layer" width="800" height="387"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Policy Enforcement at the Gateway Layer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6yceejf5pm028wbdvtw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6yceejf5pm028wbdvtw.jpg" alt="Kubernetes policy enforcement stack diagram showing NetworkPolicy packet level OPA admission time and Gateway API runtime routing authorization" width="800" height="387"&gt;&lt;/a&gt; &lt;br&gt;
Gateway API introduces routing-level trust boundaries, not just network boundaries. The real shift is temporal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NetworkPolicy&lt;/strong&gt; → Packet-level, always-on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPA / Gatekeeper / Kyverno&lt;/strong&gt; → Admission-time, pre-deploy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway API&lt;/strong&gt; → Runtime routing authorization, request-time
&lt;strong&gt;ReferenceGrant is not configuration. It is a security boundary.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A ReferenceGrant deletion — which can happen silently during namespace cleanup, RBAC rotation, or automated resource pruning — immediately collapses cross-namespace routing trust. There is no deprecation window. Traffic stops reaching its backend, and the only signal is a &lt;code&gt;ResolvedRefs: False&lt;/code&gt; condition that most teams aren't alerting on yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Day-2 Failure Patterns
&lt;/h2&gt;

&lt;p&gt;These are not edge cases. These are the failures teams discover in the first 30–60 days of production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1o2c0adfnskb3jcjl67.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1o2c0adfnskb3jcjl67.jpg" alt="Gateway API production failure modes timeline showing discovery windows for five failure patterns in first 60 days" width="800" height="322"&gt;&lt;/a&gt; &lt;br&gt;
&lt;strong&gt;Failure Mode 01 — Route Accepted, Traffic Misrouted&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Accepted: True&lt;/code&gt; means valid configuration — not correct behavior. Backend weight misconfiguration, path prefix overlap, or header match ordering errors produce accepted routes that route to the wrong destination. No alerts fire. Traffic just goes somewhere wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 02 — Cross-Namespace Trust Collapse&lt;/strong&gt;&lt;br&gt;
ReferenceGrant deleted during routine cleanup. Cross-namespace routing immediately fails. The backend is healthy, the controller is healthy, the HTTPRoute status goes &lt;code&gt;ResolvedRefs: False&lt;/code&gt; and traffic stops. Recovery requires manual ReferenceGrant reconstruction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 03 — Header Routing Regression&lt;/strong&gt;&lt;br&gt;
Annotation-era header logic doesn't translate 1:1 to HTTPRoute match semantics. The route is accepted, the match appears correct in the spec, and the wrong backend receives traffic silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 04 — Controller Version Skew&lt;/strong&gt;&lt;br&gt;
Gateway API evolves faster than most controller upgrade cycles. HTTPRoutes that reference unsupported features are accepted but silently not enforced — the spec says it should work, the controller says nothing, and behavior is undefined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 05 — TLS Cert Rotation Gap&lt;/strong&gt;&lt;br&gt;
cert-manager and Gateway API have different mental models of certificate binding. Rotation timing mismatches produce TLS termination failures that appear as backend connectivity issues — not certificate errors — in most monitoring stacks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Cluster and Multi-Tenant Considerations
&lt;/h2&gt;

&lt;p&gt;Gateway API simplifies single-cluster routing. It complicates multi-cluster ownership.&lt;/p&gt;

&lt;p&gt;The fundamental shift at multi-tenant scale: the problem is no longer routing. The problem is &lt;strong&gt;who is allowed to define routes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Gateway-per-team is the operationally cleaner model for most enterprises — blast radius is contained, ReferenceGrant surface is minimal. The shared Gateway model reduces resource overhead but introduces a ReferenceGrant audit problem at scale that platform engineering needs to own, not application teams.&lt;/p&gt;

&lt;p&gt;Cross-cluster route federation remains experimental. Model it as beta operationally, regardless of what the controller documentation claims.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem
&lt;/h2&gt;

&lt;p&gt;Teams think they migrated an ingress layer. What they actually introduced is a new control plane.&lt;/p&gt;

&lt;p&gt;This is the thread that runs through the entire series. The control plane shift isn't a Gateway API phenomenon — it is the defining architectural pattern of this infrastructure era. Every layer that used to be configuration is now a control plane: service meshes, policy engines, GitOps operators, and now routing.&lt;/p&gt;

&lt;p&gt;The teams that operate Gateway API well in production are not the ones with the best controllers. They are the ones that rebuilt their observability model before they needed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gateway API doesn't fail loudly. It fails in decisions your tooling doesn't see.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;Part 0 was the decision. Part 1 was the shift. Part 2 was the migration. Part 3 is the reality — and the reality is that Gateway API production operations require a fundamentally different observability model, a new policy enforcement layer, and an audit discipline that didn't exist when you were running Ingress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DO:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Treat Gateway API as a control plane layer — instrument routing decisions, not just traffic&lt;/li&gt;
&lt;li&gt;Alert on HTTPRoute status conditions — &lt;code&gt;ResolvedRefs: False&lt;/code&gt; is a production incident&lt;/li&gt;
&lt;li&gt;Audit ReferenceGrants continuously — treat deletions as security boundary changes, not cleanup&lt;/li&gt;
&lt;li&gt;Pin controller versions to the Gateway API channel they implement — track skew explicitly&lt;/li&gt;
&lt;li&gt;Own the ReferenceGrant audit function at the platform engineering layer
&lt;strong&gt;DON'T:&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Assume &lt;code&gt;Accepted: True&lt;/code&gt; means working — it means syntactically valid configuration&lt;/li&gt;
&lt;li&gt;Treat migration as completion — cutover is the start of the operational surface, not the end&lt;/li&gt;
&lt;li&gt;Let controller behavior drift from spec assumptions&lt;/li&gt;
&lt;li&gt;Port Ingress annotation logic directly to HTTPRoute without verifying match semantics&lt;/li&gt;
&lt;li&gt;Trust cross-cluster Gateway API federation claims without verifying your controller's implementation channel&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Architecture diagrams and full failure mode breakdown at rack2cloud.com&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 0: &lt;a href="https://www.rack2cloud.com/ingress-nginx-deprecation-what-to-do/" rel="noopener noreferrer"&gt;Ingress-NGINX Deprecation: What to Do Next&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 1: &lt;a href="https://www.rack2cloud.com/gateway-api-kubernetes-controller-decision/" rel="noopener noreferrer"&gt;Gateway API Is the Direction. Your Controller Choice Is the Risk.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 1.5: &lt;a href="https://www.rack2cloud.com/control-plane-shift-infrastructure-decisions-2026/" rel="noopener noreferrer"&gt;The Control Plane Shift&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://www.rack2cloud.com/migrate-ingress-to-gateway-api-production/" rel="noopener noreferrer"&gt;Kubernetes Ingress to Gateway API Migration&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 3: Operating Gateway API in Production ← You Are Here&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloudnative</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Kubernetes Is Not an LLM Security Boundary</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Wed, 22 Apr 2026 12:50:44 +0000</pubDate>
      <link>https://dev.to/ntctech/kubernetes-is-not-an-llm-security-boundary-48d1</link>
      <guid>https://dev.to/ntctech/kubernetes-is-not-an-llm-security-boundary-48d1</guid>
      <description>&lt;p&gt;The CNCF flagged it three days ago. Most teams haven't processed what it actually means.&lt;/p&gt;

&lt;p&gt;Kubernetes lacks built-in mechanisms to enforce application-level or semantic controls over AI systems. That's not a bug. It's not a misconfiguration. It's a category error in how we're thinking about AI workload security.&lt;/p&gt;

&lt;p&gt;Kubernetes isolates containers. It does not isolate decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfb6urf2v8jajo7rokae.jpg" alt="LLM Security Boundary Model — three layers: Infrastructure Boundary, Application Boundary, and LLM Boundary showing where Kubernetes visibility ends" width="800" height="437"&gt; 
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What Kubernetes Actually Controls
&lt;/h2&gt;

&lt;p&gt;To be clear about the problem, you need to be precise about the scope.&lt;/p&gt;

&lt;p&gt;Kubernetes enforces pod isolation, RBAC, network policy, resource limits, and admission control. A well-configured cluster with Cilium, Kyverno, and Falco is genuinely hardened.&lt;/p&gt;

&lt;p&gt;All of those controls operate at the infrastructure layer. None of them understand what an LLM is doing inside that boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three-Layer Problem
&lt;/h2&gt;

&lt;p&gt;Think of it as three distinct boundaries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure Boundary (Kubernetes):&lt;/strong&gt; Controls compute, network, identity. Cannot see model behavior, prompts, or outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application Boundary:&lt;/strong&gt; Controls API access and service logic. Cannot see model reasoning or semantic intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM Boundary — the actual risk layer:&lt;/strong&gt; Controls prompts, outputs, tool usage. This is the layer your current tooling doesn't reach.&lt;/p&gt;

&lt;p&gt;Most teams have the first two layers covered. The third is largely unaddressed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Failure Mode Kubernetes Will Never Catch
&lt;/h2&gt;

&lt;p&gt;Here's the production scenario that matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User submits a prompt with a hidden injection instruction&lt;/li&gt;
&lt;li&gt;Model retrieves internal context via RAG&lt;/li&gt;
&lt;li&gt;Model outputs sensitive internal data in its response&lt;/li&gt;
&lt;li&gt;Response returns &lt;strong&gt;HTTP 200&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No alerts fire. No logs capture what the model decided.
From Kubernetes' perspective: successful request. Pod healthy. RBAC respected. Latency within SLA.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From a security perspective: complete boundary failure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzgvr4dcgnriwd2sapgq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzgvr4dcgnriwd2sapgq.jpg" alt="LLM security boundary failure — five-step scenario showing how a prompt injection attack returns 200 OK with no Kubernetes alerts" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the observability inversion. Traditional monitoring asks: &lt;em&gt;did it run? was it fast? did it error?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LLM observability needs to ask: &lt;em&gt;was it correct? was it safe? was it allowed?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Infrastructure observability measures execution. LLM observability measures outcomes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Actual Boundary Requires
&lt;/h2&gt;

&lt;p&gt;Four control layers need to exist &lt;strong&gt;above&lt;/strong&gt; Kubernetes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingress Control&lt;/strong&gt; — prompt validation and injection filtering before the model sees the request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Egress Control&lt;/strong&gt; — output scanning and PII detection before the response leaves the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Control&lt;/strong&gt; — for agentic systems with tool access, explicit allow-lists scoped per model and context. RBAC governs which service account can call which API. This governs which model, in which context, is permitted to trigger which action. Not the same constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit Control&lt;/strong&gt; — sovereign, immutable inference logging. If your inference logs live in a vendor's platform, you don't fully own the audit trail.&lt;/p&gt;

&lt;p&gt;Emerging implementations like Kong AI Gateway and Portkey are building toward this pattern — but the pattern matters more than the product. These four components need to exist regardless of what implements them.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxulbpg6kcwk320bfi6qq.jpg" alt="LLM Control Plane Pattern — four enforcement components: Ingress Control, Egress Control, Action Control, Audit Control" width="800" height="437"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  When Kubernetes Is Enough
&lt;/h2&gt;

&lt;p&gt;To be honest: there are AI workloads where infrastructure controls are sufficient.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateless, isolated LLM — no persistent context&lt;/li&gt;
&lt;li&gt;No tool access — text output only&lt;/li&gt;
&lt;li&gt;No sensitive context in scope&lt;/li&gt;
&lt;li&gt;No external system impact
If your workload meets all four conditions, your infrastructure boundary largely holds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The moment you add RAG retrieval, tool use, memory, or agentic orchestration — any one of them — you're operating at the LLM Boundary layer, and Kubernetes alone isn't sufficient.&lt;/p&gt;

&lt;p&gt;Most enterprise AI workloads don't meet those conditions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Takeaway
&lt;/h2&gt;

&lt;p&gt;Your Kubernetes security posture is necessary. It is not sufficient for LLM workloads.&lt;/p&gt;

&lt;p&gt;The cluster can be hardened. The model is still non-deterministic. Those are two different problems requiring two different control layers.&lt;/p&gt;

&lt;p&gt;If you're running LLMs on Kubernetes with only infrastructure-layer controls, you have a boundary problem you haven't measured yet. The absence of alerts isn't evidence of safety — it's evidence that your observability doesn't reach the layer where LLM risk lives.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full architecture breakdown including the LLM Security Boundary Model and LLM Control Plane Pattern framework at &lt;a href="https://www.rack2cloud.com/kubernetes-llm-security-boundary/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>security</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>AVS Is a Migration Strategy. Treating It as a Destination Is the Mistake.</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Tue, 21 Apr 2026 12:20:25 +0000</pubDate>
      <link>https://dev.to/ntctech/avs-is-a-migration-strategy-treating-it-as-a-destination-is-the-mistake-2i6d</link>
      <guid>https://dev.to/ntctech/avs-is-a-migration-strategy-treating-it-as-a-destination-is-the-mistake-2i6d</guid>
      <description>&lt;p&gt;Most teams evaluating Azure VMware Solution frame it as an architecture decision.&lt;/p&gt;

&lt;p&gt;It isn't. AVS is a migration strategy — and the moment you start treating it as a destination, the financial and architectural consequences start compounding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Framing Problem
&lt;/h2&gt;

&lt;p&gt;AVS looks like the safe path out of a Broadcom licensing conversation. Your team knows vSphere. Your tooling maps to VMware constructs. You move workloads without retraining anyone or rearchitecting anything.&lt;/p&gt;

&lt;p&gt;What you're not choosing is where to run workloads. You're choosing how hard it will be to leave later.&lt;/p&gt;

&lt;p&gt;AVS feels like staying on-prem — just relocated into Azure's billing model. That's the trap. Because you're not escaping VMware. You're relocating it into a metered, provider-controlled environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AVS doesn't remove lock-in. It changes where the lock-in lives.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0quydzipxgw7y5v7ps3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh0quydzipxgw7y5v7ps3.jpg" alt="Azure VMware Solution architecture — VMware relocated not escaped" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes When You Land on AVS
&lt;/h2&gt;

&lt;p&gt;The familiar operational surface is real. vSphere, vSAN, NSX-T — your ops team recognizes everything they're looking at. Microsoft operates the hardware layer. You operate the guests.&lt;/p&gt;

&lt;p&gt;What you lose is the exit path you had on-prem.&lt;/p&gt;

&lt;p&gt;On-prem exit cost is physical and operational. AVS exit cost is financial, architectural, and contractual — simultaneously. When you eventually leave AVS, you're not executing a migration. You're executing a second transformation: translating VMware constructs to a target platform while simultaneously unwinding a managed service relationship and absorbing Azure egress costs at scale.&lt;/p&gt;

&lt;p&gt;AVS exit is not a migration. It's a second transformation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When AVS Is Correct
&lt;/h2&gt;

&lt;p&gt;There are legitimate use cases — but they're narrower than the sales motion suggests.&lt;/p&gt;

&lt;p&gt;AVS makes sense when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compliance requirements are written around vSphere-specific behaviors and can't be renegotiated&lt;/li&gt;
&lt;li&gt;Your team has deep VMware expertise and no capacity to absorb an operational model shift during migration&lt;/li&gt;
&lt;li&gt;You have a defined, dated exit plan to move off AVS onto native Azure within 3–5 years&lt;/li&gt;
&lt;li&gt;You have specific application workloads with hard VMware dependencies that have no near-term abstraction path
The key phrase is &lt;strong&gt;defined exit plan&lt;/strong&gt;. If you don't have one, AVS becomes your destination by default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Hidden Cost Layer
&lt;/h2&gt;

&lt;p&gt;The published price is for compute. The real cost is in everything around it.&lt;/p&gt;

&lt;p&gt;Dedicated bare metal at a three-node minimum floor. vSAN storage overhead that materially reduces usable capacity. NSX-T licensing embedded in the bill whether you use the full capability stack or not. And the one most teams miss: traffic between AVS and native Azure services isn't always free. At scale, that adds up fast — and it almost never appears in the initial cost modeling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AVS Decision Test
&lt;/h2&gt;

&lt;p&gt;Before finalizing the architecture decision, run one check.&lt;/p&gt;

&lt;p&gt;Are you using AVS to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Buy time for a defined migration?&lt;/strong&gt; — Valid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid retraining your team?&lt;/strong&gt; — Risky deferral.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delay re-architecting legacy workloads?&lt;/strong&gt; — Expensive later.
Only one of these is a strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;AVS as a deliberate bridge with a committed exit timeline is a rational use of the platform. AVS without a defined exit path is deferred lock-in — you've traded Broadcom's licensing model for Microsoft's managed service model, paid for the familiar operational surface, and left yourself with an exit that's more expensive and more complex than what you started with.&lt;/p&gt;

&lt;p&gt;Model the exit before you commit to the entry.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full architectural breakdown — including the trade-off comparison table, exit cost analysis, and native Azure contrast — is on Rack2Cloud: &lt;a href="https://www.rack2cloud.com/azure-vmware-solution-vs-native-azure/" rel="noopener noreferrer"&gt;Azure VMware Solution vs Native Azure: Architecture Trade-offs, Costs, and Exit Risk&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>azure</category>
      <category>vmware</category>
      <category>cloudarchitecture</category>
      <category>devops</category>
    </item>
    <item>
      <title>The Restore Path Is the Most Neglected Part of Backup Design</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Sun, 19 Apr 2026 13:37:47 +0000</pubDate>
      <link>https://dev.to/ntctech/the-restore-path-is-the-most-neglected-part-of-backup-design-la2</link>
      <guid>https://dev.to/ntctech/the-restore-path-is-the-most-neglected-part-of-backup-design-la2</guid>
      <description>&lt;p&gt;The restore path is where backup architectures fail — not the backup job, not the retention policy, not the storage tier.&lt;/p&gt;

&lt;p&gt;This is not an operations failure. It is a design omission.&lt;/p&gt;

&lt;p&gt;Most architectures are designed to write data — not to get it back.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Backup Job Is Not the Goal
&lt;/h2&gt;

&lt;p&gt;Most backup architectures are designed around the protection plane — backup jobs complete, retention windows are enforced, replication targets are confirmed. Dashboards go green. SLA reports are generated. The architecture is declared healthy.&lt;/p&gt;

&lt;p&gt;None of that measures whether recovery actually works.&lt;/p&gt;

&lt;p&gt;A backup job confirms that data was written to a target at a point in time. It tells you nothing about whether that data can be read back under load, whether the application stack can be reconstructed in the correct sequence, whether identity dependencies survive the restore, or whether the recovered state is consistent at the application layer rather than just bootable at the VM layer.&lt;/p&gt;

&lt;p&gt;The restore path is the sequence of operations, dependencies, and decision points between a backup completion event and a verified, production-usable recovered state. It is not a single operation. It is an architecture — and most teams have never designed it.&lt;/p&gt;

&lt;p&gt;A successful backup proves nothing about your ability to recover.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Restore Path Actually Contains
&lt;/h2&gt;

&lt;p&gt;Recovery doesn't fail in one place. It fails across layers that were never designed together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgumrktyjd0q37mzicac4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgumrktyjd0q37mzicac4.jpg" alt="Four-layer restore path model: data retrieval, dependency sequencing, identity bootstrap, and application-layer validation" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A functional restore path has four layers that must be explicitly designed, not assumed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data retrieval.&lt;/strong&gt; Where does the backup live, how long does retrieval take, and what are the network and hydration constraints at scale? Object storage restore speeds differ from on-premises targets by orders of magnitude. Cloud archive tiers introduce retrieval latency that can turn a four-hour RTO into a 48-hour one. The rehydration bottleneck is real — and it belongs in the design, not the postmortem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependency sequencing.&lt;/strong&gt; What order do workloads need to come back online? Databases before application tiers. Identity before anything that authenticates. DNS before anything that resolves. Most organizations have never documented this sequence. The engineers who know it are the ones who happen to be on call during an incident — and that is not an architecture. That is institutional knowledge waiting to walk out the door.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity bootstrap.&lt;/strong&gt; If the production identity plane is compromised or unavailable, what does the recovery environment authenticate against? This is the question that stops most recoveries cold. Ransomware operators understand this — they target the identity plane specifically because a workload that cannot authenticate is not a recovered workload. It is a running VM with no access path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application-layer validation.&lt;/strong&gt; A restored VM that boots is not a recovered application. Application-consistent recovery requires more than a successful backup job — it requires that the restored state is usable at the application layer, not just reachable over the network. Hash validation, restore pipelines, and application-layer health checks must be defined before an incident, not improvised during one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Teams Skip It
&lt;/h2&gt;

&lt;p&gt;The restore path is ignored because it doesn't produce visible success.&lt;/p&gt;

&lt;p&gt;There is no dashboard for "can we actually recover."&lt;/p&gt;

&lt;p&gt;Backup vendors measure protection-plane health because that is what they can instrument. Job completion rates, storage utilization, replication lag — these are real signals about a system that is working as designed. Recovery-plane health requires the organization to design and test it independently. No vendor ships a product that validates your dependency sequencing documentation or your identity bootstrap runbook. That work belongs to the architect.&lt;/p&gt;

&lt;p&gt;The result is a discipline where the visible work gets done and the invisible work gets skipped. Recovery drills exist precisely to surface this gap — but most teams treat them as a compliance exercise rather than an architectural stress test. A drill that confirms the backup is readable is not a recovery test. A recovery test proves the entire restore path — retrieval, sequencing, identity, application validation — executes within the declared RTO under realistic conditions.&lt;/p&gt;

&lt;p&gt;Backup success is easy to measure. Recovery success requires you to prove your assumptions wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F153lyfh422dt3r9r4v4p.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F153lyfh422dt3r9r4v4p.jpg" alt="Protection plane vs recovery plane comparison showing what backup vendors measure versus what architects must design" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Restore Path as a Design Constraint
&lt;/h2&gt;

&lt;p&gt;Recovery is not a procedure problem. It is a constraint problem.&lt;/p&gt;

&lt;p&gt;Your RTO is not a target. It is the output of constraints you probably haven't modeled.&lt;/p&gt;

&lt;p&gt;Those constraints include retrieval throughput ceilings at your backup target tier, hydration time at scale, network path availability between the recovery environment and the backup source, identity availability in an isolated recovery context, and application dependency ordering that cannot be parallelized. Each constraint has a measurable impact on recovery time. Most organizations have modeled none of them.&lt;/p&gt;

&lt;p&gt;The RTO in most DR documentation is not derived from constraint analysis. It is a number someone wrote down during a compliance exercise — unchallenged, untested, and disconnected from the actual physics of the restore path. When the incident arrives, the gap between the documented RTO and the real recovery time is not a surprise. It is the predictable output of skipping the constraint modeling.&lt;/p&gt;

&lt;p&gt;The Three-Layer Resilience Model treats recovery as a distinct architectural layer — Layer 3, with its own design requirements and failure modes, separate from backup and DR. The restore path is the operational expression of that layer. If it has not been designed, Layer 3 does not exist regardless of how many backup jobs are completing successfully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;If your organization has a documented backup architecture and no documented restore path, you have half a data protection design. The backup plane tells you that data exists somewhere. The restore path determines whether you can use it when it matters. Teams that invest in protection-plane completeness without modeling restore-path constraints are not protected — they are insured against a risk they have not actually priced.&lt;/p&gt;

&lt;p&gt;Design the restore path with the same rigor you applied to the backup architecture. If you haven't tested your restore path against real constraints, your RTO isn't a commitment. It's a guess.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.rack2cloud.com/restore-path-backup-design/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dataprotection</category>
      <category>backups</category>
      <category>disasterrecovery</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Agentic AI Has a Control Plane Problem — Because It Became the Control Plane</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Fri, 17 Apr 2026 13:04:36 +0000</pubDate>
      <link>https://dev.to/ntctech/agentic-ai-has-a-control-plane-problem-because-it-became-the-control-plane-dp3</link>
      <guid>https://dev.to/ntctech/agentic-ai-has-a-control-plane-problem-because-it-became-the-control-plane-dp3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnx4g5nvwuw1jep23fsae.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnx4g5nvwuw1jep23fsae.jpg" alt="agentic AI control plane architecture diagram showing agent operating across multiple infrastructure systems without isolation boundary" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agentic AI control plane governance is the architecture problem most teams are not modeling — and the one that will produce the most expensive failures in 2026.&lt;/p&gt;

&lt;p&gt;The control plane became the most sensitive layer in modern infrastructure. So we locked it down.&lt;/p&gt;

&lt;p&gt;Kubernetes gave us control plane isolation — the API server, etcd, and the scheduler separated from the workloads they govern. IAM gave us least-privilege scoping — execution authority bounded to the minimum required. Cloud architecture gave us blast radius containment — failure domains designed to limit the lateral spread of a single misconfiguration or breach.&lt;/p&gt;

&lt;p&gt;We spent a decade building these constraints. They are not theoretical. They are the operational lessons of every infrastructure failure that taught us what happens when execution authority goes ungoverned.&lt;/p&gt;

&lt;p&gt;Agentic AI reintroduces the same problem — without the controls.&lt;/p&gt;




&lt;h2&gt;
  
  
  We Rebuilt an Agentic AI Control Plane and Skipped Every Safeguard
&lt;/h2&gt;

&lt;p&gt;The mapping is direct. Every infrastructure concept that governs how control planes operate has an agentic equivalent. None of them carry the governance model forward.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Infrastructure Concept&lt;/th&gt;
&lt;th&gt;Agentic Equivalent&lt;/th&gt;
&lt;th&gt;What's Missing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Control plane API&lt;/td&gt;
&lt;td&gt;Tool / API invocation&lt;/td&gt;
&lt;td&gt;Policy enforcement layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IAM roles&lt;/td&gt;
&lt;td&gt;Agent credentials&lt;/td&gt;
&lt;td&gt;Scope boundaries, auditability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;etcd / state store&lt;/td&gt;
&lt;td&gt;Memory / vector store&lt;/td&gt;
&lt;td&gt;Versioning, governance, access control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator&lt;/td&gt;
&lt;td&gt;Agent runtime&lt;/td&gt;
&lt;td&gt;Isolation boundary&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgxr3vpv3fzw2xia44llp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgxr3vpv3fzw2xia44llp.jpg" alt="diagram comparing infrastructure control plane governance model to agentic AI equivalent showing missing policy enforcement and isolation layers" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every column on the right exists in agentic systems today. None of them carry the operational discipline that made the left column safe to run in production.&lt;/p&gt;

&lt;p&gt;We spent a decade separating execution from control. Agentic AI collapses that boundary again.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent Is No Longer an Application
&lt;/h2&gt;

&lt;p&gt;This is where the architecture regression becomes a structural risk.&lt;/p&gt;

&lt;p&gt;An application calls an API. An agent invokes tools, persists state, chains actions across systems, and makes decisions that trigger further actions — autonomously, at machine speed, across infrastructure it does not own.&lt;/p&gt;

&lt;p&gt;That is not an application. That is a control plane with execution authority.&lt;/p&gt;

&lt;p&gt;The distinction matters because the entire governance model for applications assumes bounded execution. An application has a defined scope. It calls what it is told to call. It does not decide. An agent decides — and those decisions have downstream effects across every system it can reach.&lt;/p&gt;

&lt;p&gt;Most teams are treating agentic AI as a new class of application. They are deploying it inside the application layer, scoping its credentials like a service account, and monitoring it with the same observability stack they use for stateless workloads.&lt;/p&gt;

&lt;p&gt;This is the architectural mistake. The agent is not operating at application scope. It is operating at control plane scope. And when a control plane runs without isolation, without enforced policy, and without bounded execution authority — you already know how that ends. You've seen it at the infrastructure layer.&lt;/p&gt;

&lt;p&gt;This class of risk has a name: &lt;strong&gt;Unbounded Control Planes&lt;/strong&gt; — a control plane that can create actions, without enforced policy, across systems it does not own.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Kubernetes failed closed. Agentic systems fail open.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faa2xesnls0b8gqtv81uw.jpg" alt="diagram showing unbounded control plane execution scope with agent operating across application layer and infrastructure layer without boundary enforcement" width="800" height="437"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Four Failure Modes That Only Surface in Production
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 01 — Credential Amplification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents aggregate permissions across every tool they can invoke. The effective access scope is broader than any single IAM role you reviewed at deployment. Blast radius is not the agent's scope — it is the union of every system it can reach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 02 — Unbounded Execution Chains&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One prompt becomes twelve API calls across three systems before a human sees any output. Each step can trigger the next. There is no circuit breaker, no step boundary, no re-evaluation gate. The execution chain is only visible after the damage is already distributed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 03 — State Persistence Without Governance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agent memory is not a cache. It is a state layer that shapes every future decision. It is not versioned, not scoped, not audited. When it influences a cross-system action six interactions later, the dependency is invisible — until a failure event forces the trace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure Mode 04 — No Control Plane Isolation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent runtime lives inside the application layer. Its credential scope operates at infrastructure authority. There is no isolation boundary between where the agent executes and what it can modify. The application perimeter does not contain infra-level execution authority.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3matr3a3ywnh2purrj9f.jpg" alt="diagram showing agentic AI blast radius from credential amplification across connected systems without scope boundary" width="800" height="437"&gt; 
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What Architects Need to Get Right (Before This Breaks in Production)
&lt;/h2&gt;

&lt;p&gt;The answer is not a new security framework. It is the governance model you already built for infrastructure — applied deliberately to a layer that is behaving like infrastructure whether you designed it that way or not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Treat Agent Credentials as Control Plane Credentials&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If an agent can invoke APIs, it holds infrastructure authority — not application scope. No shared tokens. No implicit trust. Scoped, auditable, revocable — the same standard you apply to anything that can modify state at the infrastructure layer.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Agent identity is not app identity. It is control plane identity.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Isolate the Agent Runtime from the Systems It Controls&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An agent should not operate inside the same blast radius as the resources it can modify. The execution boundary needs to be explicit — separate runtime, no direct lateral access, mediation layer between the agent and the systems it reaches.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If the agent lives inside your application layer, your control plane is already compromised.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid4h5sy43yri5yljl3z0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid4h5sy43yri5yljl3z0.jpg" alt="architecture diagram showing correct agent runtime isolation with explicit execution boundary and mediation layer between agent and controlled systems" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Govern Memory as State — Not as a Feature&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Persistent memory is not context. It is a state layer that influences future actions across systems. Version it. Scope it. Audit it. Apply the same governance you would apply to any state store that participates in cross-system decision-making.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Unbounded memory creates untraceable behavior.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrziix28u9m10646ipfx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrziix28u9m10646ipfx.jpg" alt="diagram showing agent memory as governed state layer with versioning audit and scope controls versus uncontrolled memory influencing cross-system decisions" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Constrain Execution — Agents Should Not Chain Without Boundaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The risk is not a single action. It is the accumulation of actions across systems without re-evaluation gates. Limit tool chaining. Enforce step boundaries. Require explicit re-evaluation before an agent proceeds across a system boundary.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Unbounded execution is how small decisions become systemic failures.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Reintroduce the Control Plane Boundary — Explicitly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Define where the agent's authority begins and ends before deployment, not after the first production incident. If you do not define the boundary, the agent will — and it will define it as broadly as its credentials allow.&lt;/p&gt;

&lt;p&gt;We did not lose control of infrastructure because systems became complex. We lost control when we stopped enforcing boundaries. Agentic AI removes those boundaries by default. Architects need to put them back — deliberately.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbffcyj161eskr5ydknk.jpg" alt="architecture diagram showing explicitly defined agentic AI control plane boundary with enforced policy gates at system crossing points" width="800" height="437"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;The agent is your agentic AI control plane.&lt;/p&gt;

&lt;p&gt;If your agent can take action across systems, it is part of your control plane — whether you designed it that way or not. The governance model, the isolation requirements, the credential discipline — none of that is optional at control plane scope. You already know this. You built it once. The only question is whether you apply it again before production forces the lesson.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Architecture diagrams and full failure mode breakdown at &lt;a href="https://www.rack2cloud.com/agentic-ai-control-plane-problem/" rel="noopener noreferrer"&gt;rack2cloud.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>kubernetes</category>
      <category>security</category>
    </item>
    <item>
      <title>Kubernetes Ingress to Gateway API Migration: How to Move Without Breaking Production</title>
      <dc:creator>NTCTech</dc:creator>
      <pubDate>Wed, 15 Apr 2026 12:37:36 +0000</pubDate>
      <link>https://dev.to/ntctech/kubernetes-ingress-to-gateway-api-migration-how-to-move-without-breaking-production-67m</link>
      <guid>https://dev.to/ntctech/kubernetes-ingress-to-gateway-api-migration-how-to-move-without-breaking-production-67m</guid>
      <description>&lt;p&gt;Most Gateway API migrations don't fail during the cutover.&lt;/p&gt;

&lt;p&gt;They fail in the translation layer — quietly, before traffic ever moves. The annotation audit skipped. The ingress2gateway output treated as deployment-ready. The staging environment that shared none of the complexity of production. By the time the failure surfaces, it looks like a Gateway API problem. It isn't. It's a migration preparation problem.&lt;/p&gt;

&lt;p&gt;Ingress-NGINX hit EOL on March 24 — the repository is read-only, no patches, no CVE fixes. Kubernetes 1.36 drops April 22 with Gateway API as the centerpiece. The window where this was a future consideration closed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9orr6nrfgxq8mncraebl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9orr6nrfgxq8mncraebl.jpg" alt="migrate ingress to gateway api architecture diagram showing translation layer between flat ingress annotation model and three-tier gateway api resource hierarchy" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Before You Migrate — The Annotation Audit
&lt;/h2&gt;

&lt;p&gt;The annotation count per Ingress resource is the number that determines which migration path is actually viable. Run this before anything else:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszyn8finf0ibhekykgoz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszyn8finf0ibhekykgoz.jpg" alt="Kubernetes ingress annotation complexity audit chart showing three migration risk tiers from simple to high-risk annotation surfaces" width="800" height="437"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Count annotations per ingress resource across all namespaces&lt;/span&gt;
kubectl get ingress &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | &lt;span class="se"&gt;\&lt;/span&gt;
  jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.items[] | "\(.metadata.namespace)/\(.metadata.name): \(.metadata.annotations | length) annotations"'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt;: &lt;span class="nt"&gt;-k2&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three tiers, three different migration realities:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;0–5 annotations&lt;/strong&gt; — ingress2gateway 1.0 handles 80–90%. Most of what lands in your HTTPRoute manifests will be correct. Manual review still required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6–20 annotations&lt;/strong&gt; — partial translation. Common annotations (CORS, backend TLS, path rewrite, regex) are covered. Less common ones — &lt;code&gt;configuration-snippet&lt;/code&gt;, &lt;code&gt;auth-url&lt;/code&gt;, &lt;code&gt;server-snippet&lt;/code&gt; — require architectural decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;20+ annotations&lt;/strong&gt; — the tool cannot help you. What those annotations are collectively doing needs to be understood and redesigned before a single manifest is written.&lt;/p&gt;

&lt;p&gt;Also find shared Ingress resources — single Ingress objects routing 40+ hostnames for multiple teams. These are coordination problems, not migration targets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get ingress &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | &lt;span class="se"&gt;\&lt;/span&gt;
  jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.items[] | select(.spec.rules | length &amp;gt; 5) |
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.rules | length) host rules"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ingress2gateway 1.0 — Syntax Translator, Not Architecture Translator
&lt;/h2&gt;

&lt;p&gt;ingress2gateway 1.0 is a genuine improvement — supports 30+ common Ingress-NGINX annotations with behavioral equivalence tests that verify runtime behavior in live clusters, not just YAML structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ingress2gateway print &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Translates cleanly:&lt;/strong&gt; host/path routing, TLS referencing existing Secrets, CORS headers, backend TLS, path rewrites, regex matching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does not translate:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;nginx.ingress.kubernetes.io/configuration-snippet&lt;/code&gt; — custom Lua, no Gateway API equivalent&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nginx.ingress.kubernetes.io/server-snippet&lt;/code&gt; — server-level config, no direct equivalent&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nginx.ingress.kubernetes.io/auth-url&lt;/code&gt; / &lt;code&gt;auth-signin&lt;/code&gt; — external auth, requires HTTPRoute filter or extension&lt;/li&gt;
&lt;li&gt;ConfigMap global defaults — proxy buffer sizes, upstream keepalive, timeout values don't transfer automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implicit defaults that disappear:&lt;/strong&gt; Ingress-NGINX's ConfigMap applies defaults globally that aren't in your Ingress manifests. They don't transfer. Document your ConfigMap before migration.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Migrate First
&lt;/h2&gt;

&lt;p&gt;Migration sequence matters more than migration speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrate first:&lt;/strong&gt; New services with no Ingress config. Internal services with 2–3 host rules and no custom annotations. These establish the operational pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrate second:&lt;/strong&gt; Services with standard CORS, TLS, and path rewrite annotations ingress2gateway handles cleanly. Validate behavioral equivalence before decommissioning each Ingress resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrate last:&lt;/strong&gt; &lt;code&gt;configuration-snippet&lt;/code&gt; services, external auth integrations, shared Ingress resources, anything with a P1 incident in the last 90 days.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Side-by-Side Pattern — The Only Safe Model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxz2hcjrzztraipkrady9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxz2hcjrzztraipkrady9.jpg" alt="Side-by-side Kubernetes ingress and gateway api deployment pattern showing shared load balancer IP with parallel traffic paths during migration" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cutover-first is an anti-pattern. Both controllers run simultaneously against the same cluster, sharing the same external load balancer IP.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Gateway API CRDs&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml

&lt;span class="c"&gt;# Deploy Gateway API controller alongside existing Ingress controller&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/nginx/nginx-gateway-fabric/releases/download/v1.5.0/nginx-gateway-fabric.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Gateway&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-gateway&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-gateway&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gatewayClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-gateway&lt;/span&gt;
  &lt;span class="na"&gt;listeners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;443&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPS&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terminate&lt;/span&gt;
      &lt;span class="na"&gt;certificateRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-tls&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-gateway&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The one rule:&lt;/strong&gt; Never configure both an Ingress resource and an HTTPRoute for the same hostname and path simultaneously. The two controllers compete for the same traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  HTTPRoute Translation — Before and After
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — Ingress&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-ingress&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app.example.com&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/api&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After — HTTPRoute&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPRoute&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-route&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parentRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-gateway&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-gateway&lt;/span&gt;
  &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.example.com"&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PathPrefix&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/api&lt;/span&gt;
    &lt;span class="na"&gt;filters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;URLRewrite&lt;/span&gt;
      &lt;span class="na"&gt;urlRewrite&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReplacePrefixMatch&lt;/span&gt;
          &lt;span class="na"&gt;replacePrefixMatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
    &lt;span class="na"&gt;backendRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic splitting — native in HTTPRoute, no annotations needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;backendRefs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-stable&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-canary&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;weight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Adjacent Dependencies — Address Before First HTTPRoute
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;cert-manager:&lt;/strong&gt; Requires v1.14.0+ for Gateway API support. Configuration moves from Ingress annotations to Gateway resource annotations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ExternalDNS:&lt;/strong&gt; Requires v0.14.0+ for Gateway API support. DNS records for HTTPRoute hostnames won't be created automatically on older versions — DNS resolution fails silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prometheus/alerting:&lt;/strong&gt; Gateway API controllers expose different metric structures than Ingress-NGINX. Dashboards keyed to Ingress-NGINX metric names won't work without updates.&lt;/p&gt;




&lt;h2&gt;
  
  
  DNS Cutover Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;All services validated under load via HTTPRoutes in side-by-side state&lt;/li&gt;
&lt;li&gt;Keep Ingress resources — rollback safety&lt;/li&gt;
&lt;li&gt;Reduce DNS TTL to 60 seconds — 24 hours before cutover&lt;/li&gt;
&lt;li&gt;Update external DNS record&lt;/li&gt;
&lt;li&gt;Monitor error rates for 30 minutes&lt;/li&gt;
&lt;li&gt;Remove decommissioned Ingress resources after 24 hours of clean traffic — not before&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Production Failure Modes — Works in Staging, Breaks in Prod
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3b4ozwaluiq4ary8h8g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3b4ozwaluiq4ary8h8g.jpg" alt="Four Gateway API migration production failure modes — header routing mismatch, ReferenceGrant missing, TLS handshake surprise, and implicit defaults disappearing" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Header routing mismatch&lt;/strong&gt; — HTTPRoute header matching is exact by default. Ingress-NGINX treats some header matching case-insensitively. Verify your Gateway implementation's behavior explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ReferenceGrant missing&lt;/strong&gt; — the most common failure in multi-team clusters. An HTTPRoute in namespace &lt;code&gt;frontend&lt;/code&gt; referencing a Service in namespace &lt;code&gt;api&lt;/code&gt; requires a ReferenceGrant. Without it: accepted status, 500 response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ReferenceGrant&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow-frontend-routes&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gateway.networking.k8s.io&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPRoute&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;
  &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TLS handshake surprise&lt;/strong&gt; — Ingress-NGINX's TLS defaults (cipher suites, protocol versions) live in the ConfigMap. Gateway API controllers start from their own defaults. Validate TLS behavior against legacy clients explicitly before cutover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implicit defaults disappearing&lt;/strong&gt; — proxy timeouts, upstream keepalive, buffer sizes set in the Ingress-NGINX ConfigMap don't transfer. A service relying on a 600-second proxy timeout reverts to the controller's default silently. Audit the ConfigMap before any service migrates.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;ingress2gateway 1.0 handles straightforward migrations cleanly. The gap it cannot close is between syntax translation and architectural translation. Find the untranslatable annotations during the audit — not during the rollback.&lt;/p&gt;

&lt;p&gt;The side-by-side pattern is the correct one. Both controllers running against the same load balancer IP costs nothing and eliminates the primary risk vector: the all-at-once cutover that discovers production failure modes under incident conditions.&lt;/p&gt;

&lt;p&gt;The migration doesn't fail where you think it will. It fails in everything you assumed would just translate.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the Rack2Cloud Kubernetes Ingress Architecture Series. Full post with interactive examples at rack2cloud.com.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloudnative</category>
      <category>platformengineering</category>
    </item>
  </channel>
</rss>
