<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shani Shoham</title>
    <description>The latest articles on DEV Community by Shani Shoham (@shohams).</description>
    <link>https://dev.to/shohams</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F912018%2F6059bb7d-378f-429d-b8c7-40a71df7579e.jpg</url>
      <title>DEV Community: Shani Shoham</title>
      <link>https://dev.to/shohams</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shohams"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Mon, 02 Feb 2026 16:14:13 +0000</pubDate>
      <link>https://dev.to/shohams/-48fj</link>
      <guid>https://dev.to/shohams/-48fj</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/manas_sharma" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3739096%2F64c63567-d504-47de-b304-1cd488cc2906.jpeg" alt="manas_sharma"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/manas_sharma/nvidia-gpu-monitoring-with-dcgm-exporter-and-openobserve-complete-setup-guide-34k6" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;NVIDIA GPU Monitoring: Catch Thermal Throttling Before It Costs You $50k/Year&lt;/h2&gt;
      &lt;h3&gt;Manas Sharma ・ Feb 1&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#devops&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#monitoring&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#gpu&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#observability&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>devops</category>
      <category>monitoring</category>
      <category>gpu</category>
      <category>observability</category>
    </item>
    <item>
      <title>Kubernetes v1.34: Top 5 Game-Changing Updates That Will Transform Your Container Strategy</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 17 Sep 2025 18:02:00 +0000</pubDate>
      <link>https://dev.to/shohams/kubernetes-v134-top-5-game-changing-updates-that-will-transform-your-container-strategy-1h59</link>
      <guid>https://dev.to/shohams/kubernetes-v134-top-5-game-changing-updates-that-will-transform-your-container-strategy-1h59</guid>
      <description>&lt;p&gt;Kubernetes v1.34 "Of Wind &amp;amp; Will" has officially launched with 58 enhancements, marking one of the most significant releases in recent memory. While previous versions have focused on incremental improvements, v1.34 delivers transformational changes that address long-standing pain points for platform engineers, DevOps teams, and application developers. Let's dive into the five most exciting updates that will reshape how you manage containerized workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Node Swap Support Finally Graduates to Stable: A Resource Management Revolution&lt;/li&gt;
&lt;li&gt;Pod-Level Resource Requests and Limits: Simplifying Multi-Container Management&lt;/li&gt;
&lt;li&gt;In-Place Pod Resize Memory Reduction: The Final Piece of the Puzzle&lt;/li&gt;
&lt;li&gt;Dynamic Resource Allocation Reaches Maturity: GPU and Specialized Hardware Management&lt;/li&gt;
&lt;li&gt;End-to-End Observability: Kubelet and API Server Tracing Graduate to Stable&lt;/li&gt;
&lt;li&gt;Implementation Roadmap: Getting Ready for v1.34&lt;/li&gt;
&lt;li&gt;Conclusion: A New Chapter in Container Orchestration&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Node Swap Support Finally Graduates to Stable: A Resource Management Revolution
&lt;/h2&gt;

&lt;p&gt;After years of evolution from alpha to beta, node swap support is likely to graduate to stable in Kubernetes v1.34, fundamentally changing how we think about memory management in containerized environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;For years, Kubernetes administrators had to disable swap memory entirely, forcing a binary choice: either over-provision memory (expensive) or risk out-of-memory kills (disruptive). Prior to version 1.22, Kubernetes did not provide support for swap memory on Linux systems due to the inherent difficulty in guaranteeing and accounting for pod memory utilization when swap memory was involved.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Technical Breakthrough
&lt;/h3&gt;

&lt;p&gt;The stable release introduces sophisticated swap management with three key modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NoSwap: Kubelet runs on swap-enabled nodes but Pods don't use swap&lt;/li&gt;
&lt;li&gt;LimitedSwap: Automatic swap limits calculated for containers (cgroups v2 only)&lt;/li&gt;
&lt;li&gt;UnlimitedSwap: Removed for stability reasons
‍
The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. The stable release includes comprehensive tuning guidelines for swappiness, min_free_kbytes, and watermark_scale_factor parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real-World Impact
&lt;/h3&gt;

&lt;p&gt;Consider a machine learning workload that needs 32GB during model training but only 4GB during inference. With stable swap support, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with higher memory allocation for training&lt;/li&gt;
&lt;li&gt;Gracefully reduce memory post-training without pod restarts&lt;/li&gt;
&lt;li&gt;Use swap as a safety buffer during unexpected memory spikes&lt;/li&gt;
&lt;li&gt;Achieve better resource utilization across your cluster
‍
### Implementation Considerations
‍
On Linux nodes, Kubernetes only supports running with swap enabled for hosts that use cgroup v2. Ensure your nodes are running cgroup v2 and consider the security implications, as secret content protection against swapping has been introduced to prevent sensitive data from being written to disk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Pod-Level Resource Requests and Limits: Simplifying Multi-Container Management
&lt;/h2&gt;

&lt;p&gt;Defining resource needs for Pods with multiple containers has been challenging, as requests and limits could only be set on a per-container basis. This forced developers to either over-provision resources for each container or meticulously divide the total desired resources.&lt;/p&gt;

&lt;p&gt;Kubernetes v1.34 addresses this with pod-level resource specifications now graduating to beta.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Container Orchestra Problem
&lt;/h3&gt;

&lt;p&gt;Traditional per-container resource management created several challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resource Mathematics: Dividing 2 CPU cores among 5 containers required complex calculations&lt;/li&gt;
&lt;li&gt;Dynamic Workloads: Containers with varying resource needs throughout their lifecycle&lt;/li&gt;
&lt;li&gt;Operational Complexity: Managing dozens of container resource specifications in multi-container pods&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Elegant Solution
&lt;/h3&gt;

&lt;p&gt;With the PodLevelResources feature gate enabled, you can specify resource requests and limits at the Pod level. Kubernetes 1.34 supports resource requests or limits for specific resource types: cpu and/or memory and/or hugepages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1

kind: Pod

metadata:

  name: microservices-pod

spec:

  resources:

    requests:

      cpu: "2"

      memory: "4Gi"

    limits:

      cpu: "4"

      memory: "8Gi"

  containers:

  - name: api-gateway

    image: nginx:latest

    resources:

      requests:

        cpu: "0.5"

        memory: "1Gi"

  - name: cache-service

    image: redis:latest

    # No individual limits - shares pod-level budget

  - name: worker-service

    image: python:3.9

    # Dynamically uses remaining pod resources
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HPA Integration Enhancement
&lt;/h3&gt;

&lt;p&gt;This feature was introduced as alpha in v1.32 and has graduated to beta in v1.34, with HPA now supporting pod-level resource specifications. This means your Horizontal Pod Autoscaler can now make scaling decisions based on aggregate pod resource usage rather than individual container metrics. For teams looking to optimize their &lt;a href="https://www.devzero.io/blog/kubernetes-autoscaling" rel="noopener noreferrer"&gt;Kubernetes autoscaling strategies&lt;/a&gt;, this integration represents a significant step forward in workload efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best Practices for Implementation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Start with Pod-Level: Define overall resource budget first&lt;/li&gt;
&lt;li&gt;Container Specificity: Only specify container-level resources for critical services&lt;/li&gt;
&lt;li&gt;Monitor and Adjust: Use metrics to understand actual resource distribution patterns&lt;/li&gt;
&lt;li&gt;HPA Configuration: Update autoscaling policies to leverage pod-level metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Understanding &lt;a href="https://www.devzero.io/blog/kubernetes-workload-types" rel="noopener noreferrer"&gt;which Kubernetes workload types&lt;/a&gt; benefit most from pod-level resource management will help you prioritize your implementation strategy. Deployments with multiple sidecar containers and StatefulSets with complex resource patterns see the greatest improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. In-Place Pod Resize Memory Reduction: The Final Piece of the Puzzle
&lt;/h2&gt;

&lt;p&gt;While in-place pod resizing graduated to beta in v1.33, v1.34 receives further improvements, including support for decreasing memory usage and integration with Pod-level resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Breaking the Memory Barrier
&lt;/h3&gt;

&lt;p&gt;Memory Decrease: If the memory resize restart policy is NotRequired (or unspecified), the kubelet will make a best-effort attempt to prevent oom-kills when decreasing memory limits, but doesn't provide any guarantees. This cautious approach reflects the complexity of memory management but opens new possibilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Memory Reduction Algorithm
&lt;/h3&gt;

&lt;p&gt;v1.34 introduces sophisticated memory reduction logic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Usage Validation: Check if current memory usage exceeds new limit&lt;/li&gt;
&lt;li&gt;Safety Protocols: Skip resize if memory spike risk detected&lt;/li&gt;
&lt;li&gt;Graceful Degradation: Best-effort prevention of OOM kills&lt;/li&gt;
&lt;li&gt;State Tracking: Enhanced monitoring of resize progress
‍&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Practical Applications
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Scale down Java app after JVM warmup

kubectl patch pod java-app --subresource=resize -p '{

  "spec": {

    "containers": [{

      "name": "java-app",

      "resources": {

        "requests": {"memory": "2Gi"},

        "limits": {"memory": "4Gi"}

      }

    }]

  }

}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Integration with Pod-Level Resources
&lt;/h3&gt;

&lt;p&gt;The combination of pod-level resources and memory reduction creates powerful optimization patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batch Jobs: High memory during processing, low memory during idle&lt;/li&gt;
&lt;li&gt;ML Training: Large memory for data loading, reduced memory for inference&lt;/li&gt;
&lt;li&gt;Development Environments: Dynamic resource allocation based on activity
‍
For organizations currently using &lt;a href="https://www.devzero.io/blog/kubernetes-vpa" rel="noopener noreferrer"&gt;Kubernetes VPA for rightsizing&lt;/a&gt;, the new in-place memory reduction capabilities provide a more seamless alternative to VPA's disruptive recreation approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Dynamic Resource Allocation Reaches Maturity: GPU and Specialized Hardware Management
&lt;/h2&gt;

&lt;p&gt;The core of DRA is targeting graduation to stable in Kubernetes v1.34, representing a quantum leap in how Kubernetes handles GPUs, FPGAs, and other specialized hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beyond Device Plugins
&lt;/h3&gt;

&lt;p&gt;Traditional device plugin architecture had significant limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All-or-Nothing: Entire device allocation only&lt;/li&gt;
&lt;li&gt;No Sharing: Single pod per device&lt;/li&gt;
&lt;li&gt;Limited Metadata: Minimal device information&lt;/li&gt;
&lt;li&gt;Static Allocation: No dynamic resource adjustment
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The DRA Revolution
&lt;/h3&gt;

&lt;p&gt;DRA provides a flexible way to categorize, request, and use devices in your cluster. Using DRA provides benefits like flexible device filtering using common expression language (CEL) to perform fine-grained filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  New API Resources
&lt;/h3&gt;

&lt;p&gt;DRA introduces four key resource types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ResourceClaim: Specific device access requests&lt;/li&gt;
&lt;li&gt;DeviceClass: Categories of available devices&lt;/li&gt;
&lt;li&gt;ResourceClaimTemplate: Template-based device provisioning&lt;/li&gt;
&lt;li&gt;ResourceSlice: Device inventory and availability
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advanced Features in v1.34
&lt;/h3&gt;

&lt;p&gt;Enabling the DRAConsumableCapacity feature gate (introduced as alpha in v1.34) allows resource drivers to share the same device, or even a slice of a device, across multiple ResourceClaims.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: resource.k8s.io/v1

kind: DeviceClass

metadata:

  name: gpu-a100-large

spec:

  selectors:

  - cel:

      expression: 'device.driver == "nvidia.com/gpu" &amp;amp;&amp;amp; device.attributes["memory"] &amp;gt;= "40Gi"'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sophisticated resource allocation capabilities of DRA work hand-in-hand with &lt;a href="https://www.devzero.io/blog/kubernetes-cluster-autoscaler" rel="noopener noreferrer"&gt;Kubernetes cluster autoscaling&lt;/a&gt; to ensure that both nodes and specialized hardware resources scale efficiently based on demand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World GPU Sharing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: resource.k8s.io/v1

kind: ResourceClaim

metadata:

  name: ml-training-gpu

spec:

  devices:

    requests:

    - name: training-gpu

      deviceClassName: gpu-a100-large

      allocationMode: ExactCount

      count: 1

      constraints:

      - matchAttribute: "topology.pcie.slot"

        in: ["slot-1", "slot-2"]  # Prefer specific slots for performance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For teams deploying GPU-intensive workloads at scale, DRA's intelligent device allocation pairs perfectly with modern autoscaling solutions like &lt;a href="https://www.devzero.io/blog/karpenter-guide" rel="noopener noreferrer"&gt;Karpenter for optimal node provisioning&lt;/a&gt;, ensuring the right hardware is available exactly when and where it's needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. End-to-End Observability: Kubelet and API Server Tracing Graduate to Stable
&lt;/h2&gt;

&lt;p&gt;Kubelet Tracing (KEP-2831) and API Server Tracing (KEP-647) are now targeting graduation to stable in the upcoming v1.34 release, providing unprecedented visibility into Kubernetes internal operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Observability Challenge
&lt;/h3&gt;

&lt;p&gt;Debugging Kubernetes issues often felt like archaeology - piecing together fragmented logs from different components to understand what happened. Performance bottlenecks, failed pod starts, and scheduling delays were mysteries wrapped in distributed system complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified Tracing Architecture
&lt;/h3&gt;

&lt;p&gt;Together, these enhancements provide a more unified, end-to-end view of events, simplifying the process of pinpointing latency and errors from the control plane down to the node.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Request Flow Tracking: Follow a pod creation from API server to kubelet to container runtime&lt;/li&gt;
&lt;li&gt;Performance Bottleneck Identification: Pinpoint exactly where delays occur&lt;/li&gt;
&lt;li&gt;Error Correlation: Connect failures across component boundaries&lt;/li&gt;
&lt;li&gt;Capacity Planning: Understand resource utilization patterns
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  OpenTelemetry Integration
&lt;/h3&gt;

&lt;p&gt;The stable release uses industry-standard OpenTelemetry, enabling integration with existing observability stacks like Jaeger, Zipkin, or commercial APM solutions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuration Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1

kind: ConfigMap

metadata:

  name: kubelet-tracing-config

data:

  config.yaml: |

    tracing:

      endpoint: "jaeger-collector:14268"

      samplingRatePerMillion: 1000000  # 100% sampling for debugging

      samplingGroups:

      - name: "pod-lifecycle"

        samplingRatePerMillion: 100000  # 10% for production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation Roadmap: Getting Ready for v1.34
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Assessment (Weeks 1-2)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Audit current resource management patterns&lt;/li&gt;
&lt;li&gt;Identify containers suitable for pod-level resource management&lt;/li&gt;
&lt;li&gt;Evaluate swap requirements and cgroup v2 readiness&lt;/li&gt;
&lt;li&gt;Plan DRA migration for GPU/specialized hardware workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: Testing (Weeks 3-6)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Set up non-production clusters with v1.34&lt;/li&gt;
&lt;li&gt;Test in-place pod resize with memory reduction scenarios&lt;/li&gt;
&lt;li&gt;Validate pod-level resource specifications with existing workloads&lt;/li&gt;
&lt;li&gt;Configure tracing for critical application paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 3: Gradual Rollout (Weeks 7-12)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enable features with conservative configurations&lt;/li&gt;
&lt;li&gt;Monitor performance and stability metrics&lt;/li&gt;
&lt;li&gt;Gradually expand feature usage based on confidence levels&lt;/li&gt;
&lt;li&gt;Update monitoring and alerting for new resource patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: A New Chapter in Container Orchestration
&lt;/h2&gt;

&lt;p&gt;Kubernetes v1.34 represents more than incremental progress - it's a fundamental shift toward more intelligent, flexible, and observable container orchestration. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.&lt;/p&gt;

&lt;p&gt;The convergence of stable swap support, pod-level resource management, enhanced in-place resizing, mature DRA, and comprehensive tracing creates unprecedented opportunities for optimization. Organizations can now achieve better resource utilization, reduced operational complexity, and improved application performance simultaneously.&lt;/p&gt;

&lt;p&gt;As these features stabilize and integrate, we're witnessing the emergence of truly adaptive infrastructure - Kubernetes clusters that can dynamically adjust to workload demands while providing deep insights into their behavior. The future of container orchestration isn't just about managing containers; it's about intelligent resource orchestration that adapts, optimizes, and evolves with your applications.&lt;/p&gt;

&lt;p&gt;Modern platforms like &lt;a href="https://www.devzero.io/" rel="noopener noreferrer"&gt;DevZero's live rightsizing solution&lt;/a&gt; exemplify how these new Kubernetes capabilities can be leveraged to achieve significant cost savings while maintaining performance - representing the next evolution in cloud-native resource optimization.&lt;/p&gt;

&lt;p&gt;What's your experience with these new features? Share your implementation stories and challenges in the comments below.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>containers</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Karpenter vs. Cluster Autoscaler: How They Compare in 2025</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Tue, 16 Sep 2025 19:44:27 +0000</pubDate>
      <link>https://dev.to/shohams/karpenter-vs-cluster-autoscaler-how-they-compare-in-2025-3m72</link>
      <guid>https://dev.to/shohams/karpenter-vs-cluster-autoscaler-how-they-compare-in-2025-3m72</guid>
      <description>&lt;p&gt;Every few years, a new project shows up that causes the Kubernetes ecosystem to rethink the operations mental model. &lt;/p&gt;

&lt;p&gt;In 2018, I was helping a company tame a three-hundred-node cluster with Cluster Autoscaler (CA). Just by using CA, the company saved thousands of dollars a month by pruning idle nodes. &lt;/p&gt;

&lt;p&gt;CA was helping a lot of customers, but it had a few challenges. Then, in 2021, Karpenter was released — the new kid in the block for &lt;a href="https://www.devzero.io/docs/k8s-automation" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; autoscaling. Suddenly, CA wasn’t the only option.&lt;/p&gt;

&lt;p&gt;Fast-forward to today, and both projects are mature enough to run production traffic. They just solve the scaling puzzle from two different perspectives. While CA optimizes inside the constraints of predefined node groups, Karpenter is more flexible and redraws the picture every scheduling cycle. &lt;/p&gt;

&lt;p&gt;With all this in mind, let’s walk through what that means in practice, where each one shines, and how DevZero plugs the gaps none of the open-source tools even attempt to address.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s0g5odv0pyg5jar5pby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s0g5odv0pyg5jar5pby.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Horizontal Pod Autoscaling Is the Foundation
&lt;/h2&gt;

&lt;p&gt;Before diving into where Karpenter and Cluster Autoscaler fit, it's very important that you understand where the &lt;a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/" rel="noopener noreferrer"&gt;Horizontal Pod Autoscaler&lt;/a&gt; (HPA) enters into the picture, since it works hand-in-hand with both tools. &lt;/p&gt;

&lt;p&gt;HPA operates at the workload level, automatically adjusting the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics like CPU utilization, memory usage, or custom metrics.&lt;/p&gt;

&lt;p&gt;However, HPA only manages the number of pods; it doesn't provision new nodes. If there isn't enough capacity in your cluster to schedule the additional pods that HPA wants to create, those pods remain in a "Pending" state.&lt;/p&gt;

&lt;p&gt;This is where node-level &lt;a href="https://www.devzero.io/blog/kubernetes-autoscaling" rel="noopener noreferrer"&gt;autoscaling&lt;/a&gt; becomes essential. HPA creates more pods, while Cluster Autoscaler or Karpenter responds by provisioning the underlying infrastructure to host those pods. HPA handles application scaling, while node autoscalers handle infrastructure scaling. So, with that in mind, let’s continue.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Cluster Autoscaler, and How Does It Work?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/best-practices/cas.html" rel="noopener noreferrer"&gt;Cluster Autoscaler&lt;/a&gt; is a cluster autoscaling solution that has been the default answer for node fleet management since 2016. It watches for pods that the scheduler marks unschedulable then resizes the underlying node group like an AWS Auto Scaling Group, GKE managed instance group, Azure VM scale set, or plain cloud-provider API to fit the demand.&lt;/p&gt;

&lt;p&gt;CA’s design choices feel conservative in the best sense of the word. Why? Because it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trusts the cloud provider to know which instance type to launch&lt;/li&gt;
&lt;li&gt;Works only inside node groups you describe up front&lt;/li&gt;
&lt;li&gt;Waits for configurable cool-down timers before scaling down &lt;/li&gt;
&lt;li&gt;Has knobs for every corner case imaginable to tune the balance between cost, performance, and availability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That caution kept many companies afloat through pandemic traffic spikes, but it also hardcodes yesterday’s assumptions: a node group is homogeneous, nodes launch quite slowly, and the price you pay per core might be predictable (if not using too many different instance types; we’ll talk about this later).&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Karpenter and How Does It Work?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.devzero.io/blog/karpenter-guide" rel="noopener noreferrer"&gt;Karpenter&lt;/a&gt; rips out CA’s assumptions. Instead of stretching or shrinking groups, it opens the entire EC2 (or whichever compute service provider) catalogue on every pass. The controller batches pending Pods, solves their collective constraints — like CPU, memory, taints, topology spreads, and capacity type — and fires a single API call to grab the cheapest instance that fits . When the batch drains, Karpenter re-evaluates the cluster, picks off empty or under-utilized nodes, and terminates them via its consolidation logic.&lt;/p&gt;

&lt;p&gt;The payoff is speed and thrift. Nodes often appear in 30 to 45 seconds and disappear minutes after the last workload drains. Since its release, I’ve seen customers saying they got 25 to 40 percent savings just from bin-packing — and double that when Spot capacity is fair game.&lt;/p&gt;

&lt;p&gt;Cluster Autoscaler Benefits&lt;br&gt;
I’m not here to crown a winner, but I do want to highlight a few points about CA to ultimately help you avoid rushing into Karpenter if you’re not ready yet or don’t need to right now. &lt;/p&gt;

&lt;p&gt;First, let’s consider that — in reality — CA takes an infrastructure-first approach to scaling, which means you define your node infrastructure upfront and the autoscaler works within those predefined boundaries. This approach offers several distinct advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Granular control over scaling behavior: CA provides extensive configuration options that let you fine-tune exactly how scaling decisions are made. You can set different scale-down delay timers for different node groups, configure estimator types to optimize for bin-packing or least-waste strategies, and use expander policies to control which node groups scale first during high-demand periods (or mixing on-demand with Spot instances). This level of control is particularly valuable for those with strict change management processes.&lt;/li&gt;
&lt;li&gt;Battle-tested reliability: Let’s face it: Having been in production since 2016, CA has encountered and solved countless edge cases. Its conservative approach to scaling — i.e., waiting for configurable cooldown periods before making decisions — prevents the volatility that can occur when scaling too aggressively. &lt;/li&gt;
&lt;li&gt;Multi-cloud compatibility: CA's infrastructure-first design makes it naturally compatible with any cloud provider that supports node groups or auto-scaling groups. Whether you're running on AWS, GCP, Azure, or even on-premises Kubernetes distributions, CA can manage your scaling needs using the same familiar node group abstractions.&lt;/li&gt;
&lt;li&gt;Resource budget enforcement: By defining node groups with specific minimum and maximum sizes, CA provides hard limits on resource consumption. This makes it easier to enforce budget constraints or reserve capacity to access better compute prices. It also prevents runaway scaling scenarios that could lead to unexpected cloud bills.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let’s turn our attention to Karpenter to see where it shines and to better understand how it’s different from CA.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuylwjj1ody0sqlxpzap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuylwjj1ody0sqlxpzap.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Karpenter Benefits
&lt;/h2&gt;

&lt;p&gt;Karpenter takes an application-first approach where the workload constraints drive infrastructure decisions rather than the other way around. This fundamental shift in philosophy unlocks several powerful capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic infrastructure selection: Instead of being constrained by predefined node groups, Karpenter evaluates each pod's resource requests, node selectors, affinity rules, and topology constraints then selects candidate instance types from the entire cloud catalog. When a pod requests 4 vCPUs and 8GB of memory, Karpenter might end up provisioning a c5.xlarge, m5.xlarge, or even an m6i.xlarge instance — depending on pricing and availability — all without requiring separate node groups for each possibility.&lt;/li&gt;
&lt;li&gt;Scheduler integration: Karpenter works in tandem with the Kubernetes scheduler, receiving unschedulable pods and using the same constraint-solving logic to determine which nodes to provision (not only how many as CA). This tight integration means that Karpenter understands not just resource requirements but also complex scheduling constraints like pod anti-affinity rules, topology spread constraints, and volume node affinity requirements, leading into launching efficient nodes. So, rather than scaling each node group independently, Karpenter can provision a single diverse node that accommodates multiple different workload types, leading to higher overall cluster efficiency.&lt;/li&gt;
&lt;li&gt;Real-time optimization: Because Karpenter doesn't rely on pre-provisioned node groups, it can reconsider nodes based on allocated resources. It simulates what would happen if pods are evicted to find out if better instance types could be launched. In other words, it could launch smaller or cheaper nodes or remove empty nodes by consolidating workloads onto more cost-effective instances as conditions change. Karpenter continuously evaluates whether existing nodes are optimally utilized and can automatically consolidate workloads onto fewer, more efficient nodes.&lt;/li&gt;
&lt;li&gt;Simplified operational model: The application-first approach means developers focus on defining their workload requirements in pod specifications while Karpenter handles the translation to infrastructure. Teams don't need to understand the intricacies of node group management; they simply specify CPU, memory, and scheduling constraints, and Karpenter provisions appropriate nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core difference lies in the mental model: CA asks, "What infrastructure do I want to manage?" while Karpenter asks, "What do my applications need?" This distinction makes CA ideal for organizations that prioritize infrastructure control and (some) predictability while Karpenter excels in environments where application agility and cost optimization are crucial.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7emejqmliq5gnwpkj1i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7emejqmliq5gnwpkj1i.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where DevZero Fits in the Autoscaling Landscape
&lt;/h2&gt;

&lt;p&gt;So far, we’ve looked at how Karpenter and CA approach the challenge of scaling Kubernetes clusters. But what if your scaling challenges go beyond than just than just node or pod scaling? That’s where &lt;a href="https://www.devzero.io/" rel="noopener noreferrer"&gt;DevZero&lt;/a&gt; enters the picture, offering a broader, more flexible approach to resource &lt;a href="https://www.devzero.io/blog/orchestration-basics-tool-functionality-devops-teams-need" rel="noopener noreferrer"&gt;orchestration&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;DevZero doesn’t just react to workload demands; it orchestrates resources across your entire stack, making it possible to scale at the cluster, node, and workload levels. For instance, DevZero uses machine learning to predict future workload needs, to then dynamically adjust CPU, memory, and even GPU allocations for individual conrtainers without restarts. Your applications get exactly what they need, when they need it, in real-time. Unlike traditional scaling that restarts workloads when moving them between nodes, DevZero uses a technology to snapshot running processes, preserving memory state, TCP connections, and filesystem state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxrtmewa6wa348j13x27.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxrtmewa6wa348j13x27.gif" alt=" " width="600" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What really sets DevZero apart is its multi-cloud support and cost visibility. You’re not tied to a single cloud provider; DevZero orchestrates environments across AWS, Azure, GCP, and on-premises clusters — &lt;a href="https://www.devzero.io/docs/platform/getting-started/platform" rel="noopener noreferrer"&gt;all from a single platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's more about &lt;a href="https://www.devzero.io/blog/what-makes-devzero-different" rel="noopener noreferrer"&gt;what makes DevZero unique from other autoscalers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Karpenter and CA both deliver on the promise of efficient, automated scaling for Kubernetes clusters, but they approach the problem from fundamentally different angles. &lt;/p&gt;

&lt;p&gt;CA is a good choice if you value predictable infrastructure, granular control, and multi-cloud compatibility. Its infrastructure-first model works well for organizations with strict requirements around node types and change management. &lt;/p&gt;

&lt;p&gt;Karpenter, on the other hand, is built for teams that want to move fast and let application needs drive infrastructure decisions. Its application-first approach means less upfront configuration, more flexibility, and the ability to optimize for cost and performance in real time. &lt;/p&gt;

&lt;p&gt;DevZero sits above both, orchestrating resources at the cluster, node, and workload levels. It brings multi-cloud support and live migration, enabling teams to seamlessly shift workloads between environments.&lt;/p&gt;

&lt;p&gt;Ultimately, the best tool depends on your priorities: control and predictability, flexibility and efficiency, or broad orchestration and visibility. &lt;/p&gt;

&lt;p&gt;To help you visualize these differences and give you a better idea of which tools might work best for your unique circumstances, here’s a comparison table:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcng9vopr9qm92paesp1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcng9vopr9qm92paesp1n.png" alt=" " width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>karpenter</category>
      <category>autoscaler</category>
      <category>cloud</category>
    </item>
    <item>
      <title>A Complete Guide to Karpenter: Everything You Need to Know</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Tue, 16 Sep 2025 19:22:53 +0000</pubDate>
      <link>https://dev.to/shohams/a-complete-guide-to-karpenter-everything-you-need-to-know-453g</link>
      <guid>https://dev.to/shohams/a-complete-guide-to-karpenter-everything-you-need-to-know-453g</guid>
      <description>&lt;p&gt;Modern Kubernetes workloads need elasticity. Static node groups often waste resources or introduce bottlenecks. That’s where Karpenter steps in. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler/" rel="noopener noreferrer"&gt;Karpenter&lt;/a&gt; is an open-source autoscaler built by AWS. It dynamically provisions the right-sized compute capacity for your Kubernetes clusters based on real-time demands. Whether you’re running workloads on AWS, Azure, or GKE, Karpenter simplifies cluster scaling while reducing costs and operational overhead.&lt;/p&gt;

&lt;p&gt;In this comprehensive guide, we’ll cover how Karpenter works, walk through real-world setup steps, share best practices, highlight limitations, and explore alternatives. We’ll also discuss how DevZero can simplify your development environments by integrating seamlessly with Karpenter-backed infrastructure.&lt;/p&gt;

&lt;p&gt;Let’s dive in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Karpenter?
&lt;/h2&gt;

&lt;p&gt;Karpenter is an open-source &lt;a href="https://www.devzero.io/blog/kubernetes-autoscaling" rel="noopener noreferrer"&gt;Kubernetes autoscaler&lt;/a&gt; created by AWS. It automatically provisions compute capacity in response to unschedulable pods. This ensures workloads always receive the resources they need without manual intervention or complex node group configurations.&lt;/p&gt;

&lt;p&gt;Key Features&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic Node Provisioning: Instantly launches nodes tailored to pending pod requirements.&lt;/li&gt;
&lt;li&gt;No Predefined Node Groups: Simplifies infrastructure setup by eliminating the need for manual node group definitions.&lt;/li&gt;
&lt;li&gt;Intelligent Scheduling: Selects optimal instance types, zones, and capacity types (e.g., Spot and On-Demand).&lt;/li&gt;
&lt;li&gt;Cloud-Native: Currently, AWS (via Amazon EKS) is the only cloud provider officially supported by the Karpenter maintainers. While support for Azure and GKE exists through community-driven or experimental CRDs, these are not officially stable or maintained.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike the &lt;a href="https://www.devzero.io/blog/kubernetes-cluster-autoscaler" rel="noopener noreferrer"&gt;Cluster Autoscaler&lt;/a&gt;, Karpenter does not require predefining node groups, making it faster, simpler, and more efficient. It intelligently selects instance types, zones, and capacity types like Spot and On-Demand to meet your pod's needs in a cost-effective way.&lt;/p&gt;

&lt;p&gt;Karpenter is particularly effective in environments where workloads are unpredictable or highly variable. Its ability to provision nodes quickly without needing rigid node group definitions allows developers and SREs to reduce infrastructure toil and focus more on building applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jfyauklapq8566iahhm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jfyauklapq8566iahhm.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;
Karpenter’s NodePools define the constraints and behavior for the nodes that Karpenter can provision.



&lt;h2&gt;
  
  
  How Does Karpenter Work?
&lt;/h2&gt;

&lt;p&gt;Karpenter works by monitoring the Kubernetes scheduler for pods stuck in a pending state — typically because there aren’t enough available resources to schedule them. It then analyzes each pod’s resource requests, affinity rules, and taints to determine the optimal computer resources needed, allowing it to dynamically provision the right infrastructure at the right time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Karpenter Lifecycle
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Detection: Karpenter monitors unscheduled pods in real time.&lt;/li&gt;
&lt;li&gt;Constraint Evaluation: It evaluates the pod's resource requirements, including CPU, memory, tolerations, affinity rules, and labels.&lt;/li&gt;
&lt;li&gt;Instance Matching: Using these constraints, Karpenter selects optimal instance types across availability zones, capacity types (e.g., On-Demand and Spot), and architectures.&lt;/li&gt;
&lt;li&gt;Provisioning: It provisions nodes using the cloud provider’s API (such as EC2 for AWS).&lt;/li&gt;
&lt;li&gt;Node Bootstrapping: Nodes are initialized with the appropriate configurations and join the cluster.&lt;/li&gt;
&lt;li&gt;Scheduling: Pods are scheduled onto the new node as soon as it's ready.&lt;/li&gt;
&lt;li&gt;Deprovisioning: Idle nodes are removed after a defined TTL (time-to-live) to reduce costs.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to Get Started With Karpenter: How to Install and Configure Karpenter on AWS EKS (Step-by-Step Guide)
&lt;/h2&gt;

&lt;p&gt;Getting started with Karpenter requires a combination of infrastructure preparation, permission management, and deploying Karpenter into your cluster. The following steps walk you through the process on AWS with EKS (you can easily replicate the same steps with other cloud providers). Each step includes the command, explanation, and reasoning behind it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A running EKS cluster (with access to its name and endpoint).&lt;/li&gt;
&lt;li&gt;IAM OIDC provider enabled for your EKS cluster.&lt;/li&gt;
&lt;li&gt;CLI tools installed: kubectl, awscli, eksctl, and helm.&lt;/li&gt;
&lt;li&gt;Sufficient IAM permissions to create roles, policies, and service accounts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Tag Your Subnets for Discovery
&lt;/h3&gt;

&lt;p&gt;Karpenter uses tagged subnets to know where it can provision compute resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 create-tags \
  --resources subnet-0123456789abcdef0 subnet-0fedcba9876543210 \
  --tags Key=karpenter.sh/discovery,Value=my-cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code snippet above, replace the subnet IDs and my-cluster with your actual cluster name. This tag signals to Karpenter which subnets are eligible for node provisioning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create an IAM Role for Karpenter
&lt;/h3&gt;

&lt;p&gt;Create a service account that Karpenter will use to provision compute nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create iamserviceaccount \
    --cluster my-cluster \
    --namespace karpenter \
    --name karpenter \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess \
    --approve \
    --override-existing-serviceaccounts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;This IAM role grants Karpenter the necessary permissions to interact with EC2, provision instances, and pull images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Add the Karpenter Helm Repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add karpenter https://charts.karpenter.sh
helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the Karpenter Helm chart available to your cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Install Karpenter Using Helm
&lt;/h3&gt;

&lt;p&gt;Install Karpenter into your Kubernetes cluster with the appropriate values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install karpenter karpenter/karpenter \
    --namespace karpenter \
    --create-namespace \
    --set controller.clusterName=my-cluster \
    --set controller.clusterEndpoint=https://XYZ.gr7.us-west-2.eks.amazonaws.com \
    --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::1234567890:role/karpenter-role
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;/p&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Replace my-cluster with your actual cluster name.&lt;/li&gt;
&lt;li&gt;Use your actual API server endpoint.&lt;/li&gt;
&lt;li&gt;The IAM role should match the one created in Step 2.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 5: Create a Karpenter NodePool and NodeClass
&lt;/h3&gt;

&lt;p&gt;Karpenter requires both a NodePool and a NodeClass resource. Here’s an example of each:&lt;/p&gt;

&lt;h4&gt;
  
  
  NodeClass YAML
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.k8s.aws/v1beta1
kind: AWSNodeClass
metadata:
  name: default
spec:
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        aws:eks:cluster-name: my-cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;/p&gt;

&lt;h4&gt;
  
  
  NodePool YAML
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m5", "m6a"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-west-2a", "us-west-2b"]
  limits:
    cpu: 1000
    memory: 2000Gi
  ttlSecondsAfterEmpty: 300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;/p&gt;

&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The AWSNodeClass configures subnet and security group selectors.&lt;/li&gt;
&lt;li&gt;The NodePool defines the constraints for which types of instances can be provisioned, including instance family, capacity type, and availability zones.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 6: Apply the Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f nodeclass.yaml
kubectl apply -f nodepool.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Test the Autoscaler with a Deployment
&lt;/h3&gt;

&lt;p&gt;Deploy a workload that requires more capacity than your current cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create deployment large-app --image=nginx --replicas=30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Karpenter will detect that the current cluster does not have enough resources, and it will provision new nodes based on the NodePool constraints.&lt;/p&gt;

&lt;p&gt;With these steps, you’ll have Karpenter up and running on AWS, automatically scaling your workloads with flexible, intelligent computing provisioning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7bwx4qatt8c50ijz18m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk7bwx4qatt8c50ijz18m.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;
Karpenter’s NodePools define the constraints and behavior for the nodes that Karpenter can provision.



&lt;h2&gt;
  
  
  What Are Karpenter NodePools &amp;amp; How Do You Set Them Up?
&lt;/h2&gt;

&lt;p&gt;Karpenter’s NodePools define the constraints and behavior for the nodes that Karpenter can provision. Each NodePool acts as a template for provisioning nodes tailored to specific workload types.&lt;/p&gt;

&lt;p&gt;NodePools control attributes such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instance types&lt;/li&gt;
&lt;li&gt;Availability zones&lt;/li&gt;
&lt;li&gt;Architecture (e.g., amd64 and arm64)&lt;/li&gt;
&lt;li&gt;Taints and labels&lt;/li&gt;
&lt;li&gt;Limits for total CPU and memory usage&lt;/li&gt;
&lt;li&gt;Node expiration behavior using TTL values&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prerequisites for Creating a NodePool
&lt;/h3&gt;

&lt;p&gt;Before configuring a NodePool with Karpenter, ensure the following prerequisites are met:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Karpenter is installed and running in your Kubernetes cluster.&lt;/li&gt;
&lt;li&gt;A functioning Kubernetes cluster (e.g., EKS, GKE, or AKS) with workload pods that require Kubernetes autoscaling.&lt;/li&gt;
&lt;li&gt;IAM roles and permissions are properly set up (especially on AWS) to allow Karpenter to provision and terminate compute resources.&lt;/li&gt;
&lt;li&gt;Networking components such as subnets and security groups are tagged and available for use by the autoscaler.&lt;/li&gt;
&lt;li&gt;kubectl is configured to interact with your cluster and has sufficient RBAC privileges to apply custom resource definitions like NodePool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the environment is prepared, you can begin defining NodePools to manage how and when nodes are provisioned.&lt;/p&gt;

&lt;h3&gt;
  
  
  NodePool Configuration
&lt;/h3&gt;

&lt;p&gt;Let’s walk through a real-world example of how to define a NodePool in Karpenter. This configuration file sets the rules for provisioning nodes that support your application workloads. It includes criteria like instance types, zones, taints, and resource limits.&lt;/p&gt;

&lt;p&gt;Once this YAML file is applied, Karpenter will use it as a blueprint when deciding how and where to spin up new nodes to satisfy your cluster’s computing demands.&lt;/p&gt;

&lt;p&gt;To set up your Karpenter NodePool, use this YAML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: dev-workloads
spec:
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m5.large", "m5.xlarge"]
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-west-2a", "us-west-2b"]
      labels:
        env: dev
      taints:
        - key: "env"
          value: "dev"
          effect: "NoSchedule"
  limits:
    cpu: 500
    memory: 1000Gi
  ttlSecondsAfterEmpty: 300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  YAML File Explanation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;requirements: This section allows you to specify which instance types and availability zones Karpenter should consider when provisioning nodes. In the example above, it restricts provisioning to &lt;code&gt;m5.large&lt;/code&gt; and &lt;code&gt;m5.xlarge&lt;/code&gt; instances within the &lt;code&gt;us-west-2a&lt;/code&gt; and &lt;code&gt;us-west-2b&lt;/code&gt; zones. This gives you control over cost, performance, and regional redundancy.&lt;/li&gt;
&lt;li&gt;labels: These are applied to all nodes that Karpenter provisions using this &lt;code&gt;NodePool&lt;/code&gt;. Labels like &lt;code&gt;env: dev&lt;/code&gt; help in categorizing and selecting nodes for specific workloads.&lt;/li&gt;
&lt;li&gt;taints: Taints prevent pods from being scheduled on a node unless the pod explicitly tolerates them. The &lt;code&gt;NoSchedule&lt;/code&gt; effect means that only pods with matching tolerations for env=dev can be placed on these nodes. This allows for fine-grained placement control.&lt;/li&gt;
&lt;li&gt;limits: Sets the maximum cumulative resources (CPU and memory) that can be provisioned by this NodePool. In this case, it restricts Karpenter to spinning up nodes that total no more than 500 vCPUs and 1000Gi of RAM.&lt;/li&gt;
&lt;li&gt;ttlSecondsAfterEmpty: Defines how long a node should stay alive after it becomes empty (i.e., has no pods running). Here, it’s set to 300 seconds (5 minutes), helping you reduce cloud costs by removing idle nodes on time. This also filters eligible instance types and zones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apply the YAML file using this &lt;code&gt;kubectl&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f nodepool.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;NodePools allows you to design infrastructure that matches your workload patterns, cost goals, and reliability needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq49lf3rgs01nw3ghs4a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq49lf3rgs01nw3ghs4a.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;
Autoscaling NodePools are ideal when you're running stateless applications, batch jobs, or services with variable demand.



&lt;h2&gt;
  
  
  Creating a NodePool for Autoscaling (Step-by-Step Guide)
&lt;/h2&gt;

&lt;p&gt;Autoscaling NodePools are ideal when you're running stateless applications, batch jobs, or services with variable demand. These pools help manage workloads without manual intervention, making your Kubernetes cluster more cost-effective and responsive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Choose Instance Types and Define Resource Limits
&lt;/h3&gt;

&lt;p&gt;Start by selecting a few instance types that meet your workload requirements. Use common families like t3, m5, or c5 for general-purpose workloads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: autoscaling-pool
spec:
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["t3.medium", "t3.large", "m5.large"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-west-2a", "us-west-2b"]
      labels:
        env: autoscale
      taints:
        - key: "autoscale"
          value: "true"
          effect: "NoSchedule"
  limits:
    cpu: 400
    memory: 800Gi
  ttlSecondsAfterEmpty: 120
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Explanation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;requirements&lt;/code&gt; block ensures that Karpenter only selects supported zones and instance types.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;taints&lt;/code&gt; and &lt;code&gt;labels&lt;/code&gt; help direct eligible pods to these autoscaling nodes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;limits&lt;/code&gt; cap how many vCPUs and GiB of memory this pool is allowed to provision.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ttlSecondsAfterEmpty&lt;/code&gt; defines how long idle nodes will persist before being terminated.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Apply the NodePool Manifest
&lt;/h3&gt;

&lt;p&gt;Save the YAML as &lt;code&gt;autoscaling-pool.yaml&lt;/code&gt; and apply it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f autoscaling-pool.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify it has been created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodepools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Create a Workload to Trigger Autoscaling
&lt;/h3&gt;

&lt;p&gt;Deploy a workload that requires more capacity than is currently available in your cluster. This simulates real autoscaling behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create deployment web --image=nginx --replicas=20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With 20 replicas, the Kubernetes scheduler will place pods until capacity is full. Karpenter detects the pending pods and provisions new nodes according to the autoscaling pool's rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Add Tolerations to Your Pods
&lt;/h3&gt;

&lt;p&gt;To allow your pods to run on nodes with specific taints, define tolerations in your deployment spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec:
  template:
    spec:
      tolerations:
        - key: "autoscale"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update your deployment using &lt;code&gt;kubectl apply -f&lt;/code&gt; with the updated spec.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Monitor Node Provisioning and Scheduling
&lt;/h3&gt;

&lt;p&gt;Use the following commands to monitor the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -o wide
kubectl get nodes -l env=autoscale
kubectl describe node &amp;lt;node-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, monitor Karpenter logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100 -f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that nodes are provisioned and your workloads are scheduled as expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Clean Up Resources (Optional)
&lt;/h3&gt;

&lt;p&gt;Once testing is complete, you may want to delete the deployment and NodePool to prevent resource consumption.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl delete deployment web
kubectl delete -f autoscaling-pool.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step-by-step setup gives you fine-grained control over how Kubernetes scales under dynamic workloads using Karpenter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Using Karpenter
&lt;/h2&gt;

&lt;p&gt;To get the most from Karpenter, consider the following practices:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tag Subnets and Security Groups Correctly
&lt;/h3&gt;

&lt;p&gt;Karpenter relies on discovery tags to identify which subnets and security groups to use for provisioning. On AWS, make sure your private subnets are tagged appropriately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 create-tags \
  --resources &amp;lt;subnet-id&amp;gt; \
  --tags Key=karpenter.sh/discovery,Value=&amp;lt;cluster-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the tag &lt;code&gt;karpenter.sh/discovery&lt;/code&gt; is essential. Otherwise, Karpenter won’t recognize the subnet as eligible.‍&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use Workload-Specific NodePools
&lt;/h3&gt;

&lt;p&gt;Segment workloads based on their requirements (e.g., GPU workloads, batch jobs, production, and staging, among others). Define separate NodePools for each workload type, applying appropriate taints and labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu-workloads
spec:
  template:
    spec:
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["p3.2xlarge"]
      labels:
        workload: gpu
      taints:
        - key: "workload"
          value: "gpu"
          effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pods targeting GPU workloads should include matching tolerations and node selectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Enable Spot Instance Flexibility
&lt;/h3&gt;

&lt;p&gt;Use Spot capacity for cost-sensitive or interruptible workloads. Add Spot capacity type to NodePool requirements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- key: karpenter.sh/capacity-type
  operator: In
  values: ["spot"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;ttlSecondsUntilExpired&lt;/code&gt; in combination with &lt;code&gt;ttlSecondsAfterEmpty&lt;/code&gt; to balance cost and availability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ttlSecondsUntilExpired: 21600  # 6 hours
  ttlSecondsAfterEmpty: 300      # 5 minutes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‍&lt;br&gt;
While TTLs are useful for basic lifecycle management, newer versions of Karpenter support more advanced consolidation strategies, such as &lt;code&gt;consolidationPolicy: WhenUnderutilized&lt;/code&gt;. This approach intelligently removes underutilized nodes based on real-time usage, making it more suitable for production environments where cost efficiency and resource optimization are critical. Consider using &lt;code&gt;consolidationPolicy&lt;/code&gt; instead of, or in addition to, TTLs for more intelligent scaling.&lt;/p&gt;

&lt;p&gt;Sample YAML code to implement these two strategies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.sh/v1beta1
kind: Provisioner
metadata:
  name: default
spec:
  ttlSecondsUntilExpired: 21600  # 6 hours
  ttlSecondsAfterEmpty: 300      # 5 minutes
  consolidationPolicy: WhenUnderutilized
  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["t3.medium", "t3.large"]
  providerRef:
    name: default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Set TTLs Strategically
&lt;/h3&gt;

&lt;p&gt;TTLs determine how long empty or expired nodes should remain in the cluster. Setting these values helps reduce idle compute waste:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ttlSecondsAfterEmpty: 180  # Automatically deletes idle nodes after 3 minutes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choose longer TTLs for workloads that experience frequent short-lived spikes to prevent churn.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Avoid Node Drift with Taints, Labels, and Affinity
&lt;/h3&gt;

&lt;p&gt;Without guardrails, workloads may land on unintended nodes. Use labels and taints to prevent drift:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;labels:
  workload: "batch"
taints:
  - key: "workload"
    value: "batch"
    effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And ensure your pods specify matching tolerations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec:
  tolerations:
    - key: "workload"
      operator: "Equal"
      value: "batch"
      effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Use Limits to Control Costs
&lt;/h3&gt;

&lt;p&gt;To avoid runaway provisioning and the exorbitant costs that come with it, define limits for CPU and memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;limits:
  cpu: 1000       # 1000 vCPU
  memory: 1000Gi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Karpenter will not provision nodes that push the total above these limits for the NodePool.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Monitor Logs and Events
&lt;/h3&gt;

&lt;p&gt;Track autoscaling decisions using built-in monitoring. On AWS, use CloudWatch Logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter-controller
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Disadvantages and Limitations of Karpenter
&lt;/h2&gt;

&lt;p&gt;While Karpenter simplifies autoscaling, it’s not without trade-offs, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Still Maturing: As a newer tool, it lacks the long-standing stability of Cluster Autoscaler.&lt;/li&gt;
&lt;li&gt;Cloud Provider Limitations: Non-AWS environments may face bugs or require custom configurations.&lt;/li&gt;
&lt;li&gt;IAM Complexity: AWS integration demands fine-tuned IAM permissions.&lt;/li&gt;
&lt;li&gt;Reactive Scaling: It doesn’t support predictive or scheduled autoscaling.&lt;/li&gt;
&lt;li&gt;Learning Curve: YAML-based configuration is flexible but introduces complexity.&lt;/li&gt;
&lt;li&gt;Over-Provisioning Risk: Misconfigured constraints can lead to unnecessary resource usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cross-Platform Support: AWS, Azure, and GKE
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Karpenter on AWS (EKS)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fully supported with mature Helm charts and documentation.&lt;/li&gt;
&lt;li&gt;Utilizes IAM roles for service accounts.&lt;/li&gt;
&lt;li&gt;Can provision Spot and On-Demand EC2 instances.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Karpenter on Azure (AKS)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires workload identity setup.&lt;/li&gt;
&lt;li&gt;Must manually configure custom resource definitions.&lt;/li&gt;
&lt;li&gt;Some features (like Spot fallback) are in the early stages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Karpenter on Google Kubernetes Engine (GKE)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Less official support than AWS.&lt;/li&gt;
&lt;li&gt;Requires workload identity federation.&lt;/li&gt;
&lt;li&gt;Custom bootstrap scripts are often necessary.&lt;/li&gt;
&lt;li&gt;Still a work in progress for production environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For critical workloads, Karpenter on AWS is currently the most reliable and well-supported option.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Do I Use Karpenter with Google Kubernetes Engine? (Step-by-Step Guide)
&lt;/h2&gt;

&lt;p&gt;Karpenter has native support for AWS, but it can also be configured to work with Google Kubernetes Engine (GKE). Though not officially supported to the same level as AWS, you can still get it working with some setup steps. GKE users benefit from using Karpenter for flexible, dynamic autoscaling that goes beyond the capabilities of GKE’s built-in node autoscaling.&lt;/p&gt;

&lt;p&gt;Here’s how to set up Karpenter on GKE with detailed steps, configuration, and sample YAML files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A Google Cloud project with billing enabled.&lt;/li&gt;
&lt;li&gt;The gcloud CLI installed and authenticated.&lt;/li&gt;
&lt;li&gt;Kubernetes CLI (kubectl) configured to interact with your GKE cluster.&lt;/li&gt;
&lt;li&gt;Helm installed for managing Kubernetes applications.&lt;/li&gt;
&lt;li&gt;GKE cluster created with Workload Identity enabled.&lt;/li&gt;
&lt;li&gt;Sufficient IAM permissions to create service accounts, bindings, and roles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Create a GKE Cluster with Workload Identity Enabled
&lt;/h3&gt;

&lt;p&gt;This enables Karpenter to use a Kubernetes service account that impersonates a Google service account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud container clusters create karpenter-gke \
  --workload-pool="my-project.svc.id.goog" \
  --zone=us-central1-a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code snippet above, replace my-project with your actual GCP project ID. This step sets up a GKE cluster with Workload Identity, which allows secure communication between Kubernetes workloads and Google Cloud services without long-lived credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create a Google Service Account (GSA) and Bind IAM Roles
&lt;/h3&gt;

&lt;p&gt;Karpenter needs permissions to create and delete VMs, manage networking, and access metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud iam service-accounts create karpenter-sa

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:karpenter-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/compute.instanceAdmin.v1"

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:karpenter-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:karpenter-sa@my-project.iam.gserviceaccount.com" \
  --role="roles/container.nodeServiceAccount"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This grants the GSA the permissions needed to provision and manage VM instances that serve as Kubernetes nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Bind the Google Service Account to a Kubernetes Service Account
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create namespace karpenter

kubectl create serviceaccount karpenter \
  --namespace karpenter

gcloud iam service-accounts add-iam-policy-binding karpenter-sa@my-project.iam.gserviceaccount.com \
  --member="serviceAccount:my-project.svc.id.goog[karpenter/karpenter]" \
  --role="roles/iam.workloadIdentityUser"

kubectl annotate serviceaccount karpenter \
  --namespace karpenter \
  iam.gke.io/gcp-service-account=karpenter-sa@my-project.iam.gserviceaccount.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This binds the GSA to the KSA via Workload Identity, allowing Karpenter pods to assume GCP roles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Install Karpenter with Helm
&lt;/h3&gt;

&lt;p&gt;Create a custom &lt;code&gt;values.yaml&lt;/code&gt; file tailored for GKE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;controller:
  clusterName: karpenter-gke
  clusterEndpoint: https://&amp;lt;API-SERVER&amp;gt;
  serviceAccount:
    name: karpenter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add karpenter https://charts.karpenter.sh

helm repo update

helm install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  -f values.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;&amp;lt;API-SERVER&amp;gt;&lt;/code&gt; with your GKE API server’s endpoint. This deploys the Karpenter controller using the Workload Identity-aware service account.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Create a NodePool and NodeClass for GKE
&lt;/h3&gt;

&lt;h4&gt;
  
  
  NodeClass Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: karpenter.k8s.gcp/v1beta1
kind: GCPNodeClass
metadata:
  name: gke-default
spec:
  projectID: my-project
  subnetwork: default
  serviceAccount: karpenter-sa@my-project.iam.gserviceaccount.com
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gke-nodepool
spec:
  template:
    spec:
      nodeClassRef:
        name: gke-default
      requirements:
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-central1-a"]
  limits:
    cpu: 500
    memory: 1000Gi
  ttlSecondsAfterEmpty: 300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f gcp-nodeclass.yaml
kubectl apply -f gcp-nodepool.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These custom resources tell Karpenter how to provision GCE instances that will join your GKE cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Deploy a Workload to Trigger Scaling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create deployment gke-load --image=nginx --replicas=15
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your current cluster lacks enough resources, Karpenter will provision GCE nodes using the NodePool configuration.&lt;/p&gt;

&lt;p&gt;With these steps completed, Karpenter should now be dynamically provisioning and scaling nodes in your GKE cluster based on real-time application demand. Use the following &lt;code&gt;kubectl&lt;/code&gt; commands to monitor activity and validate the setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodes
kubectl get pods
kubectl logs -n karpenter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Are Some Karpenter Alternatives?
&lt;/h2&gt;

&lt;p&gt;If Karpenter doesn’t meet your needs, consider these alternatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster Autoscaler
&lt;/h3&gt;

&lt;p&gt;This is a &lt;a href="https://www.devzero.io/blog/kubernetes-cluster-autoscaler" rel="noopener noreferrer"&gt;Kubernetes component&lt;/a&gt; that automatically adjusts the number of nodes in your cluster based on the resource needs of your pods.&lt;/p&gt;

&lt;p&gt;It scales up when there are pending pods that can’t be scheduled due to insufficient resources and scales down when nodes are underutilized. It's a stable, mature choice for general-purpose autoscaling and integrates well with managed Kubernetes platforms like EKS, GKE, and AKS.&lt;/p&gt;

&lt;p&gt;It requires predefined node groups and isn’t as flexible as Karpenter in choosing instance types. This tool is ideal for teams using managed Kubernetes services that need predictable scaling behavior and don’t require dynamic provisioning logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzvyfgjqyfx8rvnhjcdd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzvyfgjqyfx8rvnhjcdd.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;
KEDA supports over 50 built-in scalers like Kafka lag, queue length, Prometheus queries, and more.



&lt;h3&gt;
  
  
  KEDA (Kubernetes Event-Driven Autoscaler)
&lt;/h3&gt;

&lt;p&gt;An open-source autoscaler designed for event-driven applications that need to scale based on custom metrics or external triggers, KEDA supports over 50 built-in scalers like Kafka lag, queue length, Prometheus queries, and more.&lt;/p&gt;

&lt;p&gt;It works alongside the &lt;a href="https://www.devzero.io/blog/kubernetes-hpa" rel="noopener noreferrer"&gt;Horizontal Pod Autoscaler&lt;/a&gt; (HPA) to scale workloads on demand but doesn’t provision infrastructure itself. So it needs to be paired with Karpenter or Cluster Autoscaler for node scaling.&lt;/p&gt;

&lt;p&gt;KEDA is ideal for event-driven systems like queue consumers, batch jobs, or microservices responding to system metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  GKE Autopilot
&lt;/h3&gt;

&lt;p&gt;GKE Autopilot is a fully managed Kubernetes mode where Google handles both the control plane and node management.&lt;/p&gt;

&lt;p&gt;You simply deploy your workloads and GKE Autopilot automatically provisions, scales, and secures the nodes they run on.&lt;/p&gt;

&lt;p&gt;The tool enforces best practices for resource requests and security and charges you based on actual pod resource usage. However, it's GCP-only and may restrict low-level customizations required by certain workloads. &lt;/p&gt;

&lt;p&gt;GKE Autopilot is best for GCP-first teams looking to reduce operational burden while benefiting from fully managed Kubernetes scaling.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Fargate
&lt;/h3&gt;

&lt;p&gt;AWS Fargate is a serverless compute engine for containers that allows you to run pods without managing EC2 instances or Kubernetes nodes.&lt;/p&gt;

&lt;p&gt;It automatically provisions resources per pod and scales based on demand, eliminating the need to size and manage infrastructure.&lt;/p&gt;

&lt;p&gt;Fargate simplifies operations for stateless or ephemeral workloads, though it may not support certain use cases like DaemonSets or privileged workloads.&lt;/p&gt;

&lt;p&gt;AWS Fargate is tightly integrated into the AWS ecosystem and is best suited for stateless apps, &lt;a href="https://www.devzero.io/burstable-workloads" rel="noopener noreferrer"&gt;bursty workloads&lt;/a&gt;, or dev environments that prioritize simplicity over configurability.&lt;/p&gt;

&lt;h2&gt;
  
  
  How DevZero Can Help
&lt;/h2&gt;

&lt;p&gt;Many DevZero customers use DevZero along with Karpenter, KEDA and other autoscalers.&lt;/p&gt;

&lt;p&gt;Karpenter is specifically a Kubernetes cluster autoscaler focused on node group optimization and management. But as the &lt;a href="https://www.datadoghq.com/state-of-cloud-costs/" rel="noopener noreferrer"&gt;Datadog State of Cloud Costs&lt;/a&gt; report highlighted, over ⅓ of compute cloud waste is the result of idle workloads. Add the waste associated with memory. Let’s add to that GPU waste: There's a lot of workloads that provision 8 or 12 GPUs, while actual utilization is less than 1 GPU.&lt;/p&gt;

&lt;p&gt;DevZero takes a broader approach to optimization, focusing on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bin packing to reduce the total number of nodes needed&lt;/li&gt;
&lt;li&gt;Request optimization at the workload level to reduce the number of workloads&lt;/li&gt;
&lt;li&gt;Specialized optimization for different workload types (GPU vs CPU)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Benefits of DevZero
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single multi-cloud platform: support EKS, AKS, GKE and any other type of K8s in a single platform.&lt;/li&gt;
&lt;li&gt;Go beyond scheduling and spot instances: Live migration and binpacking optimizes the number of nodes as well as rightsizing of workloads.&lt;/li&gt;
&lt;li&gt;Live rightsizing for both memory and compute&lt;/li&gt;
&lt;li&gt;Support for any type of compute: Support for CPU and &lt;a href="https://www.devzero.io/blog/how-to-measure-gpu-utilization" rel="noopener noreferrer"&gt;GPU measurement&lt;/a&gt; and optimization.&lt;/li&gt;
&lt;li&gt;Flexible policy management: Users can exclude workloads and nodes from optimization, apply changes manually or use a read-write operator for automated optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, the key differentiator appears to be that while Karpenter focuses specifically on node provisioning and scaling, DevZero takes a more comprehensive approach to optimization across the entire Kubernetes cluster to provide additional layers of optimization and cost savings.&lt;/p&gt;

&lt;p&gt;Bottom line? &lt;a href="https://www.devzero.io/kubernetes-cost-optimization" rel="noopener noreferrer"&gt;DevZero&lt;/a&gt; can help you reduce as much as 80% of your Kubernetes cost&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqo1fia2afuzfux6ck1ng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqo1fia2afuzfux6ck1ng.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;DevZero Dashboards for cost and utilization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Karpenter is redefining how Kubernetes clusters scale. With its real-time, right-sized provisioning and growing multi-cloud support, it’s a compelling autoscaler for teams seeking agility and efficiency. When combined with developer platforms like DevZero, you unlock both operational excellence and developer productivity. What’s not to like?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.devzero.io/auth/signin" rel="noopener noreferrer"&gt;Explore how DevZero and Karpenter can transform your Kubernetes workflows today.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>karpenter</category>
      <category>cloud</category>
      <category>gpu</category>
    </item>
    <item>
      <title>The Cost of Kubernetes: Which Workloads Waste the Most Resources</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Fri, 12 Sep 2025 14:27:00 +0000</pubDate>
      <link>https://dev.to/shohams/the-cost-of-kubernetes-which-workloads-waste-the-most-resources-2514</link>
      <guid>https://dev.to/shohams/the-cost-of-kubernetes-which-workloads-waste-the-most-resources-2514</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Kubernetes has revolutionized how we deploy and manage applications, but it has also introduced a massive resource waste problem that most organizations don't fully understand. According to the &lt;a href="https://www.cncf.io/reports/cncf-annual-survey-2023/" rel="noopener noreferrer"&gt;CNCF's 2023 State of Cloud Native Development report&lt;/a&gt; and analysis from cloud cost management platforms like Spot.io and Cast.ai, the average Kubernetes cluster runs at only 13-25% CPU utilization and 18-35% memory utilization, representing billions of dollars in wasted cloud infrastructure costs annually.&lt;/p&gt;

&lt;p&gt;This isn't just about unused capacity -- it's about systematic overprovisioning patterns that vary dramatically by workload type. Some Kubernetes workloads waste 60-80% of their allocated resources, while others are relatively well-optimized. Understanding these patterns is crucial for any organization serious about cloud cost optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Big is Kubernetes Waste
&lt;/h2&gt;

&lt;p&gt;Before diving into specific workload patterns, let's establish the magnitude of Kubernetes resource waste:&lt;/p&gt;

&lt;h3&gt;
  
  
  Industry Benchmarks
&lt;/h3&gt;

&lt;p&gt;Based on data from multiple sources including the CNCF Annual Survey, Flexera's State of the Cloud Report, and cloud optimization platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average cluster utilization: 13-25% CPU, 18-35% memory (CNCF 2023, Cast.ai analysis)&lt;/li&gt;
&lt;li&gt;Typical overprovisioning factor: 2-5x actual resource needs (Spot.io 2023 Kubernetes Cost Report)&lt;/li&gt;
&lt;li&gt;Annual waste per cluster: $50,000-$500,000 depending on cluster size (based on AWS/GCP/Azure pricing analysis)&lt;/li&gt;
&lt;li&gt;Time to optimization payback: Usually 30-90 days (industry case studies)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Traditional Monitoring Misses This
&lt;/h3&gt;

&lt;p&gt;Most monitoring focuses on pod-level metrics, but overprovisioning happens at the resource request/limit level. A pod might be "healthy" while consuming only 20% of its allocated resources—the other 80% is simply wasted capacity that could be running other workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Does This Happen
&lt;/h3&gt;

&lt;p&gt;Behaviorally, Kubernetes manifests (helm charts, deployment.ymls, etc.) are first written for production environments and optimized for that purpose. Even so, the configurations tend to be optimized for times of peak utilization, rather than stable operations. While a workload may run properly at peak utilization time, it remains drastically overprovisioned at other times.  &lt;/p&gt;

&lt;p&gt;In reality, these manifests are more often copied than edited to fit each environment against which these configurations are executed. This results in rampant overprovisioning, not just in production environments, but also in other lower environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Average Waste by Workload Type
&lt;/h2&gt;

&lt;p&gt;Based on analysis of production clusters across multiple industries and data from cloud cost optimization platforms, here's how different Kubernetes workload types rank for resource waste:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The following percentages are based on aggregated data from various cloud cost management platforms (Cast.ai, Spot.io, Densify), customer case studies, and our own analysis of production clusters. Individual results may vary significantly based on workload characteristics and optimization maturity.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Jobs and CronJobs (60-80% average overprovisioning)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Source: Analysis of 200+ production clusters via cloud cost optimization platforms&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why they're the worst offenders:
&lt;/h4&gt;

&lt;p&gt;Unpredictable Input Sizes: Batch processing jobs often handle variable data volumes, leading to "worst-case scenario" resource allocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Typical overprovisioned Job
resources:
  requests:
    cpu: "4"
    memory: "8Gi"        # Sized for largest possible dataset
  limits:
    cpu: "8"
    memory: "16Gi"       # Double the requests "just in case"

# Reality: 90% of runs use &amp;lt;2 CPU cores and &amp;lt;3Gi memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conservative Failure Prevention: Since job failures can be expensive (data reprocessing, missed SLAs), teams err heavily on the side of overprovisioning rather than risk failure.&lt;/p&gt;

&lt;p&gt;Lack of Historical Data: Unlike long-running services, batch jobs often lack comprehensive resource usage history, making right-sizing difficult.&lt;/p&gt;

&lt;p&gt;"Set and Forget" Mentality: Jobs are often configured once and rarely revisited for optimization, even as data patterns change.&lt;/p&gt;

&lt;p&gt;Real-World Example: A financial services company was running nightly ETL jobs with 8 CPU cores and 32GB RAM. After monitoring actual usage, they discovered average utilization was 1.2 CPU cores and 4GB RAM—an 85% overprovisioning rate costing $180,000 annually across their job workloads.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This example is representative of patterns observed across multiple customer engagements in the financial services sector.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. StatefulSets (40-60% average overprovisioning)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Source: Database workload analysis from Densify and internal customer studies&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why databases and stateful apps waste resources:
&lt;/h4&gt;

&lt;p&gt;Database Buffer Pool Overallocation: Database administrators often allocate large buffer pools based on available memory rather than working set size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Common database overprovisioning pattern
resources:
  requests:
    memory: "16Gi"       # Conservative baseline
  limits:
    memory: "32Gi"       # "Room for growth"

# Actual working set: Often &amp;lt;8Gi for typical workloads
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Storage Overprovisioning: Persistent volumes are often sized for projected 2-3 year growth rather than current needs, leading to immediate overprovisioning of both storage and the compute resources to manage it.&lt;/p&gt;

&lt;p&gt;Cache Layer Conservatism: Applications like Redis, Memcached, and Elasticsearch often receive memory allocations based on peak theoretical usage rather than actual cache hit patterns and working set sizes.&lt;/p&gt;

&lt;p&gt;Growth Planning Gone Wrong: Teams allocate resources for anticipated scale that may never materialize, or arrives much later than expected.&lt;/p&gt;

&lt;p&gt;Real-World Example: An e-commerce platform allocated 64GB RAM to their PostgreSQL StatefulSet based on total database size. Monitoring revealed their working set was only 18GB, with buffer pool utilization averaging 28%. Right-sizing saved $8,000/month per database instance.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Based on a composite of multiple e-commerce customer optimizations.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Deployments (30-50% average overprovisioning)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Source: CNCF FinOps for Kubernetes report and Spot.io cost optimization data&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why even stateless apps waste resources:
&lt;/h4&gt;

&lt;p&gt;Development vs. Production Gap: Resource requirements determined during development often don't reflect production workload patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Development-based sizing
resources:
  requests:
    cpu: "500m"          # Based on single-user testing
    memory: "1Gi"        # Conservative development allocation
  limits:
    cpu: "2"             # "Better safe than sorry"
    memory: "4Gi"        # 4x requests "for bursts"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Missing Autoscaling: Many Deployments run with static replica counts and no horizontal pod autoscaling (HPA) or vertical pod autoscaling (VPA), leading to overprovisioning for peak traffic that rarely occurs.&lt;/p&gt;

&lt;p&gt;Generic Resource Templates: Organizations often use standard resource templates across different applications without customization for specific workload characteristics.&lt;/p&gt;

&lt;p&gt;Fear of Performance Issues: Teams overprovision to avoid any possibility of performance degradation, especially for customer-facing services.&lt;/p&gt;

&lt;p&gt;Real-World Example: A SaaS company's API services were allocated 2 CPU cores and 4GB RAM per pod. Performance monitoring showed 95th percentile usage at 400m CPU and 800MB RAM. Implementing HPA and right-sizing reduced costs by 60% while improving performance through better resource density.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Represents a typical pattern observed in SaaS application optimization projects.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. DaemonSets (20-40% average overprovisioning)
&lt;/h3&gt;

&lt;p&gt;Source: System workload analysis from Cast.ai and internal cluster audits&lt;/p&gt;

&lt;h4&gt;
  
  
  Why system services accumulate waste:
&lt;/h4&gt;

&lt;p&gt;One-Size-Fits-All Approach: DaemonSets often use the same resource allocation across heterogeneous node types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Problematic uniform allocation
resources:
  requests:
    cpu: "200m"          # Too much for small nodes, too little for large
    memory: "512Mi"      # Doesn't scale with node capacity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cumulative Impact: Individual overprovisioning seems small but multiplies across every node in the cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100-node cluster&lt;/li&gt;
&lt;li&gt;5 DaemonSets per node&lt;/li&gt;
&lt;li&gt;100m CPU overprovisioning per DaemonSet&lt;/li&gt;
&lt;li&gt;Total waste: 50 CPU cores cluster-wide&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;System Resource Competition: DaemonSets compete with kubelet and container runtime for resources, leading to conservative overprovisioning to ensure system stability.&lt;/p&gt;

&lt;p&gt;Lack of Visibility: System-level workloads often receive less monitoring attention than application workloads, making optimization less visible to teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calculating the Cost of Waste
&lt;/h2&gt;

&lt;p&gt;Let's quantify what these overprovisioning patterns cost:&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Calculation Examples
&lt;/h3&gt;

&lt;p&gt;Medium-sized cluster (50 nodes, mix of workload types): &lt;em&gt;Based on typical AWS EKS pricing in us-east-1 as of 2024&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs/CronJobs: 20 workloads × 70% overprovisioning × $200/month = $2,800/month waste&lt;/li&gt;
&lt;li&gt;StatefulSets: 10 workloads × 50% overprovisioning × $400/month = $2,000/month waste&lt;/li&gt;
&lt;li&gt;Deployments: 100 workloads × 40% overprovisioning × $100/month = $4,000/month waste&lt;/li&gt;
&lt;li&gt;DaemonSets: 5 workloads × 30% overprovisioning × $50/month = $75/month waste&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total monthly waste: $8,875 Annual waste: $106,500&lt;/p&gt;

&lt;p&gt;Note: Actual costs vary significantly based on cloud provider, region, instance types, and reserved instance usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  ROI of Optimization
&lt;/h3&gt;

&lt;p&gt;Most optimization efforts show (based on aggregated customer case studies):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementation time: 2-4 weeks for comprehensive optimization&lt;/li&gt;
&lt;li&gt;Payback period: 30-60 days&lt;/li&gt;
&lt;li&gt;Ongoing savings: 40-70% reduction in compute costs&lt;/li&gt;
&lt;li&gt;Performance improvements: Better resource density often improves performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Results based on analysis of 50+ optimization projects across various industries.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Causes: Why Overprovisioning Happens
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Psychological Factors
&lt;/h3&gt;

&lt;p&gt;Loss Aversion: The fear of application failure outweighs the "invisible" cost of wasted resources. A $10,000/month overprovisioning cost feels less painful than a single outage.&lt;/p&gt;

&lt;p&gt;Optimization Debt: Teams focus on shipping features rather than optimizing existing infrastructure, treating resource costs, which are usually a shared concern in most companies, as "someone else's problem."&lt;/p&gt;

&lt;p&gt;Lack of Feedback Loops: Most developers never see the cost impact of their resource allocation decisions. Moreover, most organizations have a drastic disconnect between the individuals who provision resources and the individuals who monitor the finances related to those resources (billing, invoicing, chargebacks, etc).&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Factors
&lt;/h3&gt;

&lt;p&gt;Inadequate Monitoring: Many organizations monitor application health but not resource efficiency, missing optimization opportunities.&lt;/p&gt;

&lt;p&gt;Complex Resource Relationships: Understanding the relationship between resource requests, limits, quality of service classes, and actual usage requires deep Kubernetes knowledge.&lt;/p&gt;

&lt;p&gt;Environment Inconsistencies: Resource requirements often differ significantly between development, staging, and production environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organizational Factors
&lt;/h3&gt;

&lt;p&gt;Siloed Responsibilities: Development teams set resource requirements, but platform/operations teams pay the bills, creating misaligned incentives.&lt;/p&gt;

&lt;p&gt;Missing Governance: Lack of resource quotas, limits, and approval processes for resource allocation changes.&lt;/p&gt;

&lt;p&gt;Optimization Skills Gap: Many teams lack the expertise to effectively and dynamically right-size Kubernetes workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization Strategies by Workload Type
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Jobs and CronJobs Optimization
&lt;/h3&gt;

&lt;p&gt;Resource Profiling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run jobs with representative datasets and monitor actual resource usage&lt;/li&gt;
&lt;li&gt;Create resource profiles for different input size categories&lt;/li&gt;
&lt;li&gt;Implement dynamic resource allocation based on input characteristics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Smart Scheduling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Use resource quotas to prevent waste
apiVersion: v1
kind: ResourceQuota
metadata:
  name: batch-quota
spec:
  hard:
    requests.cpu: "50"
    requests.memory: "100Gi"
    count/jobs.batch: "10"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitoring and Alerting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track job completion times vs. resource allocation&lt;/li&gt;
&lt;li&gt;Alert on jobs with &amp;lt;30% resource utilization&lt;/li&gt;
&lt;li&gt;Implement cost tracking per job execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  StatefulSets Optimization
&lt;/h3&gt;

&lt;p&gt;Database-Specific Monitoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor buffer pool hit rates and working set sizes&lt;/li&gt;
&lt;li&gt;Track query performance vs. resource allocation&lt;/li&gt;
&lt;li&gt;Implement alerts for underutilized database resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vertical Pod Autoscaling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: database-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: postgres
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: postgres
        maxAllowed:
          memory: "32Gi"
        minAllowed:
          memory: "4Gi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Storage Optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement storage classes with volume expansion&lt;/li&gt;
&lt;li&gt;Use storage tiering for hot/warm/cold data&lt;/li&gt;
&lt;li&gt;Monitor actual vs. provisioned storage usage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deployments Optimization
&lt;/h3&gt;

&lt;p&gt;Horizontal Pod Autoscaling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Custom Metrics Scaling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scale based on request rate, queue depth, or business metrics&lt;/li&gt;
&lt;li&gt;Implement predictive scaling for known traffic patterns&lt;/li&gt;
&lt;li&gt;Use multiple metrics for more accurate scaling decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DaemonSets Optimization
&lt;/h3&gt;

&lt;p&gt;Node-Specific Allocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Different resource allocation per node type
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector-small
spec:
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/instance-type: "t3.small"
      containers:
        - name: collector
          resources:
            requests:
              cpu: "50m"
              memory: "128Mi"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector-large
spec:
  template:
    spec:
      nodeSelector:
        node.kubernetes.io/instance-type: "c5.4xlarge"
      containers:
        - name: collector
          resources:
            requests:
              cpu: "200m"
              memory: "512Mi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Optimization Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Resource Quotas and Governance
&lt;/h3&gt;

&lt;p&gt;Implement namespace-level controls to prevent overprovisioning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: ResourceQuota
metadata:
  name: development-quota
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quality of Service Classes
&lt;/h3&gt;

&lt;p&gt;Optimize QoS classes for different workload patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guaranteed: Critical services with predictable resource needs&lt;/li&gt;
&lt;li&gt;Burstable: Services with variable but bounded resource usage&lt;/li&gt;
&lt;li&gt;BestEffort: Non-critical batch workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cluster Autoscaling
&lt;/h3&gt;

&lt;p&gt;Configure cluster autoscaling to match resource provisioning with actual demand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Cluster Autoscaler configuration
spec:
  scaleDownDelayAfterAdd: "10m"
  scaleDownUnneededTime: "10m"
  scaleDownUtilizationThreshold: 0.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cost Monitoring and Chargeback
&lt;/h3&gt;

&lt;p&gt;Implement comprehensive cost tracking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tag resources with cost centers and projects&lt;/li&gt;
&lt;li&gt;Monitor cost per service/team/environment&lt;/li&gt;
&lt;li&gt;Implement monthly cost reviews and optimization targets&lt;/li&gt;
&lt;li&gt;Create dashboards showing resource efficiency metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Roadmap
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: without DevZero
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Phase 1: Assessment (Week 1-2)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Deploy resource monitoring across all workload types&lt;/li&gt;
&lt;li&gt;Identify the most overprovisioned workloads&lt;/li&gt;
&lt;li&gt;Calculate current waste and potential savings&lt;/li&gt;
&lt;li&gt;Prioritize optimization efforts by impact&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Phase 2: Quick Wins (Week 3-4)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Implement HPA for suitable Deployments&lt;/li&gt;
&lt;li&gt;Right-size obviously overprovisioned Jobs and CronJobs&lt;/li&gt;
&lt;li&gt;Configure resource quotas to prevent future waste&lt;/li&gt;
&lt;li&gt;Deploy VPA in recommendation mode for StatefulSets&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Phase 3: Advanced Optimization (Week 5-8)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Implement custom metrics scaling&lt;/li&gt;
&lt;li&gt;Optimize DaemonSet resource allocation&lt;/li&gt;
&lt;li&gt;Deploy comprehensive cost monitoring&lt;/li&gt;
&lt;li&gt;Establish ongoing optimization processes&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Phase 4: Governance and Culture (Ongoing)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Create resource allocation guidelines&lt;/li&gt;
&lt;li&gt;Implement approval processes for resource changes&lt;/li&gt;
&lt;li&gt;Train teams on optimization best practices&lt;/li&gt;
&lt;li&gt;Establish regular optimization reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2: with DevZero
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Phase 1: Visualization (Week 1)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Deploy DevZero’s resource monitoring across all workload types&lt;/li&gt;
&lt;li&gt;Identify the most overprovisioned workloads&lt;/li&gt;
&lt;li&gt;Calculate current waste and potential savings&lt;/li&gt;
&lt;li&gt;Prioritize optimization efforts by impact&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Phase 2: Optimization &amp;amp; Automation (Week 2)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Apply manual recommendations&lt;/li&gt;
&lt;li&gt;Start applying automated recommendations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Measuring Success
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Performance Indicators
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cluster utilization: Target &amp;gt;60% CPU, &amp;gt;70% memory&lt;/li&gt;
&lt;li&gt;Cost per workload: Track monthly spend per service&lt;/li&gt;
&lt;li&gt;Resource efficiency ratio: Actual usage / allocated resources&lt;/li&gt;
&lt;li&gt;Optimization coverage: Percentage of workloads with proper sizing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Monitoring and Alerting
&lt;/h3&gt;

&lt;p&gt;Set up alerts for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workloads with &amp;lt;30% resource utilization for &amp;gt;7 days&lt;/li&gt;
&lt;li&gt;New deployments without resource requests/limits&lt;/li&gt;
&lt;li&gt;Cluster utilization dropping below targets&lt;/li&gt;
&lt;li&gt;Monthly cost increases &amp;gt;10%&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Kubernetes overprovisioning isn't just a cost problem—it's a systematic issue that varies dramatically by workload type. Jobs and CronJobs waste 60-80% of allocated resources, StatefulSets waste 40-60%, and even well-understood Deployments waste 30-50% on average.&lt;/p&gt;

&lt;p&gt;The good news is that this waste is largely preventable through proper monitoring, right-sizing, and governance. Organizations that implement comprehensive optimization strategies typically see (based on documented case studies and platform telemetry):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;40-70% reduction in compute costs&lt;/li&gt;
&lt;li&gt;Improved application performance through better resource density&lt;/li&gt;
&lt;li&gt;Better resource planning and capacity management&lt;/li&gt;
&lt;li&gt;Enhanced cost visibility and accountability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is treating resource optimization as an ongoing practice, not a one-time project. With the right monitoring, processes, and tooling in place, you can eliminate the majority of Kubernetes resource waste while improving application performance and reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/reports/cncf-annual-survey-2023/" rel="noopener noreferrer"&gt;CNCF Annual Survey 2023: Cloud Native Computing Foundation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/wp-content/uploads/2023/12/CNCF_Finops-Microsurvey-2023.pdf" rel="noopener noreferrer"&gt;Cloud Native and Kubernetes FinOps Microsurvey&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/wp-content/uploads/2025/04/Blue-DN29-State-of-Cloud-Native-Development.pdf" rel="noopener noreferrer"&gt;State of Cloud Native Development Report 2025: CNCF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://spot.io/blog/state-of-cloudops-2023-cloud-operations-challenges/" rel="noopener noreferrer"&gt;Kubernetes Cost Optimization Report 2023: Spot.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://info.flexera.com/CM-REPORT-State-of-the-Cloud-2025-Thanks" rel="noopener noreferrer"&gt;State of the Cloud Report 2025: Flexera&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/blog/2024/04/29/finops-for-kubernetes-engineering-cost-optimization/" rel="noopener noreferrer"&gt;FinOps for Kubernetes: CNCF FinOps Working Group&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: Overprovisioning percentages represent aggregated trends across multiple production environments. Individual results will vary based on workload characteristics, optimization maturity, and operational practices. All cost examples are illustrative and based on typical cloud provider pricing as of 2024.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cloudnative</category>
      <category>cpu</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Which Kubernetes Workloads to Use and When</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 10 Sep 2025 15:05:00 +0000</pubDate>
      <link>https://dev.to/shohams/which-kubernetes-workloads-to-use-and-when-1cga</link>
      <guid>https://dev.to/shohams/which-kubernetes-workloads-to-use-and-when-1cga</guid>
      <description>&lt;p&gt;Kubernetes has become the backbone of modern cloud-native infrastructure, powering everything from stateless web apps to complex machine learning pipelines. Yet, as organizations scale their clusters and diversify their workloads, many are confronted with a hidden challenge: choosing the right workload type and optimizing resource allocation to avoid massive, often invisible, waste. We previously discussed the &lt;a href="https://www.devzero.io/blog/kubernetes-workload-types" rel="noopener noreferrer"&gt;types of Kubernetes workloads&lt;/a&gt;. This blog will give a step by step guide to choosing the right workload type, while exposing the surprising patterns of overprovisioning that silently drain cloud budgets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start Here: Is your application stateful or stateless?
&lt;/h3&gt;

&lt;p&gt;Stateless Application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No persistent data stored locally&lt;/li&gt;
&lt;li&gt;Can be easily replaced or restarted&lt;/li&gt;
&lt;li&gt;Multiple instances are identical → Use Deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stateful Application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires persistent storage&lt;/li&gt;
&lt;li&gt;Needs stable network identity&lt;/li&gt;
&lt;li&gt;Data locality is important → Use StatefulSet&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Special Cases:
&lt;/h3&gt;

&lt;p&gt;Need to run on every node?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System-level services&lt;/li&gt;
&lt;li&gt;Node monitoring or logging&lt;/li&gt;
&lt;li&gt;Network or storage drivers → Use DaemonSet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One-time task?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data migration&lt;/li&gt;
&lt;li&gt;Batch processing&lt;/li&gt;
&lt;li&gt;Backup operation → Use Job&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recurring scheduled task?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular backups&lt;/li&gt;
&lt;li&gt;Periodic maintenance&lt;/li&gt;
&lt;li&gt;Scheduled reports → Use CronJob&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Complex multi-component application?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom business logic for deployment&lt;/li&gt;
&lt;li&gt;Complex dependencies&lt;/li&gt;
&lt;li&gt;Specialized update strategies → Consider Custom Resources/Operators&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Workload Popularity and Usage Patterns
&lt;/h2&gt;

&lt;p&gt;Based on analysis of production Kubernetes clusters:&lt;/p&gt;

&lt;p&gt;1.Deployments (60-70% of workloads)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most common workload type&lt;/li&gt;
&lt;li&gt;Well-understood and documented&lt;/li&gt;
&lt;li&gt;Suitable for majority of cloud-native applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.StatefulSets (15-20% of workloads)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Essential for data-tier applications&lt;/li&gt;
&lt;li&gt;Growing with cloud-native database adoption&lt;/li&gt;
&lt;li&gt;Require more operational expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3.DaemonSets (5-10% of workloads)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consistent across most clusters&lt;/li&gt;
&lt;li&gt;System-level services and infrastructure&lt;/li&gt;
&lt;li&gt;Often deployed by platform teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4.Jobs/CronJobs (10-15% of workloads)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Critical for automation and batch processing&lt;/li&gt;
&lt;li&gt;Highly variable resource requirements&lt;/li&gt;
&lt;li&gt;Often overlooked in resource planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;5.Custom Resources (2-5% of workloads)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Growing adoption with operator pattern&lt;/li&gt;
&lt;li&gt;Specialized use cases and complex applications&lt;/li&gt;
&lt;li&gt;Require significant Kubernetes expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for Workload Selection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Consider Your Application Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data persistence requirements: StatefulSet vs Deployment&lt;/li&gt;
&lt;li&gt;Scaling patterns: Horizontal vs vertical scaling needs&lt;/li&gt;
&lt;li&gt;Update frequency: Rolling updates vs recreate strategies&lt;/li&gt;
&lt;li&gt;Resource requirements: Consistent vs variable resource needs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Think About Operational Complexity
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring and observability requirements&lt;/li&gt;
&lt;li&gt;Backup and disaster recovery needs&lt;/li&gt;
&lt;li&gt;Security and compliance considerations&lt;/li&gt;
&lt;li&gt;Team expertise and operational overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Plan for the Future
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Growth and scaling requirements&lt;/li&gt;
&lt;li&gt;Integration with other services&lt;/li&gt;
&lt;li&gt;Migration and portability needs&lt;/li&gt;
&lt;li&gt;Cost optimization opportunities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Selecting the appropriate Kubernetes workload type is a critical architectural decision that impacts application performance, operational complexity, and resource efficiency. While Deployments handle the majority of use cases for stateless applications, understanding when to use StatefulSets, DaemonSets, Jobs, and CronJobs ensures you're building on the right foundation.&lt;/p&gt;

&lt;p&gt;The key is matching your application's characteristics—stateful vs stateless, batch vs long-running, system-level vs application-level—with the appropriate workload controller. This foundation becomes even more important when optimizing for cost and resource efficiency, which we'll explore in depth in our next post on Kubernetes overprovisioning patterns.&lt;/p&gt;

&lt;p&gt;Remember: choosing the right workload type upfront can save significant operational overhead and optimize costs down the line. Take time to understand your application's requirements and choose accordingly.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cloudnative</category>
      <category>containers</category>
    </item>
    <item>
      <title>Kubernetes Workload Types: When to Use What</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Mon, 08 Sep 2025 07:00:00 +0000</pubDate>
      <link>https://dev.to/shohams/kubernetes-workload-types-when-to-use-what-292h</link>
      <guid>https://dev.to/shohams/kubernetes-workload-types-when-to-use-what-292h</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Choosing the right Kubernetes workload type is crucial to building efficient and scalable applications. Each workload controller is designed for a specific use case, and understanding these differences is vital for both optimal application performance and resource optimization. This guide examines all major Kubernetes workload types, when to use each one, and provides real-world examples to help you make informed architectural decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Workload Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Deployments
&lt;/h3&gt;

&lt;p&gt;Purpose: Manage stateless applications with rolling updates and replica management.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web applications and APIs that don't store state locally&lt;/li&gt;
&lt;li&gt;Microservices without persistent data requirements&lt;/li&gt;
&lt;li&gt;Applications requiring high availability through multiple replicas&lt;/li&gt;
&lt;li&gt;Workloads needing frequent updates with zero downtime&lt;/li&gt;
&lt;li&gt;Services that can be easily replaced or restarted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Frontend web servers (nginx, Apache, React/Angular apps)&lt;/li&gt;
&lt;li&gt;REST API services and GraphQL endpoints&lt;/li&gt;
&lt;li&gt;Load balancers and reverse proxies&lt;/li&gt;
&lt;li&gt;Stateless backend services (authentication, notification services)&lt;/li&gt;
&lt;li&gt;Content delivery and caching layers (Redis for sessions, not persistence)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods are interchangeable and can be created/destroyed freely&lt;/li&gt;
&lt;li&gt;Rolling updates ensure zero-downtime deployments&lt;/li&gt;
&lt;li&gt;Horizontal scaling is straightforward&lt;/li&gt;
&lt;li&gt;No persistent storage is attached to individual pods&lt;/li&gt;
&lt;li&gt;Network identity is not important&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-api
  template:
    metadata:
      labels:
        app: web-api
    spec:
      containers:
        - name: api
          image: mycompany/web-api:v1.2.3
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  StatefulSets
&lt;/h3&gt;

&lt;p&gt;Purpose: Manage stateful applications requiring stable network identities and persistent storage.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Databases requiring persistent storage and stable identities&lt;/li&gt;
&lt;li&gt;Applications with master-slave or leader-follower architectures&lt;/li&gt;
&lt;li&gt;Services requiring ordered deployment and scaling&lt;/li&gt;
&lt;li&gt;Applications that store data locally and need consistent network identities&lt;/li&gt;
&lt;li&gt;Clustered applications with peer discovery requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database clusters (PostgreSQL, MySQL, MongoDB)&lt;/li&gt;
&lt;li&gt;Message brokers (RabbitMQ, Apache Kafka)&lt;/li&gt;
&lt;li&gt;Distributed storage systems (Cassandra, Elasticsearch)&lt;/li&gt;
&lt;li&gt;Consensus-based systems (etcd, Consul, Zookeeper)&lt;/li&gt;
&lt;li&gt;Analytics platforms requiring data locality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods have stable, unique network identities (pod-0, pod-1, pod-2)&lt;/li&gt;
&lt;li&gt;Persistent storage follows pods during rescheduling&lt;/li&gt;
&lt;li&gt;Ordered deployment and scaling (pod-0 before pod-1, etc.)&lt;/li&gt;
&lt;li&gt;Stable DNS names for service discovery&lt;/li&gt;
&lt;li&gt;Graceful termination and ordered updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-cluster
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:14
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: postgres-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DaemonSets
&lt;/h3&gt;

&lt;p&gt;Purpose: Run exactly one pod per node for system-level services.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node-level monitoring and logging&lt;/li&gt;
&lt;li&gt;Network plugins and system services&lt;/li&gt;
&lt;li&gt;Security agents and compliance tools&lt;/li&gt;
&lt;li&gt;Hardware management and device plugins&lt;/li&gt;
&lt;li&gt;Any service that needs to run on every node&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log collection agents (Fluentd, Filebeat, Logstash)&lt;/li&gt;
&lt;li&gt;Monitoring agents (Prometheus Node Exporter, Datadog agent)&lt;/li&gt;
&lt;li&gt;Network overlay components (Calico, Flannel)&lt;/li&gt;
&lt;li&gt;Security and compliance tools (Falco, Twistlock)&lt;/li&gt;
&lt;li&gt;Storage drivers and CSI plugins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically schedules pods on new nodes&lt;/li&gt;
&lt;li&gt;Ensures exactly one pod per node (unless node selectors are used)&lt;/li&gt;
&lt;li&gt;Typically requires elevated privileges&lt;/li&gt;
&lt;li&gt;Often uses host networking and file system access&lt;/li&gt;
&lt;li&gt;Survives node reboots and maintenance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
spec:
  selector:
    matchLabels:
      name: log-collector
  template:
    metadata:
      labels:
        name: log-collector
    spec:
      containers:
        - name: fluentd
          image: fluentd:v1.14
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Jobs
&lt;/h3&gt;

&lt;p&gt;Purpose: Run batch workloads to completion with guaranteed execution.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-time data processing tasks&lt;/li&gt;
&lt;li&gt;Database migrations and schema updates&lt;/li&gt;
&lt;li&gt;Backup and restore operations&lt;/li&gt;
&lt;li&gt;Batch analytics and reporting&lt;/li&gt;
&lt;li&gt;Image or video processing pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL (Extract, Transform, Load) processes&lt;/li&gt;
&lt;li&gt;Database migrations and maintenance scripts&lt;/li&gt;
&lt;li&gt;Report generation and data exports&lt;/li&gt;
&lt;li&gt;Machine learning model training&lt;/li&gt;
&lt;li&gt;File processing and format conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs until successful completion&lt;/li&gt;
&lt;li&gt;Can run multiple pods for parallel processing&lt;/li&gt;
&lt;li&gt;Automatically retries failed pods (configurable)&lt;/li&gt;
&lt;li&gt;Cleans up completed pods based on retention policy&lt;/li&gt;
&lt;li&gt;Supports different completion modes (parallel, indexed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  parallelism: 4
  completions: 1
  backoffLimit: 3
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: migrator
          image: mycompany/data-migrator:v2.1.0
          env:
            - name: SOURCE_DB
              value: "postgresql://old-db:5432/data"
            - name: TARGET_DB
              value: "postgresql://new-db:5432/data"
          resources:
            requests:
              cpu: 1
              memory: 2Gi
            limits:
              cpu: 2
              memory: 4Gi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  CronJobs
&lt;/h3&gt;

&lt;p&gt;Purpose: Schedule recurring batch workloads.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled backups and maintenance&lt;/li&gt;
&lt;li&gt;Periodic data synchronization&lt;/li&gt;
&lt;li&gt;Regular cleanup and housekeeping tasks&lt;/li&gt;
&lt;li&gt;Time-based report generation&lt;/li&gt;
&lt;li&gt;Health checks and monitoring tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database backups and archiving&lt;/li&gt;
&lt;li&gt;Log rotation and cleanup&lt;/li&gt;
&lt;li&gt;Data synchronization between systems&lt;/li&gt;
&lt;li&gt;Periodic health checks and system maintenance&lt;/li&gt;
&lt;li&gt;Scheduled report generation and delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key Characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses cron syntax for scheduling&lt;/li&gt;
&lt;li&gt;Creates Jobs on schedule&lt;/li&gt;
&lt;li&gt;Configurable concurrency policies&lt;/li&gt;
&lt;li&gt;Can handle missed schedules&lt;/li&gt;
&lt;li&gt;Automatic cleanup of old jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: backup
              image: postgres:14
              command:
                - /bin/bash
                - -c
                - pg_dump $DATABASE_URL &amp;gt; /backup/$(date +%Y%m%d_%H%M).sql
              env:
                - name: DATABASE_URL
                  valueFrom:
                    secretKeyRef:
                      name: db-credentials
                      key: url
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Workload Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ReplicaSets
&lt;/h3&gt;

&lt;p&gt;Purpose: Low-level replica management (typically managed by Deployments).&lt;/p&gt;

&lt;p&gt;ReplicaSets are rarely used directly in modern Kubernetes deployments. Deployments provide a higher-level abstraction that handles ReplicaSet management automatically, including rolling updates and rollback capabilities.&lt;/p&gt;

&lt;p&gt;When you might use ReplicaSets directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building custom controllers&lt;/li&gt;
&lt;li&gt;Very specific scaling requirements not met by Deployments&lt;/li&gt;
&lt;li&gt;Legacy applications with unique update patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Custom Resources and Operators
&lt;/h3&gt;

&lt;p&gt;Purpose: Application-specific workload management through custom controllers.&lt;/p&gt;

&lt;p&gt;When to Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex applications requiring custom lifecycle management&lt;/li&gt;
&lt;li&gt;Multi-component applications with interdependencies&lt;/li&gt;
&lt;li&gt;Applications needing specialized scaling or update strategies&lt;/li&gt;
&lt;li&gt;When existing workload types don't fit your use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database operators (PostgreSQL Operator, MongoDB Operator)&lt;/li&gt;
&lt;li&gt;Application platforms (Istio, Knative)&lt;/li&gt;
&lt;li&gt;ML/AI workload managers (Kubeflow, Seldon)&lt;/li&gt;
&lt;li&gt;Backup and disaster recovery operators&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>application</category>
      <category>performance</category>
    </item>
    <item>
      <title>Part 5: Tips for Optimizing GPU Utilization in Kubernetes</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Fri, 05 Sep 2025 14:03:00 +0000</pubDate>
      <link>https://dev.to/shohams/part-5-tips-for-optimizing-gpu-utilization-in-kubernetes-2of5</link>
      <guid>https://dev.to/shohams/part-5-tips-for-optimizing-gpu-utilization-in-kubernetes-2of5</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://www.linkedin.com/events/7370971876916883456/" rel="noopener noreferrer"&gt;Sign up for this free workshop hosted by NVIDIA and DevZero on October 23 to learn more about optimization GPU utilization in Kubernetes.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips for optimizing GPU utilization in Kubernetes
&lt;/h2&gt;

&lt;p&gt;Optimizing GPU utilization in Kubernetes requires a systematic approach that addresses monitoring, optimization, and governance simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Assessment and Baseline Establishment
&lt;/h3&gt;

&lt;p&gt;Current state analysis should focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Measuring actual GPU utilization across different workload types&lt;/li&gt;
&lt;li&gt;Identifying the most underutilized resources and workloads&lt;/li&gt;
&lt;li&gt;Calculating current costs and waste patterns&lt;/li&gt;
&lt;li&gt;Understanding team usage patterns and requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Baseline metrics should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average GPU utilization by workload type&lt;/li&gt;
&lt;li&gt;Cost per GPU-hour by team and project&lt;/li&gt;
&lt;li&gt;Frequency and duration of cold starts&lt;/li&gt;
&lt;li&gt;Resource sharing opportunities and constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following section provides a brief walkthrough on how overprovision/underutilization can be examined, and then automation applied to maintain workload efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x5g035v9p9hhyxg2b9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x5g035v9p9hhyxg2b9s.png" alt="CPU Usage for the last 30 days" width="800" height="129"&gt;&lt;/a&gt;Upon observing the usage patterns for this workload running as a Kubernetes deployment, replica count was reduced.   &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd9io0wd8wgklrk6do4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd9io0wd8wgklrk6do4v.png" alt="Memory Usage for the last 30 days" width="800" height="131"&gt;&lt;/a&gt;&lt;/p&gt;
The reduction in replica count, resulted in a subsequent reduction in memory that this workload was utilizing.



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3y5imhpi3d1dprlg1zy7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3y5imhpi3d1dprlg1zy7.png" alt="GPU Usage for the last 30 days" width="800" height="129"&gt;&lt;/a&gt;One of the containers in the pod was hosting an inference service, which the team was able to scale down and validate the need for – upon validation, the workload was reintroduced at a significantly scaled down capacity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyylpbfzp45e9ijinuz9r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyylpbfzp45e9ijinuz9r.png" alt="vRAM Usage for the last 30 days" width="800" height="122"&gt;&lt;/a&gt;Tangentially, this was the observed GPU VRAM utilization for the container utilizing a GPU device.‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimization Prioritization
&lt;/h3&gt;

&lt;p&gt;High-impact optimization opportunities typically include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Research workflows with long idle periods&lt;/li&gt;
&lt;li&gt;Inference workloads with frequent cold starts&lt;/li&gt;
&lt;li&gt;Training workloads running on on-demand instances without checkpointing&lt;/li&gt;
&lt;li&gt;Underutilized dedicated GPU nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceukhhrum994kswex5t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fceukhhrum994kswex5t8.png" alt="Pod create, Ctr Started, Model Loading, Ready for inference requests" width="800" height="137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quick wins that provide immediate ROI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing basic monitoring and alerting&lt;/li&gt;
&lt;li&gt;Right-sizing obviously overprovisioned workloads&lt;/li&gt;
&lt;li&gt;Enabling spot instances for training workloads&lt;/li&gt;
&lt;li&gt;Consolidating underutilized resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Governance and Continuous Improvement
&lt;/h3&gt;

&lt;p&gt;Resource governance frameworks should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Approval processes for GPU resource allocation&lt;/li&gt;
&lt;li&gt;Regular usage reviews and optimization assessments&lt;/li&gt;
&lt;li&gt;Cost allocation and chargeback mechanisms&lt;/li&gt;
&lt;li&gt;Training and best practices for development teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Continuous improvement processes should focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular monitoring and optimization reviews&lt;/li&gt;
&lt;li&gt;Technology adoption (checkpoint/restore, MIG, etc.)&lt;/li&gt;
&lt;li&gt;Workload pattern analysis and optimization&lt;/li&gt;
&lt;li&gt;Cost efficiency benchmarking and targets&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: The Path to GPU Efficiency
&lt;/h2&gt;

&lt;p&gt;GPU underutilization in Kubernetes represents one of the most expensive infrastructure optimization opportunities in modern cloud environments. Unlike CPU and memory optimization, which might save thousands monthly, GPU optimization typically saves tens or hundreds of thousands of dollars while improving application performance and reliability.&lt;/p&gt;

&lt;p&gt;The path to GPU efficiency requires understanding the unique characteristics of different ML workload types, implementing comprehensive monitoring beyond basic utilization metrics, and adopting workload-specific optimization strategies. Technologies like checkpoint/restore and CRIU-GPU are transforming the economics of GPU infrastructure by enabling more aggressive use of cost-effective compute options while maintaining reliability.&lt;/p&gt;

&lt;p&gt;Organizations that take a strategic approach to GPU optimization—focusing on workload-specific strategies, comprehensive monitoring, and systematic governance—typically achieve cost reductions of 40-70% while improving application performance and developer productivity. The key is treating GPU optimization as a strategic initiative rather than a tactical cost-cutting exercise.&lt;/p&gt;

&lt;p&gt;As AI/ML workloads continue to grow in importance and scale, GPU efficiency will become a critical competitive advantage. Organizations that master these optimization strategies today will be better positioned to scale their AI infrastructure cost-effectively tomorrow.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>gpu</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Part 4: GPU Security and Isolation</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Wed, 03 Sep 2025 17:02:00 +0000</pubDate>
      <link>https://dev.to/shohams/part-4-gpu-security-and-isolation-4bmd</link>
      <guid>https://dev.to/shohams/part-4-gpu-security-and-isolation-4bmd</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://www.linkedin.com/events/7370971876916883456/" rel="noopener noreferrer"&gt;Sign up for this free workshop hosted by NVIDIA and DevZero on October 23 to learn more about GPU utilization, security, and isolation.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GPU Security and Isolation
&lt;/h2&gt;

&lt;p&gt;Effective GPU resource management provides significant security and isolation benefits beyond simple cost optimization. These benefits become increasingly important as organizations deploy GPU workloads across multiple teams and projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware-Level Isolation with MIG
&lt;/h3&gt;

&lt;p&gt;Multi-Instance GPU (MIG) technology provides hardware-level isolation, enabling secure multi-tenancy on expensive GPU hardware. MIG partitions create isolated GPU instances with dedicated memory and compute resources, thereby preventing workloads from interfering with each other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zelk2ysxjd75qxif8zd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zelk2ysxjd75qxif8zd.png" alt="traditional allocation vs optimized sharing" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIG partitioning strategies depend on workload requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development and testing: Smaller MIG instances for multiple concurrent experiments&lt;/li&gt;
&lt;li&gt;Production inference: Larger MIG instances for performance-critical workloads&lt;/li&gt;
&lt;li&gt;Multi-tenant environments: Balanced partitioning for different teams or projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Tenancy Patterns
&lt;/h3&gt;

&lt;p&gt;Different organizational contexts require different multi-tenancy approaches:&lt;/p&gt;

&lt;p&gt;Department-level isolation: When multiple departments share GPU infrastructure, hardware-level isolation through MIG or dedicated nodes may be necessary to prevent resource conflicts and ensure security boundaries.&lt;/p&gt;

&lt;p&gt;Team-level sharing: Within engineering organizations, memory-based sharing may be acceptable when teams work on related projects with compatible security requirements.&lt;/p&gt;

&lt;p&gt;Project-level optimization: Short-term projects may benefit from time-multiplexed sharing that maximizes utilization while maintaining project isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Considerations
&lt;/h3&gt;

&lt;p&gt;GPU workloads often process sensitive data or proprietary models that require additional security measures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model protection: Preventing unauthorized access to trained models&lt;/li&gt;
&lt;li&gt;Data isolation: Ensuring training data doesn't leak between workloads&lt;/li&gt;
&lt;li&gt;Access controls: Managing who can deploy and access GPU resources&lt;/li&gt;
&lt;li&gt;Audit trails: Tracking GPU usage for compliance and security monitoring&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gpu</category>
      <category>security</category>
      <category>microvm</category>
    </item>
    <item>
      <title>Part 3: How to Fix Your GPU Utilization</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Tue, 02 Sep 2025 15:07:00 +0000</pubDate>
      <link>https://dev.to/shohams/part-3-how-to-fix-your-gpu-utilization-52d4</link>
      <guid>https://dev.to/shohams/part-3-how-to-fix-your-gpu-utilization-52d4</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://www.linkedin.com/events/7370971876916883456/" rel="noopener noreferrer"&gt;Sign up for this free workshop hosted by NVIDIA and DevZero on October 23 to learn more about improving GPU utilization.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to fix your GPU utilization
&lt;/h2&gt;

&lt;p&gt;Different ML workload types require fundamentally different optimization approaches. A strategy that works well for training workloads may be counterproductive for real-time inference, and vice versa.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training Workload Optimization
&lt;/h3&gt;

&lt;p&gt;Training workloads benefit from checkpoint/restore strategies that enable more aggressive use of cost-effective compute options. By implementing robust checkpointing, organizations can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use spot instances for training workloads, reducing costs by 60-80%&lt;/li&gt;
&lt;li&gt;Implement automatic job migration during node maintenance&lt;/li&gt;
&lt;li&gt;Enable faster recovery from hardware failures&lt;/li&gt;
&lt;li&gt;Support more efficient cluster scheduling through workload mobility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Node selection strategies for training workloads should prioritize cost-effectiveness over availability. Training can tolerate interruptions with proper checkpointing, making spot instances and preemptible nodes attractive options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Inference Optimization
&lt;/h3&gt;

&lt;p&gt;Inference workloads require right-sizing strategies that balance resource efficiency with performance requirements. Key optimization principles include:&lt;/p&gt;

&lt;p&gt;Memory-based right-sizing: Match GPU memory capacity to model requirements rather than defaulting to the largest available instances. An 80GB model doesn't require a 141GB GPU unless you plan to utilize specific optimization techniques or anticipate future model growth.&lt;/p&gt;

&lt;p&gt;Replica optimization: Determine the optimal number of inference replicas based on request patterns, cold start costs, and resource utilization. More replicas reduce individual utilization but may improve overall efficiency by minimizing the number of cold starts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n78dzdrbm1wpupf87hz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n78dzdrbm1wpupf87hz.png" alt="GPU Usage and vRAM Usage for last 7 days" width="800" height="245"&gt;&lt;/a&gt;While still not completely optimized, horizontal autoscaling helps this workload not overprovision for the sparse peaks  &lt;/p&gt;

&lt;p&gt;Resource sharing for compatible workloads: When multiple inference workloads have complementary usage patterns, GPU resources can be shared effectively. Two inference services, each requiring 60GB of GPU memory but with sparse actual utilization, can potentially share a single H100 with 141GB of memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Resource Sharing Strategies
&lt;/h3&gt;

&lt;p&gt;Modern GPU architectures enable sophisticated resource-sharing strategies that can dramatically improve utilization:&lt;/p&gt;

&lt;p&gt;Multi-Instance GPU (MIG) technology allows hardware-level partitioning of NVIDIA A100 and H100 GPUs into smaller instances. This enables multiple workloads to share a single physical GPU with hardware-level isolation, improving utilization while maintaining security boundaries. More about &lt;a href="https://www.devzero.io/blog/gpu-multi-tenancy" rel="noopener noreferrer"&gt;MIG and GPU multi-tenancy here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Time-multiplexed sharing works well for workloads with different usage patterns. A training workload that runs overnight can share GPU resources with inference workloads that peak during business hours.&lt;/p&gt;

&lt;p&gt;Memory-based sharing enables multiple workloads to coexist on the same GPU when their combined memory requirements fit within available GPU memory and their compute usage patterns don't conflict.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Costs: Ancillary Workload Optimization
&lt;/h2&gt;

&lt;p&gt;GPU workloads rarely operate in isolation. They depend on CPU-intensive preprocessing, network data transfer, and various supporting services that can create bottlenecks and reduce overall GPU utilization efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuvlj287b4ipbtn8yxvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuvlj287b4ipbtn8yxvh.png" alt="Total cost for the last hour after optimization automation" width="800" height="224"&gt;&lt;/a&gt;Impact of optimization automation on cost&lt;/p&gt;

&lt;h3&gt;
  
  
  CPU Preprocessing Bottlenecks
&lt;/h3&gt;

&lt;p&gt;Many ML workloads include significant CPU-intensive preprocessing steps that can starve GPU resources. Data loading, image preprocessing, and feature engineering tasks often run on CPU cores while GPUs wait for processed data.&lt;/p&gt;

&lt;p&gt;Strategic CPU allocation for GPU workloads involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right-sizing CPU resources to match GPU processing capacity&lt;/li&gt;
&lt;li&gt;Implementing preprocessing pipelines that minimize GPU idle time&lt;/li&gt;
&lt;li&gt;Using CPU-optimized preprocessing libraries that maximize throughput&lt;/li&gt;
&lt;li&gt;Considering preprocessing acceleration through specialized hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Network and Storage Considerations
&lt;/h3&gt;

&lt;p&gt;GPU workloads often involve substantial data movement that can impact utilization efficiency. Model loading, dataset transfer, and result output can create I/O bottlenecks that reduce GPU efficiency.&lt;/p&gt;

&lt;p&gt;Network optimization strategies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting nodes with appropriate network interface capabilities&lt;/li&gt;
&lt;li&gt;Implementing efficient data pipeline architectures&lt;/li&gt;
&lt;li&gt;Using content delivery networks for model and dataset distribution&lt;/li&gt;
&lt;li&gt;Optimizing data formats and compression for faster transfer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storage optimization involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using high-performance storage for model and dataset access&lt;/li&gt;
&lt;li&gt;Implementing caching strategies that reduce repeated data loading&lt;/li&gt;
&lt;li&gt;Considering local storage for frequently accessed models&lt;/li&gt;
&lt;li&gt;Optimizing model serialization formats for faster loading&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sidecar Container Optimization
&lt;/h3&gt;

&lt;p&gt;GPU workloads often include supporting containers that handle API endpoints, networking, monitoring, and other auxiliary functions. These sidecar containers can consume significant CPU and memory resources if not properly optimized.&lt;/p&gt;

&lt;p&gt;Common sidecar patterns include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI containers serving inference endpoints&lt;/li&gt;
&lt;li&gt;Istio service mesh components for networking and security&lt;/li&gt;
&lt;li&gt;Monitoring and logging agents for observability&lt;/li&gt;
&lt;li&gt;Authentication and authorization services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sidecar optimization strategies focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right-sizing sidecar resources based on actual usage patterns&lt;/li&gt;
&lt;li&gt;Consolidating multiple sidecar functions where possible&lt;/li&gt;
&lt;li&gt;Using lightweight alternatives for non-critical functionality&lt;/li&gt;
&lt;li&gt;Implementing resource sharing between primary and sidecar containers&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gpu</category>
      <category>machinelearning</category>
      <category>nvidia</category>
    </item>
    <item>
      <title>Part 2: How to Measure Your GPU Utilization</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Fri, 29 Aug 2025 14:55:00 +0000</pubDate>
      <link>https://dev.to/shohams/part-2-how-to-measure-your-gpu-utilization-nfd</link>
      <guid>https://dev.to/shohams/part-2-how-to-measure-your-gpu-utilization-nfd</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://www.linkedin.com/events/7370971876916883456/" rel="noopener noreferrer"&gt;Sign up for this free workshop hosted by NVIDIA and DevZero on October 23 to learn more about measuring GPU utilization.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to measure your GPU utilization
&lt;/h2&gt;

&lt;p&gt;Traditional GPU monitoring approaches, such as nvidia-smi, provide point-in-time utilization snapshots but fail to capture the strategic insights needed for optimization. Effective GPU utilization monitoring requires a multidimensional approach that integrates with Kubernetes orchestration and provides workload-specific insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  DCGM Integration with Kubernetes
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://developer.nvidia.com/dcgm" rel="noopener noreferrer"&gt;NVIDIA Data Center GPU Manager (DCGM)&lt;/a&gt; provides the foundation for comprehensive GPU monitoring in Kubernetes environments. When integrated with &lt;a href="https://github.com/google/cadvisor" rel="noopener noreferrer"&gt;cAdvisor&lt;/a&gt; and Kubernetes metrics, DCGM enables cluster-wide visibility into GPU utilization patterns across different workload types.&lt;/p&gt;

&lt;p&gt;The NVIDIA GPU Operator simplifies DCGM deployment and management in Kubernetes clusters, providing automated installation and configuration of GPU monitoring components. This operator-based approach ensures consistent monitoring across nodes while integrating with existing Kubernetes observability infrastructure.&lt;/p&gt;

&lt;p&gt;Key metrics for strategic GPU monitoring include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU utilization percentage: Actual compute utilization vs. allocated capacity&lt;/li&gt;
&lt;li&gt;Memory utilization: GPU memory usage vs. available GPU memory&lt;/li&gt;
&lt;li&gt;Tensor throughput: The rate of useful computational work being performed&lt;/li&gt;
&lt;li&gt;Request-level tracking: Whether GPUs are receiving active inference requests or sitting idle&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Dimensional Utilization Analysis
&lt;/h3&gt;

&lt;p&gt;Effective GPU optimization requires understanding the relationship between different utilization dimensions. A GPU might show 90% memory utilization while achieving only 30% compute utilization, indicating potential for resource sharing or workload optimization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajge6revg9wfcs3ivq68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajge6revg9wfcs3ivq68.png" alt="GPU Usage and vRAM Usage for Last 7 days" width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While loading a model into GPU memory makes it consume VRAM, investigating the GPU utilization shows that the workload is never interacted with - workloads like these can be safely scaled down to 1 or 2 replicas (where each replica uses 1 GPU device). &lt;/p&gt;

&lt;p&gt;Memory vs. Compute Utilization Patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High memory, low compute: Large models with infrequent inference requests&lt;/li&gt;
&lt;li&gt;High compute, low memory: Small models with high request throughput&lt;/li&gt;
&lt;li&gt;Low memory, low compute: Idle or poorly optimized workloads&lt;/li&gt;
&lt;li&gt;High memory, high compute: Well-optimized workloads operating at capacity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4oc1duik5f2l3tuhifx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4oc1duik5f2l3tuhifx.png" alt="Compute utilization vs memory utilization" width="800" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This multi-dimensional analysis enables strategic decisions about workload placement, resource sharing opportunities, and optimization priorities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster-Wide Visibility and Trends
&lt;/h3&gt;

&lt;p&gt;Strategic GPU monitoring must extend beyond individual workloads to provide cluster-wide insights into utilization patterns, trends, and optimization opportunities. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Utilization distribution: Which workloads and teams are driving GPU consumption&lt;/li&gt;
&lt;li&gt;Temporal patterns: Peak usage times and idle periods that enable better scheduling&lt;/li&gt;
&lt;li&gt;Cost attribution: Mapping GPU usage to specific teams, projects, or cost centers&lt;/li&gt;
&lt;li&gt;Optimization opportunities: Identifying underutilized resources and sharing possibilities&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gpu</category>
      <category>kubernetes</category>
      <category>nvidia</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Part 1: Why Your Million-Dollar GPU Cluster is 80% Idle and how to fix it</title>
      <dc:creator>Shani Shoham</dc:creator>
      <pubDate>Thu, 28 Aug 2025 16:06:00 +0000</pubDate>
      <link>https://dev.to/shohams/part-1-why-your-million-dollar-gpu-cluster-is-80-idle-and-how-to-fix-it-ij0</link>
      <guid>https://dev.to/shohams/part-1-why-your-million-dollar-gpu-cluster-is-80-idle-and-how-to-fix-it-ij0</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://www.linkedin.com/events/7370971876916883456/" rel="noopener noreferrer"&gt;Sign up for this free workshop hosted by NVIDIA and DevZero on October 23 to learn more about GPU utilization and how to fix it.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is your GPU cluster idle
&lt;/h2&gt;

&lt;p&gt;While organizations obsess over CPU and memory optimization in their Kubernetes clusters, a far more expensive problem is quietly destroying budgets: GPU underutilization. The average GPU-enabled Kubernetes cluster runs at 15-25% utilization, but unlike CPU overprovisioning, which can waste thousands of dollars per month, GPU underutilization can burn through tens or hundreds of thousands.&lt;/p&gt;

&lt;p&gt;Consider this: a single NVIDIA H100 instance costs $30-50 per hour across major cloud providers. An underutilized cluster with 20 GPUs running at 20% utilization incurs approximately $200,000 in annual compute costs alone. Yet most organizations lack the monitoring, processes, and architectural strategies to address this systematic waste.&lt;/p&gt;

&lt;p&gt;The challenge isn't just about resource efficiency—it's about the fundamental economics of AI/ML infrastructure. GPU resources are 10-50x more expensive than traditional compute, making optimization not just beneficial but business-critical. This post examines how various ML workload types lead to overprovisioning, strategies for monitoring actual GPU utilization, and architectural approaches that can significantly enhance ROI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding GPU Workload Patterns: The Foundation of Optimization
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjzyxwopqk1mosi934w7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmjzyxwopqk1mosi934w7.png" alt="current cost vs optimized cost for cloud compute over 1 month period" width="800" height="118"&gt;&lt;/a&gt;&lt;/p&gt;
Since cloud compute is billed by the hour (vCPU cores/hr, GB RAM/hr, GPU/hr, ..), optimizing an overprovisioned workload can have a massive impact on the monthly cloud invoice. 



&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;GPU utilization challenges stem from the diverse and often unpredictable nature of machine learning workloads. Unlike traditional applications with relatively consistent resource patterns, ML workloads exhibit dramatically different utilization characteristics that require workload-specific optimization strategies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmp5tv1er234qovz6ca5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmp5tv1er234qovz6ca5t.png" alt="Example of idle time ad utilization time of Kubernetes GPU cluster" width="800" height="665"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Training Workloads: The Interruption Cost Problem
&lt;/h3&gt;

&lt;p&gt;Training workloads represent the most resource-intensive and potentially wasteful category of GPU usage. These workloads typically run for hours or days, consuming substantial GPU memory and compute resources. However, they're particularly vulnerable to interruption costs that can multiply resource waste.&lt;/p&gt;

&lt;p&gt;When a training job is interrupted without proper checkpointing, the entire computational investment is lost. A 12-hour training run that gets interrupted at hour 10 without checkpoints requires restarting from scratch, effectively wasting 10 hours of expensive GPU time. This creates a perverse incentive for teams to overprovision resources to minimize interruption risk, leading to systematic underutilization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.devzero.io/blog/checkpoint-restore-with-criu" rel="noopener noreferrer"&gt;Checkpoint/Restore&lt;/a&gt; technology fundamentally changes this equation. By capturing the complete state of training processes—including GPU memory, model weights, and optimizer states—checkpointing enables training workloads to resume from interruption points rather than having to restart. This resilience allows organizations to utilize more cost-effective, interruption-prone instances (such as spot instances) while maintaining training reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.devzero.io/blog/gpu-container-checkpoint-restore" rel="noopener noreferrer"&gt;CRIU-GPU&lt;/a&gt;, an emerging technology that extends checkpoint/restore capabilities to GPU-accelerated workloads, represents a significant advancement in training efficiency. By capturing GPU state alongside CPU state, CRIU-GPU enables seamless migration of training workloads between nodes, more aggressive use of spot instances, and faster recovery from failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Inference: The Cold Start Challenge
&lt;/h3&gt;

&lt;p&gt;Real-time inference workloads, typically deployed as Kubernetes Deployments, face different optimization challenges centered around responsiveness and resource efficiency. These workloads must maintain low latency while efficiently utilizing expensive GPU resources.&lt;/p&gt;

&lt;p&gt;The primary efficiency killer in inference workloads is the cold start problem. When inference pods restart or scale up, they must reload large models into GPU memory. This process can take 30 seconds to several minutes for large language models or computer vision models. During this loading period, the GPU is partially utilized while the system prepares for inference requests.&lt;/p&gt;

&lt;p&gt;Consider a scenario where you're running an 80GB language model on an H100 with 141GB of GPU memory. While the model fits comfortably in memory, the loading process creates a significant gap in utilization. If pods restart frequently due to deployment updates or node maintenance, these cold starts accumulate substantial waste.&lt;/p&gt;

&lt;p&gt;Strategic right-sizing becomes critical for inference workloads. Rather than defaulting to the largest available GPU instance, teams should match GPU memory requirements to model sizes while considering replica strategies that minimize cold start frequency.&lt;/p&gt;

&lt;p&gt;CRIUgpu (generally, checkpointing GPU workloads) is used to serialize the contents loaded in GPU memory (instead of having to redownload the weights, etc on restart) is critical to help reduce cold start times on pod restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch Inference: Throughput vs. Utilization Trade-offs
&lt;/h3&gt;

&lt;p&gt;Batch inference workloads process large volumes of data asynchronously, typically using Kubernetes Jobs or CronJobs. These workloads offer the most significant opportunity for optimization because they can tolerate higher latency in exchange for better resource efficiency.&lt;/p&gt;

&lt;p&gt;The key optimization principle for batch inference is utilization density—maximizing the amount of useful work performed per GPU-hour. This often involves batching strategies that fully utilize GPU memory and compute capabilities, even if individual request latency increases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Research Workflows: The Utilization Killers
&lt;/h3&gt;

&lt;p&gt;Research and experimentation workflows represent the most challenging category for GPU utilization optimization. These workloads, often running in Jupyter notebooks or interactive development environments, exhibit highly irregular usage patterns with long idle periods.&lt;/p&gt;

&lt;p&gt;A typical research workflow might involve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loading a large dataset into GPU memory&lt;/li&gt;
&lt;li&gt;Running short experiments with high GPU utilization&lt;/li&gt;
&lt;li&gt;Long periods of analysis and code modification with zero GPU usage&lt;/li&gt;
&lt;li&gt;Abandoned experiments that continue consuming resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Research workflows often receive priority access to GPU resources due to their exploratory nature; however, this priority frequently results in poor utilization. A data scientist might reserve an H100 instance for a week-long research project but only actively use the GPU for 10-15% of that time.&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>kubernetes</category>
      <category>nvidia</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
