<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anuj Tyagi</title>
    <description>The latest articles on DEV Community by Anuj Tyagi (@sudo_anuj).</description>
    <link>https://dev.to/sudo_anuj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg</url>
      <title>DEV Community: Anuj Tyagi</title>
      <link>https://dev.to/sudo_anuj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sudo_anuj"/>
    <language>en</language>
    <item>
      <title>Canary Deployments with Flagger</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Tue, 01 Jul 2025 03:59:04 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/canary-deployments-with-flagger-ag3</link>
      <guid>https://dev.to/sudo_anuj/canary-deployments-with-flagger-ag3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the fast-paced world of software deployment, the ability to release new features safely and efficiently can make or break your application's reliability. Canary deployments have emerged as a critical strategy for minimizing risk while maintaining continuous delivery. In this comprehensive guide, we'll explore how to implement robust canary deployments using Flagger, a progressive delivery operator for Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Canary Deployment?
&lt;/h2&gt;

&lt;p&gt;Canary deployment is a technique for rolling out new features or changes to a small subset of users before releasing the update to the entire system. Named after the "canary in a coal mine" practice, this approach allows you to detect issues early and rollback quickly if problems arise.&lt;/p&gt;

&lt;p&gt;Instead of replacing your entire application at once, canary deployments gradually shift traffic from the stable version (primary) to the new version (canary), monitoring key metrics throughout the process. If the metrics indicate problems, the deployment automatically rolls back to the stable version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Choose Flagger?
&lt;/h2&gt;

&lt;p&gt;Flagger is a progressive delivery operator that automates the promotion or rollback of canary deployments based on metrics analysis. Here's why it stands out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated Traffic Management&lt;/strong&gt;: Gradually shifts traffic between versions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics-Driven Decisions&lt;/strong&gt;: Uses Prometheus metrics to determine deployment success&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple Ingress Support&lt;/strong&gt;: Works with NGINX, Istio, Linkerd, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webhook Integration&lt;/strong&gt;: Supports custom testing and validation hooks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HPA Integration&lt;/strong&gt;: Seamlessly works with Horizontal Pod Autoscaler&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites and Setup
&lt;/h2&gt;

&lt;p&gt;As shared above, Flagger provides multiple integration options but I used Nginx ingress controller and Prometheus for metrics. &lt;/p&gt;

&lt;h3&gt;
  
  
  Required Components
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NGINX Ingress Controller&lt;/strong&gt; (v1.0.2 or newer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Pod Autoscaler&lt;/strong&gt; (HPA) enabled&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus&lt;/strong&gt; for metrics collection and analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flagger&lt;/strong&gt; deployed in your cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Verification Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check NGINX ingress controller&lt;/span&gt;
kubectl get service &lt;span class="nt"&gt;--all-namespaces&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;nginx

&lt;span class="c"&gt;# Verify HPA is enabled&lt;/span&gt;
kubectl get hpa &lt;span class="nt"&gt;--all-namespaces&lt;/span&gt;

&lt;span class="c"&gt;# Confirm Flagger installation&lt;/span&gt;
kubectl get all &lt;span class="nt"&gt;-n&lt;/span&gt; flagger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 1: Installing Flagger
&lt;/h2&gt;

&lt;p&gt;Flagger can be deployed using Helm or ArgoCD. Once installed, it creates several Custom Resource Definitions (CRDs):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get crds | &lt;span class="nb"&gt;grep &lt;/span&gt;flagger
&lt;span class="c"&gt;# Expected output:&lt;/span&gt;
&lt;span class="c"&gt;# alertproviders.flagger.app&lt;/span&gt;
&lt;span class="c"&gt;# canaries.flagger.app  &lt;/span&gt;
&lt;span class="c"&gt;# metrictemplates.flagger.app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Understanding Flagger's Architecture
&lt;/h2&gt;

&lt;p&gt;When you deploy a canary with Flagger, it automatically creates and manages several Kubernetes objects:&lt;/p&gt;

&lt;h3&gt;
  
  
  Original Objects (You Provide)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;deployment.apps/your-app&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;horizontalpodautoscaler.autoscaling/your-app&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ingresses.extensions/your-app&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;canary.flagger.app/your-app&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Generated Objects (Flagger Creates)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;deployment.apps/your-app-primary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;horizontalpodautoscaler.autoscaling/your-app-primary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service/your-app&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service/your-app-canary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;service/your-app-primary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ingresses.extensions/your-app-canary&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Creating Your First Canary Configuration
&lt;/h2&gt;

&lt;p&gt;Here's a comprehensive canary configuration example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger.app/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Canary&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;

  &lt;span class="c1"&gt;# Reference to your deployment&lt;/span&gt;
  &lt;span class="na"&gt;targetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;

  &lt;span class="c1"&gt;# Reference to your ingress&lt;/span&gt;
  &lt;span class="na"&gt;ingressRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;

  &lt;span class="c1"&gt;# Optional HPA reference&lt;/span&gt;
  &lt;span class="na"&gt;autoscalerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;autoscaling/v2&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HorizontalPodAutoscaler&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app&lt;/span&gt;

  &lt;span class="c1"&gt;# Maximum time for canary to make progress before rollback&lt;/span&gt;
  &lt;span class="na"&gt;progressDeadlineSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;600&lt;/span&gt;

  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;portDiscovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Analysis runs every minute&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;

    &lt;span class="c1"&gt;# Maximum failed checks before rollback&lt;/span&gt;
    &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

    &lt;span class="c1"&gt;# Maximum traffic percentage to canary&lt;/span&gt;
    &lt;span class="na"&gt;maxWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;

    &lt;span class="c1"&gt;# Traffic increment step&lt;/span&gt;
    &lt;span class="na"&gt;stepWeight&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

    &lt;span class="c1"&gt;# Metrics to monitor&lt;/span&gt;
    &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error-rate"&lt;/span&gt;
      &lt;span class="na"&gt;templateRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;error-rate&lt;/span&gt;
      &lt;span class="na"&gt;thresholdRange&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.02&lt;/span&gt;  &lt;span class="c1"&gt;# 2% error rate threshold&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency"&lt;/span&gt;
      &lt;span class="na"&gt;templateRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;latency&lt;/span&gt;
      &lt;span class="na"&gt;thresholdRange&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;  &lt;span class="c1"&gt;# 500ms latency threshold&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;

    &lt;span class="c1"&gt;# Optional webhooks for testing&lt;/span&gt;
    &lt;span class="na"&gt;webhooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;load-test&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://flagger-loadtester.test/&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hey&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-z&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1m&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-q&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-c&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;http://my-app-canary:8080/"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Setting Up Service Monitors
&lt;/h2&gt;

&lt;p&gt;For Prometheus to collect metrics from both primary and canary services, you need to create separate ServiceMonitor resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Canary ServiceMonitor&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceMonitor&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-canary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metrics&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-canary&lt;/span&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# Primary ServiceMonitor  &lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceMonitor&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-primary&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metrics&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app.kubernetes.io/name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-primary&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, you may find metrics discovery in the Prometheus, &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs03en8pz734ya5mrfwis.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs03en8pz734ya5mrfwis.png" alt=" " width="698" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Creating Custom Metric Templates
&lt;/h2&gt;

&lt;p&gt;Flagger uses MetricTemplate resources to define how metrics are calculated. Here's an example for error rate comparison:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger.app/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MetricTemplate&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;error-rate&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus:9090&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;sum(&lt;/span&gt;
      &lt;span class="s"&gt;rate(&lt;/span&gt;
        &lt;span class="s"&gt;http_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;service="my-app-canary",&lt;/span&gt;
              &lt;span class="s"&gt;status=~"5.*"&lt;/span&gt;
          &lt;span class="s"&gt;}[1m]&lt;/span&gt;
      &lt;span class="s"&gt;) or on() vector(0))/sum(rate(&lt;/span&gt;
          &lt;span class="s"&gt;http_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;service="my-app-canary"&lt;/span&gt;
          &lt;span class="s"&gt;}[1m]&lt;/span&gt;
      &lt;span class="s"&gt;))&lt;/span&gt;
    &lt;span class="s"&gt;- sum(&lt;/span&gt;
      &lt;span class="s"&gt;rate(&lt;/span&gt;
        &lt;span class="s"&gt;http_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;service="my-app-primary",&lt;/span&gt;
              &lt;span class="s"&gt;status=~"5.*"&lt;/span&gt;
          &lt;span class="s"&gt;}[1m]&lt;/span&gt;
      &lt;span class="s"&gt;) or on() vector(0))/sum(rate(&lt;/span&gt;
          &lt;span class="s"&gt;http_requests_total{&lt;/span&gt;
              &lt;span class="s"&gt;service="my-app-primary"&lt;/span&gt;
          &lt;span class="s"&gt;}[1m]&lt;/span&gt;
      &lt;span class="s"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query calculates the difference in error rates between canary and primary versions. The &lt;code&gt;or on() vector(0)&lt;/code&gt; ensures the query returns 0 when no metrics are available instead of failing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Canary Analysis Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Promotion Flow
&lt;/h3&gt;

&lt;p&gt;When Flagger detects a new deployment, it follows this process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initialization&lt;/strong&gt;: Scale up canary deployment alongside primary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-rollout Checks&lt;/strong&gt;: Execute pre-rollout webhooks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traffic Shifting&lt;/strong&gt;: Gradually increase traffic to canary (10% → 20% → 30% → 40% → 50%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics Analysis&lt;/strong&gt;: Check error rates, latency, and custom metrics at each step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Promotion Decision&lt;/strong&gt;: If all checks pass, promote canary to primary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cleanup&lt;/strong&gt;: Scale down old primary, update primary with canary spec&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Rollback Scenarios
&lt;/h3&gt;

&lt;p&gt;Flagger automatically rolls back when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error rate exceeds threshold&lt;/li&gt;
&lt;li&gt;Latency exceeds threshold
&lt;/li&gt;
&lt;li&gt;Custom metric checks fail&lt;/li&gt;
&lt;li&gt;Webhook tests fail&lt;/li&gt;
&lt;li&gt;Failed checks counter reaches threshold&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Monitoring Canary Progress
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch all canaries in real-time&lt;/span&gt;
watch kubectl get canaries &lt;span class="nt"&gt;--all-namespaces&lt;/span&gt;

&lt;span class="c"&gt;# Get detailed canary status&lt;/span&gt;
kubectl describe canary/my-app &lt;span class="nt"&gt;-n&lt;/span&gt; production

&lt;span class="c"&gt;# View Flagger logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; deployment/flagger &lt;span class="nt"&gt;-n&lt;/span&gt; flagger-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Webhooks for Enhanced Testing
&lt;/h3&gt;

&lt;p&gt;Flagger supports multiple webhook types for comprehensive testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;webhooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Manual approval before rollout&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirm-rollout"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confirm-rollout&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://approval-service/gate/approve&lt;/span&gt;

  &lt;span class="c1"&gt;# Pre-deployment testing&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integration-test"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pre-rollout&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://test-service/&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
      &lt;span class="na"&gt;cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run-integration-tests.sh"&lt;/span&gt;

  &lt;span class="c1"&gt;# Load testing during rollout&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load-test"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rollout&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://loadtester/&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hey&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-z&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2m&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-q&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-c&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;http://my-app-canary/"&lt;/span&gt;

  &lt;span class="c1"&gt;# Manual promotion approval&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirm-promotion"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confirm-promotion&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://approval-service/gate/approve&lt;/span&gt;

  &lt;span class="c1"&gt;# Post-deployment notifications&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack-notification"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;post-rollout&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://notification-service/slack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HPA Integration
&lt;/h3&gt;

&lt;p&gt;When using HPA with canary deployments, Flagger pauses traffic increases while scaling operations are in progress:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;autoscalerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;autoscaling/v2&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HorizontalPodAutoscaler&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app-primary&lt;/span&gt;
  &lt;span class="na"&gt;primaryScalerReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;minReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="na"&gt;maxReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Alerting and Notifications
&lt;/h3&gt;

&lt;p&gt;Configure alerts to be notified of canary deployment status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;analysis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;alerts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;canary-status"&lt;/span&gt;
      &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;info&lt;/span&gt;
      &lt;span class="na"&gt;providerRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;slack-alert&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flagger-system&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Traffic Requirements
&lt;/h3&gt;

&lt;p&gt;For effective canary analysis, you need sufficient traffic to generate meaningful metrics. If your production traffic is low:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consider using load testing webhooks&lt;/li&gt;
&lt;li&gt;Implement synthetic traffic generation&lt;/li&gt;
&lt;li&gt;Adjust analysis intervals and thresholds accordingly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Metrics Selection
&lt;/h3&gt;

&lt;p&gt;Choose metrics that accurately reflect your application's health:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Error Rate&lt;/strong&gt;: Monitor 5xx responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: Track P95 or P99 response times&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Business Metrics&lt;/strong&gt;: Application-specific indicators&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deployment Timing
&lt;/h3&gt;

&lt;p&gt;Calculate your deployment duration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Minimum time = interval × (maxWeight / stepWeight)
Rollback time = interval × threshold
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, with interval=1m, maxWeight=50%, stepWeight=10%, threshold=5:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimum deployment time: 1m × (50/10) = 5 minutes&lt;/li&gt;
&lt;li&gt;Rollback time: 1m × 5 = 5 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Missing Metrics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Canary fails due to missing metrics&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Verify ServiceMonitor selectors match service labels&lt;/p&gt;

&lt;h3&gt;
  
  
  Webhook Failures
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Load testing webhooks time out&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Increase webhook timeout and verify load tester accessibility&lt;/p&gt;

&lt;h3&gt;
  
  
  HPA Conflicts
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Scaling issues during canary deployment&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Ensure HPA references are correctly configured for both primary and canary&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Policies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Traffic routing issues&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Verify network policies allow communication between services&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start Small&lt;/strong&gt;: Begin with low traffic percentages and gradual increases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Actively&lt;/strong&gt;: Set up comprehensive alerting for canary deployments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Thoroughly&lt;/strong&gt;: Use webhooks for automated testing at each stage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan for Rollback&lt;/strong&gt;: Ensure your rollback process is well-tested&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Everything&lt;/strong&gt;: Maintain clear documentation of your canary processes&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Flagger provides a robust, automated solution for implementing canary deployments in Kubernetes environments. By gradually shifting traffic while monitoring key metrics, it enables safe deployments with automatic rollback capabilities.&lt;/p&gt;

&lt;p&gt;The combination of metrics-driven analysis, webhook integration, and seamless traffic management makes Flagger an excellent choice for teams looking to implement progressive delivery practices. Start with simple configurations and gradually add more sophisticated monitoring and testing as your confidence grows.&lt;/p&gt;

&lt;p&gt;Remember that successful canary deployments depend not just on the tooling, but also on having appropriate metrics, sufficient traffic, and well-defined success criteria. With proper implementation, Flagger can significantly reduce deployment risks while maintaining the agility your development teams need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.flagger.app/" rel="noopener noreferrer"&gt;Flagger Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.flagger.app/tutorials/nginx-progressive-delivery" rel="noopener noreferrer"&gt;NGINX Progressive Delivery Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.flagger.app/tutorials/prometheus-operator" rel="noopener noreferrer"&gt;Prometheus Operator Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.flagger.app/usage/webhooks" rel="noopener noreferrer"&gt;Webhook Configuration Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>flagger</category>
      <category>canary</category>
      <category>kubernetes</category>
      <category>deployment</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Sun, 22 Jun 2025 19:53:14 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/-g14</link>
      <guid>https://dev.to/sudo_anuj/-g14</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c" class="crayons-story__hidden-navigation-link"&gt;KEDA Upgrade Debugging: When Empty Triggers Break Your Scaling&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/sudo_anuj" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" alt="sudo_anuj profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/sudo_anuj" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Anuj Tyagi
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Anuj Tyagi
                
              
              &lt;div id="story-author-preview-content-2607768" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/sudo_anuj" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Anuj Tyagi&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 20 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c" id="article-link-2607768"&gt;
          KEDA Upgrade Debugging: When Empty Triggers Break Your Scaling
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/keda"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;keda&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/eventdriven"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;eventdriven&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/kubernetes"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;kubernetes&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/debugging"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;debugging&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>keda</category>
      <category>eventdriven</category>
      <category>kubernetes</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Collect AWS Lambda@Edge metrics with Prometheus</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Fri, 20 Jun 2025 05:25:29 +0000</pubDate>
      <link>https://dev.to/aws-builders/collect-aws-lambda-edge-metrics-with-prometheus-12ah</link>
      <guid>https://dev.to/aws-builders/collect-aws-lambda-edge-metrics-with-prometheus-12ah</guid>
      <description>&lt;p&gt;This post is about the problem I worked 2 years ago but should be still valid. Why? As I solved the problem internally back in past but forgot to create PR in the official public YACE github repo. If you don't undestand what I am talking about. I will expand this blog in future. &lt;/p&gt;

&lt;p&gt;Let me explain from the beginning. &lt;/p&gt;

&lt;p&gt;I was working on implementing monitoring for a enterprise infrastructure. I was using Prometheus with &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter" rel="noopener noreferrer"&gt;YACE&lt;/a&gt; (yet another cloudwatch exporter) to collect metrics. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is YACE exporter?&lt;/strong&gt;&lt;br&gt;
It's like a plugin used with Prometheus to collect metrics from AWS. We have another option, CloudWatch exporter for the same use case but I am going ahead with YACE exporter.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1s53jkx9612n8mg16as.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1s53jkx9612n8mg16as.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter/tree/master/examples" rel="noopener noreferrer"&gt;examples&lt;/a&gt;, collecting metrics was straightforward but then I was stuck when I had to collect metrics from Lambda edge but unlike other examples, YACE was not supporting metrics discovery through AWS Lambda edge . &lt;/p&gt;

&lt;p&gt;So, I created a Github Issue in YACE repo: &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter/issues/876" rel="noopener noreferrer"&gt;https://github.com/prometheus-community/yet-another-cloudwatch-exporter/issues/876&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I received &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter/issues/876#issuecomment-1528833324" rel="noopener noreferrer"&gt;response&lt;/a&gt;, Lambda@edge don't support tags so it's metrics can't be collected via service discovery. This was blocking my project so I have to somehow solve this problem. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I solved this problem?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I figure out another approach to collect metrics by using static configuration if you know which regions are you using to collect metrics via &lt;a href="mailto:Lambda@edge"&gt;Lambda@edge&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to collect metrics via Static approach?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; apiVersion: v1alpha1
  static:
    - name: us-east-1.&amp;lt;edge_lambda_function_name&amp;gt;
      namespace: AWS/Lambda
      regions:
        - eu-central-1
        - us-east-1
        - us-west-2
        - ap-southeast-1
      period: 600
      length: 600
      metrics:
        - name: Invocations
          statistics: [Sum]
        - name: Errors
          statistics: [Sum]
        - name: Throttles
          statistics: [Sum]
        - name: Duration
          statistics: [Average, Maximum, Minimum, p90]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, I added all regions my Lambda@edge is using. I also created &lt;a href="https://github.com/prometheus-community/yet-another-cloudwatch-exporter/pull/1628" rel="noopener noreferrer"&gt;PR&lt;/a&gt; for this in YACE repo. &lt;/p&gt;

&lt;p&gt;Hope this helps someone. &lt;/p&gt;

</description>
      <category>aws</category>
      <category>prometheus</category>
      <category>lambda</category>
      <category>edge</category>
    </item>
    <item>
      <title>KEDA Upgrade Debugging: When Empty Triggers Break Your Scaling</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Fri, 20 Jun 2025 04:20:39 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c</link>
      <guid>https://dev.to/sudo_anuj/keda-upgrade-debugging-when-empty-triggers-break-your-scaling-5c6c</guid>
      <description>&lt;p&gt;This is one of the past use case to troubleshooting KEDA, Kubernetes based event driven autoscaler during upgrade in a non production environment.&lt;br&gt;&lt;br&gt;
So, I was working to upgrade KEDA from v2.10 to v2.15 for a infra unfamiliar to me. It was my first hands on experience with KEDA. I quickly understood purpose of KEDA, I worked more with HPA before that.&lt;br&gt;
If you're not aware of the difference between all pod scaling options, you can read my last post &lt;br&gt;
on &lt;a href="https://dev.to/sudo_anuj/scaling-patterns-in-kubernetes-vpa-hpa-and-keda-3mgd"&gt;Kubernetes pod scaling patterns&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My goal was to upgrade KEDA from v2.10 to v2.15 and ensure all existing &lt;code&gt;ScaledObjects&lt;/code&gt; continued to function properly. The environment had been running with KEDA v2.10 for months, and all configurations appeared to be working correctly.&lt;/p&gt;
&lt;h3&gt;
  
  
  Initial Error Analysis
&lt;/h3&gt;

&lt;p&gt;After the upgrade, the KEDA operator logs showed concerning errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2024/11/04 17:57:49 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
{"level":"info","ts":"2024-11-04T17:57:49.765Z","logger":"setup","msg":"KEDA Version: 2.15.1"}
{"level":"info","ts":"2024-11-04T17:57:49.765Z","logger":"setup","msg":"Git Commit: 123543fnerfin4fcw3d23d23b"}
I1104 17:57:49.866460    1 leaderelection.go:250] attempting to acquire leader lease keda/operator.keda.sh...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key concern was that if the last line shows only &lt;code&gt;attempting to acquire leader lease&lt;/code&gt; without the follow-up &lt;code&gt;successfully acquired lease&lt;/code&gt;, it means the leader is locked and can't work as leader. But this is fine. It can also mean, another pod is working as a leader. &lt;/p&gt;

&lt;p&gt;I went ahead to understand leader election process. &lt;br&gt;
Understanding KEDA's leader election process was crucial. A healthy startup sequence looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I1106 21:42:09.498384       1 leaderelection.go:254] attempting to acquire leader lease keda/operator.keda.sh...
I1106 21:42:55.066863       1 leaderelection.go:268] successfully acquired lease keda/operator.keda.sh
2024-11-06T21:42:55Z    INFO    Starting EventSource    {"controller": "scaledobject"}
2024-11-06T21:42:55Z    INFO    Starting Controller {"controller": "scaledobject"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sequence should include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Attempting to acquire lease&lt;/li&gt;
&lt;li&gt;Successfully acquiring lease
&lt;/li&gt;
&lt;li&gt;Multiple controller initialization messages&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Configuration Investigation
&lt;/h3&gt;

&lt;p&gt;Examining the failing ScaledObject revealed the root cause:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get scaledobject app &lt;span class="nt"&gt;-n&lt;/span&gt; test-app &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keda.sh/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaledObject&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webapp&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-app&lt;/span&gt;
  &lt;span class="na"&gt;creationTimestamp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-05-10T13:16:22Z"&lt;/span&gt;  &lt;span class="c1"&gt;# Created months ago&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webapp&lt;/span&gt;
  &lt;span class="na"&gt;minReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;triggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;  
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;conditions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaledObject doesn't have correct triggers specification&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaledObjectCheckFailed&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;False"&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ready&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Real Issue Discovery
&lt;/h3&gt;

&lt;p&gt;When I checked another KEDA operator pod, I found the root cause: &lt;/p&gt;

&lt;p&gt;&lt;code&gt;error":"no triggers defined in the ScaledObject/ScaledJob"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I spend more time searching, why KEDA was screaming for empty trigger now in v2.15 but not in v2.10. So, any release after v2.10 added this as exception and log message.  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;KEDA v2.10 behavior&lt;/strong&gt;: Silently accepted empty triggers (&lt;code&gt;triggers: []&lt;/code&gt;) and created a default HPA with 80% CPU utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KEDA v2.15 behavior&lt;/strong&gt;: Validates triggers and throws errors for empty arrays&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeline&lt;/strong&gt;: This ScaledObject had been running incorrectly from past 6 months, but v2.10 hid the problem. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Fix Implementation
&lt;/h3&gt;

&lt;p&gt;I found specific Github issue and PR: &lt;/p&gt;

&lt;p&gt;The empty triggers validation was introduced in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Issue&lt;/strong&gt;: &lt;a href="https://github.com/kedacore/keda/issues/5520" rel="noopener noreferrer"&gt;#5520&lt;/a&gt; - "KEDA doesn't validate empty array of triggers"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pull Request&lt;/strong&gt;: &lt;a href="https://github.com/kedacore/keda/pull/5524" rel="noopener noreferrer"&gt;#5524&lt;/a&gt; - "fix: Validate empty array value of triggers in ScaledObject/ScaledJob creation"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KEDA Version&lt;/strong&gt;: Introduced in v2.14, refined in v2.15&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge Date&lt;/strong&gt;: February 2024&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuration Fix
&lt;/h3&gt;

&lt;p&gt;The solution was to add proper triggers to the ScaledObject:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keda.sh/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaledObject&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webapp&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webapp&lt;/span&gt;
  &lt;span class="na"&gt;minReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;triggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;serverAddress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://prometheus:9090&lt;/span&gt;
      &lt;span class="na"&gt;metricName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http_requests_per_second&lt;/span&gt;
      &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100"&lt;/span&gt;
      &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sum(rate(http_requests_total[1m]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Validation Commands
&lt;/h3&gt;

&lt;p&gt;To identify similar issues across the cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find ScaledObjects with empty triggers&lt;/span&gt;
kubectl get scaledobjects &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{range .items[?(@.spec.triggers[0] == null)]}{.metadata.namespace}{"/"}{.metadata.name}{"\n"}{end}'&lt;/span&gt;

&lt;span class="c"&gt;# Check ScaledObject status&lt;/span&gt;
kubectl get scaledobjects &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; custom-columns&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.conditions[?(@.type=='Ready')].status"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Silent Failures Are Dangerous&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;KEDA v2.10's behavior of silently creating default HPAs masked configuration errors for months. The application had been using basic CPU scaling instead of the intended event-driven scaling.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Validation Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The upgrade didn't break anything - it revealed existing problems. KEDA v2.15's strict validation prevents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Misleading functionality (thinking event-driven scaling is active when it's not)&lt;/li&gt;
&lt;li&gt;Resource waste from inappropriate scaling decisions&lt;/li&gt;
&lt;li&gt;Configuration drift&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Understanding Version Changes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Breaking changes often fix underlying issues. The validation was introduced because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Empty triggers create meaningless ScaledObjects&lt;/li&gt;
&lt;li&gt;Default CPU-based scaling defeats KEDA's event-driven purpose&lt;/li&gt;
&lt;li&gt;Silent failures violate "fail fast, fail loud" principles&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Debugging Best Practices&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When investigating KEDA issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check leader election sequence completion&lt;/li&gt;
&lt;li&gt;Examine ScaledObject status conditions&lt;/li&gt;
&lt;li&gt;Validate trigger configurations before upgrades&lt;/li&gt;
&lt;li&gt;Test in non-production environments first&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Prevention Strategies&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Implement CI/CD validation for empty triggers&lt;/li&gt;
&lt;li&gt;Monitor ScaledObject health status&lt;/li&gt;
&lt;li&gt;Set up alerts for configuration failures&lt;/li&gt;
&lt;li&gt;Review configurations before major upgrades&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;What initially appeared to be a breaking change in KEDA v2.15 was actually a long-overdue fix for silent configuration failures. The ScaledObject had been misconfigured since May 2024, but v2.10 had been hiding the problem by falling back to default CPU-based scaling.&lt;/p&gt;

&lt;p&gt;This experience reinforces that sometimes "breaking" changes reveal existing problems rather than creating new ones. The improved validation in KEDA v2.15 ensures that event-driven autoscaling works as intended, making the system more reliable and preventing future silent failures.&lt;/p&gt;

&lt;p&gt;Understanding the difference between a tool breaking and a tool revealing existing breakage is crucial for effective debugging and system maintenance.&lt;/p&gt;

</description>
      <category>keda</category>
      <category>eventdriven</category>
      <category>kubernetes</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Scaling patterns in Kubernetes: VPA, HPA and KEDA</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Fri, 20 Jun 2025 02:26:52 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/scaling-patterns-in-kubernetes-vpa-hpa-and-keda-3mgd</link>
      <guid>https://dev.to/sudo_anuj/scaling-patterns-in-kubernetes-vpa-hpa-and-keda-3mgd</guid>
      <description>&lt;p&gt;I've been working with a mostly HPA as a scaling options in past but last year I started working with KEDA. So, I thought to write post to explain possible options in pod autoscaling. On the other side, manually adjusting parameters is not only slow but also inefficient. If you decide to allocate too little resource and you'll deliver subpar user experience or can experience application outages. If you over-provision resources "just in case" and you'll waste money and resources. That's where Kubernetes autoscaling comes to the rescue and deliver the right resources when required. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding Kubernetes Pod Autoscaling Fundamentals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Autoscaling in Kubernetes means dynamically allocating cluster resources like CPU and memory to your applications based on real-time demand. This ensures applications have the right amount of resources to handle varying levels of load, directly improving application performance and availability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Autoscaling:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Efficiency&lt;/strong&gt;: Pay only for the resources you need instead of over-provisioning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Environmental Impact&lt;/strong&gt;: Reduced power consumption and carbon emissions through better resource alignment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time Savings&lt;/strong&gt;: Automates manual resource adjustment tasks, freeing up valuable DevOps time&lt;br&gt;
Performance Optimization: Ensures applications maintain optimal performance under varying loads&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Three Pillars of Kubernetes Autoscaling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes offers three primary autoscaling mechanisms, each serving different purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vertical Pod Autoscaler (VPA)&lt;/strong&gt; - Adjusts resource requests and limits within individual pods&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Horizontal Pod Autoscaler (HPA)&lt;/strong&gt; - Scales the number of pod replicas up or down&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kubernetes Event-Driven Autoscaler&lt;/strong&gt;(KEDA) - Scales based on external events and custom metrics&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's explore each of them one by one. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vertical Pod Autoscaler (VPA): Right-sizing Your Pods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is VPA?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of individual containers within pods based on historical usage patterns. Instead of scaling the number of pods, VPA makes your existing pods "beefier" or "leaner" based on their actual resource needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How VPA Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VPA operates through three core components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recommender&lt;/strong&gt;: Calculates optimal resource values based on historical metrics from the Kubernetes Metrics Server, analyzing up to 8 days of data to generate recommendations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Updater&lt;/strong&gt;: Monitors recommendation changes and evicts pods when resource adjustments are needed, forcing replacement with updated allocations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Admission Webhook&lt;/strong&gt;: Intercepts new pod deployments and injects updated resource values based on VPA recommendations.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;When to Use VPA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VPA is ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateful applications that can't be easily scaled horizontally&lt;/li&gt;
&lt;li&gt;Resource optimization scenarios where you need to fine-tune individual pod resources&lt;/li&gt;
&lt;li&gt;Applications with unpredictable resource patterns that traditional static allocation can't handle&lt;/li&gt;
&lt;li&gt;Cost optimization efforts to eliminate resource waste.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;VPA configuration example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: "my-app"
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      maxAllowed:
        cpu: 1
        memory: 500Mi
      minAllowed:
        cpu: 100m
        memory: 50Mi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Challenges with VPA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Despite its benefits, VPA has several limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incompatibility with HPA&lt;/strong&gt;: Cannot run both tools together for CPU/memory-based scaling&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limited historical data&lt;/strong&gt;: Only stores 8 days of metrics, losing data on pod restarts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service disruption&lt;/strong&gt;: Pod evictions cause momentary service interruptions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No time-based controls&lt;/strong&gt;: Pod evictions can happen at any time, including peak hours&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster-wide configuration&lt;/strong&gt;: Limited per-workload customization options&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Horizontal Pod Autoscaler (HPA): Scaling Out Your Application&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is HPA?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HPA automatically scales the number of pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metrics like CPU utilization, memory usage, or custom metrics. It's the most fundamental and widely-used autoscaling pattern in Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyupmri2k6dd33r3lrgo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyupmri2k6dd33r3lrgo4.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How HPA Overcomes VPA Challenges&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While VPA adjusts resources within pods, HPA takes a different approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No service disruption&lt;/strong&gt;: Scaling replicas doesn't require pod eviction&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Works with stateless applications&lt;/strong&gt;: Perfect for horizontally scalable workloads&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Predictable scaling&lt;/strong&gt;: Based on well-understood metrics like CPU and memory&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mature and stable&lt;/strong&gt;: Built-in Kubernetes feature with extensive community support&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to Use HPA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HPA is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateless applications where pods are interchangeable&lt;/li&gt;
&lt;li&gt;Predictable workloads with clear load patterns&lt;/li&gt;
&lt;li&gt;Web applications that experience traffic variations&lt;/li&gt;
&lt;li&gt;Microservices that can benefit from horizontal scaling
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;HPA Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While HPA is powerful, it has constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Limited to resource metrics&lt;/strong&gt;: Basic HPA only works with CPU/memory metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not suitable for event-driven workloads&lt;/strong&gt;: Can't scale based on queue lengths or custom events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reactive scaling&lt;/strong&gt;: Only responds after metrics breach thresholds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No scale-to-zero&lt;/strong&gt;: Cannot scale down to zero replicas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;KEDA: Event-Driven Autoscaling for Modern Applications&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is KEDA?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes Event-Driven Autoscaling (KEDA) extends Kubernetes' native autoscaling capabilities to allow applications to scale based on events from various sources like message queues, databases, or custom metrics. KEDA graduated as a CNCF project, highlighting its importance in the cloud-native ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How KEDA Overcomes HPA Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KEDA addresses several HPA shortcomings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Event-driven scaling&lt;/strong&gt;: Scales based on queue lengths, database records, HTTP requests, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale-to-zero capability&lt;/strong&gt;: Can scale applications down to zero when no events are present&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich ecosystem&lt;/strong&gt;: Supports 50+ event sources including Kafka, RabbitMQ, Azure Service Bus, AWS SQS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom metrics&lt;/strong&gt;: Works with any metric source through external scalers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pleoumpti7u2pqwwm8e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pleoumpti7u2pqwwm8e.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;When to Use KEDA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;KEDA excels in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-driven architectures with message queues and event buses&lt;/li&gt;
&lt;li&gt;Serverless-style workloads that benefit from scale-to-zero&lt;/li&gt;
&lt;li&gt;Batch processing jobs triggered by data availability&lt;/li&gt;
&lt;li&gt;IoT applications processing sensor data streams
Machine learning pipelines processing inference requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;KEDA configuration example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaler
spec:
  scaleTargetRef:
    name: message-processor
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: work-queue
      mode: QueueLength
      value: "5"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;KEDA vs HPA: Key Differences&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnmgxinz8mrz5v396k0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjnmgxinz8mrz5v396k0n.png" alt=" " width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choosing the Right Autoscaling Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use VPA When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have stateful applications that can't scale horizontally&lt;/li&gt;
&lt;li&gt;Resource optimization is your primary concern
You need to fine-tune individual pod resources
Applications have unpredictable resource usage patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use HPA When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have stateless, horizontally scalable applications&lt;/li&gt;
&lt;li&gt;Traditional web applications with predictable load patterns&lt;/li&gt;
&lt;li&gt;Simple microservices that scale based on CPU/memory&lt;/li&gt;
&lt;li&gt;You need a proven, stable autoscaling solution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use KEDA When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building event-driven or serverless-style applications&lt;/li&gt;
&lt;li&gt;Processing messages from queues or streams&lt;/li&gt;
&lt;li&gt;Need to scale based on custom or external metrics&lt;/li&gt;
&lt;li&gt;Cost optimization through scale-to-zero is important&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-World Implementation Scenarios&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: E-commerce Platform&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend services&lt;/strong&gt;: HPA for web servers based on CPU utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Order processing&lt;/strong&gt;: KEDA for scaling based on order queue length&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database connections&lt;/strong&gt;: VPA for optimizing connection pool resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: IoT Data Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data ingestion&lt;/strong&gt;: KEDA scaling based on message queue depth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream processing&lt;/strong&gt;: HPA for consistent throughput requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics services&lt;/strong&gt;: VPA for memory-intensive data processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Machine Learning Platform&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model serving&lt;/strong&gt;: HPA for inference API endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training jobs&lt;/strong&gt;: KEDA triggered by training request queues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature processing&lt;/strong&gt;: VPA for compute-intensive transformations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best Practices and Recommendations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start Simple&lt;/strong&gt;: Begin with HPA for basic scaling needs, then add KEDA for event-driven requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and Adjust&lt;/strong&gt;: Continuously monitor scaling behavior and adjust thresholds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combine Strategies&lt;/strong&gt;: Use different autoscalers for different components of your application&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set Resource Limits&lt;/strong&gt;: Always define appropriate resource limits to prevent runaway scaling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Thoroughly&lt;/strong&gt;: Validate autoscaling behavior under various load conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmbfc4uj7m974x6tkcx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmbfc4uj7m974x6tkcx0.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Kubernetes autoscaling is not a one-size-fits-all solution. The choice between VPA, HPA, and KEDA depends on your specific application requirements, architecture patterns, and operational needs. VPA optimizes resource utilization within pods, HPA provides reliable horizontal scaling for traditional workloads, and KEDA enables sophisticated event-driven scaling for modern cloud-native applications.&lt;br&gt;
By understanding the strengths and limitations of each approach, you can design a comprehensive autoscaling strategy that optimizes both performance and cost while maintaining the reliability your applications demand.&lt;/p&gt;

&lt;p&gt;If you are looking for a more deep dive course and hands on labs on &lt;a href="https://trainingportal.linuxfoundation.org/courses/scaling-cloud-native-applications-with-keda-lfel1014" rel="noopener noreferrer"&gt;Kubernetes Autoscaling and KEDA&lt;/a&gt;, you can checkout official LinuxFoundation course for no cost.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>keda</category>
      <category>hpa</category>
      <category>autoscaling</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Mon, 14 Apr 2025 05:02:22 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/-31mc</link>
      <guid>https://dev.to/sudo_anuj/-31mc</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg" class="crayons-story__hidden-navigation-link"&gt;Collect Aurora audit logs in Firehose&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;
          &lt;a class="crayons-logo crayons-logo--l" href="/aws-builders"&gt;
            &lt;img alt="AWS Community Builders  logo" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2794%2F88da75b6-aadd-4ea1-8083-ae2dfca8be94.png" class="crayons-logo__image"&gt;
          &lt;/a&gt;

          &lt;a href="/sudo_anuj" class="crayons-avatar  crayons-avatar--s absolute -right-2 -bottom-2 border-solid border-2 border-base-inverted  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" alt="sudo_anuj profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/sudo_anuj" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Anuj Tyagi
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Anuj Tyagi
                
              
              &lt;div id="story-author-preview-content-2386633" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/sudo_anuj" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Anuj Tyagi&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

            &lt;span&gt;
              &lt;span class="crayons-story__tertiary fw-normal"&gt; for &lt;/span&gt;&lt;a href="/aws-builders" class="crayons-story__secondary fw-medium"&gt;AWS Community Builders &lt;/a&gt;
            &lt;/span&gt;
          &lt;/div&gt;
          &lt;a href="https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 14 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg" id="article-link-2386633"&gt;
          Collect Aurora audit logs in Firehose
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/firehose"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;firehose&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/cloudwatch"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;cloudwatch&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/logging"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;logging&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>aws</category>
      <category>firehose</category>
      <category>cloudwatch</category>
      <category>logging</category>
    </item>
    <item>
      <title>Collect Aurora audit logs in Firehose</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Mon, 14 Apr 2025 04:55:47 +0000</pubDate>
      <link>https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg</link>
      <guid>https://dev.to/aws-builders/collect-aurora-audit-logs-in-firehose-29jg</guid>
      <description>&lt;p&gt;In our last post, we &lt;a href="https://dev.to/aws-builders/enable-aurora-logs-for-security-audits-587g"&gt;enabled audit logs using parameter groups in Aurora Postgres&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Now, we are collecting our required Aurora logs in CloudWatch but we need to filter our those logs and send to S3 to archive for analysis and long term storage. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this is useful?&lt;/strong&gt;&lt;br&gt;
We can set retention on CloudWatch logs and keep our audit logs in S3. This will help to save cost. In a different use case, we can send to another external destination also for audit or analysis. &lt;/p&gt;

&lt;p&gt;At this point, I am assuming you already have your application logs in CloudWatch. For our use case, I am collecting &lt;a href="https://dev.to/aws-builders/enable-aurora-logs-for-security-audits-587g"&gt;Aurora logs in CloudWatch&lt;/a&gt; as explained in part of this series. Although below use case should work for any logs in CloudWatch&lt;/p&gt;

&lt;p&gt;In order to send logs to S3 from CloudWatch, we will create subscription filter which can help to stream log data in near realtime to destinations. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Subscription Filter in CloudWatch?&lt;/strong&gt;&lt;br&gt;
CloudWatch subscription filter provide filter patterns and options to deliver logs events to AWS services. It can provide log delivery of events to multiple destinations. &lt;/p&gt;

&lt;p&gt;CloudWatch provide multiple service options to create subscription filter.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenSearch&lt;/li&gt;
&lt;li&gt;Kinesis&lt;/li&gt;
&lt;li&gt;Data Firehose&lt;/li&gt;
&lt;li&gt;Lambda&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9697f45bse37h5xjf8k7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9697f45bse37h5xjf8k7.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will go with Firehose considering log volume and cost, and deployment for Firehose is comparatively easier for our goal to steam logs to S3. &lt;br&gt;
Firehose can transform records or convert format before delivery to the S3&lt;/p&gt;

&lt;p&gt;To begin with, we need to follow these steps. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create S3 bucket&lt;/li&gt;
&lt;li&gt;Create Firehose Stream&lt;/li&gt;
&lt;li&gt;Create IAM role for Firehose &lt;/li&gt;
&lt;li&gt;Create CloudWatch subscription filter&lt;/li&gt;
&lt;li&gt;Validation &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The reason we need to follow this approach as we need S3 bucket when creating Firehose stream. For CloudWatch subscription, we need to have Firehose stream first. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step1: Create S3 bucket&lt;/strong&gt;&lt;br&gt;
Creating S3 bucket is straightforward. You need to search S3 service and create S3 bucket with default settings. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step2: Create Firehose Stream&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqzdzoe5ppdshcg57ilx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqzdzoe5ppdshcg57ilx.png" alt=" " width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can keep the option to create IAM roles by itself. &lt;/p&gt;

&lt;p&gt;It can take a few minutes for Firehose Stream to get created and will show &lt;code&gt;active&lt;/code&gt; status.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzunulcjowvo851hdwzzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzunulcjowvo851hdwzzg.png" alt=" " width="618" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: We can't change destination for Firehose after creating stream. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step3: Create IAM role to allow CloudWatch logs -&amp;gt; Firehose&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create IAM policy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPutToFirehose",
      "Effect": "Allow",
      "Action": [
        "firehose:PutRecord",
        "firehose:PutRecordBatch"
      ],
      "Resource": "arn:aws:firehose:&amp;lt;region&amp;gt;:&amp;lt;account-id&amp;gt;:deliverystream/&amp;lt;your-firehose-name&amp;gt;"
    }
  ]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create IAM role &lt;code&gt;LogsToFirehose&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Update Trust Policy as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "logs.&amp;lt;region&amp;gt;.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step4: Create CloudWatch Subscription Filter&lt;/strong&gt;&lt;br&gt;
Now, switch back to our log group in CloudWatch. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdvdja17w0b2ijvqi7bb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdvdja17w0b2ijvqi7bb.png" alt=" " width="800" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;code&gt;Create Amazon Data Firehose subscription filter&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyl28qgx00njvr1zk3syl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyl28qgx00njvr1zk3syl.png" alt=" " width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, after adding filter name, we need to select Firehose stream in current account that we created in Step2. &lt;/p&gt;

&lt;p&gt;We can also add pattern if we want to filter our logs further before sending to Firehose and add prefix.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgketi6ekb872mmojjt0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgketi6ekb872mmojjt0c.png" alt=" " width="800" height="942"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, assign IAM role to grant permission to receive logs by Firehose from CloudWatch. We created this IAM role in Step3. Now, Create Subscription button. &lt;/p&gt;

&lt;p&gt;We should see subscription filter for our logs added like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi9ztftd9bl80k3kvzdc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi9ztftd9bl80k3kvzdc.png" alt=" " width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step5. Validate logs&lt;/strong&gt;&lt;br&gt;
After creating subscription filter, we need to check Firehose Stream monitoring metrics, if it shows data getting collected &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4saaodeywglgjctqx9rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4saaodeywglgjctqx9rv.png" alt=" " width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From our metrics, we can say, we are collecting logs. &lt;/p&gt;

&lt;p&gt;Now, we need to go to S3 our final destination to confirm if we are getting those in bucket. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21mtpc0i1m6x3tbeztbz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21mtpc0i1m6x3tbeztbz.png" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We should see logs organized in bucket with year, month and day. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9itqh8frxyxld8w7nytc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9itqh8frxyxld8w7nytc.png" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;br&gt;
That concludes our goal. &lt;/p&gt;

</description>
      <category>aws</category>
      <category>firehose</category>
      <category>cloudwatch</category>
      <category>logging</category>
    </item>
    <item>
      <title>Enable Aurora logs for security audits</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Sun, 06 Apr 2025 03:32:31 +0000</pubDate>
      <link>https://dev.to/aws-builders/enable-aurora-logs-for-security-audits-587g</link>
      <guid>https://dev.to/aws-builders/enable-aurora-logs-for-security-audits-587g</guid>
      <description>&lt;p&gt;AWS Aurora provides serverless database capability with enhanced features that you can find here. By default, AWS Aurora enables error logs but audit logs are disabled. When running a database in production, one can collect logs from application and other aws services. For Aurora monitoring, we can check metrics from CloudWatch and more granular metrics by enabling performance insight. However, under certain requirements, we may need to analyze database transactions for which we want to enable audit logs. As an SRE/DevOps or DB admin, one use case I came across is when the analytics team wants to analyze those audit logs. In another situation, in order to follow PCI, PII, SOC2 and more security compliance we need to enable audit logs. &lt;/p&gt;

&lt;p&gt;I am also adding steps of creating a cluster, assuming you want to test this in a lab or test account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step1: Creating the Aurora cluster&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Search for Aurora service from the search bar.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjr3dlgx14jnso6dk7p6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjr3dlgx14jnso6dk7p6.png" alt=" " width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: As shared in the above screenshot, I am using the postgreSQL engine from available options. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2x001h79ajy06f8k8vc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2x001h79ajy06f8k8vc.png" alt=" " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, I have selected dev/test in the template option. For Master username, you can change it, I am using default. After creating the cluster, automatically credentials will be saved in the secret manager service. You can use it to get the password to connect with the database. &lt;/p&gt;

&lt;p&gt;Before selecting &lt;code&gt;create cluster&lt;/code&gt; at the bottom, you may see Log exports with options &lt;/p&gt;

&lt;p&gt;Instance log and PostgreSQL logs. In case you are thinking, if any of these options will enable audit logs, the answer is No. Let’s know about these two options first. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59wn1cypvru3h3zijp79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59wn1cypvru3h3zijp79.png" alt=" " width="592" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: To send PostgresSQL logs to CloudWatch, you must enable PostgreSQL log option otherwise you won't even see CloudWatch log group. &lt;/p&gt;

&lt;p&gt;By default, Aurora enables error logs. Error store these logs to &lt;code&gt;log/postgresql.log&lt;/code&gt; file. This will capture all errors like query failures, server errors, login failures. &lt;/p&gt;

&lt;p&gt;If you select the PostgreSQL log option, the cluster will keep error logs but in addition enable PostgreSQL general logs like &lt;code&gt;log_statement, log_duration&lt;/code&gt; etc and export to CloudWatch. &lt;/p&gt;

&lt;p&gt;If you enable the instance log option, there will be a separate file created for it with name instance.log. You will it in logs and events tab. &lt;/p&gt;

&lt;p&gt;Now, we understand the basics on log options available when creating the Aurora cluster. &lt;/p&gt;

&lt;p&gt;Now, click on the Create Cluster from the bottom right button.&lt;/p&gt;

&lt;p&gt;We will see cluster creation in progress and the state will be changed to available in a short time. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step2: Enable access logs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As we have a cluster running, we need to open the cluster page.&lt;/p&gt;

&lt;p&gt;Click on the parameter groups as highlighted in the screenshot on left sidebar. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xddg7qrsnpdmqna431l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xddg7qrsnpdmqna431l.png" alt=" " width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you click on the &lt;code&gt;parameter groups&lt;/code&gt; link, you will see options &lt;code&gt;custom&lt;/code&gt; and &lt;code&gt;Default&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;Default&lt;/code&gt; option will show groups that were created by default during cluster creation. We don't have permission to edit parameters in Default parameter groups.&lt;/p&gt;

&lt;p&gt;So, we will switch back to the &lt;code&gt;custom&lt;/code&gt; option and click on the &lt;code&gt;Create Parameter group&lt;/code&gt; button on the right. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2okwi347kut2l09ikxbk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2okwi347kut2l09ikxbk.png" alt=" " width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we open our &lt;code&gt;custom parameter&lt;/code&gt; and Click &lt;code&gt;Edit&lt;/code&gt; on the right. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5y3omh173yjkjit9zm0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5y3omh173yjkjit9zm0.png" alt=" " width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my case, I only had to enable &lt;code&gt;log_connections&lt;/code&gt; and &lt;code&gt;log_disconnections&lt;/code&gt;. So, I can updated them with available binary value 1 to enable them. This means to log all connection and disconnection to track record of users login and logout. This is useful for security audits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG:  connection authorized: user=app_user database=mydb application=psql host=10.0.1.5

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG:  disconnection: session time: 0:00:05.233 user=app_user database=mydb host=10.0.1.5

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To improve the log_line structure and add  log_line_prefix and change default value from&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%t:%r:%u@%d:[%p]:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will improve log_line output structure and provide logs&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-04-02 15:35:12 [19456]: [3-1] user=app_user,db=finance_db,app=psql,client=10.1.2.3 AUDIT: SESSION,1,1,READ,SELECT,,,SELECT * FROM accounts;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we save these changes and switch back to our cluster page. &lt;/p&gt;

&lt;p&gt;Click on the &lt;code&gt;Modify&lt;/code&gt; option. &lt;/p&gt;

&lt;p&gt;Click the instance and go to the configuration tab. &lt;br&gt;
Search for the parameter option and update it with our &lt;code&gt;custom&lt;/code&gt; parameter from &lt;code&gt;default&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbii961zlokwyveywq84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbii961zlokwyveywq84.png" alt=" " width="772" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It will take sometime for the cluster changes to take effect. &lt;br&gt;
Now, check in the cloudwatch, you will find log-group for Aurora postgres. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcnqfa399mmcpsmklnq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcnqfa399mmcpsmklnq0.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That concludes our article. We will take it further in next blog. &lt;/p&gt;

&lt;p&gt;For more granular logging, you can use &lt;strong&gt;pg_audit&lt;/strong&gt; extension for which there is a detailed guide available in &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.pgaudit.html" rel="noopener noreferrer"&gt;aws documentation&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>aurora</category>
      <category>aws</category>
      <category>logging</category>
      <category>security</category>
    </item>
    <item>
      <title>Learn how to use oidc token instead of access+secret keys.</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Wed, 19 Mar 2025 17:29:08 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/learn-how-to-use-oidc-token-instead-of-accesssecret-keys-9hk</link>
      <guid>https://dev.to/sudo_anuj/learn-how-to-use-oidc-token-instead-of-accesssecret-keys-9hk</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/sudo_anuj" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" alt="sudo_anuj"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/sudo_anuj/cicd-with-secure-authentication-using-github-actions-and-aws-ecr-4d7d" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Secure continuous Integration with Dockerfile, Github Actions and AWS ECR&lt;/h2&gt;
      &lt;h3&gt;Anuj Tyagi ・ Mar 3 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#docker&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#awslambda&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#githubactions&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#awsecr&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>docker</category>
      <category>awslambda</category>
      <category>githubactions</category>
      <category>awsecr</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Wed, 19 Mar 2025 17:27:14 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/-2bo7</link>
      <guid>https://dev.to/sudo_anuj/-2bo7</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/sudo_anuj" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F549060%2F1a5bb9b8-7bdd-499c-9b95-b664d65ffb26.jpg" alt="sudo_anuj"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/sudo_anuj/docker-cmd-vs-entrypoint-understanding-the-differences-apc" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Docker CMD vs ENTRYPOINT: Understanding the Differences&lt;/h2&gt;
      &lt;h3&gt;Anuj Tyagi ・ Mar 3 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#docker&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#dockerfile&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#cmd&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#container&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>docker</category>
      <category>dockerfile</category>
      <category>cmd</category>
      <category>container</category>
    </item>
    <item>
      <title>Docker CMD vs ENTRYPOINT: Understanding the Differences</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Mon, 03 Mar 2025 02:06:48 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/docker-cmd-vs-entrypoint-understanding-the-differences-apc</link>
      <guid>https://dev.to/sudo_anuj/docker-cmd-vs-entrypoint-understanding-the-differences-apc</guid>
      <description>&lt;p&gt;Docker is a platform used to manage applications within containers. To run an application in a container, a Docker image needs to be created first which can be done using a Dockerfile. A Dockerfile is a text document that has one or more commands that are executed by Docker to build the image. Two important instructions among these are CMD and ENTRYPOINT for defining how a container operates.&lt;/p&gt;

&lt;p&gt;In this blog post, we will explore CMD and ENTRYPOINT, their differences, and when to use each.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You will need a code editor and Docker Desktop application on your device to follow through with the practical examples demonstrated in this guide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is CMD in Dockerfile?
&lt;/h2&gt;

&lt;p&gt;CMD any instructions within the Dockerfile that comes into effect by default. It is executed when no command is executed while using the container.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example of CMD in a Dockerfile
&lt;/h3&gt;

&lt;p&gt;Make a directory called docker-demo, and in there, a new file called Dockerfile with this content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["echo", "Welcome to AItechNav!"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this Dockerfile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;FROM alpine&lt;/code&gt; sets the base image.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CMD ["echo", "Welcome to AItechNav!"]&lt;/code&gt; defines the default command to run inside the container.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Build Image and Run the container
&lt;/h3&gt;

&lt;p&gt;To create a Docker image, execute the following command inside the &lt;code&gt;docker-demo&lt;/code&gt; directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; custom-image:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the image is built, verify it using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker image list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, run a container using the image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to AItechNav!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overriding CMD at Runtime
&lt;/h3&gt;

&lt;p&gt;CMD allows users to override the default command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v1 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Empowering AI Enthusiasts!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Empowering AI Enthusiasts!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows that CMD acts as a default but can be replaced when running a container.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is ENTRYPOINT in Dockerfile?
&lt;/h2&gt;

&lt;p&gt;ENTRYPOINT, like CMD, specifies the command to execute when running a container. However, unlike CMD, the ENTRYPOINT directive cannot be overridden at runtime; instead, additional arguments are appended to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example of ENTRYPOINT in a Dockerfile
&lt;/h3&gt;

&lt;p&gt;Modify the &lt;code&gt;Dockerfile&lt;/code&gt; to use ENTRYPOINT instead of CMD:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["echo", "Welcome to AItechNav!"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Building and Running the Docker Image
&lt;/h3&gt;

&lt;p&gt;Build the new image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; custom-image:v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the image and run a container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to AItechNav!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Attempting to Override ENTRYPOINT at Runtime
&lt;/h3&gt;

&lt;p&gt;Unlike CMD, if we attempt to override ENTRYPOINT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v2 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"AI and Cloud Learning Hub!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to AItechNav! AI and Cloud Learning Hub!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The argument is appended instead of replacing the default command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using CMD &amp;amp; ENTRYPOINT Together
&lt;/h2&gt;

&lt;p&gt;You can combine CMD and ENTRYPOINT to define a fixed command while allowing users to provide different arguments.&lt;/p&gt;

&lt;p&gt;Modify the &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["echo", "Welcome to"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["AItechNav!"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Building and Running the Image
&lt;/h3&gt;

&lt;p&gt;Build the image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; custom-image:v3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to AItechNav!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overriding CMD but Not ENTRYPOINT
&lt;/h3&gt;

&lt;p&gt;Run the container with a custom argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker container run custom-image:v3 &lt;span class="s2"&gt;"the future of AI!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to the future of AI!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ENTRYPOINT (&lt;code&gt;echo Welcome to&lt;/code&gt;) remains unchanged.&lt;/li&gt;
&lt;li&gt;CMD (&lt;code&gt;AItechNav!&lt;/code&gt;) is overridden by &lt;code&gt;the future of AI!&lt;/code&gt; at runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Difference Between CMD &amp;amp; ENTRYPOINT
&lt;/h2&gt;

&lt;p&gt;The table below highlights their differences:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;CMD&lt;/th&gt;
&lt;th&gt;ENTRYPOINT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Provides a default command&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allows command override at runtime&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (arguments are appended)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to Use CMD vs. ENTRYPOINT
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use CMD&lt;/strong&gt; when you want to provide a default command but allow users to override it at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use ENTRYPOINT&lt;/strong&gt; when you want to enforce a specific command while allowing additional parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use both CMD and ENTRYPOINT together&lt;/strong&gt; when you want a fixed command but allow users to modify arguments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;CMD and ENTRYPOINT are essential Dockerfile instructions that define how a container runs. CMD provides flexibility by allowing users to override the default command, whereas ENTRYPOINT enforces a fixed command and appends additional arguments. By understanding their differences and best use cases, you can structure your Dockerfiles efficiently for various application needs.&lt;/p&gt;

&lt;p&gt;Happy containerizing!&lt;/p&gt;

</description>
      <category>docker</category>
      <category>dockerfile</category>
      <category>cmd</category>
      <category>container</category>
    </item>
    <item>
      <title>Understanding Dockerfile: A Guide to Building Efficient Docker Images</title>
      <dc:creator>Anuj Tyagi</dc:creator>
      <pubDate>Mon, 03 Mar 2025 01:57:30 +0000</pubDate>
      <link>https://dev.to/sudo_anuj/understanding-dockerfile-a-guide-to-building-efficient-docker-images-2npg</link>
      <guid>https://dev.to/sudo_anuj/understanding-dockerfile-a-guide-to-building-efficient-docker-images-2npg</guid>
      <description>&lt;p&gt;At the core of Docker's containerization process lies the Dockerfile, a powerful tool that automates the creation of Docker images. In this blog post, we will explore what a Dockerfile is, how it works, and best practices to optimize your builds. Let's dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Dockerfile?
&lt;/h2&gt;

&lt;p&gt;A Dockerfile is a script-like text file containing instructions that define how a Docker image should be built. Each line represents a specific command followed by arguments, forming a sequential process that constructs the final image. By convention, commands are written in uppercase to improve readability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Dockerfile for a Python Application
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt /app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How the Build Process Works
&lt;/h3&gt;

&lt;p&gt;When you build an image from this Dockerfile, the following steps occur:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Base Image Selection&lt;/strong&gt;: Docker searches for the specified base image (&lt;code&gt;python:3.10&lt;/code&gt;). If it's not available locally, it fetches it from Docker Hub.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Setting Up the Working Directory&lt;/strong&gt;: The &lt;code&gt;WORKDIR /app&lt;/code&gt; command creates a directory inside the container where subsequent commands will execute.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copying Dependencies File&lt;/strong&gt;: The &lt;code&gt;COPY requirements.txt /app&lt;/code&gt; instruction transfers the dependencies file to the container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Installing Dependencies&lt;/strong&gt;: The &lt;code&gt;RUN pip install --no-cache-dir -r requirements.txt&lt;/code&gt; command installs all required Python packages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copying Application Files&lt;/strong&gt;: The &lt;code&gt;COPY . /app&lt;/code&gt; command copies all remaining application files into the container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Defining the Default Command&lt;/strong&gt;: The &lt;code&gt;CMD&lt;/code&gt; instruction specifies the default command to run inside the container, starting the application with &lt;code&gt;python app.py&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Common Dockerfile Instructions
&lt;/h2&gt;

&lt;p&gt;Here are some key Dockerfile commands and their purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FROM&lt;/strong&gt;: Specifies the base image for the build process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ADD / COPY&lt;/strong&gt;: Transfers files from the host to the container. &lt;code&gt;ADD&lt;/code&gt; can handle remote URLs and extract compressed files, but &lt;code&gt;COPY&lt;/code&gt; is recommended for local file transfers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WORKDIR&lt;/strong&gt;: Defines the working directory for subsequent commands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUN&lt;/strong&gt;: Executes commands during the image build process, such as installing software packages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CMD / ENTRYPOINT&lt;/strong&gt;: Determines the default command executed when the container starts. &lt;code&gt;ENTRYPOINT&lt;/code&gt; is immutable, while &lt;code&gt;CMD&lt;/code&gt; can be overridden.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Understanding Dockerfile Layers
&lt;/h2&gt;

&lt;p&gt;Each command in a Dockerfile creates a new layer in the final image. These layers are stacked, and Docker efficiently caches them to speed up future builds. You can inspect the image layers using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;history&lt;/span&gt; &amp;lt;IMAGE_NAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or check the number of layers with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker inspect &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{{json .RootFS.Layers}}'&lt;/span&gt; &amp;lt;IMAGE_NAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Leveraging Docker's Build Cache
&lt;/h2&gt;

&lt;p&gt;Docker optimizes image builds using a caching mechanism. When a layer remains unchanged, Docker reuses the cached version instead of rebuilding it. However, if an instruction is modified, all subsequent layers are rebuilt. This behavior impacts how Dockerfiles should be structured to minimize unnecessary rebuilds.&lt;/p&gt;

&lt;p&gt;For example, consider a build process where the initial build takes &lt;strong&gt;1244.2 seconds&lt;/strong&gt;, but subsequent builds (without modifications) reduce the time to &lt;strong&gt;6.9 seconds&lt;/strong&gt; due to caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Writing Dockerfiles
&lt;/h2&gt;

&lt;p&gt;To enhance efficiency, follow these best practices:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Use a &lt;code&gt;.dockerignore&lt;/code&gt; File
&lt;/h3&gt;

&lt;p&gt;Similar to &lt;code&gt;.gitignore&lt;/code&gt;, a &lt;code&gt;.dockerignore&lt;/code&gt; file helps exclude unnecessary files from the build context, reducing image size and improving performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Minimize Image Layers
&lt;/h3&gt;

&lt;p&gt;Fewer layers result in faster builds. Consolidating multiple &lt;code&gt;RUN&lt;/code&gt; commands into a single command reduces the number of layers. Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nginx
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get clean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nginx &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get clean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach maintains readability while optimizing build efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Optimize Layer Order for Caching
&lt;/h3&gt;

&lt;p&gt;Since Docker rebuilds layers sequentially, placing frequently changing instructions at the end improves cache utilization. Consider this inefficient order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, any code change invalidates the cache, leading to unnecessary reinstallation of dependencies. Instead, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt /app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By copying &lt;code&gt;requirements.txt&lt;/code&gt; first, dependencies are installed before the entire codebase is copied. This ensures that dependency installations are only re-run when &lt;code&gt;requirements.txt&lt;/code&gt; changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this guide, we explored Dockerfiles, their core commands, how layers affect builds, and best practices for optimizing Docker images. By structuring Dockerfiles efficiently, you can improve build speed, reduce image size, and streamline the containerization process. Happy coding!&lt;/p&gt;

</description>
      <category>docker</category>
      <category>dockerfile</category>
      <category>python</category>
    </item>
  </channel>
</rss>
