<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rodrigo Fernandes</title>
    <description>The latest articles on DEV Community by Rodrigo Fernandes (@rodrigofrs13).</description>
    <link>https://dev.to/rodrigofrs13</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F948172%2F684c5d6a-06f1-4c57-8904-dc7fa8383758.png</url>
      <title>DEV Community: Rodrigo Fernandes</title>
      <link>https://dev.to/rodrigofrs13</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rodrigofrs13"/>
    <language>en</language>
    <item>
      <title>Observability for Resilience on Amazon EKS with OpenTelemetry + Datadog (Free Tier)</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Wed, 24 Dec 2025 15:21:31 +0000</pubDate>
      <link>https://dev.to/aws-builders/observability-for-resilience-on-amazon-eks-with-opentelemetry-datadog-free-tier-4a6c</link>
      <guid>https://dev.to/aws-builders/observability-for-resilience-on-amazon-eks-with-opentelemetry-datadog-free-tier-4a6c</guid>
      <description>&lt;p&gt;&lt;strong&gt;Building Dashboards That Truly Matter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resilience in cloud-native applications is not just about restarting pods or running across multiple Availability Zones.&lt;/p&gt;

&lt;p&gt;Without deep observability, you don’t know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where latency increases&lt;/li&gt;
&lt;li&gt;which service degrades first&lt;/li&gt;
&lt;li&gt;whether autoscaling actually works&lt;/li&gt;
&lt;li&gt;how long the system takes to recover&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: &lt;strong&gt;without observability, you test resilience in the dark.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article, you will learn how to build &lt;strong&gt;a complete observability platform for resilience on Amazon EKS&lt;/strong&gt;, using only &lt;strong&gt;open-source tools&lt;/strong&gt; and the &lt;strong&gt;Datadog free tier&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;🧭 &lt;strong&gt;What You Will Build&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By the end of this article, you will have:&lt;/p&gt;

&lt;p&gt;✅ An EKS cluster ready for testing&lt;br&gt;
✅ OpenTelemetry Collector deployed via Helm&lt;br&gt;
✅ Metrics, logs, and traces exported&lt;br&gt;
✅ Datadog configured (free tier)&lt;br&gt;
✅ Dashboards focused on real resilience&lt;br&gt;
✅ A foundation ready for Chaos Engineering&lt;/p&gt;

&lt;p&gt;🧠 &lt;strong&gt;High-Level Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The architecture follows the modern cloud-native observability pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instrumented (or auto-instrumented) applications&lt;/li&gt;
&lt;li&gt;OpenTelemetry Collector as the central layer&lt;/li&gt;
&lt;li&gt;Datadog as the visualization and APM backend&lt;/li&gt;
&lt;li&gt;CloudWatch as a native AWS complement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Metrics, logs, and traces flow in a unified way&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45caqdt8b76duiw81rts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45caqdt8b76duiw81rts.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;1️⃣ &lt;strong&gt;Why Observability Is Fundamental to Resilience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resilience is not just about staying up.&lt;br&gt;
It is about &lt;strong&gt;understanding system behavior under failure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With proper observability, you can answer questions such as:&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Does latency increase during failures?&lt;/strong&gt;&lt;br&gt;
Chaos tests almost always impact response time. Without metrics, this goes unnoticed.&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Does the system fail gracefully?&lt;/strong&gt;&lt;br&gt;
5xx and 4xx errors show whether the application degrades correctly or completely breaks.&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Is the bottleneck code or infrastructure?&lt;/strong&gt;&lt;br&gt;
CPU, memory, I/O, and network saturation tell the truth.&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Where is the bottleneck between microservices?&lt;/strong&gt;&lt;br&gt;
Distributed traces show exactly where time is spent.&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Is Kubernetes reacting properly?&lt;/strong&gt;&lt;br&gt;
Events, restarts, and scheduling behavior reveal a lot about resilience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You cannot improve what you cannot observe.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;2️⃣ &lt;strong&gt;Creating the EKS Cluster with eksctl&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For labs, testing, and technical articles, eksctl is fast and efficient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster \
  --name observability-eks \
  --region us-east-1 \
  --version 1.30 \
  --nodegroup-name ng-default \
  --node-type t3.medium \
  --nodes 2 \
  --nodes-min 2 \
  --nodes-max 4 \
  --managed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A functional EKS cluster&lt;/li&gt;
&lt;li&gt;A managed node group&lt;/li&gt;
&lt;li&gt;IAM automatically configured&lt;/li&gt;
&lt;li&gt;kubeconfig ready to use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3️⃣ &lt;strong&gt;Minimal Application Instrumentation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even without deep instrumentation, it is already possible to extract significant value.&lt;/p&gt;

&lt;p&gt;📌 &lt;strong&gt;Automatic Kubernetes Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collected via kubelet and cAdvisor:&lt;/li&gt;
&lt;li&gt;CPU and memory per pod&lt;/li&gt;
&lt;li&gt;Restarts&lt;/li&gt;
&lt;li&gt;Network usage&lt;/li&gt;
&lt;li&gt;Scheduling latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📌 &lt;strong&gt;Automatic Tracing (Auto-Instrumentation)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenTelemetry supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java&lt;/li&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Go (partial)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without changing the code, you already get distributed traces.&lt;/p&gt;

&lt;p&gt;📌 &lt;strong&gt;Structured Logs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Recommended format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "timestamp": "2025-01-01T12:34:56Z",
  "message": "Order created",
  "trace_id": "abc123",
  "span_id": "def456",
  "service": "checkout"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables direct correlation between logs and traces.&lt;/p&gt;

&lt;p&gt;4️⃣ &lt;strong&gt;Deploying the OpenTelemetry Collector with Helm&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The OpenTelemetry Collector acts as the central observability layer.&lt;/p&gt;

&lt;p&gt;It receives data via OTLP, processes it, and exports it to Datadog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation via Helm&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install otel-collector ./otel-datadog \
  --namespace observability \
  --create-namespace \
  --set datadog.apiKey=&amp;lt;YOUR_API_KEY&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Collector starts collecting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics (Prometheus / Kubernetes)&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Traces&lt;/li&gt;
&lt;li&gt;Cluster events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;5️⃣ &lt;strong&gt;Datadog Free Tier&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Datadog Free Tier is surprisingly powerful:&lt;/p&gt;

&lt;p&gt;✔ Up to 5 free hosts&lt;br&gt;
✔ APM included&lt;br&gt;
✔ Unlimited dashboards&lt;br&gt;
✔ Automatic Service Map&lt;br&gt;
✔ Basic alerts&lt;/p&gt;

&lt;p&gt;This is more than enough for resilience and chaos testing.&lt;/p&gt;

&lt;p&gt;6️⃣ &lt;strong&gt;Dashboards That Truly Matter for Resilience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the key point of the article: what to monitor.&lt;/p&gt;

&lt;p&gt;📊 &lt;strong&gt;6.1 Service Latency (APM)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trace.&amp;lt;service&amp;gt;.request.duration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Helps identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;failure impact&lt;/li&gt;
&lt;li&gt;progressive degradation&lt;/li&gt;
&lt;li&gt;bottlenecks between services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚨&lt;strong&gt;6.2 5xx and 4xx Errors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http.server.request.error.count
trace.&amp;lt;service&amp;gt;.errors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A direct indicator of user-perceived failure.&lt;/p&gt;

&lt;p&gt;🔥 &lt;strong&gt;6.3 CPU and Memory Saturation per Pod&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubernetes.pod.cpu.usage.total
kubernetes.pod.memory.usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Essential for validating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HPA&lt;/li&gt;
&lt;li&gt;Karpenter&lt;/li&gt;
&lt;li&gt;poorly sized requests and limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚙️ &lt;strong&gt;6.4 Event Loop (Node.js)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Custom metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event_loop_delay
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows when the application is alive but unusable.&lt;/p&gt;

&lt;p&gt;🗺️ &lt;strong&gt;6.5 Service Map&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Automatically visualizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;broken dependencies&lt;/li&gt;
&lt;li&gt;increased latency&lt;/li&gt;
&lt;li&gt;critical services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of Datadog’s most powerful features.&lt;/p&gt;

&lt;p&gt;🔄 &lt;strong&gt;6.6 Kubernetes Events and Restarts&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubernetes.pod.restart.count
kubernetes.event.count
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Detects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CrashLoopBackOff&lt;/li&gt;
&lt;li&gt;OOMKilled&lt;/li&gt;
&lt;li&gt;readiness failures&lt;/li&gt;
&lt;li&gt;scheduling issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📊 &lt;strong&gt;Datadog Dashboard — EKS Resilience Observability&lt;br&gt;
How to Use&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Datadog → Dashboards&lt;/li&gt;
&lt;li&gt;New Dashboard&lt;/li&gt;
&lt;li&gt;Import JSON&lt;/li&gt;
&lt;li&gt;Paste the content below&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;✅ &lt;strong&gt;Covered Items (Section 6 Checklist)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✔ 6.1 Service latency (APM)&lt;br&gt;
✔ 6.2 5xx and 4xx errors&lt;br&gt;
✔ 6.3 CPU per pod&lt;br&gt;
✔ 6.3 Memory per pod&lt;br&gt;
✔ 6.4 Event Loop (Node.js)&lt;br&gt;
✔ 6.5 Service Map (operational reference)&lt;br&gt;
✔ 6.6 Pod restarts&lt;br&gt;
✔ 6.6 Kubernetes events&lt;/p&gt;

&lt;p&gt;🧩 &lt;strong&gt;Dashboard JSON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "title": "EKS Resilience Observability",
  "description": "Dashboards focados em resiliência no EKS usando OpenTelemetry + Datadog",
  "layout_type": "ordered",
  "widgets": [
    {
      "definition": {
        "type": "timeseries",
        "title": "APM - Latência por Serviço",
        "requests": [
          {
            "q": "avg:trace.*.request.duration{*} by {service}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "APM - Taxa de Erros 5xx",
        "requests": [
          {
            "q": "sum:http.server.request.error.count{status:5xx} by {service}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "APM - Taxa de Erros 4xx",
        "requests": [
          {
            "q": "sum:http.server.request.error.count{status:4xx} by {service}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "Kubernetes - CPU por Pod",
        "requests": [
          {
            "q": "avg:kubernetes.pod.cpu.usage.total{*} by {pod_name,namespace}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "Kubernetes - Memória por Pod",
        "requests": [
          {
            "q": "avg:kubernetes.pod.memory.usage{*} by {pod_name,namespace}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "Node.js - Event Loop Delay",
        "requests": [
          {
            "q": "avg:event_loop_delay{*} by {service}",
            "display_type": "line"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "Kubernetes - Restarts de Pods",
        "requests": [
          {
            "q": "sum:kubernetes.pod.restart.count{*} by {pod_name,namespace}",
            "display_type": "bars"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "timeseries",
        "title": "Kubernetes - Eventos por Tipo",
        "requests": [
          {
            "q": "sum:kubernetes.event.count{*} by {reason}",
            "display_type": "bars"
          }
        ]
      }
    },
    {
      "definition": {
        "type": "note",
        "content": "🔎 Use o **Service Map do Datadog (APM → Service Map)** para visualizar dependências, gargalos e falhas de comunicação entre microserviços.",
        "background_color": "blue",
        "font_size": "16"
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🧠 &lt;strong&gt;How This Dashboard Helps with RESILIENCE&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Signal&lt;/strong&gt;      | &lt;strong&gt;What It Validates&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;   Real impact of failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5xx Errors&lt;/strong&gt;    User-perceived failure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4xx Errors&lt;/strong&gt;    Controlled degradation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU / Memory&lt;/strong&gt;  Bottlenecks and autoscaling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event Loop&lt;/strong&gt;    App alive but degraded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restarts&lt;/strong&gt;  Pod stability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Events&lt;/strong&gt; Root cause&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service Map&lt;/strong&gt;   Critical dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;7️⃣ &lt;strong&gt;Complementing with CloudWatch&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even when using Datadog, CloudWatch remains useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;control plane logs&lt;/li&gt;
&lt;li&gt;VPC CNI&lt;/li&gt;
&lt;li&gt;EKS events&lt;/li&gt;
&lt;li&gt;cluster scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a hybrid and complete observability approach.&lt;/p&gt;

&lt;p&gt;8️⃣ &lt;strong&gt;Validating Resilience in Practice&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With everything observable, you can test:&lt;/p&gt;

&lt;p&gt;✔ &lt;strong&gt;Node failure&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pod redistribution&lt;/li&gt;
&lt;li&gt;latency impact&lt;/li&gt;
&lt;li&gt;recovery time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✔ &lt;strong&gt;Pod failure&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;perceived errors&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;fallback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✔ &lt;strong&gt;Network failures&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inter-service timeouts&lt;/li&gt;
&lt;li&gt;artificial latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✔ &lt;strong&gt;Traffic spikes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;saturation&lt;/li&gt;
&lt;li&gt;autoscaling behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now you measure, rather than assume.&lt;/p&gt;

&lt;p&gt;9️⃣ &lt;strong&gt;Conclusion — Observability Is a Pillar of Resilience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resilience is not luck.&lt;br&gt;
It is &lt;strong&gt;data-driven engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With OpenTelemetry + Datadog, even on the free tier, you get:&lt;/p&gt;

&lt;p&gt;✅ deep system visibility&lt;br&gt;
✅ correlation between metrics, logs, and traces&lt;br&gt;
✅ actionable dashboards&lt;br&gt;
✅ a solid foundation for Chaos Engineering&lt;br&gt;
✅ continuous feedback for improvement&lt;/p&gt;

&lt;p&gt;If you want to &lt;strong&gt;build real resilience on Amazon EKS&lt;/strong&gt;, the journey starts with observability.&lt;/p&gt;

</description>
      <category>eks</category>
      <category>kubernetes</category>
      <category>datadog</category>
      <category>aws</category>
    </item>
    <item>
      <title>🚀 EKS Auto Mode na prática</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Mon, 15 Dec 2025 11:29:01 +0000</pubDate>
      <link>https://dev.to/aws-builders/eks-auto-mode-na-pratica-2c6f</link>
      <guid>https://dev.to/aws-builders/eks-auto-mode-na-pratica-2c6f</guid>
      <description>&lt;p&gt;O &lt;strong&gt;Amazon EKS Auto Mode&lt;/strong&gt; representa um avanço significativo na operação de clusters Kubernetes, eliminando controles de infraestrutura manual, provisionando compute sob demanda e permitindo foco exclusivo nas workloads.&lt;/p&gt;

&lt;p&gt;Neste guia &lt;em&gt;hands-on&lt;/em&gt;, você aprenderá:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A criar um cluster com Auto Mode já habilitado usando &lt;code&gt;eksctl&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A implantar a &lt;strong&gt;Retail Store Sample App&lt;/strong&gt; — aplicação oficial da AWS para testes de workloads reais&lt;/li&gt;
&lt;li&gt;A analisar o funcionamento do Auto Mode, incluindo provisionamento, escalabilidade e distribuição de Pods&lt;/li&gt;
&lt;li&gt;A realizar testes práticos de resiliência e comportamento automático do cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📌 Arquivos de configuração do artigo:&lt;br&gt;
👉 &lt;a href="https://github.com/rodrigofrs13/eks-auto-mode-na-pratica" rel="noopener noreferrer"&gt;https://github.com/rodrigofrs13/eks-auto-mode-na-pratica&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧩 1. O que é o EKS Auto Mode?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O EKS Auto Mode executa automaticamente tarefas de operação de cluster, incluindo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seleção e provisionamento de compute sob demanda&lt;/li&gt;
&lt;li&gt;Autoscaling e realocação de Pods&lt;/li&gt;
&lt;li&gt;Gerenciamento de AMIs, patches e updates&lt;/li&gt;
&lt;li&gt;Escolha automática dos melhores tipos de instância&lt;/li&gt;
&lt;li&gt;Redução de custos com otimização inteligente&lt;/li&gt;
&lt;li&gt;Você envia Pods → o Auto Mode provisiona infraestrutura → o workload roda&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⚙️ 2. Criando o cluster com Auto Mode usando eksctl&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;📄 Arquivo conf-cluster-eks.yaml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
    name: cluster-eks
    region: us-east-1
    version: '1.32'
    tags:
      auto-mode: "enabled"
      graviton-enabled: "true"
      spot-instances-enabled: "true"
      cost-optimization: "enabled"
      architecture: "multi-arch"
      environment: "dev"
      owner: "devops-team"
      provisioned-by: "eksctl"


availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1c"]    

vpc:
  cidr: "10.0.0.0/16"
  nat:
    gateway: Single
  clusterEndpoints:
    publicAccess: true
    privateAccess: true    

cloudWatch:
  clusterLogging:
    enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler"]
    logRetentionInDays: 1

autoModeConfig:
    enabled: true
    nodePools: 
      - general-purpose
      - system    

iam:
  withOIDC: true



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;▶️ Criando o cluster&lt;br&gt;
&lt;strong&gt;📄 Arquivo 01-setup-cluster-eks.sh&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sh 01-setup-cluster-eks.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🏬 3. Implantando a Retail Store Sample App&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Retail Store Sample App simula uma loja online composta por múltiplos microserviços, ideais para testes de provisionamento automático.&lt;/p&gt;

&lt;p&gt;▶️ Instalar a aplicação&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f https://github.com/aws-containers/retail-store-sample-app/releases/latest/download/kubernetes.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;▶️ Aguardar os deployments ficarem disponíveis&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl wait --for=condition=available deployments --all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;▶️ Acessar a aplicação via port-forward&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward $(kubectl get pods \
 --selector=app.kubernetes.io/name=ui -o jsonpath='{.items[0].metadata.name}') 8080:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A aplicação ficará disponível em:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="http://localhost:8080/" rel="noopener noreferrer"&gt;http://localhost:8080/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxf4kltctnc5o2oj0925.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxf4kltctnc5o2oj0925.png" alt=" " width="800" height="676"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ 4. Testes e Análise da Configuração do Modo Automático do EKS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agora vamos validar como o Auto Mode está provisionando, escalando e distribuindo Pods e compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔍 4.1. Verificar os Node Pools do EKS Auto Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O Auto Mode organiza a infraestrutura em pools semelhantes ao Karpenter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;general-purpose&lt;/strong&gt; → para workloads da aplicação&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;system&lt;/strong&gt;→ reservado para workloads do sistema&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;▶️ Ver os Worker Nodes do general-purpose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodes -l karpenter.sh/nodepool=general-purpose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;▶️ Ver os Worker Nodes do system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodes -l karpenter.sh/nodepool=system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🔍 4.2. Ver distribuição dos Pods entre os NodePools&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for node in $(kubectl get nodes -l karpenter.sh/nodepool=general-purpose -o custom-columns=NAME:.metadata.name --no-headers); do
  echo "Pods on $node:"
  kubectl get pods --all-namespaces --field-selector spec.nodeName=$node
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Isso permite analisar:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Balanceamento&lt;/li&gt;
&lt;li&gt;Quantidade de Pods por nó&lt;/li&gt;
&lt;li&gt;Regiões de alocação&lt;/li&gt;
&lt;li&gt;Reações do Auto Mode ao workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔍 4.3. Analisando agendamento dos Pods&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -o wide -A

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Você poderá avaliar:&lt;/li&gt;
&lt;li&gt;Em qual nó cada Pod está rodando&lt;/li&gt;
&lt;li&gt;IP, Node, estado e reinicializações&lt;/li&gt;
&lt;li&gt;Padrões de alocação automática&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📈 4.4. Simulando aumento de carga&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Aumente as réplicas para ativar provisionamento automático:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl scale deployment ui --replicas=10
kubectl scale deployment carts --replicas=10
kubectl scale deployment catalogue --replicas=10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitore em tempo real:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -A -w
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🔍 4.7. Ver distribuição dos Pods entre os NodePools&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for node in $(kubectl get nodes -l karpenter.sh/nodepool=general-purpose -o custom-columns=NAME:.metadata.name --no-headers); do
  echo "Pods on $node:"
  kubectl get pods --all-namespaces --field-selector spec.nodeName=$node
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O Auto Mode deve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Criar novas instâncias&lt;/li&gt;
&lt;li&gt;Realocar Pods&lt;/li&gt;
&lt;li&gt;Ajustar compute&lt;/li&gt;
&lt;li&gt;Reduzir nós quando a carga diminui&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📊 4.5. Analisando métricas nativas do cluster&lt;/strong&gt;&lt;br&gt;
Uso de nós:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl top nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Uso de Pods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl top pods -A
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eventos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get events -A --sort-by=.metadata.creationTimestamp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🧪 4.6. Validando realocação alterando recursos&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl patch deployment ui \
  -p '{"spec": {"template": {"spec": {"containers": [{"name": "ui", "resources": {"requests": {"cpu": "700m", "memory": "700Mi"}}}]}}}}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O comportamento esperado é:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto Mode provisiona compute mais robusto&lt;/li&gt;
&lt;li&gt;Pods são redistribuídos&lt;/li&gt;
&lt;li&gt;Novos nós podem surgir&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🧹 5. Clean Up — Removendo todos os recursos&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;▶️ Remover a aplicação&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl delete -f https://github.com/aws-containers/retail-store-sample-app/releases/latest/download/kubernetes.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;▶️ Deletar o cluster EKS com Auto Mode&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sh 02-cleanup-all.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🛡️ 6. Conclusão&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Com o &lt;strong&gt;EKS Auto Mode&lt;/strong&gt;, a operação de clusters Kubernetes torna-se mais simples, eficiente e automática.&lt;br&gt;
Neste artigo, exploramos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A criação de clusters prontos para Auto Mode&lt;/li&gt;
&lt;li&gt;A implantação de uma aplicação real da AWS&lt;/li&gt;
&lt;li&gt;A análise do comportamento inteligente de provisionamento&lt;/li&gt;
&lt;li&gt;Testes práticos de resiliência e escalabilidade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Esse modo reduz consideravelmente o esforço de operação e permite foco total no desenvolvimento de aplicações.&lt;/p&gt;

&lt;p&gt;📌 Todos os arquivos usados no artigo estão no repositório:&lt;br&gt;
👉 &lt;a href="https://github.com/rodrigofrs13/eks-auto-mode-na-pratica" rel="noopener noreferrer"&gt;https://github.com/rodrigofrs13/eks-auto-mode-na-pratica&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>eks</category>
      <category>devops</category>
    </item>
    <item>
      <title>Fundamentos de Resiliência no Amazon EKS: Como projetar workloads tolerantes a falhas em produção</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Tue, 09 Dec 2025 21:43:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/fundamentos-de-resiliencia-no-amazon-eks-como-projetar-workloads-tolerantes-a-falhas-em-producao-2e53</link>
      <guid>https://dev.to/aws-builders/fundamentos-de-resiliencia-no-amazon-eks-como-projetar-workloads-tolerantes-a-falhas-em-producao-2e53</guid>
      <description>&lt;p&gt;A resiliência é um dos pilares fundamentais da arquitetura moderna em nuvem. Em ambientes distribuídos, falhas são inevitáveis — nós caem, Pods travam, redes apresentam latência e picos de carga acontecem de forma imprevisível.&lt;br&gt;
Por isso, quando falamos em aplicações críticas rodando em Kubernetes, é indispensável pensar em &lt;strong&gt;tolerância a falhas, auto-recuperação, observabilidade e automação.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O Amazon Elastic Kubernetes Service (EKS), ao combinar a flexibilidade do Kubernetes com a robustez da infraestrutura da AWS, oferece um ecossistema poderoso para construir sistemas resilientes.&lt;br&gt;
Mas a resiliência &lt;strong&gt;não é automática&lt;/strong&gt; — ela precisa ser projetada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🎯 1. O que é Resiliência no Contexto de Kubernetes e EKS?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resiliência é a capacidade de um sistema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;continuar operando mesmo diante de falhas&lt;/li&gt;
&lt;li&gt;recuperar-se automaticamente&lt;/li&gt;
&lt;li&gt;degradar de maneira controlada&lt;/li&gt;
&lt;li&gt;manter confiabilidade e disponibilidade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No Kubernetes/EKS, isso se traduz em:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-AZ&lt;/li&gt;
&lt;li&gt;autoscaling&lt;/li&gt;
&lt;li&gt;readiness e liveness probes&lt;/li&gt;
&lt;li&gt;limites de recursos&lt;/li&gt;
&lt;li&gt;rollouts seguros&lt;/li&gt;
&lt;li&gt;automação de autoscaling da infraestrutura&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Resiliência não significa não falhar, mas falhar com graça.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🏗️ 2. Arquitetura Multi-AZ e Auto-Healing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O EKS simplifica a criação de clusters distribuídos por múltiplas zonas de disponibilidade, reduzindo drasticamente o risco de interrupção.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Por que isso é importante?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uma AZ pode falhar → seus Pods continuam funcionando em outras.&lt;/li&gt;
&lt;li&gt;Interrupções de nós são automaticamente tratadas via:
&lt;strong&gt;- Managed Node Groups auto-recovery&lt;/strong&gt;
&lt;strong&gt;- Auto-healing do Kubernetes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usar &lt;strong&gt;2 ou 3 AZs&lt;/strong&gt; no cluster.&lt;/li&gt;
&lt;li&gt;Preferir Managed Node Groups ou &lt;strong&gt;EKS Auto Mode.&lt;/strong&gt;
(&lt;a href="https://dev.to/aws-builders/reducing-kubernetes-costs-using-aws-eks-auto-mode-1jl1"&gt;Tenho um artigo falando mais sobre o EKS Auto Mode&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Configurar &lt;strong&gt;Pod Anti-Affinity&lt;/strong&gt; para distribuir Pods entre nós/AZs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔧 3. Probes: Garantindo Saúde da Aplicação&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As probes são essenciais para resiliência.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Liveness Probe&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Detecte travamentos.&lt;br&gt;
Se falhar → Kubernetes reinicia o Pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Readiness Probe&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Defina quando o Pod está pronto para receber tráfego.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Startup Probe&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Evite falsos positivos de liveness em aplicações lentas para iniciar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sempre definir healthchecks adequados&lt;/li&gt;
&lt;li&gt;Nunca usar a mesma URL para readiness e liveness&lt;/li&gt;
&lt;li&gt;Ajustar tempos: initialDelay, timeout, period&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📦 4. Requests, Limits e QoS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Grande parte dos incidentes em clusters vêm de uso incorreto de recursos, como:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consumo excessivo de memória&lt;/li&gt;
&lt;li&gt;uso intensivo de CPU&lt;/li&gt;
&lt;li&gt;OOMKills&lt;/li&gt;
&lt;li&gt;throttling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Requests&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Quantidade mínima necessária.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limits&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Máximo permitido para o Pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QoS&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guaranteed&lt;/li&gt;
&lt;li&gt;Burstable&lt;/li&gt;
&lt;li&gt;BestEffort&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sempre definir requests e limits&lt;/li&gt;
&lt;li&gt;Monitorar OOMKills e throttling&lt;/li&gt;
&lt;li&gt;Avaliar Vertical Pod Autoscaler em clusters maduros&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📈 5. Autoscaling: HPA, Karpenter e EKS Auto Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resiliência também envolve adaptação automática.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HPA (Horizontal Pod Autoscaler)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Escala Pods com base em:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU&lt;/li&gt;
&lt;li&gt;Memória&lt;/li&gt;
&lt;li&gt;Latência&lt;/li&gt;
&lt;li&gt;Métricas customizadas (Prometheus)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Infraestrutura: Karpenter ou EKS Auto Mode&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Karpenter&lt;/strong&gt; provê provisionamento inteligente.&lt;br&gt;
&lt;strong&gt;EKS Auto Mode&lt;/strong&gt; leva isso ao próximo nível:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provisionamento automático baseado nos Pods&lt;/li&gt;
&lt;li&gt;Multi-AZ&lt;/li&gt;
&lt;li&gt;Zero configuração de node groups&lt;/li&gt;
&lt;li&gt;Alta resiliência + redução de custo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usar HPA + Auto Mode/Karpenter&lt;/li&gt;
&lt;li&gt;Configurar Pod Disruption Budgets&lt;/li&gt;
&lt;li&gt;Garantir readiness antes de receber tráfego&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔄 6. Implantação Resiliente: Rolling, Blue/Green e Canary&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Rolling Update&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Atualização gradual sem downtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blue/Green&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Versão nova só recebe tráfego quando validada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Canary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tráfego gradual para nova versão baseado em métricas.&lt;/p&gt;

&lt;p&gt;Ferramentas recomendadas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Argo Rollouts&lt;/li&gt;
&lt;li&gt;AWS App Mesh&lt;/li&gt;
&lt;li&gt;NGINX Ingress Controller&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evitar breaking changes&lt;/li&gt;
&lt;li&gt;Usar feature flags&lt;/li&gt;
&lt;li&gt;Monitorar cada etapa do rollout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🧪 7. Testes de Resiliência: Caos, Carga e Funcionais&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Chaos Engineering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ferramentas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ChaosMesh&lt;/li&gt;
&lt;li&gt;LitmusChaos&lt;/li&gt;
&lt;li&gt;AWS Fault Injection Simulator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cenários comuns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Falha de nó&lt;/li&gt;
&lt;li&gt;Falha de Pod&lt;/li&gt;
&lt;li&gt;Perda de rede&lt;/li&gt;
&lt;li&gt;Latência artificial&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Testes de Carga&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;K6&lt;/li&gt;
&lt;li&gt;Locust&lt;/li&gt;
&lt;li&gt;Artillery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Testes Funcionais&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Robot Framework&lt;/li&gt;
&lt;li&gt;Postman/Newman&lt;/li&gt;
&lt;li&gt;Cypress (front)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Por que isso importa?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Revela:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gargalos&lt;/li&gt;
&lt;li&gt;comportamentos inesperados&lt;/li&gt;
&lt;li&gt;falta de tolerância a falhas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📊 8. Observabilidade para Resiliência&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sem visibilidade, não há resiliência.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Métricas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;CloudWatch&lt;/li&gt;
&lt;li&gt;OpenTelemetry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Logs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fluent Bit&lt;/li&gt;
&lt;li&gt;CloudWatch Logs&lt;/li&gt;
&lt;li&gt;OpenSearch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traces&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;X-Ray&lt;/li&gt;
&lt;li&gt;Jaeger&lt;/li&gt;
&lt;li&gt;Tempo (Grafana)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Boas práticas&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Criar métricas de SLO (latência, erros)&lt;/li&gt;
&lt;li&gt;Dashboards dedicados para Pods, Nodes, Deployments&lt;/li&gt;
&lt;li&gt;Alertas automáticos com CloudWatch ou Alertmanager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🛣️ 9. Padrões Fundamentais para Resiliência no Kubernetes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Pod Disruption Budget (PDB)&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Pod Affinity/Anti-Affinity&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Topology Spread Constraints&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Retry + Exponential Backoff&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Circuit Breaker&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Idempotência&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- Timeouts bem definidos&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Esses padrões evitam:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cascatas de falhas&lt;/li&gt;
&lt;li&gt;saturação de recursos&lt;/li&gt;
&lt;li&gt;degradação global do serviço&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🎯 10. Conclusão&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O EKS fornece uma base robusta, mas a resiliência depende de:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;padrões arquiteturais&lt;/li&gt;
&lt;li&gt;práticas operacionais&lt;/li&gt;
&lt;li&gt;observabilidade&lt;/li&gt;
&lt;li&gt;testes contínuos&lt;/li&gt;
&lt;li&gt;cultura DevOps&lt;/li&gt;
&lt;li&gt;automação inteligente&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ao aplicar esses fundamentos, você obtém aplicações que:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- toleram falhas&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- escala automaticamente&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- recuperam-se sem intervenção humana&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;- entregam confiabilidade em produção&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Resiliência é uma disciplina, não uma configuração.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>eks</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>aws</category>
    </item>
    <item>
      <title>Reducing Kubernetes costs using AWS EKS Auto Mode</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Tue, 05 Aug 2025 20:05:39 +0000</pubDate>
      <link>https://dev.to/aws-builders/reducing-kubernetes-costs-using-aws-eks-auto-mode-1jl1</link>
      <guid>https://dev.to/aws-builders/reducing-kubernetes-costs-using-aws-eks-auto-mode-1jl1</guid>
      <description>&lt;p&gt;🚀 &lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Efficiently managing Kubernetes clusters can be challenging—especially when it comes to &lt;strong&gt;cost optimization&lt;/strong&gt;. Maintaining underutilized instances, manually configuring scalability, and managing node groups requires time and specialized knowledge.&lt;/p&gt;

&lt;p&gt;According to recent studies, organizations often overspend on Kubernetes infrastructure due to over-provisioning and poor resource management.&lt;br&gt;
AWS introduced Amazon EKS Auto Mode, a simpler and more cost-effective way to operate Kubernetes clusters.&lt;/p&gt;

&lt;p&gt;In this article, you’ll learn how EKS Auto Mode works, why it helps reduce costs, and how to implement it in your environment with practical examples.&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;What is EKS Auto Mode?&lt;/strong&gt;&lt;br&gt;
EKS Auto Mode is a new operational mode for Amazon EKS that completely abstracts away the infrastructure management of Kubernetes nodes.&lt;br&gt;
Launched in November 2024, it is a natural evolution of Karpenter, offering an even more simplified experience.&lt;/p&gt;

&lt;p&gt;🧩 &lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Automatic Provisioning:&lt;/strong&gt; Nodes are created and removed automatically based on pod demand&lt;br&gt;
&lt;strong&gt;- Smart Optimization:&lt;/strong&gt; Automatically selects instance types, availability zones, and pricing models&lt;br&gt;
&lt;strong&gt;- Zero Management:&lt;/strong&gt; Eliminates the need to create Node Groups or Launch Templates&lt;/p&gt;

&lt;p&gt;💰 &lt;strong&gt;How EKS Auto Mode Reduces Costs&lt;/strong&gt;&lt;br&gt;
The main goal of Auto Mode is to avoid over-provisioning and maximize EC2 usage efficiency.&lt;/p&gt;

&lt;p&gt;🎯 &lt;strong&gt;Pod-Based Intelligent Scaling&lt;/strong&gt;&lt;br&gt;
Unlike Node Groups that scale based on CPU/Memory metrics, Auto Mode scales based on pending pods. This removes the need for pre-allocated "buffer" resources.&lt;/p&gt;

&lt;p&gt;💸 &lt;strong&gt;Automatic Spot Instance Optimization&lt;/strong&gt;&lt;br&gt;
The system intelligently mixes Spot and On-Demand instances, potentially saving up to 90% on interruption-tolerant workloads.&lt;/p&gt;

&lt;p&gt;🔁 &lt;strong&gt;Elimination of Idle Nodes&lt;/strong&gt;&lt;br&gt;
Instances are automatically terminated when no pods are running, with a default grace period of 30 seconds.&lt;/p&gt;

&lt;p&gt;🤖 &lt;strong&gt;Smart Instance Selection&lt;/strong&gt;&lt;br&gt;
Auto Mode takes multiple factors into account simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pod CPU/Memory requirements&lt;/li&gt;
&lt;li&gt;Instance pricing&lt;/li&gt;
&lt;li&gt;Availability across AZs&lt;/li&gt;
&lt;li&gt;Architecture (AMD64/ARM64)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚙️ &lt;strong&gt;EKS Auto Mode Setup&lt;/strong&gt;&lt;br&gt;
✅ Prerequisites&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI configured&lt;/li&gt;
&lt;li&gt;kubectl installed&lt;/li&gt;
&lt;li&gt;Helm 3.x&lt;/li&gt;
&lt;li&gt;IAM permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🧾 &lt;strong&gt;Step 1: Clone the Repository&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/rodrigofrs13/eks-auto-mode.git
cd eks-auto-mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚙️ &lt;strong&gt;Step 2: Configure the Cluster&lt;/strong&gt;&lt;br&gt;
Edit the &lt;code&gt;conf-cluster-eks-auto-mode.yaml&lt;/code&gt; file with your desired settings for testing.&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Step 3: Create the Cluster&lt;/strong&gt;&lt;br&gt;
Run the script to set up and enable Auto Mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sh setup-cluster-eks-auto-mode.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔌 &lt;strong&gt;Step 4: Connect to the Cluster&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws eks --region &amp;lt;region&amp;gt; update-kubeconfig --name &amp;lt;cluster-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔍 &lt;strong&gt;Step 5: Verify Resources&lt;/strong&gt;&lt;br&gt;
Check if Karpenter-related resources were created:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NodePool&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodepool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NodeClass&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodeclass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;EC2NodeClass: Defines EC2 configurations (AMI, security groups, subnets, user data)&lt;/li&gt;
&lt;li&gt;NodePool: Defines scaling policies (instance types, taints, resource limits)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🧠 &lt;strong&gt;Advanced Configuration - NodePool for AMD64&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For workloads requiring AMD64 architecture, apply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f NodePool-AMD.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🧪 &lt;strong&gt;Scalability Test&lt;/strong&gt;&lt;br&gt;
🧱 &lt;strong&gt;Apply the Test Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apply deployment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f deploy-scaling-SPOT.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📈 Scale to 50 replicas&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl scale deployment nginx-arm64-spot --replicas=50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔎 &lt;strong&gt;Monitor in Real Time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pods&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -w
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;em&gt;NodeClaims *&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodeclaim
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Worker Nodes&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get nodes --show-labels
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ &lt;strong&gt;Expected Results&lt;/strong&gt;&lt;br&gt;
⏱️ Response Time: New nodes within 30–45 seconds&lt;/p&gt;

&lt;p&gt;💡 Optimization: Auto Spot/On-Demand mix&lt;/p&gt;

&lt;p&gt;🧹 Cleanup: Idle nodes deleted after 30s grace period&lt;/p&gt;

&lt;p&gt;📊 &lt;strong&gt;Cost Monitoring with Kubecost&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install Kubecost&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set persistentVolume.enabled=false \
  --set prometheus.server.persistentVolume.enabled=false \
  --set persistentVolume.enabled=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check Installation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -n kubecost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access the dashboard at: &lt;a href="http://localhost:9090" rel="noopener noreferrer"&gt;http://localhost:9090&lt;/a&gt;&lt;br&gt;
🕒 Wait ~25 minutes for full metric collection.&lt;/p&gt;

&lt;p&gt;🔍 &lt;strong&gt;Key Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per namespace&lt;/li&gt;
&lt;li&gt;Efficiency metrics: Usage vs. requests&lt;/li&gt;
&lt;li&gt;Spot vs On-Demand ratio&lt;/li&gt;
&lt;li&gt;Hourly cost trends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe5usp75tgv9jfgtckkq2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe5usp75tgv9jfgtckkq2.png" alt=" " width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🧠 &lt;strong&gt;Best Practices for Maximum Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Accurate Resource Requests and Limits
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Toleration Separation by Interruption Tolerance
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tolerations:
- key: "spot"
  operator: "Equal"
  value: "false"
  effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Para workloads tolerantes
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tolerations:
- key: "spot"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Node Affinity for Spot Optimization
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      preference:
        matchExpressions:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Use PodDisruptionBudgets
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;Cost Monitoring and Alerts&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;
  
  
  Custom CloudWatch Log Group
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws logs create-log-group --log-group-name /aws/eks/auto-mode/costs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Cost Explorer automation
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;⚠️ &lt;strong&gt;Limitations and When Not to Use&lt;/strong&gt;&lt;br&gt;
Auto Mode is not suitable for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom AMIs: It uses AWS-optimized AMIs&lt;/li&gt;
&lt;li&gt;Specialized Hardware: GPU, Inferentia, or other special instance types&lt;/li&gt;
&lt;li&gt;Granular Control Needs: Deeply customized configurations&lt;/li&gt;
&lt;li&gt;Strict Compliance Environments: Instance type constraints&lt;/li&gt;
&lt;li&gt;Stateful Workloads: Databases needing persistent local storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Amazon EKS Auto Mode represents a paradigm shift in Kubernetes cluster operations, offering:&lt;/p&gt;

&lt;p&gt;🛠️ &lt;strong&gt;Operational Simplicity:&lt;/strong&gt; Up to 80% management time reduction&lt;/p&gt;

&lt;p&gt;💸 &lt;strong&gt;Cost Savings:&lt;/strong&gt; 60–70% lower infrastructure costs&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Instant Scalability:&lt;/strong&gt; Rapid response to demand changes&lt;/p&gt;

&lt;p&gt;🧠 &lt;strong&gt;Continuous Optimization:&lt;/strong&gt; Machine learning–based decisions&lt;/p&gt;

&lt;p&gt;For organizations seeking to reduce complexity and costs, Auto Mode is a proven and mature solution.&lt;br&gt;
The combination of smart provisioning, Spot instance optimization, and zero over-provisioning leads to substantial savings.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;Additional Resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/best-practices/automode.html" rel="noopener noreferrer"&gt;Amazon EKS Auto Mode Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://karpenter.sh/docs/" rel="noopener noreferrer"&gt;Karpenter Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cost-management/latest/userguide/ce-what-is.html" rel="noopener noreferrer"&gt;AWS Cost Explorer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/docs/en/kubecost/self-hosted/2.x" rel="noopener noreferrer"&gt;Kubecost Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🧹 &lt;strong&gt;Environment Cleanup&lt;/strong&gt;&lt;br&gt;
⚠️ &lt;strong&gt;Important:&lt;/strong&gt; Always clean up unused resources to avoid unwanted charges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chmod +x cleanup-all.sh
./cleanup-all.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>cloud</category>
      <category>kubernetes</category>
      <category>finops</category>
    </item>
    <item>
      <title>Creating a simple and fast EKS Cluster</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Sun, 03 Nov 2024 13:20:17 +0000</pubDate>
      <link>https://dev.to/aws-builders/creating-a-simple-and-fast-eks-cluster-3p63</link>
      <guid>https://dev.to/aws-builders/creating-a-simple-and-fast-eks-cluster-3p63</guid>
      <description>&lt;p&gt;As a DevOps engineer, having the ability to quickly create and manage Kubernetes clusters is essential. In this article, I'll show you three different ways to create an EKS cluster on AWS, from the simplest to the most complete, helping you be more efficient in your daily tasks.&lt;/p&gt;

&lt;p&gt;How This Will Help DevOps Engineers&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick Environment Setup:&lt;/strong&gt; Create dev/test environments in minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as Code:&lt;/strong&gt; Maintain cluster configurations in version control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation Ready:&lt;/strong&gt; Easy to integrate with CI/CD pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable Approach:&lt;/strong&gt; Start simple and evolve as needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose the one that best suits your needs!&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: EKS Cluster in One Command
&lt;/h2&gt;

&lt;p&gt;For those who want to start quickly, there's a super simple way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster --name simple-cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! With this single command you already have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 t2.micro nodes&lt;/li&gt;
&lt;li&gt;New VPC&lt;/li&gt;
&lt;li&gt;Public subnets&lt;/li&gt;
&lt;li&gt;Basic security groups&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 2: Cluster with Custom Settings
&lt;/h2&gt;

&lt;p&gt;If you need more control, you can use this command with parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster \
    --name middle-cluster \
    --region us-east-1 \
    --version 1.28 \
    --nodegroup-name workers \
    --node-type t3.medium \
    --nodes 2 \
    --nodes-min 2 \
    --nodes-max 4 \
    --managed \
    --asg-access \
    --external-dns-access \
    --full-ecr-access \
    --tags "Environment=development" \
    --zones us-east-1a,us-east-1b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! This command gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;t3.medium nodes with auto-scaling&lt;/li&gt;
&lt;li&gt;ECR and external DNS access&lt;/li&gt;
&lt;li&gt;Organization tags&lt;/li&gt;
&lt;li&gt;Specific availability zones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Option 3: Cluster via Configuration File&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For more robust environments, use a cluster.yaml file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: hard-cluster
  region: us-east-1
  version: "1.28"
  tags:
    karpenter.sh/discovery: cluster-with-karpenter

availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1c"]  

vpc:
  cidr: "10.0.0.0/16"
  nat:
    gateway: Single

iam:
  withOIDC: true

karpenter:
  version: 'v0.20.0'
  createServiceAccount: true
  withSpotInterruptionQueue: true

nodeGroups:
  - name: apps
    availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1c"]
    instanceType: t3.medium
    desiredCapacity: 2
    minSize: 2
    maxSize: 4
   labels:
      role: apps
    tags:
      Environment: production
    iam:
      withAddonPolicies:
        autoScaler: true
        albIngress: true

- name: system
    instanceType: t3.small
    availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1c"]
    desiredCapacity: 2
    minSize: 2
    maxSize: 3
    labels:
      role: system

cloudWatch:
    clusterLogging:
        enableTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler"]
        logRetentionInDays: 1 

addons:
  - name: vpc-cni
    version: latest
  - name: coredns
    version: latest

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Execute with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster -f cluster.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! This configuration gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom VPC with specific CIDR&lt;/li&gt;
&lt;li&gt;Two node groups with different purposes:&lt;/li&gt;
&lt;li&gt;Apps group: t3.medium nodes with auto-scaling&lt;/li&gt;
&lt;li&gt;System group: t3.small nodes for system components&lt;/li&gt;
&lt;li&gt;Auto Scaler and ALB Ingress enabled&lt;/li&gt;
&lt;li&gt;Latest versions of core addons&lt;/li&gt;
&lt;li&gt;Production-ready setup with proper labeling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Which Option to Choose?&lt;br&gt;
Option 1 (Simple Command)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick tests&lt;/li&gt;
&lt;li&gt;POCs&lt;/li&gt;
&lt;li&gt;Learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Option 2 (Command with parameters)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Development environment&lt;/li&gt;
&lt;li&gt;Specific configurations&lt;/li&gt;
&lt;li&gt;Script automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Option 3 (Configuration file)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production environment&lt;/li&gt;
&lt;li&gt;Version-controlled configuration&lt;/li&gt;
&lt;li&gt;Multiple node groups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cleaning Up Resources&lt;/strong&gt;&lt;br&gt;
Don't forget to delete the cluster when you no longer need it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# For any of the options
eksctl delete cluster --name CLUSTER_NAME
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# For cluster created with file
eksctl delete cluster -f cluster.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Creating EKS clusters quickly and efficiently brings several key benefits for DevOps professionals:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time and Productivity Benefits&lt;/strong&gt;&lt;br&gt;
Rapid Development Cycles: Create new environments in minutes instead of hours&lt;br&gt;
Quick Testing: Validate changes and configurations without long setup times&lt;br&gt;
Fast Disaster Recovery: Quickly spin up new clusters if needed&lt;br&gt;
Efficient Experimentation: Test new configurations and settings without lengthy processes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Benefits&lt;/strong&gt;&lt;br&gt;
Pay Only What You Need: Create clusters only when needed&lt;br&gt;
Environment Control: Easily spin up and tear down environments&lt;br&gt;
Resource Optimization: Scale environments based on actual needs&lt;br&gt;
Development Cost Reduction: Use temporary clusters for testing instead of maintaining permanent ones&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Benefits&lt;/strong&gt;&lt;br&gt;
Infrastructure as Code: Maintain consistent environments across teams&lt;br&gt;
Version Control: Track all cluster configurations in Git&lt;br&gt;
Automation Ready: Easily integrate with CI/CD pipelines&lt;br&gt;
Environment Parity: Ensure development matches production&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team Benefits&lt;/strong&gt;&lt;br&gt;
Self-Service Infrastructure: Teams can create their own environments&lt;br&gt;
Reduced Dependencies: Less reliance on infrastructure teams&lt;br&gt;
Better Learning: Quick feedback loop for learning Kubernetes&lt;br&gt;
Increased Confidence: More testing and validation opportunities&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business Benefits&lt;/strong&gt;&lt;br&gt;
Faster Time to Market: Reduce environment setup time&lt;br&gt;
Improved Quality: More thorough testing in production-like environments&lt;br&gt;
Risk Reduction: Test changes in isolated environments&lt;br&gt;
Better Resource Utilization: Create and destroy environments as needed&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By mastering these cluster creation methods, you'll be able to:&lt;/strong&gt;&lt;br&gt;
Support development teams more effectively&lt;br&gt;
Respond to incidents faster&lt;br&gt;
Manage resources more efficiently&lt;br&gt;
Implement better testing practices&lt;br&gt;
Improve your infrastructure automation&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Remember:&lt;/strong&gt; The ability to quickly create and manage EKS clusters isn't just about technical capability - it's about enabling your organization to move faster, work more efficiently, and deliver better results.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>Hands-On: Escalonamento automático com EKS e Cluster Autoscaler utilizando Terraform e Helm</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Thu, 20 Jun 2024 14:59:05 +0000</pubDate>
      <link>https://dev.to/aws-builders/hands-on-escalonamento-automatico-com-eks-e-cluster-autoscaler-utilizando-terraform-e-helm-51ki</link>
      <guid>https://dev.to/aws-builders/hands-on-escalonamento-automatico-com-eks-e-cluster-autoscaler-utilizando-terraform-e-helm-51ki</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introdução&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O escalonamento automático de clusters é uma funcionalidade essencial em ambientes de computação em nuvem, especialmente quando se trata de gerenciar recursos de forma eficiente e econômica. &lt;/p&gt;

&lt;p&gt;Nesse contexto o Cluster Autoscaler (CA) é uma ferramenta vital para ajustar dinamicamente o número de instâncias de nó em um cluster Kubernetes, garantindo que as cargas de trabalho tenham recursos suficientes enquanto minimiza os custos. &lt;/p&gt;

&lt;p&gt;Este artigo técnico explora o processo de configuração e uso do Amazon EKS e do Cluster Autoscaler utilizando Terraform e Helm para implementar o escalonamento automático.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Informações gerais&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;As configurações abaixo são para ambientes de testes, workshops e demos. Não utilizar em ambientes de produção.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Caso já conheça o Cluster Autoscaler e quer fazer testes, clique nesse &lt;a href="https://github.com/rodrigofrs13/cluster-autoscaler-terraform-helm"&gt;link &lt;/a&gt;e use o repositório completo. &lt;/p&gt;

&lt;p&gt;Se quer fazer o passo-a-passo para entender em detalhes, siga as instruções abaixo.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Setup do Cluster &lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Para o setup do cluster, iremos utilizar um repositório em Terraform com o código de um cluster básico já pronto. &lt;/p&gt;

&lt;p&gt;Acesse o repositório clicando &lt;a href="https://github.com/rodrigofrs13/basic-cluster-eks-workshop"&gt;aqui&lt;/a&gt;, no readme existe o passo-a-passo para o setup completo do cluster.&lt;/p&gt;

&lt;p&gt;Após a execução dos passos, aguarde até conclusão, o output será conforme imagem abaixo:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s1m5kutc1f1fg0lnw2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s1m5kutc1f1fg0lnw2n.png" alt="Image description" width="664" height="110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pronto, o setup do cluster está concluido, vamos acessar o cluster e fazer alguns testes iniciais para analisar a integridade do cluster.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Acessando o Cluster&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Para acessar o cluster vamos utilizar o AWS Cloud9 e para a configuração vamos seguir o artigo &lt;strong&gt;Boosting AWS Cloud9 to Simplify Amazon EKS Administration&lt;/strong&gt; clicando &lt;a href="https://medium.com/@rodrigofrs13/boosting-aws-cloud9-to-simplify-amazon-eks-administration-4c0044dff017"&gt;aqui&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Após seguir os passos do artigo teremos o Cloud9 e o script de ferramentas para Kubernetes configurados.&lt;/p&gt;

&lt;p&gt;Copie o comando abaixo, altere a região e o nome do cluster e execute o comando para acessar o cluster EKS .&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ aws eks --region &amp;lt;sua-região&amp;gt; update-kubeconfig --name &amp;lt;nome-do-cluster&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Vamos fazer alguns testes iniciais para verificar a integridade do cluster.&lt;/p&gt;

&lt;p&gt;Coletando algumas informações.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl cluster-info&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy6d5zfldtt0xgs4j9k9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy6d5zfldtt0xgs4j9k9.png" alt="Image description" width="700" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verificando os Worker Nodes.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl get nodes -o wide&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe954ml8c45s0piea530c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe954ml8c45s0piea530c.png" alt="Image description" width="700" height="64"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Analisando todos os recursos criados.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl get all -A&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4tei9ssjxyrv2qxyyr3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4tei9ssjxyrv2qxyyr3.png" alt="Image description" width="700" height="238"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Com isso podemos concluir que nosso cluster está funcionando corretamente.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6n74d2mkgaa022furl9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6n74d2mkgaa022furl9.png" alt="Image description" width="249" height="203"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Com o cluster configurado vamos ao Cluster Autoscaler.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O que é o Cluster Autoscaler&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O Cluster Autoscaler é uma ferramenta de gerenciamento automático de recursos em clusters Kubernetes. &lt;/p&gt;

&lt;p&gt;Ele ajusta automaticamente o tamanho de um cluster Kubernetes, aumentando ou diminuindo o número de Worker Nodes conforme a necessidade de execução das cargas de trabalho. &lt;/p&gt;

&lt;p&gt;O Cluster Autoscaler toma decisões com base na quantidade de pods em execução e nas suas respectivas necessidades de recursos.&lt;/p&gt;

&lt;p&gt;Para saber mais sobre o Cluster Autoscaler acesse a documentação oficial clicando &lt;a href="https://github.com/kubernetes/autoscaler"&gt;aqui&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Instalação do Cluster Autoscaler&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vamos dividir os arquivos de insralação e configuração do cluster em 3 partes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;cluster_autoscaler_iam.tf&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;cluster_autoscaler_chart.tf&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;cluster_autoscaler_values.yaml&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vamos começar configurando as permissões.&lt;/p&gt;

&lt;p&gt;Primeiramente temos que pegar  o id da conta AWS e o id do OIDC Provider criado pelo cluster EKS. &lt;/p&gt;

&lt;p&gt;Para pegar o id do OIDC Provider execute o comando abaixo, alterando a variável &lt;em&gt;cluster_name&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;aws eks describe-cluster --name &amp;lt;cluster_name&amp;gt; --query "cluster.identity.oidc.issuer" --output text&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Com o id da conta aws e o id do OIDC Provider, vamos criar o arquivo &lt;em&gt;cluster_autoscaler_iam.tf&lt;/em&gt; e colar o trecho do código abaixo. &lt;/p&gt;

&lt;p&gt;Lembrado de alterar as variáveis id-da-conta-aws e oidc.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Criação da política IAM para o Cluster Autoscaler
resource "aws_iam_policy" "cluster_autoscaler_policy" {
  name        = "ClusterAutoscalerPolicy"
  description = "Policy for Kubernetes Cluster Autoscaler"
  policy      = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "autoscaling:DescribeAutoScalingGroups",
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:DescribeLaunchConfigurations",
          "autoscaling:DescribeTags",
          "autoscaling:SetDesiredCapacity",
          "autoscaling:TerminateInstanceInAutoScalingGroup",
          "ec2:DescribeInstances",
          "ec2:DescribeLaunchTemplateVersions",
          "ec2:DescribeTags"
        ],
        Resource = "*"
      }
    ]
  })
}

# Criar a IAM Role
resource "aws_iam_role" "cluster_autoscaler" {
  name = "eks-cluster-autoscaler-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Federated = "arn:aws:iam::&amp;lt;id-da-conta-aws&amp;gt;:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/&amp;lt;iodc&amp;gt;"
         },
        Action = "sts:AssumeRoleWithWebIdentity",
        Condition = {
          StringEquals = {
            "oidc.eks.${var.region}.amazonaws.com/id/&amp;lt;iodc&amp;gt;:aud" = "sts.amazonaws.com"
            "oidc.eks.${var.region}.amazonaws.com/id/&amp;lt;iodc&amp;gt;:sub" = "system:serviceaccount:kube-system:cluster-autoscaler"
          }
        }
      },
    ],
  })
}

# Criar a service account
resource "kubernetes_service_account" "cluster_autoscaler" {
  metadata {
    name      = "cluster-autoscaler"
    namespace = "kube-system"
    annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.cluster_autoscaler.arn
    }
  }
}

# Atachar a policy na role
resource "aws_iam_role_policy_attachment" "cluster_autoscaler_policy_attachment" {
  policy_arn = aws_iam_policy.cluster_autoscaler_policy.arn  #"arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/ClusterAutoscalerPolicy"
  role       = aws_iam_role.cluster_autoscaler.name
}

# (Opcional) Se você estiver usando uma instância EC2 para executar o Cluster Autoscaler, crie um profile para a instância
resource "aws_iam_instance_profile" "cluster_autoscaler_instance_profile" {
  name = "ClusterAutoscalerInstanceProfile"
  role = aws_iam_role.cluster_autoscaler.name
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Criamos uma IAM Policy chamada ClusterAutoscalerPolicy com as permissões necessárias para o Cluster Autoscaler funcionar.&lt;/p&gt;

&lt;p&gt;Criamos uma IAM Role com as permissões necessárias para o OIDC Provider.&lt;br&gt;
Criamos uma Service Account e "atachamos" a role criada.&lt;/p&gt;

&lt;p&gt;Opcional, se você estiver usando uma instância EC2 para executar o Cluster Autoscaler, crie um instance profile.&lt;/p&gt;

&lt;p&gt;Agora vamos configurar o Helm Chart, para isso crie um arquivo chamado &lt;em&gt;cluster_autoscaler_chart.tf&lt;/em&gt; e cole o trecho de código abaixo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "helm_release" "cluster_autoscaler" {
  name       = "cluster-autoscaler"
  repository = "https://kubernetes.github.io/autoscaler"
  chart      = "cluster-autoscaler"
  namespace  = "kube-system"
  timeout    = 300
  version = "9.34.1"

  values = [
    "${file("cluster_autoscaler_values.yaml")}"
  ]

  set {
    name  = "autoDiscovery.clusterName"
    value = data.aws_eks_cluster.cluster.name
  }

  set {
    name  = "awsRegion"
    value = var.region
  }

  set {
    name  = "rbac.serviceAccount.create"
    value = "false"
  }

  set {
    name  = "rbac.serviceAccount.name"
    value = "cluster-autoscaler"
  }

}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para configurar o Cluster Autoscaler com opções avançadas do Helm chart, você pode ajustar vários parâmetros que controlam o comportamento do autoscaler. &lt;/p&gt;

&lt;p&gt;O arquivo_ values.yaml_ permite configurar opções como escalonamento mínimo e máximo de Worker Nodes, controle de tolerâncias, métricas, intervalos de checagem, e muito mais.&lt;/p&gt;

&lt;p&gt;Agora crie o arquivo &lt;em&gt;cluster_autoscaler_values.yaml&lt;/em&gt; e cole o trecho abaixo. &lt;/p&gt;

&lt;p&gt;Temos que ajustar alguns parâmetros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;clusterName &lt;/em&gt;- Inserir o nome do cluster EKS&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;awsRegion &lt;/em&gt;- Inserir a região da AWS
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
# affinity -- Affinity for pod assignment
affinity: {}

# additionalLabels -- Labels to add to each object of the chart.
additionalLabels: {}

autoDiscovery:
  # cloudProviders `aws`, `gce`, `azure`, `magnum`, `clusterapi` and `oci` are supported by auto-discovery at this time
  # AWS: Set tags as described in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup

  # autoDiscovery.clusterName -- Enable autodiscovery for `cloudProvider=aws`, for groups matching `autoDiscovery.tags`.
  # autoDiscovery.clusterName -- Enable autodiscovery for `cloudProvider=azure`, using tags defined in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md#auto-discovery-setup.
  # Enable autodiscovery for `cloudProvider=clusterapi`, for groups matching `autoDiscovery.labels`.
  # Enable autodiscovery for `cloudProvider=gce`, but no MIG tagging required.
  # Enable autodiscovery for `cloudProvider=magnum`, for groups matching `autoDiscovery.roles`.
  clusterName: cluster-workshop

  # autoDiscovery.namespace -- Enable autodiscovery via cluster namespace for for `cloudProvider=clusterapi`
  namespace:  # default

  # autoDiscovery.tags -- ASG tags to match, run through `tpl`.
  tags:
    - k8s.io/cluster-autoscaler/enabled
    - k8s.io/cluster-autoscaler/{{ .Values.autoDiscovery.clusterName }}
  # - kubernetes.io/cluster/{{ .Values.autoDiscovery.clusterName }}

  # autoDiscovery.roles -- Magnum node group roles to match.
  roles:
    - worker

  # autoDiscovery.labels -- Cluster-API labels to match  https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#configuring-node-group-auto-discovery
  labels: []
    # - color: green
    # - shape: circle
# autoscalingGroups -- For AWS, Azure AKS or Magnum. At least one element is required if not using `autoDiscovery`. For example:
# &amp;lt;pre&amp;gt;
# - name: asg1&amp;lt;br /&amp;gt;
#   maxSize: 2&amp;lt;br /&amp;gt;
#   minSize: 1
# &amp;lt;/pre&amp;gt;
# For Hetzner Cloud, the `instanceType` and `region` keys are also required.
# &amp;lt;pre&amp;gt;
# - name: mypool&amp;lt;br /&amp;gt;
#   maxSize: 2&amp;lt;br /&amp;gt;
#   minSize: 1&amp;lt;br /&amp;gt;
#   instanceType: CPX21&amp;lt;br /&amp;gt;
#   region: FSN1
# &amp;lt;/pre&amp;gt;
autoscalingGroups: []
# - name: asg1
#   maxSize: 2
#   minSize: 1
# - name: asg2
#   maxSize: 2
#   minSize: 1

# autoscalingGroupsnamePrefix -- For GCE. At least one element is required if not using `autoDiscovery`. For example:
# &amp;lt;pre&amp;gt;
# - name: ig01&amp;lt;br /&amp;gt;
#   maxSize: 10&amp;lt;br /&amp;gt;
#   minSize: 0
# &amp;lt;/pre&amp;gt;
autoscalingGroupsnamePrefix: []
# - name: ig01
#   maxSize: 10
#   minSize: 0
# - name: ig02
#   maxSize: 10
#   minSize: 0

# awsAccessKeyID -- AWS access key ID ([if AWS user keys used](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#using-aws-credentials))
awsAccessKeyID: ""

# awsRegion -- AWS region (required if `cloudProvider=aws`)
awsRegion: us-east-1

# awsSecretAccessKey -- AWS access secret key ([if AWS user keys used](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#using-aws-credentials))
awsSecretAccessKey: ""

# azureClientID -- Service Principal ClientID with contributor permission to Cluster and Node ResourceGroup.
# Required if `cloudProvider=azure`
azureClientID: ""

# azureClientSecret -- Service Principal ClientSecret with contributor permission to Cluster and Node ResourceGroup.
# Required if `cloudProvider=azure`
azureClientSecret: ""

# azureResourceGroup -- Azure resource group that the cluster is located.
# Required if `cloudProvider=azure`
azureResourceGroup: ""

# azureSubscriptionID -- Azure subscription where the resources are located.
# Required if `cloudProvider=azure`
azureSubscriptionID: ""

# azureTenantID -- Azure tenant where the resources are located.
# Required if `cloudProvider=azure`
azureTenantID: ""

# azureUseManagedIdentityExtension -- Whether to use Azure's managed identity extension for credentials. If using MSI, ensure subscription ID, resource group, and azure AKS cluster name are set. You can only use one authentication method at a time, either azureUseWorkloadIdentityExtension or azureUseManagedIdentityExtension should be set.
azureUseManagedIdentityExtension: false

# azureUseWorkloadIdentityExtension -- Whether to use Azure's workload identity extension for credentials. See the project here: https://github.com/Azure/azure-workload-identity for more details. You can only use one authentication method at a time, either azureUseWorkloadIdentityExtension or azureUseManagedIdentityExtension should be set.
azureUseWorkloadIdentityExtension: false

# azureVMType -- Azure VM type.
azureVMType: "vmss"

# azureEnableForceDelete -- Whether to force delete VMs or VMSS instances when scaling down.
azureEnableForceDelete: false

# cloudConfigPath -- Configuration file for cloud provider.
cloudConfigPath: ""

# cloudProvider -- The cloud provider where the autoscaler runs.
# Currently only `gce`, `aws`, `azure`, `magnum` and `clusterapi` are supported.
# `aws` supported for AWS. `gce` for GCE. `azure` for Azure AKS.
# `magnum` for OpenStack Magnum, `clusterapi` for Cluster API.
cloudProvider: aws

# clusterAPICloudConfigPath -- Path to kubeconfig for connecting to Cluster API Management Cluster, only used if `clusterAPIMode=kubeconfig-kubeconfig or incluster-kubeconfig`
clusterAPICloudConfigPath: /etc/kubernetes/mgmt-kubeconfig

# clusterAPIConfigMapsNamespace -- Namespace on the workload cluster to store Leader election and status configmaps
clusterAPIConfigMapsNamespace: ""

# clusterAPIKubeconfigSecret -- Secret containing kubeconfig for connecting to Cluster API managed workloadcluster
# Required if `cloudProvider=clusterapi` and `clusterAPIMode=kubeconfig-kubeconfig,kubeconfig-incluster or incluster-kubeconfig`
clusterAPIKubeconfigSecret: ""

# clusterAPIMode --  Cluster API mode, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#connecting-cluster-autoscaler-to-cluster-api-management-and-workload-clusters
# Syntax: workloadClusterMode-ManagementClusterMode
# for `kubeconfig-kubeconfig`, `incluster-kubeconfig` and `single-kubeconfig` you always must mount the external kubeconfig using either `extraVolumeSecrets` or `extraMounts` and `extraVolumes`
# if you dont set `clusterAPIKubeconfigSecret`and thus use an in-cluster config or want to use a non capi generated kubeconfig you must do so for the workload kubeconfig as well
clusterAPIMode: incluster-incluster  # incluster-incluster, incluster-kubeconfig, kubeconfig-incluster, kubeconfig-kubeconfig, single-kubeconfig

# clusterAPIWorkloadKubeconfigPath -- Path to kubeconfig for connecting to Cluster API managed workloadcluster, only used if `clusterAPIMode=kubeconfig-kubeconfig or kubeconfig-incluster`
clusterAPIWorkloadKubeconfigPath: /etc/kubernetes/value

# containerSecurityContext -- [Security context for container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
containerSecurityContext: {}
  # capabilities:
  #   drop:
  #   - ALL

deployment:
  # deployment.annotations -- Annotations to add to the Deployment object.
  annotations: {}

# dnsPolicy -- Defaults to `ClusterFirst`. Valid values are:
# `ClusterFirstWithHostNet`, `ClusterFirst`, `Default` or `None`.
# If autoscaler does not depend on cluster DNS, recommended to set this to `Default`.
dnsPolicy: ClusterFirst

# envFromConfigMap -- ConfigMap name to use as envFrom.
envFromConfigMap: ""

# envFromSecret -- Secret name to use as envFrom.
envFromSecret: ""

## Priorities Expander
# expanderPriorities -- The expanderPriorities is used if `extraArgs.expander` contains `priority` and expanderPriorities is also set with the priorities.
# If `extraArgs.expander` contains `priority`, then expanderPriorities is used to define cluster-autoscaler-priority-expander priorities.
# See: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md
expanderPriorities: {}

# extraArgs -- Additional container arguments.
# Refer to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca for the full list of cluster autoscaler
# parameters and their default values.
# Everything after the first _ will be ignored allowing the use of multi-string arguments.
extraArgs:
  logtostderr: true
  stderrthreshold: info
  v: 4
  # write-status-configmap: true
  # status-config-map-name: cluster-autoscaler-status
  # leader-elect: true
  # leader-elect-resource-lock: endpoints
  # skip-nodes-with-local-storage: true
  # expander: random
  # scale-down-enabled: true
  # balance-similar-node-groups: true
  # min-replica-count: 0
  # scale-down-utilization-threshold: 0.5
  # scale-down-non-empty-candidates-count: 30
  # max-node-provision-time: 15m0s
  # scan-interval: 10s
  # scale-down-delay-after-add: 10m
  # scale-down-delay-after-delete: 0s
  # scale-down-delay-after-failure: 3m
  # scale-down-unneeded-time: 10m
  # skip-nodes-with-system-pods: true
  # balancing-ignore-label_1: first-label-to-ignore
  # balancing-ignore-label_2: second-label-to-ignore

# extraEnv -- Additional container environment variables.
extraEnv: {}

# extraEnvConfigMaps -- Additional container environment variables from ConfigMaps.
extraEnvConfigMaps: {}

# extraEnvSecrets -- Additional container environment variables from Secrets.
extraEnvSecrets: {}

# extraVolumeMounts -- Additional volumes to mount.
extraVolumeMounts: []
  # - name: ssl-certs
  #   mountPath: /etc/ssl/certs/ca-certificates.crt
  #   readOnly: true

# extraVolumes -- Additional volumes.
extraVolumes: []
  # - name: ssl-certs
  #   hostPath:
  #     path: /etc/ssl/certs/ca-bundle.crt

# extraVolumeSecrets -- Additional volumes to mount from Secrets.
extraVolumeSecrets: {}
  # autoscaler-vol:
  #   mountPath: /data/autoscaler/
  # custom-vol:
  #   name: custom-secret
  #   mountPath: /data/custom/
  #   items:
  #     - key: subkey
  #       path: mypath

# fullnameOverride -- String to fully override `cluster-autoscaler.fullname` template.
fullnameOverride: ""

# hostNetwork -- Whether to expose network interfaces of the host machine to pods.
hostNetwork: false

image:
  # image.repository -- Image repository
  repository: registry.k8s.io/autoscaling/cluster-autoscaler
  # image.tag -- Image tag
  tag: v1.30.0
  # image.pullPolicy -- Image pull policy
  pullPolicy: IfNotPresent
  ## Optionally specify an array of imagePullSecrets.
  ## Secrets must be manually created in the namespace.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  # image.pullSecrets -- Image pull secrets
  pullSecrets: []
  # - myRegistrKeySecretName

# kubeTargetVersionOverride -- Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands.
kubeTargetVersionOverride: ""

# kwokConfigMapName -- configmap for configuring kwok provider
kwokConfigMapName: "kwok-provider-config"

# magnumCABundlePath -- Path to the host's CA bundle, from `ca-file` in the cloud-config file.
magnumCABundlePath: "/etc/kubernetes/ca-bundle.crt"

# magnumClusterName -- Cluster name or ID in Magnum.
# Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`.
magnumClusterName: ""

# nameOverride -- String to partially override `cluster-autoscaler.fullname` template (will maintain the release name)
nameOverride: ""

# nodeSelector -- Node labels for pod assignment. Ref: https://kubernetes.io/docs/user-guide/node-selection/.
nodeSelector: {}

# podAnnotations -- Annotations to add to each pod.
podAnnotations:
  cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

# podDisruptionBudget -- Pod disruption budget.
podDisruptionBudget:
  maxUnavailable: 1
  # minAvailable: 2

# podLabels -- Labels to add to each pod.
podLabels: {}

# priorityClassName -- priorityClassName
priorityClassName: "system-cluster-critical"

# priorityConfigMapAnnotations -- Annotations to add to `cluster-autoscaler-priority-expander` ConfigMap.
priorityConfigMapAnnotations: {}
  # key1: "value1"
  # key2: "value2"

## Custom PrometheusRule to be defined
## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
prometheusRule:
  # prometheusRule.enabled -- If true, creates a Prometheus Operator PrometheusRule.
  enabled: false
  # prometheusRule.additionalLabels -- Additional labels to be set in metadata.
  additionalLabels: {}
  # prometheusRule.namespace -- Namespace which Prometheus is running in.
  namespace: monitoring
  # prometheusRule.interval -- How often rules in the group are evaluated (falls back to `global.evaluation_interval` if not set).
  interval: null
  # prometheusRule.rules -- Rules spec template (see https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#rule).
  rules: []

rbac:
  # rbac.create -- If `true`, create and use RBAC resources.
  create: true
  # rbac.pspEnabled -- If `true`, creates and uses RBAC resources required in the cluster with [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) enabled.
  # Must be used with `rbac.create` set to `true`.
  pspEnabled: false
  # rbac.clusterScoped -- if set to false will only provision RBAC to alter resources in the current namespace. Most useful for Cluster-API
  clusterScoped: true
  serviceAccount:
    # rbac.serviceAccount.annotations -- Additional Service Account annotations.
    annotations: {}
    # rbac.serviceAccount.create -- If `true` and `rbac.create` is also true, a Service Account will be created.
    create: true
    # rbac.serviceAccount.name -- The name of the ServiceAccount to use. If not set and create is `true`, a name is generated using the fullname template.
    name: ""
    # rbac.serviceAccount.automountServiceAccountToken -- Automount API credentials for a Service Account.
    automountServiceAccountToken: true

# replicaCount -- Desired number of pods
replicaCount: 1

# resources -- Pod resource requests and limits.
resources: {}
  # limits:
  #   cpu: 100m
  #   memory: 300Mi
  # requests:
  #   cpu: 100m
  #   memory: 300Mi

# revisionHistoryLimit -- The number of revisions to keep.
revisionHistoryLimit: 10

# securityContext -- [Security context for pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
securityContext: {}
  # runAsNonRoot: true
  # runAsUser: 1001
  # runAsGroup: 1001

service:
  # service.create -- If `true`, a Service will be created.
  create: true
  # service.annotations -- Annotations to add to service
  annotations: {}
  # service.labels -- Labels to add to service
  labels: {}
  # service.externalIPs -- List of IP addresses at which the service is available. Ref: https://kubernetes.io/docs/user-guide/services/#external-ips.
  externalIPs: []

  # service.loadBalancerIP -- IP address to assign to load balancer (if supported).
  loadBalancerIP: ""
  # service.loadBalancerSourceRanges -- List of IP CIDRs allowed access to load balancer (if supported).
  loadBalancerSourceRanges: []
  # service.servicePort -- Service port to expose.
  servicePort: 8085
  # service.portName -- Name for service port.
  portName: http
  # service.type -- Type of service to create.
  type: ClusterIP

## Are you using Prometheus Operator?
serviceMonitor:
  # serviceMonitor.enabled -- If true, creates a Prometheus Operator ServiceMonitor.
  enabled: false
  # serviceMonitor.interval -- Interval that Prometheus scrapes Cluster Autoscaler metrics.
  interval: 10s
  # serviceMonitor.namespace -- Namespace which Prometheus is running in.
  namespace: monitoring
  ## [Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#prometheus-operator-1)
  ## [Kube Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#exporters)
  # serviceMonitor.selector -- Default to kube-prometheus install (CoreOS recommended), but should be set according to Prometheus install.
  selector:
    release: prometheus-operator
  # serviceMonitor.path -- The path to scrape for metrics; autoscaler exposes `/metrics` (this is standard)
  path: /metrics
  # serviceMonitor.annotations -- Annotations to add to service monitor
  annotations: {}
  ## [RelabelConfig](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig)
  # serviceMonitor.metricRelabelings -- MetricRelabelConfigs to apply to samples before ingestion.
  metricRelabelings: {}

# tolerations -- List of node taints to tolerate (requires Kubernetes &amp;gt;= 1.6).
tolerations: []

# topologySpreadConstraints -- You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. (requires Kubernetes &amp;gt;= 1.19).
topologySpreadConstraints: []
  # - maxSkew: 1
  #   topologyKey: topology.kubernetes.io/zone
  #   whenUnsatisfiable: DoNotSchedule
  #   labelSelector:
  #     matchLabels:
  #       app.kubernetes.io/instance: cluster-autoscaler

# updateStrategy -- [Deployment update strategy](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy)
updateStrategy: {}
  # rollingUpdate:
  #   maxSurge: 1
  #   maxUnavailable: 0
  # type: RollingUpdate

# vpa -- Configure a VerticalPodAutoscaler for the cluster-autoscaler Deployment.
vpa:
  # vpa.enabled -- If true, creates a VerticalPodAutoscaler.
  enabled: false
  # vpa.updateMode -- [UpdateMode](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler/v0.13.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L124)
  updateMode: "Auto"
  # vpa.containerPolicy -- [ContainerResourcePolicy](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler/v0.13.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L159). The containerName is always et to the deployment's container name. This value is required if VPA is enabled.
  containerPolicy: {}

# secretKeyRefNameOverride -- Overrides the name of the Secret to use when loading the secretKeyRef for AWS and Azure env variables
secretKeyRefNameOverride: ""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Algumas configurações que podem ser personalizadas no arquivo de values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;autoDiscovery&lt;/em&gt;: Configura o nome do cluster para descoberta automática de grupos de Auto Scaling.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;extraArgs&lt;/em&gt;: Define argumentos adicionais para o Cluster Autoscaler, como políticas de escalonamento e thresholds.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;rbac&lt;/em&gt;: Configura a conta de serviço e as permissões RBAC.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;image&lt;/em&gt;: Define a versão da imagem do Cluster Autoscaler.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;resources&lt;/em&gt;: Especifica os recursos solicitados e limites para o pod do Cluster Autoscaler.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;nodeSelector, tolerations, affinity&lt;/em&gt;: Configurações para especificar onde os pods do Cluster Autoscaler podem ser agendados.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;replicaCount&lt;/em&gt;: Define o número de réplicas do Cluster Autoscaler.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;podAnnotations&lt;/em&gt;: Adiciona anotações ao pod do Cluster Autoscaler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Após criar todos os arquivos acima, vamos aplica-lo´s com o Terraform executando o comando abaixo:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;terraform apply --auto-approve&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Acompanhe os logs do Cluster Autoscaler para avaliar se o deploy ocorreu com sucesso.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl -n kube-system logs -f deployment/cluster-autoscaler-aws-cluster-autoscaler&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Caso esteja tudo certo o Cluster Autoscaler está operacional e pronto para testes de escalonamento.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Teste o escalonamento automático&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vamos iniciar os teste o escalonamento automático para isso vamos obter algumas informações, criar alguns recursos e acompanhar os resultados.&lt;/p&gt;

&lt;p&gt;Observe a quantidade de Worker Nodes atuais com o comando abaixo:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl get nodes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslel7yayerkhjgotkwu9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslel7yayerkhjgotkwu9.png" alt="Image description" width="516" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Observe que nesse momento temos somente 1 Worker Node disponível.&lt;/p&gt;

&lt;p&gt;Vamos criar um deployment para os testes de stress. &lt;/p&gt;

&lt;p&gt;Crie um arquivo com o nome de &lt;em&gt;cpu-stress-deployment.yaml&lt;/em&gt; e cole o código abaixo:&lt;br&gt;
&lt;br&gt;
 &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-stress
spec:
  replicas: 5
  selector:
    matchLabels:
      app: cpu-stress
  template:
    metadata:
      labels:
        app: cpu-stress
    spec:
      containers:
      - name: cpu-stress
        image: vish/stress
        resources:
          requests:
            cpu: "1"
        args:
        - -cpus
        - "1"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aplique o deployment com o comando:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl apply -f cpu-stress-deployment.yaml&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Observe o comportamento do Cluster Autoscaler, que deve aumentar o número de Worker Nodes para acomodar o workload adicional.&lt;/p&gt;

&lt;p&gt;Acopanhe os logs do Cluster Autoscaler.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl -n kube-system logs -f deployment/cluster-autoscaler-aws-cluster-autoscaler&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Acompanhe os Worker Nodes escalando e nota-se que ele escalou vários Worker Nodes para acomodar o novo workload. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl get nodes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Vamos simular a redução do workload excedente, voltando o ambiente normal. &lt;/p&gt;

&lt;p&gt;Vamos zerar a quantidade de pods no deployment e acompanhe os Worker Nodes sendo desprovisionados da infraestrutura e voltando ao seu estado original. &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Conclusão&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Usar Terraform e Helm para configurar um cluster EKS e o Cluster Autoscaler proporciona uma solução robusta e automatizada para gerenciar a escalabilidade dos clusters Kubernetes. &lt;br&gt;
Este artigo detalhado fornece os passos necessários para implementar e gerenciar o escalonamento automático, garantindo que os recursos sejam utilizados de forma eficiente e econômica.&lt;br&gt;
Com estas ferramentas, você pode otimizar os custos e melhorar o desempenho das suas aplicações em um ambiente Kubernetes gerenciado pela AWS.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Implementando aplicações altamente escaláveis com Amazon EKS e Karpenter utilizando Terraform.</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Mon, 25 Mar 2024 21:16:08 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/implementando-aplicacoes-altamente-escalaveis-com-amazon-eks-e-karpenter-utilizando-terraform-16cp</link>
      <guid>https://dev.to/rodrigofrs13/implementando-aplicacoes-altamente-escalaveis-com-amazon-eks-e-karpenter-utilizando-terraform-16cp</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introdução&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A automação e escalabilidade são elementos cruciais na gestão de infraestrutura em ambientes de nuvem, especialmente quando se trata de clusters Kubernetes. Neste artigo, exploraremos o processo técnico de implantação do Karpenter, uma ferramenta de escalonamento automático para Kubernetes, usando Terraform em um cluster Amazon EKS na AWS.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;O papel do Karpenter na escalabilidade&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;O Karpenter utiliza métricas de uso de recursos e políticas de escalonamento para provisionar automaticamente e de forma otimizada novos nós de computação quando necessário, garantindo que as aplicações tenham recursos suficientes para lidar com as demandas.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Desafios e Soluções&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Implementar manualmente políticas de escalonamento automático pode ser complexo e propenso a erros. O uso do Terraform simplifica esse processo, permitindo a definição da infraestrutura como código (IaC) e garantindo a consistência e repetibilidade nas implantações.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Benefícios desta Abordagem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Ao adotar essa abordagem, você automatiza a implantação e o gerenciamento do Karpenter em um ambiente Kubernetes na AWS, garantindo uma infraestrutura escalável, confiável e consistente.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;O que é Karpenter&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Karpenter é um sistema de autoescalonamento ele foi projetado para automatizar o dimensionamento de recursos em ambientes baseados em Kubernetes.&lt;/p&gt;

&lt;p&gt;O objetivo principal é otimizar o uso de recursos, garantindo que os aplicativos tenham a capacidade necessária para atender à demanda, ao mesmo tempo em que evita o desperdício de recursos quando a demanda diminui.&lt;/p&gt;

&lt;p&gt;O Karpenter é uma ferramenta útil para equipes que gerenciam clusters em Kubernetes, pois elimina a necessidade de ajustes manuais no dimensionamento dos recursos. &lt;/p&gt;

&lt;p&gt;Ele monitora a utilização dos recursos e toma decisões automatizadas com base nas métricas e políticas configuradas pelos usuários. &lt;/p&gt;

&lt;p&gt;Isso ajuda a garantir um desempenho eficiente e econômico dos aplicativos implantados em ambientes Kubernetes.&lt;/p&gt;

&lt;p&gt;Mais informações sobre o Karpenter acesse a pagina oficial clicando &lt;a href="https://karpenter.sh/docs/getting-started/" rel="noopener noreferrer"&gt;aqui&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Como funciona&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;O Karpenter funciona como um controlador personalizado para Kubernetes. Ele estende as capacidades padrão do Kubernetes, permitindo um autoescalonamento mais inteligente e eficiente. &lt;/p&gt;

&lt;p&gt;Abaixo uma visão geral de como o Karpenter opera:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm6q7u914necjwyxln7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm6q7u914necjwyxln7e.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoramento de métricas&lt;/strong&gt; &lt;br&gt;
Monitora constantemente métricas como uso de CPU, memória, e outras métricas relevantes dos pods e dos nós do cluster Kubernetes. Pode se integrar a sistemas de monitoramento como Prometheus para coletar essas métricas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Análise de métricas e demanda&lt;/strong&gt;&lt;br&gt;
Com base nas métricas coletadas, o Karpenter analisa a demanda atual dos recursos pelo cluster, identificando padrões de uso e tendências.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Políticas de autoescalonamento&lt;/strong&gt;&lt;br&gt;
Os usuários podem definir políticas de autoescalonamento que especificam como o Karpenter deve ajustar a quantidade de recursos disponíveis no cluster. Isso inclui especificar limites mínimos e máximos para o número de pods e nós, bem como configurar estratégias de alocação de recursos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decisões de escalonamento&lt;/strong&gt;&lt;br&gt;
Com base nas métricas coletadas e nas políticas definidas, o Karpenter toma decisões automatizadas sobre a criação, destruição ou dimensionamento de pods e nós no cluster Kubernetes. Ele pode escalar para cima (aumentando o número de pods ou nós) ou para baixo (reduzindo o número de pods ou nós) conforme necessário para atender à demanda.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alocação inteligente de recursos&lt;/strong&gt;&lt;br&gt;
Ele realiza uma alocação inteligente de recursos, distribuindo os pods de maneira eficiente nos nós disponíveis, levando em consideração as políticas de balanceamento de carga e os requisitos de recursos de cada pod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Em resumo&lt;/strong&gt;&lt;br&gt;
O Karpenter automatiza o processo de escalonamento de recursos em clusters Kubernetes, garantindo que os aplicativos tenham a capacidade necessária para lidar com a demanda atual, ao mesmo tempo em que otimiza a utilização dos recursos disponíveis e evita o desperdício. &lt;/p&gt;

&lt;p&gt;Isso simplifica a operação e a gestão de ambientes Kubernetes, melhorando a eficiência e o desempenho dos aplicativos implantados.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Alguns pontos importantes&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Provisione Worker Nodes com base nos requisitos de Workload.&lt;/li&gt;
&lt;li&gt;Crie diversas configurações de Worker Nodes por tipo de instância, usando opções flexíveis de NodePool. Em vez de gerenciar muitos Worker Nodes Groups personalizados específicos, o Karpenter pode permitir que você gerencie diversas capacidades de carga de trabalho com um NodePool único e flexível.&lt;/li&gt;
&lt;li&gt;Obtenha um melhor agendamento de pods em escala, iniciando rapidamente Worker Nodes e agendando pods.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Alguns conceitos importantes&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AWS NodeTemplate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No contexto do Karpenter se refere a uma funcionalidade específica dessa ferramenta de autoescalonamento para ambientes Kubernetes na AWS. &lt;/p&gt;

&lt;p&gt;O AWS NodeTemplate permite definir configurações detalhadas para instâncias Worker Nodes no AWS EKS, incluindo o tipo de instância, capacidade de CPU, capacidade de memória, sistema operacional, configurações de rede, volumes anexados e outras opções. Essas configurações são essenciais para a criação de novos Worker Nodes no cluster Kubernetes da AWS de acordo com as necessidades específicas de cada aplicativo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SubnetSelector&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;É uma funcionalidade que permite aos usuários selecionar sub-redes específicas dentro de uma VPC para implantar instâncias de Worker Nodes quando estão utilizando o Karpenter em conjunto com o Amazon EKS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SecurityGroupSelector&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;É uma funcionalidade que permite aos usuários selecionar Security Groups específicos da AWS para Worker Nodes que estão sendo implantadas em um cluster Kubernetes gerenciado pelo Karpenter, especialmente quando utilizado em conjunto com o Amazon EKS.&lt;/p&gt;

&lt;p&gt;Essa funcionalidade é útil por várias razões:&lt;/p&gt;

&lt;p&gt;Segurança: Permite implementar políticas de segurança granulares, garantindo que as instâncias de Worker Nodes sejam associadas aos grupos de segurança adequados com as regras de firewall corretas.&lt;/p&gt;

&lt;p&gt;Conformidade: Facilita a conformidade com padrões e regulamentos de segurança, pois você pode garantir que apenas as instâncias de Worker Nodes apropriadas tenham acesso aos recursos de rede necessários.&lt;/p&gt;

&lt;p&gt;Isolamento: Ajuda a isolar o tráfego de rede entre diferentes componentes do seu ambiente Kubernetes na AWS, aumentando a segurança e reduzindo riscos.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Best practices&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Exclua tipos de instância que não se adequam à sua carga de trabalho&lt;/strong&gt;&lt;br&gt;
Considere excluir tipos de instâncias específicas com a chave node.kubernetes.io/instance-type se elas não forem exigidas pelas cargas de trabalho em execução no seu cluster.&lt;/p&gt;

&lt;p&gt;O exemplo a seguir mostra como evitar o provisionamento de grandes instâncias Graviton.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;key: node.kubernetes.io/instance-type
operator: NotIn
values:

&lt;ul&gt;
&lt;li&gt;m6g.16xlarge&lt;/li&gt;
&lt;li&gt;m6gd.16xlarge&lt;/li&gt;
&lt;li&gt;r6g.16xlarge&lt;/li&gt;
&lt;li&gt;r6gd.16xlarge&lt;/li&gt;
&lt;li&gt;c6g.16xlarge&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Habilite o tratamento de interrupções ao usar o Spot&lt;/strong&gt;&lt;br&gt;
Karpenter suporta tratamento de interrupção nativo , habilitado através do --interruption-queue-nameargumento CLI com o nome da fila SQS. O tratamento de interrupções monitora eventos futuros de interrupção involuntária que causariam interrupção em suas cargas de trabalho, como:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avisos de interrupção pontual&lt;/strong&gt;&lt;br&gt;
Eventos de integridade de alteração agendados (eventos de manutenção)&lt;br&gt;
Eventos de encerramento de instância&lt;br&gt;
Eventos de interrupção de instância&lt;br&gt;
Quando o Karpenter detecta que um desses eventos ocorrerá em seus Worker Nodes, ele automaticamente isola, drena e encerra os Worker Nodes antes do evento de interrupção para fornecer o tempo máximo para limpeza da carga de trabalho antes da interrupção&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Criando NodePools&lt;/strong&gt;&lt;br&gt;
As práticas recomendadas a seguir abrangem tópicos relacionados à criação de NodePools.&lt;/p&gt;

&lt;p&gt;Crie vários NodePools quando...&lt;br&gt;
Quando diferentes equipes estão compartilhando um cluster e precisam executar suas cargas de trabalho em diferentes Worker Nodes de trabalho ou têm diferentes requisitos de sistema operacional ou de tipo de instância, crie vários NodePools. Por exemplo, uma equipe pode querer usar o Bottlerocket, enquanto outra pode querer usar o Amazon Linux. Da mesma forma, uma equipe pode ter acesso a hardware de GPU caro que não seria necessário para outra equipe. O uso de vários NodePools garante que os ativos mais apropriados estejam disponíveis para cada equipe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crie NodePools que sejam mutuamente exclusivos ou ponderados&lt;/strong&gt;&lt;br&gt;
Recomenda-se criar NodePools que sejam mutuamente exclusivos ou ponderados para fornecer um comportamento de agendamento consistente. Se não forem e vários NodePools corresponderem, o Karpenter escolherá aleatoriamente qual usar, causando resultados inesperados.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use temporizadores (TTL) para excluir automaticamente Worker Nodes do cluster&lt;/strong&gt;&lt;br&gt;
É possível usar temporizadores em Worker Nodes provisionados para definir quando excluir Worker Nodes que estão desprovidos de pods de carga de trabalho ou que atingiram um tempo de expiração. A expiração do nó pode ser usada como meio de atualização, para que os Worker Nodes sejam retirados e substituídos por versões atualizadas. Consulte Expiração na documentação do Karpenter para obter informações sobre como spec.disruption.expireAfterconfigurar a expiração do nó.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Criando o ambiente do cluster EKS&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Para criar um ambiente com um cluster EKS para nossos testes, iremos utilizar o repositório que está disponivel nesse &lt;a href="https://github.com/rodrigofrs13/workshop-aws-eks-karpenter" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EKS Blueprint&lt;/strong&gt;&lt;br&gt;
O código fonte do repositório acima é baseado nos Blueprints que são forneceidos pela AWS.&lt;br&gt;
A AWS disponibiliza diversos templates prontos, os chamados Blueprints. Eles abstraem as complexidades da infraestrutura permitindo que seja implantado cargas de trabalhado de forma simples e replicavel.&lt;br&gt;
Segue o link dos vários modelos de Blueprint que pode ser acessado clicando &lt;a href="https://github.com/aws-ia/terraform-aws-eks-blueprints?source=post_page-----5294088425e2--------------------------------" rel="noopener noreferrer"&gt;aqui&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clone do repositório&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git clone https://github.com/rodrigofrs13/workshop-aws-eks-karpenter&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Acesse o diretório /workshop-aws-eks-karpenter/enviroment&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editando as variaveis&lt;/strong&gt;&lt;br&gt;
Edite o arquivo arquivo terraform.tfvars com as variaveis que podem ser ajustadas antes de iniciar a instalação, como por exemplo:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;- Região AWS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Nome do cluster EKS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Versão do cluster EKS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Versão dos Addons&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Nome da Role Admin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup do ambiente&lt;/strong&gt;&lt;br&gt;
Com o comando abaixo vamos iniciar o Terraform, criar o Plan e efeturar o setup.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;terraform init &amp;amp;&amp;amp; terraform plan &amp;amp;&amp;amp; terraform apply --auto-approve&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Acesse o diretório /workshop-aws-eks-karpenter/eks-blue&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editando as variaveis&lt;/strong&gt;&lt;br&gt;
Edite o arquivo arquivo terraform.tfvars com as variaveis que podem ser ajustadas antes de iniciar a instalação, como por exemplo:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;- Região AWS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Nome do cluster EKS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Versão do cluster EKS&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Versão dos Addons&lt;/em&gt;&lt;br&gt;
&lt;em&gt;- Nome da Role Admin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup do ambiente&lt;/strong&gt;&lt;br&gt;
Com o comando abaixo vamos iniciar o Terraform, criar o Plan e efeturar o setup.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;terraform init &amp;amp;&amp;amp; terraform plan &amp;amp;&amp;amp; terraform apply --auto-approve&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Acessando o Cluster&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Vamos acessar o cluster EKS com seguinte comando para configurar o kubectl para apontar para o cluster desejado:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;$ aws eks --region &amp;lt;region&amp;gt; update-kubeconfig --name &amp;lt;cluster-name&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Caso tenha perdido o output, utilize o comando &lt;code&gt;terraform output&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Vamos fazer alguns testes iniciais para verificar se está tudo certo com nosso cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coletando algumas informações.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl cluster-info&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb8swcqsvvfcp811scrg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb8swcqsvvfcp811scrg.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verificando os Worker Nodes&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl get nodes -o wide&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t3c2cm2un81x4592u6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t3c2cm2un81x4592u6z.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analisando todos os recursos criados&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl get all -A&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj97npdbu5o55a8mc1k6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj97npdbu5o55a8mc1k6t.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Com isso podemos concluir que nosso cluster está funcionando corretamente e estamos prontos para a próxima etapa, o Karpenter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feehrmaarfs1wrf4m7pli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feehrmaarfs1wrf4m7pli.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Validando o Karpenter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Como utilizamos um Blueprint da AWS, o Karpenter já está instalado e configurado em nosso cluster. &lt;/p&gt;

&lt;p&gt;Vamos fazer algumas validações para confirmar se está tudo certo com o Karpenter.&lt;/p&gt;

&lt;p&gt;Validando os pods.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl get pods -n karpenter&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkrj26eusecqx0h7qhcof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkrj26eusecqx0h7qhcof.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Validando o Provisioner&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl get provisioner&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv3tyrpbv18u2exz2k38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv3tyrpbv18u2exz2k38.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Com isso o Karpenter está instalado e executando com sucesso.&lt;/p&gt;

&lt;p&gt;Vamos criar um NodePool com algumas regras para começarmos a utilizar o Karpenter e escalar de forma inteligente o nosso ambiente.&lt;/p&gt;

&lt;p&gt;Como estamos utilizando os Blueprints já temos a parte GitOps e CICD configurados. Em nosso estudo de caso vamos utilizar uma parte dessas ferramentas.&lt;/p&gt;

&lt;p&gt;O repositório que pode ser acessado por esse &lt;a href="https://github.com/rodrigofrs13/eks-blueprints-workloads" rel="noopener noreferrer"&gt;link &lt;/a&gt; é um Fork de um blueprint que iremos utilizar.&lt;br&gt;
Faça o clone do repositório com o comando &lt;code&gt;git clone https://github.com/rodrigofrs13/eks-blueprints-workloads.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;No deployment da app temos o NodeSelector que indica que todos os pods daquele deployment serão deployados nos Worker Nodes do Karpenter.&lt;/p&gt;

&lt;p&gt;Iremos utilizar o Team Riker para nossos testes, acesse arquivo do Karpenter que está nesse caminho: teams/team-riker/dev/templates/karpenter.yaml.&lt;/p&gt;

&lt;p&gt;Vamos configurar o nosso AWSNodeTemplate e Provisioner para definir algumas configurações para o Karpenter.&lt;/p&gt;

&lt;p&gt;Segue abaixo um exemplo do nosso NodePool&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{{ if .Values.spec.karpenterInstanceProfile }}
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: karpenter-default
  labels:
    {{- toYaml .Values.labels | nindent 4 }}  
spec:
  instanceProfile: '{{ .Values.spec.karpenterInstanceProfile }}'
  subnetSelector:
    kubernetes.io/cluster/{{ .Values.spec.clusterName }}: '*'
    kubernetes.io/role/internal-elb: '1' # to select only private subnets
  securityGroupSelector:
    aws:eks:cluster-name: '{{ .Values.spec.clusterName }}' # Choose only security groups of nodes
  tags:
    karpenter.sh/cluster_name: {{.Values.spec.clusterName}}
    karpenter.sh/provisioner: default
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
  labels:
    {{- toYaml .Values.labels | nindent 4 }}
spec:
  consolidation:
    enabled: true
  #ttlSecondsAfterEmpty: 60 # mutual exclusive with consolitation
  requirements:
    - key: "karpenter.k8s.aws/instance-category"
      operator: In
      values: ["c", "m"]
    - key: karpenter.k8s.aws/instance-cpu
      operator: Lt
      values:
        - '33'    
    - key: 'kubernetes.io/arch'
      operator: In
      values: ['amd64']
    - key: karpenter.sh/capacity-type
      operator: In
      values: ['on-demand']
    - key: kubernetes.io/os
      operator: In
      values:
        - linux
  providerRef:
    name: karpenter-default

  ttlSecondsUntilExpired: 2592000 # 30 Days = 60 * 60 * 24 * 30 Seconds;

  # Priority given to the provisioner when the scheduler considers which provisioner
  # to select. Higher weights indicate higher priority when comparing provisioners.
  # Specifying no weight is equivalent to specifying a weight of 0.
  weight: 1
  limits:
    resources:
      cpu: '2k'
  labels:
    billing-team: default
    team: default
    type: karpenter

  # Do we want to apply some taints on the nodes ?  
  # taints:
  #   - key: karpenter
  #     value: 'true'
  #     effect: NoSchedule

  # Karpenter provides the ability to specify a few additional Kubelet args.
  # These are all optional and provide support for additional customization and use cases.
  kubeletConfiguration:
    containerRuntime: containerd
    maxPods: 110     
    systemReserved:
      cpu: '1'
      memory: 5Gi
      ephemeral-storage: 2Gi
{{ end }}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Alguns itens que foram defidos&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Labels&lt;/strong&gt;&lt;br&gt;
Definimos labels dedicados que podem ser usados ​​por pods como nodeSelectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Taints&lt;/strong&gt;&lt;br&gt;
Podemos adicionar taints aos Worker Nodes para que as cargas de trabalho precisem tolerar que esses taints sejam agendados nos Worker Nodes do Karpenter.&lt;/p&gt;

&lt;p&gt;Especificamos alguns requisitos em torno de tipos de instâncias, capacidade e arquitetura; cada provisionador é altamente personalizável.&lt;/p&gt;

&lt;p&gt;Após os ajustes vamos fazer o commit e o push&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git add teams/team-riker/dev/templates/karpenter.yaml&lt;br&gt;
git commit -m "Add Karpenter provisioner"&lt;br&gt;
git push&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Podemos verificar com o eks-node-view(para saber mais clique aqui) abaixo e já podemos ver que alguns Worker Nodes do Karpenter já foram provisionados, cada um em uma AZ por conta das configurações que fizemos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphn347s31w0lgtlnlnec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphn347s31w0lgtlnlnec.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffif1qq79xn5ndgr8l0uv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffif1qq79xn5ndgr8l0uv.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;O próximo passo é aumentar o nosso Workload para que o Karpenter começe a dimencionar os Worker Nodes.&lt;/p&gt;

&lt;p&gt;Vamos escalar a app skiapp-deployment do Team Riker com o comando abaixo:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;&amp;gt; kubectl scale deployment -n team-riker skiapp-deployment --replicas 30&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Podemos notar que o Karpenter começa a escalar os novos Worker Nodes conforme os requirimentos baseando sempre na menor instância. &lt;/p&gt;

&lt;p&gt;Em um primerio momento o Karpenter adcionou 3 novos Worker Nodes (Atendendo o requisito de Multi-AZ) menores para suportar o aumento do Workload. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmugndlzctb0awvptfqjt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmugndlzctb0awvptfqjt.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No segundo momento o Karpenter adcionou 3 novos Worker Nodes (Atendendo o requisito de Multi-AZ) menores para suportar o aumento do Workload. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r80xj2ewsou7mlkyn3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4r80xj2ewsou7mlkyn3d.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vamos escalar a app skiapp-deployment do Team Riker com o comando abaixo para simular um consumo alto de WorkLoad&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl scale deployment -n team-riker skiapp-deployment --replicas 100&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Podemos notar em um primeiro momento que o Karpenter adcionou novas instancias.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foaio31o7fv1tt2nn6hko.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foaio31o7fv1tt2nn6hko.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Em um segundo momento o Karpenter alterou o tipo de instancia para melhorar a eficiencia do ambiente.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3s9d6ye4s92y7w2tunnt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3s9d6ye4s92y7w2tunnt.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;O Karpenter sempre tenta infinitamente economizar custos em seu cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Removendo o ambiente&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Vamos dismobilizar a infra reduzindo o Workload para 1&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;kubectl scale deployment -n team-riker skiapp-deployment --replicas 1&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Removendo o cluster&lt;/strong&gt;&lt;br&gt;
Acesse o diretório /workshop-aws-eks-karpenter/enviroment&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;terraform destroy--auto-approve&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Após a conclusão, acesse o diretório /workshop-aws-eks-karpenter/eks-blue e execute o comando abaixo&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;terraform destroy--auto-approve&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusão&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Agora temos a capacidade de adicionar mais capacidade ao nosso cluster para dimensionar nosso Workload e o Karpenter garante um equilíbrio entre instâncias e custos de forma inteligente. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Referências&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.github.io/aws-eks-best-practices/karpenter/" rel="noopener noreferrer"&gt;https://aws.github.io/aws-eks-best-practices/karpenter/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>eks</category>
      <category>devops</category>
      <category>awscommunitybuilders</category>
      <category>terraform</category>
    </item>
    <item>
      <title>AWS Artifact</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Fri, 19 May 2023 14:35:17 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/aws-artifact-1ll7</link>
      <guid>https://dev.to/rodrigofrs13/aws-artifact-1ll7</guid>
      <description>&lt;p&gt;&lt;strong&gt;Anotações sobre AWS Artifact para ajudar na preparação das certificações AWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Até o momento as anotações são para as certificações abaixo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse0sqtegn424nami6rz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse0sqtegn424nami6rz8.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/artifact/" rel="noopener noreferrer"&gt;Link oficial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/artifact/faq/" rel="noopener noreferrer"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Anotações gerais&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Acesso aos relatórios emitidos pelo auditor da AWS&lt;/li&gt;
&lt;li&gt;ISO 9001:2015 Certification&lt;/li&gt;
&lt;li&gt;HIPPA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PCI - Payment Card Industry&lt;/strong&gt; (PCI - Setor de cartões de pagamento)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOC - Service Organization Control&lt;/strong&gt; (SOC - Controle de organização de serviço)&lt;/li&gt;
&lt;li&gt;Não é um serviço AWS&lt;/li&gt;
&lt;li&gt;Tem que ter permissões para acessar os relatórios&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Identity and access management in AWS Artifact&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/artifact/latest/ug/security-iam.html#report-permissions" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exemplo policy que libera todos os relatórios&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "artifact:Get"
            ],
            "Resource": [
                "arn:aws:artifact:::report-package/*"
            ]
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Exemplo policy que libera os relatórios de SOC, PCI e ISO&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "artifact:Get"
            ],
            "Resource": [
                "arn:aws:artifact:::report-package/Certifications and Attestations/SOC/*",
                "arn:aws:artifact:::report-package/Certifications and Attestations/PCI/*",
                "arn:aws:artifact:::report-package/Certifications and Attestations/ISO/*"
            ]
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;PCI DSS&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/pt/compliance/pci-dss-level-1-faqs/" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>devops</category>
      <category>security</category>
      <category>aws</category>
    </item>
    <item>
      <title>Amazon Route53</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Fri, 19 May 2023 14:35:04 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/amazon-route53-3gj4</link>
      <guid>https://dev.to/rodrigofrs13/amazon-route53-3gj4</guid>
      <description>&lt;p&gt;&lt;strong&gt;Anotações sobre o Amazon Route53 para ajudar na preparação das certificações AWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Até o momento as anotações são para as certificações abaixo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" alt="Image description" width="168" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/route53/"&gt;Link oficial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/route53/faqs/"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Anotações gerais&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Serviço de DNS&lt;/li&gt;
&lt;li&gt;Envia logs para CloudWatch Logs&lt;/li&gt;
&lt;li&gt;Tem que adicionar a VPC na Query dos Logs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Public DNS query logging&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/query-logs.html"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logs recebidos&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Domínio ou subdomínio que foi solicitado&lt;/li&gt;
&lt;li&gt;Data e hora do pedido&lt;/li&gt;
&lt;li&gt;Tipo de registro DNS (como A ou AAAA)&lt;/li&gt;
&lt;li&gt;Edge Location do Route 53 que respondeu à consulta DNS&lt;/li&gt;
&lt;li&gt;Código de resposta DNS, como &lt;code&gt;NoError&lt;/code&gt;ou &lt;code&gt;ServFail&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Os Logs podem ser enviados para&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch Logs &lt;/li&gt;
&lt;li&gt;Bucket S3&lt;/li&gt;
&lt;li&gt;Kinesis Data Firehose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Referências&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs.html"&gt;Resolver query logging&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/pt_br/Route53/latest/DeveloperGuide/resolver-query-logs-format.html"&gt;Valores que aparecem em logs de consultas do Resolver&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>AWS Network Firewall</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Fri, 19 May 2023 14:34:48 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/aws-network-firewall-4129</link>
      <guid>https://dev.to/rodrigofrs13/aws-network-firewall-4129</guid>
      <description>&lt;p&gt;&lt;strong&gt;Anotações sobre o AWS Network Firewall para ajudar na preparação das certificações AWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Até o momento as anotações são para as certificações abaixo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" alt="Image description" width="168" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/network-firewall/?whats-new-cards.sort-by=item.additionalFields.postDateTime&amp;amp;whats-new-cards.sort-order=desc"&gt;Link oficial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/network-firewall/faqs/"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Definição do fornecedor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;O AWS Network Firewall é um serviço gerenciado que facilita a implantação de proteções básicas de rede para todas as suas Amazon Virtual Private Clouds (VPCs).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8qzcN4Gc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/udqvjz0gzr3lmaqywud2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8qzcN4Gc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/udqvjz0gzr3lmaqywud2.png" alt="Image description" width="597" height="386"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Anotações gerais&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sempre associado a uma VPC&lt;/li&gt;
&lt;li&gt;Statefull/Stateless Firewall&lt;/li&gt;
&lt;li&gt;Sistema de prevenção de intrusões (IPS)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Não protege contra DDoS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Gerencia liberação ou bloqueio de URL´s&lt;/li&gt;
&lt;li&gt;HTTP/HTTPS&lt;/li&gt;
&lt;li&gt;Pode importar regras da Suricata&lt;/li&gt;
&lt;li&gt;Não é capaz de permitir que EC2 em uma sub-rede privada se conectem à Internet enquanto impede conexões externas.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Delete protection&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Protege o firewall contra exclusão. Use esta configuração para proteger contra a exclusão acidental de um firewall que está em uso.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Veem habilitado por padrão&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Subnet change protection&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Protege o firewall contra alterações nas associações de sub-rede. Use essa configuração para proteger contra a modificação acidental das associações de sub-rede de um firewall que está em uso.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Veem habilitado por padrão&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Network Firewall Policy&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Adiciona múltiplos Role Groups e outras configurações &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Network Firewall Role Group&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Statefull ou Stateless&lt;/li&gt;
&lt;li&gt;Cria as regras de bloqueio na VPC&lt;/li&gt;
&lt;li&gt;Pode inserir domínios como "facebook.com"&lt;/li&gt;
&lt;li&gt;Pode inserir ip´s diretos como "8.8.8.8" &lt;/li&gt;
&lt;li&gt;Pode bloquear por protocolo (HTTP, HTTPs, ICMP)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>AWS Firewall Manager</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Fri, 19 May 2023 14:34:32 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/aws-firewall-manager-204g</link>
      <guid>https://dev.to/rodrigofrs13/aws-firewall-manager-204g</guid>
      <description>&lt;p&gt;&lt;strong&gt;Anotações sobre o AWS Firewall Manager para ajudar na preparação das certificações AWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Até o momento as anotações são para as certificações abaixo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KCHawoHT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se0sqtegn424nami6rz8.png" alt="Image description" width="168" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/firewall-manager/"&gt;Link oficial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/firewall-manager/faqs/"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Definição do fornecedor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;O AWS Firewall Manager é um serviço de gerenciamento de segurança que permite a configuração e o gerenciamento centralizados de regras do firewall entre todas as contas e aplicações no AWS Organizations.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Anotações gerais&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pré requisito -&amp;gt; &lt;em&gt;AWS Organizations&lt;/em&gt; e &lt;em&gt;AWS Config&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Alto custo&lt;/li&gt;
&lt;li&gt;Simplifica as tarefas de administração e manutenção de grupos de segurança AWS WAF, AWS Shield Advanced e Amazon VPC em várias contas e recursos.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Serviços suportados&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AWS WAF&lt;/li&gt;
&lt;li&gt;Security Groups&lt;/li&gt;
&lt;li&gt;AWS Network Firewall&lt;/li&gt;
&lt;li&gt;Route53 DNS Firewall&lt;/li&gt;
&lt;li&gt;AWS Shield Advanced&lt;/li&gt;
&lt;li&gt;Palo Alto Cloud Next-generation Firewalls&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Referências&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/firewall-manager/pricing/"&gt;Definição de preço do AWS Firewall Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tutorialsdojo.com/aws-firewall-manager/"&gt;Tutorial DOJO - AWS Firewall Manager&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>AWS Organizations</title>
      <dc:creator>Rodrigo Fernandes</dc:creator>
      <pubDate>Fri, 19 May 2023 14:34:16 +0000</pubDate>
      <link>https://dev.to/rodrigofrs13/aws-organizations-13ih</link>
      <guid>https://dev.to/rodrigofrs13/aws-organizations-13ih</guid>
      <description>&lt;p&gt;&lt;strong&gt;Anotações sobre o AWS Organizations para ajudar na preparação das certificações AWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Até o momento as anotações são para as certificações abaixo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse0sqtegn424nami6rz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fse0sqtegn424nami6rz8.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/organizations/" rel="noopener noreferrer"&gt;Link oficial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/pt/organizations/faqs/" rel="noopener noreferrer"&gt;FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Definição do fornecedor&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;O AWS Organizations ajuda você a gerenciar e controlar seu ambiente de maneira centralizada à medida que os negócios e seus recursos da AWS expandem. &lt;br&gt;
Usando o AWS Organizations, você pode criar novas contas da AWS e alocar recursos, agrupar contas para organizar seus fluxos de trabalho, aplicar políticas a contas ou grupos para governança e simplificar o faturamento usando um único método de pagamento para todas as suas contas.&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Anotações gerais&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Global Service&lt;/li&gt;
&lt;li&gt;Para automatizar a criação de AWS Accounts&lt;/li&gt;
&lt;li&gt;Gerenciamento centralizado de todas as contas&lt;/li&gt;
&lt;li&gt;Agrupamento - OU&lt;/li&gt;
&lt;li&gt;Controle de serviços/API por conta&lt;/li&gt;
&lt;li&gt;Habilita CloudTrail em todas as contas para enviar os Log´s para um Bucket S3 central&lt;/li&gt;
&lt;li&gt;Envia todos os CloudWatch Logs para uma conta central&lt;/li&gt;
&lt;li&gt;Para remover uma conta, a conta AWS deve ser capaz de operar como uma conta autônoma. Só então ele pode ser removido das organizações AWS&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Consolidated Billing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html?icmpid=docs_orgs_console#feature-set-cb-only" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descontos&lt;/li&gt;
&lt;li&gt;Volume&lt;/li&gt;
&lt;li&gt;Reserved Instances&lt;/li&gt;
&lt;li&gt;Só recebe os descontos se as EC2 estiverem na mesma AZ&lt;/li&gt;
&lt;li&gt;Saving Plans&lt;/li&gt;
&lt;li&gt;Uma invoice&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Service Control Polices (SCP)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Política da organização que você pode usar para gerenciar permissões em sua organização.&lt;/li&gt;
&lt;li&gt;Um SCP abrange todos os usuários, grupos e funções do IAM, incluindo o usuário raiz da conta da AWS.&lt;/li&gt;
&lt;li&gt;Whitelist e Black List&lt;/li&gt;
&lt;li&gt;Apply OU ou Account level&lt;/li&gt;
&lt;li&gt;Não é aplicado na conta Master&lt;/li&gt;
&lt;li&gt;Aplica em todos os usuários e roles, incluindo o root&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deve ter a permissão Explicito&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Organizational unit (OU)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_ous.html" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Uma maneira mais fácil de controlar o acesso aos recursos da AWS usando a organização AWS dos principais IAM&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/pt/blogs/security/control-access-to-aws-resources-by-using-the-aws-organization-of-iam-principals/" rel="noopener noreferrer"&gt;Documentação oficial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Para alguns serviços, você concede permissões usando  resource-based policies para especificar as contas e principais que podem acessar o recurso e quais ações podem executar nele. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;aws:PrincipalOrgID&lt;/strong&gt; - nessas políticas para exigir que todos os principais que acessam o recurso sejam de uma conta (incluindo a conta mestra) na organização. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Exemplo&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowGetObject",
            "Effect": "Allow",
            "Principal": {
                "AWS":[
                        "arn:aws:iam::094697565664:user/Casey",
                        "arn:aws:iam::094697565664:user/David",
                        "arn:aws:iam::094697565664:user/Tom",
                        "arn:aws:iam::094697565664:user/Michael",
                        "arn:aws:iam::094697565664:user/Brenda",
                        "arn:aws:iam::094697565664:user/Lisa",
                        "arn:aws:iam::094697565664:user/Norman",
                        "arn:aws:iam::094697565646:user/Steve",
                        "arn:aws:iam::087695765465:user/Douglas",
                        "arn:aws:iam::087695765465:user/Michelle"
]
},
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::2018-Financial-Data/*",
**            "Condition": {"StringEquals": 
                             {"aws:PrincipalOrgID": [ "o-yyyyyyyyyy" ]}
                         }**
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>devops</category>
      <category>aws</category>
      <category>cloud</category>
      <category>security</category>
    </item>
  </channel>
</rss>
