DEV Community

Soushi Hiruta
Soushi Hiruta

Posted on

[AWS] EKS Auto Mode Node lifecycle [EKS]

Introduction

Image description

Node Lifecycle

nodes launched by EKS Auto Mode have a maximum lifetime of 21 days (which you can reduce), after which they are automatically replaced with new nodes.

Terminates instances after 336 hours by default
https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html

      spec:
        expireAfter: 336h
Enter fullscreen mode Exit fullscreen mode

The upper use Node disruption.
https://karpenter.sh/docs/concepts/disruption/

Karpenter automatically discovers disruptable nodes and spins up replacements when needed.

Concept of Disruption Controller

  1. Deciding the priority of interrupted nodes

  2. Interruption node checks disruption budget

spec.disruption.budgets. If undefined, Karpenter will default to one budget with nodes: 10%

  spec:
    disruption:
      budgets:
      - nodes: 10%
Enter fullscreen mode Exit fullscreen mode
  1. The need for replacement nodes
        taints:
        - effect: NoSchedule
          key: CriticalAddonsOnly
        terminationGracePeriod: 24h0m0s
Enter fullscreen mode Exit fullscreen mode

By assigning CriticalAddonsOnly as a taint to a node, you can prevent Pods other than system Pods from being deployed to that node.

  1. Wait until the replacement node starts up.

Delete the node(s) and wait for the Termination Controller to gracefully shutdown the node(s).

Consolidation is configured by consolidationPolicy and consolidateAfter.

  spec:
    disruption:
      budgets:
      - nodes: 10%
      consolidateAfter: 30s
Enter fullscreen mode Exit fullscreen mode

This can be used in cases where ECS application spin-up is slow, to delay node replacement to a certain extent.

Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed

Node resource efficiency is automatically adjusted by adjusting the node instance type.

Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation

When using anti-affinity or topology, this setting takes precedence.

If interruption-handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads.

It is advisable to monitor interrupt events.

Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster,but node repair feature is alpha feature.

Since APIs other than GA cannot be enabled with EKS Feature gate, I believe this cannot be used.

Try Custom NodePool

Get Nodepools

kubectl get nodepools  -o yaml > nodepools.yaml
Enter fullscreen mode Exit fullscreen mode

expireAfter parameter edit

kubectl apply -f nodepools.yaml
Enter fullscreen mode Exit fullscreen mode

The settings will be reflected immediately.

Top comments (0)