Disruption budgets were introduced in version 0.36, and it looks like a very interesting tool to limit Karpenter from recreating WorkerNodes.
For example, in my case, we don’t want EC2 instances to be killed during business hours in the US because we have customers there, so we currently have consolidationPolicy=whenEmpty
to prevent "unnecessary" deletion of servers and Pods on them.
Instead, with Disruption budgets, we can configure policies in such a way that operations with WhenEmpty
are allowed in one period of time, and WhenEmptyOrUnderutilized
in another.
See also Kubernetes: ensuring High Availability for Pods — because when using Karpenter, even with Disruption budgets configured, you need to have pods with Topology Spread and PodDisruptionBudget configured accordingly.
Karpenter Disruption types
Documentation — Automated Graceful Methods.
First, let’s see in which cases Disruption occurs at all:
- Drift: occurs when there is a difference between the created NodePools or EC2NodeClass configurations and the existing WorkerNodes — then Karpenter will start recreating EC2 to bring them in line with the specified parameters
- Interruption: occurs if Karpenter receives an AWS Event about an instance will be terminated, for example, if it is a Spot
-
Consolidation: if we have a Consolidation set to
WhenEmptyOrUnderutilized
orWhenEmpty
, and Karpenter moves our Pods to other WorkerNodes - Note: we have Karpenter v1.0, so the policy is called
WhenEmptyOrUnderutilized
, for the v0.36 or v0.37 it'sWhenUnderutilized
Karpenter Disruption Budgets
With the help of Disruption budgets, we can very flexibly configure when and what operations Karpenter can perform, and set a limit on how many WorkerNodes will be deleted at the same time.
Documentation — NodePool Disruption Budgets.
The configuration format is quite simple:
budgets:
- nodes: "20%"
reasons:
- "Empty"
schedule: "@daily"
duration: 10m
Here we set:
- allow deletion of WorkerNodes for 20% of the total number
- for the operation when Disruption is triggered by the WhenEmpty condition
- we do this every day
- for 10 minutes
Parameters here can have values as:
-
nodes
: as a percentage or a number of nodes -
reasons
:Drifted
,Underutilized
, orEmpty
-
schedule
: the schedule by which the rule is applied, in UTC (other timezones are not yet supported), see Kubernetes Schedule syntax -
duration
: and how long the rule is in effect, for example1h15m
Also, it is not necessary to set all the parameters.
For example, we can describe two such budgets:
- nodes: "25%"
- nodes: "10"
Then we will have both rules working all the time, and the first one limits the number of nodes to 25% of the total number, and the second one limits the number of nodes to no more than 10 instances if we have more than 40 servers.
Also, Budgets can be combined, and if you set several of them, the limits will be taken according to the most strict one.
In the first example, we apply the rule for 20% of nodes and the WhenEmpty
condition, and the rest of the time the default disruption rules will work - that is, 10% of the total number of servers with the specified consolidationPolicy
.
Therefore, we can write the rule as follows:
budgets:
- nodes: "20%"
reasons:
- "Empty"
schedule: "@daily"
duration: 10m
- nodes: 0
Here, the last rule works all the time, and will be a kind of fuse: we prohibit everything, but allow disruptions to be executed according to the WhenEmpty
policy for 10 minutes once a day starting from 00:00 UTC.
Disruption Budgets example
Going back to my task:
- we have a Backend API in Kubernetes on a dedicated NodePool, and our customers are mostly from the USA, so we want to minimize the down-scaling of WorkerNodes during US business hours
- to do this, we want to block all operations on
WhenUnderutilized
during working hours in the USA Central Time's - Karpenter’s
schedule
uses the UTC zone, so the start of the working day in the USA Central Time 9:00 is 15:00 UTC - operations with
WhenEmpty
are allowed at any time, but only 1 WorkerNode at a time -
Drift
- similarly, because when I deploy changes, I want to see the result immediately
So, in fact, we need to set two budgets:
- for
Underutilized
- we prohibit everything from Monday to Friday for 9 hours starting from 15:00 UTC - for
Empty
andDrifted
- allow at any time, but only 1 node at a time, not the default 10%
Then our NodePool will look like this:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: backend1a
spec:
template:
metadata:
labels:
created-by: karpenter
component: devops
spec:
taints:
- key: BackendOnly
operator: Exists
effect: NoSchedule
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: defaultv1a
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c5"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["large", "xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# total cluster limits
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 600s
budgets:
- nodes: "0" # block all
reasons:
- "Underutilized" # if reason == underutilized
schedule: "0 15 * * mon-fri" # starting at 15:00 UTC during weekdays
duration: 9h # during 9 hours
- nodes: "1" # allow by 1 WorkerNode at a time
reasons:
- "Empty"
- "Drifted"
Deploy it, check NodePool:
$ kk describe nodepool backend1a
Name: backend1a
...
API Version: karpenter.sh/v1
Kind: NodePool
...
Spec:
Disruption:
Budgets:
Duration: 9h
Nodes: 0
Reasons:
Underutilized
Schedule: 0 15 * * mon-fri
Nodes: 1
Reasons:
Empty
Drifted
Consolidate After: 600s
Consolidation Policy: WhenEmptyOrUnderutilized
...
And we can see in the Karpenter’s logs that a Disruption was triggered by WhenUnderutilized
:
karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:48:26.777Z","logger":"controller","message":"disrupting nodeclaim(s) via delete, terminating 1 nodes (2 pods) ip-10-0-42-250.ec2.internal/t3.small/spot","commit":"62a726c","controller":"disruption","namespace":"","name":"","reconcileID":"db2233c3-c64b-41f2-a656-d6a5addeda8a","command-id":"1cd3a8d8-57e9-4107-a701-bd167ed23686","reason":"underutilized"}
karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:48:27.016Z","logger":"controller","message":"tainted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-42-250.ec2.internal"},"namespace":"","name":"ip-10-0-42-250.ec2.internal","reconcileID":"f0815e43-94fb-4546-9663-377441677028","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"}
karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:50:35.212Z","logger":"controller","message":"deleted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-42-250.ec2.internal"},"namespace":"","name":"ip-10-0-42-250.ec2.internal","reconcileID":"208e5ff7-8371-442a-9c02-919e3525001b"}
Done.
Originally published at RTFM: Linux, DevOps, and system administration.
Top comments (0)