DEV Community: Ant(on) Weiss

Truly Reactive Cloud Native AI Agents with Kagent and Khook

Ant(on) Weiss — Tue, 09 Sep 2025 13:43:54 +0000

Agent Cloud-Native in the House!

Excited by the vision of smart AI agents watching over your Kubernetes cluster?
Want an easy fully cloud-native way to run your agentic software?
Time to discover Kagent!
It’s a fairly young OSS project started by the folks at Solo.io which aims at making the building and running of AI Agents on Kubernetes easy and fun. With all the bells and whistles one would expect - authz and authn, security, visualization, governance and audit, optimization, you name it. Some of it on the roadmap, some of it being built as we speak.

Joining the Community

I started playing with Kagent a couple of months ago, loved its ergonomics and decided to build a workshop around it - meanwhile fixing some docs, submitting PRs and joining the community meetings. If you’re looking for a great OSS community to join - look no further - it’s a warm and welcoming bunch of very smart folks. And there’s a lot of work to be done!

The Need for Reactivity

So after working with Kagent for a while I realized something was missing. For me, anyway. You see - conceptually the difference between an agent and a tool is that an agent acts on your behalf, taking decisions in alignment with the declared goals and guidelines. (While with a tool - you need to take all of the decisions yourself and command your wishes.) In this respect - much of the cloud native software is already agentic. Anything built on KRM (
Kubernetes Resource Model) is declarative by nature - the user declares the desired state and then the operators or the controllers make sure it becomes the actual state. And in that respect - they are definitely our agents - acting on our behalf, recreating missing pods, reconciling broken states. True agents are reactive by nature - they listen on events and correct course accordingly with state or goal declarations being just one type of event.

And that’s exactly what Kagent didn’t have. Until now in order to summon an agent - one needed to chat to it - either in Kagent’s sleek Web UI, over CLI or API. But what good is an agent if it just sits there waiting for instructions? I wanted a way to make my agents reactive.

Khook

Enter Khook - a Kubernetes controller that allows defining:

Kubernetes events to listen to
the agent to call
the templated prompt to pass to the agent

Khook assumes autonomous remediation and incident response. Finally - the ops person's dream come true!

The following diagram shows how Khook works:

It’s been a while since I’ve developed a full-blown Kubernetes controller from scratch and I have to thank Kiro and Cursor (and specifically Sonnet 3.7) for taking care of all the boilerplate for me. This definitely made me more productive.

A Hook to the Future

Last Monday I presented Khook to the Kagent community and it makes me happy that it was met with a lot of excitement. In fact - since Friday - the khook repository (with the gracious help from Eitan Yarmush) has been transferred in the kagent-dev org on Github and we’re actively looking for early users and contributors to take the hooks for test drives and find interesting bugs. Oh, and if you clicked on the link - make sure to star the repo. It will make my day brighter 🙏

While building this project I realized it can be of much wider use - becoming the connecting tissue between virtually any type of event (think task queues, DB transactions, webhooks) and any type of A2A-compatible agent (all agent communication in Kagent is based on the A2A protocol). Especially now with Kagent supporting BYO agents. So yes - a lot of space for improvement, innovation, experimentation.

Find this exciting too? Drop me a line, join the Kagent community and contribute to Kagent or Khook or both.

And may our future be agentic.

Here's a demo of Khook triggering a Kagent agent:

Improve App Availability with Preemptible Pods and PriorityClasses

Ant(on) Weiss — Tue, 20 Aug 2024 09:37:12 +0000

Multiple apps are competing for resources in your cluster? Want to optimize resource allocation and application uptime?
Look into configuring PriorityClasses and preemptible pods.

Here's how it works:

For the full overview and a practical walkthrough - read here

Karpenter moving to 1.0.0 - with the new stability guarantees

Ant(on) Weiss — Wed, 14 Aug 2024 08:44:05 +0000

Karpenter is slowly but surely becoming the de-facto standard node autoscaler for Kubernetes. It started at AWS and is now getting adopted for AKS on Azure.

Many organizations already switched to Karpenter from whatever they were using - if it's the good old cluster-autoscaler or a commercial pay-to-scale solution from 3rd party vendors.

Now Karpenter is also a part of the Kubernetes autoscaling SIG. And that's why the Karpenter team decided it's a great time to promote sigs.k8s.io/karpenter to package version v1.0.0.

As the official proposal says: "The sigs.k8s.io/karpenter package has long-been in a production-ready state for users, but has not reflected this production-ready state through its versioning scheme. Given the first initial release of v1 APIs within Karpenter, the maintainer team feels this is the best time to make the bump to v1.0.0."

New Stability Guarantees

The linked issue goes on to outline the new stability guarantees - but these are of course not referring to the stability of Karpenter as a product. Instead it's talking about Karpenter APIs now being subject to standard Kubernetes stability guarantees. While the package itself may be subject to breaking changes within the v1.x.y major version without a bump to v2.x.y.

Karpenter is Maturing

All in all - this is great news. Karpenter has been reliable and cost-effective for quite some time but now it's also maturing as an OSS project and a part of the Kubernetes ecosystem.

Interested in how to get the most out of your Karpenter when combined with pod optimization? Read this post I wrote for PerfectScale a while ago.

Have you made the switch to Karpenter? Did it provide the optimization you expected? Share in comments!

9 Ways to Spin Up an EKS Cluster - Way 4 - CloudFormation

Ant(on) Weiss — Sun, 04 Aug 2024 14:54:09 +0000

I cheated for this one! Read on to see how:

AWS Cloudformation is a robust Infrastructure-As-Code tool. It's well-supported and allows us to write templates in JSON or YAML. And it is also used behind the scenes by a number of tools in the cloud native ecosystem. For example kops. But more importantly for this post - it's used by eksctl!

In my previous post I used eksctl to spin up a cluster complete with Karpenter and a few more add-ons.

So for this post - instead of starting from scratch I decided to just reuse the stack template created by eksctl.

Yes, I could have used one of the quickstart stacks provided by AWS but this repo has so many options that I got lost reading the docs.

So instead I just opted to export the template from the existing stack. That's the cheating part :)

Exporting the CloudFormation template

But how does one export a CloudFormation template from eksctl? As I found out - this was requested repeatedly, but never implemented. See here for example.
So instead I went to the CloudFormation console in AWS, clicked on the stacks the I wanted (the ones eksctl generated) and went to the 'Template' tab.

But as you can notice - this only gives us JSON. Now I'm a YAML engineer, so I wanted yaml. For which I clicked on 'View in Application Composer' - and when there - 'Template' and switched the toggle to 'YAML'. Voila - I can now copy the template text and continue editing it on my laptop.

In its basic form eksctl creates 2 stacks:

one for the EKS control plane
another one for the managed nodegroup

I could've bundled both stacks into one file but this separation actually makes a lot of sense. We may want more than one node group in our EKS, or we may decide to let go of node groups and opt to manage nodes with Karpenter. (You should!)

So I created 2 template files - eks.yaml (for the control plane) and ng.yaml (for the node group).
I've edited them both so they have no hard-coded resource names. Everything is based on the stack name you chose.

Second stack receives the name of the first stack as a parameter and uses some of its exports like this:



# the parameter:
Parameters:
  ClusterStack:
    Description: Name of the ClusterStack
    Type: String
    Default: eks-way4
# and the reference:
SecurityGroupIds:
          - Fn::ImportValue: !Sub '${ClusterStack}::ClusterSecurityGroupId'

I'm also defining SSH access to the nodes with an imported SSH key.
You could of course skip the whole SSH stuff, but I tend to believe it's important - especially when bringing up clusters for learning purposes.

Spinning Up EKS iwth CloudFormation

Here's how to use this:

Clone the example repo: ```bash

git clone https://github.com/antweiss/9-ways-2-EKS.git

2. Change into the cloudformation folder:
```bash


cd 9-ways-2-EKS/way-4-cloudformation

Generate an ssh key: ```bash

ssh-keygen -f ./id_rsa -N '' -C eks-way4

4. Insert the public key into the node group template:
I'm saving the original file with *bak* extension.
```bash


sed -ibak "s/SSH_KEY/$(cat id_rsa.pub)/g" ng.yaml

The result should look something like (ng.yaml line 15):



ImportedKeyPair:
    Type: AWS::EC2::KeyPair
    Properties:
      KeyName: !Ref AWS::StackName
     # this was PublicKeyMaterial: SSH_KEY
      PublicKeyMaterial: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMgmzvgvz7NENY5X25QFLFlMHVCp7U98ykm1s3+JYftI eks-way4

And finally run the deploy.sh script with 2 parameters - the desired name of the cluster and the AWS region: ```bash

./deploy.sh eks-way4 eu-central-1


Inside the script this is translated into the following 2 CFN (cloudformation) invocations:
```bash


aws cloudformation create-stack --stack-name $1 \
                                --region $2 \
                                --template-body file://eks.yaml \
                                --capabilities CAPABILITY_NAMED_IAM
aws cloudformation create-stack --stack-name $1-ng \
                                --parameters ParameterKey=ClusterStack,ParameterValue=$1 \
                                --region $2 \
                                --template-body file://ng.yaml \
                                --capabilities CAPABILITY_NAMED_IAM

Things to note here: the necessary capability CAPABILITY_NAMED_IAM
and the parameter named ClusterStack that I'm passing to the second stack.

After a short while we can verify the state of our stacks:



aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE --region eu-central-1 --max-items=2 --query "StackSummaries[*].StackName"

If this returns the names of our 2 stacks:



[
    "eks-way4-ng",
    "eks-way4"
]

then we're great! If not - proceed to the CloudFormation UI in AWS console to find the reasons why your stack creation failed.

Summary

CloudFormation is a great IaC tool if you're fine with AWS vendor-lock. It's definitely possible to create an EKS cluster with CloudFormation and that's what such tools as kops and eksctl do under the hood.
While writing CloudFormation from scratch is no fun - we can use the templates generated by eksctl - as I did. Or we can build the templates ourselves in the AWS Application Composer. But then we need to take into the account everything necessary for the cluster operation - security groups, gateways, routing rules, IAM policies. And that's a lot to take care of.

Are you using pure CloudFormation as your IaC tool?

We Can Resize Pods without Restarts! Or Can't We?

Ant(on) Weiss — Thu, 01 Aug 2024 09:22:56 +0000

Kubernetes v1.27 released in April 2023 came with an exciting announcement - we can now resize pod CPU and memory requests and limits in-place! Without deleting the pod or even restarting the containers!

This happened more than a year ago and since then a lot of folks seem to think this feature is already publicly available or is due to become so tomorrow.

But the reality is that this was originally released as an Alpha feature and since then had no success moving to Beta due to a number of unresolved issues.

Latest status as of June 2024 is that it has been pushed back to v1.32:

Here's the link to that comment on Github.

So first of all - this isn't coming tomorrow. But we can still play with the feature and understand its advantages and shortcomings. Which is exactly what I'm planning to do in this post.

Get a Cluster with Alpha Features

k3d is irreplaceable when we want quickly and cheaply test Kubernetes Alpha features. All we need to do is to pass the correct feature gate to the correct control plane component.

Install k3d

If you still haven't done so - install k3d:
with curl and bash:

curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash

or with another method of your choice listed here

In our case the component is the API server and the feature gate is called InPlacePodVerticalScaling as can be seen here

I'm spinning up a single-node cluster with the following config:

cat <<'EOF' | k3d cluster create -c -
apiVersion: k3d.io/v1alpha3
kind: Simple
name: pod-resize
servers: 1
image: rancher/k3s:v1.30.2-k3s2
options:
  k3d:
    disableLoadbalancer: true
  k3s:
    extraArgs: # the feature gate is passed here
      - arg: --kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true
        nodeFilters:
          - server:*
EOF

The Happy Path - Updating the CPU

Now let's create a pod with one container defining resource requests and limits.

apiVersion: v1
kind: Pod
metadata:
  name: stress
spec:
  containers:
  - image: progrium/stress
    args: ["--cpu", "1", "--vm", "1", "--vm-bytes", "128M", "--vm-hang", "3"]
    name: stress
    resources:
      requests:
        memory: 150M
        cpu: 100m
      limits:
        memory: 150M
        cpu: 100m

You can create the pod with:

kubectl apply -f https://raw.githubusercontent.com/perfectscale-io/inplace-pod-resize/main/guaranteed.yaml

I'm using progrium/stress and setting it up for slow success by requesting a tenth of the CPU it needs and just enough memory.

stress --vm 1 --vm-bytes 128M --vm-hang 3 - this tells stress to spawn one worker that allocates 128 Mb of memory and then releases them every 3 seconds.
My pod is only currently allowed to have 150M of memory, so I expect it to run fine.

While this 'stress --cpu 1 tells the container to use one whole CPU. While it's actually allowed to only use 0.1 CPU. So it'll surely get throttled.

The container starts just fine:

kubectl get pod
NAME     READY   STATUS      RESTARTS   AGE
stress   1/1     Running   0         7s

After a few minutes I can also check its resource consumption by running:

kubectl top pod stress
NAME     CPU(cores)   MEMORY(bytes)
stress   101m         131Mi

It's running happily, consuming the 101m of CPU and 131M of memory. All within the limits.

Pod QoS Matters

Now let's try to increase our container's limits in-place to give it more resources and see what happens:

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": { "limits": {"cpu":"300m","memory":"250M"}}}]}}'

Oops! That didn't work!
We're getting:

The Pod "stress" is invalid: metadata: Invalid value: "Guaranteed": Pod QoS is immutable

So what we now know is that while we can change the values of limits and requests - we can't change the pod QoS class. I.e the relationship between the requests and the limits has to stay the same.

Updating the Resources

Let's try to update both the requests and the limits while staying within the Guaranteed QoS:

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"cpu":"300m","memory": "250M"}, "limits": {"cpu":"300m","memory":"250M"}}}]}}'
pod/stress patched

If we now watch kubectl top pod stress we will se how the container gradually gets the additional CPU time:

The CGroups Behind the Scenes

Now, being the curious cat that I am - I wanted to check how this works behind the scenes. I know there are cgroups involved in setting container resource restrictions but I like checking myself how stuff works.
The great thing with k3d is it's very easy to get into your nodes with a simple docker exec.

docker exec -it k3d-pod-resize-server-0 sh

Now I want to find my container and identify the path to its cgroup definition.
Find the container ID using ctr - the containerd command-line utility:

ctr c ls | grep stress
a4ad15ff9c7a71a0f1c34cdce9d1ae9d18ebd4e7b01f3c92ee796e5180729460    docker.io/progrium/stress:latest                       io.containerd.runc.v2

and then - find the cgroup information for my container:

ctr c info a4ad15ff9c7a71a0f1c34cdce9d1ae9d18ebd4e7b01f3c92ee796e5180729460 | grep cgroup

which will give me something like:

"destination": "/sys/fs/cgroup",
                "type": "cgroup",
                "source": "cgroup",
            "cgroupsPath": "/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/a4ad15ff9c7a71a0f1c34cdce9d1ae9d18ebd4e7b01f3c92ee796e5180729460",
                    "type": "cgroup"

The important parts here are /sys/fs/cgroup where all the cgroup definitions are found and the cgroupsPath - where the specific constraints for this container are defined.

You'll notice there's a hierarchy there - first we have the pod... directory and then - the directory named as the container id. This being a single-container pod - all the cgroup values will be featured in the parent folder. So that's where we're going to look.

cat /sys/fs/cgroup/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/memory.max

249999360

That's right - 250 Mb of memory in bytes!

cat /sys/fs/cgroup/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/cpu.max

30000 100000

An that's correct too! According to the RedHat documentation:

The first value is the allowed time quota in microseconds for which all processes collectively in a child group can run during one period. The second value specifies the length of the period.
During a single period, when processes in a control group collectively exhaust the time specified by this quota, they are throttled for the remainder of the period and not allowed to run until the next period.

Impact on Scheduling

Another thing I wanted to try is update the requests to more than my node can give and check if the scheduler will try to reschedule my pod to another node because the current one doesn't have the needed capacity.

Let's check how many cpus my node has access to:

kubectl get node -ojsonpath="{ .items[].status.allocatable.cpu } cpus"
8 cpus%

I got 8. So let's try to request 10 and see what happens:

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"cpu": "10"}, "limits": {"cpu":"10"}}}]}}'
pod/stress patched

Alas, while the requests got updated - nothing else happens. Pod doesn't get rescheduled or evicted. Why? No idea.. Have I tried creating it with 10 cpu request from the beginning - it would have stayed pending because there aren't any nodes large enough. So I would expect the pod with requests higher than a node can satisfy to get evicted. But maybe my thinking is flawed?

Negating Resources

Until now all worked fine because we were only adding resources. Everybody likes having more stuff, nobody likes when stuff is taken away from them.

Let's start by taking back the CPU time we granted in the previous section:

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"cpu":"100m"}, "limits": {"cpu":"100m"}}}]}}'
pod/stress patched

I'm bringing the CPU requests back to 100m. Quite expectedly in a couple of seconds kubectl top will show me that pod cpu consumption went down to 100m.
And the cgroup cpu.max file will get updated as expected:

cat /sys/fs/cgroup/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/cpu.max
10000 100000

But what if I try to reduce memory?

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"memory": "150M"}, "limits": {"memory":"150M"}}}]}}
pod/stress patched

Seems to work fine. Checking the cgroups I see the config has been updated:

cat /sys/fs/cgroup/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/memory.max
149999616

And what if I need to free even more memory?

kubectl patch pod stress -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"memory": "100M"}, "limits": {"memory":"100M"}}}]}}
pod/stress patched

Note that I'm reducing memory to 100M which should cause my container to get OOMKilled. And it seems to work:

kubectl get pod stress -ojsonpath="{ .spec.containers[0].resources }"

{"limits":{"cpu":"100m","memory":"100M"},"requests":{"cpu":"100m","memory":"100M"}}

But I see that the pod continues running!

kubectl get pod
NAME     READY   STATUS    RESTARTS   AGE
stress   1/1     Running   0          21m

And checking the cgroup memory.max file shows why:

cat /sys/fs/cgroup/kubepods/podaa80f5b5-d68b-4ab6-ac38-df493310068b/memory.max
149999616

The cgroup wasn't updated! Looks like something is getting in our way - protecting the container from getting less memory than it's already using. While this makes sense as a precaution - taking away memory from a running process may lead to irreversible corruption - this now leads to container limits holding an incorrect value which will surely puzzle anyone trying to understand why it's not getting OOMKilled.

I would expect some validating admission hook to tell me that memory can't be reduced. Looks like a bug to me.

Saving Hungry Pods

Ok, we found out that memory being an incompressible resource - we can't really reduce it in-place to a value lower what than the container is already using.

But can we save an OOMing container by giving it more memory?

Let's try that with a similar pod but one that gets only 100M of memory from the get go (while trying to allocate 128):

apiVersion: v1
kind: Pod
metadata:
  name: hungry
spec:
  containers:
  - image: progrium/stress
    args: ["--cpu", "1", "--vm", "1", "--vm-bytes", "128M", "--vm-hang", "3"]
    name: stress
    resources:
      requests:
        memory: 100M
      limits:
        memory: 100M

kubectl create -f https://raw.githubusercontent.com/perfectscale-io/inplace-pod-resize/main/hungry.yaml

Quite expectedly the container gets OOMKilled almost instantly:

kubectl get pod hungry
NAME     READY   STATUS      RESTARTS     AGE
hungry   0/1     OOMKilled   1 (5s ago)   8s

And it will continue restarting and getting OOMkilled until we update its memory limits. So let's save it from this misery by giving it the memory it needs:

kubectl patch pod hungry -p '{"spec" : { "containers" : [{"name" : "stress", "resources": {"requests": {"memory": "200M"}, "limits": {"memory":"200M"}}}]}}'
pod/hungry patched

This seems to work fine:

kubectl get pod hungry -ojsonpath="{ .spec.containers[0].resources }"
{"limits":{"memory":"200M"},"requests":{"memory":"200M"}}%

But the pod continues getting killed:

kubectl get pod hungry
NAME     READY   STATUS      RESTARTS      AGE
hungry   0/1     OOMKilled   4 (33s ago)   60s

And if check the cgroup memory.max file we'll see why:

cat /sys/fs/cgroup/kubepods/burstable/pod708b8195-0ca0-45e0-9f2b-015f679c98da/memory.max
99999744

Its memory limit never actually got updated!
Why? I wasn't able to find an answer for this one. Why disallow saving containers from getting killed by providing them memory they need? I'm not aware of the technical limitations that would prevent this and I also didn't find anything in the KEP docs

So it looks like the only way to fix the OOMKill is still by deleting the pod and creating a new one with more memory.

Summary

In-place pod resizing is a long awaited feature. Still in alpha since v1.27 it will hopefully make it to beta by v1.32.
If the drawbacks and bugs get fixed.
And here are some of them I found:

Memory can't be reduced lower than currently used. But there's no notification about that.
Giving more resources than available on the node doesn't lead to pod eviction (true for both CPU and Memory)
If a pod is getting OOMKilled - it's not possible to give it more memory to save it from getting killed.

Will these get eventually fixed? I certainly hope so. Will the feature get it to beta by v1.32? Let's keep our fingers crossed.

Something in this post isn't clear or correct? Let me know in the comments.

Thanks for reading and may your pods keep running!

Fixing ko local image publishing on MacOs

Ant(on) Weiss — Mon, 22 Jul 2024 13:37:55 +0000

Preamble:

I still use Docker desktop to run containers on my MacBook Air. I know there's Colima but have no time to switch and deal with the consequences.
I also recently started using ko for containerizing my Go apps.

ko is Great but...

I love ko - it builds distroless secure and slim images. But there's one issue - by default - ko build pushes the resulting image to the remote registry.
It's kinda fine for continuous delivery, but I do a lot of experiments and I don't always want to publish all the garbage I create to remote - trying to be considerate of network bandwidth and image storage.

So instead I want to build my images to the local image storage.
It's possible to do that with ko build . -L
Just that on MacOs this was failing for me with the following:

2024/07/22 15:52:50 Loading otomato/myapp:717e6196339c956bc878bd58f5ab8244a709dc0510051f9e6df72620f28a2aaa
2024/07/22 15:52:50 daemon.Write response:
Error: failed to publish images: error publishing ko://github.com/otomato/myapp: error loading image: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Calling the Docker daemon

Clearly the docker client inside ko is trying to contact Docker daemon on the standard socket and failing.

I tried googling for this error but didn't find anything. So I decided to solve it myself.
Here's the thing - on MacOS the Docker socket isn't the standard /var/run/docker.sock - instead it's at ~/Library/Containers/com.docker.docker/Data/docker.raw.sock

The Solution

In order to fix this what I needed to do is create a symlink from the actual Docker socket to where the standard Docker client expects to find it:

sudo ln -s ~/Library/Containers/com.docker.docker/Data/docker.raw.sock /var/run/docker.sock

Now that Docker daemon can be contacted via the standard socket address - ko can push images to it:

ko build . -B -L --platform linux/arm64
2024/07/22 16:04:04 Building github.com/otomato/myapp for linux/arm64
2024/07/22 16:04:04 Loading otomato/myapp:717e6196339c956bc878bd58f5ab8244a709dc0510051f9e6df72620f28a2aaa
2024/07/22 16:04:05 Loaded otomato/myapp:717e6196339c956bc878bd58f5ab8244a709dc0510051f9e6df72620f28a2aaa
2024/07/22 16:04:05 Adding tag latest
2024/07/22 16:04:05 Added tag latest
otomato/myapp:717e6196339c956bc878bd58f5ab8244a709dc0510051f9e6df72620f28a2aaa

Meanwhile I also opened an issue on the ko repo. But until it's fixed - this hack works like charm.

Hope this helps you too.

9 Ways to Spin Up an EKS Cluster - Way 3 - eksctl

Ant(on) Weiss — Thu, 27 Jun 2024 11:00:29 +0000

In my previous post I showed how to spin up an EKS cluster with pure shell and AWS CLI. (All the links to other posts in this series will be here)

This used to be the easiest way of getting to a cluster without leaving your terminal. But pretty early in EKS history (2017) some smart folks from a company named Weaveworks(RIP) realized it was too cumbersome to do this using the aws cli subcommand and that EKS is complex enough to deserve a command-line client of its own. That's how eksctl was born.

A few months ago Weaveworks (who brought us a plethora of great OSS tools like Flux, Flagger and Weave) was shut down. But AWS announced full support for eksctl in 2019 - so eksctl is now the de-facto standard EKS CLI tool.

The great thing about eksctl is that it allows one to create and manage clusters not only using one-off commands with arguments but also with YAML configuration files - in a true and familiar IaC way.

We'll check out both options but first let's install eksctl and generate an SSH key so we can connect to the nodes in the clusters we create if needed. Please note - I'm not endorsing SSH connections to your EKS nodes. Do avoid this if possible - so as not to cause inadvertent configuration drift. But sometimes we still need this for troubleshooting, especially in training environments. So let's have the SSH key handy.

Install eksctl

If you're on Linux - here are the official instructions:

# for ARM systems, set ARCH to: `arm64`, `armv6` or `armv7`
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH

curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"

# (Optional) Verify checksum
curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt" | grep $PLATFORM | sha256sum --check

tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz

sudo mv /tmp/eksctl /usr/local/bin

Please note this doesn't install such eksctl prerequisites as kubectl and aws-iam-authenticator.

And if, like me - you're on a Mac - definitely use brew as it takes care of all dependencies. (even though the official eksctl docs don't recommend it)

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl

And now - let's generate that ssh key:

ssh-keygen  -f ./id_rsa -N ''

This will create an id_rsa and id_rsa.pub in your current directory. Make sure to run the following eksctl commands from the same directory and it will pick up this key by default.

Sidenote - the VPC

If you've read the previous post in this series (where we created an EKS cluster using the AWS CLI), you'd notice that creating the VPC was a separate step. The added value of eksctl is it takes care of most dependencies and add-ons for us without the need of running additional commands. The same is true for VPC creation. A new VPC with default subnet configuration is created for us each time we spin up a new cluster, unless we specifically define we want to re-use an existing VPC.

1. Create an EKS cluster - eksctl with arguments

The most straightforward way of creating an EKS cluster with eksctl is providing all the arguments on the command-line and letting the tool take care of the defaults. This approach, while limited and not repeatable enough can definitely give us a cluster.

The command I provide here defines quite a number of settings I personally find important even for small toy clusters I spin up for fun and games. But eksctl can do its job even with less stuff defined. Look in the official "Getting Started" docs if you want just the bare bones.

So here's what I decided to use:

# First - define the environment. 
export CLUSTER_NAME=way3
export AWS_REGION=eu-central-1
export K8S_VERSION=1.30
export NODE_TYPE=t2.medium
export MIN_NODES=1
export MAX_NODES=3

I'm starting out with small nodes and already preparing the cluster for auto-scaling with min and max nodes definitions. It's important to note that eksctl allows us to enable the IAM policy for ASG acces and define the auto-scaling range. But it doesn't take care of installing cluster-autoscaler. We'd need to do that separately. If we wanted to... On the other hand - these days it makes total sense to start out with Karpenter. For which eksctl does provide support, but not on the command line. whcih means we'll see how to configure Karpenter in the next section.

And now - time to spin up the cluster:

eksctl create cluster --name $CLUSTER_NAME \
                      --region $AWS_REGION \
                      --with-oidc --version $K8S_VERSION \
                      --nodegroup-name ng-$CLUSTER_NAME-1 \
                      --node-type t2.medium \
                      --nodes 1 --nodes-min 1 --nodes-max 3 \ 
                      --spot \
                      --ssh-access \
                      --asg-access \
                      --external-dns-access \
                      --full-ecr-access \
                      --alb-ingress-access

This command gives us a full-featured cluster with IAM policies for ECR access (--full-ecr-access), external dns controller (--external-dns-access) , ALB ingress controller (--alb-ingress-access), OIDC support and more. It also runs its nodes on spot instances for cost optimization. Which is totally fine for a toy cluster but may be not appropriate if the application you're planning to deploy isn't disruption-tolerant.

From the command output we learn that in the background our command is converted into a couple of CloudFormation stacks:


2024-06-27 12:51:47 [ℹ]  will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2024-06-27 12:51:47 [ℹ]  will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)

After about 15 minutes (depending on the weather and the region you've decided to use) CloudFormation returns and we can access our cluster:

kubectl get node
NAME                                             STATUS   ROLES    AGE   VERSION
ip-192-168-56-76.eu-central-1.compute.internal   Ready    <none>   35m   v1.29.3-eks-ae9a62a

Note that the new cluster context is added to your kubeconfig automatically.
If you want to update the kubeconfig at a later time you can use:

eksctl utils write-kubeconfig -c $CLUSTER_NAME -r $AWS_REGION

But, as we already said - the CLI approach is limited. To do real IaC we want to put the cluster definitions in a YAML config file. This gives us a lot more capabilities, and allows to commit the config file to source control for further collaboration, change tracking and automation.

But first - let's remove the cluster we just created:

eksctl delete cluster --region=$AWS_REGION --name=$CLUSTER_NAME

2. Create an EKS cluster - eksctl with a config file.

The config file I provide here gives us everything we defined at the command line and more. As mentioned - it also allows us to install Karpenter in the same eksctl execution - thus giving us an industry-standard auto-scaling EKS cluster with just-in-time node provisioning. You can grab this file in Github too.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: way3
  region: eu-central-1
  version: "1.30"
  tags:
    karpenter.sh/discovery: way3
iam:
  withOIDC: true

managedNodeGroups:
  - name: ng-way3-1
    labels: { role: worker }
    instanceType: t2.medium
    desiredCapacity: 2
    minSize: 1
    maxSize: 3
    tags:
      nodegrouprole: way3
    volumeSize: 20
    iam:
      withAddonPolicies:
        externalDNS: true
        certManager: true
        awsLoadBalancerController: true
        albIngress: true
        ebs: true
        efs: true
        imageBuilder: true
        cloudWatch: true
    ssh:
      allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key

karpenter:
  version: '0.37.0'
  createServiceAccount: true
  withSpotInterruptionQueue: true

An attentive eye will also notice I've also defined some additional stuff such as CloudWatch logging of the control plane, EBS and EFS access. Consider removing these lines if you don't need them.
Also you'll notice that not only it installs Karpenter, it also takes care of setting up the SpotInterruptionQueue, which allows Karpenter to replace spot instances before they die.
And there are many additional options available.
So yes - this is a very scalable approach, which takes care of more or less everything one might need in an EKS cluster.

Execute this plan with:

eksctl create cluster -f cluster.yaml

This again creates a CloudFormation execution that, granted we have all the necessary permissions, should complete successfully.

Let's check that Karpenter got installed:

kubectl get pod -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
karpenter     karpenter-79db484bbf-flzzq   1/1     Running   0          32s
karpenter     karpenter-79db484bbf-nfhsp   1/1     Running   0          32s
kube-system   aws-node-8h4ln               2/2     Running   0          17m
kube-system   aws-node-vq8wj               2/2     Running   0          18m
kube-system   coredns-6f6d89bcc9-qx497     1/1     Running   0          24m
kube-system   coredns-6f6d89bcc9-wwjtp     1/1     Running   0          24m
kube-system   kube-proxy-8mnd2             1/1     Running   0          18m
kube-system   kube-proxy-c5zkp             1/1     Running   0          17m

Yup, here it is!

The upside of using the config file is of course the ability to manage stuff in a somewhat idempotent way. So for example if we want to change our node group config - we can update the following lines:

- name: ng-1
    labels: { role: worker }
    instanceType: t2.medium
    desiredCapacity: 1
    minSize: 1
    maxSize: 5

and then run eksctl update nodegroup -f cluster.yaml - this will update our NodeGroup autoscaling range.

And of course eksctl provides us with a plethora of addtional commands that come very handy for ongoing management of EKS clusters:

eksctl -h
The official CLI for Amazon EKS

Usage: eksctl [command] [flags]

Commands:
  eksctl anywhere                        EKS anywhere
  eksctl associate                       Associate resources with a cluster
  eksctl completion                      Generates shell completion scripts for bash, zsh or fish
  eksctl create                          Create resource(s)
  eksctl delete                          Delete resource(s)
  eksctl deregister                      Deregister a non-EKS cluster
  eksctl disassociate                    Disassociate resources from a cluster
  eksctl drain                           Drain resource(s)
  eksctl enable                          Enable features in a cluster
  eksctl get                             Get resource(s)
  eksctl help                            Help about any command
  eksctl info                            Output the version of eksctl, kubectl and OS info
  eksctl register                        Register a non-EKS cluster
  eksctl scale                           Scale resources(s)
  eksctl set                             Set values
  eksctl unset                           Unset values
  eksctl update                          Update resource(s)
  eksctl upgrade                         Upgrade resource(s)
  eksctl utils                           Various utils
  eksctl version                         Output the version of eksctl

All in all - eksctl is the go to tool for EKS management if you haven't already standardized your cloud platform on another IaC solution such as Terraform, Pulumi, CDK or others which we'll look into in the folowing posts.

Thanks for reading and may your clusters be lean!

P.S. now you got a cluster - why not start managing its cost and performance for free with PerfectScale - the leading Kubernetes cost optimization solution?

Join now to build clusters you can be proud of: https://perfectscale.io.

DevOps Shorts 028 - Peter Guagenti

Ant(on) Weiss — Thu, 11 Apr 2024 13:29:40 +0000

I haven't posted about DevOps Shorts episodes here yet, I think. Mainly publishing them on my homepage at https://antweiss.com. But now I intend to re-post them here too for wider exposure.

So here goes:

Peter Guagenti - The AI is an Iron Man Suit for the Mind

Today's episode is a bit of a departure from my regular format. And it's symbolic. The evolution of GenAI is definitely changing how we work in IT. The change may still not be very evident but we all know it's coming. And we still need to understand what changes. Beside the StackOVerflow drop in popularity that is.

That's why my guest this time is Peter Guagenti - the President and CMO at Tabnine - the AI coding assistant. Peter has worked at Nginx, CockroachDB and SingleStore, so he has a deep understanding of platform tooling and open source. And today he's bringing the message of AI-assisted coding. And together we're trying to understand how that changes platform and Devops work.

Listen to the episode to learn:

Why AI changes everything about how we work (in DevOps too)
Where AI extends beyond code completion/generation
What's the role of context awareness
How it changes our creativity

The episode is live on:

Spotify
Youtube - also embedded at the bottom of this post.

Watch out for new episodes!

This episode is brought to you by PerfectScale - the automated K8s optimization and management platform

To watch DevOps Shorts 028:

Adding a canonical url to dev.to posts (in basic markdown editor)

Ant(on) Weiss — Tue, 02 Apr 2024 15:33:47 +0000

TLDR:
add a markdown header: canonical_url: https://your.url.here

Took me some time to find this. The preamble is I'm using the basic markdown editor on dev.to. Or at least I was doing it until now. Somehow wasn't even aware of Rich+Markdown option arriving. Trying it for the first time right now - writing this little blurb.
Anyway - I needed to update some of my older posts with the canonical links that were missing.
Almost all the guides out there showed how to do it in the new editor - through the cog menu at the bottom of the editor.

But I needed to add it to older posts which were edited with basic.
I guessed it would come down to adding a header, but it took me a few googlings to find out which one.
So yes - just add this header:
canonical_url: https://your.url.here
And you're good.

Happy blogging!

9 Ways to an EKS Cluster - Way 2 - AWS CLI

Ant(on) Weiss — Sun, 25 Feb 2024 17:08:10 +0000

In my previous post I started out with Way 1 - Create an EKS Cluster in AWS Management Console. (All the links to other posts in this series will be here)

Using the management console is quick and intuitive. But, as discussed - real platform engineers don't click on buttons. Instead they manage their Infra as Code. And - not all code was created the same. We can identify 3 layers of IaC - each one increasing in complexity and, therefore - flexibility:

Command line (CLI)
DSL + Interpeter (e.g HCL + Terraform, yaml + Ansbile, etc)
Pure programming language (with Boto3, CDK, Pulumi, etc)

So today we will start from the most accessible layer - the AWS cli.

Prerequisites

The AWS Command Line Interface (AWS CLI) is a unified tool to manage all your AWS services (not just EKS). With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.

In case you still don't have AWS CLIv2 - please follow the official instructions to get it installed.

While at it - I heartily recommend you to install aws-shell which boosts your aws cli productivity by providing graphical autocompletion, hints and shortcuts as shown in the image below. I only discovered it recently myself and it's definitely a game changer!

Of course once you get the cluster running it's highly desirable that you also have kubectl and/or OpenLens installed to interact with the cluster. But that's not specific to this EKS provisioning method.

Creating a VPC

I skipped this part in the description of Way 1.

But when working with the CLI - it makes sense to wrap everything into one script (provided in the accompanying repo) so let's create the VPC right here.

The required VPC config is non-trivial - with 2 public and 2 private subnets and all the associated gateways, rout tables and security groups. Luckily, AWS provides a CloudFormation template to make this easier, so we'll just use that.

The template defines a default IPv4 CIDR range for your VPC. Each node, Pod, and load balancer that you deploy is assigned an IPv4 address from this block. It provides enough IP addresses for most implementations, but if it doesn't, then you can change it. For more information, see VPC and subnet sizing in the Amazon VPC User Guide.

Create the VPC CloudFormation stack:

EXPORT AWS_REGION=eu-central-1
aws cloudformation create-stack --stack-name Way2VPC \
    --region $AWS_REGION \
    --template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml

Stack creation takes a few minutes but the CLI prompt returns immediately. In order to check the stack status please run:

aws cloudformation describe-stacks --stack-name Way2VPC \
    --region eu-central-1 \
    --query 'Stacks[*].StackStatus'

Once it returns:

[
    "CREATE_COMPLETE"
]

we can continue to ->

Creating the Cluster Role

As mentioned in the previous post - we have to create an IAM role that will allow the control plane of our EKS to manage its nodes. We will name our role for this blog post Way2EKSClusterRole.

The following comes from the official guide, but as all of these commands are executed from the CLI - it makes sense to put them here:

Create the policy file:

cat >eks-cluster-role-trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create the IAM role:

aws iam create-role --role-name Way2EKSClusterRole --assume-role-policy-document file://"eks-cluster-role-trust-policy.json"

Attach the Amazon EKS managed policy named AmazonEKSClusterPolicy to the role:

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy --role-name Way2EKSClusterRole

Finally - Let's Create the Cluster

Now that we have the VPC and the role - we can create the cluster.
First - define the environment variables. Feel free to modify these as appropriate for your environment:

export REGION=eu-central-1
export CLUSTERNAME=way2
export K8S_VERSION=1.29
export ROLE_ARN=$(aws iam get-role --role-name Way2EKSClusterRole --query 'Role.Arn' --output text)
export SECURITY_GROUP=$(aws cloudformation describe-stacks --stack-name Way2VPC --region eu-central-1 --query 'Stacks[*].Outputs[?OutputKey==`SecurityGroups`].OutputValue | [0] | [0]' --output text)
export SUBNET_IDS=$(aws cloudformation describe-stacks --stack-name Way2VPC --region eu-central-1 --query 'Stacks[*].Outputs[?OutputKey==`SubnetIds`].OutputValue | [0] | [0]' --output text)

Note I'm using the --query option to retrieve the necessary resource properties and --output text to make sure they are not quoted. This is needed to use them as env vars in the following command that finally creates our cluster!

aws eks create-cluster --region $REGION \
  --name $CLUSTERNAME \
  --kubernetes-version $K8S_VERSION \
  --role-arn $ROLE_ARN \
  --resources-vpc-config subnetIds=$SUBNET_IDS,securityGroupIds=$SECURITY_GROUP

It takes a several minutes to create the cluster.
You can verify it's been created by running the following command:

aws eks describe-cluster --region $REGION --name $CLUSTERNAME --query "cluster.status"

If the response is "ACTIVE" - we're good to go and connect to the cluster by generating a kubeconfig definition:

aws eks update-kubeconfig --region $REGION --name  $CLUSTERNAME

Adding a NodeGroup

Once this works and we can successfully run kubectl get nodes - we recall we still need to add nodes.
The options here are abound - we can choose between managed and unmanaged node groups (or even AWS Fargate), we can define which AMIs and instance types to choose and if the resulting machines will Spot or On-Demand. This is really beyond the scope of my post. Right here we'll opt for a minimum viable nodegroup - with defaults defined by AWS.

Creating the Node Role

Our nodes also need an IAM Role - to pull container images from ECR, to assign IPs for the AWS CNI and a bunch of other stuff.

Let's create that role:

Define the trust relationship:

cat >node-role-trust-relationship.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

And now create the role and attach all the necessary policies:

export NODE_ROLE_NAME=Way2EKSNodeRole
aws iam create-role \
  --role-name $NODE_ROLE_NAME \
  --assume-role-policy-document file://"node-role-trust-relationship.json"
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \
  --role-name $NODE_ROLE_NAME
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \
  --role-name $NODE_ROLE_NAME
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
  --role-name $NODE_ROLE_NAME

A side note about IPv6

All the commands I give only provision a cluster with IPv4 support, because that's what the majority of us need. Should you need IPv6 support - please refer to the official docs.

Finally Create the NodeGroup

There's a funny quirk to the way subnet ids are passed to this command. In aws eks create-cluster subnet ids need to be comma-separated. But in create-nodegroup they are for some reason expected to be separated by spaces... Go figure this out :))

So first do this:

export SUBNET_IDS=$(echo $SUBNET_IDS | tr ',' ' ')

And then:

export NODE_ROLE_ARN=$(aws iam get-role --role-name $NODE_ROLE_NAME --query 'Role.Arn' --output text)

aws eks create-nodegroup --cluster-name $CLUSTERNAME \
--nodegroup-name Way2NodeGroup \
--subnets subnet-05093c7f5ffd9227d subnet-0c3871d7e909fbb0d subnet-098075cc435686217 subnet-0b5940fcf21e402ad \
--node-role arn:aws:iam::117473350851:role/Way2EKSNodeRole \
--region $REGION

This creates a NodeGroup with the default scaling params of

 "minSize": 1,
 "maxSize": 2,
 "desiredSize": 2

If you need a different scaling config - modify this accordingly.

After a few minutes we can recheck our nodes by running:

kubectl get node

Which should give us something like:

NAME                                               STATUS   ROLES    AGE     VERSION
ip-192-168-177-103.eu-central-1.compute.internal   Ready    <none>   5m57s   v1.29.0-eks-5e0fdde
ip-192-168-98-28.eu-central-1.compute.internal     Ready    <none>   5m58s   v1.29.0-eks-5e0fdde

And we now have an EKS cluster all created from the CLI!
And you can even connect it to PerfectScale to start monitoring and optimizing your Kubernetes resource usage right from the start - sign up here

Conclusion

Creating an EKS cluster can be done with (relatively) simple AWS CLI commands.
There are a lot of commands to run, and while they can be wrapped in a script and parameterized - it's still not a very good solution. The good thing is that we don't need anything beside AWS CLI. Well, some CloudFormation, but it's AWS-provided.
The worst part is that such a script isn't idempotent. Once we create all (or some of) these resources - the script isn't going to work. Also removing all the multiple resources we've created is a lot of manual work now.

And that's why we're going to explore additional ways of provisioning an EKS cluster.

See you in the next installment of this series.

Exploring cgroups v2 and MemoryQoS With EKS and Bottlerocket

Ant(on) Weiss — Mon, 19 Feb 2024 14:49:05 +0000

Bottlerocket is a Linux-based operating system optimized for hosting containers. It was originally developed at AWS specifically for runnning secure and performant Kubernetes nodes. It's minimal, secure and supports atomic updates.

According to this discussion - starting with Bottlerocket 1.13.0 (Mar 2023) new distributions will default to using Cgroups v2 interface for process organization and enforcing resource limits.

In this post I intend to explore how this works for EKS clusters running Kubernetes 1.26+ and what this change means for EKS users.

Cgroups - An Intro

Cgroups (abbreviated from Control Groups) - is a Linux kernel feature that lies at the foundation of what we now know as Linux containers.

The feature allows to limit. account for and isolate resource usage for a collection of processes.

It was developed at Google circa 2007 and merged into Linux kernel mainline in 2008.

Cgroups and Kubernetes

Kubernetes allows us to define resource usage for containers via the resources map in the Pod API spec.
These definitions are then passed by the kubelet on to the container runtime on the node and translated into Cgroups configuration.

Up until version 1.25 Kubernetes only supported Cgroups v1 by default. In 1.25 - stable support for Cgroups v2 was added. Now if running on a node with Cgroups v2 - the kubelet automatically identifies this and perfroms accordingly. But what does this mean for our workload configuration? In order to understand that we need to explain what Cgroups v2 is.

Cgroups V2

Cgroups v2 was released in 2015 introducing API redesign - mainly for a unified hierarchy and improved consistency. The following diagram shows the change in how Cgroup controllers are ordered in v2 vs. v1:

According to this architecture document : "Some Kubernetes features exclusively use cgroup v2 for enhanced resource management and isolation. For example, the MemoryQoS feature improves memory QoS and relies on cgroup v2 primitives."

And when we look at the description of the aforementioned MemoryQoS feature we find out that "In cgroup v1, and prior to this feature, the container runtime never took into account and effectively ignored spec.containers[].resources.requests["memory"]." and that "Fortunately, cgroup v2 brings a new design and implementation to achieve full protection on memory... With this experimental feature, quality-of-service for pods and containers extends to cover not just CPU time but memory as well."

Well, first of all - it's a bit shocking and even insulting to learn that container runtimes ignored our settings! But I was also very curious to learn how this changes now that cgroups v2 support is introduced.

MemoryQoS and Cgroups v2

According to this page:

Memory QoS uses the memory controller of cgroup v2 to guarantee memory resources in Kubernetes. Memory requests and limits of containers in pod are used to set specific interfaces memory.min andmemory.highprovided by the memory controller. When `memory.min is set to memory requests, memory resources are reserved and never reclaimed by the kernel; this is how Memory QoS ensures the availability of memory for Kubernetes pods. And if memory limits are set in the container, this means that the system needs to limit container memory usage, Memory QoS uses memory.high to throttle workload approaching it's memory limit, ensuring that the system is not overwhelmed by instantaneous memory allocation.

This is all great! Let's now provision an EKS cluster with some Bottlerocket nodes and see how this works in practice.

To easily spin up a cluster - use the cluster.yaml in the attached github repository:

generate ssh keys:

ssh-keygen -f ./mykey

and create the cluster

eksctl create cluster -f cluster.yaml

This will create a cluster with one Bottlerocket node. It also configures ssh access to the nodes by running the Bottlerocket admin container.

This means we can now access the node:

export NODE_IP=$(kubectl get node -oyaml | yq  '.items[].status.addresses[] | select(.type=="ExternalIP") | .address')
ssh -i mykey ec2-user@$NODE_IP

We get greeted with the following screen:

As this says - we can get admin access to the Bottlerocket filesystem by running sudo sheltie. So let's do that!

[ec2-user@admin]$ sudo sheltie
[bash-5.1]$ whoami
root

Now we can check if we in fact have cgroupv2 enabled:

[bash-5.1]$ stat -fc %T /sys/fs/cgroup/
cgroup2fs

Yup! This is cgroupv2! Were this cgroupv1 the output would've been tmpfs.

Let's Deploy a Pod

Ok, now let's deploy a pod to our node. We'll do that by creating a deployment based on the following yaml spec. This deploys antweiss/busyhttp, that I forked from jpetazzo/busyhttp and added memory load and release endpoints to.
You'll notice that the pod runs a container with Guaranteed QoS - i.e memory and CPU limits are equal to requests:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: busyhttp
  name: busyhttp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busyhttp
  template:
    metadata:
      labels:
        app: busyhttp
    spec:
      containers:
      - image: otomato/busyhttp
        name: busyhttp
        resources:
          requests:
            memory: "200Mi"
            cpu: "250m"
          limits:
            memory: "200Mi"
            cpu: "250m"

This spec is found in dep.yaml and we can deploy it with:

kubectl apply -f dep.yaml

Check the Cgroup Impact

Now let's go back to our node and see how our resource definitions are reflected in the cgroup config.

Back inside the sheltie prompt let's explore the containers running on Bottlerocket. Bottlerocket OS is using containerd container runtime. In order to interact with it we'll need to use ctr.

When we run ctr help - we get the following:

So ctr is unsupported. A bit discouraging, but well, it's working. Let's try to look at our containers:

bash-5.1$ ctr containers ls
CONTAINER    IMAGE    RUNTIME

No containers?! But I do see my pod running on the node! Where is my container? Well the answer to that is namespaces. Yup, just like kubernetes or linux kernel - containerd has namespaces. And all the containers executed by the kubelet live in a namespace called "k8s.io". We can see it by running:

bash-5.1$ ctr ns ls
NAME   LABELS
k8s.io

Ok, let's check the containers in the "k8s.io" namespace:

bash-5.1$ ctr -n k8s.io containers ls
CONTAINER                                                           IMAGE                                                                                                RUNTIME
0ed99eae66803896504d1853859d8866e00669b2610ba65cba6a17aa1300da48    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
154d9b7b3a83e4db6e3e4ac4ac1f836321337c604c3b590b5188b7a0773bdae1    docker.io/otomato/busyhttp:latest                                                                    io.containerd.runc.v2
3b63efe56d15e9c315c668a5913e17ade420cf7fb5ff7fa62b3c9b0e1574eab4    112233445566.dkr.ecr.eu-central-1.amazonaws.com/amazon/aws-network-policy-agent:v1.0.7-eksbuild.1    io.containerd.runc.v2
3c85fa3829f59f517db1c766e490a014357a760ce12e2859004cdfb8ea3d7cc6    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
420b62c7b2bdf2e7aa10baf8e4afd1ebda0cfff66300a23846758a029ad31222    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
4e21008f8fc70580906990fb95bed91f9155495270fbac1efb043f81e62a1c51    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.4                       io.containerd.runc.v2
4f9d20de851414160a5003eb6988f2b0df81dfe3d72d4ba3705db01a4571b515    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/kube-proxy:v1.29.0-minimal-eksbuild.1            io.containerd.runc.v2
5f8924422d852d09ad44f5d8579d9abaa78304d303007d566300db8f61978ee5    112233445566.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni:v1.16.0-eksbuild.1                    io.containerd.runc.v2
7f5afdfbb9a8599c3c5888664f0df349aab8740be21d87e629ff7390e0524c2a    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
80ed8ba3fd4624770eb17087c1a046c90be28e9fb2e31630c82e67b4c0ae19dd    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
a0929884ccdd72de6bf848a037e80b206a4fb4e2f9b77be568bac8f51787cccb    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.1-eksbuild.1                             io.containerd.runc.v2
a0dd24ecb0aee4eb645d25c75a3eade0c8c35fb09127db5ff7d8136d7bb86efe    112233445566.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.4                       io.containerd.runc.v2
c3a3a9214b11b808a88c0293312d8877497f06f87435af3a7334717a13588c26    112233445566.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni-init:v1.16.0-eksbuild.1               io.containerd.runc.v2

Now we're talking! We have all the usual suspects here - coredns, kube-proxy, the omnipresent pause containers. But right now we're interested in the container based on the docker.io/otomato/busyhttp:latest image.

Let's look for its cgroup definition in the cgroup filesystem we discovered previously. First we need to filter out the container id. ctr supports filters for its listing function. So the way to parse out the container id by image name is the following:

export CONTAINER_ID=$(ctr -n k8s.io containers ls -q image==docker.io/otomato/busyhttp:latest)

Note the -q that tells ctr to only output the id.

Now we can find the container's cgroup config:

find /sys/fs/cgroup/ -name *$CONTAINER_ID*
/sys/fs/cgroup/kubepods.slice/kubepods-pod5be5d94a_cbfe_416f_9010_6338003af666.slice/cri-containerd-154d9b7b3a83e4db6e3e4ac4ac1f836321337c604c3b590b5188b7a0773bdae1.scope

This gives us a long path somewhere inside a folder called kubepods.slice. Let's wrap this path in an environment variable and look around:

export MY_CGROUP_DIR=$(find /sys/fs/cgroup/ -name *$CONTAINER_ID*)
ls ${MY_CGROUP_DIR}

Whew! That's a lot of files! Now according to this page on Memory QoS - our requests.memory should be translated to memory.min while memory.high is calculated the following way:

memory,high = (pod.spec.containers[i].resources.limits[memory] or nodeAllocatableMemory) * throttlingFactor

Let's look at the limit first:

cat ${MY_CGROUP_DIR}/memory.high
max

Hmm. That's not a number. But we can also notice that there's a file called memory.max. Let's look inside that:

cat ${MY_CGROUP_DIR}/memory.high
209715200

Ok, here's our limit! 209715200 bytes is exactlymthe 200Mi we defined in the resources section of our pod spec.

Now what about the requests? Let's look at memory.min:

cat ${MY_CGROUP_DIR}/memory.min
0

0 is not the request we've defined. And that makes sense. Memory QoS has been in alpha since Kubernetes 1.22 (August 2021) and according to the KEP data was still in alpha as of 1.27.

In order to see the actual request values for memory reflected in cgroup config one needs to enable the Memory QoS feature gate in kubelet config as defined here:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  MamoryQoS: true

Trouble is - due to the atomic nature of Bottlerocket OS - we can't change its KubeletConfiguration file (found at /etc/kubernetes/kubelet/config) directly. We can only pass settings through settings.kubernetes via the API or a config file. But these currently don't support setting feature gates. So it looks like the only way to modify the Kubelet to support Memory QoS on EKS Bottlerocket nodes is to build our own Bottlerocket images. Which is a subject for a whole another blog post.

And for now - let's shrug our shoulders, scratch our heads and bring down our EKS cluster:

eksctl delete cluster -f cluster.yaml

Summing it All Up

cgroup v2 is enabled by default in current Bottlerocket EKS instances.
this allows a better organized resource management on the nodes
an important Kubernetes feature based on cgroup v2 is Memory QoS that ensure that memory requests are actually allocated by the container runtime and not merely checked for by the Kubernetes scheduler
MemoryQoS is still in alpha after 2 years
There's no easy way to enable Memory QoS on Bottlerocket nodes without building the AMIs ourselves.

Anyway - this was an interesting exploration. And if there's anything I got wrong or didn't make clear - please let me know in comments.

May all your containers run smoothly!

The config files used in the blog post can be found in this github repo

This post was originally published here

9 Ways to Spin Up an EKS Cluster - Way 1 - the Console

Ant(on) Weiss — Mon, 22 Jan 2024 18:27:43 +0000

We love EKS!

If you're running on AWS - the best, most hassle-free option of getting a Kubernetes cluster is by using EKS - the Elastic Kubernetes Service. The control plane of EKS clusters is fully managed by AWS while the data plane - i.e. the worker nodes can be defined and managed by the user in various available configurations.

As with anything in modern cloud services - there are a number of ways we can create and manage EKS clusters. Organizations only starting with building out their delivery platform need to choose the provisioning and managing method. This choice has a significant impact on their platform evolution. Often the criteria for making this choice isn't clear.

In this series I intend to give an overview of all the different options and provide a rundown of the benefits and downsides of each method.

And here's our list:

Way 1 - Create an EKS Cluster in AWS Management Console

Way 2 - Create an EKS Cluster in AWS cli

Way 3 - Create an EKS Cluster with eksctl

Way 4 - Create an EKS Cluster with CloudFormation

Way 5 - Create an EKS Cluster with python and boto3

Way 6 - Create an EKS Cluster with AWS CDK

Way 7 - Create an EKS Cluster with Terraform

Way 8 - Create an EKS Cluster with Pulumi

Way 9 - Create an EKS Cluster with Crossplane

In fact - the first 3 ways listed here (Management console, eksctl and aws cli) are all laid out in this AWS guide, so I won't go into too much technical detail. But some things are still worth noting.

So, without further ado - let's start!

Way 1 - Create an EKS Cluster in AWS Management Console

So the fastest, most straightforward way of provisioning any AWS service is of course by going to the console and clicking your way through. No need to install anything on your computer, no need to learn new tools and languages.

And it's actually so easy! Just go to your AWS Management Console, find EKS in the list of available services and proceed to "Add Cluster -> Create":

Right? Wrong!
In fact - before clicking your way to a cluster you need to:

a) Create a VPC and subnets that meet Amazon EKS requirements.
b) Create a Cluster Role in AWS IAM by following this guide.

And then you can click your way through!

On choosing the Kubernetes version

This is something we need to consider for all the methods listed. Unless some specific limitation prevents you - always choose the latest version (currently it's 1.29). AWS make sure to test the version they provide and regularly deprecate older versions. Each Kubernetes version gets 14 months of standard support and upgrading your production cluster can get nerve-wrecking and time-consuming. So again - make sure to always choose the latest one.

A note on observability

The third screen you need to click through when creating EKS from the console is the Observability one. This currently allows you to enable EKS monitoring using Amazon Managed Service for Prometheus.

You only need this if you're not using a 3rd party observability service (like DataDog or NewRelic) - because all of them support monitoring EKS today and you can then set this up at a later stage.

Creating some nodes

After you've successfully clicked through, waited a while and finally saw the cluster state in the console change from "Creating" to "Active" - it's time to connect to the control plane from your kubectl client.

That's where you'll need the AWS CLI, even if you've used the console for everything else until now. Get the kubeconfig:

aws eks update-kubeconfig --name mycluster --region eu-central-1

Added new context arn:aws:eks:eu-central-1:XXXXXXXXXXX:cluster/mycluster to /Users/antweiss/.kube/config

Try to look at the nodes:

(⎈ | mycluster:default)➜  kubectl get node
No resources found

And that's where we realize we still need to create the nodes!
This can be done by going to EKS->Clusters->mycluster->Compute and choosing either to use self-manged nodes, create a managed Node Group or utilize a Fargate Profile.

What option to use for your EKS nodes is a topic for a whole separate post. I won't go into it here. You can consult this page for a basic comparison of all these options. OR drop me a note in comments if you'd like my advice.

Provisioning EKS from the Management Console - the Bottom Line

As we saw in this post - the manual method is kinda straightforward, but it still leaves a lot of detail for us to take care of.
In addition - this method doesn't scale well. It can work ok for a couple of small clusters but once we are in production - running at scale, across multiple geographical regions - managing things by hand becomes too slow and error prone. Professional platform engineers manage their infrastructure as code.
And that will be shown in the upcoming installments of this series.

Subscribe for updates and leave your comments if there's anything unclear or plain wrong! ;)