<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ole Markus With</title>
    <description>The latest articles on DEV Community by Ole Markus With (@olemarkus).</description>
    <link>https://dev.to/olemarkus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F251423%2Ff0cb7237-526a-466c-927a-88e77768bf66.jpg</url>
      <title>DEV Community: Ole Markus With</title>
      <link>https://dev.to/olemarkus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/olemarkus"/>
    <language>en</language>
    <item>
      <title>Zero-configuration IRSA on kOps</title>
      <dc:creator>Ole Markus With</dc:creator>
      <pubDate>Mon, 21 Mar 2022 15:48:12 +0000</pubDate>
      <link>https://dev.to/olemarkus/zero-configuration-irsa-on-kops-1po1</link>
      <guid>https://dev.to/olemarkus/zero-configuration-irsa-on-kops-1po1</guid>
      <description>&lt;p&gt;A while ago, I wrote about &lt;a href="https://dev.to/olemarkus/irsa-support-for-kops-1doe"&gt;using IAM Roles for ServiceAccounts on kOps&lt;/a&gt;.&lt;br&gt;
In short, this feature lets you define an AWS IAM Policy for a given ServiceAccount, and kOps will create the respective AWS IAM Role,&lt;br&gt;
assign the policy and establish a trust relationship allowing the ServiceAccount to assume the IAM Role.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge of configuring workloads
&lt;/h2&gt;

&lt;p&gt;While kOps elegantly handles what happens on the AWS side, we had not implemented anything that configures Pods to actually make&lt;br&gt;
use of the IAM Role. Indeed, some of the more frequently asked support questions&lt;br&gt;
in the kOps Slack channels have been around how to configure applications to assume roles. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://kops.sigs.k8s.io/cluster_spec/#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa"&gt;kOps documentation&lt;/a&gt;&lt;br&gt;
recommended directly adding the volumes and environment variables to the Pod spec,&lt;br&gt;
but it is not obvious exactly what needs to be added, and you have to manually fetch the actual role ARN that kOps creates from the AWS API or console.&lt;/p&gt;
&lt;h2&gt;
  
  
  The pod identity webhook
&lt;/h2&gt;

&lt;p&gt;On EKS, the &lt;a href="https://github.com/aws/amazon-eks-pod-identity-webhook"&gt;pod identity webhook&lt;/a&gt; is commonly used as the mechanism for adding the necessary parts of the Pod spec.&lt;br&gt;
This webhook looks for ServiceAccounts with a specific set of annotations telling it what ARN it can assume and various other settings. When a Pod is created that uses one of&lt;br&gt;
these ServiceAccounts, the webhook mutates the Pod using information found in the ServiceAccount annotations.&lt;/p&gt;

&lt;p&gt;Configuring these annotations is a lot simpler than directly configuring the Pod spec. &lt;br&gt;
Typically, &lt;a href="https://eksctl.io/usage/iamserviceaccounts/"&gt;EKS-specific tooling "owns" the ServiceAccount&lt;/a&gt;, which makes linking the role/ServiceAccount pair simpler, but also means that&lt;br&gt;
ServiceAccounts cannot be managed together with the application using them.&lt;/p&gt;

&lt;p&gt;For various reasons, installing the webhook on kOps was not that straightforward. For example, one could tell the webhook to use mounted TLS secrets. It could only use the &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/"&gt;CSR API&lt;/a&gt;.&lt;br&gt;
And even when the webhook was installed, you had to manually annotate ServiceAccounts with the role ARN that the Pods should try to assume.&lt;br&gt;
kOps could have "owned" the ServiceAccounts configured in the Cluster spec as well, but I feel the ownership of ServiceAccounts should be with the application and not the cluster.&lt;/p&gt;
&lt;h2&gt;
  
  
  Webhook the kOps way
&lt;/h2&gt;

&lt;p&gt;As mentioned towards the end of &lt;a href="https://dev.to/olemarkus/irsa-support-for-kops-1doe"&gt;my previous article&lt;/a&gt;,&lt;br&gt;
because kOps already knows the mapping between ServiceAccounts and IAM roles, there shouldn't be any need for&lt;br&gt;
users to copy the ARN from AWS into the ServiceAccount annotation. &lt;em&gt;Something&lt;/em&gt; should be able to just read the mapping in the Cluster spec&lt;br&gt;
and and configure workloads accordingly. &lt;/p&gt;

&lt;p&gt;I wrote this could be a webhook similar to the pod identity webhook. But why not just implement this as a feature &lt;em&gt;in&lt;/em&gt; the pod identity webhook?&lt;br&gt;
The EKS team was very open to the idea, and a &lt;a href="https://github.com/aws/amazon-eks-pod-identity-webhook/pull/142"&gt;PR later&lt;/a&gt;, the &lt;a href="https://github.com/aws/amazon-eks-pod-identity-webhook#pod-identity-webhook-configmap"&gt;webhook can be configured&lt;/a&gt; to look for additional Pods to mutate.&lt;/p&gt;

&lt;p&gt;After this PR, the webhook will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First look for annotations on the ServiceAccount as before.&lt;/li&gt;
&lt;li&gt;If no annotations are found on the ServiceAccount, the webhook will look for a mapping configured in the pod-identity-webhook ConfigMap.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Using the pod identity webhook addon
&lt;/h2&gt;

&lt;p&gt;As of kOps 1.23, kOps supports the webhook as &lt;a href="https://kops.sigs.k8s.io/addons/#pod-identity-webhook"&gt;a managed addon&lt;/a&gt;. When installed, kOps will populate the webhook ConfigMap based on the &lt;code&gt;spec.iam.serviceAccountExternalPermissions&lt;/code&gt; struct.&lt;/p&gt;
&lt;h3&gt;
  
  
  Installing
&lt;/h3&gt;

&lt;p&gt;Before continuing, make sure you already have a kOps 1.23 cluster with an AWS OIDC provider enabled.&lt;br&gt;
See &lt;a href="https://dev.to/olemarkus/irsa-support-for-kops-1doe"&gt;my previous article&lt;/a&gt; on how to go about that.&lt;/p&gt;

&lt;p&gt;Once your cluster is running 1.23, you can enable the webhook by adding the following to your cluster spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;certManager&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;podIdentityWebhook&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cert manager addon is required to establish the trust between the webhook and the API server.&lt;/p&gt;

&lt;p&gt;Now run &lt;code&gt;kops update cluster --yes&lt;/code&gt; and wait a minute or so for the control plane to deploy the addon(s).&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding a ServiceAccount mapping
&lt;/h3&gt;

&lt;p&gt;Start by granting a set of AWS privileges to a ServiceAccount:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;iam&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;serviceAccountExternalPermissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;aws&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;policyARNs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pod-identity-webhook-test&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;kops update cluster&lt;/code&gt; you will see something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  IAMRole/pod-identity-webhook-test.default.sa.&amp;lt;cluster&amp;gt;
        Tags                    {Name: pod-identity-webhook-test.default.sa.&amp;lt;clusterZ, KubernetesCluster: &amp;lt;cluster&amp;gt;, kubernetes.io/cluster/&amp;lt;cluster&amp;gt;: owned}
        ExportWithID            default-pod-identity-webhook-test

  IAMRolePolicy/external-pod-identity-webhook-test.default.sa.test.&amp;lt;cluster&amp;gt;
        Role                    name:pod-identity-webhook-test.default.sa.test.&amp;lt;cluster&amp;gt;
        ExternalPolicies        [arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess]
        Managed                 true
...
  +   config: '{"default/pod-identity-webhook-test":{"RoleARN":"arn:aws:iam::&amp;lt;account&amp;gt;:role/pod-identity-webhook-test.default.sa.&amp;lt;cluster&amp;gt;","Audience":"amazonaws.com","UseRegionalSTS":true,"TokenExpiration":0}}'
  -   config: '{}'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;kOps wants to create an IAM role for the ServiceAccount and assign it the &lt;code&gt;AmazonEC2ReadOnlyAccess&lt;/code&gt; policy.&lt;/p&gt;

&lt;p&gt;You can also see that it populates the mapping information into the pod-identity-webhook ConfigMap.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;kops update cluster --yes&lt;/code&gt; to apply the changes. Then run &lt;code&gt;kubectl logs -n kube-system -l app=pod-identity-webhook -f&lt;/code&gt; and observe the webhook picking up the mapping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I0319 07:10:28.312786       1 cache.go:186] Adding SA default/pod-identity-webhook-test to CM cache: &amp;amp;{RoleARN:arn:aws:iam::&amp;lt;account&amp;gt;:role/pod-identity-webhook-test.default.sa.&amp;lt;cluster&amp;gt; Audience:amazonaws.com UseRegionalSTS:true TokenExpiration:86400}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying the workload
&lt;/h3&gt;

&lt;p&gt;Once the mapping is in place, we can deploy the ServiceAccount and a Pod using that ServiceAccount. It's important to remember that the webhook will only mutate Pods on creation, so it &lt;em&gt;must&lt;/em&gt; be aware of the mapping before the Pod is created.&lt;/p&gt;

&lt;p&gt;Deploy the following to the cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pod-identity-webhook-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pod-identity-webhook-test&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-cli&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amazon/aws-cli:latest&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;300"&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pod-identity-webhook-test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should now see the following in the webhook logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I0319 07:39:33.373273       1 cache.go:80] Fetching sa default/pod-identity-webhook-test from cache
I0319 07:39:33.373346       1 handler.go:423] Pod was mutated. Pod=pod-identity-webhook-test, ServiceAccount=pod-identity-webhook-test, Namespace=default
I0319 07:39:33.373522       1 middleware.go:132] path=/mutate method=POST status=200 user_agent=kube-apiserver-admission body_bytes=1441
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And running &lt;code&gt;kubectl get pod pod-identity-webhook-test -o yaml&lt;/code&gt; you should see that the Pod has been mutated and now contains the expected volumes and environment variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing that it works.
&lt;/h3&gt;

&lt;p&gt;To confirm everything is good, you can run the following&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; default pod-identity-webhook-test &lt;span class="nt"&gt;--&lt;/span&gt; aws sts get-caller-identity
&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"UserId"&lt;/span&gt;: &lt;span class="s2"&gt;"AROAV6PNU2XQTMAZ64FBK:botocore-session-1647675906"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Account"&lt;/span&gt;: &lt;span class="s2"&gt;"&amp;lt;account&amp;gt;"&lt;/span&gt;,
    &lt;span class="s2"&gt;"Arn"&lt;/span&gt;: &lt;span class="s2"&gt;"arn:aws:sts::409057154529:assumed-role/pod-identity-webhook-test.default.sa.&amp;lt;cluster&amp;gt;/botocore-session-1647675906"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also check that the Pod is allowed to use the granted privileges by running something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="sb"&gt;`&lt;/span&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; default pod-identity-webhook-test &lt;span class="nt"&gt;--&lt;/span&gt; aws ec2 describe-instances &lt;span class="nt"&gt;--region&lt;/span&gt; eu-central-1&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hopefully this will make the use of IRSA on kOps-based clusters much simpler. And I hope this post will explain how things work under the hood.&lt;/p&gt;

&lt;p&gt;As always, I appreciate feedback on this feature and if this is useful for you.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>aws</category>
      <category>kops</category>
      <category>devops</category>
    </item>
    <item>
      <title>Kubernetes with IPv6 on AWS</title>
      <dc:creator>Ole Markus With</dc:creator>
      <pubDate>Wed, 13 Oct 2021 18:27:12 +0000</pubDate>
      <link>https://dev.to/olemarkus/kubernetes-with-ipv6-on-aws-290d</link>
      <guid>https://dev.to/olemarkus/kubernetes-with-ipv6-on-aws-290d</guid>
      <description>&lt;p&gt;The Kubernetes ecosystem has been working hard on supporting IPv6 the last few years, and kOps is no different.&lt;br&gt;
There are two ways we have been exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running with a private subnet with Pods IPs behind NAT.&lt;/li&gt;
&lt;li&gt;Running with a public subnet with fully routable Pod IPs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both of these sort of work on AWS, but it is not without its caveats.&lt;/p&gt;
&lt;h1&gt;
  
  
  Configuring the cluster
&lt;/h1&gt;

&lt;p&gt;Regardless of what mode is used, the VPC needs IPv6 enabled, and each instance need an allocated IPv6 address that is added to their respective Node object. This is all handled by kOps and the Cloud Controller Manager.&lt;/p&gt;
&lt;h2&gt;
  
  
  Private IPs
&lt;/h2&gt;

&lt;p&gt;A cluster with private IPv6 addresses is relatively simple to set up. As with IPv4, the cluster is configured with one flat IPv6 CIDR and CNI takes care to configure routes and tunnelling between the instances, masq traffic destined for external IPs and so on.&lt;/p&gt;

&lt;p&gt;You can configure the Cluster spec directly to use IPv6, but kOPs also provides teh &lt;code&gt;--ipv6&lt;/code&gt; flag to simplify the configuration.&lt;/p&gt;
&lt;h2&gt;
  
  
  Public IPs
&lt;/h2&gt;

&lt;p&gt;Running with private IPv6 addresses is nice for testing how well K8s and K8s components work with IPv6, but the true advantages come when the IPs are publicly routable. The obviation of NAT, tunnelling, and overlay networking in itself gives a performance boost, but you can also do things such as having cloud load balancer directly target Pods instead of going through NodePorts and bouncing off kube-proxy.&lt;/p&gt;

&lt;p&gt;kOps supports public IPs on AWS by assigning an IPv6 prefix to each Node's primary interface and using this prefix as the Node's Pod CIDR.&lt;/p&gt;

&lt;p&gt;This means any CNI that supports Kubernetes IPAM (and most do) can support publicly routable IPv6 addresses.&lt;/p&gt;

&lt;p&gt;In order to run in this mode, just add &lt;code&gt;spec.podCIDRFromCloud: true&lt;/code&gt; to the Cluster spec.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kgp -o wide
NAME                                                                  READY   STATUS    RESTARTS   AGE   IP                                       NODE                                          NOMINATED NODE   READINESS GATES
aws-cloud-controller-manager-rm9bf                                    1/1     Running   0          16h   172.20.52.202                            ip-172-20-52-202.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cert-manager-58c7f89d46-5ttmx                                         1/1     Running   0          16h   2a05:d018:4ea:8101:ba62::f4c8            ip-172-20-52-202.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cert-manager-cainjector-5998558479-lvvsr                              1/1     Running   0          16h   2a05:d018:4ea:8101:ba62::6d33            ip-172-20-52-202.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cert-manager-webhook-756bb49f7d-f4pfh                                 1/1     Running   0          16h   2a05:d018:4ea:8101:ba62::2cdc            ip-172-20-52-202.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cilium-7mjbl                                                          1/1     Running   0          16h   2a05:d018:4ea:8103:6f5a:dc57:f7b7:b73a   ip-172-20-97-249.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cilium-operator-677b9469b7-8pndm                                      1/1     Running   0          16h   172.20.52.202                            ip-172-20-52-202.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cilium-psxfs                                                          1/1     Running   0          16h   2a05:d018:4ea:8101:2cc1:f30c:f885:6e6f   ip-172-20-54-232.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
cilium-wq6xg                                                          1/1     Running   0          16h   2a05:d018:4ea:8102:ccc:bcce:24de:4840    ip-172-20-81-228.eu-west-1.compute.internal   &amp;lt;none&amp;gt;           &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Yes, some Pods with &lt;code&gt;hostNetworking: true&lt;/code&gt; have IPv4 addresses here. The reason for that is that Pods receive the IP that the Node had at the time, which in the case of the control plane was IPv4 as the Node came up before Cloud Controller Manager assigned it an IPv6 address)&lt;/p&gt;

&lt;h1&gt;
  
  
  Can I use this in production?
&lt;/h1&gt;

&lt;p&gt;So the big question is how mature is running IPv6 clusters on AWS?&lt;/p&gt;

&lt;p&gt;Not very. Yet.&lt;/p&gt;

&lt;p&gt;Taking the simpler private IP mode first, we found various issues with how various components decide which IP to use. E.g metrics-server will pick the first IP on the Node object regardless of what the Pod IP is. So ordering of the Node IPs matter. CNIs still show behavior that suggests it is not that well-tested yet. For example &lt;a href="https://github.com/cilium/cilium/issues/11263"&gt;Cilium struggles with routing issues in this 18-months-old issue&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For public IPs, there are some additional problems. On most Linux distro's &lt;code&gt;accept_ra=2&lt;/code&gt; sysctl must be set on the correct interfaces. And since the interface name depends on distro and instance type, this is a bit tricky. On Ubunutu, this is not need because Systemd has taken over a lot of the kernel responsibilities in this area. Systemd is not without bugs though, so when IPv6 single-address DHCPv6 is mixed with prefix delegation, &lt;a href="https://github.com/systemd/systemd/issues/20803"&gt;DHCPv6 breaks&lt;/a&gt;. Hopefully this fix will make it into Ubuntu soon. Cilium works around this issue, but all other CNIs lose Node connectivity about 5 min after kOps configuration has finished.&lt;/p&gt;

&lt;p&gt;Then there are various important apps that do not understand IPv6 well. Many will try to talk to the IPv4 metadata API, for example. If you are lucky, the application use a new version of the AWS SDK so you can set &lt;code&gt;AWS_EC2_METADATA_SERVICE_ENDPOINT_MODE=IPv6&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One of the benefits I mentioned above was using Pods as targets for load balancers. This is a feature that &lt;a href="https://kops.sigs.k8s.io/addons/#aws-load-balancer-controller"&gt;AWS Load Balancer Controller&lt;/a&gt; supports. But alas! AWS has two endpoints for the EC2 API. A single-stack IPv4 endpoint at &lt;code&gt;ec2.&amp;lt;region&amp;gt;.amazonaws.com&lt;/code&gt; and a dual-stack one at &lt;a href="https://api.ec2.eu-west-1.aws%60"&gt;https://api.ec2.eu-west-1.aws`&lt;/a&gt;. The SDK will use the former unless configured in code to use something else, and this is not currently possible. There is a &lt;a href="https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/2179"&gt;pull request&lt;/a&gt; for this, but that only brings you to the next component. And if you want to use &lt;a href="https://kops.sigs.k8s.io/addons/#cluster-autoscaler"&gt;Cluster Autocaler&lt;/a&gt; you are also out of luck because AWS doesn't provide a dual-stack endpoint of the autoscaling API at all.&lt;/p&gt;

&lt;p&gt;Even if IPv6 worked perfectly on cluster level, and AWS provides dual-stack endpoints for all their APIs, you would probably need to talk to other resources that only provides IPv4 IPs. In order to reach those, AWS would have to provide DNS64/NAT64, which can allow resources with single-stack IPv6 addresses to talk to resources with single-stack IPv4 addresses.&lt;/p&gt;

&lt;p&gt;Hopefully support for this will be available soon.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ipv6</category>
      <category>kubernetes</category>
      <category>kops</category>
    </item>
    <item>
      <title>Using IAM Roles for ServiceAccounts on kOps</title>
      <dc:creator>Ole Markus With</dc:creator>
      <pubDate>Wed, 19 May 2021 06:00:37 +0000</pubDate>
      <link>https://dev.to/olemarkus/irsa-support-for-kops-1doe</link>
      <guid>https://dev.to/olemarkus/irsa-support-for-kops-1doe</guid>
      <description>&lt;p&gt;&lt;em&gt;This feature has now been implemented and available for some time. See &lt;a href="https://kops.sigs.k8s.io/cluster_spec/#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa"&gt;the official docs&lt;/a&gt;. Note that the feature flag mentioned below has been replaced with: &lt;code&gt;spec.iam.    useServiceAccountExternalPermissions: true&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Until recently, the only way for a Pod to use the AWS API was to either provision static credentials or assign additional IAM Policies to the Nodes Pods were running on. kOps addons rely on the latter, which has several issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All other Pods running on the same Node would have the same permissions.&lt;/li&gt;
&lt;li&gt;EC2 Instances cannot enforce IMDSv2 with &lt;code&gt;http-put-request-hop-limit: 1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;kOps mitigates these concerns by letting addons run on the Control Plane (CP) Nodes. Unfortunately, out of the box, kOps only protect the CP Nodes with &lt;em&gt;Taints&lt;/em&gt;, and any cluster user can add &lt;em&gt;Tolerations&lt;/em&gt; to Pods and schedule them on the CP Nodes.&lt;/p&gt;

&lt;p&gt;The solution to this is to create dedicated IAM Roles for each of the addon Pods, and reduce the privileges given to the IAM Roles assigned to the EC2 instances.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kops.sigs.k8s.io/releases/1.21-notes/"&gt;kOps 1.21&lt;/a&gt; introduces a set of features that in sum enables &lt;a href="https://kops.sigs.k8s.io/cluster_spec/#service-account-issuer-discovery-and-aws-iam-roles-for-service-accounts-irsa"&gt;IAM Roles for ServiceAccounts&lt;/a&gt; (IRSA).&lt;/p&gt;

&lt;p&gt;Let us have a look at how to enable support for IRSA.&lt;/p&gt;

&lt;h2&gt;
  
  
  ServiceAccount Issuer Discovery
&lt;/h2&gt;

&lt;p&gt;The first feature needed to support IRSA is what Kubernetes refers to as &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-issuer-discovery"&gt;Service Account Issuer Discovery&lt;/a&gt;. Essentially it means publishing the OIDC issuer discovery metadata, which contains things like the public key of the ServiceAccount token signing keys. By default, the Kubernetes API Server will publish this on the API Server, but this doesn't work out-of-the box on kOps clusters. AWS also requires the documents to be published in a globally readable location. It is technically possible to expose the API Server on a public IP and allow anonymous access to the OIDC Discovery metadata, but many would be uncomfortable doing so. When this feature is configured, kOps will publish these documents to a &lt;em&gt;VFS path&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;VFS path&lt;/em&gt; is a &lt;em&gt;Virtual File System&lt;/em&gt; path that kOps also uses for storing configurations, secrets, and keys, e.g. the path pointing to the kOps state store is a &lt;em&gt;VFS path&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Right now, only S3 is supported, as we need to implement support for converting a VFS path to the corresponding HTTPS endpoint, e.g. from &lt;code&gt;s3://&amp;lt;bucket&amp;gt;/&amp;lt;path&amp;gt;&lt;/code&gt; to &lt;code&gt;https://&amp;lt;bucket&amp;gt;.s3.&amp;lt;region&amp;gt;.amazonaws.com/&amp;lt;path&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In order to enable this feature, you only need to add the following to the cluster spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountIssuerDiscovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;discoveryStore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://&amp;lt;my bucket&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to use this with AWS, take care that there is no policy preventing public access to the objects stored therein.&lt;/p&gt;

&lt;p&gt;Once you have OIDC discovery metadata published, you can configure any OIDC consumer that supports OIDC issuer discovery to establish trust with your service accounts. This is not limited to AWS, but can be used if you want your ServiceAccounts to authenticate natively to &lt;a href="https://www.vaultproject.io/docs/auth/jwt"&gt;Hashicorp Vault&lt;/a&gt; or any other OIDC consumer that supports OIDC issuer discovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS OIDC Provider
&lt;/h2&gt;

&lt;p&gt;The purpose of this feature is to make AWS trust the Kubernetes ServiceAccounts so that the ServiceAccounts can assume AWS IAM Roles. kOps will do this for you if you add the following to the spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountIssuerDiscovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enableAWSOIDCProvider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Using IAM Roles for ServiceAccounts belonging to kOps addons
&lt;/h1&gt;

&lt;p&gt;All addons that require access to the AWS API currently run on the Control Plane (CP) Nodes and assume the instance role in order to access AWS services. This is problematic because any other Pod running on CP Nodes can assume the instance role as well. And we cannot use &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html"&gt;IMDSv2&lt;/a&gt; with &lt;code&gt;http-put-response-hop-limit: 1&lt;/code&gt; as that would block addons, too.&lt;/p&gt;

&lt;p&gt;With the features above in place, each addon will be ported to using IRSA instead. Each addon will get a dedicated role it can assume that has exactly the privileges it needs. kOps will then automatically configure the Pods to use IRSA as well. Enabling IRSA for kOps addons is then entirely transparent. The corresponding privileges are also then removed from the CP Nodes.&lt;/p&gt;

&lt;p&gt;At the moment, using IRSA for kOps addons requires the &lt;code&gt;UseServiceAccountIAM&lt;/code&gt; feature flag enabled, as we feel we have not tested the functionality enough. We are also missing the ability to override/augment the IAM Policy that the ServiceAccount uses, which can be necessary, e.g. if you want to use &lt;a href="https://kops.sigs.k8s.io/addons/#cert-manager"&gt;cert-manager&lt;/a&gt; DNS validation for your own domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating IAM Roles for your own workloads
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Provision the IAM Roles
&lt;/h3&gt;

&lt;p&gt;kOps can provision IAM Roles for your &lt;em&gt;workloads&lt;/em&gt; (Deployments, StatefulSets, Jobs, etc.), including the IAM Policy Statement that allows the workload's ServiceAccount to assume the IAM Role and grant the role the privileges you want.&lt;/p&gt;

&lt;p&gt;You can let the role assume existing policies, or you can define the policy inline like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;iam&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;serviceAccountExternalPermissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;someServiceAccount&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;someNamespace&lt;/span&gt;
        &lt;span class="na"&gt;aws&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;policyARNs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::000000000000:policy/somePolicy&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anotherServiceAccount&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anotherNamespace&lt;/span&gt;
        &lt;span class="na"&gt;aws&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;inlinePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|-&lt;/span&gt;
            &lt;span class="s"&gt;[&lt;/span&gt;
              &lt;span class="s"&gt;{&lt;/span&gt;
                &lt;span class="s"&gt;"Effect": "Allow",&lt;/span&gt;
                &lt;span class="s"&gt;"Action": "s3:ListAllMyBuckets",&lt;/span&gt;
                &lt;span class="s"&gt;"Resource": "*"&lt;/span&gt;
              &lt;span class="s"&gt;}&lt;/span&gt;
            &lt;span class="s"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring Pods to use IRSA
&lt;/h3&gt;

&lt;p&gt;One thing to bear in mind is that kOps will not "own" ServiceAccounts the same way EKS does when using IRSA. So you have to modify your workloads as appropriately yourself.&lt;/p&gt;

&lt;p&gt;Typically, you will use environment variables to configure the AWS SDK to use IRSA. The following shows the changes you have to make to the Pod spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_DEFAULT_REGION&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;region&amp;gt;&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_REGION&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;region&amp;gt;&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_ROLE_ARN&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::&amp;lt;account&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;number&amp;gt;:role/&amp;lt;role&amp;gt;"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_WEB_IDENTITY_TOKEN_FILE&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/run/secrets/amazonaws.com/serviceaccount/token"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_STS_REGIONAL_ENDPOINTS&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;regional"&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/run/secrets/amazonaws.com/serviceaccount/"&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-token&lt;/span&gt;
    &lt;span class="na"&gt;projected&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;serviceAccountToken&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazonaws.com"&lt;/span&gt;
          &lt;span class="na"&gt;expirationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;86400&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you prefer, you could create ServiceAccounts with these details and use the &lt;a href="https://github.com/aws/amazon-eks-pod-identity-webhook"&gt;EKS identity webhook&lt;/a&gt;, but I don't see kOps supporting that webhook as a native addon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero-configuration IRSA
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This feature is now available. Read more in &lt;a href="https://dev.to/olemarkus/zero-configuration-irsa-on-kops-1po1"&gt;this post&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You don't have to care about anything in order for kOps addons to use IRSA. I would really like this to be the case for your own workloads as well.&lt;/p&gt;

&lt;p&gt;Since you define the relationship between AWS IAM and ServiceAccount in the Cluster spec, and the changes you have to make to your Pod spec just mirror that relationship, &lt;em&gt;something&lt;/em&gt; could automatically read the Cluster spec and configure workloads for you.&lt;/p&gt;

&lt;p&gt;This would have to be an addon that either provides a webhook similar to the EKS identity webhook, or acts as a controller that watch all workloads in the cluster. It is debatable if such an addon really should be a part of the kOps project or if this should be standalone. &lt;/p&gt;

&lt;p&gt;I would really love to hear how &lt;em&gt;you&lt;/em&gt; would want this to behave. If you have any ideas, comment here or reach out in #kops-users on the &lt;a href="https://slack.k8s.io/"&gt;Kubernetes Slack&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>kops</category>
      <category>aws</category>
    </item>
    <item>
      <title>Blazing fast Kubernetes scaling with ASG warm pools</title>
      <dc:creator>Ole Markus With</dc:creator>
      <pubDate>Mon, 19 Apr 2021 15:46:56 +0000</pubDate>
      <link>https://dev.to/olemarkus/blazing-fast-kubernetes-scaling-with-asg-warm-pools-53bd</link>
      <guid>https://dev.to/olemarkus/blazing-fast-kubernetes-scaling-with-asg-warm-pools-53bd</guid>
      <description>&lt;p&gt;Last week, AWS launched &lt;a href="https://aws.amazon.com/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/" rel="noopener noreferrer"&gt;warm pools for auto scaling groups&lt;/a&gt; (ASG). In short, this feature allows you to create a pool of pre-initialised EC2 instances. When the ASG needs to scale out, it will pull in Nodes from the warm pool if available. Since these are already pre-initialised, the scale-out time is reduced significantly.&lt;/p&gt;

&lt;p&gt;The warm pool is also virtually for free. You pay for the warm pool instances as you would for any other stopped instance. You also have to pay for the time the instances spend on initialising when entering the warm pool, but this is more or less cancelled out by the reduced time spent on initialising when entering the ASG itself.&lt;/p&gt;

&lt;p&gt;What does &lt;em&gt;pre-initialised&lt;/em&gt; mean, you wonder? Took a while for me to understand what that means too. What happens is that the EC2 instance boots, runs for a while, and then shuts down. We'll get back to this in a bit.&lt;/p&gt;

&lt;p&gt;Now, what intrigued me is this bit from the &lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html" rel="noopener noreferrer"&gt;warm pool documentation&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o1fme1c8aa5tbvlrjh3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o1fme1c8aa5tbvlrjh3.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a very interesting claim. Why on earth would this feature not be possible for Kubernetes regardless of what shape and form self-managed could Kubernetes comes in?&lt;/p&gt;

&lt;p&gt;AWS may have made interesting choices with their &lt;em&gt;Elastic Kubernetes Service&lt;/em&gt; precluding them from taking advantage of warm pools, but there are alternatives!&lt;/p&gt;

&lt;p&gt;Over the last year or so I have regularly contributed to &lt;a href="https://kops.sigs.k8s.io/" rel="noopener noreferrer"&gt;kOps&lt;/a&gt;, which is my preferred way of deploying and maintaining &lt;em&gt;production-ready&lt;/em&gt; clusters on AWS. I know well how it boots a plain Ubuntu &lt;em&gt;instance&lt;/em&gt; and configures it to become a Kubernetes &lt;em&gt;node&lt;/em&gt;, and I could not imagine implementing warm pool support would be any challenge. And turns out it was not either.&lt;/p&gt;

&lt;h1&gt;
  
  
  The results
&lt;/h1&gt;

&lt;p&gt;This post will describe in detail some of the inner workings of kOps, how ASG behaves, and how to observe various time spans between when a scale-out is triggered and the node is ready for Kubernetes workloads.&lt;/p&gt;

&lt;p&gt;If you are here only for the results, here is the TL;DR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time between a scale-out is triggered and Pods start improved &lt;em&gt;at least 50%&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;It should be possible to improve this even further.&lt;/li&gt;
&lt;li&gt;Most, if not all, of the functionality below &lt;em&gt;will be available in kOps 1.21&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;This table shows the number of seconds between CAS being triggered and Pods starting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;First Pod started&lt;/th&gt;
&lt;th&gt;Last Pod started&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No warm pool&lt;/td&gt;
&lt;td&gt;149&lt;/td&gt;
&lt;td&gt;190&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm pool&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;149&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm pool plus life cycle hook&lt;/td&gt;
&lt;td&gt;76&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm pool plus life cycle hook + warm-pull images&lt;/td&gt;
&lt;td&gt;70&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  And now for the details!
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Initialising a Kubernetes node
&lt;/h2&gt;

&lt;p&gt;First, let us have a look at what the process of converting a brand new EC2 &lt;em&gt;instance&lt;/em&gt; to a Kubernetes &lt;em&gt;Node&lt;/em&gt; means.&lt;br&gt;
On a new EC2 instance, this happens on first boot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cloud-init installs a configuration service called &lt;code&gt;nodeup&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nodeup&lt;/code&gt; takes the cluster configuration and installs &lt;code&gt;containerd&lt;/code&gt;, &lt;code&gt;kubelet&lt;/code&gt;, and the necessary distro packages with their correct configurations.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nodeup&lt;/code&gt; establishes trust with the API server (control plane).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nodeup&lt;/code&gt; creates and installs a &lt;code&gt;systemd&lt;/code&gt; service for &lt;code&gt;kubelet&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nodeup&lt;/code&gt; starts the &lt;code&gt;kubelet&lt;/code&gt; service, which is the process on each node that manages the Kubernetes workloads.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kubelet&lt;/code&gt; pulls down all the images the control plane tells it to run and starts them as defined by Pod specs and similar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a kOps-provisioned instance reboots, &lt;code&gt;nodeup&lt;/code&gt; runs through all of the above again to ensure the instance is in the expected state. &lt;code&gt;nodeup&lt;/code&gt; is smart enough not to redo already performed tasks though, so the second run is quite fast.&lt;/p&gt;
&lt;h2&gt;
  
  
  Doing nothing at all
&lt;/h2&gt;

&lt;p&gt;The most naïve way of implementing support for warm pools is to do nothing more than creating the warm pool. Unfortunately, this would start &lt;code&gt;kubelet&lt;/code&gt;, which will register the Node with the cluster. Since the &lt;a href="https://github.com/kubernetes/cloud-provider-aws" rel="noopener noreferrer"&gt;AWS cloud provider&lt;/a&gt; does not remove instances in &lt;code&gt;stopped&lt;/code&gt; state, the control plane marks the Node &lt;code&gt;NotReady&lt;/code&gt;, but keep it around in case it comes back up.&lt;/p&gt;

&lt;p&gt;It is not a catastrophe to have a large amount of &lt;code&gt;NotReady&lt;/code&gt; Nodes in the cluster, but any sane monitoring would not be too happy, and &lt;em&gt;detecting actual bad Nodes would be harder&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Making &lt;code&gt;nodeup&lt;/code&gt; aware of the warm pool
&lt;/h2&gt;

&lt;p&gt;The only thing I had to do to support warm pools gracefully was to make &lt;code&gt;nodeup&lt;/code&gt; conscious of its &lt;em&gt;ASG lifecycle state&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Instances entering the warm pool have a &lt;a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/warm-pool-instance-lifecycle.html" rel="noopener noreferrer"&gt;slightly different lifecycle&lt;/a&gt; than instances that go directly into the ASG. &lt;code&gt;nodeup&lt;/code&gt; needs to do is to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;check if the current instance ASG lifecycle state has a &lt;code&gt;warming:&lt;/code&gt; prefix.&lt;/li&gt;
&lt;li&gt;if it does &lt;em&gt;not&lt;/em&gt;, install and start the &lt;code&gt;kubelet&lt;/code&gt; service.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This way &lt;code&gt;kubelet&lt;/code&gt; does not start and join the cluster on first boot, but since we enabled the service, &lt;code&gt;systemd&lt;/code&gt; will start it on the second boot.&lt;/p&gt;
&lt;h1&gt;
  
  
  Comparing the difference
&lt;/h1&gt;

&lt;p&gt;In this section I will take you through comparing the time it takes to scale out a Kubernetes Deployment with and without a warm pool enabled. The &lt;em&gt;acid test&lt;/em&gt; is the interval between &lt;a href="https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler" rel="noopener noreferrer"&gt;Cluster Autoscaler&lt;/a&gt; (CAS) reacts to the scale-out demand and all the Pods starting. &lt;/p&gt;
&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;In the kOps Cluster spec I ensured I had the following snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clusterAutoscaler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;balanceSimilarNodeGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables the &lt;a href="https://kops.sigs.k8s.io/addons/#cluster-autoscaler" rel="noopener noreferrer"&gt;cluster autoscaler addon&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The number of new instances per ASG influences the scale-out time of the ASG itself. Adding 9 instances to one ASG is significantly slower than adding 3 instances to 3 ASGs. So, to ensure fair comparisons, we tell CAS to balance the ASGs.&lt;/p&gt;

&lt;p&gt;On each of the InstanceGroups with the &lt;code&gt;Node&lt;/code&gt; role, I set the following capacity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;machineType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;t3.medium&lt;/span&gt;
  &lt;span class="na"&gt;maxSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;minSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cluster will launch and with the with the minimum capacity.&lt;/p&gt;

&lt;p&gt;I then created a Deployment that has &lt;code&gt;resource.requirement.cpu&lt;/code&gt; set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-deployment&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:stable-alpine&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used the &lt;code&gt;nginx:stable-alpine&lt;/code&gt; image here as it is fairly small. I did not want image pull time to significantly impact the scale-out time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling out
&lt;/h2&gt;

&lt;p&gt;To scale the Deployment, I executed the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl scale deployment.v1.apps/nginx-deployment &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;t3.medium&lt;/code&gt; instance only has 2 cpus, some of which are reserved by other Pods already. So, increasing the replicas as above causes CAS to scale out one instance per &lt;em&gt;replica&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Obtaining the test results
&lt;/h2&gt;

&lt;p&gt;A good way of getting the details is to list the events for a Pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get events &lt;span class="nt"&gt;-o&lt;/span&gt; custom-columns&lt;span class="o"&gt;=&lt;/span&gt;FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message &lt;span class="nt"&gt;--field-selector&lt;/span&gt; involvedObject.kind&lt;span class="o"&gt;=&lt;/span&gt;Pod,involvedObject.name&lt;span class="o"&gt;=&lt;/span&gt;nginx-deployment-123abc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For all the configurations in this post, I ran the following command on the first and last Pod to enter the &lt;code&gt;Running&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;For the first Pod, I got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FirstSeen              LastSeen               Count   From                 Type      Reason             Message
2021-04-17T08:56:30Z   2021-04-17T08:56:30Z   1       cluster-autoscaler   Normal    TriggeredScaleUp   Pod triggered scale-up: [{Nodes-eu-central-1a 1-&amp;gt;5 (max: 10)} {Nodes-eu-central-1b 1-&amp;gt;4 (max: 10)}]
&amp;lt;snip&amp;gt;
2021-04-17T08:58:59Z   2021-04-17T08:58:59Z   1       `kubelet`              Normal    Started            Started container nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the last Pod I got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FirstSeen              LastSeen               Count   From                 Type      Reason             Message
2021-04-17T08:56:30Z   2021-04-17T08:56:30Z   1       cluster-autoscaler   Normal    TriggeredScaleUp   Pod triggered scale-up: [{Nodes-eu-central-1a-&amp;gt;5 (max: 10)} {Nodes-eu-central-1b 1-&amp;gt;4 (max: 10)}]
&amp;lt;snip&amp;gt;
2021-04-17T08:59:40Z   2021-04-17T08:59:40Z   1       `kubelet`              Normal    Started            Started container nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the first Pod started after 149s and the last Pod after 190s. These are the numbers I'll be comparing across all configurations. I also found it interesting to compare the &lt;em&gt;difference&lt;/em&gt; between the first and last Pod start time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hunting for delays
&lt;/h2&gt;

&lt;p&gt;This part may not be all that interesting. Here I just try to show which time spans can be interesting for improving the warming process, including explaining what may cause a 41 second difference between those two Pods.&lt;/p&gt;

&lt;h3&gt;
  
  
  ASG reaction time
&lt;/h3&gt;

&lt;p&gt;If I look at the ASG activity, I see the following message on all instances the ASG launched: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;At 2021-04-17T08:56:30Z a user request explicitly set group desired capacity changing the desired capacity from 1 to 5. At 2021-04-17T08:56:32Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 5.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;However, if I look at the actual boot of the instances, I see these two lines for the first and last instance to boot respectively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Logs begin at Sat 2021-04-17 08:56:51 UTC, end at Sat 2021-04-17 09:10:23 UTC. --
-- Logs begin at Sat 2021-04-17 08:57:13 UTC, end at Sat 2021-04-17 09:09:19 UTC. --
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So even if the ASG &lt;em&gt;launched&lt;/em&gt; the instances at the same time, they do not actually &lt;em&gt;boot&lt;/em&gt; at the same time. &lt;/p&gt;

&lt;p&gt;It looks like we can generally assume 20-30 seconds response time on an ASG scale-out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nodeup run time
&lt;/h3&gt;

&lt;p&gt;We can see that the &lt;code&gt;nodeup&lt;/code&gt; spends a fairly consistent amount of time initialising the node. The times below show when &lt;code&gt;nodeup&lt;/code&gt; finished on the two Nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Apr 17 08:58:27 ip-172-20-42-5 `systemd`[1]: kops-configuration.service: Succeeded.
Apr 17 08:58:50 ip-172-20-90-140 `systemd`[1]: kops-configuration.service: Succeeded.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The delay is almost the same. From boot until the instance has been configured takes about 95-100 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubelet becoming ready
&lt;/h3&gt;

&lt;p&gt;The last part is the time it takes before node enters &lt;code&gt;Ready&lt;/code&gt; state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubelet&lt;/code&gt; becomes ready once it has registered with the control plane and verified storage, CPU, memory, and networking is working properly.&lt;/p&gt;

&lt;p&gt;Interestingly, this part further skewed the difference between our two Nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Ready                True    Sat, 17 Apr 2021 11:19:08 +0200   Sat, 17 Apr 2021 10:58:53 +0200   KubeletReady                 `kubelet` is posting ready status. AppArmor enabled
  Ready                True    Sat, 17 Apr 2021 11:14:56 +0200   Sat, 17 Apr 2021 10:59:25 +0200   KubeletReady                 `kubelet` is posting ready status. AppArmor enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This last leg took 26s and 35s for those two instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter the warm pool
&lt;/h2&gt;

&lt;p&gt;So how much does adding a warm pool improve the scale-out time?&lt;/p&gt;

&lt;p&gt;Wrap up this test by scaling the deployment back to 0 so CAS can scale down the ASGs again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl scale deployment.v1.apps/nginx-deployment &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Adding a warm pool
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;This feature has not been released to a kOps beta at the time of the writing; the field names below may change.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;The following will make kOps create warm pools for our InstanceGroups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;warmPool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;minSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;maxSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specifically, this will configure a warm pool with 10 Nodes. I only do this to ensure I have a known amount of warm instances to make the the tests comparable. When using warm pools under normal operations, I would just use AWS defaults.&lt;/p&gt;

&lt;p&gt;Apply the configuration and watch the warm pool instances appear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kops get instances
ID                      NODE-NAME                                       STATUS       ROLES   STATE           INTERNAL-IP     INSTANCE-GROUP               MACHINE-TYPE
i-01ade8dad4c7ce0cd     ip-172-20-114-104.eu-central-1.compute.internal UpToDate     node                    172.20.114.104  Nodes-eu-central-1c          t3.medium
i-01f169e730f88e016     ip-172-20-118-255.eu-central-1.compute.internal UpToDate     master                  172.20.118.255  master-eu-central-1c.masters t3.medium
i-069e09e5873042cd7     ip-172-20-93-29.eu-central-1.compute.internal   UpToDate     master                  172.20.93.29    master-eu-central-1b.masters t3.medium
i-09b42c88fbd3399ab     ip-172-20-58-68.eu-central-1.compute.internal   UpToDate     node                    172.20.58.68    Nodes-eu-central-1a          t3.medium
i-0a85aed8869a30432     ip-172-20-59-147.eu-central-1.compute.internal  UpToDate     master                  172.20.59.147   master-eu-central-1a.masters t3.medium
i-0b37f6a258d9c7775     ip-172-20-75-150.eu-central-1.compute.internal  UpToDate     node                    172.20.75.150   Nodes-eu-central-1b          t3.medium
i-0c16b3c668615f259                                                     UpToDate     node    WarmPool        172.20.50.226   Nodes-eu-central-1a          t3.medium
i-0cdf1c334d452c9a6                                                     UpToDate     node    WarmPool        172.20.126.45   Nodes-eu-central-1c          t3.medium
i-0d00f4debb586d17f                                                     UpToDate     node    WarmPool        172.20.116.222  Nodes-eu-central-1c          t3.medium
i-0d04d04cb7f4be2d7                                                     UpToDate     node    WarmPool        172.20.58.177   Nodes-eu-central-1a          t3.medium
i-0d49db4113702292c                                                     UpToDate     node    WarmPool        172.20.59.66    Nodes-eu-central-1a          t3.medium
i-0e07b6cc361d7a909                                                     UpToDate     node    WarmPool        172.20.89.193   Nodes-eu-central-1b          t3.medium
i-0f3451adef13005e7                                                     UpToDate     node    WarmPool        172.20.114.9    Nodes-eu-central-1c          t3.medium
&amp;lt;snip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The kOps output only shows that the instance is in the warm pool, not if it has finished pre-initialisation. But if you go into the &lt;em&gt;Instance Management&lt;/em&gt; part of the ASG in the AWS console I can see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxl2dg4f26nyn6krdqfe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxl2dg4f26nyn6krdqfe.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, not all instances have entered &lt;code&gt;warmed:stopped&lt;/code&gt; state yet. But after waiting a bit longer, they are all ready and I can scale out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl scale deployment.v1.apps/nginx-deployment &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once all the Pods have entered the &lt;code&gt;Running&lt;/code&gt; state, let's look if this has improved.&lt;/p&gt;

&lt;p&gt;Again, find the first and last Pod that entered the &lt;code&gt;Running&lt;/code&gt; state and list their events. The method is the same as last time, and it shows that the first Pod starts after 79s and the last one starts after 149s. &lt;/p&gt;

&lt;p&gt;The first Pod starts 70 seconds faster than the first node without a warm pool. The last Pod starts 41 seconds faster than the last Pod without warm instances. That is a pretty decent improvement.&lt;/p&gt;

&lt;p&gt;Without the warm pool, the difference between the first Pod and last Pod was 41s. This time it was a whopping 70s; half a minute more. We cannot seem to blame this on ASG response time or the time between the &lt;code&gt;kubelet&lt;/code&gt; starting and becoming ready. &lt;/p&gt;

&lt;p&gt;So, what is happening here?&lt;/p&gt;

&lt;p&gt;Turns out the time an instance runs before it is shut down is &lt;em&gt;completely arbitrary&lt;/em&gt;. Some instances stay running for seconds, others for minutes. On the Node the first Pod is running on, &lt;code&gt;nodeup&lt;/code&gt; was allowed run to until completion, while on the Node of the last Pod, it was barely allowed to run at all. Luckily &lt;code&gt;nodeup&lt;/code&gt; is fairly good at knowing the current state of the Node and can pick up from where it left of regardless of when it was interrupted.&lt;/p&gt;

&lt;h1&gt;
  
  
  Enter lifecycle hooks
&lt;/h1&gt;

&lt;p&gt;But we do not want &lt;code&gt;nodeup&lt;/code&gt; to be interrupted. Nor do we want the instance to stay running long after &lt;code&gt;nodeup&lt;/code&gt; has finished.&lt;/p&gt;

&lt;p&gt;What AWS did to solve this is to let warming instances run through the same lifecycle as other instances. So, if you have a lifecycle hook for &lt;code&gt;EC2_INSTANCE_LAUNCHING&lt;/code&gt; it will trigger on warming Nodes too.&lt;/p&gt;

&lt;p&gt;Amend the InstanceGroup spec as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;warmPool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;useLifecycleHook&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will make kOps provision a lifecycle hook that can be used by &lt;code&gt;nodeup&lt;/code&gt; to signal that it has completed its configuration.&lt;/p&gt;

&lt;p&gt;Do the scale down/scale up dance again and observe the first and last Pod creation time.&lt;/p&gt;

&lt;p&gt;First Pod 76s; last Pod 98s. That's 22s difference between the first and last Pod. Down from 40s without warm pool and certainly down from the 70s for warm pool without the instance lifecycle hook.&lt;/p&gt;

&lt;h1&gt;
  
  
  The effect of warm pool alone
&lt;/h1&gt;

&lt;p&gt;With warm pool and some fairly easy changes to kOps, we did the impossible. We created warm pool support for self-managed Kubernetes.&lt;/p&gt;

&lt;p&gt;The result is significantly faster response times to Pod scale-out. From up to 190 seconds without warm pool, to up to  98 seconds with warm pool and lifecycle hooks. The best case went from 149 seconds to 76 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is 50% faster&lt;/strong&gt;. Not bad! &lt;/p&gt;

&lt;h1&gt;
  
  
  Exploiting warm pools further
&lt;/h1&gt;

&lt;p&gt;So far we have focused just on running a regular &lt;code&gt;nodeup&lt;/code&gt; run. But can we do better? Can we exploit warm pool to make Pod scale out even faster?&lt;/p&gt;

&lt;p&gt;Most likely we cannot do much about what happens before &lt;code&gt;nodeup&lt;/code&gt; runs. Nodeup already runs fairly fast; roughly 10 seconds. There could be some optimisations there, but we would not be able to shave off many seconds.&lt;/p&gt;

&lt;p&gt;But what about those 26-35 seconds between &lt;code&gt;nodeup&lt;/code&gt; completing and the Pods becoming ready?&lt;/p&gt;

&lt;p&gt;All Nodes are running additional system containers such as &lt;code&gt;kube-proxy&lt;/code&gt; a CNI, in this case Cilium. All of these need to be present on the machine before much else happens. And those images are not necessarily small. &lt;/p&gt;

&lt;p&gt;In my case, the time it took was the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FirstSeen              LastSeen               Count   From      Type     Reason    Message
2021-04-18T06:10:34Z   2021-04-18T06:10:34Z   1       `kubelet`   Normal   Pulled    Successfully pulled image "k8s.gcr.io/kube-proxy:v1.20.1" in 19.120641887s
2021-04-18T06:10:55Z   2021-04-18T06:10:55Z   1       `kubelet`   Normal   Pulled    Successfully pulled image "docker.io/cilium/cilium:v1.9.4" in 8.270050914s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So not only did it take a while for each of these images to be pulled, but they are pulled in sequence, adding up to 25+ seconds.&lt;/p&gt;

&lt;p&gt;But during warming, &lt;code&gt;nodeup&lt;/code&gt; already knows about these images. What if it could just pull them so they were already present? Let's try this and see if this changes anything.&lt;/p&gt;

&lt;p&gt;After another round of the scale-in/scale-out dance we describing the node and see the image has indeed been pulled during warming.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2021-04-18T06:35:37Z   2021-04-18T06:35:37Z   1       `kubelet`   Normal   Pulled    Container image "k8s.gcr.io/kube-proxy:v1.20.1" already present on machine
2021-04-18T06:35:37Z   2021-04-18T06:35:37Z   1       `kubelet`   Normal   Started   Started container kube-proxy
2021-04-18T06:35:37Z   2021-04-18T06:35:37Z   1       `kubelet`             Normal   Pulled      Container image "docker.io/cilium/cilium:v1.9.4" already present on machine
2021-04-18T06:35:38Z   2021-04-18T06:35:38Z   1       `kubelet`             Normal   Started     Started container cilium-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also see that the containers have started at roughly the same time. &lt;/p&gt;

&lt;p&gt;Comparing this configuration with previous configuration shows this did not bring down the starting time of the first Pod by many seconds. This time the first Pod started 70 seconds after CAS triggered. But the last Pod consistently trails only a few seconds behind, at about 79 seconds. This makes the improvement worthwhile.&lt;/p&gt;

&lt;p&gt;Pulling in images during warming could also be used for any other containers that may run on a given Node. This is certainly valuable for any DaemonSet, but also for any other Pod that have a chance of being deployed to the Node. Worst-case is that you waste a bit of disk space should the Pods not be scheduled on the Node.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrap up
&lt;/h1&gt;

&lt;p&gt;The feature is not even two weeks old, so I certainly have not had the time to explore all the ways this can be exploited. &lt;code&gt;nodeup&lt;/code&gt; typically does not expect instances to reboot so there may be optimisations that can be done there as well. For example, also on second boot, &lt;code&gt;kubelet&lt;/code&gt; is triggered by &lt;code&gt;nodeup&lt;/code&gt;, which may not be necessary. If &lt;code&gt;nodeup&lt;/code&gt; successfully created the &lt;code&gt;kubelet&lt;/code&gt; service on the first run, there should be zero changes done to the system on the second run. The only reason there would be a change is if the cluster configuration changed, and those should only be applied through an instance rotation.&lt;/p&gt;

&lt;p&gt;I hope you are as excited about this as I am. And if you wonder when this all will be available to you, the answer is "shortly". Some of the functionality has already been merged to &lt;code&gt;Kubernetes/kops&lt;/code&gt;, and I obviously have the code for the rest more or less ready. I hope for all of this to be available in kOps 1.21 expected sometime in May.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>aws</category>
      <category>kops</category>
    </item>
  </channel>
</rss>
