DEV Community

Ole Markus With
Ole Markus With

Posted on


Zero-configuration IRSA on kOps

A while ago, I wrote about using IAM Roles for ServiceAccounts on kOps.
In short, this feature lets you define an AWS IAM Policy for a given ServiceAccount, and kOps will create the respective AWS IAM Role,
assign the policy and establish a trust relationship allowing the ServiceAccount to assume the IAM Role.

Challenge of configuring workloads

While kOps elegantly handles what happens on the AWS side, we had not implemented anything that configures Pods to actually make
use of the IAM Role. Indeed, some of the more frequently asked support questions
in the kOps Slack channels have been around how to configure applications to assume roles.

The kOps documentation
recommended directly adding the volumes and environment variables to the Pod spec,
but it is not obvious exactly what needs to be added, and you have to manually fetch the actual role ARN that kOps creates from the AWS API or console.

The pod identity webhook

On EKS, the pod identity webhook is commonly used as the mechanism for adding the necessary parts of the Pod spec.
This webhook looks for ServiceAccounts with a specific set of annotations telling it what ARN it can assume and various other settings. When a Pod is created that uses one of
these ServiceAccounts, the webhook mutates the Pod using information found in the ServiceAccount annotations.

Configuring these annotations is a lot simpler than directly configuring the Pod spec.
Typically, EKS-specific tooling "owns" the ServiceAccount, which makes linking the role/ServiceAccount pair simpler, but also means that
ServiceAccounts cannot be managed together with the application using them.

For various reasons, installing the webhook on kOps was not that straightforward. For example, one could tell the webhook to use mounted TLS secrets. It could only use the CSR API.
And even when the webhook was installed, you had to manually annotate ServiceAccounts with the role ARN that the Pods should try to assume.
kOps could have "owned" the ServiceAccounts configured in the Cluster spec as well, but I feel the ownership of ServiceAccounts should be with the application and not the cluster.

Webhook the kOps way

As mentioned towards the end of my previous article,
because kOps already knows the mapping between ServiceAccounts and IAM roles, there shouldn't be any need for
users to copy the ARN from AWS into the ServiceAccount annotation. Something should be able to just read the mapping in the Cluster spec
and and configure workloads accordingly.

I wrote this could be a webhook similar to the pod identity webhook. But why not just implement this as a feature in the pod identity webhook?
The EKS team was very open to the idea, and a PR later, the webhook can be configured to look for additional Pods to mutate.

After this PR, the webhook will:

  • First look for annotations on the ServiceAccount as before.
  • If no annotations are found on the ServiceAccount, the webhook will look for a mapping configured in the pod-identity-webhook ConfigMap.

Using the pod identity webhook addon

As of kOps 1.23, kOps supports the webhook as a managed addon. When installed, kOps will populate the webhook ConfigMap based on the spec.iam.serviceAccountExternalPermissions struct.


Before continuing, make sure you already have a kOps 1.23 cluster with an AWS OIDC provider enabled.
See my previous article on how to go about that.

Once your cluster is running 1.23, you can enable the webhook by adding the following to your cluster spec:

    enabled: true
    enabled: true
Enter fullscreen mode Exit fullscreen mode

The cert manager addon is required to establish the trust between the webhook and the API server.

Now run kops update cluster --yes and wait a minute or so for the control plane to deploy the addon(s).

Adding a ServiceAccount mapping

Start by granting a set of AWS privileges to a ServiceAccount:

    - aws:
        - arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
      name: pod-identity-webhook-test
      namespace: default
Enter fullscreen mode Exit fullscreen mode

Running kops update cluster you will see something like the following:

        Tags                    {Name:<clusterZ, KubernetesCluster: <cluster>,<cluster>: owned}
        ExportWithID            default-pod-identity-webhook-test

        Role          <cluster>
        ExternalPolicies        [arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess]
        Managed                 true
  +   config: '{"default/pod-identity-webhook-test":{"RoleARN":"arn:aws:iam::<account>:role/<cluster>","Audience":"","UseRegionalSTS":true,"TokenExpiration":0}}'
  -   config: '{}'

Enter fullscreen mode Exit fullscreen mode

kOps wants to create an IAM role for the ServiceAccount and assign it the AmazonEC2ReadOnlyAccess policy.

You can also see that it populates the mapping information into the pod-identity-webhook ConfigMap.

Run kops update cluster --yes to apply the changes. Then run kubectl logs -n kube-system -l app=pod-identity-webhook -f and observe the webhook picking up the mapping.

I0319 07:10:28.312786       1 cache.go:186] Adding SA default/pod-identity-webhook-test to CM cache: &{RoleARN:arn:aws:iam::<account>:role/<cluster> UseRegionalSTS:true TokenExpiration:86400}
Enter fullscreen mode Exit fullscreen mode

Deploying the workload

Once the mapping is in place, we can deploy the ServiceAccount and a Pod using that ServiceAccount. It's important to remember that the webhook will only mutate Pods on creation, so it must be aware of the mapping before the Pod is created.

Deploy the following to the cluster:

apiVersion: v1
kind: ServiceAccount
  name: pod-identity-webhook-test
  namespace: default
apiVersion: v1
kind: Pod
  name: pod-identity-webhook-test
  namespace: default
  - name: aws-cli
    image: amazon/aws-cli:latest
    - sleep
    - "300"
  serviceAccountName: "pod-identity-webhook-test"
Enter fullscreen mode Exit fullscreen mode

You should now see the following in the webhook logs:

I0319 07:39:33.373273       1 cache.go:80] Fetching sa default/pod-identity-webhook-test from cache
I0319 07:39:33.373346       1 handler.go:423] Pod was mutated. Pod=pod-identity-webhook-test, ServiceAccount=pod-identity-webhook-test, Namespace=default
I0319 07:39:33.373522       1 middleware.go:132] path=/mutate method=POST status=200 user_agent=kube-apiserver-admission body_bytes=1441
Enter fullscreen mode Exit fullscreen mode

And running kubectl get pod pod-identity-webhook-test -o yaml you should see that the Pod has been mutated and now contains the expected volumes and environment variables.

Testing that it works.

To confirm everything is good, you can run the following

$ kubectl exec -it -n default pod-identity-webhook-test -- aws sts get-caller-identity
    "UserId": "AROAV6PNU2XQTMAZ64FBK:botocore-session-1647675906",
    "Account": "<account>",
    "Arn": "arn:aws:sts::409057154529:assumed-role/<cluster>/botocore-session-1647675906"
Enter fullscreen mode Exit fullscreen mode

You can also check that the Pod is allowed to use the granted privileges by running something like the following:

`kubectl exec -it -n default pod-identity-webhook-test -- aws ec2 describe-instances --region eu-central-1`
Enter fullscreen mode Exit fullscreen mode


Hopefully this will make the use of IRSA on kOps-based clusters much simpler. And I hope this post will explain how things work under the hood.

As always, I appreciate feedback on this feature and if this is useful for you.

Top comments (0)