Nicolas Fränkel

Posted on Sep 21, 2022 • Originally published at blog.frankel.ch

Introduction to Kubernetes extensibility

#kubernetes #devops

Kubernetes offers a lot of benefits: an enormous ecosystem with plenty of actors, self-healing capabilities, etc. There's no free lunch, though. It also comes with downsides, chief among them its complexity and operating costs.

However, the more I work with Kubernetes, the more I think its most significant asset is extensibility. If you need something that the platform doesn't provide by default, there's an option to develop it yourself and integrate it. In this post, I'd like to list such extension points.

Kubernetes 101

A lot of explanations on Kubernetes focus on the architecture. I believe they go into too many details and miss the big picture. Here, I only want to highlight the basic concepts.

At its most basic level, Kubernetes is just a platform able to run container images. It stores its configuration in a distributed storage engine, etcd. The most significant part of this configuration is dedicated to the desired state for objects. For example, you only update this state when you schedule a pod using the kubectl command line.

Other components, called controllers, watch configuration changes and read the desired state. Then, they try to reconcile the desired state with the actual state. It's nothing revolutionary: Puppet is based on the same control-loop approach, and AFAIK, Chef. Generally, a controller manages a single type of object, e.g., the DeploymentController manages deployments.

The idea behind making a generic tool is to follow Pareto's Law: solve 80% of the problems with 20% of the effort. Unfortunately, the more generic the tool and the wider the user base, the more effort to customize the remaining 20%.

Kubernetes designers saw this issue as the most critical obstacle to widespread adoption. Hence, Kubernetes offers many extension points.

Extensible model

In the section above, I mentioned scheduling a pod. A pod is one of the many objects available in Kubernetes out-of-the-box. Other objects include: deployments, jobs, services, etc.

Some solutions easily fit this model. For example, one can easily create a deployment of three Hazelcast pods. It works out-of-the-box: the pods will multicast over the network, find each other, and form a cluster.

Other solutions are not so homogenous. Before KIP-500, Kafka would rely on Zookeeper. At cluster consists of at least three Zookeeper nodes and as many Kafka nodes as desired. Kubernetes makes it possible to deploy multiple images on the same pod. Yet, if all required components are on the same pod and the pod fails, it amounts to nothing. We should map one regular component to one pod.

In this case, we need a fully-featured Kubernetes manifest that describes the architecture. Because of different requirements, we will need to make it configurable. Kubernetes' ecosystem offers several alternatives to manage this problem: Kustomize and Helm count among the most popular solutions. But neither work at the desired level of abstraction, the Kafka cluster.

Therefore, Kubernetes allows designing a new Kafka object. This kind of custom object is known as a CRD . Here's a sample for a simplistic arbitrary Foo object:

apiVersion: apiextensions.k8s.io/v1       #1
kind: CustomResourceDefinition
metadata:
 name: foos.frankel.ch                    #2
spec:
 group: frankel.ch                        #3
 names:
   plural: foos                           #4
   singular: foo                          #5
   kind: Foo                              #6
 scope: Namespaced                        #7
 versions:
   - name: v1alpha1
     served: true                         #8
     storage: true                        #9
     schema:
       openAPIV3Schema:
         type: object
         properties:
           spec:
             type: object
             properties:
               bar:
                 type: string
             required: ["bar"]
         required: ["spec"]

Required header
Match the following <plural>.<group>
Group name for REST API - /apis/<group>/<version>
Plural name for the REST API - /apis/<group>/<version>/<plural>
Singular name to be used on the CLI and for display
Used in manifests
Can be either Cluster or Namespaced. A Cluster resource is declared cluster-wide, and there can be a single one per cluster; Namespaced resources can be multiple and need to be under a namespace; by default, default
A version can be enabled/disabled
The latest version must be marked as the storage version

Once you've applied this manifest, you can manage your Foo. Let's create a manifest to create a new Foo object.

apiVersion: foos.frankel.ch/v1alpha1
kind: Foo
metadata:
  name: myfoo
spec:
  bar: "whatever"

kubectl apply -f foo.yml
kubectl get foo

The above commands have updated the data model with a new Foo type and created a Foo object. But under the cover, we've only stored data in etcd via the Kubernetes API. Nothing will happen until we start a controller that watches for new objects and acts upon them. Note that the name for a controller that manages CRDs is operator.

Extensible validation

A common concern with a platform that can run third-party workloads is allowing only vetted ones. Some workloads may consume too many resources; others may be malicious.

Here are two concrete scenarios:

As the cluster operator, you want to manage your cluster's limited physical resources (CPU/memory) and share them among all pods. For this, you want to enforce that each pod describes its resources requirements. Developers achieve this by setting the request and limits attributes. You want to disallow pods that don't have them.
As a security-minded operator, you want to prevent privilege escalation. It shouldn't change the final behavior of the pod. You want to add the allowPrivilegeEscalation=false to every pod.

While one can manage both cases through a "build" pipeline, Kubernetes provides a solution out-of-the-box.

As I explained above, Kubernetes stores configuration is etcd while controllers watch changes and act upon them. To prevent unwanted behavior, the safest way is to validate payloads that change configuration; it's the role of admission controllers.

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. The controllers consist of the list below, are compiled into the kube-apiserver binary, and may only be configured by the cluster administrator. In that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. These execute the mutating and validating (respectively) admission control webhooks which are configured in the API.

-- Using Admission Controllers

In short, two kinds of admission controllers are available:

The validating admission webhook allows/prevents a request from changing the state
The mutating admission webhook changes the request

They run in turn as per the following diagram:

From A Guide to Kubernetes Admission Controllers

Each can solve the scenarios highlighted above.

Extensible client capabilities

At its most basic level, the kubectl command line is a high-level abstraction over a REST client. You can verify it by setting the verbose option:

kubectl get pods --v=8

loader.go:372] Config loaded from file:  /Users/nico/.kube/config
round_trippers.go:463] GET https://127.0.0.1:61378/api/v1/namespaces/default/pods?limit=500
round_trippers.go:469] Request Headers:
round_trippers.go:473]     Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json
round_trippers.go:473]     User-Agent: kubectl/v1.24.2 (darwin/arm64) kubernetes/f66044f
round_trippers.go:574] Response Status: 200 OK in 8 milliseconds
round_trippers.go:577] Response Headers:
round_trippers.go:580]     Cache-Control: no-cache, private
round_trippers.go:580]     Content-Type: application/json
round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 479e2d49-7b9f-4e4c-8fca-63c273cfb525
round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 4787583d-e7d4-4679-a474-ebb66919a43c
round_trippers.go:580]     Date: Sun, 04 Sep 2022 09:32:39 GMT
round_trippers.go:580]     Audit-Id: 2f2f163d-fb6d-4149-ba44-ecf4395028aa
request.go:1073] Response Body: {"kind":"Table","apiVersion":"meta.k8s.io/v1","metadata":{"resourceVersion":"263411"},"columnDefinitions":[{"name":"Name","type":"string","format":"name","description":"Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: http://kubernetes.io/docs/user-guide/identifiers#names","priority":0},{"name":"Ready","type":"string","format":"","description":"The aggregate readiness state of this pod for accepting traffic.","priority":0},{"name":"Status","type":"string","format":"","description":"The aggregate status of the containers in this pod.","priority":0},{"name":"Restarts","type":"string","format":"","description":"The number of times the containers in this pod have been restarted and when the last container in this pod has restarted.","priority":0},{"name":"Age","type":"st [truncated 6465 chars]

Kubernetes' REST API is (mostly?) based on CRUD operations. Sometimes, you need to run several commands to achieve the desired results. For example, we would like to query which subjects can execute an action.

kubectl includes a mechanism to write code to orchestrate these calls. The mechanism is pretty similar to Git's:

You write your code according to a specific format - a plugin
You set it in your PATH variable

From this point, kubectl can discover it.

You can manage your plugins on your machine, but this approach is not scalable to a whole organization. The solution is a plugin manager. Meet Krew:

Krew is the plugin manager for kubectl command-line tool.

Krew helps you:

discover kubectl plugins,

install them on your machine,

and keep the installed plugins up-to-date.

-- What is Krew?

Regarding which subjects can execute an action, here's how to do it:

brew install krew                              #1 
kubectl krew completion                        #2
# follow instructions to update your shell
kubectl krew update                            #3
kubectl krew install who-can                   #4
k who-can watch pod                            #5

Install brew on Mac
Display the auto-completion instructions
Update the cached list of plugins
Install the who-can Krew plugin
Enjoy!

No subjects found with permissions to watch pod assigned through RoleBindings

CLUSTERROLEBINDING                             SUBJECT                                 TYPE            SA-NAMESPACE
apisix-clusterrolebinding                      apisix-ingress-controller               ServiceAccount  ingress-apisix
cluster-admin                                  system:masters                          Group
local-path-provisioner-bind                    local-path-provisioner-service-account  ServiceAccount  local-path-storage
system:controller:attachdetach-controller      attachdetach-controller                 ServiceAccount  kube-system
system:controller:daemon-set-controller        daemon-set-controller                   ServiceAccount  kube-system
system:controller:deployment-controller        deployment-controller                   ServiceAccount  kube-system
system:controller:endpoint-controller          endpoint-controller                     ServiceAccount  kube-system
system:controller:endpointslice-controller     endpointslice-controller                ServiceAccount  kube-system
system:controller:ephemeral-volume-controller  ephemeral-volume-controller             ServiceAccount  kube-system
system:controller:generic-garbage-collector    generic-garbage-collector               ServiceAccount  kube-system
system:controller:job-controller               job-controller                          ServiceAccount  kube-system
system:controller:persistent-volume-binder     persistent-volume-binder                ServiceAccount  kube-system
system:controller:pod-garbage-collector        pod-garbage-collector                   ServiceAccount  kube-system
system:controller:pvc-protection-controller    pvc-protection-controller               ServiceAccount  kube-system
system:controller:replicaset-controller        replicaset-controller                   ServiceAccount  kube-system
system:controller:replication-controller       replication-controller                  ServiceAccount  kube-system
system:controller:resourcequota-controller     resourcequota-controller                ServiceAccount  kube-system
system:controller:statefulset-controller       statefulset-controller                  ServiceAccount  kube-system
system:coredns                                 coredns                                 ServiceAccount  kube-system
system:kube-controller-manager                 system:kube-controller-manager          User
system:kube-scheduler                          system:kube-scheduler                   User

Conclusion

In this post, we browsed through several extension points in Kubernetes: the data model, admission controllers, and client-side. It was a very brief introduction, both in width and depth. Yet, I hope that it gives a good entry point into further research.

Originally published at A Java Geek on September 18^th, 2022