DEV Community

Cover image for Writing a kubernetes controller in Go with kubebuilder
Ishan Khare
Ishan Khare

Posted on

Writing a kubernetes controller in Go with kubebuilder

In this post we'll implement a simple kubernetes controller using the kubebuilder

Installing kubebuilder

The installation instructions can be found on kubebuilder installation docs

Install kustomize

Since kubebuilder internally relies on kustomize, hence we need to install that from the instructions here

Now we're all set up to start building our controller


Start the project

I have initialized an empty go project

go mod init cnat
Enter fullscreen mode Exit fullscreen mode

Next we'll initialize the controller project

kubebuilder init --domain ishankhare.dev
Enter fullscreen mode Exit fullscreen mode

Now we'll ask kubebuilder to setup all the necessary scaffolding for our project

kubebuilder create api --group cnat --version v1alpha1 --kind At
Enter fullscreen mode Exit fullscreen mode

Kubebuilder will ask us whether to create Resource and Controller directories, to which we can answer y each, since we want these to be created

Create Resource [y/n]
y
Create Controller [y/n]
y
Writing scaffold for you to edit...
Enter fullscreen mode Exit fullscreen mode

Test install scaffold

If we now run make install, kubebuilder should generate the base CRDs under config/crd/bases and a few other files for us.

Running make run should now allow us to launch the operator locally. This is really helpful for local testing while we will be implementing and debugging the business logic of our operator code. The live logs will give us insights about how our codebase changes reflect to what's happening with the operator.

Enable status subresource

In the file api/v1alpha1/at_types.go add the following comment above At struct.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// At is the Schema for the ats API
type At struct {
.
.
.
Enter fullscreen mode Exit fullscreen mode

This will enable us to use privilege separation between spec and status fields.

Background

A little background on what we're going to build here. Our controller is basically a 'Cloud Native At' (cnat) for short – a cloud native version of the Unix's at command. We provide a CRD manifest containing 2 main things:

  1. A command to execute
  2. Time to execute the command

Our controller will basically wait and watch till the appropriate time, then would spawn a Pod and run the given command in that Pod. All this will it will also keep updating the Status of that CRD which we can view with kubectl


So let's begin

CRD outline

This section basically describes how we want our CRD to look like. Given the two points we listed above, we can basically represent them as:

apiVersion: cnat.ishankhare.dev/v1alpha1
kind: At
metadata:
  name: at-sample
spec:
  schedule: "2020-11-16T10:12:00Z"
  command: "echo hello world!"
Enter fullscreen mode Exit fullscreen mode

This is enough information for our controller to decide 'when' and 'what' to execute.

Defining the CRD

Now that we know what we expect out of the CRD let's implement that. Goto the file api/v1alpha1/at_types.go and add the following fields to the code:

const (
    PhasePending = "PENDING"
    PhaseRunning = "RUNNING"
    PhaseDone    = "DONE"
)
Enter fullscreen mode Exit fullscreen mode

Also we edit the existing AtSpec and AtStatus structs as follows:

type AtSpec struct {
  Schedule string `json:"schedule,omitempty"`
  Command  string `json:"command,omitempty"`
}

type AtStatus struct {
  Phase string `json:"phase,omitempty"`
}
Enter fullscreen mode Exit fullscreen mode

We save this as we run

$ make manifests
go: creating new go.mod: module tmp
go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5
/Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases

$ make install    # this will install the generated CRD into k8s
Enter fullscreen mode Exit fullscreen mode

We should now be able to see an exhaustive CustomResourceDefinition generated for us at config/crd/bases/cnat.ishankhare.dev_ats.yaml. For people new to CRD's, they will basically help kubernetes identify what our kind: At means. Unlike builtin resource types like Deployments, Pods etc. we need to define custom types for kubernetes through a CustomResourceDefinition. Let's try to create an At resource:

cat >sample-at.yaml<<EOF
apiVersion: cnat.ishankhare.dev/v1alpha1
kind: At
metadata:
  name: at-sample
spec:
  schedule: "2020-11-16T10:12:00Z"
  command: "echo hello world!"
EOF
Enter fullscreen mode Exit fullscreen mode

If we now do a kubectl apply -f to the previously shown CRD:

kubectl apply -f sample-at.yaml
kubectl get at
NAME        AGE
at-sample   124m
Enter fullscreen mode Exit fullscreen mode

So now kubernetes recognizes our custom type, but it still does NOT know what to do with it. The logic part of the controller is still missing, but we have the skeleton ready to start writing our logic around it.


Implementing the Controller logic

Now we're coming to the fun part – implementing the logic for our operator. This will basically tell kubernetes what to do with our generated CRD. As few say its like "making kubernetes do tricks!".

Most of the scaffolding as been already generated for us by kubebuilder in main.go and controllers/. Inside the controllers/at_controller.go file we have the function Reconcile. Consider this is the reconcile loop's body.

A small refresher – a controller basically runs a periodic loop over the said objects and watches for any changes to those objects. When changes happen the loop body is triggered and the relevant logic can be executed. This function Reconcile is that logical piece, the entry-point for all our controller logic.

Let's start adding a basic structuring around it:

func (r *AtReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
    reqLogger := r.Log.WithValues("at", req.NamespacedName)
    reqLogger.Info("=== Reconciling At")

    instance := &cnatv1alpha1.At{}
    err := r.Get(context.TODO(), req.NamespacedName, instance)
    if err != nil {
        if errors.IsNotFound(err) {
            // object not found, could have been deleted after
            // reconcile request, hence don't requeue
            return ctrl.Result{}, nil
        }

        // error reading the object, requeue the request
        return ctrl.Result{}, err
    }

    // if no phase set, default to Pending
    if instance.Status.Phase == "" {
        instance.Status.Phase = cnatv1alpha1.PhasePending
    }

    // state transition PENDING -> RUNNING -> DONE


    return ctrl.Result{}, nil
}
Enter fullscreen mode Exit fullscreen mode

Here we basically create a logger and an empty instance of the At object, which will allow use to query kubernetes for existing At objects and read/write to their status etc. The function is supposed to return the result of the reconciliation logic and an error. The ctrl.Result object can optionally tell the reconciler to either Requeue immediately (bool) or RequeueAfter with a time.Duration – again these are optional.

Now lets come to the important part, the state diagram. The image below explains what basically happens when we create a new At:

State diagram

Now, lets start implementing this logic in the function:

...
// state transition PENDING -> RUNNING -> DONE
switch instance.Status.Phase {
    case cnatv1alpha1.PhasePending:
        reqLogger.Info("Phase: PENDING")

        diff, err := schedule.TimeUntilSchedule(instance.Spec.Schedule)
        if err != nil {
            reqLogger.Error(err, "Schedule parsing failure")

            return ctrl.Result{}, err
        }

        reqLogger.Info("Schedule parsing done", "Result", fmt.Sprintf("%v", diff))

        if diff > 0 {
            // not yet time to execute, wait until scheduled time
            return ctrl.Result{RequeueAfter: diff * time.Second}, nil
        }

        reqLogger.Info("It's time!", "Ready to execute", instance.Spec.Command)
        // change state
        instance.Status.Phase = cnatv1alpha1.PhaseRunning
Enter fullscreen mode Exit fullscreen mode

The schedule.TimeUntilSchedule function is simple to implement like so:

package schedule

import "time"

func TimeUntilSchedule(schedule string) (time.Duration, error) {
    now := time.Now().UTC()
    layout := "2006-01-02T15:04:05Z"
    scheduledTime, err := time.Parse(layout, schedule)
    if err != nil {
        return time.Duration(0), err
    }

    return scheduledTime.Sub(now), nil
}
Enter fullscreen mode Exit fullscreen mode

The next case in our switch body is for RUNNING phase. This is the most complicated one and I've added comments wherever I can to explain it better.

case cnatv1alpha1.PhaseRunning:
        reqLogger.Info("Phase: RUNNING")

        pod := spawn.NewPodForCR(instance)
        err := ctrl.SetControllerReference(instance, pod, r.Scheme)
        if err != nil {
            // requeue with error
            return ctrl.Result{}, err
        }

        query := &corev1.Pod{}
        // try to see if the pod already exists
        err = r.Get(context.TODO(), req.NamespacedName, query)
        if err != nil && errors.IsNotFound(err) {
            // does not exist, create a pod
            err = r.Create(context.TODO(), pod)
            if err != nil {
                return ctrl.Result{}, err
            }

            // Successfully created a Pod
            reqLogger.Info("Pod Created successfully", "name", pod.Name)
            return ctrl.Result{}, nil
        } else if err != nil {
            // requeue with err
            reqLogger.Error(err, "cannot create pod")
            return ctrl.Result{}, err
        } else if query.Status.Phase == corev1.PodFailed ||
            query.Status.Phase == corev1.PodSucceeded {
            // pod already finished or errored out`
            reqLogger.Info("Container terminated", "reason", query.Status.Reason,
                "message", query.Status.Message)
            instance.Status.Phase = cnatv1alpha1.PhaseDone
        } else {
            // don't requeue, it will happen automatically when
            // pod status changes
            return ctrl.Result{}, nil
        }
Enter fullscreen mode Exit fullscreen mode

We set the ctrl.SetControllerReference that basically tells kubernetes runtime that the created Pod is "owned" by this At instance. This is later going to come handy for us when watching on resources created by our Controller.
We also use another external function here spawn.NewPodForCR which is basically responsible for spawning the new Pod for our At custom resource spec. This is implemented as follows:

package spawn

import (
    cnatv1alpha1 "cnat/api/v1alpha1"
    "strings"

    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func NewPodForCR(cr *cnatv1alpha1.At) *corev1.Pod {
    labels := map[string]string{
        "app": cr.Name,
    }

    return &corev1.Pod{
        ObjectMeta: metav1.ObjectMeta{
            Name:      cr.Name,
            Namespace: cr.Namespace,
            Labels:    labels,
        },
        Spec: corev1.PodSpec{
            Containers: []corev1.Container{
                {
                    Name:    "busybox",
                    Image:   "busybox",
                    Command: strings.Split(cr.Spec.Command, " "),
                },
            },
            RestartPolicy: corev1.RestartPolicyOnFailure,
        },
    }
}
Enter fullscreen mode Exit fullscreen mode

The last bits of code deal with the DONE phase and the default case of switch. We also do the updation of our status subresource outside the switch as a common part of code:

case cnatv1alpha1.PhaseDone:
        reqLogger.Info("Phase: DONE")
        // reconcile without requeuing
        return ctrl.Result{}, nil
    default:
        reqLogger.Info("NOP")
        return ctrl.Result{}, nil
    }

    // update status
    err = r.Status().Update(context.TODO(), instance)
    if err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
Enter fullscreen mode Exit fullscreen mode

Lastly we utilize the SetControllerReference by adding the following to the SetupWithManager function in the same file:

err := ctrl.NewControllerManagedBy(mgr).
        For(&cnatv1alpha1.At{}).
        Owns(&corev1.Pod{}).
        Complete(r)

    if err != nil {
        return err
    }

    return nil
Enter fullscreen mode Exit fullscreen mode

The part specifying Owns(&corev1.Pod{}) is important and tells the controller manager that pods created by this controller also needs to be watched for changes.

If you want to have a look at the entire final code implementation, head over the this github repo – ishankhare07/kubebuilder-controller.
With this in place, we are now done with implementation of our controller, we can run and test it on a cluster in our current kube context using make run

go: creating new go.mod: module tmp
go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5
/Users/ishankhare/godev/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
/Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go run ./main.go
2020-11-18T05:05:45.531+0530    INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
2020-11-18T05:05:45.531+0530    INFO    setup   starting manager
2020-11-18T05:05:45.531+0530    INFO    controller-runtime.manager      starting metrics server {"path": "/metrics"}
2020-11-18T05:05:45.531+0530    INFO    controller-runtime.controller   Starting EventSource    {"controller": "at", "source": "kind source: /, Kind="}
2020-11-18T05:05:45.732+0530    INFO    controller-runtime.controller   Starting EventSource    {"controller": "at", "source": "kind source: /, Kind="}
2020-11-18T05:05:46.232+0530    INFO    controller-runtime.controller   Starting Controller     {"controller": "at"}
2020-11-18T05:05:46.232+0530    INFO    controller-runtime.controller   Starting workers        {"controller": "at", "worker count": 1}
2020-11-18T05:05:46.232+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:46.232+0530    INFO    controllers.At  Phase: PENDING  {"at": "cnat/at-sample"}
2020-11-18T05:05:46.232+0530    INFO    controllers.At  Schedule parsing done   {"at": "cnat/at-sample", "Result": "-14053h23m46.232658s"}
2020-11-18T05:05:46.232+0530    INFO    controllers.At  It's time!      {"at": "cnat/at-sample", "Ready to execute": "echo YAY!"}
2020-11-18T05:05:46.436+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:46.443+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:46.443+0530    INFO    controllers.At  Phase: RUNNING  {"at": "cnat/at-sample"}
2020-11-18T05:05:46.718+0530    INFO    controllers.At  Pod Created successfully        {"at": "cnat/at-sample", "name": "at-sample"}
2020-11-18T05:05:46.718+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:46.722+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:46.722+0530    INFO    controllers.At  Phase: RUNNING  {"at": "cnat/at-sample"}
2020-11-18T05:05:46.722+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:46.778+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:46.778+0530    INFO    controllers.At  Phase: RUNNING  {"at": "cnat/at-sample"}
2020-11-18T05:05:46.778+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:46.846+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:46.847+0530    INFO    controllers.At  Phase: RUNNING  {"at": "cnat/at-sample"}
2020-11-18T05:05:46.847+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:49.831+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:49.831+0530    INFO    controllers.At  Phase: RUNNING  {"at": "cnat/at-sample"}
2020-11-18T05:05:49.831+0530    INFO    controllers.At  Container terminated    {"at": "cnat/at-sample", "reason": "", "message": ""}
2020-11-18T05:05:50.054+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
2020-11-18T05:05:50.056+0530    INFO    controllers.At  === Reconciling At      {"at": "cnat/at-sample"}
2020-11-18T05:05:50.056+0530    INFO    controllers.At  Phase: DONE     {"at": "cnat/at-sample"}
2020-11-18T05:05:50.056+0530    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
Enter fullscreen mode Exit fullscreen mode

If you're interested to see the output command we can run:

$ kubectl logs -f at-sample
hello world!
Enter fullscreen mode Exit fullscreen mode

This post was first published on my personal blog ishankhare.dev

Top comments (0)