mark mwendia

Posted on Oct 4, 2024

Building a Custom Kubernetes Operator in Go: A Step-by-Step Guide

Kubernetes has become a critical tool for managing containerized applications. However, as applications and workloads scale, manual operations like managing stateful applications or complex workflows can become challenging. Kubernetes Operators are a concept designed to address these complexities by enabling automation for custom applications in a Kubernetes-native way.

An operator extends Kubernetes with domain-specific knowledge and best practices to automate tasks like backups, scaling, upgrades, and failovers. Operators do this by continuously monitoring the state of a system and taking automated actions based on the current and desired state, making them an excellent tool for managing complex workloads like databases, monitoring systems, or any domain-specific tasks that require complex logic.

In this tutorial, you’ll learn how to build a custom Kubernetes operator using Go. We’ll create a simple operator that automates a custom resource type, which will allow us to manage a specialized resource via Kubernetes. The steps covered will show you how to define Custom Resource Definitions (CRDs), implement control loops, and handle resource reconciliation with real-life code examples. By the end, you'll have built an operator that automates the creation and management of a custom resource in Kubernetes.

Prerequisites

Before we jump into building the operator, ensure that you have the following tools and skills at your disposal. Operators interact deeply with Kubernetes internals, so some prior knowledge of Kubernetes and Go is required.

Kubernetes Cluster: You will need a working Kubernetes cluster. You can use a cloud provider like Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), or Azure Kubernetes Service (AKS), or you can use Minikube to create a local cluster. Familiarity with Kubernetes objects like Pods, Deployments, Services, and how the Kubernetes API works is essential.
kubectl CLI: kubectl is the command-line tool used to interact with the Kubernetes API server. It allows you to inspect the cluster's state, manipulate resources, and apply configurations. You can install it from the Kubernetes documentation. Once installed, verify your cluster connection with:

kubectl get nodes

Programming Language: Operators in Kubernetes are written in Go, thanks to its concurrency model and native support for creating highly performant controllers. You will need Go installed (version 1.18+ recommended). Install Go from Go's official website and verify the installation:

go version

Kubebuilder: Kubebuilder is a powerful framework that scaffolds operator projects. It reduces the complexity of building Kubernetes APIs and operators by providing a structured, opinionated way to build these tools. You can install Kubebuilder by running:

curl -L https://go.kubebuilder.io/dl/latest/linux/amd64 | tar -xz -C /usr/local/

Docker: Operators run inside Kubernetes as containerized applications. You will need Docker to package and deploy your operator into a container. Docker allows you to build, manage, and deploy container images. Install Docker here, and confirm the installation:

docker --version

controller-runtime Library: Kubebuilder relies on the controller-runtime library, which simplifies building Kubernetes controllers by abstracting common operations. Kubebuilder automatically adds this dependency to your project, but it’s worth familiarizing yourself with how this library works as it will be central to your operator’s functionality.

Once you’ve ensured you have all the tools installed, we can move on to building the operator.

Step 1: Setting up the Environment

The first step in creating a Kubernetes operator is setting up the project environment. We will use Kubebuilder to scaffold the initial project structure.

Create a New Project Directory: Start by creating a project directory to house your operator code. Inside this directory, you will have various files, such as the CRD definitions, the controller logic, and the necessary Kubernetes configurations.

mkdir custom-operator && cd custom-operator

Initialize the Operator Project: The kubebuilder init command scaffolds the base structure of an operator project. This structure includes directories for API definitions, controller logic, and configuration files. The command also sets up Go modules for dependency management.

kubebuilder init --domain mydomain.com --repo github.com/your-username/custom-operator

This command initializes the project with your specified domain (mydomain.com in this case) and sets the repository path for the operator. Once you run this command, Kubebuilder creates the following important directories:

api/: This directory will store the custom resource definitions (CRDs) and related Go types.
controllers/: This will hold the controller logic, where you’ll implement how the operator manages resources.
config/: Here, you’ll find Kubernetes manifest files for deploying the operator and other necessary configurations.

Step 2: Defining a Custom Resource (CRD)

Custom Resource Definitions (CRDs) allow you to define new resource types in Kubernetes. These are used by the operator to create and manage custom objects, which are not natively supported by Kubernetes. In this step, we’ll create a custom resource named CustomResource.

Create an API Resource: To create a custom resource, you need to scaffold an API with Kubebuilder. The create api command automatically generates the Go code that defines the API group, version, and kind of your custom resource.

kubebuilder create api --group sample --version v1 --kind CustomResource

When prompted, answer "yes" to both creating the resource and the controller. This will generate the necessary code for the CRD and the controller to manage it.

Define the Custom Resource Structure: Navigate to the file api/v1/customresource_types.go. Here, you will define the structure of your custom resource, which describes the fields Kubernetes will expect when the user creates an instance of the resource. You’ll define the Spec (desired state) and Status (current state).

type CustomResourceSpec struct {
    // Schedule defines when the job should run
    Schedule string `json:"schedule,omitempty"`

    // Replicas specifies how many replicas should be created
    Replicas int32 `json:"replicas,omitempty"`
}

type CustomResourceStatus struct {
    // AvailableReplicas is the number of replicas that are currently running
    AvailableReplicas int32 `json:"availableReplicas,omitempty"`
}

Schedule: The schedule field allows users to specify when the custom resource should take some action, like a cron job.
Replicas: This field specifies how many replicas should be managed by the operator. It’s a simple yet powerful mechanism to scale applications.
Status: This field allows the operator to track the current state of the resource, such as the number of replicas currently running.

Generate the CRD Manifests: After defining the custom resource structure, run the following commands to generate the CRD YAML manifests:

make generate
make manifests

This will create the CRD manifests under config/crd/ directory, which Kubernetes uses to register the custom resource type in the cluster.

Step 3: Implementing the Controller Logic

The controller is the core logic of your operator. It is responsible for continuously monitoring the state of custom resources and ensuring that the actual state matches the desired state described in the CRD. In this section, we will implement the controller for our custom resource.

Open the Controller File: Open the file controllers/customresource_controller.go. Inside this file, you’ll find the Reconcile function, which is the heart of the controller logic. This function is executed whenever Kubernetes detects a change in the custom resource or its associated resources.
Understanding the Reconcile Function: The Reconcile function retrieves the current state of a custom resource and compares it to the desired state defined in the Spec. Based on this comparison, the operator takes corrective actions to align the actual state with the desired state. Here’s a simple example of how the reconciliation process can be implemented:

func (r *CustomResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // Fetch the CustomResource instance
    customResource := &samplev1.CustomResource{}
    if err := r.Get(ctx, req.NamespacedName, customResource); err != nil {
        log.Error(err, "unable to fetch CustomResource")
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Get the desired number of replicas from the Spec
    desiredReplicas := customResource.Spec.Replicas

    // Log the schedule and replicas
    log.Info("Reconciling CustomResource", "Replicas", desiredReplicas, "Schedule", customResource.Spec.Schedule)

    // Update the status to reflect the number of available replicas
    customResource.Status.AvailableReplicas = desiredReplicas
    if err := r.Status().Update(ctx, customResource); err != nil {
        log.Error(err, "unable to update CustomResource status")
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

In this example:

The controller fetches the current instance of the CustomResource.
It logs the desired Replicas and Schedule from the custom resource's Spec.
It updates the Status of the resource to reflect the actual number of available replicas.

Registering the Controller with the Manager: In the controllers/customresource_controller.go file, register the controller with the manager by ensuring that the following code is present:

func (r *CustomResourceReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&samplev1.CustomResource{}).
        Complete(r)
}

The manager is responsible for initializing and running the controller. It listens for changes to custom resources and triggers the Reconcile function when necessary.

Step 4: Packaging the Operator in a Container

Now that you’ve written the controller logic, the next step is to package your operator as a Docker container. The operator will run as a Kubernetes deployment, so it needs to be containerized.

Building the Docker Image: First, build a Docker image for your operator. You will find a Dockerfile in the project directory, which defines how the image should be built. Use the following command to build the image locally:

make docker-build IMG=<your-docker-image>

Replace with the repository and tag where the image should be stored (e.g., dockerhubuser/custom-operator:v1).

Push the Image to a Registry: Once the image is built, push it to a container registry like Docker Hub or Google Container Registry (GCR):

bash
docker push <your-docker-image>
This step is crucial because Kubernetes will pull the image from this registry when deploying the operator.

Deploying the Operator: With the image pushed, deploy the operator to the Kubernetes cluster by running:

make deploy IMG=<your-docker-image>

This command will create a Kubernetes deployment for the operator. The deployment includes the operator’s container, which continuously runs the reconciliation loop to monitor and manage the custom resource.

Step 5: Testing the Operator

With the operator running in your Kubernetes cluster, you can test its functionality by creating instances of the custom resource and observing the operator’s behavior.

Create a Custom Resource Instance: Write a YAML file, customresource.yaml, that defines an instance of the custom resource:

apiVersion: sample.mydomain.com/v1
kind: CustomResource
metadata:
  name: example-cr
spec:
  schedule: "* * * * *"
  replicas: 3

This YAML file creates a custom resource with a schedule that runs every minute and specifies three replicas.

Apply the Custom Resource: Use kubectl to apply the custom resource to your cluster:

kubectl apply -f customresource.yaml

Verifying the Operator’s Behavior: Once the custom resource is created, the operator should pick it up, reconcile the state, and log the desired number of replicas and the schedule. You can check the logs of the operator to verify this:

kubectl logs -n my-namespace deployment/custom-operator

In the logs, you should see messages that indicate the desired number of replicas and the schedule, as defined in the custom resource.

Step 6: Extending the Operator

At this point, you have a fully functional operator, but its functionality can be extended in numerous ways. Below are some potential enhancements you can implement to create a more robust operator:

Job Scheduling: Implement job scheduling using Go's cron libraries (such as robfig/cron). This could be used to automate scheduled tasks, such as periodic backups, scaling operations, or database maintenance.

c := cron.New()
c.AddFunc("@every 1h", func() {
    fmt.Println("Executing scheduled task")
})
c.Start()

Handling External Services: Your operator could manage external services, like databases, by automatically provisioning resources based on the custom resource’s specifications. For example, if the custom resource defines a database, the operator could trigger a Helm chart installation to deploy that database automatically.
Advanced Reconciliation Logic: Instead of a simple reconciliation loop, you could implement more advanced reconciliation logic, such as monitoring the health of external services, scaling based on external metrics, or even handling failure recovery.
Operator Metrics: Expose custom Prometheus metrics for your operator to track performance, resource usage, and error rates. This is particularly useful in production environments where monitoring the health of the operator itself is essential.

Conclusion

Building a Kubernetes operator in Go using Kubebuilder is a powerful way to extend Kubernetes’ functionality and automate complex operations for your custom applications. In this tutorial, you learned how to define a custom resource, implement a controller, and deploy an operator in Kubernetes. With this foundation, you can extend your operator to automate more complex workflows, integrate external services, and make your Kubernetes environment even more dynamic and responsive.

As you grow your operator, consider adding automated testing, CI/CD pipelines, and monitoring to ensure that it functions reliably in production environments. The operator pattern is a core component of the Kubernetes ecosystem and provides an efficient way to encapsulate human knowledge into automated operations that work seamlessly in cloud-native environments.