Ankur Sinha

Posted on Mar 13

From client-go to controller-runtime: Rebuilding a Kubernetes Controller

#kubernetes #go #beginners #opensource

In my previous article, I built a Kubernetes controller from scratch using client-go, informers, and workqueues.

If you haven't read it yet, you can check it here:

👉 https://dev.to/ankrsinha/from-crds-to-controllers-building-a-kubernetes-custom-controller-from-scratch-3ibk

In that project, I built Mini Task Runner, a simplified Tekton-like system where a Task defines container steps and a TaskRun triggers their execution. The controller watches TaskRun resources and creates a Pod that runs those steps.

While building the controller with raw client-go primitives helped me understand how Kubernetes controllers work internally, most real-world projects such as Kubebuilder and Operator SDK use a higher-level framework called controller-runtime to build controllers. Other systems like Tekton use similar abstractions built on top of client-go.

In this post, I rebuild the same Mini Task Runner controller using controller-runtime and explore how the architecture changes compared to the manual client-go implementation.

1. Migration Motivation

Why move from client-go → controller-runtime

The client-go library provides the fundamental building blocks required to interact with the Kubernetes API:

Informers
Listers
Workqueues
Typed clients

These primitives are powerful, but they are also low-level. When writing controllers directly with client-go, developers must assemble all of these components manually.

In my first controller implementation, I had to:

Create informer factories
Attach event handlers
Maintain a rate-limited workqueue
Implement worker goroutines
Handle retry logic
Manage relationships between resources

Most of this code exists before the actual reconciliation logic even begins.

controller-runtime was introduced to simplify this process by providing higher-level abstractions. Instead of wiring controller infrastructure manually, developers can focus primarily on reconciliation logic.

This is why many production Kubernetes controllers use controller-runtime as their foundation.

Problems in Manual Controllers

Writing the controller manually exposed several pain points.

A large portion of the code was dedicated to controller infrastructure rather than business logic. Informers had to be initialized, caches needed to be synced, event handlers registered, and worker routines had to continuously process items from a workqueue.

Handling relationships between resources also required additional logic. For example, when a Pod changed state, the controller had to explicitly map that update back to the corresponding TaskRun. This often required adding labels or writing custom mapping logic.

Workqueue management was another responsibility. If reconciliation failed, the key had to be requeued using rate limiting to avoid overwhelming the API server.

None of this logic was directly related to the actual goal of the controller, which was simply:

Observe a TaskRun and ensure a Pod exists that executes the Task.

This made the controller more complex than necessary.

2. Architecture Changes

Migrating to controller-runtime significantly simplified the architecture. The controller now revolves around four core ideas:

Manager
Reconciler
Cached Client
Automatic Workqueue

Manager

In controller-runtime, everything begins with the controller manager.

The manager acts as the central runtime for controllers. It is responsible for:

starting controllers
maintaining shared caches
providing Kubernetes clients
coordinating controller lifecycle

The controller registers itself with the manager, which ensures it runs continuously.

A typical initialization looks like this:

mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
    Scheme: scheme,
})

Once started, the manager runs all registered controllers.

Controller Builder Pattern

In controller-runtime, controllers are typically registered using the Controller Builder Pattern. This pattern connects the controller with the manager and defines which resources should trigger reconciliation.

ctrl.NewControllerManagedBy(mgr).
    For(&miniv1.TaskRun{}).
    Owns(&corev1.Pod{}).
    Complete(&TaskRunReconciler{})

Here, TaskRun is the primary resource being watched by the controller. The Owns(&corev1.Pod{}) declaration tells controller-runtime to also watch Pods created by the controller. Whenever a Pod owned by a TaskRun changes state, the corresponding TaskRun is automatically enqueued for reconciliation.

The Complete() call registers the TaskRunReconciler, which contains the reconciliation logic executed for each event.

Reconciler

Instead of manually implementing worker loops, controller-runtime uses the Reconciler pattern.

The framework automatically calls a Reconcile() function whenever a relevant resource event occurs.

The reconciler receives a request containing the resource's namespace and name. Its responsibility is straightforward:

Fetch the current state of the resource
Compare it with the desired state
Apply changes to move the system toward that desired state

In the Mini Task Runner controller, reconciliation follows a simple state machine based on the TaskRun phase:

If the TaskRun is new → create a Pod
If the Pod is running → update status
If the Pod finishes → mark success or failure

The reconciler focuses purely on this logic.

Cached Client

controller-runtime provides a cached Kubernetes client.

Reads from the cluster are served from a local cache maintained by shared informers rather than directly hitting the API server.

This provides two advantages:

Reduced API server load
Faster read operations

Fetching a resource therefore looks very simple:

var tr miniv1.TaskRun
r.Get(ctx, req.NamespacedName, &tr)

The framework manages cache synchronization internally.

Automatic Workqueue

Another major difference from the client-go implementation is that the workqueue is no longer explicitly managed in the code.

controller-runtime automatically creates and manages the workqueue.

Events such as:

TaskRun creation
TaskRun updates
Pod updates

automatically enqueue reconciliation requests.

The framework also handles:

retry behavior
rate limiting
worker execution

This removes a significant amount of boilerplate code.

3. Benefits Observed

After migrating the controller, several improvements became clear.

Reduced Boilerplate

Most of the setup code required for informers, queues, and workers disappeared.

In the previous controller, a large portion of the code was dedicated to wiring infrastructure components. With controller-runtime, the controller setup became much shorter and easier to read.

Built-in Retries

controller-runtime automatically retries reconciliation when errors occur.

If the Reconcile function returns an error, the request is requeued with exponential backoff.

This ensures the controller remains resilient without explicitly writing retry logic.

Cleaner Design

The controller structure becomes easier to reason about.

Instead of thinking about informers, worker threads, and workqueues, the developer focuses on reconciliation — comparing desired state with actual state and applying the necessary changes.

Easier Ownership Handling

controller-runtime provides utilities for managing ownership relationships between resources.

For example, when creating a Pod for a TaskRun, the Pod can be set as a child resource using:

ctrl.SetControllerReference(tr, pod, r.Scheme)

This automatically enables:

garbage collection of Pods when the TaskRun is deleted
reconciliation when the Pod status changes

Maintainability

With less infrastructure code and clearer separation of responsibilities, the controller becomes easier to extend and maintain.

Future improvements can be implemented by modifying the reconciliation logic rather than changing controller wiring.

4. Challenges Faced

Although controller-runtime simplifies controller development, there were still a few areas that required careful attention.

Status Updates

Updating the status field of a custom resource must be done using the status subresource API.

Instead of using the normal client update, the controller must call:

r.Status().Update(ctx, tr)

This distinction is important because Kubernetes treats spec and status updates differently.

Cache Behavior

Because the client reads from a local cache, the result of a write operation might not immediately appear in subsequent reads.

For example, right after creating a Pod, the cache might not yet contain that object.

To handle this safely, reconciliation logic must be idempotent, meaning the controller should behave correctly even if the same operation is attempted multiple times.

Requeue Logic

Some situations require the controller to revisit an object after a short delay.

In the Mini Task Runner, while a Pod is running, the controller periodically rechecks its status.

controller-runtime supports this using:

return ctrl.Result{RequeueAfter: 5 * time.Second}, nil

Choosing when to rely on events versus explicit requeueing required some experimentation.

Debugging Reconciliation

Because reconciliation is event-driven and asynchronous, debugging sometimes requires adding detailed logs to understand when and why reconciliation is triggered.

Structured logging helped trace the lifecycle of a TaskRun.

5. Final Outcome

After completing the migration, the controller showed several improvements compared to the original implementation.

Performance

Using cached clients reduces the number of direct API server calls. This improves scalability and ensures the controller behaves efficiently as the number of resources grows.

Simplicity

The controller code became significantly shorter and easier to understand.

Most of the complexity related to informers and workqueues is now handled by controller-runtime, allowing the code to focus primarily on business logic.

Production Readiness

controller-runtime follows patterns widely used in modern Kubernetes controllers.

After rebuilding the Mini Task Runner controller using this framework, the system aligns more closely with how real-world Kubernetes operators are implemented.

The controller was also containerized and pushed to GitHub Container Registry (GHCR). The image was then deployed inside the cluster using a standard Kubernetes Deployment.

When running inside the cluster, the controller uses Kubernetes in-cluster configuration instead of a local kubeconfig file. This allows the controller to communicate with the API server using the service account mounted inside the Pod.

Because the controller needs to create Pods and update TaskRun resources, appropriate RBAC permissions were defined using a ServiceAccount, ClusterRole, and ClusterRoleBinding. This ensures the controller has only the permissions required to reconcile resources inside the cluster.

Want to Try the controller-runtime Version?

This article focuses on the architectural changes introduced by controller-runtime. If you want to explore the implementation or run it yourself, you can find the code in the same repository.

The controller-runtime version of Mini Task Runner is available in the fork2 branch:

👉 https://github.com/ankrsinha/mini-task/tree/fork2

The repository contains instructions for:

building the controller
containerizing it with Docker
pushing the image to GitHub Container Registry (GHCR)
deploying the controller in a Kubernetes cluster

Closing Thoughts

Building the first controller with raw client-go helped me understand the mechanics behind Kubernetes controllers: informers, caches, workqueues, and worker loops.

Migrating the same controller to controller-runtime showed how those mechanics can be abstracted into a cleaner and more maintainable framework.

Both approaches are valuable learning experiences. But if the goal is to build controllers that resemble those used in production Kubernetes projects, controller-runtime provides a much more practical starting point.

Authors

Ankur Sinha

Aditya Shinde

DEV Community