Ankur Sinha

Posted on Feb 24

From CRDs to Controllers: Building a Kubernetes Custom Controller from Scratch

#devops #opensource #kubernetes #go

If you’ve spent any time in the cloud-native ecosystem, you’ve likely used tools like Tekton, Argo Workflows, or Crossplane. They feel like magic: you define some custom configurations, apply them, and suddenly Kubernetes is orchestrating complex CI/CD pipelines or provisioning cloud infrastructure.

But have you ever wondered how that magic actually works?

Recently, I decided to peel back the curtain and build Mini Task Runner—a simplified, Tekton-like task execution system built from scratch. In this post, we are going to dive deep into Kubernetes extensibility. We’ll explore why Custom Resources exist, how they interact with etcd and the Kube API Server, why code generation is mandatory, and how to write a production-grade, event-driven Controller.

You can find the full source code for this project on my GitHub: ankrsinha/mini-task.

1. The Extensibility Problem: Going Beyond Pods and Deployments

Out of the box, Kubernetes understands a specific set of "Standard Resources": Pods, Deployments, Services, ReplicaSets, etc. When you submit a Deployment to the API server, it stores that configuration in etcd (the cluster's key-value datastore). Built-in controllers (like the Deployment Controller) watch etcd for these changes and react by spinning up Pods.

But what if you want Kubernetes to understand a CI/CD Pipeline? Or a custom Database Cluster?

Kubernetes provides an extensibility mechanism called Custom Resource Definitions (CRDs). A CRD allows you to define your own API schema.

For the Mini Task Runner, I needed two CRDs:

Task (The Template): A declarative list of steps, where each step defines a specific container image and a script to run.
TaskRun (The Invocation): A trigger that tells Kubernetes to execute a specific Task. It includes a reference to the Task and a status section to track its current phase.

Here is the catch: If you just apply a CRD to a cluster, Kubernetes will happily accept your custom definitions, validate them, and save them in etcd. But nothing will happen. CRDs are just inert data. To bring them to life, you need a Custom Controller. A controller is the active "brain" that watches the API Server for changes to your Custom Resources and takes action to reconcile the desired state with the actual state.

2. The Code Generation Hurdle: Why We Need It

Before we can write a controller in Go, our application needs to understand our custom API types. Since our Task and TaskRun types don't exist in the standard Kubernetes libraries, we have to generate the plumbing.

In Kubernetes, every object must implement a specific interface which requires a method to safely create deep copies of the object in memory. Writing these deep copy functions manually for every nested struct is a nightmare. Furthermore, we need a way to communicate with the Kubernetes API server using strictly typed clients.

To solve this, we rely on Kubernetes code-generation toolkits. By defining our Go structs and adding specific annotations, the generator automatically produces:

DeepCopy Methods: Safely clones our custom objects.
A Typed Clientset: A custom client strictly typed for our newly created API group.
Informers and Listers: Essential components for building an event-driven architecture.

Now, our program is equipped with two clients: a core client to talk to the Kube API Server for standard resources (like creating Pods), and a custom client to manage our Task and TaskRun resources.

3. Building the Brains: The Controller Architecture

The Naive Approach: Polling

A naive way to build a controller is through polling. You could write a loop that queries the API Server every 5 seconds, essentially asking: "Give me all TaskRuns. Now give me all Pods. Do they match?"

Polling is an anti-pattern. It crushes the API server under massive load and introduces high latency. etcd and the API server simply aren't designed to be constantly hammered with List requests.

The Idiomatic Approach: Event-Driven Architecture

Instead, production-grade Kubernetes controllers use an Event-Driven Architecture built on four core components: Informers, Listers, Workqueues, and Workers.

How It Works:

Informers: Instead of polling, an Informer opens a long-lived HTTP connection (a WATCH request) to the Kube API Server. Whenever a TaskRun or Pod is Added, Updated, or Deleted in etcd, the API server instantly pushes an event to the Informer.
Listers (Local Cache): The Informer maintains a local, in-memory cache of the cluster state. When our controller needs to read a TaskRun, it queries the Lister's local cache instead of hitting the API server over the network. This makes reads practically instant and zero-cost.
Workqueue: When an Informer receives an event, it extracts the identifier (the key) and pushes it into a Rate-Limited Workqueue. This queue acts as a buffer, ensuring that if a resource gets updated rapidly, we process it efficiently without overwhelming the system.
Workers: These are the relentless background engines (running concurrently as Go routines). They continuously poll the Workqueue for new keys and hand them over to the reconciliation logic. Having multiple workers allows the controller to process many cluster events simultaneously.

In our code, we simply wire up event handlers so that whenever a resource is modified, the relevant function pushes the item's key into our queue, ready for the next available worker to pick it up.

4. The Heartbeat: The Reconciliation Loop

The core processing of our controller is driven by the Workers. A worker constantly pulls a key off the Workqueue and passes it to our reconciliation function. Reconciliation is level-triggered, meaning it doesn't care what specific event just happened; it only looks at the current state of the cluster and decides what to do next to reach the desired state.

Here is the step-by-step logic of the Mini Task Runner reconciliation loop:

Fetch the State
The worker retrieves the TaskRun name from the queue and fetches the full object from the local Lister cache.
Check the Phase
- If the Phase is Succeeded or Failed, no action is taken.
- If the Phase is empty (indicating a new run):
  - The referenced Task is fetched.
  - Task steps are converted into a list of container definitions.
  - A Pod is created with restartPolicy: Never.
  - The Phase is set to Pending.
- If the Phase is Pending or Running:
  - The execution Pod is checked.
Sync Pod Status to TaskRun Status
- The Pod is fetched from the cache.
- If the Pod is running:
  - The Phase is set to Running.
  - The start time is recorded.
- If the Pod has succeeded:
  - The Phase is set to Succeeded.
  - The finish time is recorded.

Once the worker completes this reconciliation pass, it signals the queue that the item is done. If an error occurs (like a network blip), the worker doesn't just crash—it safely places the key back into the Workqueue with an exponential backoff. This ensures we retry later without flooding an already struggling API server.

By updating the TaskRun status using our custom client, we push the new state back to the API Server and etcd, completing the loop.

5. The Developer Experience: A Custom Kubectl Plugin

Writing YAML files manually every time you want to execute a script is tedious. Great tools like Tekton come with great CLI companions.

Because kubectl is highly extensible by design, if you put an executable file named kubectl-<command> into your system's path, kubectl treats it as a native command.

To streamline the user experience, I built a small binary that takes a Task name as an argument, generates a TaskRun object with a unique identifier, and pushes it directly to the cluster.

By compiling this and adding it to the system path, triggering a pipeline becomes as simple as running a standard kubectl command. The moment the command is executed, our custom controller instantly sees the new object, a worker pulls it from the queue, spins up the Pod, and begins tracking its execution.

Want to Build or Run This Yourself?

This post covers the high-level architecture, but if you want to get your hands dirty, I've written a comprehensive step-by-step guide in the project repository.

Head over to the ankrsinha/mini-task README where you will find:

Complete installation instructions for the CRDs.
Commands to run the code-generator.
A quickstart guide to running the custom kubectl task start plugin on your own cluster.

Final Thoughts

Building Mini Task Runner completely demystified the Kubernetes control plane for me. Here are the major takeaways:

CRDs are just databases without an engine: Extensibility requires both the data schema (the CRD) and the logic (the Custom Controller).
Code Generation is a superpower: Generating clients, informers, and deep-copies saves thousands of lines of boilerplate and ensures your components talk to the API Server safely.
Informers & Workers > Polling: The event-driven architecture using Informers, local caches, workqueues, and concurrent workers is what allows Kubernetes to scale massively without bringing etcd to a halt.

If you want to transition from a Kubernetes user to a Kubernetes developer, I highly recommend building your own operator or controller.

Authors

Ankur Sinha

Aditya Shinde

DEV Community