<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Steven Sklar</title>
    <description>The latest articles on DEV Community by Steven Sklar (@sklarsa).</description>
    <link>https://dev.to/sklarsa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F10838%2Fb37e9361-f18b-45e8-aa4d-30c80118f548.jpeg</url>
      <title>DEV Community: Steven Sklar</title>
      <link>https://dev.to/sklarsa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sklarsa"/>
    <language>en</language>
    <item>
      <title>How do Kubernetes Operators Handle Concurrency?</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Wed, 09 Oct 2024 15:17:08 +0000</pubDate>
      <link>https://dev.to/sklarsa/how-do-kubernetes-operators-handle-concurrency-47n5</link>
      <guid>https://dev.to/sklarsa/how-do-kubernetes-operators-handle-concurrency-47n5</guid>
      <description>&lt;p&gt;Originally posted &lt;a href="https://sklar.rocks/kubernetes-operator-concurrency/" rel="noopener noreferrer"&gt;on my blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By default, operators built using &lt;a href="https://book.kubebuilder.io/" rel="noopener noreferrer"&gt;Kubebuilder&lt;/a&gt; and &lt;a href="https://github.com/kubernetes-sigs/controller-runtime" rel="noopener noreferrer"&gt;controller-runtime&lt;/a&gt; process a single reconcile request at a time. This is a sensible setting, since it's easier for operator developers to reason about and debug the logic in their applications. It also constrains throughput from the controller to core Kubernetes resources like ectd and the API server.&lt;/p&gt;

&lt;p&gt;But what if your work queue starts backing up and average reconciliation times increase due to requests that are left sitting in the queue, waiting to be processed? Luckily for us, a controller-runtime &lt;code&gt;Controller&lt;/code&gt; struct includes a &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; field (as I previously mentioned in my &lt;a href="//@/kubebuilder-tips.md"&gt;Kubebuilder Tips article&lt;/a&gt;). This option allows you to set the number of concurrent reconcile loops that are running in a single controller. So with a value above 1, you can reconcile multiple Kubernetes resources simultaneously.&lt;/p&gt;

&lt;p&gt;Early in my operator journey, one question that I had was how could we guarantee that the same resource isn't being reconciled at the same time by 2 or more goroutines? With &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; set above 1, this could lead to all sorts of race conditions and undesirable behavior, as the state of an object inside a reconciliation loop could change via a side-effect from an external source (a reconciliation loop running in a different thread).&lt;/p&gt;

&lt;p&gt;I thought about this for a while, and even implemented a &lt;code&gt;sync.Map&lt;/code&gt;-based approach that would allow a goroutine to acquire a lock for a given resource (based on its namespace/name).&lt;/p&gt;

&lt;p&gt;It turns out that all of this effort was for naught, since I recently learned (in a k8s slack channel) that the controller workqueue already includes this feature! Albeit with a simpler implementation.&lt;/p&gt;

&lt;p&gt;This is a quick story about how a k8s controller's workqueue guarantees that unique resources are reconciled sequentially. So even if &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; is set above 1, you can be confident that only a single reconciliation function is acting on any given resource at a time.&lt;/p&gt;

&lt;h2&gt;
  
  
  client-go/util
&lt;/h2&gt;

&lt;p&gt;Controller-runtime uses the &lt;a href="https://github.com/kubernetes/client-go/tree/master/util/workqueue" rel="noopener noreferrer"&gt;client-go/util/workqueue&lt;/a&gt; library to implement its underlying reconciliation queue. In the package's &lt;a href="https://github.com/kubernetes/client-go/blob/master/util/workqueue/doc.go" rel="noopener noreferrer"&gt;doc.go&lt;/a&gt; file, a comment states that the workqueue supports these properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fair: items processed in the order in which they are added.&lt;/li&gt;
&lt;li&gt;Stingy: a single item will not be processed multiple times concurrently, and if an item is added multiple times before it can be processed, it will only be processed once.&lt;/li&gt;
&lt;li&gt;Multiple consumers and producers. In particular, it is allowed for an item to be re-enqueued while it is being processed.&lt;/li&gt;
&lt;li&gt;Shutdown notifications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wait a second... My answer is right here in the second bullet, the "Stingy" property! According to these docs, the queue will automatically handle this concurrency issue for me, without having to write a single line of code. Let's run through the implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does the workqueue work?
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;workqueue&lt;/code&gt; struct has 3 main methods, &lt;code&gt;Add&lt;/code&gt;, &lt;code&gt;Get&lt;/code&gt;, and &lt;code&gt;Done&lt;/code&gt;. Inside a controller, an informer would &lt;code&gt;Add&lt;/code&gt; reconcile requests (namespaced-names of generic k8s resources) to the workqueue. A reconcile loop running in a separate goroutine would then &lt;code&gt;Get&lt;/code&gt; the next request from the queue (blocking if it is empty). The loop would perform whatever custom logic is written in the controller, and then the controller would call &lt;code&gt;Done&lt;/code&gt; on the queue, passing in the reconcile request as an argument. This would start the process over again, and the reconcile loop would call &lt;code&gt;Get&lt;/code&gt; to retrieve the next work item.&lt;/p&gt;

&lt;p&gt;This is similar to processing messages in RabbitMQ, where a worker pops an item off the queue, processes it, and then sends an "Ack" back to the message broker indicating that processing has completed and it's safe to remove the item from the queue.&lt;/p&gt;

&lt;p&gt;Still, I have an operator running in production that powers &lt;a href="https://cloud.questdb.com/" rel="noopener noreferrer"&gt;QuestDB Cloud's&lt;/a&gt; infrastructure, and wanted to be sure that the workqueue works as advertised. So a wrote a quick test to validate its behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  A little test
&lt;/h2&gt;

&lt;p&gt;Here is a simple test that validates the "Stingy" property:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main_test&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"testing"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/stretchr/testify/assert"&lt;/span&gt;

    &lt;span class="s"&gt;"k8s.io/client-go/util/workqueue"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestWorkqueueStingyProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;

    &lt;span class="c"&gt;// Create a new workqueue and add a request&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;workqueue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Subsequent adds of an identical object&lt;/span&gt;
    &lt;span class="c"&gt;// should still result in a single queued one&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Getting the object should remove it from the queue&lt;/span&gt;
    &lt;span class="c"&gt;// At this point, the controller is processing the request&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// But re-adding an identical request before it is marked as "Done"&lt;/span&gt;
    &lt;span class="c"&gt;// should be a no-op, since we don't want to process it simultaneously&lt;/span&gt;
    &lt;span class="c"&gt;// with the first one&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Once the original request is marked as Done, the second&lt;/span&gt;
    &lt;span class="c"&gt;// instance of the object will be now available for processing&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// And since it is available for processing, it will be&lt;/span&gt;
    &lt;span class="c"&gt;// returned by a Get call&lt;/span&gt;
    &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Len&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the workqueue uses a mutex under the hood, this behavior is threadsafe. So even if I wrote more tests that used multiple goroutines simultaneously reading and writing from the queue at high speeds in an attempt to break it, the workqueue's actual behavior would be the same as that of our single-threaded test.&lt;/p&gt;

&lt;h2&gt;
  
  
  All is not lost
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsklar.rocks%2Fimg%2Fblog%2Fkubernetes-operator-concurrency%2Fkubernetes-did-it.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsklar.rocks%2Fimg%2Fblog%2Fkubernetes-operator-concurrency%2Fkubernetes-did-it.webp" alt="Kubernetes did it" width="485" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are a lot of little gems like this hiding in the Kubernetes standard libraries, some of which are in not-so-obvious places (like a controller-runtime workqueue found in the go client package). Despite this discovery, and others like it that I've made in the past, I still feel that my previous attempts at solving these issues are not complete time-wasters. They force you to think critically about fundamental problems in distributed systems computing, and help you to understand more of what is going on under the hood. So that by the time I've discovered that "Kubernetes did it", I'm relieved that I can simplify my codebase and perhaps remove some unnecessary unit tests.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>k8s</category>
      <category>go</category>
      <category>kubebuilder</category>
    </item>
    <item>
      <title>How to add Kubernetes-powered leader election to your Go apps</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Fri, 19 Jul 2024 20:03:36 +0000</pubDate>
      <link>https://dev.to/sklarsa/how-to-add-kubernetes-powered-leader-election-to-your-go-apps-57jh</link>
      <guid>https://dev.to/sklarsa/how-to-add-kubernetes-powered-leader-election-to-your-go-apps-57jh</guid>
      <description>&lt;p&gt;Originally published &lt;a href="https://sklar.rocks/kubernetes-leader-election/" rel="noopener noreferrer"&gt;on by blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Kubernetes standard library is full of gems, hidden away in many of the various subpackages that are part of the ecosystem. One such example that I discovered recently &lt;a href="https://pkg.go.dev/k8s.io/client-go/tools/leaderelection" rel="noopener noreferrer"&gt;k8s.io/client-go/tools/leaderelection&lt;/a&gt;, which can be used to add a leader election protocol to any application running inside a Kubernetes cluster. This article will discuss what leader election is, how it's implemented in this Kubernetes package, and provide an example of how we can use this library in our own applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Leader Election
&lt;/h2&gt;

&lt;p&gt;Leader election is a distributed systems concept that is a core building block of highly-available software. It allows for multiple concurrent processes to coordinate amongst each other and elect a single "leader" process, which is then responsible for performing synchronous actions like writing to a data store.&lt;/p&gt;

&lt;p&gt;This is useful in systems like distributed databases or caches, where multiple processes are running to create redundancy against hardware or network failures, but can't write to storage simultaneously to ensure data consistency. If the leader process becomes unresponsive at some point in the future, the remaining processes will kick off a new leader election, eventually picking a new process to act as the leader.&lt;/p&gt;

&lt;p&gt;Using this concept, we can create highly-available software with a single leader and multiple standby replicas.&lt;/p&gt;

&lt;p&gt;In Kubernetes, the &lt;a href="https://github.com/kubernetes-sigs/controller-runtime" rel="noopener noreferrer"&gt;controller-runtime&lt;/a&gt; package uses &lt;a href="https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#leader-election" rel="noopener noreferrer"&gt;leader election&lt;/a&gt; to make controllers highly-available. In a controller deployment, resource reconciliation only occurs when a process is the leader, and other replicas are waiting on standby. If the leader pod becomes unresponsive, the remaining replicas will elect a new leader to perform subsequent reconciliations and resume normal operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes Leases
&lt;/h2&gt;

&lt;p&gt;This library uses a &lt;a href="https://kubernetes.io/docs/concepts/architecture/leases/" rel="noopener noreferrer"&gt;Kubernetes Lease&lt;/a&gt;, or distributed lock, that can be obtained by a process. Leases are native Kubernetes resources that are held by a single identity, for a given duration, with a renewal option.  Here's an example spec from the docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;coordination.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Lease&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiserver.kubernetes.io/identity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-apiserver&lt;/span&gt;
    &lt;span class="na"&gt;kubernetes.io/hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;master-1&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apiserver-07a5ea9b9b072c4a5f3d1c3702&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;holderIdentity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apiserver-07a5ea9b9b072c4a5f3d1c3702_0c8914f7-0f35-440e-8676-7844977d3a05&lt;/span&gt;
  &lt;span class="na"&gt;leaseDurationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
  &lt;span class="na"&gt;renewTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2023-07-04T21:58:48.065888Z"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Leases are used by the k8s ecosystem in three ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node Heartbeats&lt;/strong&gt;: Every Node has a corresponding Lease resource and updates its &lt;code&gt;renewTime&lt;/code&gt; field on an ongoing basis. If a Lease's &lt;code&gt;renewTime&lt;/code&gt; hasn't been updated in a while, the Node will be tainted as not available and no more Pods will be scheduled to it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leader Election&lt;/strong&gt;: In this case, a Lease is used to coordinate among multiple processes by having a leader update the Lease's &lt;code&gt;holderIdentity&lt;/code&gt;. Standby replicas, with different identities, are stuck waiting for the Lease to expire. If the Lease does expire, and is not renewed by the leader, a new election takes place in which the remaining replicas attempt to take ownership of the Lease by updating its &lt;code&gt;holderIdentity&lt;/code&gt; with their own. Since the Kubernetes API server disallows updates to stale objects, only a single standby node will successfully be able to update the Lease, at which point it will continue execution as the new leader.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Server Identity&lt;/strong&gt;: Starting in v1.26, as a beta feature, each &lt;code&gt;kube-apiserver&lt;/code&gt; replica will publish its identity by creating a dedicated Lease. Since this is a relatively slim, new feature, there's not much else that can be derived from the Lease object aside from how many API servers are running. But this does leave room to add more metadata to these Leases in future k8s versions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now let's explore this second use case of Leases by writing a sample program to demonstrate how you can use them in leader election scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Program
&lt;/h2&gt;

&lt;p&gt;In this code example, we are using the &lt;code&gt;leaderelection&lt;/code&gt; package to handle the leader election and Lease manipulation specifics.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"k8s.io/client-go/tools/leaderelection"&lt;/span&gt;
    &lt;span class="n"&gt;rl&lt;/span&gt; &lt;span class="s"&gt;"k8s.io/client-go/tools/leaderelection/resourcelock"&lt;/span&gt;
    &lt;span class="n"&gt;ctrl&lt;/span&gt; &lt;span class="s"&gt;"sigs.k8s.io/controller-runtime"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c"&gt;// lockName and lockNamespace need to be shared across all running instances&lt;/span&gt;
    &lt;span class="n"&gt;lockName&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"my-lock"&lt;/span&gt;
    &lt;span class="n"&gt;lockNamespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"default"&lt;/span&gt;

    &lt;span class="c"&gt;// identity is unique to the individual process. This will not work for anything,&lt;/span&gt;
    &lt;span class="c"&gt;// outside of a toy example, since processes running in different containers or&lt;/span&gt;
    &lt;span class="c"&gt;// computers can share the same pid.&lt;/span&gt;
    &lt;span class="n"&gt;identity&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getpid&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Get the active kubernetes context&lt;/span&gt;
    &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Create a new lock. This will be used to create a Lease resource in the cluster.&lt;/span&gt;
    &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;rl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewFromKubeconfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;rl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LeasesResourceLock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;lockNamespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;lockName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;rl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResourceLockConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Create a new leader election configuration with a 15 second lease duration.&lt;/span&gt;
    &lt;span class="c"&gt;// Visit https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig&lt;/span&gt;
    &lt;span class="c"&gt;// for more information on the LeaderElectionConfig struct fields&lt;/span&gt;
    &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;leaderelection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewLeaderElector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;leaderelection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LeaderElectionConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;          &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;LeaseDuration&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;RenewDeadline&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;RetryPeriod&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;          &lt;span class="n"&gt;lockName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Callbacks&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;leaderelection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LeaderCallbacks&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;OnStartedLeading&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I am the leader!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;OnStoppedLeading&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I am not the leader anymore!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;OnNewLeader&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;identity&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"the leader is %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Begin the leader election process. This will block.&lt;/span&gt;
    &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What's nice about the &lt;code&gt;leaderelection&lt;/code&gt; package is that it provides a callback-based framework for handling leader elections. This way, you can act on specific state changes in a granular way and properly release resources when a new leader is elected. By running these callbacks in separate goroutines, the package takes advantage of Go's strong concurrency support to efficiently utilize machine resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing it out
&lt;/h3&gt;

&lt;p&gt;To test this, lets spin up a test cluster using &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;kind&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kind create cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy the sample code into &lt;code&gt;main.go&lt;/code&gt;, create a new module (&lt;code&gt;go mod init leaderelectiontest&lt;/code&gt;) and tidy it (&lt;code&gt;go mod tidy&lt;/code&gt;) to install its dependencies. Once you run &lt;code&gt;go run main.go&lt;/code&gt;, you should see output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ go run main.go
I0716 11:43:50.337947     138 leaderelection.go:250] attempting to acquire leader lease default/my-lock...
I0716 11:43:50.351264     138 leaderelection.go:260] successfully acquired lease default/my-lock
the leader is 138
I am the leader!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact leader identity will be different from what's in the example (138), since this is just the PID of the process that was running on my computer at the time of writing.&lt;/p&gt;

&lt;p&gt;And here's the Lease that was created in the test cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ kubectl describe lease/my-lock
Name:         my-lock
Namespace:    default
Labels:       &amp;lt;none&amp;gt;
Annotations:  &amp;lt;none&amp;gt;
API Version:  coordination.k8s.io/v1
Kind:         Lease
Metadata:
  Creation Timestamp:  2024-07-16T15:43:50Z
  Resource Version:    613
  UID:                 1d978362-69c5-43e9-af13-7b319dd452a6
Spec:
  Acquire Time:            2024-07-16T15:43:50.338049Z
  Holder Identity:         138
  Lease Duration Seconds:  15
  Lease Transitions:       0
  Renew Time:              2024-07-16T15:45:31.122956Z
Events:                    &amp;lt;none&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See that the "Holder Identity" is the same as the process's PID, 138.&lt;/p&gt;

&lt;p&gt;Now, let's open up another terminal and run the same &lt;code&gt;main.go&lt;/code&gt; file in a separate process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ go run main.go
I0716 11:48:34.489953     604 leaderelection.go:250] attempting to acquire leader lease default/my-lock...
the leader is 138
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This second process will wait forever, until the first one is not responsive. Let's kill the first process and wait around 15 seconds. Now that the first process is not renewing its claim on the Lease, the &lt;code&gt;.spec.renewTime&lt;/code&gt; field won't be updated anymore. This will eventually cause the second process to trigger a new leader election, since the Lease's renew time is older than its duration. Because this process is the only one now running, it will elect itself as the new leader.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the leader is 604
I0716 11:48:51.904732     604 leaderelection.go:260] successfully acquired lease default/my-lock
I am the leader!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If there were multiple processes still running after the initial leader exited, the first process to acquire the Lease would be the new leader, and the rest would continue to be on standby.&lt;/p&gt;

&lt;h2&gt;
  
  
  No single-leader guarantees
&lt;/h2&gt;

&lt;p&gt;This package is not foolproof, in that it &lt;a href="https://pkg.go.dev/k8s.io/client-go/tools/leaderelection" rel="noopener noreferrer"&gt;"does not guarantee that only one client is acting as a leader (a.k.a. fencing)"&lt;/a&gt;. For example, if a leader is paused and lets its Lease expire, another standby replica will acquire the Lease. Then, once the original leader resumes execution, it will think that it's still the leader and continue doing work alongside the newly-elected leader. In this way, you can end up with two leaders running simultaneously.&lt;/p&gt;

&lt;p&gt;To fix this, a &lt;a href="https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html" rel="noopener noreferrer"&gt;fencing token&lt;/a&gt; which references the Lease needs to be included in each request to the server. A fencing token is effectively an integer that increases by 1 every time a Lease changes hands. So a client with an old fencing token will have its requests rejected by the server. In this scenario, if an old leader wakes up from sleep and a new leader has already incremented the fencing token, all of the old leader's requests would be rejected because it is sending an older (smaller) token than what the server has seen from the newer leader.&lt;/p&gt;

&lt;p&gt;Implementing fencing in Kubernetes would be difficult without modifying the core API server to account for corresponding fencing tokens for each Lease. However, the risk of having multiple leader controllers is somewhat mitigated by the k8s API server itself. Because updates to stale objects are rejected, only controllers with the most up-to-date version of an object can modify it. So while we could have multiple controller leaders running, a resource's state would never regress to older versions if a controller misses a change made by another leader. Instead, reconciliation time would increase as both leaders need to refresh their own internal states of resources to ensure that they are acting on the most recent versions.&lt;/p&gt;

&lt;p&gt;Still, if you're using this package to implement leader election using a different data store, this is an important caveat to be aware of.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Leader election and distributed locking are critical building blocks of distributed systems. When trying to build fault-tolerant and highly-available applications, having tools like these at your disposal is critical. The Kubernetes standard library gives us a battle-tested wrapper around its primitives to allow application developers to easily build leader election into their own applications.&lt;/p&gt;

&lt;p&gt;While use of this particular library does limit you to deploying your application on Kubernetes, that seems to be the way the world is going recently. If in fact that is a dealbreaker, you can of course fork the library and modify it to work against any ACID-compliant and highly-available datastore.&lt;/p&gt;

&lt;p&gt;Stay tuned for more k8s source deep dives!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>go</category>
    </item>
    <item>
      <title>K3s Traefik Ingress - configured for your homelab!</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Fri, 15 Dec 2023 17:20:19 +0000</pubDate>
      <link>https://dev.to/sklarsa/k3s-traefik-ingress-configured-for-your-homelab-58lc</link>
      <guid>https://dev.to/sklarsa/k3s-traefik-ingress-configured-for-your-homelab-58lc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally posted on &lt;a href="https://sklar.rocks/k3s-traefik-ingress/" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I recently purchased a used Lenovo M900 Think Centre (i7 with 32GB RAM) from eBay to expand my mini-homelab, which was just a single Synology DS218+ plugged into my ISP's router (yuck!). Since I've been spending a big chunk of time at work playing around with Kubernetes, I figured that I'd put my skills to the test and run a &lt;a href="https://k3s.io/" rel="noopener noreferrer"&gt;k3s&lt;/a&gt; node on the new server. While I was familiar with k3s before starting this project, I'd never actually run it before, opting for tools like &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;kind&lt;/a&gt; (and &lt;a href="https://minikube.sigs.k8s.io/docs/start/" rel="noopener noreferrer"&gt;minikube&lt;/a&gt; before that) to run small test clusters for my local development work.&lt;/p&gt;

&lt;p&gt;K3s comes "batteries-included" and is packaged with all of the core components that are needed to run a cluster. For ingress, this component is Traefik, something which I also never had any hands-on experience with.&lt;/p&gt;

&lt;p&gt;After a bit of trial-and-error, along with some help from the docs, I was able to figure out how to configure the Traefik ingress to expose a service on the new node to the rest of my LAN. This post will go into detail about how I did it, in case you've also installed k3s on a node in your own lab and want to know how to use the pre-bundled Traefik ingress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing k3s
&lt;/h2&gt;

&lt;p&gt;The k3s installation process is fairly straightforward. It just requires &lt;code&gt;curl&lt;/code&gt;ing &lt;code&gt;https://get.k3s.io&lt;/code&gt; and executing the script. The default settings all make sense to me, except for the need to &lt;code&gt;chmod&lt;/code&gt; the bundled kubeconfig since its default filemode is inaccessible to non-root users. Luckly, k3s has a solution for that! All you need to do is &lt;code&gt;export K3S_KUBECONFIG_MODE="644"&lt;/code&gt; before you run the installation, and the k3s install script does the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Traefik
&lt;/h2&gt;

&lt;p&gt;By default, host ports 80 and 443 are exposed by the bundled Traefik ingress controller. This will let you create HTTP and HTTPS ingresses on their standard ports.&lt;/p&gt;

&lt;p&gt;But of course, I want to run a &lt;a href="https://questdb.io/" rel="noopener noreferrer"&gt;QuestDB&lt;/a&gt; instance on my node, which uses two additional TCP ports for Influx Line Protocol (ILP) and Pgwire communication with the database. So how can I expose these extra ports on my node and route traffic to the QuestDB container running inside of k3s?&lt;/p&gt;

&lt;p&gt;K3s deploys Traefik via a Helm chart and allows you to modify that chart's &lt;code&gt;values.yaml&lt;/code&gt; through a &lt;a href="https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig" rel="noopener noreferrer"&gt;HelmChartConfig&lt;/a&gt; resource. To further customize &lt;code&gt;values.yaml&lt;/code&gt; files for installed charts, you can place extra HelmChartConfig manifests in &lt;code&gt;/var/lib/rancher/k3s/server/manifests&lt;/code&gt;. Here is my &lt;code&gt;/var/lib/rancher/k3s/server/manifests/traefik-config.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;helm.cattle.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HelmChartConfig&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;valuesContent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|-&lt;/span&gt;
    &lt;span class="s"&gt;ports:&lt;/span&gt;
      &lt;span class="s"&gt;psql:&lt;/span&gt;
        &lt;span class="s"&gt;port: 8812&lt;/span&gt;
        &lt;span class="s"&gt;expose: true&lt;/span&gt;
        &lt;span class="s"&gt;exposedPort: 8812&lt;/span&gt;
        &lt;span class="s"&gt;protocol: TCP&lt;/span&gt;
      &lt;span class="s"&gt;ilp:&lt;/span&gt;
        &lt;span class="s"&gt;port: 9009&lt;/span&gt;
        &lt;span class="s"&gt;expose: true&lt;/span&gt;
        &lt;span class="s"&gt;exposedPort: 9009&lt;/span&gt;
        &lt;span class="s"&gt;protocol: TCP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This manifest modifies the Traefik deployment to serve two extra TCP ports on the host: 8812 for Pgwire and 9009 for ILP. Along with these new ports, Traefik also ships with its default "web" and "websecure" ports that route HTTP and HTTPS traffic respectively. But this is already preconfigured for you in the Helm chart, so there's no extra work involved.&lt;/p&gt;

&lt;p&gt;So now that we have our ports exposed, lets figure out how to route traffic from them to our QuestDB instance running inside the cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  IngressRoutes
&lt;/h2&gt;

&lt;p&gt;While you can use traditional k8s &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" rel="noopener noreferrer"&gt;Ingresses&lt;/a&gt; to configure external access to cluster resources, Traefik v2 also includes new, more flexible types of ingress that coordinate directly with the Traefik deployment. These can be configured by using Traefik-specific Custom Resources, which allow users to specify cluster ingress routes using &lt;a href="https://doc.traefik.io/traefik/routing/routers/" rel="noopener noreferrer"&gt;Traefik's custom routing rules&lt;/a&gt; instead of the standard URI-based routing traditionally found in k8s. Using these rules, you can route requests not just based on hostnames and paths, but also by request headers, querystrings, and source IPs, with regex matching support for many of these options. This unlocks significantly more flexibility when routing traffic into your cluster as opposed to using a standard k8s ingress.&lt;/p&gt;

&lt;p&gt;Here's an example of an IngressRoute that I use to expose the QuestDB web console.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IngressRoute&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rule&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Host(`my-hostname`)&lt;/span&gt;
      &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This CRD maps the QuestDB service port 9000 to host port 80 through the &lt;code&gt;web&lt;/code&gt; entrypoint, which as I mentioned above, comes pre-installed in the Traefik Helm chart. With this config, I can now access the console by navigating to &lt;a href="http://my-hostname/" rel="noopener noreferrer"&gt;http://my-hostname/&lt;/a&gt; in my web browser and Traefik will route the request to my QuestDB HTTP service.&lt;/p&gt;

&lt;h3&gt;
  
  
  IngressRouteTCP
&lt;/h3&gt;

&lt;p&gt;It's important to note that IngressRoutes are used solely for HTTP ingress routing. Raw TCP routing is done using IngressRouteTCPs, and there are also IngressRouteUDPs available for you to use as well.&lt;/p&gt;

&lt;p&gt;To support the TCP-based ILP and Pgwire protocols, I created 2 &lt;code&gt;IngressRouteTCP&lt;/code&gt; resources to handle traffic on host ports 9009 and 8812.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.containo.us/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IngressRouteTCP&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb-ilp&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ilp&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HostSNI(`*`)&lt;/span&gt;
      &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9009&lt;/span&gt;
          &lt;span class="na"&gt;terminationDelay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.containo.us/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IngressRouteTCP&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb-psql&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;psql&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HostSNI(`*`)&lt;/span&gt;
      &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;questdb&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8812&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we use the TCP-specific &lt;code&gt;HostSNI&lt;/code&gt; matcher to route all node traffic on ports 9009 (&lt;code&gt;ilp&lt;/code&gt;) and 8812 (&lt;code&gt;psql&lt;/code&gt;) to the questdb service. The &lt;code&gt;ilp&lt;/code&gt; and &lt;code&gt;psql&lt;/code&gt; entrypoints correspond to the &lt;code&gt;ilp&lt;/code&gt; and &lt;code&gt;psql&lt;/code&gt; ports that we exposed in the &lt;code&gt;traefik-config.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we're able to access QuestDB on all 3 supported ports on my k3s node from the rest of my home network.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I hope this example gives you a bit more confidence when configuring Traefik on a single-node k3s "cluster". It may be a bit confusing at first, but once you have the basics down, you'll be exposing all of your hosted services in no time!&lt;/p&gt;

&lt;p&gt;Since we're only working with one node, there's a limited amount of routing flexibility that I could include in the Traefik CRD configurations. In a more complex environment, you can really go wild with all of the possibilities that are provided by Traefik routing rules! Still, the benefit of this simple example is that you can learn the basics in a small and controlled environment, and then add complexity once it's needed.&lt;/p&gt;

&lt;p&gt;Remember, if you're using k3s clustering mode and running multiple nodes, you'll need to route traffic using an &lt;a href="https://docs.k3s.io/datastore/cluster-loadbalancer" rel="noopener noreferrer"&gt;external load balancer&lt;/a&gt; like Haproxy. Maybe I'll play around with this if I ever pick up a second M900 or other mini-desktop. But until then, I'll stick with this ingress configuration.&lt;/p&gt;

</description>
      <category>traefik</category>
      <category>k3s</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>How I Built a Terminal UI to Display K8s Operator Metrics</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Thu, 07 Dec 2023 02:40:26 +0000</pubDate>
      <link>https://dev.to/sklarsa/how-i-built-a-terminal-ui-to-display-k8s-operator-metrics-2m27</link>
      <guid>https://dev.to/sklarsa/how-i-built-a-terminal-ui-to-display-k8s-operator-metrics-2m27</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally posted &lt;a href="https://sklar.rocks/operator-prom-metrics-viewer/" rel="noopener noreferrer"&gt;on my blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As you hopefully know from reading some of my recent articles, I really enjoy writing Kubernetes operators! I've written &lt;a href="https://dev.to/tags/kubernetes/"&gt;a bit about them on this blog&lt;/a&gt;, and also just recently &lt;a href="https://2023.fossy.us/schedule/presentation/51/" rel="noopener noreferrer"&gt;presented a talk&lt;/a&gt; about how to build them at the inaugural &lt;a href="https://2023.fossy.us/" rel="noopener noreferrer"&gt;FOSSY conference&lt;/a&gt; in Portland, Oregon. So here goes another operator-related article, although this one is about one of the later stages of operator development: performance tuning.&lt;/p&gt;

&lt;p&gt;Since I've been working on an operator that includes multiple controllers, each with 4 reconcilers running concurrently that make API calls to resources outside of the cluster, I wanted to start profiling my reconciliation loop performance to ensure that I could handle real-world throughput requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prometheus Metrics
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener noreferrer"&gt;Kubebuilder&lt;/a&gt; offers &lt;a href="https://book.kubebuilder.io/reference/metrics.html#exporting-metrics-for-prometheus" rel="noopener noreferrer"&gt;the ability to export operator-specific Prometheus metrics&lt;/a&gt; as part of its default scaffolding. This provides a great way to profile and monitor your operator performance under both testing and live conditions.&lt;/p&gt;

&lt;p&gt;Some of these metrics (many on a per-controller basis) include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A histogram of reconcile times&lt;/li&gt;
&lt;li&gt;Number of active workers&lt;/li&gt;
&lt;li&gt;Workqueue depth&lt;/li&gt;
&lt;li&gt;Total reconcile time&lt;/li&gt;
&lt;li&gt;Total reconcile errors&lt;/li&gt;
&lt;li&gt;API server request method &amp;amp; response counts&lt;/li&gt;
&lt;li&gt;Process memory usage &amp;amp; GC performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a way to monitor these metrics for controllers that are under development on my local machine. This means that they are running as processes outside of any cluster context, which makes it difficult to scrape their exposed metrics using a standard prometheus/grafana setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  operator-prom-metrics-viewer
&lt;/h2&gt;

&lt;p&gt;So I thought, why not craft a little GUI to display this information? Inspired by the absolutely incredible &lt;a href="https://k9scli.io/" rel="noopener noreferrer"&gt;k9s project&lt;/a&gt;, I decided to make a terminal-driven GUI using the &lt;a href="https://github.com/rivo/tview" rel="noopener noreferrer"&gt;tview library&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since I'm not (yet) storing this information for further analysis, I decided to only display the latest scraped data point for each metric. But I also didn't want to implement my own prometheus metric parser to do this, so I imported pieces of the prometheus &lt;a href="https://github.com/prometheus/prometheus/tree/main/scrape" rel="noopener noreferrer"&gt;scrape library itself&lt;/a&gt; to handle the metric parsing. All I had to do was implement a few interfaces for storing and querying my metrics to interop with the scrape library.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prometheus in-memory, transient storage
&lt;/h2&gt;

&lt;p&gt;The prometheus storage and query interfaces are relatively simple, and it helps that we actually don't need to implement all of their methods!&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;The underlying storage mechanism of my &lt;code&gt;InMemoryMetricStorage&lt;/code&gt; class is a simple map along with a mutex for locking it during multithreaded IO. In this map, we are only storing the latest value for each metric, so there's no problem overwriting a key's value if it already exists. And we also don't need to worry about out-of-order writes or any other time-series database problems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;InMemoryAppender&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;DataPoint&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;   &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mutex&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DataPoint&lt;/code&gt; struct maps directly to a prometheus metric's structure. If you've worked with prometheus metrics before, this should look pretty familiar to you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DataPoint&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Labels&lt;/span&gt;    &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;     &lt;span class="kt"&gt;float64&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To populate each DataPoint's &lt;code&gt;uint64&lt;/code&gt; key in the map, we can simply use the prometheus builtin &lt;code&gt;labels.Hash()&lt;/code&gt; function to generate a unique key for each DataPoint, without having to do any extra work on our end.&lt;/p&gt;

&lt;p&gt;To demonstrate, here's an example of a prometheus metric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# HELP http_requests_total Total number of http api requests
# TYPE http_requests_total counter
http_requests_total{api="add_product"} 4633433
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And how it would be represented by the &lt;code&gt;DataPoint&lt;/code&gt; struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;DataPoint&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"add_product"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4633433&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have our data model, we need a way to store &lt;code&gt;DataPoint&lt;/code&gt;s for further retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Appender
&lt;/h3&gt;

&lt;p&gt;Here's the interface of the storage backend appender:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Appender&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;SeriesRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SeriesRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;

    &lt;span class="n"&gt;ExemplarAppender&lt;/span&gt;
    &lt;span class="n"&gt;HistogramAppender&lt;/span&gt;
    &lt;span class="n"&gt;MetadataUpdater&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because we're building a simple in-memory tool (with no persistence or availability guarantees), we can ignore the &lt;code&gt;Commit()&lt;/code&gt; and &lt;code&gt;Rollback()&lt;/code&gt; methods (by turning them into no-ops). Furthermore, we can avoid implementing the bottom three interfaces, since the metrics that are exposed by kubebuilder are not exemplars, metadata, or histograms (the framework actually uses gauges to represent the histogram values, more on this below!). So this only leaves the &lt;code&gt;Append()&lt;/code&gt; function to implement, which just writes to the &lt;code&gt;InMemoryAppender&lt;/code&gt;'s underlying map storage in a threadsafe manner.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;InMemoryAppender&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeriesRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SeriesRef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Hash&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DataPoint&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querier
&lt;/h3&gt;

&lt;p&gt;The querier implementation is also fairly straightforward. We just need to implement a single &lt;code&gt;Query&lt;/code&gt; method against our datastore to extract metrics from our store.&lt;/p&gt;

&lt;p&gt;My solution is far from an optimized inner-loop; it has awful exponential performance in the worst case. But for looping through a few metrics at a rate of at most once-per-second, I'm not overly concerned with the time-complexity of this function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;InMemoryAppender&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;DataPoint&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;dataToReturn&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;DataPoint&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"__name__"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;isMatch&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;isMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isMatch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dataToReturn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataToReturn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dataToReturn&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Histogram implementation
&lt;/h2&gt;

&lt;p&gt;Since the controller reconcile time histogram data is stored in prometheus gauges, I did need to write some additional code to transform these into a workable schema to actually generate a histogram. To aid in the construction of the histogram buckets, I created 2 structs, one to represent the histogram itself, and the other to represent each of its buckets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;HistogramData&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;
    &lt;span class="n"&gt;curIdx&lt;/span&gt;  &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Bucket&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Label&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each controller, we can run a query against our storage backend for the &lt;code&gt;workqueue_queue_duration_seconds_bucket&lt;/code&gt; metric with the &lt;code&gt;name=&amp;lt;controller-name&amp;gt;&lt;/code&gt; label. This will return all histogram buckets for a specific controller, each with a separate label (&lt;code&gt;le&lt;/code&gt;) that denotes the bucket's limit. We can iterate through these metrics to create &lt;code&gt;Bucket&lt;/code&gt; objects, append them to the &lt;code&gt;HistogramData.buckets&lt;/code&gt; slice, and eventually sort them before rendering the histogram to keep the display consistent and in the correct order.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;curIdx&lt;/code&gt; field on the &lt;code&gt;HistogramData&lt;/code&gt; struct is only used internally when rendering the histogram, to keep track of the current index of the &lt;code&gt;buckets&lt;/code&gt; slice that we're on. This way, we can consume buckets using a &lt;code&gt;for&lt;/code&gt; loop and exit when we return the final bucket. It's a bit clunky, but it works!&lt;/p&gt;

&lt;h2&gt;
  
  
  v1.0
&lt;/h2&gt;

&lt;p&gt;After all of this hacking, I ended up with something that looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsklar.rocks%2Fimg%2Fblog%2Foperator-prom-metrics-viewer%2Fscreenshot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsklar.rocks%2Fimg%2Fblog%2Foperator-prom-metrics-viewer%2Fscreenshot.png" alt="Screenshot"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's far from perfect; since I don't manipulate the histogram values (yet) or have any display scrolling, it's very likely that the histogram will spill off the right of the screen. The UX can also be improved, since you can only switch the controller that is being displayed by pressing the &lt;code&gt;Up&lt;/code&gt; and &lt;code&gt;Down&lt;/code&gt; buttons.&lt;/p&gt;

&lt;p&gt;But it's been a fantastic tool for me when debugging slow reconciles due to API rate limiting, and has also highlighted errors for me when I miss them in the logs. Give it a whirl and let me know what you think!&lt;/p&gt;

&lt;p&gt;Link to the Github: &lt;a href="https://github.com/sklarsa/operator-prom-metrics-viewer" rel="noopener noreferrer"&gt;https://github.com/sklarsa/operator-prom-metrics-viewer&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Annotations in Kubernetes Operator Design</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Sat, 25 Nov 2023 23:22:16 +0000</pubDate>
      <link>https://dev.to/sklarsa/annotations-in-kubernetes-operator-design-1ffh</link>
      <guid>https://dev.to/sklarsa/annotations-in-kubernetes-operator-design-1ffh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published on &lt;a href="https://sklar.rocks/kubernetes-operators-annotations/" rel="noopener noreferrer"&gt;my blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It seems that annotations are everywhere in the Kubernetes (k8s) ecosystem. Ingress controllers, cloud providers, and operators of all kinds use the metadata stored in annotations to perform targeted actions inside of a cluster. So how can we leverage these when developing a new k8s operator?&lt;/p&gt;

&lt;h2&gt;
  
  
  To the Docs
&lt;/h2&gt;

&lt;p&gt;Despite their widespread use, the &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/" rel="noopener noreferrer"&gt;official documentation of annotations&lt;/a&gt; is actually quite brief. In fact, it only takes two short sentences at the top of the page to define an annotation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can use Kubernetes annotations to attach arbitrary non-identifying metadata to objects. Clients such as tools and libraries can retrieve this metadata.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While technically accurate, this definition is still pretty vague and not entirely helpful.&lt;/p&gt;

&lt;p&gt;The docs expand on this by providing a few examples of the types of metadata that can be stored in an annotation. But these samples range from build information all the way to individuals' "phone or pager numbers" (who still carries a pager these days anyway?).&lt;/p&gt;

&lt;p&gt;Somewhere within their ambiguity lies the true power of k8s annotations; they grant the ability to tag any cluster resource with structured data in almost any format. It's like having a dedicated key-value store attached to every resource in your cluster! So how can we harness this power in an operator?&lt;/p&gt;

&lt;p&gt;In this post, I will detail a way in which I recently used annotations while writing an operator for my company's product, &lt;a href="https://questdb.io/" rel="noopener noreferrer"&gt;QuestDB&lt;/a&gt;. Hopefully this will give you an idea of how you can incorporate annotations into your own operators to harness their full potential.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;The operator that I've been working on is designed to manage the full lifecycle of a QuestDB database instance, including version and hardware upgrades, config changes, backups, and (eventually) recovery from node failure. I used the &lt;a href="https://sdk.operatorframework.io/" rel="noopener noreferrer"&gt;Operator SDK&lt;/a&gt; and &lt;a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener noreferrer"&gt;kubebuilder&lt;/a&gt; frameworks to provide scaffolding and API support.&lt;/p&gt;

&lt;h2&gt;
  
  
  It always comes back to a JWK
&lt;/h2&gt;

&lt;p&gt;In order to take advantage of the database's many performance optimizations (such as &lt;a href="https://questdb.io/blog/2022/09/12/importing-300k-rows-with-io-uring/" rel="noopener noreferrer"&gt;importing over 300k rows/sec with io_uring&lt;/a&gt;), we recommend that users ingest data over &lt;a href="https://questdb.io/docs/reference/api/ilp/overview" rel="noopener noreferrer"&gt;InfluxDB Line Protocol&lt;/a&gt;. One of the features that we offer, which is not part of the original protocol, is authentication over TCP using a JSON Web Key (JWK).&lt;/p&gt;

&lt;p&gt;This feature can be configured in a file that is referenced by the main server config on launch. You just need to add your JWK's key id and public data to the file in this format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;testUser1 ec-p-256-sha256 fLKYEaoEb9lrn3nkwLDA-M_xnuFOdSt9y0Z7_vWSHLU Dt5tbS1dEDMSYfym3fgMv0B99szno-dFc1rYF9t0aac
# [key/user id] [key type] {keyX keyY}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's say that you have your private key stored elsewhere in a k8s cluster as a &lt;a href="https://kubernetes.io/docs/concepts/configuration/secret/" rel="noopener noreferrer"&gt;Secret&lt;/a&gt;, so your client application can securely push data to your QuestDB instance. The JWK secret data would look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "kty": "EC",
  "d": "5UjEMuA0Pj5pjK8a-fa24dyIf-Es5mYny3oE_Wmus48",
  "crv": "P-256",
  "kid": "testUser1",
  "x": "fLKYEaoEb9lrn3nkwLDA-M_xnuFOdSt9y0Z7_vWSHLU",
  "y": "Dt5tbS1dEDMSYfym3fgMv0B99szno-dFc1rYF9t0aac"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a user creates a QuestDB Custom Resource (CR) in the cluster, we want to be able to point our operator to this private key and reformat the public values ("kid", "x", and "y") so that it can create a valid &lt;code&gt;auth.conf&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/concepts/configuration/configmap/" rel="noopener noreferrer"&gt;ConfigMap&lt;/a&gt; value to mount to the &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/" rel="noopener noreferrer"&gt;Pod&lt;/a&gt; running our QuestDB instance. The operator can then add &lt;code&gt;line.tcp.auth.db.path=auth.conf&lt;/code&gt; to the main server config to make it aware of the new authentication file, and the client application can communicate to QuestDB securely over ILP using the private key.&lt;/p&gt;

&lt;p&gt;How can we let the operator know which Secret to use?&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the Spec
&lt;/h2&gt;

&lt;p&gt;One approach is to simply create a field on the QuestDB Custom Resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;QuestDBSpec&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;IlpSecretName&lt;/span&gt;      &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"ilpSecretName,omitempty"`&lt;/span&gt;
    &lt;span class="n"&gt;IlpSecretNamespace&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"ilpSecretNamespace,omitempty"`&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With these fields, a user can now set their values to the name and namespace of the secret that contains the JWK's private key, like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crd.questdb/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;QuestDB&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ilpSecretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-private-key&lt;/span&gt;
  &lt;span class="na"&gt;ilpSecretNamespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After applying the above yaml to the cluster, the operator will kick off a reconciliation loop of the newly created (or updated) QuestDB CR. Inside this loop, the operator will query the k8s API for the Secret &lt;code&gt;default/my-private-key&lt;/code&gt;, obtain the "kid", "x", and "y" values from the Secret's data, modify the &lt;code&gt;ConfigMap&lt;/code&gt; that is holding the QuestDB configuration, and continue the process as described above.&lt;/p&gt;

&lt;p&gt;Even though this technically works, the approach is fairly naive and can lead to some issues down the line. For example, if you want to rotate your JWK, how will the operator know to update the public key in the QuestDB auth ConfigMap? Or, what will happen if the secret does not even exist? Let's use some kubebuilder primitives to help answer these questions and improve the solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubebuilder Watches
&lt;/h2&gt;

&lt;p&gt;Kubebuilder has built-in support for watching resources that are managed both by the operator and also externally by another component. A &lt;strong&gt;watch&lt;/strong&gt; is a function that registers the controller with the k8s API server, so that the controller is notified when a "watched" resource has changed. This allows the operator to kick off a reconciliation loop against the changed object, to ensure that the actual resource spec matches the desired spec (through operator's custom logic).&lt;/p&gt;

&lt;p&gt;Using kubebuilder, resource watches can be configured in a function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;QuestDBReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SetupWithManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;For&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;questdbv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuestDB&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;corev1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConfigMap&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Watches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;corev1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;{}},&lt;/span&gt;
            &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EnqueueRequestsFromMapFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;secretToQuestDB&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPredicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResourceVersionChangedPredicate&lt;/span&gt;&lt;span class="p"&gt;{}),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this function, we register our reconciler with a controller manager and set up 3 different types of watches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;For(&amp;amp;questdbv1.QuestDB{})&lt;/code&gt; instructs the manager that the controller's primary managed resource is a &lt;code&gt;questdbv1.QuestDB&lt;/code&gt;. This watch function registers the manager with the k8s API so it will be notified about any changes that happen to a QuestDB CR. When a change has been identified, the manager will kick off a reconcile of that object, calling the &lt;code&gt;QuestDBReconciler.Reconcile()&lt;/code&gt; function to migrate the resource status to its desired state. Only one &lt;code&gt;For&lt;/code&gt; clause can be used when registering a new controller, which goes along hand-in-hand with the recommendation that a controller should be responsible for &lt;a href="https://sdk.operatorframework.io/docs/best-practices/best-practices/" rel="noopener noreferrer"&gt;a single CR&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Owns(&amp;amp;corev1.ConfigMap{})&lt;/code&gt; will kick off a reconcile of any QuestDB CR when a ConfigMap that is &lt;em&gt;owned&lt;/em&gt; by a QuestDB changes. To own an object, you can use the &lt;code&gt;controllerutil.SetControllerReference&lt;/code&gt; function to create a &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/" rel="noopener noreferrer"&gt;parent-child relationship&lt;/a&gt; between the QuestDB parent and ConfigMap child. So then, changes to that ConfigMap will trigger a reconcile of the parent QuestDB in the controller.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Based on the function signature alone, the &lt;code&gt;Watches&lt;/code&gt; block is clearly very different than the previous types. In this case, we are listening for changes to &lt;em&gt;any&lt;/em&gt; &lt;code&gt;corev1.Secret&lt;/code&gt; inside the entire cluster, &lt;strong&gt;regardless of ownership constraints&lt;/strong&gt;. The watch is also set up with a specific predicate to filter out some events (&lt;code&gt;predicate.ResourceVersionChangedPredicate&lt;/code&gt;). This predicate will match cluster events when a Secret's version is incremented (as the result of a Spec or Status change). So when a &lt;code&gt;corev1.Secret&lt;/code&gt; change is found anywhere in the cluster, the manager will run the &lt;code&gt;secretToQuestDB&lt;/code&gt; function to map that Secret to zero-or-more QuestDB &lt;code&gt;NamespacedName&lt;/code&gt; references, based on its characteristics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below, we will use this function to update a QuestDB's config if a JWK value has changed. To do this, we need to map from a Secret to any QuestDBs that are using that Secret's value for ILP authentication.&lt;/p&gt;

&lt;p&gt;Let's take a deeper look at this mapper function to see how to accomplish this.&lt;/p&gt;

&lt;h2&gt;
  
  
  EnqueueRequestsFromMapFunc
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;sigs.k8s.io/controller-runtime&lt;/code&gt; package defines a &lt;code&gt;MapFunc&lt;/code&gt; that is an input to the &lt;code&gt;Watches&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MapFunc&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;reconcile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function accepts a generic API object and returns a list of reconcile requests, which are simple wrappers on top of namespaced names (usually seen in the form &lt;code&gt;"namespace/name"&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;// NamespacedName is the name and namespace of the object to reconcile.&lt;/span&gt;
  &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So how can we turn a generic &lt;code&gt;client.Object&lt;/code&gt; (that is a generic abstraction on top of a Secret) into the name and namespace of a QuestDB object that we want to reconcile?&lt;/p&gt;

&lt;p&gt;There are many possible answers to this question!&lt;/p&gt;

&lt;p&gt;One idea is to create a naming convention that somehow encodes the name and namespace of the target QuestDB into the Secret's name, so we could use the &lt;code&gt;client.Object.GetName()&lt;/code&gt; and &lt;code&gt;client.Object.GetNamespace()&lt;/code&gt; to build a &lt;code&gt;NamespacedName&lt;/code&gt; to reconcile. Perhaps something like &lt;code&gt;questdb-${DB_NAME}-ilp&lt;/code&gt;. But this would limit what we could name Secrets, which might not interop well if something like &lt;a href="https://external-secrets.io/" rel="noopener noreferrer"&gt;external secrets controller&lt;/a&gt; is syncing the secret from an external source like Vault. Or if a developer simply forgets the naming convention, and needs to debug &lt;em&gt;why&lt;/em&gt; their QuestDB's ILP auth isn't working.&lt;/p&gt;

&lt;p&gt;Maybe we could reuse the &lt;code&gt;IlpSecretName&lt;/code&gt; and &lt;code&gt;IlpSecretNamespace&lt;/code&gt; spec fields from the previous section? We could query for a QuestDB that has a &lt;code&gt;Spec.IlpSecretName == client.Object.GetName()&lt;/code&gt; (and likewise for namespace) inside our mapper function. But this doesn't work for a few reasons.&lt;/p&gt;

&lt;p&gt;The first is that you are unable to use &lt;a href="https://github.com/kubernetes/kubernetes/issues/51046" rel="noopener noreferrer"&gt;field selectors with CRDs&lt;/a&gt;, so this query is literally impossible in the current version of k8s!&lt;/p&gt;

&lt;p&gt;Secondly, lets say you try to bypass this restriction by storing the secret name on the QuestDB object in something that &lt;em&gt;could&lt;/em&gt; queried against, like resource labels. Since the function only accepts a &lt;code&gt;client.Object&lt;/code&gt; and does not return an &lt;code&gt;error&lt;/code&gt; along with its &lt;code&gt;[]reconcile.Request&lt;/code&gt;, there's no clean place to instantiate a new client inside a &lt;code&gt;MapFunc&lt;/code&gt;. To do that, you would need a cancelable context and a standardized way to handle API errors. You can create all of this inside a &lt;code&gt;MapFunc&lt;/code&gt;, but you wouldn't be able to use the rest of kubebuilder's built-in error handling capabilities and its context that is attached to every other API request in the system. So based on the signature of &lt;code&gt;MapFunc&lt;/code&gt;, it's clear that the designers don't want you making any queries inside of them!&lt;/p&gt;

&lt;p&gt;Then how can we only use the data found in the &lt;code&gt;client.Object&lt;/code&gt; to create a list of QuestDBs to reconcile?&lt;/p&gt;

&lt;h3&gt;
  
  
  Annotations to the rescue!
&lt;/h3&gt;

&lt;p&gt;To solve this issue, I decided to create a new annotation: &lt;code&gt;"crd.questdb.io/name"&lt;/code&gt;. This annotation will be attached to a Secret and points to the name of the QuestDB CR that will use its data to construct an ILP auth config file. For simplicity, I will assume that the Secret will only be used by a single QuestDB, and that both the Secret and QuestDB will reside in the same namespace.&lt;/p&gt;

&lt;p&gt;This allows us to create a very simple mapper function that looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;CheckSecretForQdbs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;reconcile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;reconcile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c"&gt;// Exit if the object is not a Secret&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c"&gt;// Extract the target QuestDB from the annotation&lt;/span&gt;
  &lt;span class="n"&gt;qdbName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetAnnotations&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s"&gt;"crd.questdb.io/name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reconcile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ObjectKey&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;qdbName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="c"&gt;// The Secret and QuestDB must reside in&lt;/span&gt;
      &lt;span class="c"&gt;// the same namespace for this to work&lt;/span&gt;
      &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetNamespace&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reconciliation logic
&lt;/h3&gt;

&lt;p&gt;But we're not done yet! The controller still needs to find this Secret and use its data to construct the auth config.&lt;/p&gt;

&lt;p&gt;Inside our QuestDB reconciliation loop, we can query for all Secrets in a QuestDB's namespace and iterate over them until we find the one we're looking for, based on our new annotation. Here's a small code sample of that, without any additional error-checking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;QuestDBReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;questdbv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuestDB&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="c"&gt;// Assumes that the QuestDB exists (for simplicity)&lt;/span&gt;
  &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;allSecrets&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SecretList&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="n"&gt;authSecret&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="c"&gt;// Get a list of all secrets in the namespace&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;allSecrets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InNamespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c"&gt;// Iterate over them to find the secret with the desired annotation&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;allSecrets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Items&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Annotations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"crd.questdb.io/name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;authSecret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;authSecret&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"auth secret not found"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authSecret&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authSecret&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;kid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;authSecret&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"kid"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c"&gt;// Construct the ILP auth string to add to the QuestDB config&lt;/span&gt;
  &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;constructIlpAuthConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c"&gt;// Add this auth string to a ConfigMap value and update...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the new annotation allows us to fully decouple the Secret from the QuestDB operator, since there are no domain-specific naming requirements for the Secret. You don't even need to change the QuestDB CR spec to update the config. All you need to do is add the annotation to any Secret in the QuestDB's namespace, set the value to the name of the QuestDB resource, and the operator will be automatically be notified of the change and update your QuestDB's config to use the Secret's public key data.&lt;/p&gt;

&lt;p&gt;Note that this is a golden-path solution; we still need to handle cases where more than 1 Secret has the annotation or a matching Secret does not have the required keys that are needed to generate the JWK public key.&lt;/p&gt;

&lt;h2&gt;
  
  
  No limits
&lt;/h2&gt;

&lt;p&gt;The beauty of annotations is that you can store anything in them, and with a custom operator, use that data to perform any cluster automation that you can dream of! K8s doesn't even prescribe the format of an annotation's value, as long as it can be represented in a YAML string. This means you can use simple strings, JSON, or even base64-encoded binary blobs as annotation values for an operator to use! Still, since k8s is a young-ish and constantly evolving system, I would probably stick with simple annotation values to abide by KISS as much as possible.&lt;/p&gt;

&lt;p&gt;After using annotations in my operator code, I've started to gain more of an appreciation for &lt;em&gt;why&lt;/em&gt; the k8s annotation docs are so vague; because they can be used for any custom action, it's not really possible to define all of their capabilities. It's up to the operator developer to use annotations in his or her own way.&lt;/p&gt;

&lt;p&gt;I hope this example has sparked some of your own ideas about how to use annotations in your own operators. Let me know if it has!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>operators</category>
      <category>go</category>
    </item>
    <item>
      <title>Hacking in kind (Kubernetes in Docker)</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Fri, 17 Nov 2023 19:17:15 +0000</pubDate>
      <link>https://dev.to/sklarsa/hacking-in-kind-kubernetes-in-docker-2o9g</link>
      <guid>https://dev.to/sklarsa/hacking-in-kind-kubernetes-in-docker-2o9g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published &lt;a href="https://sklar.rocks/kind-hacking/" rel="noopener noreferrer"&gt;on my blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  How to dynamically add nodes to a kind cluster
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt; allows you to run a Kubernetes cluster inside Docker.  This is incredibly useful for developing Helm charts, Operators, or even just testing out different k8s features in a safe way.&lt;/p&gt;

&lt;p&gt;I've recently been working on an operator (built using the &lt;a href="https://github.com/operator-framework/operator-sdk" rel="noopener noreferrer"&gt;operator-sdk&lt;/a&gt;) that manages cluster node lifecycles.  Kind allows you to &lt;a href="https://kind.sigs.k8s.io/docs/user/configuration/#nodes" rel="noopener noreferrer"&gt;spin up clusters with multiple nodes&lt;/a&gt;, using a Docker container per-node and joining them using a common Docker network.  However, the &lt;code&gt;kind&lt;/code&gt; executable does not allow you to modify an existing cluster by adding or removing a node.&lt;/p&gt;

&lt;p&gt;I wanted to see if this was possible using a simple shell script, and it turns out that it's actually not too difficult!&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating the node
&lt;/h2&gt;

&lt;p&gt;Using my favorite diff tool, &lt;a href="https://sourcegear.com/diffmerge/" rel="noopener noreferrer"&gt;DiffMerge&lt;/a&gt;, and &lt;code&gt;docker inspect&lt;/code&gt; to compare an existing kind node's state to a new container's, I experimented with various &lt;code&gt;docker run&lt;/code&gt; flags until I got something that's close enough to the kind node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--restart&lt;/span&gt; on-failure &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /lib/modules:/lib/modules:ro &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--privileged&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--network&lt;/span&gt; kind &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--network-alias&lt;/span&gt; &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--tmpfs&lt;/span&gt; /run &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--tmpfs&lt;/span&gt; /tmp &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--security-opt&lt;/span&gt; &lt;span class="nv"&gt;seccomp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unconfined &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--security-opt&lt;/span&gt; &lt;span class="nv"&gt;apparmor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unconfined &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--security-opt&lt;/span&gt; &lt;span class="nv"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;disable &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;-v&lt;/span&gt; /var &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--label&lt;/span&gt; io.x-k8s.kind.cluster&lt;span class="o"&gt;=&lt;/span&gt;kind &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--label&lt;/span&gt; io.x-k8s.kind.role&lt;span class="o"&gt;=&lt;/span&gt;worker &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--env&lt;/span&gt; KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER &lt;span class="se"&gt;\&lt;/span&gt;
kindest/node:v1.25.2@sha256:9be91e9e9cdf116809841fc77ebdb8845443c4c72fe5218f3ae9eb57fdb4bace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Joining to the cluster
&lt;/h2&gt;

&lt;p&gt;You can join new nodes to a k8s cluster by using the &lt;code&gt;kubeadm join&lt;/code&gt; command.  In this case, we can use &lt;code&gt;docker exec&lt;/code&gt; to execute this command on our node after its container has started up.&lt;/p&gt;

&lt;p&gt;This command won't work out of the box because kind uses a &lt;code&gt;kubeadm.conf&lt;/code&gt; that does not exist in the node docker image.  It is injected into the container by the kind executable.&lt;/p&gt;

&lt;p&gt;Again, using my trusty DiffMerge tool, I compared two &lt;code&gt;/kind/kubeadm.conf&lt;/code&gt; files in existing kind nodes and found very few differences.  This allowed me to just grab one from any worker node to use as a template.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--privileged&lt;/span&gt; kind-worker &lt;span class="nb"&gt;cat&lt;/span&gt; /kind/kubeadm.conf &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$LOCAL_KUBEADM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From here, I needed to set the node's unique IP in its &lt;code&gt;kubeadm.conf&lt;/code&gt;.  We can use &lt;code&gt;docker inspect&lt;/code&gt; to grab any node IP address we need.  Since I'm working in bash, I just decided to use a simple sed replacement to replace the template node's IP address with my new node's IP in my local copy of &lt;code&gt;kubeadm.conf&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;TEMPLATE_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;docker inspect kind-worker | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[0].NetworkSettings.Networks.kind.IPAddress'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;NODE_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;docker inspect &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[0].NetworkSettings.Networks.kind.IPAddress'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;ESCAPED_TEMPLATE_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$TEMPLATE_IP&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/\./\\./g'&lt;/span&gt; &lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;ESCAPED_NODE_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$NODE_IP&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/\./\\./g'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt;.bkp &lt;span class="s2"&gt;"s/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ESCAPED_TEMPLATE_IP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ESCAPED_NODE_IP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/g"&lt;/span&gt; &lt;span class="nv"&gt;$LOCAL_KUBEADM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that our &lt;code&gt;kubeadm.conf&lt;/code&gt; is prepared, we need to copy it to the new node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--privileged&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; &lt;span class="nb"&gt;cp&lt;/span&gt; /dev/stdin /kind/kubeadm.conf &amp;lt; &lt;span class="nv"&gt;$LOCAL_KUBEADM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we can join our node to the cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--privileged&lt;/span&gt; &lt;span class="nv"&gt;$NODE_NAME&lt;/span&gt; kubeadm &lt;span class="nb"&gt;join&lt;/span&gt; &lt;span class="nt"&gt;--config&lt;/span&gt; /kind/kubeadm.conf &lt;span class="nt"&gt;--skip-phases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;preflight &lt;span class="nt"&gt;--v&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Node Tags
&lt;/h2&gt;

&lt;p&gt;Since you have complete control of the new node's &lt;code&gt;kubeadm.conf&lt;/code&gt;, it is possible to configure many of its properties for further testing.  For example, to add additional labels to the new node, you can run something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt;.bkp &lt;span class="s2"&gt;"s/node-labels: &lt;/span&gt;&lt;span class="se"&gt;\"\"&lt;/span&gt;&lt;span class="s2"&gt;/node-labels: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;my-label-key=my-label-value&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;/g"&lt;/span&gt; &lt;span class="nv"&gt;$LOCAL_KUBEADM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will add the &lt;code&gt;my-label-key=my-label-value&lt;/code&gt; label to the node once it joins the cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;Based on this script, I believe it's possible to add a &lt;code&gt;kind create node&lt;/code&gt; subcommand to add a node to an existing cluster.  Stay tuned for that...&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>kind</category>
    </item>
    <item>
      <title>How are Kubernetes VolumeAttachments Named?</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Tue, 29 Aug 2023 15:33:30 +0000</pubDate>
      <link>https://dev.to/sklarsa/how-are-kubernetes-volumeattachments-named-5o8</link>
      <guid>https://dev.to/sklarsa/how-are-kubernetes-volumeattachments-named-5o8</guid>
      <description>&lt;p&gt;This is a short article with a little k8s persistent storage nugget that I recently uncovered. FWIW, I've only checked this while using the AWS EBS &lt;a href="https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/" rel="noopener noreferrer"&gt;Container Storage Interface (CSI)&lt;/a&gt; driver, but the article should apply to any cluster using CSI drivers to mount volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;After a &lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" rel="noopener noreferrer"&gt;PersistentVolume (PV)&lt;/a&gt; is created and bound to a Persistent Volume Claim (PVC), a Pod can mount the claim as a volume to access its data. While the Pod is being provisioned, the k8s storage controller creates a &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/" rel="noopener noreferrer"&gt;VolumeAttachment&lt;/a&gt;, which declares its intent to physically attach the volume to the Pod's Node. The storage driver running on the Node will be notified when this VolumeAttachment is created, recognize that it needs to mount a volume to its host, and perform the volume mount, thus making the volume available for the Pod to use.&lt;/p&gt;

&lt;p&gt;Lets say that you want to check if a particular volume is actually attached to a Node. For example, maybe you want to ensure that a volume is unmounted before destroying a Node to ensure data integrity of a high-throughput database? So how can you find a PV's corresponding VolumeAttachment to determine if its backing volume is unmounted?&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying VolumeAttachments
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/" rel="noopener noreferrer"&gt;VolumeAttachment Spec&lt;/a&gt; contains &lt;code&gt;nodeName&lt;/code&gt; and &lt;code&gt;source&lt;/code&gt; fields, which as expected, correspond to the Node's name and PV's name respectively. But these are not valid &lt;code&gt;fieldSelector&lt;/code&gt; fields, so you cannot query the API server for a VolumeAttachment with a matching Node or PV name. Instead, you need to &lt;code&gt;LIST&lt;/code&gt; all VolumeAttachments in the cluster and iterate over them to find the one that you want. In a large cluster, there could be hundreds (or even thousands) of VolumeAttachments, so this operation could take quite a while to run inside of a controller reconcile loop.&lt;/p&gt;

&lt;p&gt;Maybe we could somehow craft a &lt;code&gt;GET&lt;/code&gt; request to find a PV's matching VolumeAttachment? Unfortunately, the "name" field, at least when using a supported k8s CSI driver, is a vague string that looks something like &lt;code&gt;csi-e52f481e06b9cf4dc9d95a56d1788026b4707cdce2e105d4444f0be9d6206a09&lt;/code&gt;. Where does this name come from? Can we use this to check if a PV is mounted to a particular Node?&lt;/p&gt;

&lt;h2&gt;
  
  
  VolumeAttachment Naming
&lt;/h2&gt;

&lt;p&gt;After a bit of digging, I found the answer in the &lt;a href="https://github.com/kubernetes/kubernetes/blob/78833e1b3385ea3485d38aa46586d39195377ec9/pkg/volume/csi/csi_attacher.go#L652:" rel="noopener noreferrer"&gt;k8s csi source code&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// getAttachmentName returns csi-&amp;lt;sha256(volName,csiDriverName,NodeName)&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;getAttachmentName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;volName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csiDriverName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nodeName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sum256&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s%s%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;volName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csiDriverName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nodeName&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"csi-%x"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attachment suffix is a sha256 sum of the names of the following components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The volume handle (part of the CSI Spec)&lt;/li&gt;
&lt;li&gt;The driver doing the mounting, and&lt;/li&gt;
&lt;li&gt;The target Node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With this discovery, we can now determine if a specific PV is mounted to a Node by using a &lt;code&gt;GET&lt;/code&gt; request based on the imputed VolumeAttachment name.&lt;/p&gt;

&lt;p&gt;Here's a small function that I wrote to generate a VolumeAttachment name based on a PV and Node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;csi&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"crypto/sha256"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;

    &lt;span class="n"&gt;v1core&lt;/span&gt; &lt;span class="s"&gt;"k8s.io/api/core/v1"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetVolumeAttachmentName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pv&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PersistentVolume&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSI&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;volName&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VolumeHandle&lt;/span&gt;
        &lt;span class="n"&gt;csiDriverName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CSI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Driver&lt;/span&gt;
        &lt;span class="n"&gt;nodeName&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sum256&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s%s%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;volName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csiDriverName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nodeName&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"csi-%x"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function accepts a PV and a Node, and sha256sums pieces of their metadata to construct a valid VolumeAttachment name. Now, you can now use this function to generate an input to check if a PV is mounted to a Node, all in a single API request!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;IsPvAttached&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pv&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PersistentVolume&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;va&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VolumeAttachment&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;apierrors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsNotFound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GetVolumeAttachmentName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The storage controller will delete a VolumeAttachment when a PV is unmounted from a Node, so we can simply check for the existence of a VolumeAttachment with the PV's name &amp;amp; driver and the Node's name by using our &lt;code&gt;GetVolumeAttachmentName&lt;/code&gt; function from above. If the VolumeAttachment does not exist, we can definitively say that the volume is unmounted from the Node, and safely delete the node without any data corruption on the volume.&lt;/p&gt;

&lt;p&gt;Hopefully this demystifies those strange VolumeAttachment names, and unlocks some new ideas when writing your next k8s controller.&lt;/p&gt;




&lt;p&gt;Originally posted on my blog: &lt;a href="https://sklar.rocks/k8s-volumeattachment-names/" rel="noopener noreferrer"&gt;https://sklar.rocks/k8s-volumeattachment-names/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>go</category>
    </item>
    <item>
      <title>Kubebuilder Tips and Tricks</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Tue, 22 Aug 2023 14:23:58 +0000</pubDate>
      <link>https://dev.to/sklarsa/kubebuilder-tips-and-tricks-28jj</link>
      <guid>https://dev.to/sklarsa/kubebuilder-tips-and-tricks-28jj</guid>
      <description>&lt;p&gt;Recently, I've been spending a lot of time writing a Kubernetes operator using the &lt;a href="https://sdk.operatorframework.io/docs/building-operators/golang/quickstart/" rel="noopener noreferrer"&gt;go operator-sdk&lt;/a&gt;, which is built on top of the &lt;a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener noreferrer"&gt;Kubebuilder&lt;/a&gt; framework.  This is a list of a few tips and tricks that I've compiled over the past few months working with these frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Log Formatting
&lt;/h2&gt;

&lt;p&gt;Kubebuilder, like much of the k8s ecosystem, utilizes &lt;a href="https://github.com/uber-go/zap" rel="noopener noreferrer"&gt;zap&lt;/a&gt; for logging.  Out of the box, the Kubebuilder zap configuration outputs a timestamp for each log, which gets formatted using scientific notation.  This makes it difficult for me to read the time of an event just by glancing at it.  Personally, I prefer ISO 8601, so let's change it!&lt;/p&gt;

&lt;p&gt;In your scaffolding's &lt;code&gt;main.go&lt;/code&gt;, you can configure your current logger format by modifying the &lt;code&gt;zap.Options&lt;/code&gt; struct and calling &lt;code&gt;ctrl.SetLogger&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Development&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TimeEncoder&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;zapcore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ISO8601TimeEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UseFlagOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, I added the &lt;code&gt;zapcore.ISO8601TimeEncoder&lt;/code&gt;, which encodes timestamps to human-readable ISO 8601-formatted strings.  It took a bit of digging, along with a bit of help from the Kubernetes Slack org, to figure this one out.  But it's been a huge quality-of-life improvement when debugging complex reconcile loops, especially in a multithreaded environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  MaxConcurrentReconciles
&lt;/h2&gt;

&lt;p&gt;Speaking of multithreaded environments, by default, an operator will only run a single reconcile loop per-controller.  However, in practice, especially when running a globally-scoped controller, it's useful to run multiple concurrent reconcile loops to simultaneously handle many resource changes at once.  Luckily, the Operator SDK makes this incredibly easy with the &lt;code&gt;MaxConcurrentReconciles&lt;/code&gt; setting.  We can set this up in a new controller's &lt;code&gt;SetupWithManager&lt;/code&gt; func:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CustomReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SetupWithManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;WithOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MaxConcurrentReconciles&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've created a command line arg in my &lt;code&gt;main.go&lt;/code&gt; file that allows the user to set this value to any integer value, since this will likely be a tweaked over time depending on how the controller performs in a production cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parent-Child Relationships
&lt;/h2&gt;

&lt;p&gt;One of the basic functions of a controller is to act as a parent to Kubernetes resources.  This allows the controller to "own"&lt;br&gt;
these objects such that when it is deleted, all child objects are automatically garbage collected by the Kubernetes runtime.&lt;/p&gt;

&lt;p&gt;I like this small function that can be called for any &lt;code&gt;client.Object&lt;/code&gt; to add a parent reference to the controller&lt;br&gt;
that you're writing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CustomReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ownObject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cr&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;myapiv1alpha1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CustomResource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetControllerReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scheme&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then add &lt;code&gt;Owns&lt;/code&gt; watches for these resources in your &lt;code&gt;SetupWithManager&lt;/code&gt; func.  These will instruct your controller to listen for changes in child resources of the specified types, triggering a reconcile loop on each change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CustomReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SetupWithManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1apps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deployment&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConfigMap&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Watches
&lt;/h2&gt;

&lt;p&gt;Your controller can also watch resources that it &lt;em&gt;doesn't&lt;/em&gt; own.  This is useful for when you need to watch for changes in globally-scoped resources like PersistentVolumes or Nodes.  Here's an example of how you would register this watch in your &lt;code&gt;SetupWithManager&lt;/code&gt; func.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;CustomReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SetupWithManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Watches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;{}},&lt;/span&gt;
            &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EnqueueRequestsFromMapFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myNodeFilterFunc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPredicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResourceVersionChangedPredicate&lt;/span&gt;&lt;span class="p"&gt;{}),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, you need to implement &lt;code&gt;myNodeFilterFunc&lt;/code&gt; to accept&lt;br&gt;
an &lt;code&gt;obj client.Object&lt;/code&gt; and return &lt;code&gt;[]reconcile.Request&lt;/code&gt;.  Using the&lt;br&gt;
&lt;code&gt;ResourceVersionChangedPredicate&lt;/code&gt; triggers the filter function for every change on that resource type, so it's important to&lt;br&gt;
write your filter function to be as efficient as possible, since there is a chance that it could be called quite a bit, especially&lt;br&gt;
if your controller is globally-scoped.&lt;/p&gt;
&lt;h2&gt;
  
  
  Field Indexers
&lt;/h2&gt;

&lt;p&gt;One gotcha that I encountered happened when trying to query for a list of Pods that are running on a particular Node.  This query uses a FieldSelector filter, as&lt;br&gt;
seen here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Get a list of all pods on the node&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;pods&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;FieldSelector&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSelectorOrDie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"spec.nodeName=%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This codepath led to the following error: &lt;code&gt;Index with name field:spec.nodeName does not exist&lt;/code&gt;.  After some googling around, I&lt;br&gt;
found &lt;a href="https://github.com/kubernetes-sigs/kubebuilder/issues/547" rel="noopener noreferrer"&gt;this GitHub issue&lt;/a&gt; that referenced&lt;br&gt;
a &lt;a href="https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html#setup" rel="noopener noreferrer"&gt;Kubebuilder docs page&lt;/a&gt; which contained the answer.&lt;/p&gt;

&lt;p&gt;Controllers created using operator-sdk and Kubebuilder use a built-in caching mechanism to store results of API requests.  This is to prevent&lt;br&gt;
spamming the K8s API, as well as improve reconciliation performance.&lt;/p&gt;

&lt;p&gt;When performing resource lookups using FieldSelectors, you first need to add your desired search field to an index&lt;br&gt;
that the cache can use for lookups.  Here's an example that will build this index for a Pod's &lt;code&gt;nodeName&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetFieldIndexer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IndexField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TODO&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pod&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="s"&gt;"spec.nodeName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rawObj&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pod&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;rawObj&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;v1core&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pod&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NodeName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we can run the &lt;code&gt;List&lt;/code&gt; function from above with the FieldSelector with no issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retries on Conflicts
&lt;/h2&gt;

&lt;p&gt;If you've ever written controllers, you're probably very familiar with the error &lt;code&gt;Operation cannot be fulfilled on ...: the object has been modified; please apply your changes to the latest version and try again&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This occurs when the version of the resource that you're currently reconciling in your controller is out-of-date with what's in latest version of the K8s cluster state.  If you're retrying your reconciliation loop on any errors, your controller will eventually reconcile the resource, but this can really pollute your logs and make it difficult to spot more important errors.&lt;/p&gt;

&lt;p&gt;After reading through the k8s source, I found the solution to this: &lt;code&gt;RetryOnConflict&lt;/code&gt;.  It's a &lt;a href="https://github.com/kubernetes/client-go/blob/648d82831a4db2e9d07c0dad53603aa15c45a968/util/retry/util.go#L103" rel="noopener noreferrer"&gt;utility function&lt;/a&gt; in the &lt;code&gt;client-go&lt;/code&gt; package that runs a function and automatically retries on conflict, up to a certain point.&lt;/p&gt;

&lt;p&gt;Now, you can just wrap your logic inside this function argument, and never have to worry about this issue again!  And the added benefit is that you just get to &lt;code&gt;return err&lt;/code&gt; instead of &lt;code&gt;return ctrl.Result{}, err&lt;/code&gt;, which makes your code that much easier to read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful Kubebuilder Markers
&lt;/h2&gt;

&lt;p&gt;Here are some useful code markers that I've found while developing my operator.&lt;/p&gt;

&lt;p&gt;To add custom columns to your custom resource's description (when running &lt;code&gt;kubectl get&lt;/code&gt;), you can add annotations to your API object like these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;//+kubebuilder:printcolumn:name="NodeReady",type="boolean",JSONPath=".status.nodeReady"&lt;/span&gt;
&lt;span class="c"&gt;//+kubebuilder:printcolumn:name="NodeIp",type="string",JSONPath=".status.nodeIp"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To add a shortname to your custom resource (like &lt;code&gt;pvc&lt;/code&gt; for &lt;code&gt;PersistentVolumeClaim&lt;/code&gt; for example), you can add this annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;//+kubebuilder:resource:shortName=mycr;mycrs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More docs on kubebuilder markers can be found here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://book.kubebuilder.io/reference/markers/crd.html" rel="noopener noreferrer"&gt;https://book.kubebuilder.io/reference/markers/crd.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Originally published on my blog: &lt;a href="https://sklar.rocks/kubebuilder-tips/" rel="noopener noreferrer"&gt;https://sklar.rocks/kubebuilder-tips/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>go</category>
      <category>kubebuilder</category>
    </item>
    <item>
      <title>We moved our Cloud operations to a Kubernetes Operator</title>
      <dc:creator>Steven Sklar</dc:creator>
      <pubDate>Tue, 15 Aug 2023 16:16:22 +0000</pubDate>
      <link>https://dev.to/sklarsa/we-moved-our-cloud-operations-to-a-kubernetes-operator-392n</link>
      <guid>https://dev.to/sklarsa/we-moved-our-cloud-operations-to-a-kubernetes-operator-392n</guid>
      <description>&lt;p&gt;Kubernetes operators seem to be everywhere these days. There are hundreds of them available to install in your cluster. Operators manage everything from X.509 certificates to enterprise-y database deployments, service meshes, and monitoring stacks. But why add even more complexity to a system that many say is already convoluted? With an operator controlling cluster resources, if you have a bug, you now have two problems: figuring out which of the myriad of Kubernetes primitives is causing the issue, as well as understanding how that dang operator is contributing to the problem!&lt;/p&gt;

&lt;p&gt;So when I proposed the idea of introducing a custom-built operator into our Cloud infrastructure stack, I knew that I had to really sell it as a true benefit to the company, so people wouldn't think that I was spending my time on a piece of RDD vaporware. After the go-ahead to begin the project, I spent my days learning about Kubernetes internals, scouring the internet far and wide to find content about operator development, and writing (and rewriting) the base of what was to become our QuestDB Cloud operator.&lt;/p&gt;

&lt;p&gt;After many months of hard work, we've finally deployed this Kubernetes operator into our production Cloud. Building on top of our existing 'one-shot-with-rollback' provisioning model, the operator adds new functionality that allows us to perform complex operations like automatically recovering from node failure, and will also orchestrate upcoming QuestDB Enterprise features such as High Availability and Cold Storage.&lt;/p&gt;

&lt;p&gt;If we've done my job correctly, you, as a current or potential end user will continue to enjoy the stability and easy management of your growing QuestDB deployments, without noticing a single difference. This is the story of how we accomplished this engineering feat and why we even did it in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original System Architecture
&lt;/h2&gt;

&lt;p&gt;To manage customer instances in our cloud, we use a queue-based system that is able to provision databases across different AWS accounts and regions using a centralized control plane known simply as "the provisioner."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze7nsy888hmpkhrpkxmp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze7nsy888hmpkhrpkxmp.jpg" alt="Provisioner Architecture" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a user creates, modifies, or deletes a database in the QuestDB Cloud, an application backend server asynchronously sends a message that describes the change to a backend worker, which then appends that message to a Kafka log. Each message contains information about the target database (its unique identifier, resident cluster, and information about its config, runtime, and storage) as well as the specific set of operations that are to be performed.&lt;/p&gt;

&lt;p&gt;These instructional messages get picked up by one of many provisioner workers, each of which subscribes to a particular Kafka log partition. Once a provisioner worker receives a message, it is decoded and its instructions are carried out inside the same process. Instructions vary based on the specific operation that needs to be performed. These range from simple tasks like an application-level restart, to more complex ones such as tearing down an underlying database's node if the user wants to pause an active instance to save costs on an unused resource.&lt;/p&gt;

&lt;p&gt;Each high-level instruction is broken down into smaller, composable granular tasks. Thus, the provisioner coordinates resources at a relatively low level, explicitly managing k8s primitives like StatefulSets, Deployments, Ingresses, Services, and PersistentVolumes as well as AWS resources like EC2 Instances, Security Groups, Route53 Records, and EBS Volumes.&lt;/p&gt;

&lt;p&gt;Provisioning progress is reported back to the application through a separate Kafka queue (we call it the "gossip" queue). Backend workers are listening to this queue for new messages, and update a customer database's status in our backend database when new information has been received.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrcztck8ub90ja91kul0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrcztck8ub90ja91kul0.jpg" alt="Provisioner Flowchart" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each granular operation also has a corresponding rollback procedure in case the provisioner experiences an error while running a set of commands. These rollbacks are designed to leave the target database's infrastructure in a consistent state, although it's important to note that this is not the desired state as intended by the end user's actions. Once an error has been encountered and the rollback executed, the backend will be notified about the original error through the gossip Kafka queue in order to surface any relevant information back to the end user. Our internal monitoring system will also trigger alerts on these errors so an on-call engineer can investigate the issue and kick off the appropriate remediation steps.&lt;/p&gt;

&lt;p&gt;While this system has proven to be reliable over the past several months since we &lt;a href="https://twitter.com/nicolas_hrd/status/1641770344468946944" rel="noopener noreferrer"&gt;first launched our public cloud offering&lt;/a&gt;, the framework still leaves room for improvement. Much of our recent database development work around High Availability and Cold Storage requires adding more infrastructure components to our existing system. And adding new components to any distributed system tends to increase the system's overall error rate.&lt;/p&gt;

&lt;p&gt;As we continue to add complexity to our infrastructure, there is a risk that the above rollback strategy won't be sufficient enough to recover from new classes of errors. We need a more responsive and dynamic provisioning system that can automatically respond to faults and take proactive remediation action, instead of triggering alerts and potentially waking up one of our globally-distributed engineers.&lt;/p&gt;

&lt;p&gt;This is where a Kubernetes operator can pick up the slack and improve our existing infrastructure automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing a Kubernetes Operator
&lt;/h3&gt;

&lt;p&gt;Now that stateful workloads have started to mature in the Kubernetes ecosystem, practitioners are writing operators to automate the management of these workloads. Operators that handle full-lifecycle tasks like provisioning, snapshotting, migrating, and upgrading clustered database systems have started to become mainstream across the industry.&lt;/p&gt;

&lt;p&gt;These operators include Custom Resource Definitions (CRDs) that define new types of Kubernetes API resources which describe how to manage the intricacies of each particular database system. Custom Resources (CRs) are then created from these CRDs that effectively define the expected state of a particular database resource. CRs are managed by operator controller deployments that orchestrate various Kubernetes primitives to migrate the cluster state to its desired state based on the CR manifests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd4latenbpn6mix0umn5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd4latenbpn6mix0umn5.jpg" alt="Operator Terminology" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By writing our own Kubernetes operator, I believed, we could take advantage of this model by allowing the provisioner to only focus on higher-level tasks, while an operator handles the nitty-gritty details of managing each individual customer database. Since an operator is directly hooked in to the Kubernetes API server through Resource Watches, it can immediately react to changes in cluster state and constantly reconcile cluster resources to achieve the desired state described by a Custom Resource spec. This allows us to automate even more of our infrastructure management, since the control plane is effectively running all of the time, instead of only during explicit provisioning actions.&lt;/p&gt;

&lt;p&gt;With an operator deployed in one of our clusters, we can now interact with a customer database by executing API calls and CRUD operations against a single resource, a new Custom Resource Definition that represents an entire QuestDB Cloud deployment. This CRD allows us to maintain a central Kubernetes object that represents the canonical version of the entire state of a customer's QuestDB database deployment in our cluster. It includes fields like the database version, instance type, volume size, server config, and the underlying node image, as well as key pieces of status like a node's IP address and DNS record information. Based on these spec field values, as well as changes to them, the operator performs a continuous reconciliation of all the relevant Kubernetes and AWS primitives to match the expected state of the higher-level QuestDB construct.&lt;/p&gt;

&lt;h3&gt;
  
  
  New Architecture
&lt;/h3&gt;

&lt;p&gt;This leads us to our new architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczmyogn5lw8zjxz3jmuq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczmyogn5lw8zjxz3jmuq.jpg" alt="Operator Architecture" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While this diagram resembles the old one, there is a key difference here; a QuestDB operator instance is now running inside of each cluster! Instead of the provisioner being responsible for all of the low-level Kubernetes and AWS building blocks, it is now only responsible for modifying a single object: a QuestDB Custom Resource. Any modifications to the state of a QuestDB deployment, large or small, can be made by changing a single manifest that describes the desired state of the entire deployment.&lt;/p&gt;

&lt;p&gt;This solution still uses the provisioner, but in a much more limited capacity than before. The provisioner continues to act as the communication link between the application and each tenant cluster, and it is still responsible for coordinating high-level operations. But, it can now delegate low-level database management responsibilities to the operator's constantly-running reconciliation loops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operator Benefits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Auto-Healing
&lt;/h3&gt;

&lt;p&gt;Since the operator model is eventually-consistent, we no longer need to rely on rollbacks in the face of errors. Instead, if the operator encounters an error in its reconciliation loop, it re-attempts that reconciliation at a later time, using an exponential backoff by default.&lt;/p&gt;

&lt;p&gt;The operator is registered to Resource Watches for all QuestDBs in the cluster, as well as for their subcomponents. So, a change in any of these components, or the QuestDB spec itself, will trigger a reconciliation loop inside the operator. If a component is accidentally mutated or experiences some state drift, the operator will be immediately notified of this change and automatically attempt to bring the rest of the infrastructure to its desired state.&lt;/p&gt;

&lt;p&gt;This property is incredibly useful to us, since we provision a dedicated node for each database and use a combination of taints and NodeSelectors to ensure that all of a deployment's Pods are running on this single node. Due to these design choices, we are now exposed to the risk of a node hardware failure that renders it inaccessible. Typical Kubernetes recovery patterns can not save us.&lt;/p&gt;

&lt;p&gt;So instead of using an alerting-first remediation paradigm, where on-call engineers would follow a runbook after receiving an alert that a database is down, the operator allows us to write custom node failover logic that is triggered as soon as the cluster is aware of an unresponsive node. Now, our engineers can sleep soundly through the night while the operator automatically performs the remediation steps that they would previously be responsible for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simpler Provisioning
&lt;/h3&gt;

&lt;p&gt;Instead of having to coordinate tens of individual objects in a typical operation, the provisioner now only has to modify a single Kubernetes object while executing higher level instructions. This dramatically reduces the code complexity inside of the provisioner codebase and makes it far easier for us to write unit and integration tests. It is also easier for on-call engineers to quickly diagnose and solve issues with customer databases, since all of the relevant information about a specific database is contained inside of a single Kubernetes spec manifest, instead of spread across various resource types inside each namespace. Although, as expressed in the previous point, hopefully there will be fewer problems for our on-call engineers to solve in the first place!&lt;/p&gt;

&lt;h3&gt;
  
  
  Easier Upgrades
&lt;/h3&gt;

&lt;p&gt;The 15-week Kubernetes release cadence can be a burden for smaller infrastructure teams to keep on top of. This relatively rapid schedule, combined with the short lifetime of each release, means that it is critical for organisations to keep their clusters up-to-date. The importance of this is magnified if you're not running your own control plane, since vendors tend to drop support for old k8s versions fairly quickly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuymyi9n475krb9107g2f.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuymyi9n475krb9107g2f.jpg" alt="EKS End Of Life Schedule" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we're running each database on its own node, and there are only two of us to manage the entire QuestDB Cloud infrastructure, we could easily spend most of our time just upgrading customer database nodes to new Kubernetes versions. Instead, our operator now allows us to automatically perform a near-zero-downtime node upgrade with a single field change on our CRD.&lt;/p&gt;

&lt;p&gt;Once the operator detects a change in a database's underlying node image, it will handle all of the orchestration around the upgrade, including keeping the customer database online for as long as possible and cleaning up stale resources once the upgrade is complete. With this automation in place, we can now perform partial upgrades against subsets of our infrastructure to test new versions before rolling them out to the entire cluster, and also seamlessly roll back to an older node version if required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Plane Locality
&lt;/h3&gt;

&lt;p&gt;By deploying an operator-per-cluster, we can improve the time it takes for provisioning operations to complete by avoiding most cross-region traffic that takes place from the provisioner to remote tenant clusters. We only need to make a few cross-region API calls to kickstart the provisioning process by performing a CRUD operation on a QuestDB spec, and the local cluster operator will handle the rest. This improves our API call network latency and reliability, reducing the number of retries for any given operation as well as the overall database provisioning time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Coordination
&lt;/h3&gt;

&lt;p&gt;Now that we have a single Kubernetes resource to represent an entire QuestDB deployment, we can use this resource as a primitive to build even larger abstractions. For example, now we can easily create a massive (but temporary) instance to crunch large amounts of data, save that data to a PersistentVolume in the cluster, then spin up another smaller database to act as the query interface. Due to the inherent composability of Kubernetes resources, the possibilities with a QuestDB Custom Resource are endless!&lt;/p&gt;

&lt;h2&gt;
  
  
  The DevOps Stuff
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Testing the Operator
&lt;/h3&gt;

&lt;p&gt;Before deploying such a significant piece of software in our stack, I wanted to be as confident as possible in its behavior and runtime characteristics before letting it run wild in our clusters. So we wrote and ran a large series of unit and integration tests before we felt confident enough to deploy the new infrastructure to development, and eventually to production.&lt;/p&gt;

&lt;p&gt;Unit tests were written against an in-memory Kubernetes API server using the &lt;a href="https://book.kubebuilder.io/reference/envtest.html" rel="noopener noreferrer"&gt;controller-runtime/pkg/envtest&lt;/a&gt; library. Envtest allowed us to iterate quickly since we could run tests against a fresh API cluster that started up in around 5 seconds instead of having to spin up a new cluster every time we wanted to run a test suite. Even existing micro-cluster tools like &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt; could not get us that level of performance. Since envtest is also not packaged with any controllers, we could also set our test cluster to a specific state and be sure that this state would not be modified unless we explicitly did so in our test code. This allowed us to fully test specific edge-cases without having to worry about control plane-level controllers modifying various objects out from underneath us.&lt;/p&gt;

&lt;p&gt;But unit testing is not enough, especially since many of the primitives that we are manipulating are fairly high-level abstractions in and of themselves. So we also spun up a test EKS cluster to run live tests against, built with the same manifests as our development and production environments. This cluster allowed us run tests using live AWS and K8s APIs, and was integral when testing features like node recovery and upgrades.&lt;/p&gt;

&lt;p&gt;We were also able to leverage &lt;a href="https://onsi.github.io/ginkgo/" rel="noopener noreferrer"&gt;Ginkgo&lt;/a&gt;'s parallel testing runtime to run our integration tests on multiple concurrent processes. This provided multiple benefits: we could run our entire integration test suite in under 10 minutes and also reuse the same suite to load test the operator in a production-like environment. Using these tests, we were able to identify hot spots in the code that needed further optimization and experimented with ways to save API calls to ease the load on our own Kubernetes API server while also staying under various AWS rate limits. It was only after running these tests over and over again that I felt confident enough to deploy the operator to our dev and prod clusters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring
&lt;/h3&gt;

&lt;p&gt;Since we built our operator using the &lt;a href="https://book.kubebuilder.io/" rel="noopener noreferrer"&gt;Kubebuilder framework&lt;/a&gt;, most standard monitoring tasks were handled for us out-of-the-box. Our operator automatically exposes a rich set of Prometheus metrics that measure reconciliation performance, the number of k8s API calls, workqueue statistics, and memory-related metrics. We we were able to ingest these metrics into pre-built dashboards by leveraging the &lt;a href="https://book.kubebuilder.io/plugins/grafana-v1-alpha" rel="noopener noreferrer"&gt;grafana/v1-alpha plugin&lt;/a&gt;, which scaffolds two Grafana dashboards to monitor Operator resource usage and performance. All we had to do was add these to our existing Grafana manifests and we were good to go!&lt;/p&gt;

&lt;h3&gt;
  
  
  Migrating to the New Paradigm
&lt;/h3&gt;

&lt;p&gt;No article about a large infrastructure change would be complete without a discussion about the migration strategy from old to the new. In our case, we designed the operator to be a drop-in-replacement for the database instances managed by the provisioner. This involved many tedious rounds of diffing yaml manifests to ensure that the resources coordinated by the operator would match the ones that were already in our cluster.&lt;/p&gt;

&lt;p&gt;Once this resource parity was reached, we utilized Kubernetes owner-dependent relationships to create an Owner Reference on each sub-resource that pointed to the new QuestDB Custom Resource. Part of the reconciliation loop involves ensuring that these ownership references are in place, and if not, are created automatically. This way, the operator can seamlessly "inherit" legacy resources through its main reconciliation codepath.&lt;/p&gt;

&lt;p&gt;We were able to roll this out incrementally due to the guarantee that the operator cannot make any changes to resources that are not owned by a QuestDB CR. Because of this property, we could create a single QuestDB CR and monitor the operator's behavior on only the cluster resources associated with that database. Once everything behaved as expected, we started to roll out additional Custom Resources. This could be easily controlled on the provisioner side; by checking for the existence of a QuestDB CR for a given database. If the CR existed, then we could be sure that we were dealing with an operator-managed instance, otherwise the provisioner would behave as it did before the operator existed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I'm extremely proud of the hard work that we've put into this project. Starting from an empty git repo, it's been a long journey to get here. There have been several times over the past few months when I questioned if all of this work was even worth the trouble, especially since we already had a provisioning system that was battle-tested in production. But with the support of my colleagues, I powered through to the end. And I can definitely say that it's been worth it, as we've already seen immediate benefits of moving to this new paradigm!&lt;/p&gt;

&lt;p&gt;Using our QuestDB CR as a building block has paved the way for new features like adding a large-format bulk CSV upload with our &lt;a href="https://questdb.io/docs/guides/importing-data/" rel="noopener noreferrer"&gt;SQL COPY command&lt;/a&gt;. The new node upgrade process will save us countless hours with every new k8s release. And even though we haven't experienced a hardware failure yet, I'm confident that our new operator will seamlessly handle that situation as well, without paging up an on-call engineer. Especially since we are a small team, it is critical for us to use automation as a force-multiplier that assists us in managing our Cloud infrastructure, and a Kubernetes operator provides us with next-level automation that was previously unattainable.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;The best way to determine the best fit is to make it real with your own data. &lt;a href="https://questdb.io/cloud" rel="noopener noreferrer"&gt;QuestDB Cloud&lt;/a&gt; offers $200 in free credit. Or, if you would rather play around with sample data first, check out our &lt;a href="https://demo.questdb.io" rel="noopener noreferrer"&gt;live Web Console&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>kubernetes</category>
      <category>questdb</category>
      <category>infrastructure</category>
      <category>go</category>
    </item>
  </channel>
</rss>
