<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joel Takvorian</title>
    <description>The latest articles on DEV Community by Joel Takvorian (@jotak).</description>
    <link>https://dev.to/jotak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F508598%2F73887a04-fa6f-42d6-bc96-7f57b3301f50.png</url>
      <title>DEV Community: Joel Takvorian</title>
      <link>https://dev.to/jotak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jotak"/>
    <language>en</language>
    <item>
      <title>Kubernetes operators: avoiding the memory pitfall</title>
      <dc:creator>Joel Takvorian</dc:creator>
      <pubDate>Fri, 19 Jul 2024 08:43:55 +0000</pubDate>
      <link>https://dev.to/jotak/kubernetes-operators-avoiding-the-memory-pitfall-10le</link>
      <guid>https://dev.to/jotak/kubernetes-operators-avoiding-the-memory-pitfall-10le</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;sup&gt;Cover image by Blake Patterson - CC BY 2.0 - &lt;a href="https://flickr.com/photos/blakespot/6173837649" rel="noopener noreferrer"&gt;https://flickr.com/photos/blakespot/6173837649&lt;/a&gt;&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Previously, in the tribulations of a Kubernetes operator developer: &lt;a href="https://dev.to/jotak/kubernetes-crd-the-versioning-joy-6g0"&gt;Kubernetes CRD: the versioning joy&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A while ago, in the NetObserv team, we heard a user complaining about the memory consumption of our operator. Sure, memory and resource footprint is always a concern, but wait, what? Did they really mean the &lt;strong&gt;operator&lt;/strong&gt;? Probably wrong, the operator itself doesn't do anything that is memory intensive. Or...&lt;/p&gt;

&lt;p&gt;What does the operator do, by the way? It is responsible for keeping the other NetObserv components afloat. It reads a config (a custom resource), and makes sure all the underlying components (for us: some eBPF agents, &lt;code&gt;flowlogs-pipeline&lt;/code&gt;, and an OpenShift console plugin) are well configured and running according to that global config. For doing so, it needs to fetch, create or update a few Kubernetes resources (deployments, configmaps, secrets, etc.), and watch them. There are a few other things, but really nothing that can explain that high memory usage.&lt;/p&gt;

&lt;p&gt;And we're being told the user had to increase the operator memory limit to 4 GB. &lt;strong&gt;4 GB&lt;/strong&gt;. That is fishy.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problems
&lt;/h3&gt;

&lt;p&gt;Just to clarify: &lt;a href="https://github.com/netobserv/network-observability-operator/pull/476" rel="noopener noreferrer"&gt;we fixed the problem&lt;/a&gt; back in November 2023, last year. For initial investigations, we asked our friends of the Operator Framework: are we doing anything wrong? It turned out yes, we made a couple of wrong assumptions. To begin with, we were given some pointers such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://sdk.operatorframework.io/docs/best-practices/managing-resources/#how-to-compute-default-values" rel="noopener noreferrer"&gt;Managing resource requests/limits with operators&lt;/a&gt; (we already did that, this wasn't the root problem here, although it helps avoiding OOM-kills)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://groups.google.com/g/operator-framework/c/AIiDgRPJc00" rel="noopener noreferrer"&gt;This email thread&lt;/a&gt; (getting closer to the problem!)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://master.sdk.operatorframework.io/docs/best-practices/designing-lean-operators/" rel="noopener noreferrer"&gt;Good practices around cache management&lt;/a&gt; (problem found! Solution? Not yet...)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quoting some excerpts from that last link:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;One of the pitfalls that many operators are failing into is that they watch resources with high cardinality like secrets possibly in all namespaces. This has a massive impact on the memory used by the controller on big clusters. Such resources can be filtered by label or fields.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Requests to a client backed by a filtered cache for objects that do not match the filter will never return anything. In other words, filtered caches make the filtered-out objects invisible to the client.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So yes, that was us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;For&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;flowslatest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FlowCollector&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;corev1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConfigMap&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;appsv1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deployment&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="c"&gt;// etc.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was our former definition of our controller builder. You see that we declare "owning" config maps. &lt;code&gt;Owns&lt;/code&gt; is documented as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Owns defines types of Objects being *generated* by the ControllerManagedBy, and configures the ControllerManagedBy to respond to&lt;/span&gt;
&lt;span class="c"&gt;// create / delete / update events by *reconciling the owner object*.&lt;/span&gt;
&lt;span class="c"&gt;//&lt;/span&gt;
&lt;span class="c"&gt;// The default behavior reconciles only the first controller-type OwnerReference of the given type.&lt;/span&gt;
&lt;span class="c"&gt;// Use Owns(object, builder.MatchEveryOwner) to reconcile all owners.&lt;/span&gt;
&lt;span class="c"&gt;//&lt;/span&gt;
&lt;span class="c"&gt;// By default, this is the equivalent of calling&lt;/span&gt;
&lt;span class="c"&gt;// Watches(object, handler.EnqueueRequestForOwner([...], ownerType, OnlyControllerOwner())).&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that reconcile requests are generated when the watched resources of the given kinds are created/updated/deleted. &lt;strong&gt;It doesn't mean that only these owned resources are watched in the underlying informers&lt;/strong&gt;. In fact, if you don't properly configure the cache, the underlying informers end up watching &lt;strong&gt;all&lt;/strong&gt; resources of the given kinds in the cluster. That was the first wrong assumption that we made, about what &lt;code&gt;Owns&lt;/code&gt; means. On large clusters with many ConfigMaps or Secrets, this has a massive impact, not only in memory, but also in the bandwidth usage with the API server. Other kinds, such as Deployments or DaemonSets, are generally less numerous and heavy, so might be OK to keep as globally watched, even though there's basically the same problem.&lt;/p&gt;

&lt;p&gt;But there's more.&lt;/p&gt;

&lt;p&gt;We noticed that removing the &lt;code&gt;Owns(&amp;amp;corev1.ConfigMap{}).&lt;/code&gt; line didn't change anything with the memory consumption. There was something else. We found that we still had all cluster config maps in memory. Yet, we don't declare any watches or informers on ConfigMaps. At least not intentionally. What we do however, is calling things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NamespacedName&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"my-cm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="s"&gt;"my-ns"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;configmap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sounds pretty harmless, doesn't it?&lt;/p&gt;

&lt;p&gt;Let's check the doc, from &lt;code&gt;Get&lt;/code&gt; in the &lt;code&gt;Reader&lt;/code&gt; interface (&lt;a href="https://github.com/kubernetes-sigs/controller-runtime/blob/1ed345090869edc4bd94fe220386cb7fa5df745f/pkg/client/interfaces.go#L50" rel="noopener noreferrer"&gt;permalink&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="c"&gt;// Get retrieves an obj for the given object key from the Kubernetes Cluster.&lt;/span&gt;
    &lt;span class="c"&gt;// obj must be a struct pointer so that obj can be updated with the response&lt;/span&gt;
    &lt;span class="c"&gt;// returned by the Server.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, nothing fancy here. But just in case: what about the interface implementations? First, it's interesting to notice that there are several implementations: from &lt;code&gt;client&lt;/code&gt;, &lt;code&gt;typed_client&lt;/code&gt;, &lt;code&gt;unstructured_client&lt;/code&gt;, &lt;code&gt;namespaced_client&lt;/code&gt;, &lt;code&gt;metadata_client&lt;/code&gt;. The &lt;a href="https://github.com/kubernetes-sigs/controller-runtime/blob/1ed345090869edc4bd94fe220386cb7fa5df745f/pkg/client/client.go#L351" rel="noopener noreferrer"&gt;main implementation&lt;/a&gt;, in &lt;code&gt;client.go&lt;/code&gt;, attempts to read from a cache, or falls back to doing a live query with one of the other implementations (typed, metadata or unstructured client). The namespaced client is a wrapper on top of another client.&lt;/p&gt;

&lt;p&gt;The package-level doc mentions that a cache is used, but without telling much about the details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// It is a common pattern in Kubernetes to read from a cache and write to the API&lt;/span&gt;
&lt;span class="c"&gt;// server.  This pattern is covered by the creating the Client with a Cache.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What cache are we talking about? Is it for lazy-loading the requested resources?&lt;br&gt;
Not really: it's again &lt;a href="https://github.com/kubernetes-sigs/controller-runtime/blob/1ed345090869edc4bd94fe220386cb7fa5df745f/pkg/cache/cache.go#L386-L403" rel="noopener noreferrer"&gt;an informers cache&lt;/a&gt;. Informers don't load resources lazily: they prefetch everything. So when the first request is done to fetch a resource of a given kind, an informer for that kind is started, filling up with all cluster data. This is the reason why we still had a high memory consumption, and this is our second wrong assumption: that a simple &lt;code&gt;Get&lt;/code&gt; could not be harmful. I find this quite pernicious as it's all done very implicitly, it's so easy to shoot yourself in the foot.&lt;/p&gt;

&lt;p&gt;Removing this &lt;code&gt;Get&lt;/code&gt; call would finally result in a much smaller memory footprint of the operator, from gigabytes to less than 100 MB. OK then, how to fix it really, while still fetching the resources that we need? There are several options, depending on the use cases.&lt;/p&gt;
&lt;h3&gt;
  
  
  Non-solutions
&lt;/h3&gt;

&lt;p&gt;About the manager cache, you may think using custom predicates in the controller builder would solve this issue. For instance, writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MyReconciler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;SetupWithManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewControllerManagedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mgr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;For&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MyResource&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Owns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;corev1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConfigMap&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPredicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myPredicate&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where &lt;code&gt;myPredicate&lt;/code&gt; would narrow down the watched config maps.&lt;/p&gt;

&lt;p&gt;But it's not. These predicates filter which changes would generate a reconcile request, but this only happens &lt;em&gt;after&lt;/em&gt; the informers have been updated with the created/updated/deleted config maps. In other words, it avoids triggering unnecessary reconcile loops (thus saving CPU), but it has no impact on what informers are keeping in cache under the cover.&lt;/p&gt;

&lt;h3&gt;
  
  
  The solutions
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;controller-runtime&lt;/code&gt; offers several mitigations for the issue:&lt;/p&gt;

&lt;p&gt;&lt;span&gt;1.&lt;/span&gt; As mentioned above, &lt;a href="https://master.sdk.operatorframework.io/docs/best-practices/designing-lean-operators/" rel="noopener noreferrer"&gt;this page on good practices&lt;/a&gt; provides two examples to restrict the scope of the informers: by restricting cached resource according to their label, or according to a field, such as the resource name or namespace. Read the &lt;a href="https://github.com/kubernetes-sigs/controller-runtime/blob/1ed345090869edc4bd94fe220386cb7fa5df745f/pkg/cache/cache.go#L135" rel="noopener noreferrer"&gt;cache options documentation&lt;/a&gt; for more information. As mentioned, you need to be careful when using this solution, as it will prevent you from accessing resources out of the scope defined. But this is definitely the way to go, if that works for you.&lt;/p&gt;

&lt;p&gt;However, this solution is only useful when you have some control, or prior knowledge, over the resources that you want to get. This is not always the case. For instance, you may expose an API to users allowing them to reference any config map or secret; &lt;a href="https://github.com/netobserv/network-observability-operator/blob/f569dbb5b476578ec0c57284f7f43e1abccfc939/apis/flowcollector/v1beta2/flowcollector_types.go#L932-L948" rel="noopener noreferrer"&gt;this is what we do in NetObserv&lt;/a&gt;, when we need to load certificates for communicating with other systems. In that case, it is not possible to make any assumption about the config map name or namespace, labels, etc., as we get this information only in the reconcile loops, when the manager and the controllers are already started, hence their cache is already configured. We could, maybe, consider the option to restart the whole managers and controllers when we detect a change on the required resource names/namespaces, but… meh, that sounds very complicated for an apparently simple problem, isn't it?&lt;/p&gt;

&lt;p&gt;&lt;span&gt;2.&lt;/span&gt; Another option is to just &lt;a href="https://github.com/kubernetes-sigs/controller-runtime/blob/1ed345090869edc4bd94fe220386cb7fa5df745f/pkg/client/client.go#L82" rel="noopener noreferrer"&gt;disable the cache&lt;/a&gt; for some group-version-kinds (GVK). Here, we're talking about the client cache, which is not the same as the controller-runtime manager cache. This is a bit of a brutal solution: caches are used for a good reason, when used correctly, to minimize traffic from the API server, especially when the fetched resources aren't expected to change often.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;3.&lt;/span&gt; Or we can implement our own cache layer. This is the solution we opted for with NetObserv, given that option 1 didn't work for us, and option 2 would lose all the benefit of a cache.&lt;/p&gt;

&lt;p&gt;So we have this &lt;a href="https://github.com/netobserv/network-observability-operator/blob/f569dbb5b476578ec0c57284f7f43e1abccfc939/pkg/narrowcache/doc.go" rel="noopener noreferrer"&gt;narrowcache&lt;/a&gt; package that:&lt;br&gt;
      - provides a client, that is a wrapper on top of &lt;code&gt;sigs.k8s.io/controller-runtime/pkg/client&lt;/code&gt;&lt;br&gt;
      - is configured to use with an explicitly provided list of GVKs (requests for other GVKs are redirected to the wrapped client)&lt;br&gt;
      - also provides a &lt;code&gt;Source&lt;/code&gt; interface, allowing to be used for enqueuing reconcile requests with watches defined on the controllers&lt;/p&gt;

&lt;p&gt;The resources of managed GVKs are then lazy-loaded: on first request, they are fetched using a live client, added in a local cache, and a watch is created under the cover to track any change for that specific resource (NOT tracking the full GVK, like informers do). Subsequent calls just return the cached object.&lt;/p&gt;

&lt;p&gt;It is used as such in NetObserv:&lt;/p&gt;

&lt;p&gt;&lt;span&gt;1.&lt;/span&gt; In manager initialisation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kcfg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// ...&lt;/span&gt;
    &lt;span class="n"&gt;narrowCache&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;narrowcache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kcfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narrowcache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConfigMaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;narrowcache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secrets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;narrowCache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ControllerRuntimeClientCacheOptions&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

    &lt;span class="n"&gt;internalManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kcfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;narrowCache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreateClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;internalManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetClient&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"unable to create narrow cache client: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c"&gt;// ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;2.&lt;/span&gt; Subsequent calls to &lt;code&gt;Get&lt;/code&gt; are done transparently, as it implements the client &lt;code&gt;Reader&lt;/code&gt; interface.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;3.&lt;/span&gt; Watching for enqueuing reconcile requests is done as such:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt; &lt;span class="n"&gt;controller&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cl&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;narrowcache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;cl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EnqueueRequestsFromMapFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;reconcile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c"&gt;// enqueuing logic / filtering here&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To be honest, this is not a panacea, because we're rewriting some logic that also exists in controller-runtime, and very certainly it's better than us at doing it (except for not covering the use case that we need). There's also the drawback of having to deal with potential breaking changes with the controller-runtime interfaces, especially related to the Source watching. I wish it was addressed directly upstream, but &lt;a href="https://github.com/kubernetes-sigs/controller-runtime/issues/2570" rel="noopener noreferrer"&gt;this proposal&lt;/a&gt; was rejected: it's considered an edge case.&lt;/p&gt;

&lt;h3&gt;
  
  
  By the way: an edge-case, or a broad one?
&lt;/h3&gt;

&lt;p&gt;I played this little game:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install some operators&lt;/li&gt;
&lt;li&gt;Monitor memory consumption and ingress traffic on these operators&lt;/li&gt;
&lt;li&gt;Create many config maps and secrets&lt;/li&gt;
&lt;li&gt;Rinse and repeat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of about ~60 random operators tested, picked up from operator hub, 13 showed a memory increase, and traffic spike to the API server, correlated with the unrelated resources that I created. This is far from negligible. (I'm contacting the authors to let them know - no intent to shame of course, it's so easy to get trapped). You can also play this little game yourself, with the operators that you're using. Of course, there is a small chance that this is done on purpose, ie. some operators may actually &lt;em&gt;need&lt;/em&gt; to watch all config maps or secrets in the cluster for their normal operation, but I bet this would be a very small minority, if any.&lt;/p&gt;

&lt;p&gt;To create many config maps and secrets, simply run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace &lt;span class="nb"&gt;test
&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;0..200&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl create cm test-cm-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./large_file &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done
for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;0..200&lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;kubectl create secret generic test-secret-&lt;span class="nv"&gt;$i&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./large_file &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;large_file&lt;/code&gt; is a local file of ~ 500KB (for instance).&lt;/p&gt;

&lt;p&gt;An operator is considered tested positive if the memory metric increases, and the network shows a spike during the operation, or tested negative if these metrics stay flat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahmpcfh95vkyepbhdt4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahmpcfh95vkyepbhdt4p.png" alt="Memory usage and bandwidth during ConfigMaps creation" width="800" height="454"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Here we see the tested operator reacting to unrelated ConfigMap creation, with memory increasing from 72MB to 350MB, and receive bandwidth showing spikes above 1MBps downloads.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I did not dig deep enough to check if, for each operator tested positive, a simple cache config would be sufficient or not. For sure, some of them don't need more than that.&lt;/p&gt;

&lt;p&gt;This blog is my small contribution to help raise awareness of the waste of resources still often seen in the software industry.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>softwareengineering</category>
      <category>theycoded</category>
    </item>
    <item>
      <title>Kubernetes CRD: the versioning joy</title>
      <dc:creator>Joel Takvorian</dc:creator>
      <pubDate>Thu, 04 Jul 2024 15:44:09 +0000</pubDate>
      <link>https://dev.to/jotak/kubernetes-crd-the-versioning-joy-6g0</link>
      <guid>https://dev.to/jotak/kubernetes-crd-the-versioning-joy-6g0</guid>
      <description>&lt;p&gt;&lt;em&gt;(The tribulations of a Kubernetes operator developer)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I am a developer of the &lt;a href="https://operatorhub.io/operator/netobserv-operator" rel="noopener noreferrer"&gt;Network Observability operator&lt;/a&gt;, for Kubernetes / OpenShift.&lt;/p&gt;

&lt;p&gt;A few days ago, we released our 1.6 version -- which I hope you will try and appreciate, but this isn't the point here. I want to talk about an issue that was reported to us soon after the release.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F783rekrxmjva1sus3lo3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F783rekrxmjva1sus3lo3.png" alt="OLM Console page in OpenShift showing an error during the operator upgrade" width="800" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The error says: risk of data loss updating "flowcollectors.flows.netobserv.io": new CRD removes version v1alpha1 that is listed as a stored version on the existing CRD&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What's that? It was a first for the team. This is an error reported by &lt;a href="https://olm.operatorframework.io/" rel="noopener noreferrer"&gt;OLM&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Investigating
&lt;/h2&gt;

&lt;p&gt;Indeed, we used to serve a &lt;code&gt;v1alpha1&lt;/code&gt; version of our CRD. And indeed, we are now removing it. But we didn't do it abruptly. We thought we followed all the guidelines of an API versioning lifecycle. I think we did, except for one detail.&lt;/p&gt;

&lt;p&gt;Let's rewind and recap the timeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;v1alpha1&lt;/code&gt; was the first version, introduced in our operator 1.0&lt;/li&gt;
&lt;li&gt;in 1.2, we introduced a new &lt;code&gt;v1beta1&lt;/code&gt;. It was the new preferred version, but the storage version was still &lt;code&gt;v1alpha1&lt;/code&gt;. Both versions were still served, and a conversion webhook allowed to convert from one to another.&lt;/li&gt;
&lt;li&gt;in 1.3, &lt;code&gt;v1beta1&lt;/code&gt; became the stored version. At this point, after an upgrade, every instance of our resource in &lt;em&gt;etcd&lt;/em&gt; are in version &lt;code&gt;v1beta1&lt;/code&gt;, right? (spoiler: it's more complicated).&lt;/li&gt;
&lt;li&gt;in 1.5 we introduced a &lt;code&gt;v1beta2&lt;/code&gt;, and we flagged &lt;code&gt;v1alpha1&lt;/code&gt; as deprecated.&lt;/li&gt;
&lt;li&gt;in 1.6, we make &lt;code&gt;v1beta2&lt;/code&gt; the storage version, and removed &lt;code&gt;v1alpha1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And &lt;strong&gt;BOOM&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;A few users complained about the error message mentioned above:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;risk of data loss updating "flowcollectors.flows.netobserv.io": new CRD removes version v1alpha1 that is listed as a stored version on the existing CRD&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And they are stuck: OLM won't allow them to proceed further. Or they can entirely remove the operator and the CRD, and reinstall.&lt;/p&gt;

&lt;p&gt;In fact, this is only some early adopters of NetObserv who have been seeing this. And we didn't see it when testing the upgrade prior to releasing. So what happened? I spent the last couple of days trying to clear out the fog.&lt;/p&gt;

&lt;p&gt;When users installed an old version &amp;lt;= 1.2, the CRD keeps track of the storage version in its status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get crd flowcollectors.flows.netobserv.io &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.storedVersions}'&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"v1alpha1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Later on, when users upgrade to 1.3, the new storage version becomes &lt;code&gt;v1beta1&lt;/code&gt;. So, this is certainly what now appears in the CRD status. This is certainly what now appears in the CRD status? (Padme style)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get crd flowcollectors.flows.netobserv.io &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.storedVersions}'&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"v1alpha1"&lt;/span&gt;,&lt;span class="s2"&gt;"v1beta1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why is it keeping &lt;code&gt;v1alpha1&lt;/code&gt;? Oh, I know! Upgrading the operator did not necessarily &lt;em&gt;change&lt;/em&gt; anything in the custom resources. Only resources that have been changed post-install would have make the &lt;em&gt;apiserver&lt;/em&gt; write them to &lt;em&gt;etcd&lt;/em&gt; in the new storage version; but different versions may coexist in &lt;em&gt;etcd&lt;/em&gt;, hence the &lt;code&gt;status.storedVersions&lt;/code&gt; field being an array and not a single string. That makes sense.&lt;/p&gt;

&lt;p&gt;Certainly, I can do some dummy edition of my custom resources to make sure they are in the new storage version. The &lt;em&gt;apiserver&lt;/em&gt; will replace the old one with a new one, so it will use the updated storage version. Let's do this. Then check again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get crd flowcollectors.flows.netobserv.io &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.storedVersions}'&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"v1alpha1"&lt;/span&gt;,&lt;span class="s2"&gt;"v1beta1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hmm...&lt;br&gt;
So, I am now &lt;em&gt;almost&lt;/em&gt; sure I don't have any &lt;code&gt;v1alpha1&lt;/code&gt; remaining in my cluster, but the CRD doesn't tell me that. What I learned is that the CRD status &lt;strong&gt;is not a source of truth&lt;/strong&gt; for what's in &lt;em&gt;etcd&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here's what the doc says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;storedVersions&lt;/code&gt; lists all versions of CustomResources that were ever persisted. Tracking these versions allows a migration path for stored versions in &lt;em&gt;etcd&lt;/em&gt;. The field is mutable so a migration controller can finish a migration to another version (ensuring no old objects are left in storage), and then remove the rest of the versions from this list. Versions may not be removed from &lt;code&gt;spec.versions&lt;/code&gt; while they exist in this list.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But how to ensure no old objects are left in storage? While poking around, I haven't found any simple way to inspect what custom resources are in &lt;em&gt;etcd&lt;/em&gt;, and in which version. It seems like no one wants to be responsible for that, in the core kube ecosystem. It is like a black box.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Apiserver&lt;/em&gt;? it deals with incoming requests but it doesn't actively keep track / stats of what's in &lt;em&gt;etcd&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is actually a metric (gauge) showing which objects the &lt;em&gt;apiserver&lt;/em&gt; stored. It is called &lt;code&gt;apiserver_storage_objects&lt;/code&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh71kkrc3l6d0r8u2r7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh71kkrc3l6d0r8u2r7g.png" alt="Graph showing the Prometheus metric " width="800" height="413"&gt;&lt;/a&gt;&lt;br&gt;
But it tells nothing about the version -- and even if it did, it would probably not be reliable, as it's generated from the &lt;em&gt;requests&lt;/em&gt; that the &lt;em&gt;apiserver&lt;/em&gt; deals with, it is not keeping an active state of what's in &lt;em&gt;etcd&lt;/em&gt;, as far as I understand.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;etcd&lt;/em&gt; itself? It is a binary store, it knows nothing about the business meaning of what comes in and out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And not talking about &lt;em&gt;OLM&lt;/em&gt;, which is probably even further from knowing that.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you, reader, can shed some light on how you would do that, ie. how you would ensure that no deprecated version of a custom resource is still lying around somewhere in a cluster, I would love to hear from you, don't hesitate to let me know!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Update from October 8th, 2024:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://github.com/gmeghnag/koff" rel="noopener noreferrer"&gt;koff&lt;/a&gt; allows to do so! You first need to dump your &lt;em&gt;etcd&lt;/em&gt; database by &lt;a href="https://docs.openshift.com/container-platform/4.16/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd" rel="noopener noreferrer"&gt;creating a snapshot&lt;/a&gt;. Then you can use &lt;em&gt;koff&lt;/em&gt; to get the versions of your custom resources, for instance:&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;koff use etcd.db
koff get myresource -ojson | jq '.items.[].apiVersion'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's the &lt;a href="https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md" rel="noopener noreferrer"&gt;etcdctl&lt;/a&gt; tool that allows to interact with &lt;em&gt;etcd&lt;/em&gt;, if you know exactly what you're looking for, and how this is stored in  &lt;em&gt;etcd&lt;/em&gt;, etc. But expecting our users to do this for upgrading? Meh...&lt;/p&gt;

&lt;h2&gt;
  
  
  Kube Storage Version Migrator
&lt;/h2&gt;

&lt;p&gt;Actually, it turns out the kube community has a go-to option for the whole issue. It's called the &lt;a href="https://kubernetes.io/docs/tasks/manage-kubernetes-objects/storage-version-migration/" rel="noopener noreferrer"&gt;Kube Storage Version Migrator&lt;/a&gt; (SVM). I guess in some flavours of Kubernetes, it might be enabled by default and triggers for any custom resource. In OpenShift, &lt;a href="https://github.com/openshift/cluster-kube-storage-version-migrator-operator?tab=readme-ov-file#kube-storage-version-migrator-operator" rel="noopener noreferrer"&gt;the trigger for automatic migration is not enabled&lt;/a&gt;, so it is up to the operator developers (or the users) to generate the migration requests.&lt;/p&gt;

&lt;p&gt;In our case, this is how the migration request looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;migration.k8s.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;StorageVersionMigration&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;migrate-flowcollector-v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flows.netobserv.io&lt;/span&gt;
    &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;flowcollectors&lt;/span&gt; 
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1alpha1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, the SVM just rewrites the custom resources without any modification, to make the &lt;em&gt;apiserver&lt;/em&gt; trigger a conversion (possibly via your webhooks, if you have some) and make them stored in the new storage version.&lt;/p&gt;

&lt;p&gt;To make sure the resources have really been modified, we can check their &lt;code&gt;resourceVersion&lt;/code&gt; before and after applying the &lt;code&gt;StorageVersionMigration&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Before&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get flowcollector cluster &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.metadata.resourceVersion}'&lt;/span&gt;
53114

&lt;span class="c"&gt;# Apply&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ./migrate-flowcollector-v1alpha1.yaml

&lt;span class="c"&gt;# After&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get flowcollector cluster &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.metadata.resourceVersion}'&lt;/span&gt;
55111

&lt;span class="c"&gt;# Did it succeed?&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get storageversionmigration.migration.k8s.io/migrate-flowcollector-v1alpha1 &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;span class="c"&gt;# [...]&lt;/span&gt;
  conditions:
  - lastUpdateTime: &lt;span class="s2"&gt;"2024-07-04T07:53:12Z"&lt;/span&gt;
    status: &lt;span class="s2"&gt;"True"&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;: Succeeded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, all you have to do is trust SVM and &lt;em&gt;apiserver&lt;/em&gt; to have effectively rewritten all the deprecated versions in their new version.&lt;/p&gt;

&lt;p&gt;Unfortunately, we're not entirely done yet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get crd flowcollectors.flows.netobserv.io &lt;span class="nt"&gt;-ojsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.storedVersions}'&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"v1alpha1"&lt;/span&gt;,&lt;span class="s2"&gt;"v1beta1"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, the CRD status isn't updated. It seems like it's not something SVM would do for us. So OLM will still block the upgrade. We need to manually edit the CRD status, and remove the deprecated version -- now that we're 99.9% sure it's not there (I don't like the other 0.1% much).&lt;/p&gt;

&lt;h2&gt;
  
  
  Revisited lifecycle
&lt;/h2&gt;

&lt;p&gt;To repeat the versioning timeline, here is what it seems we should have done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;v1alpha1&lt;/code&gt; was the first version, introduced in our operator 1.0&lt;/li&gt;
&lt;li&gt;in 1.2, we introduced a new &lt;code&gt;v1beta1&lt;/code&gt;. Storage version is still &lt;code&gt;v1alpha1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;in 1.3, &lt;code&gt;v1beta1&lt;/code&gt; becomes the stored version.

&lt;ul&gt;
&lt;li&gt;⚠️ &lt;strong&gt;The operator should check the CRD status and, if needed, create a &lt;code&gt;StorageVersionMigration&lt;/code&gt;, and then update the CRD status to remove the old storage version&lt;/strong&gt; ⚠️&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;in 1.5 &lt;code&gt;v1beta2&lt;/code&gt; is introduced, and we flag &lt;code&gt;v1alpha1&lt;/code&gt; as deprecated&lt;/li&gt;

&lt;li&gt;in 1.6, &lt;code&gt;v1beta2&lt;/code&gt; is the new storage version, &lt;strong&gt;we run again through the &lt;code&gt;StorageVersionMigration&lt;/code&gt; steps&lt;/strong&gt; (so we're safe when &lt;code&gt;v1beta1&lt;/code&gt; will be removed later). We remove &lt;code&gt;v1alpha1&lt;/code&gt;
&lt;/li&gt;

&lt;li&gt;Everything works like a charm, hopefully.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For the anecdote, in our case with NetObserv, all this convoluted scenario is probably just resulting from a false-alarm, the initial OLM error being a false positive: our FlowCollector resource manages workload installation, and have a status that reports the deployments readiness. On upgrade, new images are used, pods are redeployed, so the FlowCollector status changes. So, it had to be rewritten in the new storage version, &lt;code&gt;v1beta1&lt;/code&gt;, prior to the removal of the deprecated version. The users who have seen this issue could simply have manually removed the &lt;code&gt;v1alpha1&lt;/code&gt; from the CRD status, and that's it.&lt;/p&gt;

&lt;p&gt;While one could argue that OLM is too conservative here, blocking an upgrade that should pass because all the resources in storage must be fine, in its defense, it probably has no simple way to know that. And messing up with resources made inaccessible in &lt;em&gt;etcd&lt;/em&gt; is certainly a scenario we really don't want to run into. This is something that operator developers have to deal with.&lt;/p&gt;

&lt;p&gt;I hope this article will help prevent future mistakes for others. This error is quite tricky to spot, as it can reveal itself long after the fact.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: examples of implementations have been given in the comments below (thanks Jeeva):&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go" rel="noopener noreferrer"&gt;migrator from TektonCD&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://github.com/knative/pkg/blob/2783cd8cfad9ba907e6f31cafeef3eb2943424ee/apiextensions/storageversion/migrator.go" rel="noopener noreferrer"&gt;migrator from Knative&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>softwareengineering</category>
      <category>theycoded</category>
    </item>
  </channel>
</rss>
