(The tribulations of a Kubernetes operator developer)
I am a developer of the Network Observability operator, for Kubernetes / OpenShift.
A few days ago, we released our 1.6 version -- which I hope you will try and appreciate, but this isn't the point here. I want to talk about an issue that was reported to us soon after the release.
The error says: risk of data loss updating "flowcollectors.flows.netobserv.io": new CRD removes version v1alpha1 that is listed as a stored version on the existing CRD
What's that? It was a first for the team. This is an error reported by OLM.
Investigating
Indeed, we used to serve a v1alpha1
version of our CRD. And indeed, we are now removing it. But we didn't do it abruptly. We thought we followed all the guidelines of an API versioning lifecycle. I think we did, except for one detail.
Let's rewind and recap the timeline:
-
v1alpha1
was the first version, introduced in our operator 1.0 - in 1.2, we introduced a new
v1beta1
. It was the new preferred version, but the storage version was stillv1alpha1
. Both versions were still served, and a conversion webhook allowed to convert from one to another. - in 1.3,
v1beta1
became the stored version. At this point, after an upgrade, every instance of our resource in etcd are in versionv1beta1
, right? (spoiler: it's more complicated). - in 1.5 we introduced a
v1beta2
, and we flaggedv1alpha1
as deprecated. - in 1.6, we make
v1beta2
the storage version, and removedv1alpha1
.
And BOOM!
A few users complained about the error message mentioned above:
risk of data loss updating "flowcollectors.flows.netobserv.io": new CRD removes version v1alpha1 that is listed as a stored version on the existing CRD
And they are stuck: OLM won't allow them to proceed further. Or they can entirely remove the operator and the CRD, and reinstall.
In fact, this is only some early adopters of NetObserv who have been seeing this. And we didn't see it when testing the upgrade prior to releasing. So what happened? I spent the last couple of days trying to clear out the fog.
When users installed an old version <= 1.2, the CRD keeps track of the storage version in its status:
kubectl get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1alpha1"]
Later on, when users upgrade to 1.3, the new storage version becomes v1beta1
. So, this is certainly what now appears in the CRD status. This is certainly what now appears in the CRD status? (Padme style)
kubectl get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1alpha1","v1beta1"]
Why is it keeping v1alpha1
? Oh, I know! Upgrading the operator did not necessarily change anything in the custom resources. Only resources that have been changed post-install would have make the apiserver write them to etcd in the new storage version; but different versions may coexist in etcd, hence the status.storedVersions
field being an array and not a single string. That makes sense.
Certainly, I can do some dummy edition of my custom resources to make sure they are in the new storage version. The apiserver will replace the old one with a new one, so it will use the updated storage version. Let's do this. Then check again:
kubectl get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1alpha1","v1beta1"]
Hmm...
So, I am now almost sure I don't have any v1alpha1
remaining in my cluster, but the CRD doesn't tell me that. What I learned is that the CRD status is not a source of truth for what's in etcd.
Here's what the doc says:
storedVersions
lists all versions of CustomResources that were ever persisted. Tracking these versions allows a migration path for stored versions in etcd. The field is mutable so a migration controller can finish a migration to another version (ensuring no old objects are left in storage), and then remove the rest of the versions from this list. Versions may not be removed fromspec.versions
while they exist in this list.
But how to ensure no old objects are left in storage? While poking around, I haven't found any simple way to inspect what custom resources are in etcd, and in which version. It seems like no one wants to be responsible for that, in the core kube ecosystem. It is like a black box.
- Apiserver? it deals with incoming requests but it doesn't actively keep track / stats of what's in etcd.
There is actually a metric (gauge) showing which objects the apiserver stored. It is called apiserver_storage_objects
:
But it tells nothing about the version -- and even if it did, it would probably not be reliable, as it's generated from the requests that the apiserver deals with, it is not keeping an active state of what's in etcd, as far as I understand.
etcd itself? It is a binary store, it knows nothing about the business meaning of what comes in and out.
And not talking about OLM, which is probably even further from knowing that.
If you, reader, can shed some light on how you would do that, ie. how you would ensure that no deprecated version of a custom resource is still lying around somewhere in a cluster, I would love to hear from you, don't hesitate to let me know!
Update from October 8th, 2024:
koff allows to do so! You first need to dump your etcd database by creating a snapshot. Then you can use koff to get the versions of your custom resources, for instance:
koff use etcd.db
koff get myresource -ojson | jq '.items.[].apiVersion'
There's the etcdctl tool that allows to interact with etcd, if you know exactly what you're looking for, and how this is stored in etcd, etc. But expecting our users to do this for upgrading? Meh...
Kube Storage Version Migrator
Actually, it turns out the kube community has a go-to option for the whole issue. It's called the Kube Storage Version Migrator (SVM). I guess in some flavours of Kubernetes, it might be enabled by default and triggers for any custom resource. In OpenShift, the trigger for automatic migration is not enabled, so it is up to the operator developers (or the users) to generate the migration requests.
In our case, this is how the migration request looks like:
apiVersion: migration.k8s.io/v1alpha1
kind: StorageVersionMigration
metadata:
name: migrate-flowcollector-v1alpha1
spec:
resource:
group: flows.netobserv.io
resource: flowcollectors
version: v1alpha1
Under the hood, the SVM just rewrites the custom resources without any modification, to make the apiserver trigger a conversion (possibly via your webhooks, if you have some) and make them stored in the new storage version.
To make sure the resources have really been modified, we can check their resourceVersion
before and after applying the StorageVersionMigration
:
# Before
$ kubectl get flowcollector cluster -ojsonpath='{.metadata.resourceVersion}'
53114
# Apply
$ kubectl apply -f ./migrate-flowcollector-v1alpha1.yaml
# After
$ kubectl get flowcollector cluster -ojsonpath='{.metadata.resourceVersion}'
55111
# Did it succeed?
$ kubectl get storageversionmigration.migration.k8s.io/migrate-flowcollector-v1alpha1 -o yaml
# [...]
conditions:
- lastUpdateTime: "2024-07-04T07:53:12Z"
status: "True"
type: Succeeded
Then, all you have to do is trust SVM and apiserver to have effectively rewritten all the deprecated versions in their new version.
Unfortunately, we're not entirely done yet.
kubectl get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1alpha1","v1beta1"]
Yes, the CRD status isn't updated. It seems like it's not something SVM would do for us. So OLM will still block the upgrade. We need to manually edit the CRD status, and remove the deprecated version -- now that we're 99.9% sure it's not there (I don't like the other 0.1% much).
Revisited lifecycle
To repeat the versioning timeline, here is what it seems we should have done:
-
v1alpha1
was the first version, introduced in our operator 1.0 - in 1.2, we introduced a new
v1beta1
. Storage version is stillv1alpha1
- in 1.3,
v1beta1
becomes the stored version.- ⚠️ The operator should check the CRD status and, if needed, create a
StorageVersionMigration
, and then update the CRD status to remove the old storage version ⚠️
- ⚠️ The operator should check the CRD status and, if needed, create a
- in 1.5
v1beta2
is introduced, and we flagv1alpha1
as deprecated - in 1.6,
v1beta2
is the new storage version, we run again through theStorageVersionMigration
steps (so we're safe whenv1beta1
will be removed later). We removev1alpha1
- Everything works like a charm, hopefully.
For the anecdote, in our case with NetObserv, all this convoluted scenario is probably just resulting from a false-alarm, the initial OLM error being a false positive: our FlowCollector resource manages workload installation, and have a status that reports the deployments readiness. On upgrade, new images are used, pods are redeployed, so the FlowCollector status changes. So, it had to be rewritten in the new storage version, v1beta1
, prior to the removal of the deprecated version. The users who have seen this issue could simply have manually removed the v1alpha1
from the CRD status, and that's it.
While one could argue that OLM is too conservative here, blocking an upgrade that should pass because all the resources in storage must be fine, in its defense, it probably has no simple way to know that. And messing up with resources made inaccessible in etcd is certainly a scenario we really don't want to run into. This is something that operator developers have to deal with.
I hope this article will help prevent future mistakes for others. This error is quite tricky to spot, as it can reveal itself long after the fact.
Update: examples of implementations have been given in the comments below (thanks Jeeva):
Top comments (3)
Thanks for the nice write-up @jotak
StorageVersionMigration
is new to me, I learned from this post.We are doing similar work in tekton operator we have to deal with multiple CRDs, We have to a empty patch to update in the etcd entries and finally we update the storage version in CRDs
thanks Jeeva! I've added these links to our jira ticket :-)
Great write-up, we also have a bunch of articles on Kubernetes, check it out here - packagemain.tech