How do you ensure the stability of your Kubernetes clusters? How do you know that your manifests are syntactically valid? Are you sure you don’t have any invalid data types? Are any mandatory fields missing?
Most often, we only become aware of these misconfigurations at the worst time - when trying to deploy the new manifests.
Specialized tools and a “shift-left” approach make it possible to verify a Kubernetes schema before they’re applied to a cluster. In this article, I'll address how you can avoid misconfigurations and which tools are best to use.
Running schema validation tests is important, and the sooner the better.
If all machines (local developers environment, CI, etc.) have access to your Kubernetes cluster, run
kubectl --dry-run in server mode on every code change. If this isn’t possible, and you want to perform schema validation tests offline, use kubeconform together with a policy enforcement tool to have optimal validation coverage.
Verifying the state of Kubernetes manifests may seem like a trivial task, because the Kubernetes CLI (kubectl) has the ability to verify resources before they’re applied to a cluster. You can verify the schema by using the dry-run flag (
--dry-run=client/server) when specifying the
kubectl create or
kubectl apply commands, which will perform the validation without applying Kubernetes resources to the cluster.
But I can assure you that it’s actually more complex. A running Kubernetes cluster is required to obtain the schema for the set of resources being validated. So, when incorporating manifest verification into a CI process, you must also manage connectivity and credentials to perform the validation. This becomes even more challenging when dealing with multiple microservices in several environments (prod, dev, etc.).
Kubeval and kubeconform are command-line tools that were developed with the intent to validate Kubernetes manifests without the requirement of having a running Kubernetes environment. Because kubeconform was inspired by kubeval, they operate similarly — verification is performed against pre-generated JSON schemas that are created from the OpenAPI specifications (swagger.json) for each particular Kubernetes version. All that remains to run the schema validation tests is to point the tool executable to a single manifest, directory or pattern.
- kubectl dry-run in ‘client’ mode
- kubectl dry-run in ‘server’ mode
Now that we covered the tools that are available for Kubernetes schema validation, let’s compare some core abilities (misconfigurations coverage, speed test, different versions support, CRD support and docs).
|Misconfig/Tool||kubeval / kubeconform||kubectl dry-run in ‘client’ mode||kubectl dry-run in ‘server’ mode|
|API deprecation||✅ Caught||✅ Caught||✅ Caught|
|Invalid kind value||✅ Caught||❌ Didn't catch||🚧 Caught3|
|Invalid label value||❌ Didn't catch||❌ Didn't catch||✅ Caught|
|Invalid protocol type||✅ Caught||❌ Didn't catch||✅ Caught|
|Invalid spec key||✅ Caught||✅ Caught||✅ Caught|
|Missing image||❌ Didn't catch||❌ Didn't catch||✅ Caught|
|Wrong K8s indentation||✅ Caught||✅ Caught||✅ Caught|
Conclusion: Running kubectl dry-run in ‘server’ mode caught all misconfigurations, while kubeval/kubeconform missed two of them. It’s also interesting to see that running kubectl dry-run in ‘client’ mode is almost useless because it’s missing some obvious misconfigurations, and also requires a connection to a running Kubernetes environment.
I used hyperfine to benchmark the execution time of each tool4. First I ran it against (1) all the files with misconfigurations (seven files in total), and then I ran it against (2) 100 Kubernetes files (all the files contain the same config).
(1) Results for running the tools against seven files with different Kubernetes schema misconfigurations:
(2) Results for running the tools against 100 files with valid Kubernetes schemas:
Conclusion: We can see that while
kubeval (#2) and
kubectl --dry-run=client (#3) are providing fast results on both tests, while
kubectl --dry-run=server (#4) is working slower, especially when it needs to evaluate 100 files — 60 seconds for generating a result is still a good outcome in my opinion.
Both kubeval and kubeconform accept the Kubernetes schema version as a flag. Although both tools are similar (as mentioned, kubeconfrom is based on kubeval), one of the key differences between them is that each tool relies on its own set of pre-generated JSON schemas:
- Kubeval - instrumenta/kubernetes-json-schema (last commit: 133f848 on April 29, 2020)
- Kubeconform - yannh/kubernetes-json-schema (last commit: a660f03 on May 15, 2021)
As of today (May 2021), kubeval only supports Kubernetes schema versions up to 1.18.1, while kubeconform supports the latest Kubernetes schema available today — 1.21.0. With kubectl, it’s a little bit trickier. I don’t know which version of kubectl introduced the dry-run, but I tried it with Kubernetes version 1.16.0 and it still worked, so I know it’s available in Kubernetes versions 1.16.0-1.18.0.
The variety of Kubernetes schemas support is especially important if you want to migrate to a new Kubernetes version. With kubeval and kubeconform you can set the version and start the process of evaluating which configurations must be changed to support the cluster upgrade.
Conclusion: The fact that kubeconform has all the schemas for all the different Kubernetes versions available — and also doesn’t require minikube setup (as kubectl does) — makes it a superior tool when comparing these capabilities to its alternatives.
Custom Resource Definition (CRD) support
Both kubectl dry-run and kubeconform support resource type CRD, while kubeval does not. According to kubeval docs, you can pass a flag to kubeval to ignore missing schemas, so it will not fail when testing a bunch of manifests for which only some are resource type CRD.
Kubeval is a more popular project than kubeconform, and therefore, its community and documentation are more extensive. Kubeconform doesn't have official docs but it does have a well-written README file that explains pretty well its capabilities. The interesting part is that although Kubernetes native tools, like kubectl, are usually well-documented, it was really hard to find the necessary information needed to understand how the
dry-run flag actually works and its limitations.
Conclusion: Although it’s not as famous as kubeval, the CRD support and good-enough documentation make kubeconform the winner in my opinion.
|Item/Tool||kubeval||kubeconform||dry-run client||dry-run server|
|Benchmark speed test||+/-||+||+/-||-|
|Kubernetes versions support||-||+||+/-||+/-|
Now that you know the pros and cons associated with each tool, here are some best practices for how to best leverage them within your Kubernetes production-scale development flow.
- ⬅️ Shift-left: When possible, the best setup is if you can run
kubectl --dry-run=serveron every code change, but you probably can’t do it because you can’t allow every developer or CI machine in your organization to have a connection to your cluster. So, the second-best effort is to run kubeconform.
- 🚔 Because kubeconform doesn’t cover all common misconfigurations, it’s recommended to run it with a policy enforcement tool on every code change to fill the coverage gap.
- 💸 Buy vs. build: If you enjoy the engineering overhead, then kubeconform + conftest is a great combination of tools to get good coverage. Alternatively, there are tools that can provide you with an out-of-the-box experience to help you save time and resources, such as Datree5 (whose schema validation is powered by kubeconform).
- 🚀 During the CD step, it shouldn’t be a problem to have a connection with your cluster, so you should always run
kubectl --dry-run=serverbefore deploying your new code changes.
- 👯 Another option for using kubectl dry-run in server mode, without having a connection to your Kubernetes environment, is to run minikube +
kubectl --dry-run=server. The downside of this hack is that it’s also required to set up the minikube cluster like prod (same volumes, namespace, etc.) or you’ll encounter errors when trying to validate your Kubernetes manifests.
Thank you to Yann Hamon for creating kubeconform - it’s awesome!
This article wouldn’t be possible without you. Thank you for all of your guidance.
All the schemas validation tests performed against Kubernetes version 1.18.0 ↩
Because kubeconform is based on kubeval, they provide the same result and run them against the files with the misconfigurations. kubectl is one tool but each mode (client or server) produces a different result as you can see from the table ↩
Server mode didn’t mark the file as valid (exit code 1) but the error message is wrong:
Kind=pod doesn't support dry-run↩
All benchmark test performed on my MacBook Pro with a 2.3 GHz Quad-Core Intel Core i7 processor ↩
Disclaimer - self-promotion here :) ↩