Please note that this post were originally posted by me on medium.com, i've since decided to move over to dev.to instead!
This post assumes that you’ve some experience with both operating/running Kubernetes and some Go programming language concepts.
For the last year and a half we’ve been working a lot with OpenStack and Kubernetes at Etraveli Group. We’re our own cloud provider and the end users is the developers and the sourrounding teams in the development organization.
One of the key components of running Kubernetes on-top of OpenStack is the Cloud Controller Manager. It’s one of the pieces that glues the two platforms together. For those of you who work with managed Kubernetes services, in public clouds (or private for that matter) somewhere, will have this sorted for you already. The Kubernetes control plane will be more or less out of reach for you.
So why write something about the Cloud Controller Manager then? The main driver have been the simplest of questions asked by one of my colleagues:
"What does the (OpenStack) Cloud Controller Manager do anyways?"
I tried to answer the question but i basically had no idea of what i was talking about, i knew that it was responsible for creating a load balancer in the underlying cloud when a Service object of typeLoadBalancer is created.
I immediately decided to dig into this a bit and i ended up digging pretty deep, trying to understand every bit of the puzzle and in this post i’ll summarize my findings.
Throughout this post i’ll reference:
Cloud Controller Manager as "CCM"
The Kubernetes main source code repository as "k/k"
Kubernetes as "k8s"
As i started to write this post i created a small project and repository where i’ve built my own Cloud Controller Manager by-the-book and compares two tiny Kubernetes (v1.18.2) clusters to each other, one running the CCM and one that doesn’t. It’s a proof-of-concept and aims to show you what exactly you’ll need to run your own CCM. Everything from the code that makes up the CCM to the k8s manifests you’ll need to deploy it.
Before we begin it’s good to know that this post links to various parts of the k8s source code repositories, i have used the v1.18.0 tag.
All illustrations are my own.
Enjoy!
Please leave a comment or two (!), any feedback is highly appreciated!
The Cloud Controller Manager
The Cloud Controller Manager can be described as three different things looking at it from a high level view:
A binary
A number of control loops
Part of the glue between k8s and your cloud
Code wise the CCM is part of the k/k repo, take a look here. As mentioned in the official k8s documentation the code for the CCM in k/k can be used as a skeleton for your own implementation. The difference will be the code (packages) you provide and import for interacting with your cloud.
The CCM will most often be deployed through a k8s manifest and the binary built into a (Docker) container pulled from a well known container registry.
Worth noting is that the CCM will be assigned port 10258 and you’ll need to expose it, if needed. Out-of-the-box the CCM will expose a /healthz endpoint to check the health of the service.
When looking through the k8s organization in GitHub you’ll find a number of different CCM implementations, they’re also called external CCM’s since they live outside of the k/k repository and consists of a set of well crafted Golang interfaces as well as the bare minimum for anyone to create their own CCM.
We’ll have a detailed look at all of this later on.
The core of the CCM consists of four (cloud) controllers running control loops, optionally you can run your own controller(s) alongside the others.
We’ll have an more in-depth look at each of the cloud controllers running in the CCM in the next coming sections.
The Node controller
The Node controller makes sure that your cloud nodes (e.g. VMs) are labelled, tainted and updated with other relevant information from your cloud provider. The controller will periodically do the following, in an unordered list:
Initialize new nodes added to the cloud provider with the following taint: node.cloudprovider.kubernetes.io/uninitialized set to true and taint effect to NoSchedule. When the the node is initialized, by the Node controller, this taint will be removed allowing workloads to be scheduled on the node. Pods that are critical for e.g. running the cluster will of course have the toleration needed to be scheduled on already tainted nodes.
Update node IP address by comparing the IP address in the cloud provider with the one stored in the Node object in the k8s API.
Add or update node labels with information provided from the cloud, these include: instance type, zone failure domain and zone region. Zone specific information is not mandatory as we’ll see later on.
Regarding node labels and how the information about instances, mentioned above, are fetched and added to the Node objects. To give you an example, the OpenStack external CCM does this by either reading the metadata from disk (config disk) or using the metadata service endpoint reachable within each node.
Service controller
The Service controller will handle everything related to the life cycle of the Service object based load balancer created in your cloud. A Service provides a way of exposing your application internally and/or externally in the perspective of the k8s cluster.
This particular controller will only handle the Service objects of type LoadBalancer. This means, from a cloud provider perspective, that it’ll ensure that a load balancer of some sort is created, deleted and updated in your cloud.
Depending on how your CCM have implemented the Service controller logic you can create a cloud load balancer that only load balance network traffic internally in the cloud. This is usually done by defining a set of annotations in the metadata section of the Service object.
Please note that the default behavior, as i’ve noted in e.g. OpenStack, is that the following objects will be created when applying a simple Service object (of type LoadBalancer) manifest:
A Cloud load balancer with a populated list of nodes that the traffic would be load balanced to. The load balancer will point at every node by using the generated random NodePort
A Service object of type NodePort
As seen above there’s actually a number of things that makes up the Service object in both k8s and the cloud provider. For the users this means that the EXTERNAL-IP column will be filled with a cloud provider IP of the cloud load balancer. Seen when listing the Service objects using kubectl :
$> kubectl get service my-app-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
my-app-svc LoadBalancer 172.20.10.100 123.123.123.123 80:31147/TCP
Route controller
Out of the four controllers there’s one that’s a bit special and that’s the (Cloud) Route controller, it will not be started unless you provide the --allocate-node-cidrs or the --configure-cloud-routes flags to the CCM. Also if you haven’t implemented any route handling logic then this controller wont start, more on this later on.
This controller will periodically try to do the following:
List all routes associated with a k8s cluster, this is done by querying the cloud provider API(s).
List all Nodes by querying the k8s API.
Loop through every Node and thepodCIDRs field of the Node spec. The Pod CIDR and node name will be used to create routes via the cloud provider API(s).
Also, during the control loop, the controller will delete unused routes. When all routes have been created the Node will be considered ready, what that means is that the node condition field of NetworkUnavailable will be set to false . If the node hasn’t got any routes associated with it, the NetworkUnavailable field will be set to true . These conditions are translated into taints by the NodeLifeCycle controller, not to be confused by the one the CCM is responsible for.
Lifecycle controller
The (Cloud Node)Lifecycle controller will make sure that your nodes, represented as Kubernetes API Node objects are removed if the nodes is removed from your cloud.
Also if a node is in a cloud provider specified shutdown state, the node gets tainted accordingly with node.cloudprovider.kubernetes.io/shutdown and the taint effect ofNoSchedule .
That’s about all of the functionality you’ll get from a CCM, most of the time your cloud might have it’s own controller (provided in a separate binary from a separate repository) for handling k8s objects of type Ingress , or help you integrate natively with the underlying network infrastructure.
The CCM will not be the one-stop shop solution, it’s just on piece of a bigger puzzle.
If you’ve followed along, and perhaps have taken a look at the source code of the different controllers that the CCM handles, you might have noticed that there’s no cloud provider specific code anywhere to been seen. There’s merely a bunch of calls to somewhat mysterious methods on various objects.
The cloud provider specific code and the wiring of it will be the missing piece of this CCM puzzle.
A look in the rear view mirror
Before moving on the cloud provider package (k8s.io/cloud-provider) , the missing piece of the CCM puzzle, we’ll go through some of the history behind CCM and the cloud provider(s). How they’ve evolved during the years and how everything came to be.
From the very beginning the cloud integrations have been a foundational of part of k8s, for obvious reasons. Lets take a look at the cloud providers that had their specific provider code within the k8s repository (k/k) of v1.0 , which was released in July of 2015:
- AWS
- GCE
- Mesos
- OpenStack
- Ovirt
- Rackspace
- Vagrant
You might recognize all of the above “cloud providers”, some of them are virtualization technologies and does not exactly fit into what we generally view as a cloud provider. To say the least.
Back in July of 2015 all of the cloud provider specific code was imported and used by the kubelet , one of the critical node components that makes up a k8s cluster node.
There’s a number of problems with original implementation of cloud provider specific code that the community of k8s recognized and up until now are working on.
Here’s some of the problems that have surfaced during the years:
The kubelet shall not run cloud provider specific control loops. This have now been moved to the CCM.
Cloud providers shall not be part of k/k (in-tree), the reasons for this is that the cloud provider code would be bound to the k8s release cycle and committing code to the k/k can be a tedious task.
Support external (out-of-tree) cloud providers by provide a separate package with a pluggable way of integrating your cloud with k8s. This became the k8s.io/cloud-provider package.
All of the cloud controller code used by the CCM shall be moved to the k8s.io/cloud-provider package, there’s still remnants of code in-tree that will be moved.
For backwards compatibility reasons the code that was in-tree will be around for a while, but it’s now it’s own package (k8s.io/legacy-cloud-providers).
I’ve tried to follow along how the CCM and k8s.io/cloud-provider came to be by digging around in the various repositories in the k8s organization, almost like digital archaeology. Here’s some of the highlights:
In September of 2016 the enhancement #88 (KEP) issue is created to support out-of-tree cloud providers (pluggable).
Start of breaking the (kube) controller manager into two pieces, Oct 2016. Please note that this PR mentions the volumeController , this is the time before CSI. That controller have been removed from the CCM since then.
Cloud Controller Manager discussions July 2017. Made it to beta in v1.11 .
Here’s a good explanation on how tightly coupled the kubelet and cloud provider once were. There’s three approaches on how to decouple the kubelet and cloud provider.
The cloud-provider package
The cloud provider package is imported as k8s.io/cloud-provider in your CCM and defines a number of (Golang) interfaces. The main one is the Interface interface which is what makes this package pluggable for cloud providers.
The Interface defines a set of methods, some of these return other interfaces. These returned interfaces is also defined in the cloud.go file of the cloud provider package.
LoadBalancer() (LoadBalancer, bool)
Instances() (Instances, bool)
Zones() (Zones, bool)
Clusters() (Clusters, bool)
Routes() (Routes, bool)
As you can see above the method signatures specifies a bool return value along side the returned interfaces, this means that you can enable/disable functionality if it can’t or shouldn’t be implemented by the CCM. This is something that will be checked during initialization of the controller that implements the functionality defined by the interface.
Here’s a quick overview of which k8s.io/cloud-provider interface methods that is used by which controller:
The Instances interface methods will be called from the Node and the Lifecycle controllers.
The Zones interface methods will be called from the Nodeand theLifecycle controllers.
The Route interface methods will be called from theRoute controller.
The LoadBalancer interface methods will be called from the Service controller.
The Clusters interface is only used by the GCP external cloud provider.
Note that there’s a number of places in the k/k repository where there’s call-outs to various methods in the interfaces above, e.g. both in the kubelet and the API server.
Besides interfaces above the k8s.io/cloud-provider package also includes everything needed to register and initialize your cloud provider with the CCM.
Let’s take a look at the LoadBalancer interface, you’ll see a bunch of methods that you shall implement:
GetLoadBalancer(...)
GetLoadBalancerName(...)
EnsureLoadBalancer(...)
UpdateLoadBalancer(...)
EnsureLoadBalancerDeleted(...)
These methods will called by the Service controller that runs in the CCM, i’m showing these methods as an example because you’ll see these methods being called in the source code of the Service controller.
What’s actually being passed around throughout the CCM is your instantiated object that will behave as a cloud provider (satisfying all cloud-provider interfaces).
This is how the CCM maintained cloud controllers are able to create, update and delete resources in your cloud.
The k8s.io/cloud-provider package will not define any way of how to connect and e.g. authenticate to your cloud. That kind of logic is something you’ll have to build into the CCM.
When you’ve satisfied all of the interfaces defined in the k8s.io/cloud-provider package, wired everything together, you’ve successfully become a cloud provider. The only thing left is to build and package your CCM binary into a container and deploy it to k8s!
Looking ahead there’s actually a lot of things going on when looking at the cloud provider parts of the k/k repository, as of writing this there’s an ongoing initiative to restructure and make the k8s.io/cloud-provider more or less independent. Meaning that e.g. the cloud controllers will be a part of the k8s.io/cloud-provider package, this means that in the end you as a cloud provider would import one package to be able to build and implement your own CCM and external cloud provider.
From a Kubernetes perspective
To be able to run your newly assembled and Docker packaged CCM there’s a couple of things needed to be configured when bringing the k8s control plane up:
- kubelet‘s shall be started with the --cloud-provider=external flag, this signals to the kubelet that there’s another controller initializing the nodes.
As mentioned in the beginning of this post there’s this repository where i show and explain the technical side of things running your own external cloud provider CCM.
If you, let’s say, were on AWS right now spinning up your own k8s cluster on EC2 instances and wanted a more native integration to AWS you would’ve deployed the cloud provider AWS CCM. You could also, although not recommended, just specify --cloud-provider=aws on the kubelet‘s. This is how you signal to k8s that you want to use in-tree cloud provider(s), there’s only a handful of them implemented. Any “newer” private/public clouds out there would have an external cloud provider CCM.
The code for the in-tree cloud providers is imported through the k8s.io/legacy-cloud-providers , please note that when you use the CCM skeleton code from k/k you’ll be importing this package from the start.
Resources
To follow what’s going on in regards to the everything related to cloud providers in the context of k8s please see these resources:
Channel sig-cloud-provider in the k8s Slack space
All the k8s enhancement issues labeled with sig/cloud-provider
Here’s a list of the external cloud providers, great to use as reference or if you’re just curios on how others have done:
Top comments (0)