When you are designing your Kubernetes clusters, there are a few important decisions you need to make. One such decision that can have significant implications on your application is deciding the size of the cluster.
You can either use a few clusters that are larger in size or you can choose to use multiple clusters that are smaller in size.
- Which one should you choose?
- What are the advantages of using one strategy over the other?
- And are there other more efficient ways to architect your clusters?
Let’s find out.
In this article, we’ll look at:
- What are the Pros and Cons of choosing a large cluster size, which inevitably means fewer clusters.
- What are the Pros and Cons of choosing a smaller cluster size, and thus increasing the number of clusters required.
- Two other cluster configurations that you must know of.
The first option is to run all your workloads in the same cluster:
With this approach, the cluster is used like a general-purpose infrastructure platform — whatever you need to run, you deploy it to your existing Kubernetes cluster.
Kubernetes provides namespaces to logically separate portions of a cluster from each other, and in the above case, you could use a separate namespace for each application instance.
Let's look at the pros and cons of this approach.
If you have only one Kubernetes cluster, you need to have only one copy of all the resources that are needed to run and manage a Kubernetes cluster.
This includes, for example, the master nodes — a Kubernetes cluster typically has 3 master nodes, and if you have only a single cluster, you need only 3 master nodes in total (compared to 30 master nodes if you have 10 Kubernetes clusters).
But this also includes other cluster-wide services, such as load balancers, Ingress controllers, authentication, logging, and monitoring.
If you have only a single cluster, you can reuse these services for all your workloads, and you don't need to have multiple copies of them for multiple clusters.
As a consequence of the above point, fewer clusters are usually cheaper, because the resource overhead with larger numbers of clusters costs money.
This is particularly true for the master nodes, which may cost you substantial amounts — be it on-premises or in the cloud.
Some managed Kubernetes services provide the Kubernetes control plane for free, such as Google Kubernetes Engine (GKE) or Azure Kubernetes Service (AKS) — in this cases, cost-efficiency is less of an issue.
However, there are also managed Kubernetes services that charge a fixed amount for running a Kubernetes cluster, such as Amazon Elastic Kubernetes Service (EKS).
Administrating a single cluster is easier than administrating many clusters.
This may include tasks like:
- Upgrading the Kubernetes version
- Setting up a CI/CD pipeline
- Installing a CNI plugin
- Setting up the user authentication system
- Installing an admission controller
And many more...
If you have only a single cluster, you need to do all of this only once.
If you have many clusters, then you need to apply everything multiple times, which probably requires you to develop some automated processes and tools for being able to do this consistently.
Now, to the cons.
If you have only one cluster and if that cluster breaks, then all your workloads are down!
There are many ways that something can go wrong:
- A Kubernetes upgrade produces unexpected side effects
- An cluster-wide component (such as a CNI plugin) doesn't work as expected
- An erroneous configuration is made to one of the cluster components
- An outage occurs in the underlying infrastructure
A single incident like this can produce major damage across all your workloads if you have only a single shared cluster.
If multiple apps run in the same Kubernetes cluster, this means that these apps share the hardware, network, and operating system on the nodes of the cluster.
Concretely, two containers of two different apps running on the same node are technically two processes running on the same hardware and operating system kernel.
Linux containers provide some form of isolation, but this isolation is not as strong as the one provided by, for example, virtual machines (VMs). Under the hood, a process in a container is still just a process running on the host's operating system.
This may be an issue from a security point of view — it theoretically allows unrelated apps to interact with each other in undesired ways (intentionally and unintentionally).
Furthermore, all the workloads in a Kubernetes cluster share certain cluster-wide services, such as DNS — this allows apps to discover the Services of other apps in the cluster.
All these may or may not be issues for you, depending on the security requirements for your applications.
Kubernetes provides various means to prevent security breaches, such as PodSecurityPolicies and NetworkPolicies — however, it requires experience to tweak these tools in exactly the right way, and they can't prevent every security breach either.
It's important to keep in mind that Kubernetes is designed for sharing, and not for isolation and security.
Given the many shared resources in a Kubernetes cluster, there are many ways that different apps can "step on each other's toes".
For example, an app may monopolise a certain shared resource, such as the CPU or memory, and thus starve other apps running on the same node.
Kubernetes provides various ways to control this behaviour, such as resource requests and limits, ResourceQuotas, and LimitRanges — however, again, it's not trivial to tweak these tools in exactly the right way, and they cannot prevent every unwanted side effect either.
If you have only a single cluster, then many people in your organisation must have access to this cluster.
The more people have access to a system, the higher the risk that they break something.
Within the cluster, you can control who can do what with role-based access control (RBAC) — however, this still can't prevent that users break something within their area of authorisation.
If you use a single cluster for all your workload, this cluster will probably be rather large (in terms of nodes and Pods).
However, Kubernetes clusters can't grow infinitely large.
There are some theoretical upper limits for how big a cluster can be, which are defined by Kubernetes at about 5000 nodes, 150,000 Pods, and 300,000 containers.
However, in practice, challenges may show up already with much smaller cluster sizes, such as 500 nodes.
The reason is that larger clusters put a higher strain on the Kubernetes control plane, which requires careful planning to keep the cluster functional and efficient.
This issue is also discussed in a related article of this blog named Architecting Kubernetes clusters — choosing a worker node size.
Let's look at the opposite approach — many small clusters.
With this approach, you use a separate Kubernetes cluster for every deployment unit:
For this article, a deployment unit is an application instance — such as the dev version of a single app.
With this strategy, Kubernetes is used as a specialised application runtime for individual application instances.
Let's see what the pros and cons of this approach are.
If a cluster breaks, the damage is limited to only the workloads that run on this cluster — all the other workloads are unaffected.
The workloads running in the individual clusters don't share any resources, such as CPU, memory, the operating system, network, or other services.
This provides strong isolation between unrelated applications, which may be a big plus for the security of these applications.
If every cluster runs only a small set of workloads, then fewer people need to have access to this cluster.
The fewer people have access to a cluster, the lower the risk that something breaks.
Let's look at the cons.
As already mentioned, each Kubernetes cluster requires a set of management resources, such as the master nodes, control plane components, monitoring, and logging solutions.
If you have many small clusters, you have to sacrifice a higher percentage of the total resources for these management functions.
Inefficient resource usage automatically results in higher costs.
For example, if you have to run 30 master nodes instead of 3 for the same compute power, you will see this in your monthly bill.
Administrating many Kubernetes clusters is more complex than administrating a single Kubernetes cluster.
For example, you need to set up authentication and authorisation for each cluster — if you want to upgrade the Kubernetes version, you need to do this many times too.
You probably need to develop some automated processes for being able to do this efficiently.
Can you design your clusters based on factors other than size? Absolutely.
You can also architect your Kubernetes clusters
- Based on the application i.e one cluster for each application instance.
- Based on the environment i.e. one cluster for every environment viz prod, dev, and test.
To know about these configurations and to read a more detailed article (along with visual diagrams) on how to ideally architect your Kubernetes cluster, click here