In Kubernetes, what should I use as CPU requests and limits?
Popular answers include:
- Always use limits!
- NEVER use limits, only requests!
- I don't use either; is it OK?
Let's dive into it.
In Kubernetes, you have two ways to specify how much CPU a pod can use:
- Requests are usually used to determine the average consumption.
- Limits set the max number of resources allowed.
The Kubernetes scheduler uses requests to determine where the pod should be allocated in the cluster.
Since the scheduler doesn't know the consumption (the pod hasn't started yet), it needs a hint.
But it doesn't end there.
CPU requests are also used to repart the CPU to your containers.
Let's have a look at an example:
- A node has a single CPU.
- Container A has requests equal to 0.1 vCPU.
- Container B has requests equal to 0.2 vCPU.
What happens when both containers try to use 100% of the available CPU?
Since the CPU request doesn't limit consumption, both containers will use all available CPUs.
However, since container B's request is doubled compared to the other, the final CPU distribution is: Container 1 uses 0.3vCPU and the other 0.6vCPU (double the amount).
Requests are suitable for:
- Setting a baseline (give me at least X amount of CPU).
- Setting relationships between pods (this pod A uses twice as much CPU as the other).
But do not help set hard limits.
For that, you need CPU limits.
When you set a CPU limit, you define a period and quota.
Example:
- period: 100000 microseconds (0.1s).
- quota: 10000 microseconds (0.01s).
I can only use the CPU for 0.01 seconds every 0.1 seconds.
That's also abbreviated as "100m".
If your container has a hard limit and wants more CPU, it has to wait for the next period.
Your process is throttled.
So what should you use as CPU requests and limits in your Pods?
A simple (but not accurate) way is to calculate the smallest CPU unit as:
REQUEST = NODE_CORES * 1000 / MAX_NUM_PODS_PER_NODE
For a 1 vCPU node and a limit of 10 Pods, that's a 1 * 1000 / 10 = 100Mi
request.
Assign the smallest unit or a multiplier of it to your containers.
For example, if you don't know how much CPU you need for Pod A, but you identified it is twice as Pod B, you could set:
- Request A: 1 unit
- Request B: 2 units
If the containers use 100% CPU, they repart the CPU according to their weights (1:2).
A better approach is to monitor the app and derive the average CPU utilization.
You can do this with your existing monitoring infrastructure or use the Vertical Pod Autoscaler to monitor and report the average request value.
How should you set the limits?
- Your app might already have "hard" limits. (Node.js is single-threaded and uses up to 1 core even if you assign 2).
- You could have: limit = 99th percentile + 30-50%.
You should profile the app (or use the VPA) for a more detailed answer.
Should you always set the CPU request?
Absolutely, yes.
This is a standard good practice in Kubernetes and helps the scheduler allocate pods more efficiently.
Should you always set the CPU limit?
This is a bit more controversial, but, in general, I think so.
You can find a deeper dive here: https://dnastacio.medium.com/why-you-should-keep-using-cpu-limits-on-kubernetes-60c4e50dfc61
Also, if you want to dig in more a few relevant links:
- https://learnk8s.io/setting-cpu-memory-limits-requests
- https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b
- https://nodramadevops.com/2019/10/docker-cpu-resource-limits/
And finally, if you've enjoyed this thread, you might also like:
- The Kubernetes workshops that we run at Learnk8s https://learnk8s.io/training
- This collection of past threads https://twitter.com/danielepolencic/status/1298543151901155330
- The Kubernetes newsletter I publish every week https://learnk8s.io/learn-kubernetes-weekly
Top comments (4)
That is a good article but we should focus on the use cases (I'll try to prepare a response for this nice article).
You can have an environment that must remain under control and stable (with limits) or you might have an (likely own) cloud with few resources, very little load but peak load on a very few containers. This needs to be analyzed though with different contenerization methods and their ability to free allocated resources.
I agree, there's more to it as well. This article doesn't cover:
Still - this is a very good and solid article :)
Thank you!