DEV Community

Cover image for Learn Kubernetes, part III scaling my app
Chris Noring for Microsoft Azure

Posted on • Updated on • Originally published at

Learn Kubernetes, part III scaling my app

Follow me on Twitter, happy to take your suggestions on topics or improvements /Chris

This third part aims to show how you scale your application. We can easily set the number of Replicas we want of a certain application and let Kubernetes figure out how to do that. This is us defining a so-called desired state.

When traffic increases, we will need to scale the application to keep up with user demand. We've talked about deployments and services, now lets talk scaling.

What does scaling mean in the context of Kubernetes?

We get more Pods. More Pods that are scheduled to nodes.

Now it's time to talk about desired state again, that we mentioned in previous parts.

This is where we relinquish control to Kubernetes. All we need to do is tell Kubernetes how many Pods we want and Kubernetes does the rest.

So we tell Kubernetes about the number of Pods we want, what does that mean? What does Kubernetes do for us?

It means we get multiple instances of our application. It also means traffic is being distributed to all of our Pods, ie. load balancing.

Furthermore, Kubernetes, or more specifically, services within Kubernetes will monitor which Pods are available and send traffic to those Pods.


Scaling demo Lab

If you haven't followed the first two parts I do recommend you go back and have a read. What you need for the following to work is at least a deployment. So if you haven't created one, here is how:

kubectl run kubernetes-first-app --port=8080
Enter fullscreen mode Exit fullscreen mode

Let's have a look at our deployments:

kubectl get deployments
Enter fullscreen mode Exit fullscreen mode

Let's look closer at the response we get:

We have three pieces of information that are important to us. First, we have the READY column in which we should read the value in the following way, CURRENT STATE/DESIRED STATE. Next up is the UP_TO_DATE column which shows the number of replicas that were updated to match the desired state.
Lastly, we have the AVAILABLE column that shows how many replicas we have available to do work.

Let's scale

Now, let's do some scaling. For that we will use the scale command like so:

kubectl scale deployments/kubernetes-first-app --replicas=4 
Enter fullscreen mode Exit fullscreen mode

as we can see above the number of replicas was increased to 4 and kubernetes is thereby ready to load balance any incoming requests.

Let's have a look at our Pods next:

When we asked for 4 replicas we got 4 Pods.

We can see that this scaling operation took place by using the describe command, like so:

kubectl describe deployments/kubernetes-first-app
Enter fullscreen mode Exit fullscreen mode

In the above image, we are given quite a lot of information on our Replicas for example, but there is some other information in there that we will explain later on.

Does it load balance?

The whole point with the scaling was so that we could balance the load on incoming requests. That means that not the same Pod would handle all the requests but that different Pods would be hit.
We can easily try this out, now that we have scaled our app to contain 4 replicas of itself.

So far we used the describe command to describe the deployment but we can use it to describe the IP and port of. Once we have the IP and port we can then hit it with different HTTP requests.

kubectl describe services/kubernetes-first-app
Enter fullscreen mode Exit fullscreen mode

Especially look at the NodePort and the Endpoints. NodePort is the port value that we want to hit with an HTTP request.

Now we will actually invoke the cURL command and ensure that it hits a different port each time and thereby prove our load balancing is working. Let's do the following:

Enter fullscreen mode Exit fullscreen mode

Next up the cURL call:

curl $(minikube ip):$NODE_PORT
Enter fullscreen mode Exit fullscreen mode

As you can see above we are doing the call 4 times. Judging by the output and the name of the instance we see that we are hitting a different Pod for each request. Thereby we see that the load balancing is working.

Scaling down

So far we have scaled up. We managed to go from one Pod to 4 Pods thanks to the scale command. We can use the same command to scale down, like so:

kubectl scale deployments/kubernetes-first-app --replicas=2 
Enter fullscreen mode Exit fullscreen mode

Now if we are really fast adding the next command we can see how the Pods are being removed as Kubernetes is trying to adjust to desired state.

2 out of 4 Pods are saying Terminating as only 2 Pods are needed to maintain the new desired state.

Running our command again we see that only 2 Pods remain and thereby our new desired state have been reached:

We can also look at our deployment to see that our scale instruction has been parsed correctly:


Self-healing is Kubernetes way of ensuring that the desired state is maintained. Pods don't self heal cause Pods can die. What happens is that a new Pod appears in its place, thanks to Kubernetes.

So how do we test this?

Glad you asked, we can delete a Pod and see what happens. So how do we do that? We use the delete command. We need to know the name of our Pod though so we need to call get pods for that. So let's start with that:

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

Then lets pick one of our two Pods kubernetes-first-app-669789f4f8-6glpx and assign it to a variable:

Enter fullscreen mode Exit fullscreen mode

Now remove it:

kubectl delete pods $POD_NAME
Enter fullscreen mode Exit fullscreen mode

Let's be quick about it and check our Pod status with get pods. It should say Terminating like so:

Wait some time and then echo out our variable $POD_NAME followed by get pods. That should give you a result similar to the below.

So what does the above image tell us? It tells us that the Pod we deleted is truly deleted but it also tells us that the desired state of two replicas has been achieved by spinning up a new Pod. What we are seeing is * self-healing* at work.

Different ways to scale

Ok, we looked at a way to scale by explicitly saying how many replicas we want of a certain deployment. Sometimes, however, we might want a different way to scale namely auto-scaling. Auto-scaling is about you not having to set the exact number of replicas you want but rather rely on Kubernetes to create the number of replicas it thinks it needs. So how would Kubernetes know that? Well, it can look at more than one thing but a common metric is CPU utilization. So let's say you have a booking site and suddenly someone releases Bruce Springsteen tickets you are likely to want to rely on auto-scaling, cause the next day when the tickets are all sold out you want the number of Pods to go back to normal and you wouldn't want to do this manually.

Auto-scaling is a topic I plan to cover more in detail in a future article so if you are really curious how that is done I recommend you have a look here


Ok. So we did it. We managed to scale an app by creating replicas of it. It wasn't so hard to accomplish. We showed how we only needed to provide Kubernetes with a desired state and it would do its utmost to preserve said state, also called * self-healing*. Furthermore, we mentioned that there was another way to scale, namely auto-scaling but decided to leave that topic for another article. Hopefully, you are now more in awe of how amazing Kubernetes is and how easy it is to scale your app.

Top comments (10)

devorkin profile image
Yehonatan Devorkin • Edited

Hi there all!
Chris, I really liked your articles about K8s, I do learn a lot from them!

I have one question though, when I'm creating a new "deployment" per your articles, sing the command kubectl run kubernetes-first-app...
I'm getting eventually a new Pod - and none new Deployment.
How come? Am I missing anything?

I'm using in my lab both Minikube (with VirtualBox driver) and K8s lab (based on Vagrant, 2 Ubuntu VMs [means 2 nodes]) - both using Kubectl version 1.19

Thanks and "sorry" for jumping this article back to the active ones ;-)

joehobot profile image
Joe Hobot

Nice tutorial , on point and many will enjoy.

softchris profile image
Chris Noring

Thank you Joe :)

alexandrusimandi profile image
Alexandru Simandi • Edited

Great tutorial first of all. Really catches the essence
I am sorry to say but for us AKS is totally not production ready. Everything felt unprofessional from azure's side. Problems with version upgrade, problems with networking, expired certificate on service principal that got set to null, missing pvs, their support had no idea what was going on half of the time we were asking for support.
Overall it just didn't feel like a self managed service provided by a serious player in the cloud industry.

softchris profile image
Chris Noring

Hi Alexandru. Really appreciate this feedback. Are you able to tell me how long ago this was?

Happy to talk this through with you. It's pretty much my job to ensure that the product teams get this information so we can improve the product. :)

In short.
What would make you give it another try? What are the features that needs to work great for you to use it?

vinipachecov profile image
Vinícius Pacheco Vieira • Edited

Amazing content, I'm sharing to all my team here.
Just to update, at least in my computer, when running kubernetes in version 1.15 running kubectl delete $POD_NAME will not work, you have to specifcy a resource(pod, service etc).
The final command will be "kubectl delete pods $POD_NAME".

softchris profile image
Chris Noring

Thank you for that :) Hmm interesting.. Might be version thing, I'll update the command to ensure it works for all versions :)

goranpaunovic profile image
Goran Paunović

I tried to test load balancing feature but it looks like it is not working in minikube.

wsambian profile image
wisdom sambian

So far the best Kubernetes tutorial i have come across. Easy to digest. Keep it up. I'm following you up on twitter ASAP.

softchris profile image
Chris Noring

Thank you :)