CiCube for CICube

Posted on Oct 31 • Originally published at cicube.io

StatefulSets in Kubernetes

#kubernetes #devops #security

Introduction

However, stateful applications do bring some unique challenges in the dynamic world of Kubernetes. Luckily, Kubernetes has a fittingly powerful solution tailored for such stateful workloads: the StatefulSets. This post will walk you through the process of creating, scaling, and updating StatefulSets, whereby each application pod retains a stable network and storage identity. By the end, you will have practical insights into how you will be dealing with stateful systems in Kubernetes.

Getting Started with StatefulSets

StatefulSets probably constitute one of the most important features in Kubernetes that deal with stateful application management. They guarantee that within a set, pods are uniquely identified, and the storage is persistent. This feature becomes very important when the application requires access to a stable network identity and storage volumes that are not shared with other replicas. So, to go into this discussion of StatefulSet, it is good to have some background on several key Kubernetes resources like Pods, PersistentVolumes, and some of the kubectl tool functionality.

Hands-on practice requires setting up a cluster. If you are a beginner, ensure that you select a playground type of environment such as Minikube, Killeercoda, or Play with Kubernetes so that you will be playing with no possibility of losing any real workloads. Note that you will have to configure kubectl to communicate with your cluster. By default, use the default namespace since this provides you with a safe environment for learning. Foundational knowledge like this will prepare you for creating, scaling, and updating of StatefulSets efficiently.

Creation of a successful StatefulSet requires a few more steps to ensure that networking will work in an effective manner and the pods are predictable in terms of management. First, I create a YAML file that defines the StatefulSet and its headless Service. The headless service provides the Services, which manages the DNS records for my pods so as to provide them with their stable network identities. It is important for stateful applications that each pod be assigned an ordinal index and a stable hostname by this StatefulSet.

Then I create the StatefulSet and associated Service with the following kubectl command:

kubectl apply -f https://k8s.io/examples/application/web/web.yaml

The above command initiates the installation and deploys two pods, each running an NGINX server. While creating the pods, they will be brought up one after another in a prescribed order, thus ensuring stability in the network. Therefore, while creating the pods, I can assume consistency in IP address assignment for the purpose of connectivity.

After running the YAML, I check it is created successfully by running:

kubectl get statefulset web

As expected, I see both replicas of the StatefulSet marked running:

Understanding Pod Identity in StatefulSets

In Kubernetes, pods in a StatefulSet are important because of the fact that they keep identical identity during their whole lifetime. Each pod in a StatefulSet is assigned an ordinal index, unique within the set and forming part of the pod name such as web-0, web-1. This is not only virtually important for naming, but pretty vital to reference those pods predictably.

The ordinal index makes sure pods are created and terminated in order. This ordered creation allows a pod, in case of any kind of restart, to retain its identity so that the application can keep the state consistent. This behavior is important for stateful applications where the identity of a pod could determine how data is managed or exchanged.

For instance, let's say you have two pods, web-0 and web-1. Using the following command:

for i in 0 1; do kubectl exec web-$i -- sh -c 'hostname'; done

This prints the respective hostnames, as such:

web-0
web-1

Furthermore, to make use of the application communication's stable identities, we can make use of DNS lookups. Consider for instance the following execution:

kubectl run -ittty --image busybox:1.28 dns-test --restart=Never --rm -- nslookup web-0.nginx

This returns the address of the pod, and this ensures that during network requests, the IP does not have to be hard-coded. This is very critical because these IP addresses might change, but these hostnames would remain stable. Generally speaking, stable identities enable the stateful application design to achieve higher reliability and consistency, which may turn out to be crucial while scaling, updating, or troubleshooting an application in a distributed system.

Scaling StatefulSets Up and Down

One of the key resources tasked with managing stateful applications in Kubernetes is scaling StatefulSets. This scaling can be horizontal, hence dealing with increasing or decreasing the number of pod replicas and hence very important when handling workloads that may fluctuate.

After editing and saving, I will start the scaling process by running one of several commands that will adjust the number of replicas. Examples are kubectl scale or kubectl patch. If, for example, I wanted to increase the replicas from 2 to 5, the command would be:

kubectl scale sts web --replicas=5

Running this command, I should see that three new pods are going to be created. They will be deployed one by one, so that web-0 is up and running prior to web-1, and so on. This becomes important for stateful applications in order to maintain continuity and stability.

To scale down, for example to three replicas I'd execute:

kubectl patch sts web -p '{"spec":{"replicas":3}}'

In that case, Kubernetes will delete the pods in the reverse order of their ordinal index. This means web-4 and web-3 will be the first ones to go. Note that the PersistentVolumes bound to StatefulSet pods do not get deleted when scaling operations happen. This is quite different from other controller kinds, where deleting pods may release resources. In a StatefulSet these volumes persist so that data will not be lost when one replica is taken away, and the application resists failure or scaling grace. Consequently, learning the strategies for scaling StatefulSets, and the implications thereof, have a great influence on application design and data management in stateful applications.

Updating StatefulSets: Rolling Updates and Strategies

Updates to StatefulSets in Kubernetes represent a significant concern when dealing with stateful workloads. The major update strategy, so to say, for doing this is called RollingUpdate-it allows online upgrading of pods gracefully while keeping the service available.

I can control this process using the strategy RollingUpdate by declaring an updateStrategy in the StatefulSet configuration. Therefore, Kubernetes can update the pods one by one according to their ordinal index, which results in having an application consistent and healthy on each transition.

Partitioning Updates

Partitioning comes in when I'm performing updates. In the update strategy, I specify something-which I mean, a partition-that lets me control what pods can be updated. Taking after this example, if a value is set for partition, only pods whose ordinal index is greater than or equal to it are going to be ready for updating. This happens to be very useful during staged deployments, which allow for canary or phase rollout of testing incrementally.

kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'

Canary Deployments

Doing a canary deployment involves updating just one pod and observing its behavior before rolling the update to the rest. This keeps the risk contained, and thus I can roll back if there are any problems.

To perform a canary deployment, use:

kubectl delete pod web-1

This command deletes web-1, and then Kubernetes will recreate it with the new configuration.

Phased Rollouts

It can also do phased rollouts through the flow control of the update that specified partitions could do. This it does by setting the partition to gradually include new pods for updates while ensuring the existing ones are stable and functional hence continuity of service.

kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

Rollback Mechanisms

That's why Kubernetes can guarantee that if an update to a Pod fails, it will automatically roll back to a previous healthy state. In other words, this mechanism of automatic rollback acts as a safety check in maintaining an application healthy through updates, which makes StatefulSets somewhat special among controllers because not all other controllers would offer such a degree of rollout safety.

OnDelete Update Strategy

It requires that the update strategy of OnDelete manually delete the pod in order to trigger an update. It is more control but does need to be managed more carefully in updates; it's suitable where it needs precise rollout timing. In other words, with all these techniques-rolling updates, partitioning, canary deployments, and phased rollouts-I am better equipped to update StatefulSets without compromising application stability.

Deletion Strategies: Non-cascading vs. Cascading

If resource deletions, especially StatefulSets in Kubernetes, need to be performed with regard for their implications on applications and data. There are two ways in which deletions could happen: non-cascading and cascading.

Non-Cascading Deletion

In a non-cascading delete, only the StatefulSet itself is actually deleted, while its pods remain. That is, though the StatefulSet has been deleted, its constituent pods will keep running. That becomes useful in some situations where you want to preserve the state and configuration of the pods themselves while you want their managing StatefulSet to be removed. A non-cascading delete can be done by the following:

kubectl delete statefulset web --cascade=orphan

After running this you should be able to verify that the pods are still running by executing:

kubectl get pods -l app=nginx

Cascading Deletion

By contrast, the analogous cascading delete fully removes the StatefulSet and its pods. If you want to discard all resources that have something to do with that StatefulSet and you want to make a fresh start, this might be exactly what you need. Cascading delete has this simple command:

kubectl delete statefulset web

This command cleans up the StatefulSet and all of its Pods, leaving no residuals behind. Note that when cascading deletion is done, depending on their reclaim policy PersistentVolumes attached to the Pods may not be deleted. This provides an opportunity to retain valuable data or configurations, and to manage those stems more deterministically.

Impact on Persistent Storage and Services

The choice between cascading and non-cascading deletions has important differences in their implications for PersistentVolumes. Non-cascading deletions: PVs left behind remain bound; Data is preserved. Cascading deletions: Linked PVs are available, though clean-up can happen out of band, according to defined policies.

Best Practices and Considerations

Which of these methods of deletion to choose would, therefore, be dependent upon application needs. The need for data continuity or ongoing state management allows for a graceful transition with a non-cascading deletion. If you wish to perform clean slate management or even testing, probably a cascading deletion fits the bill. By choosing an appropriate strategy on the type of deletion, I can manage data and service continuity let alone resource utilization so much more elegantly.

Conclusion

StatefulSets represent the central object of orchestration in stateful applications of Kubernetes, combining the latter's properties of stable network identities with persistent storage for pods. As we have seen, they provide strong tools to manage pod creation order, scaling, and updates. An understanding of these elements is thus essential for any developer or operator who wants to productively use Kubernetes for stateful workloads. With proper ways and mannerisms, StatefulSets can be at the core of modern application infrastructure, balancing flexibility with reliability.

DEV Community