Arsh Sharma

Posted on Jan 11, 2021 • Edited on Jan 14, 2021

Using Volumes in Kubernetes

#kubernetes #devops #tutorial #docker

When working with K8s we want to ensure that the data created by our application, running in containers, survives if something were to happen to any one of our running containers. It is this problem that the concept of Volumes aims to solve.

In the last post of this series we got our hands dirty with K8s and ran a simple Node.js app on our cluster. In a previous post which was a part of my series on Docker I've also talked about Volumes. Here we are going to be combining those two concepts and having a look at how we can use Volumes when working with K8s, so it is highly recommended that you go check those two posts out before reading this one.

When working with Docker we used: Anonymous Volumes, Named Volumes, and Bind Mounts. K8s in contrast to Docker supports a large amount of volume types. Apart from regular volumes it also has something called Persistent Volumes (which we will have a look at in detail).

Explaining all these types is simply not possible so instead what I am going to do is show you three common problems that you might face and see how you can solve each one of these using a different type of volume each time.

I'll be using the deployment.yaml file we discussed in the previous post and build up on top of that.

apiVersion: apps/v1 
kind: Deployment 
metadata:
  name: node-app-deployment
spec:
  replicas: 1 
  selector: 
    matchLabels: 
      anything: node-app 
  template: 
    metadata:
      labels:
        anything: node-app
    spec: 
      containers:
        - name: node-app-container 
          image: YourDockerHubName/node-image

So let's get started!

The First Problem

Let's say that we have a container running inside a pod and the application it is running stores some user data. Now if something were to happen to this container and it has to restart then all of the data we stored previously would be lost. This is not at all desirable and is something we will fix.

The simplest volume type we can use to fix this is the emptyDir type.

What we will do is that in our deployment.yaml file, in the pod specification, next to containers, add the volumes key where we list all the volumes which should be part of this pod. Doing this will set up our volume but we also need to make it accessible inside the container so we will add the volumeMounts key in our configuration of the containers.

The nice part is that using the emptyDir type is fairly simple.
Let us look at how the deployment.yaml file will look and then I'll explain what we've done.

apiVersion: apps/v1 
kind: Deployment 
metadata:
  name: node-app-deployment
spec:
  replicas: 1 
  selector: 
    matchLabels: 
      anything: node-app 
  template: 
    metadata:
      labels:
        anything: node-app
    spec: 
      containers:
        - name: node-app-container 
          image: YourDockerHubName/node-image
          volumeMounts:
            - mountPath: /app/userData
              name: userData-volume
      volumes:
        - name: userData-volume
          emptyDir: {}

We first listed the volumes we want to use under volumeMounts key where we specify the mountPath which is the location in our container where the files are being stored and the name of the volume.

Then under the volumes key, we configure each volume by first specifying its name and then the config. The config is based on the type which we have to specify first. Here the type is emptyDir and we didn't specify any special config for it implying that we want to use the emptyDir type of volume with its default settings.

And voila! Doing so has solved our problem. If for some reason our container now shuts down (assuming only one pod is present), then when it will restart (this is something k8s handles for us by default) then the container will have access to the data that was created earlier since the data isn't being stored in the container now. Instead, a new empty directory is created whenever the pod starts. Containers can then write to this dir and if they restart or are removed the data survives.

The Second Problem

You might have gotten a hint of what the second problem is based on our solution to the first one. To solve the first problem we stored the data on the pod so that even if the container restarts the data is still present. But what if the pod restarts?

If there is a single pod and it restarts then our data would be lost and the app won't work while the pod is down. But if we have multiple pods and one of them shuts down for some reason then the data stored in its volume will be lost but our app would still function because other pods are running (and k8s will automatically redirect incoming traffic to them). But we would still lose the data our shut down pod had. In short, our app would still work but not have some user data.

So what is the solution?
If you remember the K8s architecture then you might have guessed it already. We could store the data on the node running these pods (assuming it is a single node cluster).

This is where the hostPath type comes in. It allows us to set a path on the host machine (the node running the pods) and then the data from that path gets exposed to different pods. So multiple pods can share the same path on the host machine (the node).

Once again let's first have a look at our deployment.yaml file and then I will explain what we have done.

apiVersion: apps/v1 
kind: Deployment 
metadata:
  name: node-app-deployment
spec:
  replicas: 1 
  selector: 
    matchLabels: 
      anything: node-app 
  template: 
    metadata:
      labels:
        anything: node-app
    spec: 
      containers:
        - name: node-app-container 
          image: YourDockerHubName/node-image
          volumeMounts:
            - mountPath: /app/userData
              name: userData-volume
      volumes:
        - name: userData-volume
          hostPath:
            path: /data
            type: DirectoryOrCreate

Most of it is the same except that while configuring the volume we have now used hostPath instead of emptyDir and provided some configuration for it. First is path which refers to the folder on our host machine (the node) where we want to save the data. The second is type where we provide the value of DirectoryOrCreate which basically means that if the folder we specified above exists use it and if not then create it on the host machine.

hostPath type can be thought of a bit like the Bind Mounts I talked about in the Docker series. Using this type should now solve the problem of our data being lost when pods shut down.

The Third Problem

The third problem too stems from our solution for the second one. What if the pods are not present on the same node. Then multiple pods running on different nodes will not have access to the entire user data which our app stores.

Here Persistent Volumes will come to the rescue.

Persistent Volumes (PVs) are pod and node independent volumes. The idea is that instead of storing the data in the pod or a node, we have entirely separate entities in our K8s cluster that are detached from our nodes (and hence pods).

Each individual pod then will have something called a Persistent Volume Claim (PVC) and it will use this to access the standalone entities we created.

The following image I found should provide some clarity:

Persistent Volumes like regular volumes also have types. The hostPath type we talked about is common to both and is perfect for experimenting with persistent volumes (PVs) when working locally.

This is because the cluster minikube provides us is a single node cluster. While you would not be using a single node cluster when working with persistent volumes, the workflow I'll be explaining will more or less be the same.

If it's getting confusing just remember that we can use the hostPath type of Persistent Volumes since we are working with a single node cluster that minikube set up for us.

Setting up the PV

The first step would be to set up the persistant volume. We are doing so using the hostPath type. For this create a host-pv.yaml file which would look something like this:

apiVersion: v1 
kind: PersistentVolume
metadata:
  name: host-pv
spec: 
  capacity: 
    storage: 1Gi 
  volumeMode: Filesystem
  storageClassName: standard
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /data
    type: DirectoryOrCreate

In the specification of this PV we first mention the capacity. The goal here is to control how much capacity can be used by different pods which later get executed in our cluster. Note that here we mention the total capacity we want to make available. Pods when they claim this persistent volume can define how much capacity they require. 1 Gi stands for 1 gigabyte here.
We have to specify the volumeMode key where we have two modes to choose from: FileSystem and Block. These are two types of storage available to us. Since we are going to have a folder in the filesystem of our Virtual Machine (running the cluster) we chose the FileSystem type.
K8s has a concept called storage classes. We have a storage class by default which we can see using the kubectl get sc command. In the output, you'll see that the name of the default storage class is standard which is what we have specified here. Storage classes are an advanced concept that isn't in the scope of this article but if I were to give a brief then I'd say that storage classes basically give administrators fine control over how storage is managed.
accessModes tell how the PV can be accessed. Here we enlist all the modes we want to support. ReadWriteOnce mode allows the volume to the mounted as a Read and Write volume only by a single node (which is perfectly fine here since our cluster is a single node cluster). You might want to look into other modes for a multiple node cluster, like ReadOnlyMany and ReadWriteMany.
After the accessModes we simply mention the type of persistent volume (which is hostPath here) along with its configuration like we did earlier too.

Setting up the PVC

Simply defining the PV is not enough. We also need to define the PV Claim which the pods will use later.
For this create a host-pvc.yaml file which looks something like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: host-pvc
spec: 
  volumeName: host-pv 
  accessModes: 
    - ReadWriteOnce
  storageClassName: standard
  resources: 
    requests: 
      storage: 1Gi

In the specification for this PV Claim we first mention the PV name for which this claim is.
Then we choose the accessModes from the ones we listed in the host-pv.yaml file. Since we listed only one we have no other choice but to go with that one here.
After that we again mention the storage class we want to use like before.
The resources key can be thought of as the counterpart for the capacity we mentioned in the host-pv.yaml file. Here we choose how much storage we want to request. We would generally not request the entire amount of storage available to us. Here it wouldn't matter since we are just testing stuff out.

Final Configuration

Now that we have our PV and PVC set up, all that needs to be done is make changes in the deployment.yaml file so that we use this PV instead of the hostPath we were using earlier.

Our deployment.yaml would look like this now:

apiVersion: apps/v1 
kind: Deployment 
metadata:
  name: node-app-deployment
spec:
  replicas: 1 
  selector: 
    matchLabels: 
      anything: node-app 
  template: 
    metadata:
      labels:
        anything: node-app
    spec: 
      containers:
        - name: node-app-container 
          image: YourDockerHubName/node-image
          volumeMounts:
            - mountPath: /app/userData
              name: userData-volume
      volumes:
        - name: userData-volume
          persistentVolumeClaim:
            claimName: host-pvc

The entire file is like it was before except that we have replaced the hostPath key with persistentVolumeClaim and under that specified the claim we want to use.

With this, we're good to go. To see our persistent volume in action, apply the files using this command like we did in the previous post:

kubectl apply -f=host-pv.yaml -f=host-pvc.yaml -f=deployment.yaml -f=service.yaml

Make sure you have your minikube cluster up before running these commands.

Now we've finally set up a persistent volume. This PV is both node as well as pod independent. I repeat again that even though we used the hostPath type while setting up this PV (because the cluster minikube sets up is a single node cluster) the overall process would be similar for other types of PVs.

Thanks a lot for reading! :D

If you have any feedback for me or just want to talk feel free to connect with me on Twitter. I'll be more than happy to hear from you!