Kubernetes-101: Volumes, part 2

#kubernetes

In the previous article we spent some time learning about the basics of Volumes. We saw how to create Volumes backed by a directory on our host machine (in my case the host machine is the Minikube VM), as well as from a ConfigMap in our cluster.

In this article we will look at PersistentVolumes and PersistentVolumeClaims. We saw how a Volume is part of the Pod specification, and it is not a separate Kubernetes object. Although the data in our Volume survived when we deleted our Pod, the Volume itself was deleted along with the Pod. The PersistentVolume is a separate Kubernetes object and its lifecycle is completely separated from the Pod that is using it. This is the kind of storage resource you want to use for your databases or for any other data that should persist well beyond the Pod resources that are using the data. The PersistentStorageClaim is a separate Kubernetes resource that is used to request storage with certain properties and it is the resource you connect with your Pods.

The following image illustrates the relationship between a Pod, a PersistentVolumeClaim, and a PersistentVolume.

PersistentVolume

A PersistentVolume is a cluster resources, similar to how a node (a baremetal or virtual machine) is a cluster resource. Typically your cluster administrator will create PersistentVolumes for you, and you as a humble Kubernetes application developer would consume the PersistentVolumes in your application (through a PersistentVolumeClaim, more on that in the next section). A cluster resource does not exist in a specific Namespace.

Like a regular Volume, a PersistentVolume will be backed by a storage media of some sort, since the data has to be stored somewhere. This would typically be the storage media provided by a cloud provider, e.g. Elastic Block Store (EBS) on AWS.

I said that your cluster administrator might create PersistentVolumes for you to use, this is called static provisioning. There is also dynamic provisioning where PersistentVolumes are created on-the-fly as they are requested by an application. In this article we will only consider the static approach.

To create a PersistentVolume with a declarative approach we can use the following manifest¹:

# pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: database-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/data"

The manifest in pv.yaml contains the usual fields of .apiVersion, .kind, .metadata.name and .spec. In the specification I say the folllowing:

I want to create a PersistentVolume of size 1Gi (1 Gibibyte)
I want the access mode to be ReadWriteOnce which means that only a single node in my cluster can read and write to this PersistentVolume
I provide a storageClassName and set it to a custom name of local-storage
I say that when a PersistentVolumeClaim associated with this PersistentStorage is deleted I want to retain the data (persistentVolumeReclaimPolicy: Retain)
Similar to what I did in the previous article on Volumes I use a hostPath type of Volume because that is the easiest to use from Minikube (the current cluster I am using). See the documentation for explanations of all settings¹.

We can create our PersistentVolume using kubectl apply:

$ kubectl apply -f pv.yaml

persistentvolume/database-pv created

We can run kubectl get to see details about our PersistentVolume:

$ kubectl get persistentvolume database-pv

NAME          CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
database-pv   1Gi        RWO            Retain           Available           local-storage           10s

In this output RWO is short for ReadWriteOnce. The column named CLAIM is empty, this means this PersistentVolume has not been associated with any PersistentVolumeClaim (as I said, more on this in the next section). We will re-run this command later in this article.

We can run kubectl describe to get additional details about or PersistentVolume:

$ kubectl describe persistentvolume database-pv

Name:            database-pv
Labels:          <none>
Annotations:     <none>
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:
Status:          Available
Claim:
Reclaim Policy:  Retain
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /data
    HostPathType:
Events:            <none>

If we want to shorten the previous two commands we can replace persistentvolume by its short form pv, leading to kubectl get pv ... and kubectl describe pv .... Now the short-hand version really pays off!

Before we can use our PersistentVolume we will need to create a PersistentVolumeClaim, more on this in the next section.

PersistentVolumeClaims

Now we have to imagine that we are Kubernetes application developers and our Kubernetes cluster administrator has created a number of PersistentVolumes for us to use. Luckily we created a PersistentVolume ourselves in the previous section, so it is not hard to imagine we have one available. What we need to do now is to claim some storage from the pool of available PersistentVolumes. We do this through the resource known as a PersistentVolumeClaim. In a PersistentVolumeClaim we specify the properties of the storage we need, and the corresponding claim will be reserved from the pool of available PersistentVolumes.

Let's create a PersistentVolumeClaim manifest:

# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  resources:
    requests:
      storage: "1Gi"

Here I say that I request storage space of 1Gi with access mode ReadWriteOnce, and that the storage class should be local-storage. This fits with the PersistentVolume we created in the previous section! The next step is to write a manifest for a Pod that uses our PersistentVolumeClaim:

# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: database-app
spec:
  # define the volumes available in this pod
  volumes:
    - name: database-storage
      persistentVolumeClaim:
        claimName: database-pvc
  containers:
    - name: db
      image: alpine
      command: ["sleep", "3600"] # hack to make the container stay alive
      # mount the volume defined above
      volumeMounts:
        - name: database-storage
          mountPath: "/mnt/data"

The important part of this manifest is .spec.volumes[*].persistentVolumeClaim where I specify the name of a PersistentVolumeClaim resource. The rest of the manifest is similar to how we did it for regular Volumes in the previous article.

I place both my manifests into a directory named pvc and I run kubectl apply on the whole directory:

$ kubectl apply -f ./pvc

pod/database-app created
persistentvolumeclaim/database-pvc created

If I run kubectl describe pod on my new Pod I can see that it has an attached Volume from a PersistentVolumeClaim:

$ kubectl describe pod database-app

Name:             database-app
Namespace:        default
...
Containers:
  db:
    ...
    Mounts:
      /mnt/data from database-storage (rw)
Volumes:
  database-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  database-pvc
    ReadOnly:   false
...

Similarly if I check the details of my PersistentVolumeClaim:

$ kubectl describe persistentvolumeclaim database-pvc

Name:          database-pvc
Namespace:     default
StorageClass:  standard
Status:        Bound
Volume:        pvc-be36a324-694a-4f0f-9bde-ca404a6e08ca
Capacity:      512m
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       database-app

I see that the PersistentVolumeClaim is used by the database-app Pod. The previous command was very long due to persistentvolumeclaim being the longest resource type name in Kubernetes, luckily it can be shortened to pvc, so the previous command could be changed to kubectl describe pvc database-pvc. Did I say that we really saved time on writing pv instead of persistentvolume? Now we took this time-saving to an even higher level.

In the previous section I said that we would check on our PersistentVolume again once we have done something with it, so let's run kubectl get to see details about our PersistentVolume again:

$ kubectl get persistentvolume database-pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                  STORAGECLASS    REASON   AGE
database-pv                                1Gi        RWO            Retain           Bound      default/database-pvc   local-storage            7m15s

We can now see that the CLAIM column has a value of default/database-pvc, the name of our PersistentVolumeClaim. We also see that the STATUS is now Bound.

Now we have a Pod with a connected Volume that in turn uses the PersistentVolumeClaim we created together with the Pod, which in turn uses the PersistentVolume that we (or our Kubernetes cluster administrator) set up. We can use the Volume in our Pod just like we did in the previous article.

Summary

In this article we encountered PersistentVolumes and PersistentVolumeClaims. The main point of these resources is to separate the lifecycle of our data stored in volumes with the Pods themselves. It is a good practice!

Next article is the last in this mini-series of articles on Volumes. We will learn about Container Storage Interface (CSI) drivers.