Updated on December 17th 2021.
Managing storage for implementing stateful set solutions is a distinct problem from managing compute instances.
🎿 0a. What's under the hood
There's no emoji for "only one lost ski", so the author will use a common icon for this theoretical part of the article. ☃️
In Kubernetes terms, the PersistentVolume
subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, the K8s team introduced two API resources: PersistentVolume
and PersistentVolumeClaim
.
A PersistentVolume
(PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster, just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system. From K8s version 1.11b PersistentVolume
s can be even configured to be expandable.
A PersistentVolumeClaim
(PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources.
Let's look at 3 (three) methods to get PV + PVC pairs tied to each other robustly.
🎿 I. Volume Name: yes, we need a hack for GKE, again
And the easiest one is about volumeName
!
There are situations when in your GKE environment you already have a persistent disk (say named pd-name
), and, let's imagine obviously you want to use strictly this disk and do not want to create a new one.
The manifest file below describes a corresponding PersistentVolume
and PersistentVolumeClaim
that your cluster can use.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-demo
spec:
storageClassName: ""
capacity:
storage: 50G
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: pd-name
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim-demo
spec:
storageClassName: ""
volumeName: pv-demo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50G
⚠️ Note: it is necessary to specify ""
as the storageClassName
so that the default storage class won't be used.
Now you may use kubectl apply -f existing-pd.yaml
command as described above to create the PersistentVolume
and PersistentVolumeClaim
.
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the pod using the claim. The cluster finds the claim in the pod's namespace and uses it to get the PersistentVolume
backing the claim. The volume is then mounted to the host and into the Pod.
After the PersistentVolume
and PersistentVolumeClaim
exist in the cluster, you can give a Pod's containers access to the volume by specifying values for the container's volumeMounts.mountPath
and volumeMounts.name
, as shown in the following example:
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv-claim-demo
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
This way, when you apply this manifest to a cluster, the Pod is created, and the task-pv-container
container has access to the volume in its /usr/share/nginx/html/
directory.
🎿 0b. Formalities matter! We have to know what we want to do
In GKE, for a container to access your pre-existing persistent disk, you'll need to do the following:
- Provision the existing persistent disk as a
PersistentVolume
. - Bind the
PersistentVolume
to aPersistentVolumeClaim
. - Give the containers in the Pod access to the
PersistentVolume
.
As you may have guessed already, there are several ways to bind a PersistentVolumeClaim
to a specific PersistentVolume
. For example, the following YAML manifest creates a new PersistentVolume
and PersistentVolumeClaim
, and then binds the claim to the volume using a claimRef
(see below), which ensures that the PersistentVolume
can only be bound to that PersistentVolumeClaim
.
☑️ Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
☑️ If you know exactly what PersistentVolume
you want your PersistentVolumeClaim
to bind to, you can specify the PV in your PVC using the volumeName
field. This method skips the normal matching and binding process. The PVC will only be able to bind to a PV that has the same name specified in volumeName
. If such a PV with that volumeName
exists and is Available
, the PV and PVC will be bound regardless of whether the PV satisfies the PVC’s label selector, access modes, and resource requests.
To bind a PersistentVolume
to a PersistentVolumeClaim
, the storageClassName
of the two resources must match, as well as capacity, accessModes
, and volumeMode
.
⚠️ Please note again, you can omit the storageClassName
, but you must specify ""
to prevent Kubernetes from using the default StorageClass
.
The storageClassName
does not need to refer to an existing StorageClass object. If all you need is to bind the claim to a volume, you can use any name you want. However, if you require extra functionality configured by a StorageClass, like volume resizing, then storageClassName
must refer to an existing StorageClass object.
🎿 II. Roulette or poker? A magic claim reference
It’s important to understand that the application developer can create the manifests for the Pod and the PersistentVolumeClaim
objects without knowing anything about the infrastructure on which the application will run. Similarly, the cluster administrator can provision a set of storage volumes of varying sizes in advance without knowing much about the applications that will use them. You may see some schematics about this process' internals below.
Kubernets can operate with different network file systems, not only with NFS, of course.
You may also want your cluster administrator to "reserve" the volume for only your claim so that nobody else’s claim can bind to it before yours does. In this case, the administrator can specify the PVC in the PV using the claimRef
field.
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "claim1"
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: "1Gi"
volumeName: "pv0001"
The PV will only be able to bind to a PVC that has the same name and namespace specified in claimRef
. The PVC’s access modes and resource requests must still be satisfied in order for the PV and PVC to be bound, though the label selector is ignored. Persistent Volume Object Definition with claimRef
example is shown below:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0001
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
nfs:
path: /tmp
server: 172.17.0.2
persistentVolumeReclaimPolicy: Recycle
claimRef:
name: claim1
namespace: default
Specifying a volumeName
in your PVC does not prevent a different PVC from binding to the specified PV before yours does. Your claim will remain Pending
until the PV is Available
.
☑️ It should be kept in mind that specifying a claimRef
in a PV does not prevent the specified PVC from being bound to a different PV! The PVC is free to choose another PV to bind to according to the normal binding process. Therefore, to avoid these scenarios and ensure your claim gets bound to the volume you want, you must ensure that both volumeName
and claimRef
are specified.
You can tell that your setting of volumeName
and/or claimRef
influenced the matching and binding process by inspecting a Bound
PV and PVC pair for the pv.kubernetes.io/bound-by-controller:
annotation. The PVs and PVCs where you set the volumeName
and/or claimRef
yourself will have no such annotation, but ordinary PVs and PVCs will have it set to "yes
". See GitLab's article related to such situation.
When a PV has its claimRef
set to some PVC name and namespace, and is reclaimed according to a Retain or Recycle reclaim policy, its claimRef
will remain set to the same PVC name and namespace even if the PVC or the whole namespace no longer exists.
⚠️ The ability to set claimRef
s is a temporary workaround for the described use cases. A long-term solution for limiting who can claim a volume is in development. The cluster administrator should first consider configuring selector-label volume binding before resorting to setting claimRef
s on behalf of users.
🎿 III. Selectors & lables, or Wait, I have everything written in advance!
So, there is a more elegant approach to enable binding of persistent volume claims (PVCs) to persistent volumes (PVs) via selector
and label
attributes. By implementing selectors and labels, regular users are able to target provisioned storage by identifiers defined by a cluster administrator.
Why it can be needed? In cases of statically provisioned storage, developers seeking persistent storage are required to know a handful identifying attributes of a PV in order to deploy and bind a PVC. This creates several problematic situations. Regular users might have to contact a cluster administrator to either deploy the PVC or provide the PV values. PV attributes alone do not convey the intended use of the storage volumes, nor do they provide methods by which volumes can be grouped.
Selector and label attributes can be used to abstract away PV details from the user while providing cluster administrators a way of identifying volumes by a descriptive and customizable tag. Through the selector-label method of binding, users are only required to know which labels are defined by the administrator. Let's look at Persistent Volume with Labels example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: gluster-volume
labels:
storage-tier: gold
aws-availability-zone: us-east-1
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteMany
glusterfs:
endpoints: glusterfs-cluster
path: myVol1
readOnly: false
persistentVolumeReclaimPolicy: Retain
Use labels to identify common attributes or characteristics shared among volumes. In this case, we defined the Gluster volume to have a custom attribute (key) named storage-tier
with a value of gold
assigned. A claim will be able to select a PV with storage-tier=gold
to match this PV.
And as a next step we need to compose a claim, a Persistent Volume Claim with Selectors. A claim with a selector stanza (see example below) attempts to match existing, unclaimed, and non-prebound PVs. The existence of a PVC selector ignores a PV’s capacity. However, accessModes
are still considered in the matching criteria.
⚠️ It is important to note that a claim must match all of the key-value pairs included in its selector stanza. If no PV matches the claim, then the PVC will remain unbound (Pending
). A PV can subsequently be created and the claim will automatically check for a label match.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gluster-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
selector:
matchLabels:
storage-tier: gold
aws-availability-zone: us-east-1
🎿 IV. What if we don't want to produce useless entities?
Real life often comes with surprises. What if you can’t recover a data from a Pod that was using a PersistentVolume or you're looking for ways to safely delete pods without affecting storage stored in a PV?
⚠️ Make sure the PersistentVolume ReclaimPolicy
is set to Retain
. For dynamically provisioned PersistentVolumes, the default reclaim policy is Delete
. This means that a dynamically provisioned volume is automatically deleted when a user deletes the corresponding PersistentVolumeClaim. This automatic behavior might be inappropriate if the volume contains precious data.
If ReclaimPolicy
is currently set to Delete
, which is a common logic in GKE to be applied for pods (look at spec.persistentVolumeReclaimPolicy: Delete
in YAML) you can patch the PV by issuing a command:
kubectl patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Next, proceed to delete the resources that was using that PV. The STATUS
of the PV will then become Released
after the PVC/Pod hold have been deleted.
In a Released state, edit the PV and remove spec.claimRef
block:
apiVersion: v1
kind: PersistentVolume
...
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
claimRef:
...
After removing the spec.claimRef
block, the PersistentVolume will be available. You'll be capable to proceeed re-adding your application involving the PV we discussed above.
How to check what we've done at all
The kubectl get pv
and kubectl get pvc
commands can be used to see what PersistentVolume
and PersistentVolumeClaim
have been defined for the application. If the STATUS
is Bound
, the application do have access to the necessary storage. The kubectl describe pv
command may be used to see detailed information about the Persistent Volume used by the application.
The kubectl describe pv
command is used to see detailed information about the Persistent Volume used by the application. An example (thanks IBM) is shown below:
# kubectl describe pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
powerai-vision-data 40Gi RWX Retain Bound default/powerai-vision-data-pvc 48d
[root@dlf01 ~]# /opt/powerai-vision/bin/kubectl.sh describe pv
Name: powerai-vision-data
Labels: assign-to=powerai-vision-data
type=local
Annotations: pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/powerai-vision-data-pvc
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 40Gi
Message:
Source:
Type: HostPath (bare host directory volume)
Path: /opt/powerai-vision/volume/
Events: <none>
The kubectl describe pvc
command is used to see detailed information about the Persistent Volume Claim for the application.
# kubectl describe pvc
Name: powerai-vision-data-pvc
Namespace: default
StorageClass:
Status: Bound
Volume: powerai-vision-data
Labels: app=powerai-vision
chart=ibm-powerai-vision-prod-1.1.0
heritage=Tiller
release=vision
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 40Gi
Access Modes: RWX
Events: <none>
Respectively and again, the above output shows more details about the Persistent Volume Claim used by the application. The Volume
section references the underlying Persistent Volume, and the STATUS
should be Bound
if it has been successfully allocated to the application. The Events section will show if there were issues with the Persistent Volume Claim.
Also, the author is strongly excited with Utku Özdemir's pv-migrate
, a CLI tool/kubectl
plugin to easily migrate the contents of one Kubernetes PersistentVolume[Claim]
to another.
YAML examples of GKE Pod with Persistent Volume (+ WebUI screenshots) may be found here (thanks to DevOpsCube team).
Sources: RedHat, loft.sh, K8s docs, GitLab, GitHub, letsdoclod.com by Erick Gubi
📖 "Kubernetes In Action" by Marko Lukša, ISBN 9781617293726
Top comments (0)