David Oyewole

Posted on Mar 21

Mastering Kubernetes Persistent Volumes: A Simplified Beginner Guide

#devops #kubernetes #cloudcomputing #docker

Introduction

In a traditional containerized environment, storage is ephemeral—meaning that when a pod is restarted, all its data is lost. This happens because container file systems are temporary and tied to the pod's lifecycle. This is a major challenge for stateful applications like databases and file storage systems that hold important data and information.

Kubernetes solves this issue by providing Persistent Volumes (PV) and Persistent Volume Claims (PVC), ensuring data persistence even when pods restart or move between nodes. This ensures that data remains persistent and secure.. In this article, I will guide you through how Persistent Volumes work, how to configure them, and best practices for managing persistent storage in Kubernetes, with a complete practical guide.

Understanding Persistent Volumes in Kubernetes

Why Do We Need Persistent Volumes?

By default, Kubernetes pods use ephemeral storage, which means:

When a pod restarts or gets rescheduled to another node, its data is lost.
Stateful applications (e.g., databases like MySQL, MongoDB) require persistent storage.
Persistent Volumes provide a way to separate storage from pods, allowing data to persist beyond the lifecycle of a pod.

What is a Persistent Volume (PV) and a Persistent Volume Claim (PVC)?

Persistent Volume (PV): A physical storage resource in the cluster, provisioned by administrators or dynamically created using Storage Classes.
Persistent Volume Claim (PVC): A request from a pod to use a Persistent Volume.
StorageClass: Defines how storage should be dynamically provisioned by the cloud provider or local storage.

How Persistent Volumes Work

Administrator creates a Persistent Volume (PV) with a specified storage size and access mode.
A pod requests storage by creating a Persistent Volume Claim (PVC).
Kubernetes binds the PVC to an available PV.
The pod mounts the Persistent Volume and uses it for storage.

Persistent Volume Provisioning: Static vs Dynamic

Static Provisioning: The cluster administrator manually creates a Persistent Volume (PV) before any pod requests storage.

Dynamic Provisioning: Kubernetes automatically provisions storage when a pod creates a Persistent Volume Claim (PVC) using a StorageClass.

Creating a Persistent Volume and PVC (Hands-on Guide)

In this practical guide, I will be using MongoDB and Mongo Express (a UI for MongoDB) to demonstrate how to use Persistent Volumes. This example is tested on a Minikube cluster, so ensure your Minikube is running.

Since we will be using local storage to persist data, we need to create a storage path in Minikube. This storage path will be used to create our Persistent Volume (PV).

minikube ssh
sudo mkdir -p /data/mongodb
sudo chmod 777 /data/mongodb
exit

This will create the directory and grant the necessary permissions. Now, we can proceed to create the PV, PVC, and deployments.

Step 1: Create a Persistent Volume (PV)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: mongodb-pv
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/data/mongodb"

Ensure that the hostPath matches the directory created in your Minikube setup.

Step 2: Create a Persistent Volume Claim (PVC)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Ensure the storageClassName, accessModes, and storage values match those defined in the PV

After applying these configurations, Kubernetes will bind the PVC to an available PV, allowing the pod to use persistent storage.
Run kubectl get pv and kubectl get pvc after applying the configuration to confirm the binding. The status should look like this:

Step 3: Deploy MongoDB with Persistent Volume

For a real-world use case, let’s deploy MongoDB with a Persistent Volume.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongo-deployment
  labels:
    app: mongodb
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      volumes:
      - name: mongodb-storage
        persistentVolumeClaim:
          claimName: mongodb-pvc
      containers:
        - name: mongo
          image: mongo
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongodb-storage
              mountPath: /data/db
          resources:
            limits:
              cpu: "500m"
              memory: "256Mi"
            requests:
              cpu: "250m"
              memory: "128Mi"
          env:
            - name: MONGO_INITDB_ROOT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: username
            - name: MONGO_INITDB_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: password

Your MongoDB deployment file should look like this but take note of the following.

In the volumes section, the claimName must match the name of your PVC.
In The volumeMounts section, the mountPath is the default mountPath for MongoDB
The env which is the Environment Variables i.e the MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD is stored in a secret file for security purposes, I will write about that in my next article.

Now apply the configuration

Step 4: Deploy Mongo Express as a UI for MongoDB

Now let's deploy Mongo Express

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongoexpress-deployment
  labels:
    app: mongoexpress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongoexpress
  template:
    metadata:
      labels:
        app: mongoexpress
    spec:
      containers:
        - name: mongoexpress
          image: mongo-express
          ports:
            - containerPort: 8081
          resources:
            limits:
              cpu: "500m"
              memory: "256Mi"
            requests:
              cpu: "250m"
              memory: "128Mi"
          env:
            - name: ME_CONFIG_MONGODB_ADMINUSERNAME
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: username
            - name: ME_CONFIG_MONGODB_ADMINPASSWORD
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: password
            - name: ME_CONFIG_MONGODB_SERVER
              valueFrom:
                configMapKeyRef:
                  name: mongo-config
                  key: database_url

This is what your Mongo Express deployment should look like, the environment variables are contained in a secret and configMap file as I mentioned earlier.

Step 5: Configure services for communication

For internal and external communication, we need to configure services, MongoDB will use ClusterIP because there will be no need for external communication while Mongo Express will use a Nodeport to give external access.

apiVersion: v1
kind: Service
metadata:
  name: mongo-service
spec:
  selector:
    app: mongodb
  ports:
    - protocol: TCP
      port: 27017
      targetPort: 27017
  type: ClusterIP


---
apiVersion: v1
kind: Service
metadata:
  name: mongoexpress-service
spec:
  selector:
    app: mongoexpress
  ports:
    - protocol: TCP
      port: 8081
      targetPort: 8081
      nodePort: 30000
  type: NodePort

Run kubectl get svc to check your services, it should look like the image below.

Step 6: Minikube Service

Next, we will use Minikube to expose the Mongo Express service, making it accessible from the browser. This will enable us to interact with MongoDB through the UI. Run the command:
minikube service minikube service mongoexpress-service

You can now access Mongo Express from the given IP address. The web page should look like this:

To test this, create databases and collections from the Mongo Express UI. Restart the pods, and you will still have the data intact. This demonstrates how Persistent Volumes work.

Common Issues & Troubleshooting

PVC stuck in "Pending" state?
- Run kubectl describe pvc mongodb-pvc to check events.
- Ensure that a matching PV is available.
- Check if the storageClassName matches between the PV and PVC.
Pod cannot write to storage?
- Ensure the permissions on the host path are correct.
Pod stuck in "ContainerCreating"?
- Verify that the Persistent Volume is properly mounted.
- Run kubectl describe pod <pod-name> for detailed logs.
PV and PVC not bound?
- Check kubectl get pv and kubectl get pvc for status.
Data loss after pod restart?
- Ensure that the volume is mounted correctly.
- Check the reclaim policy (Retain, Delete, Recycle).

Best Practices for Using Persistent Volumes

Use Storage Classes for dynamic provisioning when working with cloud storage providers.
Choose the right access mode (ReadWriteOnce, ReadWriteMany) based on the application's needs.
Set the appropriate reclaim policy (Retain, Delete, Recycle) depending on data retention needs.
Monitor storage usage using Kubernetes metrics and alerts.

Conclusion

Persistent Volumes are essential for running stateful applications in Kubernetes. By using PV, PVC, and Storage Classes, you can ensure that your application data persists beyond pod restarts and rescheduling.

Now that you understand how Persistent Volumes work, try deploying them in your own Kubernetes cluster and see how they enhance data persistence! 🚀