We continue our "Kubernetes in a Nutshell" journey and this part will cover Kubernetes Volumes! You will learn about:
- Overview of
Volumes and why they are needed - How to use a
Volume - Hands-on example to help explore
Volumes practically
The code is available on GitHub
Happy to get your feedback via Twitter or just drop a comment!
Pre-requisites:
You are going to need minikube and kubectl.
Install minikube as a single-node Kubernetes cluster in a virtual machine on your computer. On a Mac, you can simply:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 \
&& chmod +x minikube
sudo mv minikube /usr/local/bin
Install kubectl to interact with yur AKS cluster. On a Mac, you can simply:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
Overview
Data stored in Docker containers is ephemeral i.e. it only exists until the container is alive. Kubernetes can restart a failed or crashed container (in the same Pod), but you will still end up losing any data which you might have stored in the container filesystem. Kubernetes solves this problem with the help of Volumes. It supports many types of Volumes including external cloud storage (e.g. Azure Disk, Amazon EBS, GCE Persistent Disk etc.), networked file systems such as Ceph, GlusterFS etc. and others options like emptyDir, hostPath, local, downwardAPI, secret, config etc.
How are Volumes used?
Using a Volume is relatively straightforward - look at this partial Pod spec as an example
spec:
containers:
- name: kvstore
image: abhirockzz/kvstore:latest
volumeMounts:
- mountPath: /data
name: data-volume
ports:
- containerPort: 8080
volumes:
- name: data-volume
emptyDir: {}
Notice the following:
-
spec.volumes- declares the available volume(s), itsname(e.g.data-volume) and other (volume) specific characteristics e.g. in this case, its points to an Azure Disk -
spec.containers.volumeMounts- it points to a volume declared inspec.volumes(e.g.data-volume) and specifies exactly where it wants to mount that volume within the container file system (e.g./data).
A Pod can have more than one Volume declared in spec.volumes. Each of these Volumes is accessible to all containers in the Pod but it's not mandatory for all the containers to mount or make use of all the volumes. If needed, a container within the Pod can mount more than one volume into different paths in its file system. Also, different containers can possibly mount a single volume at the same time.
Another way of categorizing Volumes
I like to divide them as:
-
Ephemeral -
Volumes which are tightly coupled with thePodlifetime (e.g.emptyDirvolume) i.e. they are deleted if thePodis removed (for any reason). -
Persistent -
Volumes which are meant for long term storage and independent of thePodor theNodelifecycle. This could beNFSor cloud based storage in case of managed Kubernetes offerings such as Azure Kubernetes Service, Google Kubernetes Engine etc.
Let's look at emptyDir as an example
emptyDir volume in action
An emptyDir volume starts out empty (hence the name!) and is ephemeral in nature i.e. exists only as long as the Pod is alive. Once the Pod is deleted, so is the emptyDir data. It is quite useful in some scenarios/requirements such as a temporary cache, shared storage for multiple containers in a Pod etc.
To run this example, we will use a naive, over-simplified key-value store that exposes REST APIs for
- adding key value pairs
- reading the value for a key
Here is the code if you're interested
Initial deployment
Start minikube if already not running
minikube start
Deploy the kvstore application. This will simply create a Deployment with one instance (Pod) of the application along with a NodePort service
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/volumes-1/kvstore.yaml
To keep things simple, the YAML file is being referenced directly from the GitHub repo, but you can also download the file to your local machine and use it in the same way.
Confirm they have been created
kubectl get deployments kvstore
NAME READY UP-TO-DATE AVAILABLE AGE
kvstore 1/1 1 1 28s
kubectl get pods -l app=kvstore
NAME READY STATUS RESTARTS AGE
kvstore-6c94877886-gzq25 1/1 Running 0 40s
It's ok if you do not know what a
NodePortservice is - it will be covered in a subsequent blog post. For the time being, just understand that it is a way to access our app (REST endpoint in this case)
Check the value of the random port generated by the NodePort service - You might see a result similar to this (with different IPs, ports)
kubectl get service kvstore-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kvstore-service NodePort 10.106.144.48 <none> 8080:32598/TCP 5m
Check the PORT(S) column to find out the random port e.g. it is 32598 in this case (8080 is the internal port within the container exposed by our app - ignore it)
Now, you just need the IP of your minikube node using minikube ip
This might return something like
192.168.99.100if you're using a VirtualBox VM
In the commands that follow replace host with the minikube VM IP and port with the random port value
Create a couple of new key-value pair entries
curl http://[host]:[port]/save -d 'foo=bar'
curl http://[host]:[port]/save -d 'mac=cheese'
e.g.
curl http://192.168.99.100:32598/save -d 'foo=bar'
curl http://192.168.99.100:32598/save -d 'mac=cheese'
Access the value for key foo
curl http://[host]:[port]/read/foo
You should get the value you had saved for foo - bar. Same applies for mac i.e. you'll get cheese as its value. The program saves the key-value data in /data - let's confirm that by peeking directly into the Docker container inside the Pod
kubectl exec <pod name> -- ls /data/
foo
mac
foo, mac are individual files named after the keys. If we dig in further, we should be able to confirm thier respective values as well
To confirm value for the key mac
kubectl exec <pod name> -- cat /data/mac`
cheese
As expected, you got cheese as the answer since that's what you had stored earlier. If you try to look for a key which you haven't store yet, you'll get an error
cat: can't open '/data/moo': No such file or directory
command terminated with exit code 1
Kill the container ;-)
Alright, so far so good! Using a Volume ensures that the data will be preserved across container restarts/crash. Let's 'cheat' a bit and manually kill the Docker container.
kubectl exec [pod name] -- ps
PID USER TIME COMMAND
1 root 0:00 /kvstore
31 root 0:00 ps
Notice the process ID for the
kvstoreapplication (should be1)
In a different terminal, set a watch on the Pods
kubectl get pods -l app=kvstore --watch
We kill our app process
kubectl exec [pod name] -- kill 1
You will notice that the Pod will transition through a few phases (like Error etc.) before going back to Running state (re-started by Kubernetes).
NAME READY STATUS RESTARTS AGE
kvstore-6c94877886-gzq25 1/1 Running 0 15m
kvstore-6c94877886-gzq25 0/1 Error 0 15m
kvstore-6c94877886-gzq25 1/1 Running 1 15m
Execute kubectl exec <pod name> -- ls /data to confirm that the data in fact survived inspite of the container restart.
Delete the Pod!
But the data will not survive beyond the Pod's lifetime. To confirm this, let's delete the Pod manually
kubectl delete pod -l app=kvstore
You should see a confirmation such as below
pod "kvstore-6c94877886-gzq25" deleted
Kubernetes will restart the Pod again. You can confirm the same after a few seconds
kubectl get pods -l app=kvstore
you should see a new
PodinRunningstate
Get the pod name and peek into the file again
kubectl get pods -l app=kvstore
kubectl exec [pod name] -- ls /data/store
As expected, the /data/ directory will be empty!
The need for persistent storage
Simple (ephemeral) Volumes live and die with the Pod - but this is not going to suffice for a majority of applications. In order to be resilient, reliable, available and scalable, Kubernetes applications need to be able to run as multiple instances across Pods and these Pods themselves might be scheduled or placed across different Nodes in your Kubernetes cluster. What we need is a stable, persistent store that outlasts the Pod or even the Node on which the Pod is running.
As mentioned in the beginning of this blog, it's simple to use a Volume - not just temporary ones like the one we just saw, but even long term persistent stores.
Here is a (contrived) example of how to use Azure Disk as a storage medium for your apps deployed to Azure Kubernetes Service.
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
volumes:
- name: logs-volume
azureDisk:
kind: Managed
diskName: myAKSDiskName
diskURI: myAKSDiskURI
containers:
- image: myapp-docker-image
name: myapp
volumeMounts:
- mountPath: /app/logs
name: logs-volume
So that's it? Not quite! 😉 There are limitations to this approach. This and much more will be discussed in the next part of the series - so stay tuned!
I really hope you enjoyed and learned something from this article 😃😃 Please like and follow if you did!




Top comments (0)