yep

Posted on Apr 20 • Originally published at yepchaos.com

Persistent Storage in Kubernetes: From Longhorn to OpenEBS

#kubernetes #longhorn #openebs

Stateful workloads need storage that outlives pods. In Kubernetes, that means Persistent Volumes (PV) and Persistent Volume Claims (PVC) — a PV is the actual storage, a PVC is a pod's request for it. Kubernetes matches them and handles the binding. The interesting question is what backs those PVs.

I started with Longhorn, realized it was too heavy for my cluster, benchmarked alternatives, and switched to OpenEBS. Here's the full story with numbers.

Longhorn: Good, But Overkill

Longhorn is easy to install and comes with a solid UI, snapshots, backups, and synchronous replication across nodes. I installed it with Helm:

helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --version 1.7.1

It worked. But on a 3-node cluster with limited resources, Longhorn consumes around 1.5GB of memory just for its own components — Instance Manager, CSI plugins, Longhorn Manager, and the UI.

The bigger issue: my stateful apps (PostgreSQL, ScyllaDB) already handle their own replication. ScyllaDB replicates across nodes at the application level. PostgreSQL does the same. Adding storage-level replication on top is redundant — double the replication overhead, double the latency, for no benefit.

I set replicas to 1 to avoid redundant replication:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-single-replica
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "1"
  dataLocality: "best-effort"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Even with single replica, the 1.5GB memory overhead remained. For a small cluster where every GB matters, that's hard to justify.

Benchmarking the Options

Before switching, I ran proper benchmarks using FIO on my actual cluster — 3-node CentOS VMs, the same hardware running everything else.

FIO Pod

Same pod spec used across all three storage options, just swapping the PVC:

apiVersion: v1
kind: Pod
metadata:
  name: fio-test
spec:
  restartPolicy: Never
  containers:
    - name: fio
      image: ljishen/fio
      command: ["fio"]
      args:
        - --name=pg-test
        - --filename=/data/testfile
        - --size=200M
        - --bs=8k
        - --rw=randrw
        - --rwmixread=70
        - --ioengine=libaio
        - --iodepth=16
        - --runtime=60
        - --numjobs=1
        - --time_based
        - --group_reporting
      resources:
        requests:
          cpu: "1"
          memory: "256Mi"
        limits:
          cpu: "2"
          memory: "512Mi"
      volumeMounts:
        - mountPath: /data
          name: testvol
  volumes:
    - name: testvol
      persistentVolumeClaim:
        claimName: longhorn-pvc  # swap for local-pvc or openebs-pvc

The FIO config simulates a database-like workload — 8k block size, 70/30 read/write mix, random I/O, 16 queue depth.

Longhorn Setup

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn-single-replica
  resources:
    requests:
      storage: 1Gi

Local PV Setup

More manual — create the directory on the node first:

sudo mkdir -p /mnt/disks/localdisk1
sudo chmod 777 /mnt/disks/localdisk1

Then create the PV and PVC manually:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  local:
    path: /mnt/disks/localdisk1
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s3
  persistentVolumeReclaimPolicy: Delete
  volumeMode: Filesystem

When using local PVs, the pod also needs node affinity to land on the right node:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s3

This is the problem with local PVs at scale — every new volume needs manual directory creation, a manually written PV manifest, and node affinity on every pod that uses it. No dynamic provisioning. Painful to manage.

Results

Metric	Longhorn	Local PV	OpenEBS
Read IOPS	811	7757	7401
Read Bandwidth	6.3 MiB/s	60.6 MiB/s	57.8 MiB/s
Read Latency (avg)	14,189 µs	1,467 µs	1,539 µs
Write IOPS	346	3328	3177
Write Bandwidth	2.7 MiB/s	26.0 MiB/s	24.8 MiB/s
Write Latency (avg)	12,913 µs	1,377 µs	1,440 µs
CPU Usage (sys)	4.71%	26.25%	26.05%
Memory Overhead	~1.5 GB	none	~180 MB
Backend	User-space	Kernel block device	Kernel block device

Longhorn's numbers are significantly worse — 10x higher latency, ~10x lower IOPS. That's the cost of going through a user-space storage layer for every I/O operation. Local PV and OpenEBS both go through the kernel block device directly, which is why they're close to each other.

Local PV wins on raw performance but loses on everything else — no dynamic provisioning, manual node affinity management, manual directory creation on each node. It doesn't scale.

OpenEBS: The Sweet Spot

OpenEBS with hostpath provisioner gives us performance close to local PV with actual automation. It handles provisioning, metrics, and lifecycle. Memory overhead is ~180MB for the whole stack — 8x less than Longhorn.

k3s has a built-in local-path provisioner that's similar, but it also requires manually creating directories on each node and gives less control over the storage lifecycle. OpenEBS handles that automatically.

Install:

helm repo add openebs https://openebs.github.io/openebs
helm repo update

helm install openebs --namespace openebs openebs/openebs \
  --set engines.replicated.mayastor.enabled=false \
  --create-namespace

-set engines.replicated.mayastor.enabled=false disables Mayastor, OpenEBS's replicated storage engine. I don't need it — my apps handle their own replication. Disabling it keeps the footprint small.

Create the base directory once on each node:

sudo mkdir -p /var/openebs/local

Then PVCs just reference the openebs-hostpath storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openebs-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: openebs-hostpath
  resources:
    requests:
      storage: 1Gi

No manual PV creation, no node affinity on pods, no directory management per volume. OpenEBS handles it.

Current State

Everything stateful on the cluster — PostgreSQL, ScyllaDB, Redis, NATS — uses OpenEBS with openebs-hostpath. Longhorn is gone.

DEV Community