DEV Community

Cover image for EKS + Longhorn — Dancing into Dynamic Storage
Ashish Gajjar
Ashish Gajjar

Posted on

EKS + Longhorn — Dancing into Dynamic Storage

Introduction

This comprehensive guide walks you through the complete process of deploying and configuring Longhorn as the default storage class in your Amazon Elastic Kubernetes Service (EKS) cluster. Longhorn provides a robust, cloud-native storage solution that addresses the limitations of traditional cloud-based block storage solutions like Amazon EBS.
By following this guide, you will learn how to:

  • Understand why EBS falls short for modern Kubernetes workloads
  • Prepare your EKS cluster for Longhorn
  • Install open-iSCSI prerequisites on every worker node
  • Deploy Longhorn using Helm as the default StorageClass
  • Configure AWS Load Balancer Controller for external access
  • Access the Longhorn management UI securely
  • Automate the entire setup using Atmos + Terraform

**

What is EBS Volume?

**
EBS (Elastic Block Store) is a network-attached storage volume for your EC2 instance — think of it as a hard disk in the cloud. It lives with you in the same Availability Zone


Step 1 — Create Volume
You create an EBS volume with a specific size, type, and Availability Zone.

Create an EBS volume
aws ec2 create-volume \
  --size 20 \
  --volume-type gp3 \
  --availability-zone us-east-1a \
  --region us-east-1
Enter fullscreen mode Exit fullscreen mode

Step 2 — Attach to EC2
Attach the volume to an EC2 instance. The volume and instance must be in the same Availability Zone.

Attach volume to EC2
aws ec2 attach-volume \
  --volume-id vol-0123456789abcdef0 \
  --instance-id i-0123456789abcdef0 \
  --device /dev/xvdf
Enter fullscreen mode Exit fullscreen mode

Step 3 — Format the Volume
Format the volume with a file system before use.

  Format with ext4
sudo mkfs -t ext4 /dev/xvdf
# All clean and ready!
Enter fullscreen mode Exit fullscreen mode

Step 4 — Mount the Volume
Mount the volume to a directory so your application can use it.

Mount volume to /data
sudo mount /dev/xvdf /data
# Make mount persistent across reboots
echo '/dev/xvdf /data ext4 defaults 0 2' | sudo tee -a /etc/fstab
Enter fullscreen mode Exit fullscreen mode

Step 5 — Use It
Your application can now read and write data to the mounted volume.
Step 6 — Data Persists
Even if the EC2 instance stops or restarts, your data is safe in EBS. The volume lifecycle is independent from the instance.
Step 7 — Snapshot (Backup)
Create snapshots of your EBS volume to back up data to Amazon S3.

Create a snapshot
aws ec2 create-snapshot \
  --volume-id vol-0123456789abcdef0 \
  --description "Daily backup"
# Snapshot saved in Amazon S3
Enter fullscreen mode Exit fullscreen mode

In Simple Words
EBS = Virtual Hard Drive
EBS is like a virtual hard drive that you attach to your EC2. You create it, attach it, mount it, use it, and keep your data safe. EBS and EC2 are Best Buddies — as long as they stay in the same Availability Zone!

EBS Limitations — Why Longhorn is Needed

Cannot move across AZ **— If the node is in another AZ, EBS cannot follow. Pod gets stuck, manual intervention needed.
**Single node access only
— Not designed for multi-node ReadWriteMany. Only one EC2 at a time.
No auto failover — If the node fails, manual intervention required to recover and reattach the volume.
*Not ideal for Kubernetes HA *— EBS works but was not built for dynamic pod scheduling across multiple nodes and zones.

The Problem: Why EBS is Not Enough

When deploying stateful applications on Amazon EKS, most teams start with the AWS EBS CSI Driver — the default storage provider. While EBS works well for simple use cases, it has fundamental architectural limitations that become painful at scale.
EBS Limitations

  • Bound to a single Availability Zone — volumes cannot cross AZ boundaries
  • If a pod migrates to a node in a different AZ during failure or rebalancing, the EBS volume cannot follow it
  • Requires complex strategies: volume snapshots, restoration, or restricting pod scheduling to specific AZs
  • Read-Write-Once (RWO) access only — one pod can mount at a time
  • Tied to AWS infrastructure — no portability to on-prem or other clouds
  • Single-instance attachment creates a potential I/O bottleneck Real-World Impact These limitations create real operational pain:
  • High availability gaps during node failure — volumes cannot automatically migrate
  • Manual failover processes that introduce downtime
  • Cannot rebalance storage across Availability Zones
  • No built-in cross-cluster disaster recovery
  • Pod scheduling constraints that limit cluster flexibility

What is Longhorn?

Longhorn is a lightweight, reliable, and feature-rich distributed block storage system designed specifically for Kubernetes environments. Originally developed by Rancher Labs and now maintained as a CNCF Incubating project, Longhorn has become a production-ready solution trusted by organizations worldwide.

Architecture and Design Philosophy

Longhorn operates as a cloud-native storage orchestrator that runs entirely within your Kubernetes cluster. Unlike traditional storage solutions that require dedicated hardware or cloud provider-specific integrations, Longhorn leverages the local storage available on your Kubernetes nodes and manages it as a unified storage pool.
The system uses a microservices architecture where each volume is managed by its own controller, ensuring isolation and resilience. Storage replicas are distributed across multiple nodes, providing data redundancy and high availability without requiring external storage systems.

Key Capabilities

  • Persistent Storage for stateful applications — databases, message queues, and more
  • Cloud-Agnostic Storage — works consistently across any Kubernetes environment
  • Multi-Node Replication — automatically replicates across nodes and AZs
  • External Backup Integration — snapshots to S3, NFS, Azure Blob, or MinIO
  • Disaster Recovery — cross-cluster DR volumes for business continuity
  • Automated Snapshot and Backup Scheduling — recurring cron-based jobs
  • Non-Disruptive Upgrades — upgrade components without disrupting running PVs
  • RWO + RWX Access Modes — both ReadWriteOnce and ReadWriteMany supported ## Longhorn vs EBS CSI Driver Understanding the differences between Longhorn and the AWS EBS CSI Driver is crucial for making informed infrastructure choices. Here is a direct comparison:

EKS Cluster Requirements
Minimum Node Count
Longhorn uses a replica-based architecture to ensure data durability. The default configuration creates three replicas for each volume, distributing them across different nodes.
Critical Requirement
Deploy at least 3 worker nodes in your EKS cluster. With fewer than 3 nodes, Longhorn cannot maintain its default 3-replica configuration, which compromises data redundancy.

Recommended Instance Types
For production deployments, choose EC2 instance types with NVMe SSD instance store volumes (the 'd' suffix). These provide high IOPS and low latency that Longhorn can directly utilize:

  • Compute Optimized: c5d, c5ad, c6g, c6gd — ideal for CPU-intensive workloads
  • Memory Optimized: r5d, r5ad, r6g, r6gd — for memory-intensive applications
  • General Purpose: m5d, m5ad, m6g, m6gd — balanced compute, memory and storage
  • Storage Optimized: i3en, z1d — maximum NVMe throughput For development/testing, t3.xlarge (4 vCPU, 16 GB RAM) is acceptable. Never use t3.medium — it lacks the CPU for Longhorn engine processes. Important: Do NOT install EBS CSI Driver alongside Longhorn

Step-by-Step Installation Guide

Step 1 — Create EKS Cluster

 Create key pair first
aws ec2 create-key-pair \
  --key-name ashish-key \
  --region us-east-1 \
  --query 'KeyMaterial' \
  --output text > ashish-key.pem
chmod 400 ashish-key.pem
Enter fullscreen mode Exit fullscreen mode

Full cluster create command (takes 15-20 minutes)

eksctl create cluster \
  --name ashish --version 1.33 --region us-east-1 \
  --nodegroup-name ashish-workers \
  --node-type t3.xlarge --nodes 3 --nodes-min 2 --nodes-max 5 \
  --managed --with-oidc \
  --node-private-networking \
  --ssh-access --ssh-public-key ashish-key
Enter fullscreen mode Exit fullscreen mode

Watch progress

eksctl get cluster --name ashish --region us-east-1
Enter fullscreen mode Exit fullscreen mode

Step 2 — Enable OIDC + Install Addons
Enable OIDC (if not done at cluster creation)

eksctl utils associate-iam-oidc-provider \
  --cluster ashish --region us-east-1 --approve
Enter fullscreen mode Exit fullscreen mode

Verify OIDC

aws eks describe-cluster --name ashish \
  --query "cluster.identity.oidc.issuer" --output text
Enter fullscreen mode Exit fullscreen mode

Install all EKS addons


eksctl create addon --name aws-ebs-csi-driver --cluster ashish --region us-east-1 --force
eksctl create addon --name vpc-cni --cluster ashish --region us-east-1 --force
eksctl create addon --name coredns --cluster ashish --region us-east-1 --force
eksctl create addon --name kube-proxy --cluster ashish --region us-east-1 --force

Enter fullscreen mode Exit fullscreen mode

Verify all addons are ACTIVE
eksctl get addon --cluster ashish --region us-east-1

Step 3 — Install Longhorn Prerequisites
Longhorn has specific system-level dependencies that must be installed on each Kubernetes worker node.
Install open-iSCSI on every node (SSH in first)

# Install the iSCSI initiator package
sudo dnf install -y iscsi-initiator-utils

# Enable the iscsid service to start on boot
sudo systemctl enable iscsid

# Start the iscsid service immediately
sudo systemctl start iscsid

# Verify
sudo systemctl status iscsid
Enter fullscreen mode Exit fullscreen mode

Install open-iSCSI via DaemonSet (alternative)

kubectl apply -f \
  https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-iscsi-installation.yaml

# Verify iscsi running on all nodes
kubectl get pods -n longhorn-system | grep iscsi
Enter fullscreen mode Exit fullscreen mode

Install NFS client (for RWX volumes)

kubectl apply -f \
  https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-nfs-installation.yaml

# Verify NFS running
kubectl get pods -n longhorn-system | grep nfs
Enter fullscreen mode Exit fullscreen mode

Run Preflight Check — do not skip this!

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/scripts/environment_check.sh | bash

# All nodes must show: [  OK  ]
# Fix any FAIL items before proceeding
kubectl get pods -n longhorn-system   # should all be Running
Enter fullscreen mode Exit fullscreen mode

Step 4 — Install Longhorn via Helm

Create custom-values.yaml
cat > custom-values.yaml <<EOF
preUpgradeChecker:
  jobEnabled: false
EOF

# preUpgradeChecker.jobEnabled: false
# Disables the pre-upgrade checker on first install.
# Re-enable for future upgrades.
Enter fullscreen mode Exit fullscreen mode

** Add Helm repo and install Longhorn**
helm repo add longhorn https://charts.longhorn.io
helm repo update

Install Longhorn

helm install longhorn \
  --repo https://charts.longhorn.io \
  longhorn \
  --namespace longhorn-system \
  --create-namespace \
  -f custom-values.yaml
# Watch pods come up (takes 2-3 mins)
kubectl get pods -n longhorn-system --watch
Enter fullscreen mode Exit fullscreen mode

The installation deploys: DaemonSets (node agents), Deployments (Longhorn manager + UI), Services, and CustomResourceDefinitions (CRDs).
Step 5 — StorageClass, PV and PVC
Set Longhorn as default StorageClass

# Check existing storage classes
kubectl get storageclasses

# Remove default from gp2
kubectl patch storageclass gp2 \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# Set Longhorn as default
kubectl patch storageclass longhorn \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Verify — longhorn should show (default)
kubectl get storageclass

Enter fullscreen mode Exit fullscreen mode

Storage Flow: PVC (asks for storage) → StorageClass (decides how to create it) → PV (actual storage provided)
EBS StorageClass vs Longhorn StorageClass — Side by Side
EBS StorageClass (gp3)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
Enter fullscreen mode Exit fullscreen mode

EBS Limitations:

  • Pod stuck if node fails in different AZ — EBS cannot reattach across zones
  • WaitForFirstConsumer means pod must schedule before volume is created
  • One EC2 instance only — no multi-node access (RWO only)
  • Every volume = a separate EBS bill + snapshot storage cost Longhorn StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
reclaimPolicy: Delete
Enter fullscreen mode Exit fullscreen mode

Longhorn Advantages:

  • Pod restarts on any node in any AZ — Longhorn serves data from nearest replica
  • 3 replicas = no single point of failure — cluster continues if one node dies
  • Built-in UI, snapshot, backup to S3 — no extra AWS services needed
  • Same YAML works on AWS, GCP, on-prem — no cloud-specific provisioner

07 — UI & Security Setup
Open Longhorn Dashboard
Method 1: Port Forward (quickest)

  Port-forward to local machine
kubectl port-forward \
  -n longhorn-system \
  svc/longhorn-frontend 8080:80
# Open browser at:
# http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Method 2: LoadBalancer (team access)
Create AWS Load Balancer

kubectl get svc longhorn-frontend -n longhorn-system

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: longhorn-frontend-lb
  namespace: longhorn-system
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "instance"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: longhorn-ui
  ports:
    - port: 80
      targetPort: 8000
EOF

kubectl get svc longhorn-frontend-lb -n longhorn-system --watch

Enter fullscreen mode Exit fullscreen mode

Method 3: NodePort
Patch to NodePort

kubectl patch svc longhorn-frontend \
  -n longhorn-system \
  -p '{"spec":{"type":"NodePort"}}'

kubectl get svc longhorn-frontend -n longhorn-system
# Access via: http://<NODE-IP>:<NODE-PORT>
Enter fullscreen mode Exit fullscreen mode

Debug if Pods Not Running
Troubleshooting commands

# Check all Longhorn pods are Running
kubectl get pods -n longhorn-system

# If any pod is not Running
kubectl describe pod <pod-name> -n longhorn-system
kubectl logs <pod-name> -n longhorn-system

# Check node health in Longhorn
kubectl get nodes.longhorn.io -n longhorn-system
Enter fullscreen mode Exit fullscreen mode

Traefik Gateway — SSO with HTTPRoute
Install Gateway API CRDs + Traefik

# Gateway API CRDs
kubectl apply -f https://raw.githubusercontent.com/traefik/traefik/v3.0/docs/content/\
  reference/dynamic-configuration/kubernetes-crd-definition-v1.yml

# Install Traefik with Gateway API enabled
helm repo add traefik https://traefik.github.io/charts
helm install traefik traefik/traefik \
  --namespace traefik \
  --create-namespace \
  --set providers.kubernetesGateway.enabled=true \
  --set gateway.enabled=true
Enter fullscreen mode Exit fullscreen mode

traefik-middlewares.yaml

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: forward-auth-delegate
  namespace: longhorn-system
spec:
  chain:
    middlewares:
      - name: forward-auth
        namespace: service-foundry
Enter fullscreen mode Exit fullscreen mode

Configuration explanation:

  • apiVersion traefik.io/v1alpha1 — uses the Traefik CRD API version for defining middleware
  • kind Middleware — defines this resource as a Traefik middleware
  • forward-auth-delegate — name referenced in HTTPRoute
  • namespace longhorn-system — placed in the same namespace as Longhorn
  • name forward-auth / namespace service-foundry — references existing auth middleware in service-foundry namespace Service Foundry Deployment Flow

A real-world deployment of PostgreSQL and Redis using Longhorn storage in the service-foundry namespace:

  1. Create Namespace → service-foundry (isolated environment)
  2. Create Secrets → store PostgreSQL & Redis credentials securely
  3. Create PVCs → PostgreSQL (8Gi) + Redis (8Gi) — uses default Longhorn StorageClass automatically
  4. Deploy PostgreSQL → mounted persistent storage + internal ClusterIP service
  5. Deploy Redis → ConfigMap + Secret + mounted persistent storage + internal ClusterIP service
  6. Persistent Storage Flow → PVC → StorageClass (Longhorn) → PV auto-created ** Verify PVCs and PVs**
kubectl get pvc -A
# NAMESPACE         NAME                          STATUS  STORAGECLASS
# service-foundry   data-postgresql-0             Bound   longhorn
# service-foundry   redis-data-redis-master-0     Bound   longhorn

kubectl get pv
# NAME                   CAPACITY  STATUS  STORAGECLASS
# pvc-4f9baf6b-...       8Gi       Bound   longhorn
Enter fullscreen mode Exit fullscreen mode

UI Dashboard

LONGHORN Dashboard

Node Details


Volume Details


Snapshot and Backups


Set Cronjobs

Conclusion
Longhorn transforms how you think about storage in Kubernetes. By moving from the rigid, zone-bound architecture of EBS to a distributed, software-defined storage layer, you gain true pod mobility, automatic failover, built-in backup, and cloud portability — all from within your cluster.

Top comments (0)