DEV Community

Cover image for Welcome to Container Harbour! 🚢 Ep.14

Welcome to Container Harbour! 🚢 Ep.14

Episode 14: Reserved Berths for Divas — StatefulSets 🎭

The Database That Would NOT Behave Like a Normal Deployment 😤

We were young. We were naive. We thought: "It's just another container. Deploy it as a Deployment. It'll be fine."

We deployed PostgreSQL as a Deployment with 3 replicas.

What we got was three separate PostgreSQL instances, each with their own data, none of them aware the others existed, all happily accepting writes, none of them replicating anything.

Users were writing to replica 1. Reading from replica 2. Seeing different data. Going insane. Filing bug reports. Questioning reality.

It was not fine.

Databases are not stateless web servers. They are DIVAS. They need their own reserved berth, their own storage, their own persistent identity, and they need to start up in the RIGHT ORDER.

StatefulSets give them exactly that. 🎭


The SIPOC of StatefulSets 🗂️

Detail
Supplier Who manages the StatefulSet? The StatefulSet controller in the Controller Manager
Input What goes in? StatefulSet spec with ordinal naming, volumeClaimTemplates, headless Service
Process What happens? Pods created in order (0, 1, 2...), each gets stable identity + dedicated storage
Output What comes out? Stable, individually addressable Pods with persistent storage
Consumer Who uses it? Databases, message queues, distributed systems that need identity

StatefulSet vs Deployment: The Core Difference 🔄

DEPLOYMENT:                          STATEFULSET:

web-app-abc123   (Pod 1)             postgres-0   (Pod 0)
web-app-def456   (Pod 2)             postgres-1   (Pod 1)
web-app-ghi789   (Pod 3)             postgres-2   (Pod 2)

- Random names ✗                     - Ordered names (0,1,2...) ✓
- Random scheduling ✗                - Stable network identity ✓
- Shared or no storage ✗             - Each gets own dedicated PVC ✓
- Start/stop in any order ✗          - Start in order, stop in reverse ✓
- Replaceable, anonymous ✓           - Each Pod has a persistent identity ✓
Enter fullscreen mode Exit fullscreen mode

The StatefulSet guarantees:

  1. Stable Pod names: postgres-0, postgres-1, postgres-2
  2. Stable DNS names: postgres-0.postgres.production.svc.cluster.local
  3. Dedicated storage: postgres-0 always gets data-postgres-0 PVC
  4. Ordered operations: Pods start in order 0→1→2, scale down in reverse 2→1→0

A Minimal StatefulSet: Three-Node PostgreSQL 🐘

First, the headless Service — required for stable DNS names:

# headless-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres            # This name becomes part of the DNS
  namespace: production
spec:
  clusterIP: None           # "None" = headless! No virtual IP.
  selector:
    app: postgres
  ports:
  - port: 5432
    name: postgres
Enter fullscreen mode Exit fullscreen mode

Now the StatefulSet:

# postgres-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
spec:
  serviceName: postgres       # Must match the headless Service name!
  replicas: 3
  selector:
    matchLabels:
      app: postgres

  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        ports:
        - containerPort: 5432
          name: postgres

        env:
        - name: POSTGRES_DB
          value: "harbour_db"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name    # Inject "postgres-0", "postgres-1", etc.
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secrets
              key: POSTGRES_PASSWORD

        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data

  # THIS is the magic: each Pod gets its OWN PVC!
  volumeClaimTemplates:
  - metadata:
      name: data                 # Creates PVCs named: data-postgres-0, data-postgres-1, data-postgres-2
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: managed-premium
      resources:
        requests:
          storage: 20Gi
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f headless-service.yaml
kubectl apply -f postgres-statefulset.yaml

# Watch Pods start IN ORDER:
kubectl get pods -l app=postgres --watch
# NAME         READY   STATUS    RESTARTS
# postgres-0   0/1     Pending   0          <- 0 first
# postgres-0   1/1     Running   0          <- 0 ready before 1 starts
# postgres-1   0/1     Pending   0          <- 1 starts after 0 is ready
# postgres-1   1/1     Running   0
# postgres-2   0/1     Pending   0          <- 2 starts after 1 is ready
# postgres-2   1/1     Running   0

# See the individual PVCs created for each Pod:
kubectl get pvc -n production
# NAME              STATUS   VOLUME                  CAPACITY
# data-postgres-0   Bound    pvc-abc123...           20Gi    <- postgres-0's private storage
# data-postgres-1   Bound    pvc-def456...           20Gi    <- postgres-1's private storage
# data-postgres-2   Bound    pvc-ghi789...           20Gi    <- postgres-2's private storage
Enter fullscreen mode Exit fullscreen mode

Stable DNS: Addressing Individual Pods 📞

With a headless Service and a StatefulSet, EACH Pod gets a stable DNS name:

Pattern: <pod-name>.<service-name>.<namespace>.svc.cluster.local

postgres-0.postgres.production.svc.cluster.local   -> postgres-0's IP (ALWAYS)
postgres-1.postgres.production.svc.cluster.local   -> postgres-1's IP (ALWAYS)
postgres-2.postgres.production.svc.cluster.local   -> postgres-2's IP (ALWAYS)
Enter fullscreen mode Exit fullscreen mode
# From inside the cluster, prove it works:
kubectl run dns-test --image=busybox --rm -it --restart=Never -- sh

# Inside the pod:
nslookup postgres-0.postgres.production.svc.cluster.local
# Address: 10.244.1.5   <- Always this Pod's IP, regardless of restarts

nslookup postgres.production.svc.cluster.local
# Address: 10.244.1.5   <- All Pod IPs returned
# Address: 10.244.2.3
# Address: 10.244.3.7
Enter fullscreen mode Exit fullscreen mode

If postgres-1 is killed and recreated, it comes back as postgres-1 at the SAME DNS name with the SAME storage. The identity survives the death of the Pod. ♻️


Scaling StatefulSets: Ordered and Careful 📈

# Scale up (adds postgres-3 after postgres-2 is healthy)
kubectl scale statefulset postgres --replicas=4

# Scale DOWN (removes postgres-3 FIRST, then postgres-2, etc.)
kubectl scale statefulset postgres --replicas=2

# WHY ordered scale-down? Databases! The newest replicas are typically
# the ones with least data, and removing them first is safest.

# Watch ordered scale-down:
kubectl get pods -l app=postgres --watch
# postgres-0   1/1   Running    # stays
# postgres-1   1/1   Running    # stays
# postgres-2   1/1   Running    # last in, first out
# postgres-2   1/1   Terminating
# (postgres-2 gone)
# postgres-1   1/1   Running    # still here (we only went to 2 replicas)
Enter fullscreen mode Exit fullscreen mode

Update Strategies for StatefulSets 🔄

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2    # Only update Pods with ordinal >= 2 (canary strategy!)
Enter fullscreen mode Exit fullscreen mode
# Canary update: only update the highest ordinal Pods first
# Set partition=2: only postgres-2 gets the new version
kubectl patch statefulset postgres -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":2}}}}'
kubectl set image statefulset/postgres postgres=postgres:16

# postgres-2 gets postgres:16. postgres-0 and postgres-1 stay on postgres:15.
# Test on postgres-2. Happy? Lower the partition.
kubectl patch statefulset postgres -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":0}}}}'
# Now all Pods update in reverse order: 1, then 0.
Enter fullscreen mode Exit fullscreen mode

StatefulSet Lifecycle: What Happens When a Pod Dies 💀

# Delete postgres-1 (simulate crash)
kubectl delete pod postgres-1

# Watch what happens:
kubectl get pods -l app=postgres --watch
# postgres-1   1/1   Terminating
# postgres-1   0/1   Pending       <- NEW postgres-1 starting (same name!)
# postgres-1   1/1   Running       <- Back, same identity

# The new postgres-1:
# - Has the SAME name: postgres-1
# - Has the SAME DNS: postgres-1.postgres.production.svc.cluster.local
# - Has the SAME storage: data-postgres-1 PVC
# - Is on potentially DIFFERENT node (Kubernetes chooses)

# What did NOT survive:
# - In-memory state (obviously)
# - Anything not in /var/lib/postgresql/data
Enter fullscreen mode Exit fullscreen mode

Real World: Cassandra Cluster with StatefulSet 🏗️

StatefulSets shine with distributed databases like Cassandra, where each node needs to know the identity of the seed nodes:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
      - name: cassandra
        image: cassandra:4.1
        ports:
        - containerPort: 9042
          name: cql
        - containerPort: 7000
          name: intra-node
        env:
        - name: CASSANDRA_SEEDS
          # Seeds are the stable DNS names of the first two nodes
          value: "cassandra-0.cassandra.default.svc.cluster.local,cassandra-1.cassandra.default.svc.cluster.local"
        - name: CASSANDRA_CLUSTER_NAME
          value: "HarbourCluster"
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        volumeMounts:
        - name: cassandra-data
          mountPath: /var/lib/cassandra/data
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: managed-premium
      resources:
        requests:
          storage: 50Gi
Enter fullscreen mode Exit fullscreen mode

Cassandra nodes know each other by stable DNS names. When node cassandra-0 dies and comes back, it rejoins the cluster at the same address. The cluster gossip protocol reconnects it. No manual intervention. 🎯


When to Use StatefulSets vs Deployments 🤔

Use DEPLOYMENT for:                  Use STATEFULSET for:
- Web servers                        - Relational databases (PostgreSQL, MySQL)
- API services                       - NoSQL databases (MongoDB, Cassandra, Redis)
- Stateless microservices            - Message queues (Kafka, RabbitMQ)
- Background workers (no storage)    - Distributed coordination (ZooKeeper, etcd)
- Batch processors                   - Any app needing stable network identity
- Anything that doesn't need         - Any app needing per-instance storage
  individual identity
Enter fullscreen mode Exit fullscreen mode

When in doubt: if your app keeps data that matters, and it talks to other instances of itself — use StatefulSet. If it's just serving HTTP and doesn't care where it runs — use Deployment. 🎯


The Harbourmaster's Log — Entry 14 📋

The database team wanted to run a 3-node PostgreSQL cluster. "Just use a Deployment," someone said.

I did not say that. I knew better by now.

We deployed it as a StatefulSet. postgres-0 started first and became the primary. postgres-1 and postgres-2 started in order and became replicas, connecting to postgres-0 via its stable DNS name.

Six weeks later, postgres-1's node failed. The Pod was evicted. A new postgres-1 started on a different node. It reconnected to its persistent volume. It rejoined the replication stream. It caught up. It was ready in four minutes.

The database team didn't notice. The monitoring team sent a "Resolved" notification.

The users noticed nothing.

Databases as divas: confirmed. Worth the reserved berth: absolutely. 🎩


Your Mission 🎯

  1. Deploy a 3-node StatefulSet using nginx (for simplicity):
volumeClaimTemplates:
- metadata:
    name: webroot
  spec:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 1Gi
Enter fullscreen mode Exit fullscreen mode
  1. Exec into nginx-0 and write a unique file to /usr/share/nginx/html/:
kubectl exec -it nginx-0 -- sh -c 'echo "I am nginx-0" > /usr/share/nginx/html/id.txt'
kubectl exec -it nginx-1 -- sh -c 'echo "I am nginx-1" > /usr/share/nginx/html/id.txt'
Enter fullscreen mode Exit fullscreen mode
  1. Delete nginx-0

  2. Wait for it to come back

  3. Exec into the new nginx-0 and read id.txt — the file should still be there!

  4. Bonus: Verify that deleting nginx-0 does NOT delete the PVC webroot-nginx-0. The data survives.


Next Time 🎬

Episode 15: Leaving Harbour Gracefully — Rolling Updates and Zero Downtime Deployments. The grand finale. Ships don't all leave at once. Neither do your Pods. 🌅


🎯 Key Takeaways:

  • StatefulSets give Pods stable identities: stable names, stable DNS, stable storage.
  • Each Pod gets its own PVC via volumeClaimTemplates. Shared storage is NOT what you want.
  • Pods start in order (0→1→2) and stop in reverse (2→1→0). This is intentional.
  • A headless Service (clusterIP: None) is required to enable per-Pod DNS names.
  • pod-name.service-name.namespace.svc.cluster.local = the stable DNS pattern.
  • Deleting a Pod from a StatefulSet creates a replacement with the same name, same storage.
  • partition in updateStrategy enables canary rollouts for StatefulSets.
  • Use StatefulSets for: databases, message queues, distributed coordination systems. Nothing else. 🗄️

Top comments (0)