DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

Headless Services, StatefulSets, Volumes, PV/PVC, Databases, and Production Architecture

(What Kubernetes really does, what it does NOT do, and what DevOps engineers must own)


PART 1 — THE MOST IMPORTANT QUESTION

“What should live inside a pod, and what should NOT?”

This single question separates junior from senior DevOps engineers.

A pod is:

  • a runtime environment
  • temporary
  • replaceable
  • stateless by default

Kubernetes was built assuming:

Pods WILL die.
Pods WILL move.
Pods WILL be recreated.

So Kubernetes intentionally discourages putting anything precious inside a pod.


PART 2 — WHY “JUST PUT THE DATABASE IN A POD” IS DANGEROUS

The container filesystem is EPHEMERAL

When a pod:

  • restarts
  • is rescheduled
  • is recreated during rollout

👉 its filesystem can be wiped

If your database stores data inside the container filesystem, then:

kubectl delete pod mysql
= DATA LOSS
Enter fullscreen mode Exit fullscreen mode

That is unacceptable in production.

This is why Kubernetes separates compute from storage.


PART 3 — STATELESS vs STATEFUL (FOUNDATIONAL CONCEPT)

Stateless workloads

Examples:

  • Frontend
  • API
  • Nginx
  • Auth services

Properties:

  • Any instance can serve any request
  • No user data stored locally
  • Horizontal scaling is safe

✅ Use:

  • Deployment
  • ClusterIP / Ingress
  • No persistent storage

Stateful workloads

Examples:

  • MySQL
  • PostgreSQL
  • MongoDB
  • Kafka
  • Elasticsearch
  • ZooKeeper

Properties:

  • Data must survive restarts
  • Writes must be consistent
  • Identity matters
  • Replication needs stable peers

❌ Deployment alone is NOT enough
❌ ClusterIP alone is NOT enough


PART 4 — SERVICES: WHAT THEY REALLY DO

ClusterIP Service (default)

  • Creates one virtual IP
  • kube-proxy load balances traffic
  • Pod identity is hidden
App → ClusterIP → random pod
Enter fullscreen mode Exit fullscreen mode

This is perfect for stateless apps.

But for databases:

  • Write goes to pod A
  • Read goes to pod B
  • Data inconsistency
  • Login failures
  • Corruption risk

PART 5 — HEADLESS SERVICE (WHAT IT ACTUALLY MEANS)

A Headless Service is just a Service with:

clusterIP: None
Enter fullscreen mode Exit fullscreen mode

That one line tells Kubernetes:

“Do NOT give me a virtual IP.
Give clients the real pod endpoints.”

What changes:

  • No kube-proxy load balancing
  • DNS returns pod IPs directly
  • Identity is exposed

DNS behavior:

nslookup mysql-headless
→ pod-ip-1
→ pod-ip-2
→ pod-ip-3
Enter fullscreen mode Exit fullscreen mode

This is DNS round-robin, not service-level load balancing.

⚠️ Important:
Headless does NOT mean “no load balancing”
It means Kubernetes stops hiding pods behind a virtual IP


PART 6 — WHY HEADLESS ALONE IS NOT ENOUGH

Pod IPs:

  • change on restart
  • change on reschedule
  • are NOT stable identifiers

So how does a replica always find the primary?

This is why StatefulSet exists.


PART 7 — STATEFULSET (THE DATABASE CONTROLLER)

StatefulSet is designed for workloads that need:

  1. Stable identity
  2. Stable network names
  3. Stable storage

What StatefulSet guarantees:

Feature Deployment StatefulSet
Pod name random stable
Pod order none ordered
Identity
Per-pod storage

Pods are named:

mysql-0
mysql-1
mysql-2
Enter fullscreen mode Exit fullscreen mode

These names:

  • NEVER change
  • Survive restarts
  • Survive rescheduling

PART 8 — STATEFULSET + HEADLESS (THE DATABASE PATTERN)

When you combine:

  • StatefulSet
  • Headless Service

Kubernetes automatically creates stable DNS records:

mysql-0.mysql-headless.default.svc.cluster.local
mysql-1.mysql-headless.default.svc.cluster.local
Enter fullscreen mode Exit fullscreen mode

Now you can:

  • Write → mysql-0
  • Read → mysql-1 / mysql-2
  • Replicate reliably

This is the canonical Kubernetes database pattern.


PART 9 — STORAGE: WHY VOLUMES EXIST

Pods die.
Data must not.

Kubernetes solves this by decoupling storage from pods.


PART 10 — TYPES OF VOLUMES (WHAT A DEVOPS MUST KNOW)

1️⃣ emptyDir

  • Lives as long as pod lives
  • Deleted when pod dies

✅ Use for:

  • cache
  • temp files

❌ NEVER for databases


2️⃣ hostPath

  • Mounts node filesystem

❌ Dangerous in production
❌ Breaks portability
❌ Ties pod to a node

Used only for:

  • demos
  • debugging
  • very specific system agents

3️⃣ Persistent Volumes (PV)

A PV represents real storage:

  • EBS (AWS)
  • PD (GCP)
  • Azure Disk
  • NFS
  • Ceph

Cluster-level resource.


4️⃣ Persistent Volume Claim (PVC)

A PVC is:

  • a request for storage
  • namespace-scoped
  • bound to a PV

Pods use PVCs — not PVs directly.

Mental model (MEMORIZE THIS):

Pod → PVC → PV → Physical Disk
Enter fullscreen mode Exit fullscreen mode

PART 11 — STORAGECLASS (PRODUCTION ESSENTIAL)

A StorageClass defines:

  • disk type (SSD / HDD)
  • IOPS
  • replication
  • reclaim policy

Dynamic provisioning means:

“Create disk when PVC is created”

Production clusters ALWAYS define StorageClasses.


PART 12 — STATEFULSET + PVC (PER-POD DISKS)

StatefulSet can create:

pvc-mysql-0
pvc-mysql-1
pvc-mysql-2
Enter fullscreen mode Exit fullscreen mode

Each pod gets:

  • its own disk
  • its own data
  • no overlap

This is REQUIRED for:

  • databases
  • Kafka brokers
  • Elasticsearch nodes

PART 13 — SHOULD DATABASES RUN IN KUBERNETES?

The senior DevOps answer:

👉 It depends on maturity and risk

When YES:

  • Strong SRE team
  • Backup automation
  • Monitoring in place
  • Storage tuned
  • Operators (MySQL Operator, etc.)

When NO:

  • Small team
  • No DB expertise
  • High SLA requirements

Many companies:

  • Use RDS in production
  • Use StatefulSet DBs in dev/test

This is a business decision, not a technical limitation.


PART 14 — PRODUCTION CLUSTER ARCHITECTURE (BIG PICTURE)

User
 ↓
Load Balancer / Ingress
 ↓
Stateless App Pods (Deployment)
 ↓
Stateful DB (StatefulSet + Headless)
 ↓
Persistent Storage (PVC → PV)
Enter fullscreen mode Exit fullscreen mode

Image

Image


PART 15 — WHAT GOES INSIDE A POD (AND WHAT DOES NOT)

Put INSIDE the pod:

  • Application code
  • Runtime dependencies
  • ConfigMaps
  • Secrets (mounted as files)

NEVER put inside the pod:

  • User data
  • Database files
  • Anything you can’t lose

PART 16 — HOW DEVOPS ENGINEERS TROUBLESHOOT (REAL LIFE)

App cannot connect to DB

Checklist:

  1. Is DB pod running?
  2. Does Service have endpoints?
  3. DNS resolution inside cluster?
  4. Port reachable?

PVC Pending

Checklist:

  • StorageClass exists?
  • Default StorageClass?
  • Cloud permissions?

Data lost after restart

Red flags:

  • emptyDir
  • no PVC
  • wrong mount path

PART 17 — WHAT A 6+ YEAR DEVOPS ENGINEER MUST KNOW

Architecture

  • Stateless vs Stateful
  • Deployment vs StatefulSet
  • Headless vs ClusterIP

Storage

  • PV / PVC / StorageClass
  • Reclaim policies
  • Per-pod volumes

Networking

  • DNS behavior
  • Service selectors
  • Pod FQDNs

Operations

  • Backups
  • Restores
  • Safe upgrades
  • Scaling rules

PART 18 — INTERVIEW-READY SUMMARY (MEMORIZE)

“Databases in Kubernetes require StatefulSets for stable identity, headless services for pod-level DNS, and PVCs for persistent storage. Pods are ephemeral and must never own critical data. Kubernetes manages lifecycle, not database correctness.”


FINAL ANSWER TO YOUR CORE QUESTION

“If we don’t put everything in the pod, how do users get data?”

Users NEVER talk to pods directly.

Users talk to:

  • Ingress
  • Services

Pods talk to:

  • Stable DB endpoints
  • Persistent storage

Pods are delivery mechanisms, not data owners.

Top comments (0)