DEV Community

Guatu
Guatu

Posted on • Originally published at guatulabs.dev

Pod Disruption Budgets: Why kubectl drain Gets Stuck on Longhorn

I spent three hours trying to drain a node running Longhorn, only to watch kubectl drain freeze with no progress or error. It felt like the command was waiting for something that would never come.

What I expected was a smooth drain — pods evicted, node marked as unavailable, and everything moving on. No drama. No hanging. Just a clean maintenance operation.

What actually happened was a silent stall. The drain command would not proceed past the point where it started trying to evict the Longhorn manager pod. It would sit there, stuck, with no message. After some digging, I realized it was a Pod Disruption Budget (PDB) blocking the eviction. Longhorn has a PDB in place that ensures at least one manager pod is always running, and that PDB was preventing the eviction from completing.

The fix came in two parts: first, I had to adjust the PDB to allow the drain to proceed. Second, I had to make sure the kubectl drain command used the right flags. Here's what I did:

kubectl drain <node-name> \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=30 \
  --timeout=300s \
  --force
Enter fullscreen mode Exit fullscreen mode

I also updated the PDB to allow at least one manager pod to be evicted during the drain, while still ensuring availability:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: longhorn-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: longhorn
Enter fullscreen mode Exit fullscreen mode

This change allowed the drain to proceed without blocking. The key takeaway was that PDBs are not just a safety net — they're a potential roadblock if not configured with maintenance in mind.

This matters a lot if you're managing a Kubernetes cluster with Longhorn. Every time you want to do maintenance, scale down, or upgrade, you're going to run into this. You need to know how PDBs interact with your storage layer, and how to configure them to allow safe operations. If you don't, you'll be stuck trying to figure out why kubectl drain is hanging — and it won't be fun.

I learned my lesson the hard way. I'm now making sure every PDB in my cluster is reviewed during maintenance planning. I've also added documentation to our team's Kubernetes guides about this specific scenario. If I had known earlier, I would have configured the PDB with more flexibility from the start.

If you're running Longhorn and doing any kind of node maintenance, take a moment to check your PDBs. They might be the reason your kubectl drain is getting stuck. And if you're not sure where to start, I recommend looking at the Longhorn PDBs and asking: "What's the worst that could happen if one of these pods goes away during a drain?" The answer will tell you what minAvailable should be.

For more on how to handle storage in Kubernetes, check out my post on Kubernetes Storage on Bare Metal: Longhorn in Practice. It dives deeper into how Longhorn fits into larger infrastructure setups.

Top comments (0)