In Hindu mythology even an ant needs Shiva's aagna to move. Kubernetes has a Shiva too - etcd, granting permission for every change in the cluster. Except etcd isn't a god. It's a datastore with a disk. And that's the whole problem
Every change needs its aagna, all of it through one consensus ledger. As the cluster's population grows, so do the requests for permission and the one approver can't keep up. You can make it faster (better disk, more IOPS) but that's where it ends. One ledger, no horizontal scale. My instinct was to split it by namespace; turns out you split it by resource type and events get their own etcd. The instinct was right, the axis was different.
Your cluster's ceiling was never node count. It's how fast one not quite divine approver can stamp the ledger. etcd cosplays as Shiva. It's a disk.
Top comments (0)