Running Kafka on Kubernetes is usually smooth—but we hit a tricky problem when using KRaft mode.
For context, KRaft (Kafka Raft Metadata mode) is Kafka’s new way of managing cluster metadata without ZooKeeper. Each cluster has a unique Cluster ID stored in its metadata. All nodes need to agree on this ID to join the cluster.
The Problem
We noticed that every time a Kafka pod restarted, it would generate a new Cluster ID. Since our pods used persistent storage, the metadata on disk still had the old Cluster ID. This caused Kafka to fail on startup with a clear error:
Cluster ID mismatch: Expected <old-id>, Found <new-id>
In short, the pod thought it was a new cluster, but the storage said otherwise.
How We Fixed It
For testing, we solved it by manually specifying the Cluster ID in the Kubernetes deployment. This ensured that every pod picked the same ID as the metadata on the persistent volume. After that, the pods started without errors and rejoined the cluster seamlessly. Below is the sample Kubernetes env variable to set it
- name: KAFKA_KRAFT_CLUSTER_ID
value: "kraft-local-cluster"
Lessons Learned
- In KRaft mode, the Cluster ID must persist when using Kubernetes and stateful pods.
- Reusing old volumes can cause mismatches if the Cluster ID changes.
- For production, automate cluster ID propagation or initialize nodes with a pre-set ID to avoid manual fixes.
KRaft mode is promising, but small details like the Cluster ID can trip you up. Once you know what to look for, fixing it is straightforward—and now our Kafka cluster is more stable than ever.
Top comments (0)