Overview
This guide walks through deploying a production-grade Apache Kafka cluster on Kubernetes using KRaft mode (no ZooKeeper), SASL/SCRAM-SHA-512 authentication, and a 3-node StatefulSet for high availability. It covers every manifest file required, the reasoning behind each configuration decision, the SCRAM credential bootstrap process, common pitfalls encountered in practice, and the steps needed to take the cluster from running to production-ready.
Prerequisites
Tools
-
kubectlconfigured against your target cluster - A Kubernetes cluster with at least 3 nodes (one per Kafka pod) with sufficient resources
- Persistent volume provisioner available (e.g. local-path, Longhorn, Ceph, AWS EBS)
-
keytool(part of the JDK) if you plan to add TLS later
Cluster Resources
Each Kafka pod in this guide requests 500m CPU and 1Gi RAM, with limits of 2 CPU and 4Gi RAM. For production you should size these based on your throughput requirements. A minimum of 10Gi persistent storage per broker is configured.
Kubernetes Version
Kubernetes 1.21 or later is recommended. StatefulSets, headless Services, and ConfigMaps used in this guide are stable APIs available in all modern versions.
Architecture
The cluster consists of three pods (kafka-0, kafka-1, kafka-2) each running in combined broker+controller mode using KRaft. This means each pod participates in the Raft quorum for metadata management as well as serving producer and consumer traffic.
┌─────────────────────────────────────────┐
│ Kafka Namespace │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ kafka-0 │ │ kafka-1 │ │ kafka-2 │ │
│ │ broker │ │ broker │ │ broker │ │
│ │ + ctrl │ │ + ctrl │ │ + ctrl │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ port 9092 (SASL_PLAINTEXT — broker) │
│ port 9093 (PLAINTEXT — controller) │
│ │
│ ┌─────────────────────────────────┐ │
│ │ kafka-headless (ClusterIP: │ │
│ │ None) — DNS per pod │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
KRaft requires a majority quorum (2 out of 3) to elect a leader and commit metadata. The cluster can tolerate the loss of one node.
File Structure
You need five manifest files plus one temporary modification during the SCRAM bootstrap process:
namespace.yaml — Kafka namespace
kafka-svc.yaml — Headless service for pod DNS
kafka-jaas.yaml — JAAS config for inter-broker SCRAM auth
kafka-sasl.yaml — Secret holding admin credentials for client apps
kafka-stateful-set.yaml — Brokers, init containers, volumes, config
Step 1 — Namespace
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: kafka
Apply first so all subsequent resources land in the correct namespace:
kubectl apply -f namespace.yaml
Step 2 — Headless Service
# kafka-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: kafka-headless
namespace: kafka
spec:
clusterIP: None
selector:
app: kafka
ports:
- name: internal
port: 9092
- name: controller
port: 9093
A headless service (clusterIP: None) causes Kubernetes DNS to create per-pod A records:
kafka-0.kafka-headless.kafka.svc.cluster.local
kafka-1.kafka-headless.kafka.svc.cluster.local
kafka-2.kafka-headless.kafka.svc.cluster.local
These stable DNS names are what the KRaft quorum voters list and the advertised listeners are built from. They survive pod restarts and rescheduling because they are tied to the StatefulSet ordinal, not the pod IP.
Step 3 — JAAS Configuration
# kafka-jaas.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kafka-jaas
namespace: kafka
data:
jaas.conf: |
KafkaServer {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="admin"
password="supersecret";
};
This file configures the Java Authentication and Authorization Service (JAAS) for the Kafka broker process itself. The username and password here are the inter-broker credentials — what each broker uses when authenticating to other brokers on the INTERNAL listener.
Important: JAAS alone does not create the SCRAM user in Kafka's metadata store. That is a separate bootstrap step performed after the cluster is running (see Step 7). The JAAS file tells the broker process what credentials to use, but those credentials must also exist in Kafka's internal metadata before inter-broker authentication will succeed.
Step 4 — SASL Secret
# kafka-sasl.yaml
apiVersion: v1
kind: Secret
metadata:
name: kafka-sasl
namespace: kafka
type: Opaque
stringData:
username: admin
password: supersecret
This secret can be mounted into client pods or referenced by applications that need to connect to Kafka. It is separate from the JAAS ConfigMap so that client credentials can be managed independently of the broker configuration.
Step 5 — StatefulSet
This is the core manifest. It contains two init containers and one main broker container.
Why Two Init Containers?
init-node-id runs a tiny busybox container to extract the pod ordinal (0, 1, or 2) from the hostname and writes it to a shared emptyDir volume. This is necessary because Kubernetes environment variable substitution does not support string manipulation, so you cannot derive the integer 0 from the pod name kafka-0 using env vars alone.
format-storage runs the actual Kafka image to format the KRaft log directory using kafka-storage.sh. It only runs if /data/meta.properties does not already exist, making it safe on pod restarts. The temporary kraft.properties file used during formatting includes a dummy plaintext broker listener — this is required because Kafka's config validator refuses to proceed if the only listener defined is a controller listener. These values are only used during formatting and are not used by the running broker.
Why /etc/kafka/docker/run Instead of kafka-server-start.sh?
The official apache/kafka Docker image ships an entrypoint at /etc/kafka/docker/run that translates KAFKA_* environment variables into server.properties entries before starting the broker. Calling kafka-server-start.sh directly bypasses this translation entirely — env vars like KAFKA_LOG_DIRS are ignored and the broker falls back to compiled-in defaults, causing it to look for data in the wrong directory.
Why CLUSTER_ID as an Env Var?
If CLUSTER_ID is not set, the official entrypoint generates a new random cluster ID on every pod start and attempts to re-format storage. This fails because the existing meta.properties already has a different ID written by the init container. Always set CLUSTER_ID to match the value used during the format step.
Generating Your Own Cluster ID
The cluster ID q1Sh-9_ISia_zwGINzRvyQ used in this guide was generated with:
/opt/kafka/bin/kafka-storage.sh random-uuid
Generate your own unique ID for each cluster you deploy. Do not reuse IDs across clusters.
# kafka-stateful-set.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
namespace: kafka
spec:
serviceName: kafka-headless
replicas: 3
selector:
matchLabels:
app: kafka
template:
metadata:
labels:
app: kafka
spec:
securityContext:
fsGroup: 1001
volumes:
- name: kafka-config
emptyDir: {}
- name: kafka-jaas
configMap:
name: kafka-jaas
initContainers:
- name: init-node-id
image: busybox:1.36
command:
- sh
- -c
- |
ORDINAL=$(hostname | awk -F'-' '{print $NF}')
echo "$ORDINAL" > /config/node-id
volumeMounts:
- name: kafka-config
mountPath: /config
- name: format-storage
image: apache/kafka:4.2.0
command:
- sh
- -c
- |
NODE_ID=$(cat /config/node-id)
if [ ! -f "/data/meta.properties" ]; then
echo "Formatting storage for node $NODE_ID..."
echo "node.id=$NODE_ID" > /tmp/kraft.properties
echo "process.roles=broker,controller" >> /tmp/kraft.properties
echo "controller.quorum.voters=0@kafka-0.kafka-headless.kafka.svc.cluster.local:9093,1@kafka-1.kafka-headless.kafka.svc.cluster.local:9093,2@kafka-2.kafka-headless.kafka.svc.cluster.local:9093" >> /tmp/kraft.properties
echo "listeners=PLAINTEXT://:9092,CONTROLLER://:9093" >> /tmp/kraft.properties
echo "advertised.listeners=PLAINTEXT://localhost:9092" >> /tmp/kraft.properties
echo "listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT" >> /tmp/kraft.properties
echo "inter.broker.listener.name=PLAINTEXT" >> /tmp/kraft.properties
echo "controller.listener.names=CONTROLLER" >> /tmp/kraft.properties
echo "log.dirs=/data" >> /tmp/kraft.properties
/opt/kafka/bin/kafka-storage.sh format \
--ignore-formatted \
--cluster-id q1Sh-9_ISia_zwGINzRvyQ \
--config /tmp/kraft.properties
else
echo "Already formatted, skipping."
fi
volumeMounts:
- name: kafka-data
mountPath: /data
- name: kafka-config
mountPath: /config
containers:
- name: kafka
image: apache/kafka:4.2.0
command:
- sh
- -c
- |
export KAFKA_NODE_ID=$(cat /config/node-id)
exec /etc/kafka/docker/run
ports:
- containerPort: 9092
- containerPort: 9093
env:
- name: CLUSTER_ID
value: "q1Sh-9_ISia_zwGINzRvyQ"
- name: KAFKA_PROCESS_ROLES
value: "broker,controller"
- name: KAFKA_CONTROLLER_LISTENER_NAMES
value: "CONTROLLER"
- name: KAFKA_CONTROLLER_QUORUM_VOTERS
value: "0@kafka-0.kafka-headless.kafka.svc.cluster.local:9093,1@kafka-1.kafka-headless.kafka.svc.cluster.local:9093,2@kafka-2.kafka-headless.kafka.svc.cluster.local:9093"
- name: KAFKA_LISTENERS
value: "INTERNAL://:9092,CONTROLLER://:9093"
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INTERNAL:SASL_PLAINTEXT,CONTROLLER:PLAINTEXT"
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: "INTERNAL"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://$(POD_NAME).kafka-headless.kafka.svc.cluster.local:9092"
- name: KAFKA_SASL_ENABLED_MECHANISMS
value: SCRAM-SHA-512
- name: KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL
value: SCRAM-SHA-512
- name: KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR
value: "3"
- name: KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
value: "3"
- name: KAFKA_TRANSACTION_STATE_LOG_MIN_ISR
value: "2"
- name: KAFKA_MIN_INSYNC_REPLICAS
value: "2"
- name: KAFKA_LOG_DIRS
value: /data
- name: KAFKA_OPTS
value: "-Djava.security.auth.login.config=/opt/kafka/config/jaas/jaas.conf"
volumeMounts:
- name: kafka-data
mountPath: /data
- name: kafka-config
mountPath: /config
- name: kafka-jaas
mountPath: /opt/kafka/config/jaas
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: kafka-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Step 6 — Initial Deployment
Apply all manifests:
kubectl apply -f namespace.yaml
kubectl apply -f kafka-svc.yaml
kubectl apply -f kafka-jaas.yaml
kubectl apply -f kafka-sasl.yaml
kubectl apply -f kafka-stateful-set.yaml
Watch the pods come up:
kubectl get pods -n kafka -w
All three pods should reach Running status within a minute or two. Check kafka-0 logs to confirm a clean startup:
kubectl logs -n kafka kafka-0
A healthy startup ends with:
[KafkaRaftServer nodeId=0] Kafka Server started
Step 7 — Bootstrap SCRAM Credentials
This is the most involved post-deployment step, and also the most commonly misunderstood. Here is why it requires a special process.
The Chicken-and-Egg Problem
The broker's INTERNAL listener (port 9092) requires SASL/SCRAM-SHA-512 authentication. The kafka-configs.sh admin tool needs to connect to a broker to write SCRAM credentials into Kafka's metadata. But to connect to the broker, you need valid SCRAM credentials — which don't exist yet because you haven't written them.
There are two apparent escape hatches that do not actually work:
- Using the JAAS file credentials directly against port 9092 — the JAAS file tells the broker process what credentials to use for its own inter-broker connections, but those credentials must also exist in Kafka's metadata store before the SCRAM handshake will succeed. The JAAS file does not pre-populate metadata.
-
Using
--bootstrap-controlleragainst port 9093 — Kafka 4.x does not support thealterUserScramCredentialsadmin API via the controller quorum endpoint. It must go through a broker.
The solution is to temporarily open an unauthenticated PLAINTEXT listener on a separate port (9094), use it to write the credentials, then close it again.
7a — Add the Temporary Listener to the Service
Edit kafka-svc.yaml to add port 9094:
# kafka-svc.yaml (temporary — port 9094 will be removed after bootstrap)
apiVersion: v1
kind: Service
metadata:
name: kafka-headless
namespace: kafka
spec:
clusterIP: None
selector:
app: kafka
ports:
- name: internal
port: 9092
- name: controller
port: 9093
- name: plaintext-bootstrap
port: 9094
7b — Add the Temporary Listener to the StatefulSet
Edit kafka-stateful-set.yaml and make these changes to the broker container:
Add containerPort: 9094 to the ports list.
Change KAFKA_LISTENERS to:
- name: KAFKA_LISTENERS
value: "INTERNAL://:9092,CONTROLLER://:9093,PLAINTEXT://:9094"
Change KAFKA_LISTENER_SECURITY_PROTOCOL_MAP to:
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: "INTERNAL:SASL_PLAINTEXT,CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
Change KAFKA_ADVERTISED_LISTENERS to:
- name: KAFKA_ADVERTISED_LISTENERS
value: "INTERNAL://$(POD_NAME).kafka-headless.kafka.svc.cluster.local:9092,PLAINTEXT://$(POD_NAME).kafka-headless.kafka.svc.cluster.local:9094"
7c — Apply and Wait for Rollout
kubectl apply -f kafka-svc.yaml
kubectl apply -f kafka-stateful-set.yaml
kubectl rollout status statefulset/kafka -n kafka
7d — Register the SCRAM Credentials
kubectl exec -n kafka kafka-0 -- \
/opt/kafka/bin/kafka-configs.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9094 \
--alter \
--add-config 'SCRAM-SHA-512=[password=supersecret]' \
--entity-type users \
--entity-name admin
Expected output:
Completed updating config for user admin.
If you need additional users (application service accounts), register them now while port 9094 is still open:
kubectl exec -n kafka kafka-0 -- \
/opt/kafka/bin/kafka-configs.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9094 \
--alter \
--add-config 'SCRAM-SHA-512=[password=apppassword]' \
--entity-type users \
--entity-name myapp
7e — Remove the Temporary Listener
Revert kafka-svc.yaml to remove port 9094:
# kafka-svc.yaml (final)
apiVersion: v1
kind: Service
metadata:
name: kafka-headless
namespace: kafka
spec:
clusterIP: None
selector:
app: kafka
ports:
- name: internal
port: 9092
- name: controller
port: 9093
Revert the StatefulSet changes — remove the PLAINTEXT entries from KAFKA_LISTENERS, KAFKA_LISTENER_SECURITY_PROTOCOL_MAP, and KAFKA_ADVERTISED_LISTENERS, and remove the containerPort: 9094 line.
Apply and wait:
kubectl apply -f kafka-svc.yaml
kubectl apply -f kafka-stateful-set.yaml
kubectl rollout status statefulset/kafka -n kafka
Step 8 — Verify the Cluster
Check the KRaft quorum status:
kubectl exec -n kafka kafka-0 -- \
/opt/kafka/bin/kafka-metadata-quorum.sh \
--bootstrap-controller kafka-0.kafka-headless.kafka.svc.cluster.local:9093 \
describe --status
A healthy cluster shows:
LeaderId: 0
LeaderEpoch: 1
HighWatermark: 303
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 0
CurrentVoters: [{"id": 0, ...}, {"id": 1, ...}, {"id": 2, ...}]
CurrentObservers: []
MaxFollowerLag: 0 confirms all three nodes are fully in sync. Three voters and no observers confirms all nodes are full participants in the quorum.
Verify SASL authentication works on port 9092:
kubectl exec -n kafka kafka-0 -- bash -c '
cat > /tmp/client.properties << EOF
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="supersecret";
EOF
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9092 \
--command-config /tmp/client.properties \
--list
'
Create a test topic and produce/consume a message:
kubectl exec -n kafka kafka-0 -- bash -c '
cat > /tmp/client.properties << EOF
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin" password="supersecret";
EOF
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9092 \
--command-config /tmp/client.properties \
--create --topic test --partitions 3 --replication-factor 3
echo "hello kafka" | /opt/kafka/bin/kafka-console-producer.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9092 \
--producer.config /tmp/client.properties \
--topic test
/opt/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server kafka-0.kafka-headless.kafka.svc.cluster.local:9092 \
--consumer.config /tmp/client.properties \
--topic test --from-beginning --max-messages 1
'
Production Hardening Checklist
The cluster at this point is functional and secured with SASL. The following additional steps are needed before handling real production traffic.
Add TLS Encryption
The current setup uses SASL_PLAINTEXT, meaning credentials and data are transmitted unencrypted within the cluster. For any environment where network traffic could be intercepted, upgrade to SASL_SSL.
Generate a keystore and truststore using keytool. The Subject Alternative Names (SANs) must cover all broker hostnames — this is the most common mistake when setting up TLS for Kafka on Kubernetes.
PASS=yourpassword
SAN="dns:kafka-0.kafka-headless.kafka.svc.cluster.local,\
dns:kafka-1.kafka-headless.kafka.svc.cluster.local,\
dns:kafka-2.kafka-headless.kafka.svc.cluster.local,\
dns:kafka-headless.kafka.svc.cluster.local"
# Generate CA
keytool -genkeypair -alias ca -keyalg RSA -keysize 2048 \
-dname "CN=kafka-ca" -validity 3650 \
-keystore ca.jks -storepass $PASS -storetype JKS
keytool -exportcert -alias ca -keystore ca.jks \
-storepass $PASS -file ca.crt
# Generate broker keypair and CSR
keytool -genkeypair -alias kafka -keyalg RSA -keysize 2048 \
-dname "CN=kafka" -validity 3650 \
-keystore keystore.jks -storepass $PASS -storetype JKS
keytool -certreq -alias kafka -keystore keystore.jks \
-storepass $PASS -file kafka.csr
# Sign the CSR with the CA, including all broker SANs
keytool -gencert -alias ca -keystore ca.jks -storepass $PASS \
-infile kafka.csr -outfile kafka.crt -validity 3650 \
-ext "SAN=$SAN"
# Import CA + signed cert into the keystore
keytool -importcert -alias ca -file ca.crt \
-keystore keystore.jks -storepass $PASS -noprompt
keytool -importcert -alias kafka -file kafka.crt \
-keystore keystore.jks -storepass $PASS -noprompt
# Build truststore containing only the CA
keytool -importcert -alias ca -file ca.crt \
-keystore truststore.jks -storepass $PASS -noprompt
# Store in Kubernetes
kubectl create secret generic kafka-tls -n kafka \
--from-file=keystore.jks=keystore.jks \
--from-file=truststore.jks=truststore.jks \
--from-literal=password=$PASS \
--dry-run=client -o yaml | kubectl apply -f -
Then update the StatefulSet to mount the secret and change the listener protocols:
-
INTERNAL:SASL_PLAINTEXT→INTERNAL:SASL_SSL -
CONTROLLER:PLAINTEXT→CONTROLLER:SSL - Add
KAFKA_SSL_KEYSTORE_LOCATION,KAFKA_SSL_TRUSTSTORE_LOCATION,KAFKA_SSL_KEYSTORE_PASSWORD,KAFKA_SSL_TRUSTSTORE_PASSWORDenv vars - Mount the
kafka-tlssecret as a volume at/tls
Change Default Credentials
The admin/supersecret credentials used in this guide are for demonstration only. Before going to production:
- Update
kafka-jaas.yamlwith a strong randomly generated password - Update
kafka-sasl.yamlwith the same password - Re-register the SCRAM credentials using the bootstrap process in Step 7
- Rotate any client configurations that reference the old credentials
Network Policies
Add a NetworkPolicy to restrict which pods can reach Kafka:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kafka-network-policy
namespace: kafka
spec:
podSelector:
matchLabels:
app: kafka
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: kafka
ports:
- port: 9092
- port: 9093
- from:
- namespaceSelector: {}
ports:
- port: 9092
egress:
- to:
- podSelector:
matchLabels:
app: kafka
Pod Anti-Affinity
Prevent Kubernetes from scheduling multiple Kafka pods on the same node:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- kafka
topologyKey: kubernetes.io/hostname
Add this under spec.template.spec in the StatefulSet.
Pod Disruption Budget
Ensure Kubernetes never takes down more than one Kafka pod at a time during voluntary disruptions such as node drains or rolling updates:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kafka-pdb
namespace: kafka
spec:
minAvailable: 2
selector:
matchLabels:
app: kafka
Liveness and Readiness Probes
Add probes so Kubernetes can detect and restart unhealthy pods:
livenessProbe:
tcpSocket:
port: 9092
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 6
readinessProbe:
tcpSocket:
port: 9092
initialDelaySeconds: 20
periodSeconds: 5
failureThreshold: 3
Resource Tuning
| Workload | CPU Request | Memory Request | Storage |
|---|---|---|---|
| Development | 250m | 512Mi | 5Gi |
| Low traffic | 500m | 1Gi | 20Gi |
| Medium traffic | 1 | 2Gi | 50Gi |
| High traffic | 2–4 | 4–8Gi | 100Gi+ |
JVM Heap Tuning
Set the JVM heap via the KAFKA_HEAP_OPTS env var. A common rule of thumb is 50% of the container memory limit, not exceeding 6Gi:
- name: KAFKA_HEAP_OPTS
value: "-Xms2g -Xmx2g"
Monitoring
The official apache/kafka image exposes JMX metrics. Enable them with:
- name: KAFKA_JMX_PORT
value: "9999"
- name: KAFKA_JMX_HOSTNAME
value: "0.0.0.0"
Key metrics to monitor:
-
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions— should always be 0 -
kafka.controller:type=KafkaController,name=ActiveControllerCount— should be 1 across the cluster -
kafka.network:type=RequestMetrics,name=TotalTimeMs— producer/consumer latency - JVM GC pause times — long pauses indicate heap needs tuning
Common Troubleshooting
Pod stuck in Init state
Check the init container logs:
kubectl logs -n kafka kafka-0 -c format-storage
kubectl logs -n kafka kafka-0 -c init-node-id
No readable meta.properties error
The broker cannot find formatted storage. Ensure you are using /etc/kafka/docker/run as the entrypoint, not kafka-server-start.sh. The latter ignores KAFKA_* env vars and the broker looks in the wrong directory.
Invalid cluster.id error
The CLUSTER_ID env var does not match what was written to meta.properties during the format step. Delete the PVCs to force a reformat:
kubectl delete pvc -n kafka -l app=kafka
There must be at least one broker advertised listener during format
The temp kraft.properties used in the format-storage init container must include a non-controller listener. Ensure listeners, advertised.listeners, listener.security.protocol.map, and inter.broker.listener.name are all present in the temp properties file.
SASL authentication timeout on port 9092
The SCRAM credentials have not been registered. The JAAS file alone is not enough — run the bootstrap process in Step 7 to write the credentials into Kafka's metadata store.
UnsupportedEndpointTypeException when using --bootstrap-controller
The alterUserScramCredentials admin API is not supported via the controller quorum endpoint in Kafka 4.x. You must connect through a broker (port 9092). Use the temporary PLAINTEXT listener on port 9094 as described in Step 7.
Port 9094 connection refused
The headless service does not expose port 9094 by default. You must explicitly add it to kafka-svc.yaml during the bootstrap process (Step 7a) and remove it afterwards.
Brokers cannot reach each other
Verify the headless service exists and pod DNS resolves from within a pod:
kubectl exec -n kafka kafka-0 -- \
nslookup kafka-1.kafka-headless.kafka.svc.cluster.local
MaxFollowerLag is non-zero
One broker is behind. Check its logs for errors. Common causes are disk I/O pressure, a recent pod restart, or network instability.
Upgrading Kafka
To upgrade to a new Kafka version, update the image field in the StatefulSet. Kubernetes performs a rolling restart one pod at a time. Since KRaft requires a majority, the cluster stays available as long as at least 2 of 3 pods are running.
kubectl set image statefulset/kafka \
kafka=apache/kafka:4.3.0 \
-n kafka
kubectl rollout status statefulset/kafka -n kafka
Summary
| Step | What it does |
|---|---|
| namespace.yaml | Isolates all Kafka resources |
| kafka-svc.yaml | Headless service for stable pod DNS |
| kafka-jaas.yaml | JAAS config for inter-broker SCRAM auth |
| kafka-sasl.yaml | Secret for client applications |
| kafka-stateful-set.yaml | Brokers, init containers, storage, config |
| Step 7 — SCRAM bootstrap | Temporarily open port 9094, write credentials to metadata, close port 9094 |
The cluster is production-ready when TLS is enabled, credentials are rotated to strong values, pod anti-affinity and a PodDisruptionBudget are in place, resource limits are tuned for your workload, and monitoring is active.
Top comments (0)