DEV Community

Cover image for Setup Multi Node Kafka Cluster (KRaft)
Adil Ansari
Adil Ansari

Posted on

Setup Multi Node Kafka Cluster (KRaft)

In this guide I will walk you through running a 3‑node Apache Kafka cluster on AWS EC2 using KRaft (Kafka’s built-in metadata quorum). At the end of this tutorial you’ll end up with:

  • 3 EC2 instances: kafka-01, kafka-02, kafka-03
  • Each node acts as broker + controller
  • Ports:
    • 9092 for broker/client traffic
    • 9093 for controller quorum traffic

This setup uses PLAINTEXT for simplicity. For production, add TLS + SASL, and proper monitoring.


Prerequisites

1) 3 EC2 instances in the same VPC (different AZs for HA)

  • Recommended: t3a.small or bigger to start (Kafka loves RAM and disk IOPS)
  • Disk: at least 20–50 GB per node (more if you plan to retain data longer)

2) Each node must be able to reach the other nodes on 9092 and 9093


Now let's move to setup our very own kafka cluster setup from scratch.

Step 1: Set hostnames (run on all 3 nodes)

On each node, set the hostname to match:

On kafka-01:

sudo hostnamectl set-hostname kafka-01
Enter fullscreen mode Exit fullscreen mode

On kafka-02:

sudo hostnamectl set-hostname kafka-02
Enter fullscreen mode Exit fullscreen mode

On kafka-03:

sudo hostnamectl set-hostname kafka-03
Enter fullscreen mode Exit fullscreen mode

Then add the same /etc/hosts block on all nodes:

<KAFKA_01_PRIVATE_IP> kafka-01
<KAFKA_02_PRIVATE_IP> kafka-02
<KAFKA_03_PRIVATE_IP> kafka-03
Enter fullscreen mode Exit fullscreen mode

Quick sanity check (from each node):

getent hosts kafka-01 kafka-02 kafka-03
Enter fullscreen mode Exit fullscreen mode

Step 2: Install Java (run on all 3 nodes)

Kafka runs on the JVM, so Java is non‑negotiable.

On Ubuntu:

sudo apt-get update
sudo apt-get install -y openjdk-17-jre-headless
java -version
Enter fullscreen mode Exit fullscreen mode

I used ubuntu for this tutorial, you can use other linux distributions. Most steps would be similar.

On Amazon Linux 2023:

sudo dnf install -y java-17-amazon-corretto-headless
java -version
Enter fullscreen mode Exit fullscreen mode

Step 3: Download and install Kafka (run on all 3 nodes)

We’ll install Kafka under /opt and run it as a dedicated kafka user.

sudo useradd --system --create-home --home-dir /opt/kafka --shell /usr/sbin/nologin kafka || true
sudo mkdir -p /opt/kafka
sudo chown -R kafka:kafka /opt/kafka
Enter fullscreen mode Exit fullscreen mode

Download Kafka 4.3.0 binary (run in /opt):

If curl isn’t installed:

# Ubuntu
sudo apt-get install -y curl

# Amazon Linux 2023
sudo dnf install -y curl
Enter fullscreen mode Exit fullscreen mode
cd /opt
sudo curl -fL -o kafka_2.13-4.3.0.tgz "https://www.apache.org/dyn/closer.lua/kafka/4.3.0/kafka_2.13-4.3.0.tgz?action=download"
sudo tar -xzf kafka_2.13-4.3.0.tgz
sudo chown -R kafka:kafka /opt/kafka_2.13-4.3.0
Enter fullscreen mode Exit fullscreen mode

Create a log directory (this is where Kafka stores data):

sudo mkdir -p /opt/kafka/logDir
sudo chown -R kafka:kafka /opt/kafka/logDir
Enter fullscreen mode Exit fullscreen mode

Step 4: Create the Kafka config (per node)

We’ll use a custom config file: /opt/kafka_2.13-4.3.0/config/custom-server.properties.

Common config (same on all nodes)

Create the file and paste this base (we’ll change the node-specific lines next):

# ==================== Node Roles & Identity ====================
process.roles=broker,controller
node.id=1

# ==================== Network & Listeners ====================
listeners=PLAINTEXT://kafka-01:9092,CONTROLLER://kafka-01:9093
advertised.listeners=PLAINTEXT://kafka-01:9092

listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER

# ==================== KRaft Quorum Configuration ====================
controller.quorum.voters=1@kafka-01:9093,2@kafka-02:9093,3@kafka-03:9093

# ==================== Storage Layout ====================
log.dirs=/opt/kafka/logDir

# ==================== Topic Defaults & System Replication ====================
num.partitions=6
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1

# ==================== Log Retention Policies ====================
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
Enter fullscreen mode Exit fullscreen mode

Now edit the node-specific values on each node:

  • On kafka-01

    • node.id=1
    • listeners=...kafka-01...
    • advertised.listeners=...kafka-01...
  • On kafka-02

    • node.id=2
    • listeners=...kafka-02...
    • advertised.listeners=...kafka-02...
  • On kafka-03

    • node.id=3
    • listeners=...kafka-03...
    • advertised.listeners=...kafka-03...

Use your editor of choice:

sudo vi /opt/kafka_2.13-4.3.0/config/custom-server.properties
Enter fullscreen mode Exit fullscreen mode

A note on advertised.listeners

Whatever you set here is what clients will use to connect. If you plan to connect from your laptop:

  • advertised.listeners must be reachable from your laptop (security group + routing + correct hostname/IP)

If you only connect from inside the VPC:

  • keep it on private hostnames/IPs (recommended)

Step 5: Generate a cluster ID (run once, then reuse on all nodes)

On any one node (e.g. kafka-01):

cd /opt/kafka_2.13-4.3.0
sudo -u kafka bin/kafka-storage.sh random-uuid
Enter fullscreen mode Exit fullscreen mode

Example output:

CPRRAdoxRDaL5L1S2nwuJw
Enter fullscreen mode Exit fullscreen mode

Save it somewhere safe. I like keeping it in a file so it’s not lost:

echo "<PASTE_YOUR_CLUSTER_ID_HERE>" | sudo tee /opt/kafka/cluster.id >/dev/null
Enter fullscreen mode Exit fullscreen mode

Now copy that value to all three nodes (same exact cluster ID).


Step 6: Format the log directory (run on all 3 nodes)

This step initializes the KRaft metadata in your configured storage.

On each node:

cd /opt/kafka_2.13-4.3.0
KAFKA_CLUSTER_ID="$(cat /opt/kafka/cluster.id)"
sudo -u kafka bin/kafka-storage.sh format \
  --cluster-id "$KAFKA_CLUSTER_ID" \
  --config config/custom-server.properties
Enter fullscreen mode Exit fullscreen mode

If you re-run the command later and it complains the directory is already formatted, that’s normal (and a good sign).


Step 7: Create systemd service on all 3 nodes

Create /etc/systemd/system/kafka.service:

[Unit]
Description=Apache Kafka (KRaft)
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
WorkingDirectory=/opt/kafka_2.13-4.3.0

Environment=KAFKA_HEAP_OPTS=-Xmx1G -Xms1G
Environment=KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent

ExecStart=/opt/kafka_2.13-4.3.0/bin/kafka-server-start.sh /opt/kafka_2.13-4.3.0/config/custom-server.properties
Restart=on-failure
RestartSec=3
LimitNOFILE=100000

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Start and enable it:

sudo systemctl daemon-reload
sudo systemctl enable --now kafka.service
sudo systemctl status kafka.service
Enter fullscreen mode Exit fullscreen mode


Step 8: Verify the KRaft quorum is healthy

From any node:

cd /opt/kafka_2.13-4.3.0
sudo -u kafka bin/kafka-metadata-quorum.sh \
  --bootstrap-controller kafka-01:9093 \
  describe --status
Enter fullscreen mode Exit fullscreen mode

You should see all voters listed and one node acting as leader.

If this fails, 90% of the time it’s one of these:

  • 9093 blocked between nodes (security group issue)
  • wrong hostnames (bad /etc/hosts)
  • node.id mismatch vs controller.quorum.voters

Step 9: Create and test your first topic

Create a topic (replication factor 2 for 3 brokers is OK for a lab):

cd /opt/kafka_2.13-4.3.0
sudo -u kafka ./bin/kafka-topics.sh --create \
  --topic first-topic \
  --bootstrap-server kafka-01:9092 \
  --replication-factor 2 \
  --partitions 3
Enter fullscreen mode Exit fullscreen mode

Describe the topic:

sudo -u kafka ./bin/kafka-topics.sh --describe \
  --bootstrap-server kafka-01:9092 \
  --topic first-topic
Enter fullscreen mode Exit fullscreen mode

Start a producer:

sudo -u kafka ./bin/kafka-console-producer.sh \
  --topic first-topic \
  --bootstrap-server kafka-01:9092
Enter fullscreen mode Exit fullscreen mode

In another terminal, start a consumer:

sudo -u kafka ./bin/kafka-console-consumer.sh \
  --topic first-topic \
  --from-beginning \
  --bootstrap-server kafka-01:9092
Enter fullscreen mode Exit fullscreen mode

Type a few messages in the producer and confirm they show up in the consumer.


Troubleshooting (quick hits)

  • Clients can’t connect

    • Check advertised.listeners (clients connect to that, not listeners)
    • Confirm 9092 inbound rules for the client source IP/SG
  • Nodes don’t form a quorum

    • Confirm 9093 is open between the Kafka nodes only
    • Verify controller.quorum.voters matches the correct hostnames and IDs
    • Ensure each node’s node.id is unique
  • Permission errors under /opt/kafka/logDir

    • Fix ownership: sudo chown -R kafka:kafka /opt/kafka/logDir
  • Service “starts” but immediately stops

    • Use sudo journalctl -u kafka.service -n 200 --no-pager to see the real error

Automation Script

You can automate the Linux-specific steps required to set up Kafka on your Ubuntu or Amazon Linux 2023 VM using the shell script provided in the GitHub repository adilansari488/kafka-multi-node-cluster-setup. Any setup steps outside the VM, such as AWS EC2, Security Group configuration, must still be completed manually.


Summary

If you have followed the above steps correctly, you should now have a Kafka cluster running with three nodes.


Feel free to give your feedback and suggestions.


HAPPY KAFKA 😊

Connect

Top comments (0)