DEV Community

Cover image for ๐Ÿง  Kafka Broker vs Controller - Complete Guide
Ajinkya Singh
Ajinkya Singh

Posted on

๐Ÿง  Kafka Broker vs Controller - Complete Guide

Understanding the Two Critical Roles in Kafka's Architecture

The Big Picture

In Kafka 4.0 (with KRaft), servers can perform two distinct roles:

Role Analogy Primary Function
Broker ๐Ÿ“ฆ Library Shelf Manager Handles data storage and delivery
Controller ๐ŸŽฎ Library Head Librarian Manages catalog and coordinates operations

Quick Tip: Think of Kafka as a digital library system. Brokers are the staff who shelve and retrieve books, while Controllers are the head librarians who maintain the catalog and coordinate everything.


Evolution: Before and After

โŒ The Old Way (Before Kafka 4.0)

Problem: Two separate systems to manage!

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ZooKeeper Cluster  โ”‚ โ† External dependency
โ”‚   (The Brain ๐Ÿง )    โ”‚    Must maintain separately
โ”‚                     โ”‚    Additional complexity
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ†“ Manages metadata
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Kafka Brokers     โ”‚
โ”‚ (Data handlers only)โ”‚
โ”‚  โ€ข Store data       โ”‚
โ”‚  โ€ข Serve clients    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Challenges:

  • Two systems to deploy, monitor, and maintain
  • ZooKeeper expertise required
  • Additional infrastructure costs
  • Complex failure scenarios

โœ… The New Way (Kafka 4.0+ with KRaft)

Solution: Self-contained, all-in-one system!

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      KAFKA CLUSTER (Self-Managed)     โ”‚
โ”‚                                       โ”‚
โ”‚  CONTROLLERS (Built-in Brain ๐Ÿง )      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚Ctrl-1โ”‚  โ”‚Ctrl-2โ”‚  โ”‚Ctrl-3โ”‚       โ”‚
โ”‚  โ”‚Leaderโ”‚  โ”‚Followโ”‚  โ”‚Followโ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚     โ”‚                                โ”‚
โ”‚     โ†“ Manages metadata               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚Brkr-1โ”‚  โ”‚Brkr-2โ”‚  โ”‚Brkr-3โ”‚       โ”‚
โ”‚  โ”‚ ๐Ÿ“ฆ   โ”‚  โ”‚ ๐Ÿ“ฆ   โ”‚  โ”‚ ๐Ÿ“ฆ   โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • โœ… Single system to manage
  • โœ… No external dependencies
  • โœ… Faster metadata operations
  • โœ… Simpler deployment

Role 1: The Broker (Library Shelf Manager ๐Ÿ“ฆ)

What It Does

The Broker is the data handler - it stores and serves data to producers and consumers.

Real-World Analogy

Imagine a library shelf manager who:

  • Receives new books from publishers (messages from producers)
  • Organizes them on specific shelves (partitions)
  • Retrieves books when patrons request them (serves consumers)
  • Maintains backup copies in storage rooms (replication)

Key Responsibilities

1๏ธโƒฃ Storing Data ๐Ÿ’พ

Broker stores topic partitions on disk:

/var/kafka/data/
โ”œโ”€โ”€ product-catalog-0/
โ”‚   โ”œโ”€โ”€ 00000000.log  โ† Actual message data
โ”‚   โ”œโ”€โ”€ 00001000.log
โ”‚   โ””โ”€โ”€ offset: 1250
โ”‚
โ”œโ”€โ”€ product-catalog-2/
โ”‚   โ””โ”€โ”€ Backup copy from Broker-3
โ”‚
โ””โ”€โ”€ customer-events-1/
    โ””โ”€โ”€ offset: 450
Enter fullscreen mode Exit fullscreen mode

2๏ธโƒฃ Handling Producer Requests ๐Ÿ“ค

  • Receives messages from producers
  • Appends to partition logs
  • Assigns unique offsets
  • Sends acknowledgments back

3๏ธโƒฃ Handling Consumer Requests ๐Ÿ“ฅ

  • Serves read requests from consumers
  • Fetches data from partitions
  • Tracks consumer positions
  • Manages consumer offsets

4๏ธโƒฃ Replication ๐Ÿ”„

  • Copies data between leader and follower partitions
  • Ensures data redundancy
  • Maintains in-sync replicas (ISR)
  • Handles failover scenarios

5๏ธโƒฃ Providing Metadata ๐Ÿ“‹

  • Tells clients about cluster topology
  • Shares partition locations
  • Provides leader information
  • Responds to bootstrap requests

Visual: Broker in Action

        Producers                  Consumers
            โ”‚                          โ”‚
            โ†“ Write                    โ†‘ Read
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚         BROKER-1 (Server)         โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚                                   โ”‚
    โ”‚  product-catalog-0/ (Leader)      โ”‚
    โ”‚  โ”œโ”€ Messages: 1-1250              โ”‚
    โ”‚  โ””โ”€ Actively serving clients      โ”‚
    โ”‚                                   โ”‚
    โ”‚  product-catalog-2/ (Follower)    โ”‚
    โ”‚  โ””โ”€ Syncing from Broker-3         โ”‚
    โ”‚                                   โ”‚
    โ”‚  customer-events-1/ (Leader)      โ”‚
    โ”‚  โ””โ”€ Messages: 1-450               โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Role 2: The Controller (Head Librarian ๐ŸŽฎ)

What It Does

The Controller is the brain/orchestrator - it manages cluster state and coordinates operations.

Real-World Analogy

Imagine a head librarian who:

  • Doesn't shelve books personally (no data handling)
  • Maintains the master catalog (metadata)
  • Decides which staff manages which sections (partition assignment)
  • Tracks all library locations and staff availability (broker health)
  • Coordinates responses when staff call in sick (leader election)
  • If the head librarian is unavailable, an assistant takes over immediately

Key Responsibilities

1๏ธโƒฃ Cluster State Management ๐Ÿ—บ๏ธ

The Controller maintains the single source of truth:

Topic Registry:
  - Topic: "transaction-stream"
    Partitions: 6
    Replication Factor: 3
    Leaders:
      - Partition-0: Broker-1
      - Partition-1: Broker-2
      - Partition-2: Broker-3
      - Partition-3: Broker-1
      - Partition-4: Broker-2
      - Partition-5: Broker-3

Broker Registry:
  - Broker-1: โœ… Online, 15 partitions
  - Broker-2: โœ… Online, 18 partitions
  - Broker-3: โœ… Online, 17 partitions

Consumer Groups:
  - Group "data-analytics":
    Members: [Consumer-A, Consumer-B, Consumer-C]
    Coordinator: Broker-1
Enter fullscreen mode Exit fullscreen mode

2๏ธโƒฃ Leader Election โญ

When a partition leader fails, the Controller:

  1. Detects the failure immediately
  2. Selects a new leader from in-sync replicas
  3. Updates cluster metadata
  4. Notifies all brokers
  5. Clients automatically redirect to new leader

Example Scenario:

Before:  transaction-stream-0 Leader = Broker-1 โœ…
         Broker-1 crashes! ๐Ÿ’ฅ
After:   transaction-stream-0 Leader = Broker-2 โญ (promoted!)
         Time taken: ~2-3 seconds
Enter fullscreen mode Exit fullscreen mode

3๏ธโƒฃ Cluster Change Notification ๐Ÿ“ข

The Controller broadcasts changes to all brokers:

  • ๐Ÿ†• New topic created โ†’ notify all brokers
  • โš ๏ธ Broker goes down โ†’ redistribute partitions
  • โญ New leader elected โ†’ update routing
  • ๐Ÿ”ง Configuration changed โ†’ apply updates

4๏ธโƒฃ Broker Lifecycle Management ๐Ÿ”„

  • Manages broker registration
  • Handles broker join/leave events
  • Smooth handoff during shutdowns
  • Updates cluster membership

5๏ธโƒฃ Administrative Operations โš™๏ธ

  • Topic creation/deletion
  • Partition reassignment
  • Configuration changes
  • Quota management

Visual: Controller Quorum

    CONTROLLER QUORUM (High Availability)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Ctrl-1  โ”‚  โ”‚  Ctrl-2  โ”‚  โ”‚  Ctrl-3  โ”‚
โ”‚ (LEADER) โ”‚โ—„โ”€โ”ค(Follower)โ”‚โ—„โ”€โ”ค(Follower)โ”‚
โ”‚    โญ    โ”‚  โ”‚          โ”‚  โ”‚          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ€ข Makes  โ”‚  โ”‚ โ€ข Standbyโ”‚  โ”‚ โ€ข Standbyโ”‚
โ”‚   all    โ”‚  โ”‚ โ€ข Ready  โ”‚  โ”‚ โ€ข Ready  โ”‚
โ”‚   decis- โ”‚  โ”‚   to     โ”‚  โ”‚   to     โ”‚
โ”‚   ions   โ”‚  โ”‚   take   โ”‚  โ”‚   take   โ”‚
โ”‚ โ€ข Notif- โ”‚  โ”‚   over   โ”‚  โ”‚   over   โ”‚
โ”‚   ies    โ”‚  โ”‚ โ€ข Syncs  โ”‚  โ”‚ โ€ข Syncs  โ”‚
โ”‚   brokersโ”‚  โ”‚   data   โ”‚  โ”‚   data   โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚
     โ†“ Commands & notifications
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            BROKERS                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚  โ”‚Br-1โ”‚    โ”‚Br-2โ”‚    โ”‚Br-3โ”‚      โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Important Notes:

  • Always use an odd number of controllers (3, 5, 7)
  • Uses Raft consensus algorithm
  • Requires majority to function (e.g., 2 out of 3)
  • If majority fails, cluster cannot make decisions

Combined vs Dedicated Roles

Option 1: Combined Role (Development/Testing)

Setup: Each node runs BOTH broker + controller

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     NODE-1          โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚   Controller    โ”‚ โ”‚
โ”‚ โ”‚   (Leader) โญ   โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚         +           โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚     Broker      โ”‚ โ”‚
โ”‚ โ”‚   (Data ๐Ÿ“ฆ)     โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Same for NODE-2 and NODE-3
(with follower controllers)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • โœ… Simple setup
  • โœ… Fewer machines (cost-effective)
  • โœ… Good for development/testing
  • โœ… Small-scale production

Cons:

  • โŒ Resource contention (metadata + data compete)
  • โŒ Less stable under high load
  • โŒ Harder to scale independently
  • โŒ "Noisy neighbor" problem

Best For:

  • Local development
  • Testing environments
  • Small production deployments (<10 brokers)
  • Low-traffic applications

Option 2: Dedicated Roles (Production)

Setup: Separate controller nodes from broker nodes

DEDICATED CONTROLLERS (Metadata Only)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Ctrl-1  โ”‚  โ”‚  Ctrl-2  โ”‚  โ”‚  Ctrl-3  โ”‚
โ”‚ (Leader) โ”‚  โ”‚(Follower)โ”‚  โ”‚(Follower)โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 4GB RAM  โ”‚  โ”‚ 4GB RAM  โ”‚  โ”‚ 4GB RAM  โ”‚
โ”‚ 2 CPU    โ”‚  โ”‚ 2 CPU    โ”‚  โ”‚ 2 CPU    โ”‚
โ”‚ Small VM โ”‚  โ”‚ Small VM โ”‚  โ”‚ Small VM โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚
     โ†“ Manages

DEDICATED BROKERS (Data Only)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Brkr-1  โ”‚  โ”‚  Brkr-2  โ”‚  โ”‚  Brkr-3  โ”‚
โ”‚   ๐Ÿ“ฆ     โ”‚  โ”‚   ๐Ÿ“ฆ     โ”‚  โ”‚   ๐Ÿ“ฆ     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 64GB RAM โ”‚  โ”‚ 64GB RAM โ”‚  โ”‚ 64GB RAM โ”‚
โ”‚ 16 CPU   โ”‚  โ”‚ 16 CPU   โ”‚  โ”‚ 16 CPU   โ”‚
โ”‚ TB disk  โ”‚  โ”‚ TB disk  โ”‚  โ”‚ TB disk  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

... scale to 100+ brokers as needed
Enter fullscreen mode Exit fullscreen mode

Pros:

  • โœ… Maximum stability (isolated operations)
  • โœ… Independent scaling
  • โœ… Optimized resources per role
  • โœ… Better fault tolerance
  • โœ… Industry standard for production
  • โœ… Can upgrade independently

Cons:

  • โŒ More machines (higher cost)
  • โŒ More complex setup
  • โŒ Overkill for small deployments

Best For:

  • Production environments
  • High-traffic applications
  • Enterprise deployments
  • Systems requiring 24/7 uptime

Real-World Examples

Example: Controller Leader Failover

Scenario: Main controller experiences hardware failure

BEFORE (Normal Operations):
Controller-1 (LEADER) โญ โ†’ Managing all metadata
Controller-2 (Follower) โ†’ Standby backup
Controller-3 (Follower) โ†’ Standby backup

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
Hardware failure on Controller-1! ๐Ÿ’ฅ
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

Detection (within seconds):
Controller-2: "Leader timeout detected!"
Controller-3: "Leader timeout detected!"
        โ”‚
        โ†“ Raft Consensus Election

AFTER (2-3 seconds):
Controller-1 (OFFLINE) ๐Ÿ’€
Controller-2 (LEADER) โญ โ†’ PROMOTED! Takes over all duties
Controller-3 (Follower) โ†’ Continues standby

โœ… Service continues without interruption!
โœ… No data lost!
โœ… Brokers still serving all requests!
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Question on Kraftโ€™s Leader Election Algorithm

In Kraftโ€™s leader election algorithm, correctness proofs often assume an odd number of nodes to avoid symmetry and tie-breaking issues.

But in real distributed systems, nodes can fail at any time.

๐Ÿ‘‰ If a node fails mid-execution and the system is left with an even number of active nodes, how does the algorithm still guarantee that a unique leader is elected?

Would love to hear your views and interpretations on this!

Top comments (0)