Please read the last two articles in my Kafka series before this one β this part gets serious.
π The Big Picture: Two Brains Working Together
Think of Kafka as a well-organized company with two main components:
βββββββββββββββββββββββββββββββββββββββββββββββ
β KAFKA CLUSTER β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β CONTROLLERS (The Brain π§ ) β β
β β β’ Manage who does what β β
β β β’ Track what's happening β β
β β β’ Make decisions β β
β ββββββββββββββββ¬ββββββββββββββββββββββ β
β β Commands & Updates β
β ββββββββββββββββββββββββββββββββββββββ β
β β BROKERS (The Workers πͺ) β β
β β β’ Store the actual data β β
β β β’ Serve producers and consumers β β
β β β’ Follow controller's instructionsβ β
β ββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Simple Analogy: Controllers are like managers who plan and coordinate, while Brokers are employees who do the actual work.
π Part 1: SETUP - Controllers Organize Everything
Step 1: Controllers Start and Elect Leader
When Kafka starts, multiple controllers use the Raft Election algorithm to choose one leader:
Controller-1 Controller-2 Controller-3
β β β
ββββββββRaft Electionββββββββββββ
β
One becomes LEADER β
β
ββββββββββββββββββ
β Controller-1 β
β (LEADER) β β β Makes all decisions!
ββββββββββββββββββ
π‘ Simple: Like choosing a class monitor who manages everything.
Step 2: Controllers Create Metadata Registry
The Controller Leader creates a comprehensive "notebook" π of everything happening in the cluster:
ββββββββββββββββββββββββββββββββββββββββββββ
β METADATA REGISTRY β
ββββββββββββββββββββββββββββββββββββββββββββ€
β β
β TOPICS: β
β β’ "orders" β 3 partitions β
β β’ "payments" β 2 partitions β
β β
β BROKERS: β
β β’ Broker-1 β Alive β
(IP: 192.168.1.10)β
β β’ Broker-2 β Alive β
(IP: 192.168.1.11)β
β β’ Broker-3 β Alive β
(IP: 192.168.1.12)β
β β
β WHO'S IN CHARGE (Leaders): β
β β’ orders-partition-0 β Broker-1 β
β β’ orders-partition-1 β Broker-2 β
β β’ orders-partition-2 β Broker-3 β
β β
β BACKUPS (Followers): β
β β’ orders-partition-0 β Broker-2, Broker-3β
β β’ orders-partition-1 β Broker-3, Broker-1β
β β’ orders-partition-2 β Broker-1, Broker-2β
ββββββββββββββββββββββββββββββββββββββββββββ
π‘ Simple: Like a school register showing which classes exist, which teachers are present, who teaches which subject, and who are the substitute teachers.
Step 3: Controller Tells Brokers Their Jobs
The controller assigns specific roles to each broker:
Controller β β Broker-1: "You are the LEADER for orders-partition-0"
"You are a BACKUP for orders-partition-2"
Controller β β Broker-2: "You are the LEADER for orders-partition-1"
"You are a BACKUP for orders-partition-0"
Controller β β Broker-3: "You are the LEADER for orders-partition-2"
"You are a BACKUP for orders-partition-1"
π‘ Simple: Like a manager assigning tasks to employees.
π€ Part 2: PRODUCER SENDS DATA
Step 4: Producer Wants to Send Message
Your application (Producer) has data to send:
ββββββββββββββββββββββββββββ
β I have a message: β
β β’ Topic: "orders" β
β β’ Key: "user_123" β
β β’ Value: {order data} β
β β
β "Where do I send this?" β
ββββββββββββββββββββββββββββ
π‘ Simple: Like having a letter but not knowing which post office to use.
Step 5: Producer Asks ANY Broker for Information
The producer can connect to any broker to get routing information:
Producer βββββββββββΊ Broker-1
"Where do I send "Let me check
messages for my copy of
'orders' topic?" the registry..."
Key Point: Every broker has a copy of the controller's metadata!
βββββββββββββββββββββββββββββββββββββ
β Broker-1's Copy of Metadata: β
β β’ orders-partition-0 β Broker-1 β
β β’ orders-partition-1 β Broker-2 β
β β’ orders-partition-2 β Broker-3 β
βββββββββββββββββββββββββββββββββββββ
π‘ Simple: Like asking a postman for directions - he has a map (copy of registry).
Step 6: Producer Calculates Which Partition
The producer's client library automatically determines the target partition:
1. Hash the key: hash("user_123") = 456789
2. Divide by partitions: 456789 % 3 = 0
3. Result: Goes to Partition 0
π‘ Simple: Like a sorting machine that knows exactly which box to put each item in.
Step 7: Producer Sends to Correct Broker
Now the producer sends directly to the partition leader:
Producer ββββββββββββΊ Broker-1 (Leader for P0)
Broker-1 receives:
1. β
Validates it's the leader for P0
2. πΎ Writes to disk: /kafka/data/orders-0/
3. π Assigns offset: 1251
4. β
Sends "OK!" back to producer
π‘ Simple: Like mailing a letter to the correct post office that handles your area.
Step 8: Broker Replicates to Followers (Background)
After writing, the leader broker automatically replicates to followers:
Broker-1 (Leader P0) βββCOPY MESSAGEββββΊ Broker-2 (Follower P0)
β
Copied!
ββββββββββββACK (Copied!)ββββββββ€
Broker-1 (Leader P0) βββCOPY MESSAGEββββΊ Broker-3 (Follower P0)
β
Copied!
Result: Data now exists on 3 brokers! πͺ
π‘ Simple: Like making photocopies of important documents and storing them in different safes.
π₯ Part 3: CONSUMER READS DATA
Step 9: Consumer Wants to Read Messages
Your application (Consumer) wants to read data:
ββββββββββββββββββββββββββββ
β I want to read from: β
β β’ Topic: "orders" β
β β’ Group: "my-group" β
β β
β "How do I start?" β
ββββββββββββββββββββββββββββ
π‘ Simple: Like wanting to read a book but not knowing which library has it.
Step 10: Consumer Connects and Gets Metadata
The consumer connects to any broker to get cluster information:
Consumer βββββββββββββΊ Broker-2
"Tell me everything "Here's the full
about 'orders'" cluster map!"
Broker returns:
βββββββββββββββββββββββββββββββββββββ
β Topic "orders" info: β
β β’ Partition 0 β Leader: Broker-1 β
β β’ Partition 1 β Leader: Broker-2 β
β β’ Partition 2 β Leader: Broker-3 β
βββββββββββββββββββββββββββββββββββββ
π‘ Simple: Like getting a mall directory showing which stores are on which floor.
Step 11: Consumer Joins Group (Controller Coordinates)
The consumer joins a consumer group for coordinated reading:
Consumer β Broker β Controller β
"I want to join "Let me assign
group 'my-group'" partitions..."
Controller decides:
βββββββββββββββββββββββββββββββββββββ
β Group "my-group" has 1 consumer β
β Topic "orders" has 3 partitions β
β β
β Assignment: β
β Consumer-A β [P0, P1, P2] β
β (gets all 3 partitions) β
βββββββββββββββββββββββββββββββββββββ
π‘ Simple: Like a teacher assigning homework to students.
Step 12: Consumer Reads and Tracks Progress
How Consumer Manages Reading Position:
STARTUP (Once):
Consumer β Broker: "Where did I leave off?"
Broker β Consumer: "P0: offset 1250, P1: offset 890, P2: offset 2100"
Consumer stores in memory β
CONTINUOUS READING:
Consumer fetches using local memory (no broker queries!)
β’ Fetch from P0: offset 1250 β 1300 β 1350 (tracked in memory)
β’ Fetch from P1: offset 890 β 940 β 990 (tracked in memory)
β’ Fetch from P2: offset 2100 β 2150 β 2200 (tracked in memory)
PERIODIC COMMIT (Every 5 seconds or after batch):
Consumer β Broker: "Save progress: P0=1350, P1=990, P2=2200"
π₯ KEY POINTS:
- Consumer reads position from broker ONCE at startup
- Tracks current position IN MEMORY while reading
- Saves progress back to broker PERIODICALLY (every 5 seconds by default)
π‘ Simple: Like checking your bookmark when opening a book, remembering your page while reading, and updating the bookmark occasionally.
Step 13: Consumer Pulls Data from Brokers
Consumer fetches data in parallel from all partition leaders:
Consumer ββββββββΊ Broker-1 (P0) β Returns 50 events
ββββββββββΊ Broker-2 (P1) β Returns 50 events
ββββββββββΊ Broker-3 (P2) β Returns 50 events
Total: 150 events fetched!
β
Process all events
(Your business logic)
β
All done! β
π‘ Simple: Like reading from multiple books at the same time, keeping track of your progress in each.
π§ Part 4: FAILURE HANDLING
SCENARIO A: Broker Fails (Controller Handles)
Broker-1 crashes! π₯
STEP 1: Controller Detects Failure
Controller β: "Broker-1 stopped responding!"
STEP 2: Controller Checks Metadata
βββββββββββββββββββββββββββββββββββββ
β Partition 0 (orders): β
β β’ Leader: Broker-1 π β
β β’ Followers: Broker-2 β
, Broker-3 β
β
β β
β Need new leader for P0! β
βββββββββββββββββββββββββββββββββββββ
STEP 3: Controller Elects New Leader
Controller β β Broker-2: "You are now LEADER for P0!"
Broker-2: "OK! I'm the new leader!"
STEP 4: Controller Updates Metadata
βββββββββββββββββββββββββββββββββββββ
β Partition 0 (orders): β
β β’ Leader: Broker-2 β (NEW!) β
β β’ Followers: Broker-3 β
β
β β’ Broker-1: π (removed) β
βββββββββββββββββββββββββββββββββββββ
STEP 5: Controller Notifies Everyone
Controller β All: "Metadata changed! P0 leader is now Broker-2!"
Total time: 2-3 seconds! β‘
π‘ Simple: Like when a teacher is absent, the principal quickly assigns a substitute.
SCENARIO B: Consumer Fails (Controller Rebalances)
Consumer-A crashes! π₯
STEP 1: Group Coordinator Detects
Coordinator: "Consumer-A missed 3 heartbeats!"
STEP 2: Coordinator Triggers Rebalance
Coordinator β Controller: "Group 'my-group' needs rebalancing!"
Controller β Other Consumers: "Stop! Rebalancing..."
STEP 3: Controller Reassigns Partitions
Before:
Consumer-A: [P0, P1] π (dead)
Consumer-B: [P2] β
After:
Consumer-B: [P0, P1, P2] β
(takes over all!)
STEP 4: Controller Notifies Consumer-B
Controller β Consumer-B: "You now read P0, P1, P2"
Consumer-B resumes reading:
β
Loads last saved positions
β
Continues from where Consumer-A left off
β
No messages lost!
π‘ Simple: Like when one waiter is sick, another takes over their tables.
π¨ The Complete Picture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KAFKA CLUSTER β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CONTROLLERS (The Managers π§ ) β β
β β β β
β β Controller-1 β Controller-2 Controller-3 β β
β β (Leader) (Follower) (Follower) β β
β β β β
β β Maintains METADATA REGISTRY β β
β β β’ Who's alive? β β
β β β’ Who's the leader? β β
β β β’ Who reads what? β β
β ββββββββββββββββββββ¬βββββββββββββββββββββββββββββ β
β β Commands & Notifications β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β BROKERS (The Workers πͺ) β β
β β β β
β β Broker-1 Broker-2 Broker-3 β β
β β P0 (L)β P1 (L)β P2 (L)β β β
β β P1 (F) P2 (F) P0 (F) β β
β β P2 (F) P0 (F) P1 (F) β β
β β β β
β β Stores & Serves Data β β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β β
βββββββββββΌβββββββββββββββΌβββββββββββββββΌββββββββββββββββ
β β β
β β β
Producer-1 Producer-2 Producer-3
β β β
ββββββββββββββββ΄βββββββββββββββ
β
All query metadata from any broker
β
ββββββββββββββββ΄βββββββββββββββ
β β
Consumer-A Consumer-B
Group: g1 Group: g1
Reads: P0,P1 Reads: P2
π Key Roles Summary
Controller (The Boss π)
MANAGES:
- β Who's alive? (broker health)
- β Who's in charge? (partition leaders)
- β Who reads what? (consumer assignments)
- β What exists? (topics, partitions)
DECIDES:
- β New leader when broker fails
- β Partition assignments for consumers
- β Where new partitions go
NOTIFIES:
- β Tells brokers their jobs
- β Updates everyone on changes
- β Coordinates rebalancing
Broker (The Worker π·)
STORES:
- β Actual data on disk
- β Log files for partitions
- β Copy of metadata (from controller)
SERVES:
- β Producer write requests
- β Consumer read requests
- β Metadata queries
REPLICATES:
- β Copies data to followers
- β Syncs with leader
- β Reports status to controller
Producer (The Sender π€)
DOES:
- β Creates messages
- β Queries metadata (from any broker)
- β Calculates partition (key hash)
- β Sends to correct broker
DOESN'T CARE ABOUT:
- β Controllers (transparent)
- β Followers (writes only to leader)
- β Other producers
Consumer (The Receiver π₯)
DOES:
- β Joins consumer group
- β Gets partition assignment
- β Tracks reading position in memory
- β Pulls from leader brokers
- β Saves progress periodically
DOESN'T CARE ABOUT:
- β How controller assigns partitions
- β Follower replicas
- β Other consumer groups
π― The Magic: Why This Works So Well
1. Separation of Concerns
- CONTROLLERS think π§
- BROKERS work πͺ
- Controllers don't handle data
- Brokers don't make decisions
- Like: Managers plan, workers execute
2. Everything Has a Backup
- Controllers: 3 copies (1 leader + 2 followers)
- Partitions: 3 copies (1 leader + 2 followers)
- Metadata: All brokers have a copy
- Result: If anything fails, backups take over!
3. Distributed = Fast + Reliable
- Multiple brokers = Parallel processing
- Multiple partitions = Load distribution
- Multiple replicas = No data loss
- Like: Many checkout lanes at a store
4. Automatic Recovery
- Broker fails β Controller elects new leader (seconds)
- Consumer fails β Controller reassigns partitions (seconds)
- Controller fails β Another controller becomes leader (seconds)
- All automatic! No human intervention needed!
I also took some help from Claude to understand and make a few concepts more visual.
Top comments (1)
This breakdown of Kafka's roles oddly reminds me of how a film set runsβdirectors, assistant directors, crew, and backup plans for every shot so the production never really stops even if something goes wrong.