Please read the last two articles in my Kafka series before this one โ this part gets serious.
๐ The Big Picture: Two Brains Working Together
Think of Kafka as a well-organized company with two main components:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ KAFKA CLUSTER โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CONTROLLERS (The Brain ๐ง ) โ โ
โ โ โข Manage who does what โ โ
โ โ โข Track what's happening โ โ
โ โ โข Make decisions โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Commands & Updates โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ BROKERS (The Workers ๐ช) โ โ
โ โ โข Store the actual data โ โ
โ โ โข Serve producers and consumers โ โ
โ โ โข Follow controller's instructionsโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple Analogy: Controllers are like managers who plan and coordinate, while Brokers are employees who do the actual work.
๐ Part 1: SETUP - Controllers Organize Everything
Step 1: Controllers Start and Elect Leader
When Kafka starts, multiple controllers use the Raft Election algorithm to choose one leader:
Controller-1 Controller-2 Controller-3
โ โ โ
โโโโโโโโRaft Electionโโโโโโโโโโโโ
โ
One becomes LEADER โญ
โ
โโโโโโโโโโโโโโโโโโ
โ Controller-1 โ
โ (LEADER) โญ โ โ Makes all decisions!
โโโโโโโโโโโโโโโโโโ
๐ก Simple: Like choosing a class monitor who manages everything.
Step 2: Controllers Create Metadata Registry
The Controller Leader creates a comprehensive "notebook" ๐ of everything happening in the cluster:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ METADATA REGISTRY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ TOPICS: โ
โ โข "orders" โ 3 partitions โ
โ โข "payments" โ 2 partitions โ
โ โ
โ BROKERS: โ
โ โข Broker-1 โ Alive โ
(IP: 192.168.1.10)โ
โ โข Broker-2 โ Alive โ
(IP: 192.168.1.11)โ
โ โข Broker-3 โ Alive โ
(IP: 192.168.1.12)โ
โ โ
โ WHO'S IN CHARGE (Leaders): โ
โ โข orders-partition-0 โ Broker-1 โ
โ โข orders-partition-1 โ Broker-2 โ
โ โข orders-partition-2 โ Broker-3 โ
โ โ
โ BACKUPS (Followers): โ
โ โข orders-partition-0 โ Broker-2, Broker-3โ
โ โข orders-partition-1 โ Broker-3, Broker-1โ
โ โข orders-partition-2 โ Broker-1, Broker-2โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like a school register showing which classes exist, which teachers are present, who teaches which subject, and who are the substitute teachers.
Step 3: Controller Tells Brokers Their Jobs
The controller assigns specific roles to each broker:
Controller โญ โ Broker-1: "You are the LEADER for orders-partition-0"
"You are a BACKUP for orders-partition-2"
Controller โญ โ Broker-2: "You are the LEADER for orders-partition-1"
"You are a BACKUP for orders-partition-0"
Controller โญ โ Broker-3: "You are the LEADER for orders-partition-2"
"You are a BACKUP for orders-partition-1"
๐ก Simple: Like a manager assigning tasks to employees.
๐ค Part 2: PRODUCER SENDS DATA
Step 4: Producer Wants to Send Message
Your application (Producer) has data to send:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ I have a message: โ
โ โข Topic: "orders" โ
โ โข Key: "user_123" โ
โ โข Value: {order data} โ
โ โ
โ "Where do I send this?" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like having a letter but not knowing which post office to use.
Step 5: Producer Asks ANY Broker for Information
The producer can connect to any broker to get routing information:
Producer โโโโโโโโโโโบ Broker-1
"Where do I send "Let me check
messages for my copy of
'orders' topic?" the registry..."
Key Point: Every broker has a copy of the controller's metadata!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Broker-1's Copy of Metadata: โ
โ โข orders-partition-0 โ Broker-1 โ
โ โข orders-partition-1 โ Broker-2 โ
โ โข orders-partition-2 โ Broker-3 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like asking a postman for directions - he has a map (copy of registry).
Step 6: Producer Calculates Which Partition
The producer's client library automatically determines the target partition:
1. Hash the key: hash("user_123") = 456789
2. Divide by partitions: 456789 % 3 = 0
3. Result: Goes to Partition 0
๐ก Simple: Like a sorting machine that knows exactly which box to put each item in.
Step 7: Producer Sends to Correct Broker
Now the producer sends directly to the partition leader:
Producer โโโโโโโโโโโโบ Broker-1 (Leader for P0)
Broker-1 receives:
1. โ
Validates it's the leader for P0
2. ๐พ Writes to disk: /kafka/data/orders-0/
3. ๐ Assigns offset: 1251
4. โ
Sends "OK!" back to producer
๐ก Simple: Like mailing a letter to the correct post office that handles your area.
Step 8: Broker Replicates to Followers (Background)
After writing, the leader broker automatically replicates to followers:
Broker-1 (Leader P0) โโโCOPY MESSAGEโโโโบ Broker-2 (Follower P0)
โ
Copied!
โโโโโโโโโโโโACK (Copied!)โโโโโโโโค
Broker-1 (Leader P0) โโโCOPY MESSAGEโโโโบ Broker-3 (Follower P0)
โ
Copied!
Result: Data now exists on 3 brokers! ๐ช
๐ก Simple: Like making photocopies of important documents and storing them in different safes.
๐ฅ Part 3: CONSUMER READS DATA
Step 9: Consumer Wants to Read Messages
Your application (Consumer) wants to read data:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ I want to read from: โ
โ โข Topic: "orders" โ
โ โข Group: "my-group" โ
โ โ
โ "How do I start?" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like wanting to read a book but not knowing which library has it.
Step 10: Consumer Connects and Gets Metadata
The consumer connects to any broker to get cluster information:
Consumer โโโโโโโโโโโโโบ Broker-2
"Tell me everything "Here's the full
about 'orders'" cluster map!"
Broker returns:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Topic "orders" info: โ
โ โข Partition 0 โ Leader: Broker-1 โ
โ โข Partition 1 โ Leader: Broker-2 โ
โ โข Partition 2 โ Leader: Broker-3 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like getting a mall directory showing which stores are on which floor.
Step 11: Consumer Joins Group (Controller Coordinates)
The consumer joins a consumer group for coordinated reading:
Consumer โ Broker โ Controller โญ
"I want to join "Let me assign
group 'my-group'" partitions..."
Controller decides:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Group "my-group" has 1 consumer โ
โ Topic "orders" has 3 partitions โ
โ โ
โ Assignment: โ
โ Consumer-A โ [P0, P1, P2] โ
โ (gets all 3 partitions) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Simple: Like a teacher assigning homework to students.
Step 12: Consumer Reads and Tracks Progress
How Consumer Manages Reading Position:
STARTUP (Once):
Consumer โ Broker: "Where did I leave off?"
Broker โ Consumer: "P0: offset 1250, P1: offset 890, P2: offset 2100"
Consumer stores in memory โ
CONTINUOUS READING:
Consumer fetches using local memory (no broker queries!)
โข Fetch from P0: offset 1250 โ 1300 โ 1350 (tracked in memory)
โข Fetch from P1: offset 890 โ 940 โ 990 (tracked in memory)
โข Fetch from P2: offset 2100 โ 2150 โ 2200 (tracked in memory)
PERIODIC COMMIT (Every 5 seconds or after batch):
Consumer โ Broker: "Save progress: P0=1350, P1=990, P2=2200"
๐ฅ KEY POINTS:
- Consumer reads position from broker ONCE at startup
- Tracks current position IN MEMORY while reading
- Saves progress back to broker PERIODICALLY (every 5 seconds by default)
๐ก Simple: Like checking your bookmark when opening a book, remembering your page while reading, and updating the bookmark occasionally.
Step 13: Consumer Pulls Data from Brokers
Consumer fetches data in parallel from all partition leaders:
Consumer โโโโโโโโบ Broker-1 (P0) โ Returns 50 events
โโโโโโโโโโบ Broker-2 (P1) โ Returns 50 events
โโโโโโโโโโบ Broker-3 (P2) โ Returns 50 events
Total: 150 events fetched!
โ
Process all events
(Your business logic)
โ
All done! โ
๐ก Simple: Like reading from multiple books at the same time, keeping track of your progress in each.
๐ง Part 4: FAILURE HANDLING
SCENARIO A: Broker Fails (Controller Handles)
Broker-1 crashes! ๐ฅ
STEP 1: Controller Detects Failure
Controller โญ: "Broker-1 stopped responding!"
STEP 2: Controller Checks Metadata
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Partition 0 (orders): โ
โ โข Leader: Broker-1 ๐ โ
โ โข Followers: Broker-2 โ
, Broker-3 โ
โ
โ โ
โ Need new leader for P0! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
STEP 3: Controller Elects New Leader
Controller โญ โ Broker-2: "You are now LEADER for P0!"
Broker-2: "OK! I'm the new leader!"
STEP 4: Controller Updates Metadata
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Partition 0 (orders): โ
โ โข Leader: Broker-2 โญ (NEW!) โ
โ โข Followers: Broker-3 โ
โ
โ โข Broker-1: ๐ (removed) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
STEP 5: Controller Notifies Everyone
Controller โ All: "Metadata changed! P0 leader is now Broker-2!"
Total time: 2-3 seconds! โก
๐ก Simple: Like when a teacher is absent, the principal quickly assigns a substitute.
SCENARIO B: Consumer Fails (Controller Rebalances)
Consumer-A crashes! ๐ฅ
STEP 1: Group Coordinator Detects
Coordinator: "Consumer-A missed 3 heartbeats!"
STEP 2: Coordinator Triggers Rebalance
Coordinator โ Controller: "Group 'my-group' needs rebalancing!"
Controller โ Other Consumers: "Stop! Rebalancing..."
STEP 3: Controller Reassigns Partitions
Before:
Consumer-A: [P0, P1] ๐ (dead)
Consumer-B: [P2] โ
After:
Consumer-B: [P0, P1, P2] โ
(takes over all!)
STEP 4: Controller Notifies Consumer-B
Controller โ Consumer-B: "You now read P0, P1, P2"
Consumer-B resumes reading:
โ
Loads last saved positions
โ
Continues from where Consumer-A left off
โ
No messages lost!
๐ก Simple: Like when one waiter is sick, another takes over their tables.
๐จ The Complete Picture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ KAFKA CLUSTER โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CONTROLLERS (The Managers ๐ง ) โ โ
โ โ โ โ
โ โ Controller-1 โญ Controller-2 Controller-3 โ โ
โ โ (Leader) (Follower) (Follower) โ โ
โ โ โ โ
โ โ Maintains METADATA REGISTRY โ โ
โ โ โข Who's alive? โ โ
โ โ โข Who's the leader? โ โ
โ โ โข Who reads what? โ โ
โ โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Commands & Notifications โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ BROKERS (The Workers ๐ช) โ โ
โ โ โ โ
โ โ Broker-1 Broker-2 Broker-3 โ โ
โ โ P0 (L)โญ P1 (L)โญ P2 (L)โญ โ โ
โ โ P1 (F) P2 (F) P0 (F) โ โ
โ โ P2 (F) P0 (F) P1 (F) โ โ
โ โ โ โ
โ โ Stores & Serves Data โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โ โ โ
โ โ โ
Producer-1 Producer-2 Producer-3
โ โ โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
โ
All query metadata from any broker
โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
โ โ
Consumer-A Consumer-B
Group: g1 Group: g1
Reads: P0,P1 Reads: P2
๐ Key Roles Summary
Controller (The Boss ๐)
MANAGES:
- โ Who's alive? (broker health)
- โ Who's in charge? (partition leaders)
- โ Who reads what? (consumer assignments)
- โ What exists? (topics, partitions)
DECIDES:
- โ New leader when broker fails
- โ Partition assignments for consumers
- โ Where new partitions go
NOTIFIES:
- โ Tells brokers their jobs
- โ Updates everyone on changes
- โ Coordinates rebalancing
Broker (The Worker ๐ท)
STORES:
- โ Actual data on disk
- โ Log files for partitions
- โ Copy of metadata (from controller)
SERVES:
- โ Producer write requests
- โ Consumer read requests
- โ Metadata queries
REPLICATES:
- โ Copies data to followers
- โ Syncs with leader
- โ Reports status to controller
Producer (The Sender ๐ค)
DOES:
- โ Creates messages
- โ Queries metadata (from any broker)
- โ Calculates partition (key hash)
- โ Sends to correct broker
DOESN'T CARE ABOUT:
- โ Controllers (transparent)
- โ Followers (writes only to leader)
- โ Other producers
Consumer (The Receiver ๐ฅ)
DOES:
- โ Joins consumer group
- โ Gets partition assignment
- โ Tracks reading position in memory
- โ Pulls from leader brokers
- โ Saves progress periodically
DOESN'T CARE ABOUT:
- โ How controller assigns partitions
- โ Follower replicas
- โ Other consumer groups
๐ฏ The Magic: Why This Works So Well
1. Separation of Concerns
- CONTROLLERS think ๐ง
- BROKERS work ๐ช
- Controllers don't handle data
- Brokers don't make decisions
- Like: Managers plan, workers execute
2. Everything Has a Backup
- Controllers: 3 copies (1 leader + 2 followers)
- Partitions: 3 copies (1 leader + 2 followers)
- Metadata: All brokers have a copy
- Result: If anything fails, backups take over!
3. Distributed = Fast + Reliable
- Multiple brokers = Parallel processing
- Multiple partitions = Load distribution
- Multiple replicas = No data loss
- Like: Many checkout lanes at a store
4. Automatic Recovery
- Broker fails โ Controller elects new leader (seconds)
- Consumer fails โ Controller reassigns partitions (seconds)
- Controller fails โ Another controller becomes leader (seconds)
- All automatic! No human intervention needed!
I also took some help from Claude to understand and make a few concepts more visual.
Top comments (0)