DEV Community

loading...
Cover image for Notes on Kafka: Distributed Systems

Notes on Kafka: Distributed Systems

Eden Jose
A cloud enthusiast, an IT Professional, and a problem-solver. I am either learning something new, running a mile, or planning my next 100 days.
Updated on ・3 min read

Now that we've learned what Kafka is and it's capability to scale out massively, we'll go over what a distributed system is and how Kafka is considered as one.

A system is a collection of resources that have instructions to achieve a specific goal. A distributed system is a system that consist of multiple workers or nodes, or sometimes as worker nodes. These are the Kafka brokers.

Alt Text

These nodes must coordinate with each other to ensure that the goal is achieved. Without coordination, it'll be hard to divide the work, and even to track who's working on what.


Controller Election

Like any organization, each workers have their own roles and responsibilities. There's also a general hierarchy where there's a supervisor that acts as the head. In Kafka, the supervisor is the controller.

Alt Text

The controller is a worker node that is elected to serve as the admin, and it is usually the node that's been around the longest. The controller's responsibility includes:

  • inventory of the available workers to take on the load
  • maintain the item list of work assigned to the workers
  • ensure active status of each worker

Getting Work Done

The controller decides which worker will take on a task that comes in. To do this, the controller must know:

  • Who's available to take on the task?
  • What risk policy should be considered in its decision?
  • How much redundancy should be implemented in case the assigned worker fails?

Alt Text

If the controller determines that it needs more than one worker for a task, it will promote the chosen worker as the leader. The promoted leaders will then each recruit two more peers to participate in the replication. The number of recruited peers is called the replication factor

A quorum is then formed when the committed peers take on a new role as a follower.

In the event that the assigned worker becomes unavailable, the work already done should not be lost. The controller must ensure the task given to a worker must at least be given to another.

Alt Text

In case one leader is not able to establish a quorum with the available peers, the controller reassigns the task to another worker which will become a leader and the new leader will retry to form a quorum with two peers again.


The succeeding notes will discuss what's the role of the Apache Zookeeper. If you'd like to know more, please proceed to the next note in the series.

Similarly, you can check out the following resources:


If you've enjoyed this short but concise article, I'll be glad to connect with you on Twitter!. You can also hit the Follow below to stay updated when there's new awesome contents! 😃

jeden image

Discussion (0)