DEV Community

Cover image for Notes on Kafka: The Zookeeper
Eden Jose
Eden Jose

Posted on • Edited on

Notes on Kafka: The Zookeeper

Just like any distributed system, the nodes must keep a consensus-based communication with one another and this is referred to as the Gossip protocol. The data being exchanged usually contains the actual message or data payloads but there are also other network communications included in the data:

  • Nodes becoming available and requesting cluster membership
  • Configuration settings and management
  • Controller election events
  • Updates on the health status of workers

Now, the Zookeeper

Zookeeper maintains the cluster's metadata. It manages the following

  • manages the brokers and keeps a list of them
  • configuration information
  • health and sync status
  • cluster membership
  • helps in leader election
  • send notifications to Kafka in case of changes
  • number of ZooKeeper launched should on odd-number (3,5,7)
  • does not support consumer offsets with Kafka versions below v0.10

It is, in itself, a distributed system which is comprised of multiple nodes called ensemble. It also has:

  • leader, which handles the writes
  • followers, which are the rest fo the servers which handles the reads

NOTE:
Clients (producers and consumers) write to Kafka brokers.
Kafka brokers read and write to the ZooKeeper nodes.

Alt Text

Initially, Kafka cannot work without Zookeeper. However, Apache Kafka 2.8.0 was released April 2021 with alot of features and improvements, chief of which is the elimination of Apache ZooKeeper.

In version 2.8.0, the Kafka brokers will now lean on an internal implementation of the Raft census algorithm. This is still in the works thus production use is still not being recommended. For more information on this, you can check out Kafka needs No Keeper

The subject of Zookeeper certainly deserves its own series if we are to dig deeper into it. I might create a separate post or series that's entirely dedicated to it but for now, this explanation is sufficient. We will see Zookeeper again in the succeeding topics.

Similarly, you can check out this awesome links about Zookeeper:


Bringing it all together

In the complete Apache Kafka Distributed Architecture, we have a Kafka cluster which is comprised of multiple independent brokers.

Associated with the cluster is the Zookeeper environment which provided the metadata that the cluster needs to operate reliably. The metadata is constantly changing, thus cluster members and the zookeeper will be communicating continuously.

Alt Text

Now, these components will scale out to process the demands that are coming from both the producers and consumers to ensure reliability and availability.


The succeeding notes will discuss the messaging internals of Apache Kafka, specifically topics and messaged. If you'd like to know more, please proceed to the next note in the series.

Similarly, you can check out the following resources:


If you've enjoyed this short but concise article, I'll be glad to connect with you on Twitter!. You can also hit the Follow below to stay updated when there's new awesome contents! 😃


Top comments (0)