DEV Community

Cover image for Learning Kafka Part Two: Core Components of Kafka
Muhammad Ibrahim Anis
Muhammad Ibrahim Anis

Posted on

Learning Kafka Part Two: Core Components of Kafka

Welcome to the second part of Learning Kafka.

Previously, we were introduced to Kafka, also, briefly touched on its origins and some of its features. Furthermore, we also learned some vocabularies that will make our journey much easier, like distributed systems, nodes, durability, scalability etc.

This time, a look at the core components of Kafka, that is to say those components of Kafka that resides within Kafka itself as oppose to any component that interacts with it.
By the end of this section, we will have a better understanding of brokers, topics, partitions, messages, and offsets. We will also touch briefly on Zookeeper and Kraft.

Cluster/Brokers

A cluster is a group of systems working together to achieve a shared goal. And each system in a cluster is called a server or a node. Likewise, a Kafka cluster is a system that consist of several nodes running Kafka. A single node is referred to as a broker.
A broker is responsible for hosting of topics and partitions (more on topics and partitions later), and write messages to storage. Also, each broker must have a unique identifier. Amongst these brokers, one is elected as the controller, while the others are designated as followers.

A Kafka Cluster with three brokers

In addition to the usual broker responsibilities, the controller is also responsible for managing partitions, storing events and general administrative tasks. Technically, all the brokers in a cluster are controllers, but a cluster can only have one active controller at any given time.
One of the main functions of Zookeeper in Kafka is to handle broker election.

Topics/Partitions

In Kafka, a topic is a logical grouping of events (also called messages). We can think of a topic as a folder in a filesystem. Each topic name must be unique.

Topics are further divided into partitions. When we write messages to Kafka, we are actually writing them to partitions. The number of partitions for a topic is stated when creating the topic, though it can be increased later, but it cannot be decreased. A topic can have as many partitions as needed. In a cluster, partitions of the same topic are distributed across multiple brokers, this ensures scalability and high-throughput.

A broker topics and partitions

Messages/offsets

Remember events? Well in Kafka, events are referred to as messages. Messages are similar to files in a folder or a row in a table. Folder and table are comparable to a Kafka topic. Messages contain a key (optional), value, and a timestamp. Example of a message;

Key: Thermostat 1 (optional)
Value: Temperature reading 40 oC
Timestamp: 2022-12-24 at 01:48 a.m.

Sending messages individually to Kafka would result in excessive overhead, for efficiency, messages are written to Kafka in batches. A batch is a group of messages, which are being sent to the same topic and partition.

When writing messages to Kafka, we can specify which partition to send the message, if not specified, messages with the same key will be sent to the same partition. For example, all messages from Thermostat 1 will end up in the same partition. If the partition is not specified and the message has no key, then the application sending the message (which is called a producer) will use the sticky partitioning strategy. Sticky partitioning as in the application randomly picks a partition and send a batch of message to it. Then it repeats this process all over again. That way, after a while, messages are evenly distributed among all partitions.

Offset is a unique integer identifier assigned to messages in a partition. A Kafka broker assigns an offset sequentially to every message written to it. Offset is also used by applications reading messages from Kafka (called consumers) to keep track of messages they have consumed. This prevents them from consuming the same message twice.

A kafka broker with topics and partitions, each partition shows the number of messages in it and its position

Even though Zookeeper does not reside within a Kafka cluster, Kraft will. So, to understand what Kraft is and why we need it, we first must know the role Zookeeper plays in Kafka.

Apache Zookeeper

“Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group service.”

So, in English, Zookeeper is a distributed system (yes, another one) that is used to manage and coordinate other distributed systems.
Kafka needs Zookeeper to run, in fact, Kafka will not start without Zookeeper already running.
In Kafka, Zookeeper is responsible for the metadata configuration about the Kafka cluster, like;

  • Cluster Membership: which broker belongs to the cluster, if its available or not.
  • Access Control: which client application have read and/or write permission to the Kafka cluster.
  • Topics Configuration: list of existing topics and the number of partitions for each topic.
  • Controller Election: keeps track of which broker is currently the controller and handles reelection when the controller shuts down.

KRaft

Even though Zookeeper comes bundled with Kafka, it is a full-fledged Apache Project in its own right. In production, we are most likely going run a Zookeeper Cluster (called an Ensemble) separately from our Kafka Cluster. Zookeeper is lightweight, fast and easy to set up, but using it with Kafka comes with a few limitations.

Running Zookeeper alongside Kafka adds another layer of complexity for tuning, maintenance and monitoring. Instead of maintaining and monitoring a single system, we now have to monitor our Kafka cluster and our Zookeeper ensemble. Also, as the cluster grows, the whole system can grow cumbersome and there is a noticeable lag in performance, especially when a broker(s) in the cluster is restarting.

Kafka Raft or Kraft for short aims to replace Apache Zookeeper with Kafka topics and the Raft consensus to make Kafka self-managed. (Read more on the Raft consensus here). The Kafka cluster metadata configuration is now stored inside Kafka itself in a topic. This makes metadata operations faster and more scalable as the cluster need not communicate with an external system (i.e., Zookeeper).
Kraft has been in testing since Kafka version 2.8 (released April 2021). In version 3.3, it was marked production ready. Zookeeper will be removed totally as a dependency from Kafka in 4.0.

This brings us to the end this part, we have discussed about Kafka cluster, brokers, topics, partitions, messages and offsets, also a brief overview Apache Zookeeper and Kafka Raft.
Next, Kafka’s architectural design.

Top comments (0)