DEV Community

Cover image for A Gentle Introduction to Event Streaming —Apache Kafka
Rishabh Agarwal
Rishabh Agarwal

Posted on

A Gentle Introduction to Event Streaming —Apache Kafka

Apache Kafka is a brilliant and important technology that is used for creating event-driven platforms. It has become fairly popular among developers in recent times, and chances are that you might find Kafka being used in several applications you use yourself. Kafka has revolutionized how we design software architectures and handle communication among several independent components of an application.

As the world around us becomes more and more intermingled with software and electronics technologies, there is a paradigm shift that we experience. A shift from a snapshot-based world to an event-driven world. This transformation is evident when we are using social media apps or reading news on applications such as Google News. When an event occurs in the natural world, it is reflected by the state change of application. Compare it to a newspaper that takes snapshots of the world at a fixed interval. With ever-growing penetration of technology, the use of such event-driven systems have become both a necessity and a requirement.

So What is Apache Kafka?

Let us take a look at the official definition provided by Confluent for Apache Kafka —

Apache Kafka is a real-time event-streaming platform that collects, stores, and processes messages. It provides great performance, that too at scale. On top of that, it provides capabilities such as stream processing, distributed logging, and pub-sub messaging.

While this statement surprisingly summarizes all the magic of Apache Kafka, there is not much for us to understand as a beginner. To understand this statement better, we will break this definition and understand parts of it.

What is event-streaming?

Before talking about an event stream, we need to know what an event is.

An event (or message) is something that “happened” in your world or business. This event can be a reading from an IoT device, notification of stock price hitting a threshold, completion of a process, or anything which fits the definition of “happened.”

Equipped with the knowledge of an event, we can now make efforts to understand what an event stream is.

An event (or message) stream is an endless queue of events. The events enter the queue from one end and are consumed from the other. Literally, an event stream is just a stream of events.

Thus now, the first line of the definition starts to make sense. Apache Kafka is a piece of technology that allows us to manage these event streams. It is responsible for collecting these event streams from the place of generation and storing them to allow for future use. It also gives us the ability to process messages in the stream.

Depiction of Events and Event Stream

An event or message in Kafka consists of two parts — Key and Value. While both the key and the value can be complex objects, it is a general practice to keep the key as a primitive value.

What are the Topics in Kafka?

Topics are the way by which Kafka organizes various event streams. Each topic in Kafka should have a unique name. As a developer, we always work at the abstraction of topics. Events are published to a topic and read from a topic. For starters, Kafka's topic can be considered a table in a relational database.

A single Kafka topic can have several consumers, each consuming from different positions, independently of each other. Kafka partitions topics across various Kafka nodes. The partitions are important because they enable parallelization of topics, enabling high message throughput.

Topics are partitioned across several brokers/nodes.

Messages from a particular topic can be processed and then published to a new topic. For example — Consider a Kafka topic where events for any user rating a movie are published. Now suppose that there is a need to process only those messages where users have rated poorly. This can be achieved by processing messages from the first Kafka topic, filtering them, and then publishing it to a new Kafka topic.

Producers & Consumers in Kafka

Producers are client applications that publish (write) events to Kafka, while consumers are those that subscribe to (receive and analyze) these events. Producers and consumers are totally disconnected and agnostic of each other in Kafka, which is a critical design aspect for achieving the tremendous scalability that Kafka is known for. For example, manufacturers never have to wait for customers. Kafka provides several guarantees, including the ability to process events precisely once.

Source https://kafka.apache.org/intro

Conclusion

The potential of Kafka is vast, and this article only touched the tip of the huge iceberg. Continue on your journey to Apache Kafka by trying it on your own here.

A cross post from my Medium page.

Top comments (0)