Intro
For many years, our default method for data storage has been databases, which have effectively served us for decades.
However, in certain cases, it might be more beneficial to focus on events rather than static data.
Events
So, what exactly is an event? An event is a timestamped indication that a specific action has occurred.
Traditionally, storing events in databases can be a tad inefficient. Instead, we use a structure known as a log — an ordered sequence of events, each accompanied by some state and a description of the event.
This structure is intuitive and scalable.
Topics
Enter Kafka — a robust system designed to manage these logs. In the Kafka ecosystem, these logs are grouped under categories called "topics".
Kafka ensures durability by writing topics across multiple disks or servers, eliminating the risks associated with a single point of failure.
The retention of topics can range from brief periods to years, or even indefinitely, and their sizes can vary from minuscule to massive.
Each event captures a specific business occurrence.
Thinking in Kafka
Kafka prompts developers to prioritize events over static entities.
Gone are the days when the norm was building a monolithic application with a singular, expansive database.
Now, whether you're working with a single service or multiple ones, they can both consume from and produce to a Kafka topic.
The Possibilities
With data residing in these topics, it becomes feasible to establish services that execute real-time analysis by consuming messages directly from topics, as opposed to relying on overnight batch processes.
Kafka Connect
Kafka offers tools, like Kafka Connect, which facilitates the capturing of database changes as events and populates them into a topic — ready for any use case you envision.
Furthermore, Kafka Connect can export events from a topic into a database or any external system.
Kafka Streams
While manipulating topics, events can be grouped based on their key, allowing for subsequent aggregation or counting before directing them to another topic.
Kafka simplifies this process through its Kafka Streams API. It's a Java-based API that encompasses all the foundational elements and infrastructure, ensuring scalability and reliability without the developer needing to build these from scratch.
Outro
Hopefully, this provides an MVP-level foundational understanding of Kafka. Your next step? Dive into the code and experiment firsthand with the Kafka quick start guide.
Top comments (1)
Nice Kafka overview. In my experience, there is no absolute advantage of using Kafka over a DB or vice-versa. It all depends on your use case and architecture. That said, from a data Engineering POV, Kafka is often used as a stream based decoupler system that decouples Data producers from Data consumers in a data pipeline.