What is Kafka exactly?
Imagine a high-speed highway for data. That's essentially what Kafka is! It's a powerful tool for handling real-time data streams in large-scale systems. Think of it as a system that captures continuous data flow from various sources like applications, user interfaces, and servers, and stores it for further analysis and processing.
When to use Kafka?
Kafka is build for large scale systems that requires a high throughput. It's ideal for situations where real-time data transfer is crucial, such as managing payment transactions, Internet of Things (IoT) data flow, and monitoring systems. It's a popular choice in data platforms, event-driven architectures, and micro-services environments.
Example: Lets take an payment service where after successful payment we need to generate invoice send email, add database entry all at the same time. Along with subscription event where we have to capture analytics and enable users Kafka can help in streamlining this entire setup at scale and in real-time
Kafka Working
Kafka works on Kafka-protocol which is based on TCP protocol. It uses distributed systems server and client to process data.
Kafka-protocol
Kafka employs a binary protocol over TCP for efficient communication between clients and servers. This binary protocol is designed to minimize overhead and maximize performance. Unlike traditional protocols that require a handshake for every connection, Kafka's protocol establishes a long-lasting connection, reducing the overhead of repeated handshakes.
Kafka-server
Kafka-server is cluster of one or more brokers that handles distributed processing of events across various data-centers.
- Brokers are individual Kafka servers responsible for orchestrating the event streaming.
- A Kafka cluster is highly scalability and fault-tolerant, if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.
Kafka-client
Kafka clients are applications that interact with Kafka clusters to produce and consume events. Kafka-has many drivers available across various languages for Node.js, Python, C, C++, Java.
- Producers generate and send events to specific Kafka topics. They can operate asynchronously for high throughput or synchronously for guaranteed delivery.
- Consumers subscribe to topics of interest and actively poll the brokers for new events. They can be part of consumer groups, allowing multiple consumers to collaborate and share the load of processing events from a single topic.
Kafka-Terminologies
Event
We all are talking about event for quite some time, it simply is an action that has happened in the application. Events belong to a topic, it also has data(message) with it to help in further processing.Unlike traditional messaging systems, events are not deleted after consumption. We can define how long do we store an event in the Kafka broker or cluster.
Topic
Topic are the identification for an event. Similar to files in folder system topic helps organizing events to specific topic. Topics in Kafka are always multi-producer and multi-subscriber.
Partition
Imagine a topic as a book with multiple chapters (partitions). Each chapter (partition) can hold a sequence of events. When you produce an event, you essentially decide which chapter (partition) it belongs to based on its key.
- For each partition, one broker is designated as the leader, while others are followers. The leader handles reads and writes for the partition, while followers replicate data from the leader. If the leader fails, one of the followers is promoted to become the new leader.
- Kafka brokers maintain metadata about the cluster's state, including the topics, partitions, and their respective leaders. Clients fetch this metadata to determine which broker to send requests to.
Example of topic partition,entire event is assigned to a specific partition based on a partitioning key docs.
Kafka-APIs
All these API's are provided by client side packages to interact with Kafka broker/cluster servers
- The Admin API to manage and inspect topics, brokers, and other Kafka objects.
- The Producer API to publish (write) a stream of events to one or more Kafka topics.
- The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
- The Kafka Streams API to implement stream processing applications and micro-services. Helpful in doing aggregation and pre-processing some data like joints, windowing.
- The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka
More info on installing Kafka can be found here
Hope this give you basic idea of Kafka its working and its need. Will keep updating this series learning Kafka.
Top comments (0)