Why Kafka?
Apache Kafka is a distributed event-streaming platform built for high throughput, low latency, and real-time data processing. It provides a reliable way to send, receive, store, and process streams of records in a scalable and fault-tolerant way.
Kafka is widely used for event-driven systems, microservices communication, log aggregation, real-time analytics, and data pipelines across industries like finance, ecommerce, and social media.
Key Concepts
Topic
A topic is a named feed/category where data is published. Topics are immutable — data is only appended, not modified.
Partition
- A topic is divided into partitions for parallel processing.
- Each partition is an ordered log (append-only).
- Replication happens across brokers, NOT across partitions.
- More partitions = more scalability and parallelism.
Broker
- A broker is a Kafka server.
- A cluster is made up of many brokers.
- Each broker stores multiple partitions.
- Brokers also handle requests from producers and consumers.
Cluster
A Kafka cluster is a group of brokers working together. Data is distributed across brokers for fault tolerance and scalability. If one broker fails, another can serve the data thanks to replication.
Consumer Group
A set of consumers that work together to read data from a topic.
- Each partition is consumed by only one consumer within a group.
- Enables load balancing and parallel processing.
- Allows scaling of consumers horizontally.
Producer – Publishes messages to Kafka topics.
Consumer – Reads messages from topics.
Offset – Position of a message inside a partition. Used to track how much data has been consumed.
Top comments (0)