Quick Recap: Kafka

Why Kafka?

Apache Kafka is a distributed event-streaming platform built for high throughput, low latency, and real-time data processing. It provides a reliable way to send, receive, store, and process streams of records in a scalable and fault-tolerant way.

Kafka is widely used for event-driven systems, microservices communication, log aggregation, real-time analytics, and data pipelines across industries like finance, ecommerce, and social media.

Key Concepts

Topic

A topic is a named feed/category where data is published. Topics are immutable — data is only appended, not modified.

Partition

A topic is divided into partitions for parallel processing.
Each partition is an ordered log (append-only).
Replication happens across brokers, NOT across partitions.
More partitions = more scalability and parallelism.

Broker

A broker is a Kafka server.
A cluster is made up of many brokers.
Each broker stores multiple partitions.
Brokers also handle requests from producers and consumers.

Cluster

A Kafka cluster is a group of brokers working together. Data is distributed across brokers for fault tolerance and scalability. If one broker fails, another can serve the data thanks to replication.

Consumer Group

A set of consumers that work together to read data from a topic.

Each partition is consumed by only one consumer within a group.
Enables load balancing and parallel processing.
Allows scaling of consumers horizontally.

Producer – Publishes messages to Kafka topics.

Consumer – Reads messages from topics.

Offset – Position of a message inside a partition. Used to track how much data has been consumed.

DEV Community

Quick Recap: Kafka

Why Kafka?

Key Concepts

Top comments (0)