DEV Community

shantanu mahakale
shantanu mahakale

Posted on

Quick Recap: Kafka

Why Kafka?

Apache Kafka is a distributed event-streaming platform built for high throughput, low latency, and real-time data processing. It provides a reliable way to send, receive, store, and process streams of records in a scalable and fault-tolerant way.

Kafka is widely used for event-driven systems, microservices communication, log aggregation, real-time analytics, and data pipelines across industries like finance, ecommerce, and social media.

Key Concepts

Topic

A topic is a named feed/category where data is published. Topics are immutable — data is only appended, not modified.

Partition

  • A topic is divided into partitions for parallel processing.
  • Each partition is an ordered log (append-only).
  • Replication happens across brokers, NOT across partitions.
  • More partitions = more scalability and parallelism.

Broker

  • A broker is a Kafka server.
  • A cluster is made up of many brokers.
  • Each broker stores multiple partitions.
  • Brokers also handle requests from producers and consumers.

Cluster

A Kafka cluster is a group of brokers working together. Data is distributed across brokers for fault tolerance and scalability. If one broker fails, another can serve the data thanks to replication.

Consumer Group

A set of consumers that work together to read data from a topic.

  • Each partition is consumed by only one consumer within a group.
  • Enables load balancing and parallel processing.
  • Allows scaling of consumers horizontally.

Producer – Publishes messages to Kafka topics.

Consumer – Reads messages from topics.

Offset – Position of a message inside a partition. Used to track how much data has been consumed.

Top comments (0)