DEV Community

Cover image for Apache Kafka Fundamentals — A Complete Technical Guide
ZeeshanAli-0704
ZeeshanAli-0704

Posted on • Edited on

Apache Kafka Fundamentals — A Complete Technical Guide

Table of Contents

  1. Introduction
  2. Kafka Core Concepts
  3. Producers
  4. Consumers
  5. Kafka Scaling Concepts
  6. Kafka Streams
  7. Kafka Delivery Guarantees
  8. Hands On Tips
  9. Conclusion

Apache Kafka Fundamentals — A Complete Technical Guide

Introduction

Apache Kafka is a distributed, fault-tolerant streaming platform built to handle high-throughput, real-time data pipelines and messaging systems. Originally developed by LinkedIn, Kafka has become a backbone for event-driven architectures across companies like Netflix, PayPal, Tesla, and Pinterest.

Kafka allows applications to publish, subscribe, store, and process streams of records in a scalable and reliable way. It’s used for real-time analytics, log aggregation, messaging, and event sourcing.

In this guide, we’ll explore Kafka’s core concepts, architecture, producers, consumers, streams, delivery guarantees, scaling strategies, and hands-on tips with examples.


Kafka Core Concepts

Topics and Partitions

  • Topic: A category or feed name where messages are stored. Think of it like a folder for messages.
  • Partition: A topic can be split into multiple partitions, which allow parallelism and ordering of messages within the partition.

Example:

Topic: orders
Partitions: 3
Partition 0 → Messages for US
Partition 1 → Messages for Europe
Partition 2 → Messages for Asia
Enter fullscreen mode Exit fullscreen mode

Best Practices:

  • Topic names should reflect business semantics, e.g., orders, payments, user-signups.
  • Partitions control throughput & scaling, not semantics.

Brokers-and-Clusters

  • Broker: A Kafka server responsible for storing data and serving clients.
  • Cluster: Multiple brokers form a cluster. One broker acts as controller, handling partition leadership and cluster metadata.

Replication ensures fault tolerance:

  • If a broker fails, another replica takes over.
  • Example: replication-factor=3 → each partition stored on 3 brokers.

Zookeeper and kraft

  • Legacy Kafka relies on ZooKeeper for cluster metadata, leader elections, and broker coordination.
  • Modern Kafka (2.8+) uses KRaft mode, removing the ZooKeeper dependency for simpler cluster management.

Producers

Producers send data to Kafka topics. They handle serialization, partitioning, and delivery guarantees.

Producer Responsibilities

  • Publish messages to topics.
  • Ensure ordering per partition.
  • Serialize messages for storage.

Sending Strategies

Strategy Description Use Case
Fire-and-forget Send without waiting for ack High-speed logging
Synchronous Waits for broker acknowledgment Critical transactional data
Asynchronous Callback-based confirmation Balanced performance & reliability

Acknowledgment Levels

  • acks=0 → No confirmation, fastest but unsafe
  • acks=1 → Leader confirms, faster but risk of data loss if leader fails
  • acks=all/-1 → All replicas confirm, safest option

Serialization

Kafka stores byte arrays.

  • Built-in serializers: String, Integer, ByteArray
  • Custom serializers: JSON, Avro, Protobuf

Example:

ProducerRecord<String, String> record = 
   new ProducerRecord<>("orders", "order123", "{ \"user\": \"John\", \"amount\": 99.99 }");
producer.send(record);
Enter fullscreen mode Exit fullscreen mode

Consumers

Consumers read messages from Kafka topics and process them.

Consumer Groups

  • Multiple consumers can share partitions in a group.
  • Each partition is read by only one consumer in a group.
  • Multiple groups = independent consumption.

Example:

Topic: orders (3 partitions)
Consumer Group: order-processors (2 consumers)
Partition 0 → Consumer 1
Partition 1 → Consumer 2
Partition 2 → Consumer 1
Enter fullscreen mode Exit fullscreen mode

Offset Management

  • Offset tracks which messages have been read.
  • Can be auto-committed or manual commit.

Warning: Mismanaged offsets → duplicate processing or message loss.


Partition Rebalancing

  • Triggered when a consumer joins/leaves or partitions change.
  • Temporarily pauses consumption while rebalancing.

Kafka Scaling Concepts

Kafka can handle millions of messages/sec using:

  • Partitioning → parallelism for production & consumption
  • Consumer Groups → horizontal scaling
  • Replication → fault tolerance
  • Retention & Compaction → storage management
  • Compression → reduce network load (gzip, snappy, lz4)
  • Multi-Cluster & Mirror Maker → cross-cluster replication

Illustration Idea: Partitioned topic across multiple brokers with replication.


Kafka Streams

Kafka Streams is a library for real-time stream processing:

  • Supports stateless & stateful operations
  • Windowing → aggregate over time windows
  • Joins → join streams & tables
  • Stream ↔ Table duality (KStreamKTable)

Example: Count orders per minute:

KStream<String, String> orders = builder.stream("orders");
KTable<Windowed<String>, Long> count = orders
    .groupByKey()
    .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
    .count();
Enter fullscreen mode Exit fullscreen mode

Kafka Delivery Guarantees

Kafka supports three levels of message delivery:

  1. At-most-once → Messages may be lost
  2. At-least-once → Messages may be duplicated
  3. Exactly-once → No duplicates, no loss

Exactly-once guarantees require:

  • Idempotent producers
  • Transactions for atomic writes across partitions
  • Atomic offset commits

Configuration Example:

enable.idempotence=true
acks=all
retries=2147483647
max.inflight.requests.per.connection=1
isolation.level=read_committed
processing.mode=exactly_once
Enter fullscreen mode Exit fullscreen mode

Hands On Tips

  • Use Docker / Docker Compose to run Kafka locally.

  • Useful CLI tools:

    • kafka-topics.sh → manage topics
    • kafka-console-producer.sh → produce messages
    • kafka-console-consumer.sh → consume messages
  • Start small: produce/consume messages, explore partitions and offsets.

Example CLI Commands:

# Create a topic
kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

# Produce messages
kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092

# Consume messages
kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092
Enter fullscreen mode Exit fullscreen mode

Conclusion

Kafka is a high-performance, distributed streaming platform ideal for real-time messaging, analytics, and event-driven systems. Understanding topics, partitions, brokers, producers, consumers, and delivery guarantees is essential for building reliable Kafka architectures.

By practicing with topics, partitions, producers, consumers, and streams, developers can unlock Kafka’s true power in high-throughput, fault-tolerant, and scalable applications.


More Details:

Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali
GitHub: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Top comments (0)