ZeeshanAli-0704

Posted on May 6 • Edited on Sep 13

Apache Kafka Fundamentals — A Complete Technical Guide

#systemdesign #systemdesignwithzeeshanali

Introduction
Kafka Core Concepts
Producers
Consumers
Kafka Scaling Concepts
Kafka Streams
Kafka Delivery Guarantees
Hands On Tips
Conclusion

Apache Kafka Fundamentals — A Complete Technical Guide

Introduction

Apache Kafka is a distributed, fault-tolerant streaming platform built to handle high-throughput, real-time data pipelines and messaging systems. Originally developed by LinkedIn, Kafka has become a backbone for event-driven architectures across companies like Netflix, PayPal, Tesla, and Pinterest.

Kafka allows applications to publish, subscribe, store, and process streams of records in a scalable and reliable way. It’s used for real-time analytics, log aggregation, messaging, and event sourcing.

In this guide, we’ll explore Kafka’s core concepts, architecture, producers, consumers, streams, delivery guarantees, scaling strategies, and hands-on tips with examples.

Kafka Core Concepts

Topics and Partitions

Topic: A category or feed name where messages are stored. Think of it like a folder for messages.
Partition: A topic can be split into multiple partitions, which allow parallelism and ordering of messages within the partition.

Example:

Topic: orders
Partitions: 3
Partition 0 → Messages for US
Partition 1 → Messages for Europe
Partition 2 → Messages for Asia

Best Practices:

Topic names should reflect business semantics, e.g., orders, payments, user-signups.
Partitions control throughput & scaling, not semantics.

Brokers-and-Clusters

Broker: A Kafka server responsible for storing data and serving clients.
Cluster: Multiple brokers form a cluster. One broker acts as controller, handling partition leadership and cluster metadata.

Replication ensures fault tolerance:

If a broker fails, another replica takes over.
Example: replication-factor=3 → each partition stored on 3 brokers.

Zookeeper and kraft

Legacy Kafka relies on ZooKeeper for cluster metadata, leader elections, and broker coordination.
Modern Kafka (2.8+) uses KRaft mode, removing the ZooKeeper dependency for simpler cluster management.

Producers

Producers send data to Kafka topics. They handle serialization, partitioning, and delivery guarantees.

Producer Responsibilities

Publish messages to topics.
Ensure ordering per partition.
Serialize messages for storage.

Sending Strategies

Strategy	Description	Use Case
Fire-and-forget	Send without waiting for ack	High-speed logging
Synchronous	Waits for broker acknowledgment	Critical transactional data
Asynchronous	Callback-based confirmation	Balanced performance & reliability

Acknowledgment Levels

acks=0 → No confirmation, fastest but unsafe
acks=1 → Leader confirms, faster but risk of data loss if leader fails
acks=all/-1 → All replicas confirm, safest option

Serialization

Kafka stores byte arrays.

Built-in serializers: String, Integer, ByteArray
Custom serializers: JSON, Avro, Protobuf

Example:

ProducerRecord<String, String> record = 
   new ProducerRecord<>("orders", "order123", "{ \"user\": \"John\", \"amount\": 99.99 }");
producer.send(record);

Consumers

Consumers read messages from Kafka topics and process them.

Consumer Groups

Multiple consumers can share partitions in a group.
Each partition is read by only one consumer in a group.
Multiple groups = independent consumption.

Example:

Topic: orders (3 partitions)
Consumer Group: order-processors (2 consumers)
Partition 0 → Consumer 1
Partition 1 → Consumer 2
Partition 2 → Consumer 1

Offset Management

Offset tracks which messages have been read.
Can be auto-committed or manual commit.

Warning: Mismanaged offsets → duplicate processing or message loss.

Partition Rebalancing

Triggered when a consumer joins/leaves or partitions change.
Temporarily pauses consumption while rebalancing.

Kafka Scaling Concepts

Kafka can handle millions of messages/sec using:

Partitioning → parallelism for production & consumption
Consumer Groups → horizontal scaling
Replication → fault tolerance
Retention & Compaction → storage management
Compression → reduce network load (gzip, snappy, lz4)
Multi-Cluster & Mirror Maker → cross-cluster replication

Illustration Idea: Partitioned topic across multiple brokers with replication.

Kafka Streams

Kafka Streams is a library for real-time stream processing:

Supports stateless & stateful operations
Windowing → aggregate over time windows
Joins → join streams & tables
Stream ↔ Table duality (KStream ↔ KTable)

Example: Count orders per minute:

KStream<String, String> orders = builder.stream("orders");
KTable<Windowed<String>, Long> count = orders
    .groupByKey()
    .windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
    .count();

Kafka Delivery Guarantees

Kafka supports three levels of message delivery:

At-most-once → Messages may be lost
At-least-once → Messages may be duplicated
Exactly-once → No duplicates, no loss

Exactly-once guarantees require:

Idempotent producers
Transactions for atomic writes across partitions
Atomic offset commits

Configuration Example:

enable.idempotence=true
acks=all
retries=2147483647
max.inflight.requests.per.connection=1
isolation.level=read_committed
processing.mode=exactly_once

Hands On Tips

Use Docker / Docker Compose to run Kafka locally.
Useful CLI tools:
- kafka-topics.sh → manage topics
- kafka-console-producer.sh → produce messages
- kafka-console-consumer.sh → consume messages
Start small: produce/consume messages, explore partitions and offsets.

Example CLI Commands:

# Create a topic
kafka-topics.sh --create --topic orders --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

# Produce messages
kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092

# Consume messages
kafka-console-consumer.sh --topic orders --from-beginning --bootstrap-server localhost:9092

Conclusion

Kafka is a high-performance, distributed streaming platform ideal for real-time messaging, analytics, and event-driven systems. Understanding topics, partitions, brokers, producers, consumers, and delivery guarantees is essential for building reliable Kafka architectures.

By practicing with topics, partitions, producers, consumers, and streams, developers can unlock Kafka’s true power in high-throughput, fault-tolerant, and scalable applications.

More Details:

Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali
GitHub: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

DEV Community

Apache Kafka Fundamentals — A Complete Technical Guide

Table of Contents

Apache Kafka Fundamentals — A Complete Technical Guide

Introduction

Kafka Core Concepts

Topics and Partitions

Brokers-and-Clusters

Zookeeper and kraft

Producers

Producer Responsibilities

Sending Strategies

Acknowledgment Levels

Serialization

Consumers

Consumer Groups

Offset Management

Partition Rebalancing

Kafka Scaling Concepts

Kafka Streams

Kafka Delivery Guarantees

Hands On Tips

Conclusion

Top comments (0)