Byron Hsieh

Posted on Dec 23, 2025 • Edited on Dec 30, 2025

Setting Up Kafka 4.0 Locally with Docker: A Learning Journey

#kafka #docker #kraft

Introduction

As part of my journey through the "Apache Kafka Series - Learn Apache Kafka for Beginners v3" Udemy course, I needed to set up a local Kafka environment using Docker. What started as a simple container setup evolved into a deeper understanding of Kafka architecture, Docker best practices, and the transition from Zookeeper to KRaft mode.

In this post, I'll share my learning process, the decisions I made, and the final production-ready configuration I arrived at.

Starting Point: Finding the Right Docker Configuration

Initial Research - Confluent vs Bitnami vs Apache Official

When searching for Kafka Docker setups, I encountered three main options:

Confluent's official tutorial - https://developer.confluent.io/confluent-tutorials/kafka-on-docker/
Bitnami Kafka image - Popular for its ease of use
Apache Kafka official image - The source of truth

Initially, I was torn between Bitnami (known for simplified configuration) and Apache's official image (more control but steeper learning curve).

The Confluent Tutorial Discovery

The Confluent tutorial provided an excellent starting point with this configuration:

services:
  broker:
    image: apache/kafka:latest
    hostname: broker
    container_name: broker
    ports:
      - 9092:9092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
      KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

Why I chose the Apache official image:

✅ Direct from the source (Apache Foundation)
✅ Production-ready and enterprise-grade
✅ Better alignment with official documentation
✅ Latest features and security updates

Key Learning #1: Understanding KRaft Mode

One of the biggest revelations was learning about KRaft (Kafka Raft) - Kafka's replacement for Zookeeper.

What is KRaft?

Kafka Raft Algorithm for Tracking metadata
Eliminates Zookeeper dependency
Single process can handle both broker and controller roles
Faster startup and simpler architecture

Configuration Breakdown:

KAFKA_PROCESS_ROLES: broker,controller        # Single node handles both roles
KAFKA_NODE_ID: 1                             # Unique node identifier
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093  # Controller election
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER   # Controller communication
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk            # Unique cluster identifier

Benefits of KRaft mode:

⚡ Faster startup (no Zookeeper coordination)
🏗️ Simpler architecture
📈 Better scalability
🔄 Single point of configuration

Key Learning #2: Docker Image Layering and Customization

The Command Path Problem

Initially, running Kafka commands required full paths:

docker exec broker /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

This was verbose and error-prone. I learned about Docker image layering and decided to create a custom image.

Solution: Custom Dockerfile

FROM apache/kafka:4.0.1

# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"

# Set working directory
WORKDIR /opt/kafka

Updated docker-compose.yml

services:
  broker:
    build:
      context: .
      dockerfile: Dockerfile
    image: my-kafka-kraft:4.0.1               # Custom image name
    # ... rest of configuration

Key insights:

✅ Original Apache image remains unchanged
✅ New layer adds convenience without bloat
✅ Commands now work directly: kafka-topics.sh --list --bootstrap-server localhost:9092

Key Learning #3: Understanding Kafka Listeners

The listener configuration was initially confusing but crucial for proper networking:

KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092

LISTENERS vs ADVERTISED_LISTENERS:

LISTENERS = "Where Kafka actually listens" (Server-side binding)

ADVERTISED_LISTENERS = "How clients should connect" (Client-side addressing)

Listener Breakdown:

Listener	Binding Address	Port	Purpose	Access From
`PLAINTEXT://broker:29092`	Container hostname	29092	Inter-service communication	Docker network
`CONTROLLER://broker:29093`	Container hostname	29093	KRaft metadata operations	Internal only
`PLAINTEXT_HOST://0.0.0.0:9092`	All interfaces	9092	External client access	Host machine

Why Different Addresses?

Internal Communication: broker:29092
- Other Docker services connect using container hostname
- Fast, low-latency container-to-container networking
Controller Operations: broker:29093
- KRaft protocol for cluster coordination
- Replaces Zookeeper functionality
External Access: localhost:9092
- Host machine applications connect via port forwarding
- Docker maps container port to host port

Network Flow Diagram:

External App → localhost:9092 → Docker Port Mapping → PLAINTEXT_HOST://0.0.0.0:9092
                                                           ↓
Internal Service → broker:29092 → PLAINTEXT://broker:29092 → Kafka Broker
                                                           ↓
KRaft System → broker:29093 → CONTROLLER://broker:29093 ↗

Key Insight: The address Kafka binds to (0.0.0.0) differs from what it advertises to clients (localhost) because clients can't connect to 0.0.0.0 directly.

Key Learning #4: Data Persistence Strategy

The Problem: Data Loss on Container Restart

Initially, running docker-compose down would delete all topics and data. This happened because:

environment:
  KAFKA_LOG_DIRS: /tmp/kraft-combined-logs  # Data stored in container's temp directory
# No volumes configured = data loss on container deletion

Solution: Docker Volumes

services:
  broker:
    # ... other configuration
    volumes:
      - ./data:/tmp/kraft-combined-logs  # Bind mount for data persistence

What Gets Persisted:

./data/
├── __cluster_metadata-0/              # KRaft metadata (replaces Zookeeper)
├── __consumer_offsets-*/              # Consumer group offsets
├── my-topic-0/                        # Topic partition data
│   ├── 00000000000000000000.log       # Actual messages
│   ├── 00000000000000000000.index     # Message index
│   └── partition.metadata             # Partition metadata
├── meta.properties                    # Broker metadata
└── ...                               # Other Kafka state files

Volume strategy comparison:

Method	Pros	Cons	Use Case
No volumes	Simple setup	Data loss on restart	Learning/testing only
Named volumes	Docker managed	Hidden location	Development
Bind mounts	Full control, easy backup	Manual directory management	Production

Final Local Development Configuration

Here's my complete local development setup:

docker-compose.yml

services:
  broker:
    build:
      context: .                              # Build context
      dockerfile: Dockerfile                  # Custom Dockerfile
    image: my-kafka-kraft:4.0.1              # Explicit image name
    hostname: broker
    container_name: broker
    ports:
      - 9092:9092                            # External client port
    volumes:
      - ./data:/tmp/kraft-combined-logs      # Data persistence
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_PROCESS_ROLES: broker,controller  # KRaft mode
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
      KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

Dockerfile

FROM apache/kafka:4.0.1

# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"

# Set working directory
WORKDIR /opt/kafka

Project Structure

kafka-docker/
├── docker-compose.yml      # Main configuration
├── Dockerfile              # Custom image definition
├── data/                   # Kafka data (auto-created)
│   ├── __cluster_metadata-0/
│   ├── my-topic-0/
│   └── ...
└── docker_commands         # Command reference file

Key Takeaways and Best Practices

1. Choose the Right Base Image

Use official Apache Kafka for production environments
Bitnami is excellent for quick prototyping
Always pin specific versions: apache/kafka:4.0.1 vs latest

2. Embrace KRaft Mode

Simpler than Zookeeper-based setups
Better performance and reliability
Future-proof (Zookeeper deprecation planned)

3. Layer Docker Images Thoughtfully

Keep customizations minimal and purpose-driven
Document why each layer exists
Use multi-stage builds for complex setups

4. Plan for Data Persistence

Always use volumes in production
Bind mounts offer better control than named volumes
Backup strategy should include volume data

5. Network Configuration Matters

Understand internal vs external listeners
Plan port allocation carefully
Test connectivity from both inside and outside containers

Conclusion

This journey from a simple Docker container to a well-configured local Kafka setup taught me valuable lessons about:

Modern Kafka architecture (KRaft vs Zookeeper)
Docker best practices (layering, volumes, networking)
Configuration decisions (persistence, networking, image customization)
Development environment setup (network restrictions, data management)

The final configuration is suitable for local development and learning, with a solid foundation that could be enhanced for production use when needed.

DEV Community