DEV Community

Byron Hsieh
Byron Hsieh

Posted on

Setting Up Kafka 4.0 Locally with Docker: A Learning Journey

Introduction

As part of my journey through the "Apache Kafka Series - Learn Apache Kafka for Beginners v3" Udemy course, I needed to set up a local Kafka environment using Docker. What started as a simple container setup evolved into a deeper understanding of Kafka architecture, Docker best practices, and the transition from Zookeeper to KRaft mode.

In this post, I'll share my learning process, the decisions I made, and the final production-ready configuration I arrived at.

Starting Point: Finding the Right Docker Configuration

Initial Research - Confluent vs Bitnami vs Apache Official

When searching for Kafka Docker setups, I encountered three main options:

  1. Confluent's official tutorial - https://developer.confluent.io/confluent-tutorials/kafka-on-docker/
  2. Bitnami Kafka image - Popular for its ease of use
  3. Apache Kafka official image - The source of truth

Initially, I was torn between Bitnami (known for simplified configuration) and Apache's official image (more control but steeper learning curve).

The Confluent Tutorial Discovery

The Confluent tutorial provided an excellent starting point with this configuration:

services:
  broker:
    image: apache/kafka:latest
    hostname: broker
    container_name: broker
    ports:
      - 9092:9092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
      KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Enter fullscreen mode Exit fullscreen mode

Why I chose the Apache official image:

  • ✅ Direct from the source (Apache Foundation)
  • ✅ Production-ready and enterprise-grade
  • ✅ Better alignment with official documentation
  • ✅ Latest features and security updates

Key Learning #1: Understanding KRaft Mode

One of the biggest revelations was learning about KRaft (Kafka Raft) - Kafka's replacement for Zookeeper.

What is KRaft?

  • Kafka Raft Algorithm for Tracking metadata
  • Eliminates Zookeeper dependency
  • Single process can handle both broker and controller roles
  • Faster startup and simpler architecture

Configuration Breakdown:

KAFKA_PROCESS_ROLES: broker,controller        # Single node handles both roles
KAFKA_NODE_ID: 1                             # Unique node identifier
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093  # Controller election
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER   # Controller communication
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk            # Unique cluster identifier
Enter fullscreen mode Exit fullscreen mode

Benefits of KRaft mode:

  • ⚡ Faster startup (no Zookeeper coordination)
  • 🏗️ Simpler architecture
  • 📈 Better scalability
  • 🔄 Single point of configuration

Key Learning #2: Docker Image Layering and Customization

The Command Path Problem

Initially, running Kafka commands required full paths:

docker exec broker /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Enter fullscreen mode Exit fullscreen mode

This was verbose and error-prone. I learned about Docker image layering and decided to create a custom image.

Solution: Custom Dockerfile

FROM apache/kafka:4.0.1

# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"

# Set working directory
WORKDIR /opt/kafka
Enter fullscreen mode Exit fullscreen mode

Updated docker-compose.yml

services:
  broker:
    build:
      context: .
      dockerfile: Dockerfile
    image: my-kafka-kraft:4.0.1               # Custom image name
    # ... rest of configuration
Enter fullscreen mode Exit fullscreen mode

Key insights:

  • ✅ Original Apache image remains unchanged
  • ✅ New layer adds convenience without bloat
  • ✅ Commands now work directly: kafka-topics.sh --list --bootstrap-server localhost:9092

Key Learning #3: Understanding Kafka Listeners

The listener configuration was initially confusing but crucial for proper networking:

KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
Enter fullscreen mode Exit fullscreen mode

LISTENERS vs ADVERTISED_LISTENERS:

LISTENERS = "Where Kafka actually listens" (Server-side binding)

ADVERTISED_LISTENERS = "How clients should connect" (Client-side addressing)

Listener Breakdown:

Listener Binding Address Port Purpose Access From
PLAINTEXT://broker:29092 Container hostname 29092 Inter-service communication Docker network
CONTROLLER://broker:29093 Container hostname 29093 KRaft metadata operations Internal only
PLAINTEXT_HOST://0.0.0.0:9092 All interfaces 9092 External client access Host machine

Why Different Addresses?

  1. Internal Communication: broker:29092

    • Other Docker services connect using container hostname
    • Fast, low-latency container-to-container networking
  2. Controller Operations: broker:29093

    • KRaft protocol for cluster coordination
    • Replaces Zookeeper functionality
  3. External Access: localhost:9092

    • Host machine applications connect via port forwarding
    • Docker maps container port to host port

Network Flow Diagram:

External App → localhost:9092 → Docker Port Mapping → PLAINTEXT_HOST://0.0.0.0:9092
                                                           ↓
Internal Service → broker:29092 → PLAINTEXT://broker:29092 → Kafka Broker
                                                           ↓
KRaft System → broker:29093 → CONTROLLER://broker:29093 ↗
Enter fullscreen mode Exit fullscreen mode

Key Insight: The address Kafka binds to (0.0.0.0) differs from what it advertises to clients (localhost) because clients can't connect to 0.0.0.0 directly.

Key Learning #4: Data Persistence Strategy

The Problem: Data Loss on Container Restart

Initially, running docker-compose down would delete all topics and data. This happened because:

environment:
  KAFKA_LOG_DIRS: /tmp/kraft-combined-logs  # Data stored in container's temp directory
# No volumes configured = data loss on container deletion
Enter fullscreen mode Exit fullscreen mode

Solution: Docker Volumes

services:
  broker:
    # ... other configuration
    volumes:
      - ./data:/tmp/kraft-combined-logs  # Bind mount for data persistence
Enter fullscreen mode Exit fullscreen mode

What Gets Persisted:

./data/
├── __cluster_metadata-0/              # KRaft metadata (replaces Zookeeper)
├── __consumer_offsets-*/              # Consumer group offsets
├── my-topic-0/                        # Topic partition data
│   ├── 00000000000000000000.log       # Actual messages
│   ├── 00000000000000000000.index     # Message index
│   └── partition.metadata             # Partition metadata
├── meta.properties                    # Broker metadata
└── ...                               # Other Kafka state files
Enter fullscreen mode Exit fullscreen mode

Volume strategy comparison:

Method Pros Cons Use Case
No volumes Simple setup Data loss on restart Learning/testing only
Named volumes Docker managed Hidden location Development
Bind mounts Full control, easy backup Manual directory management Production

Final Local Development Configuration

Here's my complete local development setup:

docker-compose.yml

services:
  broker:
    build:
      context: .                              # Build context
      dockerfile: Dockerfile                  # Custom Dockerfile
    image: my-kafka-kraft:4.0.1              # Explicit image name
    hostname: broker
    container_name: broker
    ports:
      - 9092:9092                            # External client port
    volumes:
      - ./data:/tmp/kraft-combined-logs      # Data persistence
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_PROCESS_ROLES: broker,controller  # KRaft mode
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
      KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Enter fullscreen mode Exit fullscreen mode

Dockerfile

FROM apache/kafka:4.0.1

# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"

# Set working directory
WORKDIR /opt/kafka
Enter fullscreen mode Exit fullscreen mode

Project Structure

kafka-docker/
├── docker-compose.yml      # Main configuration
├── Dockerfile              # Custom image definition
├── data/                   # Kafka data (auto-created)
│   ├── __cluster_metadata-0/
│   ├── my-topic-0/
│   └── ...
└── docker_commands         # Command reference file
Enter fullscreen mode Exit fullscreen mode

Key Takeaways and Best Practices

1. Choose the Right Base Image

  • Use official Apache Kafka for production environments
  • Bitnami is excellent for quick prototyping
  • Always pin specific versions: apache/kafka:4.0.1 vs latest

2. Embrace KRaft Mode

  • Simpler than Zookeeper-based setups
  • Better performance and reliability
  • Future-proof (Zookeeper deprecation planned)

3. Layer Docker Images Thoughtfully

  • Keep customizations minimal and purpose-driven
  • Document why each layer exists
  • Use multi-stage builds for complex setups

4. Plan for Data Persistence

  • Always use volumes in production
  • Bind mounts offer better control than named volumes
  • Backup strategy should include volume data

5. Network Configuration Matters

  • Understand internal vs external listeners
  • Plan port allocation carefully
  • Test connectivity from both inside and outside containers

Conclusion

This journey from a simple Docker container to a well-configured local Kafka setup taught me valuable lessons about:

  • Modern Kafka architecture (KRaft vs Zookeeper)
  • Docker best practices (layering, volumes, networking)
  • Configuration decisions (persistence, networking, image customization)
  • Development environment setup (network restrictions, data management)

The final configuration is suitable for local development and learning, with a solid foundation that could be enhanced for production use when needed.

Resources


Top comments (0)