Introduction
As part of my journey through the "Apache Kafka Series - Learn Apache Kafka for Beginners v3" Udemy course, I needed to set up a local Kafka environment using Docker. What started as a simple container setup evolved into a deeper understanding of Kafka architecture, Docker best practices, and the transition from Zookeeper to KRaft mode.
In this post, I'll share my learning process, the decisions I made, and the final production-ready configuration I arrived at.
Starting Point: Finding the Right Docker Configuration
Initial Research - Confluent vs Bitnami vs Apache Official
When searching for Kafka Docker setups, I encountered three main options:
- Confluent's official tutorial - https://developer.confluent.io/confluent-tutorials/kafka-on-docker/
- Bitnami Kafka image - Popular for its ease of use
- Apache Kafka official image - The source of truth
Initially, I was torn between Bitnami (known for simplified configuration) and Apache's official image (more control but steeper learning curve).
The Confluent Tutorial Discovery
The Confluent tutorial provided an excellent starting point with this configuration:
services:
broker:
image: apache/kafka:latest
hostname: broker
container_name: broker
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_NODE_ID: 1
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Why I chose the Apache official image:
- ✅ Direct from the source (Apache Foundation)
- ✅ Production-ready and enterprise-grade
- ✅ Better alignment with official documentation
- ✅ Latest features and security updates
Key Learning #1: Understanding KRaft Mode
One of the biggest revelations was learning about KRaft (Kafka Raft) - Kafka's replacement for Zookeeper.
What is KRaft?
- Kafka Raft Algorithm for Tracking metadata
- Eliminates Zookeeper dependency
- Single process can handle both broker and controller roles
- Faster startup and simpler architecture
Configuration Breakdown:
KAFKA_PROCESS_ROLES: broker,controller # Single node handles both roles
KAFKA_NODE_ID: 1 # Unique node identifier
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093 # Controller election
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER # Controller communication
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk # Unique cluster identifier
Benefits of KRaft mode:
- ⚡ Faster startup (no Zookeeper coordination)
- 🏗️ Simpler architecture
- 📈 Better scalability
- 🔄 Single point of configuration
Key Learning #2: Docker Image Layering and Customization
The Command Path Problem
Initially, running Kafka commands required full paths:
docker exec broker /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
This was verbose and error-prone. I learned about Docker image layering and decided to create a custom image.
Solution: Custom Dockerfile
FROM apache/kafka:4.0.1
# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"
# Set working directory
WORKDIR /opt/kafka
Updated docker-compose.yml
services:
broker:
build:
context: .
dockerfile: Dockerfile
image: my-kafka-kraft:4.0.1 # Custom image name
# ... rest of configuration
Key insights:
- ✅ Original Apache image remains unchanged
- ✅ New layer adds convenience without bloat
- ✅ Commands now work directly:
kafka-topics.sh --list --bootstrap-server localhost:9092
Key Learning #3: Understanding Kafka Listeners
The listener configuration was initially confusing but crucial for proper networking:
KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
LISTENERS vs ADVERTISED_LISTENERS:
LISTENERS = "Where Kafka actually listens" (Server-side binding)
ADVERTISED_LISTENERS = "How clients should connect" (Client-side addressing)
Listener Breakdown:
| Listener | Binding Address | Port | Purpose | Access From |
|---|---|---|---|---|
PLAINTEXT://broker:29092 |
Container hostname | 29092 | Inter-service communication | Docker network |
CONTROLLER://broker:29093 |
Container hostname | 29093 | KRaft metadata operations | Internal only |
PLAINTEXT_HOST://0.0.0.0:9092 |
All interfaces | 9092 | External client access | Host machine |
Why Different Addresses?
-
Internal Communication:
broker:29092- Other Docker services connect using container hostname
- Fast, low-latency container-to-container networking
-
Controller Operations:
broker:29093- KRaft protocol for cluster coordination
- Replaces Zookeeper functionality
-
External Access:
localhost:9092- Host machine applications connect via port forwarding
- Docker maps container port to host port
Network Flow Diagram:
External App → localhost:9092 → Docker Port Mapping → PLAINTEXT_HOST://0.0.0.0:9092
↓
Internal Service → broker:29092 → PLAINTEXT://broker:29092 → Kafka Broker
↓
KRaft System → broker:29093 → CONTROLLER://broker:29093 ↗
Key Insight: The address Kafka binds to (0.0.0.0) differs from what it advertises to clients (localhost) because clients can't connect to 0.0.0.0 directly.
Key Learning #4: Data Persistence Strategy
The Problem: Data Loss on Container Restart
Initially, running docker-compose down would delete all topics and data. This happened because:
environment:
KAFKA_LOG_DIRS: /tmp/kraft-combined-logs # Data stored in container's temp directory
# No volumes configured = data loss on container deletion
Solution: Docker Volumes
services:
broker:
# ... other configuration
volumes:
- ./data:/tmp/kraft-combined-logs # Bind mount for data persistence
What Gets Persisted:
./data/
├── __cluster_metadata-0/ # KRaft metadata (replaces Zookeeper)
├── __consumer_offsets-*/ # Consumer group offsets
├── my-topic-0/ # Topic partition data
│ ├── 00000000000000000000.log # Actual messages
│ ├── 00000000000000000000.index # Message index
│ └── partition.metadata # Partition metadata
├── meta.properties # Broker metadata
└── ... # Other Kafka state files
Volume strategy comparison:
| Method | Pros | Cons | Use Case |
|---|---|---|---|
| No volumes | Simple setup | Data loss on restart | Learning/testing only |
| Named volumes | Docker managed | Hidden location | Development |
| Bind mounts | Full control, easy backup | Manual directory management | Production |
Final Local Development Configuration
Here's my complete local development setup:
docker-compose.yml
services:
broker:
build:
context: . # Build context
dockerfile: Dockerfile # Custom Dockerfile
image: my-kafka-kraft:4.0.1 # Explicit image name
hostname: broker
container_name: broker
ports:
- 9092:9092 # External client port
volumes:
- ./data:/tmp/kraft-combined-logs # Data persistence
environment:
KAFKA_BROKER_ID: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_PROCESS_ROLES: broker,controller # KRaft mode
KAFKA_NODE_ID: 1
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:29093
KAFKA_LISTENERS: PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Dockerfile
FROM apache/kafka:4.0.1
# Add Kafka bin directory to PATH for convenient command usage
ENV PATH="/opt/kafka/bin:${PATH}"
# Set working directory
WORKDIR /opt/kafka
Project Structure
kafka-docker/
├── docker-compose.yml # Main configuration
├── Dockerfile # Custom image definition
├── data/ # Kafka data (auto-created)
│ ├── __cluster_metadata-0/
│ ├── my-topic-0/
│ └── ...
└── docker_commands # Command reference file
Key Takeaways and Best Practices
1. Choose the Right Base Image
- Use official Apache Kafka for production environments
- Bitnami is excellent for quick prototyping
- Always pin specific versions:
apache/kafka:4.0.1vslatest
2. Embrace KRaft Mode
- Simpler than Zookeeper-based setups
- Better performance and reliability
- Future-proof (Zookeeper deprecation planned)
3. Layer Docker Images Thoughtfully
- Keep customizations minimal and purpose-driven
- Document why each layer exists
- Use multi-stage builds for complex setups
4. Plan for Data Persistence
- Always use volumes in production
- Bind mounts offer better control than named volumes
- Backup strategy should include volume data
5. Network Configuration Matters
- Understand internal vs external listeners
- Plan port allocation carefully
- Test connectivity from both inside and outside containers
Conclusion
This journey from a simple Docker container to a well-configured local Kafka setup taught me valuable lessons about:
- Modern Kafka architecture (KRaft vs Zookeeper)
- Docker best practices (layering, volumes, networking)
- Configuration decisions (persistence, networking, image customization)
- Development environment setup (network restrictions, data management)
The final configuration is suitable for local development and learning, with a solid foundation that could be enhanced for production use when needed.
Top comments (0)