I Built a Free Apache Kafka Course from Scratch — Here's the Full Curriculum (and What I Got Wrong)
I spent months building a free Apache Kafka course covering everything from first principles to a real-time analytics platform final project.
No paywall. No "premium tier." 9 modules, 470 minutes of content, completely free.
Here's the full syllabus, the Python code that actually works, and the honest mistakes I made building the curriculum — so you don't repeat them.
Why I Built This
Every time someone asked me "how do I learn Kafka?", I sent them to the same 3 places:
- The official Confluent docs (dense, assumes you already know what you're doing)
- A $15 Udemy course that spends Module 1 explaining what a computer is
- A YouTube playlist where half the videos are deleted
None of them answered the real question beginners have: why does Kafka exist, and what problem does it actually solve before I write a single line of code?
That's the gap I built for.
The Problem With Most Kafka Tutorials
Most tutorials start with: "Kafka is a distributed event streaming platform..."
And then they immediately show you a Docker Compose file with 6 services.
Beginners copy-paste it, something breaks, they don't know why, they quit.
The real problem is that Kafka is an answer to a specific architectural problem — and if you don't understand the problem first, the solution makes no sense.
So Module 1 and 2 of this course don't touch Kafka at all. They build the problem statement from scratch.
The Full Syllabus (9 Modules, 470 Minutes)
Module 1: Introduction to Kafka — 35 min
Not "what is Kafka" — but why event streaming exists at all. What breaks in traditional request-response architectures at scale.
Module 2: The Problem Statement — 30 min
A real-world scenario: you're building an e-commerce platform. Orders, inventory, notifications, analytics — all tightly coupled. What happens when one service goes down? This module makes the pain visceral before Kafka enters the picture.
Module 3: How Kafka Solves the Problem — 35 min
Now Kafka enters. Topics, producers, consumers — introduced through the same e-commerce scenario from Module 2. The mental model clicks because the problem is already familiar.
Module 4: Kafka Architecture Deep Dive — 45 min
Brokers, partitions, replication, offsets, ZooKeeper vs KRaft. This is where most tutorials either go too shallow or too deep. The goal here was: deep enough to make architecture decisions, not deep enough to need a PhD.
Key concepts covered:
- Partition strategy and why it matters for throughput
- Replication factor trade-offs
- Consumer group coordination
- Exactly-once vs at-least-once semantics (and when you actually need each)
Module 5: Consumer Groups in Kafka — 40 min
The concept most beginners get wrong. Consumer groups are not just "multiple consumers" — they're a load balancing mechanism with specific partition assignment rules. This module covers offset management and consumer lag, which is what you actually debug in production.
Module 6: Kafka Setup & Hands-On — 50 min
Docker Compose setup that actually works. Complete topic management examples. This is where you stop reading and start running commands.
# The Docker Compose we use — minimal, no unnecessary services
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.4.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:7.4.0
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
No 6-service compose file. No Kafka UI, Schema Registry, or Connect until you actually need them.
Module 7: Kafka with Python — 60 min ⬅️ Most Popular
This is the module people ask about most. kafka-python library, complete producer-consumer example, error handling that actually covers what breaks in real usage.
The producer:
from kafka import KafkaProducer
import json
import time
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
# Retry on failure — critical for production
retries=5,
retry_backoff_ms=300
)
def send_event(topic: str, event: dict):
future = producer.send(topic, value=event)
try:
record_metadata = future.get(timeout=10)
print(f"Sent to partition {record_metadata.partition}, offset {record_metadata.offset}")
except Exception as e:
print(f"Failed to send: {e}")
# Example: e-commerce order event
order_event = {
"order_id": "ORD-1234",
"user_id": "USR-5678",
"items": ["laptop", "mouse"],
"total": 1299.99,
"timestamp": time.time()
}
send_event("order-events", order_event)
producer.flush()
producer.close()
The consumer:
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
'order-events',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest', # Start from beginning if no committed offset
enable_auto_commit=False, # Manual commit — safer for production
group_id='order-processing-service',
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
print("Listening for order events...")
for message in consumer:
order = message.value
print(f"Processing order {order['order_id']} — Total: ${order['total']}")
# Your business logic here
process_order(order)
# Commit only after successful processing
consumer.commit()
Why enable_auto_commit=False matters:
If your consumer crashes between receiving a message and processing it, auto-commit means that message is lost. Manual commit means you only mark a message as "done" after your code actually handles it. This is the difference between a toy consumer and a production one.
Most tutorials don't mention this. This is why I wrote Module 7 myself instead of linking to documentation.
Module 8: Kafka Monitoring & Optimization — 55 min
Consumer lag, JMX metrics, what to actually watch in production. Performance tuning — batch size, linger.ms, compression. The complete monitoring setup using Prometheus + Grafana is included.
Module 9: Final Project — Real-Time Analytics Platform — 120 min
Build an end-to-end system:
- User activity producer (simulates clickstream data)
- Kafka as the event backbone
- Consumer that aggregates metrics in real time
- Output: live dashboard showing active users, top pages, conversion events
This is the project you put on your resume and can actually explain in an interview.
What I Got Wrong Building This Curriculum
Mistake 1: I underestimated how much the problem statement matters
I originally started the course with Kafka concepts. Every beta tester said the same thing: "I understand what Kafka does, but I don't understand when I'd use it."
Adding Module 2 (The Problem Statement) — which doesn't mention Kafka at all — fixed this completely. The concept-to-application gap is the hardest part of teaching distributed systems.
Mistake 2: Setup took too long
My original Module 6 started with building Kafka from source. Nobody needs that. Docker Compose solves setup in 3 commands. I wasted two weeks of curriculum writing on something a one-liner fixes.
Mistake 3: I skipped consumer groups initially
I thought consumer groups were intermediate content. They're not — they're core to understanding how Kafka scales. When I added Module 5, learner comprehension of Module 7 improved significantly because they already understood why group_id exists.
Who This Is For
- Backend developers who keep seeing Kafka in job descriptions but haven't touched it
- CS students who want a practical distributed systems project for their portfolio
- Anyone who tried the official Confluent quickstart and got lost in the Docker output
The Honest Limitation
This course teaches Kafka for application developers — not Kafka administrators. If you need to tune broker configurations for 10M messages/second or manage multi-datacenter replication, this isn't that course. It's the course you take before that one.
Where to Find It
The full course — all 9 modules, all Python examples, the final project — is free at:
No account required to read. Create one if you want to track progress.
If you find it useful, the most helpful thing you can do is share it with someone who's been putting off learning Kafka because the existing resources felt too expensive or too intimidating.
Discussion
What's the Kafka concept that took you the longest to actually understand? For me it was consumer group rebalancing — specifically what happens to in-flight messages during a rebalance. Took me embarrassingly long to get it right.
Drop it in the comments — genuinely curious what trips people up most.
Top comments (0)