De' Clerke

Posted on Jun 2

Kafka for Data Engineers: Core Concepts, KRaft, and the Patterns That Actually Work

#dataengineering #kafka #airflow #streaming

If your Kafka Docker Compose still has a ZooKeeper service in it, your setup is already legacy. As of Kafka 4.0 (released March 2025), ZooKeeper is gone. The architecture changed, the config changed, and the setup you learned two years ago will not work with any Kafka 4.x image.

This guide covers Kafka from the ground up — what it is, how it works, how to run it locally in 2026 with KRaft, and the Python patterns that hold up in production. It also covers the gotchas that waste days when you're new to it.

What Kafka Is and Why Data Engineers Use It

Kafka is a distributed event streaming platform. The core abstraction is a log: an ordered, append-only sequence of records. Producers write to the log. Consumers read from it. The log is retained for a configurable period (default 7 days), so multiple consumers can independently read the same data at their own pace.

For data engineers, this matters in three scenarios:

Real-time ingestion. When data arrives faster than a batch pipeline can process it — flight events, sensor readings, stock ticks, user clicks — Kafka buffers it reliably. Your pipeline reads at whatever pace it can sustain without losing data.

Decoupling producers from consumers. Your ingestion pipeline doesn't need to know who consumes the data or how fast they process it. Add a new downstream system and it starts reading from whatever offset it needs. Nothing about the producer changes.

Change Data Capture (CDC). Kafka Connect + Debezium reads PostgreSQL's write-ahead log and streams every row-level insert, update, and delete to a Kafka topic. Downstream systems get a live feed of database changes without polling.

The Architecture: What You Actually Need to Know

Brokers, Topics, and Partitions

A Kafka broker is a server that stores messages and serves producers and consumers. A production cluster has multiple brokers for fault tolerance.

A topic is a named stream of messages — like a database table, but append-only. Topics are divided into partitions, which are the actual unit of storage and parallelism. Each partition is an ordered, immutable log on disk.

Topic: flight-events (3 partitions)

Partition 0: [offset 0] [offset 1] [offset 2] [offset 3] ...
Partition 1: [offset 0] [offset 1] [offset 2] ...
Partition 2: [offset 0] [offset 1] ...

Messages within a partition are strictly ordered. Messages across partitions are not. If you need strict global ordering, use one partition. If you need parallelism, use more.

Partition count rule of thumb: partitions = max(target throughput / single-partition throughput, number of consumers). You can increase partition count later, but you can never decrease it. Start conservative.

Offsets

An offset is the unique sequential ID of a message within a partition. Offset 0 is the first message. Offset 47 is the 48th. Consumers track their position by storing ("committing") the last offset they processed. This is how they resume after a restart without reprocessing everything or missing anything.

Consumer Groups

A consumer group is a set of consumers that split a topic's partitions between them. Each partition is assigned to exactly one consumer in the group at a time.

Topic: flight-events (4 partitions)
Consumer group: pipeline-processors (2 consumers)

Consumer A: handles Partition 0 + Partition 1
Consumer B: handles Partition 2 + Partition 3

This is the horizontal scaling model. To process faster, add consumers to the group — up to the number of partitions. Beyond that, extra consumers sit idle.

A rebalance happens when consumers join or leave the group. Partitions get redistributed. In older Kafka, rebalances were disruptive — all consumers stopped processing, all partitions were revoked, and they were reassigned from scratch. With the new rebalance protocol (KIP-848, GA in Kafka 4.0), rebalances are incremental. Only the partitions that need to move actually move. For large consumer groups, this is a significant operational improvement.

Delivery Semantics

Three guarantees, each with a different tradeoff:

Semantic	Description	Risk
At-most-once	Commit before processing	May lose messages on crash
At-least-once	Commit after processing	May process duplicates on retry
Exactly-once	Transactions + idempotent producer	Highest complexity and latency

For most data engineering pipelines, at-least-once is the right default. Make your processing idempotent (upsert into PostgreSQL with ON CONFLICT, for example) and duplicates become harmless.

The Big Change in 2025: KRaft Replaces ZooKeeper

Every Kafka deployment before Kafka 4.0 required ZooKeeper — a separate distributed coordination service that managed cluster metadata (which broker is the controller, partition leaders, topic configs, consumer group state).

ZooKeeper was the most operationally painful part of running Kafka. It required its own cluster, its own monitoring, its own tuning. It had its own failure modes that were separate from Kafka's.

KRaft (Kafka Raft Metadata) replaces ZooKeeper entirely. Kafka manages its own metadata using the Raft consensus algorithm, built directly into the broker. There is no separate service. The cluster self-manages.

Kafka 3.9.x was the last release to support ZooKeeper. If you're on Kafka 3.9 with ZooKeeper and want to upgrade to 4.0+, you must migrate to KRaft first using the migration tooling in 3.9. You cannot skip it.

Kafka 4.0+ is KRaft-only, no exceptions. The current stable release is 4.3.0 (May 2026).

The practical impact for a new project: your Docker Compose is simpler. No ZooKeeper service. No KAFKA_ZOOKEEPER_CONNECT. Just Kafka, configured to run as both broker and controller.

Running Kafka Locally: KRaft Docker Compose

The Confluent Platform Docker images are the easiest way to run Kafka locally. Confluent Platform 8.1 maps to Kafka 4.1.

# docker-compose.yml — Kafka 4.1 (Confluent Platform 8.1.2), KRaft mode
version: '3.8'
services:
  kafka:
    image: confluentinc/cp-kafka:8.1.2
    container_name: kafka
    hostname: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
      KAFKA_LOG_RETENTION_HOURS: 168
      CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
    volumes:
      - kafka_data:/var/lib/kafka/data
    healthcheck:
      test: kafka-topics --bootstrap-server localhost:9092 --list || exit 1
      interval: 10s
      timeout: 10s
      retries: 5

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    depends_on:
      kafka:
        condition: service_healthy
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092

volumes:
  kafka_data:

Start it:

docker compose up -d
docker compose logs -f kafka    # watch for "started (kafka.server.KafkaServer)"

The Kafka UI at http://localhost:8080 shows topics, consumer groups, lag, and messages visually. Use it while you learn.

The ADVERTISED_LISTENERS gotcha (the most common Docker error):

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 works when your producer/consumer runs on the host machine. If your producer also runs inside Docker (a different container), use the service name instead:

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092

The rule: localhost for host-to-container access, kafka (the service name) for container-to-container. Mixing them produces the error kafka: client has run out of available brokers to talk to with no further explanation.

For multi-service Docker setups (producer container + Kafka container + consumer container), add two listeners:

KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,PLAINTEXT_HOST://0.0.0.0:29092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT

Internal containers connect on kafka:9092. Your host machine connects on localhost:29092.

CLI: The Commands You'll Use Constantly

Get into the container:

docker exec -it kafka bash

Topics:

# Create a topic
kafka-topics --bootstrap-server localhost:9092 \
  --create --topic flight-events \
  --partitions 3 \
  --replication-factor 1

# List all topics
kafka-topics --bootstrap-server localhost:9092 --list

# Describe a topic (partitions, leaders, replicas, ISR)
kafka-topics --bootstrap-server localhost:9092 --describe --topic flight-events

# Delete a topic
kafka-topics --bootstrap-server localhost:9092 --delete --topic flight-events

# Change retention to 1 day
kafka-configs --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name flight-events \
  --add-config retention.ms=86400000

Quick produce/consume for testing:

# Produce — type messages, Ctrl+C to stop
kafka-console-producer \
  --bootstrap-server localhost:9092 \
  --topic flight-events

# Consume from the beginning
kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic flight-events \
  --from-beginning

# Consume as part of a named group
kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic flight-events \
  --group my-test-group \
  --from-beginning

Consumer group inspection (critical for debugging lag):

# List all consumer groups
kafka-consumer-groups --bootstrap-server localhost:9092 --list

# Describe a group — shows current offset, log end offset, and lag per partition
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --describe --group flight-processor

# Output:
# TOPIC          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  CONSUMER-ID
# flight-events  0          1234            1236            2    ...
# flight-events  1          890             890             0    ...
# flight-events  2          445             450             5    ...

Resetting offsets:

# Reset to beginning (reprocess all messages)
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --group flight-processor \
  --topic flight-events \
  --reset-offsets --to-earliest --execute

# Reset to latest (skip existing, process only new messages)
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --group flight-processor \
  --topic flight-events \
  --reset-offsets --to-latest --execute

# Reset to a specific datetime
kafka-consumer-groups --bootstrap-server localhost:9092 \
  --group flight-processor \
  --topic flight-events \
  --reset-offsets --to-datetime 2025-01-01T00:00:00.000 --execute

Python: Producer Patterns with confluent-kafka

Install:

pip install confluent-kafka==2.14.0

Basic producer:

import json
from confluent_kafka import Producer, KafkaException

KAFKA_CONFIG = {
    'bootstrap.servers': 'localhost:9092',
    'acks': 'all',                           # wait for all in-sync replicas
    'enable.idempotence': True,              # exactly-once per partition, retries safe
    'max.in.flight.requests.per.connection': 5,
    'retries': 10,
    'retry.backoff.ms': 1000,
    'compression.type': 'snappy',
    'batch.size': 32768,                     # 32 KB batches
    'linger.ms': 10,                         # wait 10ms to fill a batch before sending
}

def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}")

producer = Producer(KAFKA_CONFIG)

def produce_message(topic: str, key: str, value: dict):
    producer.produce(
        topic=topic,
        key=key.encode('utf-8'),
        value=json.dumps(value).encode('utf-8'),
        callback=delivery_callback,
    )
    producer.poll(0)    # trigger callbacks without blocking

def produce_batch(topic: str, records: list[dict], key_field: str):
    for record in records:
        key = str(record.get(key_field, ''))
        producer.produce(
            topic=topic,
            key=key.encode('utf-8'),
            value=json.dumps(record).encode('utf-8'),
            callback=delivery_callback,
        )
    producer.flush()    # block until all messages confirmed delivered
    print(f"Flushed {len(records)} messages to {topic}")

Why enable.idempotence=True: Without it, a retry after a network timeout might deliver the same message twice because the broker received it but the acknowledgment was lost. With idempotence enabled, Kafka assigns each message a sequence number and deduplicates retries at the broker level. Always enable it unless you have a specific reason not to.

Why linger.ms=10: By default, Kafka sends messages as soon as they're produced, one batch at a time. Setting linger.ms tells the producer to wait up to 10ms for more messages before sending, filling batches more efficiently. For high-throughput pipelines this meaningfully reduces network round trips.

Python: Consumer Patterns

from confluent_kafka import Consumer, KafkaError, KafkaException
import json
import signal

CONSUMER_CONFIG = {
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'flight-processor',
    'auto.offset.reset': 'earliest',     # start from the beginning if no committed offset
    'enable.auto.commit': False,         # manual commit — never let Kafka auto-commit
    'max.poll.interval.ms': 300000,      # 5 min max between polls before rebalance triggered
    'session.timeout.ms': 45000,
    'heartbeat.interval.ms': 3000,
}

consumer = Consumer(CONSUMER_CONFIG)
consumer.subscribe(['flight-events'])

running = True

def shutdown(signum, frame):
    global running
    running = False

signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)

try:
    while running:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            if msg.error().code() == KafkaError._PARTITION_EOF:
                # Reached end of partition — not an error, just informational
                continue
            raise KafkaException(msg.error())

        key = msg.key().decode('utf-8') if msg.key() else None
        value = json.loads(msg.value().decode('utf-8'))

        # Process the message
        process_event(value)

        # Commit after processing — at-least-once delivery
        consumer.commit(message=msg)
finally:
    consumer.close()

The auto.offset.reset gotcha — the mistake everyone makes first:

auto.offset.reset only applies when a consumer group has no committed offsets — meaning it's brand new or its offsets were deleted. It does not override existing committed offsets.

earliest: start from the oldest available message (what you want for a new consumer seeing existing data)
latest: start from the newest message — everything before the consumer started is invisible

The common mistake: you write a new consumer, set auto.offset.reset='latest', start it, and wonder why it receives nothing. It's because there were no new messages after it started. All the existing messages are already "past" the latest offset. Change to earliest and reset the consumer group offsets, or use --to-earliest on the CLI.

Why enable.auto.commit=False: Auto-commit fires on a timer (every 5 seconds by default). If your process crashes after the commit but before processing finishes, messages are lost. If it crashes after processing but before the commit, you reprocess on restart. Manual commit after processing gives you at-least-once: possible duplicates, never lost. For idempotent sinks (PostgreSQL upserts), this is the right tradeoff.

Batch consumer (for loading into a database):

Instead of committing after every single message, accumulate a batch and commit once:

def consume_and_load(topic: str, batch_size: int = 500):
    consumer.subscribe([topic])
    batch = []
    try:
        while True:
            msg = consumer.poll(timeout=5.0)
            if msg and not msg.error():
                batch.append(json.loads(msg.value()))

            if len(batch) >= batch_size:
                load_to_database(batch)    # your upsert logic
                consumer.commit()
                batch = []
    finally:
        if batch:
            load_to_database(batch)
            consumer.commit()
        consumer.close()

This pattern is typical for Kafka-to-PostgreSQL pipelines: poll until you have 500 messages, bulk-insert with ON CONFLICT DO NOTHING, commit, repeat.

The Rebalance Problem and How KIP-848 Fixes It

The most disruptive thing in a Kafka consumer setup is a rebalance. Before Kafka 4.0, every time a consumer joined or left a group, the broker would:

Tell all consumers to stop processing and revoke their partitions
Wait for all consumers to acknowledge
Reassign all partitions from scratch
Tell consumers to resume

For a consumer group with 20 consumers and 60 partitions, a single consumer restart triggered a full stop-the-world for every consumer. At high message volume, this caused visible processing gaps.

The new rebalance protocol (KIP-848, GA in Kafka 4.0) works incrementally. Only the partitions that need to move actually move. Consumers that don't need to change keep processing. The impact of a single consumer restart is isolated to that consumer's partitions.

To use the new protocol, set the group.protocol consumer config:

CONSUMER_CONFIG = {
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'flight-processor',
    'group.protocol': 'consumer',      # new KIP-848 protocol; 'classic' is the old one
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,
}

The consumer protocol is the default in Kafka 4.0+. If you're connecting to a 4.0+ broker, it will be used automatically. If you're connecting to a 3.x broker, it falls back to classic.

Monitoring Consumer Lag

Lag is the difference between the latest message in a partition and the last message your consumer committed. Lag = 0 means you're fully caught up. Lag growing means your consumer is slower than the producer.

CLI:

kafka-consumer-groups --bootstrap-server localhost:9092 \
  --describe --group flight-processor

The LAG column is what you watch. If it's consistently growing, your consumer needs more threads, more instances, or a faster processing path.

Python:

from confluent_kafka.admin import AdminClient, ConsumerGroupTopicPartitions, TopicPartition
from confluent_kafka import Consumer

admin = AdminClient({'bootstrap.servers': 'localhost:9092'})

def check_lag(group_id: str, topic: str, num_partitions: int) -> dict:
    partitions = [TopicPartition(topic, p) for p in range(num_partitions)]
    fut = admin.list_consumer_group_offsets(
        [ConsumerGroupTopicPartitions(group_id, partitions)]
    )
    committed = fut[group_id].result().topic_partitions

    temp_consumer = Consumer({'bootstrap.servers': 'localhost:9092', 'group.id': '_lag-check'})
    lag_report = {}
    for tp in committed:
        low, high = temp_consumer.get_watermark_offsets(
            TopicPartition(tp.topic, tp.partition)
        )
        lag = high - (tp.offset if tp.offset >= 0 else low)
        lag_report[f"{tp.topic}[{tp.partition}]"] = lag
        if lag > 10000:
            print(f"WARNING: high lag on {tp.topic}[{tp.partition}]: {lag}")
    temp_consumer.close()
    return lag_report

Common Errors and What They Actually Mean

"LEADER_NOT_AVAILABLE"
The topic was just created and leader election is still in progress. Wait 2–3 seconds and retry. This is not a real error for newly created topics.

Consumer receives no messages despite messages existing in the topic
Two causes: (1) auto.offset.reset='latest' and the consumer started after those messages were written — reset offsets to earliest for the group; (2) another consumer in the same group already committed those offsets — use a different group.id or reset the group.

"Rebalancing too often"
Your consumer is taking too long between poll() calls. If processing one batch takes longer than max.poll.interval.ms (default 5 minutes), Kafka assumes the consumer is dead and triggers a rebalance. Fix: increase max.poll.interval.ms, process fewer records per poll, or move slow processing to a background thread.

"Message too large" (broker rejects)
message.max.bytes on the broker (default 1MB) is smaller than your message. Align message.max.bytes on the broker and max.request.size on the producer:

# In Docker Compose:
KAFKA_MESSAGE_MAX_BYTES: 10485760         # 10 MB broker limit

# In producer config:
'max.request.size': 10485760             # must match

Dead Letter Queue — when you can't fix a bad message:

Some messages will always fail processing — malformed JSON, missing fields, unexpected types. Don't let them block your pipeline. Route them to a DLQ topic:

def process_with_dlq(msg, dlq_producer):
    try:
        data = json.loads(msg.value())
        process_event(data)
        consumer.commit(message=msg)
    except Exception as e:
        dlq_producer.produce(
            'flight-events-dlq',
            key=msg.key(),
            value=msg.value(),
            headers={
                'error': str(e).encode(),
                'original-topic': msg.topic().encode(),
                'original-partition': str(msg.partition()).encode(),
                'original-offset': str(msg.offset()).encode(),
            },
        )
        dlq_producer.flush()
        consumer.commit(message=msg)    # commit anyway — bad message goes to DLQ, not back to queue

The DLQ lets you inspect failed messages later, fix the processing logic, and replay them.

Kafka with Airflow

The apache-airflow-providers-apache-kafka package adds native Kafka operators and sensors to Airflow:

pip install apache-airflow-providers-apache-kafka

Add a Kafka connection in the Airflow UI: Conn Type → Apache Kafka, Bootstrap Servers → kafka:9092.

from airflow.decorators import dag, task
from airflow.providers.apache.kafka.sensors.kafka import AwaitMessageSensor
from airflow.providers.apache.kafka.operators.consume import ConsumeFromTopicOperator
from datetime import datetime

@dag(schedule='@daily', start_date=datetime(2025, 1, 1))
def kafka_pipeline():

    # Wait for a message matching a condition before proceeding
    wait_for_data = AwaitMessageSensor(
        task_id='wait_for_flight_data',
        kafka_config_id='kafka_default',
        topics=['flight-events'],
        apply_function='my_module.check_message',
        poll_timeout=1.0,
        poll_interval=5,
    )

    # Consume and process a batch from the topic
    consume = ConsumeFromTopicOperator(
        task_id='consume_flights',
        kafka_config_id='kafka_default',
        topics=['flight-events'],
        apply_function='my_module.process_message',
        max_messages=1000,
        max_batch_size=100,
        commit_cadence='end_of_batch',
    )

    wait_for_data >> consume

kafka_pipeline()

# my_module.py
import json

def check_message(message, **context):
    data = json.loads(message.value())
    return data.get('status') == 'ready'

def process_message(message, **context):
    data = json.loads(message.value())
    # transform and return
    return {**data, 'processed_at': context['ts']}

For simpler use cases — just producing from a Python task — use the confluent_kafka library directly inside a @task function:

from confluent_kafka import Producer
import json

@task
def produce_events(records: list) -> int:
    p = Producer({'bootstrap.servers': 'kafka:9092', 'acks': 'all'})
    for record in records:
        p.produce('flight-events', value=json.dumps(record).encode())
    p.flush()
    return len(records)

Quick Reference: Key Config Decisions

Config	Recommended value	Why
`acks`	`all`	Safest — waits for all in-sync replicas
`enable.idempotence`	`True`	Deduplicates retries at broker level
`enable.auto.commit`	`False`	Manual commit for reliable at-least-once
`auto.offset.reset`	`earliest` (new consumers)	Don't silently miss existing messages
`compression.type`	`snappy`	Good ratio, fast encode/decode
`linger.ms`	`10`	Fill batches efficiently; minimal latency impact
`group.protocol`	`consumer` (Kafka 4.0+)	New incremental rebalance protocol

Version Reference (June 2026)

	Version	Notes
Apache Kafka	4.3.0	Latest stable (May 2026)
Confluent Platform	8.1.2	Maps to Kafka 4.1; use for Docker images
confluent-kafka (Python)	2.14.0	Preferred Python client
Last ZooKeeper release	3.9.2	Dead end — do not build new projects on it
Java requirement (brokers)	17+	From Kafka 4.0 onward

If you're migrating an existing ZooKeeper-based cluster, the path is: upgrade to Kafka 3.9 first (the bridge version with the best migration tooling), run the ZooKeeper-to-KRaft migration, then upgrade to 4.0+.

Follow me on dev.to for more data engineering content, or browse the project code at github.com/declerke.

DEV Community