DEV Community

Khadijah (Dana Ordalina)
Khadijah (Dana Ordalina)

Posted on

Apache Kafka Explained Simply: Core Concepts, Best Practices, and Common Pitfalls for Developers

Apache Kafka has become the de‑facto standard for data streaming and event‑driven systems. Yet many developers still struggle to understand when Kafka is actually needed and how to avoid common pitfalls. This post is a concise, practical introduction to help you get productive faster.
🎯

When Kafka Truly Makes Sense

Kafka shines in scenarios where you need:

  • High throughput — tens or hundreds of thousands of messages per second
  • Horizontal scalability
  • Reliable delivery guarantees
  • Event storage and replay

Typical use cases:

  • Logs and telemetry
  • Event‑driven architecture
  • Microservice integration
  • Stream processing
  • Change Data Capture (CDC)

If you just need a simple task queue, Kafka may be overkill.

Kafka Core Concepts Explained Simply

Topic
A logical category of messages — like a folder for events.

Partition
A physical subdivision of a topic.
Enables scaling reads and writes.

Producer
Sends messages into a topic.

Consumer
Reads messages from a topic.

Consumer Group
Multiple consumers working together to share partitions.

Offset
A pointer (<>) to the current read position inside a partition.

🧱 Kafka Core Concepts Explained Simply
Topic
A logical category of messages — like a folder for events.

Partition
A physical subdivision of a topic.
Enables scaling reads and writes.

Producer
Sends messages into a topic.

Consumer
Reads messages from a topic.

Consumer Group
Multiple consumers working together to share partitions.

Offset
A pointer (<>) to the current read position inside a partition.

Minimal Configuration You Should Understand

acks
0 — fastest but messages may be lost

1 — balanced

all — safest

retention.ms / retention.bytes
How long Kafka keeps data.

replication.factor
Use 3 for production.

min.insync.replicas
Guarantees that a write reaches at least N replicas.

Common Mistakes Developers Make

  • Using Kafka as a simple job queue
  • One topic with one partition → “Kafka is slow”
  • Manual offset handling instead of consumer groups
  • No monitoring of consumer lag
  • Poor key selection → partition imbalance 📊 How to Know Kafka Is Struggling

Watch for:

  • Growing consumer lag
    -** Under‑replicated partitions**

  • Long JVM GC pauses

  • Network saturation

🧪

Minimal Python Example

from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers="localhost:9092")
producer.send("events", b"hello kafka")
producer.flush()

Enter fullscreen mode Exit fullscreen mode

Consumer:

python
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    "events",
    bootstrap_servers="localhost:9092",
    auto_offset_reset="earliest",
    group_id="demo-group"
)

for msg in consumer:
    print(msg.value)
Enter fullscreen mode Exit fullscreen mode

Final Thoughts
Kafka is powerful but not a silver bullet. Understanding its core concepts and configuring it properly lets you build scalable, reliable systems. Start small, monitor your metrics, and iterate.

Tags
kafka streamingarchitecture microservices beginners devops backend

Top comments (0)