Apache Kafka processes millions of events per second with persistent, distributed, fault-tolerant message storage. The backbone of event-driven architecture.
Why Kafka?
RabbitMQ: great for task queues. But when you need to:
- Process 1M+ events/second
- Replay events from any point in time
- Fan out one event to 10 different consumers
- Keep events for days/weeks/forever
Kafka was built for exactly this.
Core Concepts
Topics — named streams of events (like database tables)
Producers — write events to topics
Consumers — read events from topics
Consumer groups — parallel processing with automatic load balancing
Partitions — topics split across brokers for parallelism
Retention — events stored for configured time (hours, days, forever)
What You Get for Free
# Docker Compose (quickest start)
version: '3'
services:
kafka:
image: confluentinc/cp-kafka:7.5.0
ports: ['9092:9092']
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Producer (Node.js):
import { Kafka } from 'kafkajs';
const kafka = new Kafka({ brokers: ['localhost:9092'] });
const producer = kafka.producer();
await producer.send({
topic: 'user-events',
messages: [{ key: 'user-123', value: JSON.stringify({ action: 'purchase', amount: 99.99 }) }],
});
Consumer:
const consumer = kafka.consumer({ groupId: 'analytics-group' });
await consumer.subscribe({ topic: 'user-events' });
await consumer.run({
eachMessage: async ({ message }) => {
const event = JSON.parse(message.value.toString());
console.log('Event:', event);
},
});
Real Use Cases
- Event sourcing — store every state change, rebuild state from events
- Real-time analytics — process clickstreams, transactions, IoT data
- Microservice communication — decouple services with event-driven messaging
- Change data capture — stream database changes to other systems
- Log aggregation — centralize logs from hundreds of services
Performance
- Millions of events/second per cluster
- Millisecond latency for produce and consume
- Horizontal scaling — add brokers for more throughput
- Data retention — keep events for days, weeks, or indefinitely
If your system processes more than 1,000 events/second — you'll eventually need Kafka.
Need web scraping or data extraction? Check out my tools on Apify — get structured data from any website in minutes.
Custom solution? Email spinov001@gmail.com — quote in 2 hours.
Top comments (0)