Introduction
When I started learning Apache Kafka, the Producer seemed simple at first - just call send() and you're done, right?
Wrong.
As I dug deeper, I discovered a rich set of configurations that determine whether your messages are delivered reliably or potentially lost. Understanding these concepts transformed how I think about building data pipelines.
In this article, I'll walk you through everything I learned about Kafka Producers, from the basics to production-ready configurations.
This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".
Table of Contents
- Producer Basics
- The Sticky Partitioner
- Producer Acknowledgements (acks)
- Producer Retries
- Idempotent Producer
- Production-Ready Configuration
1. Producer Basics
Setting Up a Producer
Every Kafka Producer starts with configuration:
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
The Critical Trio: send(), flush(), close()
This is where things get interesting:
send() - Asynchronous Operation
producer.send(record);
- Asynchronous: The message goes into a buffer, NOT sent immediately
- If your program exits right after this, the message might never reach Kafka
flush() - Synchronous Operation
producer.flush();
- Synchronous: Forces all buffered messages to be sent and blocks until complete
- Useful for learning/demos, but rarely used in production (impacts performance)
close() - Cleanup
producer.close();
- Shuts down the Producer and releases resources
- Internally calls flush() - ensures all messages are sent
- MUST be called in production to prevent resource leaks
send() flush() close()
↓ ↓ ↓
[Buffer] -----> [Send] -----> [Clean up]
(async) (sync) (includes flush)
Callbacks for Monitoring
Callbacks let you track message delivery:
producer.send(record, (metadata, exception) -> {
if (exception == null) {
log.info("Sent to topic: " + metadata.topic() +
" partition: " + metadata.partition() +
" offset: " + metadata.offset());
} else {
log.error("Error while producing", exception);
}
});
What you can get from metadata:
-
topic: Which topic the message was sent to -
partition: Which partition number -
offset: The position of this message in the partition -
timestamp: When the message was created
2. The Sticky Partitioner
When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.
Was my code broken? Not quite. This is the Sticky Partitioner at work.
How It Works
Introduced in Kafka 2.4+, the Sticky Partitioner is the default when messages don't have a key:
- Messages "stick" to one partition until a batch is full
- When the batch is sent, switch to a different partition
- Goal: Reduce network requests by batching more efficiently
When Does a Batch Get Sent?
A batch is sent when ANY of these conditions is met:
| Trigger | Setting | Default |
|---|---|---|
| Size limit | batch.size |
16384 bytes (16 KB) |
| Time limit | linger.ms |
0 ms |
| Manual | flush() |
- |
Common Misconception
batch.size is in BYTES, not message count!
// Make batches smaller to observe partition switching
properties.setProperty("batch.size", "400"); // 400 bytes
Why Sticky is Better Than Round-Robin
Old Way (Pre-Kafka 2.4):
- Each message goes to a different partition
- 100 messages = potentially 100 network requests
New Way (Sticky Partitioner):
- Messages batch together per partition
- 100 messages = maybe 5 network requests
- Fewer network calls = better throughput
3. Producer Acknowledgements (acks)
This is one of the most important producer configurations.
The Three Levels
| Setting | Behavior | Data Loss Risk |
|---|---|---|
acks=0 |
Fire and forget - don't wait for any confirmation | High |
acks=1 |
Wait for leader broker only | Medium |
acks=all |
Wait for leader + all in-sync replicas | None |
acks=0 (Fire and Forget)
Producer → [send] → Broker
↓
(don't wait)
↓
[continue]
- Producer doesn't wait for any acknowledgement
- Highest throughput, but data loss is possible
- Use case: Metrics collection where some loss is acceptable
acks=1 (Leader Only)
Producer → [send] → Leader Broker → [commit]
↓
[send ack back]
↓
Replicas sync later (background)
- Producer waits for leader to acknowledge
- Problem: If leader fails before replication, data is lost
- Was the default from Kafka 1.0 to 2.8
acks=all (Full Replication)
Producer → [send] → Leader → Replica 1 → [ack]
↓ ↓
Replica 2 → [ack]
↓
[all acks received]
↓
[send ack to producer]
- Producer waits for leader AND all in-sync replicas (ISR)
- No data loss (with proper configuration)
- Default since Kafka 3.0
The min.insync.replicas Setting
acks=all works together with min.insync.replicas:
Replication Factor = 3
min.insync.replicas = 2
Scenario: Only 1 broker available
Result: Producer receives NOT_ENOUGH_REPLICAS exception
→ Better to fail than to lose data!
Best practice formula:
Brokers that can fail = Replication Factor - min.insync.replicas
Example: RF=3, min.insync=2 → Can tolerate 1 broker failure
4. Producer Retries
The Retry Mechanism
When sending fails, Kafka retries automatically:
|<------------ delivery.timeout.ms (default: 2 min) ------------>|
| |
send() → [batch] → [send] → [fail] → [wait] → [retry] → [success]
↑ ↑
network retry.backoff.ms
error (default: 100ms)
Key Settings
| Setting | Default | Description |
|---|---|---|
retries |
2147483647 (Kafka 2.1+) | Max retry attempts |
retry.backoff.ms |
100 | Wait time between retries |
delivery.timeout.ms |
120000 (2 min) | Upper bound for total delivery time |
Understanding delivery.timeout.ms
This is the most important timeout setting:
// Total time from send() to success or failure
properties.setProperty("delivery.timeout.ms", "120000"); // 2 minutes
What happens when timeout is reached:
producer.send(record, (metadata, exception) -> {
if (exception instanceof TimeoutException) {
// delivery.timeout.ms exceeded
// Message was NOT delivered - handle it!
}
});
The Out-of-Order Problem
With retries enabled and max.in.flight.requests.per.connection > 1:
Timeline:
1. Send batch A (messages 1-10)
2. Send batch B (messages 11-20)
3. Batch A fails (network error)
4. Batch B succeeds ← commits first!
5. Batch A retries and succeeds
Result in Kafka: [11-20, 1-10] ← Out of order!
Solution: Use Idempotent Producer (next section)
5. Idempotent Producer
The Duplicate Problem
Without idempotence, network errors can cause duplicates:
1. Producer sends message
2. Kafka commits message to log
3. Kafka sends ack
4. Ack is LOST (network error)
5. Producer thinks it failed → retries
6. Kafka commits AGAIN → Duplicate!
The Solution
properties.setProperty("enable.idempotence", "true");
How It Works
Each producer gets a Producer ID (PID) and each message batch gets a sequence number:
Producer (PID=1) sends:
Batch 1: seq=0 → Kafka commits
Batch 1: seq=0 → Kafka says "already have seq=0, ignoring"
Result: No duplicates!
What Idempotence Automatically Sets
When you enable idempotence, Kafka automatically configures:
| Setting | Value | Why |
|---|---|---|
acks |
all | Need confirmation from all replicas |
retries |
MAX_VALUE | Retry until delivery.timeout.ms |
max.in.flight.requests.per.connection |
5 | With ordering guaranteed! |
The Magic of Ordering with max.in.flight=5
You might wonder: "How can we have 5 in-flight requests AND maintain order?"
Answer: Kafka uses sequence numbers to detect out-of-order batches and rejects them, forcing the producer to retry in correct order.
Batch A (seq=0) fails, Batch B (seq=1) arrives first
Kafka: "I expected seq=0, got seq=1 - rejecting!"
Producer retries Batch A, then Batch B
Order maintained!
Default Since Kafka 3.0
Idempotent producer is enabled by default in Kafka 3.0+, but explicitly enable it for older versions or for clarity.
6. Production-Ready Configuration
Safe Producer Settings
Properties properties = new Properties();
// Connection
properties.setProperty("bootstrap.servers", "broker1:9092,broker2:9092,broker3:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
// Reliability (Kafka 3.0+ defaults, but explicit is better)
properties.setProperty("enable.idempotence", "true");
properties.setProperty("acks", "all");
properties.setProperty("retries", String.valueOf(Integer.MAX_VALUE));
properties.setProperty("max.in.flight.requests.per.connection", "5");
// Timeouts
properties.setProperty("delivery.timeout.ms", "120000"); // 2 minutes
properties.setProperty("request.timeout.ms", "30000"); // 30 seconds
High Throughput Settings (Optional)
// Batching - wait a bit to collect more messages
properties.setProperty("linger.ms", "20");
properties.setProperty("batch.size", String.valueOf(32 * 1024)); // 32KB
// Compression - reduce network bandwidth
properties.setProperty("compression.type", "snappy"); // or "lz4", "zstd"
Configuration Cheat Sheet
┌─────────────────────────────────────────────────────────────┐
│ PRODUCER SETTINGS │
├─────────────────────────────────────────────────────────────┤
│ RELIABILITY │
│ ├── enable.idempotence = true (prevent duplicates) │
│ ├── acks = all (full replication) │
│ └── retries = MAX_VALUE (bounded by timeout) │
├─────────────────────────────────────────────────────────────┤
│ TIMEOUTS │
│ ├── delivery.timeout.ms = 120000 (total time bound) │
│ ├── request.timeout.ms = 30000 (per request) │
│ └── retry.backoff.ms = 100 (between retries) │
├─────────────────────────────────────────────────────────────┤
│ THROUGHPUT │
│ ├── linger.ms = 20 (batch collection time) │
│ ├── batch.size = 32768 (batch size in bytes) │
│ └── compression.type = snappy (reduce network I/O) │
├─────────────────────────────────────────────────────────────┤
│ BROKER/TOPIC (not producer settings) │
│ ├── replication.factor = 3 │
│ └── min.insync.replicas = 2 │
└─────────────────────────────────────────────────────────────┘
Key Takeaways
send() is asynchronous - always call
close()to ensure deliverySticky Partitioner batches messages by partition for better throughput
acks=all + min.insync.replicas=2 = no data loss (with RF=3)
delivery.timeout.ms is the upper bound for all retries
Idempotent Producer prevents duplicates AND maintains ordering
Default since Kafka 3.0 - idempotence is on, but be explicit
Conclusion
The Kafka Producer is deceptively simple on the surface but incredibly powerful when you understand its internals:
- Batching improves throughput
- Acknowledgements control durability
- Retries handle transient failures
- Idempotence prevents duplicates
The key insight: Kafka's defaults have gotten much better over time. In Kafka 3.0+, you get idempotent, exactly-once producer semantics out of the box. But understanding why these settings matter helps you tune them for your specific use case.
This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more tutorials!
Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3
Top comments (0)