Byron Hsieh

Posted on Jan 20

Kafka Producer Deep Dive: From Basics to Production-Ready Configuration

#java #kafka

Introduction

When I started learning Apache Kafka, the Producer seemed simple at first - just call send() and you're done, right?

Wrong.

As I dug deeper, I discovered a rich set of configurations that determine whether your messages are delivered reliably or potentially lost. Understanding these concepts transformed how I think about building data pipelines.

In this article, I'll walk you through everything I learned about Kafka Producers, from the basics to production-ready configurations.

This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".

Producer Basics
The Sticky Partitioner
Producer Acknowledgements (acks)
Producer Retries
Idempotent Producer
Production-Ready Configuration

1. Producer Basics

Setting Up a Producer

Every Kafka Producer starts with configuration:

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

The Critical Trio: send(), flush(), close()

This is where things get interesting:

send() - Asynchronous Operation

producer.send(record);

Asynchronous: The message goes into a buffer, NOT sent immediately
If your program exits right after this, the message might never reach Kafka

flush() - Synchronous Operation

producer.flush();

Synchronous: Forces all buffered messages to be sent and blocks until complete
Useful for learning/demos, but rarely used in production (impacts performance)

close() - Cleanup

producer.close();

Shuts down the Producer and releases resources
Internally calls flush() - ensures all messages are sent
MUST be called in production to prevent resource leaks

send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----> [Send] -----> [Clean up]
(async)        (sync)         (includes flush)

Callbacks for Monitoring

Callbacks let you track message delivery:

producer.send(record, (metadata, exception) -> {
    if (exception == null) {
        log.info("Sent to topic: " + metadata.topic() +
                " partition: " + metadata.partition() +
                " offset: " + metadata.offset());
    } else {
        log.error("Error while producing", exception);
    }
});

What you can get from metadata:

topic: Which topic the message was sent to
partition: Which partition number
offset: The position of this message in the partition
timestamp: When the message was created

2. The Sticky Partitioner

When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.

Was my code broken? Not quite. This is the Sticky Partitioner at work.

How It Works

Introduced in Kafka 2.4+, the Sticky Partitioner is the default when messages don't have a key:

Messages "stick" to one partition until a batch is full
When the batch is sent, switch to a different partition
Goal: Reduce network requests by batching more efficiently

When Does a Batch Get Sent?

A batch is sent when ANY of these conditions is met:

Trigger	Setting	Default
Size limit	`batch.size`	16384 bytes (16 KB)
Time limit	`linger.ms`	0 ms
Manual	`flush()`	-

Common Misconception

batch.size is in BYTES, not message count!

// Make batches smaller to observe partition switching
properties.setProperty("batch.size", "400");  // 400 bytes

Why Sticky is Better Than Round-Robin

Old Way (Pre-Kafka 2.4):

Each message goes to a different partition
100 messages = potentially 100 network requests

New Way (Sticky Partitioner):

Messages batch together per partition
100 messages = maybe 5 network requests
Fewer network calls = better throughput

3. Producer Acknowledgements (acks)

This is one of the most important producer configurations.

The Three Levels

Setting	Behavior	Data Loss Risk
`acks=0`	Fire and forget - don't wait for any confirmation	High
`acks=1`	Wait for leader broker only	Medium
`acks=all`	Wait for leader + all in-sync replicas	None

acks=0 (Fire and Forget)

Producer → [send] → Broker
              ↓
         (don't wait)
              ↓
         [continue]

Producer doesn't wait for any acknowledgement
Highest throughput, but data loss is possible
Use case: Metrics collection where some loss is acceptable

acks=1 (Leader Only)

Producer → [send] → Leader Broker → [commit]
                          ↓
                    [send ack back]
                          ↓
                 Replicas sync later (background)

Producer waits for leader to acknowledge
Problem: If leader fails before replication, data is lost
Was the default from Kafka 1.0 to 2.8

acks=all (Full Replication)

Producer → [send] → Leader → Replica 1 → [ack]
                      ↓           ↓
                  Replica 2 → [ack]
                      ↓
              [all acks received]
                      ↓
              [send ack to producer]

Producer waits for leader AND all in-sync replicas (ISR)
No data loss (with proper configuration)
Default since Kafka 3.0

The min.insync.replicas Setting

acks=all works together with min.insync.replicas:

Replication Factor = 3
min.insync.replicas = 2

Scenario: Only 1 broker available
Result: Producer receives NOT_ENOUGH_REPLICAS exception
        → Better to fail than to lose data!

Best practice formula:

Brokers that can fail = Replication Factor - min.insync.replicas

Example: RF=3, min.insync=2 → Can tolerate 1 broker failure

4. Producer Retries

The Retry Mechanism

When sending fails, Kafka retries automatically:

|<------------ delivery.timeout.ms (default: 2 min) ------------>|
|                                                                  |
send() → [batch] → [send] → [fail] → [wait] → [retry] → [success]
                              ↑         ↑
                         network    retry.backoff.ms
                          error      (default: 100ms)

Key Settings

Setting	Default	Description
`retries`	2147483647 (Kafka 2.1+)	Max retry attempts
`retry.backoff.ms`	100	Wait time between retries
`delivery.timeout.ms`	120000 (2 min)	Upper bound for total delivery time

Understanding delivery.timeout.ms

This is the most important timeout setting:

// Total time from send() to success or failure
properties.setProperty("delivery.timeout.ms", "120000"); // 2 minutes

What happens when timeout is reached:

producer.send(record, (metadata, exception) -> {
    if (exception instanceof TimeoutException) {
        // delivery.timeout.ms exceeded
        // Message was NOT delivered - handle it!
    }
});

The Out-of-Order Problem

With retries enabled and max.in.flight.requests.per.connection > 1:

Timeline:
1. Send batch A (messages 1-10)
2. Send batch B (messages 11-20)
3. Batch A fails (network error)
4. Batch B succeeds ← commits first!
5. Batch A retries and succeeds

Result in Kafka: [11-20, 1-10]  ← Out of order!

Solution: Use Idempotent Producer (next section)

5. Idempotent Producer

The Duplicate Problem

Without idempotence, network errors can cause duplicates:

1. Producer sends message
2. Kafka commits message to log
3. Kafka sends ack
4. Ack is LOST (network error)
5. Producer thinks it failed → retries
6. Kafka commits AGAIN → Duplicate!

The Solution

properties.setProperty("enable.idempotence", "true");

How It Works

Each producer gets a Producer ID (PID) and each message batch gets a sequence number:

Producer (PID=1) sends:
  Batch 1: seq=0 → Kafka commits
  Batch 1: seq=0 → Kafka says "already have seq=0, ignoring"

Result: No duplicates!

What Idempotence Automatically Sets

When you enable idempotence, Kafka automatically configures:

Setting	Value	Why
`acks`	all	Need confirmation from all replicas
`retries`	MAX_VALUE	Retry until delivery.timeout.ms
`max.in.flight.requests.per.connection`	5	With ordering guaranteed!

The Magic of Ordering with max.in.flight=5

You might wonder: "How can we have 5 in-flight requests AND maintain order?"

Answer: Kafka uses sequence numbers to detect out-of-order batches and rejects them, forcing the producer to retry in correct order.

Batch A (seq=0) fails, Batch B (seq=1) arrives first
Kafka: "I expected seq=0, got seq=1 - rejecting!"
Producer retries Batch A, then Batch B
Order maintained!

Default Since Kafka 3.0

Idempotent producer is enabled by default in Kafka 3.0+, but explicitly enable it for older versions or for clarity.

6. Production-Ready Configuration

Safe Producer Settings

Properties properties = new Properties();

// Connection
properties.setProperty("bootstrap.servers", "broker1:9092,broker2:9092,broker3:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());

// Reliability (Kafka 3.0+ defaults, but explicit is better)
properties.setProperty("enable.idempotence", "true");
properties.setProperty("acks", "all");
properties.setProperty("retries", String.valueOf(Integer.MAX_VALUE));
properties.setProperty("max.in.flight.requests.per.connection", "5");

// Timeouts
properties.setProperty("delivery.timeout.ms", "120000");  // 2 minutes
properties.setProperty("request.timeout.ms", "30000");    // 30 seconds

High Throughput Settings (Optional)

// Batching - wait a bit to collect more messages
properties.setProperty("linger.ms", "20");
properties.setProperty("batch.size", String.valueOf(32 * 1024)); // 32KB

// Compression - reduce network bandwidth
properties.setProperty("compression.type", "snappy");  // or "lz4", "zstd"

Configuration Cheat Sheet

┌─────────────────────────────────────────────────────────────┐
│                    PRODUCER SETTINGS                        │
├─────────────────────────────────────────────────────────────┤
│  RELIABILITY                                                │
│  ├── enable.idempotence = true     (prevent duplicates)    │
│  ├── acks = all                    (full replication)      │
│  └── retries = MAX_VALUE           (bounded by timeout)    │
├─────────────────────────────────────────────────────────────┤
│  TIMEOUTS                                                   │
│  ├── delivery.timeout.ms = 120000  (total time bound)      │
│  ├── request.timeout.ms = 30000    (per request)           │
│  └── retry.backoff.ms = 100        (between retries)       │
├─────────────────────────────────────────────────────────────┤
│  THROUGHPUT                                                 │
│  ├── linger.ms = 20                (batch collection time) │
│  ├── batch.size = 32768            (batch size in bytes)   │
│  └── compression.type = snappy     (reduce network I/O)    │
├─────────────────────────────────────────────────────────────┤
│  BROKER/TOPIC (not producer settings)                       │
│  ├── replication.factor = 3                                 │
│  └── min.insync.replicas = 2                               │
└─────────────────────────────────────────────────────────────┘

Key Takeaways

send() is asynchronous - always call close() to ensure delivery
Sticky Partitioner batches messages by partition for better throughput
acks=all + min.insync.replicas=2 = no data loss (with RF=3)
delivery.timeout.ms is the upper bound for all retries
Idempotent Producer prevents duplicates AND maintains ordering
Default since Kafka 3.0 - idempotence is on, but be explicit

Conclusion

The Kafka Producer is deceptively simple on the surface but incredibly powerful when you understand its internals:

Batching improves throughput
Acknowledgements control durability
Retries handle transient failures
Idempotence prevents duplicates

The key insight: Kafka's defaults have gotten much better over time. In Kafka 3.0+, you get idempotent, exactly-once producer semantics out of the box. But understanding why these settings matter helps you tune them for your specific use case.

This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more tutorials!

Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3

Introduction

Table of Contents

1. Producer Basics

Setting Up a Producer

The Critical Trio: send(), flush(), close()

send() - Asynchronous Operation

flush() - Synchronous Operation

close() - Cleanup

Callbacks for Monitoring

2. The Sticky Partitioner

How It Works

When Does a Batch Get Sent?

Common Misconception

Why Sticky is Better Than Round-Robin

3. Producer Acknowledgements (acks)

The Three Levels

acks=0 (Fire and Forget)

acks=1 (Leader Only)

acks=all (Full Replication)

The min.insync.replicas Setting

4. Producer Retries

The Retry Mechanism

Key Settings

Understanding delivery.timeout.ms

The Out-of-Order Problem

5. Idempotent Producer

The Duplicate Problem

The Solution

How It Works

What Idempotence Automatically Sets

The Magic of Ordering with max.in.flight=5

Default Since Kafka 3.0

6. Production-Ready Configuration

Safe Producer Settings

High Throughput Settings (Optional)

Configuration Cheat Sheet

Key Takeaways

Conclusion