Ajinkya Singh

Posted on Nov 25

🔥Mastering Apache Kafka: Topics, Partitions, Delivery Guarantees & Replication

#messaging #apachekafka #distributedsystems

Topic Architecture: Building Your Data Highway
Delivery Guarantees: Never Lose, Never Duplicate
Replication & ISR: Zero Downtime Architecture

Topic Architecture: Building Your Data Highway

🎯 The Foundation: Understanding Topics

Think of a Kafka topic as a multi-lane highway where your data travels. Just like how more lanes allow more cars to drive simultaneously, more partitions allow more messages to be processed in parallel.

Creating Your First Topic

/opt/kafka/bin/kafka-topics.sh --create \
  --topic order-events \
  --partitions 4 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

What just happened?

✅ Created a topic called order-events
✅ Split it into 4 parallel lanes (partitions)
✅ Made 3 copies of each partition (for safety)

🛣️ Partitions: Your Performance Dial

Scenario: E-commerce Order Processing

You run an online store with these requirements:

5,000 orders per second during peak hours
Each processing server handles 1,250 orders/second

Calculate partitions needed:

Partitions = Target Throughput ÷ Consumer Throughput
Partitions = 5,000 ÷ 1,250 = 4 partitions

The Highway Analogy Visualized

1 Partition Topic (Bottleneck):

[═══════════════════════] Single lane - all traffic stuck

4 Partition Topic (Optimal):

[═══════════════════════] Partition 0 → Consumer A
[═══════════════════════] Partition 1 → Consumer B
[═══════════════════════] Partition 2 → Consumer C
[═══════════════════════] Partition 3 → Consumer D

🔑 Message Keys: The Secret to Ordering

The Problem

Without keys, messages for the same customer could process out of order, causing chaos!

❌ Without Keys (Random Distribution):

Customer #42 creates account    → Partition 1
Customer #42 updates email      → Partition 0  ← Different partition!
Customer #42 adds phone         → Partition 2  ← Processed out of order!

✅ With Keys (Guaranteed Order):

Key: "customer-42" → hash → Partition 2

Customer #42 creates account    → Partition 2
Customer #42 updates email      → Partition 2  ← Same partition!
Customer #42 adds phone         → Partition 2  ← Perfect order!

Real-World Example: Social Media Feed

Scenario: Building an Instagram-like feed processor

When users create posts, you need to ensure each user's posts are processed in order, but different users can be processed in parallel.

Key Strategy:

User ID as Key → All posts from same user → Same partition → Guaranteed order

Result: All of Sarah's posts stay in order, but different users' posts can be processed in parallel across partitions!

Key Strategy Decision Tree

Do you need message ordering?
├─ NO  → Use NO KEY (round-robin for max throughput)
│         Perfect for: Metrics, logs, sensor data
│
└─ YES → Use MESSAGE KEY
    ├─ User activity?     → key = user_id
    ├─ Order processing?  → key = order_id
    ├─ IoT devices?       → key = device_id
    └─ Multi-tenant app?  → key = tenant_id

🛡️ Replication: Your Safety Net

Scenario: Banking Application

You're building a payment processor. One question matters most:

"If a server crashes at 2 AM, do we lose transaction records?"

Answer depends on replication factor:

Replication Factor = 1 (❌ NEVER IN PRODUCTION)

Partition 0:
└── Broker 1 (Leader) ← Single point of failure
    If Broker 1 crashes → DATA PERMANENTLY LOST

Replication Factor = 3 (✅ Production Standard)

Partition 0:
├── Broker 1 (Leader)    ← Handles all reads/writes
├── Broker 2 (Follower)  ← Backup copy
└── Broker 3 (Follower)  ← Another backup

Broker 1 crashes? → Broker 2 becomes leader → NO DATA LOSS ✓

Real Failure Scenario: Diwali Sale

Timeline of a broker failure:

11:59 PM - Diwali sale starts, 10,000 orders/second
12:03 AM - Broker 2 crashes (hardware failure)
12:03 AM - Controller detects failure (3 seconds)
12:03 AM - Broker 3 promoted to leader
12:03 AM - Orders continue processing
         - Zero orders lost ✓
         - 3-second interruption only

⚙️ Advanced Configuration: Retention & Durability

Configuration Example: Multi-Tier Data Strategy

Real-time Analytics Topic (High Volume, Short Retention)

/opt/kafka/bin/kafka-topics.sh --create \
  --topic clickstream-events \
  --partitions 16 \
  --replication-factor 3 \
  --config retention.ms=86400000 \  # 1 day
  --config min.insync.replicas=1    # Lower durability for speed

Financial Transactions Topic (Critical, Long Retention)

/opt/kafka/bin/kafka-topics.sh --create \
  --topic payment-transactions \
  --partitions 6 \
  --replication-factor 3 \
  --config retention.ms=7776000000 \  # 90 days
  --config min.insync.replicas=2      # High durability

Event Sourcing Topic (Forever, Maximum Safety)

/opt/kafka/bin/kafka-topics.sh --create \
  --topic account-events \
  --partitions 8 \
  --replication-factor 5 \
  --config retention.ms=-1 \          # Forever
  --config min.insync.replicas=3      # Maximum durability

Delivery Guarantees: Never Lose, Never Duplicate

🎭 The Central Dilemma

In distributed systems, failures are inevitable. The question is:

When your consumer crashes mid-processing, what happens to that message?

You have two choices, each with trade-offs:

📊 At Most Once: "Fire and Forget"

Definition: Process each message 0 or 1 times (never more than once)

Priority: Prevent duplicates at all costs, even if it means losing messages

The Coffee Shop Analogy

You order coffee:
1. Barista writes order ✓
2. ✅ MARKS YOUR ORDER COMPLETE (commits offset)
3. 💥 Power outage before making coffee!
4. Power returns
5. Order marked complete → You never get your coffee

Result: Order processed ZERO times (at most once)

How It Works: Automatic Offset Commit

Consumer Configuration:

enable.auto.commit=true
auto.commit.interval.ms=5000  # Commits every 5 seconds

Processing Flow:

1. Fetch message from Kafka
2. ✅ Auto-commit saves offset (marks as done)
3. Start processing message
4. If crash here → Message LOST

Timeline of Message Loss

10:00:00 - Fetch message "user_clicked_buy_button"
10:00:01 - Auto-commit saves offset 1001 ✓
10:00:02 - Start processing...
10:00:03 - 💥 APPLICATION CRASHES
10:00:04 - Restart
10:00:05 - Ask Kafka: "Where was I?"
10:00:05 - Kafka: "You were at offset 1001"
10:00:06 - Skip to offset 1001
          - Message 1000 NEVER PROCESSED

🎯 At Least Once: "Never Lose Anything"

Definition: Process each message 1 or more times (never zero times)

Priority: Ensure every message is processed, even if it means processing some twice

The Coffee Shop Analogy (Revised)

You order coffee:
1. Barista makes your coffee ✓
2. Serves it to you ✓
3. 💥 Power outage before marking order complete!
4. Power returns
5. Order NOT marked complete → Makes coffee AGAIN
6. You get TWO coffees (one extra)

Result: Order processed TWICE (at least once)

How It Works: Manual Offset Commit

Consumer Configuration:

enable.auto.commit=false  # Manual control

Processing Flow:

1. Fetch message from Kafka
2. Process message successfully ✓
3. Write to database ✓
4. Commit offset manually ✓
5. If crash before step 4 → Message REPROCESSED

Timeline of Duplicate Processing

10:00:00 - Fetch message "order_12345: charge $100"
10:00:01 - Process order successfully ✓
10:00:02 - Charge customer $100 ✓
10:00:03 - Write to database ✓
10:00:04 - 💥 APPLICATION CRASHES (before committing offset)
10:00:05 - Restart
10:00:06 - Ask Kafka: "Where was I?"
10:00:06 - Kafka: "You were at offset 1000"
10:00:07 - Fetch message 1000 AGAIN
10:00:08 - Process order AGAIN
10:00:09 - Charge customer $100 AGAIN ⚠️ (DUPLICATE!)
10:00:10 - Write to database AGAIN
10:00:11 - Commit offset successfully

🎓 Decision Matrix: Which Guarantee to Use?

                    START HERE
                        ↓
            Can you afford to lose messages?
                    ↙       ↘
                  YES        NO
                   ↓          ↓
           AT MOST ONCE   AT LEAST ONCE
                   ↓          ↓
            Use cases:   Can you handle duplicates?
            • Sensors       ↙            ↘
            • Metrics     YES            NO
            • Logs         ↓              ↓
                    Implement      Add deduplication
                    idempotency    • Transaction IDs
                    • Txn IDs      • DB constraints
                    • DB keys      • Processing log

Use Case Decision Table

Scenario	Guarantee	Reasoning
Banking transactions	At Least Once	Cannot lose money transfers
User registrations	At Least Once	Cannot lose new users
E-commerce orders	At Least Once	Cannot lose customer orders
Stock trades	At Least Once	Cannot lose trade records
IoT sensor readings	At Most Once	Losing one reading is acceptable
Application logs	At Most Once	Missing one log entry is okay
Click analytics	At Most Once	Approximate counts are fine
System metrics	At Most Once	Slightly off counts acceptable

⚙️ Configuration Comparison

At Most Once Configuration:

# Consumer config
enable.auto.commit=true
auto.commit.interval.ms=5000

At Least Once Configuration:

# Consumer config
enable.auto.commit=false  # Manual control

Replication And ISR: Zero Downtime Architecture

🏗️ The Architecture: No Single Point of Failure

The Problem Visualized

Without Replication:

Topic: customer-orders
Partition 0 → Broker 1 ← Single copy
              💥 Crashes
              ↓
          🚨 DATA LOST FOREVER
          🚨 Service DOWN

With Replication:

Topic: customer-orders
Partition 0:
├── Broker 1 (Leader)    ← Primary copy
├── Broker 2 (Follower)  ← Backup copy
└── Broker 3 (Follower)  ← Another backup

Broker 1 💥 Crashes
         ↓
Broker 2 becomes Leader (3 seconds)
         ↓
✅ NO DATA LOST
✅ Service continues

🎭 Leaders and Followers

How Replication Works

Every partition has ONE leader and multiple followers:

Partition 0 (Replication Factor = 3):

                    Producers write here
                           ↓
        ┌──────────────────────────────┐
        │  Broker 1 (LEADER)           │
        │  [msg1][msg2][msg3][msg4]    │
        └──────────────────────────────┘
                 ↓           ↓
    ┌────────────┘           └────────────┐
    ↓                                     ↓
┌──────────────────┐           ┌──────────────────┐
│ Broker 2         │           │ Broker 3         │
│ (FOLLOWER)       │           │ (FOLLOWER)       │
│ [msg1][msg2]     │           │ [msg1][msg2]     │
│ [msg3][msg4]     │           │ [msg3][msg4]     │
└──────────────────┘           └──────────────────┘
    Actively                        Actively
    replicating                     replicating

Key Rules:

All writes go through the leader
All reads go through the leader
Followers continuously pull new data from leader
Followers do NOT serve client requests

🎯 In-Sync Replicas (ISR): The Safety Net

ISR = A follower that is fully caught up and healthy

What Makes a Replica "In-Sync"?

✅ IN-SYNC REPLICA:
   • Has all committed messages
   • Actively fetching from leader
   • Not fallen behind (< replica.lag.time.max.milliseconds)
   • Ready to become leader at any moment

❌ OUT-OF-SYNC REPLICA:
   • Missing recent messages
   • Stopped fetching (network issue, crashed)
   • Fallen too far behind
   • CANNOT become leader

Real-World Analogy: Emergency Contact List

You have 3 emergency contacts:

Contact 1 (Leader): Mom
  • Always answers immediately
  • Has all current information

Contact 2 (ISR): Dad
  • Answers within seconds
  • Fully updated on family matters
  • Can step in if Mom unavailable ✓

Contact 3 (Out-of-Sync): Cousin
  • Takes hours to respond
  • Out of the loop
  • Can't be relied on in emergency ✗

🔒 Message Commitment: The Two-Phase Process

Phase 1: Leader Writes

Producer sends: "Transfer $500 from Alice to Bob"
        ↓
Leader writes to local log
        ↓
Message is "uncommitted" (not safe yet)

Phase 2: ISRs Acknowledge

Leader notifies followers:
        ↓
Follower 1 (ISR) replicates ✓
        ↓
Follower 2 (ISR) replicates ✓
        ↓
Message now "committed" (safe!)
        ↓
Producer receives ACK

Producer Configuration Impact

acks=1 (Fast but Risky):

Producer → Leader writes → ACK immediately
           (Followers not yet replicated)

Leader crashes before replication?
→ Message LOST 💥

acks=all (Slow but Safe):

Producer → Leader writes → Wait for ALL ISRs → ACK

Takes longer, but:
→ Message GUARANTEED safe ✅
→ Zero data loss

🚨 Failure Scenario: Leader Election

Real-World Example: Diwali Sale at 2 AM

Initial State:

Topic: flash-sale-orders
Partition 0:
├── Broker 1 (Leader)    ← 50,000 orders/sec
│   ISR: [1, 2, 3]
├── Broker 2 (Follower)  ← In-Sync
└── Broker 3 (Follower)  ← In-Sync

Failure Timeline:

02:00:00 - Broker 1 crashes (power supply failure)
02:00:01 - Producer attempts write → Connection refused
02:00:02 - Zookeeper detects missing heartbeat
02:00:03 - Controller initiates leader election
           Candidates: Broker 2, Broker 3 (both ISR)
02:00:03 - Broker 2 elected as new leader
02:00:04 - Controller notifies all brokers
02:00:04 - Producer auto-discovers new leader
02:00:05 - Orders resume processing ✅

Total downtime: 5 seconds
Orders lost: ZERO (because acks=all)

Why Was Broker 2 Chosen?

✅ Was in ISR (fully caught up)
✅ Had all committed messages
✅ First in preferred replica list

What If Broker 2 Was Out-of-Sync?

02:00:03 - Controller checks ISR: [1, 3]
02:00:03 - Broker 2 NOT in ISR → SKIPPED
02:00:03 - Broker 3 elected instead

⚙️ Configuration Deep Dive

1. Replication Factor (Topic Level)

# Development: Fast, no safety
/opt/kafka/bin/kafka-topics.sh --create \
  --topic dev-logs \
  --replication-factor 1  # ❌ Never in production!

# Staging: Moderate safety
/opt/kafka/bin/kafka-topics.sh --create \
  --topic staging-events \
  --replication-factor 2

# Production: Standard safety
/opt/kafka/bin/kafka-topics.sh --create \
  --topic production-orders \
  --replication-factor 3  # ✅ Industry standard

# Mission-Critical: Maximum safety
/opt/kafka/bin/kafka-topics.sh --create \
  --topic financial-transactions \
  --replication-factor 5

2. Min In-Sync Replicas (Topic Level)

Works with acks=all on producer side:

# Weak durability (fast but risky)
--config min.insync.replicas=1
# Leader only needs to write
# If leader crashes before followers replicate → DATA LOST

# Strong durability (recommended)
--config min.insync.replicas=2
# Leader + at least 1 follower must acknowledge
# Can survive 1 broker failure

# Maximum durability (mission-critical)
--config min.insync.replicas=3
# Leader + at least 2 followers must acknowledge
# Can survive 2 broker failures

3. Replica Lag Time (Broker Level)

# In server.properties
replica.lag.time.max.milliseconds=10000  # 10 seconds

# If follower doesn't fetch within 10 seconds:
# → Removed from ISR
# → Cannot become leader
# → Logs warning

Table of Contents