DEV Community

Cover image for ⚑ How Kafka Stores Billions of Messages: The Storage Architecture Nobody Explains
Ajinkya Singh
Ajinkya Singh

Posted on

⚑ How Kafka Stores Billions of Messages: The Storage Architecture Nobody Explains

🎯 Introduction: Beyond the Message Queue

Imagine you're running a global streaming platform processing millions of viewer events every secondβ€”play, pause, skip, like, and watch history. How do you store this tsunami of data efficiently while keeping it instantly accessible? This is where Kafka's storage architecture becomes a masterpiece of engineering.


πŸ—οΈ The Three Core Responsibilities

Every Kafka broker juggles three critical tasks simultaneously:

1. πŸ“₯ Producer Gateway

Accepts incoming streams of events from applications across your network

2. πŸ’Ύ Storage Engine

Writes messages to disk durably and efficientlyβ€”this is where the magic happens

3. πŸ“€ Consumer Server

Rapidly locates and delivers data to consumers while replicating to other brokers


πŸ“Š The Storage Hierarchy

Topic: "viewer-activity"
β”‚
β”œβ”€β”€ Partition 0 (folder: /data/viewer-activity-0/)
β”‚   β”œβ”€β”€ 00000000000000000000.log
β”‚   β”œβ”€β”€ 00000000000000000000.index
β”‚   β”œβ”€β”€ 00000000000000000000.timeindex
β”‚   β”œβ”€β”€ 00000000000000850000.log
β”‚   β”œβ”€β”€ 00000000000000850000.index
β”‚   β”œβ”€β”€ 00000000000000850000.timeindex
β”‚   └── 00000000000001700000.log (active segment)
β”‚
β”œβ”€β”€ Partition 1 (folder: /data/viewer-activity-1/)
β”‚   └── [similar segment files...]
β”‚
└── Partition 2 (folder: /data/viewer-activity-2/)
    └── [similar segment files...]
Enter fullscreen mode Exit fullscreen mode

🧩 Why Segments? The Problem with Giant Files

Imagine storing all viewer events in one massive file:

❌ A 15TB file is impossible to manage

❌ Deleting old watch history requires rewriting everything

❌ Finding "what did user watch at 3 PM yesterday" means scanning terabytes

❌ File corruption destroys all your data

The Segment Solution:

Kafka divides each partition's log into segmentsβ€”smaller, manageable chunks (typically 1GB each).


πŸ”„ When Does Kafka Create a New Segment?

A new segment rolls when either condition is met:

Condition 1: Size Threshold

Current segment reaches 1GB (default)
β†’ Close it, start fresh segment
Enter fullscreen mode Exit fullscreen mode

Condition 2: Time Threshold

7 days have passed (default)
β†’ Roll to new segment regardless of size
Enter fullscreen mode Exit fullscreen mode

πŸ“… Real-World Timeline Example:

Monday 9:00 AM  β†’ Users start watching shows
                β†’ Messages written to segment 0

Monday 2:30 PM  β†’ Segment 0 hits 1GB
                β†’ Kafka creates segment 1
                β†’ New events go to segment 1

Tuesday 3:00 PM β†’ Segment 1 hits 1GB
                β†’ Kafka creates segment 2 (now active)
Enter fullscreen mode Exit fullscreen mode

πŸ—‚οΈ Segment File Anatomy

For each segment, Kafka maintains three files:

Partition Folder: viewer-activity-0/
β”‚
β”œβ”€β”€ 00000000000000850000.log        (1GB - actual messages)
β”œβ”€β”€ 00000000000000850000.index      (10MB - offset lookup map)
└── 00000000000000850000.timeindex  (10MB - timestamp lookup map)
Enter fullscreen mode Exit fullscreen mode

File Naming Convention

The number 00000000000000850000 is the base offsetβ€”the first message's offset in this segment.


⚑ Index Files: Kafka's Speed Secret

🎯 The Two-Step Lookup Process

When Kafka needs to find a message, it uses TWO files in sequence:

  1. First: Check the segment file name (base offset) to find the RIGHT segment
  2. Second: Use the .index file to find the EXACT location within that segment

Let's see this in action!


πŸ” 1. Offset Index (.index)

Maps message offset β†’ byte position in the log file

How It Works:

Consumer Request: "Give me messages starting from offset 850,125"

Step 1: FIND THE RIGHT SEGMENT (using base offsets)
        Available segments:
        β€’ 00000000000000000000.log (base: 0)
        β€’ 00000000000000850000.log (base: 850,000) ← THIS ONE!
        β€’ 00000000000001700000.log (base: 1,700,000)

        Offset 850,125 falls between 850,000 and 1,700,000
        β†’ Select segment: 00000000000000850000.log

Step 2: FIND EXACT POSITION (using .index file)
        Broker loads 00000000000000850000.index into memory
        Binary search finds:
        β€’ offset 850,100 β†’ byte 3072
        β€’ offset 850,150 β†’ byte 6144

        Message 850,125 must be between bytes 3072-6144

Step 3: READ THE MESSAGE
        Read from byte 3072 in 00000000000000850000.log
        Scan forward until offset 850,125 is found
Enter fullscreen mode Exit fullscreen mode

Index Structure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Offset  β”‚  Byte Pos   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  850000  β”‚      0      β”‚
β”‚  850100  β”‚   3072      β”‚
β”‚  850150  β”‚   6144      β”‚
β”‚  850200  β”‚   9216      β”‚
β”‚   ...    β”‚    ...      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Result: ✨ No scanning millions of messagesβ€”instant lookup!


πŸ“‹ Visual: The Two-File Lookup System

🎯 FINDING MESSAGE BY OFFSET (850,125)

File System View:
β”œβ”€β”€ 00000000000000000000.log      (offsets: 0 - 849,999)
β”œβ”€β”€ 00000000000000000000.index
β”œβ”€β”€ 00000000000000850000.log      (offsets: 850,000 - 1,699,999) ⭐
β”œβ”€β”€ 00000000000000850000.index    ⭐ USE THIS!
β”œβ”€β”€ 00000000000001700000.log      (offsets: 1,700,000+)
└── 00000000000001700000.index

Step 1: Match offset to segment name (base offset)
        850,125 is >= 850,000 and < 1,700,000
        β†’ Open segment: 00000000000000850000

Step 2: Use that segment's .index file
        β†’ Open: 00000000000000850000.index
        β†’ Find byte position: 3072
        β†’ Read from .log file at byte 3072
Enter fullscreen mode Exit fullscreen mode

πŸ”‘ Key Insight: Why Two Files?

Without segment files (base offset in filename):

  • ❌ Would need to check EVERY index file
  • ❌ Open thousands of files to find the right one
  • ❌ Very slow!

With segment files (base offset in filename):

  • βœ… Filename tells you the offset range instantly
  • βœ… Only open ONE .index file
  • βœ… Lightning fast!

The Two-Step Magic:

  1. Segment filename (base offset) = Coarse filter (which file?)
  2. Index file (.index) = Fine filter (which byte position?)

⏰ 2. Time Index (.timeindex)

Maps timestamp β†’ corresponding offset (also uses two-step lookup!)

How It Works:

Consumer Request: "Show me all viewer activity from the last hour"
                  (timestamp: 2025-11-25 13:00:00)

Step 1: FIND THE RIGHT SEGMENT (using .timeindex files)
        Check all segments' time ranges:
        β€’ Segment 0: timestamps 10:00 - 11:59
        β€’ Segment 850000: timestamps 12:00 - 13:59 ← THIS ONE!
        β€’ Segment 1700000: timestamps 14:00 onwards

        Timestamp 13:00:00 falls in segment 850000

Step 2: FIND THE OFFSET (using .timeindex)
        Broker checks 00000000000000850000.timeindex
        Binary search finds:
        2025-11-25 13:00:00 β†’ offset 850,750

Step 3: USE OFFSET INDEX (now it's a regular offset lookup!)
        Uses 00000000000000850000.index
        Finds: offset 850,750 β†’ byte position 225,280

Step 4: READ THE MESSAGES
        Reads from byte 225,280 in .log file
        Returns all messages from that point forward
Enter fullscreen mode Exit fullscreen mode

Time Index Structure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Timestamp        β”‚  Offset  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2025-11-25 10:00:00  β”‚  850000  β”‚
β”‚ 2025-11-25 11:30:00  β”‚  850300  β”‚
β”‚ 2025-11-25 13:00:00  β”‚  850750  β”‚
β”‚ 2025-11-25 14:15:00  β”‚  851000  β”‚
β”‚        ...           β”‚   ...    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

🧠 The Memory Advantage

Why are index lookups so blazing fast?

Disk Storage (Slow):
β”œβ”€β”€ segment.log (1GB - rarely loaded fully)
β”œβ”€β”€ segment.index (10MB - loaded into RAM!)
└── segment.timeindex (10MB - loaded into RAM!)

Memory (Lightning Fast):
└── Index files cached β†’ Microsecond lookups
Enter fullscreen mode Exit fullscreen mode

Because index files are tiny (10-20MB), they fit entirely in memory:

  • βœ… No disk reads for lookups
  • βœ… Binary search through in-memory structures
  • βœ… Microseconds instead of seconds

πŸ—‘οΈ Retention Policies: Managing Disk Space

Without cleanup, your disks would eventually overflow. Kafka automatically deletes old segments based on policies.

1. Time-Based Retention

Configuration:
log.retention.hours = 168  (7 days)

Timeline:
Day 1  β†’ Segment 0 created (Nov 18 viewer data)
Day 2  β†’ Segment 1 created (Nov 19 viewer data)
...
Day 8  β†’ Segment 0 is now 7 days old
       β†’ Kafka deletes:
         - segment-0.log
         - segment-0.index
         - segment-0.timeindex
Enter fullscreen mode Exit fullscreen mode

Visual:

Before Cleanup (Day 8):
[Seg0: Nov 18] [Seg1: Nov 19] [Seg2: Nov 20] ... [Seg7: Nov 25]
      ↓ 7 days old - DELETE

After Cleanup:
[Seg1: Nov 19] [Seg2: Nov 20] ... [Seg7: Nov 25]
Enter fullscreen mode Exit fullscreen mode

2. Size-Based Retention

Configuration:
log.retention.bytes = 107374182400  (100GB per partition)

Current State:
Partition total size = 98GB βœ“ OK

New Segment Added:
Partition total size = 103GB βœ— EXCEEDS LIMIT

Action:
β†’ Kafka deletes oldest segment
β†’ New total = 102GB βœ“
Enter fullscreen mode Exit fullscreen mode

3. Combined Policies

You can use both simultaneouslyβ€”Kafka deletes when either condition is met:

log.retention.hours = 168     (7 days)
log.retention.bytes = 100GB

Segment deleted if:
βœ“ Age > 7 days  OR
βœ“ Total size > 100GB
Enter fullscreen mode Exit fullscreen mode

πŸ’¨ Why Deletion is Lightning Fast

Segment-Level Operations

Kafka never deletes individual messages. It operates on entire segment files.

❌ Inefficient (What Kafka DOESN'T do):
Read file β†’ Skip old messages β†’ Write remaining β†’ Rebuild indexes
(Hours of processing)

βœ… Efficient (What Kafka DOES):
Delete segment-000.log
Delete segment-000.index  
Delete segment-000.timeindex
(Milliseconds)
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • πŸš€ One filesystem operation
  • πŸš€ No data rewriting
  • πŸš€ No index rebuilding
  • πŸš€ Extremely low CPU usage

🎬 Real-World Message Journey

Let's trace a viewer event through the entire system:

Step 1: Producer Sends Event

Producer β†’ Kafka Broker

Message: {
  user_id: "viewer-789",
  action: "started_watching",
  show_id: "breaking-code-s1e3",
  timestamp: 2025-11-25 14:45:00
}

Destination:
Topic: viewer-activity
Partition: 2 (based on user_id hash)
Offset: 1,700,082 (next available)
Enter fullscreen mode Exit fullscreen mode

Step 2: Write to Active Segment

Active Segment: 00000000000001700000.log
Position: Append at byte 245,760
Enter fullscreen mode Exit fullscreen mode

Step 3: Update Indexes

Offset Index:
1,700,082 β†’ byte 245,760

Time Index:
2025-11-25 14:45:00 β†’ offset 1,700,082
Enter fullscreen mode Exit fullscreen mode

Step 4: Acknowledge Producer

Broker β†’ Producer:
"βœ“ Message written at offset 1,700,082"
Enter fullscreen mode Exit fullscreen mode

Step 5: Consumer Reads Message

Consumer Request: "Give me messages from offset 1,700,082"

Kafka Process:
1. Which segment? β†’ 00000000000001700000.log
2. Load offset index into memory
3. Find: offset 1,700,082 β†’ byte 245,760
4. Read from disk starting at byte 245,760
5. Return message to consumer
Enter fullscreen mode Exit fullscreen mode

🧹 Log Compaction: Latest State Only

The Use Case

Scenario: Storing user profile updates

User ID: viewer-789

Messages Over Time:
1. viewer-789 β†’ {email: "old@email.com", plan: "basic"}
2. viewer-789 β†’ {email: "new@email.com", plan: "basic"}
3. viewer-789 β†’ {email: "new@email.com", plan: "premium"}

Question: Do we need messages 1 and 2?
Answer: NO! Only the latest state matters.
Enter fullscreen mode Exit fullscreen mode

How Compaction Works

Configuration:
log.cleanup.policy = compact

Result:
Kafka guarantees: For each key, retain at least the last known value

Background Process:
1. Log cleaner reads old segments
2. Identifies duplicate keys
3. Keeps only latest message per key
4. Creates cleaned segments
5. Replaces old segments
Enter fullscreen mode Exit fullscreen mode

πŸ’€ Tombstones: Deleting Keys

To remove a key entirely:

Send message: {key: "viewer-789", value: null}

Effect:
β†’ Log cleaner removes the key
β†’ Includes its last valid value
β†’ After configurable grace period
Enter fullscreen mode Exit fullscreen mode

Compaction vs. Retention Comparison

Aspect Retention (Delete) Compaction
What's Kept Time/size window of all messages Latest value per key
What's Deleted Old segments entirely Old values for same key
Use Case Time-series events, logs User profiles, state management
Configuration log.cleanup.policy=delete log.cleanup.policy=compact
History Limited time/size window No history, only latest

πŸš€ Performance Benefits Summary

Why Kafka is Blazing Fast

  1. Sequential Writes β†’ Always append to end (disk-friendly)
  2. Memory-Mapped Indexes β†’ Entire index files in RAM
  3. Zero-Copy Transfer β†’ Direct disk β†’ network socket
  4. Batch Processing β†’ Multiple messages written together
  5. Segment-Level Operations β†’ No expensive per-message work

πŸ“Š Real-World Numbers

A single Kafka broker can handle:

  • βœ… Millions of messages per second
  • βœ… Hundreds of MB/s throughput
  • βœ… Sub-millisecond latencies
  • βœ… All while maintaining durability

Top comments (0)