π― Introduction: Beyond the Message Queue
Imagine you're running a global streaming platform processing millions of viewer events every secondβplay, pause, skip, like, and watch history. How do you store this tsunami of data efficiently while keeping it instantly accessible? This is where Kafka's storage architecture becomes a masterpiece of engineering.
ποΈ The Three Core Responsibilities
Every Kafka broker juggles three critical tasks simultaneously:
1. π₯ Producer Gateway
Accepts incoming streams of events from applications across your network
2. πΎ Storage Engine
Writes messages to disk durably and efficientlyβthis is where the magic happens
3. π€ Consumer Server
Rapidly locates and delivers data to consumers while replicating to other brokers
π The Storage Hierarchy
Topic: "viewer-activity"
β
βββ Partition 0 (folder: /data/viewer-activity-0/)
β βββ 00000000000000000000.log
β βββ 00000000000000000000.index
β βββ 00000000000000000000.timeindex
β βββ 00000000000000850000.log
β βββ 00000000000000850000.index
β βββ 00000000000000850000.timeindex
β βββ 00000000000001700000.log (active segment)
β
βββ Partition 1 (folder: /data/viewer-activity-1/)
β βββ [similar segment files...]
β
βββ Partition 2 (folder: /data/viewer-activity-2/)
βββ [similar segment files...]
π§© Why Segments? The Problem with Giant Files
Imagine storing all viewer events in one massive file:
β A 15TB file is impossible to manage
β Deleting old watch history requires rewriting everything
β Finding "what did user watch at 3 PM yesterday" means scanning terabytes
β File corruption destroys all your data
The Segment Solution:
Kafka divides each partition's log into segmentsβsmaller, manageable chunks (typically 1GB each).
π When Does Kafka Create a New Segment?
A new segment rolls when either condition is met:
Condition 1: Size Threshold
Current segment reaches 1GB (default)
β Close it, start fresh segment
Condition 2: Time Threshold
7 days have passed (default)
β Roll to new segment regardless of size
π Real-World Timeline Example:
Monday 9:00 AM β Users start watching shows
β Messages written to segment 0
Monday 2:30 PM β Segment 0 hits 1GB
β Kafka creates segment 1
β New events go to segment 1
Tuesday 3:00 PM β Segment 1 hits 1GB
β Kafka creates segment 2 (now active)
ποΈ Segment File Anatomy
For each segment, Kafka maintains three files:
Partition Folder: viewer-activity-0/
β
βββ 00000000000000850000.log (1GB - actual messages)
βββ 00000000000000850000.index (10MB - offset lookup map)
βββ 00000000000000850000.timeindex (10MB - timestamp lookup map)
File Naming Convention
The number 00000000000000850000 is the base offsetβthe first message's offset in this segment.
β‘ Index Files: Kafka's Speed Secret
π― The Two-Step Lookup Process
When Kafka needs to find a message, it uses TWO files in sequence:
- First: Check the segment file name (base offset) to find the RIGHT segment
-
Second: Use the
.indexfile to find the EXACT location within that segment
Let's see this in action!
π 1. Offset Index (.index)
Maps message offset β byte position in the log file
How It Works:
Consumer Request: "Give me messages starting from offset 850,125"
Step 1: FIND THE RIGHT SEGMENT (using base offsets)
Available segments:
β’ 00000000000000000000.log (base: 0)
β’ 00000000000000850000.log (base: 850,000) β THIS ONE!
β’ 00000000000001700000.log (base: 1,700,000)
Offset 850,125 falls between 850,000 and 1,700,000
β Select segment: 00000000000000850000.log
Step 2: FIND EXACT POSITION (using .index file)
Broker loads 00000000000000850000.index into memory
Binary search finds:
β’ offset 850,100 β byte 3072
β’ offset 850,150 β byte 6144
Message 850,125 must be between bytes 3072-6144
Step 3: READ THE MESSAGE
Read from byte 3072 in 00000000000000850000.log
Scan forward until offset 850,125 is found
Index Structure:
ββββββββββββ¬ββββββββββββββ
β Offset β Byte Pos β
ββββββββββββΌββββββββββββββ€
β 850000 β 0 β
β 850100 β 3072 β
β 850150 β 6144 β
β 850200 β 9216 β
β ... β ... β
ββββββββββββ΄ββββββββββββββ
Result: β¨ No scanning millions of messagesβinstant lookup!
π Visual: The Two-File Lookup System
π― FINDING MESSAGE BY OFFSET (850,125)
File System View:
βββ 00000000000000000000.log (offsets: 0 - 849,999)
βββ 00000000000000000000.index
βββ 00000000000000850000.log (offsets: 850,000 - 1,699,999) β
βββ 00000000000000850000.index β USE THIS!
βββ 00000000000001700000.log (offsets: 1,700,000+)
βββ 00000000000001700000.index
Step 1: Match offset to segment name (base offset)
850,125 is >= 850,000 and < 1,700,000
β Open segment: 00000000000000850000
Step 2: Use that segment's .index file
β Open: 00000000000000850000.index
β Find byte position: 3072
β Read from .log file at byte 3072
π Key Insight: Why Two Files?
Without segment files (base offset in filename):
- β Would need to check EVERY index file
- β Open thousands of files to find the right one
- β Very slow!
With segment files (base offset in filename):
- β Filename tells you the offset range instantly
- β Only open ONE .index file
- β Lightning fast!
The Two-Step Magic:
- Segment filename (base offset) = Coarse filter (which file?)
- Index file (.index) = Fine filter (which byte position?)
β° 2. Time Index (.timeindex)
Maps timestamp β corresponding offset (also uses two-step lookup!)
How It Works:
Consumer Request: "Show me all viewer activity from the last hour"
(timestamp: 2025-11-25 13:00:00)
Step 1: FIND THE RIGHT SEGMENT (using .timeindex files)
Check all segments' time ranges:
β’ Segment 0: timestamps 10:00 - 11:59
β’ Segment 850000: timestamps 12:00 - 13:59 β THIS ONE!
β’ Segment 1700000: timestamps 14:00 onwards
Timestamp 13:00:00 falls in segment 850000
Step 2: FIND THE OFFSET (using .timeindex)
Broker checks 00000000000000850000.timeindex
Binary search finds:
2025-11-25 13:00:00 β offset 850,750
Step 3: USE OFFSET INDEX (now it's a regular offset lookup!)
Uses 00000000000000850000.index
Finds: offset 850,750 β byte position 225,280
Step 4: READ THE MESSAGES
Reads from byte 225,280 in .log file
Returns all messages from that point forward
Time Index Structure:
ββββββββββββββββββββββββ¬βββββββββββ
β Timestamp β Offset β
ββββββββββββββββββββββββΌβββββββββββ€
β 2025-11-25 10:00:00 β 850000 β
β 2025-11-25 11:30:00 β 850300 β
β 2025-11-25 13:00:00 β 850750 β
β 2025-11-25 14:15:00 β 851000 β
β ... β ... β
ββββββββββββββββββββββββ΄βββββββββββ
π§ The Memory Advantage
Why are index lookups so blazing fast?
Disk Storage (Slow):
βββ segment.log (1GB - rarely loaded fully)
βββ segment.index (10MB - loaded into RAM!)
βββ segment.timeindex (10MB - loaded into RAM!)
Memory (Lightning Fast):
βββ Index files cached β Microsecond lookups
Because index files are tiny (10-20MB), they fit entirely in memory:
- β No disk reads for lookups
- β Binary search through in-memory structures
- β Microseconds instead of seconds
ποΈ Retention Policies: Managing Disk Space
Without cleanup, your disks would eventually overflow. Kafka automatically deletes old segments based on policies.
1. Time-Based Retention
Configuration:
log.retention.hours = 168 (7 days)
Timeline:
Day 1 β Segment 0 created (Nov 18 viewer data)
Day 2 β Segment 1 created (Nov 19 viewer data)
...
Day 8 β Segment 0 is now 7 days old
β Kafka deletes:
- segment-0.log
- segment-0.index
- segment-0.timeindex
Visual:
Before Cleanup (Day 8):
[Seg0: Nov 18] [Seg1: Nov 19] [Seg2: Nov 20] ... [Seg7: Nov 25]
β 7 days old - DELETE
After Cleanup:
[Seg1: Nov 19] [Seg2: Nov 20] ... [Seg7: Nov 25]
2. Size-Based Retention
Configuration:
log.retention.bytes = 107374182400 (100GB per partition)
Current State:
Partition total size = 98GB β OK
New Segment Added:
Partition total size = 103GB β EXCEEDS LIMIT
Action:
β Kafka deletes oldest segment
β New total = 102GB β
3. Combined Policies
You can use both simultaneouslyβKafka deletes when either condition is met:
log.retention.hours = 168 (7 days)
log.retention.bytes = 100GB
Segment deleted if:
β Age > 7 days OR
β Total size > 100GB
π¨ Why Deletion is Lightning Fast
Segment-Level Operations
Kafka never deletes individual messages. It operates on entire segment files.
β Inefficient (What Kafka DOESN'T do):
Read file β Skip old messages β Write remaining β Rebuild indexes
(Hours of processing)
β
Efficient (What Kafka DOES):
Delete segment-000.log
Delete segment-000.index
Delete segment-000.timeindex
(Milliseconds)
Benefits:
- π One filesystem operation
- π No data rewriting
- π No index rebuilding
- π Extremely low CPU usage
π¬ Real-World Message Journey
Let's trace a viewer event through the entire system:
Step 1: Producer Sends Event
Producer β Kafka Broker
Message: {
user_id: "viewer-789",
action: "started_watching",
show_id: "breaking-code-s1e3",
timestamp: 2025-11-25 14:45:00
}
Destination:
Topic: viewer-activity
Partition: 2 (based on user_id hash)
Offset: 1,700,082 (next available)
Step 2: Write to Active Segment
Active Segment: 00000000000001700000.log
Position: Append at byte 245,760
Step 3: Update Indexes
Offset Index:
1,700,082 β byte 245,760
Time Index:
2025-11-25 14:45:00 β offset 1,700,082
Step 4: Acknowledge Producer
Broker β Producer:
"β Message written at offset 1,700,082"
Step 5: Consumer Reads Message
Consumer Request: "Give me messages from offset 1,700,082"
Kafka Process:
1. Which segment? β 00000000000001700000.log
2. Load offset index into memory
3. Find: offset 1,700,082 β byte 245,760
4. Read from disk starting at byte 245,760
5. Return message to consumer
π§Ή Log Compaction: Latest State Only
The Use Case
Scenario: Storing user profile updates
User ID: viewer-789
Messages Over Time:
1. viewer-789 β {email: "old@email.com", plan: "basic"}
2. viewer-789 β {email: "new@email.com", plan: "basic"}
3. viewer-789 β {email: "new@email.com", plan: "premium"}
Question: Do we need messages 1 and 2?
Answer: NO! Only the latest state matters.
How Compaction Works
Configuration:
log.cleanup.policy = compact
Result:
Kafka guarantees: For each key, retain at least the last known value
Background Process:
1. Log cleaner reads old segments
2. Identifies duplicate keys
3. Keeps only latest message per key
4. Creates cleaned segments
5. Replaces old segments
π Tombstones: Deleting Keys
To remove a key entirely:
Send message: {key: "viewer-789", value: null}
Effect:
β Log cleaner removes the key
β Includes its last valid value
β After configurable grace period
Compaction vs. Retention Comparison
| Aspect | Retention (Delete) | Compaction |
|---|---|---|
| What's Kept | Time/size window of all messages | Latest value per key |
| What's Deleted | Old segments entirely | Old values for same key |
| Use Case | Time-series events, logs | User profiles, state management |
| Configuration | log.cleanup.policy=delete |
log.cleanup.policy=compact |
| History | Limited time/size window | No history, only latest |
π Performance Benefits Summary
Why Kafka is Blazing Fast
- Sequential Writes β Always append to end (disk-friendly)
- Memory-Mapped Indexes β Entire index files in RAM
- Zero-Copy Transfer β Direct disk β network socket
- Batch Processing β Multiple messages written together
- Segment-Level Operations β No expensive per-message work
π Real-World Numbers
A single Kafka broker can handle:
- β Millions of messages per second
- β Hundreds of MB/s throughput
- β Sub-millisecond latencies
- β All while maintaining durability
Top comments (0)