A consumer needs an offset (a bookmark) to know where to start reading from a partition. Normally:
Kafka stores the last committed offset in _consumer_offsets
. When you restart a consumer, it resumes from that committed offset.
But… what if there is no valid offset for a partition?
That’s where auto.offset.reset
kicks in.
🚨 When Does “no valid offset” Happen?
- New consumer group (first time this group subscribes to a topic → no committed offsets exist yet).
- Offsets got deleted (Kafka has a retention policy for committed offsets — e.g., offsets.retention.minutes).
- Offset is invalid (maybe pointing to data that was deleted due to log retention).
⚙️ auto.offset.reset Options
1. earliest
- Start reading from the beginning of the log (smallest available offset).
- Consumer will replay all historical data.
- Good for batch jobs, data pipelines, or when you really want everything (e.g., reindexing a search database).
2. latest
- Start reading from the end of the log (largest offset).
- Consumer ignores past data → only gets new messages arriving after it joined.
- Good for real-time dashboards or monitoring, where you don’t care about history.
📌 Why is this Important?
- If you forget this setting, you can accidentally replay millions of messages when you didn’t intend to.
- Conversely, you might miss data if you start from latest in a system that needs history.
Top comments (0)