Hitesh

Posted on Jan 9

Kafka interview questions for Data Engineer

#kafka #data #python #webdev

Here are Kafka interview questions tailored for AI / Backend / Data Engineer (0–3 yrs) roles, with clear, crisp answers you can revise quickly.
I’ve grouped them by difficulty & topic, exactly how interviews flow.

1️⃣ Kafka Basics (Must-Know)

Q1. What is Kafka?

Answer:
Kafka is a distributed event-streaming platform used for publishing, storing, and consuming streams of records in real time. It is designed for high throughput, fault tolerance, and scalability.

Q2. Kafka vs traditional message queue?

Answer:

Kafka	Message Queue
Pull-based	Push-based
Persistent logs	Messages deleted after consume
Multiple consumers	Usually single consumer
Replay supported	Replay difficult

Q3. What is a topic?

Answer:
A topic is a logical stream of messages where data is written and read.

Q4. What is a partition?

Answer:
A partition is a unit of parallelism in Kafka. Each partition is an ordered, immutable log.

Q5. Why partitions matter?

Answer:
They enable:

Parallel processing
Horizontal scalability
Ordered processing per partition

2️⃣ Producers & Consumers

Q6. What is a Kafka producer?

Answer:
A producer publishes records to Kafka topics.

Q7. How does Kafka decide which partition to write to?

Answer:

Key provided → hash(key) % partitions
No key → round-robin

Q8. What is a Kafka consumer?

Answer:
A consumer reads records from topics.

Q9. What is a consumer group?

Answer:
A consumer group is a set of consumers that share the load of reading partitions.

👉 One partition → one consumer within a group

Q10. What happens if consumers > partitions?

Answer:
Extra consumers stay idle.

3️⃣ Offsets & Delivery Semantics

Q11. What is an offset?

Answer:
An offset is a unique position of a record in a partition.

Q12. How does Kafka track offsets?

Answer:
Offsets are stored in Kafka’s internal topic: __consumer_offsets.

Q13. At-least-once vs At-most-once?

Answer:

At-least-once: no data loss, duplicates possible
At-most-once: no duplicates, data loss possible

Q14. Does Kafka support exactly-once?

Answer:
Yes, using:

Idempotent producers
Transactions

4️⃣ Fault Tolerance & Replication

Q15. What is a broker?

Answer:
A broker is a Kafka server that stores and serves data.

Q16. What is replication factor?

Answer:
Number of copies of a partition across brokers.

Q17. What is leader and follower?

Answer:

Leader: handles reads/writes
Follower: replicates leader data

Q18. What happens if leader fails?

Answer:
A follower is automatically elected as the new leader.

5️⃣ Performance & Reliability

Q19. Why is Kafka fast?

Answer:

Sequential disk writes
Zero-copy transfer
Batching
Page cache usage

Q20. What is ISR?

Answer:
ISR (In-Sync Replicas) are replicas fully caught up with the leader.

Q21. What is `acks` in producer?

Answer:

acks=0 → no guarantee
acks=1 → leader only
acks=all → leader + replicas

6️⃣ Real-World Scenarios (Very Important)

Q22. How do you ensure message ordering?

Answer:
Use the same key so messages go to the same partition.

Q23. How to handle duplicate messages?

Answer:

Idempotent consumers
Deduplication using unique IDs
Exactly-once semantics

Q24. How to reprocess old data?

Answer:
Reset consumer offsets to an earlier value.

Q25. Kafka vs RabbitMQ?

Answer:

Kafka → high throughput, replay, streaming
RabbitMQ → low latency, task queues

7️⃣ Kafka + Data Engineering / AI

Q26. Kafka in ETL pipelines?

Answer:
Kafka acts as a buffer and ingestion layer between producers and downstream ETL systems.

Q27. Kafka with Spark / Flink?

Answer:
Kafka provides real-time data streams; Spark/Flink process them.

Q28. Kafka for ML pipelines?

Answer:
Used for:

Real-time feature ingestion
Streaming inference
Online model updates

8️⃣ Configuration & Monitoring

Q29. How do you monitor Kafka?

Answer:

Lag
Throughput
Broker health
Consumer offsets

Tools:

Prometheus + Grafana
CloudWatch (MSK)

Q30. What is consumer lag?

Answer:
Difference between latest offset and consumer offset.

9️⃣ Advanced (Bonus)

Q31. What is log compaction?

Answer:
Kafka keeps the latest record per key, deleting older ones.

Q32. What is retention policy?

Answer:

Time-based
Size-based

Q33. Schema Registry?

Answer:
Manages schema evolution for messages (Avro/Protobuf).

🔥 5 One-Line Interview Killers

Memorize these:

“Kafka is a distributed commit log.”
“Partitions give scalability; keys give ordering.”
“Offsets enable replayability.”
“Consumer groups provide horizontal scaling.”
“Exactly-once requires idempotent producers and transactions.”

🎯 How to Answer Like a Pro

When stuck, say:

“In production, the choice depends on throughput, ordering, and replay requirements.”

1️⃣ Kafka Basics (Must-Know)

Q1. What is Kafka?

Q2. Kafka vs traditional message queue?

Q3. What is a topic?

Q4. What is a partition?

Q5. Why partitions matter?

2️⃣ Producers & Consumers

Q6. What is a Kafka producer?

Q7. How does Kafka decide which partition to write to?

Q8. What is a Kafka consumer?

Q9. What is a consumer group?

Q10. What happens if consumers > partitions?

3️⃣ Offsets & Delivery Semantics

Q11. What is an offset?

Q12. How does Kafka track offsets?

Q13. At-least-once vs At-most-once?

Q14. Does Kafka support exactly-once?

4️⃣ Fault Tolerance & Replication

Q15. What is a broker?

Q16. What is replication factor?

Q17. What is leader and follower?

Q18. What happens if leader fails?

5️⃣ Performance & Reliability

Q19. Why is Kafka fast?

Q20. What is ISR?

Q21. What is acks in producer?

6️⃣ Real-World Scenarios (Very Important)

Q22. How do you ensure message ordering?

Q23. How to handle duplicate messages?

Q24. How to reprocess old data?

Q25. Kafka vs RabbitMQ?

7️⃣ Kafka + Data Engineering / AI

Q26. Kafka in ETL pipelines?

Q27. Kafka with Spark / Flink?

Q28. Kafka for ML pipelines?

8️⃣ Configuration & Monitoring

Q29. How do you monitor Kafka?

Q30. What is consumer lag?

9️⃣ Advanced (Bonus)

Q31. What is log compaction?

Q32. What is retention policy?

Q33. Schema Registry?

🔥 5 One-Line Interview Killers

🎯 How to Answer Like a Pro

Q21. What is `acks` in producer?