Here are 10 commonly asked interview questions on Apache Kafka:
1. What is Apache Kafka, and what are its key components?
-
Answer: Kafka is a distributed event-streaming platform used for building real-time data pipelines and applications.
- Key components:
- Producer: Sends data to topics.
- Consumer: Reads data from topics.
- Broker: Kafka server storing messages.
- Topic: Categories where records are sent.
- Zookeeper/Controller: Manages the Kafka cluster metadata (Zookeeper is being replaced by Kafka Raft in newer versions).
2. How does Kafka ensure message durability and fault tolerance?
-
Answer: Kafka achieves durability and fault tolerance through:
- Replication: Topics are divided into partitions, and each partition is replicated across multiple brokers.
-
Acknowledgements: Producers can request acknowledgments (
acks
) to ensure messages are written. - Commit Log: Messages are stored in a log, making them recoverable even after crashes.
3. What is a Kafka partition, and how does it work?
-
Answer:
- Partitions are the basic unit of parallelism in Kafka.
- Each partition is an ordered sequence of records, and a topic can have one or more partitions.
- Data is distributed across partitions using key-based partitioning or round-robin (default).
- Consumers read data in order within a partition.
4. How does Kafka handle data retention?
-
Answer:
- Kafka retains data for a configured period (
log.retention.ms
) or until a storage size limit (log.retention.bytes
) is reached. - Old records are deleted based on these configurations.
- The cleanup policy can also be set to
compact
to keep only the latest key-value pairs.
- Kafka retains data for a configured period (
5. What is the role of Zookeeper in Kafka?
-
Answer:
- Zookeeper manages Kafka metadata, including:
- Tracking brokers in the cluster.
- Leader election for partitions.
- Storing configurations for topics and ACLs.
- Starting from Kafka 2.8, Zookeeper is being replaced by KRaft (Kafka Raft) for metadata management.
6. How is Kafka different from traditional message queues like RabbitMQ?
-
Answer:
- Kafka is designed for distributed systems and focuses on scalability, durability, and high throughput.
- Unlike RabbitMQ, Kafka:
- Retains messages for a configurable period.
- Decouples producers and consumers.
- Supports event replay from logs.
7. Explain the consumer group mechanism in Kafka.
-
Answer:
- Consumers subscribe to topics as part of a consumer group.
- Kafka ensures that each partition is consumed by only one consumer in a group for parallelism.
- If a consumer fails, another in the group takes over its partitions.
8. What is Kafka’s ISR (In-Sync Replica) mechanism?
-
Answer:
- ISR refers to replicas of a partition that are fully synchronized with the leader partition.
- Kafka writes only to partitions where all ISR members have acknowledged the write.
9. How do you monitor Kafka performance?
-
Answer: Common monitoring tools include:
- Metrics: Kafka exposes JMX metrics, such as throughput, replication lag, and disk usage.
- Tools: Prometheus, Grafana, Confluent Control Center.
- Logs: Analyze broker, producer, and consumer logs for issues.
10. What are the common challenges with Kafka, and how can they be mitigated?
-
Answer:
- Data loss: Configure proper replication and acknowledgment.
- Consumer lag: Monitor lag and scale consumer groups if needed.
- Partition imbalance: Use partition rebalancing tools.
- High latency: Optimize producer/consumer configurations (e.g., batch size, compression).
Top comments (0)