DEV Community

Leanid Herasimau
Leanid Herasimau

Posted on • Originally published at suddo.io on

Apache Kafka: ZooKeeper vs. KRaft — A Complete Comparison of Approaches

Apache Kafka is one of the most popular distributed data streaming systems. Historically, Kafka used Apache ZooKeeper for cluster management, but since version 2.8.0, an alternative has emerged—KRaft (Kafka Raft Metadata mode).

Image

What is ZooKeeper and why is it needed in Kafka?

The Role of ZooKeeper in Traditional Kafka ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and offering group services. In traditional Kafka, ZooKeeper is responsible for:

  • Storing cluster metadata
  • Controller coordination
  • Tracking broker liveness
  • Managing ACL (Access Control Lists)
  • Storing topic and broker configurations

What is KRaft (Kafka Raft Metadata mode)?

KRaft is short for Kafka Raft, and it's a new architecture where Kafka manages its own metadata using the Raft protocol to ensure consistency and fault tolerance. The Raft protocol, developed in 2014, became an alternative to the more complex Paxos protocol and has become a standard for many modern distributed systems, including etcd, Consul, and others.

In KRaft mode, Kafka brokers take on the responsibility of managing metadata. Some brokers become controllers and form a quorum, using the Raft protocol to agree on metadata changes. This means that Kafka can now manage itself without an external dependency, which significantly simplifies the architecture and improves performance.

The key difference with KRaft is that all metadata is now stored directly in Kafka, and brokers can read this metadata directly without needing to contact an external system. This eliminates an extra network call and reduces latency, especially during frequent configuration changes or controller elections.

Deployment and Management Complexity

The traditional architecture with ZooKeeper requires managing two separate systems. This means that engineers need to manage not only Kafka but also ZooKeeper, which increases operational complexity. During deployment, you need to configure both ZooKeeper and Kafka, ensuring that both systems are correctly configured and operating in a consistent state.

In KRaft mode, everything becomes simpler. We only need to manage one system—Kafka, which now manages its own metadata. This significantly simplifies deployment, especially in containerized environments where each additional dependency complicates orchestration.

Furthermore, in a ZooKeeper architecture, you need to carefully plan the size of the ZooKeeper cluster. Typically, 3 or 5 nodes are used to ensure a quorum in case of node failure. This requires additional resources and increases monitoring complexity. In KRaft mode, we can use the same Kafka brokers to manage metadata, making the architecture more compact.

Performance and Latency

One of the key advantages of KRaft is improved performance. In ZooKeeper mode, every time a broker needs to get or update metadata, it must contact ZooKeeper, which requires a network call. This can add latency, especially if the ZooKeeper cluster is on another continent or if the network is experiencing issues.

In KRaft mode, metadata is stored directly in Kafka, and brokers can access it via internal Kafka protocols. This reduces latency and improves overall system performance. Controller election time during failures is also improved, as this process now occurs within Kafka and does not require interaction with an external system.

Another important aspect is scalability. ZooKeeper has limitations on the number of nodes and load, which can become a bottleneck when scaling a Kafka cluster. KRaft does not have these limitations, as it uses the same infrastructure as Kafka itself.

Reliability and Fault Tolerance

The ZooKeeper architecture has certain risks. ZooKeeper itself can become a single point of failure, especially if the ZooKeeper cluster is small or improperly configured. If ZooKeeper is unavailable, Kafka can continue to operate with known metadata but will not be able to handle configuration changes, controller elections, and other operations that require interaction with ZooKeeper.

KRaft solves this problem by integrating metadata management into Kafka itself. We now have a distributed metadata management system that uses the same Raft protocol as other distributed systems. This makes the system more resilient to failures, as we no longer have a separate single point of failure.

However, it is worth noting that KRaft is a newer technology, and although it has undergone thorough testing, it does not have the same long-term production track record as ZooKeeper. This can be an important factor for mission-critical systems where stability and a proven track record are a priority.

Advantages and Disadvantages of Each Approach

ZooKeeper

One of the most significant advantages of ZooKeeper is its maturity and proven track record. ZooKeeper has been used in production for over 15 years, and during this time, a vast amount of knowledge, best practices, and problem solutions has been accumulated. This is especially important for large organizations where stability and reliability are paramount.

The maturity of ZooKeeper also means there is extensive documentation, numerous learning materials, active communities, and experienced engineers who can help in case of problems. This creates a certain confidence that any problems that may arise have already been solved by someone before.

The separation of responsibilities in the ZooKeeper architecture also has its advantages. Since ZooKeeper is responsible only for coordination and Kafka for data storage, each system can be optimized for its specific task. This can lead to better performance in certain scenarios, especially if you have specific metadata management requirements.

Another important advantage is the ability to use ZooKeeper for other services in your infrastructure. If you already have a ZooKeeper cluster used for other purposes, integrating Kafka with it can be a logical extension of your architecture.

ZooKeeper: Understanding the Disadvantages

Despite all its advantages, ZooKeeper also has its disadvantages, which become more apparent as system complexity grows. The main disadvantage is operational complexity. Managing two separate systems requires more time, knowledge, and resources. Each system has its own metrics, logs, configurations, and peculiarities, which increases the cognitive load on engineers.

The additional resources required for ZooKeeper are also an important factor. A ZooKeeper cluster requires separate servers, memory, CPU, and disk space. These resources are not used for direct Kafka data processing but only for coordination, which can be inefficient from a resource utilization perspective.

ZooKeeper's scalability limitations can also become a problem. ZooKeeper is not designed to store large volumes of data or handle high loads. When scaling a Kafka cluster, especially with a large number of topics and partitions, ZooKeeper can become a bottleneck.

KRaft

KRaft represents a modern approach to Kafka architecture, and its main advantage is a simplified architecture. Instead of managing two systems, you now only need to manage one. This significantly simplifies deployment, especially in containerized and cloud environments where each additional dependency complicates orchestration.

Improved performance is another key advantage of KRaft. Since metadata is now stored directly in Kafka, latency when accessing metadata is significantly reduced. This is especially important for scenarios with frequent configuration changes or a high frequency of controller elections.

Simplified management is also a significant advantage. We now have a single system for monitoring, logging, and debugging. This reduces the cognitive load on engineers and simplifies operations.

KRaft: Understanding the Disadvantages

Despite all its advantages, KRaft also has its disadvantages. The main one is the relative newness of the technology. Although KRaft has undergone thorough testing, it does not have the same long-term production track record as ZooKeeper. This can be an important factor for organizations that prefer proven solutions.

The migration process from ZooKeeper to KRaft can also be complex. Although Apache Kafka provides tools for migration, it still requires careful planning, testing, and may involve risks. Some tools and integrations may not support KRaft immediately, requiring additional work to update the ecosystem.


Recommendations for Choosing an Approach

When choosing between ZooKeeper and KRaft, it is important to consider several factors. If you are starting a new project, KRaft is usually the best choice. It is a modern solution that is easier to manage and has better performance. Also, KRaft is the future of Kafka, and in the long term, ZooKeeper will gradually become obsolete.

If you already have a running Kafka cluster on ZooKeeper, the decision to migrate requires careful analysis. You need to assess the risks, the time required for migration, and the potential benefits. In some cases, it makes sense to stay on ZooKeeper, especially if you have mission-critical systems where stability is paramount.

For organizations that already use ZooKeeper for other services, it might be logical to continue using ZooKeeper for Kafka as well to avoid infrastructure duplication. However, in the long run, transitioning to KRaft can lead to a simpler and more efficient architecture.

Conclusion

Choosing between ZooKeeper and KRaft is not just a technical decision, but a strategic one that will affect your infrastructure for years to come. ZooKeeper offers proven stability and a mature ecosystem, but requires more complex management and additional resources. KRaft provides a modern, simplified architecture with better performance, but requires a cautious approach to migration and has less production experience.

For new projects, KRaft is usually the preferred choice, as it is the future of Kafka. For existing systems, the decision should be based on specific requirements, risks, and development strategy. In any case, understanding the differences between these approaches allows you to make more informed decisions and build more reliable and efficient data streaming systems.

#kafka #kraft

Top comments (0)