The Challenge: Kafka Operations Are Hard
Operating Kafka at scale is not just about running a cluster. It involves:
- Careful resource planning, including CPU, memory, and disk usage.
- Managing partition assignments and rebalancing, which often lead to tricky operational edge cases.
- Ensuring high availability and fault tolerance while keeping latency low.
- Handling ever-growing storage needs and maintaining complex ZooKeeper or KRaft metadata.
All of this makes Kafka an incredibly powerful yet operationally heavy system. While it is widely recognized as the de facto standard for distributed messaging protocols, it doesn't always align with the principles of cloud-native systems, which emphasize simplicity, statelessness, and elasticity.
Of course, Confluent and other vendors offer managed solutions that handle these resource and operational challenges for you.
However, it's exciting to explore simpler, open-source alternatives that anyone can build and experiment with on their own.
Cloud-Native Needs vs Kafka's Design
In modern cloud-native environments, developers increasingly prefer systems that are:
- Stateless by design, so they can scale up and down easily.
- Composable, allowing infrastructure pieces to be swapped or upgraded independently.
- Lightweight, to reduce costs and operational overhead.
Kafka, despite its undeniable strengths, was designed before these paradigms became mainstream. As a result, deploying and managing Kafka clusters in ephemeral or serverless environments often becomes impractical.
Introducing a Stateless Kafka Broker
To address this challenge, I started an experimental project: stateless-kafka-broker.
The core idea is simple yet radical: remove internal state from the broker, delegate everything to external storage backends, and let the broker focus solely on protocol handling.
📄 Architecture Diagram
The architecture is designed to be as simple and minimal as possible.
It separates metadata, logs, and index management into pluggable external stores.
As a result, the broker itself remains stateless and easy to operate.
This architecture brings several benefits:
- You no longer need to maintain broker-local metadata or logs.
- Scaling horizontally becomes a matter of spinning up more stateless broker instances — no more complex state replication or rebalancing.
- You gain the flexibility to choose storage backends that best fit your environment.
Decoupling Through Pluggable Stores
In order to achieve this, the broker design is split into three distinct stores:
Meta Store
Handles topic partition metadata, consumer group information, and other critical broker-side state.
Backend options include file-based JSON for local simplicity, Redis for centralized ephemeral storage, or even SQL-based solutions like SQLite for small-scale deployments.
Log Store
Stores actual message data for ProduceRequest
and FetchRequest
.
This store can be backed by local file systems for dev environments, or object storage solutions like Amazon S3 or Google Cloud Storage for production environments where durability and scalability are key.
Index Store
Maintains efficient indexing information to quickly locate messages within log segments.
By separating the index logic, we enable faster reads and writes while allowing backend flexibility similar to the Meta Store.
Each of these stores is implemented as a Rust trait
, enabling fully pluggable implementations. Developers can create their own backends, experiment with different combinations, or optimize for specific workloads — all without changing the core broker logic.
Why Performance Isn't the Only Goal
While this stateless approach offers many architectural and operational benefits, it does come with trade-offs.
A traditional Kafka cluster can leverage local disk caches, in-memory optimizations, and tight coupling between broker components to achieve exceptional throughput. In contrast, a stateless broker depends on external storage latency and lacks these caching opportunities.
However, performance isn't always the most important requirement. For some applications, simplicity, flexibility, and operational ease outweigh raw throughput.
By decoupling storage, you can tune the system for different priorities, such as durability, cost efficiency, or minimal operational overhead.
Open Source, Let's Build Together!
This project is fully open source, and I'm hoping many people will join and contribute!
Whether you'd like to implement new backends, improve protocol support, optimize performance, or write better documentation — every contribution is welcome.
Let's learn, experiment, and build something fun together!
Call for Contributions
This project is still in its early stages and has a long way to go. That’s why I’m inviting contributors from around the world to join in:
- Add new backend implementations for Meta, Log, or Index stores.
- Improve protocol coverage and compatibility.
- Optimize performance in creative ways.
- Write tests, examples, and documentation to help others get started.
Whether you're a Rust enthusiast, a distributed systems geek, or just curious about Kafka internals, your contributions are welcome and highly appreciated.
Let’s explore the potential of a truly stateless, cloud-native Kafka broker together and bring back the joy of building in the open! 🌍🚀
⭐ GitHub
Check out the code and join us here: stateless-kafka-broker on GitHub
Top comments (0)