DEV Community: Awneesh Tiwari

StrikeMQ vs Kafka: Benchmarking a 735KB Broker Against a 200MB JVM Giant

Awneesh Tiwari — Tue, 10 Feb 2026 10:44:57 +0000

Kafka is the gold standard for production event streaming. But for local development and testing, it's like driving a semi truck to the grocery store. I built StrikeMQ — a Kafka-compatible broker in C++20 — specifically for the localhost:9092 use case. Here's how they compare with real numbers.

The Setup

StrikeMQ v0.1.4 — C++20, zero dependencies, single binary.
Apache Kafka 3.7 — Running via docker compose with KRaft (no ZooKeeper), default configuration.
Hardware — Apple M-series MacBook, 10 cores, 16GB RAM.

All tests measure the same thing: a process listening on port 9092 that Kafka clients can produce to and consume from.

Binary Size

	Kafka	StrikeMQ
Runtime files	~200MB (JVM + jars + config)	735KB (stripped, statically linked)
Dependencies	JDK 11+, scripts, config dirs	None

StrikeMQ is 272x smaller. The entire binary — networking, Kafka protocol codec, storage engine, REST API, HTTP server — fits in less space than a single JPEG.

$ ls -lh strikemq
-rwxr-xr-x  1 user  staff  735K  strikemq

$ du -sh kafka_2.13-3.7.0/
207M    kafka_2.13-3.7.0/

Startup Time

I measured time from process start to first successful produce (using kcat):

	Kafka	StrikeMQ
Cold start to ready	~8-15 seconds	< 10ms
First produce accepted	~10-20 seconds	< 50ms

# StrikeMQ: instant
$ time (./strikemq & sleep 0.1 && echo "test" | kcat -b 127.0.0.1:9092 -P -t bench)
real    0m0.112s

# Kafka: wait for JVM warmup, controller election, log recovery...
$ time (docker compose up -d && until kcat -b 127.0.0.1:9092 -L 2>/dev/null; do sleep 0.5; done)
real    0m12.438s

When you're iterating on code and restarting your broker 50 times a day, those 12 seconds add up to 10 minutes of daily waiting.

Memory Usage

Measured after startup with no topics, then after producing 10,000 messages:

State	Kafka	StrikeMQ
Idle (no topics)	~350MB RSS	~1.5MB RSS
After 10K messages	~400MB RSS	~2MB + mmap'd segments
Theoretical minimum	~200MB (JVM heap floor)	< 1MB (code + stack)

StrikeMQ uses mmap for storage segments. The OS manages page residency — only pages being read or written are in physical memory. The broker itself barely allocates heap. Kafka, by contrast, needs a JVM with a minimum heap, GC metadata, thread stacks for 50+ threads, and page cache for its own log segments.

Idle CPU

	Kafka	StrikeMQ
CPU at idle	1-3% (GC cycles, thread scheduling)	0.0%

StrikeMQ uses kqueue (macOS) / epoll (Linux) event loops that block when there's nothing to do. No background GC, no periodic timers, no busy loops. The process is literally suspended by the kernel until a packet arrives.

# StrikeMQ idle for 60 seconds
$ top -pid $(pgrep strikemq) -l 1
PID    COMMAND  %CPU  MEM
12345  strikemq  0.0   1.5M

Produce Latency — Microbenchmarks

StrikeMQ's built-in benchmark suite measures the raw latency of core operations using TSC (Time Stamp Counter) for nanosecond-precision timing. 1 million samples each after a 10K warmup:

SPSC Ring Buffer (push + pop)

The lock-free queue that passes connections from the acceptor thread to workers:

Percentile	Latency
avg	19 ns
p50	< 42 ns
p99.9	42 ns
max	13 us

Memory Pool (alloc + free)

Pre-allocated block pool with intrusive freelist:

Percentile	Latency
avg	3 ns
p50	< 42 ns
p99.9	42 ns
max	7 us

Log Append (1KB message)

The full produce path — lock partition, memcpy into mmap'd segment, update offset index, unlock:

Percentile	Latency
avg	145 ns
p50	83 ns
p99	667 ns
p99.9	4.4 us
max	15 us

Kafka Header Decode

Parsing a complete Kafka request header from raw bytes:

Percentile	Latency
avg	16 ns
p50	< 42 ns
p99.9	42 ns
max	15 us

Every operation passes the sub-millisecond p99.9 check. The log append — which is the actual disk write — completes in 145ns on average. That's because mmap turns disk writes into memory copies; the OS flushes to disk asynchronously.

End-to-End Produce Latency

For the full network round-trip (client -> TCP -> parse -> store -> respond -> client), measured with kcat producing 1,000 individual messages:

	Kafka	StrikeMQ
p50	~1-2ms	< 0.5ms
p99	~5-10ms	< 1ms
p99.9	~15-50ms	< 1ms

StrikeMQ's end-to-end produce stays under 1ms at p99.9. The path is:

recv() → parse Kafka header (16ns) → decode batch → lock partition mutex →
memcpy into mmap (145ns) → unlock → encode response → send()

No GC pauses. No thread context switches for common cases. No JIT warmup.

Consume Latency

The fetch path is even faster because it's completely lock-free:

recv() → parse header → binary search offset index → pointer into mmap → send()

Zero copies of actual message data. The kernel's send() reads directly from the mmap'd file pages. No deserialization, no buffer allocation, no locking.

Resource Comparison Summary

Metric	Kafka	StrikeMQ	Factor
Binary size	200MB	735KB	272x smaller
Startup time	12s	10ms	1,200x faster
Idle memory	350MB	1.5MB	233x less
Idle CPU	1-3%	0%	--
Produce p99.9	~15ms	< 1ms	15x+ faster
Dependencies	JDK, scripts	None	--
Threads at idle	50+	12	4x fewer

What This Means For You

If you're running Kafka in docker-compose.yml for local development, you're paying a 12-second startup tax and 350MB memory overhead every time. Multiply that across your team and your CI pipeline:

Developer laptop: Swap Kafka for StrikeMQ in docker-compose. Same port, same protocol, same client code. Free up 350MB for your IDE.
CI/CD integration tests: Start StrikeMQ in 10ms instead of waiting 15 seconds for Kafka to boot. Your pipeline gets faster without changing a single test.
Prototyping: Want to test if Kafka is right for your architecture? Try the idea with StrikeMQ in seconds, not minutes.

What StrikeMQ Doesn't Do

This isn't a production Kafka replacement. It deliberately trades durability and fault tolerance for speed and simplicity:

No replication (single broker)
No authentication (no SASL/SSL)
Consumer group offsets are in-memory (lost on restart)
No log compaction or retention enforcement

It's a development tool, like SQLite is to PostgreSQL or LocalStack is to AWS.

Try It

# macOS
brew tap awneesht/strike-mq
brew install strikemq

# Or build from source (any platform)
git clone https://github.com/awneesht/Strike-mq.git
cd Strike-mq && cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build
./build/strikemq

Then point any Kafka client at 127.0.0.1:9092. Or use the built-in REST API:

# Produce via curl
curl -X POST localhost:8080/v1/topics/demo/messages \
  -d '{"messages":[{"value":"hello"},{"key":"user-1","value":"world"}]}'

# Peek at messages
curl "localhost:8080/v1/topics/demo/messages?offset=0&limit=10"

Run the benchmarks yourself:

./build/strikemq_bench

GitHub: github.com/awneesht/Strike-mq
License: MIT

All benchmarks run on Apple M-series, macOS, compiled with Clang -O2. Your numbers will vary. Kafka numbers are representative of default configurations — tuned Kafka will perform better, but will still carry the JVM baseline overhead. StrikeMQ numbers are from its built-in benchmark suite using TSC-based nanosecond timing.

StrikeMQ vs Kafka: Benchmarking a 735KB Broker Against a 200MB JVM Giant

Awneesh Tiwari — Tue, 10 Feb 2026 09:57:41 +0000

The Setup

All tests measure the same thing: a process listening on port 9092 that Kafka clients can produce to and consume from.

Binary Size

	Kafka	StrikeMQ
Runtime files	~200MB (JVM + jars + config)	735KB (stripped, statically linked)
Dependencies	JDK 11+, scripts, config dirs	None

StrikeMQ is 272x smaller. The entire binary — networking, Kafka protocol codec, storage engine, REST API, HTTP server — fits in less space than a single JPEG.

$ ls -lh strikemq
-rwxr-xr-x  1 user  staff  735K  strikemq

$ du -sh kafka_2.13-3.7.0/
207M    kafka_2.13-3.7.0/

Startup Time

I measured time from process start to first successful produce (using kcat):

	Kafka	StrikeMQ
Cold start to ready	~8-15 seconds	< 10ms
First produce accepted	~10-20 seconds	< 50ms

# StrikeMQ: instant
$ time (./strikemq & sleep 0.1 && echo "test" | kcat -b 127.0.0.1:9092 -P -t bench)
real    0m0.112s

# Kafka: wait for JVM warmup, controller election, log recovery...
$ time (docker compose up -d && until kcat -b 127.0.0.1:9092 -L 2>/dev/null; do sleep 0.5; done)
real    0m12.438s

When you're iterating on code and restarting your broker 50 times a day, those 12 seconds add up to 10 minutes of daily waiting.

Memory Usage

Measured after startup with no topics, then after producing 10,000 messages:

State	Kafka	StrikeMQ
Idle (no topics)	~350MB RSS	~1.5MB RSS
After 10K messages	~400MB RSS	~2MB + mmap'd segments
Theoretical minimum	~200MB (JVM heap floor)	< 1MB (code + stack)

Idle CPU

	Kafka	StrikeMQ
CPU at idle	1-3% (GC cycles, thread scheduling)	0.0%

# StrikeMQ idle for 60 seconds
$ top -pid $(pgrep strikemq) -l 1
PID    COMMAND  %CPU  MEM
12345  strikemq  0.0   1.5M

Produce Latency — Microbenchmarks

StrikeMQ's built-in benchmark suite measures the raw latency of core operations using TSC (Time Stamp Counter) for nanosecond-precision timing. 1 million samples each after a 10K warmup:

SPSC Ring Buffer (push + pop)

The lock-free queue that passes connections from the acceptor thread to workers:

Percentile	Latency
avg	19 ns
p50	< 42 ns
p99.9	42 ns
max	13 us

Memory Pool (alloc + free)

Pre-allocated block pool with intrusive freelist:

Percentile	Latency
avg	3 ns
p50	< 42 ns
p99.9	42 ns
max	7 us

Log Append (1KB message)

The full produce path — lock partition, memcpy into mmap'd segment, update offset index, unlock:

Percentile	Latency
avg	145 ns
p50	83 ns
p99	667 ns
p99.9	4.4 us
max	15 us

Kafka Header Decode

Parsing a complete Kafka request header from raw bytes:

Percentile	Latency
avg	16 ns
p50	< 42 ns
p99.9	42 ns
max	15 us

End-to-End Produce Latency

For the full network round-trip (client -> TCP -> parse -> store -> respond -> client), measured with kcat producing 1,000 individual messages:

	Kafka	StrikeMQ
p50	~1-2ms	< 0.5ms
p99	~5-10ms	< 1ms
p99.9	~15-50ms	< 1ms

StrikeMQ's end-to-end produce stays under 1ms at p99.9. The path is:

recv() → parse Kafka header (16ns) → decode batch → lock partition mutex →
memcpy into mmap (145ns) → unlock → encode response → send()

No GC pauses. No thread context switches for common cases. No JIT warmup.

Consume Latency

The fetch path is even faster because it's completely lock-free:

recv() → parse header → binary search offset index → pointer into mmap → send()

Zero copies of actual message data. The kernel's send() reads directly from the mmap'd file pages. No deserialization, no buffer allocation, no locking.

Resource Comparison Summary

Metric	Kafka	StrikeMQ	Factor
Binary size	200MB	735KB	272x smaller
Startup time	12s	10ms	1,200x faster
Idle memory	350MB	1.5MB	233x less
Idle CPU	1-3%	0%	--
Produce p99.9	~15ms	< 1ms	15x+ faster
Dependencies	JDK, scripts	None	--
Threads at idle	50+	12	4x fewer

What This Means For You

Developer laptop: Swap Kafka for StrikeMQ in docker-compose. Same port, same protocol, same client code. Free up 350MB for your IDE.
CI/CD integration tests: Start StrikeMQ in 10ms instead of waiting 15 seconds for Kafka to boot. Your pipeline gets faster without changing a single test.
Prototyping: Want to test if Kafka is right for your architecture? Try the idea with StrikeMQ in seconds, not minutes.

What StrikeMQ Doesn't Do

This isn't a production Kafka replacement. It deliberately trades durability and fault tolerance for speed and simplicity:

No replication (single broker)
No authentication (no SASL/SSL)
Consumer group offsets are in-memory (lost on restart)
No log compaction or retention enforcement

It's a development tool, like SQLite is to PostgreSQL or LocalStack is to AWS.

Try It

# macOS
brew tap awneesht/strike-mq
brew install strikemq

# Or build from source (any platform)
git clone https://github.com/awneesht/Strike-mq.git
cd Strike-mq && cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build
./build/strikemq

Then point any Kafka client at 127.0.0.1:9092. Or use the built-in REST API:

# Produce via curl
curl -X POST localhost:8080/v1/topics/demo/messages \
  -d '{"messages":[{"value":"hello"},{"key":"user-1","value":"world"}]}'

# Peek at messages
curl "localhost:8080/v1/topics/demo/messages?offset=0&limit=10"

Run the benchmarks yourself:

./build/strikemq_bench

GitHub: github.com/awneesht/Strike-mq
License: MIT

I replaced a 200MB JVM process with a 52KB binary that speaks Kafka

Awneesh Tiwari — Tue, 10 Feb 2026 06:14:14 +0000

Every time I spin up Kafka for local development, the same ritual plays out: start ZooKeeper (or KRaft), wait for the JVM to warm up, watch 2GB of RAM disappear, and then finally — after 30 seconds — send my first message.

I got tired of it. So I built StrikeMQ — a 52KB message broker written in C++20 that speaks the Kafka wire protocol. Any Kafka client library works with it out of the box. No code changes. No JVM. No ZooKeeper. Start in milliseconds, 0% CPU when idle.

Think of it like LocalStack for Kafka — develop locally against StrikeMQ, deploy to real Kafka in production.

# Build
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

# Run
./strikemq

# Produce and consume with any Kafka client
echo -e "hello\nworld\nstrike" | kcat -b 127.0.0.1:9092 -P -t my-topic
kcat -b 127.0.0.1:9092 -C -t my-topic -e

This post is about how I built it, the bugs that nearly broke me, and what I learned about implementing a real wire protocol from scratch.

Why Not Just Use Kafka?

Kafka is incredible for production. But for local development and testing, it's overkill:

	Kafka	StrikeMQ
Binary size	~200MB+ (JVM + libs)	52KB
Startup time	10-30 seconds	< 10ms
Idle CPU	1-5% (JVM GC, threads)	0%
Memory	1-2GB minimum	~1MB + mmap'd segments
Dependencies	JVM, ZooKeeper/KRaft	None

I didn't want to build a Kafka replacement. I wanted something that pretends to be Kafka well enough that kafka-python, librdkafka, kcat, and confluent-kafka-go can't tell the difference.

The Architecture

StrikeMQ has four layers, all in pure C++20 with zero third-party dependencies:

        Kafka Clients (kcat, librdkafka, kafka-python, ...)
                            |
                       TCP :9092
                            |
              +----------------------------+
              |     Acceptor Thread        |
              |  kqueue/epoll (accept only)|
              +----------------------------+
                  |     |     |        |
            SPSC ring buffers (lock-free)
                  |     |     |        |
          Worker 0  Worker 1  ...  Worker N-1
          (own kqueue/epoll per thread)
                  |     |     |        |
              +----------------------------+
              |      Protocol Layer        |
              |  Kafka wire protocol       |
              |  encode/decode/route       |
              +----------------------------+
                    |     |     |     |
              Produce  Fetch  List   Metadata
                              Offsets
                    |     |
              +----------------------------+
              |   Consumer Group Handlers  |
              |  JoinGroup, SyncGroup,     |
              |  Heartbeat, OffsetCommit   |
              +----------------------------+
                    |     |
              +----------------------------+
              |      Storage Layer         |
              |  mmap'd log segments       |
              |  sparse offset index       |
              |  (per-partition mutex)     |
              +----------------------------+
                            |
                    /tmp/strikemq/data/

Multi-Threaded I/O

The network layer uses an acceptor + N worker threads architecture. The acceptor thread runs its own kqueue (macOS) or epoll (Linux) loop that does nothing but accept() new connections and distribute them round-robin to worker threads via lock-free SPSC ring buffers. Each worker thread runs its own event loop with its own kqueue/epoll instance, its own connection map, and a pipe-based wakeup mechanism for cross-thread notification.

N defaults to std::thread::hardware_concurrency() — on a 10-core machine, that's 10 independent event loops processing requests in parallel. A slow consumer fetch on worker 3 no longer blocks a fast producer on worker 7.

Every socket gets TCP_NODELAY for minimum latency, and each worker processes up to 64 events per iteration. Frame extraction happens inline — we read the 4-byte big-endian size prefix, accumulate bytes until a full Kafka frame arrives, then route it to the protocol layer. Connection state is thread-local to each worker, so there's no locking on the I/O hot path.

Zero-Copy Storage

Messages are stored in memory-mapped log segments, pre-allocated to 1GB each. Writes are sequential memcpy into the mapped region. Reads are zero-copy — the Fetch handler returns a raw pointer directly into the mmap'd segment. No serialization, no copying, no allocation on the read path.

/tmp/strikemq/data/
  my-topic-0/
    0.log         # 1GB pre-allocated, mmap'd
  another-topic-0/
    0.log

A sparse offset index (one entry per 4KB boundary) maps logical Kafka offsets to byte positions. Lookups use std::lower_bound for O(log n) performance, then scan forward through batch headers to find the exact starting position.

Lock-Free Data Structures

The lock-free primitives aren't theoretical — they're load-bearing infrastructure for the multi-threaded architecture:

SPSC ring buffer — Used to pass accepted file descriptors from the acceptor thread to each worker. Wait-free, cache-line aligned (64 bytes) to prevent false sharing. Uses separate cached head/tail copies to minimize cross-core cache traffic. One ring buffer per worker (acceptor = producer, worker = consumer), so no contention between workers.
MPSC ring buffer — Compare-and-swap loop for multi-producer safety with a committed flag per slot.
Memory pool — Pre-allocated block pool with an intrusive freelist. On Linux, it tries MAP_HUGETLB for 2MB pages, with automatic fallback to regular pages. The constructor touches every page to force materialization and prevent page faults on the hot path.

Storage is protected with per-partition mutexes — PartitionLog::append() holds a lock only for its own partition, so concurrent writes to different topics never contend. The read path (PartitionLog::read()) is completely lock-free, using only acquire loads on atomics to see committed data.

Implementing the Kafka Wire Protocol

This is where it gets interesting. The Kafka protocol is a binary, big-endian, version-aware request/response protocol over TCP. Every request starts with:

[4 bytes] message size
[2 bytes] API key (which operation)
[2 bytes] API version
[4 bytes] correlation ID
[variable] client ID string

I implemented five core APIs:

API	What It Does
ApiVersions	"What do you support?" — Client's first request
Metadata	"What topics exist? Where are the brokers?"
Produce	"Store these messages"
Fetch	"Give me messages starting from offset X"
ListOffsets	"What's the earliest/latest offset for this partition?"

Each API has multiple versions with different field layouts. Produce alone has versions 0 through 5, each adding fields like transactional_id or changing how acks works. The encoder and decoder are version-aware — they check the API version and include/skip fields accordingly.

The Bugs That Nearly Broke Me

Bug #1: The ~34GB Malloc

When I first connected librdkafka, the broker crashed immediately. Not a segfault in my code — a malloc assertion failure inside librdkafka.

Here's what happened: librdkafka sends ApiVersions v3, which uses Kafka's "flexible versions" encoding. This means compact arrays (varint-prefixed instead of int32-prefixed) and tagged fields at the end of each section.

My encoder dutifully added a tagged_fields byte (0x00 = no tags) to the response header. But the Kafka protocol spec has a special exception: ApiVersions responses must NOT include header tagged fields, for backwards compatibility with older clients.

That one extra byte shifted every subsequent field by 1 position. When librdkafka parsed the "number of API entries" field, it read a garbage value that translated to approximately 34 billion entries. It tried to malloc ~34GB, the allocator returned NULL, and the process aborted.

The fix: One line removed — don't write the header tagged_fields byte for ApiVersions responses.

Bug #2: The INT16 That Was an INT32

After implementing Fetch, kcat connected and tried to consume messages. Instead of data, I got:

rd_kafka_msgset_reader_msg_v2:764: expected 18446744073709551613 bytes

That number is (uint64_t)-3 — a clear sign that a signed value was being interpreted as unsigned, and something was off by a few bytes in the binary layout.

The Kafka v2 record batch header has 49 bytes of fixed fields. Two of them — attributes and producerEpoch — are INT16 (2 bytes each). But my serializer was writing them as INT32 (4 bytes each):

// BEFORE (broken):
w32(batch.attributes);      // wrote 4 bytes, should be 2
w32(batch.producer_epoch);  // wrote 4 bytes, should be 2

// AFTER (fixed):
w16(batch.attributes);      // correct: 2 bytes
w16(batch.producer_epoch);  // correct: 2 bytes

Those 4 extra bytes shifted every record in the batch. When librdkafka parsed with the correct field sizes, the varint decoder landed on garbage bytes and produced nonsensical lengths.

This bug was particularly nasty because produces appeared to succeed — the broker accepted and stored the data. It only manifested on consume, when a client tried to parse the stored bytes with the correct field widths.

How I found it: I wrote a Python script to hex-dump the raw .log file and manually walked through each field of the Kafka v2 batch format, byte by byte, until I found the offset where reality diverged from the spec.

Bug #3: librdkafka's Version Gate

Even after fixing the serialization, kcat refused to parse the response. Debug logs showed:

Feature MsgVer2: Fetch (4..32767) NOT supported by broker

librdkafka has a feature gate: it only uses Kafka v2 record batches if the broker advertises Fetch v4 or higher. I was advertising Fetch v0-v0 — valid, but insufficient. The client fell back to an older message format that didn't match what was stored on disk.

The fix: Advertise Fetch v0-v4 and update the response encoder to handle v1+ fields (throttle_time_ms) and v4+ fields (last_stable_offset, aborted_transactions).

Performance

On my M1 MacBook:

Metric	Value
Produce latency (p99.9)	< 1ms
CPU when idle	0%
Memory footprint	~1MB + mmap'd segments
Startup time	< 10ms
Binary size	52KB

The produce path is: recv() → parse header → decode batch → lock partition mutex → memcpy into mmap → unlock → encode response → send(). The only synchronization is a per-partition mutex, so writes to different topics are fully parallel across worker threads.

The consume path is even simpler: recv() → parse header → binary search the offset index → return a pointer into the mmap'd segment → send(). Zero copies of the actual message data, and completely lock-free.

What Works Today

from kafka import KafkaProducer, KafkaConsumer

# Produce
producer = KafkaProducer(bootstrap_servers='127.0.0.1:9092')
producer.send('my-topic', b'hello from python')
producer.flush()

# Consume
consumer = KafkaConsumer('my-topic',
                         bootstrap_servers='127.0.0.1:9092',
                         auto_offset_reset='earliest')
for msg in consumer:
    print(msg.value.decode())  # "hello from python"
    break

# Consume with consumer group
consumer = KafkaConsumer('my-topic', group_id='my-group',
                         bootstrap_servers='127.0.0.1:9092',
                         auto_offset_reset='earliest')
for msg in consumer:
    print(msg.value.decode())
    break

Supported APIs: ApiVersions (v0-v3), Metadata (v0), Produce (v0-v5), Fetch (v0-v4), ListOffsets (v0-v2), FindCoordinator (v0-v2), JoinGroup (v0-v3), SyncGroup (v0-v2), Heartbeat (v0-v2), LeaveGroup (v0-v1), OffsetCommit (v0-v3), OffsetFetch (v0-v3). Topics are auto-created on first produce or metadata request.

What's Next

Log compaction and retention — Segments accumulate indefinitely right now

Try It

StrikeMQ is MIT licensed and runs on macOS (Apple Silicon + Intel) and Linux.

GitHub: github.com/awneesht/Strike-mq

Docker (easiest):

docker run -p 9092:9092 strikemq/strikemq

Or build from source:

git clone https://github.com/awneesht/Strike-mq.git
cd Strike-mq
mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
./strikemq

If you're tired of waiting 30 seconds for Kafka to start during local development, give it a try. Stars and feedback welcome.