Kafka is the gold standard for production event streaming. But for local development and testing, it's like driving a semi truck to the grocery store. I built StrikeMQ — a Kafka-compatible broker in C++20 — specifically for the localhost:9092 use case. Here's how they compare with real numbers.
The Setup
StrikeMQ v0.1.4 — C++20, zero dependencies, single binary.
Apache Kafka 3.7 — Running via docker compose with KRaft (no ZooKeeper), default configuration.
Hardware — Apple M-series MacBook, 10 cores, 16GB RAM.
All tests measure the same thing: a process listening on port 9092 that Kafka clients can produce to and consume from.
Binary Size
| Kafka | StrikeMQ | |
|---|---|---|
| Runtime files | ~200MB (JVM + jars + config) | 735KB (stripped, statically linked) |
| Dependencies | JDK 11+, scripts, config dirs | None |
StrikeMQ is 272x smaller. The entire binary — networking, Kafka protocol codec, storage engine, REST API, HTTP server — fits in less space than a single JPEG.
$ ls -lh strikemq
-rwxr-xr-x 1 user staff 735K strikemq
$ du -sh kafka_2.13-3.7.0/
207M kafka_2.13-3.7.0/
Startup Time
I measured time from process start to first successful produce (using kcat):
| Kafka | StrikeMQ | |
|---|---|---|
| Cold start to ready | ~8-15 seconds | < 10ms |
| First produce accepted | ~10-20 seconds | < 50ms |
# StrikeMQ: instant
$ time (./strikemq & sleep 0.1 && echo "test" | kcat -b 127.0.0.1:9092 -P -t bench)
real 0m0.112s
# Kafka: wait for JVM warmup, controller election, log recovery...
$ time (docker compose up -d && until kcat -b 127.0.0.1:9092 -L 2>/dev/null; do sleep 0.5; done)
real 0m12.438s
When you're iterating on code and restarting your broker 50 times a day, those 12 seconds add up to 10 minutes of daily waiting.
Memory Usage
Measured after startup with no topics, then after producing 10,000 messages:
| State | Kafka | StrikeMQ |
|---|---|---|
| Idle (no topics) | ~350MB RSS | ~1.5MB RSS |
| After 10K messages | ~400MB RSS | ~2MB + mmap'd segments |
| Theoretical minimum | ~200MB (JVM heap floor) | < 1MB (code + stack) |
StrikeMQ uses mmap for storage segments. The OS manages page residency — only pages being read or written are in physical memory. The broker itself barely allocates heap. Kafka, by contrast, needs a JVM with a minimum heap, GC metadata, thread stacks for 50+ threads, and page cache for its own log segments.
Idle CPU
| Kafka | StrikeMQ | |
|---|---|---|
| CPU at idle | 1-3% (GC cycles, thread scheduling) | 0.0% |
StrikeMQ uses kqueue (macOS) / epoll (Linux) event loops that block when there's nothing to do. No background GC, no periodic timers, no busy loops. The process is literally suspended by the kernel until a packet arrives.
# StrikeMQ idle for 60 seconds
$ top -pid $(pgrep strikemq) -l 1
PID COMMAND %CPU MEM
12345 strikemq 0.0 1.5M
Produce Latency — Microbenchmarks
StrikeMQ's built-in benchmark suite measures the raw latency of core operations using TSC (Time Stamp Counter) for nanosecond-precision timing. 1 million samples each after a 10K warmup:
SPSC Ring Buffer (push + pop)
The lock-free queue that passes connections from the acceptor thread to workers:
| Percentile | Latency |
|---|---|
| avg | 19 ns |
| p50 | < 42 ns |
| p99.9 | 42 ns |
| max | 13 us |
Memory Pool (alloc + free)
Pre-allocated block pool with intrusive freelist:
| Percentile | Latency |
|---|---|
| avg | 3 ns |
| p50 | < 42 ns |
| p99.9 | 42 ns |
| max | 7 us |
Log Append (1KB message)
The full produce path — lock partition, memcpy into mmap'd segment, update offset index, unlock:
| Percentile | Latency |
|---|---|
| avg | 145 ns |
| p50 | 83 ns |
| p99 | 667 ns |
| p99.9 | 4.4 us |
| max | 15 us |
Kafka Header Decode
Parsing a complete Kafka request header from raw bytes:
| Percentile | Latency |
|---|---|
| avg | 16 ns |
| p50 | < 42 ns |
| p99.9 | 42 ns |
| max | 15 us |
Every operation passes the sub-millisecond p99.9 check. The log append — which is the actual disk write — completes in 145ns on average. That's because mmap turns disk writes into memory copies; the OS flushes to disk asynchronously.
End-to-End Produce Latency
For the full network round-trip (client -> TCP -> parse -> store -> respond -> client), measured with kcat producing 1,000 individual messages:
| Kafka | StrikeMQ | |
|---|---|---|
| p50 | ~1-2ms | < 0.5ms |
| p99 | ~5-10ms | < 1ms |
| p99.9 | ~15-50ms | < 1ms |
StrikeMQ's end-to-end produce stays under 1ms at p99.9. The path is:
recv() → parse Kafka header (16ns) → decode batch → lock partition mutex →
memcpy into mmap (145ns) → unlock → encode response → send()
No GC pauses. No thread context switches for common cases. No JIT warmup.
Consume Latency
The fetch path is even faster because it's completely lock-free:
recv() → parse header → binary search offset index → pointer into mmap → send()
Zero copies of actual message data. The kernel's send() reads directly from the mmap'd file pages. No deserialization, no buffer allocation, no locking.
Resource Comparison Summary
| Metric | Kafka | StrikeMQ | Factor |
|---|---|---|---|
| Binary size | 200MB | 735KB | 272x smaller |
| Startup time | 12s | 10ms | 1,200x faster |
| Idle memory | 350MB | 1.5MB | 233x less |
| Idle CPU | 1-3% | 0% | -- |
| Produce p99.9 | ~15ms | < 1ms | 15x+ faster |
| Dependencies | JDK, scripts | None | -- |
| Threads at idle | 50+ | 12 | 4x fewer |
What This Means For You
If you're running Kafka in docker-compose.yml for local development, you're paying a 12-second startup tax and 350MB memory overhead every time. Multiply that across your team and your CI pipeline:
- Developer laptop: Swap Kafka for StrikeMQ in docker-compose. Same port, same protocol, same client code. Free up 350MB for your IDE.
- CI/CD integration tests: Start StrikeMQ in 10ms instead of waiting 15 seconds for Kafka to boot. Your pipeline gets faster without changing a single test.
- Prototyping: Want to test if Kafka is right for your architecture? Try the idea with StrikeMQ in seconds, not minutes.
What StrikeMQ Doesn't Do
This isn't a production Kafka replacement. It deliberately trades durability and fault tolerance for speed and simplicity:
- No replication (single broker)
- No authentication (no SASL/SSL)
- Consumer group offsets are in-memory (lost on restart)
- No log compaction or retention enforcement
It's a development tool, like SQLite is to PostgreSQL or LocalStack is to AWS.
Try It
# macOS
brew tap awneesht/strike-mq
brew install strikemq
# Or build from source (any platform)
git clone https://github.com/awneesht/Strike-mq.git
cd Strike-mq && cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build
./build/strikemq
Then point any Kafka client at 127.0.0.1:9092. Or use the built-in REST API:
# Produce via curl
curl -X POST localhost:8080/v1/topics/demo/messages \
-d '{"messages":[{"value":"hello"},{"key":"user-1","value":"world"}]}'
# Peek at messages
curl "localhost:8080/v1/topics/demo/messages?offset=0&limit=10"
Run the benchmarks yourself:
./build/strikemq_bench
GitHub: github.com/awneesht/Strike-mq
License: MIT
All benchmarks run on Apple M-series, macOS, compiled with Clang -O2. Your numbers will vary. Kafka numbers are representative of default configurations — tuned Kafka will perform better, but will still carry the JVM baseline overhead. StrikeMQ numbers are from its built-in benchmark suite using TSC-based nanosecond timing.
Top comments (0)