A Bite-Sized Journey into Kafka, Arrow, and Go

CandyFlow isn’t an actual product—it’s a playful concept to showcase what happens when you combine Apache Kafka (scalable streaming), Apache Arrow (ultra-fast in-memory columnar data), and Go (efficient microservices with concurrency). By using these three technologies together, you can build a lean yet incredibly powerful data pipeline that can handle tens of thousands of requests per second at sub-millisecond latencies.

1. Why Kafka + Arrow + Go?

Kafka:
- A bulletproof message broker that ingests massive volumes of data and streams it in real time.
Arrow:
- A columnar in-memory format, perfect for zero-copy reads and near-instant analytics/queries.
Go:
- Offers excellent concurrency performance and a lightweight approach for building HTTP endpoints and consumers.

Putting them in Docker Compose means you can spin up a working prototype with minimal overhead, then scale out if you need bigger volumes in production.

2. Under the Hood (Conceptually)

Producer (Go) → Publishes JSON “candy price” updates to Kafka.
Consumer (Go + Arrow) → Reads from Kafka, appends each message into an Arrow-based table in memory, then exposes an HTTP endpoint (/cheapest, etc.) to handle user queries instantly.
topic-init Container → Creates the Kafka topic automatically on startup.
Zookeeper & Kafka → Provide the robust messaging backbone.

CandyFlow is purely an illustrative name; the “candy price” angle is just for fun. In reality, you could track e-commerce prices, sensor data, or any streaming events that need real-time lookups.

3. The Performance Numbers

Using k6 load tests, we hammered the consumer endpoint (/cheapest):

Ramping from 1k RPS to 10k RPS.
Achieved a p(95) latency of ~0.4–0.5 ms.
Zero HTTP errors across millions of requests.
Only rare outliers around 200 ms, likely due to minor GC/network blips.

This level of throughput and sub-millisecond latency is exceptional and shows how Arrow’s columnar structure + Go’s concurrency + Kafka’s streaming capabilities come together seamlessly.

4. Not a Product, but a Teaching Tool

Remember: CandyFlow is not a real candy-price aggregator. It’s an example designed to:

Demonstrate the synergy of Kafka (for ingestion), Arrow (for in-memory performance), and Go (for concurrency and HTTP).
Prove you can achieve near real-time queries (sub-ms) under heavy loads (thousands to tens of thousands RPS).
Inspire you to apply this same concept to e-commerce price trackers, IoT sensor data streams, or real-time analytics.

5. Closing Thoughts

Cost-Effective & Scalable: The Docker Compose approach is quick to launch and test. You can expand partitions/replicas for bigger use cases.
Minimal Complexity: A few containers, a small amount of Go code, and a straightforward Arrow schema are all it takes.
Impressive Performance: Sub-millisecond latencies at 10k+ RPS without throwing specialized hardware or monstrous clusters at the problem.

CandyFlow stands as a sweet demonstration of what’s possible with Kafka, Arrow, and Go—and hopefully sparks ideas for your own real-world streaming and analytics needs!

Top comments (2)

Chitral Verma • Feb 20

is there some sample code for this ?

Mr. 0x1 • Feb 28

github.com/copyleftdev/candyflow