DEV Community

Donald Johnson
Donald Johnson

Posted on

1

A Bite-Sized Journey into Kafka, Arrow, and Go

CandyFlow isn’t an actual product—it’s a playful concept to showcase what happens when you combine Apache Kafka (scalable streaming), Apache Arrow (ultra-fast in-memory columnar data), and Go (efficient microservices with concurrency). By using these three technologies together, you can build a lean yet incredibly powerful data pipeline that can handle tens of thousands of requests per second at sub-millisecond latencies.

1. Why Kafka + Arrow + Go?

  1. Kafka:

    • A bulletproof message broker that ingests massive volumes of data and streams it in real time.
  2. Arrow:

    • A columnar in-memory format, perfect for zero-copy reads and near-instant analytics/queries.
  3. Go:

    • Offers excellent concurrency performance and a lightweight approach for building HTTP endpoints and consumers.

Putting them in Docker Compose means you can spin up a working prototype with minimal overhead, then scale out if you need bigger volumes in production.


2. Under the Hood (Conceptually)

  • Producer (Go) → Publishes JSON “candy price” updates to Kafka.
  • Consumer (Go + Arrow) → Reads from Kafka, appends each message into an Arrow-based table in memory, then exposes an HTTP endpoint (/cheapest, etc.) to handle user queries instantly.
  • topic-init Container → Creates the Kafka topic automatically on startup.
  • Zookeeper & Kafka → Provide the robust messaging backbone.

CandyFlow is purely an illustrative name; the “candy price” angle is just for fun. In reality, you could track e-commerce prices, sensor data, or any streaming events that need real-time lookups.


3. The Performance Numbers

Using k6 load tests, we hammered the consumer endpoint (/cheapest):

  1. Ramping from 1k RPS to 10k RPS.
  2. Achieved a p(95) latency of ~0.4–0.5 ms.
  3. Zero HTTP errors across millions of requests.
  4. Only rare outliers around 200 ms, likely due to minor GC/network blips.

This level of throughput and sub-millisecond latency is exceptional and shows how Arrow’s columnar structure + Go’s concurrency + Kafka’s streaming capabilities come together seamlessly.


4. Not a Product, but a Teaching Tool

Remember: CandyFlow is not a real candy-price aggregator. It’s an example designed to:

  • Demonstrate the synergy of Kafka (for ingestion), Arrow (for in-memory performance), and Go (for concurrency and HTTP).
  • Prove you can achieve near real-time queries (sub-ms) under heavy loads (thousands to tens of thousands RPS).
  • Inspire you to apply this same concept to e-commerce price trackers, IoT sensor data streams, or real-time analytics.

5. Closing Thoughts

  • Cost-Effective & Scalable: The Docker Compose approach is quick to launch and test. You can expand partitions/replicas for bigger use cases.
  • Minimal Complexity: A few containers, a small amount of Go code, and a straightforward Arrow schema are all it takes.
  • Impressive Performance: Sub-millisecond latencies at 10k+ RPS without throwing specialized hardware or monstrous clusters at the problem.

CandyFlow stands as a sweet demonstration of what’s possible with Kafka, Arrow, and Go—and hopefully sparks ideas for your own real-world streaming and analytics needs!

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (2)

Collapse
 
chitralverma profile image
Chitral Verma

is there some sample code for this ?

Collapse
 
copyleftdev profile image
Donald Johnson

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay