DEV Community

Johannes Lichtenberger
Johannes Lichtenberger

Posted on • Updated on


Using Apache Kafka for horizontal scaling of a temporal document store?

Hi all,

since about a year I'm reading a bunch of stuff about distributed systems, especially database systems, for instance Martin Kleppmans "Data Intensive Applications" and just bought the Database Internals book. He's a big proponent of storing data in a distributed log as for instance Kafka or BookKeeper and building more or less (materialized) views from the stored data in various systems on top.

I really like the idea. However, instead of storing data in the backend of Kafka (for that kind of use case I think BookKeeper is better as a transaction commit log and he suggested it) for my Open Source project SirixDB I'm envisioning of replacing the Kafka backend and storing directly in my log-structured storage system, as it does not need a transaction log (just an in-buffer, which can be flushed to disk if otherwise too much memory would have to be used). Thus, one of the cool things is, that resource-wide transactions (on documents -- binary JSON or XML files) in a database are implemented using an atomic swap of an UberPage, which is the main entry point into a trie based index structure (of indexes). Thus, it would be great to not have to store data in a second data store, so to say.

There's a great article from Goetz Graefe, Caetano Sauer and Theo Härder about a similar approach.

I've read that recently reads can be made from followers, which are in sync with the leader of a partition in Kafka, so what do you think about this idea? I think the backend could be replaced.

The cool thing is that in Kafka a topic would probably be a database in my case and a partition a resource, which should be replicated to some nodes in a cluster.

Kind regards and a great christmas time

Top comments (0)

An Animated Guide to Node.js Event Loop

Node.js doesn’t stop from running other operations because of Libuv, a C++ library responsible for the event loop and asynchronously handling tasks such as network requests, DNS resolution, file system operations, data encryption, etc.

What happens under the hood when Node.js works on tasks such as database queries? We will explore it by following this piece of code step by step.