DEV Community

Johannes Lichtenberger
Johannes Lichtenberger

Posted on • Edited on

4 1

Using Apache Kafka for horizontal scaling of a temporal document store?

Hi all,

since about a year I'm reading a bunch of stuff about distributed systems, especially database systems, for instance Martin Kleppmans "Data Intensive Applications" and just bought the Database Internals book. He's a big proponent of storing data in a distributed log as for instance Kafka or BookKeeper and building more or less (materialized) views from the stored data in various systems on top.

I really like the idea. However, instead of storing data in the backend of Kafka (for that kind of use case I think BookKeeper is better as a transaction commit log and he suggested it) for my Open Source project SirixDB I'm envisioning of replacing the Kafka backend and storing directly in my log-structured storage system, as it does not need a transaction log (just an in-buffer, which can be flushed to disk if otherwise too much memory would have to be used). Thus, one of the cool things is, that resource-wide transactions (on documents -- binary JSON or XML files) in a database are implemented using an atomic swap of an UberPage, which is the main entry point into a trie based index structure (of indexes). Thus, it would be great to not have to store data in a second data store, so to say.

There's a great article from Goetz Graefe, Caetano Sauer and Theo Härder about a similar approach.

I've read that recently reads can be made from followers, which are in sync with the leader of a partition in Kafka, so what do you think about this idea? I think the backend could be replaced.

The cool thing is that in Kafka a topic would probably be a database in my case and a partition a resource, which should be replicated to some nodes in a cluster.

Kind regards and a great christmas time
Johannes

Top comments (0)

Great read:

Is it Time to go Back to the Monolith?

History repeats itself. Everything old is new again and I’ve been around long enough to see ideas discarded, rediscovered and return triumphantly to overtake the fad. In recent years SQL has made a tremendous comeback from the dead. We love relational databases all over again. I think the Monolith will have its space odyssey moment again. Microservices and serverless are trends pushed by the cloud vendors, designed to sell us more cloud computing resources.

Microservices make very little sense financially for most use cases. Yes, they can ramp down. But when they scale up, they pay the costs in dividends. The increased observability costs alone line the pockets of the “big cloud” vendors.