since about a year I'm reading a bunch of stuff about distributed systems, especially database systems, for instance Martin Kleppmans "Data Intensive Applications" and just bought the Database Internals book. He's a big proponent of storing data in a distributed log as for instance Kafka or BookKeeper and building more or less (materialized) views from the stored data in various systems on top.
I really like the idea. However, instead of storing data in the backend of Kafka (for that kind of use case I think BookKeeper is better as a transaction commit log and he suggested it) for my Open Source project SirixDB I'm envisioning of replacing the Kafka backend and storing directly in my log-structured storage system, as it does not need a transaction log (just an in-buffer, which can be flushed to disk if otherwise too much memory would have to be used). Thus, one of the cool things is, that resource-wide transactions (on documents -- binary JSON or XML files) in a database are implemented using an atomic swap of an UberPage, which is the main entry point into a trie based index structure (of indexes). Thus, it would be great to not have to store data in a second data store, so to say.
There's a great article from Goetz Graefe, Caetano Sauer and Theo Härder about a similar approach.
I've read that recently reads can be made from followers, which are in sync with the leader of a partition in Kafka, so what do you think about this idea? I think the backend could be replaced.
The cool thing is that in Kafka a topic would probably be a database in my case and a partition a resource, which should be replicated to some nodes in a cluster.
Kind regards and a great christmas time
Top comments (0)