DEV Community

Cover image for Hacktoberfest: Contribute to our temporal database system
Johannes Lichtenberger
Johannes Lichtenberger

Posted on

Hacktoberfest: Contribute to our temporal database system

We are a (very) small team working on a database system in our spare time (https://sirix.io | https://github.com/sirixdb/sirix).

GitHub logo sirixdb / sirix

SirixDB is a temporal, evolutionary database system, which uses an accumulate only approach. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach called sliding snapshot.

An Evolutionary, Accumulate-Only Database System

Stores small-sized, immutable snapshots of your data and facilitates querying the full history

Tweet

Follow

Download ZIP | Join us on Slack | Community Forum

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub and another tutorial: How YOU can contribute to OSS, a beginners guide

"Remember that you're lucky, even if you don't think you are, because there's always something that you can be thankful for." - Esther Grace Earl (http://tswgo.org)

SirixDB uses a huge persistent (in the functional sense) tree of tries, wherein the committed snapshots share unchanged pages and even common records in changed pages. The system only stores page-fragments instead of full pages during a commit to reduce write-amplification. During read operations, the system reads the page-fragments in parallel to reconstruct an in-memory page.

SirixDB currently…




It began as a research system at the University of Konstanz and was the main focus of two PhD thesis and several bachelor and master thesis.

Johannes, the current maintainer worked on the system for his bachelor as well as master thesis. Furthermore, he also contributed as a research assistent.

The system first of all builds a trie based index over all currently stored revisions. To efficiently reconstruct a revision the timestamps and the offsets into the log-file, the main storage, are written to a revision file. Second, the main document index is referenced from the revision roots. Furthermore, the system stores a path summary as well as secondary indexes in subtrees of the revision root pages. The tree of in-memory indexes are mapped to a sequential log-file during a commit in a postorder traversal. The parent pages store hashes of their children in references to the child pages. This can be used to check if data has correctly been stored in the future.

Another idea is to version the data leaf pages, thus that not the whole page has to be copied during a write. Instead, a clever sliding snapshot algorithm is used to avoid read- or write-peaks which would occur during full snapshots of a page after increments haved been written.

We'd be very happy to get contributions during Hacktoberfest but also in the long run.

Top comments (0)