Jinbase – Multi-model transactional embedded database

#showdev #database #python #productivity

Hi Dev !

I'm Alex, a tech enthusiast. I'm excited to show you Jinbase, my multi-model transactional embedded database.

Almost a year ago, I introduced Paradict, my take on multi-format streaming serialization. Given its readability, the Paradict text format appears de facto as an interesting data format for config files. But using Paradict to manage config files would end up cluttering its programming interface and making it confusing for users who still have choices of alternative libraries (TOML, INI File, etc.) dedicated to config files. So I used Paradict as a dependency for KvF (Key-value file format), a new project of mine that focuses on config files with sections.

With its compact binary format, I thought Paradict would be an efficient dependency for a new project that would rely on I/O functions (such as Open, Read, Write, Seek, Tell and Close) to implement a minimalistic yet reliable persistence solution. But that was before I learned that "files are hard". SQLite with its transactions, BLOB data type and incremental I/O for BLOBs seemed like the right giant to stand on for my new project.

Jinbase started small as a key-value store and ended up as a multi-model embedded database that pushes the boundaries of what we usually do with SQLite. The first transition to the second data model (the depot) happened when I realized that the key-value store was not well suited for cases where a unique identifier (UID) is supposed to be automatically generated for each new record, saving the user the burden of providing an identifier that could accidentally be subject to a collision and thus overwrite an existing record. After that, I implemented a search capability that accepts UID ranges for the depot store, timespans (records are automatically timestamped) for both the depot and key-value stores and GLOB patterns and number ranges for string and integer keys in the key-value store.

The queue and stack data models emerged as solutions for use cases where records must be consumed in a specific order. A typical record would be retrieved and deleted from the database in a single transaction unit.

Since SQLite is used as the storage engine, Jinbase supports the relational model de facto. For convenience, all tables related to Jinbase internals are prefixed with jinbase_, making Jinbase a useful tool for opening legacy SQLite files to add new data models that will safely coexist with the ad hoc relational model.

All four main data models (key-value, depot, queue, stack) support Paradict-compatible data types, such as dictionaries, strings, binary data, integers, booleans, datetimes, etc. Under the hood, when the user initiates a write operation, Jinbase serializes (except for binary data), chunks, and stores the data iteratively. A record can be accessed not only in bulk, but also with two levels of partial access granularity: the byte-level and the field-level.

While SQLite's incremental I/O for BLOBs is designed to target an individual BLOB column in a row, Jinbase extends this so that for each record, incremental reads cover all chunks as if they were a single unified BLOB. For dictionary records only, Jinbase automatically creates and maintains a lightweight index consisting of pointers to root fields, which then allows extracting from an arbitrary record the contents of a field automatically deserialized before being returned.

The most obvious use cases for Jinbase are storing user preferences, persisting session data before exit, order-based processing of data streams, exposing data for other processes, upgrading legacy SQLite files with new data models and bespoke data persistence solutions.

Jinbase is written in Python, is available on PyPI and you can play with the examples on the README.

Let me know what you think of this project.

Project Link: https://github.com/pyrustic/jinbase