What are the advantages/disadvantages of memory mapped files for a DBMS?

#discuss #help

Hi,

I'm developing a temporal Open Source data store[1] with features such as

the storage engine is written from scratch
completely isolated read-only transactions and one read/write transaction concurrently with a single lock to guard the writer. Readers will never be blocked by the single read/write transaction and execute without any latches/locks. Likewise the writer is not blocked by read-only transactions
variable-sized pages
lightweight buffer management with a "kind of" pointer swizzling
dropping the need for a write-ahead log due to atomic switching of an UberPage
rolling merkle hash tree of all nodes built during updates optionally
ID-based diff-algorithm to determine differences between revisions taking the (secure) hashes optionally into account
non-blocking REST-API, which also takes the hashes into account to throw an error if a subtree has been modified in the meantime concurrently during updates
versioning through a huge persistent and durable, variable-sized page tree using copy-on-write
storing delta page-fragments using a patented sliding snapshot algorithm
using a special trie, which is especially good for storing records with numerical dense, monotonically increasing 64 Bit integer IDs. We make heavy use of bit shifting to calculate the path to fetch a record
time or modification counter-based auto-commit
versioned, user-defined secondary index structures
a versioned path summary
indexing every revision, such that a timestamp is only stored once in a RevisionRootPage. The resources stored in SirixDB are based on a huge, persistent (functional) and durable tree
sophisticated time travel queries

I've read a bunch of stuff about memory mapped files, but I'm still not really sure in which cases a memory mapped file would be better and if I should map the whole file or just the potentially rather small page fragments.

It seems MongoDB, LMDB and other data stores are super fast, because of memory mapped files among other stuff.

I think it would also drop the requirement to cache any page-fragments all-together.

In my case the files can get very big as I'm using an append-only paradigm without segment files.

So, besides that I want to better understand the impacts I wonder if I should map the whole file or just the page-fragment regions (however, they could be rather small due to fine granular storage -- maybe in some cases only a few hundred bytes).

Kind regards
Johannes

[1] http://sirix.io

DEV Community

What are the advantages/disadvantages of memory mapped files for a DBMS?

Top comments (0)

Read next

Meme Monday

Scaling Applications to Zero with Kubernetes and KEDA

How to Build a Strong Tech Resume (Get Hired Faster)

My Everyday Tools