DEV Community: Jonathan Johnson

The Futility of Benchmarks

Jonathan Johnson — Mon, 04 Oct 2021 04:00:00 +0000

I'm normally someone to stress avoiding premature optimization. Unfortunately, when deciding whether to replace Sled as the storage layer for BonsaiDb, I needed to understand whether Nebari could even remotely compare to the speed of Sled. But, I also realized I didn't know how Sled compared to any other engine either. SQLite is one of those projects you always hear about being efficient, and rightfully so, so I wanted to compare Nebari against both of those projects.

There are many other reasons I decided to keep developing Nebari, but today, I'm going to focus on the struggle I had getting Nebari to the point that I could write that last devlog.

Initial stages of benchmarking Nebari

From the outset of working on BonsaiDb, my only goals were to scale as well as CouchDB, as I had built my last business on it. One of the simplest things I should have done much sooner was set up a CouchDB benchmark. I had no idea how performant CouchDB was compared to any other database engine -- even after my extensive experience with it.

Because CouchDB isn't as easy to set up, I started my Nebari benchmark suite only comparing against SQLite. After getting my initial suite working, I found that on single-row inserts and retrievals, I could beat SQLite, but for larger operations, SQLite would easily beat me. I don't have good graphs, because at the time I was experimenting with Nebari being async and using supporting io_uring. This is the best image I had prior to switching to Criterion for benchmarks after ditching async:

This graph is measuring how long it takes to retrieve 100 records out of a data set of varying sizes. As you can see, SQLite is steady, and while I could beat it on small datasets, I wasn't happy with how this was turning out, though. I should have been happy enough given that the project was only 2 weeks old, but try as I might, I just wasn't happy with these results.

I decided it was time to benchmark Sled and CouchDb after switching to Criterion.

The new benchmark suite

My initial report shows a pretty gruesome story for my beloved CouchDB. On every benchmark, Nebari, SQLite, and Sled all are measured on a different order of magnitude. For example:

Sled is so fast that the line doesn't even show up on the graph. Nebari is faster than SQLite in this particular benchmark, and then CouchDB is on its lonesome self at just shy of a full millisecond. What was the operation? Requesting a row by its primary key.

Was I happy now that I knew I was going to be able to beat CouchDB in performance? I should have been, but I knew I had a lot of performance left on the table.

I continued working on Nebari, flushing out its functionality, fixing its bugs, and eventually was able to hook up BonsaiDb atop of it. It was at that point that I discovered that Sled wasn't the cause of the memory bug. If you read that post, you'll see me conclude I'm going to keep writing Nebari for many reasons, but I didn't name speed as one of them.

"I'm not a database engineer"

Imposter syndrome is a fun thing to fight. If you read through my posts about Nebari and BonsaiDb, you'll see me asserting over and over: I'm not a database engineer. A month ago, I arguably wasn't. But, I became one over the past month.

A great one? Probably not, but instead of being nervous about showing people Nebari, I'm now feeling proud to have written it. What changed my mind? It all came down to the realization that benchmarks are futile.

Every time I publish numbers, I make sure to reinforce something that everyone should already know: a benchmark suite is not a predictor of how your application will perform when built with the thing being benchmarked. You can pick the fastest libraries and still bring it to a crawl using an O(n^2) algorithm.

Yet, the true futility of benchmarking didn't start hitting me until I decided I wanted to set up an automated way to run benchmarks on a machine that could produce reliable results over time. I was shocked at some of the initial results:

The top graph shows a dedicated Vultr VPS that might be a potential deployment target for us and the bottom graph shows results from results from my development machine. What's interesting to see is that on my local machine, all engines insert a row in less than 40 microseconds, with the quickest being at around 16 microseconds (Sled).

Compare that with the VPS: The only engine that completes in less than 40us is Nebari at 35.7us. Sled is 3x slower in this particular benchmark, and SQLite is really not happy running on that VPS.

That moment was a turning point for me. If you click through the benchmarks at that stage as reported by my machine, you would most likely agree with me: I should be proud of what I pulled off in less than a month. But, if you then look at the benchmarks on the VPS, you see an even prettier picture for Nebari.

For those wondering why Nebari is faster in these situations, I can only hypothesize because I'm not that familiar with how storage works on a VPS host. My best guess is that appending to the end of a file is more optimized in these environments than whatever is needed for SQLite and Sled to update their databases (file locks? or just worse random write performance?).

I'm not trying to say that these benchmarks are useless. On the contrary, they've helped me understand where I'm likely leaving performance on the table and identify some low hanging fruit already. But, no matter how good I make any of these benchmarks perform, the actual performance in the hosted environment will likely be much different than what I can simulate on my own developer machine. At the end of the day, the only way to optimize the shipping application is going to be to profile the application itself.

The final nail in my imposter syndrome's coffin came yesterday when I finished switching BonsaiDb over to Nebari's transaction log. I measured the save_documents benchmark locally, and saw that my new implementation landed slightly slower than Sled (but with full revision history supported). I then realized I never looked at the performance of save_documents on a VPS before.

I dug through Github Actions logs to see the benchmark results. After looking for the lowest numbers across several old runs, here's the fastest results compared:

BonsaiDb on Sled:

save_documents/1024     time:   [388.02 us 395.56 us 403.93 us]
                        thrpt:  [2.4177 MiB/s 2.4688 MiB/s 2.5168 MiB/s]
save_documents/2048     time:   [510.57 us 523.38 us 535.91 us]
                        thrpt:  [3.6445 MiB/s 3.7317 MiB/s 3.8254 MiB/s]
save_documents/8192     time:   [578.55 us 588.99 us 599.17 us]
                        thrpt:  [13.039 MiB/s 13.264 MiB/s 13.504 MiB/s]

BonsaiDb on Nebari:

save_documents/1024     time:   [187.73 us 194.35 us 201.45 us]
                        thrpt:  [4.8477 MiB/s 5.0247 MiB/s 5.2020 MiB/s]
save_documents/2048     time:   [188.09 us 192.51 us 197.58 us]
                        thrpt:  [9.8850 MiB/s 10.146 MiB/s 10.384 MiB/s]
save_documents/8192     time:   [272.55 us 280.89 us 291.47 us]
                        thrpt:  [26.804 MiB/s 27.813 MiB/s 28.664 MiB/s]

It may sound silly, but seeing these results was cathartic. For a month, I was doing my best to sound confident in what I was doing, but at the end of each day, I found myself fearing that I would ultimately fail to build something that could eventually support my visions of grandeur. I'm confident if I had a more exhaustive benchmark suite for BonsaiDb there would be no clear winner across all measurements.

But, for a project started a month ago to be in the same realm as SQLite and Sled? I'm very happy with that.

Unveiling the hosted benchmark suite

I moved on to finishing up a nice hosted overview of benchmarks, which also describes what each benchmark is testing a little better than the Criterion reports do. These benchmarks are run on an instance that we've identified as a potential deployment target for Cosmic Verge, although it's still too early to know exactly what environment we'll ultimately call home.

Despite the title of this post, benchmarks are still going to be a critical part of developing BonsaiDb and Nebari. It's just important to remember that benchmarks will always be limited in what they can tell you, unless the benchmark is specifically written for your particular use case and being run in exactly the same environment as your production environment.

Nebari is shaping up into a neat library on its own, but I'm excited to start putting more time back into BonsaiDb and Gooey.

Towards Stabilization: Serialization Format(s) for PliantDb

Jonathan Johnson — Mon, 28 Jun 2021 19:46:57 +0000

Last week someone interested in using PliantDb asked a question on our Discord server:

The current version is not yet at 1.0, and messages are everywhere to not use it, yet. What features aren't yet implemented or trusted?

Because Discord isn't a great way to archive these answers publicly, I wanted to share my response:

In terms of what's trusted: everything. I feel confident in this implementation because of the code coverage: https://pliantdb.dev/coverage -- It's not perfect, and I'm sure there are some bugs, but the biggest concern to me is storage formats. I may replace cbor with something else, for many reasons that I'll leave outside of chat here (Dax doesn't even know that thought process yet lol). This sort of fundamental storage change would make a simple update incompatible, and that's what I'm not ready for people to adopt PliantDb aggressively yet.

That being said, part of the unit tests do include testing backup/restore, and my intention is to ensure that export format will always be able to bring you from a previous version to a current version in those situations. The gotcha right now for that feature is that the key-value store isn't backed up currently. https://github.com/khonsulabs/pliantdb/issues/50 (Discovered I overlooked that feature while hooking up those unit tests).

Missing features that I'm aware of for local/embedded use: Collection's don't have a List function. You can list by creating a view over the collection, but I need to add a separate List endpoint. I started this the other day, but I was hoping to do it by replacing get_multiple. I realized that approach was a bad idea from a permissions standpoint, so I reverted the changes to tackle it another day.

For server/client: There isn't any multi-user support (yet). We're on the cusp of it. The certificate handling on the server portion for the QUIC protocol currently only supports pinned certificates -- the goal is for our HTTP + QUIC layers to eventually share the same certificate. For websockets, no TLS currently, and the websockets are mounted at root. Eventually they will be moved to a route on an HTTP layer that you will be able to extend with your own HTTP routes.

That question spurred my brain into action, though. A few weeks ago, I had begun looking into how to protect PliantDb from memory exhaustion attacks. I knew bincode's method, but my initial searches on mitigation strategies for serde-cbor came up blank.

The summary of the results of my searches is that there is a question about whether the current serde-cbor crate should be considered the mainline one, or if a newer one (Ciborium) should replace it. I should note, I haven't tested either crate against this attack, and it could be that one or both of them already mitigate it somehow. And, if either are susceptible, pull requests could address the issue. But, I wasn't sure where my efforts to further investigate should be spent.

Why `CBOR`?

At this point, I wanted to remind myself why I picked the decisions I did. There are two types of data structures in use in PliantDb that need to be serialized and deserialized: ones PliantDb itself manages, and ones that users of PliantDb will provide. This is where the power of serde comes in: PliantDb only needs the user types ot implement Serialize and Deserialize, and it's able to be easily stored in PliantDb.

When considering storage formats for user types, it's important to think about one of the most important aspects of a database: storing and loading your data reliably. Generally speaking, a self-describing format is one that includes enough information that it can be loaded without having the original data structure for reference. bincode has a note in its README discussing its limitations of using this type of format for storage.

In short, if I want PliantDb to be easy to use in a reliable fashion, user datatypes should be enoded using a self-describing format. With CouchDB, a major inspiration for PliantDb, documents were stored as JSON. However, JSON isn't a particularly efficient format, and in my research, CBOR is an open-standard binary format with a reasonable amount of popularity in the Rust community.

For the internal PliantDb structures, I am willing to subject myself to limitations on how to manage migrating between versions of data structures. Those structures I want to serialize as quickly as possible while still providing me some flexibility. bincode fits this bill perfectly. While a custom format technically could be faster, bincode is very fast and well-tested.

So, that's the reasoning as to why I picked CBOR and bincode. But, something rubbed me the wrong way about CBOR and most other self-describing formats. This friction of wanting to solve the only outstanding question for the storage of PliantDb's documents made me confront one of my only dislikes of the CBOR format: its verbosity.

Why consider switching away from `CBOR`?

Imagine this data structure:

struct Logs {
  date: Date,
  entries: Vec<LogEntry>,
}

struct LogEntry {
  timestamp: DateTime<Utc>,
  level: String,
  message: String,
  // ...
}

When encoding this data structure with 50 entries, the identifiers timestamp, level, and message will be in the created file that many times.

As someone who has worked on a compiler that targeted multiple executable formats, I know one of the tricks of the trade: executables include a string table that contains all of the static strings in your binary. If you have the string literal "hello" in your executable in 30 files, the compiler will encode the same address for each reference.

My theory was that by generating a string table for all of the identifiers in the data structures, I could easily gain efficiency on storage while hopefully retaining similar performance to CBOR.

How beneficial would it be? Only one way to find out.

`PBOR` - a working name

I started up a project the next day and lacking creativity, I named it PliantDb Binary Object Representation, or PBOR. While I named it after CBOR, I genuinely came up with this format independently, and while it bears a resemblance, there are a few distinct features. First, let me state my goals explicitly upfront for this project:

Implement a self-describing serialization format with "full" compatibility with serde's features. Essentially, design it to fit serde's design like a glove.
Is safe to run in production: ensure it is safe against malicious payloads.
Is compact: Should be more compact than CBOR.
Is efficient: Should be roughly equivalent to the current performance of CBOR.

Tackling an `identifier` table

So first, a quick discussion about the practicalities of having a string/identifier table. One downside is that you don't know the size of the table until the entire data set has been scanned. This creates a conundrum. If you want the table at the start of the output, you will need to either allocate an arbitrary amount of space and come back and patch it in, or you write it after the data and include a 'jump' target in a header (requiring the ability to seek backward over your written data).

The problem with both approaches is similar: the entire payload must be in-memory to efficiently process the data, either during serialization or deserialization. So, as I began to think about how to design the format, I started thinking about having a format that would allow me to output an identifier once, then in the future, refer to it by id.

This began highlighting a core design idea, that each chunk of data was going to have a kind and an optional argument. This turns out to be another way that CBOR and my format differ. For CBOR, the argument is always output as a second byte (or additional, depending on how big the integer value is). The way I tackled the problem requires slightly more work but appears to over-time save storage space.

Let's establish a new term: an Atom. In PBOR an atom is an individual chunk of data. The first byte contains three pieces of information:

Upper nibble (& 0b11110000): the Atom kind.
Fifth bit (& 0b1000): Additional bytes are part of the argument
Last 3 bits (& 0b111): the first 3 bits of the argument.

To parse the atom header, if the lower nibble is not 0, there is an argument. The last three bits are extracted, and then if the fifth bit is set, an additional byte is read. Next, the lower 7 bits are shifted into the proper location, and if the highest bit is set, the loop is continued. The maximum size for an argument is a u64, which makes the maximum atom header weigh in at 10 bytes with this extra encoding.

However, this packing provides some interesting opportunities. The remaining three bits can hold a value of 0 through 7, and if needed, the argument can scale up to a u64. However, values that are less than 8 can be stored in a single-byte atom header.

Let's examine integers and floats. The most common sizes are all 8 bytes or less. So, if we subtracted 1 (and disallowed a 0-byte integer), all integers that are u64 or smaller will only require a single byte header to denote the atom is an integer of X bytes size.

With bytes and strings, the argument can be the length of the data. Small values would still fit within a single byte, but string or byte sequences that were less than 1,024 bytes long would fit within a two-byte header. Long story short, the loss of the single bit of encoding still allow most practical values to fit in one-byte-smaller encoding.

Finally, let's think about identifiers. In PBOR there is an atom kind Symbol. When the serializer first encounters a new identifier, it will write an atom (Symbol, 0), followed by a string atom containing the identifier. The deserializer will expect a string when it receives an 0 in the atom header. Both the serializer and deserializer will assign it a new id, with the first one starting at 1 and counting upwards.

When the serializer encounters an identifier it has already serialized, it will emit the symbol ID as the atom argument. The deserializer will not expect a string when it receives a non-zero argument and instead will look up the already-deserialized string.

How did the experiment go?

Here's where the arbitrary benchmark I've chosen (adapted from this project's log benchmark):

Library	Serialize (ms)	Deserialize (ms)	length	gzip length
bincode	0.5757	2.3022	741,295	305,030
pbor	2.1235	4.7786	983,437	373,654
serde-cbor	1.4557	4.7311	1,407,835	407,372
serde-json	3.2774	6.0356	1,827,461	474,358

These numbers should be considered arbitrary by anyone reading this. PBOR is not a clear winner on any given metric, but it did achieve my primary goals.

Ultimately, going into the experiment I underestimated the cost of building and maintaining the symbol map both during serialization and deserialization. It took far too long to optimize it to be able to become equivalent on deserialization speed. I'm confident I can squeeze a little more performance out here or there, but I've stopped focusing on that for now. Instead, I wanted to openly ask: does this seem like a good idea, or should I just keep embracing CBOR?

Unfortunately, to give realistic practical numbers, I'll need to take this experiment further, so I'm taking this moment to pause and reflect and make sure this goal is something worth spending time on.

One of the ways to prove its worth would be more benchmarks. But to benchmark the true impact to PliantDb, we must consider how data flows through the database.

Understanding serialization in `PliantDb`

At the core, the Document type contains the serialized bytes of the document that is stored. This means that when saving to the database, the code connecting to the database is responsible for serialization: not the server. Thus, the penalty for serialization cost will live wherever the documents are being saved from.

If your View code deserializes the document on the server, the deserialization speed impacts that code's execution. However, this only affects the View updating processes and does not impact View queries themselves.

The server doesn't deserialize the documents for document fetching or view queries and simply sends the serialized bytes across. Thus, the format of the data on-disk directly impacts the amount of data transmitted across the network.

The last thing that I would find interesting to measure in real-world workloads is how often a document is serialized compared to deserialized. It seems reasonable to assume that the average document is deserialized more than once for each time it's serialized. Yet, not all data is the same -- many kinds of data are written and rarely read from, such as application logs.

What should `PliantDb` support?

Because of this mixture of "who pays the cost", there may ultimately not be a correct answer. My gut thinks that PBOR is an interesting option, but there are significant benefits to using an open standard like CBOR. I don't believe either choice will significantly affect the performance of PliantDb servers. Finishing up PBOR would require several more days to flush out unit testing and benchmarks and a few rough edges.

As such, I'm seeking input! I'd love to hear what your thoughts are for the self-describing format support. Here are the three options as I see them, but please leave a comment if you have other ideas.

Stick with CBOR
PBOR sounds worth pursuing further
PliantDb shouldn't have one enabled by default, and users should be able to pick via feature flags. Clients and servers should be able to support multiple formats at the same time.

I'm running a poll on this post on our Discourse forums, but I would love feedback in whatever way is the easiest for you to provide it.

Keep in mind that this only impacts the built-in deserialization methods. You can always interact with the document contents directly to use your own libraries.

Thank you in advance for any feedback!

Guaranteed unique; Or, why dogfooding can be taxing.

Jonathan Johnson — Sun, 02 May 2021 00:00:00 +0000

As I looked towards the future of PliantDb, I thought my next step was to begin working on the permissions system. I've been setting a goal to try to have Cosmic Verge running on PliantDb by Saturday so that when I give an update on the game, it will have had some meaningful progress. In reviewing my action plan, I wanted the native clients to talk to the PliantDb server directly over PubSub. To do that without fear of people doing something that could break the game, I wanted to restrict unauthenticated database connections to specific actions. For the demo, there wouldn't be any user accounts.

I spent some time working on a permissions system design inspired by AWS's IAM policy system. I'm delighted with how that API is coming along, and I'm excited by the vision that we have on how we're going to try to build the permissions system in a way that makes applying it automatic and straightforward while still allowing flexibility to do complicated logic as-needed. But, this post isn't about that -- I'll write up a summary once I've finished implementing the system. The reason for this post is to talk about a seemingly unrelated feature: unique views. As odd as it sounds, I couldn't bring myself to finish the permissions system until after I solved this problem.

The Problem: Guaranteeing Uniqueness

As I finished writing the lower-level part of the permissions system, I began looking at how the permissions would be managed -- through roles and groups. My approach to Cosmic Verge's development is the same as PliantDb's development: If I can design some chunk of code that can be reused over and over to build my project, I'm going to want to use that tool for the job. PliantDb's job is to store collections of data. These permission groups and roles should be implemented using the same schema objects that PliantDb users will be using: PliantDb needs to eat its own dogfood.

For these structures, I was going to want to have a human-readable name be their unique identifier. When reading these permission structures, seeing "Administrators" as the group name instead of "21674831" is infinitely more useful. In a traditional database, the first tool I would use for this would be to use this a varchar as a primary key. In CouchDB, if you don't specify an id when you create a document, it will automatically generate a UUID-style ID. However, you can also specify an ID at the time of inserting, and it will use that ID -- and for CouchDB, that can be any JSON data type. In PliantDb, to keep things simple and efficient, I decided to restrict the document IDs to u64s.

Another approach that traditional databases can use is a "unique constraint" -- the ability to have the database check that before updating/inserting any data, it checks that certain constraints hold true. In PliantDb, I had the idea to support "unique views," which would allow any View to restrict the view entries to one per key optionally. For example, in the PermissionGroup collection, I could define this view:

impl View for PermissionGroupsByName {
    type Collection = PermissionGroup;
    type Key = String;
    type Value = ();
    const UNIQUE: bool = true;

    fn map(&self, document: &Document<'_>) -> schema::MapResult<Self::Key, Self::Value> {
        let group = document.contents::<PermissionGroup>()?;
        Ok(Some(document.emit_key(
            group.name.to_ascii_lowercase(),
        )))
    }
}

Whenever a new PermissionGroup is inserted or updated with a key that already exists, a UniqueKeyViolation error will be returned.

The first approach solves a core desire of mine, but the second approach is much more versatile. Ideally, both would be supported by PliantDb.

Supporting Arbitrary Primary Key Types

I felt inspired to dive into the first approach: supporting arbitrary primary keys. I started at Document, changing the id: u64 to id: Vec<u8> to support an arbitrary number of bytes. I then added an id() method, which attempts to decode the value using the Key trait that Views already use. Unfortunately, this can error, so the effect of changing all of the doc.header.id references to doc.header.id()? started to take a toll on the readability of the code. Eventually, I decided it was too many question marks for a user to endure and backed out of this approach.

Despite having rolled back my changes, I may still reattempt to support this feature using a different approach -- but it will need to involve a new Document type, one that already deserializes the ID into a generic type.

The challenges of dogfooding

This moment is where the inspiration for this blog post came from: dogfooding a large project can be hard. As I stared at the back-to-square-zero code base, I started to be tempted to ignore this problem. After all, the only real problem arises under a heavily concurrent situation, and will PliantDb users really be adjusting their permissions in contentious situations? Probably not.

Each decision to push functionality down the line yields some technical debt. PliantDb is the core of the architecture we're trying to build in Cosmic Verge. To me, the area you want the least technical debt in are the parts that are at the "core" of your codebase.

This high amount of dogfooding will hopefully allow us to acheive these grandious goals, but it does come at the expense of needing to spend extra time on the core components to ensure the entire machine works. My break from the computer helped me remember a few important lessons:

First lesson: don't set arbitrary deadlines when "passion" is in the project description. This post started off discussing how I was trying to progress on the game itself before the next game dev meetup. Yesterday was one week from the next meetup. The motivation for the goal seemed innocuous: it's a game dev meetup; I kind of want to show progress of the game itself. But, the reality is that there's no real pressure to do so. No one is seriosuly expecting an entire database engine to be done in less than two months. I knew I could hit that goal. Heck, I even think I might still hit that goal. But that's beside the point: Setting goals is different than setting deadlines. My goal is to get Cosmic Verge on PliantDb. But, if I force myself to meet that goal on a deadline, I might end up with technical debt, but worse, it might come at the expense of mental stress.

Second lesson: dream big, but take each day one step at a time. With just PliantDb, my list of things I could tackle each day is immense. Add a game to it, and there's no way for a single person to cross the finish line for my lofty visions for both projects. The reality is that I can't make PliantDb or Cosmic Verge reach their fullest vision on my own. But, I can try my best to ensure I'm a step closer to that vision after each day I work. This works right now because I generally try to plan a few steps ahead of where I'm going. For example, right now the high-level PliantDb list is:

Permissions
Platform Trait
Multi-user support
Replication

I tend to take this approach to planning because I like to reflect on the remaining items each time I finish one. I want to be sure they still seem like the best next steps, otherwise I might want to spend some time adjusting my plan. When I was in the moment and getting frustrated, I had lost sight of this process and was focusing on delivering an arbitrary feature set by an arbitrary date. While I could take on the stress of ensuring I have something by that date, the much healthier approach is to take each day in stride and evaluate where I'm at closer to the meetup.

Third lesson: Me-time is needed too. If you've seen my Discord statuses over the last month, you'll know that my progress on PliantDb has been made despite an increasing amount of time I've been playing Factorio. I've been having some regular gaming sessions with friends, spending time with my wife, and making progress on a re-listen of The Licanius Trilogy. But, nearly every waking moment that wasn't a chore or hanging out with someone else was spent working on PliantDb.

This realization hit me as I sat to really enjoy the piano for the first time in several weeks. I play regularly, but lately, it had been only for 20-30 minutes every few days. I still would have fun playing, but I was usually playing to feel like I'm practicing. I started the same way this weekend, and after a couple of songs, I came back and sat at the computer. I thought I was ready to tackle unique views. Yet as I stared at the monitor more and more, I just realized that playing the piano more sounded better to me at that moment in time. I went back to the piano and played until my back was sore -- in a good way.

Those moments enjoying the music for the escape it was providing made me realize I needed to relax for the rest of the day and take the artificial pressure off myself.

Unique Views

Last night while enjoying my time away from the computer, I still found myself pondering the challenges of implementing unique views. It sounds simple at first glance, but it flips the responsibility of view updating on its head. Before this implementation, when you save a document, no views are updated immediately. Instead, when a view is queried, at that time, the view will be indexed, and results are returned. This means if you update a document 5 times but only access the view one time, the view's code is only being evaluated once. However, for a unique view to work, document saving must take the responsibility of the view indexer.

A little while ago, I noticed that you could define an associated constant in a trait. Implementors of that trait can be required to provide their own value. I haven't seen this used much in practice, but I immediately thought of using it for this unique flag for the view. The inline example earlier in this post shows how this works for the View trait in PliantDb now. I'm not sure if I'm going to keep this approach or change it to how version() is a function. For today, the constant makes more sense, but I also envision dynamic views that will need to be created at runtime in the long term.

Despite my fear of revisiting some of the first code I wrote for PliantDb, overall it was a pretty painless process. The view indexer already had the individual-document update logic in its own function, so it was easy to call it from the transaction executor.

Final Lesson: Worrying isn't worth it

I'm happy I got this feature done. Its journey to completion started weeks ago: when I was thinking of porting Cosmic Verge to PlaintDb, I had identified this as something I would want. But, each time I thought of it, I lamented it. I worried about how annoying changing that logic was going to be. I felt like it was going to make the already complex code much harder to understand.

In the end, it was much easier than anticipated. And, now that it's done, I'm excited at how much less "blocked" I feel on the project. All of the worrying amounted to nothing except stress.

So, instead of promising when the next update will happen, I'll just say I'm looking forward to giving an overview of the permissions system whenever I'm done. Until next time!

PliantDb 0.1.0-dev.3: Updates + Thoughts on the Vision

Jonathan Johnson — Mon, 26 Apr 2021 00:00:00 +0000

When I last left off, I had reached a significant milestone for PliantDb: I had just released the first version of the client-server functionality. Today, I wanted to recap what I've done in the last two weeks, but more importantly, start painting a picture of where this project is going in my head. If you thought the goals in the project's README were already lofty, you're in for a journey today.

What's new in PliantDb 0.1.0-dev.3

PubSub

PliantDb now offers PubSub:

let subscriber = db.create_subscriber().await?;
// Subscribe for messages sent to the topic "ping"
subscriber.subscribe_to("ping").await?;
db.publish("ping", &1_u32).await?;
println!("Got ping message: {:?}", subscriber.receiver().recv_async().await?);

Key-Value Store

I don't want any compromise on the ACID compliance of transactions in collections, yet that comes at a significant performance cost. Sometimes, you'd rather sacrifice data safety for high-performance. The Key-Value store aims to provide redis-like speed and functionality to PliantDb. The current API is limited to basic set/get/delete key operations, but it supports enough atomic operations to enable using the key-value store as a synchronized lock provider. For example, executing this operation on multiple clients will result in only one client executing the isolated code:

match db.set_key("lock-name", &my_process_id).only_if_vacant().expire_in(Duration::from_millis(100)).await? {
    KeyStatus::Inserted => {
        // Run the isolated code
    }
    _ => {
        println!("Other client acquired the lock.");
    }
}

Improving the onboarding experience

From the start of this project, I've enforced public APIs to have documentation. I've also tried to create reasonably simple examples of the basic functionality of PliantDb. However, for a project like this, there's a lot of use-case-specific topics that need to be covered. I decided there needed to be a book. It's still very early in progress, but it seems perfect to share at this stage of the project: pliantdb.dev/guide.

At this stage, I wouldn't want anyone to use PliantDb in a real project yet. I think the storage mechanisms themselves are reliable and can be trusted, but I can't guarantee that the storage format will be stable between versions. Because of this harsh anti-recommendation, the guide is at a good stage for people interested in the project: it covers some high-level concepts. It also begins to explore some of the concepts I'm going to discuss here later on -- writing those sections were an inspiration for this post.

I want the user guide to have sections for topics that cover the knowledge someone needs to possess to feel confident in being their own database administrator. It sounds daunting, but the goal of PliantDb is to make being a responsible database administrator as easy as it can be.

Keeping PliantDb Modular

While having many feature flags can be daunting, I think I've come up with a good approach to the feature flags in the "omnibus" crate. If you're just getting a project up and running, full can be used to bring in everything. If you want to pick and choose, you can now enable each of these features independently:

PubSub
Key-Value store
WebSockets
trust-dns based DNS resolution on the client
Command-Line structures/binary

Furthermore, realize that this is a core goal of mine: While this is a reasonably large project, you will be able to pick and choose what you need in your database. Some functionality will need to be integrated to work optimally, but as much as possible will be kept modular.

Fun fact: there are currently 18 build jobs processed in CI to ensure each of the various valid feature flag combinations compile and pass unit tests.

Redefining the Onion

The design of PliantDb is meant to be layered, kind of like an onion. The pliantdb-core is the core of the proverbial onion, and the first layer around it is pliantdb-local. Before today, here's how I described the layers:

local: Single-database storage mechanism, comparable to SQLite
server: Multi-database networked server, comparable to CouchDB.

In discussing the plans I'm about to unveil to you, I realized I went too far in mimicking CouchDB's design. I decided to implement the multi-database abstraction as a server-type operation -- the server doesn't really care about the databases. It just organizes multiple databases together. But, CouchDB is only accessible via HTTP, unlike PliantDb. In PliantDb, your code can run in the same executable as the database.

Because of this, a very valid use case is a completely offline multi-database storage mechanism. Suppose you are running a single-machine setup and aren't needing any other access to the database. In that case, you should be able to utilize all of PliantDb's features that make sense: multiple databases, key-value store, PubSub, and more to come. This realization had me commit a massive refactoring defining the layers as:

local: The Storage type provides multi-database management, and the Database type provides access to a single database.
server: The Server type uses a Storage instance internally and allows accessing it over a network.

As a testament to this being the correct design decision, I was able to remove many internal APIs that were needed to support the Server before. While it was a painstaking process, I'm pleased with the outcome.

Fixed backup/restore

As part of the previous pull request, there was an update to the backup process. The bug wasn't related to the safety of the data but rather that I wasn't saving the executed transaction metadata. At the time, that was a design decision, but I didn't test well enough. It wasn't until the multi-database implementation utilized a view query under the hood that an expect() failed in the view indexer: the view indexer thought it was entirely reasonable to expect: if there were documents, there must be a transaction id.

As I thought about my original decision, I realized I was deeply mistaken. Not saving the transaction information breaks the ability for a restored database to keep replication history. So, now that I've updated backup/restore to work for multi-database (another side-effect of this design decision) and included transaction information, here's what it looks like:

The top-level directories, admin and default, are the two databases in this example exported. The admin database is the internal database used to track the databases that have been created. default is the default name of a database, if it was created for you automatically during Database::open_local.

Inside of each database folder is a _transactions folder. Each file is a single Executed transaction.

All of the remaining folders will be Collections of documents. Each file is named using the document ID and the revision number. The contents of the file are the exact bytes that were stored in the document, which usually means it's encoded as CBOR. But, you can manage the document bytes directly if you desire.

What's the end goal of PliantDb?

As odd as it may sound, I'm writing PliantDb to power a game I'm writing. As I mentioned in my last post, the game currently is using PostgreSQL and Redis, and the changes above were all inspired by thinking about what I need to be able to update Cosmic Verge to use PliantDb instead of those two engines.

Once I finished the key-value store, I found myself ready to start on that task! But, as I started trying to figure out where to begin the refactoring, I realized I had been having grandiose visions of PliantDb that I thought were unrelated to Cosmic Verge... Only, they were starting to seem relevant now that I thought about it more.

I'm going to start with the conclusion: PliantDb's ultimate form is a platform to help you build a modern Rust-y app. For Cosmic Verge, it will be what game clients connect to over the internet, and it's what our internal API will be powered by. To support this safely, a robust permissions model will be needed. But, rest assured, if all you want is a local database with minimal features, you'll be able to get just that and no more.

To begin to understand why this is the logical conclusion of multiple days of conversations on Discord, we must first start thinking about the goals of the Cosmic Verge architecture we're going for:

We want to have a large number of "locations" with independent sets of data and regular-interval game loops.
We want to have a cluster that can scale up and down as needed to meet demand. This means dynamically moving locations between servers as load is increasing.
We want to have every location be configured in a highly-available setup. If one server fails, clients should barely notice a hiccup (the only hiccup being if they dropped their connection).
Every server will have PliantDb data on it, but we want custom logic driving placement of data/tasks within the cluster. We want to be able to use metrics to balance load intelligently.

Because of these basic facts, we concluded that every Cosmic Verge server was going to be a part of the PliantDb cluster. And, if each server was going to be connected via PliantDb, could we improve upon PliantDb's solution to solving our networking problems by implementing a separate protocol? In the end, we decided we couldn't. But more importantly, as we reviewed the features needed by Cosmic Verge to achieve clustering and the features required by PliantDb, we realized the overlap was too significant to ignore.

Why is this better than using some other database cluster? It boils down to how PliantDb works in the server's executable. Each instance of the Cosmic Verge server will open a PliantDb server in cluster mode. When the server's code calls into the cluster, it will know what servers contain the data in question. For a PubSub message, for example, it knows precisely which servers have any subscribers listening to the topic of the message being published. Because of this knowledge, a PubSub message sent through the PliantDb cluster will be a direct message between two servers in the same cluster. The same knowledge also works for all database operations. If you need a quorum write to succeed, and you're one of the three servers in that particular database shard's cluster, only two network requests are sent. Or, if you ask for a cached view result, your local server instance will return the data without making a network request if it can.

But, what about the actual game API? How is PliantDb going to help with that? Let me introduce a project that I haven't updated in a little while: basws. This is the project that Cosmic Verge currently is built on. The main idea is to build a simple way to create API servers, abstracting the authentication/re-authentication logic as much as possible. As I started envisioning how I would integrate PliantDb with it, I started realizing that I wanted PliantDb to have some of this functionality. It wouldn't be hard to add this exact functionality into PliantDb and give it direct support for the Users and Permissions models. A clear win for Cosmic Verge, but hopefully for a lot of developers.

What's next?

I have my hopes on demoing a native-client version of Cosmic Verge at next month's Rust Game Dev Meetup powered by PliantDb, but to do that I need a few more things:

Permissions: I don't want to allow people to connect to a PliantDb server that has no concept of permissions.
basws-like API layer: This layer will be defined as a trait that you will be able to optionally provide on the Server and (eventually) Cluster types.
Users if I want to support logging in, although for the demo I might simply give each player a unique random color.

The next meetup is on May 8. I'm hopeful, but there's a lot of work to do. And, I keep finding myself writing very long blog posts!

As always, thank you for reading. I hope you're interested in PliantDb. If you'd like to join the development conversations, join our discord.

PlaintDB Serves - another milestone reached

Jonathan Johnson — Wed, 14 Apr 2021 17:53:00 +0000

It's been a productive couple of weeks since I introduced PliantDB. I merged the pull request enabling client/server communications. The journey took a little longer than I had anticipated, but that's for a few reasons. Ultimately, I want to stress something: You can be extremely productive in Rust.

If you want to just learn about how PliantDB's engine works, my previous post goes into more detail. Or, you can listen to my talk at last Saturday's Rust Game Dev Meetup, embedded above. Today, I'm going to talk more about the process of developing.

My journey from Rust-noob to when I began PliantDB

My Rust journey began a few years ago when I haphazardly threw together a small tool to wait for AWS CloudFormation stacks to reach a "complete" state. The official AWS CLI application allows you to wait for a single state, such as "UPDATE_COMPLETE," but not for one of many states (or any state matching a COMPLETE-like status). So, I wrote a simple tool using rusoto. I liked the idea of Rust, but it didn't click for me yet. Stubborn me didn't actually read the book.

Fast forward to when I'm daydreaming about quitting my day job to pursue game development. At that point, I firmly believed in the idea of why Rust was a big deal, but I still hadn't done anything beyond that simple tool. When I quit my job in November 2019, I had only started trying to dive into Rust full time.

PliantDB's initial commit was on Friday, March 19. I know I began writing code that morning because I kicked off the day by having a conversation with one of my former business partners: "You'll never guess what I'm seriously thinking of doing after we end our call."

When I told him, "I'm going to write my own CouchDB-like database," he protested in the fashion he always would as we debated ideas back when we ran our business together. Within a few minutes, I had sold him on the idea, which gave me the last boost of confidence I needed to embark on what most developers would consider a foolish endeavor.

Tackling async compatibility issues

I settled on sled after evaluating the landscape of available BTree-like data storage layers. It's a complex project, but it's well-tested and is fairly widespread in use. From the initial moments of designing this architecture, I was thinking of how to fit it within sled to utilize its transactions to ensure ACID compliance.

This fundamental decision wasn't without downsides. The primary of which is that sled isn't "compatible" with async/await in Rust. What I mean is that if you're trying to integrate it within an app that uses tokio, for example, you either need to operate sled within its own thread pool outside of tokio, or you need to use blocking wrappers such as spawn_blocking. These come with their own downsides, such as long-running blocking tasks requiring to worry about tasks on the currently blocked thread not executing.

For today, I've chosen to use my best guess as to the best type of blocking wrapper for each type of operation, but the long-term goal is utilizing a new async executor that Daxpedda is working on. It's compatible with tokio, but it already has a concept named block_on_blocking, which is an optimized version of blocking designed to more fairly block without needing to adopt a 'static lifetime requirement due to using spawn_blocking. He's about to resume working on the executor, but he was responsible for the QUIC-based networking stack PliantDB is using and is wrapping up a few last requests before moving on.

Complexities of supporting a rich type system over a network

The second major battle was something I hadn't fully comprehended when I started: How do you deal with types in a safe way while only exchanging bytes between a client and server? In my head, I knew serde was going to be a big part of the solution, but I didn't quite realize the levels of indirection I was going to need.

Let's take a look at an example:

db
 .view::<ShapesByNumberOfSides>()
 .with_key(3)
 .query_with_docs()
 .await?

This code could be code running on a client talking to a remote database or using PliantDB locally in a form akin to SQLite. This is meant to be one of the selling points of PliantDB, but to make this work, it's rather tricky. Here's how it works:

db.view::<ShapesByNumberOfSides>() returns a View, which acts as a builder for accessing a view. with_key(3) sets the key field of the View to QueryKey::Matches(3_u32). Finally, query_with_docs() simply calls Connection::query_with_docs().

Let's look at it from the Client's perspective. Following it along the route to the server will show the complexities I had to navigate and the power of Rust each step along the way.

On the client, db is a RemoteDatabase<Schema>. This implements connection, and converts the parameters Option<QueryKey<u32>> and AccessPolicy into Request::Database { database: "dbname", request: DatabaseRequest::Query { view: "view-name", key: Option<QueryKey<Vec<u8>>>, access_policy, with_docs: true } }.

Once it has it in this enum, it can be sent via QUIC or WebSockets across the wire. It will receive that request on the server, but at the layer that it's receiving the request, the server doesn't have any generic types in its signatures. So, we must design a way to talk to our Storage<Schema> without the <Schema> part!

This is done using an internal trait OpenDatabase, which the server implements for Storage<Schema>. This is the first layer, allowing the network code to invoke query_with_docs which takes the view's name rather than the type of the view. It then looks up an abstracted version of that view which automatically serializes and deserializes across its access points. These are the same conversion mechanisms that were used when initially creating the ViewEntries when indexing these views.

Finally, once the response is retrieved, the journey happens in reverse, going through Response::Database(DatabaseResponse::ViewMappingsWithDocs()).

To me, it's incredible the lengths that you can go in Rust to allow transparent handling of native types in user-code. One of my initial goals of PlaintDB has been achieved: writing local and remote code using async/await without needing to care whether the data is local or not.

Multi-tasking is challenging sometimes, even with Rust

Ultimately, the goal wasn't to provide a WebSocket implementation in the first pass of the server. I had a goal to present at the Game Dev Meetup this past Saturday, and I really wanted to have a working client/server, but Daxpedda and I were having troubles with some of our code. It was becoming tough to isolate whether the networking code was to blame or whether PlaintDB was to blame.

That's when I decided to add WebSockets. I was pretty confident I wanted them long-term anyways. Additionally, it was to give me a way to use a protocol I was familiar with and wasn't very complicated to verify the server's functionality. I found bugs with my code in PliantDB pretty quickly, but I was having two peculiar issues.

First, I was becoming more and more confident that the channel library Daxpedda and I fell in love with, flume, was misbehaving, but I couldn't seem to reproduce it outside of the massive PliantDB codebase. I finally called up Daxpedda on Discord and screen shared my debugging session, showing him how the tests succeeded if I retained a channel. If I allowed the sender to drop after successfully sending, sometimes the tests would fail. He agreed, something was odd. It took me a while, but I finally whittled it down to about 30 lines of code and reported the issue. In an amazingly quick fashion, the maintainer fixed the issue and released an update. And for the record, I still fully love and recommend this library if you're mixing async and non-async code using channels. It's a wonderful implementation.

The second issue was that every time I ran my unit test suite as a whole, I would sometimes succeed, but more often than not, after a random number of tests, all of the rest would fail. This ended up being my own stupidity. When I was writing the unit tests for the client, I thought to myself, "If I create one shared server, I can test the server differently by running each client test suite on a single server in its own database." I liked the idea, but I didn't think about the problem of achieving it.

Pro-tip: #[tokio::test] creates a unique tokio runtime for each test

When spawning the server, I was spawning it in a runtime that would dutifully get destroyed once the test completed. Whatever other tests happened to finish before the server was destroyed would get green marks, and the rest would start getting connection refusals.

Of course, this manifested itself in fun ways to my code -- channels just disconnecting all of a sudden, and often I wouldn't have any errors displaying anywhere!

So, remember: when writing async tests, if you spawn into your async runtime, it will not last beyond the current unit test. In this case, I decided to move that style of test to an integration-style test, to keep the "unit" nature more accurate.

Sharing Unit Tests

One of the neat results of using the same trait to implement the database interface for Client/Server/Local is that a common unit testing suite was able to be written and reused:

pliantdb-core::test_util::define_connection_test_suite!
pliantdb-local::tests
pliantdb-server::tests
pliantdb-client::tests::websockets
pliantdb-client::tests::pliant (the QUIC-based connection tests)

This means that as more database functionality is added, it can be added to the common test suite and automatically tested across all layers of PliantDB. Once clustering support is added, the same suite will be tested there also.

Being Productive

This morning, I decided I wanted to write an example for the PliantDB server. At the end of the day, I wasn't happy with the type of interaction-less result I could make with the current functionality, so I added reduce_grouped(). I marveled with Daxpedda in Discords after looking at the diff: 19 files,+494,-90. It took me about an hour from the point of introducing my first compilation issue to getting it compiling. I added a couple of unit tests into the existing suite, and it all worked.

This is a regular occurrence with Rust and me. Yes, I can tell you about my experiences of having to debug multithreading issues. I can tell you they're just as painful as they are outside of Rust. However, the building blocks of the language itself encourage a design that helps eliminate so many types of runtime issues that you can encounter. You can still have errors in your logic, but I am finding that more often than not: when it compiles, it works.

Let's look at the stats of PliantDB as of tonight, using tokei:

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Shell                   2           29           24            3            2
 TOML                    8          249          223            1           25
 YAML                    1            8            6            2            0
------------------------------------------------------------------------------------
 Markdown                1           77            0           49           28
 |- BASH                 1            1            1            0            0
 |- Rust                 1           54           40            3           11
 (Total)                            132           41           52           39
------------------------------------------------------------------------------------
 Rust                   61         7974         6699          221         1054
 |- Markdown            39          593            0          563           30
 (Total)                           8567         6699          784         1084
===============================================================================
 Total                  73         8337         6952          276         1109
===============================================================================

According to Tokei, I've written 6699 lines of Rust code in this project. The first day of work was Friday, March 19, which is around ~3.5 weeks for those stats.

I have a pretty-well-tested codebase that I'm almost ready to integrate into Cosmic Verge. While I have plenty of work remaining on PliantDB, I'm excited at the prospect of replacing PostgreSQL and Redis in Cosmic Verge potentially next month.

Interested in PliantDB's development?

I'm always happy to have more people to talk about Rust with. I'd love to hear from you on Discord, Twitter, or GitHub Discussions.