<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jonathan Johnson</title>
    <description>The latest articles on DEV Community by Jonathan Johnson (@ecton).</description>
    <link>https://dev.to/ecton</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F614508%2F139f19ad-c1b5-48d5-af22-be8f3d828f31.jpeg</url>
      <title>DEV Community: Jonathan Johnson</title>
      <link>https://dev.to/ecton</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ecton"/>
    <language>en</language>
    <item>
      <title>The Futility of Benchmarks</title>
      <dc:creator>Jonathan Johnson</dc:creator>
      <pubDate>Mon, 04 Oct 2021 04:00:00 +0000</pubDate>
      <link>https://dev.to/ecton/the-futility-of-benchmarks-3a78</link>
      <guid>https://dev.to/ecton/the-futility-of-benchmarks-3a78</guid>
      <description>&lt;p&gt;I'm normally someone to stress avoiding premature optimization. Unfortunately, when deciding whether to replace Sled as the storage layer for &lt;a href="https://github.com/khonsulabs/bonsaidb/" rel="noopener noreferrer"&gt;BonsaiDb&lt;/a&gt;, I needed to understand whether &lt;a href="https://github.com/khonsulabs/nebari" rel="noopener noreferrer"&gt;Nebari&lt;/a&gt; could even remotely compare to the speed of Sled. But, I also realized I didn't know how Sled compared to any other engine either. SQLite is one of those projects you always hear about being efficient, and rightfully so, so I wanted to compare Nebari against both of those projects.&lt;/p&gt;

&lt;p&gt;There &lt;a href="https://community.khonsulabs.com/t/introducing-nebari-a-key-value-data-store-written-using-an-append-only-file-format/81/2" rel="noopener noreferrer"&gt;are many other reasons&lt;/a&gt; I decided to keep developing Nebari, but today, I'm going to focus on the struggle I had getting Nebari to the point that I could write that last devlog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initial stages of benchmarking Nebari
&lt;/h2&gt;

&lt;p&gt;From the outset of working on BonsaiDb, my only goals were to scale as well as CouchDB, as I had built my last business on it. One of the simplest things I should have done much sooner was set up a CouchDB benchmark. I had no idea how performant CouchDB was compared to any other database engine -- even after my extensive experience with it.&lt;/p&gt;

&lt;p&gt;Because CouchDB isn't as easy to set up, I started my &lt;a href="https://github.com/khonsulabs/nebari" rel="noopener noreferrer"&gt;Nebari&lt;/a&gt; benchmark suite only comparing against SQLite. After getting my initial suite working, I found that on single-row inserts and retrievals, I could beat SQLite, but for larger operations, SQLite would easily beat me. I don't have good graphs, because at the time I was experimenting with Nebari being async and using supporting io_uring. This is the best image I had prior to switching to Criterion for benchmarks after ditching async:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcommunity-uploads.khonsulabs.com%2Ffile%2Fkhonsulabs-community-uploads%2Foriginal%2F1X%2Ff6e851868b28d1919951a7c3fb146f4bdeff5a62.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcommunity-uploads.khonsulabs.com%2Ffile%2Fkhonsulabs-community-uploads%2Foriginal%2F1X%2Ff6e851868b28d1919951a7c3fb146f4bdeff5a62.png" alt="Nebari vs SQLite|564x320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph is measuring how long it takes to retrieve 100 records out of a data set of varying sizes. As you can see, SQLite is steady, and while I could beat it on small datasets, I wasn't happy with how this was turning out, though. I should have been happy enough given that the project was only 2 weeks old, but try as I might, I just wasn't happy with these results.&lt;/p&gt;

&lt;p&gt;I decided it was time to benchmark Sled and CouchDb after switching to Criterion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The new benchmark suite
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://khonsulabs-storage.s3.us-west-000.backblazeb2.com/roots-bench-2021-09-07/report/index.html" rel="noopener noreferrer"&gt;initial report&lt;/a&gt; shows a pretty gruesome story for my beloved CouchDB. On every benchmark, Nebari, SQLite, and Sled all are measured on a different order of magnitude. For example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://khonsulabs-storage.s3.us-west-000.backblazeb2.com/roots-bench-2021-09-07/logs-gets/1000%20sequential%20elements/report/index.html" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcommunity-uploads.khonsulabs.com%2Ffile%2Fkhonsulabs-community-uploads%2Foriginal%2F1X%2F98c6f893a6787c6e1bd6cc5bcffaa3e90a60e437.png" alt="Retrieve 1 row out of 1,000 records|689x176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sled is so fast that the line doesn't even show up on the graph. Nebari is faster than SQLite in this particular benchmark, and then CouchDB is on its lonesome self at just shy of a full millisecond. What was the operation? Requesting a row by its primary key.&lt;/p&gt;

&lt;p&gt;Was I happy now that I knew I was going to be able to beat CouchDB in performance? I should have been, but I knew I had a lot of performance left on the table.&lt;/p&gt;

&lt;p&gt;I continued working on Nebari, flushing out its functionality, fixing its bugs, and eventually was able to hook up BonsaiDb atop of it. It was at that point that &lt;a href="https://community.khonsulabs.com/t/introducing-nebari-a-key-value-data-store-written-using-an-append-only-file-format/81#what-caused-the-uncontrolled-memory-usage-5" rel="noopener noreferrer"&gt;I discovered that Sled wasn't the cause of the memory bug.&lt;/a&gt; If you read that post, you'll see me conclude I'm going to keep writing Nebari for many reasons, but I didn't name speed as one of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  "I'm not a database engineer"
&lt;/h2&gt;

&lt;p&gt;Imposter syndrome is a fun thing to fight. If you read through my posts about Nebari and BonsaiDb, you'll see me asserting over and over: &lt;em&gt;I'm not a database engineer&lt;/em&gt;. A month ago, I arguably wasn't. But, I became one over the past month.&lt;/p&gt;

&lt;p&gt;A great one? Probably not, but instead of being nervous about showing people Nebari, I'm now feeling proud to have written it. What changed my mind? It all came down to the realization that benchmarks are futile.&lt;/p&gt;

&lt;p&gt;Every time I publish numbers, I make sure to reinforce something that everyone should already know: a benchmark suite is not a predictor of how your application will perform when built with the thing being benchmarked. You can pick the fastest libraries and still bring it to a crawl using an O(n^2) algorithm.&lt;/p&gt;

&lt;p&gt;Yet, the true futility of benchmarking didn't start hitting me until I decided I wanted to set up an automated way to run benchmarks on a machine that could produce reliable results over time. I was shocked at some of the initial results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcommunity-uploads.khonsulabs.com%2Ffile%2Fkhonsulabs-community-uploads%2Foptimized%2F1X%2Fd26abe875427c4ee4d9dd11bdf0642f6a750ac25_2_690x355.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcommunity-uploads.khonsulabs.com%2Ffile%2Fkhonsulabs-community-uploads%2Foptimized%2F1X%2Fd26abe875427c4ee4d9dd11bdf0642f6a750ac25_2_690x355.png" alt="ecton's machine vs the cloud|690x355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The top graph shows &lt;a href="https://khonsulabs-storage.s3.us-west-000.backblazeb2.com/nebari-vultr-2c-4g/report/index.html" rel="noopener noreferrer"&gt;a dedicated Vultr VPS that might be a potential deployment target for us&lt;/a&gt; and the bottom graph shows results from &lt;a href="https://khonsulabs-storage.s3.us-west-000.backblazeb2.com/nebari-ecton/report/index.html" rel="noopener noreferrer"&gt;results from my development machine&lt;/a&gt;. What's interesting to see is that on my local machine, all engines insert a row in less than 40 microseconds, with the quickest being at around 16 microseconds (Sled).&lt;/p&gt;

&lt;p&gt;Compare that with the VPS: The only engine that completes in less than 40us is Nebari at 35.7us. Sled is 3x slower in this particular benchmark, and SQLite is really not happy running on that VPS.&lt;/p&gt;

&lt;p&gt;That moment was a turning point for me. If you click through the benchmarks at that stage as reported by my machine, you would most likely agree with me: I should be proud of what I pulled off in less than a month. But, if you then look at the benchmarks on the VPS, you see an even prettier picture for Nebari.&lt;/p&gt;

&lt;p&gt;For those wondering why Nebari is faster in these situations, I can only hypothesize because I'm not that familiar with how storage works on a VPS host. My best guess is that appending to the end of a file is more optimized in these environments than whatever is needed for SQLite and Sled to update their databases (file locks? or just worse random write performance?).&lt;/p&gt;

&lt;p&gt;I'm not trying to say that these benchmarks are useless. On the contrary, they've helped me understand where I'm likely leaving performance on the table and identify some low hanging fruit already. But, no matter how good I make any of these benchmarks perform, the actual performance in the hosted environment will likely be much different than what I can simulate on my own developer machine. At the end of the day, the only way to optimize the shipping application is going to be to profile the application itself.&lt;/p&gt;

&lt;p&gt;The final nail in my imposter syndrome's coffin came yesterday when I finished &lt;a href="https://github.com/khonsulabs/bonsaidb/pull/74" rel="noopener noreferrer"&gt;switching BonsaiDb over to Nebari's transaction log&lt;/a&gt;. I measured the &lt;code&gt;save_documents&lt;/code&gt; benchmark locally, and saw that my new implementation landed slightly slower than Sled (but with full revision history supported). I then realized I never looked at the performance of &lt;code&gt;save_documents&lt;/code&gt; on a VPS before.&lt;/p&gt;

&lt;p&gt;I dug through Github Actions logs to see the benchmark results. After looking for the lowest numbers across several old runs, here's the fastest results compared:&lt;/p&gt;

&lt;p&gt;BonsaiDb on Sled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;save_documents/1024     time:   [388.02 us 395.56 us 403.93 us]
                        thrpt:  [2.4177 MiB/s 2.4688 MiB/s 2.5168 MiB/s]
save_documents/2048     time:   [510.57 us 523.38 us 535.91 us]
                        thrpt:  [3.6445 MiB/s 3.7317 MiB/s 3.8254 MiB/s]
save_documents/8192     time:   [578.55 us 588.99 us 599.17 us]
                        thrpt:  [13.039 MiB/s 13.264 MiB/s 13.504 MiB/s]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BonsaiDb on Nebari:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;save_documents/1024     time:   [187.73 us 194.35 us 201.45 us]
                        thrpt:  [4.8477 MiB/s 5.0247 MiB/s 5.2020 MiB/s]
save_documents/2048     time:   [188.09 us 192.51 us 197.58 us]
                        thrpt:  [9.8850 MiB/s 10.146 MiB/s 10.384 MiB/s]
save_documents/8192     time:   [272.55 us 280.89 us 291.47 us]
                        thrpt:  [26.804 MiB/s 27.813 MiB/s 28.664 MiB/s]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It may sound silly, but seeing these results was cathartic. For a month, I was doing my best to sound confident in what I was doing, but at the end of each day, I found myself fearing that I would ultimately fail to build something that could eventually support my &lt;a href="https://github.com/khonsulabs/cosmicverge/" rel="noopener noreferrer"&gt;visions of grandeur&lt;/a&gt;. I'm confident if I had a more exhaustive benchmark suite for BonsaiDb there would be no clear winner across all measurements.&lt;/p&gt;

&lt;p&gt;But, for a project started a month ago to be in the same realm as SQLite and Sled? I'm very happy with that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unveiling the hosted benchmark suite
&lt;/h2&gt;

&lt;p&gt;I moved on to finishing up a nice &lt;a href="https://khonsulabs-storage.s3.us-west-000.backblazeb2.com/nebari-scaleway-gp1-xs/index.html" rel="noopener noreferrer"&gt;hosted overview of benchmarks&lt;/a&gt;, which also describes what each benchmark is testing a little better than the Criterion reports do. These benchmarks are run on an instance that we've identified as a potential deployment target for Cosmic Verge, although it's still too early to know exactly what environment we'll ultimately call home.&lt;/p&gt;

&lt;p&gt;Despite the title of this post, benchmarks are still going to be a critical part of developing BonsaiDb and Nebari. It's just important to remember that benchmarks will always be limited in what they can tell you, unless the benchmark is specifically written for your particular use case and being run in exactly the same environment as your production environment.&lt;/p&gt;

&lt;p&gt;Nebari is shaping up into a neat library on its own, but I'm excited to start putting more time back into &lt;a href="https://github.com/khonsulabs/bonsaidb" rel="noopener noreferrer"&gt;BonsaiDb&lt;/a&gt; and &lt;a href="https://github.com/khonsulabs/gooey/" rel="noopener noreferrer"&gt;Gooey&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>database</category>
      <category>bonsaidb</category>
      <category>nebari</category>
    </item>
    <item>
      <title>Towards Stabilization: Serialization Format(s) for PliantDb</title>
      <dc:creator>Jonathan Johnson</dc:creator>
      <pubDate>Mon, 28 Jun 2021 19:46:57 +0000</pubDate>
      <link>https://dev.to/ecton/towards-stabilization-serialization-format-s-for-pliantdb-4c9</link>
      <guid>https://dev.to/ecton/towards-stabilization-serialization-format-s-for-pliantdb-4c9</guid>
      <description>&lt;p&gt;Last week someone interested in using &lt;a href="https://pliantdb.dev/"&gt;&lt;code&gt;PliantDb&lt;/code&gt;&lt;/a&gt; asked a question &lt;a href="https://discord.khonsulabs.com/"&gt;on our Discord server&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The current version is not yet at 1.0, and messages are everywhere to not use it, yet. What features aren't yet implemented or trusted?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because Discord isn't a great way to archive these answers publicly, I wanted to share &lt;a href="https://discord.com/channels/578968877866811403/833332909808025610/857624820337737739"&gt;my response&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In terms of what's trusted: everything. I feel confident in this implementation because of the code coverage: &lt;a href="https://pliantdb.dev/coverage"&gt;https://pliantdb.dev/coverage&lt;/a&gt; -- It's not perfect, and I'm sure there are some bugs, but the biggest concern to me is storage formats. I may replace cbor with something else, for many reasons that I'll leave outside of chat here (Dax doesn't even know that thought process yet lol). This sort of fundamental storage change would make a simple update incompatible, and that's what I'm not ready for people to adopt PliantDb aggressively yet.&lt;/p&gt;

&lt;p&gt;That being said, part of the unit tests do include testing backup/restore, and my intention is to ensure that export format will always be able to bring you from a previous version to a current version in those situations. The gotcha right now for that feature is that the key-value store isn't backed up currently. &lt;a href="https://github.com/khonsulabs/pliantdb/issues/50"&gt;https://github.com/khonsulabs/pliantdb/issues/50&lt;/a&gt; (Discovered I overlooked that feature while hooking up those unit tests).&lt;/p&gt;

&lt;p&gt;Missing features that I'm aware of for local/embedded use: Collection's don't have a List function. You can list by creating a view over the collection, but I need to add a separate List endpoint. I started this the other day, but I was hoping to do it by replacing get_multiple. I realized that approach was a bad idea from a permissions standpoint, so I reverted the changes to tackle it another day.&lt;/p&gt;

&lt;p&gt;For server/client: There isn't any multi-user support (yet). We're on the cusp of it. The certificate handling on the server portion for the QUIC protocol currently only supports pinned certificates -- the goal is for our HTTP + QUIC layers to eventually share the same certificate. For websockets, no TLS currently, and the websockets are mounted at root. Eventually they will be moved to a route on an HTTP layer that you will be able to extend with your own HTTP routes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question spurred my brain into action, though. A few weeks ago, I had begun looking into &lt;a href="https://github.com/khonsulabs/pliantdb/issues/56"&gt;how to protect &lt;code&gt;PliantDb&lt;/code&gt; from memory exhaustion attacks&lt;/a&gt;. I knew &lt;code&gt;bincode&lt;/code&gt;'s method, but my initial searches on mitigation strategies for &lt;code&gt;serde-cbor&lt;/code&gt; came up blank.&lt;/p&gt;

&lt;p&gt;The summary of the results of my searches is that there is a question about whether the current &lt;a href="https://lib.rs/crates/serde_cbor"&gt;&lt;code&gt;serde-cbor&lt;/code&gt;&lt;/a&gt; crate should be considered the mainline one, or if a newer one (&lt;a href="https://lib.rs/crates/ciborium"&gt;&lt;code&gt;Ciborium&lt;/code&gt;&lt;/a&gt;) should replace it. I should note, I haven't tested either crate against this attack, and it could be that one or both of them already mitigate it somehow. And, if either are susceptible, pull requests could address the issue. But, I wasn't sure where my efforts to further investigate should be spent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why &lt;code&gt;CBOR&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;At this point, I wanted to remind myself why I picked the decisions I did. There are two types of data structures in use in &lt;code&gt;PliantDb&lt;/code&gt; that need to be serialized and deserialized: ones &lt;code&gt;PliantDb&lt;/code&gt; itself manages, and ones that users of &lt;code&gt;PliantDb&lt;/code&gt; will provide. This is where the power of &lt;code&gt;serde&lt;/code&gt; comes in: &lt;code&gt;PliantDb&lt;/code&gt; only needs the user types ot implement &lt;code&gt;Serialize&lt;/code&gt; and &lt;code&gt;Deserialize&lt;/code&gt;, and it's able to be easily stored in &lt;code&gt;PliantDb&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When considering storage formats for user types, it's important to think about one of the most important aspects of a database: storing and loading your data reliably. Generally speaking, a self-describing format is one that includes enough information that it can be loaded without having the original data structure for reference. &lt;a href="https://github.com/bincode-org/bincode#is-bincode-suitable-for-storage"&gt;&lt;code&gt;bincode&lt;/code&gt; has a note in its README&lt;/a&gt; discussing its limitations of using this type of format for storage.&lt;/p&gt;

&lt;p&gt;In short, if I want &lt;code&gt;PliantDb&lt;/code&gt; to be easy to use in a reliable fashion, user datatypes should be enoded using a self-describing format. With CouchDB, a major inspiration for &lt;code&gt;PliantDb&lt;/code&gt;, documents were stored as JSON. However, JSON isn't a particularly efficient format, and in my research, &lt;a href="https://cbor.io/"&gt;&lt;code&gt;CBOR&lt;/code&gt;&lt;/a&gt; is an open-standard binary format with a reasonable amount of popularity in the Rust community.&lt;/p&gt;

&lt;p&gt;For the internal &lt;code&gt;PliantDb&lt;/code&gt; structures, I am willing to subject myself to limitations on how to manage migrating between versions of data structures. Those structures I want to serialize as quickly as possible while still providing me some flexibility. &lt;code&gt;bincode&lt;/code&gt; fits this bill perfectly. While a custom format technically could be faster, &lt;code&gt;bincode&lt;/code&gt; is very fast and well-tested.&lt;/p&gt;

&lt;p&gt;So, that's the reasoning as to why I picked &lt;code&gt;CBOR&lt;/code&gt; and &lt;code&gt;bincode&lt;/code&gt;. But, something rubbed me the wrong way about &lt;code&gt;CBOR&lt;/code&gt; and most other self-describing formats. This friction of wanting to solve the only outstanding question for the storage of &lt;code&gt;PliantDb&lt;/code&gt;'s documents made me confront one of my only dislikes of the &lt;code&gt;CBOR&lt;/code&gt; format: its verbosity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why consider switching away from &lt;code&gt;CBOR&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;Imagine this data structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Logs&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LogEntry&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;LogEntry&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DateTime&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Utc&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When encoding this data structure with 50 &lt;code&gt;entries&lt;/code&gt;, the identifiers &lt;code&gt;timestamp&lt;/code&gt;, &lt;code&gt;level&lt;/code&gt;, and &lt;code&gt;message&lt;/code&gt; will be in the created file that many times.&lt;/p&gt;

&lt;p&gt;As someone who has worked on a compiler that targeted multiple executable formats, I know one of the tricks of the trade: executables include a string table that contains all of the static strings in your binary. If you have the string literal &lt;code&gt;"hello"&lt;/code&gt; in your executable in 30 files, the compiler will encode the same address for each reference.&lt;/p&gt;

&lt;p&gt;My theory was that by generating a string table for all of the identifiers in the data structures, I could easily gain efficiency on storage while hopefully retaining similar performance to &lt;code&gt;CBOR&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;How beneficial would it be? Only one way to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;PBOR&lt;/code&gt; - a working name
&lt;/h2&gt;

&lt;p&gt;I started up a project the next day and lacking creativity, I named it &lt;code&gt;PliantDb Binary Object Representation&lt;/code&gt;, or &lt;code&gt;PBOR&lt;/code&gt;. While I named it after &lt;code&gt;CBOR&lt;/code&gt;, I genuinely came up with this format independently, and while it bears a resemblance, there are a few distinct features. First, let me state my goals explicitly upfront for this project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement a self-describing serialization format with "full" compatibility with &lt;code&gt;serde&lt;/code&gt;'s features. Essentially, design it to fit &lt;code&gt;serde&lt;/code&gt;'s design like a glove.&lt;/li&gt;
&lt;li&gt;Is safe to run in production: ensure it is safe against malicious payloads.&lt;/li&gt;
&lt;li&gt;Is compact: Should be more compact than &lt;code&gt;CBOR&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Is efficient: Should be roughly equivalent to the current performance of &lt;code&gt;CBOR&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tackling an &lt;code&gt;identifier&lt;/code&gt; table
&lt;/h2&gt;

&lt;p&gt;So first, a quick discussion about the practicalities of having a string/identifier table. One downside is that you don't know the size of the table until the entire data set has been scanned. This creates a conundrum. If you want the table at the start of the output, you will need to either allocate an arbitrary amount of space and come back and patch it in, or you write it after the data and include a 'jump' target in a header (requiring the ability to seek backward over your written data).&lt;/p&gt;

&lt;p&gt;The problem with both approaches is similar: the entire payload must be in-memory to efficiently process the data, either during serialization or deserialization. So, as I began to think about how to design the format, I started thinking about having a format that would allow me to output an identifier once, then in the future, refer to it by id.&lt;/p&gt;

&lt;p&gt;This began highlighting a core design idea, that each chunk of data was going to have a &lt;code&gt;kind&lt;/code&gt; and an optional &lt;code&gt;argument&lt;/code&gt;. This turns out to be another way that &lt;code&gt;CBOR&lt;/code&gt; and my format differ. For &lt;code&gt;CBOR&lt;/code&gt;, the argument is always output as a second byte (or additional, depending on how big the integer value is). The way I tackled the problem requires slightly more work but appears to over-time save storage space.&lt;/p&gt;

&lt;p&gt;Let's establish a new term: an Atom. In &lt;code&gt;PBOR&lt;/code&gt; an atom is an individual chunk of data. The first byte contains three pieces of information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upper nibble (&lt;code&gt;&amp;amp; 0b11110000&lt;/code&gt;): the Atom kind.&lt;/li&gt;
&lt;li&gt;Fifth bit (&lt;code&gt;&amp;amp; 0b1000&lt;/code&gt;): Additional bytes are part of the argument&lt;/li&gt;
&lt;li&gt;Last 3 bits (&lt;code&gt;&amp;amp; 0b111&lt;/code&gt;): the first 3 bits of the argument.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To parse the atom header, if the lower nibble is not 0, there is an argument. The last three bits are extracted, and then if the fifth bit is set, an additional byte is read. Next, the lower 7 bits are shifted into the proper location, and if the highest bit is set, the loop is continued. The maximum size for an argument is a &lt;code&gt;u64&lt;/code&gt;, which makes the maximum atom header weigh in at 10 bytes with this extra encoding.&lt;/p&gt;

&lt;p&gt;However, this packing provides some interesting opportunities. The remaining three bits can hold a value of 0 through 7, and if needed, the argument can scale up to a u64. However, values that are less than 8 can be stored in a single-byte atom header.&lt;/p&gt;

&lt;p&gt;Let's examine integers and floats. The most common sizes are all 8 bytes or less. So, if we subtracted 1 (and disallowed a 0-byte integer), all integers that are u64 or smaller will only require a single byte header to denote the atom is an integer of X bytes size.&lt;/p&gt;

&lt;p&gt;With bytes and strings, the argument can be the length of the data. Small values would still fit within a single byte, but string or byte sequences that were less than 1,024 bytes long would fit within a two-byte header. Long story short, the loss of the single bit of encoding still allow most practical values to fit in one-byte-smaller encoding.&lt;/p&gt;

&lt;p&gt;Finally, let's think about identifiers. In &lt;code&gt;PBOR&lt;/code&gt; there is an atom kind Symbol. When the serializer first encounters a new identifier, it will write an atom &lt;code&gt;(Symbol, 0)&lt;/code&gt;, followed by a string atom containing the identifier. The deserializer will expect a string when it receives an 0 in the atom header. Both the serializer and deserializer will assign it a new id, with the first one starting at 1 and counting upwards.&lt;/p&gt;

&lt;p&gt;When the serializer encounters an identifier it has already serialized, it will emit the symbol ID as the atom argument. The deserializer will not expect a string when it receives a non-zero argument and instead will look up the already-deserialized string.&lt;/p&gt;

&lt;h2&gt;
  
  
  How did the experiment go?
&lt;/h2&gt;

&lt;p&gt;Here's where the arbitrary benchmark I've chosen (&lt;a href="https://github.com/djkoloski/rust_serialization_benchmark"&gt;adapted from this project's log benchmark&lt;/a&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Serialize (ms)&lt;/th&gt;
&lt;th&gt;Deserialize (ms)&lt;/th&gt;
&lt;th&gt;length&lt;/th&gt;
&lt;th&gt;gzip length&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;bincode&lt;/td&gt;
&lt;td&gt;0.5757&lt;/td&gt;
&lt;td&gt;2.3022&lt;/td&gt;
&lt;td&gt;741,295&lt;/td&gt;
&lt;td&gt;305,030&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pbor&lt;/td&gt;
&lt;td&gt;2.1235&lt;/td&gt;
&lt;td&gt;4.7786&lt;/td&gt;
&lt;td&gt;983,437&lt;/td&gt;
&lt;td&gt;373,654&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;serde-cbor&lt;/td&gt;
&lt;td&gt;1.4557&lt;/td&gt;
&lt;td&gt;4.7311&lt;/td&gt;
&lt;td&gt;1,407,835&lt;/td&gt;
&lt;td&gt;407,372&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;serde-json&lt;/td&gt;
&lt;td&gt;3.2774&lt;/td&gt;
&lt;td&gt;6.0356&lt;/td&gt;
&lt;td&gt;1,827,461&lt;/td&gt;
&lt;td&gt;474,358&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These numbers should be considered arbitrary by anyone reading this. &lt;code&gt;PBOR&lt;/code&gt; is not a clear winner on any given metric, but it did achieve my primary goals.&lt;/p&gt;

&lt;p&gt;Ultimately, going into the experiment I underestimated the cost of building and maintaining the symbol map both during serialization and deserialization. It took far too long to optimize it to be able to become equivalent on deserialization speed. I'm confident I can squeeze a little more performance out here or there, but I've stopped focusing on that for now. Instead, I wanted to openly ask: does this seem like a good idea, or should I just keep embracing &lt;code&gt;CBOR&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;Unfortunately, to give realistic practical numbers, I'll need to take this experiment further, so I'm taking this moment to pause and reflect and make sure this goal is something worth spending time on. &lt;/p&gt;

&lt;p&gt;One of the ways to prove its worth would be more benchmarks. But to benchmark the true impact to &lt;code&gt;PliantDb&lt;/code&gt;, we must consider how data flows through the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding serialization in &lt;code&gt;PliantDb&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;At the core, the &lt;a href="https://pliantdb.dev/main/pliantdb/core/document/struct.Document.html"&gt;Document&lt;/a&gt; type contains the serialized bytes of the document that is stored. This means that when saving to the database, the code connecting to the database is responsible for serialization: not the server. Thus, the penalty for serialization cost will live wherever the documents are being saved from.&lt;/p&gt;

&lt;p&gt;If your View code deserializes the document on the server, the deserialization speed impacts that code's execution. However, this only affects the View updating processes and does not impact View queries themselves.&lt;/p&gt;

&lt;p&gt;The server doesn't deserialize the documents for document fetching or view queries and simply sends the serialized bytes across. Thus, the format of the data on-disk directly impacts the amount of data transmitted across the network.&lt;/p&gt;

&lt;p&gt;The last thing that I would find interesting to measure in real-world workloads is how often a document is serialized compared to deserialized. It seems reasonable to assume that the average document is deserialized more than once for each time it's serialized. Yet, not all data is the same -- many kinds of data are written and rarely read from, such as application logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What should &lt;code&gt;PliantDb&lt;/code&gt; support?
&lt;/h2&gt;

&lt;p&gt;Because of this mixture of "who pays the cost", there may ultimately not be a correct answer. My gut thinks that &lt;code&gt;PBOR&lt;/code&gt; is an interesting option, but there are significant benefits to using an open standard like &lt;code&gt;CBOR&lt;/code&gt;. I don't believe either choice will significantly affect the performance of &lt;code&gt;PliantDb&lt;/code&gt; servers. Finishing up &lt;code&gt;PBOR&lt;/code&gt; would require several more days to flush out unit testing and benchmarks and a few rough edges.&lt;/p&gt;

&lt;p&gt;As such, I'm seeking input! I'd love to hear what your thoughts are for the self-describing format support. Here are the three options as I see them, but please leave a comment if you have other ideas.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stick with &lt;code&gt;CBOR&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PBOR&lt;/code&gt; sounds worth pursuing further&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PliantDb&lt;/code&gt; shouldn't have one enabled by default, and users should be able to pick via feature flags. Clients and servers should be able to support multiple formats at the same time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm running a poll &lt;a href="https://community.khonsulabs.com/t/towards-stabilization-serialization-format-s-for-pliantdb/71#what-should-pliantdb-support-7"&gt;on this post on our Discourse forums&lt;/a&gt;, but I would love feedback in whatever way is the easiest for you to provide it.&lt;/p&gt;

&lt;p&gt;Keep in mind that this only impacts the built-in deserialization methods. You can always interact with the document contents directly to use your own libraries.&lt;/p&gt;

&lt;p&gt;Thank you in advance for any feedback!&lt;/p&gt;

</description>
      <category>devlog</category>
      <category>rust</category>
      <category>pliantdb</category>
      <category>database</category>
    </item>
    <item>
      <title>Guaranteed unique; Or, why dogfooding can be taxing.</title>
      <dc:creator>Jonathan Johnson</dc:creator>
      <pubDate>Sun, 02 May 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/ecton/guaranteed-unique-or-why-dogfooding-can-be-taxing-2gcn</link>
      <guid>https://dev.to/ecton/guaranteed-unique-or-why-dogfooding-can-be-taxing-2gcn</guid>
      <description>&lt;p&gt;As I looked towards &lt;a href="https://dev.to/ecton/pliantdb-0-1-0-dev-3-updates-thoughts-on-the-vision-2p7l"&gt;the future of PliantDb&lt;/a&gt;, I thought my next step was to begin working on the permissions system. I've been setting a goal to try to have &lt;a href="https://github.com/khonsulabs/cosmicverge"&gt;Cosmic Verge&lt;/a&gt; running on &lt;a href="https://github.com/khonsulabs/pliantdb"&gt;PliantDb&lt;/a&gt; by Saturday so that when I give an update on the game, it will have had some meaningful progress. In reviewing my action plan, I wanted the native clients to talk to the PliantDb server directly over PubSub. To do that without fear of people doing something that could break the game, I wanted to restrict unauthenticated database connections to specific actions. For the demo, there wouldn't be any user accounts.&lt;/p&gt;

&lt;p&gt;I spent some time working on a permissions system design inspired by &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html"&gt;AWS's IAM policy system&lt;/a&gt;. I'm delighted with how that API is coming along, and I'm excited by the vision that we have on how we're going to try to build the permissions system in a way that makes applying it automatic and straightforward while still allowing flexibility to do complicated logic as-needed. But, this post isn't about that -- I'll write up a summary once I've finished implementing the system. The reason for this post is to talk about a seemingly unrelated feature: unique views. As odd as it sounds, I couldn't bring myself to finish the permissions system until after I solved this problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Guaranteeing Uniqueness
&lt;/h2&gt;

&lt;p&gt;As I finished writing the lower-level part of the permissions system, I began looking at how the permissions would be managed -- through roles and groups. My approach to Cosmic Verge's development is the same as PliantDb's development: If I can design some chunk of code that can be reused over and over to build my project, I'm going to want to use that tool for the job. PliantDb's job is to store collections of data. These permission groups and roles should be implemented using the same schema objects that PliantDb users will be using: PliantDb needs to &lt;a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food"&gt;eat its own dogfood&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For these structures, I was going to want to have a human-readable name be their unique identifier. When reading these permission structures, seeing "Administrators" as the group name instead of "21674831" is infinitely more useful. In a traditional database, the first tool I would use for this would be to use this a &lt;code&gt;varchar&lt;/code&gt; as a primary key. In CouchDB, if you don't specify an id when you create a document, it will automatically generate a UUID-style ID. However, you can also specify an ID at the time of inserting, and it will use that ID -- and for CouchDB, that can be any JSON data type. In PliantDb, to keep things simple and efficient, I decided to restrict the document IDs to u64s.&lt;/p&gt;

&lt;p&gt;Another approach that traditional databases can use is a "unique constraint" -- the ability to have the database check that before updating/inserting any data, it checks that certain constraints hold true. In PliantDb, I had the idea to support "unique views," which would allow any &lt;code&gt;View&lt;/code&gt; to restrict the view entries to one per key optionally. For example, in the &lt;code&gt;PermissionGroup&lt;/code&gt; collection, I could define this view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;View&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;PermissionGroupsByName&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PermissionGroup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'_&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MapResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nn"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="py"&gt;.contents&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PermissionGroup&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="nf"&gt;.emit_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="nf"&gt;.to_ascii_lowercase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whenever a new &lt;code&gt;PermissionGroup&lt;/code&gt; is inserted or updated with a key that already exists, a &lt;code&gt;UniqueKeyViolation&lt;/code&gt; error will be returned.&lt;/p&gt;

&lt;p&gt;The first approach solves a core desire of mine, but the second approach is much more versatile. Ideally, both would be supported by PliantDb.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supporting Arbitrary Primary Key Types
&lt;/h2&gt;

&lt;p&gt;I felt inspired to dive into the first approach: supporting arbitrary primary keys. I started at Document, changing the &lt;code&gt;id: u64&lt;/code&gt; to &lt;code&gt;id: Vec&amp;lt;u8&amp;gt;&lt;/code&gt; to support an arbitrary number of bytes. I then added an &lt;code&gt;id()&lt;/code&gt; method, which attempts to decode the value using the &lt;code&gt;Key&lt;/code&gt; trait that Views already use. Unfortunately, this can error, so the effect of changing all of the &lt;code&gt;doc.header.id&lt;/code&gt; references to &lt;code&gt;doc.header.id()?&lt;/code&gt; started to take a toll on the readability of the code. Eventually, I decided it was too many question marks for a user to endure and backed out of this approach.&lt;/p&gt;

&lt;p&gt;Despite having rolled back my changes, I may still reattempt to support this feature using a different approach -- but it will need to involve a new Document type, one that already deserializes the ID into a generic type.&lt;/p&gt;

&lt;h2&gt;
  
  
  The challenges of dogfooding
&lt;/h2&gt;

&lt;p&gt;This moment is where the inspiration for this blog post came from: dogfooding a large project can be hard. As I stared at the back-to-square-zero code base, I started to be tempted to ignore this problem. After all, the only real problem arises under a heavily concurrent situation, and will PliantDb users really be adjusting their permissions in contentious situations? Probably not.&lt;/p&gt;

&lt;p&gt;Each decision to push functionality down the line yields some technical debt. PliantDb is the core of the architecture we're trying to build in Cosmic Verge. To me, the area you want the least technical debt in are the parts that are at the "core" of your codebase.&lt;/p&gt;

&lt;p&gt;This high amount of dogfooding will hopefully allow us to acheive these grandious goals, but it does come at the expense of needing to spend extra time on the core components to ensure the entire machine works. My break from the computer helped me remember a few important lessons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First lesson: don't set arbitrary deadlines when "passion" is in the project description.&lt;/strong&gt; This post started off discussing how I was trying to progress on the game itself before the next game dev meetup. Yesterday was one week from the next meetup. The motivation for the goal seemed innocuous: it's a game dev meetup; I kind of want to show progress of the game itself. But, the reality is that there's no real pressure to do so. No one is seriosuly expecting an entire database engine to be done in less than two months. I knew I could hit that goal. Heck, I even think I might still hit that goal. But that's beside the point: Setting goals is different than setting deadlines. My goal is to get Cosmic Verge on PliantDb. But, if I force myself to meet that goal on a deadline, I might end up with technical debt, but worse, it might come at the expense of mental stress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second lesson: dream big, but take each day one step at a time.&lt;/strong&gt; With just PliantDb, my list of things I could tackle each day is immense. Add a game to it, and there's no way for a single person to cross the finish line for my lofty visions for both projects. The reality is that I can't make PliantDb or Cosmic Verge reach their fullest vision on my own. But, I can try my best to ensure I'm a step closer to that vision after each day I work. This works right now because I generally try to plan a few steps ahead of where I'm going. For example, right now the high-level PliantDb list is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Platform Trait&lt;/li&gt;
&lt;li&gt;Multi-user support&lt;/li&gt;
&lt;li&gt;Replication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tend to take this approach to planning because I like to reflect on the remaining items each time I finish one. I want to be sure they still seem like the best next steps, otherwise I might want to spend some time adjusting my plan. When I was in the moment and getting frustrated, I had lost sight of this process and was focusing on delivering an arbitrary feature set by an arbitrary date. While I could take on the stress of ensuring I have something by that date, the much healthier approach is to take each day in stride and evaluate where I'm at closer to the meetup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third lesson: Me-time is needed too.&lt;/strong&gt; If you've seen my Discord statuses over the last month, you'll know that my progress on &lt;code&gt;PliantDb&lt;/code&gt; has been made despite an increasing amount of time I've been playing Factorio. I've been having some regular gaming sessions with friends, spending time with my wife, and making progress on a re-listen of &lt;a href="https://en.wikipedia.org/wiki/The_Shadow_of_What_Was_Lost"&gt;The Licanius Trilogy&lt;/a&gt;. But, nearly every waking moment that wasn't a chore or hanging out with someone else was spent working on PliantDb.&lt;/p&gt;

&lt;p&gt;This realization hit me as I sat to really enjoy the piano for the first time in several weeks. I play regularly, but lately, it had been only for 20-30 minutes every few days. I still would have fun playing, but I was usually playing to feel like I'm practicing. I started the same way this weekend, and after a couple of songs, I came back and sat at the computer. I thought I was ready to tackle unique views. Yet as I stared at the monitor more and more, I just realized that playing the piano more sounded better to me at that moment in time. I went back to the piano and played until my back was sore -- in a good way.&lt;/p&gt;

&lt;p&gt;Those moments enjoying the music for the escape it was providing made me realize I needed to relax for the rest of the day and take the artificial pressure off myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unique Views
&lt;/h2&gt;

&lt;p&gt;Last night while enjoying my time away from the computer, I still found myself pondering the challenges of implementing unique views. It sounds simple at first glance, but it flips the responsibility of view updating on its head. Before this implementation, when you save a document, no views are updated immediately. Instead, when a view is queried, at that time, the view will be indexed, and results are returned. This means if you update a document 5 times but only access the view one time, the view's code is only being evaluated once. However, for a unique view to work, document saving &lt;em&gt;must&lt;/em&gt; take the responsibility of the view indexer.&lt;/p&gt;

&lt;p&gt;A little while ago, I noticed that you could define an associated constant in a trait. Implementors of that trait can be required to provide their own value. I haven't seen this used much in practice, but I immediately thought of using it for this unique flag for the view. The inline example earlier in this post shows how this works for the &lt;code&gt;View&lt;/code&gt; trait in &lt;code&gt;PliantDb&lt;/code&gt; now. I'm not sure if I'm going to keep this approach or change it to how &lt;code&gt;version()&lt;/code&gt; is a function. For today, the constant makes more sense, but I also envision dynamic views that will need to be created at runtime in the long term.&lt;/p&gt;

&lt;p&gt;Despite my fear of revisiting some of the first code I wrote for PliantDb, overall it was a pretty painless process. The view indexer already had the individual-document update logic in its own function, so it was easy to call it from the transaction executor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Lesson: Worrying isn't worth it
&lt;/h2&gt;

&lt;p&gt;I'm happy I got this feature done. Its journey to completion started weeks ago: when I was thinking of porting Cosmic Verge to PlaintDb, I had identified this as something I would want. But, each time I thought of it, I lamented it. I worried about how annoying changing that logic was going to be. I felt like it was going to make the already complex code much harder to understand.&lt;/p&gt;

&lt;p&gt;In the end, it was much easier than anticipated. And, now that it's done, I'm excited at how much less "blocked" I feel on the project. All of the worrying amounted to nothing except stress.&lt;/p&gt;

&lt;p&gt;So, instead of promising when the next update will happen, I'll just say I'm looking forward to giving an overview of the permissions system whenever I'm done. Until next time!&lt;/p&gt;

</description>
      <category>devlog</category>
      <category>rust</category>
      <category>pliantdb</category>
      <category>database</category>
    </item>
    <item>
      <title>PliantDb 0.1.0-dev.3: Updates + Thoughts on the Vision</title>
      <dc:creator>Jonathan Johnson</dc:creator>
      <pubDate>Mon, 26 Apr 2021 00:00:00 +0000</pubDate>
      <link>https://dev.to/ecton/pliantdb-0-1-0-dev-3-updates-thoughts-on-the-vision-2p7l</link>
      <guid>https://dev.to/ecton/pliantdb-0-1-0-dev-3-updates-thoughts-on-the-vision-2p7l</guid>
      <description>&lt;p&gt;When I last left off, I had reached a significant milestone for &lt;a href="https://github.com/khonsulabs/pliantdb"&gt;PliantDb&lt;/a&gt;: I had &lt;a href="https://dev.to/ecton/plaintdb-serves-another-milestone-reached-kl3"&gt;just released&lt;/a&gt; the first version of the client-server functionality. Today, I wanted to recap what I've done in the last two weeks, but more importantly, start painting a picture of where this project is going in my head. If you thought the goals in the project's README were already lofty, you're in for a journey today.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's new in PliantDb 0.1.0-dev.3
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PubSub
&lt;/h3&gt;

&lt;p&gt;PliantDb now offers &lt;a href="https://pliantdb.dev/guide/about/concepts/pubsub.html"&gt;PubSub&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;subscriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="nf"&gt;.create_subscriber&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c"&gt;// Subscribe for messages sent to the topic "ping"&lt;/span&gt;
&lt;span class="n"&gt;subscriber&lt;/span&gt;&lt;span class="nf"&gt;.subscribe_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="nf"&gt;.publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="mi"&gt;1_u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Got ping message: {:?}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subscriber&lt;/span&gt;&lt;span class="nf"&gt;.receiver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.recv_async&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key-Value Store
&lt;/h3&gt;

&lt;p&gt;I don't want any compromise on the ACID compliance of transactions in collections, yet that comes at a significant performance cost. Sometimes, you'd rather sacrifice data safety for high-performance. The &lt;a href="https://pliantdb.dev/guide/traits/kv.html"&gt;Key-Value store&lt;/a&gt; aims to provide redis-like speed and functionality to PliantDb. The current API is limited to basic set/get/delete key operations, but it supports enough atomic operations to enable using the key-value store as a synchronized lock provider. For example, executing this operation on multiple clients will result in only one client executing the isolated code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="nf"&gt;.set_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lock-name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;my_process_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.only_if_vacant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.expire_in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_millis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nn"&gt;KeyStatus&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Inserted&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Run the isolated code&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="mi"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Other client acquired the lock."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Improving the onboarding experience
&lt;/h3&gt;

&lt;p&gt;From the start of this project, I've enforced public APIs to have documentation. I've also tried to create &lt;a href="https://github.com/khonsulabs/pliantdb/tree/main/pliantdb/examples"&gt;reasonably simple examples&lt;/a&gt; of the basic functionality of PliantDb. However, for a project like this, there's a lot of use-case-specific topics that need to be covered. I decided there needed to be a book. It's still very early in progress, but it seems perfect to share at this stage of the project: &lt;a href="https://pliantdb.dev/guide/"&gt;pliantdb.dev/guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At this stage, I wouldn't want anyone to use PliantDb in a real project yet. I think the storage mechanisms themselves are reliable and can be trusted, but I can't guarantee that the storage format will be stable between versions. Because of this harsh anti-recommendation, the guide is at a good stage for people interested in the project: it covers some high-level concepts. It also begins to explore some of the concepts I'm going to discuss here later on -- writing those sections were an inspiration for this post.&lt;/p&gt;

&lt;p&gt;I want the user guide to have sections for topics that cover the knowledge someone needs to possess to feel confident in being their own database administrator. It sounds daunting, but the goal of PliantDb is to make being a responsible database administrator as easy as it can be.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keeping PliantDb Modular
&lt;/h3&gt;

&lt;p&gt;While having many feature flags can be daunting, I think I've come up with a good approach to the feature flags in the "omnibus" crate. If you're just getting a project up and running, &lt;code&gt;full&lt;/code&gt; can be used to bring in everything. If you want to pick and choose, you can now enable each of these features independently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PubSub&lt;/li&gt;
&lt;li&gt;Key-Value store&lt;/li&gt;
&lt;li&gt;WebSockets&lt;/li&gt;
&lt;li&gt;trust-dns based DNS resolution on the client&lt;/li&gt;
&lt;li&gt;Command-Line structures/binary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Furthermore, realize that this is a core goal of mine: While this is a reasonably large project, you will be able to pick and choose what you need in your database. Some functionality will need to be integrated to work optimally, but as much as possible will be kept modular.&lt;/p&gt;

&lt;p&gt;Fun fact: there are currently &lt;a href="https://github.com/khonsulabs/pliantdb/actions/runs/787744699"&gt;18 build jobs&lt;/a&gt; processed in CI to ensure each of the various valid feature flag combinations compile and pass unit tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redefining the Onion
&lt;/h3&gt;

&lt;p&gt;The design of PliantDb is meant to be layered, kind of like an onion. The &lt;code&gt;pliantdb-core&lt;/code&gt; is the core of the proverbial onion, and the first layer around it is &lt;code&gt;pliantdb-local&lt;/code&gt;. Before today, here's how I described the layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;local&lt;/code&gt;: Single-database storage mechanism, comparable to SQLite&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server&lt;/code&gt;: Multi-database networked server, comparable to CouchDB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In discussing the plans I'm about to unveil to you, I realized I went too far in mimicking CouchDB's design. I decided to implement the multi-database abstraction as a server-type operation -- the server doesn't really care about the databases. It just organizes multiple databases together. But, CouchDB is only accessible via HTTP, unlike PliantDb. In PliantDb, your code can run in the same executable as the database.&lt;/p&gt;

&lt;p&gt;Because of this, a very valid use case is a completely offline multi-database storage mechanism. Suppose you are running a single-machine setup and aren't needing any other access to the database. In that case, you should be able to utilize all of PliantDb's features that make sense: multiple databases, key-value store, PubSub, and more to come. This realization had me commit a &lt;a href="https://github.com/khonsulabs/pliantdb/pull/49"&gt;massive refactoring&lt;/a&gt; defining the layers as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;local&lt;/code&gt;: The &lt;code&gt;Storage&lt;/code&gt; type provides multi-database management, and the &lt;code&gt;Database&lt;/code&gt; type provides access to a single database.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server&lt;/code&gt;: The &lt;code&gt;Server&lt;/code&gt; type uses a &lt;code&gt;Storage&lt;/code&gt; instance internally and allows accessing it over a network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a testament to this being the correct design decision, I was able to remove many internal APIs that were needed to support the Server before. While it was a painstaking process, I'm pleased with the outcome.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fixed backup/restore
&lt;/h3&gt;

&lt;p&gt;As part of the previous pull request, there was an update to the backup process. The bug wasn't related to the safety of the data but rather that I wasn't saving the executed transaction metadata. At the time, that was a design decision, but I didn't test well enough. It wasn't until the multi-database implementation utilized a view query under the hood that an &lt;code&gt;expect()&lt;/code&gt; failed in the view indexer: the view indexer thought it was entirely reasonable to expect: if there were documents, there must be a transaction id.&lt;/p&gt;

&lt;p&gt;As I thought about my original decision, I realized I was deeply mistaken. Not saving the transaction information breaks the ability for a restored database to keep replication history. So, now that I've updated backup/restore to work for multi-database (another side-effect of this design decision) and included transaction information, here's what it looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nG7RG8AY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://ecton.dev/pliantdb-vision-backup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nG7RG8AY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://ecton.dev/pliantdb-vision-backup.png" alt="Backup File Listing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The top-level directories, &lt;code&gt;admin&lt;/code&gt; and &lt;code&gt;default&lt;/code&gt;, are the two databases in this example exported. The &lt;code&gt;admin&lt;/code&gt; database is the internal database used to track the databases that have been created. &lt;code&gt;default&lt;/code&gt; is the default name of a database, if it was created for you automatically during &lt;code&gt;Database::open_local&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Inside of each database folder is a &lt;code&gt;_transactions&lt;/code&gt; folder. Each file is a single &lt;a href="https://pliantdb.dev/main/pliantdb/core/transaction/struct.Executed.html"&gt;&lt;code&gt;Executed&lt;/code&gt;&lt;/a&gt; transaction.&lt;/p&gt;

&lt;p&gt;All of the remaining folders will be &lt;a href="https://pliantdb.dev/guide/about/concepts/collection.html"&gt;&lt;code&gt;Collections&lt;/code&gt;&lt;/a&gt; of documents. Each file is named using the document ID and the revision number. The contents of the file are the exact bytes that were stored in the document, which usually means it's encoded as &lt;a href="https://cbor.io/"&gt;CBOR&lt;/a&gt;. But, you can manage the document bytes directly if you desire.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the end goal of PliantDb?
&lt;/h2&gt;

&lt;p&gt;As odd as it may sound, I'm writing PliantDb to power a &lt;a href="https://cosmicverge.com/"&gt;game I'm writing&lt;/a&gt;. As I mentioned in my last post, the game currently is using PostgreSQL and Redis, and the changes above were all inspired by thinking about what I need to be able to update Cosmic Verge to use PliantDb instead of those two engines.&lt;/p&gt;

&lt;p&gt;Once I finished the key-value store, I found myself ready to start on that task! But, as I started trying to figure out where to begin the refactoring, I realized I had been having grandiose visions of PliantDb that I thought were unrelated to Cosmic Verge... Only, they were starting to seem relevant now that I thought about it more.&lt;/p&gt;

&lt;p&gt;I'm going to start with the conclusion: PliantDb's ultimate form is a platform to help you build a modern Rust-y app. For Cosmic Verge, it will be what game clients connect to over the internet, and it's what our internal API will be powered by. To support this safely, a robust permissions model will be needed. But, rest assured, if all you want is a local database with minimal features, you'll be able to get just that and no more.&lt;/p&gt;

&lt;p&gt;To begin to understand why this is the logical conclusion of multiple days of conversations on Discord, we must first start thinking about the goals of the Cosmic Verge architecture we're going for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We want to have a large number of "locations" with independent sets of data and regular-interval game loops.&lt;/li&gt;
&lt;li&gt;We want to have a cluster that can scale up and down as needed to meet demand. This means dynamically moving locations between servers as load is increasing.&lt;/li&gt;
&lt;li&gt;We want to have every location be configured in a highly-available setup. If one server fails, clients should barely notice a hiccup (the only hiccup being if they dropped their connection).&lt;/li&gt;
&lt;li&gt;Every server will have PliantDb data on it, but we want custom logic driving placement of data/tasks within the cluster. We want to be able to use metrics to balance load intelligently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of these basic facts, we concluded that every Cosmic Verge server was going to be a part of the PliantDb cluster. And, if each server was going to be connected via PliantDb, could we improve upon PliantDb's solution to solving our networking problems by implementing a separate protocol? In the end, we decided we couldn't. But more importantly, as we reviewed the features needed by Cosmic Verge to achieve clustering and the features required by PliantDb, we realized the overlap was too significant to ignore.&lt;/p&gt;

&lt;p&gt;Why is this better than using some other database cluster? It boils down to how PliantDb works in the server's executable. Each instance of the Cosmic Verge server will open a PliantDb server in cluster mode. When the server's code calls into the cluster, it will know what servers contain the data in question. For a PubSub message, for example, it knows precisely which servers have any subscribers listening to the topic of the message being published. Because of this knowledge, a PubSub message sent through the PliantDb cluster will be a direct message between two servers in the same cluster. The same knowledge also works for all database operations. If you need a quorum write to succeed, and you're one of the three servers in that particular database shard's cluster, only two network requests are sent. Or, if you ask for a cached view result, your local server instance will return the data without making a network request if it can.&lt;/p&gt;

&lt;p&gt;But, what about the actual game API? How is PliantDb going to help with that? Let me introduce a project that I haven't updated in a little while: &lt;a href="https://github.com/khonsulabs/basws"&gt;basws&lt;/a&gt;. This is the project that Cosmic Verge currently is built on. The main idea is to build a simple way to create API servers, abstracting the authentication/re-authentication logic as much as possible. As I started envisioning how I would integrate PliantDb with it, I started realizing that I wanted PliantDb to have some of this functionality. It wouldn't be hard to add this exact functionality into PliantDb and give it direct support for the Users and Permissions models. A clear win for Cosmic Verge, but hopefully for a lot of developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;I have my hopes on demoing a native-client version of Cosmic Verge at next month's &lt;a href="https://www.youtube.com/watch?v=gqCxt8XL92o"&gt;Rust Game Dev Meetup&lt;/a&gt; powered by PliantDb, but to do that I need a few more things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Permissions: I don't want to allow people to connect to a PliantDb server that has no concept of permissions.&lt;/li&gt;
&lt;li&gt;basws-like API layer: This layer will be defined as a trait that you will be able to optionally provide on the &lt;code&gt;Server&lt;/code&gt; and (eventually) &lt;code&gt;Cluster&lt;/code&gt; types.&lt;/li&gt;
&lt;li&gt;Users if I want to support logging in, although for the demo I might simply give each player a unique random color.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next meetup is on May 8. I'm hopeful, but there's a lot of work to do. And, I keep finding myself writing very long blog posts!&lt;/p&gt;

&lt;p&gt;As always, thank you for reading. I hope you're interested in &lt;a href="https://github.com/khonsulabs/pliantdb"&gt;PliantDb&lt;/a&gt;. If you'd like to join the development conversations, join our &lt;a href="https://discord.khonsulabs.com"&gt;discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>pliantdb</category>
      <category>database</category>
    </item>
    <item>
      <title>PlaintDB Serves - another milestone reached</title>
      <dc:creator>Jonathan Johnson</dc:creator>
      <pubDate>Wed, 14 Apr 2021 17:53:00 +0000</pubDate>
      <link>https://dev.to/ecton/plaintdb-serves-another-milestone-reached-kl3</link>
      <guid>https://dev.to/ecton/plaintdb-serves-another-milestone-reached-kl3</guid>
      <description>&lt;p&gt;It's been a productive couple of weeks since I &lt;a href="https://ecton.dev/introducing-pliantdb/"&gt;introduced PliantDB&lt;/a&gt;. I merged the &lt;a href="https://github.com/khonsulabs/pliantdb/pull/28"&gt;pull request enabling client/server communications&lt;/a&gt;. The journey took a little longer than I had anticipated, but that's for a few reasons. Ultimately, I want to stress something: You can be extremely productive in Rust.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/XE0lH0tlbBs?start=1647"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you want to just learn about how PliantDB's engine works, &lt;a href="https://ecton.dev/introducing-pliantdb/"&gt;my previous post&lt;/a&gt; goes into more detail. Or, you can listen to my talk at last Saturday's Rust Game Dev Meetup, embedded above. Today, I'm going to talk more about the process of developing.&lt;/p&gt;

&lt;h2&gt;
  
  
  My journey from Rust-noob to when I began PliantDB
&lt;/h2&gt;

&lt;p&gt;My Rust journey began a few years ago when I haphazardly threw together a small tool to wait for AWS CloudFormation stacks to reach a "complete" state. The official AWS CLI application allows you to wait for a single state, such as "UPDATE_COMPLETE," but not for one of many states (or any state matching a COMPLETE-like status). So, I wrote a simple tool using &lt;a href="https://lib.rs/rusoto"&gt;rusoto&lt;/a&gt;. I liked the idea of Rust, but it didn't click for me yet. Stubborn me didn't actually read the book.&lt;/p&gt;

&lt;p&gt;Fast forward to when I'm daydreaming about quitting my day job to pursue game development. At that point, I firmly believed in the idea of why Rust was a big deal, but I still hadn't done anything beyond that simple tool. When I quit my job in November 2019, I had only started trying to dive into Rust full time.&lt;/p&gt;

&lt;p&gt;PliantDB's initial commit was on &lt;a href="https://github.com/khonsulabs/pliantdb/commit/43bd3a25b61fc7841c9554422d7bb46ad4362e59"&gt;Friday, March 19&lt;/a&gt;. I know I began writing code that morning because I kicked off the day by having a conversation with one of my former business partners: "You'll never guess what I'm seriously thinking of doing after we end our call."&lt;/p&gt;

&lt;p&gt;When I told him, "I'm going to write my own CouchDB-like database," he protested in the fashion he always would as we debated ideas back when we ran our business together. Within a few minutes, I had sold him on the idea, which gave me the last boost of confidence I needed to embark on what most developers would consider a foolish endeavor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tackling async compatibility issues
&lt;/h2&gt;

&lt;p&gt;I settled on &lt;a href="https://sled.rs"&gt;sled&lt;/a&gt; after evaluating the landscape of available BTree-like data storage layers. It's a complex project, but it's well-tested and is fairly widespread in use. From the initial moments of designing this architecture, I was thinking of how to fit it within sled to utilize its transactions to ensure ACID compliance.&lt;/p&gt;

&lt;p&gt;This fundamental decision wasn't without downsides. The primary of which is that sled isn't "compatible" with async/await in Rust. What I mean is that if you're trying to integrate it within an app that uses tokio, for example, you either need to operate sled within its own thread pool outside of tokio, or you need to use blocking wrappers such as spawn_blocking. These come with their own downsides, such as long-running blocking tasks requiring to worry about tasks on the currently blocked thread not executing.&lt;/p&gt;

&lt;p&gt;For today, I've chosen to use my best guess as to the best type of blocking wrapper for each type of operation, but the long-term goal is utilizing a new async executor that &lt;a href="https://github.com/daxpedda"&gt;Daxpedda&lt;/a&gt; is working on. It's compatible with tokio, but it already has a concept named &lt;code&gt;block_on_blocking,&lt;/code&gt; which is an optimized version of blocking designed to more fairly block without needing to adopt a &lt;code&gt;'static&lt;/code&gt; lifetime requirement due to using &lt;code&gt;spawn_blocking&lt;/code&gt;. He's about to resume working on the executor, but he was responsible for the &lt;a href="https://github.com/khonsulabs/fabruic/"&gt;QUIC-based networking stack&lt;/a&gt; PliantDB is using and is wrapping up a few last requests before moving on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Complexities of supporting a rich type system over a network
&lt;/h2&gt;

&lt;p&gt;The second major battle was something I hadn't fully comprehended when I started: How do you deal with types in a safe way while only exchanging bytes between a client and server? In my head, I knew &lt;a href="https://serde.rs"&gt;serde&lt;/a&gt; was going to be a big part of the solution, but I didn't quite realize the levels of indirection I was going to need.&lt;/p&gt;

&lt;p&gt;Let's take a look at an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;db&lt;/span&gt;
 &lt;span class="py"&gt;.view&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ShapesByNumberOfSides&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="nf"&gt;.with_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="nf"&gt;.query_with_docs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code could be code running on a client talking to a remote database or using PliantDB locally in a form akin to SQLite. This is meant to be one of the selling points of PliantDB, but to make this work, it's rather tricky. Here's how it works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/core/connection/trait.Connection.html#method.view"&gt;&lt;code&gt;db.view::&amp;lt;ShapesByNumberOfSides&amp;gt;()&lt;/code&gt;&lt;/a&gt; returns a &lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/local/core/connection/struct.View.html"&gt;View&lt;/a&gt;, which acts as a &lt;a href="https://doc.rust-lang.org/1.0.0/style/ownership/builders.html"&gt;builder&lt;/a&gt; for accessing a view. &lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/local/core/connection/struct.View.html#method.with_key"&gt;&lt;code&gt;with_key(3)&lt;/code&gt;&lt;/a&gt; sets the &lt;code&gt;key&lt;/code&gt; field of the &lt;code&gt;View&lt;/code&gt; to &lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/core/connection/enum.QueryKey.html"&gt;&lt;code&gt;QueryKey::Matches(3_u32)&lt;/code&gt;&lt;/a&gt;. Finally, &lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/local/core/connection/struct.View.html#method.query_with_docs"&gt;&lt;code&gt;query_with_docs()&lt;/code&gt;&lt;/a&gt; simply calls &lt;a href="https://docs.rs/pliantdb/0.1.0-dev-2/pliantdb/core/connection/trait.Connection.html#tymethod.query_with_docs"&gt;&lt;code&gt;Connection::query_with_docs()&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's look at it from the Client's perspective. Following it along the route to the server will show the complexities I had to navigate and the power of Rust each step along the way.&lt;/p&gt;

&lt;p&gt;On the client, &lt;code&gt;db&lt;/code&gt; is a &lt;code&gt;RemoteDatabase&amp;lt;Schema&amp;gt;&lt;/code&gt;. This implements connection, and converts the parameters &lt;code&gt;Option&amp;lt;QueryKey&amp;lt;u32&amp;gt;&amp;gt;&lt;/code&gt; and &lt;code&gt;AccessPolicy&lt;/code&gt; into &lt;a href="https://docs.rs/pliantdb-local/0.1.0-dev-2/pliantdb_local/core/networking/enum.Request.html#variant.Database"&gt;&lt;code&gt;Request::Database&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{ database: "dbname", request:&lt;/code&gt; &lt;a href="https://docs.rs/pliantdb-local/0.1.0-dev-2/pliantdb_local/core/networking/enum.DatabaseRequest.html#variant.Query"&gt;&lt;code&gt;DatabaseRequest::Query&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{ view: "view-name", key: Option&amp;lt;QueryKey&amp;lt;Vec&amp;lt;u8&amp;gt;&amp;gt;&amp;gt;, access_policy, with_docs: true } }&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once it has it in this enum, it can be sent via QUIC or WebSockets across the wire. It will receive that request on the server, but at the layer that it's receiving the request, the server doesn't have any generic types in its signatures. So, we must design a way to talk to our &lt;code&gt;Storage&amp;lt;Schema&amp;gt;&lt;/code&gt; without the &lt;code&gt;&amp;lt;Schema&amp;gt;&lt;/code&gt; part!&lt;/p&gt;

&lt;p&gt;This is done using an internal trait &lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/server/src/server.rs#L704"&gt;&lt;code&gt;OpenDatabase&lt;/code&gt;&lt;/a&gt;, which the server implements for &lt;code&gt;Storage&amp;lt;Schema&amp;gt;&lt;/code&gt;. This is the first layer, allowing the network code to &lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/server/src/server.rs#L819"&gt;invoke &lt;code&gt;query_with_docs&lt;/code&gt;&lt;/a&gt; which takes the view's name rather than the type of the view. It then looks up an &lt;a href="https://docs.rs/pliantdb-core/0.1.0-dev-2/pliantdb_core/schema/view/trait.Serialized.html"&gt;abstracted version&lt;/a&gt; of that view which automatically serializes and deserializes across its access points. These are the same conversion mechanisms that were used when initially creating the ViewEntries when indexing these views.&lt;/p&gt;

&lt;p&gt;Finally, once the response is retrieved, the journey happens in reverse, going through &lt;a href="https://docs.rs/pliantdb-local/0.1.0-dev-2/pliantdb_local/core/networking/enum.Response.html#variant.Database"&gt;&lt;code&gt;Response::Database&lt;/code&gt;&lt;/a&gt;(&lt;a href="https://docs.rs/pliantdb-local/0.1.0-dev-2/pliantdb_local/core/networking/enum.DatabaseResponse.html#variant.ViewMappingsWithDocs"&gt;&lt;code&gt;DatabaseResponse::ViewMappingsWithDocs()&lt;/code&gt;&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;To me, it's incredible the lengths that you can go in Rust to allow transparent handling of native types in user-code. One of my initial goals of PlaintDB has been achieved: writing local and remote code using async/await without needing to care whether the data is local or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-tasking is challenging sometimes, even with Rust
&lt;/h2&gt;

&lt;p&gt;Ultimately, the goal wasn't to provide a WebSocket implementation in the first pass of the server. I had a goal to present at the Game Dev Meetup this past Saturday, and I really wanted to have a working client/server, but Daxpedda and I were having troubles with some of our code. It was becoming tough to isolate whether the networking code was to blame or whether PlaintDB was to blame.&lt;/p&gt;

&lt;p&gt;That's when I decided to add WebSockets. I was pretty confident I wanted them long-term anyways. Additionally, it was to give me a way to use a protocol I was familiar with and wasn't very complicated to verify the server's functionality. I found bugs with my code in PliantDB pretty quickly, but I was having two peculiar issues.&lt;/p&gt;

&lt;p&gt;First, I was becoming more and more confident that the channel library Daxpedda and I fell in love with, &lt;a href="https://lib.rs/flume"&gt;flume&lt;/a&gt;, was misbehaving, but I couldn't seem to reproduce it outside of the massive PliantDB codebase. I finally called up Daxpedda on Discord and screen shared my debugging session, showing him how the tests succeeded if I retained a channel. If I allowed the sender to drop after successfully sending, sometimes the tests would fail. He agreed, something was odd. It took me a while, but I finally whittled it down to about 30 lines of code and &lt;a href="https://github.com/zesterer/flume/issues/78"&gt;reported the issue&lt;/a&gt;. In an amazingly quick fashion, the maintainer fixed the issue and released an update. And for the record, I still fully love and recommend this library if you're mixing async and non-async code using channels. It's a wonderful implementation.&lt;/p&gt;

&lt;p&gt;The second issue was that every time I ran my unit test suite as a whole, I would sometimes succeed, but more often than not, after a random number of tests, all of the rest would fail. This ended up being my own stupidity. When I was writing the unit tests for the client, I thought to myself, "If I create one shared server, I can test the server differently by running each client test suite on a single server in its own database." I liked the idea, but I didn't think about the problem of achieving it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Pro-tip: &lt;code&gt;#[tokio::test]&lt;/code&gt; creates a unique tokio runtime for each test&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When spawning the server, I was spawning it in a runtime that would dutifully get destroyed once the test completed. Whatever other tests happened to finish before the server was destroyed would get green marks, and the rest would start getting connection refusals.&lt;/p&gt;

&lt;p&gt;Of course, this manifested itself in fun ways to my code -- channels just disconnecting all of a sudden, and often I wouldn't have any errors displaying anywhere!&lt;/p&gt;

&lt;p&gt;So, remember: when writing async tests, if you &lt;code&gt;spawn&lt;/code&gt; into your async runtime, it will not last beyond the current unit test. In this case, I decided to move that style of test to an integration-style test, to keep the "unit" nature more accurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharing Unit Tests
&lt;/h2&gt;

&lt;p&gt;One of the neat results of using the same trait to implement the database interface for Client/Server/Local is that a common unit testing suite was able to be written and reused:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/core/src/test_util.rs#L284"&gt;&lt;code&gt;pliantdb-core::test_util::define_connection_test_suite!&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/local/src/tests.rs#L34"&gt;&lt;code&gt;pliantdb-local::tests&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/server/src/tests.rs#L38"&gt;&lt;code&gt;pliantdb-server::tests&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/client/src/tests.rs#L92"&gt;&lt;code&gt;pliantdb-client::tests::websockets&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/khonsulabs/pliantdb/blob/40eb7a16af42c6e768c2824bfafbf2c89f908f49/client/src/tests.rs#L136"&gt;&lt;code&gt;pliantdb-client::tests::pliant&lt;/code&gt;&lt;/a&gt; (the QUIC-based connection tests)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that as more database functionality is added, it can be added to the common test suite and automatically tested across all layers of PliantDB. Once clustering support is added, the same suite will be tested there also.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being Productive
&lt;/h2&gt;

&lt;p&gt;This morning, I decided I wanted to write an example for the PliantDB server. At the end of the day, I wasn't happy with the type of interaction-less result I could make with the current functionality, so &lt;a href="https://github.com/khonsulabs/pliantdb/issues/22"&gt;I added &lt;code&gt;reduce_grouped()&lt;/code&gt;&lt;/a&gt;. I marveled with Daxpedda in Discords after looking at the diff: &lt;a href="https://github.com/khonsulabs/pliantdb/commit/c14c6d7bf1dfe4977a42a8f89c3275721c744114"&gt;19 files,+494,-90&lt;/a&gt;. It took me about an hour from the point of introducing my first compilation issue to getting it compiling. I added a couple of unit tests into the existing suite, and it all worked.&lt;/p&gt;

&lt;p&gt;This is a regular occurrence with Rust and me. Yes, I can tell you about my experiences of having to debug multithreading issues. I can tell you they're just as painful as they are outside of Rust. However, the building blocks of the language itself encourage a design that helps eliminate so many types of runtime issues that you can encounter. You can still have errors in your logic, but I am finding that more often than not: when it compiles, it works.&lt;/p&gt;

&lt;p&gt;Let's look at the stats of PliantDB as of tonight, using &lt;a href="https://github.com/XAMPPRocky/tokei"&gt;tokei&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;===============================================================================&lt;/span&gt;
 Language            Files        Lines         Code     Comments       Blanks
&lt;span class="o"&gt;===============================================================================&lt;/span&gt;
 Shell                   2           29           24            3            2
 TOML                    8          249          223            1           25
 YAML                    1            8            6            2            0
&lt;span class="nt"&gt;------------------------------------------------------------------------------------&lt;/span&gt;
 Markdown                1           77            0           49           28
 |- BASH                 1            1            1            0            0
 |- Rust                 1           54           40            3           11
 &lt;span class="o"&gt;(&lt;/span&gt;Total&lt;span class="o"&gt;)&lt;/span&gt;                            132           41           52           39
&lt;span class="nt"&gt;------------------------------------------------------------------------------------&lt;/span&gt;
 Rust                   61         7974         6699          221         1054
 |- Markdown            39          593            0          563           30
 &lt;span class="o"&gt;(&lt;/span&gt;Total&lt;span class="o"&gt;)&lt;/span&gt;                           8567         6699          784         1084
&lt;span class="o"&gt;===============================================================================&lt;/span&gt;
 Total                  73         8337         6952          276         1109
&lt;span class="o"&gt;===============================================================================&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;According to Tokei, I've written 6699 lines of Rust code in this project. The first day of work was Friday, March 19, which is around ~3.5 weeks for those stats.&lt;/p&gt;

&lt;p&gt;I have a &lt;a href="https://khonsulabs.github.io/pliantdb/coverage/"&gt;pretty-well-tested&lt;/a&gt; codebase that I'm almost ready to integrate into &lt;a href="https://github.com/khonsulabs/cosmicverge"&gt;Cosmic Verge&lt;/a&gt;. While I have plenty of work remaining on PliantDB, I'm excited at the prospect of replacing PostgreSQL and Redis in Cosmic Verge potentially next month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interested in PliantDB's development?
&lt;/h2&gt;

&lt;p&gt;I'm always happy to have more people to talk about Rust with. I'd love to hear from you &lt;a href="https://discord.khonsulabs.com"&gt;on Discord&lt;/a&gt;, &lt;a href="https://twitter.com/ectonDev"&gt;Twitter&lt;/a&gt;, or &lt;a href="https://github.com/khonsulabs/pliantdb/discussions"&gt;GitHub Discussions&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>pliantdb</category>
      <category>database</category>
    </item>
  </channel>
</rss>
