Mutable BSON and Oracle OSON

#oracle #mongodb #json #database

AskTom Live is a great source of information from Oracle developer advocates and product managers, but I recently came across a clickbait marketing title ("Not All Binary Protocols Are Created Equal: The Science Behind OSON's 529x Performance Advantage") which compares apples to oranges, and it's an opportunity to explain what BSON is, the binary JSON format used by MongoDB.

TL;DR: If you want to compare with OSON, the Oracle Database datatype for JSON, you should compare the Mutable BSON Document which is the structure that MongoDB uses to access documents, reading and updating individual fields. Raw BSON is closer to protobuf: a compact serialization format for disk or network transfer, with access metadata removed and no blocks or headers.

I've left the following comment to the YouTube video but it seems that it is not publicly visible, so here it is.

Let me explain how Oracle Database and MongoDB handle disk-based data access, and you will understand the different design purposes of OSON and BSON, and why you are not testing the right thing to compare them.

Oracle Database, like many traditional databases, uses the same format on disk (blocks) and in memory (buffers), and must store all transient metadata that helps access it in memory on persistent storage. This applies to table blocks (which contain a table directory, a row directory, and even lock flags, ITLs, that need to be cleaned up later), and the same idea was used for OSON (header, dictionary, sorted field IDs, offset arrays). Think of it as a mini database with its catalog, like the Oracle database has its dictionary and segment headers, which map physical extents and blocks. Then accessing the on-disk OSON structure directly makes sense — it's designed to be used through buffers that match the disk blocks.

But MongoDB with WiredTiger uses a smarter cache where the in-memory structures are optimized for RAM: adding pointers instead of disk offsets, building an Elements Vector for O(1) field access, and adding skiplists to navigate fields, all when data is loaded into the database cache. So there are two formats: the mutable BSON that the database actually works on in memory for query processing and updates, and the on-disk raw BSON that, on purpose, strips any unnecessary metadata and compresses it, to maximize the OS filesystem cache usage, and fits to the major advantage of MongoDB for documents: read/write a document in a single I/O.

The raw BSON is a serialization format for disk and network, not to be accessed partially, because MongoDB has a powerful mutable BSON format in memory with O(1) access through its Elements Vector indexing. The O(n) sequential scan, the "no partial updates" limitation, and the field position penalties you describe — those are properties of the serialization format, not how MongoDB actually processes queries. And by definition, the serialization format is read sequentially, even though BSON can jump between fields. Don't do that except when you need a full document. Use the MongoDB server and drivers to access BSON, and learn how to use it correctly.

With this understanding, you can see that the "529x performance" clickbait title comes from a mistake: you used raw BSON to access individual fields, bypassing everything MongoDB does when serving a query. It would be like using BBED to query Oracle Datafiles without going through the instance — no buffer cache, no row directory navigation, no dictionary lookups — and then concluding that Oracle's storage format is slow.

Notably, the original OSON VLDB paper (Liu et al., 2020) by Zhen Hua Liu doesn't make the claims this video does. That paper honestly compares OSON against Oracle's own JSON text storage, not against MongoDB's query processing. It compares encoding sizes with BSON, which is legitimate for a serialization format comparison (though it overlooks that BSON in MongoDB is compressed on disk and over the network). The paper authors understood they were comparing serialization formats and storage approaches within Oracle, not benchmarking MongoDB's actual runtime performance.

I believe OSON is the optimal format for Oracle because it integrates with the existing instance, cache, blocks, and securefiles, which were created a long time ago. Conversely, BSON is ideal for MongoDB, as it capitalizes on the document database's design and the WiredTiger storage engine.

Top comments (1)

Rick Houlihan • Apr 21

You are certainly correct on the cache comments, most traditional RDBMS use the same format on disk and in memory, I am putting this on my list to test as it poses several interesting scenarios worth looking at. The benchmark I wrote, however, is not clickbait. It is verifiable in code. With 1K attributes at the root of a BSON document accessing the last attribute is 529X slower than the same structure in OSON. Even with just 50 attributes it is 28X slower. Serializing to a Document class pays the cost one time, but the cost of serialization is the equivalent of 10 reads from the rawBsonDocument. Again, all demonstrable and provable in code. The claim in the benchmark I wrote was a linear sequential scan issue with client side access. You are convoluting the issue by introducing a server side data structure into the conversation when it does not belong there. We did run a server side partial update test in the article, the results indicate that mutable BSON does not help in that scenario. It sounds like it might improve server side performance when accessing attributes within a document multiple times, but that is not what the benchmark actually tests.