AskTom Live is a great source of information from Oracle developer advocates and product managers, but I recently came across a clickbait marketing title ("Not All Binary Protocols Are Created Equal: The Science Behind OSON's 529x Performance Advantage") which compares apples to oranges, and it's an opportunity to explain what BSON is, the binary JSON format used by MongoDB.
TL;DR: If you want to compare with OSON, the Oracle Database datatype for JSON, you should compare the Mutable BSON Document which is the structure that MongoDB uses to access documents, reading and updating individual fields. Raw BSON is closer to protobuf: a compact serialization format for disk or network transfer, with access metadata removed and no blocks or headers.
I've left the following comment to the YouTube video but it seems that it is not publicly visible, so here it is.
Let me explain how Oracle Database and MongoDB handle disk-based data access, and you will understand the different design purposes of OSON and BSON, and why you are not testing the right thing to compare them.
Oracle Database, like many traditional databases, uses the same format on disk (blocks) and in memory (buffers), and must store all transient metadata that helps access it in memory on persistent storage. This applies to table blocks (which contain a table directory, a row directory, and even lock flags, ITLs, that need to be cleaned up later), and the same idea was used for OSON (header, dictionary, sorted field IDs, offset arrays). Think of it as a mini database with its catalog, like the Oracle database has its dictionary and segment headers, which map physical extents and blocks. Then accessing the on-disk OSON structure directly makes sense — it's designed to be used through buffers that match the disk blocks.
But MongoDB with WiredTiger uses a smarter cache where the in-memory structures are optimized for RAM: adding pointers instead of disk offsets, building an Elements Vector for O(1) field access, and adding skiplists to navigate fields, all when data is loaded into the database cache. So there are two formats: the mutable BSON that the database actually works on in memory for query processing and updates, and the on-disk raw BSON that, on purpose, strips any unnecessary metadata and compresses it, to maximize the OS filesystem cache usage, and fits to the major advantage of MongoDB for documents: read/write a document in a single I/O.
The raw BSON is a serialization format for disk and network, not to be accessed partially, because MongoDB has a powerful mutable BSON format in memory with O(1) access through its Elements Vector indexing. The O(n) sequential scan, the "no partial updates" limitation, and the field position penalties you describe — those are properties of the serialization format, not how MongoDB actually processes queries. And by definition, the serialization format is read sequentially, even though BSON can jump between fields. Don't do that except when you need a full document. Use the MongoDB server and drivers to access BSON, and learn how to use it correctly.
With this understanding, you can see that the "529x performance" clickbait title comes from a mistake: you used raw BSON to access individual fields, bypassing everything MongoDB does when serving a query. It would be like using BBED to query Oracle Datafiles without going through the instance — no buffer cache, no row directory navigation, no dictionary lookups — and then concluding that Oracle's storage format is slow.
Notably, the original OSON VLDB paper (Liu et al., 2020) by Zhen Hua Liu doesn't make the claims this video does. That paper honestly compares OSON against Oracle's own JSON text storage, not against MongoDB's query processing. It compares encoding sizes with BSON, which is legitimate for a serialization format comparison (though it overlooks that BSON in MongoDB is compressed on disk and over the network). The paper authors understood they were comparing serialization formats and storage approaches within Oracle, not benchmarking MongoDB's actual runtime performance.
I believe OSON is the optimal format for Oracle because it integrates with the existing instance, cache, blocks, and securefiles, which were created a long time ago. Conversely, BSON is ideal for MongoDB, as it capitalizes on the document database's design and the WiredTiger storage engine.
Top comments (0)