Franck Pachot

Posted on May 30

BSON and OSON: documents are designed to be nested, not flat

#document #database #oracle #mongodb

Critiques vendor bias and missing comparisons

I like vendor benchmarks as they are often good illustrations of worst practice. Rather than focusing on real implementation trade-offs, they pick an extreme case that favors their own implementation by chance but is not optimized in the competitor's — because it is simply not how that technology is meant to be used. For example, Oracle published "2x" benchmark results using YCSB, a key-value benchmark (slide 14 here), and a "529x" test misusing the raw BSON purpose (here). Both compare a relational database against a document database by using neither technology as it was designed to be used — no normalized tables, no nested documents, just flat key-value fields. The latter test is even worse: it uses 1,000 top-level fields in a single document. Don't do that!

While having thousands of columns in an SQL table is usually undesirable, it's acceptable to have hundreds or even thousands of fields in a document, as they represent multiple entities and value objects. However, nobody creates a flat structure with 1,000 top-level fields. The advantage of JSON is its ability to organize entities into nested sub-documents. For instance, instead of storing first_name and last_name as separate fields, you can have a name field containing a sub-object with first and last. During queries, using dot notation to reference name.first and name.last makes no difference from first_name or last_name. It simplifies reading and displaying the document, and, as that is how it should be used, the binary JSON formats are optimized for it.

To illustrate this, I created two collections, flat with one thousand fields at the top level, and nest with ten top-level fields, each containing a sub-object with ten sub-objects.


{
echo field{0..9}{0..9}{0..9}
for i in {1..100000} ; do echo {0..9}{0..9}{0..9} ; done
} | sed 's/ /,/g' |
 mongoimport --collection "flat" --type=csv --headerline --drop

{
echo field{0..9}.sub{0..9}.sub{0..9}
for i in {1..100000} ; do echo {0..9}{0..9}{0..9} ; done
} | sed 's/ /,/g' |
 mongoimport --collection "nest" --type=csv --headerline --drop

The flat documents are like:

  { _id: ObjectId('6a1a1ed41d55219677a79c22'),
  field000: 0, field001: 1, field002: 2, field003: 3, field004: 4, field005: 5, field006: 6, field007: 7, field008: 8, field009: 9,
  field010: 10, field011: 11, field012: 12, field013: 13, field014: 14, field015: 15, field016: 16, field017: 17, field018: 18, field019: 19,
  field020: 20, field021: 21, field022: 22, field023: 23, field024: 24, field025: 25, field026: 26, field027: 27, field028: 28, field029: 29,
  field030: 30, field031: 31, field032: 32, field033: 33, field034: 34, field035: 35, field036: 36, field037: 37, field038: 38,
...

The nested documents are like:

  {                                                                                                                                                                                               
    _id: ObjectId('6a1a160041c5538a7e93a9c5'),
    field0: {
      sub0: { sub0: 0, sub1: 1, sub2: 2, sub3: 3, sub4: 4, sub5: 5, sub6: 6, sub7: 7, sub8: 8, sub9: 9 },
      sub1: { sub0: 10, sub1: 11, sub2: 12, sub3: 13, sub4: 14, sub5: 15, sub6: 16, sub7: 17, sub8: 18, sub9: 19 },
      sub2: { sub0: 20, sub1: 21, sub2: 22, sub3: 23, sub4: 24, sub5: 25, sub6: 26, sub7: 27, sub8: 28, sub9: 29 },
      sub3: { sub0: 30, sub1: 31, sub2: 32, sub3: 33, sub4: 34, sub5: 35,
...

They hold the same values, with a different field structure.

BSON serialization format optimization

BSON is designed for efficient sequential streaming through network or disk access. Each field is stored one after another, with a small header containing the field type and name, followed by the value. Because every value type has either a fixed size (like a 64-bit integer or a double) or an explicit length prefix (like a string or a binary blob), the parser can skip over any field it doesn't care about without reading its value at all — it simply jumps forward by the declared length.

Nested sub-documents and arrays are stored the same way: the type is object or array, followed by the total byte length of the entire nested structure, followed by the nested fields themselves (for arrays, the field names are simply "0", "1", "2", and so on). That length prefix is the key optimization: if the parser is looking for a top-level field and the current field is a sub-document that doesn't match, it can skip the entire sub-document in one jump — all its nested fields, however deep — without reading a single byte inside it.

When you query a field using dot notation, such as "name.first", the parser works level by level. It scans the top-level fields looking for name. Any top-level field that doesn't match name is skipped, including any sub-documents, in a single jump. Once name is found and confirmed to be of type object, the parser steps into that sub-document and begins scanning its fields for first, again skipping anything that doesn't match. It never needs to read the sibling sub-documents of name at all.

The worst case is a field that doesn't exist: the parser must scan every field at the relevant nesting level before concluding it isn't there. But even then, sub-documents and arrays that are clearly irrelevant — because their parent name doesn't match — are each skipped in a single operation regardless of how large or deeply nested they are.

This is precisely why organizing many fields into a nested hierarchy is not just a cosmetic preference. A flat document with 1,000 top-level fields forces the parser to read 1,000 field names in the worst case. A document with 10 top-level fields, each containing 10 sub-fields, each containing 10 values, holds the same 1,000 values but the parser only ever reads at most 10 + 10 + 10 = 30 field names to locate any one of them. The BSON length prefix turns the nested structure into an implicit index, making the format genuinely faster to navigate at scale.

With the collection flat that has one thousand top-level fields from "field000" to "field999", looking for the last field "field999" takes more than one second:


db.flat.find(
 { "field999" : {$lt:42} }
).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 0,
  executionTimeMillis: 1081,
  totalKeysExamined: 0,
  totalDocsExamined: 100000,
  executionStages: {
    isCached: false,
    stage: 'COLLSCAN',
    filter: { field999: { '$lt': 42 } },
    nReturned: 0,
    executionTimeMillisEstimate: 1070,
    works: 100001,
    advanced: 0,
    needTime: 100000,
    needYield: 0,
    saveState: 64,
    restoreState: 64,
    isEOF: 1,
    direction: 'forward',
    docsExamined: 100000
  }
}

With the collection nest that has ten top-level fields from "field0" to "field9", and two levels of ten sub-objects from "sub0" to "sub9", looking for the last field "field9.sub9.sub9" is twice as fast:


db.nest.find(
 { "field9.sub9.sub9" : {$lt:42} }
).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 0,
  executionTimeMillis: 424,
  totalKeysExamined: 0,
  totalDocsExamined: 100000,
  executionStages: {
    isCached: false,
    stage: 'COLLSCAN',
    filter: { 'field9.sub9.sub9': { '$lt': 42 } },
    nReturned: 0,
    executionTimeMillisEstimate: 420,
    works: 100001,
    advanced: 0,
    needTime: 100000,
    needYield: 0,
    saveState: 21,
    restoreState: 21,
    isEOF: 1,
    direction: 'forward',
    docsExamined: 100000
  }
}

A major part of the time is fetching the document from storage. Navigating into it is a smaller part, but the difference is still clearly visible: 1,081 ms versus 424 ms for 100,000 documents, with the same number of values, and the same worst-case field to find: the last one. The only difference is how those fields are arranged. Nesting them three levels deep, with ten branches at each level, makes the collection scan faster — with no index, no schema change, and no query rewrite beyond the dot notation that you would use anyway.

This is not a micro-optimization or an edge case. Any collection scan, whether triggered by a missing index, a low-selectivity filter, or a background analytics query, pays this cost on every document it reads. The deeper and wider your documents are, the more the BSON length-prefix trick pays off, because the parser can leap over entire branches of the document tree in a single bounds check.

What about Oracle's OSON format?

Oracle Database stores JSON documents in its proprietary binary format called OSON. At first glance, OSON appears to take a fundamentally different approach: rather than storing field names inline with each value as BSON does, OSON builds a field name dictionary at the beginning of the document. Think of it like a mini-datastore with its catalog and indexes, rather than a protocol buffer. Each field in the document body then refers to its name by a short numeric ID rather than repeating the full string. This makes individual field names cheaper to compare and reduces document size when the same field name appears many times.

Given that design, you might expect nesting to make no difference in Oracle: if every field is just a numeric ID anyway, scanning 1,000 flat fields should cost the same as scanning 10 + 10 + 10 fields spread across three levels. The dictionary lookup cost is the same either way.

In practice, however, the same experiment on Oracle Database shows a very similar result to MongoDB.

Querying field999 on the flat table takes about 1.6 seconds:


set autotrace traceonly

select /*+ gather_plan_statistics */ "DATA" from "ORA"."flat"
 where JSON_EXISTS("DATA",'$?(@.field999.numberOnly() < $B0)'
 passing 42 as "B0" type(strict))
;

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________
SQL_ID  1h98k751w7dws, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ "DATA" from "ORA"."flat"  where
JSON_EXISTS("DATA",'$?(@.field999.numberOnly() < $B0)'  passing 42 as
"B0" type(strict))

Plan hash value: 4073748891
---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      0 |00:00:01.60 |     301K|    739 |
|*  1 |  TABLE ACCESS FULL| flat |      1 |     12 |      0 |00:00:01.60 |     301K|    739 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_EXISTS2( /*+ QJSNMD_USE_SHR_XCTX */ "DATA" /*+ LOB_BY_VALUE */
              FORMAT OSON , '$?(@.field999.numberOnly() < $B0)' /* json_path_str
              $?(@.field999.numberOnly() < $B0)  */  PASSING 42 AS "B0" FALSE ON ERROR
              TYPE(STRICT) )=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
-----------------------------------------------------------
             117  CPU used by this session
             117  CPU used when call started
             160  DB time
         1174164  RM usage by this session
               3  Requests to/from client
             738  Session total flash IO requests
         6053888  cell physical IO interconnect bytes
          301074  consistent gets
          301074  consistent gets from cache
          301074  consistent gets pin
            1068  consistent gets pin (fastpath)
             739  gcs data block access records
               1  messages sent
             747  non-idle wait count
              48  non-idle wait time
               2  opened cursors cumulative
               1  opened cursors current
             738  physical read total IO requests
         6053888  physical read total bytes
               2  process last non-idle time
          301074  session logical reads
              48  user I/O wait time
               4  user calls

Querying field9.sub9.sub9 on the nested table takes about 1.0 second:


select /*+ gather_plan_statistics */ "DATA" from "ORA"."nest"
 where JSON_EXISTS("DATA",'$?(@.field9.sub9.sub9.numberOnly() < $B0)'
 passing 42 as "B0" type(strict))
;

PLAN_TABLE_OUTPUT
__________________________________________________________________________________________________
SQL_ID  7yuwrup6rscrc, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ "DATA" from "ORA"."nest"   where
JSON_EXISTS("DATA",'$?(@.field9.sub9.sub9.numberOnly() < $B0)'
passing 42 as "B0" type(strict))

Plan hash value: 3225864993
---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      0 |00:00:01.04 |     100K|   1143 |
|*  1 |  TABLE ACCESS FULL| nest |      1 |   1000 |      0 |00:00:01.04 |     100K|   1143 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_EXISTS2( /*+ QJSNMD_USE_SHR_XCTX */ "DATA" /*+ LOB_BY_VALUE */
              FORMAT OSON , '$?(@.field9.sub9.sub9.numberOnly() < $B0)' /* json_path_str
              $?(@.field9.sub9.sub9.numberOnly() < $B0)  */  PASSING 42 AS "B0" FALSE ON ERROR
              TYPE(STRICT) )=1)


Statistics
-----------------------------------------------------------
              40  CPU used by this session
              40  CPU used when call started
             106  DB time
          404934  RM usage by this session
               4  Requests to/from client
            1125  Session total flash IO requests
         9363456  cell physical IO interconnect bytes
          100221  consistent gets
          100221  consistent gets from cache
          100221  consistent gets pin
           99096  consistent gets pin (fastpath)
            1142  gcs data block access records
             158  global enqueue gets sync
             158  global enqueue releases
            1134  non-idle wait count
              71  non-idle wait time
               2  opened cursors cumulative
               1  opened cursors current
            1125  physical read total IO requests
         9363456  physical read total bytes
              61  process last non-idle time
          100221  session logical reads
              71  user I/O wait time
               4  user calls

The most telling number is not the execution time but the Buffers column: 301,074 consistent gets for the flat table versus 100,221 for the nested table. Oracle is reading three times as many data blocks for the flat documents. To understand why, I check the actual segment sizes:


with
table_seg (owner,table_name, table_blocks) as ( select owner, segment_name, sum(blocks) from dba_segments group by owner,segment_name),
lob_seg   (owner,table_name,   lob_blocks) as ( select owner, table_name,   sum(blocks) from dba_lobs join dba_segments using (owner, segment_name) group by owner,table_name),
tab_stats (owner,table_name,  avg_row_len) as ( select owner, table_name,   avg_row_len from dba_tables)
select owner, table_name, table_blocks, nvl(lob_blocks, 0) as lob_blocks, table_blocks + nvl(lob_blocks, 0) as total_blocks, avg_row_len
 from tab_stats natural join table_seg natural left join lob_seg
 where owner = 'ORA' and table_name in ('flat', 'nest')
;

OWNER    TABLE_NAME       TABLE_BLOCKS    LOB_BLOCKS    TOTAL_BLOCKS    AVG_ROW_LEN
________ _____________ _______________ _____________ _______________ ______________
ORA      flat                    1,152       317,608         318,760            166
ORA      nest                  100,992            32         101,024          7,724

The segment query reveals the full story. The nested documents have an average row length of 7,724 bytes and are stored entirely inline in the table segment's blocks, with almost no LOB blocks. The flat documents have an average row length of only 166 bytes in the table segment, but they spill into 317,608 LOB blocks. That tiny average row length for flat is not a sign of small documents — it is the sign of a pointer because large documents do not fit in fixed-sized table blocks. The actual OSON bytes have been pushed out of the row into a separate LOB segment managed by Oracle's securefile infrastructure, and every document access requires an additional pointer dereference to fetch them from there.

The root cause is the OSON dictionary. OSON stores each distinct field name only once in a per-document dictionary and replaces every occurrence in the document body with a short numeric ID. The dictionary itself is the mechanism that compresses the field names. A flat document with 1,000 entirely unique field names like field000 through field999 requires a dictionary with 1,000 entries, one per distinct string, with no repetition to exploit. The resulting OSON document is large enough to exceed Oracle's block size and gets stored out of row as a LOB. A nested document whose entire structure uses only 20 distinct names — field0 through field9 and sub0 through sub9 — has a dictionary with just 20 short entries. Furthermore, objects that share the same field structure can reuse each other's field ID array entirely rather than repeating it. The resulting document is compact enough to be stored inline, directly in the table block, with no LOB indirection at all.

The consequence is dramatic here. Scanning the flat collection means fetching a pointer from the table block, then chasing it to a LOB block, for every single one of the 100,000 documents. Scanning the nested collection means reading the document directly from the table block. That is the difference between 301,074 consistent gets and 100,221.

So the two formats reach the same conclusion by different paths. BSON benefits from nesting because the length-prefix on sub-documents lets the parser skip entire branches with a single jump, reducing the number of field names it reads. OSON benefits from nesting because the dictionary compresses repeated field names, shrinking the documents enough to keep them inline in the table and avoiding the cost of LOB storage entirely.

JSONB and BSON on PostgreSQL

JSONB takes a different approach from BSON. Rather than storing fields in insertion order with length-prefix skipping, JSONB sorts keys at each nesting level and adds a header describing the offset and type of every value at that level, enabling binary search instead of sequential scanning. Because this structure is built independently per level, a nested document gets a short sorted list at each level rather than one long list at the top. The benefit of nesting is the same as with BSON, just through a different mechanism.

I imported the same data into PostgreSQL:

CREATE TABLE flat (data JSONB);  
CREATE TABLE nest (data JSONB);
\copy flat (data) FROM PROGRAM 'mongoexport --collection flat'  
\copy nest (data) FROM PROGRAM 'mongoexport --collection nest' 

VACUUM ANALYZE flat, nest
;

The query on the flat structure takes more than two seconds:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF)  
SELECT * FROM flat  
WHERE (data->>'field999')::int < 42;  

                                 QUERY PLAN
----------------------------------------------------------------------------
 Seq Scan on public.flat (actual time=2138.111..2138.111 rows=0.00 loops=1)
   Output: data
   Filter: (((flat.data ->> 'field999'::text))::integer < 42)
   Rows Removed by Filter: 100000
   Buffers: shared hit=400637
 Planning Time: 0.043 ms
 Execution Time: 2138.126 ms
(7 rows)

The query on the nested structure is twice as fast:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF)  
SELECT * FROM nest  
WHERE (data->'field9'->'sub9'->>'sub9')::int < 42; 

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Seq Scan on public.nest (actual time=1292.003..1292.003 rows=0.00 loops=1)
   Output: data
   Filter: (((((nest.data -> 'field9'::text) -> 'sub9'::text) ->> 'sub9'::text))::integer < 42)
   Rows Removed by Filter: 100000
   Buffers: shared hit=400637
 Planning Time: 0.041 ms
 Execution Time: 1292.017 ms
(7 rows)

2,138 ms versus 1,292 ms, same documents, same values, same worst-case field. The difference is purely the cost of navigating the JSONB structure. The nested document wins because each level presents the parser with a short sorted list of at most 10 keys rather than a single list of 1,000. It is interesting to see that both tables have the same values but different field names, due to nesting, finally have exactly the same number of pages (shared hit=400637) in JSONB/TOAST.

PostgreSQL can also store documents in BSON natively via the DocumentDB extension, which uses the same BSON library as MongoDB. I loaded the same data there to confirm the result holds.

DocumentDB is a PostgreSQL database, so I ran the JSONB queries above on the same instance (the PostgreSQL endpoint port is 9712 in DocumentDB docker container by default). Then I used the DocumentDB extension API to import the same data from JSONB tables to DocumentDB collections:

postgres=# \dx documentdb*
                                List of installed extensions
          Name           | Version | Schema |                  Description
-------------------------+---------+--------+-----------------------------------------------
 documentdb              | 0.111-0 | public | API surface for DocumentDB for PostgreSQL
 documentdb_core         | 0.111-0 | public | Core API surface for DocumentDB on PostgreSQL
 documentdb_extended_rum | 0.111-0 | public | DocumentDB Extended RUM index access method
(3 rows)

-- Insert from the PostgreSQL JSONB tables into DocumentDB collections

SELECT documentdb_api.insert_one('db', 'flat', document)
FROM (SELECT data::text::documentdb_core.bson AS document FROM flat) t;

SELECT documentdb_api.insert_one('db', 'nest', document)
FROM (SELECT data::text::documentdb_core.bson AS document FROM nest) t;

I connected to the same DocumentDB database with the MongoDB-compatible API (the MongoDB endpoint port is 10260 in DocumentDB docker container by default).

The query on the flat structure with one thousand fields took 3 seconds:

db.flat.find(
 { "field999" : {$lt:42} }
).explain("executionStats").executionStats
;

{
  nReturned: Long('0'),
  executionTimeMillis: 3285.758,
  executionStartAtTimeMillis: 3285.758,
  totalDocsExamined: Long('100000'),
  totalKeysExamined: Long('0'),
  executionStages: {
    stage: 'COLLSCAN',
    nReturned: Long('0'),
    executionTimeMillis: 3285.758,
    executionStartAtTimeMillis: 3285.758,
    totalDocsExamined: 100000,
    totalKeysExamined: 0,
    totalDocsRemovedByRuntimeFilter: 100000,
    numBlocksFromCache: 374177,
    numBlocksFromDisk: 76758
  }
}

The query on the nested structure is twice as fast:

db.nest.find(
 { "field9.sub9.sub9" : {$lt:42} }
).explain("executionStats").executionStats
;

{
  nReturned: Long('0'),
  executionTimeMillis: 1401.24,
  executionStartAtTimeMillis: 1401.24,
  totalDocsExamined: Long('100000'),
  totalKeysExamined: Long('0'),
  executionStages: {
    stage: 'COLLSCAN',
    nReturned: Long('0'),
    executionTimeMillis: 1401.24,
    executionStartAtTimeMillis: 1401.24,
    totalDocsExamined: 100000,
    totalKeysExamined: 0,
    totalDocsRemovedByRuntimeFilter: 100000,
    numBlocksFromCache: 349453,
    numBlocksFromDisk: 51482
  }
}

3,286 ms versus 1,401 ms — consistent with the MongoDB result and explained by the same BSON length-prefix mechanism. DocumentDB stores documents in BSON on top of PostgreSQL storage, so the nesting optimization is identical to native MongoDB when looking at a field value.

Conclusion

Format	Navigation mechanism	Why nesting helps
BSON	Sequential scan with length-prefix skip	Skips entire sub-documents in one jump
JSONB	Binary search on sorted keys per level	Keeps each sorted list short
OSON	Dictionary lookup	Reduces dictionary size

The underlying mechanisms differ, but the guidance is the same: documents are nested structures that aggregate multiple entities and value objects. Don’t chase performance with documents containing 1,000 top-level fields — fix the data model first.

Model your documents the way you naturally think about the data — as a hierarchy of related objects — and the binary JSON format your database uses will reward you for it. Whether that is BSON in MongoDB, BSON via the DocumentDB extension for PostgreSQL, JSONB, or OSON, flat documents with hundreds of top-level fields are not just harder to read — they are measurably slower to query and heavier on storage. Nesting is not just good document design. It is how these formats are engineered to be used.

Top comments (9)

Rick Houlihan • May 30 • Edited

Franck, I appreciate the writeup, but it argues my case for me.

Every benchmark in it compares flat-vs-nested within one engine — Mongo flat vs Mongo nested, Oracle flat vs Oracle nested. The comparison that actually tests the claim is OSON-nested vs BSON-nested at the same shape, and you skipped it. I didn't. It's in the article you're responding to (methodology detailed below):

1 level deep — BSON 598ns, OSON 110ns -> 5.4x
3 levels — BSON 1,363ns, OSON 115ns -> 11.9x
5 levels — BSON 2,144ns, OSON 170ns -> 12.6x

Nesting is your recommended fix, and in a head-to-head it makes BSON relatively worse, not better — the gap widens with depth, because every level is another potential full scan for BSON (the test above executes depth traversal on the first attribute, best case for BSON) and just another single jump for OSON. Your own example shows the same thing at micro scale: your 10x10x10 doc reads "at most 10+10+10 = 30 field names." On the identical document OSON does 3 dictionary-ID lookups and 3 offset jumps. 30 sequential string comparisons versus 3 integer lookups — on the structure you're telling people to adopt as the cure. You literally reproduced the O(n) vs O(1) result and relabeled it as a rebuttal.

That's the tell in the advice itself. "Nest your documents and BSON gets faster" is a workaround you only ever need for a format that pays per field. Nobody gives that advice to OSON users, because OSON's access cost doesn't move with shape — flat or five deep, hash and jump. Oracle's VLDB 2020 paper calls it exactly that: tree pointers as jump-navigation offsets. Having to reshape your data to dodge the scan is the scan tax. Not having the tax is the architecture.

Two cleanups. Your "flat OSON is 3x the buffer reads" is a 1000-unique-name dictionary crossing the out-of-row threshold and spilling to a LOB — a storage boundary, not field-access complexity; a nested doc with 1000 distinct names spills the same way. And the 1000-field windmill you keep tilting at is the X-axis endpoint of a scaling curve — the textbook way to demonstrate complexity by varying input. The gap is already 28.6x at position 50, a field count you'd call realistic. Splitting 50 fields into 5 levels of 10 doesn't remove the scan; BSON does up to 10 string compares per level, OSON does 1 jump. The tax just gets redistributed.

BSON was a reasonable 2009 design from a team shipping a PaaS that pivoted into a database. OSON was designed in 2017 by the people who wrote the SQL/JSON standard, with eight years of BSON's limitations in front of them, and they chose jump-navigation over sequential scan on purpose — to kill exactly this cost. "Model it as a hierarchy and the format rewards you" is true. It's just more true for the format that doesn't make you do it to stay fast.

Methodology (DocBench, BsonVsOsonClientSideTest.testNestedFieldAccess): each object carries 10 sibling "padding" fields plus one structural field; depths 1/3/5; identical documents and identical path-navigation code for both formats — BSON via RawBsonDocument, OSON via OracleJsonValue, no driver-side HashMap shortcut. The "target" leaf sits at position 1 (best case for BSON); only the structural "nested" descent links sit deep (last position, after the 10 padding fields), so BSON scans ~10 siblings before each hop while OSON does one hash lookup. That is a modest, realistic fan-out, not a 1000-field strawman.

Worked out: ~14 / ~36 / ~58 BSON string comparisons at depth 1/3/5 vs 3 / 5 / 7
OSON jumps — which is the 5.4x / 11.9x / 12.6x in the table.

Source: github.com/rhoulihan/DocBench

Franck Pachot • May 30

Yes, I’m comparing designs within the same engine. You shouldn’t compare timings across different engines running on different CPU and memory configurations—especially when the metric is microseconds per document.
I never said “Nest your documents and BSON gets faster”. You nest documents because they are documents — not key–value records or wide tables. The performance benefit is a consequence, not the goal: document databases are optimized for document-shaped data.

If OSON is designed and optimized for thousands of top-level fields, that’s really cool for some workloads, like storing massive spreadsheets. But I still wouldn’t recommend that model for document applications. You can doesn’t mean you should. The same applies to relational tables: you can create 4,096 columns in an Oracle table, it works, but the engine is not primarily optimized for that many row pieces.

Rick Houlihan • May 30 • Edited

I agree that tests should be grounded on the same hardware. That is why I did all my tests using Oracle 26ai Free version running on the exact same hardware as MongoDB 8.1.

The whole point of your article is nest the data in the document and it gets faster. The fact that you did not explicitly say the words is irrelevant, it's what you are demonstrating.

OSON is optimized for accessing individual values within a document. Because of this it does not matter where the field is. Field 1 or 10000000000 the timing is the same, that is the point. BSON requires a linear sequential scan across fields until it finds the attribute name that matches an equality test. That is extremely heavy and expensive.

This is the difference between a properly engineered protocol and a hack produced during a product pivot. BSON was built to store and serve up full documents for a content management platform called 10gen. It was folded into a database under pressure to produce a product that the company could sell which became MongoDB. OSON was designed by a team of data scientists and engineers with decades of experience in database technology. They are not equivalent in any way and even the simplest tests expose this.

When I was at MongoDB I used BSON like a club to beat up RDBMS platforms like Postgres that claimed to support documents with bolt on technologies like JSONB while burning through CPU on JSON serialization as if its free. And yes I said bolt-on, that is exactly what JSONB is. You call it "plugin" but that is just another name for a feature that was not an integral part of the original design, e.g. bolt-on.

The fact is OSON flipped the script on MongoDB and that is why I am here at Oracle now. A proven database with a real CBO and 40 years of optimized execution planning that supports a true binary document protocol all of MongoDB's internal warts are suddenly exposed. The fact is there was a ton of technical debt that BSON protocol efficiency papered over. Unfortunately that paper is now peeling and the mold underneath is becoming visible.

Franck Pachot • May 30

Bolt-on? PostgreSQL was explicitly designed in 86 to store complex objects in addition to basic datatypes found in previous RDBMS. Objects, vectors, XML, JSON... those followed the trends but the idea is the same: the hooks for extensible datatypes, indexing, and storage were there explicitly to avoid bolt-on solutions later. You said "40 years of CBO"? 40 years ago, DB2 had great CBO, but Oracle was rule based. CBO started to be used in data warehouses 25 years ago, and later in OLTP (but forcing index cost lower). By the way neither 10gen or Oracle have invented binary serialization formats with variable length content, headers, and indexes. Devices exchange such formats for decades (Telco call data records or whatever).

Rick Houlihan • May 30

You say tomato, I say tomahto.

Oracle has only 34 years of CBO history, so I guess that makes my entire argument incorrect. I must be a fool.

Tossing out Red Herring distractions in the comments does not make the article above any more correct.

BSON is inferior to OSON in every way that can be measured. This is fact.

Rick Houlihan • May 30 • Edited

Also, on the JSONB thing...while technically it is not a plugin it certainly has not been in the Postgres distribution since '86. It first appeared as part of the release in 9.4 which was released in '92. So I guess everything you say must also be wrong now that you stated an esoteric historical fact incorrectly as I did and now we are both just idiots talking at each other. (I of course do not believe that for one second...you are extremely sharp)

While JSONB is compiled natively into the Postgres engine today, the reality is that it was frankensteined out of a 10-year-old key-value extension called hstore. The development team openly admitted that after a year of trying to make nested data work in an extension format, they had to frantically pivot, rip out the code, and merge it into the core database engine under extreme fatigue to hit a release deadline. It is fully integrated, but it is built on old structural bones. That legacy foundation is exactly why it suffers from heavy write-amplification, lack of in-place updates, and terrible query planner selectivity estimates today.

Compiled in or not, JSONB is a bolt-on by any reasonable definition.

Franck Pachot • May 30

I didn’t say “the JSONB thing” — I was referring to the extensibility framework that made JSONB (and many other features) possible in the first place. The key idea is that these capabilities can be plugged directly into the storage layer and catalog, rather than bolted on top, and that the system supports custom data types and indexes at all layers. This was already described in 1986 in THE DESIGN OF POSTGRES as “access methods” (with the first design principle being to “provide better support for complex objects”). The closest thing in Oracle Database is Data Cartridge (around 8i).

Rick Houlihan • May 31 • Edited

This is why I will always enjoy our banter, Franck. Accuracy is what its all about.

We may often disagree, but we will always agree on that even when we have different opinions on what accuracy is.

In this case my assumption as the reader and yours as the author were inconsistent which generated hallucinations. :)

E Lion Reigns • Jun 3

Nested vs flat documents — I feel this in CSV/JSONL logging for chat+call logs. How do you version schema changes without breaking replay?