BSON vs OSON: Different design goals

#mongodb #json #database #oracle

There was a discussion on LinkedIn comparing OSON (the binary JSON in Oracle Database) and BSON (the binary JSON in MongoDB). To be clear, BSON and OSON aren’t directly comparable because they serve different purposes:

BSON is a general-purpose binary serialization format designed for transport and storage efficiency, similar to Protobuf. It prioritizes fast encode/decode operations for network transmission and disk storage across diverse applications.
OSON is Oracle Database's internal JSON storage format, specifically engineered for database operations, with extensive metadata to enable efficient field querying and navigation without fully decoding the document.

They share two objectives but make different trade-offs:

Compactness Through Binary Encoding – Both formats achieve compactness without compression through binary encoding and type-specific representations. However, BSON prioritizes encoding speed with a straightforward binary representation, whereas OSON achieves greater compactness through local dictionary compression of field names within documents.
Partial Navigation Without Full Scanning – Both enable efficient partial document traversal, but with different approaches. BSON uses simple size prefixes to enable jumping between fields. A BSON document is meant to be stored as a single variable-size block (possible in the WiredTiger B-Tree). OSON implements comprehensive metadata structures—such as a tree segment and jumpable offsets—so that it can reduce random reads when stored in multiple fixed-size blocks (the Oracle Database storage model).

They differ in one major design objective regarding updates:

BSON follows MongoDB's "store together what is accessed together" principle, optimizing for full-document read patterns and full-document writes to disk as a single contiguous block, rather than in-place modifications. Standard BSON documents are not designed for efficient partial updates outside of an in-memory mutable object representation.
OSON is designed to optimize database operations, including in-place modifications without additional memory allocation.

I mentioned the Java driver in the previous post. The Oracle Java driver supports fast access via OracleJsonObject.get(), which avoids instantiating a new object and uses the internal metadata for navigation. I wanted to see how this works, but because the Oracle Database Java driver isn’t open source, I tried Python instead, where the Oracle driver is open source. However, it doesn’t provide an equivalent to OracleJsonObject.get(), so you must decode and encode to a Python dict, similar to BSON.

I ran the following program to measure the time and size to encode and decode BSON and OSON:


import time    
import bson    
import oracledb    
from bson.codec_options import CodecOptions    
from bson.raw_bson import RawBSONDocument    

# Prepare RawBSON codec options once for lazy BSON decoding    
raw_codec_options = CodecOptions(document_class=RawBSONDocument)    

def generate_large_document(num_fields, field_length):    
    long_str = "a" * field_length    
    return {f"field_{i+1}": long_str for i in range(num_fields)}    

def compare_bson_oson(document, connection, iterations=100):    
    """Compare BSON and OSON encode/decode, plus access after decode."""    

    middle_field_name = f"field_{len(document)//2}"    

    # Precompute sizes    
    bson_data_sample = bson.encode(document)    
    bson_size = len(bson_data_sample)    

    oson_data_sample = connection.encode_oson(document)    
    oson_size = len(oson_data_sample)    

    # Timers    
    bson_encode_total = 0.0    
    bson_decode_total = 0.0    
    bson_access_after_decode_total = 0.0    
    bson_decode_raw_total = 0.0    
    bson_access_raw_total = 0.0    

    oson_encode_total = 0.0    
    oson_decode_total = 0.0    
    oson_access_after_decode_total = 0.0    

    for _ in range(iterations):    
        # BSON encode    
        start = time.perf_counter()    
        bson_data = bson.encode(document)    
        bson_encode_total += (time.perf_counter() - start)    

        # BSON decode raw (construct RawBSONDocument)    
        start = time.perf_counter()    
        raw_bson_doc = RawBSONDocument(bson_data, codec_options=raw_codec_options)    
        bson_decode_raw_total += (time.perf_counter() - start)    

        # BSON access single field from raw doc    
        start = time.perf_counter()    
        _ = raw_bson_doc[middle_field_name]    
        bson_access_raw_total += (time.perf_counter() - start)    

        # BSON full decode    
        start = time.perf_counter()    
        decoded_bson = bson.decode(bson_data)    
        bson_decode_total += (time.perf_counter() - start)    

        # BSON access after full decode    
        start = time.perf_counter()    
        _ = decoded_bson[middle_field_name]    
        bson_access_after_decode_total += (time.perf_counter() - start)    

        # OSON encode    
        start = time.perf_counter()    
        oson_data = connection.encode_oson(document)    
        oson_encode_total += (time.perf_counter() - start)    

        # OSON full decode    
        start = time.perf_counter()    
        decoded_oson = connection.decode_oson(oson_data)    
        oson_decode_total += (time.perf_counter() - start)    

        # OSON access after full decode    
        start = time.perf_counter()    
        _ = decoded_oson[middle_field_name]    
        oson_access_after_decode_total += (time.perf_counter() - start)    

    return (    
        bson_encode_total,    
        bson_decode_total,    
        bson_access_after_decode_total,    
        bson_size,    
        bson_decode_raw_total,    
        bson_access_raw_total    
    ), (    
        oson_encode_total,    
        oson_decode_total,    
        oson_access_after_decode_total,    
        oson_size    
    )    

def run_multiple_comparisons():    
    iterations = 100    
    num_fields_list = [10, 100, 1000]    
    field_sizes_list = [1000, 10000, 100000]  # small length → large length    

    # Oracle functions need a connection even if all happens locally 🤷🏼‍♂️ 
    oracledb.init_oracle_client(config_dir="/home/opc/Wallet")    
    connection = oracledb.connect(    
        user="franck",    
        password="My Strong P455w0rd",    
        dsn="orcl_tp"    
    )    

    print(f"{'Format':<6} {'Fields':<8} {'FieldLen':<10} "    
          f"{'Encode(s)':<12} {'Decode(s)':<12} {'Access(s)':<12} {'Size(bytes)':<12} "    
          f"{'DecRaw(s)':<12} {'AccRaw(s)':<12} "    
          f"{'EncRatio':<9} {'DecRatio':<9} {'SizeRatio':<9}")    
    print("-" * 130)    

    for field_length in field_sizes_list:    
        for num_fields in num_fields_list:    
            document = generate_large_document(num_fields, field_length)    
            bson_res, oson_res = compare_bson_oson(document, connection, iterations)    

            enc_ratio = oson_res[0] / bson_res[0] if bson_res[0] > 0 else 0    
            dec_ratio = oson_res[1] / bson_res[1] if bson_res[1] > 0 else 0    
            size_ratio = oson_res[3] / bson_res[3] if bson_res[3] > 0 else 0    

            # BSON row    
            print(f"{'BSON':<6} {num_fields:<8} {field_length:<10} "    
                  f"{bson_res[0]:<12.4f} {bson_res[1]:<12.4f} {bson_res[2]:<12.4f} {bson_res[3]:<12} "    
                  f"{bson_res[4]:<12.4f} {bson_res[5]:<12.4f} "    
                  f"{'-':<9} {'-':<9} {'-':<9}")    

            # OSON row    
            print(f"{'OSON':<6} {num_fields:<8} {field_length:<10} "    
                  f"{oson_res[0]:<12.4f} {oson_res[1]:<12.4f} {oson_res[2]:<12.4f} {oson_res[3]:<12} "    
                  f"{'-':<12} {'-':<12} "    
                  f"{enc_ratio:<9.2f} {dec_ratio:<9.2f} {size_ratio:<9.2f}")    

    connection.close()    

if __name__ == "__main__":    
    run_multiple_comparisons()

I got the following results:

Encode(s) — total encode time over 100 iterations.
Decode(s) — full decode into Python objects (dict).
Access(s) — access to a field from Python objects (dict).
DecRaw(s) — creation of a RawBSONDocument for BSON (no equivalent for OSON).
AccRaw(s) — single middle‑field access from a raw document (lazy decode for BSON).
Ratios — OSON time / BSON time.


$ TNS_ADMIN=/home/opc/Wallet python bson-oson.py  
Format Fields   FieldLen   Encode(s)    Decode(s)    Access(s)    Size(bytes)  DecRaw(s)    AccRaw(s)    EncRatio  DecRatio  SizeRatio  
----------------------------------------------------------------------------------------------------------------------------------  
BSON   10       1000       0.0005       0.0005       0.0000       10146        0.0002       0.0011       -         -         -  
OSON   10       1000       0.0013       0.0006       0.0000       10206        -            -            2.44      1.26      1.01  
BSON   100      1000       0.0040       0.0043       0.0000       101497       0.0012       0.0101       -         -         -  
OSON   100      1000       0.0103       0.0053       0.0000       102009       -            -            2.58      1.25      1.01  
BSON   1000     1000       0.0422       0.0510       0.0000       1015898      0.0098       0.0990       -         -         -  
OSON   1000     1000       0.1900       0.0637       0.0000       1021912      -            -            4.50      1.25      1.01  
BSON   10       10000      0.0019       0.0017       0.0000       100146       0.0005       0.0025       -         -         -  
OSON   10       10000      0.0045       0.0021       0.0000       100208       -            -            2.36      1.27      1.00  
BSON   100      10000      0.0187       0.0177       0.0000       1001497      0.0026       0.0225       -         -         -  
OSON   100      10000      0.1247       0.0241       0.0000       1002009      -            -            6.66      1.36      1.00  
BSON   1000     10000      0.2709       0.2439       0.0001       10015898     0.0235       0.2861       -         -         -  
OSON   1000     10000      14.4215      0.3185       0.0001       10021912     -            -            53.23     1.31      1.00

Important nuance: In Python, oracledb.decode_oson() yields a standard dict, so we cannot measure lazy access as we can with the Java driver’s OracleJsonObject.get() method, which can skip object instantiation. We measured it for one field from the raw BSON to show that there is a cost, which is higher than accessing the dict, though still less than a microsecond for large documents. In general, since you store together what is accessed together, it often makes sense to decode to an application object.

Encoding OSON is slower than BSON, especially for large documents, by design—because it computes navigation metadata for faster reads—whereas BSON writes a contiguous field stream. For the largest case, encoding takes ~15 seconds over 100 iterations, which translates to milliseconds per operation.

Decoding BSON is marginally faster than OSON, but the difference is negligible since all decoding times are under a millisecond. OSON’s extra metadata helps mainly when reading a few fields from a large document, as it avoids instantiating an immutable object.

Raw BSON provides faster "decoding" (as it isn’t actually decoded), but slower field access. Still, this difference—less than a millisecond—is negligible except when accessing many fields, in which case you should decode once to a Python dict.

This test illustrates the different design goals of BSON and OSON. I used the Python driver to illustrage what an application does: get a document from a query and manipulate as an application object. On the server, it is different, and queries may modify a single field in a document. OSON will do it directly on the OSON datatype, as it has all metadata, whereas BSON will be accessed though mutable BSON object.

DEV Community

BSON vs OSON: Different design goals

Top comments (0)