There was a discussion on LinkedIn comparing OSON (the binary JSON in Oracle Database) and BSON (the binary JSON in MongoDB). To be clear, BSON and OSON aren’t directly comparable because they serve different purposes:
- BSON is a general-purpose binary serialization format designed for transport and storage efficiency, similar to Protobuf. It prioritizes fast encode/decode operations for network transmission and disk storage across diverse applications.
- OSON is Oracle Database's internal JSON storage format, specifically engineered for database operations, with extensive metadata to enable efficient field querying and navigation without fully decoding the document.
They share two objectives but make different trade-offs:
- Compactness Through Binary Encoding – Both formats achieve compactness without compression through binary encoding and type-specific representations. However, BSON prioritizes encoding speed with a straightforward binary representation, whereas OSON achieves greater compactness through local dictionary compression of field names within documents.
- Partial Navigation Without Full Scanning – Both enable efficient partial document traversal, but with different approaches. BSON uses simple size prefixes to enable jumping between fields. A BSON document is meant to be stored as a single variable-size block (possible in the WiredTiger B-Tree). OSON implements comprehensive metadata structures—such as a tree segment and jumpable offsets—so that it can reduce random reads when stored in multiple fixed-size blocks (the Oracle Database storage model).
They differ in one major design objective regarding updates:
- BSON follows MongoDB's "store together what is accessed together" principle, optimizing for full-document read patterns and full-document writes to disk as a single contiguous block, rather than in-place modifications. Standard BSON documents are not designed for efficient partial updates outside of an in-memory mutable object representation.
- OSON is designed to optimize database operations, including in-place modifications without additional memory allocation.
I mentioned the Java driver in the previous post. The Oracle Java driver supports fast access via OracleJsonObject.get(), which avoids instantiating a new object and uses the internal metadata for navigation. I wanted to see how this works, but because the Oracle Database Java driver isn’t open source, I tried Python instead, where the Oracle driver is open source. However, it doesn’t provide an equivalent to OracleJsonObject.get(), so you must decode and encode to a Python dict, similar to BSON.
I ran the following program to measure the time and size to encode and decode BSON and OSON:
import time
import bson
import oracledb
from bson.codec_options import CodecOptions
from bson.raw_bson import RawBSONDocument
# Prepare RawBSON codec options once for lazy BSON decoding
raw_codec_options = CodecOptions(document_class=RawBSONDocument)
def generate_large_document(num_fields, field_length):
long_str = "a" * field_length
return {f"field_{i+1}": long_str for i in range(num_fields)}
def compare_bson_oson(document, connection, iterations=100):
"""Compare BSON and OSON encode/decode, plus access after decode."""
middle_field_name = f"field_{len(document)//2}"
# Precompute sizes
bson_data_sample = bson.encode(document)
bson_size = len(bson_data_sample)
oson_data_sample = connection.encode_oson(document)
oson_size = len(oson_data_sample)
# Timers
bson_encode_total = 0.0
bson_decode_total = 0.0
bson_access_after_decode_total = 0.0
bson_decode_raw_total = 0.0
bson_access_raw_total = 0.0
oson_encode_total = 0.0
oson_decode_total = 0.0
oson_access_after_decode_total = 0.0
for _ in range(iterations):
# BSON encode
start = time.perf_counter()
bson_data = bson.encode(document)
bson_encode_total += (time.perf_counter() - start)
# BSON decode raw (construct RawBSONDocument)
start = time.perf_counter()
raw_bson_doc = RawBSONDocument(bson_data, codec_options=raw_codec_options)
bson_decode_raw_total += (time.perf_counter() - start)
# BSON access single field from raw doc
start = time.perf_counter()
_ = raw_bson_doc[middle_field_name]
bson_access_raw_total += (time.perf_counter() - start)
# BSON full decode
start = time.perf_counter()
decoded_bson = bson.decode(bson_data)
bson_decode_total += (time.perf_counter() - start)
# BSON access after full decode
start = time.perf_counter()
_ = decoded_bson[middle_field_name]
bson_access_after_decode_total += (time.perf_counter() - start)
# OSON encode
start = time.perf_counter()
oson_data = connection.encode_oson(document)
oson_encode_total += (time.perf_counter() - start)
# OSON full decode
start = time.perf_counter()
decoded_oson = connection.decode_oson(oson_data)
oson_decode_total += (time.perf_counter() - start)
# OSON access after full decode
start = time.perf_counter()
_ = decoded_oson[middle_field_name]
oson_access_after_decode_total += (time.perf_counter() - start)
return (
bson_encode_total,
bson_decode_total,
bson_access_after_decode_total,
bson_size,
bson_decode_raw_total,
bson_access_raw_total
), (
oson_encode_total,
oson_decode_total,
oson_access_after_decode_total,
oson_size
)
def run_multiple_comparisons():
iterations = 100
num_fields_list = [10, 100, 1000]
field_sizes_list = [1000, 10000, 100000] # small length → large length
# Init Oracle client
oracledb.init_oracle_client(config_dir="/home/opc/Wallet")
connection = oracledb.connect(
user="franck",
password="My Strong P455w0rd",
dsn="orcl_tp"
)
print(f"{'Format':<6} {'Fields':<8} {'FieldLen':<10} "
f"{'Encode(s)':<12} {'Decode(s)':<12} {'Access(s)':<12} {'Size(bytes)':<12} "
f"{'DecRaw(s)':<12} {'AccRaw(s)':<12} "
f"{'EncRatio':<9} {'DecRatio':<9} {'SizeRatio':<9}")
print("-" * 130)
for field_length in field_sizes_list:
for num_fields in num_fields_list:
document = generate_large_document(num_fields, field_length)
bson_res, oson_res = compare_bson_oson(document, connection, iterations)
enc_ratio = oson_res[0] / bson_res[0] if bson_res[0] > 0 else 0
dec_ratio = oson_res[1] / bson_res[1] if bson_res[1] > 0 else 0
size_ratio = oson_res[3] / bson_res[3] if bson_res[3] > 0 else 0
# BSON row
print(f"{'BSON':<6} {num_fields:<8} {field_length:<10} "
f"{bson_res[0]:<12.4f} {bson_res[1]:<12.4f} {bson_res[2]:<12.4f} {bson_res[3]:<12} "
f"{bson_res[4]:<12.4f} {bson_res[5]:<12.4f} "
f"{'-':<9} {'-':<9} {'-':<9}")
# OSON row
print(f"{'OSON':<6} {num_fields:<8} {field_length:<10} "
f"{oson_res[0]:<12.4f} {oson_res[1]:<12.4f} {oson_res[2]:<12.4f} {oson_res[3]:<12} "
f"{'-':<12} {'-':<12} "
f"{enc_ratio:<9.2f} {dec_ratio:<9.2f} {size_ratio:<9.2f}")
connection.close()
if __name__ == "__main__":
run_multiple_comparisons()
I got the following results:
-
Encode(s)— total encode time over 100 iterations. -
Decode(s)— full decode into Python objects (dict). -
Access(s)— access to a field from Python objects (dict). -
DecRaw(s)— creation of aRawBSONDocumentfor BSON (no equivalent for OSON). -
AccRaw(s)— single middle‑field access from a raw document (lazy decode for BSON). - Ratios — OSON time / BSON time.
$ TNS_ADMIN=/home/opc/Wallet python bson-oson.py
Format Fields FieldLen Encode(s) Decode(s) Access(s) Size(bytes) DecRaw(s) AccRaw(s) EncRatio DecRatio SizeRatio
----------------------------------------------------------------------------------------------------------------------------------
BSON 10 1000 0.0005 0.0005 0.0000 10146 0.0002 0.0011 - - -
OSON 10 1000 0.0013 0.0006 0.0000 10206 - - 2.44 1.26 1.01
BSON 100 1000 0.0040 0.0043 0.0000 101497 0.0012 0.0101 - - -
OSON 100 1000 0.0103 0.0053 0.0000 102009 - - 2.58 1.25 1.01
BSON 1000 1000 0.0422 0.0510 0.0000 1015898 0.0098 0.0990 - - -
OSON 1000 1000 0.1900 0.0637 0.0000 1021912 - - 4.50 1.25 1.01
BSON 10 10000 0.0019 0.0017 0.0000 100146 0.0005 0.0025 - - -
OSON 10 10000 0.0045 0.0021 0.0000 100208 - - 2.36 1.27 1.00
BSON 100 10000 0.0187 0.0177 0.0000 1001497 0.0026 0.0225 - - -
OSON 100 10000 0.1247 0.0241 0.0000 1002009 - - 6.66 1.36 1.00
BSON 1000 10000 0.2709 0.2439 0.0001 10015898 0.0235 0.2861 - - -
OSON 1000 10000 14.4215 0.3185 0.0001 10021912 - - 53.23 1.31 1.00
Important nuance: In Python, oracledb.decode_oson() yields a standard dict, so we cannot measure lazy access as we can with the Java driver’s OracleJsonObject.get() method, which can skip object instantiation. We measured it for one field from the raw BSON to show that there is a cost, which is higher than accessing the dict, though still less than a microsecond for large documents. In general, since you store together what is accessed together, it often makes sense to decode to an application object.
Encoding OSON is slower than BSON, especially for large documents, by design—because it computes navigation metadata for faster reads—whereas BSON writes a contiguous field stream. For the largest case, encoding takes ~15 seconds over 100 iterations, which translates to milliseconds per operation.
Decoding BSON is marginally faster than OSON, but the difference is negligible since all decoding times are under a millisecond. OSON’s extra metadata helps mainly when reading a few fields from a large document, as it avoids instantiating an immutable object.
Raw BSON provides faster "decoding" (as it isn’t actually decoded), but slower field access. Still, this difference—less than a millisecond—is negligible except when accessing many fields, in which case you should decode once to a Python dict.
This test illustrates the different design goals of BSON and OSON. I used the Python driver to illustrage what an application does: get a document from a query and manipulate as an application object. On the server, it is different, and queries may modify a single field in a document. OSON will do it directly on the OSON datatype, as it has all metadata, whereas BSON will be accessed though mutable BSON object.
Top comments (0)