There was a discussion on LinkedIn comparing OSON (the binary JSON in Oracle Database) and BSON (the binary JSON in MongoDB). To be clear, BSON and OSON aren’t directly comparable because they serve different purposes:
- BSON is a general-purpose binary serialization format, similar to Protobuf, optimized for efficient transport, storage, and fast encode/decode. It uses a compact byte-stream representation to reduce disk usage, applying additional metadata in memory during deserialization for quick data access. This design allows the database server to retrieve specific fields efficiently though the database server and drivers.
- OSON is Oracle Database’s internal JSON storage format, designed for database operations. It embeds metadata that lets the database query fields and navigate the document efficiently without fully decoding it, much like a tiny datastore whose catalog and indexes are stored with the data. This lets applications access JSON fields directly, without needing a document database smart driver.
They share two objectives but make different trade-offs:
- Compactness Through Binary Encoding – Both formats achieve compactness without compression through binary encoding and type-specific representations. However, BSON prioritizes encoding speed with a straightforward binary representation, whereas OSON achieves greater compactness through local dictionary compression of field names within documents.
- Partial Navigation Without Full Scanning – Both enable efficient partial document traversal, but with different approaches. BSON uses simple size prefixes to enable jumping between fields without an O(n) complexity proportional to the document size. A BSON document is meant to be stored as a single variable-size block (possible in the WiredTiger B-Tree). OSON implements comprehensive metadata structures—such as a tree segment and jumpable offsets—so that it can reduce random reads, and avoid O(n) complexity, when stored in multiple fixed-size blocks (the Oracle Database storage model).
They differ in one major design objective regarding updates:
- BSON follows MongoDB’s "store together what is accessed together" principle, favoring full-document reads and writes as a single contiguous block on disk over in-place modifications. Consequently, standard BSON is not optimized for partial updates, aside from an in-memory mutable representation that preserves consistency and transactional boundaries. This in-memory mutable representation is what is used on the database server and drivers.
- OSON is esigned to optimize database operations by allowing efficient in-place modifications without extra memory allocation. The trade-off is higher CPU usage on the database server to add and write metadata, but lower memory usage on the client, since that metadata is included in the query result set.
In short, MongoDB’s BSON uses different optimized formats for disk and memory and follows an application-centric model that offloads some processing to smart drivers. Oracle’s OSON is database-centric, avoids application-side processing, and is optimized for a fixed block size, with the buffer pool using the same on-disk structure in memory. Neither is inherently better: each is best suited to its own database engine and its specific features and constraints.
I mentioned the Java driver in the previous post. The Oracle Java driver supports fast access via OracleJsonObject.get(), which avoids instantiating a new object and uses the internal metadata for navigation. I wanted to see how this works, but because the Oracle Database Java driver isn’t open source, I tried Python instead, where the Oracle driver is open source. However, it doesn’t provide an equivalent to OracleJsonObject.get(), so you must decode and encode to a Python dict, similar to BSON.
I ran the following program to measure the time and size to encode and decode BSON and OSON:
import time
import bson
import oracledb
from bson.codec_options import CodecOptions
from bson.raw_bson import RawBSONDocument
# Prepare RawBSON codec options once for lazy BSON decoding
raw_codec_options = CodecOptions(document_class=RawBSONDocument)
def generate_large_document(num_fields, field_length):
long_str = "a" * field_length
return {f"field_{i+1}": long_str for i in range(num_fields)}
def compare_bson_oson(document, connection, iterations=100):
"""Compare BSON and OSON encode/decode, plus access after decode."""
middle_field_name = f"field_{len(document)//2}"
# Precompute sizes
bson_data_sample = bson.encode(document)
bson_size = len(bson_data_sample)
oson_data_sample = connection.encode_oson(document)
oson_size = len(oson_data_sample)
# Timers
bson_encode_total = 0.0
bson_decode_total = 0.0
bson_access_after_decode_total = 0.0
bson_decode_raw_total = 0.0
bson_access_raw_total = 0.0
oson_encode_total = 0.0
oson_decode_total = 0.0
oson_access_after_decode_total = 0.0
for _ in range(iterations):
# BSON encode
start = time.perf_counter()
bson_data = bson.encode(document)
bson_encode_total += (time.perf_counter() - start)
# BSON decode raw (construct RawBSONDocument)
start = time.perf_counter()
raw_bson_doc = RawBSONDocument(bson_data, codec_options=raw_codec_options)
bson_decode_raw_total += (time.perf_counter() - start)
# BSON access single field from raw doc
start = time.perf_counter()
_ = raw_bson_doc[middle_field_name]
bson_access_raw_total += (time.perf_counter() - start)
# BSON full decode
start = time.perf_counter()
decoded_bson = bson.decode(bson_data)
bson_decode_total += (time.perf_counter() - start)
# BSON access after full decode
start = time.perf_counter()
_ = decoded_bson[middle_field_name]
bson_access_after_decode_total += (time.perf_counter() - start)
# OSON encode
start = time.perf_counter()
oson_data = connection.encode_oson(document)
oson_encode_total += (time.perf_counter() - start)
# OSON full decode
start = time.perf_counter()
decoded_oson = connection.decode_oson(oson_data)
oson_decode_total += (time.perf_counter() - start)
# OSON access after full decode
start = time.perf_counter()
_ = decoded_oson[middle_field_name]
oson_access_after_decode_total += (time.perf_counter() - start)
return (
bson_encode_total,
bson_decode_total,
bson_access_after_decode_total,
bson_size,
bson_decode_raw_total,
bson_access_raw_total
), (
oson_encode_total,
oson_decode_total,
oson_access_after_decode_total,
oson_size
)
def run_multiple_comparisons():
iterations = 100
num_fields_list = [10, 100, 1000]
field_sizes_list = [1000, 10000, 100000] # small length → large length
# Oracle functions need a connection even if all happens locally 🤷🏼♂️
oracledb.init_oracle_client(config_dir="/home/opc/Wallet")
connection = oracledb.connect(
user="franck",
password="My Strong P455w0rd",
dsn="orcl_tp"
)
print(f"{'Format':<6} {'Fields':<8} {'FieldLen':<10} "
f"{'Encode(s)':<12} {'Decode(s)':<12} {'Access(s)':<12} {'Size(bytes)':<12} "
f"{'DecRaw(s)':<12} {'AccRaw(s)':<12} "
f"{'EncRatio':<9} {'DecRatio':<9} {'SizeRatio':<9}")
print("-" * 130)
for field_length in field_sizes_list:
for num_fields in num_fields_list:
document = generate_large_document(num_fields, field_length)
bson_res, oson_res = compare_bson_oson(document, connection, iterations)
enc_ratio = oson_res[0] / bson_res[0] if bson_res[0] > 0 else 0
dec_ratio = oson_res[1] / bson_res[1] if bson_res[1] > 0 else 0
size_ratio = oson_res[3] / bson_res[3] if bson_res[3] > 0 else 0
# BSON row
print(f"{'BSON':<6} {num_fields:<8} {field_length:<10} "
f"{bson_res[0]:<12.4f} {bson_res[1]:<12.4f} {bson_res[2]:<12.4f} {bson_res[3]:<12} "
f"{bson_res[4]:<12.4f} {bson_res[5]:<12.4f} "
f"{'-':<9} {'-':<9} {'-':<9}")
# OSON row
print(f"{'OSON':<6} {num_fields:<8} {field_length:<10} "
f"{oson_res[0]:<12.4f} {oson_res[1]:<12.4f} {oson_res[2]:<12.4f} {oson_res[3]:<12} "
f"{'-':<12} {'-':<12} "
f"{enc_ratio:<9.2f} {dec_ratio:<9.2f} {size_ratio:<9.2f}")
connection.close()
if __name__ == "__main__":
run_multiple_comparisons()
I got the following results:
-
Encode(s)— total encode time over 100 iterations. -
Decode(s)— full decode into Python objects (dict). -
Access(s)— access to a field from Python objects (dict). -
DecRaw(s)— creation of aRawBSONDocumentfor BSON (no equivalent for OSON). -
AccRaw(s)— single middle‑field access from a raw document (lazy decode for BSON). - Ratios — OSON time / BSON time.
$ TNS_ADMIN=/home/opc/Wallet python bson-oson.py
Format Fields FieldLen Encode(s) Decode(s) Access(s) Size(bytes) DecRaw(s) AccRaw(s) EncRatio DecRatio SizeRatio
----------------------------------------------------------------------------------------------------------------------------------
BSON 10 1000 0.0005 0.0005 0.0000 10146 0.0002 0.0011 - - -
OSON 10 1000 0.0013 0.0006 0.0000 10206 - - 2.44 1.26 1.01
BSON 100 1000 0.0040 0.0043 0.0000 101497 0.0012 0.0101 - - -
OSON 100 1000 0.0103 0.0053 0.0000 102009 - - 2.58 1.25 1.01
BSON 1000 1000 0.0422 0.0510 0.0000 1015898 0.0098 0.0990 - - -
OSON 1000 1000 0.1900 0.0637 0.0000 1021912 - - 4.50 1.25 1.01
BSON 10 10000 0.0019 0.0017 0.0000 100146 0.0005 0.0025 - - -
OSON 10 10000 0.0045 0.0021 0.0000 100208 - - 2.36 1.27 1.00
BSON 100 10000 0.0187 0.0177 0.0000 1001497 0.0026 0.0225 - - -
OSON 100 10000 0.1247 0.0241 0.0000 1002009 - - 6.66 1.36 1.00
BSON 1000 10000 0.2709 0.2439 0.0001 10015898 0.0235 0.2861 - - -
OSON 1000 10000 14.4215 0.3185 0.0001 10021912 - - 53.23 1.31 1.00
Important nuance: In Python, oracledb.decode_oson() yields a standard dict, so we cannot measure lazy access as we can with the Java driver’s OracleJsonObject.get() method, which can skip object instantiation. We measured it for one field from the raw BSON to show that there is a cost, which is higher than accessing the dict, though still less than a microsecond for large documents. In general, since you store together what is accessed together, it often makes sense to decode to an application object.
Encoding OSON is slower than BSON, especially for large documents, by design—because it computes navigation metadata for faster reads—whereas BSON writes a contiguous field stream. For the largest case, encoding takes ~15 seconds over 100 iterations, which translates to milliseconds per operation.
Decoding BSON is marginally faster than OSON, but the difference is negligible since all decoding times are under a millisecond. OSON’s extra metadata helps mainly when reading a few fields from a large document, as it avoids instantiating an immutable object.
Raw BSON provides faster "decoding" (as it isn’t actually decoded), but slower field access. Still, this difference—less than a millisecond—is negligible except when accessing many fields, in which case you should decode once to a Python dict.
This test illustrates the different design goals of BSON and OSON. I used the Python driver to illustrage what an application does: get a document from a query and manipulate as an application object. On the server, it is different, and queries may modify a single field in a document. OSON will do it directly on the OSON datatype, as it has all metadata, whereas BSON will be accessed though mutable BSON object.
Top comments (0)