Franck Pachot

Posted on Dec 21, 2025 • Edited on Dec 23, 2025

JSONB vs. BSON: Tracing PostgreSQL and MongoDB Wire Protocols

#mongodb #postgres #json #network

There is an essential difference between MongoDB’s BSON and PostgreSQL’s JSONB. Both are binary JSON formats, but they serve different roles. JSONB is purely an internal storage format for JSON data in PostgreSQL. BSON, on the other hand, is MongoDB’s native data format: it is used by application drivers, over the network, in memory, and on disk.

JSONB: PostgreSQL internal storage format

JSONB is a storage format, as defined by the PostgreSQL documentation:

PostgreSQL offers two types for storing JSON data: json and jsonb

PostgreSQL uses JSONB solely for internal storage, requiring the entire structure to be read to access a field, as observed in JSONB DeTOASTing (read amplification).

When PostgreSQL introduced JSONB, it evaluated BSON (see JSON Patch for PostgreSQL - BSON Support?) but chose a more storage-oriented design. JSONB stores arrays compactly instead of flattening them into ordered key–value pairs like BSON, whose layout is optimized for document-oriented APIs that parse objects and arrays the same way.

BSON: MongoDB storage and exchange format

BSON is used for storage and also as an exchange format between the application and the database, as defined in the BSON specification:

BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more "schema-less" than Protocol Buffers

On the application side, the MongoDB driver converts application objects to BSON, which supports more data types than JSON or JSONB, including datetime and binary. This BSON is sent and received over the network and stored and manipulated on the server as-is, with no extra parsing. Both the driver and the database can efficiently access fields via the binary structure because BSON includes metadata such as field length prefixes and explicit type information, even for large or nested documents.

PostgreSQL protocol is JSON (text), not JSONB

To illustrate this, I've written a small Python program that inserts a document into a PostgreSQL table with a JSONB column, and queries that table to retrieve the document:

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import declarative_base, sessionmaker

Base = declarative_base()

class Item(Base):
    __tablename__ = 'items'
    id = Column(Integer, primary_key=True)
    data = Column(JSONB)  # our JSONB column

# Connect to Postgres
engine = create_engine('postgresql+psycopg2://', echo=True)
Session = sessionmaker(bind=engine)
session = Session()

# Create table
Base.metadata.create_all(engine)

# Insert an object into JSONB column
obj = {"name": "widget", "price": 9.99, "tags": ["new", "sale"]}
session.add(Item(data=obj))
session.commit()

# Read back the table
for row in session.query(Item).all():
    print(row.id, row.data)

The program uses SQLAlchemy to send and retrieve Python objects to and from PostgreSQL via the Psycopg2 driver. I've stored it in demo.py.

When I run the program, with python demo.py, before it displays the final result, it logs all SQL statements:

2025-12-21 12:50:22,484 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2025-12-21 12:50:22,485 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine select current_schema()
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2025-12-21 12:50:22,486 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-12-21 12:50:22,487 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,488 INFO sqlalchemy.engine.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
2025-12-21 12:50:22,488 INFO sqlalchemy.engine.Engine [generated in 0.00015s] {'name': 'items'}
2025-12-21 12:50:22,489 INFO sqlalchemy.engine.Engine
CREATE TABLE items (
        id SERIAL NOT NULL,
        data JSONB,
        PRIMARY KEY (id)
)


2025-12-21 12:50:22,489 INFO sqlalchemy.engine.Engine [no key 0.00011s] {}
2025-12-21 12:50:22,491 INFO sqlalchemy.engine.Engine COMMIT
2025-12-21 12:50:22,493 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,494 INFO sqlalchemy.engine.Engine INSERT INTO items (data) VALUES (%(data)s) RETURNING items.id
2025-12-21 12:50:22,494 INFO sqlalchemy.engine.Engine [generated in 0.00018s] {'data': '{"name": "widget", "price": 9.99, "tags": ["new", "sale"]}'}
2025-12-21 12:50:22,495 INFO sqlalchemy.engine.Engine COMMIT
2025-12-21 12:50:22,497 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 12:50:22,498 INFO sqlalchemy.engine.Engine SELECT items.id AS items_id, items.data AS items_data
FROM items
2025-12-21 12:50:22,498 INFO sqlalchemy.engine.Engine [generated in 0.00013s] {}

1 {'name': 'widget', 'tags': ['new', 'sale'], 'price': 9.99}

To see what is sent and received through the network by the PostgreSQL protocol, I run the program with strace, showing the sendto and recv system calls with their arguments: strace -e trace=sendto,recvfrom -yy -s 1000 python demo.py.

Like most SQL database drivers, the protocol is basic: send SQL commands as text, and fetch a tabular result set. In the PostgreSQL protocol's messages, the first letter is the message type (Q for Simple Query Message, followed by the length of the message, and the message in text, X to terminate the session, C for command completion status, T abd D for the resultset).

Here is the output, the lines starting with the timestamp are the logs from SQL Alchemy, those starting with sendto and recv are the network system calls with the message to the database, and the result from it

Where is the trace when inserting one document:

2025-12-21 16:52:20,278 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 16:52:20,279 INFO sqlalchemy.engine.Engine INSERT INTO items (data) VALUES (%(data)s) RETURNING items.id
2025-12-21 16:52:20,279 INFO sqlalchemy.engine.Engine [generated in 0.00029s] {'data': '{"name": "widget", "price": 9.99, "tags": ["new", "sale"]}'}

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "C\0\0\0\nBEGIN\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 17

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0vINSERT INTO items (data) VALUES ('{\"name\": \"widget\", \"price\": 9.99, \"tags\": [\"new\", \"sale\"]}') RETURNING items.id\0", 119, MSG_NOSIGNAL, NULL, 0) = 119

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "T\0\0\0\33\0\1id\0\0\0@\310\0\1\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\v\0\1\0\0\0\0011C\0\0\0\17INSERT 0 1\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 62
2025-12-21 16:52:20,281 INFO sqlalchemy.engine.Engine COMMIT

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0\vCOMMIT\0", 12, MSG_NOSIGNAL, NULL, 0) = 12

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "C\0\0\0\vCOMMIT\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 18

It started a transaction (Q\0\0\0\nBEGIN), received command completion (C\0\0\0\nBEGIN), then sent the full text of the INSERT command, including the JSON payload (Q\0\0\0vINSERT INTO items (data) VALUES ('{\"name\": \"widget\", \"price\": 9.99, \"tags\": [\"new\", \"sale\"]}). It subsequently received command completion (INSERT 0 1) and the returned ID (T\0\0\0\33\0\1id, D\0\0\0\v\0\1\0\0\0\001).

Here is the trace when I query and fetch the document:

2025-12-21 16:52:20,283 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-12-21 16:52:20,285 INFO sqlalchemy.engine.Engine SELECT items.id AS items_id, items.data AS items_data
FROM items
2025-12-21 16:52:20,285 INFO sqlalchemy.engine.Engine [generated in 0.00024s] {}

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "C\0\0\0\nBEGIN\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 17

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0FSELECT items.id AS items_id, items.data AS items_data \nFROM items\0", 71, MSG_NOSIGNAL, NULL, 0) = 71

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "T\0\0\0>\0\2items_id\0\0\0@\310\0\1\0\0\0\27\0\4\377\377\377\377\0\0items_data\0\0\0@\310\0\2\0\0\16\332\377\377\377\377\377\377\0\0D\0\0\0I\0\2\0\0\0\0011\0\0\0:{\"name\": \"widget\", \"tags\": [\"new\", \"sale\"], \"price\": 9.99}C\0\0\0\rSELECT 1\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 157

It started another transaction, sent the SELECT statement as text and received the result as JSON text (D\0\0\0I\0\2\0\0\0\0011\0\0\0:{\"name\": \"widget\", \"tags\": [\"new\", \"sale\"], \"price\": 9.99}).

Finally, the transaction ends, and the session is disconnected:

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "Q\0\0\0\rROLLBACK\0", 14, MSG_NOSIGNAL, NULL, 0) = 14

recvfrom(3<TCPv6:[[::1]:41858->[::1]:5432]>, "C\0\0\0\rROLLBACK\0Z\0\0\0\5I", 16384, 0, NULL, NULL) = 20

sendto(3<TCPv6:[[::1]:41858->[::1]:5432]>, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5

If you want to dig into the code, the server-side parsing is in jsonb_send and jsonb_recv ("The type is sent as text in binary mode"), and while it tests the version before converting to text, there's only one version. The client-side for Psycopg2 shows that register_default_jsonb is the same as register_default_json

Comparing with MongoDB (BSON from app to DB)

To compare with MongoDB, created the following demo-mongodb.py:

from pymongo import MongoClient
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.my_database
insert_result = db.items.insert_one({"name": "widget", "price": 9.99, "tags": ["new", "sale"]})
print("Inserted document ID:", insert_result.inserted_id)
for doc in items_collection.find():
    print(doc["_id"], doc)

I used the same strace command, but displaying all characters as hexadecimal to be able to decode them with bsondump:

$ strace -e trace=sendto,recvfrom -xx -yy -s 1000 python demo-mongodb.py 2>&1

Here is the network request for the insert:


sendto(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\xd6\x00\x00\x00\x51\xdc\xb0\x74\x00\x00\x00\x00\xdd\x07\x00\x00\x00\x00\x00\x00\x00\x5a\x00\x00\x00\x02\x69\x6e\x73\x65\x72\x74\x00\x06\x00\x00\x00\x69\x74\x65\x6d\x73\x00\x08\x6f\x72\x64\x65\x72\x65\x64\x00\x01\x03\x6c\x73\x69\x64\x00\x1e\x00\x00\x00\x05\x69\x64\x00\x10\x00\x00\x00\x04\x31\xb8\x9a\x81\xfd\x35\x42\x1a\x88\x44\xa8\x69\xe8\xba\x6f\x30\x00\x02\x24\x64\x62\x00\x0c\x00\x00\x00\x6d\x79\x5f\x64\x61\x74\x61\x62\x61\x73\x65\x00\x00\x01\x66\x00\x00\x00\x64\x6f\x63\x75\x6d\x65\x6e\x74\x73\x00\x58\x00\x00\x00\x07\x5f\x69\x64\x00\x69\x48\x3f\x7f\x87\x46\xd5\x2e\xe2\x0b\xbc\x0b\x02\x6e\x61\x6d\x65\x00\x07\x00\x00\x00\x77\x69\x64\x67\x65\x74\x00\x01\x70\x72\x69\x63\x65\x00\x7b\x14\xae\x47\xe1\xfa\x23\x40\x04\x74\x61\x67\x73\x00\x1c\x00\x00\x00\x02\x30\x00\x04\x00\x00\x00\x6e\x65\x77\x00\x02\x31\x00\x05\x00\x00\x00\x73\x61\x6c\x65\x00\x00\x00", 214, 0, NULL, 0) = 214

recvfrom(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\x2d\x00\x00\x00\x06\x00\x00\x00\x51\xdc\xb0\x74\xdd\x07\x00\x00", 16, 0, NULL, NULL) = 16

recvfrom(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\x00\x00\x00\x00\x00\x18\x00\x00\x00\x10\x6e\x00\x01\x00\x00\x00\x01\x6f\x6b\x00\x00\x00\x00\x00\x00\x00\xf0\x3f\x00", 29, 0, NULL, NULL) = 29

Inserted document ID: 69483f7f8746d52ee20bbc0b

Here is the fetch query that receives the document:


sendto(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\x70\x00\x00\x00\xff\x5c\x49\x19\x00\x00\x00\x00\xdd\x07\x00\x00\x00\x00\x00\x00\x00\x5b\x00\x00\x00\x02\x66\x69\x6e\x64\x00\x06\x00\x00\x00\x69\x74\x65\x6d\x73\x00\x03\x66\x69\x6c\x74\x65\x72\x00\x05\x00\x00\x00\x00\x03\x6c\x73\x69\x64\x00\x1e\x00\x00\x00\x05\x69\x64\x00\x10\x00\x00\x00\x04\x31\xb8\x9a\x81\xfd\x35\x42\x1a\x88\x44\xa8\x69\xe8\xba\x6f\x30\x00\x02\x24\x64\x62\x00\x0c\x00\x00\x00\x6d\x79\x5f\x64\x61\x74\x61\x62\x61\x73\x65\x00\x00", 112, 0, NULL, 0) = 112

recvfrom(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\xc5\x00\x00\x00\x07\x00\x00\x00\xff\x5c\x49\x19\xdd\x07\x00\x00", 16, 0, NULL, NULL) = 16

recvfrom(5<TCP:[127.0.0.1:44570->127.0.0.1:27017]>, "\x00\x00\x00\x00\x00\xb0\x00\x00\x00\x03\x63\x75\x72\x73\x6f\x72\x00\x97\x00\x00\x00\x04\x66\x69\x72\x73\x74\x42\x61\x74\x63\x68\x00\x60\x00\x00\x00\x0$\x30\x00\x58\x00\x00\x00\x07\x5f\x69\x64\x00\x69\x48\x3f\x7f\x87\x46\xd5\x2e\xe2\x0b\xbc\x0b\x02\x6e\x61\x6d\x65\x00\x07\x00\x00\x00\x77\x69\x64\x67\x65\x74\x00\x01\x70\x72\x69\x63\x65\x00\x7b\x14\xae\x47\xe1\xfa\x23\x40\x04\x74\x61\x67\x73\x00\x1c\x00\x00\x00\x02\x30\x00\x04\x00\x00\x00\x6e\x65\x77\x00\x02\x31\x00\x05\x00\x00\x00\x73\x61\x6c\x65\x00\x00\x00\x00\x12\x69\x64\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x6e\x73\x00\x12\x00\x00\x00\x6d\x79\x5f\x64\x61\x74\x61\x62\x61\x73\x65\x2e\x69\x74\x65\x6d\x73\x00\x00\x01\x6f\x6b\x00\x00\x00\x00\x00\x00\x00\xf0\x3f\x00", 181, 0, NULL, NULL) = 181

69483f7f8746d52ee20bbc0b {'_id': ObjectId('69483f7f8746d52ee20bbc0b'), 'name': 'widget', 'price': 9.99, 'tags': ['new', 'sale']}

I use bsondump, available in the MongoDB container, to decode the messages.

Insert starts with a 20 bytes message header: Total message size in little-endian = 0xd6 = 214 bytes \xd6\x00\x00\x00, requestID \x51\xdc\xb0\x74, responseTo (0 for client->server) \x00\x00\x00\x00, opCode = 2013 (OP_MSG) \xdd\x07\x00\x00, \x00\x00\x00\x00 and then starts BSON:

root@9574ecd2d248:/# bsondump <(echo -ne '\x5a\x00\x00\x00\x02\x69\x6e\x73\x65\x72\x74\x00\x06\x00\x00\x00\x69\x74\x65\x6d\x73\x00\x08\x6f\x72\x64\x65\x72\x65\x64\x00\x01\x03\x6c\x73\x69\x64\x00\x1e\x00\x00\x00\x05\x69\x64\x00\x10\x00\x00\x00\x04\x31\xb8\x9a\x81\xfd\x35\x42\x1a\x88\x44\xa8\x69\xe8\xba\x6f\x30\x00\x02\x24\x64\x62\x00\x0c\x00\x00\x00\x6d\x79\x5f\x64\x61\x74\x61\x62\x61\x73\x65\x00\x00\x01\x66\x00\x00\x00\x64\x6f\x63\x75\x6d\x65\x6e\x74\x73\x00\x58\x00\x00\x00')
{  
  "insert": "items",  
  "ordered": true,  
  "lsid": {  
    "id": {  
      "$binary": {  
        "base64": "Mbiagf01QhqIRKhp6LpvMA==",  
        "subType": "04"  
      }  
    }  
  },  
  "$db": "my_database"  
}  
2025-12-21T19:09:39.214+0000    1 objects found
2025-12-21T19:09:39.214+0000    unexpected EOF
root@9574ecd2d248:/#

This shows unexpected EOF because the "documents" array is actually sent in the next section of the OP_MSG, not embedded here. The second BSON section starts with its own length field (\x58\x00\x00\x00 = 88 bytes) and contains the actual document to be inserted:

root@9574ecd2d248:/# bsondump <(echo -ne '\x58\x00\x00\x00\x07\x5f\x69\x64\x00\x69\x48\x3f\x7f\x87\x46\xd5\x2e\xe2\x0b\xbc\x0b\x02\x6e\x61\x6d\x65\x00\x07\x00\x00\x00\x77\x69\x64\x67\x65\x74\x00\x01\x70\x72\x69\x63\x65\x00\x9a\x99\x99\x99\x99\x99\x23\x40\x04\x74\x61\x67\x73\x00\x1c\x00\x00\x00\x02\x30\x00\x04\x00\x00\x00\x6e\x65\x77\x00\x02\x31\x00\x05\x00\x00\x00\x73\x61\x6c\x65\x00\x00\x00')
{  
  "_id": {  
    "$oid": "69483f7f8746d52ee20bbc0b"  
  },  
  "name": "widget",  
  "price": {  
    "$numberDouble": "9.8"  
  },  
  "tags": [  
    "new",  
    "sale"  
  ]  
}  
2025-12-21T19:09:49.278+0000    1 objects found
root@9574ecd2d248:/#

BSON holds the document in a flexible binary format, including all field names, datatypes, and values, which is what is exchanged between the application driver and the database server. Y can even see that the the ObjectId was generated by the driver, before sending to the database.

I can do the same with the query result:

root@9574ecd2d248:/# bsondump <(echo -ne '\xb0\x00\x00\x00\x03\x63\x75\x72\x73\x6f\x72\x00\x97\x00\x00\x00\x04\x66\x69\x72\x73\x74\x42\x61\x74\x63\x68\x00\x60\x00\x00\x00\x03\x30\x00\x58\x00\x00\x00\x07\x5f\x69\x64\x00\x69\x48\x3f\x7f\x87\x46\xd5\x2e\xe2\x0b\xbc\x0b\x02\x6e\x61\x6d\x65\x00\x07\x00\x00\x00\x77\x69\x64\x67\x65\x74\x00\x01\x70\x72\x69\x63\x65\x00\x7b\x14\xae\x47\xe1\xfa\x23\x40\x04\x74\x61\x67\x73\x00\x1c\x00\x00\x00\x02\x30\x00\x04\x00\x00\x00\x6e\x65\x77\x00\x02\x31\x00\x05\x00\x00\x00\x73\x61\x6c\x65\x00\x00\x00\x00\x12\x69\x64\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x6e\x73\x00\x12\x00\x00\x00\x6d\x79\x5f\x64\x61\x74\x61\x62\x61\x73\x65\x2e\x69\x74\x65\x6d\x73\x00\x00\x01\x6f\x6b\x00\x00\x00\x00\x00\x00\x00\xf0\x3f\x00')
{  
  "cursor": {  
    "firstBatch": [  
      {  
        "_id": {  
          "$oid": "69483f7f8746d52ee20bbc0b"  
        },  
        "name": "widget",  
        "price": {  
          "$numberDouble": "9.99"  
        },  
        "tags": [  
          "new",  
          "sale"  
        ]  
      }  
    ],  
    "id": {  
      "$numberLong": "0"  
    },  
    "ns": "my_database.items"  
  },  
  "ok": {  
    "$numberDouble": "1.0"  
  }  
}  
2025-12-21T18:44:08.110+0000    1 objects found

Again, the document is received in BSON format, which stores binary values with the correct datatypes.

Conclusion: no JSONB in the application

With PostgreSQL, the JSON text is visible in the network messages, even when it comes from a JSONB column:

In PostgreSQL, storing as TEXT, JSON, or JSONB affects storage and indexing, but the wire protocol still sends and receives plain JSON text. Every query requires the client and server to parse and serialize it, adding CPU overhead and risking a loss of type fidelity for large or complex documents.

MongoDB uses BSON from end to end — in storage and on the wire, as a document API. Drivers map BSON types directly to application objects, preserving types like dates and binary fields without extra parsing. This reduces CPU cost on both sides, improves scalability, and makes large‑document handling more efficient.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.