Franck Pachot

Posted on Jan 10 • Edited on Jan 12

Which Document class is best to use in Java to read MongoDB documents?

#mongodb #java #oracle #postgres

TL;DR: the answer is in the title, use Document.

BSON is a serialization format, similar to protobuf, designed for efficient document storage on disk and transfer over the network as a byte stream. Instead of scanning and rewriting the entire byte sequence to access its contents (fields, arrays, subdocuments), you work with an in-memory object that exposes methods to read and write fields efficiently. On MongoDB's server side, that's the mutable BSON object. On the client side, the drivers provide a similar API. Here are the five object types that implement the BSON interface in Java

Document is the recommendation for most applications. It provides the best balance of flexibility, ease of use, and functionality.

Only consider the other classes when you have specific requirements:

BsonDocument: When you need strict BSON type safety rather than application types.
RawBsonDocument: When you need to access the raw bytes of the document rather than field values.
JsonObject: When working exclusively with JSON strings, as plain text.
BasicDBObject: Only for legacy code migration.

RawBsonDocument is used internally—for example, for client-side encryption and change streams—while the other classes are intended for direct use in MongoDB driver operations. It can also be used for lazy accessors that read BSON field values when not all fieds are used, though in that case I prefer to revisit the query's projection.

This article focuses exclusively on driver-provided document types rather than application POJOs mapped through the CodecRegistry. While both systems use codecs for BSON serialization, they operate at different abstraction levels: driver objects are generic containers provided by the MongoDB driver, whereas POJOs are application-specific domain objects that require custom codec mapping.

Your choice mainly impacts how you construct, manipulate, and access document data in your application code. They are documented:

Documents - Java Sync Driver - MongoDB Docs

mongodb.com

Note that

I reviewed the documentation and code to better understand their differences. Especially when comparing with other databases which have different design and API. In this post I provide a detailed description of those classes, what they are, how they are implemented, what they provide, and when to use them

Detailed description

While analyzing the performance of a program that scanned a RawBsonDocument field by field, I noticed performance issues and confirmed that this is not the right way to retrieve document values in MongoDB. Instead, documents should be modeled so that data accessed together is stored together, rather than being read back as small fragments, field by field. If you only need a subset of fields, rely on a projection in your find() query instead.

Understanding these distinctions helped me implement this fix to get the expected timings to get field values, and I'll show the relevant code though this article.

Document (org.bson)

Document is a flexible representation of a BSON document that implements Map<String, Object>. It uses a LinkedHashMap<String, Object> internally to maintain insertion order.

Key characteristics:

Loosely-typed: Values are stored as Object, allowing you to use standard Java types (String, Integer, Date, etc.)
Flexible: Easy to work with dynamically structured documents
Map interface: Provides all standard Map operations

Document is the recommended default choice for most applications. Use it when you want a flexible and concise data representation that's easy to work with.

BsonDocument (org.bson)

BsonDocument is a type-safe container for BSON documents that implements Map<String, BsonValue>. It also uses a LinkedHashMap, but stores BsonValue types.

Key characteristics:

Type-safe: All values must be wrapped in BSON library types (BsonString, BsonInt32, BsonDocument, etc.)
Stricter API: Provides compile-time type safety but requires more verbose code
Map interface: Implements Map<String, BsonValue>

Use BsonDocument when you need a type-safe API and want explicit control over BSON types. This is particularly useful when you need to ensure precise type handling or when working with APIs that require BsonDocument.

RawBsonDocument (org.bson)

RawBsonDocument is an immutable BSON document represented using only raw bytes. It stores the BSON document as a byte array without parsing it.

Key characteristics:

Immutable: All mutation operations throw UnsupportedOperationException
Lazy parsing: Data is only parsed when accessed, making it very efficient for pass-through scenarios, when not for accessing each individual field
Memory efficient: Stores raw bytes, avoiding object allocation overhead
Can decode to other types: Provides decode() method to convert to other document types when needed

Use RawBsonDocument when you need maximum performance and memory efficiency for whole-document operations. It is particularly useful when reading documents from MongoDB and passing them to another system unchanged, when working with large documents that you don’t need to parse, when building high-performance data pipelines where parsing overhead matters, and when you need an immutable document representation.

JsonObject (org.bson.json)

JsonObject is a wrapper class that holds a JSON object string. It simply stores the JSON as a String.

Key characteristics:

Does not implement Map: It's just a string wrapper with validation
No parsing required: Avoids conversion to Map structure if you're just working with JSON
JSON-focused: Designed for applications that primarily work with JSON strings
Supports Extended JSON: Works with MongoDB Extended JSON format

Use JsonObject when you want to work directly with JSON strings and avoid the overhead of converting to and from Map objects. This is ideal for REST APIs that consume and produce JSON, for logging or persisting documents as JSON strings, and for applications that primarily handle JSON and do not require programmatic field-level access.

BasicDBObject (com.mongodb)

BasicDBObject is a legacy BSON object implementation that extends BasicBSONObject and implements the DBObject interface.

Key characteristics:

Legacy class: Exists for backward compatibility with older driver versions
Does not implement Map: Only implements the DBObject interface, lacking modern Map convenience methods
Binary compatibility concerns: Implements an interface rather than extending a class, which can cause compatibility issues

Only use BasicDBObject when migrating from a legacy driver version (pre-3.0). The documentation explicitly recommends avoiding this class for new development due to its limitations.

Conversion Between Types

All of these classes implement the Bson interface, which allows them to be used interchangeably in MongoDB operations (but without the same performance). You can also convert between types:

BsonDocument.parse(json) to create from JSON
RawBsonDocument.decode(codec) to convert RawBsonDocument to another type
Document can be converted to BsonDocument through codec operations
JsonObject.getJson() to extract the JSON string

Comparing `RawBsonDocument` and `Document`

RawBsonDocument stores BSON as a raw byte array and does not deserialize it in advance. When you access a field, it creates a BsonBinaryReader and scans the document sequentially, reading each field’s type and name until it finds the requested key. Only the matching field is decoded using RawBsonValueHelper.decode, while all other fields are skipped without parsing. For nested documents and arrays, it reads only their sizes and wraps the corresponding byte ranges in new RawBsonDocument or RawBsonArray instances, keeping their contents as raw bytes. This approach provides fast access for a single-field lookup, is memory-efficient, and keeps the document immutable, which is ideal for large documents where only a few fields are needed or for documents that are mostly passed through without inspection.

In contrast, Document uses a fully deserialized LinkedHashMap<String, Object>. When a Document is created, all fields are eagerly parsed into Java objects. Field access and containsKey operations are simple HashMap lookups, and the document is fully mutable, supporting standard map operations such as put, remove, and clear. This design consumes more memory but is better suited to small or medium-sized documents, scenarios where many fields are accessed, or cases where the document needs to be modified frequently.

Finally, Document doesn’t rely on RawBsonDocument for parsing or field access, and the two classes serve different purposes.

In my fix for the problem mentioned above, I converted the RawBsonDocument to a Document so that parsing time would be captured in the benchmark:

MongoCollection<RawBsonDocument> rawCollection = db.getCollection("benchmark_client_side", RawBsonDocument.class);
// Get a RawBsonDocument from the query
RawBsonDocument raw = rawCollection.find(new Document("_id", docId)).first();
// Later, decode it into a Document
Document doc = raw.decode(new DocumentCodec());

However, the correct code should simply query into a Document:

MongoCollection<Document> docCollection = db.getCollection("benchmark_client_side", Document.class);  
Document doc = docCollection.find(new Document("_id", docId)).first();

Getting the result as a Document takes more time and memory than a RawBsonDocument, but it parses the BSON fields to create Java objects. For example, instead of using .asString().getValue() from the raw BSON as this:

String res=raw.get(fieldName).asString().getValue();

my fix simply casts to a String:

String res=(String)doc.get(fieldName);

That's what you application uses.

Comparison with PostgreSQL (JSONB) and Oracle Database (OSON)

PostgreSQL’s JDBC driver provides no native Java JSON API for JSON or JSONB columns: values are always returned as text, so the application must parse the string into its own document model using a separate JSON library. Even when using PostgreSQL’s binary JSONB storage, none of the binary efficiency crosses the wire (See JSONB vs. BSON: Tracing PostgreSQL and MongoDB Wire Protocols), and client code still performs a full parse before accessing individual fields.

In Oracle’s JDBC driver, the closest equivalent to a Document is OracleJsonObject, one of the OracleJsonValue types, which can be exposed directly as a javax.json.JsonObject or mapped domain object. This API works directly on the underlying OSON bytes without fully parsing the entire document into an intermediate data structure. OSON is more than a raw serialization: it carries its own local dictionary of distinct field names, a sorted array of hash codes for those names, and compact field ID arrays and value offsets, enabling the driver to locate a field in place by binary‑searching the field ID array, and jumping to the right offset.

In short, BSON is a serialization format that builds its access map when read to memory, while OSON is designed for direct application access and builds it on write. Even if OSON provides fast access to fields directly from the OSON structure, it requires calling OracleJsonObject.get().asJsonString().getString() to get the value as a Java string, similar to the one we got with Document.get().

If JSON text is needed instead of the object model, the equivalent is simply to use ResultSet.getString() from the SQL resultset, which will convert the OSON image to JSON text.

Neither Oracle nor PostgreSQL provides BSON as they use OSON and JSONB, so there's no BsonDocument or RawBsonDocument equivalent.

Conclusion

MongoDB’s main advantage for modern applications—whatever the data types or workloads—is the ability to work with data directly through your domain objects, without an intermediate object-relational mapping layer or view. Use the Document class as your document object model (DOM). It offers flexibility, map-style access, and a natural Java interface, while the driver transparently converts BSON from the network into objects your application can use immediately.

DEV Community

Which Document class is best to use in Java to read MongoDB documents?

Documents - Java Sync Driver - MongoDB Docs

Detailed description

Document (org.bson)

BsonDocument (org.bson)

RawBsonDocument (org.bson)

JsonObject (org.bson.json)

BasicDBObject (com.mongodb)

Conversion Between Types

Comparing `RawBsonDocument` and `Document`

Comparison with PostgreSQL (JSONB) and Oracle Database (OSON)

Conclusion

Top comments (0)

Documents - Java Sync Driver - MongoDB Docs

Detailed description

Document (org.bson)

BsonDocument (org.bson)

RawBsonDocument (org.bson)

JsonObject (org.bson.json)

BasicDBObject (com.mongodb)

Conversion Between Types

Comparing RawBsonDocument and Document

Comparison with PostgreSQL (JSONB) and Oracle Database (OSON)

Conclusion

Comparing `RawBsonDocument` and `Document`