DEV Community

Cover image for DocumentDB: Comparing Emulation Internals with MongoDB
Franck Pachot for AWS Heroes

Posted on

DocumentDB: Comparing Emulation Internals with MongoDB

MongoDB is the leading database for document data modeling, with its Atlas service available on AWS, Azure, and Google Cloud. Its popularity has led to the development of compatible APIs by other vendors, like Amazon DocumentDB (with MongoDB compatibility), highlighting MongoDB's importance in modern applications. Microsoft did the same for CosmosDB and developed a MongoDB emulation on PostgreSQL called DocumentDB, now part of the Linux Foundation.

AWS has joined the project. While today Amazon DocumentDB uses its own Aurora‑based proprietary engine, AWS’s participation opens the possibility that, in the future, the managed service could leverage this PostgreSQL‑based extension.

An emulation cannot truly replace MongoDB, which was designed to store, index, and process documents with flexible schema natively instead of using fixed-size blocks and relational tables, but may help in their transition. This article tests a simple query across three options: native MongoDB, PostgreSQL with the DocumentDB extension, and Oracle Database’s emulation - another emulation on top another RDBMS. They encounter similar challenges: implementing document semantics on top of a row-based engine. The aim is to demonstrate an evaluation method, including execution plans, to assess the pros and cons of each platform in relation to relevant application patterns, rather than specific use cases.

MongoDB

I create a simple collection with one field "nnn", indexed, and insert random values between 0 and 100:

db.franck.drop();

db.franck.createIndex({nnn:1});

void db.franck.insertMany(
 Array.from({ length: 100000 }, () => (
  { nnn: (Math.random() * 100)}
 ))
);
Enter fullscreen mode Exit fullscreen mode

I query values between 20 and 80, displaying the first five for pagination:

db.franck.find(
  { nnn: { $gte: 20, $lt: 80 } }
).sort({ nnn: 1 }).limit(5)

[
  { _id: ObjectId('68b37f883c2e2550c0d51c0c'), nnn: 20.00087217392812 },
  { _id: ObjectId('68b37f883c2e2550c0d5dd3c'), nnn: 20.000927538131542 },
  { _id: ObjectId('68b37f883c2e2550c0d5f1e7'), nnn: 20.000979995906974 },
  { _id: ObjectId('68b37f883c2e2550c0d59dc4'), nnn: 20.001754428025208 },
  { _id: ObjectId('68b37f883c2e2550c0d66c4f'), nnn: 20.002357317589414 }
]
Enter fullscreen mode Exit fullscreen mode

Here is the execution plan with execution statistics:

db.franck.find(
  { nnn: { $gte: 20, $lt: 80 } }
).sort({ nnn: 1 }).limit(5).explain("executionStats").executionStats

{
  executionSuccess: true,
  nReturned: 5,
  executionTimeMillis: 0,
  totalKeysExamined: 5,
  totalDocsExamined: 5,
  executionStages: {
    isCached: false,
    stage: 'LIMIT',
    nReturned: 5,
    executionTimeMillisEstimate: 0,
    works: 6,
    advanced: 5,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    limitAmount: 5,
    inputStage: {
      stage: 'FETCH',
      nReturned: 5,
      executionTimeMillisEstimate: 0,
      works: 5,
      advanced: 5,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 0,
      docsExamined: 5,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 5,
        executionTimeMillisEstimate: 0,
        works: 5,
        advanced: 5,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 0,
        keyPattern: { nnn: 1 },
        indexName: 'nnn_1',
        isMultiKey: false,
        multiKeyPaths: { nnn: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { nnn: [ '[20, 80)' ] },
        keysExamined: 5,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

MongoDB scanned the index (stage: 'IXSCAN') on "nnn" (keyPattern: { nnn: 1 }) for values between 20 and 80 (indexBounds: { nnn: [ '[20, 80)' ]). It examined 5 index entries (keysExamined: 5) and fetched the corresponding documents, resulting in 5 documents read (docsExamined: 5). It stopped (LIMIT) after returning the documents for the result (nReturned: 5).

We achieved exactly what we needed without any unnecessary work, so no further tuning is required. We could go further, like with a covering index to avoid the FETCH stage, but it's not needed as the number of documents fetched is low and bounded.

PostgreSQL with DocumentDB

To gain a comprehensive understanding of the emulation, I examine both the execution plan from the emulation and the execution plan in the underlying database. I begin by starting a container with DocumentDB.

I start a container for my lab using the DocumentDB image from the Microsoft repo, which will later move to the Linux Foundation, and I use the default ports.

docker run -d -p 10260:10260 -p 9712:9712 --name pgddb \
ghcr.io/microsoft/documentdb/documentdb-local:latest  \
--username ddb --password ddb
Enter fullscreen mode Exit fullscreen mode

I add auto-explain to show all execution plans for my lab:

## add auto-explain extension to be loaded
docker exec -it pgddb sed -e '/shared_preload_libraries/s/,/, auto_explain,/' -i /home/documentdb/postgresql/data/postgresql.conf

## bounce the instance
docker restart -t 5 pgddb

# set auto-explain by default for the emulation gateway
psql -e 'postgres://documentdb@localhost:9712/postgres' <<'SQL'
\dconfig shared_preload_libraries
alter user ddb set auto_explain.log_analyze=on;
alter user ddb set auto_explain.log_buffers=on;
alter user ddb set auto_explain.log_format=text;
alter user ddb set auto_explain.log_min_duration=0;
alter user ddb set auto_explain.log_nested_statements=on;
alter user ddb set auto_explain.log_settings=on;
alter user ddb set auto_explain.log_timing=on;
alter user ddb set auto_explain.log_triggers=on;
alter user ddb set auto_explain.log_verbose=on;
alter user ddb set auto_explain.log_wal=on;
SQL

# tail the PostgreSQL log in the background to see the execution plan
docker exec -it pgddb tail -f /home/documentdb/postgresql/data/pglog.log | grep -v " LOG: cron job" &

# connect to the MogoDB emulation gateway
mongosh 'mongodb://ddb:ddb@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true'

Enter fullscreen mode Exit fullscreen mode

I create the same collection and index as in my MongoDB test, run the same query, and check the execution plan:

db.franck.find(
  { nnn: { $gte: 20, $lt: 80 } }
).sort({ nnn: 1 }).limit(5).explain("executionStats").executionStats

{
  nReturned: Long('5'),
  executionTimeMillis: Long('154'),
  totalDocsExamined: Long('5'),
  totalKeysExamined: Long('5'),
  executionStages: {
    stage: 'LIMIT',
    nReturned: Long('5'),
    executionTimeMillis: Long('154'),
    totalKeysExamined: Long('5'),
    totalDocsExamined: 5,
    numBlocksFromCache: 4415,
    numBlocksFromDisk: 0,
    inputStage: {
      stage: 'SORT',
      nReturned: Long('5'),
      executionTimeMillis: Long('154'),
      totalKeysExamined: Long('5'),
      totalDocsExamined: 5,
      sortMethod: 'top-N heapsort',
      totalDataSizeSortedBytesEstimate: 26,
      numBlocksFromCache: 4415,
      numBlocksFromDisk: 0,
      inputStage: {
        stage: 'FETCH',
        nReturned: Long('59935'),
        executionTimeMillis: Long('133'),
        totalKeysExamined: Long('59935'),
        indexName: 'nnn_1',
        totalDocsRemovedByIndexRechecks: 0,
        numBlocksFromCache: 4415,
        numBlocksFromDisk: 0,
        inputStage: {
          stage: 'IXSCAN',
          nReturned: Long('59935'),
          executionTimeMillis: Long('133'),
          totalKeysExamined: Long('59935'),
          indexName: 'nnn_1',
          totalDocsRemovedByIndexRechecks: 0,
          numBlocksFromCache: 4415,
          numBlocksFromDisk: 0
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

DocumentDB scanned the index (stage: 'IXSCAN'), identifying the index name but lacking details on key patterns or index bounds. It appears to have read the correct range (from 20 to 80) but did not apply pagination efficiently, as evidenced by the high volume of index entries read (totalKeysExamined: Long('59935'), nReturned: Long('59935')). All documents were retrieved (stage: 'FETCH', nReturned: Long('59935')) and sorted for pagination (stage: 'SORT', sortMethod: 'top-N heapsort'). Ultimately, this process returned the final result of 5 documents (stage: 'LIMIT', nReturned: Long('5')), discarding the thousands of documents read.

While the query and result are similar to MongoDB, the execution differs significantly. MongoDB avoids reading all documents and sorting them because its index not only helps find a range but also returns results in order.

To grasp the underlying reasons for this difference, we need more than just the execution plan of the emulation. I installed auto-explain in my lab to analyze the execution plan in PostgreSQL:

2025-08-31 17:20:47.765 UTC [416] LOG:  duration: 160.621 ms  plan:
        Query Text: EXPLAIN (FORMAT JSON, ANALYZE True, VERBOSE True, BUFFERS True, TIMING True) SELECT document FROM documentdb_api_catalog.bson_aggregation_find($1, $2)
        Query Parameters: $1 = 'test', $2 = '\x5f0000000266696e6400070000006672616e636b000366696c7465720022000000036e6e6e00180000001024677465001400000010246c740050000000000003736f7274000e000000106e6e6e000100000000106c696d6974000500000000'
        Limit  (cost=517.87..517.88 rows=5 width=68) (actual time=160.617..160.618 rows=5 loops=1)
          Output: document, (bson_orderby(document, 'BSONHEX0e000000106e6e6e000100000000'::documentdb_core.bson))
          Buffers: shared hit=4447, temp read=1860 written=2002
          ->  Sort  (cost=517.87..580.37 rows=25000 width=68) (actual time=160.615..160.616 rows=5 loops=1)
                Output: document, (bson_orderby(document, 'BSONHEX0e000000106e6e6e000100000000'::documentdb_core.bson))
                Sort Key: (bson_orderby(collection.document, 'BSONHEX0e000000106e6e6e000100000000'::documentdb_core.bson)) NULLS FIRST
                Sort Method: top-N heapsort  Memory: 26kB
                Buffers: shared hit=4447, temp read=1860 written=2002
                ->  Index Scan using nnn_1 on documentdb_data.documents_2 collection  (cost=0.00..102.62 rows=25000 width=68) (actual time=98.698..138.723 rows=59973 loops=1)
                      Output: document, bson_orderby(document, 'BSONHEX0e000000106e6e6e000100000000'::documentdb_core.bson)
                      Index Cond: (collection.document @<> 'BSONHEX3f000000036e6e6e0035000000106d696e0014000000106d61780050000000086d696e496e636c75736976650001086d6178496e636c757369766500000000'::documentdb_core.bson)
                      Buffers: shared hit=4439, temp read=1860 written=2002
        Settings: search_path = 'documentdb_api_catalog, documentdb_api, public'
Enter fullscreen mode Exit fullscreen mode

The steps are similar but more detailed at the PostgreSQL level. The index access utilizes the @<> operator, which checks if a BSON value is within a specified range via a RUM index (DocumentDB uses an extended version of RUM index which provides more metadata than a GIN index). In this case, the index was applied solely for filtering, while a separate Sort step managed the final ordering. This method requires reading all documents before they can be ordered. Although the example is simple, indexed fields may contain arrays, which means a forward scan must return entries in order based on the smallest in the array. This behavior is native to MongoDB but not in PostgreSQL, and it likely explains why entries can't be retrieved in the desired order in the current version of DocumentDB.

Note that you might see a Bitmap Scan before auto-vacuum runs, but it's important to recognize that an Index Scan is also possible, which is a key distinction from GIN indexes.

The definition of the table and index is visible from PostgreSQL:

postgres=# \d documentdb_data.documents_2

                     Table "documentdb_data.documents_2"
     Column      |           Type           | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+---------
 shard_key_value | bigint                   |           | not null |
 object_id       | documentdb_core.bson     |           | not null |
 document        | documentdb_core.bson     |           | not null |
 creation_time   | timestamp with time zone |           |          |
Indexes:
    "collection_pk_2" PRIMARY KEY, btree (shard_key_value, object_id)
    "documents_rum_index_3" documentdb_rum (document documentdb_api_catalog.bson_rum_single_path_ops (path=nnn, tl='2699'))
Check constraints:
    "shard_key_value_check" CHECK (shard_key_value = '2'::bigint)
Enter fullscreen mode Exit fullscreen mode

DocumentDB uses Citus for sharding, and an extended version of RUM indexes documentdb_rum for indexes:

postgres=# select indexdef, tablename, indexname 
           from pg_indexes 
           where schemaname='documentdb_data'
;
                                                                                indexdef                                                                                 |  tablename  |       indexname
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-----------------------
 CREATE UNIQUE INDEX collection_pk_1 ON documentdb_data.documents_1 USING btree (shard_key_value, object_id)                                                             | documents_1 | collection_pk_1
 CREATE UNIQUE INDEX retry_1_pkey ON documentdb_data.retry_1 USING btree (shard_key_value, transaction_id)                                                               | retry_1     | retry_1_pkey
 CREATE INDEX retry_1_object_id_idx ON documentdb_data.retry_1 USING btree (object_id)                                                                                   | retry_1     | retry_1_object_id_idx
 CREATE UNIQUE INDEX collection_pk_2 ON documentdb_data.documents_2 USING btree (shard_key_value, object_id)                                                             | documents_2 | collection_pk_2
 CREATE UNIQUE INDEX retry_2_pkey ON documentdb_data.retry_2 USING btree (shard_key_value, transaction_id)                                                               | retry_2     | retry_2_pkey
 CREATE INDEX retry_2_object_id_idx ON documentdb_data.retry_2 USING btree (object_id)                                                                                   | retry_2     | retry_2_object_id_idx
 CREATE INDEX documents_rum_index_3 ON documentdb_data.documents_2 USING documentdb_rum (document documentdb_api_catalog.bson_rum_single_path_ops (path=nnn, tl='2699')) | documents_2 | documents_rum_index_3
(7 rows)

Enter fullscreen mode Exit fullscreen mode

Here is the list of extensions installed:

postgres=# \dx
                                    List of installed extensions
      Name       | Version |   Schema   |                        Description
-----------------+---------+------------+------------------------------------------------------------
 documentdb      | 0.104-0 | public     | API surface for DocumentDB for PostgreSQL
 documentdb_core | 0.104-0 | public     | Core API surface for DocumentDB on PostgreSQL
 pg_cron         | 1.6     | pg_catalog | Job scheduler for PostgreSQL
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgis         | 3.5.3   | public     | PostGIS geometry and geography spatial types and functions
 rum             | 1.3     | public     | RUM index access method
 tsm_system_rows | 1.0     | public     | TABLESAMPLE method which accepts number of rows as a limit
 vector          | 0.8.0   | public     | vector data type and ivfflat and hnsw access methods
(8 rows)
Enter fullscreen mode Exit fullscreen mode

In my example, DocumentDB performed more work than MongoDB because the sort operation was not pushed down to the index scan. Achieving ordered results from a multi-key index is challenging due to multiple index entries per document. The scan must deduplicate these entries and arrange them correctly: the lowest array value for a forward scan and the greatest for a backward scan. MongoDB implemented this functionality from the get-go in its multi-key indexes. Emulation on top of SQL databases still requires further development to match the performance and scalability, as RDBMS were not designed for multi-key indexes, as one-to-many relationships are typically managed in separate tables according to the first normal form. This remains a TODO in the code for DocumentDB's RUM index access method.

The order by pushdown is not the only limitation. If you cannot read the five index entries needed for the query, you should at least try to avoid fetching thousands of documents. In MongoDB, using a covering index will replace the FETCH stage with a PROJECTION_COVERED stage. I attempted the same in DocumentDB by omitting the "_id" from the projection:

db.franck.find(
  { nnn: { $gte: 20, $lt: 80 } },
  { _id: 0, nnn: 1 }
).sort({ nnn: 1 }).limit(5).hint({nnn:1, _id:1 }).explain("executionStats").executionStats

{
  nReturned: Long('5'),
  executionTimeMillis: Long('170'),
  totalDocsExamined: Long('5'),
  totalKeysExamined: Long('5'),
  executionStages: {
    stage: 'LIMIT',
    nReturned: Long('5'),
    executionTimeMillis: Long('170'),
    totalKeysExamined: Long('5'),
    totalDocsExamined: 5,
    numBlocksFromCache: 4607,
    numBlocksFromDisk: 0,
    inputStage: {
      stage: 'PROJECT',
      nReturned: Long('5'),
      executionTimeMillis: Long('170'),
      totalKeysExamined: Long('5'),
      totalDocsExamined: 5,
      numBlocksFromCache: 4607,
      numBlocksFromDisk: 0,
      inputStage: {
        stage: 'SORT',
        nReturned: Long('5'),
        executionTimeMillis: Long('170'),
        totalKeysExamined: Long('5'),
        totalDocsExamined: 5,
        sortMethod: 'top-N heapsort',
        totalDataSizeSortedBytesEstimate: 25,
        numBlocksFromCache: 4607,
        numBlocksFromDisk: 0,
        inputStage: {
          stage: 'FETCH',
          nReturned: Long('59973'),
          executionTimeMillis: Long('149'),
          totalKeysExamined: Long('59973'),
          indexName: 'nnn_1__id_1',
          totalDocsRemovedByIndexRechecks: 0,
          numBlocksFromCache: 4599,
          numBlocksFromDisk: 0,
          inputStage: {
            stage: 'IXSCAN',
            nReturned: Long('59973'),
            executionTimeMillis: Long('149'),
            totalKeysExamined: Long('59973'),
            indexName: 'nnn_1__id_1',
            totalDocsRemovedByIndexRechecks: 0,
            numBlocksFromCache: 4599,
            numBlocksFromDisk: 0
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

A PROJECT stage was added. However, it did not eliminate the FETCH of thousands of documents because DocumentDB cannot perform an Index Only Scan on the RUM index. This situation may evolve in the future, as this new index type was designed to address the limitations of GIN indexes. A commit refers to it as the Helio GIN index, which is named after HelioDB, the internal name for the "pgmongo" project when it was used exclusively within CosmosDB.

Oracle with ORDS

Oracle Database has also implemented an emulation for MongoDB and it's interresting to see how it compares with DocumentDB on top of PostgreSQL. I use the Oracle Autonomous docker image to test it:

docker run -it --rm --entrypoint bash                       \
container-registry.oracle.com/database/adb-free:latest-23ai
# Start Oracle Autonomous Database
ADMIN_PASSWORD=OracleDatabase_23ai  \
WALLET_PASSWORD=OracleDatabase_23ai \
/u01/scripts/entrypoint.sh > /tmp/server.log 2>&1 </dev/null &

# Install MongoDB Shell
curl https://downloads.mongodb.com/compass/\
mongosh-2.5.1-linux-x64.tgz | tar -C ~ -zxvf -

# Wait until ORDS has started the MongoDB API (takes a few minutes)
until grep -iB1 mongodb /tmp/server.log ; do sleep 1 ; done

# To explain plan (to run after ORA schema is created by first connection)
sqlplus admin/OracleDatabase_23ai@//localhost:1521/MYATP_tp.adb.oraclecloud.com << 'SQL'
 grant select on sys.v_$session to ora;
 grant select on sys.v_$sql_plan_statistics_all to ora;
 grant select on sys.v_$sql_plan to ora;
 grant select on sys.v_$sql to ora;
 grant select on dba_segments to ora;
SQL

# Connect with MongoDB Shell
~/*/bin/mongosh  \
mongodb://admin:OracleDatabase_23ai@$HOSTNAME:27017/ora'?authMechanism=PLAIN&authSource=$external&ssl=true&retryWrites=false&loadBalanced=true&tlsAllowInvalidCertificates=true'
Enter fullscreen mode Exit fullscreen mode

I run the same as I did in MongoDB and DocumentDB and get the execution plan (there's no executionStats in this emulation but we will see a workaround later:

db.franck.find(
      { nnn: { $gte: 20, $lt: 80 } }
    ).sort({ nnn: 1 }).limit(5).explain("executionStats")

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'ora.franck',
    indexFilterSet: false,
    parsedQuery: {
      '$query': { nnn: { '$numberOnly': { '$gte': 20, '$lt': 80 } } },
      '$orderby': {
        '$fields': [ { path: 'nnn', order: 'asc', sortByMinMax: true } ],
        '$lax': true
      }
    },
    rewrittenQuery: {
      '$and': [
        {
          '$query': {
            '$value': { nnn: { '$numberOnly': { '$gte': 20, '$lt': 80 } } }
          }
        },
        {
          '$orderby': {
            '$fields': [ { path: 'nnn', order: 'asc', sortByMinMax: true } ],
            '$lax': true
          }
        }
      ]
    },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'COUNT',
        options: 'STOPKEY',
        columns: '(rowset=5) "from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]',
        filterType: 'filter',
        filter: 'ROWNUM<=5',
        inputStage: {
          stage: 'VIEW',
          columns: '(rowset=5) "from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]',
          inputStage: {
            stage: 'SORT',
            options: 'ORDER BY STOPKEY',
            columns: `(#keys=1; rowset=5) INTERNAL_FUNCTION(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE))[4000], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16]`,
            filterType: 'filter',
            filter: 'ROWNUM<=5',
            path: "$._id'",
            inputStage: {
              stage: 'FILTER',
              columns: `"DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000]`,
              filterType: 'filter',
              filter: "HEXTORAW('04')>SYS_CONS_ANY_SCALAR(:1, 3)",
              path: "$._id'",
              inputStage: {
                stage: 'TABLE ACCESS',
                options: 'BY INDEX ROWID BATCHED',
                source: 'franck',
                columns: `"DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000]`,
                filterType: 'filter',
                filter: `JSON_EXISTS2("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$?( exists(@.nnn?( (@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )' PASSING :1 AS "B0" , :2 AS "B1" FALSE ON ERROR TYPE(STRICT) )=1`,
                path: "$._id'",
                query: "$?( exists(@.nnn?( (@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )'",
                inputStage: {
                  stage: 'HASH',
                  options: 'UNIQUE',
                  columns: '(#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]',
                  inputStage: {
                    stage: 'INDEX',
                    options: 'RANGE SCAN (MULTI VALUE)',
                    source: '$ora:franck.nnn_1',
                    columns: `"franck".ROWID[ROWID,10], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000], SYSVARCOL[8]`,
                    filterType: 'access',
                    filter: `JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)>=SYS_CONS_ANY_SCALAR(:1, 3) AND JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."nnn"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)<HEXTORAW('04')`
                  }
                }
              }
            }
          }
        }
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}
Enter fullscreen mode Exit fullscreen mode

Like DocumentDB on PostgreSQL, Oracle Database do not push down the sort operation, resulting in the reading of all documents. The execution plan shows that all documents were fetched (TABLE ACCESS BY INDEX ROWID BATCHED), ordered (SORT ORDER BY STOPKEY) to finally retrieve the top 5 results (COUNT STOPKEY).

There's a mention of INDEX RANGE SCAN (MULTI VALUE) but it's not clear which range, and it has an additional HASH UNIQUE to deduplicate the multiple index entries. I mean given the following distribution, did the range scan read 59827 documents or more?

db.franck.aggregate([ {  
 $facet: { 
  " < 20": [ { $match: { nnn: { $lt: 20 } } },           { $count: "count" }    ],  
  "20-80": [ { $match: { nnn: { $gte: 20, $lt: 80 } } }, { $count: "count" }  ],  
  " > 80": [ { $match: { nnn: { $gte: 80 } } },          { $count: "count" }   ]  
    }  
  }  
])  

[
  {
    ' < 20': [ { count: 19946 } ],
    '20-80': [ { count: 59827 } ],
    ' > 80': [ { count: 20227 } ]
  }
]
Enter fullscreen mode Exit fullscreen mode

Unlike DocumentDB, which converts PostgreSQL execution plans into a MongoDB-like format, Oracle’s emulation illustrates the complex transformations needed to execute MongoDB queries in an RDBMS. This is preferable only when details are hard to obtain, but the MongoDB emulation in Oracle Database allows for SQL execution and retrieval of the SQL execution plan using DBMS_XPLAN (as long as you executed the GRANT commands above):

db.aggregate([ { $sql : {
 statement: "alter session set statistics_level=all",
 resetSession  : false }} ]);

db.franck.find(
      { nnn: { $gte: 20, $lt: 80 } }
    ).sort({ nnn: 1 }).limit(5)

db.aggregate( [ { $sql : `
select * 
from dbms_xplan.display_cursor(format=>'ADVANCED ALLSTATS LAST')  
` } ] ).forEach(row => print(row.PLAN_TABLE_OUTPUT));

SQL_ID  f324cg78qnywa, child number 0
-------------------------------------
select /*+ FIRST_ROWS(5) */ "DATA",rawtohex("RESID"),"ETAG" from
"ORA"."franck" where JSON_EXISTS("DATA",'$?( exists(@.nnn?(
(@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )' passing :1 as
"B0", :2 as "B1" type(strict)) order by JSON_QUERY("DATA",
'$.nnn[*].min()') asc nulls first fetch next 5 rows only

Plan hash value: 2372779666

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                   |      1 |        |       |     2 (100)|          |      5 |00:00:00.45 |   80403 |       |       |          |
|*  1 |  COUNT STOPKEY                          |                   |      1 |        |       |            |          |      5 |00:00:00.45 |   80403 |       |       |          |
|   2 |   VIEW                                  |                   |      1 |      1 |  6114 |     2  (50)| 00:00:01 |      5 |00:00:00.45 |   80403 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                |                   |      1 |      1 |   122 |     2  (50)| 00:00:01 |      5 |00:00:00.45 |   80403 |  2048 |  2048 | 2048  (0)|
|*  4 |     FILTER                              |                   |      1 |        |       |            |          |  59827 |00:00:00.44 |   80403 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| franck            |      1 |      1 |   122 |     1   (0)| 00:00:01 |  59827 |00:00:00.43 |   80403 |       |       |          |
|   6 |       HASH UNIQUE                       |                   |      1 |      1 |   122 |            |          |  80054 |00:00:00.03 |     391 |   811K|   811K|          |
|*  7 |        INDEX RANGE SCAN (MULTI VALUE)   | $ora:franck.nnn_1 |      1 |      1 |       |     1   (0)| 00:00:01 |  80054 |00:00:00.02 |     391 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$2
   2 - SEL$1 / "from$_subquery$_002"@"SEL$2"
   3 - SEL$1
   5 - SEL$1 / "franck"@"SEL$1"
   7 - SEL$1 / "franck"@"SEL$1"

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      OPT_PARAM('_fix_control' '20648883:0 26552730:1 27175987:0 29972495:0 22387320:0 30195773:0 31945701:1 32108311:1 33659818:3 34092979:1 35495824:1 33792497:1 36554842:1
              36283175:1 31720959:1 36004220:1 36635255:1 36675198:1 36868551:1 37400112:1 37346200:0')
      FIRST_ROWS(5)
      FORCE_XML_QUERY_REWRITE
      FORCE_JSON_TABLE_TRANSFORM
      XML_DML_RWT_STMT
      XMLINDEX_REWRITE
      XMLINDEX_REWRITE_IN_SELECT
      NO_COST_XML_QUERY_REWRITE
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$1" "franck"@"SEL$1" "$ora:franck.nnn_1")
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" UNIQUE)
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)
   4 - filter(HEXTORAW('04')>SYS_CONS_ANY_SCALAR(:1, 3))
   5 - filter(JSON_EXISTS2("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$?( exists(@.nnn?( (@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )' PASSING :1 AS "B0" , :2
              AS "B1" FALSE ON ERROR TYPE(STRICT) )=1)
   7 - access("franck"."SYS_NC00005$">=SYS_CONS_ANY_SCALAR(:1, 3) AND "franck"."SYS_NC00005$"<HEXTORAW('04'))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (rowset=5) "from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]
   2 - (rowset=5) "from$_subquery$_002"."DATA"[JSON,8200], "from$_subquery$_002"."RAWTOHEX("RESID")"[VARCHAR2,4000], "from$_subquery$_002"."ETAG"[RAW,16]
   3 - (#keys=1; rowset=5) "franck"."SYS_NC00005$"[RAW,4000], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], "franck"."RESID"[RAW,2000], "ETAG"[RAW,16]
   4 - "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], "franck"."RESID"[RAW,2000], "ETAG"[RAW,16], "franck"."SYS_NC00005$"[RAW,4000]
   5 - "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], "franck"."RESID"[RAW,2000], "ETAG"[RAW,16], "franck"."SYS_NC00005$"[RAW,4000]
   6 - (#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]
   7 - "franck".ROWID[ROWID,10], "franck"."SYS_NC00005$"[RAW,4000], SYSVARCOL[8]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  FIRST_ROWS(5)

Query Block Registry:
---------------------

  SEL$1 (PARSER) [FINAL]
  SEL$2 (PARSER) [FINAL]

Enter fullscreen mode Exit fullscreen mode

From the execution statistics, it is clear that the index was not used efficiently, as INDEX RANGE SCAN (MULTI VALUE) has read 80054 A-Rows which, as we have seen above, accounts for all documents with "nnn" greater than 20, including the 20227 above 80. They are filtered down to 59827 only after accessing the table with TABLE ACCESS BY INDEX ROWID BATCHED. There is a clue about this in the predicate section where the index access mentions only a lower bound: access("franck"."SYS_NC00005$">=SYS_CONS_ANY_SCALAR(:1, 3) AND "franck"."SYS_NC00005$"<HEXTORAW('04'))

Another important outcome from the execution plan is the HASH UNIQUE step here that deduplicates the multi-key entries. I have no arrays for "nnn" in my example, so the number is the same but the execution plan must take this possibility into account. I covered this in a previous post.

If you are used to Oracle Database, you may think that the additional sort is due to the batch table access by ROWID which optimizes the access to the table to reduce I/O rather than serve an ORDER BY. And given that the optimizer statistics are wrong and no dynamic sampling was used, it could be a bad choice of the optimizer. But it's different here. Between the index and the table, there's a deduplication step (HASH UNIQUE) on the ROWID ((#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]). By consequence, the batch table access is possible:

db.aggregate( [ { $sql : {
statement: `
select /*+ 
      FIRST_ROWS(5) 
      NO_BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1")
*/ "DATA",rawtohex("RESID"),"ETAG" from
"ORA"."franck" where JSON_EXISTS("DATA",'$?( exists(@.nnn?(
(@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )' passing :1 as
"B0", :2 as "B1" type(strict)) order by JSON_QUERY("DATA",
'$.nnn[*].min()') asc nulls first fetch next 5 rows only
`,
binds: { "1": 20, "2": 80 }
} } ] ).forEach(row => print(row.DATA));

{ nnn: 20.000276167882134, _id: ObjectId('68b37f80dd1e7e1dedd54d7c') }
{ nnn: 20.00127178299522, _id: ObjectId('68b37f80dd1e7e1dedd59692') }
{ nnn: 20.001609281518974, _id: ObjectId('68b37f80dd1e7e1dedd589be') }
{ nnn: 20.00465615670095, _id: ObjectId('68b37f80dd1e7e1dedd516ba') }
{ nnn: 20.00547511267696, _id: ObjectId('68b37f80dd1e7e1dedd5b644') }

db.aggregate( [ { $sql : `
select * 
from dbms_xplan.display_cursor(null,null,format=>'ALLSTATS LAST')  ` } ] ).forEach(row => print(row.PLAN_TABLE_OUTPUT));

SQL_ID  235z2n0wtzck2, child number 0
-------------------------------------
 select /*+        FIRST_ROWS(5)
NO_USE_HASH_AGGREGATION(@"SEL$1" UNIQUE)
NO_BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1") */
"DATA",rawtohex("RESID"),"ETAG" from "ORA"."franck" where
JSON_EXISTS("DATA",'$?( exists(@.nnn?( (@.numberOnly() >= $B0) &&
(@.numberOnly() < $B1) )) )' passing :1 as "B0", :2 as "B1"
type(strict)) order by JSON_QUERY("DATA", '$.nnn[*].min()') asc nulls
first fetch next 5 rows only

Plan hash value: 3334965968

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                   |      1 |        |      5 |00:00:00.43 |    2173 |       |       |          |
|*  1 |  COUNT STOPKEY                       |                   |      1 |        |      5 |00:00:00.43 |    2173 |       |       |          |
|   2 |   VIEW                               |                   |      1 |      1 |      5 |00:00:00.43 |    2173 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY             |                   |      1 |      1 |      5 |00:00:00.43 |    2173 |  2048 |  2048 | 2048  (0)|
|*  4 |     FILTER                           |                   |      1 |        |  59827 |00:00:00.42 |    2173 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID     | franck            |      1 |      1 |  59827 |00:00:00.41 |    2173 |       |       |          |
|   6 |       HASH UNIQUE                    |                   |      1 |      1 |  80054 |00:00:00.08 |     387 |  5510K|   960K| 4897K (0)|
|*  7 |        INDEX RANGE SCAN (MULTI VALUE)| $ora:franck.nnn_1 |      1 |      1 |  80054 |00:00:00.01 |     387 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)
   4 - filter(HEXTORAW('04')>SYS_CONS_ANY_SCALAR(:1, 3))
   5 - filter(JSON_EXISTS2("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$?( exists(@.nnn?( (@.numberOnly() >= $B0) && (@.numberOnly() <
              $B1) )) )' PASSING :1 AS "B0" , :2 AS "B1" FALSE ON ERROR TYPE(STRICT) )=1)
   7 - access("franck"."SYS_NC00005$">=SYS_CONS_ANY_SCALAR(:1, 3) AND "franck"."SYS_NC00005$"<HEXTORAW('04'))
Enter fullscreen mode Exit fullscreen mode

Deduplication is on ROWID, which is useless for the result pagination, so finally TABLE ACCESS BY INDEX ROWID BATCHED was the best plan. Still, it lacks efficiency. Oracle reads 80054 index entries with a one-sided range, unlike MongoDB's 5 keys or DocumentDB's 59935 keys. The keys, initially expected to be sorted by the result field, are instead ordered by ROWID to ensure one document per entry. All documents are fetched and filtered to 59827, then sorted by the expected field for pagination to return the Top-5.

I use the raw SQL query and hints in a lab to understand how it works and the limitation of the emulation, but be careful, there's a lot of hidden magic here. For example, I tried a SORT UNIQUE instead of HASH UNIQUE, with the NO_USE_HASH_AGGREGATION hint, and got wrong result as if the ORDER BY didn't apply:

db.aggregate( [ { $sql : {
statement: `
select /*+ 
      FIRST_ROWS(5) 
      NO_USE_HASH_AGGREGATION(@"SEL$1" UNIQUE)
*/ "DATA",rawtohex("RESID"),"ETAG" from
"ORA"."franck" where JSON_EXISTS("DATA",'$?( exists(@.nnn?(
(@.numberOnly() >= $B0) && (@.numberOnly() < $B1) )) )' passing :1 as
"B0", :2 as "B1" type(strict)) order by JSON_QUERY("DATA",
'$.nnn[*].min()') asc nulls first fetch next 5 rows only
`,
binds: { "1": 20, "2": 80 }
} } ] ).forEach(row => print(row.DATA));

{ nnn: 67.95991306665603, _id: ObjectId('68b37f80dd1e7e1dedd4f7f4') }
{ nnn: 70.91722435559275, _id: ObjectId('68b37f80dd1e7e1dedd4f7f6') }
{ nnn: 76.3751578913558, _id: ObjectId('68b37f80dd1e7e1dedd4f7f7') }
{ nnn: 24.260047454002255, _id: ObjectId('68b37f80dd1e7e1dedd4f7f8') }
{ nnn: 49.42197218105689, _id: ObjectId('68b37f80dd1e7e1dedd4f7f9') }
Enter fullscreen mode Exit fullscreen mode

This illustrates the difficulty of replicating MongoDB functionality within a SQL database. Currently, DocumentDB on PostgreSQL performs better than Oracle since it applies the full filter on the index. However, it still falls short compared to vanilla MongoDB, as it cannot utilize the index to return sorted results.

Conclusion

MongoDB differs from traditional SQL databases in two main ways. First, its API and document model support application-driven development, aligning closely with in-app objects instead of a normalized relational schema. Second, this model is maintained at the storage layer, allowing business transactions to be single operations on one document, often on a single shard. This simplicity enhances developer experience, performance, and scalability.

Emulating MongoDB on top of an SQL database is challenging because it requires adding structures and access methods that RDBMS were never built for. In MongoDB, one‑to‑many relationships are embedded in documents, and multi‑key indexes are a core feature. In relational databases, the same data is typically split across multiple tables with single‑key indexes. Emulations introduce new indexes like Oracle’s multi‑value index or PostgreSQL’s extended RUM index to bridge that gap, but optimizations such as pushing down ORDER BY for efficient pagination are not yet implemented.

Finally, performance tuning in an emulation requires dual expertise: interpreting MongoDB execution plans while also understanding the underlying SQL engine’s execution strategies, so the simplicity of developing, deploying, and operating must also be taken into account. Emulations are good as long as they make your life easier (you don't need all features, accept lower performance, but stay on a database you know). However, it's crucial to recognize when they introduce more complexity, such as workarounds for feature limitations or understanding intricate execution plans, than using a database that natively supports a document model.

Top comments (0)