Franck Pachot for MongoDB

Posted on Aug 15 • Edited on Aug 20

Why Doesn't Oracle Multi-Value Index Optimize .sort() Like MongoDB Does With its Multi-Key Index?

#mongodb #oracle #index #json

Until recently, Oracle Database offered very limited options for creating inverted indexes, primarily restricted to special indexing for text and spatial data. Unlike PostgreSQL with Generalized Inverted Index (GIN), Oracle lacked a general solution. However, in response to the growing demand for document databases and to provide a MongoDB emulation, Oracle introduced the Multi-Value Index (MVI) in version 23ai. Let's create one and examine its internal structure and how it differs from MongoDB indexes.

Lab setup

I start a Docker container with Oracle 23ai running and connect with SQL*Plus:

docker run --name ora -d -e ORACLE_PWD=OracleDatabase_23ai container-registry.oracle.com/database/free:latest

docker exec -it ora bash -c 'until grep -B100 "Completed: ALTER DATABASE OPEN" /opt/oracle/diag/rdbms/free/FREE/trace/alert_FREE.log ; do sleep 1 ; done'

docker exec -it ora sqlplus / as sysdba

I create a table to store some JSON documents:

DROP TABLE if exists franck;
CREATE TABLE franck (data JSON);

INSERT INTO franck VALUES
  ('{ field1: "x", field2: [ "jkl" , "mno" , "jkl" ] }') -- 78 , [ 6a 6b 6c , 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "y", field2: [ "mno" , "jkl"         ] }') -- 79 , [ 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "z", field2:   "jkl"                   }') -- 7a ,   6a 6b 6c
);

The structure is flexible, and for the second field, I inserted scalar (in the third document) and arrays, some with redundant values, and with different orders.

Multi-Value Index

I create a multi-value index similar to what the MongoDB API emulation would create. It uses JSON_TABLE to extract fields to columns to be indexed. It uses JSON_MKMVI to make an index entry for the multi-value index:

CREATE MULTIVALUE INDEX FRANCK_MVI ON FRANCK (
 JSON_MKMVI(
  JSON_TABLE(
    "DATA", '$' COLUMNS (
     "SCALAR" PATH '$."field1"' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH,
     NESTED PATH '$."field2"[*]' COLUMNS (
       "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
     )
    )
  )
 )
);

ORA_RAWCOMPARE is a clue that the ordering of nested values is binary. It would be nice to run those functions and see how they make index entries from JSON, but we cannot use them directly:

  select JSON_TABLE(
         *
ERROR at line 2:
ORA-61712: Can't use json_table syntax to create non-MVI(MultiValue Index): .
Help: https://docs.oracle.com/error-help/db/ora-61712/

The index entries will have a compound key with values extracted from the DATA column:

field1 as a scalar value (not used by the MongoDB emulation as all fields can be an array)
field2[*] as the values within an array, and NESTED PATH unnests the values (like $unwind)

The indexes created by the MongoDB API emulation are all NESTED PATH to be compatible with MongoDB's flexible schema, where any field in a document can be a scalar or an array, but I created one scalar to see the behavior.

Index internals

The index is created, making it easy to dump what's inside. I get the block identifier from DBA_SEGMENTS and dump the block:

SQL> -- get the block offset in datafile 1 ( I run this as sysdba, it goes so SYSTEM tablespace )
SQL> column block new_value block
SQL> select header_block+1 as block from dba_segments where segment_name='FRANCK_MVI' and owner=user;

     BLOCK
----------
    114553

SQL> -- set tracefile identifier and dump the block
SQL> alter session set tracefile_identifier = 'franck&block';
old   1: alter session set tracefile_identifier = 'franck&block'
new   1: alter session set tracefile_identifier = 'franck    114553'

Session altered.

SQL> -- get tracefile name and print
SQL> column tracefile new_value tracefile
SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));

TRACEFILE
--------------------------------------------------------------------------------
/opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc

SQL> host cat "&tracefile."
Trace file /opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc
Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free
Version 23.8.0.25.04
Build label:    RDBMS_23.8.0.25.04DBRU_LINUX.X64_250423
ORACLE_HOME:    /opt/oracle/product/23ai/dbhomeFree
System name:    Linux
Node name:      6fc22939038c
Release:        4.14.35-2047.540.4.1.el7uek.x86_64
Version:        #2 SMP Thu Sep 5 12:06:13 PDT 2024
Machine:        x86_64
VM name:        KVM CPUID feature flags: 0x01003afb
CLID:   P
Instance name: FREE
Instance number: 1
Database name: FREE
Database unique name: FREE
Database id: 1471336774
Database role: PRIMARY
Redo thread mounted by this instance: 1
Oracle binary inode:    1036542
Oracle process number: 63
Unix process pid:       3518
             NID:       4026532223
             created:   2025-08-15T09:12:36.-2739213311+00:00
             image:     oracle@6fc22939038c (TNS V1-V3)


*** 2025-08-15T09:13:47.601001+00:00 (CDB$ROOT(1))
*** SESSION ID:(213.17644) 2025-08-15T09:13:47.601014+00:00
*** CLIENT ID:() 2025-08-15T09:13:47.601024+00:00
*** SERVICE NAME:(SYS$USERS) 2025-08-15T09:13:47.601033+00:00
*** MODULE NAME:(sqlplus@6fc22939038c (TNS V1-V3)) 2025-08-15T09:13:47.601043+00:00
*** ACTION NAME:() 2025-08-15T09:13:47.601052+00:00
*** CLIENT DRIVER:(SQL*PLUS) 2025-08-15T09:13:47.601060+00:00
*** CONTAINER ID:(1) 2025-08-15T09:13:47.601069+00:00
*** CLIENT IP:(N/A) 2025-08-15T09:13:47.601078+00:00
*** CONNECTION ID:(PGTKhibdDbfgYwIAEaztFA==) 2025-08-15T09:13:47.601088+00:00

Start dump data blocks tsn: 0 file#:1 minblk 114553 maxblk 114553
Block dump from cache:

Dump of buffer cache at level 3 for pdb=1 tsn=0 rdba=114553

BH (0xbdf5b740) file#: 1 rdba: 0x0001bf79 (1024/114553) class: 1 ba: 0xbd134000
  set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: 72614 objn: 72614 tsn: [1/0] afn: 1 hint: 0xf
  hash: [0xc3f52b50,0x77de4520] lru: [0xbdf5d2e0,0xbdf5b990]
  ckptq: [NULL] fileq: [NULL]
  objq: [0x76c0c790,0x76c0c790] objaq: [0x76c0c780,0x76c0c780] qhead: 0x76c0c770
  st: XCURRENT md: NULL fpin: 'ktswh23: ktsfbkl' fscn: 0x2f14be fcur_vldr: 0x3 tch: 1
  flags: block_written_once
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x2f14c0] HSUB: [1]
  Printing buffer operation history (latest change first): (cnt 9)
  01. sid:11 L192:kcbbic2:bic:FBD     02. sid:11 L191:kcbbic2:bic:FBW
  03. sid:11 L602:bic1_int:bis:FWC    04. sid:11 L822:bic1_int:ent:rtn
  05. sid:11 L832:oswmqbg1:clr:WRT    06. sid:11 L930:kubc:sw:mq
  07. sid:11 L913:bxsv:sw:objq        08. sid:11 L608:bxsv:bis:FBW
  09. sid:11 L607:bxsv:bis:FFW        10. sid:05 L464:chg1_mn:bic:FMS
  11. sid:05 L778:chg1_mn:bis:FMS     12. sid:05 L552:chg_main:bic:CLN
  13. sid:05 L353:gcur:set:MEXCL      14. sid:05 L353:gcur:set:MEXCL

  buffer tsn: 0 rdba: 0x0001bf79 (1024/114553)
  scn: 0x2f14c0 seq: 0x01 flg: 0x04 tail: 0x14c00601
  frmt: 0x02 chkval: 0x2a1a type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dumping 'block' addr=0xbd134000 size=8192 bytes
Dump of memory from 0xbd134000 to 0xbd136000
0BD134000 0000A206 0001BF79 002F14C0 04010000  [....y...../.....]
0BD134010 00002A1A 00400102 00011BA6 002F14BE  [.*....@......./.]
0BD134020 00008000 00020002 00000000 00000000  [................]
0BD134030 00000000 00000000 00000000 00000000  [................]
0BD134040 00000000 0000FFFF 00000000 00000000  [................]
0BD134050 00000000 80008000 002F14BE 03800000  [........../.....]
0BD134060 00000000 002E0005 1ED31F01 00000000  [................]
0BD134070 00000000 00000000 00000002 00001F60  [............`...]
0BD134080 1F3A1F4D 1F141F27 00001F01 00000000  [M.:.'...........]
0BD134090 00000000 00000000 00000000 00000000  [................]
        Repeat 492 times
0BD135F60 7A040201 6B6A0404 0000066C 02004108  [...z..jkl....A..]
0BD135F70 02000000 04790402 6F6E6D04 08000006  [......y..mno....]
0BD135F80 00010041 02020000 04047904 066C6B6A  [A........y..jkl.]
0BD135F90 41080000 00000100 04020200 6D040478  [...A........x..m]
0BD135FA0 00066F6E 00410800 00000000 78040202  [no....A........x]
0BD135FB0 6B6A0404 0000066C 00004108 00000000  [..jkl....A......]
0BD135FC0 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
0BD135FF0 00000000 00000000 00000000 14C00601  [................]

Block scn: 0x2f14c0
Block header dump:  0x0001bf79
 Object id on Block? Y
 seg/obj: 0x11ba6  csc:  0x00000000002f14be  itc: 2  flg: --  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn  0x00000000002f14be
Leaf block dump
===============
header address 3172155484=0xbd13405c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 5
kdxcofbo 46=0x2e
kdxcofeo 7937=0x1f01
kdxcoavs 7891
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 2
kdxlebksz 8032
row#0[8013] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 00
row#1[7994] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 00
row#2[7975] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 01
row#3[7956] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 01
row#4[7937] flag: -------, lock: 0, len=19, data:(2):  00 01
col 0; len 2; (2):  04 7a
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 02
----- end of leaf block Logical dump -----
----- end of leaf block dump -----

There is one index entry per unique value in the array, so five entries in total:

entry	key 0 (hex)	field1	key 1 (hex)	field2	rowid	row
0	04 78	'x'	04 6a 6b 6c	'jkl'	00 00 08 41 00 00	1
1	04 78	'x'	04 6d 6e 6f	'mno'	00 00 08 41 00 00	1
2	04 79	'y'	04 6a 6b 6c	'jkl'	00 00 08 41 00 01	2
3	04 79	'y'	04 6d 6e 6f	'mno'	00 00 08 41 00 01	2
4	04 7a	'z'	04 6a 6b 6c	'jkl'	00 00 08 41 00 02	3

The index appears to be suitable for ordered entries in MongoDB, which sorts the documents in ascending order by the minimum value of an array. However, Oracle multi-value indexes do not preserve the sort as they add a deduplication step. We will see that in execution plans.

Execution plan in MongoDB

Here is an example, with result and execution plan, in MongoDB:

db.franck.drop();

db.franck.createIndex({ field1: 1, field2: 2});

db.franck.insertMany([
     { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
     { field1: "y", field2: [ "mno" , "jkl"         ] },
     { field1: "z", field2:   "jkl"                   },
    ]);

db.franck.find({ field1:"x" }).sort({ field1: 1, field2: 1 });

[
  {
    _id: ObjectId('689f03ed160c7ec59cd4b0ce'),
    field1: 'x',
    field2: [ 'jkl', 'mno', 'jkl' ]
  }
]

db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats").executionStats;

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 2,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 3,
    advanced: 1,
    needTime: 1,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    docsExamined: 1,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 3,
      advanced: 1,
      needTime: 1,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { field1: 1, field2: 2 },
      indexName: 'field1_1_field2_2',
      isMultiKey: true,
      multiKeyPaths: { field1: [], field2: [ 'field2' ] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { field1: [ '["x", "x"]' ], field2: [ '[MinKey, MaxKey]' ] },
      keysExamined: 2,
      seeks: 1,
      dupsTested: 2,
      dupsDropped: 1
    }
  }
}

The execution plan shows an IXSCAN and a FETCH, but no SORT operation because the index entries are scanned in the expected order.

Important statistics to note are: dupsTested: 2 and dupsDropped: 1. A multi-key index contains multiple keys for one document, but only one key is needed to fetch the document, resulting in the dropping of duplicates. MongoDB IXSCAN can eliminate duplicate keys from the same document while maintaining key order, keeping the key that matters (the minimum for ascending key order) and eliminating the need for an additional sort.

Execution plan in MongoDB emulation

I run the same on the MongoDB API emulation of Oracle Autonomous Database 23ai:

ora> db.franck.drop();

true

ora> db.franck.createIndex({ field1: 1, field2: 2});

field1_1_field2_2

ora> db.franck.insertMany([
         { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
         { field1: "y", field2: [ "mno" , "jkl"         ] },
         { field1: "z", field2:   "jkl"                   },
        ]);

{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId('689f041f23cd9d33b9d4b0c5'),
    '1': ObjectId('689f041f23cd9d33b9d4b0c6'),
    '2': ObjectId('689f041f23cd9d33b9d4b0c7')
  }
}

ora> db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats")

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'ora.franck',
    indexFilterSet: false,
    parsedQuery: {
      '$query': { field1: { '$stringOnly': 'x' } },
      '$orderby': {
        '$fields': [
          { path: 'field1', order: 'asc', sortByMinMax: true },
          { path: 'field2', order: 'asc', sortByMinMax: true }
        ],
        '$lax': true
      }
    },
    rewrittenQuery: {
      '$and': [
        {
          '$query': { '$value': { field1: { '$stringOnly': 'x' } } }
        },
        {
          '$orderby': {
            '$fields': [
              { path: 'field1', order: 'asc', sortByMinMax: true },
              { path: 'field2', order: 'asc', sortByMinMax: true }
            ],
            '$lax': true
          }
        }
      ]
    },
    winningPlan: {
      stage: 'SELECT STATEMENT',
      inputStage: {
        stage: 'SORT',
        options: 'ORDER BY',
        columns: `(#keys=2) JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.field1[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767], JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.field2[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], RAWTOHEX(JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) ))[4000], "ETAG"[RAW,16]`,
        path: "$.field1[*].min()'",
        inputStage: {
          stage: 'TABLE ACCESS',
          options: 'BY INDEX ROWID BATCHED',
          source: 'franck',
          columns: `"DATA" /*+ LOB_BY_VALUE */ [JSON,8200], JSON_VALUE("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$._id' RETURNING ANY ORA_RAWCOMPARE(2000) NO ARRAY ERROR ON ERROR TYPE(LAX) )[RAW,2000], "ETAG"[RAW,16]`,
          path: "$._id'",
          inputStage: {
            stage: 'HASH',
            options: 'UNIQUE',
            columns: '(#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]',
            inputStage: {
              stage: 'INDEX',
              options: 'RANGE SCAN (MULTI VALUE)',
              source: '$ora:franck.field1_1_field2_2',
              columns: `"franck".ROWID[ROWID,10], JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."field1"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)[RAW,4000], SYSVARCOL[8]`,
              filterType: 'access',
              filter: `JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$."field1"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE)=SYS_CONS_ANY_SCALAR(:1, 4)`
            }
          }
        }
      }
    },
    rejectPlans: []
  },
  serverInfo: { host: 'localhost', port: 27017, version: '4.2.14' },
  ok: 1
}

There is a SORT ORDER BY operation, and even if the index entries were ordered from INDEX RANGE SCAN (MULTI VALUE), there's a HASH UNIQUE that shuffles them.

Reading this execution plan is not easy, but I can run SQL queries from an aggregation pipeline to call DBMS_XPLAN:

ora> db.franck.find({ field1:"x" }).sort({ field1: 1, field2: 1 });
[
  {
    _id: ObjectId('689f041f23cd9d33b9d4b0c5'),
    field1: 'x',
    field2: [ 'jkl', 'mno', 'jkl' ]
  }
]
ora> db.aggregate( [ { $sql : `
select * from 
 dbms_xplan.display_cursor(format=>'BASIC +OUTLINE +PROJECTION')
` } ] ).forEach(row => print(row.PLAN_TABLE_OUTPUT));

EXPLAINED SQL STATEMENT:
------------------------
select "DATA",rawtohex("RESID"),"ETAG" from "ORA"."franck" where
JSON_EXISTS("DATA",'$?(@.field1.stringOnly() == $B0)' passing :1 as
"B0" type(strict)) order by JSON_QUERY("DATA", '$.field1[*].min()') asc
nulls first, JSON_QUERY("DATA", '$.field2[*].min()') asc nulls first

Plan hash value: 3588138049

------------------------------------------------------------------------------
| Id  | Operation                            | Name                          |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                               |
|   1 |  SORT ORDER BY                       |                               |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| franck                        |
|   3 |    HASH UNIQUE                       |                               |
|   4 |     INDEX RANGE SCAN (MULTI VALUE)   | $ora:franck.field1_1_field2_2 |
------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('23.1.0')
      DB_VERSION('23.1.0')
      OPT_PARAM('_fix_control' '20648883:0 26552730:1 27175987:0 29972495:0
              22387320:0 30195773:0 31945701:1 32108311:1 33659818:3 34092979:1
              35495824:1 33792497:1 36554842:1 36283175:1 31720959:1 36004220:1
              36635255:1 36675198:1 36868551:1 37400112:1 37346200:0')
      ALL_ROWS
      FORCE_XML_QUERY_REWRITE
      FORCE_JSON_TABLE_TRANSFORM
      XML_DML_RWT_STMT
      XMLINDEX_REWRITE
      XMLINDEX_REWRITE_IN_SELECT
      NO_COST_XML_QUERY_REWRITE
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "franck"@"SEL$1" "$ora:franck.field1_1_field2_2")
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$1" UNIQUE)
      END_OUTLINE_DATA
  */

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=2) JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */
       FORMAT OSON , '$.field1[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER
       NULL ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY
       ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767],
       JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON ,
       '$.field2[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL ON ERROR
       TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY ORA_RAWCOMPARE(32767) ERROR
       ON ERROR TYPE(LAX) )[32767], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200],
       RAWTOHEX("RESID")[4000], "ETAG"[RAW,16]
   2 - "DATA" /*+ LOB_BY_VALUE */ [JSON,8200],
       "franck"."RESID"[RAW,2000], "ETAG"[RAW,16]
   3 - (#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]
   4 - "franck".ROWID[ROWID,10], "franck"."SYS_NC00005$"[RAW,4000],
       SYSVARCOL[8]

This gives lots of information about how Oracle runs MongoDB queries transformed to complex SQL using JSON and even XML capabilities.

Reasons for the additional SORT

The HASH UNIQUE could be a query planner decision, as the sort is not preserved by TABLE ACCESS BY INDEX ROWID BATCHED. I can run the same with hints to disable these:

db.aggregate( [ { $sql : `
select /*+ gather_plan_statistics
      NO_BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1")
      NO_USE_HASH_AGGREGATION(@"SEL$1" UNIQUE)
*/ "DATA",rawtohex("RESID"),"ETAG" from "ORA"."franck" where
JSON_EXISTS("DATA",'$?(@.field1.stringOnly() == $B0)' passing 'x' as
"B0" type(strict)) order by JSON_QUERY("DATA", '$.field1[*].min()') asc
nulls first, JSON_QUERY("DATA", '$.field2[*].min()') asc nulls first
`
 } ] );

db.aggregate( [ { $sql : `
select * from dbms_xplan.display_cursor(format=>'BASIC ALLSTATS LAST +PROJECTION +PREDICATE')
` } ] ).forEach(row => print(row.PLAN_TABLE_OUTPUT));

EXPLAINED SQL STATEMENT:
------------------------
 select /*+ gather_plan_statistics
NO_BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "franck"@"SEL$1")
NO_USE_HASH_AGGREGATION(@"SEL$1" UNIQUE) */
"DATA",rawtohex("RESID"),"ETAG" from "ORA"."franck" where
JSON_EXISTS("DATA",'$?(@.field1.stringOnly() == $B0)' passing 'x' as
"B0" type(strict)) order by JSON_QUERY("DATA", '$.field1[*].min()') asc
nulls first, JSON_QUERY("DATA", '$.field2[*].min()') asc nulls first

Plan hash value: 1223435510

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                               |      0 |        |      0 |00:00:00.01 |       0 |       |       |          |
|   1 |  SORT ORDER BY                    |                               |      1 |      1 |      1 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID     | franck                        |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|   3 |    SORT UNIQUE                    |                               |      1 |      1 |      1 |00:00:00.01 |       1 |  2048 |  2048 | 2048  (0)|
|*  4 |     INDEX RANGE SCAN (MULTI VALUE)| $ora:franck.field1_1_field2_2 |      1 |      1 |      2 |00:00:00.01 |       1 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("franck"."SYS_NC00005$"=SYS_CONS_ANY_SCALAR('x', 4))

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=2) JSON_VALUE(JSON_QUERY("DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.field1[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL
       ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING ANY ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767], JSON_VALUE(JSON_QUERY("DATA" /*+
       LOB_BY_VALUE */  FORMAT OSON , '$.field2[*].min()' RETURNING JSON WITHOUT ARRAY WRAPPER NULL ON ERROR TYPE(LAX) ) FORMAT OSON , '$' RETURNING
       ANY ORA_RAWCOMPARE(32767) ERROR ON ERROR TYPE(LAX) )[32767], "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], RAWTOHEX("RESID")[4000], "ETAG"[RAW,16]
   2 - "DATA" /*+ LOB_BY_VALUE */ [JSON,8200], "franck"."RESID"[RAW,2000], "ETAG"[RAW,16]
   3 - (#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]
   4 - "franck".ROWID[ROWID,10], "franck"."SYS_NC00005$"[RAW,4000], SYSVARCOL[8]

Without batching the access to the heap table, this execution still sorts by ROWID, as the key for the SORT UNIQUE is: (#keys=2) "franck".ROWID[ROWID,10], SYSVARCOL[8]. This breaks the apparent ordering that we have seen in the index dump, and explains why Oracle has to execute an additional SORT ORDER BY operation before returning the results. The key for this sort is: '$.field1[*].min()' which is the correct behavior as MongoDB sorts on the minimum value in the array for an ascending sort.

Oracle multi-value indexes have been built on top of regular indexes (typ: 2 - INDEX in the dump) that have no built-in deduplication like MongoDB multi-key indexes. The deduplication is built on top of it with a UNIQUE operation on ROWID. Finally, Oracle multi-value indexes have the same limitation as PostgreSQL GIN indexes, which builds deduplication with bitmaps: They cannot be used to avoid a sort for efficient pagination queries.

Index created by the MongoDB emulation

If you want to check the index definition of my index, it is visible from the expression-based index definition, knowing the names from the execution plan:

select column_name, column_expression
from all_ind_expressions 
natural join all_ind_columns
where index_owner='ORA' 
  and index_name='$ora:franck.field1_1_field2_2'
  and column_name='SYS_NC00005$'
;

JSON_MKMVI(
 JSON_TABLE(
  "DATA", '$' PRESENT ON EMPTY MINIMAL CROSS PRODUCT WITH ERROR ON PARALLEL ARRAYS COLUMNS( 
    NESTED PATH '$."field1"[*]' COLUMNS( 
     "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH 
    )
  , NESTED PATH '$."field2"[*]' COLUMNS(
     "K1" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH 
    ) 
  ) 
 )  AS "K0","K1"
)

I've found no easy way to run this from the $sql stage of an aggregation pipeline because the column expression is stored as LONG in the Oracle dictionary. I did this just to show that the multi-value index I've created initially follows the same definition.

Key deduplication in MongoDB

The extra sort operations in Oracle emulation are needed to deduplicate the ROWID after the index scan, and then get the original order of the keys back. The documentation doesn't mention it and its proprietary software. The following patent gives some ideas of how it has been implemented: Technique of comprehensively supporting multi-value, multi-field, multilevel, multi-position functional index over stored aggregately stored data in RDBMS

In contrast, MongoDB's IXSCAN includes built-in deduplication that maintains the streaming order of index scan, with source available (see RecordIdDeduplicator).

The algorithm optimizes memory usage based on the number of keys. Before MongoDB 8.0, IndexScan deduplication relied on a simple in-memory hash set of RecordIds, checking if a RecordId was present and either emitting it or skipping it. This method preserved streaming order but increased memory usage with more unique IDs.
Starting with MongoDB 8.0, the hash set was replaced by a hybrid HashRoaringSet, which checks membership row by row while maintaining stream order. It stores a limited number of IDs in a flat hash table, switching to a compressed Roaring Bitmap once it exceeds about 1 million unique IDs. This change minimizes memory usage during large scans while preserving deduplication.

Conclusion

Oracle is enhancing its JSON support and indexing capabilities with multi-value indexes (MVI), which are now on par with PostgreSQL's GIN indexes. This improvement facilitates partial emulation of the MongoDB API and its semantic with flexible schemas. However, MVI, like GIN, still requires post-index scan sorting to achieve complete ordering (because another sort must be used to deduplicate the multiple keys), which complicates efficient pagination for multivalued fields, frequent in OLTP workloads.

Oracle’s multi-value indexes (MVI) in 23ai combine several existing features enhanced for this purpose: function-based indexes, virtual columns for field extraction, and JSON functions like JSON_TABLE and NESTED PATH for unnesting arrays. It also incorporates hints originally created for XML, the previous attempt to process documents in the RDBMS. By leveraging these features, it increases complexity but helps DBAs familiar with the RDBMS in troubleshooting, compensating for the incomplete .explain() emulation with the traditional DBMS_XPLAN and dumping index internals.

MongoDB supports complex documents with arrays, removing the need for an emulation layer between nested structures and traditional RDBMS storage. What would you choose? The best database is the one you’re most comfortable with. When looking at those examples, which index definition is the easiest to understand for your team? And which execution plan can they interpret to understand the scalability of their queries?

DEV Community