Franck Pachot

Posted on Mar 19 • Edited on Mar 21

MongoDB equivalent for PostgreSQL JSONB operations

#postgres #mongodb #document #database

This article examines PostgreSQL operations that benefit from GIN indexes, as listed in Built-in GIN Operator Classes.

I created a table containing one million rows, plus three additional rows relevant to my queries. The goal is to demonstrate the benefits of using indexes, allowing for efficient data retrieval without the need to read all rows.

Sample table in PostgreSQL

Here is a table that simulates a document collection, with all attributes in a JSONB column:

CREATE TABLE data (
    id SERIAL PRIMARY KEY,
    details JSONB
);

INSERT INTO data (details)
SELECT jsonb_build_object(
    'stuff', md5(random()::text)
)
FROM generate_series(1, 1000000);

ANALYZE data;

INSERT INTO data (details) VALUES
('{"name": "Alice", "age": 30, "tags": ["developer", "engineer"]}'),
('{"name": "Bob", "age": 25, "tags": ["designer"]}'),
('{"name": "Carol", "age": 40, "tags": ["developer", "manager"]}');

Sample collection on MongoDB

I am creating the same data in a MongoDB collection:

const bulk = db.data.initializeUnorderedBulkOp();
for (let i = 0; i < 1000000; i++) {
    bulk.insert({
        stuff: Math.random().toString(36).substring(2, 15)
    });
}
bulk.execute();

db.data.insertMany([
    { "name": "Alice", "age": 30, "tags": ["developer", "engineer"] },
    { "name": "Bob", "age": 25, "tags": ["designer"] },
    { "name": "Carol", "age": 40, "tags": ["developer", "manager"] }
]);

@> (jsonb, jsonb): Contains

The following queries search for data where the tag array contains a value:

PostgreSQL:

SELECT * FROM data WHERE details @> '{"tags": ["developer"]}'
;

MongoDB:

db.data.find({ tags: { $all: ["developer"] } })
;

Without an index, this runs a SeqScan on PostgreSQL:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details @> '{"tags": ["developer"]}'
;
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on data (actual time=144.002..144.004 rows=2 loops=1)
   Filter: (details @> '{"tags": ["developer"]}'::jsonb)
   Rows Removed by Filter: 1000001
   Buffers: shared hit=5715 read=4595
 Planning Time: 0.065 ms
 Serialization: time=0.008 ms  output=1kB  format=text
 Execution Time: 144.236 ms

and a COLLSCAN on MongoDB:

test> print( 
db.data.find({ tags: { $all: ["developer"] } })
.explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 2,
  executionTimeMillis: 437,
  totalKeysExamined: 0,
  totalDocsExamined: 1000003,
  executionStages: {
    stage: 'COLLSCAN',
    filter: { tags: { '$eq': 'developer' } },
    nReturned: 2,
    executionTimeMillisEstimate: 391,
    works: 1000004,
    advanced: 2,
    direction: 'forward',
    docsExamined: 1000003
  }
}

I create a jsonb_path_ops GIN index on PostgreSQL:

CREATE INDEX idx_details ON data USING GIN (details jsonb_path_ops)
;

and a multi-key index sparse index on MongoDB for the array of tags:

db.data.createIndex({ tags: 1 } , { sparse: true })
;

PostgreSQL uses the GIN index to find the two index entries, though a Bitmap Index Scan, and then the two rows where the Recheck Cond didn't have to filter more:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details @> '{"tags": ["developer"]}'
;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=0.019..0.021 rows=2 loops=1)
   Recheck Cond: (details @> '{"tags": ["developer"]}'::jsonb)
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_details (actual time=0.010..0.010 rows=2 loops=1)
         Index Cond: (details @> '{"tags": ["developer"]}'::jsonb)
         Buffers: shared hit=4
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.079 ms
 Serialization: time=0.005 ms  output=1kB  format=text
 Execution Time: 0.041 ms

MongoDB efficiently reads two index entries and retrieves two documents without the need of bitmaps. This method preserves the index order, which is advantageous when handling multiple rows.

test> print( 
db.data.find({ tags: { $all: ["developer"] } })
. explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 2,
  executionTimeMillis: 0,
  totalKeysExamined: 2,
  totalDocsExamined: 2,
  executionStages: {
    stage: 'FETCH',
    nReturned: 2,
    executionTimeMillisEstimate: 0,
    works: 3,
    advanced: 2,
    docsExamined: 2,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 2,
      executionTimeMillisEstimate: 0,
      works: 3,
      advanced: 2,
      keyPattern: { tags: 1 },
      indexName: 'tags_1',
      isMultiKey: true,
      multiKeyPaths: { tags: [ 'tags' ] },
      isSparse: true,
      isPartial: false,
      direction: 'forward',
      indexBounds: { tags: [ '["developer", "developer"]' ] },
      keysExamined: 2,
      seeks: 1,
      dupsTested: 2
    }
  }
}

@? (jsonb, jsonpath): JSON Path Match

The following query searches for developers, as well as non developers younger than 35:

On PostgreSQL:

SELECT * FROM data 
 WHERE details @? '$?(@.tags[*] == "developer" || @.age < 35)'
;

On MongoDB:

db.data.find({
  $or: [
    { tags: { $elemMatch: { $eq: "developer" } } },
    { age: { $lt: 35 } }
  ]
});

Without an additional index on "age", MongoDB chooses a COLLSCAN:

test> print( 
db.data.find({
  $or: [
    { tags: { $elemMatch: { $eq: "developer" } } },
    { age: { $lt: 35 } }
  ]
}).explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 585,
  totalKeysExamined: 0,
  totalDocsExamined: 1000003,
  executionStages: {
    isCached: false,
    stage: 'SUBPLAN',
    nReturned: 3,
    executionTimeMillisEstimate: 547,
    works: 1000004,
    advanced: 3,
    needTime: 1000000,
    inputStage: {
      stage: 'COLLSCAN',
      filter: {
        '$or': [
          { tags: { '$elemMatch': [Object] } },
          { age: { '$lt': 35 } }
        ]
      },
      nReturned: 3,
      executionTimeMillisEstimate: 517,
      works: 1000004,
      advanced: 3,
      needTime: 1000000,
      direction: 'forward',
      docsExamined: 1000003
    }
  }
}

PostgreSQL uses the GIN index but not efficiently as it reads all index entries to remove them later by recheck:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data 
WHERE details @? '$?(@.tags[*] == "developer" || @.age < 35)'
;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=582.323..582.327 rows=3 loops=1)
   Recheck Cond: (details @? '$?(@."tags"[*] == "developer" || @."age" < 35)'::jsonpath)
   Rows Removed by Index Recheck: 1000000
   Heap Blocks: exact=10310
   Buffers: shared hit=14703
   ->  Bitmap Index Scan on idx_details (actual time=123.755..123.755 rows=1000003 loops=1)
         Index Cond: (details @? '$?(@."tags"[*] == "developer" || @."age" < 35)'::jsonpath)
         Buffers: shared hit=4393
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.117 ms
 Serialization: time=0.009 ms  output=1kB  format=text
 Execution Time: 582.575 ms

The solution, in both databases, is to add an index on "age".

An expression-based index in PostgreSQL:

CREATE INDEX idx_age ON data ( ((details->>'age')::int) )
;

A regular index on MongoDB, but sparse as I don't need to index missing values:

db.data.createIndex({ age: 1 }, { sparse: true} );

Here is the new execution plan in MongoDB which can combine the multi-key index on "tags" and the regular index on "age":

print( 
db.data.find({
  $or: [
    { tags: { $elemMatch: { $eq: "developer" } } },
    { age: { $lt: 35 } }
  ]
}).explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 4,
  totalDocsExamined: 5,
  executionStages: {
    isCached: false,
    stage: 'SUBPLAN',
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 6,
    advanced: 3,
    inputStage: {
      stage: 'FETCH',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 6,
      advanced: 3,
      docsExamined: 3,
      alreadyHasObj: 2,
      inputStage: {
        stage: 'OR',
        nReturned: 3,
        executionTimeMillisEstimate: 0,
        works: 6,
        advanced: 3,
        dupsTested: 4,
        dupsDropped: 1,
        inputStages: [
          {
            stage: 'FETCH',
            filter: { '$or': [Array] },
            nReturned: 2,
            executionTimeMillisEstimate: 0,
            works: 3,
            advanced: 2,
            docsExamined: 2,
            alreadyHasObj: 0,
            inputStage: {
              stage: 'IXSCAN',
              nReturned: 2,
              executionTimeMillisEstimate: 0,
              works: 3,
              advanced: 2,
              keyPattern: [Object],
              indexName: 'tags_1',
              isMultiKey: true,
              multiKeyPaths: [Object],
              isUnique: false,
              isSparse: true,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: [Object],
              keysExamined: 2,
              seeks: 1,
              dupsTested: 2,
              dupsDropped: 0
            }
          },
          {
            stage: 'IXSCAN',
            nReturned: 2,
            executionTimeMillisEstimate: 0,
            works: 3,
            advanced: 2,
            keyPattern: { age: 1 },
            indexName: 'age_1',
            isMultiKey: false,
            multiKeyPaths: { age: [] },
            isUnique: false,
            isSparse: true,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { age: [Array] },
            keysExamined: 2,
            seeks: 1,
            dupsTested: 0,
            dupsDropped: 0
          }
        ]
      }
    }
  }
}

I've detailed this OR expansion in a previous blog post:

Order-preserving or-expansion in MongoDB and $in:[,,] treated as a range skip scan, or exploded to an equality for the ESR rule

Franck Pachot for MongoDB ・ Feb 27

#mongodb #postgres #index #sort

PostgreSQL cannot do the same and you need to write the query differently:

postgres=# explain (analyze, costs off, buffers, serialize text)
-- Query for using GIN index on "details" column
SELECT * FROM data 
WHERE details @? '$?(@.tags[*] == "developer")'
UNION
-- Query for using B-tree index on "age" column
SELECT * FROM data 
WHERE (details->>'age')::int < 35
;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 HashAggregate (actual time=0.134..0.220 rows=3 loops=1)
   Group Key: data.id, data.details
   Batches: 1  Memory Usage: 1561kB
   Buffers: shared hit=9
   ->  Append (actual time=0.023..0.035 rows=4 loops=1)
         Buffers: shared hit=9
         ->  Bitmap Heap Scan on data (actual time=0.022..0.024 rows=2 loops=1)
               Recheck Cond: (details @? '$?(@."tags"[*] == "developer")'::jsonpath)
               Heap Blocks: exact=1
               Buffers: shared hit=5
               ->  Bitmap Index Scan on idx_details (actual time=0.012..0.012 rows=2 loops=1)
                     Index Cond: (details @? '$?(@."tags"[*] == "developer")'::jsonpath)
                     Buffers: shared hit=4
         ->  Bitmap Heap Scan on data data_1 (actual time=0.009..0.010 rows=2 loops=1)
               Recheck Cond: (((details ->> 'age'::text))::integer < 35)
               Heap Blocks: exact=1
               Buffers: shared hit=4
               ->  Bitmap Index Scan on idx_age (actual time=0.008..0.008 rows=2 loops=1)
                     Index Cond: (((details ->> 'age'::text))::integer < 35)
                     Buffers: shared hit=3
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.160 ms
 Serialization: time=0.006 ms  output=1kB  format=text
 Execution Time: 0.288 ms

Another possibility in PostgreSQL is to use a compound index rather than two indexes, with the BTREE_GIN extension:

postgres=# create extension btree_gin;
CREATE EXTENSION

postgres=# CREATE INDEX idx_details_comp2 ON data USING GIN ( details , ((details->>'age')::int));
CREATE INDEX

postgres=# explain (analyze, costs off, buffers)
-- Query for using GIN index on `details` column
SELECT * FROM data 
WHERE details @? '$?(@.tags[*] == "developer")'
UNION
-- Query for using B-tree index on `age` column
SELECT * FROM data 
WHERE (details->>'age')::int < 35
;
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 HashAggregate (actual time=0.153..0.238 rows=3 loops=1)
   Group Key: data.id, data.details
   Batches: 1  Memory Usage: 1561kB
   Buffers: shared hit=16
   ->  Append (actual time=0.034..0.046 rows=4 loops=1)
         Buffers: shared hit=16
         ->  Bitmap Heap Scan on data (actual time=0.033..0.035 rows=2 loops=1)
               Recheck Cond: (details @? '$?(@."tags"[*] == "developer")'::jsonpath)
               Heap Blocks: exact=1
               Buffers: shared hit=11
               ->  Bitmap Index Scan on idx_details_comp2 (actual time=0.022..0.022 rows=2 loops=1)
                     Index Cond: (details @? '$?(@."tags"[*] == "developer")'::jsonpath)
                     Buffers: shared hit=10
         ->  Bitmap Heap Scan on data data_1 (actual time=0.009..0.009 rows=2 loops=1)
               Recheck Cond: (((details ->> 'age'::text))::integer < 35)
               Heap Blocks: exact=1
               Buffers: shared hit=5
               ->  Bitmap Index Scan on idx_details_comp2 (actual time=0.008..0.008 rows=2 loops=1)
                     Index Cond: (((details ->> 'age'::text))::integer < 35)
                     Buffers: shared hit=4
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.180 ms
 Execution Time: 0.303 ms

The same index is used two times, so it doesn't improve the performance, and it still needs to rewrite the query.

@@ (jsonb, jsonpath): JSON Path Predicate

The following query find all people younger than 35.

PostgreSQL:

SELECT * FROM data WHERE details @@ '$.age < 35';

MongoDB:

db.data.find({ age: { $lt: 35 } })

MongoDB doesn't use a JSON Path syntax and can use the index on "age":

test> print(
 db.data.find({ age: { $lt: 35 } })
.explain('executionStats').executionStats
);
{
  executionSuccess: true,
  nReturned: 2,
  executionTimeMillis: 0,
  totalKeysExamined: 2,
  totalDocsExamined: 2,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 2,
    executionTimeMillisEstimate: 0,
    works: 3,
    advanced: 2,
    docsExamined: 2,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 2,
      executionTimeMillisEstimate: 0,
      works: 3,
      advanced: 2,
      keyPattern: { age: 1 },
      indexName: 'age_1',
      isMultiKey: false,
      multiKeyPaths: { age: [] },
      isUnique: false,
      isSparse: true,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { age: [ '[-inf.0, 35)' ] },
      keysExamined: 2,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

With the JSON Path syntax, PostgreSQL doesn't use the index on "age" but the GIN index. However, it gets all index entries and need to filter later with a Recheck Cond:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details @@ '$.age < 35'
;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=438.601..438.604 rows=2 loops=1)
   Recheck Cond: (details @@ '($."age" < 35)'::jsonpath)
   Rows Removed by Index Recheck: 1000001
   Heap Blocks: exact=10310
   Buffers: shared hit=14703
   ->  Bitmap Index Scan on idx_details (actual time=122.889..122.889 rows=1000003 loops=1)
         Index Cond: (details @@ '($."age" < 35)'::jsonpath)
         Buffers: shared hit=4393
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.102 ms
 Serialization: time=0.009 ms  output=1kB  format=text
 Execution Time: 438.848 ms

It is more efficient to write the query to be able to use the expression-based index:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE (details->>'age')::int < 35
;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=0.018..0.019 rows=2 loops=1)
   Recheck Cond: (((details ->> 'age'::text))::integer < 35)
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on idx_age (actual time=0.011..0.011 rows=2 loops=1)
         Index Cond: (((details ->> 'age'::text))::integer < 35)
         Buffers: shared hit=3
 Planning Time: 0.063 ms
 Serialization: time=0.006 ms  output=1kB  format=text
 Execution Time: 0.041 ms

In conclusion, even if a GIN index can serve JSON Path queries, it may not be optimal and a more traditional approach is preferrable.

? (jsonb, text): Key Exists

I created a GIN index using json_path_ops to hold index entries for each path's values. However, to effectively test for key presence, further indexing with json_ops is necessary. Although json_ops is the default option, I recommend avoiding it unless you need to use operators for key existence.

CREATE INDEX idx_details_all ON data USING GIN (details jsonb_ops)
;

This index allows to quickly check the presence of "age" in rows:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details ? 'age'
;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=0.021..0.023 rows=3 loops=1)
   Recheck Cond: (details ? 'age'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_details_all (actual time=0.012..0.012 rows=3 loops=1)
         Index Cond: (details ? 'age'::text)
         Buffers: shared hit=4
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.083 ms
 Serialization: time=0.006 ms  output=1kB  format=text
 Execution Time: 0.045 ms

An alternative to this is to create a partial index for rows containing an age.

Similarly, in MongoDB, the sparse index created above can be used to test for the existence of these rows.

test> print( 
db.data.find({ age: { $exists: true } })
.explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 3,
  totalDocsExamined: 3,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 4,
    advanced: 3,
    docsExamined: 3,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 4,
      advanced: 3,
      keyPattern: { age: 1 },
      indexName: 'age_1',
      isMultiKey: false,
      multiKeyPaths: { age: [] },
      isUnique: false,
      isSparse: true,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { age: [ '[MinKey, MaxKey]' ] },
      keysExamined: 3,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

?& (jsonb, text[]): All Keys Exist

The PostgreSQL GIN index, with jsonb_ops rather than json_path_ops, can be used to test the existence of multiple keys:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details ?& '{name, age}'
;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=0.022..0.023 rows=3 loops=1)
   Recheck Cond: (details ?& '{name,age}'::text[])
   Heap Blocks: exact=1
   Buffers: shared hit=8
   ->  Bitmap Index Scan on idx_details_all (actual time=0.014..0.014 rows=3 loops=1)
         Index Cond: (details ?& '{name,age}'::text[])
         Buffers: shared hit=7
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.085 ms
 Serialization: time=0.006 ms  output=1kB  format=text
 Execution Time: 0.045 ms

In MongoDB, this uses the index on "age":

test> print( db.data.find({ 
  $and: [
   { name: { $exists: true } },
   { age: { $exists: true } }
  ]
}). explain('executionStats').executionStats )
;
{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 3,
  totalDocsExamined: 3,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    filter: { name: { '$exists': true } },
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 4,
    advanced: 3,
    docsExamined: 3,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 4,
      advanced: 3,
      keyPattern: { age: 1 },
      indexName: 'age_1',
      isMultiKey: false,
      multiKeyPaths: { age: [] },
      isUnique: false,
      isSparse: true,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { age: [ '[MinKey, MaxKey]' ] },
      keysExamined: 3,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }

The index on "age" was used and the presense of "name" was filtered on the document because I've no index on it.

?| (jsonb, text[]): Any Key Exists

The PostgreSQL GIN index can be used to test the existence of any key in a list:

postgres=# explain (analyze, costs off, buffers, serialize text)
SELECT * FROM data WHERE details ?| '{name, tags}'
;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Bitmap Heap Scan on data (actual time=0.022..0.023 rows=3 loops=1)
   Recheck Cond: (details ?| '{name,tags}'::text[])
   Heap Blocks: exact=1
   Buffers: shared hit=8
   ->  Bitmap Index Scan on idx_details_all (actual time=0.014..0.014 rows=3 loops=1)
         Index Cond: (details ?| '{name,tags}'::text[])
         Buffers: shared hit=7
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.088 ms
 Serialization: time=0.007 ms  output=1kB  format=text
 Execution Time: 0.046 ms

MongoDB must have an index for each key, and easily combines them:

test> db.data.createIndex({ name: 1 } , { sparse: true })
name_1

test> print( 
db.data.find({ $or: [
 { name: { $exists: true } },
 { tags: { $exists: true } }
] }).explain('executionStats').executionStats 
);
{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 8,
  totalDocsExamined: 3,
  executionStages: {
    isCached: false,
    stage: 'SUBPLAN',
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 10,
    advanced: 3,
    needTime: 6,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    inputStage: {
      stage: 'FETCH',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 10,
      advanced: 3,
      needTime: 6,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 3,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'OR',
        nReturned: 3,
        executionTimeMillisEstimate: 0,
        works: 10,
        advanced: 3,
        dupsTested: 6,
        dupsDropped: 3,
        inputStages: [
          {
            stage: 'IXSCAN',
            nReturned: 3,
            executionTimeMillisEstimate: 0,
            works: 6,
            advanced: 3,
            needTime: 2,
            keyPattern: { tags: 1 },
            indexName: 'tags_1',
            isMultiKey: true,
            multiKeyPaths: { tags: [Array] },
            isUnique: false,
            isSparse: true,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { tags: [Array] },
            keysExamined: 5,
            seeks: 1,
            dupsTested: 5,
            dupsDropped: 2
          },
          {
            stage: 'IXSCAN',
            nReturned: 3,
            executionTimeMillisEstimate: 0,
            works: 4,
            advanced: 3,
            keyPattern: { name: 1 },
            indexName: 'name_1',
            isMultiKey: false,
            multiKeyPaths: { name: [] },
            isUnique: false,
            isSparse: true,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { name: [Array] },
            keysExamined: 3,
            seeks: 1,
            dupsTested: 0,
            dupsDropped: 0
          }
        ]
      }
    }
  }
}

MongoDB Wildcard Indexes for Ad-Hoc Queries

In MongoDB, multiple indexes can be created to manage nested structures like subdocuments and arrays. However, if access patterns are unknown, a single wildcard index can simplify the process:

test> db.data.createIndex({"$**": 1});
$**_1
test>

You can remove all indexes created in this blog post and create only this wildcard index. All execution plans will be the same and provide the same performance. The only variation being the overhead at insert time due to the increased number of index entries.

Comparison PostgreSQL JSONB vs MongoDB Documents

In conclusion, both PostgreSQL and MongoDB support JSON data storage without a predefined schema, but they differ in their approaches. PostgreSQL offers GIN indexing and JSON Path for ad-hoc queries, whereas MongoDB indexes all fields automatically if a wildcard is used during index creation, without the need for another index type.

PostgreSQL's JSONB is an additional datatype designed for unstructured data, utilizing GIN indexing for schema-on-read functionality. While expression indexes can be applied to known scalar fields, they do not extend to arrays. JSONB complements SQL relations rather than replacing document databases. Proper indexing, accurate cardinality estimates, and updates should target relational columns, which should remain normalized and joined at read time.

MongoDB supports varied schemas across documents and employs indexing similar to relational databases to optimize access patterns. Its indexing mechanism allows collections to be queried efficiently, with the query planner selecting the appropriate index. MongoDB does not require adherence to the first normal form and supports arrays, leading to multikey indexes similar to inverted indexes.
Notably, documents need not specify indexed fields, and sparse indexes can efficiently verify key presence. For ad-hoc queries where the schema is unknown in advance, MongoDB allows the creation of wildcard indexes. This results in a hybrid model combining schema-on-write for explicit fields and schema-on-read for others. The schema-on-write part can be enforced by Schema Validation.

While MongoDB documents and JSONB columns share JSON format, MongoDB documents resemble extended SQL rows with fewer normalization constraints and greater flexibility for arrays and subdocuments. In contrast, PostgreSQL JSONB allows the addition of subdocuments for storing unstructured data within a relational schema.

DEV Community