DEV Community

Cover image for Extended RUM in DocumentDB: B-tree-like ordered scans for flexible BSON in PostgreSQL
Franck Pachot
Franck Pachot

Posted on

Extended RUM in DocumentDB: B-tree-like ordered scans for flexible BSON in PostgreSQL

The main challenge in document database indexing is the flexibility of fields: the same path can be a scalar, an array, nested, or missing. Despite this, an index must specify what it covers and the order in which rows can be produced. B-tree indexes work well for fixed-scalar columns, enabling prefix filtering and returning sorted rows. GIN and RUM inverted indexes support flexible, repeated values, but traditional GIN doesn't support range queries or sorting, and RUM ordering relies on distance operators on attached values rather than standard document-style ORDER BY field LIMIT n.

DocumentDB's Extended RUM closes that gap. It extends the RUM access method for compound document indexes by generating composite index terms from the indexed paths and applying an ordering transform during the scan. The result is an inverted, multikey-style index that can filter, sort, and stop at LIMIT in a single Index Scan, while preserving document semantics for arrays and missing fields.

Here is the table I created for my previous blog post, RUM—Storing More in the Index:


postgres=# \d articles
                                                                   Table "public.articles"

  Column   |            Type             | Collation | Nullable |                                           Default                                         
-----------+-----------------------------+-----------+----------+---------------------------------------------------------------------------------------------
 id        | integer                     |           | not null | nextval('articles_id_seq'::regclass)
 title     | text                        |           | not null |
 body      | text                        |           | not null |
 category  | text                        |           | not null |
 published | timestamp without time zone |           | not null |
 score     | integer                     |           | not null |
 tsv       | tsvector                    |           |          | generated always as (to_tsvector('simple'::regconfig, (title || ' '::text) || body)) stored

Indexes:

    "articles_pkey" PRIMARY KEY, btree (id)
    "idx_gin_tsv" gin (tsv)
    "idx_rum_multi" rum (tsv rum_tsvector_addon_ops, category, published) WITH (attach=published, "to"=tsv)

Enter fullscreen mode Exit fullscreen mode

The RUM index supports filtering and ordering by distance:


--EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)  
SELECT id, published
 , published <=> '1970-01-01'::timestamp as " <=> 1970"
 , extract ( epoch from published )      as " epoch "
FROM articles
WHERE tsv @@ to_tsquery('english', 'postgresql')
  AND category = 'tech'
ORDER BY published <=> '1970-06-01'::timestamp
LIMIT 5
;

 id  |      published      |  <=> 1970  |       epoch
-----+---------------------+------------+-------------------
  20 | 2020-01-01 20:00:00 | 1577908800 | 1577908800.000000
  40 | 2020-01-02 16:00:00 | 1577980800 | 1577980800.000000
  60 | 2020-01-03 12:00:00 | 1578052800 | 1578052800.000000
  80 | 2020-01-04 08:00:00 | 1578124800 | 1578124800.000000
 100 | 2020-01-05 04:00:00 | 1578196800 | 1578196800.000000

(5 rows)

                                                                                                                                                                                                                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=70.167..70.177 rows=5 loops=1)
   Output: id, published, ((published <=> '1970-01-01 00:00:00'::timestamp without time zone)), (EXTRACT(epoch FROM published)), ((published <=> '1970-06-01 00:00:00'::timestamp without time zone))
   Buffers: shared hit=223, temp read=550 written=550
   ->  Index Scan using idx_rum_multi on public.articles (actual time=70.165..70.175 rows=5 loops=1)
         Output: id, published, (published <=> '1970-01-01 00:00:00'::timestamp without time zone), EXTRACT(epoch FROM published), (published <=> '1970-06-01 00:00:00'::timestamp without time zone)
         Index Cond: ((articles.tsv @@ '''postgresql'''::tsquery) AND (articles.category = 'tech'::text))
         Order By: (articles.published <=> '1970-06-01 00:00:00'::timestamp without time zone)
         Buffers: shared hit=223, temp read=550 written=550
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.124 ms
 Execution Time: 70.698 ms

(12 rows)

Enter fullscreen mode Exit fullscreen mode

Although Order By is integrated into the Index Scan, temp read reveals that it isn't a straightforward ordered index traversal, unlike a B-tree. Internal RUM scan processes spilled over to temporary storage. The key point is that there's no PostgreSQL Sort node involved. However, this is still distance-based ordering rather than simple key ordering.

I used the distance operator <=> with a date earlier than any date in this table, so the query effectively retrieves the first five articles sorted by published date. RUM allows ordering by its distance operators on attached values, such as published <=> constant. This can resemble chronological ordering when the constant is outside the data range, but it isn't the same as a simple ORDER BY published.

If I use a basic ORDER BY in my query without applying the distance operator, I obtain the same result, but it takes longer to execute:


--EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)  
SELECT id, published
 , published <=> '1970-01-01'::timestamp as " <=> 1970"
 , extract ( epoch from published )      as " epoch "
FROM articles
WHERE tsv @@ to_tsquery('english', 'postgresql')
  AND category = 'tech'
ORDER BY published
LIMIT 5
;

 id  |      published      |  <=> 1970  |       epoch
-----+---------------------+------------+-------------------
  20 | 2020-01-01 20:00:00 | 1577908800 | 1577908800.000000
  40 | 2020-01-02 16:00:00 | 1577980800 | 1577980800.000000
  60 | 2020-01-03 12:00:00 | 1578052800 | 1578052800.000000
  80 | 2020-01-04 08:00:00 | 1578124800 | 1578124800.000000
 100 | 2020-01-05 04:00:00 | 1578196800 | 1578196800.000000

(5 rows)

                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=21870.366..21870.368 rows=5 loops=1)
   Output: id, published, ((published <=> '1970-01-01 00:00:00'::timestamp without time zone)), (EXTRACT(epoch FROM published))
   Buffers: shared hit=3 read=50215 written=1
   ->  Sort (actual time=21870.364..21870.366 rows=5 loops=1)
         Output: id, published, ((published <=> '1970-01-01 00:00:00'::timestamp without time zone)), (EXTRACT(epoch FROM published))
         Sort Key: articles.published
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=3 read=50215 written=1
         ->  Bitmap Heap Scan on public.articles (actual time=85.266..21852.624 rows=50000 loops=1)
               Output: id, published, (published <=> '1970-01-01 00:00:00'::timestamp without time zone), EXTRACT(epoch FROM published)
               Recheck Cond: ((articles.tsv @@ '''postgresql'''::tsquery) AND (articles.category = 'tech'::text))
               Heap Blocks: exact=50000
               Buffers: shared hit=3 read=50215 written=1
               ->  Bitmap Index Scan on idx_rum_multi (actual time=75.951..75.951 rows=50000 loops=1)
                     Index Cond: ((articles.tsv @@ '''postgresql'''::tsquery) AND (articles.category = 'tech'::text))
                     Buffers: shared hit=3 read=215
 Planning:
   Buffers: shared read=2
 Planning Time: 2.276 ms
 Execution Time: 21870.406 ms

(20 rows)

Enter fullscreen mode Exit fullscreen mode

RUM's ordering mechanism uses the <=> distance operator, which measures distance from a reference point. This differs from simply using ORDER BY published. When you directly apply ORDER BY published, RUM defaults to a bitmap scan combined with sorting.

For a query involving 1 million articles filtered by words = 'postgresql' and category = 'tech' (which yields 50K matches here) and sorted by published, it reads all matching rows and sorts them, taking 21 seconds because it reads 50K heap blocks instead of stopping after finding just 5 rows like expected with ORDER BY ... LIMIT.

In tables with a strict schema and no arrays, the solution is straightforward. B-tree indexes store entries in sorted order, enabling efficient filtering and retrieval. A composite B-tree index on (category, published) can filter for category='tech' and return results already ordered by published, eliminating the need for a separate sort step:

postgres=# CREATE INDEX idx_articles_category_published
           ON articles (category, published)
;

postgres=# EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)  
 SELECT id, published
  , published <=> '1970-01-01'::timestamp as " <=> 1970"
  , extract ( epoch from published )      as " epoch "
 FROM articles
 WHERE tsv @@ to_tsquery('english', 'postgresql')
   AND category = 'tech'
 ORDER BY published
 LIMIT 5
;

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.036..0.050 rows=5 loops=1)
   Output: id, published, ((published <=> '1970-01-01 00:00:00'::timestamp without time zone)), (EXTRACT(epoch FROM published))
   Buffers: shared hit=9
   ->  Index Scan using idx_articles_category_published on public.articles (actual time=0.035..0.048 rows=5 loops=1)
         Output: id, published, (published <=> '1970-01-01 00:00:00'::timestamp without time zone), EXTRACT(epoch FROM published)
         Index Cond: (articles.category = 'tech'::text)
         Filter: (articles.tsv @@ '''postgresql'''::tsquery)
         Rows Removed by Filter: 20
         Buffers: shared hit=9
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.129 ms
 Execution Time: 0.065 ms

(13 rows)

Enter fullscreen mode Exit fullscreen mode

This is fast because the (category, published) B-tree walks category='tech' entries in timestamp order, and the filter predicate happens to reject only 20 rows before finding 5 matches. If the text predicate were much more selective or poorly correlated with published, this plan could also scan many rows.

B-trees index only columns with well-typed scalar values. A B-tree can index extracted scalar expressions from JSONB, but it does not naturally support document-database multikey semantics, where the same path may be scalar, an array, nested, or absent. For that, you need an inverted/multikey-style index. Here, I skipped indexing "words" because I know it can contain an array, and it relies on the fact that "category" can contain only one value. This is true in SQL, where the schema is declared for the table, but not for a polymorphic document collection.

To show the same access pattern with flexible documents, I use the DocumentDB extension for PostgreSQL. I create a collection to store the same data in a flexible schema:


SELECT documentdb_api.create_collection('db', 'articles');

SELECT documentdb_api.insert_one(
    'db',
    'articles',
    FORMAT(
        '{"_id": %s, "title": %s, "body": %s, "category": %s, "published": {"$date": {"$numberLong": "%s"}}, "score": %s, "words": %s}',
        to_json(id),
        to_json(title),
        to_json(body),
        to_json(category),
        (EXTRACT(EPOCH FROM published) * 1000)::bigint,
        to_json(score),
        to_json(tsvector_to_array(tsv))
    )::documentdb_core.bson
)
FROM articles;

SELECT documentdb_api_internal.create_indexes_non_concurrently('db',   
    '{ "createIndexes": "articles", "indexes": [ {   
        "key": { "words":1, "category": 1, "published": -1 },   
        "name": "idx_wrd_cat_pub"   
    } ] }',   
    true)
;
Enter fullscreen mode Exit fullscreen mode

This is similar to the "articles" table and index, but in a collection where the data type and cardinality don't have to be declared in advance. The index definition doesn't need to know that "words" contains an array and that there's only one "category" per article.

Here is a similar query and its execution plan:


postgres=# SET documentdb_core.bsonUseEJson to true;

SET

postgres=# SELECT documentdb_api_catalog.bson_dollar_unwind(
 cursorpage, '$cursor.firstBatch'
) FROM documentdb_api.find_cursor_first_page(
    'db', '{
         "find": "articles",
         "filter": { "words": "postgresql", "category": "tech" },
         "sort": { "published": 1 },
         "limit": 5,
         "projection": { "_id": 1, "title": 1, "published": 1 }
}'::documentdb_core.bson
)
;
                                                                                                                                   bson_dollar_unwind

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------
 { "cursor" : { "id" : { "$numberLong" : "0" }, "ns" : "db.articles", "firstBatch" : { "_id" : { "$numberInt" : "20" }, "title" : "Article about postgresql and postgresql", "published" : { "$date" : { "$numberLong" : "1577908800000" } } } }, "ok" : { "$numberDouble" : "
1.0" } }
 { "cursor" : { "id" : { "$numberLong" : "0" }, "ns" : "db.articles", "firstBatch" : { "_id" : { "$numberInt" : "40" }, "title" : "Article about postgresql and postgresql", "published" : { "$date" : { "$numberLong" : "1577980800000" } } } }, "ok" : { "$numberDouble" : "
1.0" } }
 { "cursor" : { "id" : { "$numberLong" : "0" }, "ns" : "db.articles", "firstBatch" : { "_id" : { "$numberInt" : "60" }, "title" : "Article about postgresql and postgresql", "published" : { "$date" : { "$numberLong" : "1578052800000" } } } }, "ok" : { "$numberDouble" : "
1.0" } }
 { "cursor" : { "id" : { "$numberLong" : "0" }, "ns" : "db.articles", "firstBatch" : { "_id" : { "$numberInt" : "80" }, "title" : "Article about postgresql and postgresql", "published" : { "$date" : { "$numberLong" : "1578124800000" } } } }, "ok" : { "$numberDouble" : "
1.0" } }
 { "cursor" : { "id" : { "$numberLong" : "0" }, "ns" : "db.articles", "firstBatch" : { "_id" : { "$numberInt" : "100" }, "title" : "Article about postgresql and postgresql", "published" : { "$date" : { "$numberLong" : "1578196800000" } } } }, "ok" : { "$numberDouble" :
"1.0" } }

(5 rows)

postgres=# EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)
SELECT documentdb_api_catalog.bson_dollar_unwind(
 cursorpage, '$cursor.firstBatch'
) FROM documentdb_api.find_cursor_first_page(
    'db', '{
         "find": "articles",
         "filter": { "words": "postgresql", "category": "tech" },
         "sort": { "published": 1 },
         "limit": 5,
         "projection": { "_id": 1, "title": 1, "published": 1 }
}'::documentdb_core.bson
)
;
                                                                                                                                                                                                                                                                                                                                                                                                     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
 ProjectSet (actual time=0.294..0.295 rows=5 loops=1)
   Output: bson_dollar_unwind(cursorpage, '$cursor.firstBatch'::text)
   Buffers: shared hit=14
   ->  Function Scan on documentdb_api.find_cursor_first_page (actual time=0.284..0.284 rows=1 loops=1)
         Output: cursorpage, continuation, persistconnection, cursorid
         Function Call: documentdb_api.find_cursor_first_page('db'::text, '{ "find" : "articles", "filter" : { "words" : "postgresql", "category" : "tech" }, "sort" : { "published" : { "$numberInt" : "1" } }, "limit" : { "$numberInt" : "5" }, "projection" : { "_id" : {
"$numberInt" : "1" }, "title" : { "$numberInt" : "1" }, "published" : { "$numberInt" : "1" } } }'::bson, '0'::bigint)
         Buffers: shared hit=14
 Planning Time: 0.046 ms
 Execution Time: 0.308 ms

(9 rows)
Enter fullscreen mode Exit fullscreen mode

This is extremely fast, reading only 14 buffers, including additional lookups inside find_cursor_first_page, and is as efficient as the B-tree index on the SQL table, but on a flexible document collection.

Extended RUM is an extension of the RUM access method that shares the same on-disk page layout but overrides the scan, ordering, and cost-estimation entry points. The key addition is the ordered composite index, which matches the features of a multi-key index in MongoDB.

The find_cursor_first_page function executes everything inside a C function, so PostgreSQL's EXPLAIN only sees it as a black-box Function Scan node. To see the internal plan, I use bson_aggregation_pipeline, which generates inline SQL that the planner can optimize and expose:

postgres=# EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)
SELECT document FROM documentdb_api_catalog.bson_aggregation_pipeline(
    'db',
    '{"aggregate": "articles", "pipeline": [
        {"$match": {  "words": "postgresql", "category": "tech" }},
        {"$sort": { "published": 1 }},
        {"$limit": 5},
        {"$project": { "_id": 1, "title": 1, "published": 1 }}
    ], "cursor": {}}'::documentdb_core.bson
);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
 Subquery Scan on agg_stage_3 (actual time=0.058..0.075 rows=5 loops=1)
   Output: documentdb_api_internal.bson_dollar_project(agg_stage_3.document, '{ "_id" : { "$numberInt" : "1" }, "title" : { "$numberInt" : "1" }, "published" : { "$numberInt" : "1" } }'::bson, '{ "now" : { "$date" : { "$numberLong" : "1783073064776" } } }'::bson)
   Buffers: shared hit=10
   ->  Limit (actual time=0.051..0.064 rows=5 loops=1)
         Output: collection.document, (bson_orderby(collection.document, '{ "published" : { "$numberInt" : "1" } }'::bson))
         Buffers: shared hit=10
         ->  Index Scan using idx_wrd_cat_pub on documentdb_data.documents_28 collection (actual time=0.050..0.062 rows=5 loops=1)
               Output: collection.document, bson_orderby(collection.document, '{ "published" : { "$numberInt" : "1" } }'::bson)
               Index Cond: ((collection.document @= '{ "words" : "postgresql" }'::bson) AND (collection.document @= '{ "category" : "tech" }'::bson))
               Order By: (collection.document OPERATOR(documentdb_api_internal.<>-|) '{ "published" : { "$numberInt" : "1" } }'::bson)
               Buffers: shared hit=10
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.305 ms
 Execution Time: 0.107 ms

(15 rows)

Enter fullscreen mode Exit fullscreen mode

The plan shows:

  • Index Scan using idx_wrd_cat_pub, the DocumentDB Extended RUM compound index on pathspec='[ "words", "category", { "published": -1 } ]'
  • Index Cond filter on words: "postgresql" and category = "tech", which are in the index prefix and can be scalars or arrays
  • Order By using the ordering <>-| operator on published: 1, which pushes the ascending sort to the index by scanning the descending index in reverse
  • Limit that stops after 5 rows, with no extra work

The result is the desired access pattern: 5 matching rows are returned after examining only 10 cached blocks, with no sort and 0.1 ms execution time.

The sort key path is extracted from the index term at scan time, matching the sort field to the appropriate position in the compound index and returning a comparable value that PostgreSQL can use for ordered retrieval. The result is an Index Scan (not a Bitmap Index Scan) that returns rows in the requested order directly from the inverted index.

DocumentDB's extended RUM achieves B-tree-like performance while retaining the inverted index's ability to handle flexible document fields — values that can be scalars, arrays, missing, or nested objects.

Like B-tree indexes, it can be used for both forward and backward lookups. In this example, I've created the index in descending order ({ "words": 1, "category": 1, "published": -1 }) but it was scanned in ascending order ({"$sort": {"published": 1}}).

Like multi-key indexes in MongoDB, when a field contains an array, the index stores one entry per array element (not one entry per document). The index cannot reconstruct the full array from a single entry, so the document must be fetched. However, when the query doesn't need any data beyond what the index condition already validates, an Index Only Scan is possible:

postgres=# SELECT documentdb_api_internal.create_indexes_non_concurrently('db',
    '{ "createIndexes": "articles", "indexes": [ {
        "key": { "category": 1, "published": 1 },
        "name": "idx_cat_pub"
    } ] }',
    true);

postgres=# VACUUM;

VACUUM

postgres=# EXPLAIN (COSTS OFF, ANALYZE ON, BUFFERS ON, VERBOSE ON)
SELECT document
FROM documentdb_api_catalog.bson_aggregation_pipeline(
    'db',
    '{
        "aggregate": "articles",
        "pipeline": [
            { "$match": { "category": "tech" } },
            { "$count": "count" }
        ],
        "cursor": {}
    }'::documentdb_core.bson
);

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (actual time=165.000..165.001 rows=1 loops=1)
   Output: bson_repath_and_build('count'::text, documentdb_api_internal.bsoncount(1))
   Buffers: shared hit=3172
   ->  Index Only Scan using idx_cat_pub on documentdb_data.documents_28 collection (actual time=0.032..149.004 rows=250000 loops=1)
         Output: collection.document
         Index Cond: (collection.document @= '{ "category" : "tech" }'::bson)
         Heap Fetches: 0
         Buffers: shared hit=3172
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.197 ms
 Execution Time: 165.025 ms

(12 rows)

Enter fullscreen mode Exit fullscreen mode

The $count only needs to count matching entries, not return field values. The index can confirm that category = "tech" matches without fetching the heap row. Heap Fetches: 0 because the visibility map confirms all rows are visible, so no heap fetch is needed. This requires the table to have been vacuumed, which is usually done by autovacuum. Counting 250,000 documents out of one million takes 165 milliseconds with an Index Only Scan. It would take 30 seconds if it had to fetch all documents with a Bitmap Heap Scan.

Here's an overview of the internal process:

  1. Index term composition: When a document is added, DocumentDB creates a combined index term that encodes all relevant paths (such as category and published date) into a single serialized bytea entry within the RUM entry tree.

  2. Ordered scan: During query execution, if the sort order matches an indexed path, the RUM scan applies the ordering transform function to retrieve the sort key from each composite term. This allows PostgreSQL's Index Scan node to navigate the index in the specified order, reading only necessary entries to meet the LIMIT.

  3. Index-only scan support: If the query can be answered solely using index entries, PostgreSQL can bypass heap fetches when the visibility map indicates all heap pages are visible.

  4. Forward and reverse traversal: The index supports both ascending and descending orderings, aligning with MongoDB's 1 and -1 sort specifications.

The execution plan reveals the internal table name, and you can look at the definition:


postgres=# \d documentdb_data.documents_28

                  Table "documentdb_data.documents_28"

     Column      |         Type         | Collation | Nullable | Default
-----------------+----------------------+-----------+----------+---------
 shard_key_value | bigint               |           | not null |
 object_id       | documentdb_core.bson |           | not null |
 document        | documentdb_core.bson |           | not null |

Indexes:

    "collection_pk_28" PRIMARY KEY, btree (shard_key_value, object_id)
    "documents_rum_index_69" documentdb_extended_rum (document documentdb_extended_rum_catalog.bson_extended_rum_composite_path_ops (pathspec='[ "words", "category", { "published" : -1 } ]', tl='2691'))
    "documents_rum_index_70" documentdb_extended_rum (document documentdb_extended_rum_catalog.bson_extended_rum_composite_path_ops (pathspec='[ "category", "published" ]', tl='2691'))

Check constraints:

    "shard_key_value_check" CHECK (shard_key_value = '28'::bigint)

Enter fullscreen mode Exit fullscreen mode

The indexes created by the DocumentDB API are Extended RUM indexes: documentdb_extended_rum, available with the DocumentDB extension for PostgreSQL. They index a pathspec that includes a sort direction. tl='2691' is the truncation limit because PostgreSQL index entries are limited to about one-third of an 8KB block. Truncated terms still support filtering, with a recheck against the heap.

I used the DocumentDB API in PostgreSQL to view the internal execution plan, but I can also run the same query using the MongoDB-compatible endpoint:

$ mongosh 'mongodb://ddb:ddb@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true'

Current Mongosh Log ID: 6a4780a124863da54dd4b0c1
Connecting to:          mongodb://<credentials>@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true&directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.3.7
Using MongoDB:          7.0.0
Using Mongosh:          2.3.7
mongosh 2.9.2 is available for download: https://www.mongodb.com/try/download/shell

For mongosh info see: https://www.mongodb.com/docs/mongodb-shell/

test> use db;

switched to db db

db> db.articles.find(
 { "words": "postgresql", "category": "tech"}
).sort(
 { "published": 1 }
).limit(5);

[
  {
    _id: 20,
    title: 'Article about postgresql and postgresql',
    body: 'This is the body discussing postgresql in the context of postgresql. We also mention postgresql here.',
    category: 'tech',
    published: ISODate('2020-01-01T20:00:00.000Z'),
    score: 40,
    words: [ 'about', 'also', 'and', 'article', 'body', 'context', 'discussing', 'here', 'in', 'is', 'mention', 'of', 'postgresql', 'the', 'this', 'we' ]
  },
  {
    _id: 40,
    title: 'Article about postgresql and postgresql',
    body: 'This is the body discussing postgresql in the context of postgresql. We also mention postgresql here.',
    category: 'tech',
    published: ISODate('2020-01-02T16:00:00.000Z'),
    score: 80,
    words: [ 'about', 'also', 'and', 'article', 'body', 'context', 'discussing', 'here', 'in', 'is', 'mention', 'of', 'postgresql', 'the', 'this', 'we' ]
  },
  {
    _id: 60,
    title: 'Article about postgresql and postgresql',
    body: 'This is the body discussing postgresql in the context of postgresql. We also mention postgresql here.',
    category: 'tech',
    published: ISODate('2020-01-03T12:00:00.000Z'),
    score: 20,
    words: [ 'about', 'also', 'and', 'article', 'body', 'context', 'discussing', 'here', 'in', 'is', 'mention', 'of', 'postgresql', 'the', 'this', 'we' ]
  },
  {
    _id: 80,
    title: 'Article about postgresql and postgresql',
    body: 'This is the body discussing postgresql in the context of postgresql. We also mention postgresql here.',
    category: 'tech',
    published: ISODate('2020-01-04T08:00:00.000Z'),
    score: 60,
    words: [ 'about', 'also', 'and', 'article', 'body', 'context', 'discussing', 'here', 'in', 'is', 'mention', 'of', 'postgresql', 'the', 'this', 'we' ]
  },
  {
    _id: 100,
    title: 'Article about postgresql and postgresql',
    body: 'This is the body discussing postgresql in the context of postgresql. We also mention postgresql here.',
    category: 'tech',
    published: ISODate('2020-01-05T04:00:00.000Z'),
    score: 0,
    words: [ 'about', 'also', 'and', 'article', 'body', 'context', 'discussing', 'here', 'in', 'is', 'mention', 'of', 'postgresql', 'the', 'this', 'we' ]
  }
]

db> db.articles.find(
 { "words": "postgresql", "category": "tech"}
).sort(
 { "published": 1 }
).limit(5).explain("executionStats").executionStats;

{
  nReturned: Long('5'),
  executionTimeMillis: 0.067,
  executionStartAtTimeMillis: 0.054,
  totalDocsExamined: Long('5'),
  totalKeysExamined: Long('5'),
  executionStages: {
    stage: 'LIMIT',
    nReturned: Long('5'),
    executionTimeMillis: 0.067,
    executionStartAtTimeMillis: 0.054,
    totalDocsExamined: 5,
    totalKeysExamined: 5,
    numBlocksFromCache: 10,
    inputStage: {
      stage: 'FETCH',
      nReturned: Long('5'),
      executionTimeMillis: 0.065,
      executionStartAtTimeMillis: 0.053,
      totalKeysExamined: 5,
      numBlocksFromCache: 10,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: Long('5'),
        executionTimeMillis: 0.065,
        executionStartAtTimeMillis: 0.053,
        indexName: 'idx_wrd_cat_pub',
        totalKeysExamined: 5,
        numBlocksFromCache: 10
      }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

This query is a good example of an ESR pattern: Equality, Sort, Range, common in OLTP applications using MongoDB-style indexes. The MongoDB-compatible plan shows that we get the result from IXSCAN directly (totalKeysExamined: 5, nReturned: Long('5'),) in an inexpensive way (numBlocksFromCache: 10, executionTimeMillis: 0.065) without an additional sort operation, like in MongoDB.

In DocumentDB, the index supports equality on words and category, whether they are scalar or array items, ordered retrieval on published, and early termination at LIMIT 5, all in a single Index Scan.

The important contrast is with traditional PostgreSQL inverted indexes. GIN and classic RUM are excellent for matching flexible predicates, but they do not naturally provide index-ordered retrieval for document-style queries such as ORDER BY field LIMIT n. Without index-supported ordering, PostgreSQL must read all matching rows and sort them, even when only the first few rows are needed.

Extended RUM fills this gap. It keeps the multikey behavior expected from a document index while adding B-tree-like ordered traversal for compound document paths. This allows flexible document queries to efficiently combine filtering, ordering, and early LIMIT termination.

Among the MongoDB-compatible services I tested, I haven't yet found equivalent support for index-ordered retrieval with multikey predicates and sorting. For instance, Oracle multi-value indexes can't keep the order from the index because they need to deduplicate keys after scanning (see here). I also tested Amazon DocumentDB on AWS (see here), and Google Firestore on GCP (see here), and both require reading more and sorting afterward.

DocumentDB’s Extended RUM indexes are available through the DocumentDB extension for PostgreSQL. The project is open source at https://documentdb.io/, originally developed by Microsoft for Cosmos DB and now part of the Linux Foundation.

Top comments (0)