Franck Pachot

Posted on Jun 1

Filter on Children, Sort by Parent: One-to-Many Compound Index Strategies in PostgreSQL

#postgres #sql #documentdb #mongodb

Before looking at PostgreSQL, I'll first introduce the problem by showing how MongoDB's document model and multi-key indexes handle compound indexes across a one-to-many relationship. Then we will see how to approximate this in PostgreSQL by denormalizing on the "many" side or the "one" side, how to maintain consistency with cascade foreign keys or triggers, and how to accelerate filtering with sorted pagination using B-tree, GIN, or RUM indexes.

Multi-key indexes in MongoDB allow compound indexes on one-to-many relationships. For example, the following index covers fields from both the children (child_value) and the parent (parent_value) when children are embedded as an array (in a children field) within the parent document:

db.parent.createIndex(
 { "children.child_value": 1, parent_value: 1 }
);

Such an index can efficiently support an equality filter on child_value combined with a sort and pagination on parent_value. For instance, finding the top 10 parents, ordered by parent_value, where at least one child has a child_value of 0.9:

db.parent.find(
 { "children.child_value": 0.9 }
).sort({ "parent_value": 1 }).limit(10);

It is important to understand the semantics of predicates on embedded arrays. This query does not return the top 10 children with this value — a parent may have multiple matching children, but we get one result per parent. This differs from a relational join (or $unwind in a MongoDB aggregation pipeline), which would produce one row per child. Here, the predicate tests only for the existence of at least one child with the specified value. The result set contains distinct parents, not (parent, child) pairs.

A multi-key index stores multiple index entries per parent document — one for each distinct child_value in the array — paired with the parent's parent_value. The number of index entries per parent equals the number of distinct child values, which can be less than the total number of children when multiple children share the same value. For example, given these documents:

{
  _id: 24,
  parent_value: "Y",
  children: [
    { child_num: 1, child_value: 0.9 },
    { child_num: 2, child_value: 0.8 }
  ]
},
{
  _id: 42,
  parent_value: "X",
  children: [
    { child_num: 1, child_value: 0.5 },
    { child_num: 2, child_value: 0.9 },
    { child_num: 3, child_value: 0.9 },
    { child_num: 4, child_value: 0.7 }
  ]
}

The index contains three entries for the second document, because the two children with child_value: 0.9 produce only one index entry:

  (0.5, "X", 42) → record ID of { _id: 42 }
  (0.7, "X", 42) → record ID of { _id: 42 }
  (0.8, "Y", 24) → record ID of { _id: 24 }
  (0.9, "X", 42) → record ID of { _id: 42 }
  (0.9, "Y", 24) → record ID of { _id: 24 }

The index scan for {child_value: 0.9} finds two entries, already ordered by the second field of the compound key, parent_value. MongoDB can read them in order — "X" before "Y" — and fetch the corresponding documents without an additional sort. With a .limit(10), the scan stops after 10 distinct parents are found, making the query efficient regardless of how many total documents exist in the collection.

There is no direct equivalent of this in PostgreSQL. If the one-to-many relationship is stored as JSONB, GIN indexes can locate documents containing {child_value: 0.9} but do not preserve ordering on other fields. The query must fetch all matching documents, sort them on parent_value, and only then return the top 10. If the one-to-many relationship is normalized into two tables, no single index can span columns from both tables — a join is required first, and the sort optimization depends on the join strategy chosen by the planner.

Normalized One-to-Many

Now that the problem is defined, let's see how we can work around this limitation in a normalized model:

drop table if exists parent, child cascade;

create table parent (
 parent_id    bigserial primary key,
 parent_value float
);

create table child (
 child_id     bigserial primary key,
 parent_id    int references parent (parent_id),
 child_value  float
);

insert into parent (parent_value)
 select random() from generate_series(1, 1000)
;

insert into child (parent_id, child_value)
 select parent_id, round(random()::numeric, 4) from parent, generate_series(1, 1000)
;

create index on parent (parent_value);
create index on child (child_value);

vacuum analyze parent, child;

The following query finds the top 10 parents, by parent_value, that have at least one child with a child_value of 0.9999:

-- explain (analyze, buffers, verbose, costs off)
select *
from parent p
where exists (
  select 1 from child c
  where c.parent_id = p.parent_id
  and c.child_value = 0.9999
)
order by parent_value
limit 10;

 parent_id |     parent_value
-----------+----------------------
       956 | 0.013518349069161273
       789 | 0.022813950892476287
       307 | 0.024740344416860793
       833 | 0.041118650516225985
       493 |  0.07587347477845374
       402 |  0.08400127026866477
       256 |   0.0980166832909124
       342 |  0.11175505348300807
       846 |  0.12047992766758941
       590 |  0.16840580135419425
(10 rows)

The execution plan shows that:

The index on child_value was scanned to find all index entries with the expected value — here rows=99.0 — which are then used to locate the corresponding blocks in the child table (this is a Bitmap Index Scan).
The Bitmap Heap Scan on the child table returned all children with child_value = 0.9999 (there are 95 if we count the actual matches — a bitmap scan can have some false positives). The results were stored in memory (Materialize).
The index on parent_value was scanned to retrieve parents in the desired order, checking for each whether its parent_id appears in the materialized buffer. It had to read rows=169.00 parents before finding rows=10.00 that satisfied the condition, with lookups on the materialized result (loops=169).

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.366..2.272 rows=10.00 loops=1)
   Output: p.parent_id, p.parent_value
   Buffers: shared hit=245
   ->  Nested Loop Semi Join (actual time=0.365..2.270 rows=10.00 loops=1)
         Output: p.parent_id, p.parent_value
         Join Filter: (p.parent_id = c.parent_id)
         Rows Removed by Join Filter: 16246
         Buffers: shared hit=245
         ->  Index Scan using parent_parent_value_idx on public.parent p (actual time=0.012..0.080 rows=169.00 loops=1)
               Output: p.parent_id, p.parent_value
               Index Searches: 1
               Buffers: shared hit=142
         ->  Materialize (actual time=0.000..0.006 rows=96.19 loops=169)
               Output: c.parent_id
               Storage: Memory  Maximum Storage: 20kB
               Buffers: shared hit=103
               ->  Bitmap Heap Scan on public.child c (actual time=0.032..0.151 rows=99.00 loops=1)
                     Output: c.parent_id
                     Recheck Cond: (c.child_value = '0.9999'::double precision)
                     Heap Blocks: exact=99
                     Buffers: shared hit=103
                     ->  Bitmap Index Scan on child_child_value_idx (actual time=0.016..0.016 rows=99.00 loops=1)
                           Index Cond: (c.child_value = '0.9999'::double precision)
                           Index Searches: 1
                           Buffers: shared hit=4

The other index on parent_value can serve the sort but not the filter on child_value. In both cases, many more rows must be read to combine the two conditions after a Nested Loop Join. That is the price of normalization.

In a SQL database, if we want a single index to cover both fields, they must exist in the same table.

Denormalize into the "Many" Side (Children)

A common solution is to duplicate the parent value into the child table and create a compound index:

alter table child add parent_value float;

update child
  set parent_value = parent.parent_value
  from parent
  where child.parent_id = parent.parent_id;

alter table child alter parent_value set not null;

create index on child (child_value, parent_value);

The query can now operate on the child table only:

-- explain (analyze, buffers, verbose, costs off)
select distinct parent_id, parent_value
 from child
 where child_value = 0.9999
 order by parent_value
 limit 10;

 parent_id |     parent_value
-----------+----------------------
       956 | 0.013518349069161273
       789 | 0.022813950892476287
       307 | 0.024740344416860793
       833 | 0.041118650516225985
       493 |  0.07587347477845374
       402 |  0.08400127026866477
       256 |   0.0980166832909124
       342 |  0.11175505348300807
       846 |  0.12047992766758941
       590 |  0.16840580135419425
(10 rows)

The execution plan shows a single index scan which covers:

the filter: Index Cond: child_value = 0.9999
the order: Presorted Key: child.parent_value

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.067..0.072 rows=10.00 loops=1)
   Output: parent_id, parent_value
   Buffers: shared hit=36
   ->  Unique (actual time=0.066..0.070 rows=10.00 loops=1)
         Output: parent_id, parent_value
         Buffers: shared hit=36
         ->  Incremental Sort (actual time=0.066..0.066 rows=11.00 loops=1)
               Output: parent_id, parent_value
               Sort Key: child.parent_value, child.parent_id
               Presorted Key: child.parent_value
               Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
               Buffers: shared hit=36
               ->  Index Scan using child_child_value_parent_value_idx on public.child (actual time=0.014..0.054 rows=33.00 loops=1)
                     Output: parent_id, parent_value
                     Index Cond: (child.child_value = '0.9999'::double precision)
                     Index Searches: 1
                     Buffers: shared hit=36
 Planning:
   Buffers: shared hit=3
 Planning Time: 0.132 ms
 Execution Time: 0.088 ms

There is an additional sort because the result may contain duplicate parents when they have multiple children with the same value, but it is an incremental sort so it does not have to read all children — here only rows=33.00 before producing rows=10.00. This gets the top ten parents by parent_value without reading more child rows than necessary.

This is nearly optimal. The new index resembles a multi-key index, except that it may contain duplicate entries for the same parent — unnecessary for this query, but potentially useful for others that need the detail of individual children. Like the multi-key index, it must be maintained when the parent value changes. We have two solutions: referential integrity constraint and triggers.

Update with Cascade Constraint (Recommended)

One way to keep this column automatically updated is declarative. We add an additional foreign key that enforces that the parent_value in the child table matches the one in the parent, and declare it on update cascade:

alter table parent add unique (parent_id, parent_value);

alter table child
 add constraint fk_child_parent
 foreign key (parent_id, parent_value)
 references parent (parent_id, parent_value)
 on update cascade;

Unfortunately, PostgreSQL cannot use the same index for two unique constraints that share a prefix, so two indexes are maintained on the parent table. This solution is intended for cases where updates are not frequent.

I test the performance overhead when updating one parent:

\timing on
explain (analyze, buffers, costs off, verbose on)
 update parent
 set parent_value = parent_value - 0.000000000000000042
 where parent_id = 789
;

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Update on public.parent (actual time=0.063..0.064 rows=0.00 loops=1)
   Buffers: shared hit=16
   ->  Index Scan using parent_parent_id_parent_value_key on public.parent (actual time=0.022..0.025 rows=1.00 loops=1)
         Output: (parent_value - '4.2e-17'::double precision), ctid
         Index Cond: (parent.parent_id = 789)
         Index Searches: 1
         Buffers: shared hit=5
 Planning Time: 0.080 ms
 Trigger RI_ConstraintTrigger_a_18059 for constraint fk_child_parent on parent: time=37.950 calls=1
 Trigger RI_ConstraintTrigger_c_18061 for constraint fk_child_parent on child: time=6.767 calls=1000
 Execution Time: 44.874 ms

There is an overhead: 37.9 milliseconds to cascade the update to a thousand of children (parent action), and 6.7 milliseconds to validate each _c_hild row (child check).

This is not problematic for occasional updates. Of course, if we update all parents, the row-by-row cascading constraints can be slow:

\timing on
explain (analyze, buffers, costs off, verbose on)
 update parent
 set parent_value = parent_value - 0.000000000000000042
;
vacuum parent, child;

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Update on public.parent (actual time=5.944..5.945 rows=0.00 loops=1)
   Buffers: shared hit=10975 dirtied=1
   ->  Seq Scan on public.parent (actual time=0.010..0.156 rows=1000.00 loops=1)
         Output: (parent_value - '4.2e-17'::double precision), ctid
         Buffers: shared hit=11
 Planning:
   Buffers: shared hit=86
 Planning Time: 0.235 ms
 Trigger RI_ConstraintTrigger_a_18059 for constraint fk_child_parent on parent: time=42659.655 calls=506
 Trigger RI_ConstraintTrigger_c_18061 for constraint fk_child_parent on child: time=3398.155 calls=506000
 Execution Time: 46103.577 ms
(11 rows)

Time: 46105.778 ms (00:46.106)

Obviously this is not a solution for frequently updated columns. However, the parent side of a one-to-many relationship often consists of static references, slowly changing dimensions, or immutable event headers, making this overhead acceptable. I also tested with the constraint set as DEFERRABLE. The elapsed time is similar, but you do not see it in the EXPLAIN output because the validation occurs at commit time.

Update with Trigger (Not Recommended)

The cascade constraint has the advantage of being declarative, but it executes internally as a trigger. We can instead create a custom trigger that cascades the update without re-validating the foreign key — assuming that nobody updates the denormalized column directly, and that it is set correctly on insert.

-- function to update the parent value in the children
create or replace function sync_parent_value() returns trigger as $$
begin
  update child set parent_value = new.parent_value
  where parent_id = new.parent_id;
  return new;
end;
$$ language plpgsql;

-- trigger raised for each update
create trigger trg_sync_parent_value
after update of parent_value on parent
for each row
when (old.parent_value is distinct from new.parent_value)
execute function sync_parent_value();

-- drop the previous constraint
alter table child drop constraint fk_child_parent;
alter table parent drop constraint parent_parent_id_parent_value_key;

I test the update of all parents:

\timing on
explain (analyze, buffers, costs off, verbose on)
 update parent
 set parent_value = parent_value - 0.000000000000000042
;
vacuum parent, child;

                                   QUERY PLAN
---------------------------------------------------------------------------------
 Update on public.parent (actual time=4.547..4.547 rows=0.00 loops=1)
   Buffers: shared hit=8453 dirtied=26
   ->  Seq Scan on public.parent (actual time=0.009..0.124 rows=1000.00 loops=1)
         Output: (parent_value - '4.2e-17'::double precision), ctid
         Buffers: shared hit=11
 Planning:
   Buffers: shared hit=8 dirtied=2
 Planning Time: 0.108 ms
 Trigger trg_sync_parent_value: time=42719.154 calls=506
 Execution Time: 42724.042 ms
(10 rows)

Time: 42729.150 ms (00:42.729)

postgres=# vacuum parent, child;

VACUUM
Time: 176.585 ms

The performance advantage over the cascade constraint is minimal, and the risk is higher because parent_value in the child table could be updated independently without any check that it still matches the parent. We would also need an insert trigger to set the correct value, and an update trigger to prevent direct modifications. Ultimately, the cascade constraint solution is strongly preferable.

Denormalize into the "One" Side (Parent)

If we want to more closely mimic a MongoDB multi-key index, denormalizing into the child table is not an exact equivalent because there is no deduplication. It has its advantages — it can serve queries that need individual child rows — but a structure analogous to a multi-key index should be stored on the parent side, where each parent holds a summary of its children's values.

One approach is to add an array of child values to the parent table so that we can filter on it:

-- add array column
alter table parent add child_values float[];

-- populate from child
update parent p set child_values = (
  select array_agg(distinct child_value)
  from child where parent_id = p.parent_id
);

A trigger must maintain the array when children change:

create or replace function sync_child_values_array() returns trigger as $$
begin
  update parent set child_values = (
    select array_agg(distinct child_value)
    from child where parent_id = coalesce(new.parent_id, old.parent_id)
  ) where parent_id = coalesce(new.parent_id, old.parent_id);
  return new;
end;
$$ language plpgsql;

create trigger trg_sync_child_values_array
after insert or update or delete on child
for each row
execute function sync_child_values_array();

The triggers are examples, but you must cover all possible cases in your environment. For example, here, we do not expect children moved to another parent by updating parent_id, which should update both parents.

Now that we have the child values in the parent, we can create a compound index. But we cannot use regular B-Tree on fields within an array, and need an inverted index: GIN or RUM.

GIN Index (and `btree_gin` Extension)

As an array is involved, we need an inverted index. PostgreSQL provides GIN indexes, and they can be combined with scalar columns using the btree_gin extension:

-- required extension
create extension if not exists btree_gin;

-- compound GIN index on the array and the parent value
create index on parent using gin (parent_value, child_values);

Now the query gives the same result, but the compound GIN index cannot avoid a sort:

explain (analyze, buffers, verbose, costs off)
select parent_id, parent_value
 from parent
 where child_values @> array[0.9999::float]
 order by parent_value
 limit 10;

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.079..0.081 rows=10.00 loops=1)
   Output: parent_id, parent_value
   Buffers: shared hit=14
   ->  Sort (actual time=0.079..0.080 rows=10.00 loops=1)
         Output: parent_id, parent_value
         Sort Key: parent.parent_value
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=14
         ->  Bitmap Heap Scan on public.parent (actual time=0.031..0.060 rows=93.00 loops=1)
               Output: parent_id, parent_value
               Recheck Cond: (parent.child_values @> '{0.9999}'::double precision[])
               Heap Blocks: exact=10
               Buffers: shared hit=14
               ->  Bitmap Index Scan on parent_parent_value_child_values_idx (actual time=0.018..0.018 rows=93.00 loops=1)
                     Index Cond: (parent.child_values @> '{0.9999}'::double precision[])
                     Index Searches: 1
                     Buffers: shared hit=4

Because GIN requires a bitmap scan, preserving the ordering on parent_value is not possible. Such an index can be used for queries with predicates on both fields, but our pagination query uses it less efficiently. It remains acceptable here (reading 93 rows to return 10) because the predicate is selective enough that few rows need sorting.

Partial B-tree Index for known predicates

If the predicate is not very selective—because the value being filtered on is popular—the previous index is not very efficient. If you have frequent filtering on a few specific values, you can create partial indexes:

create index on parent (parent_value)
 where child_values @> array[0.9999::float]
;

Then the execution that uses this exact predicate is optimal, reading only 10 index entries to produce the 10 result rows:

explain (analyze, buffers, verbose, costs off)
select parent_id, parent_value
 from parent
 where child_values @> array[0.9999::float]
 order by parent_value
 limit 10;

                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.013..0.024 rows=10.00 loops=1)
   Output: parent_id, parent_value
   Buffers: shared hit=10 read=1
   ->  Index Scan using parent_parent_value_idx on public.parent (actual time=0.012..0.022 rows=10.00 loops=1)
         Output: parent_id, parent_value
         Index Searches: 1
         Buffers: shared hit=10 read=1
 Planning:
   Buffers: shared hit=15 read=1
 Planning Time: 0.314 ms
 Execution Time: 0.034 ms

This is the fastest possible execution for a known predicate. However, it requires creating a separate partial index for each distinct filter value, which is impractical when the set of possible predicates is large or unpredictable.

DocumentDB Multi-Key Index (PostgreSQL with extended RUM)

While standard PostgreSQL GIN indexes cannot return ordered results, the DocumentDB extension for PostgreSQL provides a true multi-key index implementation. DocumentDB stores documents as BSON and uses an enhanced RUM access method (documentdb_extended_rum) that maintains multi-key index entries—just like MongoDB does natively. This enables the query engine to push both filter and sort into a single index scan with early termination via LIMIT.

I start a DocumentDB container and connect with psql:

docker run -d --name documentdb-local \
  -p 10260:10260 \
  -p 9712:9712 \
  ghcr.io/documentdb/documentdb/documentdb-local:latest \
  --username postgres \
  --password Iapetus \
  --start-pg
docker exec -it documentdb-local psql -p 9712 postgres

After creating the same relational tables as before, I create a DocumentDB collection and its compound index:

-- create the index first (on empty collection)
select documentdb_api_internal.create_indexes_non_concurrently(
  'mydb',
  '{
    "createIndexes": "parents",
    "indexes": [{
      "key": { "children.child_value": 1, "parent_value": 1 },
      "name": "child_value_parent_value_idx"
    }]
  }'::documentdb_core.bson
);

-- populate from the relational tables, embedding children as an array
select documentdb_api.insert_one('mydb', 'parents',
  format('{"parent_id": %s, "parent_value": %s, "children": %s}',
    p.parent_id, p.parent_value,
    (select coalesce(json_agg(json_build_object('child_value', c.child_value)), '[]')
     from child c where c.parent_id = p.parent_id)
  )::documentdb_core.bson, null
) from parent p;

Now, I run the equivalent query. The filter and sort match the MongoDB query from the beginning of this article:

explain (analyze, buffers, costs off)
select document from bson_aggregation_find(
  'mydb',
  '{
    "find": "parents",
    "filter": { "children.child_value": 0.9999 },
    "sort": { "parent_value": 1 },
    "limit": 10,
    "projection": { "parent_id": 1, "parent_value": 1, "_id": 0 }
  }'::documentdb_core.bson
);

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Limit (actual time=0.179..0.890 rows=10 loops=1)
   Buffers: shared hit=95
   ->  Index Scan using child_value_parent_value_idx on documents_10 collection (actual time=0.178..0.888 rows=10 loops=1)
         Index Cond: (document @= '{ "children.child_value" : { "$numberDouble" : "0.99990000000000001101" } }'::bson)
         Order By: (document |-<> '{ "parent_value" : { "$numberInt" : "1" } }'::bson)
         Buffers: shared hit=95
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.274 ms
 Execution Time: 0.916 ms
(10 rows)

The plan shows a single index scan—no sort node, no bitmap scan. The index provides both the filter (Index Cond) and the ordering (Order By) simultaneously. With LIMIT 10, the executor stops after reading 10 matching documents from the index, regardless of how many total parents contain the value.

Looking at the underlying storage, we can see the extended RUM index that makes this possible:

\d documentdb_data.documents_10

postgres=# \d documentdb_data.documents_10
           Table "documentdb_data.documents_10"
     Column      |  Type  | Collation | Nullable | Default
-----------------+--------+-----------+----------+---------
 shard_key_value | bigint |           | not null |
 object_id       | bson   |           | not null |
 document        | bson   |           | not null |
Indexes:
    "collection_pk_10" PRIMARY KEY, btree (shard_key_value, object_id)
    "documents_rum_index_29" documentdb_extended_rum (document documentdb_extended_rum_catalog.bson_extended_rum_composite_path_ops (pathspec='[ "children.child_value", "parent_value" ]', tl='2691'))
Check constraints:
    "shard_key_value_check" CHECK (shard_key_value = '10'::bigint)

The documentdb_extended_rum index stores BSON paths as composite keys, exploding array elements into multiple index entries—the same structure as a MongoDB multi-key index, implemented in PostgreSQL's extensible index infrastructure.

As I used this collection to duplicate data, we need to keep it in sync with the relational tables and add triggers:

-- sync child changes to the DocumentDB collection  
create or replace function sync_child_to_documentdb() returns trigger as $$  
begin  
  -- rebuild the entire children array for the affected parent  
  perform documentdb_api.update(  
    'mydb',  
    format('{  
      "update": "parents",  
      "updates": [{  
        "q": { "parent_id": %s },  
        "u": { "$set": { "children": %s } }  
      }]  
    }',  
      coalesce(new.parent_id, old.parent_id),  
      (select coalesce(json_agg(json_build_object('child_value', c.child_value)), '[]')  
       from child c where c.parent_id = coalesce(new.parent_id, old.parent_id))  
    )::documentdb_core.bson  
  );  
  return new;  
end;  
$$ language plpgsql;  

create trigger trg_sync_child_to_documentdb  
after insert or update or delete on child  
for each row  
execute function sync_child_to_documentdb();  

-- sync parent_value updates to the DocumentDB collection  
create or replace function sync_parent_to_documentdb() returns trigger as $$  
begin  
  perform documentdb_api.update(  
    'mydb',  
    format('{  
      "update": "parents",  
      "updates": [{  
        "q": { "parent_id": %s },  
        "u": { "$set": { "parent_value": %s } }  
      }]  
    }', new.parent_id, new.parent_value)::documentdb_core.bson  
  );  
  return new;  
end;  
$$ language plpgsql;  

create trigger trg_sync_parent_to_documentdb  
after update of parent_value on parent  
for each row  
when (old.parent_value is distinct from new.parent_value)  
execute function sync_parent_to_documentdb();

Of course, if the DocumentDB collection is the primary data store rather than a synchronized copy, these triggers are unnecessary—you would simply use the DocumentDB API functions (or MongoDB wire protocol on port 10260) for all reads and writes, and the multi-key index is maintained automatically.

Conclusion

MongoDB's multi-key index is a B-tree that explodes array entries, creating one index entry per distinct array element per document. This allows a single index to efficiently serve equality filters on child values combined with sorts on parent values, with early termination via limit. PostgreSQL has no native equivalent: GIN indexes handle array containment but cannot produce ordered output, and B-tree indexes cannot span multiple tables. However, it is extensible.

We explored several strategies to approximate this behavior, filter with pagination, in PostgreSQL:

Approach	Read Performance	Write Cost	Complexity
Normalized with `EXISTS`	Moderate: two index scans, semi-join	Low	Low
Denormalized into child + cascade FK	Near-optimal (single index, incremental sort for dedup)	High (all children updated on parent change)	Moderate
Array on parent + GIN	Low (sort after bitmap scan)	Moderate (one parent row per child change)	Moderate (triggers)
Array on parent + partial B-tree index	Optimal (index scan, no sort, stops at 10)	Low (one parent row per child change)	Moderate (one index per predicate)
DocumentDB extended RUM	Optimal (ordered index scan, no sort, stops at 10)	Low (one document per parent change)	Low (native multi-key semantics)

The best choice depends on the workload:

If parent_value rarely changes and queries need individual child rows, denormalizing into the child table with a cascade constraint is simple and effective. The compound B-tree index on the child table provides near-optimal read performance, and the cascade foreign key maintains consistency declaratively.
If the query only needs parents and the child predicate is known in advance, a partial B-tree index on the parent table with an array column delivers optimal read performance with minimal write overhead—only one parent row is updated when a child is inserted or deleted. The limitation is that a separate partial index is needed for each predicate value.
If the predicates are varied and unpredictable but a sort after filtering is acceptable, the GIN index provides a general-purpose solution that works with any array containment predicate, at the cost of a post-filter sort.
If the predicates are varied and unpredictable and you need optimal read performance without sorting, the DocumentDB extension provides the closest general-purpose equivalent to MongoDB's multi-key index, supporting array filtering with ordered output from a single index scan—with no write amplification on parent value updates since children are embedded in the same document.

In all cases, the standard PostgreSQL solution requires explicit denormalization and maintenance logic—whether through cascade constraints, triggers, or application code—to achieve what MongoDB's document model and multi-key indexes provide natively. The DocumentDB extension bridges this gap by bringing the document model directly into PostgreSQL, aligning the data structure and the index structure by design and eliminating the need for synchronization entirely.