In SQL databases, we sometimes encounter an Entity-Attribute-Value (EAV) model to work around the rigidity of the relational model when different attributes are used among documents. In MongoDB, you can do the same with the Attribute Pattern, and index the attribute name and value, but it is not needed as documents can simply include multiple fields, and wildcard indexes can index each attribute. You can think of it like an UNPIVOT but applied only to the index entries.
The Youtube video statistics dataset imported in the first post of this series is a collection of one million videos. They embed an "accessControl" sub-object that stores a list of attributes (like 'comment', 'rate', or 'syndicate') with a permission ('allowed', or 'moderated'):
Here is an example of document:
{
"_id": "---ALs2MJb8",
"accessControl": {
"comment": { "permission": "allowed" },
"list": { "permission": "allowed" },
"videoRespond": { "permission": "moderated" },
"rate": { "permission": "allowed" },
"syndicate": { "permission": "allowed" },
"embed": { "permission": "allowed" },
"commentVote": { "permission": "allowed" },
"autoPlay": { "permission": "allowed" }
},
"category": "Music",
"author":string"TriumphantPromotions",
"publishedDate":string"2013-06-04T05:14:58Z",
...
}
In this dataset, all attributes belong to a known list, but for the purpose of this example, we will treat them as unknown, refraining from creating individual indexes. The attribute pattern would have transformed it to the following, with a single field name that can be indexed:
{
"_id": "---ALs2MJb8",
"accessControl": [
{ "type": "comment", "permission": "allowed" },
{ "type": "list", "permission": "allowed" },
{ "type": "videoRespond", "permission": "moderated" },
{ "type": "rate", "permission": "allowed" },
{ "type": "syndicate", "permission": "allowed" },
{ "type": "embed", "permission": "allowed" },
{ "type": "commentVote", "permission": "allowed" },
{ "type": "autoPlay", "permission": "allowed" }
],
"category": "Music",
"author":string"TriumphantPromotions",
"publishedDate":string"2013-06-04T05:14:58Z",
...
}
Wildcard indexes function similarly, with an index key compound with the field name and the value, without modifying the document itself.
It is created like a regular index except that it can include a $**
wildcard:
db.youstats.createIndex(
{ "author": 1, "accessControl.$**" : 1, "category": 1 }
)
In my data set, I have 68 videos from "Paramount Movies" and 3 of them have rate permission denied:
db.youstats.aggregate([
{ $match: { author: "Paramount Movies" } },
{
$group: {
_id: "$accessControl.rate.permission",
count: { $sum: 1 }
}
}
])
[ { _id: 'allowed', count: 3 }, { _id: 'denied', count: 65 } ]
If I want to find only those with rate permission denied, I would have to create an index with "accessControl.rate.permission" in the key. Without it, it would have to find the 68 documents, and then filter out to eliminate 65 of them. Such an index would serve only the "rate" permission, and I would have to create many indexes for all permissions I might query, and that might be a lot with a flexible schema.
With my wildcard index, all fields under a path are automatically indexed. This allows queries to access the three relevant documents directly, even without prior knowledge of which permissions will be in the query filter:
db.youstats.find({
author: "Paramount Movies",
"accessControl.rate.permission": "allowed"
}).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 3,
executionTimeMillis: 0,
totalKeysExamined: 3,
totalDocsExamined: 3,
executionStages: {
isCached: false,
stage: 'FETCH',
nReturned: 3,
executionTimeMillisEstimate: 0,
works: 5,
advanced: 3,
...
docsExamined: 3,
alreadyHasObj: 0,
inputStage: {
stage: 'IXSCAN',
nReturned: 3,
executionTimeMillisEstimate: 0,
works: 4,
advanced: 3,
...
keyPattern: {
author: 1,
'$_path': 1,
'accessControl.rate.permission': 1,
category: 1
},
indexName: 'author_1_accessControl.$**_1_category_1',
isMultiKey: false,
multiKeyPaths: {
author: [],
'$_path': [],
'accessControl.rate.permission': [],
category: []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
author: [ '["Paramount Movies", "Paramount Movies"]' ],
'$_path': [
'["accessControl.rate.permission", "accessControl.rate.permission"]'
],
'accessControl.rate.permission': [ '["allowed", "allowed"]' ],
category: [ '[MinKey, MaxKey]' ]
},
keysExamined: 3,
seeks: 1,
dupsTested: 0,
dupsDropped: 0
}
}
}
The number of index entries read, keysExamined: 3
, matches the number of documents returned, nReturned: 3
, indicating optimal access.
The index bounds reveal insights about the indexed keys:
- The value searched in the first field of the key,
author
, is[ '"Paramount Movies", "Paramount Movies"' ]
- The value searched in the
$_path
, which contains the field name, is[ '"accessControl.rate.permission", "accessControl.rate.permission"' ]
- The value searched in the this field is
[ '"allowed", "allowed"' ]
. - The last field of the key
category
has no filter applied, resulting in a scan of all values, represented as[ '[MinKey, MaxKey]' ]
.
Examining the index bounds provides valuable insights into the access patterns it can serve efficiently. For instance, if you see category: [ '[MinKey, MaxKey]' ]
, you can confidently add .sort({category:1})
to your query without increasing costs, as the index entries will already be in the required order.
If you have filters on multiple fields under a wildcard, the index might read more entries, but the filter is still covered before fetching the documents, like these:
db.youstats.find(
{
author: "Paramount Movies",
"accessControl.rate.permission": "allowed",
"accessControl.comment.permission": "denied"
}
)
db.youstats.find({
author: "Paramount Movies",
$or: [
{ "accessControl.rate.permission": "allowed" },
{ "accessControl.comment.permission": "denied" }
]
})
Wildcard indexes offer significant flexibility when dealing with documents that have evolving, dynamic, or unpredictable attribute sets. They prove especially valuable in various scenarios beyond access control permissions:
User-Defined Content & Metadata: In applications that allow users to add custom fields—like tagging, profile properties, or annotation systems—there’s no need to anticipate and index every potential custom attribute in advance.
IoT and Telemetry Data: Devices frequently send sensor readings or status fields that may vary over time or between models. Wildcard indexes enable efficient indexing of any combination of measurements or state fields within the same collection, accommodating unforeseen future fields without needing schema changes.
Catalogs and Product Data: E-commerce platforms often manage products with differing attribute sets based on category (e.g., size, color, voltage, brand, material). Wildcard indexes eliminate the necessity for separate indexes for each potential attribute.
Multi-Tenant or Extensible Systems: SaaS platforms and extensible business applications allow tenants or partners to define their own custom fields. Wildcard indexes facilitate efficient querying, regardless of the unique attributes present in each tenant’s data.
Audit Logs and Event Sourcing: Log entries may feature arbitrary keys based on event type or source system. Wildcard indexes permit efficient filtering and retrieval of records, even as event schemas evolve.
The wildcard index is a feature of MongoDB where the flexibility of documents is native. In contrast, MongoDB emulations built on top of SQL databases struggle to offer the same level of flexibility, as they are constrained by the limitations of RDBMS engines.
I tested on Oracle with the MongoDB compatible API:
oracle> db.youstats.createIndex(
... { "author": 1, "accessControl.$**" : 1, "category": 1 }
... )
MongoServerError[MONGO-67]: Wildcard indexes are not supported.
I also tested on FerretDB which uses PostgreSQL with the DocumentDB extension that powers CosmosDB in Azure:
ferretdb> db.youstats.createIndex(
... { "author": 1, "accessControl.$**" : 1, "category": 1 }
... )
MongoServerError[CannotCreateIndex]: Error in specification { "name" : "author_1_accessControl.$**_1_category_1", "key" : { "author" : 1, "accessControl.$**" : 1, "category" : 1 } }
:: caused by
:: wildcard indexes do not allow compounding
AWS has a service compatible with old versions of MongoDB, Amazon DocumentDB, and this fails:
docdb> db.youstats.createIndex(
... { "author": 1, "accessControl.$**" : 1, "category": 1 }
... )
MongoServerError: Index type not supported : $**
Google Firestore also provides a MongoDB emulation, which doesn't support wildcard indexes either. If you try other databases that pretend being MongoDB compatible, please tell me if it works.
Wildcard indexes significantly simplify operational complexity for developers and DBAs. Instead of creating more indexes or restructuring documents with the attribute pattern, a single wildcard index can adapt to accommodate various query patterns as requirements evolve.
Top comments (0)