Franck Pachot

Posted on Feb 14 • Edited on Feb 23

Cartesian product (CROSS JOIN) in MongoDB

#database #javascript #mongodb #tutorial

Relational database joins are, conceptually, a cartesian product followed by a filter (the join condition). Without a join condition, the result is every possible combination of rows. In SQL, you use CROSS JOIN or list multiple tables in the FROM clause without a join condition in the WHERE clause. In MongoDB, you can model the same behavior at read time using $lookup, or at write time by embedding documents.

A relational database uses a data model defined independently of applications and is accessed with SQL, which makes normalization necessary. In contrast, a document database typically uses a domain-driven model and is accessed through application services. This allows for multiple choices optimized by access patterns, and some duplication is acceptable as long as a single service is responsible for updates.

Example

Define two collections: one for clothing sizes and one for gender-specific fits:

db.sizes.insertMany([  
  { code: "XS", neckCm: { min: 31, max: 33 } },  
  { code: "S",  neckCm: { min: 34, max: 36 } },  
  { code: "M",  neckCm: { min: 37, max: 39 } },  
  { code: "L",  neckCm: { min: 40, max: 42 } },  
  { code: "XL", neckCm: { min: 43, max: 46 } }  
]);

db.fits.insertMany([
  {
    code: "MEN",
    description: "Straight cut, broader shoulders, narrower hips"
  },
  {
    code: "WOMEN",
    description: "Tapered waist, narrower shoulders, wider hips"
  }
]);

Each collection stores independent characteristics, and every size applies to every fit. The goal is to generate all valid product variants.

Cross join on read: $lookup + $unwind

In order to add all sizes to each body shape, use a $lookup without filter condition and, as it adds them as an embedded array, use $unwind to get one document per combination:

db.sizes.aggregate([
  {
    $lookup: {
      from: "fits",
      pipeline: [],
      as: "fit"
    }
  },
  { $unwind: "$fit" },
  { $sort: { "fit.code": 1, code: 1 } },
  {
    $project: {
      _id: 0,
      code: { $concat: ["$fit.code", "-", "$code"] }
    }
  }
]);

Here is the result:

Each input document produces its own Cartesian product, so the total number of results is the product of both collection sizes.

Application-side

For such small static reference collections, the application may simply read both and join with loops:

const sizes = db.sizes.find({}, { code: 1, _id: 0 }).sort({ code: 1 }).toArray();
const fits  = db.fits.find({},  { code: 1, _id: 0 }).sort({ code: 1 }).toArray();

for (const fit of fits) {
  for (const size of sizes) {
    print(`${fit.code}-${size.code}`);
  }
}

While it's good to keep the reference in a database, such static data can stay in cache in the application.

Cross join on write: embed the many-to-many

If, in your domain, sizes are only meaningful in the context of a specific fit, embedding them in the fits documents is often a better model:

db.fits.aggregate([
  {
    $lookup: {
      from: "sizes",
      pipeline: [
        { $project: { _id: 0, code: 1, neckCm:1 } },
        { $sort: { code: 1 } }
      ],
      as: "sizes"
    }
  },
  {
    $merge: {
      into: "fits",
      on: "_id",
      whenMatched: "merge",
      whenNotMatched: "discard"
    }
  }
]);

Here is the new shape of the single collection:

Once embedded, the query becomes straightforward, simply unwind the embedded array:

db.fits.aggregate([
  { $unwind: "$sizes" },
  {
    $project: {
      _id: 0,
      code: {
        $concat: ["$code", "-", "$sizes.code"]
      }
    }
  }
]);

You may embed only the fields required, like the size code, or all fields like I did here with the neck size, and then remove the size collection.

Although this may duplicate the values for each body shape, it only requires using updateMany() instead of updateOne() when updating it. For example, the following updates one size (the array item i identified by arrayFilters and updated with $set) in the whole collection (the empty {} as a document filter):

db.fits.updateMany(
  {},  
  { $set: { "sizes.$[i].neckCm": { min: 38, max: 40 } } }, 
  { arrayFilters: [ { "i.code": "M" }  ] }  
);

Duplication has the advantage of returning all required information in a single read, without joins or multiple queries, and it is not problematic for updates since it can be handled with a single bulk update operation. Relational databases often rely on database‑level constraints because multiple clients may issue ad‑hoc SQL and business rules must therefore be enforced at the database level—MongoDB applications are typically domain‑driven, with clear ownership of data and a single responsibility for performing updates.

In that context, consistency is maintained by the application's service rather than by cross‑table constraints.

This approach also lets business rules evolve, such as defining different neck sizes for men and women, without changing the data model. In Domain-Driven Design, The bounded context responsible for listing, taking orders, or shipping T-shirts needs access to their complete characteristics. Another domain is responsible for deciding whether all medium sizes share the same dimensions or vary by gender.

Conclusion

In a fully normalized relational model, all relationships use the same pattern: a one-to-many relationship between two tables, enforced by a primary (or unique) key on one side and a foreign key on the other. This holds regardless of cardinality (many can be three or one million), lifecycle rules (except a little control though cascade deletes or updates), ownership (shared or exclusive parent), navigation direction (according to access patterns), and association type (polymorphism must be normalize to one-to-many or additional nullable columns). Even many-to-many relationships are just two one-to-many relationships via a junction table.

MongoDB exposes these same concepts as modeling choices—handled per document at read time with $lookup, at write time through embedding, or in the application—instead of enforcing a single normalized representation. The choice depends on the domain data and access patterns.

DEV Community