oluseyeo

Posted on Sep 15, 2020 • Edited on Oct 7, 2021

How to Create Relationships with Mongoose and Node.JS

#node #mongodb #database #javascript

FOCUS: One-to-many Relationships

NoSQL databases, unlike SQL databases like PostgreSQL, MYSQL etc, which are traditionally built for data relationship management, indexed and referenced across multiple tables, have a poor or almost non-existent support for relationships in her JSON-like built schema. MongoDB, a popular NoSQL database, like others, have inbuilt methods that developers can leverage to build relationships between multiple schemas.

Relationships in MongoDB are built on the JOIN functionality and with the popular NPM module, the Mongoose library, developers can harness its raw power, building complex relationships, and importantly, designing efficient databases to avoid throttling queries, as it would have been done, if working with an SQL database.

In this tutorial, I am going to be touching on the following in details:

Types of relationships & object reference types in MongoDB
Mongoose Populate Method
Mongoose Virtuals

Prerequisites:

It is expected that readers have a good basic grasp of ExpressJS, Mongoose, ES6+ JS & Postman.

Also, the following should be available either as a service or installed and running locally on your PC:

MongoDB or you can choose Atlas, the cloud version of MongoDB.
Mongoose NPM. Simply run [npm i mongoose ] at the root of your project folder.
Postman, to test the endpoints.



"npm i mongoose"

For the purpose of this write-up, I have built a small “Publishing House” project, to walk you through how to achieve any of the methods to be discussed. The Publishing House project assumes Publishers as registered users, who can publish multiple books under their portfolio.

MongoDB as database.
Mongoose library, as the database object document manager (ODM).
ExpressJS to create our routes using async/await ES6+ since we shall be dealing with promises.
Postman will be used to test our endpoints for responses.

Mongoose represents relational data using two major design models, and the choice of model to deploy when planning the database collections of any project is predominantly hinged on the data-size, data accuracy, and frequency of access. Nonetheless, the rule of thumb is, the size of documents stored, is in direct proportion to the speed at which queries are resolved, and ultimately, how performant the database is.

The two models are as follows:

Embedded Data Models [Denormalization]: This is the least recommended form of relationship. Data is simply denormalized by embedding Child (related) documents right into the Parent (main) document. Using our “Publishing project” as an example, this would mean, Publishers, store all published books and related information directly on each publisher’s object.
In a typical One-to-Few document relationship, this would work perfectly as the expected size of documents is not more than 20. However, when working with Child documents of a larger size, this size heavily impairs database performance, causing lags, and difficulty in keeping data synced, ultimately bringing about poor user experience.
Referenced Data Model [Normalization]: When data is normalized, it means documents are separated into different collections, and they share references between each other. In most cases, a single update on the Parent document, with all parameters passed, updates the child documents directly referenced to it. The rest of this tutorial will be focused on the best use case of this method, and how best to organize our database collections and documents in an efficient manner.

Referencing documents between collections can be done via dual approaches, and are as follows:

Child Referencing: A document is considered Child referenced, when the Parent document stores a reference to its child collections, storing its identifiers - in most situations, the id, in an array of similar identifiers on the Parent document. Citing our “Publishing House” project, this would mean, having Publishers store the book._id for each book created, in an array of book id’s, predefined on the Publisher's Schema, and when needed, fetch these child documents using the populate method.

From our Project, see the Publisher's schema below:



const mongoose = require('mongoose');
const {Schema} = require('mongoose');

const publisherSchema = new Schema({
   name: String,
   location: String,
   publishedBooks: [{
      type: Schema.Types.ObjectId,
      ref: 'Book'
   }]
},
{timestamps: true});

module.exports = mongoose.model('Publisher', publisherSchema);

Publisher Schema [Notice published books is an array]

Here is our Book Schema:



const mongoose= require('mongoose');
const {Schema} = require('mongoose');

const bookSchema = new Schema({
   name: String,
   publishYear: Number,
   author: String,
   publisher: {
      type: Schema.Types.ObjectId,
      ref: 'Publisher',
      required: true
   }
},
{timestamps: true});

module.exports = mongoose.model('Book', bookSchema);

Book Schema

The mongoose “populate” method loads the details of each referenced Child documents and returns it alongside each Publisher's document fetched from the DB. Let’s see an example of this using our project.

We start by creating a new Publisher below:



/***
 * @action ADD A NEW PUBLISHER
 * @route http://localhost:3000/addPublisher
 * @method POST
*/
app.post('/addPublisher', async (req, res) => {
   try {
      //validate req.body data before saving
      const publisher = new Publisher(req.body);
      await publisher.save();
      res.status(201).json({success:true, data: publisher });

   } catch (err) {
      res.status(400).json({success: false, message:err.message});
   }
});

Create a new publisher



{
    "success": true,
    "data": {
        "publishedBooks": [],
        "_id": "5f5f8ac71edcc2122cb341c7",
        "name": "Embedded Publishers",
        "location": "Lagos, Nigeria",
        "createdAt": "2020-09-14T15:22:47.183Z",
        "updatedAt": "2020-09-14T15:22:47.183Z",
        "__v": 0
    }
}

A new publisher

Next, the newly created Publisher proceeds to add a new book about to publish to it's DB. The publisher’s _id is passed in as a value to the Publisher’s key on the Book schema before saving, and in the same request loop, right after calling the save method on the new book, the newly created book object returned from the Promise, MUST be passed as a parameter to a push method, called on the Publisher’s key. This would ensure that the book object, is saved on the Publisher's document.

Here's the magic breakdown:



/***
 * @action ADD A NEW BOOK
 * @route http://localhost:3000/addBook
 * @method POST
*/

app.post('/addBook', async (req, res)=>{

   /**
    * @tutorial: steps
    * 1. Authenticate publisher and get user _id.
    * 2. Assign user id from signed in publisher to publisher key.
    * 3. Call save method on Book.
   */

   try {
      //validate data as required

      const book = new Book(req.body);
      // book.publisher = publisher._id; <=== Assign user id from signed in publisher to publisher key
      await book.save();

      /**
       * @tutorial: steps
       * 1. Find the publishing house by Publisher ID.
       * 2. Call Push method on publishedBook key of Publisher.
       * 3. Pass newly created book as value.
       * 4. Call save method.
      */
      const publisher = await Publisher.findById({_id: book.publisher})
      publisher.publishedBooks.push(book);
      await publisher.save();

      //return new book object, after saving it to Publisher
      res.status(200).json({success:true, data: book })

   } catch (err) {
      res.status(400).json({success: false, message:err.message})
   }
})

A Publisher adding a new book to be published to her DB

This is the defined way to saving child document references(id’s) on the publisher’s document. On successful creation, the below is returned when you query the Publisher's id.

PS: The Publisher below created 3 new books.



{
    "publishedBooks": [
        {
            "_id": "5f5f8ced4021061030b0ab68",
            "name": "Learn to Populate virtuals Mongoose",
            "publishYear": 2019,
            "author": "Devangelist"
        },
        {
            "_id": "5f5f8d144021061030b0ab6a",
            "name": "Why GoLang gaining traction",
            "publishYear": 2020,
            "author": "John Doe"
        },
        {
            "_id": "5f5f8d3c4021061030b0ab6b",
            "name": "Developer Impostor syndrome",
            "publishYear": 2021,
            "author": "John Mark"
        }
    ],
    "_id": "5f5f8ac71edcc2122cb341c7",
    "name": "Embedded Publishers",
    "location": "Lagos, Nigeria",
    "createdAt": "2020-09-14T15:22:47.183Z",
    "updatedAt": "2020-09-14T15:33:16.449Z",
    "__v": 3
}

Saved object returns Child array

However, Should the push and save method not be called on the Publisher's document, the Publisher although existing, and the new Book created, will return an empty array of publishedBooks as seen below, when queried.



{
    "success": true,
    "data": {
        "publishedBooks": [],
        "_id": "5f5f8ac71edcc2122cb341c7",
        "name": "Embedded Publishers",
        "location": "Lagos, Nigeria",
        "createdAt": "2020-09-14T15:22:47.183Z",
        "updatedAt": "2020-09-14T15:22:47.183Z",
        "__v": 0
    }
}

Empty Array, when object isn't pushed and saved

Despite the success of the Child Referencing method, its limitation as seen above is that the size of the array of Id’s can get very large quickly, consequently seeing the database lose efficiency and performance overtime as the size of the array grows. MongoDB officially recognizes this as an anti-pattern, and strongly discourages its use for document relationships run at scale.

Parent Referencing: Parent referencing, on the other hand, is a tad different from Child Referencing as described earlier, in that, ONLY Child documents keep a reference to parent documents. This reference is singly kept on each Child document created, defined as an object ID on the Schema. Parent documents, conversely, keep no direct reference but builds one with the help of a Mongoose method called Virtuals.

Mongoose Virtual is a far more sophisticated approach to fetching referenced Child documents, and it importantly, takes up less memory for data storage, as the new key-field Mongoose virtual creates whenever a query is run, doesn’t persist on the Parent document. Occasionally, Virtuals are also referred to as "reverse-populate', as such, when you hear people mention that, don't fret!

Enough with the talk, let's jump into our project code.
First, let's see what our Book Schema looks like below:



const mongoose= require('mongoose');
const {Schema} = require('mongoose');

const bookSchema = new Schema({
   name: String,
   publishYear: Number,
   author: String,
   publisher: {
      type: Schema.Types.ObjectId,
      ref: 'Publisher',
      required: true
   }
},
{timestamps: true})

module.exports = mongoose.model('Book', bookSchema);

Next, which is where the tricky part lies, is our Parent document. Please pay attention to how virtuals are defined and a crucial part of this is the extra options we must set on the Schema, without which no results get returned. These extra options are the toJSON and toObject options. They both default to false, and are core to ensuring that whenever the Parent document is queried when these options are set to True, results are passed to the .json() method on the response call.



const mongoose = require('mongoose');
const {Schema} = require('mongoose');

const publisherSchema = new Schema({
   name: String,
   location: String
},
   {timestamps: true}
);

/**
 * @action Defined Schema Virtual
 * @keys 
 *    1.   The first parameter can be named anything.
 *          It defines the name of the key to be named on the Schema
 * 
 *    2. Options Object
 *       ref: Model name for Child collection
 *       localField: Key for reference id, stored on Child Doc, as named on Parent Doc.
 *       foreignField: Key name that holds localField value on Child Document
 */
publisherSchema.virtual('booksPublished', {
   ref: 'Book', //The Model to use
   localField: '_id', //Find in Model, where localField 
   foreignField: 'publisher', // is equal to foreignField
});

// Set Object and Json property to true. Default is set to false
publisherSchema.set('toObject', { virtuals: true });
publisherSchema.set('toJSON', { virtuals: true });


module.exports = mongoose.model('Publisher', publisherSchema);

Notice that we don’t have a publishedBooks array anymore on the Schema

Defining the virtual object comes next, and the best way to easily remember how to define it, (much easier if you’re from an SQL background), is;

SELECT “name for the virtual field” FROM “ref – Child collection name”, WHERE “localField – Parent key stored on child collection, mostly id” EQUALS “_foreignField – the name of Child schema key, storing parent id, as its value.

With both options above defined, whenever we populate our Publisher after calling the GET method, we are guaranteed to retrieve all books published by each publisher, and for further specificity, as not all the information about a book will be needed, select the keys required from each book and return it in the response body.

See how it is done in our project below:



/***
 * @action GET ALL PUBLISHERS
 * @route http://localhost:3000/publishers
 * @method GET
 */
app.get('/publishers', async (req, res) => {
   try {
      const data = await Publisher.find()
                                 .populate({path: 'booksPublished', select: 'name publishYear author'});
      res.status(200).json({success: true, data});
   } catch (err) {
      res.status(400).json({success: false, message:err.message});
   }
})

Get all Publishers



{
    "success": true,
    "data": [
        {
            "_id": "5f5f546e190dff51041db304",
            "name": "Random Publishers",
            "location": "Kigali, Rwanda",
            "createdAt": "2020-09-14T11:30:54.768Z",
            "updatedAt": "2020-09-14T11:30:54.768Z",
            "__v": 0,
            "booksPublished": [
                {
                    "_id": "5f5f548e190dff51041db305",
                    "name": "Mastering Mongoose with Javascript",
                    "publishYear": 2020,
                    "author": "Devangelist",
                    "publisher": "5f5f546e190dff51041db304"
                },
                {
                    "_id": "5f5f55ca190dff51041db307",
                    "name": "Learning Mongoose Populate method",
                    "publishYear": 2019,
                    "author": "Devangelist",
                    "publisher": "5f5f546e190dff51041db304"
                }
            ],
            "id": "5f5f546e190dff51041db304"
        }
}

Query results from getting all publishers [Notice the booksPublished array]

Summarily, Parent referencing is the best approach to referencing when using the Normalized model method and dealing with a large dataset.

If you made it to this point, thank you for reading through, and I hope you’ve learnt something-[new]. I’m happy to chat further about new knowledge, opportunities and possible corrections. I can be reached on twitter via, @oluseyeo_ or via email at, sodevangelist@gmail.com.

Happy Hacking 💥 💥

TL: DR;

There are two modelling approaches, Embedded and Referenced.
Embed only when your data will be accessed less frequently and you’re mostly only reading data.
For larger IOPS, use referencing model.
Referencing can be done in two ways, Child and Parent referencing.
If Child document size is small, under 100, use Child referencing. This stores child reference key directly on Parent document using the push method.
If the size of Child documents is huge, use the parent referencing option, reverse populating Parent documents using mongoose virtual.

Recommended further reading:
Data Access Patterns
Mongoose Documentation
Denormalization

Top comments (15)

Idiono-mfon Anthony • Jan 26 '21

Oh, this is a wonderful explanation. Thank you for spending time to make justice to this subject. I was researching ways of handling data modeling effectively in mongoDb and behold I landed In this, and it has cleared the air.

hemant-parmar • Apr 29 '21

Great article. I am about to implement relations for my MEAN project.

A Couple of questions.
Q-1. Can I have multiple path in find().populate() method?
I have a Log collection, I was thinking to add 4 ObjectId type fields. i.e. client, service, executive, manager - of course apart from its own fields. Each of these are just one object. i.e. One Log entry will be associated with one Client, Service, Execute and Manager.

So I will need to add multiple path with their own select keys. Is that recommended?

And when on frontend, when the Log is displayed, I am planning to populate relevant fields from all these 4 (i.e. Client Name, ClientCategory, ClientSubCategory, ClientRating), (ServiceName, ServiceFreq), ExecName and ManagerName.

The Log display on frontend has Search and filter options on various fields. So when a user searches or applies filters the backend Mongoose query will run again, fetch the data and display.

Q-2: What would be the performance impact if the number of entries in Log collection is in the range to 5000 - 50,000, when I use Child Ref vs Parent Ref? Which one is recommended according to you in this case.

Thanks again
hemant