DEV Community

ilinieja
ilinieja

Posted on

MongoDB Materialized Views in NodeJS + Mongoose

What are materialized views?

Materialized views are database objects containing the result of a query. They're useful when we want to quickly retrieve the output of a heavy aggregate query without running it for each request.

In MongoDB, materialized views are just like any other collection. In our case, we want to create materialized views to store multiple time-bucketed analytical views from a large events collection. Depending on a use case, we can update views on a daily, weekly, monthly or any other timespan basis. We can also trigger updates on-demand.

Working example code for this article is available on GitHub.

Original collection

We're going to use an example collection of Orders - large collection of orders made in an online shop. Here's a simplified schema for it.

// Order.ts

const Order = new mongoose.Schema({
  customer: {
    fullName: { type: String, required: true },
    email: { type: String, required: true },
  },
  items: [
    {
      item: {
        name: { type: String, required: true },
        price: { type: Number, required: true },
      },
      quantity: { type: Number, required: true },
    },
  ],
  date: { type: Date, required: true },
});

// This prevents Mongoose from recompiling the model.
export default mongoose.models?.Order || mongoose.model("Order", Order);
Enter fullscreen mode Exit fullscreen mode

Materialized view

We're going to aggregate daily stats of all orders without grouping. That'll reduce out huge event collection to steady-growing 365-documents-a-year collection that's inexpensive to query and store.

This view is expensive to create though since we're aggregating from a whole Orders collection. One of optimizations here can be to only aggregate from the Orders that were created later than the last entry in this materialized view.

It's convenient to store aggregation pipeline for a materialized view model next to it to avoid updating them separately.

// DayStats.ts

const DAY_STATS_COLLECTION_NAME = "day_stats";

// Schema of the aggregate documents stored in materialized view.
const DayStats = new mongoose.Schema(
  {
    _id: { type: String, required: true },
    date: { type: Date, required: true },
    revenue: { type: Number, required: true },
    orders: { type: Number, required: true },
    averageOrderRevenue: { type: Number, required: true },
  },
  { collection: DAY_STATS_COLLECTION_NAME, versionKey: false }
);

// Aggregation pipeline for the materialized view.
export async function calculateDayStats() {
  await Order.aggregate([
    // Flattens items array to calculate sums later.
    { $unwind: { path: "$items" } },
    // Actual calculations for totals and averages.
    {
      $group: {
        _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
        revenue: {
          $sum: { $multiply: ["$items.quantity", "$items.item.price"] },
        },
        orders: { $sum: 1 },
        averageOrderRevenue: {
          $avg: { $multiply: ["$items.quantity", "$items.item.price"] },
        },
      },
    },
    // Adding date object after grouping for querying 
    // the materialized view in future (to use date comparison).
    {
      $addFields: {
        date: {
          $dateFromString: { dateString: "$_id", format: "%Y-%m-%d" },
        },
      },
    },
    // Rounding calculated values.
    {
      $set: {
        revenue: { $round: ["$revenue", 5] },
        orders: { $round: ["$orders", 5] }, 
        averageOrderRevenue: { $round: ["$averageOrderRevenue", 5] },
      },
    },
    // Saving to materialized view (writing into collection).
    { $merge: { into: DAY_STATS_COLLECTION_NAME, whenMatched: "replace" } },
  ]);
}

// This prevents Mongoose from recompiling the model.
export default mongoose.models?.DayStats ||
  mongoose.model("DayStats", DayStats);
Enter fullscreen mode Exit fullscreen mode

Updating the materialized view

Now we have the materialized view and we need to update it daily. For this, we'll use Agenda, a light-weight job scheduling library for Node.js with Mongo backed persistence layer.

We'll have a worker script that'll set up the queue, schedule updates and update the materialized view based on that schedule.

Worker script needs to be always running - it regularly checks the scheduled jobs and triggers them when needed.

// worker.ts

import * as dotenv from "dotenv";
dotenv.config();

import setupQueue from "./setup-queue";
import scheduleDbViewUpdates from "./db-view-updates";

(async function () {
  const queue = await setupQueue();
  const jobs = await scheduleDbViewUpdates(queue);

  console.info("Jobs worker started");
})();
Enter fullscreen mode Exit fullscreen mode

setup-queue is just going to initialize Agenda and point it to MongoDB.

// setup-queue.ts

import { Agenda } from "@hokify/agenda";

const DATABASE_URI = process.env.DATABASE_URI;
const DATABASE_COLLECTION = process.env.JOBS_DATABASE_COLLECTION;

if (!DATABASE_URI || !DATABASE_COLLECTION) {
  throw new Error(
    "DATABASE_URI and JOBS_DATABASE_COLLECTION are missing in env"
  );
}

export default async function setupQueue() {
  const agenda = new Agenda({
    db: {
      address: DATABASE_URI,
      collection: DATABASE_COLLECTION,
    },
    processEvery: "5 minutes",
  });

  agenda.on("start", (job) => {
    console.info(`Job ${job.attrs.name} started`);
  });

  agenda.on("complete", (job) => {
    console.info(`Job ${job.attrs.name} completed`);
  });

  agenda.on("fail", (err, job) => {
    console.error(`Job ${job.attrs.name} failed`, err);
  });

  await agenda.start();

  return agenda;
}
Enter fullscreen mode Exit fullscreen mode

And db-view-updates will schedule our view updates.

// db-view-updates.ts

import { Agenda } from "@hokify/agenda";

import dbConnection from "../shared/dbConnection";
import { calculateDayStats } from "../models/DayStats";

const JOB_NAMES = {
  CALCULATE_DAY_STATS: "CALCULATE_DAY_STATS",
  CALCULATE_DAY_GEO_BUCKET_STATS: "CALCULATE_DAY_GEO_BUCKET_STATS",
  CALCULATE_DAY_PLATFORM_STATS: "CALCULATE_DAY_PLATFORM_STATS",
};

export default async function scheduleDbViewUpdates(agenda: Agenda) {
  const jobs = [];

  agenda.define(JOB_NAMES.CALCULATE_DAY_STATS, async (job) => {
    const connection = await dbConnection();

    await calculateDayStats();

    connection.disconnect();
  });

  jobs.push(await agenda.every("1 day", JOB_NAMES.CALCULATE_DAY_STATS));

  return jobs;
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

This simple setup allows adding arbitrary number of materialized views for performance-intensive MongoDB aggregations. It can also be used with aggregations joining multiple collections.

When extending this setup it's important to keep an eye on aggregations performance and optimize them using indexes, only recalculating for new documents and running aggregations on secondary replica.

Working example code is available on GitHub.

Top comments (1)

Collapse
 
datnguyen2001 profile image
dat nguyen • Edited

don't know why this post is so flop, but it saved my day, thanks ilinieja