DEV Community

Derek Binkley
Derek Binkley

Posted on

Take a Data Adventure with MongoDB Aggregations, Part 1

Do you have a lot of data in MongoDB that you want to explore and see what it contains? Do you want to create advance reports out of the documents you've stored? If so, Mongo aggregations are right for you.

When using a relational database such as Postgres we could slice and dice our data with a group by clause. In the MongoDB world we have aggregations. I have created a simple two collection database based upon the Internet Movie Database downloadable data. We have an Actors collection.

{  
  "_id": "nm0554880",
  "name": "Emilio Martínez Lázaro",
  "yearBorn": 1945,
  "yearDied": null,
  "genres": [
    "director",
    "writer",
    "producer"
  ],
  "movies": [
    "tt0067556",
    "tt0076520",
    "tt0081577",
    "tt0088285",
    "tt0089521",
    "tt0093320",
    "tt0095670"
  ]}
Enter fullscreen mode Exit fullscreen mode

We also have a Movies collection

{
  "type": "movie",
  "title": "Star Wars",
  "yearReleased": 1977,
  "genres": [
    "Action",
    "Adventure",
    "Fantasy"
  ],
  "cast": [
    "nm0000184"
    ...
  ]
}
Enter fullscreen mode Exit fullscreen mode

As you can see these collections are joined in a many-to-many relationship through the cast and movies arrays within each document. We could have stored all of an actor's information directly under each movie or vice versa, however, that would not have made it as easy to search from both perspectives, movies and actors.

We can write our aggregations using either MongoDB Compass or using through our language of choice. I will show examples in NodeJS and converting to any language will be straightforward. In fact, a wonderful feature of Compass is that you can build aggregations and then generate language specific code directly through the UI.

My examples assume we have a connection as shown in this MongoDB tutorial.

Query Actors with Over Ten Movies

Let's start by taking a look at prolific actors who are billed in more than ten movies. We will be aggregating using the actors collection by building a pipeline of steps that transforms, filters or summarizes our collection. Our first steps will be to add a count of movies to our data using the project aggregation.

const projectClause = {
  $project: {
    name: 1,
    movieCount: { $size: '$movies' }
  }
}
Enter fullscreen mode Exit fullscreen mode

The name: 1 line indicates that we want the actor's name in our returned data. Next we add movieCount as a field and set it's value to the size of the movies array. Note the dollar sign with movies. This tells the parser that you are referring to data and aren't just creating a string.

Next we want to filter our data using the match aggregation.

const matchClause = {
  $match: { roleCount: { $gte: 10 }},
}
Enter fullscreen mode Exit fullscreen mode

Using our the greater than or equal operator we filter out any one who doesn't have enough movies credited to them.

Running this aggregation can be done by creating an array of steps and passing that to the aggregate function from our mongo client.

const results = await db.collection('actors').aggregate(
  [
    projectClause,
    matchClause,
  ]
);
Enter fullscreen mode Exit fullscreen mode

This returns an AggregationCursor object over which we can iterate.

await results.forEach(actor => {
  console.log(actor)
});
Enter fullscreen mode Exit fullscreen mode

This will get us results like:

{ _id: 'nm0677951', name: 'Daniel Petrie', roleCount: 180 }
{ _id: 'nm0677953', name: 'Donald Petrie', roleCount: 46 }
{ _id: 'nm0678072', name: 'Boris Petroff', roleCount: 14 }
{ _id: 'nm0678101', name: 'Giulio Petroni', roleCount: 25 }
{ _id: 'nm0678203', name: 'Vladimir Petrov', roleCount: 35 }
{ _id: 'nm0678210', name: 'Pavel Petrov-Bytov', roleCount: 10 }
{ _id: 'nm0678249', name: 'Aleksandar Petrovic', roleCount: 26 }
{ _id: 'nm0678328', name: 'Jean Petrovici', roleCount: 16 }
{ _id: 'nm0678476', name: 'Hans-Klaus Petsch', roleCount: 11 }
{ _id: 'nm0678483', name: 'Erik A. Petschler', roleCount: 18 }
Enter fullscreen mode Exit fullscreen mode

Add in Movie Details

What if we want to see a list of movies for each actor? The lookup aggregation will allow us to perform a left outer join from the actors collection to the movies.

The first this we need to do is add the movies array to our project clause since each step of the pipeline passes data to the next and we will need it later in the pipeline.

const projectClause = {
    $project: {
      name: 1,
      movies: 1,
      roleCount: { $size: '$movies' }
    }
  }
Enter fullscreen mode Exit fullscreen mode

Our match clause stays the same but we need to add more stages to our aggregation. In this new clause below we first add a from element that references our movies collection. Next we name the field within the actor document which is our array names "movies." Since movies is an array of IDs we link each element of the movies array to the "_id" field within the movies collection using foriegnField. Lastly, we name our resulting array moviesNames.

const lookupClause = {
  $lookup: {
    from: "movies",
    localField: "movies",
    foreignField: "_id",
    as: "movieNames",
  }
}
Enter fullscreen mode Exit fullscreen mode

We could leave it there and we'll have our connection but what if we want to clean up the returned data a bit first? We can add a second project clause.

const projectClause2 = {
  $project: {
    name: 1,
    roleCount: 1,
    'movieNames.title': 1,
    'movieNames.yearReleased': 1,
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice the dot notation that allows us to specify fields within our array of movie objects. You will also see that we left out the original array or movie IDs since we don't need it once the join is complete. Then we run like this:

const results = await db.collection('actors').aggregate(
  [
    projectClause,
    matchClause,
    lookupClause,
    projectClause2,
  ]
);

await results.forEach(actor => {
  console.log(actor)
});
Enter fullscreen mode Exit fullscreen mode

We get results that look like:

{
  _id: 'nm0240439',
  name: 'Peter Duffell',
  roleCount: 72,
  movieNames: [
    { title: 'The House That Dripped Blood', yearReleased: 1971 },
    { title: "From a Bird's Eye View", yearReleased: 1970 },
    { title: 'The Adventures of Black Beauty', yearReleased: 1972 },
    { title: 'England Made Me', yearReleased: 1973 },
    { title: 'Inside Out', yearReleased: 1975 },
...
Enter fullscreen mode Exit fullscreen mode

Aggregations can be used for so much more than what we've seen here but these steps will allow you to get started in your data journey.

Top comments (0)