Do you have a lot of data in MongoDB that you want to explore and see what it contains? Do you want to create advance reports out of the documents you've stored? If so, Mongo aggregations are right for you.
When using a relational database such as Postgres we could slice and dice our data with a group by
clause. In the MongoDB world we have aggregations. I have created a simple two collection database based upon the Internet Movie Database downloadable data. We have an Actors collection.
{
"_id": "nm0554880",
"name": "Emilio Martínez Lázaro",
"yearBorn": 1945,
"yearDied": null,
"genres": [
"director",
"writer",
"producer"
],
"movies": [
"tt0067556",
"tt0076520",
"tt0081577",
"tt0088285",
"tt0089521",
"tt0093320",
"tt0095670"
]}
We also have a Movies collection
{
"type": "movie",
"title": "Star Wars",
"yearReleased": 1977,
"genres": [
"Action",
"Adventure",
"Fantasy"
],
"cast": [
"nm0000184"
...
]
}
As you can see these collections are joined in a many-to-many relationship through the cast
and movies
arrays within each document. We could have stored all of an actor's information directly under each movie or vice versa, however, that would not have made it as easy to search from both perspectives, movies and actors.
We can write our aggregations using either MongoDB Compass or using through our language of choice. I will show examples in NodeJS and converting to any language will be straightforward. In fact, a wonderful feature of Compass is that you can build aggregations and then generate language specific code directly through the UI.
My examples assume we have a connection as shown in this MongoDB tutorial.
Query Actors with Over Ten Movies
Let's start by taking a look at prolific actors who are billed in more than ten movies. We will be aggregating using the actors
collection by building a pipeline of steps that transforms, filters or summarizes our collection. Our first steps will be to add a count of movies to our data using the project aggregation.
const projectClause = {
$project: {
name: 1,
movieCount: { $size: '$movies' }
}
}
The name: 1
line indicates that we want the actor's name in our returned data. Next we add movieCount
as a field and set it's value to the size of the movies array. Note the dollar sign with movies. This tells the parser that you are referring to data and aren't just creating a string.
Next we want to filter our data using the match aggregation.
const matchClause = {
$match: { roleCount: { $gte: 10 }},
}
Using our the greater than or equal operator we filter out any one who doesn't have enough movies credited to them.
Running this aggregation can be done by creating an array of steps and passing that to the aggregate function from our mongo client.
const results = await db.collection('actors').aggregate(
[
projectClause,
matchClause,
]
);
This returns an AggregationCursor
object over which we can iterate.
await results.forEach(actor => {
console.log(actor)
});
This will get us results like:
{ _id: 'nm0677951', name: 'Daniel Petrie', roleCount: 180 }
{ _id: 'nm0677953', name: 'Donald Petrie', roleCount: 46 }
{ _id: 'nm0678072', name: 'Boris Petroff', roleCount: 14 }
{ _id: 'nm0678101', name: 'Giulio Petroni', roleCount: 25 }
{ _id: 'nm0678203', name: 'Vladimir Petrov', roleCount: 35 }
{ _id: 'nm0678210', name: 'Pavel Petrov-Bytov', roleCount: 10 }
{ _id: 'nm0678249', name: 'Aleksandar Petrovic', roleCount: 26 }
{ _id: 'nm0678328', name: 'Jean Petrovici', roleCount: 16 }
{ _id: 'nm0678476', name: 'Hans-Klaus Petsch', roleCount: 11 }
{ _id: 'nm0678483', name: 'Erik A. Petschler', roleCount: 18 }
Add in Movie Details
What if we want to see a list of movies for each actor? The lookup aggregation will allow us to perform a left outer join from the actors collection to the movies.
The first this we need to do is add the movies array to our project clause since each step of the pipeline passes data to the next and we will need it later in the pipeline.
const projectClause = {
$project: {
name: 1,
movies: 1,
roleCount: { $size: '$movies' }
}
}
Our match clause stays the same but we need to add more stages to our aggregation. In this new clause below we first add a from element that references our movies collection. Next we name the field within the actor document which is our array names "movies." Since movies is an array of IDs we link each element of the movies array to the "_id" field within the movies collection using foriegnField
. Lastly, we name our resulting array moviesNames
.
const lookupClause = {
$lookup: {
from: "movies",
localField: "movies",
foreignField: "_id",
as: "movieNames",
}
}
We could leave it there and we'll have our connection but what if we want to clean up the returned data a bit first? We can add a second project clause.
const projectClause2 = {
$project: {
name: 1,
roleCount: 1,
'movieNames.title': 1,
'movieNames.yearReleased': 1,
}
}
Notice the dot notation that allows us to specify fields within our array of movie objects. You will also see that we left out the original array or movie IDs since we don't need it once the join is complete. Then we run like this:
const results = await db.collection('actors').aggregate(
[
projectClause,
matchClause,
lookupClause,
projectClause2,
]
);
await results.forEach(actor => {
console.log(actor)
});
We get results that look like:
{
_id: 'nm0240439',
name: 'Peter Duffell',
roleCount: 72,
movieNames: [
{ title: 'The House That Dripped Blood', yearReleased: 1971 },
{ title: "From a Bird's Eye View", yearReleased: 1970 },
{ title: 'The Adventures of Black Beauty', yearReleased: 1972 },
{ title: 'England Made Me', yearReleased: 1973 },
{ title: 'Inside Out', yearReleased: 1975 },
...
Aggregations can be used for so much more than what we've seen here but these steps will allow you to get started in your data journey.
Top comments (0)