MongoDB Guide | Aggregation
by Jacob Pelletier
Contact me for suggestions, comments, or just to say hello at jacob@yankeedo.io! 👋
Follow me on my journey in learning MongoDB.
I highly recommend checking out Mongo University!
What will we be covering in this guide
- What aggregation is.
- What the stages are.
Introduction.
What is aggregation?
Aggregation is an analysis and aggregation of data. An aggregation occurs in one or more stages that process documents. Documents can be filtered, sorted, grouped, and transformed in a pipeline. Outputs of one stage become inputs for another. Aggregations do not affect the original source data.
db.collection.aggregate([
{ $stage_name: {<expression>} },
{ $stage_name: {<expression>} }
])
Each stage is a single operation on the data. Common operation stages are:
- $match - filters for data that match criteria.
- $group - groups documents based on criteria.
- $sort - puts documents in a specified order.
Each stage has its own syntax to carry out its operations.
Field names prefixed with a dollar sign are called "field paths". It allows us to refer to the value in that field.
For example, the code below grabs the values of the first_name
and last_name
value fields and sets the concatenation of these values to defaultUsername
.
$set: {
defaultUsername: {
$concat: ["$first_name", "$last_name"]
}
}
An aggregation is a collection and summary of data. A stage is a built-in methode that can be completed on data (does not alter it). An aggregation pipeline is a series of stages completed in data in some order.
db.collection.aggregate([
{
$stage1: {
{ expression1 }...
},
$stage2: {
{ expression1 }...
}
}
])
Stage Syntax:
{$match: {<query>}}
Query Syntax:
{$expr: {<expression>}}
Method Syntax:
db.collection.aggregate([])
Match and Group Stages
These are very common aggregations.
- $match: Filters documents matching some criteria a. takes one argument. b. works exactly like a find command. c. {$match: {"state": "CT"}}. d. place as early as possible so it can use indexes. e. it reduces the amount of documents, and thus the amount of processing required.
*match airbnb example *
Find all airbnbs with 5 or more bedrooms
{ $match: { "beds": { $gte: 5 } } },
- $group: Creates a single document for each distinct value.
a. groups documents by a group key.
b. output is one document for each unique value of the group key.
c. requires a group key, specified by
_id
, the field to group by. d. it may also include one or more fields with an accumulator. e. the accumulator specifies how to aggregate the information for each of the groups.
$group generic example:
$group:
{
_id: <expression>, // group key
<field>: { <accumulator> : <expression> }
}
$group airbnb example
Find all airbnbs with 5 or more bedrooms and return the count of each record.
db.listingsAndReviews.aggregate([
{ $match: { "beds": { $gte: 5 } } },
{ $group: { _id: "$beds", count: { $count: {} } } },
{ $sort: { count: -1 } },
])
Sort and Limit Stage
I had already used sort earlier to make the results more readable.
$sort: as demonstrated in the $group example, the $sort stage sort by some field, such as the count field above.
a.1
represents ascending order, while-1
represents descending order.$limit: limits the number of documents that are passed to the next stage.
a. only takes a positive integer
db.listingsAndReviews.aggregate([
{ $match: { "beds": { $gte: 5 } } },
{ $group: { _id: "$beds", count: { $count: {} } } },
{ $sort: { count: -1 } },
{ $limit: 3 }
])
As you can see, the order of the stages matters. If we were to limit before we sorted, then we would get different results.
Project, Count, and Set Stage
- $project: determines resulting output shape.
a. specifies fields to return in aggregation.
b. similar to find operation.
c. should be used last if possible.
d. chooses which fields to keep by either inclusion or exclusion.
e. set the field to 1 to include,
<field> : 1
f. set the field to 0 to include,<field> : 0
g. new value specified for new fields and existing fields can be given a new value,<field> : <new value>
Notice the differences in the project stages below.
- $count: counts the number of documents in the pipeline.
For example, if we wanted to count how many bed options above 5 there are, we could perform the following aggregation.
- $set: adds or modifies fields in the pipeline.
In this example, we rename beds to _id.
The same pipeline without the $set stage.
Out Stage
- $out writes the documents that are returned by an aggregation pipeline into a collection. a. must the the last stage. b. creates a new collection if it does not already exist. c. if the collection exists, $out replaces the existing collection with new data.
Top comments (0)