Working with nested data in MongoDB simplifies mapping between application objects and database structures. However, challenges can arise when grouping or joining values within sub-document arrays, particularly for developers shifting from SQL databases with normalized data where the result is always flattened to tabular result.
I'll go through an example with the following collection, and link to MongoDB Playground for each explanation.
[
{
"projectName": "Troubleshooting PostgreSQL issues",
"team": [
{ "memberId": "cyclops", "name": "Cyclops", "role": "Postgres Expert" },
{ "memberId": "wolverine", "name": "Wolverine", "role": "Consultant" },
{ "memberId": "storm", "name": "Storm", "role": "DBA" },
{ "memberId": "beast", "name": "Beast", "role": "Developer" },
{ "memberId": "tony", "name": "Tony", "role": "Architect" }
],
"status": "active"
},
{
"projectName": "Build new apps with MongoDB",
"team": [
{ "memberId": "tony", "name": "Tony", "role": "Developer" }
],
"status": "planned"
}
]
Suppose you want a report of which projects each person was involved in, and their role(s) for each.
In PostgreSQL, you’d have normalized to three tables because of the Many-to-Many relationship. The join returns one row per project and team member, with information about projects and members duplicated. You can then aggregate them per team member to get the projects per person as an array:
SELECT
tm.member_id,
tm.name,
ARRAY_AGG(p.project_name ORDER BY p.project_id) AS projects
FROM
team_members tm
JOIN project_team pt ON tm.member_id = pt.member_id
JOIN projects p ON pt.project_id = p.project_id
GROUP BY
tm.member_id, tm.name
ORDER BY
tm.member_id;
Here's the example on db<>fiddle: https://dbfiddle.uk/FhsA9DpU
In MongoDB, relationships are embedded. I created a "team" array for project members. Let's explore how to aggregate it, starting with some common mistakes.
The Wrong Way: Grouping Directly on an Array
Suppose you try to group directly on the array field:
db.projects.aggregate([
{ $group: { _id: "$team.memberId", projects: { $push: "$projectName" } } }
])
What you get:
[
{
"_id": [
"cyclops", "wolverine", "storm", "beast", "tony"
],
"projects": ["Troubleshooting PostgreSQL issues"]
},
{
"_id": [
"tony"
],
"projects": ["Build new apps with MongoDB"]
}
]
Here's the example on MongoDB Playground: https://mongoplayground.net/p/bn7xsRMQhu2
What went wrong?
MongoDB used the whole array as the grouping key. Instead of grouping by each individual member, you get one group for each unique full-team combination.
Tip for SQL users:
In SQL, GROUP BY splits rows by scalar values, but in MongoDB, grouping on an array field groups by the entire array as a single value.
The Wrong Way: Lookup Directly on an Array
Similarly, you may try to enrich team data with $lookup
on the array field:
{
$lookup: {
from: "members",
localField: "team.memberId",
foreignField: "memberId",
as: "memberInfo"
}
}
Here's the example on MongoDB Playground:
https://mongoplayground.net/p/RVXoY5z7ke1
What you get:
- The entire array is used as the lookup key, so
memberInfo
will usually just be an empty array (no matches), unless a member’smemberId
somehow matches an array, which it never will.
Key for SQL folks:
Unlike SQL, where the JOIN is applied per row value, MongoDB $lookup
expects scalar fields, not arrays. If your join keys are inside an array, you need to flatten that array first.
The Right Way: Flatten with $unwind
, Then $group
First, flatten out your team
array so each person/project combination gets its own pipeline document:
{ $unwind: "$team" }
Here's how it looks like on MongoDB Playground:
https://mongoplayground.net/p/yGvvM-FZM5p
Then group by the member ID:
Don’t forget to include the role for each project. Here, we also collect all roles for completeness.
{
$group: {
_id: "$team.memberId",
memberName: { $first: "$team.name" },
roles: { $addToSet: "$team.role" },
projects: {
$push: {
name: "$projectName",
id: "$_id",
role: "$team.role"
}
}
}
}
The full working pipeline is like this:
db.projects.aggregate([
{ $unwind: "$team" },
{
$group: {
_id: "$team.memberId",
memberName: { $first: "$team.name" },
roles: { $addToSet: "$team.role" },
projects: {
$push: {
name: "$projectName",
id: "$_id",
role: "$team.role"
}
}
}
}
])
Here's the result on MongoDB Playground:
https://mongoplayground.net/p/oMOfvrZXa2a
This is the expected output, with one document per member and their project or array of projects:
[
{
"_id": "cyclops",
"memberName": "Cyclops",
"roles": ["Postgres Expert"],
"projects": [
{ "name": "Troubleshooting PostgreSQL issues", "id": ObjectId("60f...a1"), "role": "Postgres Expert" }
]
},
{
"_id": "wolverine",
"memberName": "Wolverine",
"roles": ["Consultant"],
"projects": [
{ "name": "Troubleshooting PostgreSQL issues", "id": ObjectId("60f...a1"), "role": "Consultant" }
]
},
{
"_id": "storm",
"memberName": "Storm",
"roles": ["DBA"],
"projects": [
{ "name": "Troubleshooting PostgreSQL issues", "id": ObjectId("60f...a1"), "role": "DBA" }
]
},
{
"_id": "beast",
"memberName": "Beast",
"roles": ["Developer"],
"projects": [
{ "name": "Troubleshooting PostgreSQL issues", "id": ObjectId("60f...a1"), "role": "Developer" }
]
},
{
"_id": "tony",
"memberName": "Tony",
"roles": ["Architect", "Developer"],
"projects": [
{ "name": "Troubleshooting PostgreSQL issues", "id": ObjectId("60f...a1"), "role": "Architect" },
{ "name": "Build new apps with MongoDB", "id": ObjectId("60f...a2"), "role": "Developer" }
]
}
]
Now you get the right result: Tony appears in both projects, and each is listed with the correct role.
Practical reminder:
MongoDB’s $unwind
makes array data behave more like SQL’s flattened rows, so aggregation and joins work as SQL users expect.
Here is an example with $lookup: https://mongoplayground.net/p/I3__p2EE9ps
What About Filtering Instead of Grouping?
If you just want to filter out all but the developers in each project, no unwinding is needed:
db.projects.aggregate([
{
$addFields: {
team: {
$filter: {
input: "$team",
as: "member",
cond: { $eq: ["$$member.role", "Developer"] }
}
}
}
}
])
Here's the MongoDB playground:
https://mongoplayground.net/p/bhDQJVKu7Rp
[
{
"_id": ObjectId("60f...a1"),
"projectName": "Troubleshooting PostgreSQL issues",
"team": [
{ "memberId": "beast", "name": "Beast", "role": "Developer" }
],
"status": "active"
},
{
"_id": ObjectId("60f...a2"),
"projectName": "Build new apps with MongoDB",
"team": [
{ "memberId": "tony", "name": "Tony", "role": "Developer" }
],
"status": "planned"
}
]
I used the following aggregation operators:
- $unwind: Flattens array fields, creating a separate document for each element in the array.
- $group: Similar to SQL’s GROUP BY, especially effective after arrays have been unwound.
- $lookup: Similar to a SQL LEFT OUTER JOIN, but by default MongoDB does not flatten joined results into multiple documents with repeated data. Instead, matched documents are returned normalized (0NF) as an array field.
SQL databases store their data normalized, often in third normal form, with multiple tables for one-to-many and many-to-many relationships to avoid duplicated data. The query result requires a join and is always in first normal form (1NF), as the result of a SQL query is a single flat table. Since 1NF cannot have arrays, SQL databases have to unnest groups, flattening relationships to multiple rows and introducing redundancy in a denormalized tabular result set.
MongoDB is not constrained by normal forms and supports rich document models, with arrays for repeating groups and nested objects directly in each document. When you use the aggregation pipeline, results can keep this nested structure. But if you want to group or join on values nested inside arrays, you’ll need to flatten the array to multiple documents using $unwind
, so further aggregation stages work as expected. In practice, $lookup
in MongoDB is often compared to JOINs in SQL, but if your fields live inside arrays, a join operation is really $unwind
followed by $lookup
.
Key takeaways for SQL users moving to MongoDB:
- Arrays are not automatically expanded or unnested.
- Whenever your “join key” or “group by value” is inside an array, always unwind first.
- In MongoDB, aggregation pipeline results can be nested and contain arrays, unlike the flat results of SQL queries.
-
$unwind is the document-model equivalent of SQL’s
UNNEST
: it’s your bridge from nested arrays to flat, row-like documents for further aggregation. - When joining with $lookup, always check whether your localField or foreignField is an array and flatten as needed.
Top comments (0)