This tutorial was written by Nancy Agarwal.
Understanding MongoDB Schema Design
If you have worked with MongoDB, you have probably heard someone say, "MongoDB is schema-less — just store whatever JSON you want." This is one of the biggest misconceptions.
MongoDB is not schema-less — it is schema-flexible. That flexibility is powerful, but it also means developers must take responsibility for good schema design.
When schema design is done correctly:
- Queries become extremely fast
- APIs stay simple
- Applications scale smoothly
When schema design is ignored:
- Queries become slow
- Documents grow bloated
- Updates become difficult
- Systems become harder to maintain
MongoDB Schema Design Do's
Model for Access Patterns, Not Entities
Relational databases typically start with entities and relationships. MongoDB flips this approach. Instead of starting with tables, you should start with how your application accesses data.
Ask yourself:
- What are the most common queries?
- What data needs to be fetched together?
- What operations happen most frequently?
Example
In one HR system, employee data was always fetched along with department and manager information. Instead of performing joins with $lookup, we used the Extended Reference Pattern.
Key manager fields were embedded inside the employee document, while full manager details were stored separately.
This allowed most reads to happen in a single document query, keeping performance extremely fast.
Patterns to Consider
- Extended Reference Pattern
Duplicate important parent fields in child documents to avoid joins.
- Pre-Compute Pattern
Store computed values such as totals, aggregates, or derived metrics so they do not need to be recalculated repeatedly.
Pro Tip: Before finalizing your schema, write down the top three queries your application performs. If your schema cannot serve them efficiently, redesign it.
Use the Pre-Compute Pattern for Expensive Aggregations
MongoDB’s aggregation framework is powerful, but running heavy aggregations repeatedly can become expensive at scale.
Example
In one analytics system, every dashboard load executed several $group aggregations on a 50 million record collection.
The result: query latency of over 3 seconds.
We solved it using the Pre-Compute Pattern. Aggregations were computed nightly and stored in a collection like dailyUserMetrics.
Dashboards then simply read the pre-aggregated results, reducing latency to around 80 milliseconds.
Use this pattern for:
- Dashboards
- Leaderboards
- Analytics reports
Use the Polymorphic Pattern for Flexible Data Types
Sometimes documents share a common structure but contain different fields depending on their type.
Example CRM activities documents:
{ type: "email", subject: "...", recipients: [...] }
{ type: "call", duration: 180, recordingUrl: "..." }
{ type: "meeting", participants: [...], location: "..." }
Each document shares common metadata like timestamps or userId, but contains type-specific attributes.
Use the Bucket Pattern for Time-Series Data
Applications storing logs, telemetry, or IoT events often generate millions of small records.
The Bucket Pattern groups multiple readings into a single document.
Example:
{
deviceId: "D123",
startTime: ISODate("2025-10-29T10:00:00Z"),
readings: [
{ t: 0, value: 1.02 },
{ t: 1, value: 1.05 }
]
}
This dramatically reduces document count and improves query performance.
Version and Evolve Your Schema
Schemas evolve as products grow. Ignoring schema evolution leads to inconsistent documents and fragile code.
Example older invoice:
{ amount: 100, currency: "USD" }
Example newer invoice:
{ amount: 100, currency: "USD", tax: 10, discount: 5, schemaVersion: 2 }
MongoDB Schema Design Don’ts
Don’t Let Arrays Grow Unbounded
Unbounded arrays are a common MongoDB anti-pattern. When arrays grow indefinitely, documents become large and slow.
In one system, all user sessions were stored inside a user document. Some users accumulated over 50,000 sessions, causing slow queries.
A better approach was to store sessions in a separate collection referenced by userId.
Don’t Treat MongoDB Like a Relational Database
Over-normalizing data leads to excessive joins using $lookup.
Recommended approach:
- Embed when relationships are tight
- Reference when data is large or independent
- Denormalize intentionally
Don’t Over-Index or Over-Engineer
Indexes improve reads but increase memory usage and slow writes.
Best practices:
- Only index fields used by queries
- Periodically review unused indexes
- Avoid excessive collections
Final Takeaways
- Design around access patterns, not tables.
- Use the right schema patterns - Extended Reference, Pre-Compute, 3. Polymorphic, Bucket - to shape data for performance.
- Never let arrays grow unbounded.
- Monitor document size and index efficiency.
- Version your schema - evolution is inevitable.
Conclusion
MongoDB’s schema flexibility is powerful but requires thoughtful design. By applying these patterns and avoiding common anti-patterns, teams can build MongoDB systems that remain performant as data grows.
If you found this useful, I'd love to hear how you're modeling your MongoDB schemas in production - or what anti-patterns you've encountered. Let's trade war stories in the comments.
Top comments (0)