This tutorial was written by Nancy Agarwal.
Understanding MongoDB Schema Design
If you have worked with MongoDB, you have probably heard someone say, "MongoDB is schema-less — just store whatever JSON you want." This is one of the biggest misconceptions.
MongoDB is not schema-less — it is schema-flexible. That flexibility is powerful, but it also means developers must take responsibility for good schema design.
When schema design is done correctly:
- Queries become extremely fast
- APIs stay simple
- Applications scale smoothly
When schema design is ignored:
- Queries become slow
- Documents grow bloated
- Updates become difficult
- Systems become harder to maintain
MongoDB Schema Design Do's
Model for Access Patterns, Not Entities
Relational databases typically start with entities and relationships. MongoDB flips this approach. Instead of starting with tables, you should start with how your application accesses data.
Ask yourself:
- What are the most common queries?
- What data needs to be fetched together?
- What operations happen most frequently?
Example
In one HR system, employee data was always fetched along with department and manager information. Instead of performing joins with $lookup, we used the Extended Reference Pattern.
Key manager fields were embedded inside the employee document, while full manager details were stored separately.
This allowed most reads to happen in a single document query, keeping performance extremely fast.
Patterns to Consider
- Extended Reference Pattern
Duplicate important parent fields in child documents to avoid joins.
- Pre-Compute Pattern
Store computed values such as totals, aggregates, or derived metrics so they do not need to be recalculated repeatedly.
Pro Tip: Before finalizing your schema, write down the top three queries your application performs. If your schema cannot serve them efficiently, redesign it.
Use the Pre-Compute Pattern for Expensive Aggregations
MongoDB’s aggregation framework is powerful, but running heavy aggregations repeatedly can become expensive at scale.
Example
In one analytics system, every dashboard load executed several $group aggregations on a 50 million record collection.
The result: query latency of over 3 seconds.
We solved it using the Pre-Compute Pattern. Aggregations were computed nightly and stored in a collection like dailyUserMetrics.
Dashboards then simply read the pre-aggregated results, reducing latency to around 80 milliseconds.
Use this pattern for:
- Dashboards
- Leaderboards
- Analytics reports
Use the Polymorphic Pattern for Flexible Data Types
Sometimes documents share a common structure but contain different fields depending on their type.
Example CRM activities documents:
{ type: "email", subject: "...", recipients: [...] }
{ type: "call", duration: 180, recordingUrl: "..." }
{ type: "meeting", participants: [...], location: "..." }
Each document shares common metadata like timestamps or userId, but contains type-specific attributes.
Use the Bucket Pattern for Time-Series Data
Applications storing logs, telemetry, or IoT events often generate millions of small records.
The Bucket Pattern groups multiple readings into a single document.
Example:
{
deviceId: "D123",
startTime: ISODate("2025-10-29T10:00:00Z"),
readings: [
{ t: 0, value: 1.02 },
{ t: 1, value: 1.05 }
]
}
This dramatically reduces document count and improves query performance.
Version and Evolve Your Schema
Schemas evolve as products grow. Ignoring schema evolution leads to inconsistent documents and fragile code.
Example older invoice:
{ amount: 100, currency: "USD" }
Example newer invoice:
{ amount: 100, currency: "USD", tax: 10, discount: 5, schemaVersion: 2 }
MongoDB Schema Design Don’ts
Don’t Let Arrays Grow Unbounded
Unbounded arrays are a common MongoDB anti-pattern. When arrays grow indefinitely, documents become large and slow.
In one system, all user sessions were stored inside a user document. Some users accumulated over 50,000 sessions, causing slow queries.
A better approach was to store sessions in a separate collection referenced by userId.
Don’t Treat MongoDB Like a Relational Database
Over-normalizing data leads to excessive joins using $lookup.
Recommended approach:
- Embed when relationships are tight
- Reference when data is large or independent
- Denormalize intentionally
Don’t Over-Index or Over-Engineer
Indexes improve reads but increase memory usage and slow writes.
Best practices:
- Only index fields used by queries
- Periodically review unused indexes
- Avoid excessive collections
Final Takeaways
- Design around access patterns, not tables.
- Use the right schema patterns - Extended Reference, Pre-Compute, 3. Polymorphic, Bucket - to shape data for performance.
- Never let arrays grow unbounded.
- Monitor document size and index efficiency.
- Version your schema - evolution is inevitable.
Conclusion
MongoDB’s schema flexibility is powerful but requires thoughtful design. By applying these patterns and avoiding common anti-patterns, teams can build MongoDB systems that remain performant as data grows.
If you found this useful, I'd love to hear how you're modeling your MongoDB schemas in production - or what anti-patterns you've encountered. Let's trade war stories in the comments.
Top comments (1)
Final Takeaways
Design around access patterns, not tables.
Re: Design schema around the UI as such. Let the UI form may take as the smallest UI unit . Because the whole data pertaining to a particular UI form will be accessed together, therefore let it be stored together. This avoids the costly joins, network traffic, and most obviously the developer impedance.
_
Use the right schema patterns - Extended Reference, Pre-Compute, 3. Polymorphic, Bucket - to shape data for performance.
Re: These are not the rules, any technique yielding good results, may be good write, may be good read, may be applied.
Never let arrays grow unbounded.
Re: Arrays or the array of embedded subdocuments make the storage of nested structures possible in NoSQL databses. In real world projects, there will be a natural fit between the subdocuments stored in arrays and the entity being stored here.
For e.g. The number of Items in a Purchase Order in a Procurement systems should be normally less than 8 average. This number should not make the array grow uncontrolled. However, bounding is a safety measure which may be addressed during the tuning and tweaking phase of projects, it may not be a prime concern at the initial phase of development.
Monitor document size and index efficiency.
Re: Again, most of the business transactions have a certain size and shape. If we design the schema by modelling close to to the analytical outcome of possible sizes and shapes, documents will grow controlled and index will remain efficient. This issue occur when there is not enough analytical background of the data being modelled. The monitoring out of the lack of analysis may remain as a recurring cost to the system as long as the gap exists.
Version your schema - evolution is inevitable.
Re: The point of flexible schema is certainly an advantage for iterative development. Still software systems are being made not just for the sake of development iterations. It has certain points at which the developmental activities will be nearly completed, and the system will start to be stabilised. During this period of iterative development, everything is transient in nature. Therefore the versions if counted or not counted, it is ok. The real objective is to meet the stabilisation at the expected time. The schema flexibility is an aid in iterative development and versioning may be a mere procedure. Yes, versioning has some much relevance for every act of development post stabilisation - may be enhancement or maintenance.