Abigail Afi Gbadago for MongoDB

Posted on Aug 18 • Edited on Sep 4

Document Modeling With the Django MongoDB Backend

In this tutorial, we will cover how the document model works using the Django MongoDB Backend with examples of data modeling, embedding and referencing, query designs, and design patterns.

A document database stores information in documents that are fast and easy for developers to work with. Document databases have flexible schema that allows the data model to evolve as the application changes and to horizontally scale out.

While Django was originally built for the RDMS structure, MongoDB provides a flexible solution of modeling data that suits applications that implement nested and/or dynamic data. It is a great option for applications that require quick read performance, fast iteration, and JSON-like handling. Thus, it’s a great fit for when your data model is dynamic, flexible, and nested.

Setting up the Django MongoDB Backend

Follow the steps outlined in the Getting Started section for the Django MongoDB Backend. This covers:

Installing the Django MongoDB Backend.
Configuring MongoDB as your database in settings.py.
Creating a sample project and app

Data modeling in Django

In Django, data is modeled in the form of objects, called models, which represent tables in a relational database or collections in MongoDB. A Django model is the single, definitive source of information about your data. It contains the essential fields and behaviors of the data you’re storing. Generally, each model maps to a single database table.

Example:

This example model defines a Human, which has a first_name and last_name:

from django.db import models

class Human(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

Designing document models in MongoDB

In MongoDB, data modeling is organizing data (could consist of key-value pairs, relational, objects, graph, and geospatial data) flexibly within collections in a database and its related entities. Document modeling supports a flexible schema for data, making it easily adaptable to your application’s needs. Restructuring your schema can help you optimize your queries and can be done as many times as necessary.

A document stores data records that share a similar structure within collections.

Key differences to note when modeling data in RDMS and MongoDB

Tables and collections

A collection in MongoDB is a group of data stored in documents and a table. In RDMS, it’s a collection of related data and it consists of columns and rows.

Rows and documents

Documents in MongoDB group related data together in a collection, and rows store data in relational databases.

Predefined schema and schema-less but structured

With RDBMS, data is strictly stored according to a predefined schema, and as such, the data must be in the required structure. In MongoDB, data is stored in a schema-less but structured format. This means that there is no strict schema validation. However, schema validation rules can be created in order to ensure that all documents in a collection share a similar structure.

Due to the flexible nature of data models in MongoDB, there are various ways to map relationships between different entities in your schema.

Main options when modeling data:

Embedding data in the same collection
Referencing to connect related data that exists in separate collections

Relationships between documents in MongoDB

A one-to-one embedded relationship is modeled by embedding documents within a document, demonstrating documents embedded within the same document.
A one-to-many embedded document relationship is modeled by embedded documents, demonstrating one document linked to many documents.
A one-to-many relationship is modeled by referencing documents, demonstrating one document linked to many documents.
A many-to-many relationship is modeled by referencing documents, demonstrating many documents linked to other related documents.

MongoDB supports flexible data modeling by organizing data within documents with the links between related entities in a database.

The document design process

The document design process consists of the following steps:

Identify your workload: This helps determine the application that runs most frequently which aids in creating effective indexes and minimizes the number of calls the application makes to the database.
Map relationships: Mapping relationships is crucial because it determines the relationships of data stored in your documents which is instrumental for faster performance and querying using indexes.
Apply design patterns: Schema design patterns help to optimize your data model based on your application's access patterns. This improves application performance and reduces schema complexity.
Create indexes: An index is a data structure that stores the value of a field or specific sets of fields. It supports query patterns and covers a query when the index contains all of the fields scanned and improves query performance.

Define a model

To create a model in a MongoDB collection with the Django MongoDB Backend:
Add your model class definitions to your application's models.py file by specifying the fields you want to store and include any model metadata in an inner Meta class. You can also use the str() method to define the string representation of your model.

Syntax to define a model

class <Model name>(models.Model):
     <field name> = <data type>
     # Include additional fields here
     class Meta:
         # Include metadata here
     def __str__(self):

For example:

from django.db import models

class Human(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    age = models.PositiveIntegerField()

   class Meta:
db_table = "human"
managed = False

    def __str__(self):
        return f"{self.first_name} {self.last_name}"

Knowing when to embed vs reference

When designing your data models, it’s essential to know the relationships between them in order to determine whether to embed or reference while mapping relationships.

When related data needs to be retrieved in a single database operation, we generally embed sub-documents or arrays whilst we reference when relationships between data need to be stored using links, from one document to another.

When to embed documents

Embedded documents store related data in a single document structure. A document can contain arrays and sub-documents with related data. These denormalized data models allow applications to retrieve related data in a single database operation.

Why embed?

It helps in denormalization for read performance, thereby reducing duplicated data and the number of joins needed during read operations/$lookup operations.

Example: A product catalog where category is embedded into product and types are embedded as an array.

Using the Django MongoDB Backend, this translates to:

from django.db import models
from django_mongodb_backend.models import EmbeddedModel
from django_mongodb_backend.fields import EmbeddedModelField, EmbeddedModelArrayField

class Category(EmbeddedModel):
    name = models.CharField(max_length=100)

class TypeOption(EmbeddedModel):
    color = models.CharField(max_length=50)
    size = models.CharField(max_length=10)

class Product(models.Model):
    name = models.CharField(max_length=200)
    price = models.FloatField()
    category = EmbeddedModelField(Category)
    types = EmbeddedModelArrayField(TypeOption)

    def __str__(self):
        return self.name

An example of the product object in Django:

# Embedded models
category = Category(name="Clothing")
types = [
    TypeOption(color="Black", size="L"),
    TypeOption(color="White", size="XL")
]

#Creating the object
product = Product.objects.create(
    _id="p1",
    name="T‑Shirt",
    price=15.99,
    category=category,
    types=types
)

When to reference documents

References store relationships between data by including links, called references, from one document to another. For example, a customerId field in an orders collection indicates a reference to a document in the customers collection.

Applications can resolve these references to access the related data. Broadly, these are normalized data models.

Why reference?

Referencing allows you to reuse data and avoid overbloated documents and nested arrays.

Example (one-to-many):
Consider we have a customer collection with the following fields:

# Create embedded instances
customer = CustomerInfo(
    name="Alice", 
    email="alice@example.com", 
    address="123 Main St"
)

items = [
    OrderItem(product="T‑Shirt", quantity=2, price=19.99),
    OrderItem(product="Jeans", quantity=1, price=39.99)
]

# Create and save the order
order = Order.objects.create(
    _id="o1001",
    customer=customer,
    items=items,
    total=79.97,
    orderDate=timezone.make_aware(datetime(2025, 6, 11, 11, 0))
)

print(order)

Considering an instance when our data increases or needs to be updated, it will overbloat the document. Hence, we can reference the customerId in the orders collection, making it easier to update our collection as the data keeps increasing and changing, and reduce data duplication.

Example:

# Create Customer instance
customer = Customer.objects.create(
    _id="a123",
    name="Afi",
    email="afi@example.com",
    address="123 MongoDB Street"
)

# Create Order instance
order = Order.objects.create(
    _id="o1001",
    customerId="a123",
    items=[
        OrderItem(product="Tracksuit", quantity=1, price=29.99),
        OrderItem(product="MongoDB Beanie", quantity=1, price=9.99)
    ],
    total=39.99,
    orderDate=timezone.make_aware(datetime(2025, 8, 11, 10, 0))
)

Design patterns in the document model

Schema design patterns optimize reads and writes for the document model. We will discuss the:

Bucket Pattern.
Outlier Pattern.
Subset Pattern.

Bucket Pattern

The Bucket Pattern is used to bucket data (time-series or analytics) together. Data is organized into specific groups, increasing the ability to discover historical trends, provide future forecasting, and optimize our use of storage.

Example where a bank groups monthly transactions:

# Embedded transaction instances
transactions = [
    Transaction(date=date(2025, 8, 1), amount=-50.0, description="Grocery"),
    Transaction(date=date(2025, 8, 3), amount=-120.0, description="E Energy"),
    Transaction(date=date(2025, 8, 5), amount=1000.0, description="Paycheck"),
]

# MonthlyStatement document
bank_statement = MonthlyStatement.objects.create(
    _id="a123",
    userId="a123",
    month="2025-08",
    transactions=transactions
)

Outlier Pattern

The Outlier Pattern is useful for data that falls outside the "normal" pattern and needs to be tracked. The significant thing to note is that the outliers show a lot of difference in their data which isn’t considered "normal" and tweaking the application design to accommodate these edge cases can degrade performance for the more typical queries and documents.

Example:
Let’s say for a normal book entry, we have the following document:

book = Book.objects.create(
    _id="a123",
    title="Some Crime Novel",
    author="Jane Doe",
    copiesSold=1500
)

Then, consider this outlier example:

outlier_book = OutlierBook.objects.create(
    _id="b123",
    title="Goblet of Fire",
    author="John Doe",
    copiesSold=180000000,
    outlier=True
)

We see that the “copiesSold” is very high as compared to the previous entry and as such, the “outlier” field is added to track if it’s an outlier or not.

Subset Pattern

The Subset Pattern addresses the issues that arise when the working set (frequently accessed data) and indexes grow beyond the physical RAM allotted, which can result in information being removed from memory.

Example:
Let’s use the previous book example (from the Outlier Pattern) but let’s add an embedded array for reviews, where a book can have multiple reviews but a review can belong to only one book. In the case, where the copies sold is 180,000,000, the reviews will be exponentially high and will create a bloated document which affects querying and performance.

As such, we can only display the latest reviews per book and put the reviews in a separate collection, as seen below.

#imports
# Create review instances
reviews = [
    Review(reviewId=8212, author="Jess", text="Amazing read for a Potter Head!", date=date(2025, 8, 12)),
    Review(reviewId=8211, author="Alice", text="Same feeling as the movie!", date=date(2025, 8, 12))
]

# Create the book instance
book = Book.objects.create(
    _id="b123",
    title="Goblet of Fire",
    author="John Doe",
    copiesSold=180000000,
    outlier=True,
    recentReviews=reviews
)

Separate collection for reviews:

# Create review instances
reviews = [
    Review(reviewId=8212, author="Afi", text="Amazing read for a Potter Head!", date=date(2025, 8, 12)),
    Review(reviewId=8211, author="Jane", text="Same feeling as the movie!", date=date(2025, 8, 12))
]

# Create the product instance
product = Product.objects.create(
    _id="b123",
    title="Goblet of Fire",
    author="John Doe",
    copiesSold=180000000,
    outlier=True,
    recentReviews=reviews
)

Using Django ORM equivalents with the document model

The document model used with Django doesn’t have an ORM. Rather, it integrates directly with Django's model layer and ORM, so your Django models.Model classes map to MongoDB collections and documents, rather than SQL tables.

ORM operations—like filtering, lookups, and joins—are converted into MongoDB operations, such as lookups using optimized MongoDB aggregations, JOINs performed with $lookup, index definitions, and other schema-level operations.

In the next step, we will be looking at examples of ORM equivalent queries with sample data from the sample_guide dataset that contains documents that represent a planet in our solar system.

First, we have to model the data based on the document showed in the sample_samplies dataset to match what we want to use in our models.py file:

from django.db import models
from django_mongodb_backend.models import EmbeddedModel
from django_mongodb_backend.managers import MongoManager
from django_mongodb_backend.fields import (
    ArrayField,
    EmbeddedModelArrayField,
    EmbeddedModelField
)

class Item(EmbeddedModel):
    name = models.CharField(max_length=100)
    tags = ArrayField(
        models.CharField(max_length=50),
        null=True,
        blank=True
    )
    price = models.DecimalField(max_digits=10, decimal_places=2)
    quantity = models.IntegerField()

class CustomerInfo(EmbeddedModel):
    gender = models.CharField(max_length=1)
    age = models.IntegerField()
    email = models.EmailField()
    satisfaction_score = models.IntegerField(null=True, blank=True)

class Sale(models.Model):
    sale_date = models.DateTimeField()
    items = EmbeddedModelArrayField(Item, null=True, blank=True)
    store_location = models.CharField(max_length=100)
    customer = EmbeddedModelField(CustomerInfo)
    coupon_used = models.BooleanField()
    purchase_method = models.CharField(max_length=50)
    objects = MongoManager()

    class Meta:
        db_table = "sales"
        managed = False

    def __str__(self):
        return f"Sale at {self.store_location} on {self.sale_date}"

After saving this in our models.py, run migrations and insert objects in order to query from the sample_supplies database.

#Imports
from sample_supplies.models import Sale, Item, CustomerInfo

from django.utils import timezone
from datetime import datetime
from django_mongodb_backend.managers import MongoManager
from django.db.models import Q

Example of an item object inserted:

items = [
    Item(name="notepad", tags=["office", "writing", "school"], price=35.29, quantity=2),
    Item(name="pens", tags=["writing", "office", "school", "stationary"], price=56.12, quantity=5),
    Item(name="envelopes", tags=["stationary", "office", "general"], price=19.95, quantity=8),
    Item(name="binder", tags=["school", "general", "organization"], price=14.16, quantity=3),
]

Example of a customer object inserted:

customer = CustomerInfo(gender="F", age=32, email="janedoe@fakemail.com", satisfaction_score=4)

Example of a sale object inserted:

sale = Sale.objects.create(
    _id=ObjectId(),
        sale_date=timezone.now(),
        items=items,
        store_location="Accra",
        customer=customer,
        coupon_used=True,
        purchase_method="Online"
)

Specifying A database query

To query the sample_guide database equivalent to Django ORM queries, we use the QuerySet methods (all(), filter(), get(), exclude(), and raw_aggregate()) on the Sale model manager in a query filter.

Examples:

all()

To retrieve all documents from a collection, call the all() method on your model's manager.

Sale.objects.all()

Output:

<QuerySet [<Sale: Sale at Accra on 2025-08-12 14:50:33.600000+00:00>, 
<Sale: Sale at London on 2025-08-12 14:51:01.244000+00:00>, <Sale: Sale at Paris on 2025-08-12 14:51:08.573000+00:00>]>

filter()

To query a collection for documents that match a set of criteria, user filter().

Sale.objects.filter(coupon_used=True)

Output:

<QuerySet [<Sale: Sale at Accra on 2025-08-12 14:50:33.600000+00:00>, 
<Sale: Sale at Paris on 2025-08-12 14:51:08.573000+00:00>]>

get()

To retrieve a document that matches a query, use the get() method.
You can also use the filter() method and first().

Sale.objects.filter(purchase_method="Online").first()

Output:

<Sale: Sale at Accra on 2025-08-12 14:50:33.600000+00:00>

Sale.objects.get(store_location="Accra")

Output:

<Sale: Sale at Accra on 2025-08-12 14:50:33.600000+00:00>

exclude()

To query a collection for documents that do not meet your search criteria, call the exclude() method on your model's manager. Pass the exclusion criteria as an argument to the exclude() method.

Sale.objects.exclude(coupon_used="True")

Output:

<QuerySet [<Sale: Sale at London on 2025-08-12 14:51:01.244000+00:00>]>

$Lookups within EmbeddedModelArray field

Sale.objects.filter(items__price__gt=30.00)

Output:

<QuerySet [<Sale: Sale at Accra on 2025-08-09 08:50:33.600000+00:00>, 
<Sale: Sale at London on 2025-08-10 14:51:01.244000+00:00>, 
<Sale: Sale at Paris on 2025-08-10 10:51:08.573000+00:00>, 
<Sale: Sale at Paris on 2025-08-10 17:47:13.089000+00:00>, 
<Sale: Sale at Lisbon on 2025-08-10 17:47:40.027000+00:00>]>

Query a primary key field

You can use the pk lookup shortcut to query primary key values, which MongoDB stores as ObjectId values.

Sale.objects.get(pk=ObjectId("593a1394f29313caabce0d37"))

Output:

<Sale: Sale at Sao Tome on 2025-08-12 17:53:11.343000+00:00>

Q object

You can use Q objects to run queries with multiple sets of matching criteria. To create a Q object, pass your query filter to the Q() method. You can pass multiple Q objects as arguments to your query method and separate each Q object by an OR (|), AND (&), or XOR (^) operator.

Sale.objects.filter(
    (Q(store_location__startswith="Sao Tome") |   Q(store_location__startswith="Accra")) 
)

Output:

<QuerySet [<Sale: Sale at Accra on 2025-08-12 14:50:33.600000+00:00>, <Sale: Sale at Sao Tome on 2025-08-12 17:53:11.343000+00:00>, <Sale: Sale at Accra on 2025-08-12 17:58:41.477000+00:00>]>

Grouping and aggregations

An aggregation pipeline consists of one or more stages that process documents. Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, calculate values, and output documents from one stage to the next. For example, return the total, average, maximum, and minimum values.

Performing raw aggregate on a QuerySet

If you want to run complex queries that Django's query API does not support, you can use the raw_aggregate() method. This method allows you to specify your query criteria in a MongoDB aggregation pipeline, which you pass as an argument to raw_aggregate().

Example:

sales= Sale.objects.raw_aggregate([
    {"$match": {"store_location": "Accra"}},
    {"$project": {
        "title": 1,
        "released": 1
    }
}])

for s in sales:
    print(f"Sales at the store location in {s.store_location} on the date: {s.sale_date}\n")

Output:

Sales at the store location in Accra on the date: 2025-08-12 14:50:33.600000+00:00
Sales at the store location in Accra on the date: 2025-08-12 17:58:41.477000+00:00
Sales at the store location in Accra on the date: 2025-08-12 20:00:50.157000+00:00

Best practices when modeling data in MongoDB

Avoid deeply nested documents—know when to embed or reference.
Understand your data and try to think ahead in terms of scaling so that your queries work efficiently using indexes.
Use pagination when working with large data to avoid unnecessary scrolling.

Conclusion

The document model provides an efficient way of flexibly structuring your data and helps improve query responses and efficiency when done correctly. Keep in mind that embedding isn’t always bad (it depends on the use case), and referencing isn’t always the bad guy just because you want to swerve JOINS or $lookups. They all serve a purpose to fit your application needs—that’s why you need to understand your data in order to model data efficiently. Have you tried the document model yet? If yes, how was the experience? If no, give it a try and let’s discuss. Check the resources below for more information.

DEV Community

Document Modeling With the Django MongoDB Backend

Setting up the Django MongoDB Backend

Data modeling in Django

Designing document models in MongoDB

Key differences to note when modeling data in RDMS and MongoDB

Tables and collections

Rows and documents

Predefined schema and schema-less but structured

Relationships between documents in MongoDB

The document design process

Define a model

Syntax to define a model

Knowing when to embed vs reference

When to embed documents

Why embed?

When to reference documents

Why reference?

Design patterns in the document model

Bucket Pattern

Outlier Pattern

Subset Pattern

Using Django ORM equivalents with the document model

Specifying A database query

all()

filter()

get()

exclude()

$Lookups within EmbeddedModelArray field

Query a primary key field

Q object

Grouping and aggregations

Performing raw aggregate on a QuerySet

Best practices when modeling data in MongoDB

Conclusion

Resources

Top comments (0)