MongoDB Guests for MongoDB

Posted on Apr 8

Best Practices Cheat Sheet for Flask-PyMongo

#database #python #webdev #tutorial

This tutorial was written by João Araújo.

Introduction

Starting a project is always a challenge. Naming the first things, deciding on a project structure, what stack to use, and even where we want that project to go. While it is important to have most of it planned, if we are unsure or lack visibility into the project's direction, we should use the right tools to navigate any challenges that may arise. The ability to build applications that evolve over time is a strength of both MongoDB and Flask. If you are starting a project today and are not sure how fast it will grow, or if you just need quick prototyping, MongoDB performance aligned with Flask should be considered. If you are here just for the cheat sheet, it is at the end - but consider reading through to understand the “whys” behind it.

What is MongoDB?

MongoDB is an open-source NoSQL database that stores and manages data using flexible document-oriented structures rather than the traditional tables and rows of relational databases. In MongoDB, data is stored in JSON-like documents (BSON) organized into databases and collections, allowing schemas to be dynamic and easily changed over time, making it highly adaptable for handling diverse, structured, and unstructured data. It’s designed for scalability, flexibility, and ease of development, and is widely used in modern web and mobile applications where rapid iteration and large data volumes are common.

What is Flask?

Flask is a micro web framework written in Python that provides the core essentials for building web applications without imposing a lot of structure or extra components, making it lightweight and easy to use. Flask’s biggest strengths are its simplicity and flexibility. Because it doesn’t enforce a specific project layout or include unnecessary features by default, developers can structure applications as they see fit and choose the best tools for tasks such as database access, authentication, and form handling. Additionally, Flask has a large ecosystem of add-ons that provide features such as authentication, authorization, login, sessions, and more.

Why Choose MongoDB with Flask?

Choosing MongoDB with Flask is a great combination for Python web projects, as both tools are built around flexibility and simplicity. MongoDB’s document-oriented, schema-less storage means you don’t have to define rigid table structures ahead of time. Data can reside in flexible JSON-like documents, making it easy to get started and adapt as your app evolves. Another big plus of using MongoDB with Flask is that JSON-like data structures (e.g., Python dictionaries) map naturally to BSON documents in the database, so developers can work with familiar Python types without complex transformations. We could go on, but let's get this cheat sheet started.

Setup and Project Structure

Configure MongoDB connection parameters (URI, host, port, credentials) via environment variables or configuration files.

Always use configuration files or environment variables to define database connections and overall credentials. This ensures there is no hard-coded information across production and other environments.

In the environment file, there isn't much to see here. I added information on how and what access we have to our database, as well as the environment we run on.

.env
FLASK_ENV=development
FLASK_APP=app.py

MONGODB_URI=mongodb://localhost:27017
MONGODB_DB=my_database

Use a single, globally shared `MongoClient` (or an appropriately scoped one) instead of creating a new connection per request.

Avoid using multiple database clients within the same application. You should create a single connection and reuse it. MongoDB Drivers manage a connection pool and can be configured to allow more or fewer active connections. When a new connection is required, the driver will use an available connection or establish a new one. This ensures authentication occurs only once, so the time spent resolving the request is dedicated to the actual query rather than to authentication and operation execution. With Flask's built-in extension management, we can instantiate it once and reuse it later.

python
import os
from pymongo import MongoClient
from flask import current_app
from dotenv import load_dotenv

# load environment variables
load_dotenv()

# defining my mongo connection
class Mongo:
    def __init__(self):
        self.client = None
        self.db = None

    def init_app(self, app):
        uri = os.getenv("MONGODB_URI")
        db_name = os.getenv("MONGODB_DB")

        if not uri or not db_name:
            raise RuntimeError("MongoDB environment variables are not set")

        self.client = MongoClient(uri)
        self.db = self.client[db_name]

        # store reference inside Flask app as extension
        app.extensions["mongo"] = self

    @staticmethod
    def get_db():
        mongo = current_app.extensions.get("mongo")
        if not mongo:
            raise RuntimeError("Mongo extension not initialized")
        return mongo.db


mongo = Mongo()

Keep your database access logic, application logic (Flask routes), and configuration separate. Helps with upgrades, testing, and debugging.

By keeping your database access logic, configuration, and business rules separate, you can support each point of view independently without having to search your project to identify the root cause. This is especially useful when you need to understand the application workload and determine which index to use on MongoDB.

The example below shows an example project:

.md
project/
├── .env
├── app.py
├── extensions.py        # Mongo connection
├── routes/
│   └── example.py      # Route logic
├── repositories/
│   └── example_repo.py # Mongo collection queries
└── requirements.txt

extensions.py

In the following file, I am instantiating a MongoDB client. I will instantiate it later in my Flask app by calling the init_app function and passing the Flask instance. This way, whenever my “main” is called, it will create my connection.

python
import os
from pymongo import MongoClient
from flask import current_app
from dotenv import load_dotenv

# load environment variables
load_dotenv()

# defining my mongo connection
class Mongo:
    def __init__(self):
        self.client = None
        self.db = None

    def init_app(self, app):
        uri = os.getenv("MONGODB_URI")
        db_name = os.getenv("MONGODB_DB")

        if not uri or not db_name:
            raise RuntimeError("MongoDB environment variables are not set")

        self.client = MongoClient(uri)
        self.db = self.client[db_name]

        # store reference inside Flask app as extension
        app.extensions["mongo"] = self

    @staticmethod
    def get_db():
        mongo = current_app.extensions.get("mongo")
        if not mongo:
            raise RuntimeError("Mongo extension not initialized")
        return mongo.db


mongo = Mongo()

repositories/example_repo.py

Here, I am simply defining how I interact with my database. If required, I would also define indexes here (not creating them). Having both the indexes and your interactions with that collection in the same folder makes it easier to evaluate new indexes, remove unnecessary ones, or upgrade indexes as new requirements arise.

python
from extensions import Mongo

COLLECTION_NAME = "example"

def get_all():
    db = Mongo.get_db()
    return list(db[COLLECTION_NAME].find({}, {"_id": 0}))

def insert_message(message: str):
    db = Mongo.get_db()
    result = db[COLLECTION_NAME].insert_one({"message": message})
    return str(result.inserted_id)

routes/example.py

Finally, I'm using Flask Blueprints to define my routes. Blueprints are a powerful, built-in mechanism for structuring applications into modular, reusable components rather than placing all code in one file.

python
from flask import Blueprint, jsonify, request
from repositories.example_repo import get_all, insert_message

example_bp = Blueprint("example", __name__, url_prefix="/example")

@example_bp.route("/", methods=["GET"])
def list_messages():
    return jsonify(get_all())

@example_bp.route("/", methods=["POST"])
def create_message():
    payload = request.get_json(silent=True) or {}
    message = payload.get("message")

    if not message:
        return jsonify({"error": "message is required"}), 400

    inserted_id = insert_message(message)
    return jsonify({"inserted_id": inserted_id}), 201

app.py

python
from flask import Flask
from routes.example import example_bp
from extensions import mongo

def create_app():
    app = Flask(__name__)

    # init extensions
    mongo.init_app(app)

    # register routes
    app.register_blueprint(example_bp)

    return app

app = create_app()

if __name__ == "__main__":
    app.run(debug=True)

Data Modeling & Schema Design

Maintain consistent field names and types in the application.

Maintaining consistent field names and data types across the application is critical to avoid subtle bugs, simplify queries, and keep your data model predictable over time. When field names or types change inconsistently (for example, userId versus user_id, or a date stored sometimes as a string and sometimes as a Date), it increases cognitive load, breaks indexes, complicates aggregations, and can cause runtime errors that are hard to trace. Data consistency makes your application easier to maintain, safer to refactor, and more reliable when multiple services, developers, or analytics tools interact with the same data. If you need to have multiple versions of the same document, consider using Document Versioning pattern.

You can use the MongoDB schema validator to define and validate documents inserted into a collection after the schema definition.

javascript
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["userId", "email", "isActive", "createdAt"],
      properties: {
        userId: {
          bsonType: "string",
          description: "Must be a string and is required"
        },
        email: {
          bsonType: "string",
          pattern: "^.+@.+$",
          description: "Must be a valid email string."
        },
        isActive: {
          bsonType: "bool",
          description: "Must be a boolean."
        },
        createdAt: {
          bsonType: "date",
          description: "Must be a BSON date."
        },
        optionalField: {
         bsonType: “string”,
         description: “This field is optional.”
       }
      }
    }
  },
  validationLevel: "strict",
  validationAction: "error"
})

Design your document model according to query and access patterns.

Designing your document model based on query and access patterns means shaping your MongoDB documents around how the application actually reads and writes data, not just how the data appears conceptually. By embedding frequently accessed data together and avoiding unnecessary joins, you reduce query complexity, minimize network round-trips, and improve performance. A model optimized for real access patterns is faster, simpler to query, and scales more predictably than one designed only from a relational mindset.

In MongoDB, data that is read together should usually live together. That’s the core idea behind modeling for access patterns.

Model not aligned with access patterns(reference-heavy):

javascript
// orders
{ "_id": 1, "customer_id": 10 }

// order_items
{ "order_id": 1, "product": "Book", "qty": 2 }

// customers
{ "_id": 10, "name": "Alice" }

Query (multiple round trips / $lookup):

db.orders.aggregate([
  { $match: { _id: 1 } },
  { $lookup: {
      from: "order_items",
      localField: "_id",
      foreignField: "order_id",
      as: "items"
  }},
  { $lookup: {
      from: "customers",
      localField: "customer_id",
      foreignField: "_id",
      as: "customer"
  }}
])

Model aligned with access patterns (embedded):

javascript
# orders
{
  "_id": 1,
  "customer": {
    "id": 10,
    "name": "Alice"
  },
  "items": [
    { "product": "Book", "qty": 2 },
    { "product": "Pen", "qty": 1 }
  ]
}

Query (single, simple read):

python
db.orders.findOne({ _id: 1 })

Performance & Query Optimization

Use indexes for fields that you frequently query, filter, or sort by.

Using indexes on frequently queried, filtered, or sorted fields is essential for efficient MongoDB queries. A well-designed compound index can support multiple query patterns when it follows the ESR rule (Equality → Sort → Range), meaning equality filters should come first, followed by sort fields, and then range conditions. By ordering fields correctly, a single index can satisfy multiple query patterns without creating multiple indexes:

“user”
“user” and “age”
“user”, “age”, and “workedHours”

Index creation:

python
db.example.createIndex({ user: 1, age: 1, workedHours: 1 })

Query examples:

python
db.example.find({ user: "joao" })

db.example.find({ user: "joao", age: "28" })
# or
db.example.find({ user: "joao" }).sort({age: 1})

db.example.find({
  user: "joao",
  workedHours: { $gte: 100, $lte: 200 }
}).sort({age: 1})

For potentially large result sets, use pagination (_id skip/limit).

For potentially large result sets, always use pagination to avoid loading too much data into memory and to keep response times predictable. In MongoDB, we can rely on the _id field for pagination over large skip values because it scales efficiently as collections grow and avoids unnecessary data scans. By filtering on _id and combining it with the $limit stage, each page starts where the previous one ended, providing consistent performance and avoiding the cost of scanning and skipping documents. This is possible because ObjectIds are unique and grow monotonically by default.

python
# First page
db.logs.find({})
  .sort({ _id: 1 })
  .limit(10)

# Next page (use last _id from previous page)
db.logs.find({ _id: { $gt: ObjectId("65a123...") } }) // change to $lt if using descending
  .sort({ _id: 1 })
  .limit(10)

Security

Always validate and sanitize user input before inserting it into MongoDB.

Always validate and sanitize user input before inserting it into MongoDB, especially when the input comes from a public endpoint, since HTTP requests should be treated as untrusted by default. Proper validation ensures required fields are present and have the correct data types, while sanitization prevents malformed data, unexpected structures, or NoSQL injection (e.g. $gt, $ne) from reaching the database.

Additionally, if you store sensitive data, such as passwords, never store it in plaintext. Always hash them with an algorithm, such as bcrypt. This protects data integrity, reduces security risks, and keeps your MongoDB collections consistent and secure, even when accessed.

Another protection layer is using Binary subtype 8. This subtype will replace its content in log files for ###. Making sure that even if someone gets access to your MongoDB logs, they will be redacted on that specific field.

python
from flask import request, jsonify
import bcrypt
from bson import Binary

@app.route("/users", methods=["POST"])
def create_user():
    payload = request.get_json(silent=True) or {}

    # Validation
    if not isinstance(payload.get("email"), str):
        return jsonify({"error": "email must be a string"}), 400
    if not isinstance(payload.get("password"), str):
        return jsonify({"error": "password must be a string"}), 400

    salt = bcrypt.gensalt()
    hashed_pwd = bcrypt.hashpw(payload["password"].encode('utf-8'), salt)

    # Sanitization
    user = {
        "email": payload["email"].strip().lower(),
        # Hash password before storing
        "password": Binary(hashed_pwd, 8),
        "is_active": bool(payload.get("is_active", True))
    }

    db.users.insert_one(user)
    return jsonify({"status": "created"}), 201

Conclusion

This article outlines best practices for using MongoDB with Flask. This is not an exhaustive list, but a few common mistakes I've seen after many consulting sessions. Hopefully, this has shed some light and will help you avoid those common mistakes. The best way to learn MongoDB is through its extensive documentation, which is comprehensive and always up to date. If you are unsure or couldn't find anything, there are always the forums and community to help out!

And as promised, here's the Cheat sheet!

Area	Best Practice	Key Action / Code Example
Setup & Structure	Configure connection via `environment variables`.	Use environment variables for credentials.
Setup & Structure	Use a single, globally shared `MongoClient`.	Instantiate `MongoClient` once in an extension and reuse it.
Setup & Structure	Separate logic (routes, repo, config).	Use Flask Blueprints for routes and dedicated repository files for database interactions.
Data Modeling	Maintain consistent field names and types.	Enforce consistency with MongoDB's `$jsonSchema` validator when required.
Data Modeling	Design model for query & access patterns.	Embed data that is read together (e.g., customer info inside the order document).
Performance	Use indexes for frequent query, filter, or sort fields.	Follow the ESR Rule for compound indexes: Equality, Sort, Range.
Performance	Use _id field for pagination.	Use `find({ _id: { $gt: last_id } }).sort({ _id: 1 }).limit(N)` to avoid large `skip` values.
Security	Validate and sanitize all user input.	Check data types and strip/lower strings before insertion.
Security	Never store sensitive data in plaintext.	Always hash passwords before saving them to the database.