Puneet Gupta

Posted on Jul 4 • Edited on Jul 5 • Originally published at pg-blogs.netlify.app

Avoiding ORM Traps and the N+1 Problem in Python

#django #python #performance #database

Introduction

Python's ORMs — SQLAlchemy and the Django ORM — are a joy to write and dangerously easy to make slow. A loop over a queryset that reads one related field can silently fire hundreds of queries. It looks fine in development and falls over on real data.

This post covers the ORM traps Python teams hit most, starting with the N+1 query problem, in both SQLAlchemy and Django.

The N+1 Problem

You load authors and print each author's book count:

# Django
for author in Author.objects.all():        # 1 query
    print(author.books.count())             # 1 query PER author

# SQLAlchemy
for author in session.query(Author).all():  # 1 query
    print(len(author.books))                # 1 lazy-load query PER author

One query to fetch the authors, then N more — one per author — to fetch each author's books. A hundred authors is 101 round trips. It works on three test rows and collapses in production.

Fix It: Eager Loading

Django — `select_related` and `prefetch_related`

select_related — for forward ForeignKey/OneToOne. Uses a SQL join, one query.
prefetch_related — for reverse FKs and ManyToMany. Runs a second query and joins in Python.

# One query for authors + one for all their books, joined in Python
for author in Author.objects.prefetch_related("books"):
    print(len(author.books.all()))

# Forward FK: a single join query
for book in Book.objects.select_related("author"):
    print(book.author.name)

SQLAlchemy — `selectinload` and `joinedload`

from sqlalchemy.orm import selectinload, joinedload

# selectinload: a 2nd query with WHERE author_id IN (...) — best for collections
authors = session.query(Author).options(selectinload(Author.books)).all()

# joinedload: a single LEFT JOIN — best for many-to-one / one-to-one
books = session.query(Book).options(joinedload(Book.author)).all()

A good rule of thumb: selectinload for to-many collections (avoids row multiplication), joinedload for to-one relations.

Don't Fetch Whole Objects to Read a Few Columns

Hydrating full model instances to display two fields wastes memory and time. Select only what you need.

# Django: .values() / .only() skip full model construction
Author.objects.values("name", "email")            # dicts, no model overhead
Author.objects.only("name")                        # deferred other columns

# SQLAlchemy: select specific columns
session.query(Author.name, Author.email).all()

For aggregates, push the work into the database rather than counting in Python:

# Django: annotate a count in SQL — no N+1, no Python-side counting
from django.db.models import Count
Author.objects.annotate(num_books=Count("books"))

Pagination Pitfalls

Two things bite here.

1. Counting for pagination is a full second query. LIMIT/OFFSET pagination runs a COUNT(*) to compute total pages, which can be expensive on large tables. For big datasets prefer keyset (cursor) pagination — WHERE id > :last_seen ORDER BY id LIMIT n — which stays fast regardless of how deep you page.

2. Slicing after a to-many prefetch/join can multiply or mis-limit rows. As in Java's JPA, a LIMIT applied on top of a collection join doesn't page the way you expect. In Django, prefetch_related sidesteps this by issuing a separate query; slice the base queryset (Author.objects.all()[:20]) and let the prefetch fetch relations for that page.

Beware Implicit Lazy Loads Outside a Session

In SQLAlchemy, accessing a relationship after the Session is closed raises DetachedInstanceError. In Django, touching a related object in a template after the request's DB work is done can quietly re-query. The fix is the same in both: load everything the caller needs while the session/request is open, using eager loading — don't rely on lazy access leaking out of scope.

Always Look at the Queries

The ORM hides the SQL; make it visible in development.

# Django: inspect queries actually run
from django.db import connection
print(len(connection.queries))     # with DEBUG=True
# or use django-debug-toolbar to see counts and duplicates per request

# SQLAlchemy: echo generated SQL
engine = create_engine(url, echo=True)

Most ORM performance bugs are invisible in the Python and obvious in the query log. Watching the query count is how you catch an N+1 before your users do.

Practical Checklist

Trap	Django fix	SQLAlchemy fix
N+1 on forward FK	`select_related`	`joinedload`
N+1 on collection / M2M	`prefetch_related`	`selectinload`
Fetching whole objects to read	`.values()` / `.only()`	select columns
Counting in Python	`annotate(Count(...))`	`func.count()`
Deep `OFFSET` pagination	keyset pagination	keyset pagination
Lazy load out of scope	eager-load in request	eager-load in session

Final Thoughts

SQLAlchemy and the Django ORM are fast enough for almost anything — if you tell them how to load related data. Default to eager loading at the call sites that need relations, select only the columns you'll use, keep aggregation in the database, and watch the query count during development. The ORM's job is to hide SQL; your job is to make sure it isn't hiding an N+1.

DEV Community

Avoiding ORM Traps and the N+1 Problem in Python

Introduction

The N+1 Problem

Fix It: Eager Loading

Django — `select_related` and `prefetch_related`

SQLAlchemy — `selectinload` and `joinedload`

Don't Fetch Whole Objects to Read a Few Columns

Pagination Pitfalls

Beware Implicit Lazy Loads Outside a Session

Always Look at the Queries

Practical Checklist

Final Thoughts

Top comments (0)

Introduction

The N+1 Problem

Fix It: Eager Loading

Django — select_related and prefetch_related

SQLAlchemy — selectinload and joinedload

Don't Fetch Whole Objects to Read a Few Columns

Pagination Pitfalls

Beware Implicit Lazy Loads Outside a Session

Always Look at the Queries

Practical Checklist

Final Thoughts

Django — `select_related` and `prefetch_related`

SQLAlchemy — `selectinload` and `joinedload`