You don't need paid monitoring tools to find what's slow in your Django application. Two free, open-source tools cover most of it: Django Debug Toolbar for per-request profiling and snakeviz for visualizing Python's built-in cProfile data.
This post walks through how I use both tools to find and fix performance problems, based on patterns from my own projects. The examples are grounded in a Django API that handles 10,000+ scheduled messages per day with Celery workers and external API calls.
If you want the broader performance optimization workflow — including production monitoring, caching, and async offloading — I covered that in How to Optimise Backend Performance. This post goes deeper on the local profiling tools.
Setting Up Django Debug Toolbar
Installation takes about two minutes.
pip install django-debug-toolbar
# settings.py (development only)
INSTALLED_APPS = [
...
'debug_toolbar',
]
MIDDLEWARE = [
'debug_toolbar.middleware.DebugToolbarMiddleware',
...
]
DEBUG_TOOLBAR_PANELS = [
'debug_toolbar.panels.sql.SQLPanel',
'debug_toolbar.panels.profiling.ProfilingPanel',
'debug_toolbar.panels.timer.TimerPanel',
'debug_toolbar.panels.cache.CachePanel',
]
INTERNAL_IPS = ['127.0.0.1']
Add the URL configuration:
# urls.py
if settings.DEBUG:
import debug_toolbar
urlpatterns = [
path('__debug__/', include(debug_toolbar.urls)),
] + urlpatterns
On Python 3.12+, the profiling panel needs the dev server running single-threaded:
python manage.py runserver --nothreading
The SQL Panel: Your N+1 Detector
The SQL panel is where I spend most of my time in Debug Toolbar. It shows every database query fired during a request, with timing, SQL text, and stack traces.
What to look for
Query count. A list endpoint returning 50 items should not fire 150 queries. If it does, you're missing select_related or prefetch_related.
"Duplicated" and "Similar" badges. Debug Toolbar groups identical query patterns and flags them. If you see a red "Duplicated" badge next to SELECT * FROM customers WHERE id = ? repeated 200 times, that's a textbook N+1.
The stack trace. Click any query to see which line of Python triggered it. This tells you whether the query came from the view, a serializer, a model method, or a template. Knowing where matters as much as knowing what.
Query time distribution. If one query takes 200 ms and the rest take 1 ms each, that query is your target. Often it's a missing index — the query is doing a sequential scan instead of using an index.
Finding an N+1 in practice
On a project similar to my Message Scheduler, I hit this endpoint locally:
GET /api/messages/?status=pending
Debug Toolbar showed: 187 queries in 420 ms. Several queries had "Duplicated" badges — the same SELECT * FROM users WHERE id = ? pattern repeated for every message in the list.
The view was loading messages and then accessing message.user.email in the serializer. Each access triggered a separate query.
The fix:
# Before: 187 queries
messages = Message.objects.filter(status='pending')
# After: 2 queries
messages = Message.objects.filter(status='pending').select_related('user')
After the change, Debug Toolbar showed 2 queries in 12 ms. One query for messages with a JOIN to users, one for the count.
The Profiling Panel: Where Time Actually Goes
The SQL panel tells you about database time. The profiling panel tells you about everything else — serialization, template rendering, Python computation, middleware.
Enable it by clicking its checkbox in the toolbar. It shows a collapsible call tree for the request:
GET /api/messages/ — 1842 ms
├── MessageListView.get() — 1842 ms (cumtime)
│ ├── MessageQuerySet.all() — 12 ms
│ ├── MessageSerializer.to_representation() — 1650 ms
│ │ ├── UserField.to_representation() × 200 — 1580 ms
│ │ │ └── SQL: SELECT * FROM users WHERE id = %s × 200
│ │ └── ChannelSerializer.to_representation() × 200 — 60 ms
│ └── Paginator.paginate() — 180 ms
Reading the call tree
Call count is the key signal. A function called 200 times inside a loop is almost always an N+1 or a missing batch operation. In the example above, UserField.to_representation() runs 200 times — once per message in the list.
Nesting shows the hierarchy. A slow parent with fast own-time means the parent is slow because of its children. MessageSerializer.to_representation() takes 1650 ms, but it's not doing anything slow itself — its child UserField.to_representation() is.
Start from the deepest nodes with the highest cumulative time and work upward. The actual bottleneck is usually at the bottom of the tree.
You can adjust the profiling depth with DEBUG_TOOLBAR_CONFIG:
DEBUG_TOOLBAR_CONFIG = {
'PROFILER_MAX_DEPTH': 15, # default: 10
'PROFILER_THRESHOLD_RATIO': 5, # default: 8
}
snakeviz: Visual Profiling with cProfile
Django Debug Toolbar works great for web requests. But when you need to profile a management command, a Celery task, or a function in isolation, cProfile + snakeviz is the tool.
Capturing a profile
import cProfile
from django.test import RequestFactory
factory = RequestFactory()
request = factory.get('/api/messages/?status=pending')
profiler = cProfile.Profile()
profiler.enable()
response = message_list_view(request)
profiler.disable()
profiler.dump_stats('message_list.prof')
Or from the command line:
python -m cProfile -o output.prof manage.py some_management_command
Viewing with snakeviz
pip install snakeviz
snakeviz message_list.prof
snakeviz opens a browser with an interactive visualization — either a sunburst chart or an icicle chart. Each block is a function. Wider blocks took more time. Blocks nested inside others were called by the parent.
Reading snakeviz output
The text table below the chart shows the same data as cProfile's text output:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.842 1.842 views.py:45(message_list)
200 0.003 0.000 1.650 0.008 serializers.py:12(get_user)
200 1.580 0.008 1.580 0.008 base.py:330(execute)
1 0.001 0.001 0.180 0.180 pagination.py:22(paginate)
| Column | What It Means |
|---|---|
ncalls |
How many times this function was called |
tottime |
Time inside this function, excluding sub-calls |
cumtime |
Total time including sub-calls |
percall |
Time per call |
How to read this: Start from the top (sorted by cumtime). message_list takes 1.84 seconds total. get_user is called 200 times and accounts for 1.65 seconds — 89% of the view's time. The actual time is in base.py:execute, which is Django's database query executor. Classic N+1.
What patterns to look for:
- High
ncallson database functions → N+1 queries - High
tottimeon a single function → CPU-bound bottleneck (serialization, computation) - High
cumtimewith lowtottime→ the function itself is fast but calls something slow
Profiling Celery tasks
For my Message Scheduler's delivery tasks, I profile individual task functions:
import cProfile
@shared_task
def send_message(message_id):
# Normal task code...
pass
# Profile it
cProfile.run('send_message(42)', 'send_task.prof')
Then snakeviz send_task.prof shows exactly where delivery time goes — API calls to SES, Telegram latency, database reads for message content. This is how I discovered that loading the full message object (including a large metadata JSON field) was adding unnecessary overhead. Switching to .only('id', 'channel', 'recipient', 'body') cut the database portion by 60%.
Deep Dives with EXPLAIN ANALYZE
When Debug Toolbar or snakeviz points to a slow query, EXPLAIN ANALYZE tells you why it's slow at the database level.
EXPLAIN ANALYZE
SELECT m.id, m.body, m.send_at, u.email
FROM messages m
JOIN users u ON m.user_id = u.id
WHERE m.status = 'pending'
AND m.send_at > '2026-03-01'
ORDER BY m.send_at ASC
LIMIT 50;
What the output tells you
| Indicator | Meaning | Fix |
|---|---|---|
Seq Scan |
Full table scan — no index used | Add index on filtered columns |
Nested Loop + high rows |
Looping join on large sets | Check join indexes |
Sort with high cost |
Sorting without index support | Add index matching ORDER BY |
Rows Removed by Filter (high) |
Index not selective enough | Use composite or partial index |
Fixing a missing index
Debug Toolbar showed a query on the messages table taking 180 ms. EXPLAIN ANALYZE confirmed a sequential scan:
Seq Scan on messages (cost=0.00..28453.00 rows=1250 actual time=0.028..178.403 rows=1247)
Filter: ((status)::text = 'pending'::text AND (send_at > '2026-03-01'))
Rows Removed by Filter: 498753
Scanning 500,000 rows to return 1,247. A partial index fixed it:
CREATE INDEX idx_messages_pending ON messages(send_at)
WHERE status = 'pending';
After the index:
Index Scan using idx_messages_pending on messages (cost=0.29..42.15 rows=1250 actual time=0.015..1.203 rows=1247)
From 178 ms to 1.2 ms. The partial index is small because it only covers pending messages, so it stays fast even as the table grows.
In Django migrations:
class Migration(migrations.Migration):
operations = [
migrations.AddIndex(
model_name='message',
index=models.Index(
fields=['send_at'],
condition=models.Q(status='pending'),
name='idx_messages_pending',
),
),
]
The Full Workflow
Here's the process I follow for every slow endpoint:
1. Hit the endpoint with Debug Toolbar enabled.
Check the SQL panel first. High query count with duplicate badges = N+1. Fix with select_related or prefetch_related.
2. Check the profiling panel.
If query count is fine but the request is still slow, the profiling panel shows where time goes in Python code — serialization, computation, template rendering.
3. Profile in isolation with cProfile + snakeviz.
For deeper analysis or non-web-request profiling (management commands, Celery tasks), capture a .prof file and visualize it.
4. Run EXPLAIN ANALYZE on slow queries.
When a specific query is the bottleneck, check the execution plan. Look for Seq Scan and add targeted indexes.
5. Verify the fix.
Hit the endpoint again with Debug Toolbar. Confirm query count dropped, execution time improved. Run EXPLAIN ANALYZE again to confirm the index is being used.
This loop — observe, identify, fix, verify — is the same one I follow across all my projects. I wrote about it in broader context (including production monitoring, caching strategies, and async offloading) in How to Optimise Backend Performance.
Beyond Local Profiling
Debug Toolbar and snakeviz are local development tools. They catch problems before code ships. But some issues only appear under production load — connection pool exhaustion, cache stampedes, replication lag.
For my Message Scheduler, I use Celery Flower for worker monitoring and structured logging with structlog for production request tracing. On my portfolio's AI chatbot, the Cloudflare Worker proxy handles error states and I track response latency through server logs.
The HealthLab platform uses health check endpoints that verify database connectivity — simple but catches the most common production failure.
The tools change between local and production, but the principle stays: find where time goes, fix the biggest bottleneck, verify the improvement.
Tools Reference
| Tool | What It Does | When to Use |
|---|---|---|
| Django Debug Toolbar (SQL panel) | Shows all queries per request with timing and stack traces | First check on any slow endpoint |
| Django Debug Toolbar (Profiling panel) | Call tree with cumulative time per function | When query count is fine but request is slow |
| cProfile + snakeviz | Python profiler with visual flame graph | Management commands, Celery tasks, isolated functions |
EXPLAIN ANALYZE |
PostgreSQL execution plan with actual timings | When a specific query is the bottleneck |
| QueryCountMiddleware | Logs query count per request in staging | Catching N+1 regressions before they hit production |
All my projects — including architecture diagrams, tradeoff analysis, and failure mode documentation — are at ankitjang.one/projects.
About me: I'm Ankit Jangwan, a Senior Software Engineer building backend systems with Django, PostgreSQL, Celery, and Go. See my case studies at ankitjang.one/case-studies.
Top comments (0)