Ankit Jangwan

Posted on Apr 2 • Originally published at ankitjang.one

Profiling Django APIs with Debug Toolbar and snakeviz

#django #performance #database #profiling

You don't need paid monitoring tools to find what's slow in your Django application. Two free, open-source tools cover most of it: Django Debug Toolbar for per-request profiling and snakeviz for visualizing Python's built-in cProfile data.

This post walks through how I use both tools to find and fix performance problems, based on patterns from my own projects. The examples are grounded in a Django API that handles 10,000+ scheduled messages per day with Celery workers and external API calls.

If you want the broader performance optimization workflow — including production monitoring, caching, and async offloading — I covered that in How to Optimise Backend Performance. This post goes deeper on the local profiling tools.

Setting Up Django Debug Toolbar

Installation takes about two minutes.

pip install django-debug-toolbar

# settings.py (development only)
INSTALLED_APPS = [
    ...
    'debug_toolbar',
]

MIDDLEWARE = [
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    ...
]

DEBUG_TOOLBAR_PANELS = [
    'debug_toolbar.panels.sql.SQLPanel',
    'debug_toolbar.panels.profiling.ProfilingPanel',
    'debug_toolbar.panels.timer.TimerPanel',
    'debug_toolbar.panels.cache.CachePanel',
]

INTERNAL_IPS = ['127.0.0.1']

Add the URL configuration:

# urls.py
if settings.DEBUG:
    import debug_toolbar
    urlpatterns = [
        path('__debug__/', include(debug_toolbar.urls)),
    ] + urlpatterns

On Python 3.12+, the profiling panel needs the dev server running single-threaded:

python manage.py runserver --nothreading

The SQL Panel: Your N+1 Detector

The SQL panel is where I spend most of my time in Debug Toolbar. It shows every database query fired during a request, with timing, SQL text, and stack traces.

What to look for

Query count. A list endpoint returning 50 items should not fire 150 queries. If it does, you're missing select_related or prefetch_related.

"Duplicated" and "Similar" badges. Debug Toolbar groups identical query patterns and flags them. If you see a red "Duplicated" badge next to SELECT * FROM customers WHERE id = ? repeated 200 times, that's a textbook N+1.

The stack trace. Click any query to see which line of Python triggered it. This tells you whether the query came from the view, a serializer, a model method, or a template. Knowing where matters as much as knowing what.

Query time distribution. If one query takes 200 ms and the rest take 1 ms each, that query is your target. Often it's a missing index — the query is doing a sequential scan instead of using an index.

Finding an N+1 in practice

On a project similar to my Message Scheduler, I hit this endpoint locally:

GET /api/messages/?status=pending

Debug Toolbar showed: 187 queries in 420 ms. Several queries had "Duplicated" badges — the same SELECT * FROM users WHERE id = ? pattern repeated for every message in the list.

The view was loading messages and then accessing message.user.email in the serializer. Each access triggered a separate query.

The fix:

# Before: 187 queries
messages = Message.objects.filter(status='pending')

# After: 2 queries
messages = Message.objects.filter(status='pending').select_related('user')

After the change, Debug Toolbar showed 2 queries in 12 ms. One query for messages with a JOIN to users, one for the count.

The Profiling Panel: Where Time Actually Goes

The SQL panel tells you about database time. The profiling panel tells you about everything else — serialization, template rendering, Python computation, middleware.

Enable it by clicking its checkbox in the toolbar. It shows a collapsible call tree for the request:

GET /api/messages/ — 1842 ms
├── MessageListView.get() — 1842 ms (cumtime)
│   ├── MessageQuerySet.all() — 12 ms
│   ├── MessageSerializer.to_representation() — 1650 ms
│   │   ├── UserField.to_representation() × 200 — 1580 ms
│   │   │   └── SQL: SELECT * FROM users WHERE id = %s × 200
│   │   └── ChannelSerializer.to_representation() × 200 — 60 ms
│   └── Paginator.paginate() — 180 ms

Reading the call tree

Call count is the key signal. A function called 200 times inside a loop is almost always an N+1 or a missing batch operation. In the example above, UserField.to_representation() runs 200 times — once per message in the list.

Nesting shows the hierarchy. A slow parent with fast own-time means the parent is slow because of its children. MessageSerializer.to_representation() takes 1650 ms, but it's not doing anything slow itself — its child UserField.to_representation() is.

Start from the deepest nodes with the highest cumulative time and work upward. The actual bottleneck is usually at the bottom of the tree.

You can adjust the profiling depth with DEBUG_TOOLBAR_CONFIG:

DEBUG_TOOLBAR_CONFIG = {
    'PROFILER_MAX_DEPTH': 15,        # default: 10
    'PROFILER_THRESHOLD_RATIO': 5,   # default: 8
}

snakeviz: Visual Profiling with cProfile

Django Debug Toolbar works great for web requests. But when you need to profile a management command, a Celery task, or a function in isolation, cProfile + snakeviz is the tool.

Capturing a profile

import cProfile
from django.test import RequestFactory

factory = RequestFactory()
request = factory.get('/api/messages/?status=pending')

profiler = cProfile.Profile()
profiler.enable()
response = message_list_view(request)
profiler.disable()

profiler.dump_stats('message_list.prof')

Or from the command line:

python -m cProfile -o output.prof manage.py some_management_command

Viewing with snakeviz

pip install snakeviz
snakeviz message_list.prof

snakeviz opens a browser with an interactive visualization — either a sunburst chart or an icicle chart. Each block is a function. Wider blocks took more time. Blocks nested inside others were called by the parent.

Reading snakeviz output

The text table below the chart shows the same data as cProfile's text output:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    0.000    0.000    1.842    1.842 views.py:45(message_list)
    200    0.003    0.000    1.650    0.008 serializers.py:12(get_user)
    200    1.580    0.008    1.580    0.008 base.py:330(execute)
      1    0.001    0.001    0.180    0.180 pagination.py:22(paginate)

Column	What It Means
`ncalls`	How many times this function was called
`tottime`	Time inside this function, excluding sub-calls
`cumtime`	Total time including sub-calls
`percall`	Time per call

How to read this: Start from the top (sorted by cumtime). message_list takes 1.84 seconds total. get_user is called 200 times and accounts for 1.65 seconds — 89% of the view's time. The actual time is in base.py:execute, which is Django's database query executor. Classic N+1.

What patterns to look for:

High ncalls on database functions → N+1 queries
High tottime on a single function → CPU-bound bottleneck (serialization, computation)
High cumtime with low tottime → the function itself is fast but calls something slow

Profiling Celery tasks

For my Message Scheduler's delivery tasks, I profile individual task functions:

import cProfile

@shared_task
def send_message(message_id):
    # Normal task code...
    pass

# Profile it
cProfile.run('send_message(42)', 'send_task.prof')

Then snakeviz send_task.prof shows exactly where delivery time goes — API calls to SES, Telegram latency, database reads for message content. This is how I discovered that loading the full message object (including a large metadata JSON field) was adding unnecessary overhead. Switching to .only('id', 'channel', 'recipient', 'body') cut the database portion by 60%.

Deep Dives with EXPLAIN ANALYZE

When Debug Toolbar or snakeviz points to a slow query, EXPLAIN ANALYZE tells you why it's slow at the database level.

EXPLAIN ANALYZE
SELECT m.id, m.body, m.send_at, u.email
FROM messages m
JOIN users u ON m.user_id = u.id
WHERE m.status = 'pending'
  AND m.send_at > '2026-03-01'
ORDER BY m.send_at ASC
LIMIT 50;

What the output tells you

Indicator	Meaning	Fix
`Seq Scan`	Full table scan — no index used	Add index on filtered columns
`Nested Loop` + high rows	Looping join on large sets	Check join indexes
`Sort` with high cost	Sorting without index support	Add index matching ORDER BY
`Rows Removed by Filter` (high)	Index not selective enough	Use composite or partial index

Fixing a missing index

Debug Toolbar showed a query on the messages table taking 180 ms. EXPLAIN ANALYZE confirmed a sequential scan:

Seq Scan on messages  (cost=0.00..28453.00 rows=1250 actual time=0.028..178.403 rows=1247)
  Filter: ((status)::text = 'pending'::text AND (send_at > '2026-03-01'))
  Rows Removed by Filter: 498753

Scanning 500,000 rows to return 1,247. A partial index fixed it:

CREATE INDEX idx_messages_pending ON messages(send_at)
WHERE status = 'pending';

After the index:

Index Scan using idx_messages_pending on messages  (cost=0.29..42.15 rows=1250 actual time=0.015..1.203 rows=1247)

From 178 ms to 1.2 ms. The partial index is small because it only covers pending messages, so it stays fast even as the table grows.

In Django migrations:

class Migration(migrations.Migration):
    operations = [
        migrations.AddIndex(
            model_name='message',
            index=models.Index(
                fields=['send_at'],
                condition=models.Q(status='pending'),
                name='idx_messages_pending',
            ),
        ),
    ]

The Full Workflow

Here's the process I follow for every slow endpoint:

1. Hit the endpoint with Debug Toolbar enabled.
Check the SQL panel first. High query count with duplicate badges = N+1. Fix with select_related or prefetch_related.

2. Check the profiling panel.
If query count is fine but the request is still slow, the profiling panel shows where time goes in Python code — serialization, computation, template rendering.

3. Profile in isolation with cProfile + snakeviz.
For deeper analysis or non-web-request profiling (management commands, Celery tasks), capture a .prof file and visualize it.

4. Run EXPLAIN ANALYZE on slow queries.
When a specific query is the bottleneck, check the execution plan. Look for Seq Scan and add targeted indexes.

5. Verify the fix.
Hit the endpoint again with Debug Toolbar. Confirm query count dropped, execution time improved. Run EXPLAIN ANALYZE again to confirm the index is being used.

This loop — observe, identify, fix, verify — is the same one I follow across all my projects. I wrote about it in broader context (including production monitoring, caching strategies, and async offloading) in How to Optimise Backend Performance.

Beyond Local Profiling

Debug Toolbar and snakeviz are local development tools. They catch problems before code ships. But some issues only appear under production load — connection pool exhaustion, cache stampedes, replication lag.

For my Message Scheduler, I use Celery Flower for worker monitoring and structured logging with structlog for production request tracing. On my portfolio's AI chatbot, the Cloudflare Worker proxy handles error states and I track response latency through server logs.

The HealthLab platform uses health check endpoints that verify database connectivity — simple but catches the most common production failure.

The tools change between local and production, but the principle stays: find where time goes, fix the biggest bottleneck, verify the improvement.

Tools Reference

Tool	What It Does	When to Use
Django Debug Toolbar (SQL panel)	Shows all queries per request with timing and stack traces	First check on any slow endpoint
Django Debug Toolbar (Profiling panel)	Call tree with cumulative time per function	When query count is fine but request is slow
cProfile + snakeviz	Python profiler with visual flame graph	Management commands, Celery tasks, isolated functions
`EXPLAIN ANALYZE`	PostgreSQL execution plan with actual timings	When a specific query is the bottleneck
QueryCountMiddleware	Logs query count per request in staging	Catching N+1 regressions before they hit production

All my projects — including architecture diagrams, tradeoff analysis, and failure mode documentation — are at ankitjang.one/projects.

About me: I'm Ankit Jangwan, a Senior Software Engineer building backend systems with Django, PostgreSQL, Celery, and Go. See my case studies at ankitjang.one/case-studies.

DEV Community