Navdeep Rana

Posted on Nov 16

Scale Django on Seenode: Production Architecture That Actually Works

#django #devops #python #deploydjango

Advanced scaling patterns for Django apps - from 100 to 100K requests/day

Last year I pushed 18 app versions with staging endpoints hardcoded into production. Residents across the UK couldn't unlock their hotel rooms via NFC. My PM couldn't log in. That was a frontend disaster, but it taught me something: deployment F-ups scale with complexity.

Django backend scaling is the same energy - you can't just throw more Gunicorn workers at it and pray. I learned this the hard way when a payments API I built started choking at 2,000 req/min. Tea break interrupted at 11am by a Slack ping: "Checkout timing out, users complaining." I ssh'd in, watched workers dying one by one, and realized our monolith was the problem.

This article covers what actually worked when I migrated that payments system to Seenode and scaled it to 50K+ requests/day without melting down. No theory, just the architecture patterns that kept me sleeping through the night.

You'll learn:

Why splitting web/worker services matters (and how to do it on Seenode)
Database connection pooling that actually prevents "too many connections" errors
Redis caching patterns worth implementing
Real cost numbers ($25/mo to $180/mo at scale)

Prerequisites:

Django app deployed on Seenode (see my production deployment guide)
Basic understanding of Django, PostgreSQL, Celery
Seenode account (free tier works for testing)

Read time: 10 minutes

Level: Advanced (assumes you've deployed Django before)

Why Monoliths Break at Scale

Most Django apps start simple: one Gunicorn process handling HTTP requests and background tasks via @shared_task decorators. This works great until it doesn't.

The breaking point? When a slow task (PDF generation, sending emails) ties up a worker and your API requests start queuing. Or when 20 Gunicorn workers x 5 DB connections hits PostgreSQL's 100-connection limit. I've debugged both, multiple times.

The fix isn't bigger servers. It's splitting web and worker services so they scale independently. Seenode makes this easy—same Git repo, different start commands.

The Multi-Service Architecture

Here's how I structure Django apps on Seenode—three services, one Git repo:

1. Web Service (Gunicorn) - handles HTTP only, scales horizontally

2. Worker Service (Celery) - background jobs, scales for throughput

3. Scheduler Service (Celery Beat) - cron jobs, runs once

Plus PostgreSQL and Redis (managed by Seenode or external).

Each service uses the same codebase but different start commands. Seenode's dashboard lets you configure all three pointing to the same GitHub repo—just change the startup script.

Service 1: Web Service (Gunicorn)

Seenode web service configuration showing Gunicorn start command with workers and threads. The $PORT variable is automatically provided by Seenode.

Your Seenode web service start command:

gunicorn config.wsgi:application --bind 0.0.0.0:$PORT --workers 4 --threads 2 --timeout 30 --max-requests 1000 --max-requests-jitter 100

Key settings explained:

--workers 4 - Use (2 x CPU) + 1. More workers != better performance.

--threads 2 - Handles concurrent DB queries without exploding connection count.

--max-requests 1000 - Restarts workers to prevent memory leaks. Learned this after watching memory climb to 95% over 3 days.

--timeout 30 - Kills hung requests. Saved me when a payment provider's API started timing out.

On a 2-CPU Seenode instance ($25/mo), this handles 2,000 req/min with p95 response time around 180ms.

I tried running 8 workers on a 2-CPU instance once. Thought more workers = better performance. Nope. Hit OOM errors within 2 hours. Turns out the formula (2 x CPU) + 1 exists for a reason.

Set these environment variables in Seenode's dashboard:

GUNICORN_WORKERS=4
GUNICORN_THREADS=2

Service 2: Celery Workers (Background Jobs)

Seenode worker service start command:

celery -A config worker --loglevel=info --concurrency=4 --max-tasks-per-child=1000 --time-limit=300

Celery config (config/celery.py):

import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')
app = Celery('myproject')

app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

app.conf.update(
    broker_url=os.environ.get('REDIS_URL', 'redis://localhost:6379/0'),
    result_backend=os.environ.get('REDIS_URL'),
    task_serializer='json',
    task_time_limit=300,  # 5 min hard limit (prevent stuck tasks)
    task_acks_late=True,  # Only ack after completion
    worker_prefetch_multiplier=1,  # One task at a time for long operations
    result_expires=3600,  # Results expire after 1 hour
)

Example task:

# myapp/tasks.py
from celery import shared_task
from django.core.mail import send_mail

@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def send_welcome_email(self, user_id):
    try:
        from myapp.models import User
        user = User.objects.get(id=user_id)
        send_mail(
            subject='Welcome!',
            message=f'Hi {user.first_name}!',
            from_email='noreply@myapp.com',
            recipient_list=[user.email],
        )
        return {'status': 'sent'}
    except Exception as exc:
        raise self.retry(exc=exc)

Call from views:

# Queue background task (returns immediately)
send_welcome_email.delay(user.id)

I started with --concurrency=4. When queue backed up during Black Friday, scaled to 5 worker instances via Seenode's dashboard. Took 30 seconds, queue cleared in 2 minutes.

Service 3: Celery Beat (Scheduler)

Run periodic tasks on a schedule. Start command:

celery -A config beat --loglevel=info --scheduler django_celery_beat.schedulers:DatabaseScheduler

Install django-celery-beat, add to INSTALLED_APPS, run migrations. Define schedules:

# config/celery.py
from celery.schedules import crontab

app.conf.beat_schedule = {
    'cleanup-sessions-daily': {
        'task': 'myapp.tasks.cleanup_old_sessions',
        'schedule': crontab(hour=2, minute=0),  # 2 AM daily
    },
    'check-payment-status': {
        'task': 'myapp.tasks.check_payment_status',
        'schedule': 300.0,  # Every 5 min
    },
}

Or manage via Django admin—no code deployments needed to change schedules.

Critical: Run ONLY ONE Beat instance. Multiple schedulers = duplicate tasks.

Horizontal Scaling & Database Connection Pooling

Seenode lets you scale by adjusting the "Instances" slider. But your app needs to be stateless—no file uploads on local disk, no in-memory sessions.

Redis Sessions (Required for Multi-Instance)

# settings.py
# pip install django-redis

CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': os.environ.get('REDIS_URL', 'redis://127.0.0.1:6379/1'),
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            'CONNECTION_POOL_KWARGS': {'max_connections': 50},
        },
        'KEY_PREFIX': 'myapp',
        'TIMEOUT': 300,
    }
}

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
SESSION_CACHE_ALIAS = 'default'

Set REDIS_URL in Seenode environment variables. Use Seenode's managed Redis or external provider.

Media Files on S3

File uploads disappear when you scale or redeploy. Use S3:

# pip install django-storages boto3
INSTALLED_APPS += ['storages']
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'

AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME')

Database Connection Pooling (Critical)

With multiple instances, connections explode:

Example: 3 web instances x 4 workers x 5 connections = 60. Add 2 Celery services and you hit PostgreSQL's 100-connection limit. This is the math that breaks things.

I once spent a Saturday debugging "too many connections" errors. Math was simple: 6 services x 5 connections = 30, but max_connections was set to 20. Increased to 50, problem solved.

Enable connection pooling:

# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'CONN_MAX_AGE': 600,  # Keep connections alive 10 minutes
        'OPTIONS': {
            'connect_timeout': 10,
            'options': '-c statement_timeout=30000',
        },
    }
}

# For Celery workers, close connections immediately
if os.environ.get('CELERY_WORKER'):
    DATABASES['default']['CONN_MAX_AGE'] = 0

Check Seenode's PostgreSQL docs for PgBouncer if needed for higher connection counts.

Caching: The Cheap Performance Win

Database queries kill performance. Caching is the easiest fix.

View-Level Caching

from django.views.decorators.cache import cache_page

@cache_page(300)  # 5 minutes
def product_list(request):
    products = Product.objects.filter(active=True)
    return render(request, 'products/list.html', {'products': products})

Query-Level Caching

from django.core.cache import cache

def get_popular_products(limit=10):
    cache_key = f'popular_products_{limit}'
    products = cache.get(cache_key)

    if products is None:
        products = Product.objects.filter(active=True).annotate(
            order_count=Count('orders')
        ).order_by('-order_count')[:limit]
        cache.set(cache_key, products, 600)  # 10 min

    return products

Cache Invalidation

First time I implemented caching, I forgot to invalidate on updates. Users saw prices from 3 days ago. Customer support got hammered. The fix? One cache.delete() call. Cost of mistake: $2,000 in lost sales. Cost of fix: 2 lines of code.

# myapp/models.py
from django.db.models.signals import post_save, post_delete
from django.dispatch import receiver
from django.core.cache import cache

@receiver([post_save, post_delete], sender=Product)
def invalidate_product_cache(sender, instance, **kwargs):
    cache.delete(f'product_{instance.id}')
    cache.delete('popular_products_10')

Performance Monitoring

Seenode has built-in metrics, but you need to know what broke.

Django Debug Toolbar (Dev Only)

# settings.py
if DEBUG:
    INSTALLED_APPS += ['debug_toolbar']
    MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']

Shows SQL queries, cache hits/misses. Use it to find N+1 query problems.

Seenode Metrics (What to Watch)

CPU >80% sustained = scale up or optimize
Memory >85% = memory leak or need more RAM
Response time p95 >500ms = investigate slow endpoints

I keep the Seenode metrics tab pinned. Saw CPU spike to 95% every day at 2pm. Took a week to realize it was daily report generation. Moved to dedicated worker, CPU dropped to 40%.

Database Optimization

Fix N+1 Queries

# BAD: N+1 query problem
products = Product.objects.filter(active=True)
for product in products:
    print(product.category.name)  # Separate query each time!

# GOOD: Use select_related for foreign keys
products = Product.objects.filter(active=True).select_related('category')

# GOOD: Use prefetch_related for many-to-many
products = Product.objects.filter(active=True).prefetch_related('tags')

Add Database Indexes

Add indexes on frequently queried fields:

class Meta:
    indexes = [
        models.Index(fields=['category', 'active']),
        models.Index(fields=['-created_at']),
    ]

Cost Management

Scaling from $25/mo to $180/mo happened faster than expected. What kept costs reasonable:

Start small, scale on metrics - Don't over-provision. I started with 1 web instance ($25/mo), scaled to 3 when CPU hit 85%.

Optimize before scaling - Profile with Debug Toolbar, fix N+1 queries, add indexes, enable caching. Then scale if needed.

Scale workers independently - Workers are cheaper than web instances. Add worker capacity without scaling entire web service.

Caching ROI - Caching saved ~$100/mo in database costs by reducing query load 60%.

Troubleshooting (The Usual Suspects)

"Too Many Database Connections"

Simple math problem: 6 services x 5 connections = 30, but max_connections was 20. Fix: Increase max_connections or use PgBouncer. For Celery workers, close connections immediately:

DATABASES['default']['CONN_MAX_AGE'] = 0 if os.environ.get('CELERY_WORKER') else 600

Redis Out of Memory

Set TTLs on everything. Celery results don't need to live forever: CELERY_RESULT_EXPIRES = 3600 (1 hour, not 24h).

Celery Queue Backing Up

Workers can't keep up. Add more worker instances or route slow tasks to separate queues:

CELERY_TASK_ROUTES = {
    'myapp.tasks.slow_report': {'queue': 'slow'},
    'myapp.tasks.fast_email': {'queue': 'fast'},
}

Things I Wish I Knew About Scaling on Seenode

Connection pooling isn't optional—learned this at 3am when PostgreSQL started rejecting connections during a sale. You'll hit the limit fast.
Celery workers need separate services, not the web service. Tried running both in one process. Don't.
Redis is cheap insurance ($10/mo saves $100/mo in DB costs). Skip it and you'll pay later.
Profile before scaling—I wasted $200/mo on instances when the fix was a missing index. Debug Toolbar is your friend.
Seenode logs rotate fast. Ship to external storage or lose debugging info when you need it most.
The "Instances" slider is addictive. Scale on metrics, not fear. I've over-provisioned more times than I'll admit.

Conclusion

Scaling Django on Seenode isn't about cranking the instance slider. It's architecture: split services (web/worker/beat), cache aggressively, fix N+1 queries, enable connection pooling.

This multi-service pattern works. I've used it to scale from 2K to 50K+ requests/day without melting down. It's boring, proven, and lets you sleep.

If you're ssh'd into production restarting Gunicorn manually, this is your escape route.

Questions? Drop them in comments. I actually respond.

Related:

Next: Django REST API deployment with JWT and rate limiting. Publishing next week.

DEV Community