Part 4: The Two Pillars — Fixing the Foundation Before It Breaks
In Part 1, we built the infrastructure. In Part 2, we planted performance problems into the database. In Part 3, we achieved 95% speed improvements with Redis — and discovered that speed without correctness is worse than no speed at all.
If you're jumping in here, you need the context. Part 1 gives you the containerized stack. Part 2 gives you the database. Part 3 gives you the cache. If you're continuing from Part 3, you already have an API that responds in 4ms when the cache is warm — and 80ms with 61 queries when the cache is cold.
Today we fix both problems. We turn 61 queries into 1. We make cache misses disappear by invalidating stale data the moment it changes. By the end of this post, your system will be both fast and correct — which is the only combination that matters in production.
The Two Problems We're Solving
Part 3 left us with a system that works beautifully under ideal conditions and breaks under real ones. Let's be precise about what "breaks" means.
Problem 1: The N+1 Query Trap
Every time the cache expires or is empty, the API fires 61 separate database queries to render 20 properties. This happens because our serializers are nested — Property → Agent → Office, Property → Location — and Django fetches each related object separately instead of using a JOIN.
Why this matters: A cache miss isn't rare. It happens every time the cache expires (every 60 seconds in Part 3's config). It happens when a user hits a URL for the first time. It happens when you deploy a new version and the cache resets. Under high traffic, cache misses are constant. If each miss costs 61 queries and 80ms, your database chokes.
The symptom: Response times spike from 4ms to 80ms during a cache miss. Under load, PostgreSQL's connection pool fills up. The 51st concurrent request waits. The 101st request times out.
The fix: SQL JOINs via select_related. One query. No matter how nested the serializers are.
Problem 2: Stale Data
When a property's price changes in the admin, the cached API continues serving the old price for up to 60 seconds. That's the cache TTL we set in Part 3. The cache has no awareness of the underlying data. It only knows how to count.
Why this matters: In real estate, a 60-second delay might be tolerable. In e-commerce during a flash sale, it's catastrophic. In stock trading, it's illegal. But even in real estate, there's a trust problem. A user sees a price, calls the agent, and the agent quotes a different number. The user blames the portal. The agent blames the portal. Both are right. The cache is wrong.
The symptom: You change data in the admin. The API shows the old data. You wait. Eventually it updates. You have no control over when.
The fix: Signal-based invalidation. When the data changes, Django tells Redis: delete this key. Immediately. Not in 60 seconds. Now.
Part A: The Diagnostic Tool — Django Debug Toolbar
Before we fix anything, we need to see the problem in a way that's impossible to ignore. Terminal logs show you SQL queries. Django Debug Toolbar shows you SQL queries and makes you feel the weight of them.
Step 1: Install Django Debug Toolbar
cd backend
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install django-debug-toolbar
pip freeze > requirements.txt
Step 2: Configure It for Docker
The toolbar only appears when Django thinks you're making requests from a trusted IP. Inside Docker, the request comes from the Docker network, not 127.0.0.1. We need to tell Django that Docker IPs are safe.
Add this to core/settings.py:
# At the top, with other imports
import socket
# In INSTALLED_APPS (add near the top, before your apps)
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
# Third-party
'rest_framework',
'corsheaders',
'debug_toolbar', # ← Add this
# Our apps
'housing',
]
# In MIDDLEWARE (add at the very top of the list)
MIDDLEWARE = [
'debug_toolbar.middleware.DebugToolbarMiddleware', # ← Add this FIRST
'django.middleware.security.SecurityMiddleware',
'corsheaders.middleware.CorsMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
# At the bottom of settings.py, add this block
# This allows the toolbar to show up when running inside Docker
INTERNAL_IPS = [
'127.0.0.1',
]
# Docker-specific: detect the host machine's IP from inside the container
try:
hostname, _, ips = socket.gethostbyname_ex(socket.gethostname())
INTERNAL_IPS += [ip[:-1] + '1' for ip in ips]
except:
pass
Step 3: Add the Toolbar URLs
Update core/urls.py:
from django.contrib import admin
from django.urls import path, include
from django.conf import settings # ← Add this import
urlpatterns = [
path('admin/', admin.site.urls),
path('api/', include('housing.urls')),
]
# Only add the debug toolbar URLs if DEBUG is True
if settings.DEBUG:
urlpatterns += [
path('__debug__/', include('debug_toolbar.urls')),
]
Step 4: Rebuild and Restart
docker compose build backend
docker compose restart backend
Step 5: Verify It Works
Open your browser and navigate to:
http://localhost:8000/api/properties/live/naive/
You should see a sidebar on the right side of the browser with tabs labeled "History", "Versions", "Time", "SQL", "Cache", etc. If you don't see it, the INTERNAL_IPS detection didn't work. Hard-code your Docker gateway IP instead:
# Find your Docker gateway IP
docker network inspect housing-caching-demo_default | grep Gateway
Add that IP to INTERNAL_IPS manually:
INTERNAL_IPS = [
'127.0.0.1',
'172.18.0.1', # ← Your gateway IP here (yours will be different)
]
Restart again. The toolbar should appear.
What the Toolbar Shows You
Click the SQL tab in the toolbar. You'll see a list of every query Django fired to render the page. Scroll through it. Count them. You should see 61 queries for the naive endpoint.
Click any query. The toolbar shows you:
- The raw SQL
- The execution time
- The stack trace (where in your code this query was triggered)
- Whether it's a duplicate (same query fired multiple times)
This is the smoking gun. This is what we're about to fix.
📊 [Screenshot : Django Debug Toolbar - Naive Endpoint SQL Tab]
Part B: Pillar 1 — Fixing the N+1 Queries with select_related
The N+1 problem isn't a Django bug. It's the default behavior when you serialize nested relationships without telling Django to optimize. We fix it with select_related — a method that tells Django to perform SQL JOINs upfront instead of lazy-loading related objects one at a time.
How select_related Works
Here's what Django does without it:
-- Query 1: Get 20 properties
SELECT * FROM housing_property ORDER BY created_at DESC LIMIT 20;
-- Query 2-21: Get each property's location (20 separate queries)
SELECT * FROM housing_location WHERE id = 1;
SELECT * FROM housing_location WHERE id = 2;
...
SELECT * FROM housing_location WHERE id = 20;
-- Query 22-41: Get each property's agent (20 separate queries)
SELECT * FROM housing_agent WHERE id = 1;
SELECT * FROM housing_agent WHERE id = 2;
...
-- Query 42-61: Get each agent's office (20 separate queries)
SELECT * FROM housing_office WHERE id = 1;
SELECT * FROM housing_office WHERE id = 2;
...
Total: 61 queries. Each one is a round-trip to PostgreSQL. Even if each query is fast, the cumulative network latency kills you.
Here's what Django does with select_related:
-- Query 1: Get 20 properties WITH their related data via JOINs
SELECT
property.*,
location.*,
agent.*,
office.*
FROM housing_property AS property
LEFT OUTER JOIN housing_location AS location ON property.location_id = location.id
LEFT OUTER JOIN housing_agent AS agent ON property.agent_id = agent.id
LEFT OUTER JOIN housing_office AS office ON agent.office_id = office.id
ORDER BY property.created_at DESC
LIMIT 20;
Total: 1 query. Everything comes back in a single round-trip. The database does the work of combining the tables. Python just deserializes the result.
The Code
We already teased this in Part 3's OptimizedPropertyListView. Now we're making it real.
Update housing/views.py. The OptimizedPropertyListView class should look like this:
class OptimizedPropertyListView(generics.ListAPIView):
"""
The database-optimized version. Uses select_related to fetch
Property + Location + Agent + Office in a single query with JOINs.
This is what "fast without cache" looks like.
61 queries → 1 query.
80ms → 15-20ms.
"""
serializer_class = PropertySerializer
def get_queryset(self):
return Property.objects.select_related(
'location', # JOIN on property.location_id = location.id
'agent__office' # Double underscore: JOIN agent, then JOIN office
).all().order_by('-created_at')
The double underscore (agent__office) is the key. It tells Django: "Follow the agent ForeignKey from Property to Agent, then follow the office ForeignKey from Agent to Office." Both JOINs happen in the same query.
Test It
Restart the backend:
docker compose restart backend
Open your browser and navigate to:
http://localhost:8000/api/properties/live/optimized/
Click the SQL tab in the Debug Toolbar. You should see 1 query. It's a long query — the SELECT statement spans multiple lines because it's pulling columns from four tables — but it's one query.
Click the Time tab. Check the total time. It should be 15-20ms. That's without a cache. Just fast SQL.
📊 [Screenshot: Django Debug Toolbar - Optimized Endpoint SQL Tab]
The Before-and-After Table
| Metric | Naive (No Optimization) | Optimized (select_related) |
Improvement |
|---|---|---|---|
| Query count | 61 | 1 | 98% reduction |
| Query time (PostgreSQL) | ~40ms | ~8ms | 5x faster |
| Total response time (no cache) | 80ms | 15-20ms | 4x faster |
| Database connections used | 61 | 1 | 98% reduction |
The database is now fast. But the cache is still dumb. Let's fix that.
Part C: Pillar 2 — Smart Invalidation with Django Signals
The cache from Part 3 is time-based. It expires after 60 seconds whether the data changed or not. This creates two problems:
- Stale data for up to 60 seconds after an update
- Unnecessary cache misses every 60 seconds even if the data hasn't changed in days
We're replacing time-based expiry with event-based invalidation. When the data changes, Django sends a signal. We listen for that signal and delete the cache key immediately.
How Django Signals Work
Django's ORM fires signals at specific moments in a model's lifecycle:
-
pre_save— fired before a model is saved to the database -
post_save— fired after a model is saved -
pre_delete— fired before a model is deleted -
post_delete— fired after a model is deleted
We register a function to listen for these signals. When the signal fires, our function runs. That function can do anything — log the event, send an email, update a search index, or in our case, delete a Redis cache key.
The Implementation
Create a new file: housing/signals.py
"""
housing/signals.py
Signal handlers for cache invalidation.
Whenever a Property is saved or deleted, we clear the cached API responses
immediately instead of waiting for the TTL to expire.
"""
from django.db.models.signals import post_save, post_delete
from django.dispatch import receiver
from django.core.cache import cache
from .models import Property
@receiver([post_save, post_delete], sender=Property)
def invalidate_property_cache(sender, instance, **kwargs):
"""
Automatic cache invalidation.
Fires whenever a Property is created, updated, or deleted.
Clears the entire cache to ensure fresh data on the next request.
In a production system, you'd invalidate specific cache keys
(e.g., only the listing page, only pages that include this property).
For this demo, we use cache.clear() for simplicity and guaranteed freshness.
"""
cache.clear()
print(f"[SIGNAL] Property {instance.id} changed. Redis cache purged.")
The @receiver decorator registers this function as a signal handler. The [post_save, post_delete] list means "fire this function after a save OR after a delete". The sender=Property parameter means "only fire this for the Property model, not all models".
Register the Signals
Signals don't auto-register. Django needs to import signals.py at startup. The correct place to do this is in the app's AppConfig class.
Update housing/apps.py:
from django.apps import AppConfig
class HousingConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'housing'
def ready(self):
"""
Called when Django starts up.
This is where we import signals to register them.
"""
import housing.signals # noqa
The # noqa comment tells linters "this import has a side effect (registration), it's not unused."
Restart the Backend
docker compose restart backend
Watch the logs as it restarts:
docker compose logs -f backend
You should see Django start up. No errors. The signal is registered.
Part D: Testing the Smart Invalidation
This is the moment where the system behavior changes in a way you can see with your own eyes.
The Test
- Prime the cache. Hit the cached endpoint once:
curl -s http://localhost:8000/api/properties/cached/ | jq '.results[0] | {id, title, price}'
You'll see output like:
{
"id": 1,
"title": "Modern Apartment in Seattle",
"price": "450000.00"
}
Note the price.
- Verify the cache is serving it. Open the cached endpoint in your browser:
http://localhost:8000/api/properties/cached/
Click the SQL tab in the Debug Toolbar. You should see 0 queries. The cache is serving it.
- Change the data. Open the Django admin:
http://localhost:8000/admin/housing/property/
Find Property #1. Change the price from 450000 to 395000. Save.
Watch the terminal where docker compose logs -f backend is running. You should see:
[SIGNAL] Property 1 changed. Redis cache purged.
- Verify the cache was purged. Refresh the browser on the cached endpoint.
Click the SQL tab. You should see 1 query (the optimized query with JOINs). The cache was empty. Django queried the database with the optimized queryset.
Check the JSON response. The price should be 395000.00 — the new price. Immediately. No 60-second wait.
- Verify the cache is repopulated. Refresh again.
Click the SQL tab. You should see 0 queries. The fresh data is now cached. The next request will serve from Redis at 4ms.
What Just Happened
The cache is no longer time-based. It's event-based. The data in Redis is always fresh because stale data is deleted the moment the source of truth (PostgreSQL) changes. There's no 60-second window where the API lies to users.
This is the pattern production systems use. E-commerce sites invalidate product caches when prices change. News sites invalidate article caches when headlines are edited. Real estate portals invalidate listing caches when prices or availability change. The TTL still exists as a safety net (in case a signal fails to fire, the cache expires eventually), but invalidation happens immediately, not eventually.
Part E: The Trade-Off — When cache.clear() Isn't Enough
Our signal uses cache.clear() — it deletes every key in Redis. This is simple and guarantees freshness, but it's blunt. If you have 100 different cached endpoints (different search filters, different pages, featured listings, agent profiles), changing one property blows away all 100 caches.
In a high-traffic system, that creates a new problem: the cache stampede returns. Everyone hits empty caches at the same moment. The database gets hammered.
The Better Approach: Targeted Invalidation
Instead of cache.clear(), delete specific keys:
@receiver([post_save, post_delete], sender=Property)
def invalidate_property_cache(sender, instance, **kwargs):
"""
Invalidate only the cache keys related to this property.
"""
# The cache key pattern that Django's cache_page decorator creates
# This is a simplified example — the real key includes query params, headers, etc.
cache_key_pattern = f"views.decorators.cache.cache_page.*.properties.*"
# In production, you'd use a pattern-matching delete or maintain a registry
# of cache keys per property. For now, we clear everything.
cache.clear()
print(f"[SIGNAL] Property {instance.id} changed. Cache purged.")
The challenge: Django's cache_page decorator auto-generates keys based on the URL, query parameters, request headers, and more. You can't easily predict the exact key unless you control the caching logic yourself (which we'll do in Part 5 with low-level caching).
For this post, cache.clear() is correct. It's simple, it works, and it guarantees correctness. The performance cost (everyone refills their cache simultaneously) is acceptable at this scale. When you hit 10,000 requests per second, you move to more sophisticated patterns like cache warming, versioned keys, or probabilistic invalidation. But you don't need those yet. Solve the problem you have, not the problem you imagine.
Part F: The Complete System — Fast and Correct
Let's map what we built across all four parts and see the final state.
The Request Lifecycle (Cached Endpoint)
Request 1 — Cache Miss:
- User requests
/api/properties/cached/ - Django checks Redis: "Do you have this URL?"
- Redis: "No."
- Django queries PostgreSQL with
select_related(1 query, 15ms) - Django serializes the data to JSON
- Django saves the response to Redis with a 60-second TTL
- Django returns the JSON to the user
- Total time: ~20ms (query + serialization + network)
Request 2-N — Cache Hit:
- User requests
/api/properties/cached/ - Django checks Redis: "Do you have this URL?"
- Redis: "Yes, here it is."
- Django returns the cached JSON directly
- Total time: ~4ms (Redis read + network)
When Data Changes:
- Admin updates Property #42 in the admin panel
- Django saves the change to PostgreSQL
- Django fires the
post_savesignal - The signal handler calls
cache.clear() - Redis deletes all cached responses
- Next request is a cache miss (goes through Request 1 flow above)
- Data is fresh immediately, not in 60 seconds
The Performance Table — All Four Parts Combined
| Metric | Part 2 (Naive DB) | Part 3 (Basic Cache) | Part 4 (Optimized + Smart Invalidation) |
|---|---|---|---|
| Query count (cache miss) | 61 | 61 | 1 |
| Query count (cache hit) | 61 | 0 | 0 |
| Response time (cache miss) | 80ms | 80ms | 15-20ms |
| Response time (cache hit) | 80ms | 4ms | 4ms |
| Data freshness | Instant | Up to 60s stale | Instant |
| Cache invalidation | N/A | Time-based (dumb) | Event-based (smart) |
| Under load (50 concurrent users) | 500-1500ms | 4-10ms (warm cache) | 4-10ms (warm), 20-40ms (cold) |
Part G: Tools and Patterns — The Professional Toolkit
Here are the tools and patterns we used in Part 4, documented for reference.
Django Debug Toolbar
What it does: Shows you every SQL query, cache hit/miss, template render, and signal fired during a request. Lives in the browser as a sidebar.
When to use it: During development, whenever you're optimizing a view or debugging unexpected behavior.
Key panels:
- SQL — query count, query time, duplicate detection
- Cache — hits/misses, key names, sizes
- Time — breakdown of where time is spent (DB, template, Python, middleware)
- Signals — which signals fired and how long their handlers took
Installation:
pip install django-debug-toolbar
Configuration: See Part A above. The INTERNAL_IPS trick is critical for Docker.
select_related vs prefetch_related
select_related — for ForeignKey and OneToOne relationships. Uses SQL JOINs. One query.
Property.objects.select_related('location', 'agent__office')
Produces:
SELECT property.*, location.*, agent.*, office.*
FROM housing_property
LEFT JOIN housing_location ON ...
LEFT JOIN housing_agent ON ...
LEFT JOIN housing_office ON ...
prefetch_related — for ManyToMany and reverse ForeignKey relationships. Uses separate queries but fetches related objects in bulk instead of one at a time.
Property.objects.prefetch_related('images')
Produces:
-- Query 1
SELECT * FROM housing_property LIMIT 20;
-- Query 2 (one query for ALL images, not 20 separate ones)
SELECT * FROM housing_propertyimage WHERE property_id IN (1,2,3,...,20);
When to use which:
-
select_related— when you're traversing ForeignKeys forward (Property → Agent) -
prefetch_related— when you're traversing reverse ForeignKeys or ManyToMany (Property → all its images)
Django Signals
Common signals:
-
pre_save/post_save— before/after a model is saved -
pre_delete/post_delete— before/after a model is deleted -
m2m_changed— when a ManyToMany relationship changes
Registration pattern:
from django.db.models.signals import post_save
from django.dispatch import receiver
@receiver(post_save, sender=MyModel)
def my_handler(sender, instance, created, **kwargs):
if created:
# This is a new object
pass
else:
# This is an update
pass
Where to put signal handlers: In a signals.py file in your app. Import it in apps.py in the ready() method.
When to use signals: Cache invalidation, search index updates, audit logging, triggering async tasks. Don't use them for business logic that belongs in model methods or views.
Cache Invalidation Strategies
Time-based (Part 3):
-
@cache_page(60)— cache expires after 60 seconds - Simple, predictable, safe
- Guaranteed stale data for up to TTL duration
- Thundering herd on expiry under high traffic
Event-based (Part 4):
- Invalidate on
post_save/post_deletesignals - Data is always fresh
- More complex, requires testing
- Avoids stale data but creates cold cache moments
Hybrid (Production):
- Event-based invalidation + long TTL (e.g., 3600 seconds) as a safety net
- If invalidation fails for any reason, cache still expires eventually
- Best of both worlds
Advanced patterns (Part 5+):
- Cache warming — background task refreshes cache before expiry
-
Versioned keys —
property:123:v2instead of deleting, increment version - Probabilistic early expiry — randomly refresh 5% of requests when cache is 90% expired
- Cache locking — only one process refreshes an expired key, others wait or serve stale
Part H: Troubleshooting
Debug Toolbar Doesn't Appear
Symptom: You open the API endpoint in the browser. No sidebar. No toolbar.
Possible causes:
-
INTERNAL_IPSis wrong. The toolbar only shows if Django thinks the request is coming from a trusted IP. Inside Docker, the request comes from the Docker bridge network, not127.0.0.1.
Fix: Add your Docker gateway IP to INTERNAL_IPS. Find it with:
docker network inspect housing-caching-demo_default | grep Gateway
Add that IP (e.g., 172.18.0.1) to the INTERNAL_IPS list in settings.py.
-
Middleware is in the wrong order.
DebugToolbarMiddlewaremust be at the top of theMIDDLEWARElist.
Fix: Check settings.py. The toolbar middleware should be the first item in the list.
-
DEBUG is False. The toolbar is disabled if
DEBUG = False.
Fix: Check settings.py. DEBUG should be True in development.
- You're hitting the API via curl or Locust. The toolbar only works in a browser. It injects JavaScript and HTML into the response.
Fix: Open the URL in Chrome/Firefox.
Signal Fires But Cache Doesn't Clear
Symptom: You update a property in the admin. The terminal shows the "[SIGNAL] Property X changed" message. But the cached endpoint still shows old data.
Possible causes:
-
You're hitting a different URL. Django's
cache_pagedecorator creates different keys for different URLs./api/properties/cached/and/api/properties/cached(no trailing slash) are different keys.
Fix: Check the exact URL you're testing. Match it exactly, including trailing slash.
-
The cache backend isn't Redis. If
CACHESinsettings.pyis misconfigured, Django might be usingLocMemCache(local memory) instead of Redis.cache.clear()would clear local memory, not Redis.
Fix: Verify the cache backend:
docker compose exec backend python manage.py shell
from django.core.cache import cache
print(cache.__class__) # Should be RedisCache or django_redis
cache.set('test', 'value')
Then check Redis:
docker compose exec redis redis-cli keys "*"
You should see a key. If not, the cache isn't going to Redis.
-
The signal isn't registered. If you didn't add the
ready()method toapps.py, the signal handler never gets imported.
Fix: Verify the signal is registered. Add a print statement to signals.py at the module level (outside any function):
print("[DEBUG] signals.py imported")
Restart the backend. If you don't see that print statement in the logs, the file isn't being imported.
select_related Doesn't Reduce Query Count
Symptom: You added select_related to the view. The Debug Toolbar still shows 61 queries.
Possible causes:
-
You're testing the wrong endpoint.
select_relatedis only on theOptimizedPropertyListView, not thePropertyListView.
Fix: Make sure you're hitting /api/properties/live/optimized/, not /api/properties/live/naive/.
-
The serializer is accessing related objects Django didn't fetch. If your serializer accesses a field you didn't include in
select_related, Django makes an extra query.
Fix: Check what fields the serializer uses. Make sure every ForeignKey accessed in the serializer is in the select_related() call.
- You forgot to restart. Code changes don't apply until the container restarts.
Fix: docker compose restart backend
What We Built — And What's Next
We entered Part 4 with a system that was fast when the cache was warm and broken when the cache was cold. We leave Part 4 with a system that's fast all the time and correct all the time.
The database is optimized. 61 queries became 1. Cache misses that used to take 80ms now take 15ms. The database can handle 10x more traffic before it chokes.
The cache is smart. It no longer serves stale data. When a property changes, the cache updates immediately. Users see fresh data without sacrificing speed.
The system is observable. Django Debug Toolbar shows us exactly where time is spent. We can see the difference between a cache hit and a cache miss. We can see which queries are slow. We can prove the optimizations work.
But we're still serving data from Django to... nobody. The frontend is a blank Next.js welcome page. Part 5 changes that. We build the actual housing portal UI — the listing cards, the filters, the search. And we introduce a third caching layer: the browser. Client-side caching with Next.js revalidation. That's when the system becomes whole.
Checkpoint: Push to GitHub
git add .
git commit -m "feat: add select_related optimization, signal-based cache invalidation, and Django Debug Toolbar"
git checkout -b part-4-full-cache
git push origin part-4-full-cache
The repo now has four branches:
-
part-1-setup— infrastructure -
part-2-data— database schema and seed -
part-3-problem— API layer and basic Redis cache -
part-4-full-cache— query optimization and smart invalidation
Diff any two branches to see what changed:
git diff part-3-problem..part-4-full-cache
Next: Part 5 — The Frontend Layer: Next.js, SWR, and Client-Side Caching. Stay tuned.


Top comments (0)