DEV Community

Cover image for From Zero to Cached: Building a High-Performance Housing Portal with Django, Next.js, and Redis- Part 3
Ajit Kumar
Ajit Kumar

Posted on

From Zero to Cached: Building a High-Performance Housing Portal with Django, Next.js, and Redis- Part 3

Part 3: The Quick Win — Measuring the Baseline and Introducing Redis Cache

In Part 1, we built the infrastructure. In Part 2, we seeded 5,000 properties into an intentionally naive database schema. Today, we face the consequences — and we fix them. Or at least, we appear to.


If you're jumping in here, start with Part 1 and Part 2 — they set up the entire stack and data layer this post builds on. If you're continuing from Part 2, you already have 5,000 properties in PostgreSQL, zero indexes beyond primary keys, and a database schema that's about to reveal exactly why caching exists.

Today we build the API layer, measure how slow it is, and introduce Redis. By the end of this post, you'll see a 95% speed improvement — and understand exactly why that improvement comes with a hidden cost.


Part A: The Foundation — Building the API Layer

Before we can measure anything, we need endpoints that actually serve data. This is the DRF (Django REST Framework) layer — the thing we test, the thing we cache, and the thing that exposes every inefficiency we built into Part 2's database.

Step 1: The Serializers

Serializers turn Django models into JSON. We're using nested serializers here — which is a common pattern in REST APIs and also the primary trigger for the N+1 query problem we're about to demonstrate.

Create housing/serializers.py:

"""
housing/serializers.py

Nested serializers that expose the full object graph:
Property → Agent → Office
Property → Location

This is intentionally naive. The nesting triggers N+1 queries because
Django fetches each related object separately instead of in a single JOIN.
"""

from rest_framework import serializers
from .models import Office, Agent, Location, Property


class OfficeSerializer(serializers.ModelSerializer):
    class Meta:
        model = Office
        fields = ['id', 'name', 'city', 'phone']


class AgentSerializer(serializers.ModelSerializer):
    office = OfficeSerializer(read_only=True)

    class Meta:
        model = Agent
        fields = ['id', 'name', 'email', 'phone', 'office']


class LocationSerializer(serializers.ModelSerializer):
    class Meta:
        model = Location
        fields = ['id', 'city', 'state', 'zip_code']


class PropertySerializer(serializers.ModelSerializer):
    location = LocationSerializer(read_only=True)
    agent = AgentSerializer(read_only=True)

    class Meta:
        model = Property
        fields = [
            'id', 'title', 'description', 'property_type', 'price',
            'bedrooms', 'bathrooms', 'location', 'agent', 'status',
            'view_count', 'created_at',
        ]
Enter fullscreen mode Exit fullscreen mode

The nesting is the key detail here. When the serializer renders a Property, it also renders the Agent, which also renders the Office. Each level of nesting is a separate database query — unless we do something about it. We won't. Not yet. That's Part 4's job.

Step 2: The Views — Three Endpoints, Three Stories

We're creating three different views of the same data. This isn't just for testing — it's the scientific method applied to caching. We need a control group (naive), an experimental group (cached), and a hint at what comes next (optimized).

Create housing/views.py:

"""
housing/views.py

Three views of the same data:
1. Naive — no optimization, triggers N+1 queries
2. Cached — naive view with @cache_page, serves from Redis
3. Optimized — uses select_related to fix N+1 at the database level (Part 4 preview)
"""

from django.utils.decorators import method_decorator
from django.views.decorators.cache import cache_page
from rest_framework import generics
from .models import Property
from .serializers import PropertySerializer


class PropertyListView(generics.ListAPIView):
    """
    The naive baseline. No caching. No query optimization.
    This is the "before" picture.
    """
    queryset = Property.objects.all().order_by('-created_at')
    serializer_class = PropertySerializer


class CachedPropertyListView(PropertyListView):
    """
    The cached version. Same queryset as PropertyListView, but with
    @cache_page(60) applied. This caches the entire HTTP response
    (headers + JSON body) in Redis for 60 seconds.

    First request: cache miss, hits the database, saves to Redis.
    Subsequent requests: cache hit, served from Redis, zero DB queries.
    """
    @method_decorator(cache_page(60))
    def dispatch(self, *args, **kwargs):
        return super().dispatch(*args, **kwargs)


class OptimizedPropertyListView(generics.ListAPIView):
    """
    The database-optimized version. No cache, but uses select_related
    to fetch Property + Agent + Office in a single query with JOINs
    instead of 41 separate queries.

    This is a preview of Part 4. We're including it here so you can
    compare "fast cache" vs "fast database" side by side.
    """
    queryset = Property.objects.select_related(
        'agent__office', 'location'
    ).all().order_by('-created_at')
    serializer_class = PropertySerializer
Enter fullscreen mode Exit fullscreen mode

The @method_decorator(cache_page(60)) line is the entire cache implementation. One decorator. 60 seconds. That's the "quick win" — and the reason this post exists.

Step 3: The URLs

Create housing/urls.py:

"""
housing/urls.py

Three routes for three views.
We keep them under different paths so we can test them side-by-side
without redeploying code or toggling settings.
"""

from django.urls import path
from .views import PropertyListView, CachedPropertyListView, OptimizedPropertyListView

urlpatterns = [
    path('properties/live/naive/', PropertyListView.as_view(), name='property-naive'),
    path('properties/cached/', CachedPropertyListView.as_view(), name='property-cached'),
    path('properties/live/optimized/', OptimizedPropertyListView.as_view(), name='property-optimized'),
]
Enter fullscreen mode Exit fullscreen mode

Update core/urls.py to include the housing app's routes:

"""
core/urls.py
"""

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('api/', include('housing.urls')),  # ← Add this line
]
Enter fullscreen mode Exit fullscreen mode

Step 4: Enable SQL Logging

We need to see every query Django fires. Add this to core/settings.py:

# Add this anywhere in settings.py, typically near the bottom

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        },
    },
}
Enter fullscreen mode Exit fullscreen mode

This prints every SQL query to the Docker logs. You'll see the N+1 problem in real time.

Step 5: Restart and Verify

docker compose restart backend

# Test that the endpoint exists
curl http://localhost:8000/api/properties/live/naive/ | jq '.results[0].title'
Enter fullscreen mode Exit fullscreen mode

If you see a property title, the API is alive.


Part B: The Instrumentation — Define What We Measure

An engineer without metrics is just a person with an opinion. Before we optimize anything, we agree on what "fast" and "slow" mean in this context.

What We Measure

Response latency (ms) — How long the HTTP request takes, end to end. Measured from the moment curl sends the request to the moment it receives the full response.

Query count — How many SQL queries Django fires to assemble the JSON response. A well-optimized endpoint should use 1-3 queries. Our naive endpoint uses 41.

Query time (ms) — The total time PostgreSQL spends executing those queries. This is separate from serialization time, network time, and Python overhead.

Cache hits/misses — Did Redis serve this response, or did we go to the database? A cache hit means zero database queries. A cache miss means we pay the full cost.

What Endpoint We Measure

We're testing GET /api/properties/live/naive/ as the baseline and GET /api/properties/cached/ as the optimized version. Both return 20 results per page (DRF's default pagination). Both use the exact same serializer. The only difference is the @cache_page decorator.

What Tools We Use

curl with --write-out — Terminal-based response timing. One command, one number. No install needed.

curl -o /dev/null -s -w "Total time: %{time_total}s\n" http://localhost:8000/api/properties/live/naive/
Enter fullscreen mode Exit fullscreen mode

Django SQL logging — We enabled this in Step 4. Watch the Docker logs during a request:

docker compose logs -f backend | grep "SELECT"
Enter fullscreen mode Exit fullscreen mode

You'll see every query scroll past.

redis-cli monitor — Real-time stream of every command Redis receives. Open this in a separate terminal and leave it running:

docker compose exec redis redis-cli monitor
Enter fullscreen mode Exit fullscreen mode

When you hit the cached endpoint, you'll see GET and SET commands appear.

EXPLAIN ANALYZE — PostgreSQL's query planner. Shows exactly what the database does with a query — sequential scan vs index scan, estimated cost, actual time.

docker compose exec db psql -U user -d housing_db
Enter fullscreen mode Exit fullscreen mode
EXPLAIN ANALYZE 
SELECT * FROM housing_property 
ORDER BY created_at DESC 
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Look for Seq Scan on housing_property. That's a full table scan. No index. PostgreSQL reads every single row to find the 20 most recent ones.

Locust — Load testing and visualization. This is the tool that turns numbers into graphs. We'll install it shortly.


Part C: The Baseline — Measure the Slow Path

This is the "before" picture. We hit the naive endpoint, we watch it work, and we document every inefficiency.

Test 1: The Single Request

curl -o /dev/null -s -w "Total time: %{time_total}s\n" http://localhost:8000/api/properties/live/naive/
Enter fullscreen mode Exit fullscreen mode

Expected output:

Total time: 0.068s or  Total time: 0.040257s
Enter fullscreen mode Exit fullscreen mode

Your number will vary depending on your machine, but it should be somewhere between 0.060s and 0.080s. That's 60-80 milliseconds for 20 rows of JSON.

Test 2: Count the Queries

Watch the Docker logs during the request:

docker compose logs -f backend
Enter fullscreen mode Exit fullscreen mode

Hit the endpoint again. Scroll through the logs. Count the SELECT statements. You should see:

SELECT ... FROM housing_property ORDER BY created_at DESC LIMIT 20
SELECT ... FROM housing_agent WHERE id = 1
SELECT ... FROM housing_office WHERE id = 1
SELECT ... FROM housing_location WHERE id = 1
SELECT ... FROM housing_agent WHERE id = 2
SELECT ... FROM housing_office WHERE id = 2
SELECT ... FROM housing_location WHERE id = 2
... (repeat 20 times)
Enter fullscreen mode Exit fullscreen mode

Total: 1 query for properties + 20 queries for agents + 20 queries for offices + 20 queries for locations = 61 queries to render 20 properties.

This is the N+1 problem in its purest form. The serializer asks for each property's agent. Django fetches each agent separately. The serializer asks for each agent's office. Django fetches each office separately. It's not a bug — it's the default behavior when you use nested serializers without query optimization.

Test 3: The Query Plan

Open a PostgreSQL shell:

docker compose exec db psql -U user -d housing_db
Enter fullscreen mode Exit fullscreen mode

Run the main query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE 
SELECT * FROM housing_property 
ORDER BY created_at DESC 
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

You'll see output like this:

Limit  (cost=XXX..XXX rows=20 width=XXX) (actual time=X.XXX..X.XXX rows=20 loops=1)
  ->  Sort  (cost=XXX..XXX rows=5000 width=XXX) (actual time=X.XXX..X.XXX rows=20 loops=1)
        Sort Key: created_at DESC
        ->  Seq Scan on housing_property  (cost=0.00..XXX.XX rows=5000 width=XXX) (actual time=X.XXX..X.XXX rows=5000 loops=1)
Enter fullscreen mode Exit fullscreen mode

The key line is Seq Scan on housing_property. That's a sequential scan — PostgreSQL is reading every single row from disk into memory, sorting them, and then taking the first 20. With 5,000 rows, this is tolerable. With 50,000 rows, it's slow. With 500,000 rows, it's a disaster.

Exit the PostgreSQL shell (\q).

Test 4: The Bombardment — Simulating Load

A single request tells you latency. Multiple concurrent requests tell you scalability. Create a simple bash script to simulate 50 users hitting the endpoint at the same time.

Create bombardment_test.sh in your project root:

#!/bin/bash
# Fires 50 requests in parallel and records each one's response time

for i in {1..50}; do 
  curl -o /dev/null -s -w "Request $i: %{time_total}s\n" http://localhost:8000/api/properties/live/naive/ & 
done
wait
Enter fullscreen mode Exit fullscreen mode

Make it executable and run it:

chmod +x bombardment_test.sh
./bombardment_test.sh
Enter fullscreen mode Exit fullscreen mode

You'll see output like this:

Request 2: 0.522941s
Request 6: 0.555113s
Request 5: 0.559119s
Request 1: 0.561309s
...
Request 45: 1.261981s
Request 50: 1.261836s
...
Request 32: 1.467066s
Request 25: 1.469146s
Enter fullscreen mode Exit fullscreen mode

Notice the pattern: the first few requests complete in ~0.5 seconds. The middle batch climbs to ~1.0 seconds. The final batch hits ~1.4 seconds. This is the multiplication effect. Each request has to wait for the previous ones to finish. PostgreSQL's connection pool is finite. When 50 requests arrive simultaneously, the 50th request waits in a queue while the first 49 execute.

The Baseline Results

Metric Value (No Cache)
Single request 60-80ms
Query count 61 queries
Query time (estimated) ~40ms
Under load (50 concurrent users) 500ms - 1500ms
Failure rate 0% (slow, but functional)

This is the number we beat.


Part D: The Load Testing Tool — Installing Locust

The bash script gives us numbers. Locust gives us graphs. And graphs tell stories that tables can't.

Step 1: Install Locust

pip install locust
pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Step 2: Create the Locust Test File

Create locustfile.py in your project root:

"""
locustfile.py

Locust test configuration for the housing portal API.
Simulates real users hitting both the naive and cached endpoints.

Run with:
    locust -f locustfile.py --host=http://localhost:8000

Then open http://localhost:8089 in your browser.
"""

from locust import HttpUser, task, between


class NaiveUser(HttpUser):
    """
    Simulates a user hitting the unoptimized endpoint.
    This is the baseline — no cache, no query optimization.
    """
    wait_time = between(1, 2)  # Wait 1-2 seconds between requests

    @task
    def get_properties(self):
        self.client.get("/api/properties/live/naive/", name="Naive (No Cache)")


class CachedUser(HttpUser):
    """
    Simulates a user hitting the cached endpoint.
    First request is a cache miss. Subsequent requests are cache hits.
    """
    wait_time = between(1, 2)

    @task
    def get_properties(self):
        self.client.get("/api/properties/cached/", name="Cached (Redis)")
Enter fullscreen mode Exit fullscreen mode

Step 3: Run Locust

locust -f locustfile.py --host=http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8089 in your browser. You'll see the Locust web UI.

Step 4: Configure the Test

In the Locust UI:

Number of users: 50
Spawn rate: 10 users per second
Host: http://localhost:8000 (already set via --host flag)

Click Start swarming.

Step 5: Understanding the Locust Interface

Locust shows you three tabs:

Statistics — A table showing median, average, min, max, and percentile response times. The columns that matter:

  • Median (50th percentile) — half of requests are faster than this, half are slower
  • 95th percentile — 95% of requests are faster than this, 5% are slower
  • 99th percentile — the slowest 1% of requests. This is the "worst case" your users experience.

Charts — Live graphs showing:

  • Total Requests per Second — throughput. Higher is better.
  • Response Times — latency over time. Lower is better. Watch for the line climbing as users increase.
  • Number of Users — shows the spawn rate as users ramp up.

Failures — Any HTTP errors (500, 404, timeouts). Should be empty for this test.

Step 6: What to Capture

Run the test twice — once for the naive endpoint, once for the cached endpoint. For each run, capture screenshots of:

The Statistics tab after the test stabilizes (after all 50 users have spawned and made at least 5-10 requests each). This gives you the median, 95th, and 99th percentile numbers.

The Charts tab showing the response time graph over the full duration of the test. You want to see the curve — flat for cached, climbing for naive.


Locust Screenshot: Naive Endpoint Statistics


Locust Screenshot: Naive Endpoint Response Time Chart


Step 7: Stop the Test

Click Stop in the Locust UI. The test stops immediately. The statistics remain on screen so you can review them.


Part E: The Cache — Introducing Redis

Now we flip the switch. Same data. Same serializer. One decorator. Everything changes.

How @cache_page Works

Django's cache_page decorator does one thing: it saves the entire HTTP response — headers, status code, JSON body, everything — as a single string in Redis, keyed by the request URL.

First request (cache miss):

  1. User requests /api/properties/cached/
  2. Django checks Redis: "Do you have a cached response for this URL?"
  3. Redis: "No."
  4. Django queries the database (61 queries), serializes the data to JSON, saves the response in Redis with a 60-second TTL, returns it to the user.

Second request (cache hit):

  1. User requests /api/properties/cached/
  2. Django checks Redis: "Do you have a cached response for this URL?"
  3. Redis: "Yes, here it is." (returns the pre-built string)
  4. Django skips the database entirely, skips serialization entirely, returns the cached string directly. Zero queries. Zero Python overhead beyond the Redis network call.

After 60 seconds:

The cache expires. The next request is a cache miss again. The cycle repeats.

Test It Manually

Hit the cached endpoint once to prime the cache:

curl -o /dev/null -s -w "Total time: %{time_total}s\n" http://localhost:8000/api/properties/cached/
Enter fullscreen mode Exit fullscreen mode

First request (cache miss):

Total time: 0.065s
Enter fullscreen mode Exit fullscreen mode

Hit it again immediately:

curl -o /dev/null -s -w "Total time: %{time_total}s\n" http://localhost:8000/api/properties/cached/
Enter fullscreen mode Exit fullscreen mode

Second request (cache hit):

Total time: 0.004s or Total time: 0.002012s 
Enter fullscreen mode Exit fullscreen mode

That's 4 milliseconds. The database wasn't touched. Redis served a 50KB JSON string from RAM in 4ms.

Inspect the Cache Key

docker compose exec redis redis-cli keys "*"
Enter fullscreen mode Exit fullscreen mode

You'll see something like:

1) ":1:views.decorators.cache.cache_page.GET./api/properties/cached/.d41d8cd98f00b204e9800998ecf8427e"
Enter fullscreen mode Exit fullscreen mode

That's Django's auto-generated cache key. The components:

  • :1: — the Redis database number (we're using database 1)
  • views.decorators.cache.cache_page — the decorator that created this key
  • GET./api/properties/cached/ — the HTTP method and path
  • .d41d8cd98f00b204e9800998ecf8427e — a hash of the query parameters (empty in this case, but if the URL had ?page=2, the hash would be different)

Monitor Redis in Real Time

Open a second terminal and run:

docker compose exec redis redis-cli monitor
Enter fullscreen mode Exit fullscreen mode

Leave this running. In your first terminal, hit the cached endpoint:

curl -o /dev/null -s http://localhost:8000/api/properties/cached/
Enter fullscreen mode Exit fullscreen mode

In the monitor terminal, you'll see:

"GET" ":1:views.decorators.cache.cache_page.GET./api/properties/cached/..."
Enter fullscreen mode Exit fullscreen mode

Hit it again. You'll see the same GET command. No SET — because the cache already has it.

Wait 60 seconds. Hit it again. You'll see:

"GET" ":1:views.decorators.cache.cache_page..."
"SETEX" ":1:views.decorators.cache.cache_page..." "60" "..."
Enter fullscreen mode Exit fullscreen mode

The GET returned nothing (cache expired), so Django queried the database and wrote a new value with SETEX (set with expiry).

This is the cache in action. Every command is visible. This is your debugging tool when cache behavior gets weird.


Part F: The Comparison — Before vs After

Same test. Same endpoint pattern. Different results.

The Warm Cache Test

The bash bombardment script from earlier tests the "cold start" problem — what happens when the cache is empty and 50 users hit it simultaneously. Now we test the opposite: what happens when the cache is already warm?

Prime the cache:

curl -o /dev/null -s http://localhost:8000/api/properties/cached/
Enter fullscreen mode Exit fullscreen mode

Now run the bombardment test against the cached endpoint:

#!/bin/bash
for i in {1..50}; do 
  curl -o /dev/null -s -w "Request $i: %{time_total}s\n" http://localhost:8000/api/properties/cached/ & 
done
wait
Enter fullscreen mode Exit fullscreen mode

Expected output:

Request 1: 0.004s
Request 2: 0.003s
Request 3: 0.005s
Request 4: 0.004s
...
Request 48: 0.007s
Request 49: 0.006s
Request 50: 0.005s
Enter fullscreen mode Exit fullscreen mode

All requests complete in under 10ms. No degradation. No queuing. Redis doesn't care how many requests hit it simultaneously — it's single-threaded and fast enough that even the 50th request feels instant.

The Locust Comparison : Mixed User

Update locustfile.py to test both endpoints side by side:

from locust import HttpUser, task, between


class CachedUser(HttpUser):
    """
    Most of your traffic should be cached. We weight this 5x higher.
    """
    weight = 5  # 5x more likely to spawn than NaiveUser
    wait_time = between(1, 2)

    @task
    def get_cached(self):
        self.client.get("/api/properties/cached/", name="Cached (Redis)")


class NaiveUser(HttpUser):
    """
    A small amount of traffic hits the naive endpoint for comparison.
    """
    weight = 1
    wait_time = between(2, 5)

    @task
    def get_naive(self):
        self.client.get("/api/properties/live/naive/", name="Naive (No Cache)")
Enter fullscreen mode Exit fullscreen mode

Run Locust again:

locust -f locustfile.py --host=http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

Start the test with 100 users total (the weighting means ~83 users will hit cached, ~17 will hit naive).

Prime the cache first by hitting the cached endpoint once manually before starting the Locust test:

curl -o /dev/null -s http://localhost:8000/api/properties/cached/
Enter fullscreen mode Exit fullscreen mode

Let Locust run for 2-3 minutes. Capture the results.


📊 [Locust Screenshot: Side-by-Side Statistics]

Locust Screenshot: Side-by-Side Statistics


📊 [Locust Screenshot: Side-by-Side Response Time Chart]

Locust Screenshot: Side-by-Side Response Time Chart

The Locust Comparison : Only Cache User

We also test only with cache user (which is not ideal use-case). Here are the results, as expected the response time is very very low.

📊 [Locust Screenshot: Only Cache Statistics]

Locust Screenshot: Cache Statistics

📊 [Locust Screenshot: Cache Chart (Response Time)]

Locust Screenshot: Cache Chart (Response Time)


The Victory Table

Metric No Cache With Cache (Warm) Improvement
Single request 60-80ms 4-7ms 93% faster
Query count 61 0 100% reduction
Under load (50 users) 500-1500ms 4-10ms 99% faster
P50 latency (Locust) ~600ms ~5ms 120x faster
P95 latency (Locust) ~1200ms ~8ms 150x faster
P99 latency (Locust) ~1500ms ~12ms 125x faster
Requests per second ~80 ~1200 15x throughput

Your exact numbers will vary based on your machine, but the ratios should be similar. The cache is 100x+ faster under load.

Show Redis Memory Usage

docker compose exec redis redis-cli info memory
Enter fullscreen mode Exit fullscreen mode

Look for the used_memory_human line:

used_memory_human:1.23M
Enter fullscreen mode Exit fullscreen mode

The cache is now using space. Each cached response is roughly 50KB (20 properties × 2.5KB per serialized property). One cached endpoint × 50KB = negligible. A hundred different URLs (different pages, different filters) × 50KB each = 5MB. A thousand = 50MB. This is why Redis is measured in gigabytes in production — it's holding thousands of cached pages simultaneously.


Part G: The Hidden Cost — Stale Data and the Thundering Herd

We've achieved 2-7ms response times. The database is untouched. The system feels instant. But we've introduced two problems that most tutorials skip over. Let's confront them.

Problem 1: Cache Staleness

The cache has no awareness of the underlying data. It doesn't know when a property's price changes. It only knows how to count to 60.

Test it:

  1. Open the Django admin: http://localhost:8000/admin/
  2. Navigate to Housing → Properties
  3. Pick any property. Change the price from $500,000 to $450,000. Save.
  4. Hit the naive endpoint:
curl -s http://localhost:8000/api/properties/live/naive/ | jq '.results[] | select(.id==1) | .price'
Enter fullscreen mode Exit fullscreen mode

You'll see "450000.00" — the new price. The database was updated. The query reflects it immediately.

  1. Hit the cached endpoint:
curl -s http://localhost:8000/api/properties/cached/ | jq '.results[] | select(.id==1) | .price'
Enter fullscreen mode Exit fullscreen mode

You'll see "500000.00" — the old price. The cache doesn't know the data changed. It will continue serving the stale price for up to 60 seconds.

Why this happens: The @cache_page decorator caches the entire response as a single blob. There's no mechanism to say "Property #1 changed, invalidate the cache." The only signal the cache understands is time.

When this is acceptable: If your data changes rarely and your users can tolerate a 60-second delay before seeing updates, this is fine. Many real estate portals operate this way — listings don't update by the second.

When this is unacceptable: If a price change needs to appear instantly (e.g., flash sales, stock prices, auction bids), you cannot use time-based caching alone. You need event-based invalidation. That's Part 4.

Problem 2: The Thundering Herd (Cache Stampede)

Your bombardment test revealed something subtle. Look at the results again:

Request 2: 0.522941s
Request 6: 0.555113s
Request 1: 0.561309s
...
Request 45: 1.261981s
Enter fullscreen mode Exit fullscreen mode

These are from the cached endpoint test. But the times aren't 4ms — they're 500ms+. What happened?

The answer: You hit an empty cache with 50 simultaneous requests. Here's the sequence of events:

  1. Request 1 arrives. Checks Redis. Cache is empty. Starts querying the database.
  2. Requests 2-50 arrive while Request 1 is still querying the database. They also check Redis. Cache is still empty (Request 1 hasn't finished saving yet). They all start querying the database.
  3. The database is now handling 50 concurrent queries. Same load as the naive endpoint.
  4. Eventually, all 50 requests finish. Request 1 saves its result to Redis. Requests 2-50 also try to save (depending on timing, some might see the cache by then, but most don't).

This is called a cache stampede or thundering herd. It happens when:

  • The cache expires (or is empty)
  • A burst of traffic hits the endpoint at the exact same moment
  • All requests see an empty cache and rush the database

The mitigation: Cache locking or probabilistic early recomputation. The idea: when the cache is about to expire (say, at 55 seconds out of 60), one request "locks" the cache key, refreshes it in the background, and extends the TTL. Other requests continue serving the slightly-stale cache while the refresh happens. This prevents the stampede.

Why we're not implementing it now: This is complexity layered on top of complexity. Part 3's goal is to show that caching works. Part 4's goal is to show that simple caching has limits. Cache locking is a Part 5 or Part 6 topic — after we've fixed the N+1 queries and introduced signal-based invalidation.

The takeaway: Caching doesn't eliminate load. It shifts it. A warm cache is near-perfect. A cold cache or an expired cache under high traffic behaves identically to no cache at all.


Part H: Tools Reference

Here are all the tools we used in this post, documented for future reference.

General Tools

curl --write-out — Terminal-based HTTP timing. Add -w "Total time: %{time_total}s\n" to any curl command to see the total response time. Add -o /dev/null -s to discard the body and silence progress output.

Locust — Load testing and visualization. Install with pip install locust. Run with locust -f locustfile.py --host=http://localhost:8000. Open http://localhost:8089 to see the web UI.

Browser DevTools (Network tab) — Visual inspection of requests. Right-click any request → Copy as cURL to reproduce it in the terminal.

Django-Specific

Django SQL logging — Prints every query to the console. Configure in settings.py with the LOGGING dict. Watch with docker compose logs -f backend | grep SELECT.

@cache_page decorator — Full-page caching. Syntax: @cache_page(timeout_in_seconds). Applied to views via @method_decorator for class-based views.

django-debug-toolbar — (Part 4) Shows query count, query time, cache hits/misses in a sidebar. We haven't installed it yet, but it's the next tool we add.

Database

EXPLAIN ANALYZE — PostgreSQL's query planner. Shows whether a query uses an index or does a sequential scan. Run inside psql:

EXPLAIN ANALYZE SELECT * FROM housing_property ORDER BY created_at DESC LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

pg_stat_user_tables — Live statistics on table activity. Check table sizes, scan counts, and row estimates:

docker compose exec db psql -U user -d housing_db -c "SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE schemaname='public';"
Enter fullscreen mode Exit fullscreen mode

Redis

redis-cli monitor — Real-time command stream. Every GET, SET, DEL appears as it happens. Critical for debugging cache behavior.

redis-cli keys * — List all keys in the current database. Shows what's cached right now.

redis-cli get — Retrieve the value of a specific key. Useful for inspecting cached data manually.

redis-cli del — Delete a key manually. Forces the next request to be a cache miss. Great for testing cache expiry behavior.

redis-cli info memory — Memory usage statistics. Check used_memory_human to see how much RAM the cache is consuming.


Part I: Troubleshooting

TypeError: CLIENT_CLASS got an unexpected keyword argument

The error:

TypeError: AbstractConnection.__init__() got an unexpected keyword argument 'CLIENT_CLASS'
Enter fullscreen mode Exit fullscreen mode

The cause: You're using Django 5.0's built-in Redis backend (django.core.cache.backends.redis.RedisCache) with options designed for the third-party django-redis library. The two have different configuration signatures.

The fix: Use django-redis consistently. In settings.py:

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",  # ← Not the built-in backend
        "LOCATION": os.environ.get("REDIS_URL", "redis://redis:6379/1"),
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
    }
}
Enter fullscreen mode Exit fullscreen mode

Make sure django-redis is in requirements.txt and installed in the container.

Cache Not Working (Always Slow)

The symptom: Every request to the cached endpoint is slow. redis-cli monitor shows no activity.

Possible causes:

  1. REDIS_URL is wrong or missing. Check docker-compose.yml:
  backend:
    environment:
      - REDIS_URL=redis://redis:6379/1
Enter fullscreen mode Exit fullscreen mode
  1. The cache backend isn't configured. Check settings.py — the CACHES dict must exist and point to Redis.

  2. The decorator isn't applied. Check views.py — the @method_decorator(cache_page(60)) line must be on the dispatch method, not the class itself.

The test:

docker compose exec backend python manage.py shell
Enter fullscreen mode Exit fullscreen mode
from django.core.cache import cache
cache.set('test', 'works')
print(cache.get('test'))  # Should print: works
Enter fullscreen mode Exit fullscreen mode

If this fails, the cache backend isn't connected to Redis.

Stale Data Persists Beyond TTL

The symptom: You update a property in the admin. You wait 60 seconds. The cached endpoint still shows the old data.

The cause: The cache key doesn't match the URL exactly. Common culprits:

  • Trailing slash mismatch: /api/properties/cached vs /api/properties/cached/ are different keys
  • Query parameters: ?page=1 vs ?page=2 are different keys, but ? with no parameters is different from no ? at all
  • HTTP method: GET and POST to the same URL are different keys

The fix: Inspect the actual cache keys:

docker compose exec redis redis-cli keys "*properties*"
Enter fullscreen mode Exit fullscreen mode

Match the exact URL pattern. If the key has .d41d8cd98f00b204e9800998ecf8427e at the end, that's the hash of the query string. If your URL has no query string, the hash should match that empty-string hash. If it doesn't, you're hitting a different URL than you think.

Cache Stampede on Expiry

The symptom: Every 60 seconds, response times spike to 500ms+ for a brief moment, then drop back to 5ms.

The cause: The cache expires. Multiple requests arrive during the expiration window. They all see an empty cache. They all query the database. Thundering herd.

The short-term fix: Increase the TTL. @cache_page(300) for 5 minutes instead of 60 seconds. This reduces the frequency of stampedes but doesn't eliminate them.

The long-term fix: Implement cache warming (a background task that refreshes the cache before it expires) or cache locking (only one process refreshes an expired cache while others wait or serve stale data). Both are beyond the scope of Part 3.


What We Built — And What Comes Next

Let's take stock. We went from a slow, query-heavy API to a system that responds in 4ms. We reduced database load by 100%. We increased throughput by 15x. Those are real, measurable wins.

But we also introduced two new problems: stale data and cache stampedes. The cache is a tool, not a solution. It's extraordinarily effective when used correctly and catastrophic when misunderstood.

Here's the honest state of things:

The cache works. When warm, it's 100x faster than the database. Redis is purpose-built for this. It's battle-tested. It scales. For read-heavy workloads like a housing portal, it's the right tool.

The cache is fragile. It breaks the moment the cache expires under high load. It serves stale data for up to 60 seconds after an update. It's an all-or-nothing tool — you cache the entire response or you don't cache at all.

The database is still slow. The first request (the cache miss) still fires 61 queries. If your traffic spikes at the exact moment the cache expires, your database still chokes. The cache hides the problem. It doesn't fix it.

That's the setup for Part 4. We're going to fix the problem at the source — the N+1 queries. We'll use select_related and prefetch_related to turn 61 queries into 1 query. We'll introduce signal-based cache invalidation so that when a property's price changes, the cache updates immediately instead of waiting for a TTL. And we'll move from full-page caching to granular object caching so we can invalidate individual properties without throwing away the entire listing.

The cache gave us speed. Part 4 gives us correctness.


Checkpoint: Push to GitHub

git add .
git commit -m "feat: add API endpoints, caching layer, and Locust load tests"
git checkout -b part-3-problem
git push origin part-3-problem
Enter fullscreen mode Exit fullscreen mode

The repo now has three branches:

  • part-1-setup — infrastructure only
  • part-2-data — database schema and seed data
  • part-3-problem — API layer and Redis cache

You can diff between any two branches to see exactly what changed:

git diff part-2-data..part-3-problem
Enter fullscreen mode Exit fullscreen mode

Next: Part 4 — Smart Invalidation and Query Optimization. Stay tuned.

Top comments (0)