The Problem
Tripvento's city-matrix endpoint returns every hotel in a city scored against 14 traveler personas. At 33 cities it felt fast. At 212 cities, with 24,000+ hotels, cold responses were creeping toward 700ms and the payload for a single city had quietly reached 1.8MB.
Nothing was broken. No alerts were firing. But I knew what that feeling meant. I was about to hit a wall.
Before touching a single line of code, I opened a Django shell and started measuring.
Profile First. Fix Second.
The biggest mistake I see in performance debugging is jumping straight to solutions. Add an index. Throw in a cache. Upgrade the droplet. All of these can mask the real problem while adding complexity.
I isolated each layer independently:
import time
from hotels.models import StagingHotel, StagingHotelIntent
from hotels.serializers import HotelIntentMatrixSerializer
from django.db.models import Prefetch
# layer 1 raw query
start = time.time()
qs = list(StagingHotel.objects.filter(
destination__name__iexact='Savannah',
permanently_closed=False
).select_related(
'destination', 'destination__region', 'neighborhood_obj'
).prefetch_related(
'metrics',
Prefetch('intents',
queryset=StagingHotelIntent.objects.filter(is_eligible=True),
to_attr='prefetched_intents')
)[:15])
print(f'Query: {(time.time()-start)*1000:.0f}ms')
# layer 2 serializer
start = time.time()
data = HotelIntentMatrixSerializer(qs, many=True, context={'demo_mode': True}).data
print(f'Serialize: {(time.time()-start)*1000:.0f}ms')
# layer 3 JSON
start = time.time()
json_str = json.dumps(data)
print(f'JSON dump: {(time.time()-start)*1000:.0f}ms, {len(json_str)//1024}KB')
Results:
City Matrix: 176ms
Serialize: 19ms
JSON dump: 8ms
That's 203ms in Python. But cold curl requests were hitting 679ms. The missing ~470ms was Django middleware, DRF request parsing, and network. Knowing that told me where not to look.
The Python layer was the fixable part. I had 203ms to work with.
Fix 1: The 95ms Line
The single biggest win came from one method call I hadn't thought twice about.
In CityMatrixViewSet.list(), I was calling queryset.count() before slicing — to populate a total_available field in the response. Reasonable sounding. Completely wasteful.
# This ran on every single request
total_count = queryset.count() # Full table scan - 95ms
For demo tier users capped at 15 results, I didn't need the exact count. I just needed to know if there were more.
# Fetch 16, check if a 16th exists
limited_qs = list(queryset[:result_limit + 1])
has_more = len(limited_qs) > result_limit
queryset = limited_qs[:result_limit]
# total_count gone entirely
95ms eliminated. One change. No tradeoff.
The broader lesson: count() in Django triggers a SELECT COUNT(*) where Postgres must traverse the index and verify row visibility for every matching row due to MVCC. If you're calling it on a large filtered queryset just to display a number, ask whether that number is actually used.
Fix 2: The Prefetch Cache That Wasn't
Django's prefetch_related is supposed to batch related object lookups. You set it up in get_queryset(), and when the serializer accesses related objects, it reads from the in-memory cache instead of hitting the database again.
Except I wasn't actually using it.
In my serializer:
def _get_latest_metric(self, obj):
if not hasattr(obj, '_latest_metric_cache'):
# DB query per hotel
obj._latest_metric_cache = obj.metrics.first()
return obj._latest_metric_cache
.first() on a related manager bypasses the prefetch cache entirely. Django evaluates it as a fresh queryset. With 15 hotels per response, that's 15 extra queries silently, on every request.
The fix is reading from the prefetch cache directly:
def _get_latest_metric(self, obj):
if not hasattr(obj, '_latest_metric_cache'):
prefetched = getattr(obj, '_prefetched_objects_cache', {}).get('metrics')
if prefetched is not None:
obj._latest_metric_cache = prefetched[0] if prefetched else None
else:
obj._latest_metric_cache = obj.metrics.first()
return obj._latest_metric_cache
Same result. Zero extra queries.
Note: If you used
to_attr='prefetched_metrics'in your Prefetch object, you can skip the internal cache dict entirely and usegetattr(obj, 'prefetched_metrics', None)instead .
Fix 3: Index Hygiene
I had five indexes on StagingHotel. One of them was doing nothing.
indexes = [
models.Index(fields=['destination']), # redundant
models.Index(fields=['destination', 'permanently_closed']), # covers the above
models.Index(fields=['latitude', 'longitude']),
models.Index(fields=['provider_id']),
models.Index(fields=['is_ready_for_production']),
]
A composite index on (destination, permanently_closed) already handles any query filtering on destination alone — PostgreSQL uses the leftmost columns of a composite index. The standalone destination index was dead weight: taking up space, slowing down writes, and contributing nothing to reads.
Dropped it.
The broader habit: audit your indexes the same way you audit your code. Redundant indexes aren't free — they cost write performance and memory.
Fix 4: PostgreSQL Wasn't Tuned At All
My docker-compose.yml had no PostgreSQL configuration. Out of the box, Postgres ships with conservative defaults designed for shared hosting environments circa 2005.
command: >
postgres
-c shared_buffers=256MB
-c effective_cache_size=768MB
-c work_mem=16MB
-c maintenance_work_mem=128MB
-c random_page_cost=1.1
shared_buffers tells Postgres how much RAM to use for caching data pages. The default is 128MB — laughably low for a modern server. random_page_cost=1.1 tells the query planner that random disk reads are almost as cheap as sequential ones, which is true on SSDs and pushes it toward index scans over sequential scans.
None of this requires code changes. It's configuration. It costs nothing and the gains are immediate.
Fix 5: Stop Sending Data Nobody Asked For
After fixing the query layer, I looked at what I was actually sending over the wire.
City-matrix at 111 hotels: 1.8MB. That's ~16KB per hotel which included story fields, full amenity lists, nearby POI breakdowns, images, algorithm metadata duplicated on every single hotel object.
The algorithm metadata was the easiest fix. I was serializing the same engine config — version, fusion strategy, radius — on every hotel in the response. 111 copies of the same object.
But the structural fix was bigger: I added a ?thin=true mode that strips hotel detail fields entirely and returns only what's needed for rendering a ranked list like id, name, location, detail_url, and all scores.
# Fat — full hotel details
GET /rankings/?destination=miami_fl&intent=romantic
# ~466KB
# Thin — scores + identifiers only
GET /rankings/?destination=miami_fl&intent=romantic&thin=true
# ~88KB
466KB → 88KB. 81% reduction.
For city-matrix, which powers comparison UIs and pre caching thin is now the default. If a user selects a specific hotel, the frontend fetches the full detail via detail_url. You pay for the data when you need it, not on every list render.
The previous PostGIS article covers how I eliminated the 55M row HotelPOI table from the database entirely. That's the same principle applied to storage, don't persist what you don't query.
Fix 6: The Cache Was a Time Bomb
While auditing the query layer, I checked Redis and found this:
Maxmemory Policy: noeviction
Maxmemory: 25.00 MB
Used: 7.21 MB
RSS: 23.67 MB
noeviction is Redis's default policy. It means when memory fills up, Redis stops accepting writes and returns errors. Not "evict old keys." Not "evict least recently used." Just errors.
My RSS was already at 23.67MB on a 25MB plan. I was one traffic spike away from Redis silently breaking and every cache miss becoming a cold Postgres hit.
One command fixed it:
heroku redis:maxmemory -a your-app --policy allkeys-lru
# we've since migrated to DigitalOcean Valkey the policy is set the same way via redis-cli or your provider's dashboard.
allkeys-lru evicts the least recently used keys when memory fills up. For a cache, this is always the right policy because you'd rather lose stale data than have writes fail. The cache degrades gracefully under pressure instead of exploding. This assumes Redis is a dedicated cache. If you're also using it for Celery queues or persistent data, use volatile-lru instead.
This isn't a performance optimization. It's a correctness fix. The "optimization" is that your cache actually keeps working when traffic spikes instead of silently becoming a liability.
If you're running Redis as a cache and haven't checked your eviction policy, go check it right now.
Fix 7: The Easiest Win of All
After all the query and payload work, I realized I'd never enabled gzip compression.
Django ships with GZipMiddleware built in. It's not enabled by default. One line:
MIDDLEWARE = [
'corsheaders.middleware.CorsMiddleware',
'django.middleware.gzip.GZipMiddleware',
'django.middleware.security.SecurityMiddleware',
# ...
]
The result on city-matrix:
Before: 1.81MB
After: 261KB
7x reduction. One line of middleware.
Rankings followed the same pattern, compressed responses dropped to a fraction of their original size. Any client sending Accept-Encoding: gzip (every modern HTTP client does) gets the compressed version automatically. Django handles the negotiation.
I'd spent weeks optimizing queries and serializers. The biggest single payload reduction came from a middleware I'd simply forgotten to turn on.
Check your middleware stack right now. If GZipMiddleware isn't there, add it before you do anything else in this list.
The Results
Metric |
Before |
After |
|---|---|---|
Cold E2E response |
~679ms |
~188ms |
Warm/cached response |
~500ms |
12ms |
|
95ms per request |
0ms |
City-matrix payload (uncompressed) |
1.81MB |
261KB (gzip) |
City-matrix payload (thin + gzip) |
1.81MB |
~88KB |
Overall payload reduction |
— |
~95% |
Redis under pressure |
Errors |
Graceful eviction |
Same algorithm. Same infrastructure. Same DigitalOcean droplet.
The Pattern
Every fix here followed the same logic as the PostGIS migration: stop paying for things you don't need.
count()— paying for a full table scan to return a number nobody used.first()bypassing prefetch — paying for N database queries when you already had the data in memoryRedundant index — paying write overhead for an index that helped nothing
1.8MB payload — paying for CDN egress and client parsing on data that never got rendered
noevictionRedis — paying with production errors when you could have paid with stale cache evictionMissing gzip — paying to transfer 7x more bytes than necessary, every single request
The instinct when something is slow is to add: more cache, more indexes, more infrastructure. Sometimes the right move is to measure first and then remove the thing that shouldn't be there.
This is part 3 of the Building Tripvento series. Part 1 covered deleting 55M rows to scale the database. Part 2 covered the multi LLM self healing data pipeline. Next up: how I built a content factory that generates destination guides at scale.
I'm Ioan Istrate, founder of Tripvento — a hotel ranking API that scores properties against 14 traveler personas using geospatial intelligence and semantic AI. Previously worked on ranking systems at U.S. News & World Report. If you want to nerd out about Django performance or API design, let's connect on LinkedIn.
Top comments (0)