How I made a 31M-row SQLite query go from 168 seconds to 5 seconds

#showdev #python #database #webdev

My property comparison app was working great in development. Then a user in Norwich searched for nearby sales and my API took 168 seconds to respond. nginx gave up at 60 seconds and returned a 504.

Here's how I fixed it -- and what I learned about SQLite performance at scale.

The Setup

I built a property comparables API that serves recent sales data from government open datasets. The UK version uses HM Land Registry Price Paid Data -- every residential property transaction in England and Wales since 1995.

That's 31 million rows in a single SQLite database. 8.7GB on disk.

The API is simple: give it a postcode, it finds nearby sales within a radius, filtered by property type and date range. The stack is Python + FastAPI + SQLite, running on a single $20/month VPS.

For most postcodes, it was fine. London postcodes with a few hundred results came back in under a second. But Norwich postcode NR1 has 1,941 postcodes in its search radius -- and that's where everything broke.

The Query

SELECT *
FROM transactions
WHERE postcode IN (?, ?, ?, ... )  -- up to 1,941 postcodes
  AND property_type = ?
  AND date >= ?
ORDER BY date DESC
LIMIT 100

Nothing exotic. But SQLite was doing a full table scan on 31M rows for each query.

The Wrong Fix I Tried First

My instinct was to add an index on postcode:

CREATE INDEX idx_postcode ON transactions(postcode);

This helped -- queries dropped to about 40 seconds. Still unusable. The problem is that after finding rows by postcode, SQLite still had to scan all matching rows to filter by property_type and date.

The Actual Fix: Compound Indexes

The key insight is that SQLite can only use one index per table per query. If your WHERE clause filters on three columns, a single-column index only helps with the first filter. The rest are scanned.

The fix is a compound index that matches your query pattern:

CREATE INDEX idx_comp ON transactions(postcode, property_type, date DESC);

The column order matters -- it must match the query's filter order. SQLite walks the B-tree left to right: first it narrows by postcode, then by property type, then by date range. All three filters are served by the same index lookup.

After creating the index:

ANALYZE;  -- update the query planner's statistics

Result: NR1 went from 168 seconds to 4.8 seconds. Every other postcode dropped to under 1 second.

The Same Fix, Different Country

I had the same problem with the France version -- 8.3 million transactions from the DVF open dataset. Same fix:

CREATE INDEX idx_comp ON transactions(code_postal, type_local, date_mutation DESC);
ANALYZE;

All major French cities went to under 100ms.

What Else I Learned

1. ANALYZE is not optional. Without it, SQLite's query planner doesn't know about the index distribution and may choose a suboptimal plan. I spent an hour wondering why my new index wasn't helping before realizing I'd skipped this.

2. nginx timeout matters. Default proxy_read_timeout is 60 seconds. Dense queries that take 5-10 seconds are fine, but if you ever regress, nginx will kill the connection before the user sees an error. I bumped it to 300 seconds as a safety net:

location /v1/ {
    proxy_read_timeout 300s;
    proxy_pass http://127.0.0.1:8060;
}

3. SQLite handles 31M rows better than you'd think. There's a bias in the industry that SQLite is "for small stuff." With proper indexing, it serves production traffic on a single VPS without connection pooling, replication, or any of the Postgres/MySQL infrastructure. My entire hosting cost is $20/month for 11 country databases.

4. Test with your densest data. I had been testing with London postcodes (small radius, few matches). The bug only appeared in Norwich (huge radius, 1,941 postcodes). Now I always test with the worst case after any database or query change.

The Numbers

Metric	Before	After
NR1 (worst case)	168s	4.8s
SW1A (London)	12s	0.3s
NR1 via HTTPS	504 timeout	5.1s
Index size overhead	--	+1.2GB
Total DB size	8.7GB	9.9GB

The 1.2GB index overhead is a good trade for 30x query speedup.

Live Demo

The API serves comparable sales across 11 countries, all using the same SQLite + compound index pattern. If you want to see the result: property.nwc-advisory.com for UK data.

The API is also available on
RapidAPI(https://rapidapi.com) if you want to integrate property comps into your own tools, and there's an open-source MCP server if you use Claude or other AI agents.