canonical_url: https://revcardata.com/how-i-built-a-sub-10ms-car-database-api-for-86835-vehicles-using-fastapi-and-supabase/
Every developer who has ever attempted to build an application in the automotive space—whether it is an EV routing calculator, a B2B insurance comparison matrix, or an e-commerce platform for replacement parts—inevitably hits a massive data infrastructure wall. The automotive data ecosystem is currently heavily gatekept by legacy monopolies. They trap you in multi-month negotiation cycles only to deliver archaic, bloated XML feeds or poorly documented flat files.
Tired of maintaining fragile, ad-hoc web scrapers that break on every frontend update of public vehicle registries, I decided to engineer a production-ready, highly predictable, and lightning-fast automotive data engine called RevCarData.
This technical breakdown covers the precise architectural decisions, database index optimizations, and custom security middleware required to serve exactly 86,835 global vehicles with a sub-10ms latency profile.
1. The Tech Stack: Why FastAPI + Supabase?
When building a high-throughput data API, the primary engineering constraints are connection pooling, rapid JSON serialization, and minimal I/O overhead. The chosen tech stack directly reflects these needs:
-
FastAPI (Python 3.11+): Chosen over Django or Flask primarily due to its native asynchronous execution support (
async/await), automatic OpenAPI/Swagger generation, and Pydantic v2 data serialization layer, which compiles to highly efficient Rust under the hood. - Supabase (PostgreSQL): Instead of relying on a standard heavy ORM layer which adds execution overhead, Supabase allows us to interface directly with a robust PostgreSQL engine, utilize advanced structural indexing, and execute raw database-level remote procedure calls (RPC).
💡 Architectural Insight: By bypassing traditional synchronous ORMs and writing asynchronous endpoints that leverage native PostgreSQL functions, we eliminated python-level loops for data formatting, dropping our computational overhead significantly.
2. Breaking the Speed Barrier: Achieving Sub-10ms Latency
Serving detailed vehicle specifications (including highly granular electric vehicle metrics, battery capacities, torque, and dimensions) across over 86,000 records can quickly bottleneck database memory. To keep queries blindingly fast, we implemented two critical optimizations:
A. PostgreSQL Trigram and Composite Indexing
Users searching for cars rarely type exact strings. They type partial keywords like "Audi Q3 Competition". Standard B-Tree indexes fail on partial string matching (ILIKE %keyword%). We deployed GIN (Generalized Inverted Index) combined with the pg_trgm extension directly on the search columns (Make, Model, trimLevel).
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_vehicles_search_trgm ON vehicles USING gin ((Make || ' ' || Model || ' ' || trimLevel) gin_trgm_ops);
B. Database-Level RPC (Remote Procedure Calls) for UI Dropdowns
Populating dynamic UI frontend selectors (e.g., getting all unique makes available for a specific year) by querying a raw table of 86,835 rows with a SELECT DISTINCT statement is completely inefficient. Instead, we wrote an optimization function directly in PostgreSQL and call it via an RPC route:
CREATE OR REPLACE FUNCTION get_unique_makes(p_year INT DEFAULT NULL)
RETURNS TABLE(make TEXT) AS $$
BEGIN
RETURN QUERY
SELECT DISTINCT vehicles."Make"::TEXT
FROM vehicles
WHERE (p_year IS NULL OR vehicles."Year" = p_year)
ORDER BY vehicles."Make"::TEXT;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
3. The Architecture Challenge: Designing the Hybrid Security & Billing Gateway
As a commercial B2B platform, the system must monetize effectively. However, developers have different consumption preferences. Some prefer integration via centralized global marketplaces like RapidAPI, while others require direct API keys via our corporate portal (Stripe checkout) or flat-file database dumps (raw CSV/JSON datasets) to train machine learning models.
To solve this without maintaining two separate server clusters, I engineered a custom hybrid security middleware within FastAPI. The gateway validates incoming traffic dynamically using two distinct validation tracks:
| Inbound Vector | Authentication Header | Validation Layer | Rate-Limiting Rule |
|---|---|---|---|
| RapidAPI Client | x-rapidapi-proxy-secret |
Upstream Marketplace Proxy Check | Managed via RapidAPI Tiers |
| Direct Portal Client | X-API-Key |
Supabase api_customers Query |
Hard Quota Limit + Dynamic Decrement |
Here is the production-grade Python implementation of our hybrid gateway middleware. It seamlessly processes marketplace proxy requests and direct token validations in a single asynchronous pass:
from fastapi import Security, HTTPException, Request
from fastapi.security import APIKeyHeader
from typing import Optional
# auto_error=False prevents FastAPI from blocking requests missing the direct header
api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False)
RAPIDAPI_SECRET = "8b0503f0-538a-11f1-b8ee-6b6a98cea25b"
async def get_api_key(request: Request, api_key: Optional[str] = Security(api_key_header)):
# TRACK 1: RAPIDAPI UPSTREAM GATEWAY CHECK
provided_secret = request.headers.get("x-rapidapi-proxy-secret")
if provided_secret == RAPIDAPI_SECRET:
return "rapidapi_gateway_user"
# TRACK 2: DIRECT PORTAL CHECK (revcardata.com / Stripe)
if not api_key:
raise HTTPException(status_code=401, detail="Missing X-API-Key or valid Proxy Secret")
if api_key == "rev_live_pro999": # Master testing key
return api_key
try:
# Dynamic token and database validation lookup via Supabase Client
res = supabase.table("api_customers").select("*").eq("api_key", api_key).execute()
if not res.data:
raise HTTPException(status_code=401, detail="Invalid API Key registration.")
customer = res.data[0]
usage = int(customer.get('current_usage', 0))
limit = int(customer.get('monthly_limit', 50000))
if usage >= limit:
raise HTTPException(status_code=429, detail="Monthly request quota exhausted.")
# Atomically increment resource utilization counters
supabase.table("api_customers").update({"current_usage": usage + 1}).eq("api_key", api_key).execute()
return api_key
except Exception as e:
if isinstance(e, HTTPException): raise e
raise HTTPException(status_code=500, detail="Internal Authentication Infrastructure Offline")
4. Standardizing the Schema: The YMME Model
Automotive manufacturers label data erratically across global regions. To make the JSON response perfectly predictable for front-end developers, we normalized all data around the industry standard YMME (Year, Make, Model, Engine) matrix.
The responses are split into logical, modular endpoints depending on data intensity. A primary search endpoint (/api/v1/vehicles) delivers lightweight identification arrays using smart structural pagination. When deep technical specifications are required, clients hit the premium specs endpoint, which responds with clean, nested JSON structures:
{
"id": 1042,
"pricing": { "base_msrp_usd": 89400.00, "base_msrp_eur": 82100.00 },
"dimensions": {
"length_mm": 4963,
"width_mm": 1966,
"height_mm": 1379,
"curb_weight_kg": 2295
},
"ev_specs": {
"battery_capacity_kwh": 93.4,
"real_world_range_km": 452
},
"acceleration_0_100": 3.9
}
5. Key Engineering Takeaways
Building a high-availability B2B data product requires a strict separation of concerns. By handling security routing natively at the framework middleware layer and transferring complex query distinctions down to indexed PostgreSQL operations, we created an engine capable of handling enterprise loads with close to zero infrastructure costs.
The entire live playground is open for testing. You can explore our interactive OpenAPI schema, query live vehicle endpoints, or grab sample raw database structures directly at revcardata.com.
Top comments (0)