What killed my vector database on a free-tier container

#python #fastapi #websockets #ai

I built a real-time insights dashboard that ingests live public weather
data, runs every reading through an anomaly detector and a similarity
search, and streams the results to the browser over WebSockets as they
happen, a small demonstration of a real-time data platform architecture
(ingestion, streaming, AI-driven pattern detection, vector search) using a
free, no-auth public API instead of mocked data.

Live demo: https://realtime-weather-insights-production.up.railway.app
Code: https://github.com/leovasone/realtime-weather-insights
Case study: https://vasone.com.br/realtime-insights.html

Architecture

Open-Meteo API  →  poll loop (60s)  →  anomaly detector (z-score)
                                     →  vector store (in-memory)  →  similarity search
                                     →  narrator (Claude, optional)
                                     ↓
                              WebSocket broadcast  →  browser dashboard

An async loop polls a handful of cities every 60 seconds. Each reading
passes through a rolling z-score anomaly detector (per city, per metric —
temperature, humidity, wind, pressure, cloud cover) and a similarity search
that finds the closest historical match in a different city. The combined
result is broadcast to every connected client, no polling on the browser
side.

Two real production issues surfaced once this was actually running with
real traffic on a resource-constrained host, and neither would have shown
up in a local demo.

Issue 1: the vector database was solving a scale problem I didn't have

The first version used ChromaDB for the similarity search. In production on
Railway's free tier, the container got killed and restarted roughly every
60-70 seconds, no traceback, consistent with an out-of-memory kill, which
dropped every open WebSocket connection each time.

The actual data volume here is tiny: a handful of 5-dimensional numeric
vectors per city. A full vector database with its native bindings was
solving a scale problem this app doesn't have. I replaced it with a
brute-force in-memory search behind the exact same interface, no external
dependency, no native bindings, no memory overhead beyond a small bounded
list.

The lesson generalizes: before reaching for a vector database on a
resource-limited host, it's worth checking whether brute-force search over
your actual data volume is already fast enough. For small N, it usually is,
and it removes an entire class of deployment risk.

Issue 2: one bad CDN URL took down more than the chart

Chart.js was originally loaded via a single blocking <script> tag. In at
least one real browser session it silently failed to load, a
case-sensitivity typo in the CDN URL, and because the entire page's script
ran as one block, that single failure prevented the WebSocket connection
from ever being established. The chart panel being blank was the visible
symptom; the real bug was that an unrelated third-party script failure
could take down the entire live data feed.

I rewrote the loading to be decoupled: it tries one CDN, falls back to a
second if that fails, and shows a visible "chart unavailable" message
instead of empty space if both fail, none of which blocks the WebSocket
connection or the per-city cards, which never actually depended on Chart.js
in the first place. The fix wasn't really about the chart library; it was
about making sure a non-critical dependency failing can't cascade into
critical functionality failing.

Being honest about what's actually "AI" here

Z-score anomaly detection and vector-distance similarity search are
legitimate, useful techniques, but they're statistics and linear algebra,
not machine learning models. The one genuinely generative-AI piece of this
pipeline is an optional narrator (Claude Haiku) that, at most once per
60-second cycle across all cities combined, turns the structured anomalies
and similarity matches into a single plain-language sentence. It's called
once per cycle rather than once per city, both to keep cost negligible and
because "something changed somewhere this minute" is a more useful unit of
narration than several separate one-line summaries every cycle.

One calibration bug is worth mentioning here too: I initially asked the
narrator to apply a numeric similarity threshold itself (e.g. "only call
two cities near-identical if their distance is below 0.05"). A small, cheap
model doesn't reliably enforce a numeric rule buried in a prompt, it
called a Tokyo/Sydney pair "quase idênticas" despite an 18 km/h wind gap
that should have disqualified it. The fix was to stop asking the LLM to
make that judgment at all: a closeness_label() function in code now
computes the qualitative phrase and flags any large single-metric gap
deterministically, and the narrator is instructed to use that exact
phrase verbatim rather than deciding the wording itself. If you're wiring
an LLM into a pipeline with quantitative thresholds, that logic almost
always belongs in code, not in the prompt.

If you've hit similar failure modes running AI pipelines on constrained
infrastructure, I'd like to hear about them in the comments.