<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leonardo Vasone</title>
    <description>The latest articles on DEV Community by Leonardo Vasone (@leovasone).</description>
    <link>https://dev.to/leovasone</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4014333%2F54688f89-a27c-4965-9ac9-b0849e00adfb.jpg</url>
      <title>DEV Community: Leonardo Vasone</title>
      <link>https://dev.to/leovasone</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leovasone"/>
    <language>en</language>
    <item>
      <title>What killed my vector database on a free-tier container</title>
      <dc:creator>Leonardo Vasone</dc:creator>
      <pubDate>Sat, 04 Jul 2026 03:41:39 +0000</pubDate>
      <link>https://dev.to/leovasone/what-killed-my-vector-database-on-a-free-tier-container-10h1</link>
      <guid>https://dev.to/leovasone/what-killed-my-vector-database-on-a-free-tier-container-10h1</guid>
      <description>&lt;p&gt;I built a real-time insights dashboard that ingests live public weather&lt;br&gt;
data, runs every reading through an anomaly detector and a similarity&lt;br&gt;
search, and streams the results to the browser over WebSockets as they&lt;br&gt;
happen, a small demonstration of a real-time data platform architecture&lt;br&gt;
(ingestion, streaming, AI-driven pattern detection, vector search) using a&lt;br&gt;
free, no-auth public API instead of mocked data.&lt;/p&gt;

&lt;p&gt;Live demo: &lt;a href="https://realtime-weather-insights-production.up.railway.app" rel="noopener noreferrer"&gt;https://realtime-weather-insights-production.up.railway.app&lt;/a&gt;&lt;br&gt;
Code: &lt;a href="https://github.com/leovasone/realtime-weather-insights" rel="noopener noreferrer"&gt;https://github.com/leovasone/realtime-weather-insights&lt;/a&gt;&lt;br&gt;
Case study: &lt;a href="https://vasone.com.br/realtime-insights.html" rel="noopener noreferrer"&gt;https://vasone.com.br/realtime-insights.html&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Open-Meteo API  →  poll loop (60s)  →  anomaly detector (z-score)
                                     →  vector store (in-memory)  →  similarity search
                                     →  narrator (Claude, optional)
                                     ↓
                              WebSocket broadcast  →  browser dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An async loop polls a handful of cities every 60 seconds. Each reading&lt;br&gt;
passes through a rolling z-score anomaly detector (per city, per metric —&lt;br&gt;
temperature, humidity, wind, pressure, cloud cover) and a similarity search&lt;br&gt;
that finds the closest historical match in a &lt;em&gt;different&lt;/em&gt; city. The combined&lt;br&gt;
result is broadcast to every connected client, no polling on the browser&lt;br&gt;
side.&lt;/p&gt;

&lt;p&gt;Two real production issues surfaced once this was actually running with&lt;br&gt;
real traffic on a resource-constrained host, and neither would have shown&lt;br&gt;
up in a local demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Issue 1: the vector database was solving a scale problem I didn't have
&lt;/h2&gt;

&lt;p&gt;The first version used ChromaDB for the similarity search. In production on&lt;br&gt;
Railway's free tier, the container got killed and restarted roughly every&lt;br&gt;
60-70 seconds, no traceback, consistent with an out-of-memory kill, which&lt;br&gt;
dropped every open WebSocket connection each time.&lt;/p&gt;

&lt;p&gt;The actual data volume here is tiny: a handful of 5-dimensional numeric&lt;br&gt;
vectors per city. A full vector database with its native bindings was&lt;br&gt;
solving a scale problem this app doesn't have. I replaced it with a&lt;br&gt;
brute-force in-memory search behind the exact same interface, no external&lt;br&gt;
dependency, no native bindings, no memory overhead beyond a small bounded&lt;br&gt;
list.&lt;/p&gt;

&lt;p&gt;The lesson generalizes: before reaching for a vector database on a&lt;br&gt;
resource-limited host, it's worth checking whether brute-force search over&lt;br&gt;
your actual data volume is already fast enough. For small N, it usually is,&lt;br&gt;
and it removes an entire class of deployment risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Issue 2: one bad CDN URL took down more than the chart
&lt;/h2&gt;

&lt;p&gt;Chart.js was originally loaded via a single blocking &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag. In at&lt;br&gt;
least one real browser session it silently failed to load, a&lt;br&gt;
case-sensitivity typo in the CDN URL, and because the entire page's script&lt;br&gt;
ran as one block, that single failure prevented the WebSocket connection&lt;br&gt;
from ever being established. The chart panel being blank was the visible&lt;br&gt;
symptom; the real bug was that an unrelated third-party script failure&lt;br&gt;
could take down the entire live data feed.&lt;/p&gt;

&lt;p&gt;I rewrote the loading to be decoupled: it tries one CDN, falls back to a&lt;br&gt;
second if that fails, and shows a visible "chart unavailable" message&lt;br&gt;
instead of empty space if both fail, none of which blocks the WebSocket&lt;br&gt;
connection or the per-city cards, which never actually depended on Chart.js&lt;br&gt;
in the first place. The fix wasn't really about the chart library; it was&lt;br&gt;
about making sure a non-critical dependency failing can't cascade into&lt;br&gt;
critical functionality failing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being honest about what's actually "AI" here
&lt;/h2&gt;

&lt;p&gt;Z-score anomaly detection and vector-distance similarity search are&lt;br&gt;
legitimate, useful techniques, but they're statistics and linear algebra,&lt;br&gt;
not machine learning models. The one genuinely generative-AI piece of this&lt;br&gt;
pipeline is an optional narrator (Claude Haiku) that, at most once per&lt;br&gt;
60-second cycle across all cities combined, turns the structured anomalies&lt;br&gt;
and similarity matches into a single plain-language sentence. It's called&lt;br&gt;
once per cycle rather than once per city, both to keep cost negligible and&lt;br&gt;
because "something changed somewhere this minute" is a more useful unit of&lt;br&gt;
narration than several separate one-line summaries every cycle.&lt;/p&gt;

&lt;p&gt;One calibration bug is worth mentioning here too: I initially asked the&lt;br&gt;
narrator to apply a numeric similarity threshold itself (e.g. "only call&lt;br&gt;
two cities near-identical if their distance is below 0.05"). A small, cheap&lt;br&gt;
model doesn't reliably enforce a numeric rule buried in a prompt, it&lt;br&gt;
called a Tokyo/Sydney pair "quase idênticas" despite an 18 km/h wind gap&lt;br&gt;
that should have disqualified it. The fix was to stop asking the LLM to&lt;br&gt;
make that judgment at all: a &lt;code&gt;closeness_label()&lt;/code&gt; function in code now&lt;br&gt;
computes the qualitative phrase and flags any large single-metric gap&lt;br&gt;
deterministically, and the narrator is instructed to use that exact&lt;br&gt;
phrase verbatim rather than deciding the wording itself. If you're wiring&lt;br&gt;
an LLM into a pipeline with quantitative thresholds, that logic almost&lt;br&gt;
always belongs in code, not in the prompt.&lt;/p&gt;

&lt;p&gt;If you've hit similar failure modes running AI pipelines on constrained&lt;br&gt;
infrastructure, I'd like to hear about them in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>websockets</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
