<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Grigoriy</title>
    <description>The latest articles on DEV Community by Grigoriy (@grigoriyraze).</description>
    <link>https://dev.to/grigoriyraze</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3977692%2Fc8594755-0b8d-45ef-8c11-2eaed2ffac64.webp</url>
      <title>DEV Community: Grigoriy</title>
      <link>https://dev.to/grigoriyraze</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/grigoriyraze"/>
    <language>en</language>
    <item>
      <title>I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke</title>
      <dc:creator>Grigoriy</dc:creator>
      <pubDate>Wed, 10 Jun 2026 12:24:22 +0000</pubDate>
      <link>https://dev.to/grigoriyraze/i-built-an-ai-personalized-news-bot-that-runs-on-a-single-sqlite-file-architecture-and-everything-1o5n</link>
      <guid>https://dev.to/grigoriyraze/i-built-an-ai-personalized-news-bot-that-runs-on-a-single-sqlite-file-architecture-and-everything-1o5n</guid>
      <description>&lt;h1&gt;
  
  
  I built an AI-personalized news bot that runs on a single SQLite file — architecture and everything that broke
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;TL;DR — &lt;a href="https://t.me/futur_e_news_bot?start=devto" rel="noopener noreferrer"&gt;@futur_e_news_bot&lt;/a&gt;. A bilingual&lt;br&gt;
(EN/RU) Telegram news feed. AI removes duplicates, clusters one event into one card,&lt;br&gt;
translates, and tunes the feed to your reactions. Stack: aiogram, local ONNX&lt;br&gt;
embeddings, &lt;strong&gt;sqlite-vec instead of pgvector&lt;/strong&gt;, a chain of free LLMs via OpenRouter,&lt;br&gt;
one worker on Fly.io. It started at ~$5/month — and broke in three interesting ways&lt;br&gt;
I'll walk through.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wanted a news feed without two things: &lt;strong&gt;duplicates&lt;/strong&gt; (the same event from five&lt;br&gt;
outlets with five headlines) and &lt;strong&gt;noise&lt;/strong&gt; (90% of the agenda I don't care about). So&lt;br&gt;
one weekend turned into a Telegram bot that de-dupes, learns what I like from 🔥/❤️/😢&lt;br&gt;
reactions, and shows everything in two languages.&lt;/p&gt;

&lt;p&gt;The interesting part isn't the idea — it's that the whole thing runs as &lt;strong&gt;one worker&lt;br&gt;
process against a single SQLite file&lt;/strong&gt;, no database server, no Redis, no queue. Here's&lt;br&gt;
how, and what it cost me (in dollars and in debugging).&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture: one worker, one SQLite, no DB server
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────┐                      ┌────────────────────┐
│ Fly machine #1           │                      │ Fly machine #2     │
│  ┌─────────────────────┐ │                      │ (private-only)     │
│  │ aiogram long-polling│ │                      │   RSSHub           │
│  ├─────────────────────┤ │                      │  (Telegram → RSS)  │
│  │ APScheduler jobs    │◄┼──── Fly 6PN ─────────┤                    │
│  │  pipeline / deliver │ │  *.internal:1200     │                    │
│  │  digests / watchdog │ │                      │                    │
│  └─────────────────────┘ │                      └────────────────────┘
│  ┌─────────────────────┐ │
│  │ SQLite + sqlite-vec │ │  ← all state, on a Fly volume
│  ├─────────────────────┤ │
│  │ fastembed (ONNX)    │ │  ← embeddings computed locally, no API
│  └─────────────────────┘ │
└──────────┬───────────────┘
           │ HTTPS
           ▼
    OpenRouter (free LLM chain → cheap paid fallback)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;No public HTTP, no load balancer, no separate DB machine. The bot polls Telegram,&lt;br&gt;
APScheduler runs the jobs, and &lt;strong&gt;all state lives in one SQLite file&lt;/strong&gt; on a volume.&lt;br&gt;
RSSHub (which bridges public Telegram channels into RSS) is a second app reachable&lt;br&gt;
only over Fly's private network.&lt;/p&gt;

&lt;p&gt;A hobby instance fits in 512 MB at roughly &lt;strong&gt;$5/month&lt;/strong&gt; (I'm now on 2 GB ≈ $15/mo after&lt;br&gt;
a traffic spike — more on why below). LLM cost runs &lt;strong&gt;$0–1/month&lt;/strong&gt; because most calls&lt;br&gt;
hit free models.&lt;/p&gt;
&lt;h2&gt;
  
  
  sqlite-vec instead of pgvector
&lt;/h2&gt;

&lt;p&gt;It started as Postgres + pgvector. That works great, but it needs a separate DB machine&lt;br&gt;
(another ~$5/mo on Fly), its own secrets, backups, migrations — a lot of infra just to&lt;br&gt;
store embeddings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/asg017/sqlite-vec" rel="noopener noreferrer"&gt;sqlite-vec&lt;/a&gt; is a native SQLite extension that&lt;br&gt;
adds &lt;code&gt;vec0&lt;/code&gt; virtual tables with cosine/L2/Hamming KNN right in SQL. It's basically&lt;br&gt;
pgvector, embeddable and serverless.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# create the vector table
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec_driver_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE VIRTUAL TABLE IF NOT EXISTS story_vec &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USING vec0(embedding float[384] distance_metric=cosine)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# KNN is just a SELECT
&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT rowid, distance FROM story_vec &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WHERE embedding MATCH :v AND k = :k ORDER BY distance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sqlite_vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;serialize_float32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That powers three things: &lt;strong&gt;dedup at ingest&lt;/strong&gt; (nearest neighbor under a distance&lt;br&gt;
threshold → it's a duplicate), &lt;strong&gt;"more like this"&lt;/strong&gt;, and &lt;strong&gt;semantic inline search&lt;/strong&gt;.&lt;br&gt;
Result: one machine instead of two, zero cost to store vectors, near-zero infra. SQLite&lt;br&gt;
is single-writer, but for one worker that's a non-issue — WAL + &lt;code&gt;busy_timeout&lt;/code&gt; cover it.&lt;/p&gt;
&lt;h2&gt;
  
  
  A chain of free LLMs
&lt;/h2&gt;

&lt;p&gt;Each story gets one LLM pass: classify (category, tags, importance, "breaking"),&lt;br&gt;
summarize, translate EN↔RU, and score tone 0–3 (for a "good news only" toggle).&lt;br&gt;
OpenRouter lets you pass an array of models and falls back automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen3-next-80b-a3b-instruct:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/llama-3.3-70b-instruct:free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/mistral-nemo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# cheap paid fallback when the free ones 429
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A global rate limiter + exponential backoff on 429/5xx sit on top. Free models carry&lt;br&gt;
90%+ of traffic, so the monthly bill is basically rounding error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local embeddings (fastembed / ONNX)
&lt;/h2&gt;

&lt;p&gt;Embeddings for every story and every search query. Going to an API for that is money,&lt;br&gt;
latency, and a privacy leak. &lt;a href="https://github.com/qdrant/fastembed" rel="noopener noreferrer"&gt;fastembed&lt;/a&gt; runs&lt;br&gt;
sentence-transformers on ONNX without PyTorch — &lt;code&gt;paraphrase-multilingual-MiniLM-L12-v2&lt;/code&gt;,&lt;br&gt;
384 dims, ~10–30 ms on CPU. Free and private.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recommender: a taste vector + EWMA + anti-bubble
&lt;/h2&gt;

&lt;p&gt;Each user has a &lt;code&gt;taste_vec&lt;/code&gt; (same 384-dim space as stories). It starts null; the first&lt;br&gt;
🔥/❤️ copies the story's embedding, and every reaction after that nudges it with an&lt;br&gt;
exponential moving average. A link click is a stronger signal than a like, so it moves&lt;br&gt;
the vector more. 😢 pushes it away.&lt;/p&gt;

&lt;p&gt;Ranking = &lt;code&gt;cosine(taste, story) + category affinity + tag affinity + recency&lt;/code&gt;, plus a&lt;br&gt;
small "discovery" injection so the feed doesn't collapse into an echo chamber. The whole&lt;br&gt;
thing is brute-force numpy over the last few hundred candidates — milliseconds at this&lt;br&gt;
scale, and far simpler than wiring up a vector index for ranking (sqlite-vec handles the&lt;br&gt;
KNN-shaped problems; ranking wants a weighted score, not nearest-neighbor).&lt;/p&gt;

&lt;h2&gt;
  
  
  And then it broke — three times
&lt;/h2&gt;

&lt;p&gt;The honest part. None of these were the idea; all were operational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. sqlite-vec 0.1.6 shipped a 32-bit binary for arm64.&lt;/strong&gt; First deploy:&lt;br&gt;
&lt;code&gt;OSError: wrong ELF class: ELFCLASS32&lt;/code&gt;. Looks like your code, is actually upstream. Pin&lt;br&gt;
&lt;code&gt;sqlite-vec==0.1.9&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. fastembed changed its default pooling (CLS → mean) between minor versions.&lt;/strong&gt; New&lt;br&gt;
embeddings became silently incompatible with the stored ones; KNN started returning&lt;br&gt;
garbage. The only fix is to re-embed the whole table and recompute every user's taste&lt;br&gt;
vector. Pin your embedding lib and read the changelog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. OOM crash-loop that hid for days.&lt;/strong&gt; Loading the ONNX model pushes RSS to ~400 MB.&lt;br&gt;
On a 512 MB machine the kernel OOM-killed the process &lt;em&gt;during model load&lt;/em&gt;, Fly&lt;br&gt;
restarted it, it OOM'd again — a loop. The bot kept &lt;em&gt;polling&lt;/em&gt; (so it looked alive) but&lt;br&gt;
never created a story. Meanwhile RSS items piled up and every digest went out empty.&lt;br&gt;
Bumping to 1 GB (then 2 GB under load) fixed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. A feed with no timeout froze the whole pipeline for ~4 days.&lt;/strong&gt; &lt;code&gt;feedparser.parse(url)&lt;/code&gt;&lt;br&gt;
does a blocking fetch with &lt;strong&gt;no timeout&lt;/strong&gt;. One dead/slow feed hung the run forever, and&lt;br&gt;
with &lt;code&gt;max_instances=1&lt;/code&gt; every subsequent run was skipped — a silent multi-day outage&lt;br&gt;
while the bot happily answered messages. Fixes that should've been there from day one:&lt;br&gt;
fetch with httpx and a hard timeout, and wrap each pipeline run in &lt;code&gt;asyncio.wait_for&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The lesson from #3 and #4 is the same: &lt;strong&gt;a worker that polls can look healthy while&lt;br&gt;
being functionally dead.&lt;/strong&gt; I added a watchdog that DMs me if no story has been created&lt;br&gt;
in N hours. It would have caught all of it in minutes instead of days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;It's live: &lt;strong&gt;&lt;a href="https://t.me/futur_e_news_bot?start=devto" rel="noopener noreferrer"&gt;@futur_e_news_bot&lt;/a&gt;&lt;/strong&gt; — &lt;code&gt;/start&lt;/code&gt;,&lt;br&gt;
pick a language, react to a few stories, and the feed starts tuning itself. Good/neutral&lt;br&gt;
news by default; you can dial in negative news in four levels.&lt;/p&gt;

&lt;p&gt;I'd love feedback on the dedup/clustering quality and on the recommender — those are the&lt;br&gt;
parts still furthest from "great." Happy to go deeper on sqlite-vec, the free-LLM chain,&lt;br&gt;
or the recommender in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>sql</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
