<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Preetha</title>
    <description>The latest articles on DEV Community by Preetha (@rpreetha).</description>
    <link>https://dev.to/rpreetha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944287%2F495be482-8283-4571-9f2a-a9d2afade6c2.png</url>
      <title>DEV Community: Preetha</title>
      <link>https://dev.to/rpreetha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rpreetha"/>
    <language>en</language>
    <item>
      <title>I built a self-hosted RAG system for Journalism — What Production Retrieval Taught Me</title>
      <dc:creator>Preetha</dc:creator>
      <pubDate>Fri, 22 May 2026 08:49:06 +0000</pubDate>
      <link>https://dev.to/rpreetha/i-built-a-self-hosted-rag-system-for-journalism-what-production-retrieval-taught-me-2cki</link>
      <guid>https://dev.to/rpreetha/i-built-a-self-hosted-rag-system-for-journalism-what-production-retrieval-taught-me-2cki</guid>
      <description>&lt;p&gt;Over the last few months, I built &lt;strong&gt;Atlas&lt;/strong&gt; — a fully self-hosted retrieval system designed for journalism workflows. No paid APIs. No hosted vector databases or AI infrastructure. Just local models, PostgreSQL, pgvector, Celery, and a retrieval pipeline built to survive production traffic.&lt;/p&gt;

&lt;p&gt;I originally thought this would mostly be an infrastructure project. It wasn't. The hardest lessons appeared after deployment — when assumptions broke, retrieval quality drifted, and tiny implementation decisions started affecting reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does Atlas do?
&lt;/h2&gt;

&lt;p&gt;Atlas ingests live RSS feeds from BBC, Guardian, NYT, NPR, Deutsche Welle and more every 15 minutes, embeds content locally using sentence-transformers, stores vectors in PostgreSQL with pgvector, and answers questions with source-grounded citations.&lt;/p&gt;

&lt;p&gt;Beyond search it has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Grounded Q&amp;amp;A&lt;/strong&gt; — every answer maps to an exact source passage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claim-level fact-checking&lt;/strong&gt; — splits text into claims, scores each against evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Story brief generation&lt;/strong&gt; — key facts, open questions, suggested angles for reporters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-format repurposing&lt;/strong&gt; — one topic becomes newsletter, social post, audio script, headline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A full story workspace&lt;/strong&gt; — source notebooks, drafts, editorial review, version diff, publish readiness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/SearchAnswer.gif" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/SearchAnswer.gif&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/PartnerMode.gif" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/PartnerMode.gif&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The retrieval pipeline
&lt;/h2&gt;

&lt;p&gt;Here is the full pipeline before I get into the lessons:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query string
    │
    ├── embed(query) → vector cosine &amp;gt; 0.30 → top 20 chunks
    ├── websearch_to_tsquery → PostgreSQL FTS → top 20 chunks
    └── Title FTS boost → top 10 articles
              │
              ▼
         RRF merge (k=60)
              │
         recency blend (85% relevance + 15% freshness)
              │
         post-cosine gate &amp;gt; 0.12
              │
         Policy engine (public / partner / paywall)
              │
         Response + inline citations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 1 — Pure vector search fails for news
&lt;/h2&gt;

&lt;p&gt;This surprised me. I assumed a good embedding model would handle everything. It does not — at least not for current events journalism.&lt;/p&gt;

&lt;p&gt;The problem: &lt;strong&gt;proper nouns&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Words like &lt;code&gt;Philippines&lt;/code&gt;, &lt;code&gt;Kishida&lt;/code&gt;, &lt;code&gt;Rafah&lt;/code&gt;, &lt;code&gt;Starmer&lt;/code&gt; are rare in any model's training data relative to their importance in daily news. The cosine similarity between &lt;code&gt;"Japan missile exports Philippines"&lt;/code&gt; and an article titled &lt;code&gt;"Tokyo defence deal with Manila confirmed"&lt;/code&gt; was &lt;strong&gt;0.28&lt;/strong&gt; — just below my original threshold of 0.30.&lt;/p&gt;

&lt;p&gt;The article was clearly relevant. The vector search missed it completely.&lt;/p&gt;

&lt;p&gt;Full-text search caught it immediately because &lt;code&gt;Japan&lt;/code&gt;, &lt;code&gt;missile&lt;/code&gt;, &lt;code&gt;Philippines&lt;/code&gt; all appeared in the article text.&lt;/p&gt;

&lt;p&gt;The fix was hybrid search. Vector catches semantic similarity. FTS catches proper nouns and exact terminology. Neither is sufficient alone for a news corpus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Three search paths merged with RRF
# Path 1: vector cosine similarity
# Path 2: websearch_to_tsquery (handles "Japan Philippines" as two terms)
# Path 3: title-specific FTS (weighted 0.7x to avoid title-only noise)
&lt;/span&gt;
&lt;span class="c1"&gt;# RRF merge — no score normalisation needed because it only uses rank position
# final_score = Σ 1 / (60 + rank_i)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 2 — Batch embedding is not a micro-optimisation
&lt;/h2&gt;

&lt;p&gt;I was calling &lt;code&gt;embed()&lt;/code&gt; once per article for the first two weeks. Here is what that looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17 feeds × 30 articles × embed(1 article) × 100ms = 51 seconds per ingest cycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After switching to batch embedding — collect all articles, call &lt;code&gt;embed([t1, t2, ..., tN])&lt;/code&gt; once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17 feeds × 30 articles = 510 articles
embed(510 articles)    = ~3 seconds total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;17× faster.&lt;/strong&gt; The model inference overhead is almost entirely fixed cost per batch, not per item. This is obvious from the PyTorch documentation but I had not read it carefully enough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — slow
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;insert_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After — fast
&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;vecs&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# single call, returns (N, 384) array
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vecs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;insert_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 3 — The cosine threshold is your precision-recall dial
&lt;/h2&gt;

&lt;p&gt;Atlas has two thresholds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;COSINE_MIN&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;   &lt;span class="c1"&gt;# SQL WHERE — pre-filter before leaving DB
&lt;/span&gt;&lt;span class="n"&gt;POST_COSINE_MIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.12&lt;/span&gt;   &lt;span class="c1"&gt;# post-RRF — sanity gate after merge
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I learned tuning these:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.45&lt;/td&gt;
&lt;td&gt;Missed "Japan missile Philippines" — too restrictive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;td&gt;Good balance for a news corpus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;td&gt;Sports results started appearing for political queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The intuition: news articles about related topics often use completely different vocabulary than the query. A threshold of 0.30 allows the model to bridge that vocabulary gap. A threshold of 0.45 requires the query and article to use nearly identical language — which defeats the purpose of semantic search.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;POST_COSINE_MIN = 0.12&lt;/code&gt; exists only to handle FTS-only hits. When an article is found by keyword search but has no semantic overlap with the query (cosine = 0.0), it means the keyword match was probably accidental. The post-filter removes those.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4 — Celery beat scheduling has a startup timing problem
&lt;/h2&gt;

&lt;p&gt;The beat schedule runs &lt;code&gt;ingest_all_feeds&lt;/code&gt; every 15 minutes. But there is a subtle issue: on a fresh deploy, the first beat fires at the next &lt;code&gt;:00&lt;/code&gt;, &lt;code&gt;:15&lt;/code&gt;, &lt;code&gt;:30&lt;/code&gt;, or &lt;code&gt;:45&lt;/code&gt; UTC boundary — not 15 minutes from startup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Deploy at 14:01 → first ingest at 14:15  ✓ fine
Deploy at 14:14 → first ingest at 14:15  ✓ fine
Deploy at 14:00:01 → first ingest at 14:15  ✗ 15 minute corpus gap on first launch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix was &lt;code&gt;timedelta(minutes=15)&lt;/code&gt; instead of &lt;code&gt;crontab(minute='*/15')&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;startup_ingest&lt;/code&gt; task also checks corpus article count before honouring the Redis dedup flag. Empty corpus → ingest regardless. This handles &lt;code&gt;docker-compose down -v&lt;/code&gt; (fresh database) correctly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;beat_schedule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingest-every-15-min&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks.ingest_all_feeds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# from startup, not clock-aligned
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;startup-ingest-once&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks.startup_ingest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;# fires once, Redis dedup prevents repeats
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 5 — The Docker healthcheck dependency chain matters
&lt;/h2&gt;

&lt;p&gt;This one took me an embarrassing amount of time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;celery-beat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;celery-worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;   &lt;span class="c1"&gt;# ← this line is critical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;service_healthy&lt;/code&gt;, beat starts immediately and dispatches tasks before any worker is ready to consume them. The tasks sit in the queue, beat fires again in 15 minutes, tasks pile up.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;service_healthy&lt;/code&gt;, beat waits until a worker is confirmed ready. Clean startup every time.&lt;/p&gt;

&lt;p&gt;The worker healthcheck uses &lt;code&gt;celery inspect ping&lt;/code&gt; which confirms the worker is actually processing — not just that the container started.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;The infrastructure has a &lt;code&gt;warmup_reranker()&lt;/code&gt; stub in &lt;code&gt;main.py&lt;/code&gt; for a cross-encoder reranker. That is the highest-impact next upgrade — running &lt;code&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/code&gt; over the top-20 RRF results before returning to the user. Adds ~100ms latency but meaningfully improves ranking for ambiguous queries.&lt;/p&gt;

&lt;p&gt;I am also looking at adding a BM25 path via the &lt;code&gt;pg_bm25&lt;/code&gt; extension (ParadeDB) to replace the PostgreSQL FTS path. BM25 handles document length normalisation better than &lt;code&gt;tsvector&lt;/code&gt; for longer articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  The project
&lt;/h2&gt;

&lt;p&gt;Atlas is built to learn and adapt. The README has a two-week tutorial walking through each layer of the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; FastAPI · PostgreSQL 16 · pgvector · Celery · Redis · sentence-transformers · Next.js 14 · Docker Compose&lt;/p&gt;

&lt;p&gt;Happy to answer questions on the retrieval architecture, the pgvector schema, or the Celery configuration in the comments.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>mcp</category>
      <category>postgressql</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
