<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Preetha</title>
    <description>The latest articles on DEV Community by Preetha (@rpreetha).</description>
    <link>https://dev.to/rpreetha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944287%2F495be482-8283-4571-9f2a-a9d2afade6c2.png</url>
      <title>DEV Community: Preetha</title>
      <link>https://dev.to/rpreetha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rpreetha"/>
    <language>en</language>
    <item>
      <title>I Built a Medication Safety Companion for Home Care with Hermes Agent</title>
      <dc:creator>Preetha</dc:creator>
      <pubDate>Sat, 30 May 2026 12:33:11 +0000</pubDate>
      <link>https://dev.to/rpreetha/i-built-a-medication-safety-companion-for-home-care-with-hermes-agent-jki</link>
      <guid>https://dev.to/rpreetha/i-built-a-medication-safety-companion-for-home-care-with-hermes-agent-jki</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;: Build With Hermes Agent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Across millions of homes, older adults managing multiple prescriptions face the same quiet crisis every day.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"I think I took my blood thinner twice today. Or maybe I forgot it. &lt;br&gt;
I honestly can't remember."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That conversation is where this project started.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Medication Safety Companion&lt;/strong&gt; is a home-care assistant that runs inside Telegram. Older adults talk to it by voice note or text message — the same way they already use their phone. It records doses, figures out which pill they mean when the description is vague, stops them from taking the same dose twice, and escalates immediately when something dangerous is happening. It also sends proactive reminders when a dose is overdue, with no action needed from the patient.&lt;/p&gt;

&lt;p&gt;Everything the patient and caregiver sees is delivered through Hermes Agent. The patient never interacts with Python. They talk to Hermes — through Telegram — and Hermes handles the full conversation from greeting to confirmation to escalation.&lt;/p&gt;

&lt;p&gt;The core design decision: Hermes is in charge of the conversation. A deterministic Python pipeline is in charge of every safety decision. The two never swap roles.&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Here is what an actual Telegram session looks like. All of these were live on the running system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/preetharaj/Rx-Guardian/blob/main/docs/hermes_demo.gif" rel="noopener noreferrer"&gt;https://github.com/preetharaj/Rx-Guardian/blob/main/docs/hermes_demo.gif&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/preetharaj/Rx-Guardian/blob/main/docs/Telegram_demo.gif" rel="noopener noreferrer"&gt;https://github.com/preetharaj/Rx-Guardian/blob/main/docs/Telegram_demo.gif&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Confirming a dose by voice
&lt;/h3&gt;

&lt;p&gt;The patient sends a voice note through Telegram. Hermes receives it via its Telegram gateway, passes it through faster-whisper for local transcription, and routes the transcript to the safety pipeline. The patient sees the transcription echoed back so they can catch any mishearing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient sends voice note: "I took my heart pill"

🎙️ I heard: I took my heart pill

Got it. I've recorded that you took your heart pill at 08:14 UTC.
Let me know if you miss a dose.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When the description is ambiguous
&lt;/h3&gt;

&lt;p&gt;The system never guesses. It asks. Hermes holds the clarification session open across turns using its built-in memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient: "the white pill"

I found more than one pill that could match.
1) Metoprolol 50mg  — white oval, blue cap bottle
2) Lisinopril 10mg  — white round, orange bottle, morning dose
3) Amlodipine 5mg   — white round, yellow bottle, evening dose

Which one did you take?

Patient: "2"

Got it. I've recorded that you took your pressure pill at 08:17 UTC.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When the dose was already taken
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient: "I took my heart pill again"

I already have a record that you took your heart pill today at 08:14 UTC.
You don't need to take it again.
If you think there's a mistake, ask your caregiver to check.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When something unsafe is mentioned
&lt;/h3&gt;

&lt;p&gt;Normal flow stops. No dose gets logged. Hermes delivers this message verbatim — the response formatting step is bypassed entirely for escalations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient: "I want to take an old pain pill with my blood thinner"

I need to stop here. That medication may not be safe to take
alongside your current prescriptions. Please do not take it right now.
Contact your doctor or pharmacist before taking anything new.
I've made a note for your caregiver.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When it sounds like an emergency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient: "I accidentally took four pills by mistake"

This sounds urgent. Please call 995 right now or ask someone
nearby to call for you. Tell them which medication was taken
and roughly how much. I have notified your caregiver.
Do not take anything else until help arrives.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Proactive reminder — Hermes cron, no patient action needed
&lt;/h3&gt;

&lt;p&gt;Every morning, Hermes's scheduler runs &lt;code&gt;dispatch.py --remind&lt;/code&gt;. If a dose is overdue, Hermes sends the reminder to Telegram automatically. The patient does not have to open the app first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;⚠️ Reminder: You haven't confirmed your blood thinner yet today.
This is an important medication. Please take it and reply:
'I took my blood thinner'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The audit log
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;/logs

📋 Recent events:
🚨 [2026-05-29 01:09] EMERGENCY_ESCALATION
⚠️  [2026-05-29 01:08] DRUG_INTERACTION_ALERT
✅  [2026-05-29 01:06] MED_CONFIRMED — heart pill
✅  [2026-05-29 01:05] MED_CONFIRMED — pressure pill
🔍  [2026-05-29 01:05] MED_AMBIGUOUS
🔒  [2026-05-28 14:22] MED_DUPLICATE_BLOCKED — heart pill
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/preetharaj/Rx-Guardian" rel="noopener noreferrer"&gt;preetharaj/Rx-Guardian&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hermes Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The entire user-facing layer — receives messages and voice notes via Telegram gateway, manages conversation and session memory, runs the skill, delivers responses, schedules and sends proactive reminders via cron&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Safety pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pure Python — deterministic rules only, zero AI involvement in any safety decision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Response formatting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenRouter free tier (nvidia/nemotron:free) — called by Hermes after the pipeline decides, only to rephrase output into warmer language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voice transcription&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;faster-whisper running locally — Hermes passes Telegram voice notes through it, no audio leaves the device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SQLite with WAL mode — 4 tables, immutable audit log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;pytest — 94 tests, all passing without any API key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two rows in that table share responsibility for what most people would call "the AI part": Hermes Agent and Response formatting. They do entirely different things. Hermes Agent is the patient-facing intelligence — it manages the full conversation, remembers what was said two turns ago, routes voice notes, and delivers all messages through Telegram. Response formatting is a narrow utility step downstream, called only after the safety pipeline has already made its decision, and its only job is to turn a structured result like &lt;code&gt;{outcome: CONFIRMED, message: "..."}&lt;/code&gt; into a warmer sentence. A bad API response or a model hallucination in that step cannot change whether a dose is confirmed or an escalation fires — the pipeline already ran.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Python modules
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;lookup.py&lt;/span&gt;               &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;matches&lt;/span&gt; &lt;span class="err"&gt;"heart&lt;/span&gt; &lt;span class="err"&gt;pill"&lt;/span&gt; &lt;span class="err"&gt;to&lt;/span&gt; &lt;span class="err"&gt;Metoprolol,&lt;/span&gt; &lt;span class="err"&gt;handles&lt;/span&gt; &lt;span class="err"&gt;ambiguous&lt;/span&gt; &lt;span class="err"&gt;descriptions&lt;/span&gt;
&lt;span class="err"&gt;ambiguity_handler.py&lt;/span&gt;    &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;manages&lt;/span&gt; &lt;span class="err"&gt;multi-turn&lt;/span&gt; &lt;span class="err"&gt;clarification&lt;/span&gt; &lt;span class="err"&gt;sessions&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;session&lt;/span&gt; &lt;span class="err"&gt;keys&lt;/span&gt;
&lt;span class="err"&gt;duplicate_guard.py&lt;/span&gt;      &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;checks&lt;/span&gt; &lt;span class="err"&gt;audit&lt;/span&gt; &lt;span class="err"&gt;log,&lt;/span&gt; &lt;span class="err"&gt;blocks&lt;/span&gt; &lt;span class="err"&gt;repeat&lt;/span&gt; &lt;span class="err"&gt;doses&lt;/span&gt; &lt;span class="err"&gt;within&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;6-hour&lt;/span&gt; &lt;span class="err"&gt;window&lt;/span&gt;
&lt;span class="err"&gt;confidence_rules.py&lt;/span&gt;     &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;handles&lt;/span&gt; &lt;span class="err"&gt;low-confidence&lt;/span&gt; &lt;span class="err"&gt;voice&lt;/span&gt; &lt;span class="err"&gt;transcriptions&lt;/span&gt;
&lt;span class="err"&gt;emergency_escalation.py&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;80+&lt;/span&gt; &lt;span class="err"&gt;trigger&lt;/span&gt; &lt;span class="err"&gt;phrases&lt;/span&gt; &lt;span class="err"&gt;across&lt;/span&gt; &lt;span class="err"&gt;5&lt;/span&gt; &lt;span class="err"&gt;categories,&lt;/span&gt; &lt;span class="err"&gt;fires&lt;/span&gt; &lt;span class="err"&gt;before&lt;/span&gt; &lt;span class="err"&gt;anything&lt;/span&gt; &lt;span class="err"&gt;else&lt;/span&gt;
&lt;span class="err"&gt;caregiver_override.py&lt;/span&gt;   &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;explicit&lt;/span&gt; &lt;span class="err"&gt;two-step&lt;/span&gt; &lt;span class="err"&gt;correction&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;full&lt;/span&gt; &lt;span class="err"&gt;audit&lt;/span&gt; &lt;span class="err"&gt;trail&lt;/span&gt;
&lt;span class="err"&gt;safety_router.py&lt;/span&gt;        &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;wires&lt;/span&gt; &lt;span class="err"&gt;all&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;above&lt;/span&gt; &lt;span class="err"&gt;together&lt;/span&gt; &lt;span class="err"&gt;in&lt;/span&gt; &lt;span class="err"&gt;a&lt;/span&gt; &lt;span class="err"&gt;fixed&lt;/span&gt; &lt;span class="err"&gt;priority&lt;/span&gt; &lt;span class="err"&gt;order&lt;/span&gt;
&lt;span class="err"&gt;reminder.py&lt;/span&gt;             &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;checks&lt;/span&gt; &lt;span class="err"&gt;overdue&lt;/span&gt; &lt;span class="err"&gt;doses&lt;/span&gt; &lt;span class="err"&gt;and&lt;/span&gt; &lt;span class="err"&gt;generates&lt;/span&gt; &lt;span class="err"&gt;reminder&lt;/span&gt; &lt;span class="err"&gt;messages&lt;/span&gt;
&lt;span class="err"&gt;dispatch.py&lt;/span&gt;             &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;single&lt;/span&gt; &lt;span class="err"&gt;CLI&lt;/span&gt; &lt;span class="err"&gt;entry&lt;/span&gt; &lt;span class="err"&gt;point&lt;/span&gt; &lt;span class="err"&gt;that&lt;/span&gt; &lt;span class="err"&gt;Hermes&lt;/span&gt; &lt;span class="err"&gt;calls&lt;/span&gt; &lt;span class="err"&gt;via&lt;/span&gt; &lt;span class="err"&gt;terminal,&lt;/span&gt; &lt;span class="err"&gt;returns&lt;/span&gt; &lt;span class="err"&gt;JSON&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How Hermes Agent Powers This Project
&lt;/h2&gt;

&lt;p&gt;Hermes is not a wrapper around a prompt here. It is the patient-facing layer, the conversation manager, the voice transcription router, the scheduler, and the delivery channel. Without Hermes, this is a Python script that nobody can talk to.&lt;/p&gt;

&lt;p&gt;Let me be specific about each piece.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Telegram gateway handles voice and text
&lt;/h3&gt;

&lt;p&gt;When a patient sends a voice note to the Telegram bot, Hermes receives it through its native Telegram gateway integration. The gateway downloads the audio, passes it through faster-whisper for local transcription, and forwards the resulting text to the skill for processing. The patient never had to install anything beyond Telegram. They did not type a command or navigate a menu. They just sent a voice message the same way they would send one to a family member.&lt;/p&gt;

&lt;p&gt;This is Hermes's gateway doing real work. The voice-to-text pipeline, the Telegram connection, the message routing — all of it is handled by Hermes before the first line of safety Python runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The skill is the orchestration contract
&lt;/h3&gt;

&lt;p&gt;The project is packaged as a native Hermes skill stored in &lt;code&gt;hermes-skill/med-safety/SKILL.md&lt;/code&gt;. Dropping this file into &lt;code&gt;~/.hermes/skills/&lt;/code&gt; registers &lt;code&gt;/med-safety&lt;/code&gt; as a slash command and loads all the rules Hermes will follow for every interaction.&lt;/p&gt;

&lt;p&gt;The SKILL.md contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A step-by-step procedure for what Hermes does on each turn&lt;/li&gt;
&lt;li&gt;An outcome table mapping every pipeline result code to a specific response behaviour&lt;/li&gt;
&lt;li&gt;Hard rules Hermes must never break (&lt;code&gt;ESCALATION → deliver verbatim, nothing else&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The path and argument format for calling &lt;code&gt;dispatch.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Alongside it is a &lt;code&gt;SOUL.md&lt;/code&gt; — Hermes's built-in personality system. This defines the voice of the assistant: calm, short sentences, everyday words, no medical jargon, one clear next step at the end of every message. The older adult on the other end of this conversation does not want to parse clinical language when they are worried about whether they double-dosed a blood thinner. The SOUL.md enforces that consistently across every response without repeating the instruction in every prompt.&lt;/p&gt;

&lt;p&gt;This is why a skill is a better fit than a system prompt. A SOUL.md + SKILL.md combination gives the agent a spec it treats like a contract, not a suggestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The terminal tool separates conversation from safety decisions
&lt;/h3&gt;

&lt;p&gt;Hermes calls &lt;code&gt;dispatch.py&lt;/code&gt; via its terminal tool on every turn. The script runs the full safety pipeline and returns a JSON result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CONFIRMED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Got it. I've recorded that you took your heart pill at 08:14 UTC."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"session_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"log_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes reads the &lt;code&gt;outcome&lt;/code&gt; field and acts according to the rules in SKILL.md. It never has to decide whether a dose was safe, whether a combination is dangerous, or whether something is an emergency. Those decisions came back in the JSON. Hermes just delivers.&lt;/p&gt;

&lt;p&gt;This separation is intentional. The terminal tool pattern lets you put safety-critical logic in code you can test, audit, and reason about, while leaving the agent to do what it is actually good at: understanding natural language, managing conversation state, and delivering messages to a human.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Session memory tracks multi-turn clarification
&lt;/h3&gt;

&lt;p&gt;When "the white pill" comes back AMBIGUOUS, the pipeline returns a &lt;code&gt;session_key&lt;/code&gt; — a unique identifier for the open clarification session. Hermes holds this key in its session memory and passes it automatically on the next turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python dispatch.py &lt;span class="nt"&gt;--session&lt;/span&gt; &lt;span class="s2"&gt;"ambig_20260529_083200"&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The patient just said "2". They did not say which session they were answering, what question was asked, or which medications were in the list. Hermes remembered all of that across turns. The session_key mechanism works because Hermes's memory layer exists and I did not have to build a separate state store to use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Cron drives the proactive reminder loop
&lt;/h3&gt;

&lt;p&gt;Inside a Hermes session, this instruction registers a scheduled job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Create a cron job: every day at 08:30
  &lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/project &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python dispatch.py &lt;span class="nt"&gt;--remind&lt;/span&gt;
If result is not &lt;span class="o"&gt;[&lt;/span&gt;SILENT], send the message to Telegram.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;dispatch.py --remind&lt;/code&gt; checks the reminder engine, compares each medication's scheduled time against the audit log, and returns reminder messages only for doses that are overdue and have not been sent yet today. Hermes's cron scheduler runs this check and delivers the result through Telegram without any polling loop, background thread, or separate infrastructure.&lt;/p&gt;

&lt;p&gt;A morning reminder that reaches the patient before they have forgotten is more useful than one they have to ask for. Hermes cron made that easy. I expected to spend a day on this part. It took about an hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. The Telegram bot's own JobQueue as a redundancy layer
&lt;/h3&gt;

&lt;p&gt;The Telegram bot also runs an independent JobQueue check every 15 minutes through python-telegram-bot. If Hermes cron misses a window — restart, connectivity, anything — the bot sends the reminder anyway. A patient's medication reminder should not depend on a single point of failure. Two independent schedules pointing at the same &lt;code&gt;reminder.py&lt;/code&gt; logic is the right call.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture in one diagram
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patient voice note or text (Telegram)
    │
    ▼
Hermes Agent — Telegram gateway
    │  receives message, transcribes voice via faster-whisper,
    │  loads med-safety skill, reads SOUL.md personality
    │
    ▼
Hermes calls terminal: python dispatch.py "I took my heart pill"
    │
    ▼
safety_router.py — deterministic Python, no AI
    ├── emergency_escalation.py  — 80+ unsafe keywords, fires first
    ├── confidence_rules.py      — low-confidence STT → ask to repeat
    ├── lookup.py                — 4-pass medication matching
    ├── ambiguity_handler.py     — 2+ matches → clarification session
    └── duplicate_guard.py       — confirmed in last 6h → blocked
    │
    ▼ JSON: {outcome, message, session_key, log_id}
    │
    ▼
If outcome == ESCALATION:
    Hermes delivers message verbatim. Response formatting bypassed.
Else:
    OpenRouter rephrases into SOUL.md-consistent language
    Hermes delivers to Telegram
    │
    ▼
Audit log (SQLite) — immutable, every event written, nothing deleted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Escalation messages never reach OpenRouter. The moment &lt;code&gt;format_response()&lt;/code&gt; sees &lt;code&gt;outcome == ESCALATION&lt;/code&gt;, it returns the message as-is. The sentence telling someone to call emergency services will never be softened by a model trying to sound less alarming.&lt;/p&gt;




&lt;h2&gt;
  
  
  Safety rules, tested in code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;span class="c"&gt;# 94 passed in 2.4s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every safety property has a test. A few that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"the white pill" with three candidates → AMBIGUOUS, nothing logged yet&lt;/li&gt;
&lt;li&gt;Same medication confirmed twice within 6 hours → DUPLICATE_BLOCKED&lt;/li&gt;
&lt;li&gt;MED_UNCERTAIN in the log → also blocks a second dose attempt&lt;/li&gt;
&lt;li&gt;"I accidentally took four pills" → EMERGENCY_ESCALATION, message contains "995"&lt;/li&gt;
&lt;li&gt;"I want to take ibuprofen" → DRUG_INTERACTION_ALERT, no dose confirmed&lt;/li&gt;
&lt;li&gt;"I took fish oil" → SUPPLEMENT response, never logged as a prescription&lt;/li&gt;
&lt;li&gt;"I took it again" → UNKNOWN_MED, not confirmed as insulin&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one was a real bug. "I took it again" strips down to "it" through the intent extractor, and "it" fuzzy-matched "insulin" because both contain the same token. One silent wrong confirmation of a critical medication. The fix was a stopword list that prevents pronouns from reaching the medication matcher at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running it yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &amp;lt;your-repo&amp;gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;med-safety-companion
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"python-telegram-bot[job-queue]"&lt;/span&gt; faster-whisper

&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# OPENROUTER_API_KEY — free at openrouter.ai, no card required&lt;/span&gt;
&lt;span class="c"&gt;# TELEGRAM_BOT_TOKEN — free from @BotFather on Telegram&lt;/span&gt;

python seed.py
python &lt;span class="nt"&gt;-m&lt;/span&gt; pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;
python cli.py &lt;span class="nt"&gt;--bot&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send &lt;code&gt;/start&lt;/code&gt; to your Telegram bot. Say "I took my heart pill" by voice. Watch it land in the logs.&lt;/p&gt;

&lt;p&gt;To run the full Hermes experience with the &lt;code&gt;/med-safety&lt;/code&gt; slash command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; hermes-skill/med-safety ~/.hermes/skills/
hermes &lt;span class="nt"&gt;-s&lt;/span&gt; med-safety
&lt;span class="c"&gt;# then type: /med-safety I took my heart pill&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Three things I did not expect
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Hermes's voice note handling eliminated an entire integration problem.&lt;/strong&gt; I expected to write a custom webhook, a file download handler, and audio conversion code. The Telegram gateway handled all of that. I only had to write the transcription step and wire it to the pipeline. The hard part of "voice input on Telegram" was not hard at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SKILL.md is a more reliable contract than a system prompt.&lt;/strong&gt; A well-written skill with outcome tables, hard rules, and a separate SOUL.md produces consistent behaviour across hundreds of turns. A long system prompt drifts. The skill acts like a spec the agent is trying to satisfy, not a tone it vaguely remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cron + Telegram is a complete proactive notification system.&lt;/strong&gt; I expected to need a separate scheduler service, a notification database, and probably a Redis queue. I ended up with a &lt;code&gt;--remind&lt;/code&gt; flag on a Python script and three lines in a Hermes cron definition. The proactive reminder feature — the one that might actually save someone from a missed dose — was the simplest part of the whole build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;Adding TTS so the system can speak the confirmation back to patients who find reading difficult. A caregiver summary view showing the week's log in plain language. Letting caregivers add or edit medications through the Telegram chat itself. And eventually, refill tracking — a warning when a medication is running low before the patient runs out entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hermes is the voice, the memory, the scheduler, and the delivery channel. Python is the safety brain. One without the other is incomplete. Together they form something worth deploying in a real home.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>healthcare</category>
    </item>
    <item>
      <title>Inside an MCP-Native Content Workflow Engine — Here's What Actually Broke (and What Finally Made Sense)</title>
      <dc:creator>Preetha</dc:creator>
      <pubDate>Wed, 27 May 2026 12:45:51 +0000</pubDate>
      <link>https://dev.to/rpreetha/inside-an-mcp-native-content-workflow-engine-heres-what-actually-broke-and-what-finally-made-85a</link>
      <guid>https://dev.to/rpreetha/inside-an-mcp-native-content-workflow-engine-heres-what-actually-broke-and-what-finally-made-85a</guid>
      <description>&lt;p&gt;I started this project thinking, &lt;em&gt;"Let me try this MCP thing."&lt;/em&gt; I didn't expect to end up rebuilding how I think about workflow automation, AI integration, and what it really means to build infrastructure instead of just tools.&lt;/p&gt;

&lt;p&gt;This isn't a tutorial — the repo &lt;a href="https://github.com/preetharaj/ContentOps-MCP" rel="noopener noreferrer"&gt;ContentOps-MCP&lt;/a&gt; has a 7-day structured tutorial for that. This is more honest. It's about the decisions I made, the ones that broke things, and the lessons that actually matter for your next project.&lt;/p&gt;

&lt;p&gt;Think of it like a &lt;strong&gt;&lt;em&gt;detective story&lt;/em&gt;&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
each phase of the build was a clue. Some led to dead ends. Some changed the whole direction. By the end, the architecture finally made sense.&lt;/p&gt;
&lt;h2&gt;
  
  
  The problem I was actually trying to solve
&lt;/h2&gt;

&lt;p&gt;Content teams don't waste time writing. They waste time on the plumbing around the writing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checking if the meta description is there.
&lt;/li&gt;
&lt;li&gt;Pasting the draft link into Slack.
&lt;/li&gt;
&lt;li&gt;Remembering to send the newsletter email.
&lt;/li&gt;
&lt;li&gt;Discovering two hours after publishing that paragraph three has a broken link.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams patch this with Zapier or n8n. Notion → WordPress → Slack. Done. It works — &lt;strong&gt;until it doesn't&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Until someone publishes a half-finished draft because the automation didn't care.&lt;br&gt;&lt;br&gt;
Until the SEO title is 90 characters because nothing checked.&lt;br&gt;&lt;br&gt;
Until the brand voice that took six months to establish gets eroded one auto-published post at a time.&lt;/p&gt;

&lt;p&gt;The automation wasn't the problem. &lt;strong&gt;What it lacked was any sense of quality.&lt;/strong&gt; It just moved things around.&lt;/p&gt;

&lt;p&gt;So I started with one question:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What if the workflow itself could check whether the content was ready before it moved?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;The project is called &lt;strong&gt;ContentOps MCP Orchestrator&lt;/strong&gt;. The short version looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Notion draft → QA gate → WordPress → Slack → email
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the interesting part isn't the pipeline. It's the &lt;strong&gt;QA gate&lt;/strong&gt; sitting in the middle of it, and the architecture underneath that makes the whole thing composable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: get the pipeline working
&lt;/h3&gt;

&lt;p&gt;I started with a straightforward FastAPI app. Notion polling, a workflow runner that called step functions in order, SQLite for run history, and a basic static UI. Nothing fancy.&lt;/p&gt;

&lt;p&gt;The point was to get the pipeline working end-to-end and understand what data needed to flow between steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: make everything MCP-first
&lt;/h3&gt;

&lt;p&gt;Next, I refactored every integration — WordPress, Slack, Resend, Notion — into its own &lt;strong&gt;MCP server&lt;/strong&gt;. Each one exposed &lt;code&gt;/tools&lt;/code&gt; and &lt;code&gt;/use_tool&lt;/code&gt; endpoints. The orchestrator became an MCP client, dispatching tool calls instead of importing Python functions directly.&lt;/p&gt;

&lt;p&gt;I added a &lt;code&gt;ServerRegistry&lt;/code&gt; that tried the remote server first and fell back to local mock responses if it wasn't running.&lt;/p&gt;

&lt;p&gt;Suddenly the system felt different. Adding a new integration didn't mean editing the runner. It meant writing a new server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: add the QA gate and registry
&lt;/h3&gt;

&lt;p&gt;Then came the QA gate. It's an &lt;strong&gt;11-agent local scoring engine&lt;/strong&gt; that runs on every workflow before the publish step. It checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SEO title completeness
&lt;/li&gt;
&lt;li&gt;Meta description
&lt;/li&gt;
&lt;li&gt;Broken links
&lt;/li&gt;
&lt;li&gt;Readability
&lt;/li&gt;
&lt;li&gt;Brand voice consistency (against a rubric you define)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns a score, a pass/fail, and specific suggestions. The workflow either pauses for human review or continues automatically depending on the mode you configure.&lt;/p&gt;

&lt;p&gt;I also added a curated registry of &lt;strong&gt;14 content-stack MCP servers&lt;/strong&gt; (Ghost, Beehiiv, Substack, WordPress, Webflow, Resend, Loops, Mailchimp, Linear, Notion, Coda, and more).&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture decision that changed everything
&lt;/h2&gt;

&lt;p&gt;Here's the thing about building a workflow tool. The naive approach is to write a runner that knows how to call each integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wordpress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;call_wordpress&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;call_slack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works. It's also a &lt;strong&gt;dead end&lt;/strong&gt;, because every new integration means touching the runner.&lt;/p&gt;

&lt;p&gt;The MCP model inverts this. The runner doesn't know anything about WordPress. The WordPress MCP server knows about WordPress. The runner just knows how to make a tool call and handle the result.&lt;/p&gt;

&lt;p&gt;You can add Ghost, Beehiiv, Linear, anything — as long as it speaks the protocol, the runner doesn't care.&lt;/p&gt;

&lt;p&gt;This sounds obvious in retrospect. &lt;strong&gt;It wasn't obvious when I was writing step 1.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Example: MCP tool call vs HTTP call
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This is the turning point where the architecture finally clicked.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Old way (HTTP / Python function):
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wordpress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;create_draft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;post_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every new integration means touching this file. Every new step type means more branching logic. Hard to test. Hard to extend.&lt;/p&gt;

&lt;p&gt;✅ New way (MCP tool call):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wordpress-mcp&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create_draft&lt;/span&gt;
    &lt;span class="na"&gt;input_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{trigger.pages[0].title}"&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{trigger.pages[0].body}"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;slack-mcp&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;post_message&lt;/span&gt;
    &lt;span class="na"&gt;input_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;draft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;published:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{steps[0].url}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator doesn't know what WordPress is. It just makes a tool call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wordpress-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_draft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How to Build an MCP-Native Content Stack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Add Ghost? Add another server. Add Beehiiv? Add another server. &lt;strong&gt;The orchestrator stays the same. That's the inversion.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the QA gate: what I got wrong first
&lt;/h2&gt;

&lt;p&gt;My first attempt at the QA gate was a single function that checked everything. One blob of logic, one result. It was fast to write and immediately painful to extend.&lt;/p&gt;

&lt;p&gt;The problem is that content quality isn't one thing. SEO is a different concern from readability. Brand voice is different from link validation. Mixing them into one function meant changing one thing risked breaking another, and the output was a wall of undifferentiated feedback that didn't tell you what to fix first.&lt;/p&gt;

&lt;p&gt;The second version split it into &lt;strong&gt;11 specialized agents&lt;/strong&gt;. Each agent inherits from &lt;code&gt;BaseQAAgent&lt;/code&gt;, implements a &lt;code&gt;check()&lt;/code&gt; method, and returns a &lt;code&gt;QAResult&lt;/code&gt; with a severity level and specific issues. The scoring engine aggregates them with configurable weights.&lt;/p&gt;

&lt;p&gt;You can add a new agent without touching any existing one.&lt;/p&gt;

&lt;p&gt;This is the open/closed principle, yes. But the deeper lesson was about &lt;strong&gt;feedback granularity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A QA check that says "this draft needs work" is useless.&lt;br&gt;&lt;br&gt;
A check that says "your meta description is missing, your third internal link returns 404, and your sentence length average is 28 words for a technical audience" is actionable.&lt;/p&gt;
&lt;h3&gt;
  
  
  🔍 Example: QA gate — before vs after
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This is where the QA gate actually does something useful.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Imagine this draft comes in from Notion:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Draft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Post"&lt;/span&gt;
&lt;span class="na"&gt;meta_description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Testing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;something.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sure&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;if&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;will&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;work."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The QA gate checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Title: too vague → fails&lt;/li&gt;
&lt;li&gt;Meta: missing → fails&lt;/li&gt;
&lt;li&gt;Readability: too shallow for technical audience → fails&lt;/li&gt;
&lt;li&gt;Brand voice: informal, unclear → fails&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pass"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Title is generic and vague, meta description is missing, brand voice is not professional or technical, and readability is too shallow for the target audience."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Choose a more specific, technical title."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Write a clear meta description summarizing the article."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Add concrete examples and structure your content with sections."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow pauses. The editor sees this reasoning and chooses to "retry after edit" instead of publishing a weak draft.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the difference between "automation moved things" and "automation checked quality first."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The fallback pattern: underrated engineering
&lt;/h2&gt;

&lt;p&gt;One of the quieter decisions in this project was the &lt;code&gt;ServerRegistry&lt;/code&gt; fallback. When the orchestrator needs to call &lt;code&gt;wordpress-mcp::create_draft&lt;/code&gt;, it first tries the actual MCP server running on port 8002. If that's not running, it falls back to a local mock that returns a realistic response shape.&lt;/p&gt;

&lt;p&gt;This sounds like a convenience feature for demos. It's actually a &lt;strong&gt;correctness feature for development&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Without fallback, developing the orchestrator logic requires all four MCP servers to be running simultaneously. With fallback, you can work on the executor, the QA gate, the UI, and the run trace logic independently — the system behaves consistently regardless of which servers are actually up.&lt;/p&gt;

&lt;p&gt;The lesson:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Design your system to be testable in parts, not just as a whole.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The fallback pattern is one way to do that. It forces you to define what a "realistic response" looks like for each tool, which in turn forces you to think clearly about your data contracts. The mock becomes a kind of documentation, because it defines the shape of the real response.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Example: Fallback mock vs real server
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This is why the fallback pattern is underrated.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Real server response (if running):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wordpress-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_draft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp-native-content-stack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://blog.example.com/mcp-native-content-stack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"draft"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fallback mock (if server is not running):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wordpress-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_draft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp-native-content-stack-mock"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://blog.example.com/mock-draft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"draft"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator doesn't care which one it gets. The shape is the same. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can test the QA gate without WordPress running.&lt;/li&gt;
&lt;li&gt;You can test the UI without Slack running.&lt;/li&gt;
&lt;li&gt;You can test the run trace without any servers running.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The mock becomes documentation because it forces you to define what the real response &lt;em&gt;should&lt;/em&gt; look like.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP actually is (and why it matters right now)
&lt;/h2&gt;

&lt;p&gt;If you haven't been following the MCP ecosystem, here's the quick version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is an open protocol by Anthropic that standardizes how AI models interact with external tools and data sources. Instead of every LLM integration being a bespoke API adapter, MCP gives you a common interface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tools with schemas
&lt;/li&gt;
&lt;li&gt;Resources with URIs
&lt;/li&gt;
&lt;li&gt;A client/server model that works across implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical implication for builders:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You can write a tool once and have any MCP-compatible client call it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude desktop, your custom orchestrator, someone else's agent — they all speak the same protocol. This is a big deal for infrastructure builders because it means your MCP server has leverage beyond your own project.&lt;/p&gt;

&lt;p&gt;I built contentops-mcp before the MCP ecosystem fully matured, which meant making some decisions based on where I thought the protocol was going rather than where it was. That's a risk. But it's also an opportunity.&lt;/p&gt;

&lt;p&gt;Being early in an ecosystem means the registry of content-stack MCP servers I built has a real chance of becoming the reference catalog for this niche, simply because almost nothing else exists there yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons worth keeping
&lt;/h2&gt;

&lt;p&gt;These are the ones I'd tell myself on day one if I could.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Build the trace first.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I added detailed run tracing — per-step status, input/output capture, timestamps — later than I should have. Every bug I hit before that required reading logs and mentally reconstructing what happened. After I had the trace, debugging became obvious. Build observability into the data model from the start, not as an afterthought.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Real numbers beat descriptions.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
"11-agent quality checks" and "14-server content-stack registry" are specific. "AI-powered quality checks" and "growing registry" are not. Specificity signals that you actually built the thing and counted it. Use real numbers everywhere you can.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The mock is a contract.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Every time I wrote a fallback mock response, I was forced to decide what the real response would look like. That decision often uncovered inconsistencies in my data model before I'd written the real implementation. Write your mocks early and treat them as interface definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. One entry point, one command.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The project went through a phase where running it required four terminal windows — one for each MCP server, one for the orchestrator. That's fine for production but terrible for a learner's first experience. The &lt;code&gt;ServerRegistry&lt;/code&gt; fallback pattern collapsed it to one command. Always think about what the first five minutes look like for someone who just cloned the repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Separate the learning path from the feature list.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The 7-day tutorial structure forced me to think about which concepts depended on which other concepts, and in what order they should be introduced. That exercise — independently of any tutorial — made the architecture cleaner. If you can't explain the build order to a beginner, your dependencies probably aren't as clean as you think.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Don't wrap, invert.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The temptation when building on top of existing integrations is to wrap them. "I'll write a wrapper for the WordPress API." The better question is: who should know about WordPress? The answer is: the WordPress MCP server, and nothing else. Inversion of knowledge — pushing integration-specific logic to the edge — is what makes the core stay clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Naming is architecture.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;qa-gate::run_check&lt;/code&gt; as a tool call address tells you the server, the tool, and the action in one string. It's readable in logs. It's greppable. It matches your file structure. Naming your MCP tool calls with &lt;code&gt;server::tool&lt;/code&gt; convention costs nothing and saves enormous cognitive load when you are reading a run trace at midnight trying to figure out why step 3 failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The project has a roadmap that goes in a direction I am genuinely excited about.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phase 4&lt;/strong&gt;: visual workflow editor and team-based approval gates — the product layer on top of the infrastructure.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 5&lt;/strong&gt;: hosted registry with premium MCP server adapters for enterprise stacks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the thing I am most interested in is the registry itself. There is no curated, tested, versioned catalog of content-stack MCP servers anywhere in the current ecosystem. The one in this project is a start — 14 servers across publishing, email, and ops categories.&lt;/p&gt;

&lt;p&gt;Getting that to 50, with real installation paths and verified tool schemas, is the kind of catalog that becomes a reference point for the whole content-ops vertical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to find it
&lt;/h2&gt;

&lt;p&gt;The project is on GitHub at &lt;a href="https://github.com/preetharaj/ContentOps-MCP" rel="noopener noreferrer"&gt;ContentOps-MCP&lt;/a&gt; with a zero-to-running quick start, a 7-day tutorial, and full docs for the QA gate and registry.&lt;/p&gt;

&lt;p&gt;The one-line pitch:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Zero Infra Cost — MCP-native ContentOps orchestrator for AI publishing workflows.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Notion → QA Gate → WordPress → Slack → Resend.&lt;br&gt;&lt;br&gt;
Built-in 11-agent quality checks, 14-server content-stack registry. Self-hosted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are building in the MCP ecosystem, working on content tooling, or just want a concrete project to learn FastAPI and workflow orchestration from scratch — the tutorial is designed to take you from day one to a working demo in a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I didn't want to build another automation tool. I wanted to build a content system that could reason about quality before it published anything.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The case isn't closed.&lt;/strong&gt; If you are working on anything in the MCP space, leave a breadcrumb in the comments — I want to know what tools people are wiring together right now before the trail goes cold.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>contentops</category>
      <category>aiops</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Exploring AI workflow Orchestration: Comparing Weft, Python &amp; Alternative Pipeline Approaches</title>
      <dc:creator>Preetha</dc:creator>
      <pubDate>Mon, 25 May 2026 06:12:48 +0000</pubDate>
      <link>https://dev.to/rpreetha/exploring-ai-workflow-orchestration-comparing-weft-python-alternative-pipeline-approaches-82g</link>
      <guid>https://dev.to/rpreetha/exploring-ai-workflow-orchestration-comparing-weft-python-alternative-pipeline-approaches-82g</guid>
      <description>&lt;p&gt;A few weeks ago I started exploring something that made me rethink how we build AI workflows. Most of us naturally reach for &lt;strong&gt;Python&lt;/strong&gt; when building AI systems. I do too. Python gives flexibility, full control, and honestly it powers almost everything in modern AI stacks.But while experimenting, I came across &lt;strong&gt;Weft from Weaver Mind AI&lt;/strong&gt;.Weft is still in development, but the idea behind it caught my attention.&lt;/p&gt;

&lt;p&gt;Instead of thinking purely in terms of code execution, Weft focuses more on &lt;strong&gt;orchestration&lt;/strong&gt; — how AI models, APIs, retrieval systems, humans, and tools coordinate together as a workflow.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt; is amazing at building logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weft&lt;/strong&gt; feels like it is trying to improve how AI workflows themselves are structured.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That got me curious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/PreethaRaj/TokenWeaver-Lab/releases/download/v1.0.0/Demo.gif" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/TokenWeaver-Lab/releases/download/v1.0.0/Demo.gif&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Experiment Setup
&lt;/h2&gt;

&lt;p&gt;So I decided to experiment with it in one of my projects.&lt;/p&gt;

&lt;p&gt;I built a &lt;strong&gt;local-first research synthesis demo&lt;/strong&gt; to compare different orchestration styles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Weft-style orchestration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traditional Python full-buffer processing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staged pipeline executor&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval, summarization, and synthesis handled as isolated sequential steps&lt;/li&gt;
&lt;li&gt;No shared orchestration state&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;MapReduce-style orchestration&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Distributed &lt;em&gt;map&lt;/em&gt; stages for retrieval and synthesis&lt;/li&gt;
&lt;li&gt;Followed by aggregation and reduction steps&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The project focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token usage&lt;/li&gt;
&lt;li&gt;Cost visibility&lt;/li&gt;
&lt;li&gt;Orchestration efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4sk0s82p16rezyz89mv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4sk0s82p16rezyz89mv.png" alt="Weft Orchestration Cost Analysis" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Goal
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can changing orchestration patterns alone impact token usage and cost behaviour?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Experiment Design
&lt;/h2&gt;

&lt;p&gt;Some things I experimented with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shared retrieval pipelines to keep comparisons fair&lt;/li&gt;
&lt;li&gt;Token and cost visibility between orchestration approaches&lt;/li&gt;
&lt;li&gt;Deterministic local execution for reproducible experiments&lt;/li&gt;
&lt;li&gt;Research synthesis workflows without introducing API costs into evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Measurement Methodology
&lt;/h2&gt;

&lt;p&gt;Token usage was computed by tracking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total prompt tokens&lt;/li&gt;
&lt;li&gt;Generated completion tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;across each orchestration path.&lt;/p&gt;

&lt;p&gt;Cost estimates were normalized using equivalent model pricing assumptions so orchestration differences could be isolated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reduction Formula
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;((baseline token usage - orchestration token usage) / baseline token usage) × 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline&lt;/strong&gt;: Traditional Python full-buffer execution&lt;/li&gt;
&lt;li&gt;Larger context windows were repeatedly passed between stages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Orchestration approaches that reduced repeated context movement showed &lt;strong&gt;measurable token efficiency improvements&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lesson 1: Orchestration Matters More Than Expected&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We usually optimize prompts or models. But sometimes the bigger opportunity sits in &lt;strong&gt;workflow design itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In these experiments, orchestration decisions influenced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How often context was reloaded&lt;/li&gt;
&lt;li&gt;How retrieval outputs were reused&lt;/li&gt;
&lt;li&gt;Whether intermediate results were shared across stages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small architectural choices created &lt;strong&gt;downstream effects on token consumption and execution efficiency&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lesson 2: Fair Benchmarking Is Harder Than It Looks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Comparing approaches only works when inputs stay consistent.Even small differences can distort conclusions.&lt;/p&gt;

&lt;p&gt;To ensure fairness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval outputs were fixed across runs&lt;/li&gt;
&lt;li&gt;Execution remained deterministic&lt;/li&gt;
&lt;li&gt;All orchestration layers used identical source data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this control, token reductions may be misleading.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lesson 3: AI Engineering Is Becoming a Systems Design Problem&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Models still matter.&lt;/p&gt;

&lt;p&gt;But so do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pipelines&lt;/li&gt;
&lt;li&gt;Retrieval patterns&lt;/li&gt;
&lt;li&gt;Context flow&lt;/li&gt;
&lt;li&gt;Orchestration decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge is shifting from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Pick the best model”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Design efficient information movement systems&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key factors influencing performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context buffering strategies&lt;/li&gt;
&lt;li&gt;Retrieval reuse&lt;/li&gt;
&lt;li&gt;Execution ordering&lt;/li&gt;
&lt;li&gt;State management&lt;/li&gt;
&lt;li&gt;Orchestration topology&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Disclaimer
&lt;/h2&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; about replacing Python.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python remains the foundation&lt;/li&gt;
&lt;li&gt;Weft explores an additional layer: &lt;strong&gt;AI workflow orchestration&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Acknowledgment
&lt;/h2&gt;

&lt;p&gt;Courtesy: &lt;a href="https://github.com/WeaveMindAI/weft" rel="noopener noreferrer"&gt;https://github.com/WeaveMindAI/weft&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Big thanks to &lt;strong&gt;Weaver Mind AI&lt;/strong&gt; for sharing early ideas in this space.&lt;/p&gt;

&lt;p&gt;The project is still evolving — and I’m still learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call for Discussion
&lt;/h2&gt;

&lt;p&gt;Curious if others have experimented with &lt;strong&gt;orchestration-first AI development approaches&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/PreethaRaj/TokenWeaver-Lab" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/TokenWeaver-Lab&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Weavermind:&lt;/strong&gt; &lt;a href="https://weavemind.ai/" rel="noopener noreferrer"&gt;https://weavemind.ai/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I built a self-hosted RAG system for Journalism — What Production Retrieval Taught Me</title>
      <dc:creator>Preetha</dc:creator>
      <pubDate>Fri, 22 May 2026 08:49:06 +0000</pubDate>
      <link>https://dev.to/rpreetha/i-built-a-self-hosted-rag-system-for-journalism-what-production-retrieval-taught-me-2cki</link>
      <guid>https://dev.to/rpreetha/i-built-a-self-hosted-rag-system-for-journalism-what-production-retrieval-taught-me-2cki</guid>
      <description>&lt;p&gt;Over the last few months, I built &lt;strong&gt;Atlas&lt;/strong&gt; — a fully self-hosted retrieval system designed for journalism workflows. No paid APIs. No hosted vector databases or AI infrastructure. Just local models, PostgreSQL, pgvector, Celery, and a retrieval pipeline built to survive production traffic.&lt;/p&gt;

&lt;p&gt;I originally thought this would mostly be an infrastructure project. It wasn't. The hardest lessons appeared after deployment — when assumptions broke, retrieval quality drifted, and tiny implementation decisions started affecting reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does Atlas do?
&lt;/h2&gt;

&lt;p&gt;Atlas ingests live RSS feeds from BBC, Guardian, NYT, NPR, Deutsche Welle and more every 15 minutes, embeds content locally using sentence-transformers, stores vectors in PostgreSQL with pgvector, and answers questions with source-grounded citations.&lt;/p&gt;

&lt;p&gt;Beyond search it has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Grounded Q&amp;amp;A&lt;/strong&gt; — every answer maps to an exact source passage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claim-level fact-checking&lt;/strong&gt; — splits text into claims, scores each against evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Story brief generation&lt;/strong&gt; — key facts, open questions, suggested angles for reporters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-format repurposing&lt;/strong&gt; — one topic becomes newsletter, social post, audio script, headline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A full story workspace&lt;/strong&gt; — source notebooks, drafts, editorial review, version diff, publish readiness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/SearchAnswer.gif" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/SearchAnswer.gif&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/PartnerMode.gif" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence/releases/download/v1.0.0/PartnerMode.gif&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The retrieval pipeline
&lt;/h2&gt;

&lt;p&gt;Here is the full pipeline before I get into the lessons:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query string
    │
    ├── embed(query) → vector cosine &amp;gt; 0.30 → top 20 chunks
    ├── websearch_to_tsquery → PostgreSQL FTS → top 20 chunks
    └── Title FTS boost → top 10 articles
              │
              ▼
         RRF merge (k=60)
              │
         recency blend (85% relevance + 15% freshness)
              │
         post-cosine gate &amp;gt; 0.12
              │
         Policy engine (public / partner / paywall)
              │
         Response + inline citations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 1 — Pure vector search fails for news
&lt;/h2&gt;

&lt;p&gt;This surprised me. I assumed a good embedding model would handle everything. It does not — at least not for current events journalism.&lt;/p&gt;

&lt;p&gt;The problem: &lt;strong&gt;proper nouns&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Words like &lt;code&gt;Philippines&lt;/code&gt;, &lt;code&gt;Kishida&lt;/code&gt;, &lt;code&gt;Rafah&lt;/code&gt;, &lt;code&gt;Starmer&lt;/code&gt; are rare in any model's training data relative to their importance in daily news. The cosine similarity between &lt;code&gt;"Japan missile exports Philippines"&lt;/code&gt; and an article titled &lt;code&gt;"Tokyo defence deal with Manila confirmed"&lt;/code&gt; was &lt;strong&gt;0.28&lt;/strong&gt; — just below my original threshold of 0.30.&lt;/p&gt;

&lt;p&gt;The article was clearly relevant. The vector search missed it completely.&lt;/p&gt;

&lt;p&gt;Full-text search caught it immediately because &lt;code&gt;Japan&lt;/code&gt;, &lt;code&gt;missile&lt;/code&gt;, &lt;code&gt;Philippines&lt;/code&gt; all appeared in the article text.&lt;/p&gt;

&lt;p&gt;The fix was hybrid search. Vector catches semantic similarity. FTS catches proper nouns and exact terminology. Neither is sufficient alone for a news corpus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Three search paths merged with RRF
# Path 1: vector cosine similarity
# Path 2: websearch_to_tsquery (handles "Japan Philippines" as two terms)
# Path 3: title-specific FTS (weighted 0.7x to avoid title-only noise)
&lt;/span&gt;
&lt;span class="c1"&gt;# RRF merge — no score normalisation needed because it only uses rank position
# final_score = Σ 1 / (60 + rank_i)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 2 — Batch embedding is not a micro-optimisation
&lt;/h2&gt;

&lt;p&gt;I was calling &lt;code&gt;embed()&lt;/code&gt; once per article for the first two weeks. Here is what that looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17 feeds × 30 articles × embed(1 article) × 100ms = 51 seconds per ingest cycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After switching to batch embedding — collect all articles, call &lt;code&gt;embed([t1, t2, ..., tN])&lt;/code&gt; once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;17 feeds × 30 articles = 510 articles
embed(510 articles)    = ~3 seconds total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;17× faster.&lt;/strong&gt; The model inference overhead is almost entirely fixed cost per batch, not per item. This is obvious from the PyTorch documentation but I had not read it carefully enough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — slow
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;insert_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After — fast
&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;vecs&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# single call, returns (N, 384) array
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vecs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;insert_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 3 — The cosine threshold is your precision-recall dial
&lt;/h2&gt;

&lt;p&gt;Atlas has two thresholds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;COSINE_MIN&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;   &lt;span class="c1"&gt;# SQL WHERE — pre-filter before leaving DB
&lt;/span&gt;&lt;span class="n"&gt;POST_COSINE_MIN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.12&lt;/span&gt;   &lt;span class="c1"&gt;# post-RRF — sanity gate after merge
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I learned tuning these:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.45&lt;/td&gt;
&lt;td&gt;Missed "Japan missile Philippines" — too restrictive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;td&gt;Good balance for a news corpus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;td&gt;Sports results started appearing for political queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The intuition: news articles about related topics often use completely different vocabulary than the query. A threshold of 0.30 allows the model to bridge that vocabulary gap. A threshold of 0.45 requires the query and article to use nearly identical language — which defeats the purpose of semantic search.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;POST_COSINE_MIN = 0.12&lt;/code&gt; exists only to handle FTS-only hits. When an article is found by keyword search but has no semantic overlap with the query (cosine = 0.0), it means the keyword match was probably accidental. The post-filter removes those.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4 — Celery beat scheduling has a startup timing problem
&lt;/h2&gt;

&lt;p&gt;The beat schedule runs &lt;code&gt;ingest_all_feeds&lt;/code&gt; every 15 minutes. But there is a subtle issue: on a fresh deploy, the first beat fires at the next &lt;code&gt;:00&lt;/code&gt;, &lt;code&gt;:15&lt;/code&gt;, &lt;code&gt;:30&lt;/code&gt;, or &lt;code&gt;:45&lt;/code&gt; UTC boundary — not 15 minutes from startup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Deploy at 14:01 → first ingest at 14:15  ✓ fine
Deploy at 14:14 → first ingest at 14:15  ✓ fine
Deploy at 14:00:01 → first ingest at 14:15  ✗ 15 minute corpus gap on first launch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix was &lt;code&gt;timedelta(minutes=15)&lt;/code&gt; instead of &lt;code&gt;crontab(minute='*/15')&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;startup_ingest&lt;/code&gt; task also checks corpus article count before honouring the Redis dedup flag. Empty corpus → ingest regardless. This handles &lt;code&gt;docker-compose down -v&lt;/code&gt; (fresh database) correctly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;beat_schedule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingest-every-15-min&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks.ingest_all_feeds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# from startup, not clock-aligned
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;startup-ingest-once&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks.startup_ingest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;    &lt;span class="c1"&gt;# fires once, Redis dedup prevents repeats
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 5 — The Docker healthcheck dependency chain matters
&lt;/h2&gt;

&lt;p&gt;This one took me an embarrassing amount of time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;celery-beat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;celery-worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;   &lt;span class="c1"&gt;# ← this line is critical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;service_healthy&lt;/code&gt;, beat starts immediately and dispatches tasks before any worker is ready to consume them. The tasks sit in the queue, beat fires again in 15 minutes, tasks pile up.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;service_healthy&lt;/code&gt;, beat waits until a worker is confirmed ready. Clean startup every time.&lt;/p&gt;

&lt;p&gt;The worker healthcheck uses &lt;code&gt;celery inspect ping&lt;/code&gt; which confirms the worker is actually processing — not just that the container started.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;The infrastructure has a &lt;code&gt;warmup_reranker()&lt;/code&gt; stub in &lt;code&gt;main.py&lt;/code&gt; for a cross-encoder reranker. That is the highest-impact next upgrade — running &lt;code&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/code&gt; over the top-20 RRF results before returning to the user. Adds ~100ms latency but meaningfully improves ranking for ambiguous queries.&lt;/p&gt;

&lt;p&gt;I am also looking at adding a BM25 path via the &lt;code&gt;pg_bm25&lt;/code&gt; extension (ParadeDB) to replace the PostgreSQL FTS path. BM25 handles document length normalisation better than &lt;code&gt;tsvector&lt;/code&gt; for longer articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  The project
&lt;/h2&gt;

&lt;p&gt;Atlas is built to learn and adapt. The README has a two-week tutorial walking through each layer of the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/PreethaRaj/atlas-editorial-intelligence" rel="noopener noreferrer"&gt;https://github.com/PreethaRaj/atlas-editorial-intelligence&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; FastAPI · PostgreSQL 16 · pgvector · Celery · Redis · sentence-transformers · Next.js 14 · Docker Compose&lt;/p&gt;

&lt;p&gt;Happy to answer questions on the retrieval architecture, the pgvector schema, or the Celery configuration in the comments.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>mcp</category>
      <category>postgressql</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
