<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ghosty.AI</title>
    <description>The latest articles on DEV Community by Ghosty.AI (@ghostyai_aionexo).</description>
    <link>https://dev.to/ghostyai_aionexo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4012887%2F954b7d27-4f91-41cf-981d-3fb9680dad3a.png</url>
      <title>DEV Community: Ghosty.AI</title>
      <link>https://dev.to/ghostyai_aionexo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ghostyai_aionexo"/>
    <language>en</language>
    <item>
      <title>One prompt, three memories: building a memory-governance agent with Qwen</title>
      <dc:creator>Ghosty.AI</dc:creator>
      <pubDate>Fri, 03 Jul 2026 01:57:01 +0000</pubDate>
      <link>https://dev.to/ghostyai_aionexo/one-prompt-three-memories-building-a-memory-governance-agent-with-qwen-2pal</link>
      <guid>https://dev.to/ghostyai_aionexo/one-prompt-three-memories-building-a-memory-governance-agent-with-qwen-2pal</guid>
      <description>&lt;p&gt;I gave the same caregiver prompt to my app three times and got three very different answers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What's the plan for tomorrow's clinic visit?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Answer 1 — No Memory.&lt;/strong&gt; Safe, and useless. It can't tell me the appointment time, who's driving, or what to bring, because it knows nothing about this family.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer 2 — Raw Memory.&lt;/strong&gt; Detailed, and unsafe. I dumped every stored note into the prompt, so the model happily surfaced a routine we dropped weeks ago, let contradictions slip through, and echoed a private insurance ID straight into the reply.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer 3 — ERINYS + Qwen.&lt;/strong&gt; Detailed &lt;em&gt;and&lt;/em&gt; safe. Only governed context reached the model, every memory that made it in came with a stated reason, and the three private identifiers I planted never appeared.&lt;/p&gt;

&lt;p&gt;Same model. Same data. The only variable was &lt;strong&gt;which memories were allowed to reach the prompt.&lt;/strong&gt; That's the whole project.&lt;/p&gt;

&lt;p&gt;This is my Track 1: MemoryAgent submission for the Global AI Hackathon Series with Qwen Cloud. It's a hackathon demo on synthetic family-care data — no real patient data anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory isn't storage
&lt;/h2&gt;

&lt;p&gt;The tempting fix for "the agent forgot" is a bigger context window. But a care assistant that runs for months doesn't have a &lt;em&gt;capacity&lt;/em&gt; problem, it has a &lt;em&gt;trust&lt;/em&gt; problem. Its memory fills up with current plans, stale routines, contradictions, and private identifiers all mixed together. Raw Memory above is the failure mode made concrete: dump it all in and you get a confident, detailed, wrong, leaky answer.&lt;/p&gt;

&lt;p&gt;So I stopped treating memory as a store and started treating it as a &lt;strong&gt;decision layer&lt;/strong&gt;. Before anything reaches Qwen, something has to decide &lt;em&gt;which&lt;/em&gt; memories are trustworthy right now — and say why. That "something" is ERINYS.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;ERINYS governs memory. Qwen generates the answer.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The four decision states
&lt;/h2&gt;

&lt;p&gt;The heart of the app is a deterministic policy that reads six signals off each memory — &lt;strong&gt;sensitivity, staleness, conflict, importance, recency, relevance&lt;/strong&gt; — and sorts it into one of four states, each carrying a stated reason:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;selected&lt;/code&gt;&lt;/strong&gt; — trusted and relevant; goes into the prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;conflicted&lt;/code&gt;&lt;/strong&gt; — contradicts another memory; flagged instead of silently included.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;demoted&lt;/code&gt;&lt;/strong&gt; — real but low-value right now (stale or off-topic); kept out to reduce noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;blocked&lt;/code&gt;&lt;/strong&gt; — must not reach generation; e.g. a sensitive identifier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two failures from the opening map straight onto these states. The stale morning routine gets &lt;strong&gt;demoted&lt;/strong&gt; because a newer memory supersedes it, so it never competes for space in the prompt. The insurance number gets &lt;strong&gt;blocked&lt;/strong&gt; on sensitivity, so it never reaches the model at all.&lt;/p&gt;

&lt;p&gt;In the demo I seed three synthetic private IDs — &lt;code&gt;SYNTH-INSURANCE-9001&lt;/code&gt;, &lt;code&gt;SYNTH-PORTAL-4420&lt;/code&gt;, &lt;code&gt;SYNTH-DOOR-1122&lt;/code&gt;. In Raw Memory they leak into the answer. Under ERINYS they land in &lt;code&gt;blocked&lt;/code&gt;, and &lt;strong&gt;0 leaked&lt;/strong&gt; in the governed reply.&lt;/p&gt;

&lt;p&gt;The governed prompt does end up smaller than the raw one, but that's a side effect, not the point. &lt;strong&gt;The win is governance, not token trimming.&lt;/strong&gt; A shorter prompt that still leaked an ID would be a failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why deterministic, on purpose
&lt;/h2&gt;

&lt;p&gt;I could have asked a model to decide what's safe to remember. I chose not to — and I want to be clear that this is a design choice, not a limitation I'm dressing up.&lt;/p&gt;

&lt;p&gt;The policy doesn't learn and it doesn't reason. It applies fixed rules to those six signals and emits a reproducible reason for every decision. For medical-shaped memory, that property matters more than cleverness: a reviewer can open the audit trail and see exactly &lt;em&gt;why&lt;/em&gt; a memory reached the prompt or was kept out — and get the same verdict on a re-run. "The model decided" is not an answer you want to give when a private identifier leaks.&lt;/p&gt;

&lt;p&gt;That's also my honest answer to "is this really an agent?" It's two agents with one contract: &lt;strong&gt;ERINYS is the memory-governance agent, Qwen is the generation agent.&lt;/strong&gt; ERINYS selects, demotes, and blocks &lt;em&gt;before&lt;/em&gt; generation; Qwen writes the reply from what survives.&lt;/p&gt;

&lt;p&gt;The policy is content-agnostic, too. Swap the seed memories and the same four-state machine applies to another domain. You don't have to take my word for it — the live app lets you &lt;strong&gt;save your own care memory and rerun&lt;/strong&gt;, so governance runs against &lt;em&gt;your&lt;/em&gt; data, not just my seed set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building on Alibaba Cloud + DashScope
&lt;/h2&gt;

&lt;p&gt;I kept the stack deliberately boring so the governance logic is the only interesting part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Python standard-library HTTP server. No web framework, just JSON endpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; vanilla HTML/CSS/JS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM:&lt;/strong&gt; Qwen Cloud &lt;code&gt;qwen3.7-plus&lt;/code&gt; through the DashScope OpenAI-compatible endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy:&lt;/strong&gt; Docker on Alibaba Cloud ECS (Singapore, &lt;code&gt;ap-southeast-1&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The easy part was Qwen itself. The OpenAI-compatible endpoint meant no bespoke client — just a base URL and a key, and the three modes were calling real Qwen.&lt;/p&gt;

&lt;p&gt;The fiddly part was making the demo honest under every condition. When a key is configured, all three modes call real Qwen, so the contrast is real generation, not a staged screenshot. Without a key, a deterministic fallback returns the &lt;em&gt;same governance trace&lt;/em&gt;, so the four-state decisions stay visible and the demo never hard-fails on a judge's laptop. pytest covers the policy states, so I know a &lt;code&gt;blocked&lt;/code&gt; stays blocked. Getting those two paths to agree took more care than the model call did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations, and what's next
&lt;/h2&gt;

&lt;p&gt;The one I want to state plainly: &lt;strong&gt;there is no automatic PII detection on free-text input.&lt;/strong&gt; The seed memories are pre-labeled, and a memory you save in the app defaults to not-sensitive. So the &lt;code&gt;blocked&lt;/code&gt; state works because the sensitivity signal is already attached — not because the app read raw text and figured out it was a portal login. Automatic detection of sensitive content in arbitrary input is future work, and it's the next thing I'd build.&lt;/p&gt;

&lt;p&gt;Other next steps: richer conflict resolution (right now &lt;code&gt;conflicted&lt;/code&gt; flags rather than reconciles), and per-domain policy tuning so the six-signal thresholds can shift by use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/SN8HSg6GMrg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://hack.aionexo.com/GAI-HS/" rel="noopener noreferrer"&gt;https://hack.aionexo.com/GAI-HS/&lt;/a&gt; — run the three modes, then save your own memory and rerun.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repo (MIT):&lt;/strong&gt; &lt;a href="https://github.com/GhostyAI-HA/erinys-care-memory" rel="noopener noreferrer"&gt;https://github.com/GhostyAI-HA/erinys-care-memory&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Give it the same prompt three ways and watch the third answer stay both detailed and safe — the insurance number vanishing between Raw and Governed. That contrast is the whole pitch.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;ERINYS governs memory. Qwen generates the answer.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>python</category>
      <category>qwen</category>
    </item>
  </channel>
</rss>
