DEV Community

Resmon Rama Rondonuwu
Resmon Rama Rondonuwu

Posted on

How I Cured "AI Amnesia" Without Vector DBs (Zero Cost Architecture) Daemon's Project

Daemon's Project

Hi DEV Community! 👋 First time posting here.

I'm Rama, a solo builder From Indonesia, and for the past few months, I've been secretly building an AI companion called Daemon.

Like many of you, I kept hitting the exact same frustrating wall: AI amnesia.

The common advice in the industry is always:

"Just throw a Vector DB at it and use RAG!"

But as a solo dev trying to keep everything local, private, and low-cost, I wanted to explore a completely different question: What if the real problem isn’t memory size… but a lack of reasoning discipline?

So, instead of starting with embeddings, I went the opposite direction.

  • ❌ No Vector DB
  • ❌ No paid APIs
  • ✅ Just n8n + Local PostgreSQL + Strict Prompt Architecture (100% Free / Self-Hosted)

💡 The Core Idea: "Logic First, Memory Discipline First, Vector Later"

Most “AI memory” systems I tested had the same fatal flaws:

  • Semantic noise → Unrelated things get linked just because the words sound similar.
  • Over-inference → The AI assumes way too much from weak signals (Logical Leaps).
  • Context drift → Updated user preferences get ignored because the old data is still in the database.

So, I built a system focused on controlling how the AI thinks, not just what it remembers.


⚙️ Architecture Overview (Ignite Contextual Memory)

1. Layered Memory (SQL-based)

Instead of dumping everything into a vector store, memory is strictly structured into layers:

  • Window Memory → The active, ongoing conversation.
  • Session Summary → Compressed context/minutes of the meeting.
  • Core Memory (Tagged) → Hard facts locked behind tags like [PROFILE], [PROJECT], [STATE], [PREFERENCE].

All retrieval is done via deterministic SQL queries (PostgreSQL orchestrated by n8n).
👉 The Result: 100% predictable, absolutely zero semantic noise, and $0 API cost.

2. Inference Gate (Anti-Hallucination Layer)

The system forces the AI to strictly separate Explicit Facts vs Assumptions.

Example:

User: "I like crows. My project is Black Vault. What should the logo be?"
Daemon: "Not necessarily a crow. You said you like crow symbolism, but you haven’t defined it as the project identity yet. It could be an option, but right now, that’s still an assumption."

👉 The Result: No forced conclusions and zero “Yes-Man” behavior.

3. Semantic Bridging

Instead of relying on embeddings to find similarities, I use controlled, logical linking.

  • "AI Companion""Thinking Partner"
  • "External mind""Reflective system"

This allows the AI to track concept evolution naturally, even across long, heavily distracted conversations.


🧪 Validation & Stress Testing

I ran this architecture through a brutal evaluation suite (evaluated by ChatGPT Pro) focused on context continuity, contradiction handling, and memory hygiene.

The Results:

  • ✅ Maintains context across heavy distractions
  • ✅ Rejects false assumptions and refuses to hallucinate
  • ✅ Handles changing user preferences correctly ("Last revision wins")
  • ✅ Keeps multiple project contexts strictly separated

🤖 What Daemon Actually Does

Fun fact: I named it Daemon (inspired by the companions in The Golden Compass) because I wanted an entity that grows alongside the user, not just a stateless bot that resets every time you close the tab.

Daemon isn’t just a Q&A chatbot. It acts as a State-Aware Thinking Partner.
It can:

  • Break down complex decisions (trade-offs, risks).
  • Challenge your assumptions (Challenger Mode).
  • Structure vague ideas into clear, actionable concepts.
  • Maintain perfect context across long, multi-day discussions.

⚠️ Important Limitations

I want to be transparent—this approach is not a magic bullet:

  • No semantic vector search yet: Scaling to massive, unstructured documents is still limited.
  • Fully reactive: It doesn't make proactive suggestions (yet).
  • Works best in focused, structured contexts rather than free-flowing creative chaos.

This doesn’t replace vector-based systems. It’s more about building cognitive discipline before scaling semantic retrieval.


🧭 The Big Takeaway

After building this, my main insight is this: The biggest limitation of LLM systems isn’t memory size—it’s uncontrolled reasoning and assumption drift. Before scaling with embeddings, it might be worth asking: Does your AI actually know when it should NOT assume something?


💬 Open Question for the Community

Has anyone else here tried building SQL-based memory systems or non-vector approaches to context management?

Curious to hear your thoughts, critiques, or even architectural roasts! 😄

Cheers! 🍻

Top comments (5)

Collapse
 
apex_stack profile image
Apex Stack

The Inference Gate is the part that most AI memory systems skip entirely — everyone focuses on "what to store" and ignores "when should the AI refuse to connect dots." The crow/Black Vault example in your article is exactly the failure mode I run into with LLM-generated content at scale: the model makes a plausible-sounding inference that isn't grounded in what the user actually said, and it goes undetected because it's coherent.

The "Last revision wins" principle in your Core Memory layer is also something I've fought with directly. I run a Qwen 3.5 8B locally to generate content across thousands of pages, and old preference data leaking into new generations is a real problem — deterministic SQL retrieval with explicit tag scoping solves it far better than hoping embeddings weight recency correctly.

Curious about n8n as the orchestrator: how do you handle branching logic when the Inference Gate rejects a connection? Does Daemon surface the rejection explicitly to the user (like in your crow example), or does it silently reroute and only flag ambiguity if pressed? The transparent rejection approach seems better for trust-building, but wondering if users find it jarring in practice.

Collapse
 
ramarondonuwu profile image
Resmon Rama Rondonuwu

Hi Apex Stack, thanks for the insightful comment!

Great to meet another local LLM enthusiast—Qwen 3.5 8B is a beast, but you’re right, it’s prone to that 'plausible-sounding' drift if the retrieval isn’t strictly scoped.

Regarding your question on the Inference Gate and n8n branching:

In Daemon's current architecture, I opted for Explicit Transparency. When the Inference Gate identifies an ungrounded connection, it doesn't just silently reroute. Instead, the n8n workflow triggers a specific 'Clarification Node' that surfaces the ambiguity to the user.

Why? Because a true 'Thinking Partner' shouldn't guess.

In practice, users (mostly myself for now) find it much less 'jarring' than the alternative—which is the AI confidently building an entire project based on a false assumption. By explicitly saying, 'I noticed you mentioned X, but I’m not assuming it’s part of project Y yet,' it actually reduces cognitive load. You don't have to keep double-checking if the AI is still on the same page.

In n8n, this is handled via a simple Switch Node after the Inference Gate check:

  1. Path A (Grounded): Proceed to execution.

  2. Path B (Ambiguous): Trigger a 'Hold & Clarify' response template.

It’s definitely a shift from the typical 'seamless' AI UX, but for complex logical builds, it’s been a lifesaver. Have you tried any explicit flagging with your Qwen setup?"

Collapse
 
apex_stack profile image
Apex Stack

The Clarification Node approach makes total sense — and yes, I've been experimenting with something similar in my Qwen pipeline. Instead of a formal gate, I use explicit JSON schema validation as an early exit: if the model output doesn't match the expected structure (which happens maybe 8-12% of the time with financial data), the batch job logs the ambiguity and queues it for retry rather than silently writing garbage to the database. It's not as elegant as your Switch Node pattern, but functionally it's the same principle — surface the uncertainty instead of letting it propagate downstream. Your point about reducing cognitive load really landed: users not having to "keep double-checking if the AI is still on the same page" is exactly the UX win. I might steal the Clarification Node concept and apply it to my scheduled agents — right now they fail silently on edge cases, which is worse than failing loudly.

Thread Thread
 
ramarondonuwu profile image
Resmon Rama Rondonuwu • Edited

Hi Apex! Thanks for the insightful comment. I completely relate to the 'silent failure' nightmare, especially when dealing with strict outputs like financial data.

The JSON schema validation you're using is basically a 'Bouncer'. It strictly checks if the data has the right ID card (format) before letting it into the database. It’s crucial, but as you noticed, a hallucinated transaction formatted perfectly as JSON will still get past the Bouncer.

The 'Clarification Node' in Daemon acts more like an 'Investigator' before the data even reaches the Bouncer. Here is the conceptual 'meat' of how I structured it—which you might easily adapt for your scheduled Qwen agents:

The 'Investigator' Logic Flow (Adaptable for Batch Jobs):

1. The Extraction (Reasoning First): Instead of asking the LLM to output the final data directly to fit the schema, I first ask it to evaluate the context and output its reasoning along with a Confidence_Score (0-100).

2. The Inference Gate (Switch Node): I use a Switch/If node right after the LLM's evaluation:

  • If Confidence_Score >= 90: The data is clean. Route it to the final JSON formatting node (The Bouncer).
  • If Confidence_Score < 90: Ambiguity detected. Route it to the 'Clarification Path'.

3. The Handling (For Autonomous Agents): Since your agents run on a schedule and can't ask a user in real-time like my Daemon does, your 'Clarification Path' could simply format the ambiguous output into an 'Exception Log' or a 'Human Review Queue' instead of forcing a database write.

By isolating the Evaluation of Certainty from the Formatting of Output, the model doesn't feel pressured to 'guess' or hallucinate just to satisfy the JSON schema. It has a safe exit route when it's unsure.

I hope this conceptual breakdown helps you integrate it! Would love to hear if separating the reasoning from the formatting helps reduce your 8-12% failure rate.

Edited

Just had another thought regarding the 'silent failure' in your Qwen pipeline, Apex.

I'm curious, does your 8-12% failure rate usually happen because the LLM completely breaks the JSON format, or is it more of a 'perfectly formatted hallucination' issue?

To use a simple analogy, I was wondering if your scenario looks something like this:

  • Input: 1+1=?
  • LLM Output: {"result": 4}
  • JSON Schema: "Formatting is perfect. Let it pass." -> Result: Corrupted data is silently saved.

If that's the core issue, have you ever experimented with a "Dual-Agent" or "Validator LLM" approach before it hits the JSON Schema gate? Basically having a smaller/faster LLM strictly evaluate the logical consistency before the JSON validation happens.

I'm really interested to hear your perspective on this. Have you tried something similar in your batch jobs, or does that add too much latency/cost for your specific financial use case?

Thread Thread
 
apex_stack profile image
Apex Stack

The 1+1={result:4} analogy is exactly right — that's the dominant failure mode. Roughly 70% of our bad outputs are "valid JSON, wrong numbers" (Qwen confidently writes a P/E ratio of 2847 for a bank stock because it pattern-matched to something plausible-sounding). The remaining ~30% is broken structure, usually think-block leakage when the model forgets to disable reasoning mode in batch.

For the validator question: I haven't gone full dual-agent yet — the cost doubles per page at 8,000+ tickers. What I do instead is a sample-and-audit loop: after each batch run, I validate a random 5% of outputs against Yahoo Finance raw numbers directly. If the error rate exceeds a threshold, I flag the whole batch for retry. It's async rather than real-time, but it's essentially your Inference Gate shifted to post-generation.

Your "Evaluation of Certainty before Formatting of Output" split is the right architecture. The key win is catching the confident-but-wrong case — exactly what JSON schema validation alone misses.