DEV Community

Resmon Rama Rondonuwu
Resmon Rama Rondonuwu

Posted on

How I Cured "AI Amnesia" Without Vector DBs (Zero Cost Architecture) Daemon's Project

Daemon's Project

Hi DEV Community! 👋 First time posting here.

I'm Rama, a solo builder From Indonesia, and for the past few months, I've been secretly building an AI companion called Daemon.

Like many of you, I kept hitting the exact same frustrating wall: AI amnesia.

The common advice in the industry is always:

"Just throw a Vector DB at it and use RAG!"

But as a solo dev trying to keep everything local, private, and low-cost, I wanted to explore a completely different question: What if the real problem isn’t memory size… but a lack of reasoning discipline?

So, instead of starting with embeddings, I went the opposite direction.

  • ❌ No Vector DB
  • ❌ No paid APIs
  • ✅ Just n8n + Local PostgreSQL + Strict Prompt Architecture (100% Free / Self-Hosted)

💡 The Core Idea: "Logic First, Memory Discipline First, Vector Later"

Most “AI memory” systems I tested had the same fatal flaws:

  • Semantic noise → Unrelated things get linked just because the words sound similar.
  • Over-inference → The AI assumes way too much from weak signals (Logical Leaps).
  • Context drift → Updated user preferences get ignored because the old data is still in the database.

So, I built a system focused on controlling how the AI thinks, not just what it remembers.


⚙️ Architecture Overview (Ignite Contextual Memory)

1. Layered Memory (SQL-based)

Instead of dumping everything into a vector store, memory is strictly structured into layers:

  • Window Memory → The active, ongoing conversation.
  • Session Summary → Compressed context/minutes of the meeting.
  • Core Memory (Tagged) → Hard facts locked behind tags like [PROFILE], [PROJECT], [STATE], [PREFERENCE].

All retrieval is done via deterministic SQL queries (PostgreSQL orchestrated by n8n).
👉 The Result: 100% predictable, absolutely zero semantic noise, and $0 API cost.

2. Inference Gate (Anti-Hallucination Layer)

The system forces the AI to strictly separate Explicit Facts vs Assumptions.

Example:

User: "I like crows. My project is Black Vault. What should the logo be?"
Daemon: "Not necessarily a crow. You said you like crow symbolism, but you haven’t defined it as the project identity yet. It could be an option, but right now, that’s still an assumption."

👉 The Result: No forced conclusions and zero “Yes-Man” behavior.

3. Semantic Bridging

Instead of relying on embeddings to find similarities, I use controlled, logical linking.

  • "AI Companion""Thinking Partner"
  • "External mind""Reflective system"

This allows the AI to track concept evolution naturally, even across long, heavily distracted conversations.


🧪 Validation & Stress Testing

I ran this architecture through a brutal evaluation suite (evaluated by ChatGPT Pro) focused on context continuity, contradiction handling, and memory hygiene.

The Results:

  • ✅ Maintains context across heavy distractions
  • ✅ Rejects false assumptions and refuses to hallucinate
  • ✅ Handles changing user preferences correctly ("Last revision wins")
  • ✅ Keeps multiple project contexts strictly separated

🤖 What Daemon Actually Does

Fun fact: I named it Daemon (inspired by the companions in The Golden Compass) because I wanted an entity that grows alongside the user, not just a stateless bot that resets every time you close the tab.

Daemon isn’t just a Q&A chatbot. It acts as a State-Aware Thinking Partner.
It can:

  • Break down complex decisions (trade-offs, risks).
  • Challenge your assumptions (Challenger Mode).
  • Structure vague ideas into clear, actionable concepts.
  • Maintain perfect context across long, multi-day discussions.

⚠️ Important Limitations

I want to be transparent—this approach is not a magic bullet:

  • No semantic vector search yet: Scaling to massive, unstructured documents is still limited.
  • Fully reactive: It doesn't make proactive suggestions (yet).
  • Works best in focused, structured contexts rather than free-flowing creative chaos.

This doesn’t replace vector-based systems. It’s more about building cognitive discipline before scaling semantic retrieval.


🧭 The Big Takeaway

After building this, my main insight is this: The biggest limitation of LLM systems isn’t memory size—it’s uncontrolled reasoning and assumption drift. Before scaling with embeddings, it might be worth asking: Does your AI actually know when it should NOT assume something?


💬 Open Question for the Community

Has anyone else here tried building SQL-based memory systems or non-vector approaches to context management?

Curious to hear your thoughts, critiques, or even architectural roasts! 😄

Cheers! 🍻

Top comments (2)

Collapse
 
apex_stack profile image
Apex Stack

The Inference Gate is the part that most AI memory systems skip entirely — everyone focuses on "what to store" and ignores "when should the AI refuse to connect dots." The crow/Black Vault example in your article is exactly the failure mode I run into with LLM-generated content at scale: the model makes a plausible-sounding inference that isn't grounded in what the user actually said, and it goes undetected because it's coherent.

The "Last revision wins" principle in your Core Memory layer is also something I've fought with directly. I run a Qwen 3.5 8B locally to generate content across thousands of pages, and old preference data leaking into new generations is a real problem — deterministic SQL retrieval with explicit tag scoping solves it far better than hoping embeddings weight recency correctly.

Curious about n8n as the orchestrator: how do you handle branching logic when the Inference Gate rejects a connection? Does Daemon surface the rejection explicitly to the user (like in your crow example), or does it silently reroute and only flag ambiguity if pressed? The transparent rejection approach seems better for trust-building, but wondering if users find it jarring in practice.

Collapse
 
ramarondonuwu profile image
Resmon Rama Rondonuwu

Hi Apex Stack, thanks for the insightful comment!

Great to meet another local LLM enthusiast—Qwen 3.5 8B is a beast, but you’re right, it’s prone to that 'plausible-sounding' drift if the retrieval isn’t strictly scoped.

Regarding your question on the Inference Gate and n8n branching:

In Daemon's current architecture, I opted for Explicit Transparency. When the Inference Gate identifies an ungrounded connection, it doesn't just silently reroute. Instead, the n8n workflow triggers a specific 'Clarification Node' that surfaces the ambiguity to the user.

Why? Because a true 'Thinking Partner' shouldn't guess.

In practice, users (mostly myself for now) find it much less 'jarring' than the alternative—which is the AI confidently building an entire project based on a false assumption. By explicitly saying, 'I noticed you mentioned X, but I’m not assuming it’s part of project Y yet,' it actually reduces cognitive load. You don't have to keep double-checking if the AI is still on the same page.

In n8n, this is handled via a simple Switch Node after the Inference Gate check:

  1. Path A (Grounded): Proceed to execution.

  2. Path B (Ambiguous): Trigger a 'Hold & Clarify' response template.

It’s definitely a shift from the typical 'seamless' AI UX, but for complex logical builds, it’s been a lifesaver. Have you tried any explicit flagging with your Qwen setup?"