I got tired of my agents making things up in long-horizon or multi-session workflows. So I built a memory layer that refuses to.
EidolonDB gives agents three memory tiers — short_term, episodic, and semantic — with automatic promotion and decay. You ingest raw conversation text, and an LLM pipeline extracts structured memories, classifies them by tier, scores importance, and deduplicates. Over time, short-term facts either promote to long-term knowledge or expire.
Key idea: if something isn’t in memory, the system rejects the premise instead of guessing.
How I validated it
I built an eval harness with 8 multi-session scenarios:
project assistant
personal assistant
technical support
preference drift
ambiguous recall
contradictory memory
incomplete recall
temporal retrieval
Each scenario spans 3 sessions, with a held-out judge scoring:
recall accuracy
hallucination / false-premise acceptance
Results
No-memory baseline: 0.158
RAG baseline: 0.933
(same rejection prompting; naive RAG ~0.65)
EidolonDB: 1.000
In particular, EidolonDB consistently rejected false premises that weren’t present in memory.
What’s available
REST API (self-host or cloud)
Fully self-hostable (Docker + Postgres)
JS SDK (@eidolondb/client) and Python SDK (eidolondb)
Temporal retrieval (“what did we discuss last session?”)
Retrieval feedback loop for lifecycle weighting
Pricing:
Free tier
Developer ($19/mo)
Growth ($99/mo)
Links
Site: https://eidolondb.com
Docs: https://eidolondb.com/docs
GitHub: https://github.com/millbj92/eidolondb
Happy to answer questions about eval methodology, lifecycle design, or architecture.
Top comments (0)