I’m building Orion, a proactive AI companion. Not just a chatbot — something that actually remembers things over time and can act intelligently before being asked.
**My first attempt was naïve:
“Let’s pass all past memory to the LLM!”
Result: 💸 Massive token usage, 🤯 polluted context, 😅 nonsense answers.
**Second attempt:
“More retrieval = better answers”
Result: LLM got distracted, important memories got buried, cost went up, and I learned that more isn’t always better.
What finally started working: structured memory layers
Short-term memory → current session, quick context
Long-term memory → structured facts like preferences, events
Vector memory → semantic recall for similar past situations
TTL + scoring → forgetting intentionally is a feature
The tricky part isn’t the LLM or the vector DB — it’s deciding what to remember, when to update, and what to forget.
I documented everything:
architecture diagrams, memory flows, mistakes, lessons learned. It’s messy, but real.
Repo (full system + docs + diagrams):
👉 https://github.com/vivek-1314/orion-py
Top comments (0)