Epsikamjung
So this week was spent on embedding memory in a GenAI memory. You remember what your name is? Offcourse, you do, or do you? Actually , that's how memory works, everything that is important information is stored in it. But we don't remember everything, for example, try remembering the first word of this article. Now, your brain stores information in 2 parts: long-term and short-term, which is self-explanatory. Same goes for an LLM; we can store important things in Long-term, and some small, irrelevant type info can be referred as short-term. But the question is, how can we do it? And is it even necessary?
Github for code : https://github.com/Ashdeep-Singh-97/GenAI-memory
Short-Term vs Long-Term Memory in LLMs
When humans have a conversation, we naturally remember things said a few moments ago (short-term memory), but we don’t retain every detail forever. Only meaningful events or facts are stored in long-term memory.
For an LLM (Large Language Model), the same principle applies:
Short-Term Memory: This is like the conversation window or context. The model remembers the current dialogue and past few exchanges, but this memory disappears once the session ends.
Long-Term Memory: This is persistent memory, stored outside the model in databases, vector stores, or graph stores. It allows the AI to recall facts, preferences, and past conversations across multiple sessions.
Why is this important? Without long-term memory, an LLM feels like someone with amnesia — it can have smart responses in the moment, but it won’t remember you tomorrow. With persistent memory, the AI becomes more human-like, able to build relationships and continuity over time.
Let's see it in code.
Refer github link for code. Now since you have read the code , you can see it is a simple implementation of long-term + short-term memory using mem0ai, Neo4j (for graph-based storage), and Qdrant (for vector embeddings).
Here’s what’s happening:
Neo4j (Graph Store) → Stores relationships between pieces of information (like a mind map). This is great for knowledge graphs.
Qdrant (Vector Store) → Stores embeddings (numerical representations of text). This helps in semantic search so the model can recall similar memories.
How this works:
Search memory: Before responding, the code fetches relevant past memories (mem.search) for the user "piyush".
System prompt building: These memories are inserted into the system prompt, giving the AI context about past interactions.
Generate response: The query is sent to OpenAI (gpt-4.1-mini) with both the user query and context.
Store new memory: The conversation (both user’s question and AI’s response) is stored in the memory for future reference.
Why this matters
This approach gives your GenAI app a sense of continuity. With short-term memory, it handles the current session, and with long-term memory, it builds a persistent knowledge base about users. The blend of vector search (semantic recall) and graph storage (structured relationships) makes the memory powerful and human-like.
And with this , let's wrap today's article. Hopefully you'll store it in your Long-term memory.
Keep following for more.
Peace.....
Top comments (0)