Body:
I’ve been building a lot of AI agents lately. The logic is usually fun to write, but I kept hitting the same wall: State Management.
LLMs are stateless. As soon as the session ends, the context is gone. Sure, you can pass the chat history, but once that hits the token limit, your agent gets amnesia.
The standard solution is "Just use a Vector Database."
But setting up a dedicated Pinecone or Weaviate instance, configuring the embedding pipeline (OpenAI/HuggingFace), and writing the chunking logic just to store a few user preferences felt like massive overkill for my side projects. I didn't want to manage infrastructure; I just wanted my agent to remember that my favorite color is blue.
So, I spent the last few weeks building a dedicated API to abstract all that boring stuff away.
It’s called MemVault. It’s open-source, built with Node.js/TypeScript, and runs on PostgreSQL with pgvector.
Here is how I built it and how you can use it.
The Architecture
I wanted a "fire and forget" solution. I send text, the API handles the rest.
Backend: Node.js & Express (TypeScript)
Database: PostgreSQL using the pgvector extension (via Prisma ORM).
Embeddings: OpenAI text-embedding-3-small (512 dimensions).
Validation: Zod (because trusting user input is a bad idea).
The "Secret Sauce": Hybrid Search
The biggest issue with standard vector search (Cosine Similarity) is that it sometimes retrieves old, irrelevant stuff just because it uses similar words.
To fix this, I implemented a hybrid scoring algorithm directly in the SQL query. It doesn't just look at vector similarity; it weights in:
Semantic Similarity: How close is the meaning?
Recency: Is this memory from today or last year? (Decays over time).
Importance: I added an "Importance Score" based on the content (e.g., text containing money, dates, or proper nouns gets a boost).
This ensures that if I ask "What did I buy?", it prioritizes the iPhone I bought yesterday over the socks I bought three years ago, even if the vector match is similar.
Seeing is believing (The Visualizer)
Debugging RAG (Retrieval-Augmented Generation) is a pain. You usually stare at JSON arrays trying to figure out why the AI retrieved the wrong context.
So I built a frontend visualizer to actually see what's happening.
(Insert screenshot of your Memory Nodes graph here)
It visualizes the flow: Input → Vectorization → Storage → Retrieval. It helped me catch so many edge cases where the embeddings weren't acting the way I expected.
How to use it in 3 minutes
I published the SDK to NPM so you don't have to host the backend yourself if you don't want to.
- Install
Bash
npm install memvault-sdk-jakops88
- Store a memory You don't need to chunk text or call OpenAI. Just throw the string at the API.
JavaScript
const { storeMemory } = require('memvault-sdk-jakops88');
await storeMemory({
sessionId: "user-123",
text: "The user prefers dark mode and writes in Python.",
metadata: { source: "onboarding_chat" }
});
- Retrieve context When your user asks a question, query the memory first.
JavaScript
const { retrieveMemories } = require('memvault-sdk-jakops88');
const context = await retrieveMemories({
sessionId: "user-123",
query: "What coding language should I use for the snippet?",
limit: 1
});
console.log(context[0].text);
// Output: "The user prefers dark mode and writes in Python."
Open Source
I’m not trying to sell a SaaS here. The whole project is open source. You can fork the repo, deploy it to your own Railway/Docker instance, and run it entirely free (except for the OpenAI API costs).
I’d love some feedback on the scoring algorithm if anyone here is into vector math.
Links:
- Live Demo & Visualizer (Try storing a memory and watch the graph update)
- NPM Package
- RapidAPI Page
Thanks for reading.
Top comments (0)