Jay Bamroliya

Posted on Jul 5

I Built an AI That Never Forgets

#ai #python #opensource #hackathon

I Built an AI That Never Forgets — for $0 (Cognee Hackathon)

By Team MindVault — Jay Bamroliya & Kaushal Karkar

Every AI assistant has the same embarrassing problem.

You spend 20 minutes explaining your project. You close the tab. You come back tomorrow — and it has no idea who you are.

Your AI has amnesia. Every. Single. Time.

For the WeMakeDevs × Cognee Hackathon, we built MindVault to fix that — a personal "living memory" that builds a knowledge graph of your life as you talk to it. And we made it run on a completely free stack. Here's exactly how, including everything that broke along the way.

The Problem with Stateless AI

When you call an LLM, every request starts from zero. No memory of your last session, your preferences, your decisions, or your name.

The usual workarounds all fall short:

System prompts — token-limited, manually managed
Vector databases — semantic similarity only, no relational context
RAG pipelines — complex to build, no graph awareness

None of these give you real persistent memory.

Enter Cognee

Cognee is an open-source memory layer for AI agents. It turns text into a hybrid graph-vector knowledge store — two retrieval systems working together:

Vector search — "find things semantically similar to this query"
Graph traversal — "follow relationships between concepts"

That's the difference between a filing cabinet and an actual brain.

Cognee 1.2's memory API is beautifully simple — four verbs that cover the whole memory lifecycle:

import cognee

await cognee.remember("Jay is a developer from India building MindVault.")
results = await cognee.recall("Who is building MindVault?")
await cognee.improve()                 # enrich graph connections
await cognee.forget(everything=True)   # GDPR-ready erasure

What We Built: MindVault

A chat interface where every message becomes structured memory:

Operation	What happens
💾 Remember	Text → embedded + mined into knowledge-graph entities
🔍 Recall	Question → hybrid graph+vector search → AI answer from YOUR memories
✨ Improve	Re-runs enrichment, strengthening graph connections
🗑️ Forget	Full erasure — complete data lifecycle

Plus the parts we're proud of:

A live force-directed knowledge graph rendered on Canvas — zero libraries, custom physics (repulsion, springs, gravity). You literally watch your memory grow as you type.
Voice input via the Web Speech API — speak your memories.
A live LOCAL ↔ CLOUD toggle — one click switches between open-source Cognee running on your machine and Cognee Cloud. No restart. Same codebase, memory_engine.py abstracts both backends behind identical async functions.

Browser (chat · voice · live graph · toggle)
        │
        ▼
FastAPI backend ── /remember /recall /improve /forget
        │
        ▼
memory_engine.py ── one interface, two backends
   ├── LOCAL:  open-source Cognee + Groq + fastembed
   └── CLOUD:  Cognee Cloud REST API

The Real Story: Making It Run for $0

This was the hardest and most educational part. We had no budget for APIs. Here's the free stack and every wall we hit:

Wall 1: LLM costs. Groq's free tier gives you llama-3.3-70b-versatile at 6,000 tokens/minute. Sounds fine — until you learn Cognee's cognify pipeline makes multiple concurrent LLM calls. Instant 429 rate-limit errors.

Fix: Cognee ships a built-in rate limiter (backed by aiolimiter). Three env vars:

LLM_RATE_LIMIT_ENABLED=true
LLM_RATE_LIMIT_REQUESTS=1
LLM_RATE_LIMIT_INTERVAL=15

Calls queue and space out automatically. remember() takes ~90 seconds on the free tier — a fair trade for $0.

Wall 2: Embedding costs. Cognee defaults to OpenAI embeddings — which means an OpenAI key and a bill.

Fix: fastembed runs BAAI/bge-small-en-v1.5 locally. No API key, no network calls:

EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5

Wall 3: Vector dimension mismatch. Our LanceDB store had been created with OpenAI's 3072-dim vectors; fastembed produces 384-dim. Schema conflict, cryptic errors.

Fix: wipe .cognee_system/databases and let it rebuild with the right schema. Lesson: embedding dimensions are part of your storage schema — changing providers means migrating.

Wall 4 (Cloud mode): the silent no-op. Cognee Cloud's /api/v1/add accepts multipart file uploads, not JSON. Our JSON POSTs returned plausible status codes while storing nothing. Recall answers were pure LLM hallucination — confidently wrong, cached per-question.

Fix: read the OpenAPI spec (/openapi.json), switch to multipart:

files={"data": ("memory.txt", text.encode(), "text/plain")},
data={"datasetName": dataset},

Debugging lesson: when search says "no data found" but add says "success," trust the negative signal — verify what's actually stored (GET /api/v1/datasets/{id}/data) instead of trusting status codes.

What Surprised Us

Graph traversal is genuinely different from vector search. We stored "Jay is building MindVault" and "MindVault is powered by Cognee AI" as separate memories, then asked "What is Jay building?" — Cognee connected the dots through the graph, not by keyword overlap.

improve() is underrated. Most people stop at add-and-search. Re-running enrichment after accumulating memories visibly strengthens the graph — new edges appear between old nodes.

Try It

git clone https://github.com/jaybamroliya/mindvault
cd mindvault
pip install -r requirements.txt
cp .env.example .env   # add a free Groq key from console.groq.com
python -m uvicorn main:app --port 8000

Total cost: $0. No credit card anywhere in the stack.

Final Thoughts

"Stateless AI" is one of the most annoying unsolved UX problems in AI. Cognee solves it properly — not with a prompt hack, but with a real hybrid memory architecture that you can self-host for free or scale on their cloud.

If you're building agents, give your AI a memory. It changes everything.

Built for the WeMakeDevs × Cognee Hackathon by Team MindVault — Jay Bamroliya & Kaushal Karkar. Source: github.com/jaybamroliya/mindvault

DEV Community