<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jay Bamroliya</title>
    <description>The latest articles on DEV Community by Jay Bamroliya (@jay_bamroliya_402b72cf784).</description>
    <link>https://dev.to/jay_bamroliya_402b72cf784</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016133%2Fbac00b3d-85aa-4759-95d7-0ca27092b377.jpg</url>
      <title>DEV Community: Jay Bamroliya</title>
      <link>https://dev.to/jay_bamroliya_402b72cf784</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jay_bamroliya_402b72cf784"/>
    <language>en</language>
    <item>
      <title>I Built an AI That Never Forgets</title>
      <dc:creator>Jay Bamroliya</dc:creator>
      <pubDate>Sun, 05 Jul 2026 12:16:35 +0000</pubDate>
      <link>https://dev.to/jay_bamroliya_402b72cf784/i-built-an-ai-that-never-forgets-1c2a</link>
      <guid>https://dev.to/jay_bamroliya_402b72cf784/i-built-an-ai-that-never-forgets-1c2a</guid>
      <description>&lt;h1&gt;
  
  
  I Built an AI That Never Forgets — for $0 (Cognee Hackathon)
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;By Team MindVault — Jay Bamroliya &amp;amp; Kaushal Karkar&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Every AI assistant has the same embarrassing problem.&lt;/p&gt;

&lt;p&gt;You spend 20 minutes explaining your project. You close the tab. You come back tomorrow — and it has no idea who you are.&lt;/p&gt;

&lt;p&gt;Your AI has amnesia. Every. Single. Time.&lt;/p&gt;

&lt;p&gt;For the WeMakeDevs × Cognee Hackathon, we built &lt;strong&gt;MindVault&lt;/strong&gt; to fix that — a personal "living memory" that builds a knowledge graph of your life as you talk to it. And we made it run on a completely free stack. Here's exactly how, including everything that broke along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Stateless AI
&lt;/h2&gt;

&lt;p&gt;When you call an LLM, every request starts from zero. No memory of your last session, your preferences, your decisions, or your name.&lt;/p&gt;

&lt;p&gt;The usual workarounds all fall short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompts&lt;/strong&gt; — token-limited, manually managed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector databases&lt;/strong&gt; — semantic similarity only, no relational context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG pipelines&lt;/strong&gt; — complex to build, no graph awareness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these give you &lt;em&gt;real&lt;/em&gt; persistent memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter Cognee
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/topoteretes/cognee" rel="noopener noreferrer"&gt;Cognee&lt;/a&gt; is an open-source memory layer for AI agents. It turns text into a &lt;strong&gt;hybrid graph-vector knowledge store&lt;/strong&gt; — two retrieval systems working together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Vector search&lt;/strong&gt; — "find things semantically similar to this query"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph traversal&lt;/strong&gt; — "follow relationships between concepts"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the difference between a filing cabinet and an actual brain.&lt;/p&gt;

&lt;p&gt;Cognee 1.2's memory API is beautifully simple — four verbs that cover the whole memory lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cognee&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cognee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jay is a developer from India building MindVault.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cognee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who is building MindVault?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cognee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;improve&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                 &lt;span class="c1"&gt;# enrich graph connections
&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cognee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;everything&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# GDPR-ready erasure
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What We Built: MindVault
&lt;/h2&gt;

&lt;p&gt;A chat interface where every message becomes structured memory:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;💾 &lt;strong&gt;Remember&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Text → embedded + mined into knowledge-graph entities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🔍 &lt;strong&gt;Recall&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Question → hybrid graph+vector search → AI answer from YOUR memories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;✨ &lt;strong&gt;Improve&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Re-runs enrichment, strengthening graph connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🗑️ &lt;strong&gt;Forget&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Full erasure — complete data lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Plus the parts we're proud of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A live force-directed knowledge graph&lt;/strong&gt; rendered on Canvas — zero libraries, custom physics (repulsion, springs, gravity). You literally watch your memory grow as you type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice input&lt;/strong&gt; via the Web Speech API — speak your memories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A live LOCAL ↔ CLOUD toggle&lt;/strong&gt; — one click switches between open-source Cognee running on your machine and Cognee Cloud. No restart. Same codebase, &lt;code&gt;memory_engine.py&lt;/code&gt; abstracts both backends behind identical async functions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser (chat · voice · live graph · toggle)
        │
        ▼
FastAPI backend ── /remember /recall /improve /forget
        │
        ▼
memory_engine.py ── one interface, two backends
   ├── LOCAL:  open-source Cognee + Groq + fastembed
   └── CLOUD:  Cognee Cloud REST API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Real Story: Making It Run for $0
&lt;/h2&gt;

&lt;p&gt;This was the hardest and most educational part. We had &lt;strong&gt;no budget for APIs&lt;/strong&gt;. Here's the free stack and every wall we hit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 1: LLM costs.&lt;/strong&gt; Groq's free tier gives you &lt;code&gt;llama-3.3-70b-versatile&lt;/code&gt; at 6,000 tokens/minute. Sounds fine — until you learn Cognee's cognify pipeline makes &lt;em&gt;multiple concurrent LLM calls&lt;/em&gt;. Instant 429 rate-limit errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Cognee ships a built-in rate limiter (backed by &lt;code&gt;aiolimiter&lt;/code&gt;). Three env vars:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM_RATE_LIMIT_ENABLED=true
LLM_RATE_LIMIT_REQUESTS=1
LLM_RATE_LIMIT_INTERVAL=15
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calls queue and space out automatically. remember() takes ~90 seconds on the free tier — a fair trade for $0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 2: Embedding costs.&lt;/strong&gt; Cognee defaults to OpenAI embeddings — which means an OpenAI key and a bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;code&gt;fastembed&lt;/code&gt; runs BAAI/bge-small-en-v1.5 &lt;strong&gt;locally&lt;/strong&gt;. No API key, no network calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EMBEDDING_PROVIDER=fastembed
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Wall 3: Vector dimension mismatch.&lt;/strong&gt; Our LanceDB store had been created with OpenAI's 3072-dim vectors; fastembed produces 384-dim. Schema conflict, cryptic errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; wipe &lt;code&gt;.cognee_system/databases&lt;/code&gt; and let it rebuild with the right schema. Lesson: embedding dimensions are part of your storage schema — changing providers means migrating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 4 (Cloud mode): the silent no-op.&lt;/strong&gt; Cognee Cloud's &lt;code&gt;/api/v1/add&lt;/code&gt; accepts &lt;strong&gt;multipart file uploads&lt;/strong&gt;, not JSON. Our JSON POSTs returned plausible status codes while storing &lt;em&gt;nothing&lt;/em&gt;. Recall answers were pure LLM hallucination — confidently wrong, cached per-question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; read the OpenAPI spec (&lt;code&gt;/openapi.json&lt;/code&gt;), switch to multipart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/plain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datasetName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debugging lesson: when search says "no data found" but add says "success," &lt;strong&gt;trust the negative signal&lt;/strong&gt; — verify what's actually stored (&lt;code&gt;GET /api/v1/datasets/{id}/data&lt;/code&gt;) instead of trusting status codes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Surprised Us
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Graph traversal is genuinely different from vector search.&lt;/strong&gt; We stored "Jay is building MindVault" and "MindVault is powered by Cognee AI" as separate memories, then asked "What is Jay building?" — Cognee connected the dots &lt;em&gt;through the graph&lt;/em&gt;, not by keyword overlap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;improve()&lt;/code&gt; is underrated.&lt;/strong&gt; Most people stop at add-and-search. Re-running enrichment after accumulating memories visibly strengthens the graph — new edges appear between old nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/jaybamroliya/mindvault
&lt;span class="nb"&gt;cd &lt;/span&gt;mindvault
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env   &lt;span class="c"&gt;# add a free Groq key from console.groq.com&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; uvicorn main:app &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total cost: &lt;strong&gt;$0.&lt;/strong&gt; No credit card anywhere in the stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;"Stateless AI" is one of the most annoying unsolved UX problems in AI. Cognee solves it properly — not with a prompt hack, but with a real hybrid memory architecture that you can self-host for free or scale on their cloud.&lt;/p&gt;

&lt;p&gt;If you're building agents, give your AI a memory. It changes everything.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the WeMakeDevs × Cognee Hackathon by Team MindVault — Jay Bamroliya &amp;amp; Kaushal Karkar. Source: &lt;a href="https://github.com/jaybamroliya/mindvault" rel="noopener noreferrer"&gt;github.com/jaybamroliya/mindvault&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>hackathon</category>
    </item>
  </channel>
</rss>
