<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vinayak Sonthalia</title>
    <description>The latest articles on DEV Community by Vinayak Sonthalia (@vinayaksonthalia).</description>
    <link>https://dev.to/vinayaksonthalia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016272%2F06216a7a-b34b-4a55-8064-d54ff1f51642.jpg</url>
      <title>DEV Community: Vinayak Sonthalia</title>
      <link>https://dev.to/vinayaksonthalia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vinayaksonthalia"/>
    <language>en</language>
    <item>
      <title>I Built an AI That Forgets on Purpose — and It Made It Smarter</title>
      <dc:creator>Vinayak Sonthalia</dc:creator>
      <pubDate>Sun, 05 Jul 2026 16:02:52 +0000</pubDate>
      <link>https://dev.to/vinayaksonthalia/i-built-an-ai-that-forgets-on-purpose-and-it-made-it-smarter-5ci</link>
      <guid>https://dev.to/vinayaksonthalia/i-built-an-ai-that-forgets-on-purpose-and-it-made-it-smarter-5ci</guid>
      <description>&lt;p&gt;&lt;em&gt;It's 3am, your site is down, and your AI confidently tells you to fix a server that was deleted a month ago. Here's why I spent a week building the delete key.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9rzjd02xkf56k4gfipnh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9rzjd02xkf56k4gfipnh.png" alt="The AI That Forgets on Purpose" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The night my own assistant lied to me
&lt;/h2&gt;

&lt;p&gt;I was building an incident-triage assistant — the kind of thing an on-call engineer asks for help at 3am when the site is down. Feed it your team's runbooks, ask it what to check, get a calm answer back. It worked. I was pretty happy with it.&lt;/p&gt;

&lt;p&gt;Then one evening, while testing, it told me to go fix a server we'd switched off a month earlier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcxmdgfuwuu3uei4mikzy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcxmdgfuwuu3uei4mikzy.png" alt="3am panic" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; auth-service is slow, what do I check?&lt;br&gt;
&lt;strong&gt;Assistant:&lt;/strong&gt; Check the legacy-cache — flush and resize the cluster!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Confident. Detailed. Completely wrong. The legacy-cache didn't exist anymore. And here's the thing that stuck with me: &lt;strong&gt;the assistant hadn't malfunctioned.&lt;/strong&gt; It did exactly what I built it to do — remember. It just remembered something it should have forgotten.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug wasn't a bug. It was the whole design.
&lt;/h2&gt;

&lt;p&gt;Look at how everyone builds "AI memory" right now — they're all racing in one direction: &lt;strong&gt;remember more.&lt;/strong&gt; More documents, longer history, bigger context.&lt;/p&gt;

&lt;p&gt;It's genuinely useful — until you notice the quiet assumption holding it all together: &lt;em&gt;every fact we remember stays true forever.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Facts do not do that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F527u83mdkwupj9kpfmjl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F527u83mdkwupj9kpfmjl.png" alt="The hoarder problem" width="799" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Servers get retired. People change teams. Last year's clever fix becomes this year's outage. A bigger memory just hands you &lt;strong&gt;more stale facts, with no way to tell "still true" from "used to be true."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think of your phone's contacts: if you only ever &lt;em&gt;add&lt;/em&gt; numbers, eventually you call an old friend and a stranger picks up. The bug was never a number you forgot — it was a number you &lt;strong&gt;kept when you shouldn't have.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I stopped asking &lt;em&gt;"how much can it remember?"&lt;/em&gt; and started asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can it stop remembering the things that have stopped being true?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I named the project &lt;strong&gt;Lethe&lt;/strong&gt;, after the river of forgetting. (I'm not subtle.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch it forget, live
&lt;/h2&gt;

&lt;p&gt;I retire the &lt;code&gt;legacy-cache&lt;/code&gt; — decommission it, gone — and ask the &lt;strong&gt;exact same question&lt;/strong&gt; as before, word for word:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fddsf91frjlt4l5o8kauc.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fddsf91frjlt4l5o8kauc.gif" alt="The hero flip — same question, one delete, different answer" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; "Check the legacy-cache, flush and resize the cluster." &lt;em&gt;(go fight the ghost)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After the delete, same question:&lt;/strong&gt; "Check the session-store connection pool and its hit rate…"&lt;/p&gt;

&lt;p&gt;The answer &lt;strong&gt;flips.&lt;/strong&gt; And when I pushed it — &lt;em&gt;"okay, then what IS the legacy-cache?"&lt;/em&gt; — it said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The legacy-cache is not documented in the runbooks."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not &lt;em&gt;"I remember more."&lt;/em&gt; Instead: &lt;strong&gt;"I stopped remembering the wrong thing"&lt;/strong&gt; — and it &lt;em&gt;admits it doesn't know&lt;/em&gt; rather than confidently inventing something. Which, for an AI, is basically emotional maturity.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works (the two-minute version)
&lt;/h2&gt;

&lt;p&gt;Lethe is built on &lt;a href="https://www.cognee.ai/" rel="noopener noreferrer"&gt;Cognee&lt;/a&gt;, an open-source memory engine. You feed it runbooks as &lt;strong&gt;plain English&lt;/strong&gt; — no tags, no schema — and one &lt;code&gt;cognify()&lt;/code&gt; call builds &lt;strong&gt;two stores at once&lt;/strong&gt;: a knowledge graph (Kùzu) and a vector index (LanceDB).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzaexvmfal70epx2ptd4a.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzaexvmfal70epx2ptd4a.gif" alt="The live knowledge graph" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ask a question → vector search finds what's &lt;em&gt;relevant&lt;/em&gt;, the graph adds what's &lt;em&gt;connected&lt;/em&gt;, and the model writes one runbook-style answer — every claim cited to its source.&lt;/p&gt;

&lt;p&gt;And when a system dies? &lt;code&gt;forget()&lt;/code&gt; — a &lt;strong&gt;real hard delete&lt;/strong&gt;: raw files, graph nodes, edges, embeddings. Gone. With a printed receipt and a live re-query proving it.&lt;/p&gt;

&lt;p&gt;There's a gentler layer too: a &lt;strong&gt;curation loop&lt;/strong&gt; that scores memory health, reversibly down-weights aging runbooks, and queues true deletions for a &lt;em&gt;human&lt;/em&gt; to approve. Nothing gets hard-deleted without a person saying yes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F76u8tf3147ba2g1ks07b.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F76u8tf3147ba2g1ks07b.gif" alt="The curation loop" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Then I tried, on purpose, to prove myself wrong
&lt;/h2&gt;

&lt;p&gt;Cognee advertises several superpowers beyond forgetting. Instead of listing them all as &lt;em&gt;my&lt;/em&gt; features, I raced them against plain, boring RAG — same docs, same model, temperature zero:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fny1lkp7ihofmw21m0skc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fny1lkp7ihofmw21m0skc.png" alt="The honest bake-off" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multi-hop? &lt;strong&gt;Tied&lt;/strong&gt; at this scale. Learning from feedback? &lt;strong&gt;No clean win.&lt;/strong&gt; Blast-radius? &lt;strong&gt;Tied.&lt;/strong&gt; Only &lt;strong&gt;forget&lt;/strong&gt; produced something RAG-with-the-same-stack fundamentally cannot do. So I cut the three ties as headline claims and built the whole product around the one provable win.&lt;/p&gt;

&lt;p&gt;Then I measured forgetting itself — &lt;strong&gt;twice&lt;/strong&gt;, with a blind judge from a different model family scoring every answer:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flen6ygsy5dymorq8cly1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flen6ygsy5dymorq8cly1.png" alt="Forgetting, measured" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Fun confession: my &lt;em&gt;first&lt;/em&gt; benchmark got thrown out — it graded whether the forgotten word was absent, which is circular. The blind-judge version grades &lt;em&gt;correctness&lt;/em&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug that almost killed the demo
&lt;/h2&gt;

&lt;p&gt;For a while, the demo was haunted: the same question usually returned a full sentence — but &lt;em&gt;sometimes&lt;/em&gt; it fired back a single word: &lt;strong&gt;"legacy-cache."&lt;/strong&gt; Just the word.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhxk8goupzbr735it9vpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhxk8goupzbr735it9vpy.png" alt="The case of the one-word answer" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I cleared the suspects one by one — the build (glitch happened on the &lt;em&gt;same&lt;/em&gt; saved graph), my phrasing, retrieval (the assembled context was rich and correct &lt;em&gt;every time&lt;/em&gt;). The culprit? A line buried in the tool's &lt;strong&gt;default prompt&lt;/strong&gt;: &lt;em&gt;"answer as briefly as possible."&lt;/em&gt; On a vague question, that collapses to the single most relevant word. The model wasn't broken — it was obeying its instructions &lt;em&gt;a little too well.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One custom prompt later, three separate bugs disappeared at once. &lt;strong&gt;When an LLM writes your final answer, the instruction you hand it is the most powerful lever you have.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Go poke it yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌊 &lt;strong&gt;Try it live (no signup):&lt;/strong&gt; &lt;a href="https://vinayaksonthalia-lethe.hf.space" rel="noopener noreferrer"&gt;vinayaksonthalia-lethe.hf.space&lt;/a&gt; — ask, decommission, re-ask, watch it flip&lt;/li&gt;
&lt;li&gt;🎬 &lt;strong&gt;2-minute demo:&lt;/strong&gt; &lt;a href="https://youtu.be/3840gxTZWxY" rel="noopener noreferrer"&gt;youtu.be/3840gxTZWxY&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📚 &lt;strong&gt;Every design decision, documented:&lt;/strong&gt; &lt;a href="https://vinayaksonthalia-lethe.hf.space/learn" rel="noopener noreferrer"&gt;30 chapters at /learn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/vinayaksonthalia/lethe" rel="noopener noreferrer"&gt;github.com/vinayaksonthalia/lethe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everyone's building memory that only grows. Lethe is the other half — &lt;strong&gt;memory that knows when to let go.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thanks for reading. Now go delete something.&lt;/em&gt; 🌊&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built solo in one week by Vinayak Sonthalia (final-year B.Tech) for the WeMakeDevs × Cognee hackathon.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>hackathon</category>
    </item>
  </channel>
</rss>
