<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ravi kashyap</title>
    <description>The latest articles on DEV Community by ravi kashyap (@ravionite).</description>
    <link>https://dev.to/ravionite</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3863146%2F5d2d1fe9-7262-4269-aca6-f4ba72570444.png</url>
      <title>DEV Community: ravi kashyap</title>
      <link>https://dev.to/ravionite</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ravionite"/>
    <language>en</language>
    <item>
      <title>I built an open-source memory layer for LLMs — here's how it works</title>
      <dc:creator>ravi kashyap</dc:creator>
      <pubDate>Mon, 06 Apr 2026 05:30:10 +0000</pubDate>
      <link>https://dev.to/ravionite/i-built-an-open-source-memory-layer-for-llms-heres-how-it-works-3pb1</link>
      <guid>https://dev.to/ravionite/i-built-an-open-source-memory-layer-for-llms-heres-how-it-works-3pb1</guid>
      <description>&lt;p&gt;LLMs are stateless by design. You send a message, you get a reply, and the model instantly forgets everything. Every conversation starts cold.&lt;/p&gt;

&lt;p&gt;That's fine for one-off tasks. It's a real problem when you're building anything personal — a coding assistant that knows your stack, a writing tool that remembers your style, an agent that tracks what you've decided across sessions.&lt;/p&gt;

&lt;p&gt;The usual answers are: roll your own RAG pipeline, use a cloud memory service, or spend a weekend stitching together embeddings, a vector database, and prompt injection logic. None of those feel like &lt;em&gt;the&lt;/em&gt; answer.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;MemoryWeave&lt;/strong&gt; — an open-source Python library that gives any LLM long-term memory in three lines of code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;memoryweave&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryWeave&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryWeave&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Ravi. I prefer Python and FastAPI.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What stack should I recommend?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → Relevant memories:
# → - My name is Ravi. I prefer Python and FastAPI. (relevance: 0.94)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ctx.summary&lt;/code&gt; is a ready-to-inject string. Paste it into your system prompt. Done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why not just vector search?
&lt;/h2&gt;

&lt;p&gt;Most memory libraries are thin wrappers around a vector database. You embed text, store vectors, and retrieve by cosine similarity. It works, but it has a blind spot.&lt;/p&gt;

&lt;p&gt;Vector search finds &lt;em&gt;similar&lt;/em&gt; text. It struggles with &lt;em&gt;related&lt;/em&gt; facts.&lt;/p&gt;

&lt;p&gt;Say you store: &lt;code&gt;"Ravi uses FastAPI"&lt;/code&gt; and &lt;code&gt;"FastAPI uses Uvicorn"&lt;/code&gt;. If you query &lt;code&gt;"What server does Ravi use?"&lt;/code&gt;, a pure vector search will miss the inference. The connection lives in the relationship between facts, not in any single embedding.&lt;/p&gt;

&lt;p&gt;MemoryWeave solves this with a dual-retrieval architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Here's the full pipeline — both &lt;code&gt;add()&lt;/code&gt; and &lt;code&gt;get()&lt;/code&gt; in one view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory.add(text)
  │
  ├── spaCy NLP          extract entities + subject-verb-object facts
  ├── sentence-transformers   embed text → 384-dim vector
  ├── Vector store        save embedding (InMemory or ChromaDB)
  └── Knowledge graph     add entities and facts as nodes/edges (NetworkX)

memory.get(query)
  │
  ├── Embed query
  ├── Vector search       top-k similar memories by cosine similarity
  ├── Graph query         related facts by keyword overlap
  └── Ranker              fuse scores → 0.6 × vector + 0.4 × graph → MemoryContext
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through each part.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. NLP extraction (spaCy)
&lt;/h3&gt;

&lt;p&gt;When you call &lt;code&gt;memory.add(text)&lt;/code&gt;, the first thing that happens is a spaCy pass over the raw text. It extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Named entities&lt;/strong&gt; — people, places, organizations, tech names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subject-verb-object triples&lt;/strong&gt; — structured facts like &lt;code&gt;(Ravi, prefers, Python)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These become nodes and edges in a knowledge graph (NetworkX under the hood). This is what makes relational queries possible later.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Embedding (sentence-transformers)
&lt;/h3&gt;

&lt;p&gt;In parallel, the same text is embedded using &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; — a compact, fast sentence-transformers model that produces 384-dimensional vectors. These go into either an in-memory store (great for development) or ChromaDB (for production, persistence across restarts).&lt;/p&gt;

&lt;p&gt;Everything runs locally. No API keys, no data sent to any external service.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Deduplication
&lt;/h3&gt;

&lt;p&gt;Before storing anything, MemoryWeave checks cosine similarity against existing embeddings. If a new entry scores ≥ 0.98 against something already stored, it's silently dropped. This keeps memory clean when the same fact gets re-added across sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Retrieval and fusion
&lt;/h3&gt;

&lt;p&gt;When you call &lt;code&gt;memory.get(query)&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The query is embedded with the same model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector search&lt;/strong&gt; returns the top-k most similar memories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph query&lt;/strong&gt; does a keyword overlap walk across the knowledge graph, surfacing related facts that may not be textually similar to the query&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;weighted ranker&lt;/strong&gt; fuses both: &lt;code&gt;final_score = 0.6 × vector_score + 0.4 × graph_score&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The weights are configurable. If your use case is mostly factual (e.g., a personal knowledge base), bump &lt;code&gt;graph_weight&lt;/code&gt; up. If you're doing more semantic search over long-form text, keep vector weight dominant.&lt;/p&gt;

&lt;p&gt;The result is a &lt;code&gt;MemoryContext&lt;/code&gt; object:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;summary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ready-to-inject string for your system prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;entries&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector search hits with scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;facts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Graph facts with scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;has_results&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;False&lt;/code&gt; if nothing was found&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Plugging into OpenAI or Anthropic
&lt;/h2&gt;

&lt;p&gt;MemoryWeave ships with first-class adapters for both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# OpenAI
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;memoryweave.adapters.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIAdapter&lt;/span&gt;

&lt;span class="n"&gt;adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# injects memory into system prompt
# ... call OpenAI ...
&lt;/span&gt;&lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# stores the turn for next time
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Anthropic
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;memoryweave.adapters.anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AnthropicAdapter&lt;/span&gt;

&lt;span class="n"&gt;adapter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AnthropicAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ... call Anthropic with system= ...
&lt;/span&gt;&lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The adapters handle prompt injection automatically. You don't touch the system prompt manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-user sessions
&lt;/h2&gt;

&lt;p&gt;Every &lt;code&gt;MemoryWeave&lt;/code&gt; instance is scoped to a &lt;code&gt;session_id&lt;/code&gt;. Sessions never bleed into each other:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;alice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryWeave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MemoryConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;bob&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryWeave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MemoryConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;alice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice likes TypeScript.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob prefers Rust.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# → TypeScript
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# → Rust
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  REST API + TypeScript SDK
&lt;/h2&gt;

&lt;p&gt;If your app isn't Python, MemoryWeave also ships a FastAPI server and a TypeScript SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the server&lt;/span&gt;
uvicorn memoryweave.server:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;span class="c"&gt;# Optional: lock it with an API key&lt;/span&gt;
&lt;span class="nv"&gt;MEMORYWEAVE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-secret uvicorn memoryweave.server:app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MemoryWeave&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@memoryweave/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MemoryWeave&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:8000&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Ravi prefers Python.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What language?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Current state
&lt;/h2&gt;

&lt;p&gt;The library is at &lt;strong&gt;v1.1.0&lt;/strong&gt;, sitting at 248 tests and 91% coverage with CI green across Python 3.10–3.12. The full phase list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Phase 1 — Foundation
✅ Phase 2 — NLP extraction (spaCy)
✅ Phase 3 — Storage layer (vector + knowledge graph)
✅ Phase 4 — Core memory API v0.1.0
✅ Phase 5 — TypeScript SDK
✅ Phase 6 — FastAPI REST server
✅ Phase 7 — Documentation
✅ Phase 8 — Launch v1.0.0
✅ Phase 9 — Deduplication, async methods, LLM adapters, server auth v1.1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;A few things on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting strategies&lt;/strong&gt; — time-decay and relevance-decay so stale memories don't pollute retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming support&lt;/strong&gt; — auto-extract and store from streamed LLM responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory summaries&lt;/strong&gt; — periodic compression of older memories into higher-level facts&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;memoryweave
python &lt;span class="nt"&gt;-m&lt;/span&gt; spacy download en_core_web_sm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/ravii-k/memoryweave" rel="noopener noreferrer"&gt;github.com/ravii-k/memoryweave&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building something with it — or you've hit the same problem and solved it differently — I'd genuinely like to hear about it in the comments.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
