<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gokul Jinu</title>
    <description>The latest articles on DEV Community by Gokul Jinu (@gokuljinu01).</description>
    <link>https://dev.to/gokuljinu01</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3894886%2F19cb34da-f738-4bbf-9991-4a49d73af164.png</url>
      <title>DEV Community: Gokul Jinu</title>
      <link>https://dev.to/gokuljinu01</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gokuljinu01"/>
    <language>en</language>
    <item>
      <title>Why we built tag-graph memory for AI agents — and shipped a Python SDK for it</title>
      <dc:creator>Gokul Jinu</dc:creator>
      <pubDate>Thu, 23 Apr 2026 20:31:29 +0000</pubDate>
      <link>https://dev.to/gokuljinu01/why-we-built-tag-graph-memory-for-ai-agents-and-shipped-a-python-sdk-for-it-25c1</link>
      <guid>https://dev.to/gokuljinu01/why-we-built-tag-graph-memory-for-ai-agents-and-shipped-a-python-sdk-for-it-25c1</guid>
      <description>&lt;p&gt;I spent most of last year trying to solve a deceptively narrow problem: how do you give an LLM agent persistent memory that's bounded, predictable, and doesn't blow your token bill?&lt;/p&gt;

&lt;p&gt;I tried a lot of things. Vector DBs gave me fuzzy results that were impossible to token-budget. Raw conversation history blew past context windows in 5 turns. "Summarize and re-inject" silently dropped the one fact the agent needed three turns later.&lt;/p&gt;

&lt;p&gt;Today I shipped the first Python SDK for what we ended up building — &lt;strong&gt;MME&lt;/strong&gt; (Memory Management Engine). It's a &lt;em&gt;bounded tag-graph&lt;/em&gt; memory engine, and it's a different shape of memory than the vector-DB-by-default story you hear everywhere.&lt;/p&gt;

&lt;p&gt;This is a writeup of the design choices, why each one matters in production, and what's in the SDK if you want to try it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three problems with vector-search agent memory
&lt;/h2&gt;

&lt;p&gt;Vector retrieval is the default answer to "how do I give my agent memory" because embeddings are universal and pgvector / Pinecone / Weaviate are easy to host. But for &lt;em&gt;agent&lt;/em&gt; memory specifically (as opposed to RAG over documents), three things keep biting you:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. You can't token-budget the result
&lt;/h3&gt;

&lt;p&gt;You ask for top-K = 5 documents. You get five chunks back. Each chunk could be 80 tokens or 800 tokens. You don't know until you tokenize the response. So either you over-budget (waste money on every call) or under-budget (truncate mid-sentence and the LLM gets garbage).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cosine similarity rewards the wrong things
&lt;/h3&gt;

&lt;p&gt;For a question like "what are my food preferences?", a chunk containing the literal phrase "food preferences" beats a chunk that says "I'm allergic to peanuts and I prefer dark chocolate" — even though the second chunk is what you actually want. Embeddings encode lexical similarity at least as much as semantic relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. There's no learning loop
&lt;/h3&gt;

&lt;p&gt;Every retrieval is independent. The system never improves from "this pack worked for that query" feedback. To improve, you re-train embeddings or re-chunk — both are heavy ops you don't run on every accepted pack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape that worked: a bounded tag-graph
&lt;/h2&gt;

&lt;p&gt;The core idea: instead of embeddings, store memories as &lt;strong&gt;structured tag sets&lt;/strong&gt;, and retrieve by walking a graph from query tags to memory tags.&lt;/p&gt;

&lt;p&gt;When you save a memory like &lt;em&gt;"I prefer dark chocolate"&lt;/em&gt;, MME extracts a small set of structured tags — &lt;code&gt;food&lt;/code&gt;, &lt;code&gt;preference&lt;/code&gt;, &lt;code&gt;dark_chocolate&lt;/code&gt;, &lt;code&gt;food_item&lt;/code&gt; — with weights. These tags become nodes in a graph; their co-occurrence creates weighted edges.&lt;/p&gt;

&lt;p&gt;When you query &lt;em&gt;"what are my food preferences?"&lt;/em&gt;, MME does the same tag extraction on the query (yielding seed tags S like &lt;code&gt;food&lt;/code&gt;, &lt;code&gt;preference&lt;/code&gt;), then walks the graph:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From each seed, follow up to &lt;strong&gt;M = 32&lt;/strong&gt; highest-weight edges&lt;/li&gt;
&lt;li&gt;Repeat to depth &lt;strong&gt;D = 2&lt;/strong&gt; with a decay factor α applied per hop&lt;/li&gt;
&lt;li&gt;Trim the activated tag set to a &lt;strong&gt;beam width B = 128&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Find memories whose tags are in the activated set&lt;/li&gt;
&lt;li&gt;Score by activation × recency × importance − diversity penalty&lt;/li&gt;
&lt;li&gt;Pack greedily until the &lt;strong&gt;token budget&lt;/strong&gt; is hit (exact tiktoken count)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bound is mathematical: O(|S| · M^D) tags activated, hard cap at beam width. In practice this gives p95 latency of &lt;strong&gt;135 ms&lt;/strong&gt; across a 25-minute soak of 150K requests with 0% errors. (I obsessed over this; the bounds aren't decorative.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Why each piece matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Bounded propagation.&lt;/strong&gt; Without a depth + beam cap, graph walks degenerate to "activate everything" on dense graphs. The cap means latency is predictable regardless of graph size. This is the single biggest reason MME is production-runnable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token-budgeted packs.&lt;/strong&gt; The packer is a hard constraint, not a soft target. You ask for 1024 tokens, you get ≤ 1024. Items that don't fit are skipped, not truncated. This means you can prompt-engineer with confidence: your context window allocation is real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online learning.&lt;/strong&gt; When the agent accepts a pack and the downstream call succeeds, MME updates edge weights via EMA from the feedback signal. After a few hundred accepted packs, the graph self-tunes to your usage patterns. No retraining, no embedding refreshes, no offline pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online tagging.&lt;/strong&gt; New memories get tagged at write time by a small LLM-backed tagger that knows the existing tag vocabulary. Tags are reused where possible (the &lt;code&gt;dark_chocolate&lt;/code&gt; tag persists across users in the same scope), so the graph densifies as you save more memories — which is what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Python SDK (shipped today)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;railtech-mme
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three calls cover 90% of what you'll do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;railtech_mme&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MME&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;MME&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mme&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Save a memory — auto-tagged on the server
&lt;/span&gt;    &lt;span class="n"&gt;mme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I prefer dark chocolate.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m allergic to peanuts.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Inject — get a token-budgeted pack
&lt;/span&gt;    &lt;span class="n"&gt;pack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are my food preferences and allergies?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;excerpt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Feedback — close the learning loop
&lt;/span&gt;    &lt;span class="n"&gt;mme&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pack_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pack_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;accepted&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a parallel &lt;code&gt;AsyncMME&lt;/code&gt; for async stacks, full Pydantic models on every request/response, and an exception taxonomy (&lt;code&gt;MMEAuthError&lt;/code&gt;, &lt;code&gt;MMERateLimitError&lt;/code&gt;, &lt;code&gt;MMETimeoutError&lt;/code&gt;, etc.) so you can write proper error handling.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain integration
&lt;/h2&gt;

&lt;p&gt;The integration is a first-class extra, not a wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'railtech-mme[langchain]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;railtech_mme.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MMEInjectTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MMESaveTool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;MMEInjectTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;MMESaveTool&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both tools have proper Pydantic schemas, so the LLM sees clean parameter descriptions when deciding whether to call them. &lt;code&gt;MMEInjectTool&lt;/code&gt; returns a token-budgeted pack; &lt;code&gt;MMESaveTool&lt;/code&gt; lets the agent persist new memories with optional section/source tags.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's not yet there (honest beat)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The SDK is &lt;strong&gt;one day old&lt;/strong&gt;. v0.1.0 shipped yesterday; v0.1.1 today after end-to-end verification surfaced two real bugs (&lt;code&gt;recent()&lt;/code&gt; was crashing on real responses, and the README quickstart was using a paraphrase that didn't activate the cold-start tag graph). Both fixed.&lt;/li&gt;
&lt;li&gt;Docs are minimal. The README has a quickstart, the dashboard at mme.railtech.io has the Python section, but you'll find missing pieces — please open issues.&lt;/li&gt;
&lt;li&gt;The backend has been in production for ~6 months, so the &lt;em&gt;server&lt;/em&gt; is mature. The &lt;em&gt;Python client&lt;/em&gt; is what's new and what I'd love feedback on.&lt;/li&gt;
&lt;li&gt;LangGraph examples beyond basic tool-binding aren't in the repo yet. They're next.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I'm sharing this
&lt;/h2&gt;

&lt;p&gt;Two reasons:&lt;/p&gt;

&lt;p&gt;First, I think tag-graph memory is genuinely a different design point than vector search, and I'd like more people to push on it — find where it breaks, find where it shines. The math is in the README; the code is Apache-2.0 on GitHub.&lt;/p&gt;

&lt;p&gt;Second, this is launch day for the Python SDK. If you're building agents in Python and you've felt the pain of "my agent doesn't remember things well," I'd love it if you tried it and told me what's clunky about the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/gokulJinu01/railtech-mme-python" rel="noopener noreferrer"&gt;&lt;code&gt;gokulJinu01/railtech-mme-python&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/railtech-mme/" rel="noopener noreferrer"&gt;&lt;code&gt;railtech-mme&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sign up for an API key:&lt;/strong&gt; &lt;a href="https://mme.railtech.io" rel="noopener noreferrer"&gt;mme.railtech.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python docs section:&lt;/strong&gt; &lt;a href="https://mme.railtech.io/#python" rel="noopener noreferrer"&gt;mme.railtech.io/#python&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to answer technical questions in the comments — the bounded retrieval math, the LangChain tool design, why we chose tag-graph over hybrid vector+keyword, or anything about the prod observability stack.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>python</category>
      <category>ai</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
