<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Juan David Gómez</title>
    <description>The latest articles on DEV Community by Juan David Gómez (@juandastic).</description>
    <link>https://dev.to/juandastic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2387605%2F11f53f0b-023e-4c47-87db-00467fa8f7e1.jpeg</url>
      <title>DEV Community: Juan David Gómez</title>
      <link>https://dev.to/juandastic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/juandastic"/>
    <language>en</language>
    <item>
      <title>My AI Sends 30k Tokens Per Message. 80% of Them Were Wasted.</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Sun, 19 Apr 2026 00:42:07 +0000</pubDate>
      <link>https://dev.to/juandastic/my-ai-sends-30k-tokens-per-message-80-of-them-were-wasted-1lmp</link>
      <guid>https://dev.to/juandastic/my-ai-sends-30k-tokens-per-message-80-of-them-were-wasted-1lmp</guid>
      <description>&lt;p&gt;Building AI side projects is fun until you have to pay for them.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://dev.to/juandastic/my-wife-sent-297-messages-in-15-days-not-to-me-to-the-ai-i-built-her-the-synapse-story-333o"&gt;Synapse&lt;/a&gt;, an AI companion with deep memory powered by a knowledge graph. My wife uses it daily for therapy, coaching, and reflection. The AI knows her life, her patterns, her goals, her emotional triggers. It remembers things across weeks and months.&lt;/p&gt;

&lt;p&gt;Two weeks ago, I connected PostHog to track LLM costs. Here is what I saw:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkchp9m62z3t83qto3tqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkchp9m62z3t83qto3tqr.png" alt="AI generations price" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;24 in two weeks. Four users. One of her sessions hit $2.42 for 28 messages. A single conversation.&lt;/p&gt;

&lt;p&gt;I looked at the token breakdown and the problem was obvious. Every message sends roughly 30,000 tokens of system context. Her knowledge graph has grown rich after weeks of daily use. Entities, relationships, temporal facts, emotional patterns. All of it compiled into a structured text snapshot and injected into every single message.&lt;/p&gt;

&lt;p&gt;And 80 to 90% of those tokens are the exact same compiled knowledge repeated on every single turn.&lt;/p&gt;

&lt;p&gt;The memory quality is great. The cost structure is not. So I made two changes: I restructured how context is assembled, and I added an explicit cache layer using Gemini's CachedContent API. Together they cut the cost per message by more than half.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Context: The Hybrid Memory Architecture
&lt;/h2&gt;

&lt;p&gt;If you have been following this series, you know the backstory. If not, here is the short version. (The full technical deep dive is in &lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;Scaling AI Memory: How I Tamed a 120K Token Prompt with Deterministic GraphRAG&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Synapse uses a two-layer approach to give the AI long-term memory:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Base Compilation (Working Memory).&lt;/strong&gt; When a session starts, &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;Synapse Cortex&lt;/a&gt; compiles the knowledge graph into a structured text summary. Entities, relationships, temporal facts. The most connected nodes always make it in. A waterfill algorithm caps the budget at roughly 120,000 characters (~30K tokens). This is the "always-on" context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. GraphRAG (Episodic Recall).&lt;/strong&gt; When the graph is too large for the budget, a second layer retrieves long-tail memories per-turn using hybrid search. It uses the graph UUIDs from the compilation metadata to avoid duplicating what is already in the base. Zero-latency, deterministic, no agent loops.&lt;/p&gt;

&lt;p&gt;This works well for quality. The AI still feels like it knows everything about you. But the cost story has a gap: that 30K compilation is the same text for the entire session, and it gets billed as fresh input tokens on every single message.&lt;/p&gt;

&lt;p&gt;In a 28-message session, that is 28 x 30k = 840k tokens just from the base knowledge. Almost all of it identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: One Blob, One Bill
&lt;/h2&gt;

&lt;p&gt;Before this change, the context assembly on the Convex side (the frontend backend) looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prepareContext: before&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachedSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;`\n\nCurrent date and time: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currentDateTime&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userKnowledge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;`\n\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userKnowledge&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiMessages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;conversationHistory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One string. Persona prompt, datetime, and the entire 30K compilation concatenated together. Sent as a single system message on every request.&lt;/p&gt;

&lt;p&gt;This design was simple and it worked fine when the graph was small. But it has two problems at scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You cannot cache part of a blob.&lt;/strong&gt; AI providers (OpenAI, Google, Anthropic) offer automatic cache layers in theory. When you send the same text prefix multiple times, the provider may cache it behind the scenes and charge less. But this is unreliable. A small change anywhere in the prompt can break the matching pattern. The provider may choose not to cache for reasons you cannot see or control. You have zero visibility into whether it is working.&lt;/p&gt;

&lt;p&gt;In my case, the &lt;code&gt;systemContent&lt;/code&gt; blob included the current datetime on every message. That single line changing every turn was enough to break any automatic prefix matching. Even though the other 25k tokens were identical.&lt;/p&gt;

&lt;p&gt;The persona prompt (~500 tokens) is lightweight and rarely changes. The datetime changes every turn. The knowledge compilation (~25-30K tokens) is heavy but stable for the entire session. Treating them as one string means the lightweight parts are hostage to the heavy part.&lt;/p&gt;

&lt;h2&gt;
  
  
  Change #1: Splitting the Context Snapshot
&lt;/h2&gt;

&lt;p&gt;The first change was structural. Instead of returning one &lt;code&gt;systemContent&lt;/code&gt; string, &lt;code&gt;prepareContext&lt;/code&gt; now returns three separate fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prepareContext: after&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemInstruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cachedSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\nCurrent date and time: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currentDateTime&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;knowledgeCache&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;cacheName&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;apiMessages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// user/assistant turns only, no system message&lt;/span&gt;
  &lt;span class="nx"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// lightweight persona + datetime (~500 tokens)&lt;/span&gt;
  &lt;span class="nx"&gt;compilation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// heavy knowledge (~25K tokens), stable per session&lt;/span&gt;
  &lt;span class="nx"&gt;cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;// Gemini cache pointer (if available)&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HTTP layer sends these as separate JSON parameters to Cortex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;compilation&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;compilation&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;cacheName&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;cache_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cacheName&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiMessages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation is what makes everything else possible. The compilation is now an independent unit that the server can handle differently from the volatile parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Episodic Section
&lt;/h3&gt;

&lt;p&gt;I also adjusted the compilation itself. I added a new section that summarizes the previous session. Instead of relying only on the graph's entity and relationship definitions, the model now gets a short episodic recap: "Last session you talked about X, explored Y, and mentioned Z."&lt;/p&gt;

&lt;p&gt;This serves two purposes. First, it gives the model easy session continuity without loading raw message history. Second, it let me trim the budget for less-connected facts and concepts. The total max tokens dropped from ~30K to ~25K. That is ~5,000 fewer tokens per message before caching even enters the picture.&lt;/p&gt;

&lt;p&gt;You can see the effect in the PostHog data. After April 10th, the average tokens per message for my wife's account dropped visibly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42kp5497boh1v3kz5yrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42kp5497boh1v3kz5yrx.png" alt="posthog data avg tokens per message" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Change #2: Gemini Explicit Cache
&lt;/h2&gt;

&lt;p&gt;Here is where the real savings come from.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini CachedContent: The Concept
&lt;/h3&gt;

&lt;p&gt;Gemini offers an explicit caching API. You create a cache resource by uploading content to &lt;code&gt;caches.create()&lt;/code&gt;. You get back a resource name like &lt;code&gt;cachedContents/abc123&lt;/code&gt;. On subsequent requests, you pass that name and Gemini uses the cached content as a prefix instead of re-processing the input.&lt;/p&gt;

&lt;p&gt;The economics: cached tokens cost roughly 75% less than regular input tokens. For a 25K token compilation, that means paying for about 6,250 tokens instead of 25,000. On every single turn.&lt;/p&gt;

&lt;p&gt;The minimum is around 1,024 tokens. I use 4,000 characters as a conservative threshold.&lt;/p&gt;

&lt;h3&gt;
  
  
  How I Integrated It
&lt;/h3&gt;

&lt;p&gt;The cache lifecycle follows the existing Synapse pipeline. No new infrastructure. No new services. Just a new step after compilation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When a session starts&lt;/strong&gt; (hydration), Cortex compiles the knowledge and creates a Gemini cache from it. The &lt;code&gt;cacheName&lt;/code&gt; is returned to the client alongside the compilation. The client stores both in &lt;code&gt;user_knowledge_cache&lt;/code&gt; (Convex table).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;During the session&lt;/strong&gt;, the client sends &lt;code&gt;cache_name&lt;/code&gt; and &lt;code&gt;compilation&lt;/code&gt; on every chat request. If the cache is valid, Cortex passes it to Gemini via &lt;code&gt;cached_content&lt;/code&gt; and the compilation is served from cache. If there is no cache, Cortex inlines the compilation into the prompt. Same result, different price.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the session closes&lt;/strong&gt; (ingestion), new messages are processed into the knowledge graph, a fresh compilation is generated, and a new cache is created. The cycle repeats.&lt;/p&gt;

&lt;p&gt;The entire cache manager is about 100 lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CacheManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_compilation_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compilation_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compilation_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MIN_CHARS_FOR_CACHE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compilation_too_small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CreateCachedContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compilation_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;compilation_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invalidate_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cache_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;refresh_ttl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cache_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;UpdateCachedContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend is stateless. It does not track which user owns which cache. The client persists the &lt;code&gt;cacheName&lt;/code&gt; and forwards it. This keeps the architecture clean and avoids a new data store.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TTL Problem
&lt;/h3&gt;

&lt;p&gt;Gemini caches expire by wall clock. Default is 1 hour. Not by usage. So a user in a long conversation would hit expiration at exactly the 60-minute mark regardless of how many messages they sent.&lt;/p&gt;

&lt;p&gt;My solution: after every successful cache hit, I spawn a fire-and-forget task that pushes the TTL forward by another hour. Active users never expire mid-session. The task runs in the background, does not block the response, and if it fails nothing breaks. The worst case is the cache expires and the fallback kicks in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cache_hit&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;active_cache_name&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;cache_manager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;_spawn_background&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;refresh_ttl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cache_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Fallback: Engineering for the Unhappy Path
&lt;/h2&gt;

&lt;p&gt;One thing I like about Gemini's implementation: if the cache is expired or not found, the request fails explicitly. It does not silently fall back to full-price tokens. It tells you. That gives you control to decide what to do next, whether that is retry with the full compilation or create a fresh cache.&lt;/p&gt;

&lt;p&gt;But that also means a stale cache is a new way for the app to break. Caches expire by wall clock. They can get deleted upstream. The model in the cache might not match the model in the request. If I built a system that only works when the cache is hot, it would be a matter of time before my wife wakes me up at midnight asking why her precious app stopped working.&lt;/p&gt;

&lt;p&gt;The design principle is simple: &lt;strong&gt;the client always sends everything&lt;/strong&gt;. Both the &lt;code&gt;cache_name&lt;/code&gt; and the full &lt;code&gt;compilation&lt;/code&gt;. The server decides which to use.&lt;/p&gt;

&lt;p&gt;If the cache is valid, use it. 75% cheaper. If the cache is stale, fall back to inlining the compilation. Full price, but it works. The user never notices.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Peek Pattern
&lt;/h3&gt;

&lt;p&gt;There is a subtlety with the Gemini SDK. It is lazy. When you open a streaming request, the actual HTTP call does not happen until you pull the first chunk. That means cache errors do not surface when you create the stream. They surface when you iterate.&lt;/p&gt;

&lt;p&gt;So I peek the first chunk inside a &lt;code&gt;try/except&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stream_iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;_open_and_peek&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cache_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_cache&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;_looks_like_cache_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Cache is stale, invalidate it
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;cache_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invalidate_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_cache_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Rebuild with compilation inlined and retry
&lt;/span&gt;        &lt;span class="n"&gt;gemini_contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inline_compilation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;stream_iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;_open_and_peek&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I match against a list of known error strings: "cache expired", "cache not found", "does not match the model in the cached content", and a few more. If the error matches, I invalidate the stale cache, rebuild the contents with the compilation inlined, and retry the stream. All before any bytes reach the client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auto Re-Hydration
&lt;/h3&gt;

&lt;p&gt;One more thing. When a fallback happens, I do not just serve the request and move on. The final SSE usage chunk includes &lt;code&gt;cache_fallback_triggered: true&lt;/code&gt;. The Convex client watches for this flag and schedules a background re-hydration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;cache_fallback_triggered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cortex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hydrate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a fresh cache for the next message. So only one message per expiration window pays full price. Every subsequent message in that session gets the cache benefit again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Here is what the data shows after both changes went live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Reduction
&lt;/h3&gt;

&lt;p&gt;The episodic restructuring (Change #1) dropped the average tokens per message from the ~40K range to the ~30K range. That is visible in the PostHog chart starting around April 10th.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Per Generation
&lt;/h3&gt;

&lt;p&gt;From the PostHog session tracking, here is the daily breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Generations&lt;/th&gt;
&lt;th&gt;Total Cost&lt;/th&gt;
&lt;th&gt;Cost/Generation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apr 11&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;$1.14&lt;/td&gt;
&lt;td&gt;$0.0170&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 12&lt;/td&gt;
&lt;td&gt;86&lt;/td&gt;
&lt;td&gt;$1.18&lt;/td&gt;
&lt;td&gt;$0.0137&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 13&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;$1.32&lt;/td&gt;
&lt;td&gt;$0.0169&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 14&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;$3.08&lt;/td&gt;
&lt;td&gt;$0.0390&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 16&lt;/td&gt;
&lt;td&gt;109&lt;/td&gt;
&lt;td&gt;$1.56&lt;/td&gt;
&lt;td&gt;$0.0143&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 17&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;td&gt;$2.16&lt;/td&gt;
&lt;td&gt;$0.0399&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apr 18&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;$0.55&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0088&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;April 18 is when the explicit cache was fully active. The cost per generation dropped to $0.0088. Less than half of the typical days.&lt;/p&gt;

&lt;p&gt;The expensive days (Apr 14 and 17) are sessions with heavy ingestion: 28 and 39 generation calls respectively, including the Gemini calls for graph processing during the Sleep Cycle. Those costs include both the chat generation and the knowledge extraction, not just the user-facing messages.&lt;/p&gt;

&lt;h3&gt;
  
  
  The $2.42 Session Revisited
&lt;/h3&gt;

&lt;p&gt;That 28-message session that cost $2.42? With caching active at the April 18 rate, the same conversation would cost roughly $0.25. That is almost a 90% reduction.&lt;/p&gt;

&lt;p&gt;And the AI still has full access to the knowledge graph. The compilation is the same content. It is just served from cache instead of being re-tokenized on every turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;Both changes are fully instrumented. On the Axiom side, every chat request logs cache attributes: &lt;code&gt;cache.hit&lt;/code&gt;, &lt;code&gt;cache.hit_ratio&lt;/code&gt;, &lt;code&gt;cache.fallback_triggered&lt;/code&gt;, &lt;code&gt;cache.skip_reason&lt;/code&gt;. On PostHog, I track &lt;code&gt;cache_enabled&lt;/code&gt;, &lt;code&gt;cache_hit&lt;/code&gt;, and &lt;code&gt;cached_tokens&lt;/code&gt; per generation.&lt;/p&gt;

&lt;p&gt;This means I can answer questions like: what percentage of requests hit the cache? How often does the fallback trigger? Is the TTL refresh working? No guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Next
&lt;/h2&gt;

&lt;p&gt;The cache layer is live and working. But there are a few things I want to explore:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tighter compilation budgets.&lt;/strong&gt; Now that the episodic summary provides session continuity, I think I can trim the base compilation further. Maybe from 25K to 20K tokens. The GraphRAG layer already handles the long tail. Less base knowledge means a smaller cache, which means even cheaper turns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-model flexibility.&lt;/strong&gt; Gemini caches are bound to a specific model. If I switch between Flash and Pro based on the task, each model needs its own cache. That is a limitation I have not solved yet.&lt;/p&gt;

&lt;p&gt;Building AI solutions is fun. Paying for them is the part that makes you think harder. And thinking harder usually leads to a better architecture.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is article #5 in the Synapse series. If you want the full backstory:&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/juandastic/my-wife-sent-297-messages-in-15-days-not-to-me-to-the-ai-i-built-her-the-synapse-story-333o"&gt;The Synapse Story&lt;/a&gt; - What Synapse is and why I built it&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;Beyond RAG&lt;/a&gt; - Knowledge graphs as the memory foundation&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;Scaling AI Memory&lt;/a&gt; - The hybrid approach with Hydration V2 and GraphRAG&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/juandastic/full-circle-giving-my-ais-knowledge-graph-a-notion-interface-using-mcp-2dmp"&gt;Full Circle&lt;/a&gt; - Giving the knowledge graph a Notion interface via MCP&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;The code is open source: &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;synapse-chat-ai&lt;/a&gt; (frontend) and &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;synapse-cortex&lt;/a&gt; (backend).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's connect on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>My Wife Sent 297 Messages in 15 Days. Not to Me. To the AI I Built Her. The Synapse Story</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Fri, 03 Apr 2026 01:26:03 +0000</pubDate>
      <link>https://dev.to/juandastic/my-wife-sent-297-messages-in-15-days-not-to-me-to-the-ai-i-built-her-the-synapse-story-333o</link>
      <guid>https://dev.to/juandastic/my-wife-sent-297-messages-in-15-days-not-to-me-to-the-ai-i-built-her-the-synapse-story-333o</guid>
      <description>&lt;h2&gt;
  
  
  The Psychologist Who Couldn't Find the Right Therapist
&lt;/h2&gt;

&lt;p&gt;My wife is a professional psychologist. She is also a regular therapy patient. And for years, she struggled to find a therapist who fits her intelligence and knowledge to use that in her favor, not against her.&lt;/p&gt;

&lt;p&gt;The problem was not the therapists. The problem was the format. She would walk into a session with a mental list (sometimes an actual list or even full presentations) of what she wanted to cover that week. Sometimes the session went deep into the right topics. Other times, something emotionally loud from that day would take over the entire hour. She would leave feeling lighter, sure, but frustrated. She had used her session to vent about something temporary instead of working on her core issues. And her therapist only saw her for one hour per week. There was no way to cover everything.&lt;/p&gt;

&lt;p&gt;When she discovered LLMs and understood their potential, something clicked. She started experimenting with Gemini as a daily companion for emotional exploration, not as a replacement for therapy, but as the &lt;strong&gt;missing piece&lt;/strong&gt; between sessions. The LLM handles the daily processing: the venting, the pattern recognition, the emotional sorting. The professional therapist acts as the safeguard of the process, keeping everything aligned with long-term goals.&lt;/p&gt;

&lt;p&gt;It worked. But Gemini has no memory. So she built a workaround.&lt;/p&gt;

&lt;p&gt;Over months, she crafted a massive "Master Prompt" in Notion. It contained her medical history, key life events, emotional triggers, therapeutic frameworks, and ongoing projects. Every time she started a new conversation, she had to manually copy-paste this just to get the AI up to speed. If she didn't, the advice was generic and useless.&lt;/p&gt;

&lt;p&gt;The prompt grew every week because life kept happening. She dreaded starting new threads because of the "context set up" tax. She felt like she was constantly repeating herself.&lt;/p&gt;

&lt;p&gt;She didn't need a search engine or a simple chat history. She needed a &lt;strong&gt;continuous brain.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://synapse-chat.juandago.dev/" rel="noopener noreferrer"&gt;Synapse&lt;/a&gt;. An AI chat with deep memory, powered by a knowledge graph, designed for the kind of personal conversations that actually need the AI to know who you are.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furhpv2al75xpw47iuy8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furhpv2al75xpw47iuy8b.png" alt="Side-by-side comparison: Regular AI vs. Synapse" width="781" height="510"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Articles, Three Walls, One User
&lt;/h2&gt;

&lt;p&gt;Synapse didn't start as what it is today. It evolved through four versions, and every single one broke when she actually used it. I documented each step on Dev.to. Here is the compressed timeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;V1: The Knowledge Graph.&lt;/strong&gt; I replaced the Notion page with a Neo4j knowledge graph powered by &lt;a href="https://github.com/getzep/graphiti" rel="noopener noreferrer"&gt;Graphiti&lt;/a&gt;. As she chatted, the AI quietly extracted entities and relationships in the background. I called it the "Sleep Cycle." No more copy-pasting. The compiled graph was about 10,000 tokens, down from her 35,000-token manual prompt. It worked. (&lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;Read the full origin story&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;V2: The Scaling Wall.&lt;/strong&gt; The graph grew. By day 21, every message carried over &lt;strong&gt;120,000 tokens&lt;/strong&gt; of system context. Costs climbed. Latency suffered. I built a budget-aware "waterfill" system (Hydration V2) that caps the prompt at ~30K tokens and retrieves the rest on demand with zero-latency GraphRAG. The AI didn't get dumber. It still felt like it knew everything about her. (&lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;How I tamed 120K tokens&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;V3: The UX Wall.&lt;/strong&gt; I built a graph visualizer so she could explore her AI's memory. I thought it was beautiful. To her, it was just overwhelming. She missed Notion. So I brought Notion back as the interface to her AI's brain, using MCP for bidirectional sync. She could review her AI's knowledge in structured tables, flag mistakes with a checkbox, and push corrections back to the graph. (&lt;a href="https://dev.to/juandastic/full-circle-giving-my-ais-knowledge-graph-a-notion-interface-using-mcp-2dmp"&gt;The full circle&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Every version looked perfect in my demo. Every version broke when she actually used it. That is what makes building for a real user different from building a tutorial.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhtw1unze40ixe4qs5se.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhtw1unze40ixe4qs5se.png" alt="Evolution diagram" width="276" height="630"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Synapse: A Chat AI That Builds a Map of Your Life
&lt;/h2&gt;

&lt;p&gt;Enough history. Let me show you what Synapse is today.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pipeline: Converse, Ingest, Compile, Evolve
&lt;/h3&gt;

&lt;p&gt;Synapse works in a 4-step cycle. You don't configure anything. You just talk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Converse.&lt;/strong&gt; You open a chat and start talking. Pick a persona (more on that below). The AI already knows who you are because the knowledge from your previous sessions was compiled and injected before you sent your first message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Ingest.&lt;/strong&gt; When a conversation pauses (3 hours of inactivity) or you press the "Consolidate Memory" button, the system closes the session. It sends the transcript to &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;Synapse Cortex&lt;/a&gt;, the Python backend that powers the brain. Cortex uses Graphiti and Gemini to extract entities, relationships, and patterns into a Neo4j knowledge graph. This is the "Sleep Cycle."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Compile.&lt;/strong&gt; Next time you start a conversation, Cortex hydrates the session. It compiles the most important knowledge from your graph into a structured text snapshot (~30K tokens). The most connected entities, the "hubs" of your life, always make it in. If the graph is too large, a waterfill algorithm prioritizes what matters most and retrieves the rest on demand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Evolve.&lt;/strong&gt; With every conversation, the graph refines. Old facts get invalidated with timestamps, not deleted. New connections emerge. The AI's understanding of you grows over weeks and months.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1h9ppsk8x4mhed3791l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1h9ppsk8x4mhed3791l.png" alt="4-step pipeline" width="748" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof37uv9hdiqosr9rl99c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof37uv9hdiqosr9rl99c.png" alt=" " width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Personas, One Memory
&lt;/h3&gt;

&lt;p&gt;Most AI tools give you one generic chatbot. Synapse gives you three specialized lenses. All three share the same knowledge graph. &lt;strong&gt;Same memory, different perspective.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🧭 &lt;strong&gt;Compass (Therapeutic).&lt;/strong&gt; Built on Acceptance and Commitment Therapy (ACT), Dialectical Behavior Therapy (DBT), and Polyvagal Theory. Neuroaffirmative by default. It tracks nervous system states. It remembers which grounding techniques worked before. One meaningful question at a time. For processing anxiety, grief, anger, and anything that needs deep emotional context.&lt;/p&gt;

&lt;p&gt;🌿 &lt;strong&gt;Solace (Wellbeing).&lt;/strong&gt; Built on Positive Psychology (PERMA model), Self-Compassion (Kristin Neff), and Mindfulness-Based Stress Reduction. Gentle, unhurried, reflective. For daily emotional check-ins, mood patterns, and self-compassion practices. It notices patterns in your energy, sleep, and stress over time.&lt;/p&gt;

&lt;p&gt;⚡ &lt;strong&gt;Momentum (Growth Coach).&lt;/strong&gt; Built on Motivational Interviewing and Implementation Intentions. Direct, action-oriented, no fluff. For goal tracking, overcoming procrastination, and building momentum. It remembers your commitments across sessions and calls you on it.&lt;/p&gt;

&lt;p&gt;She might process a hard week with Compass on Monday, do a gentle check-in with Solace on Wednesday, and set specific goals with Momentum on Friday. &lt;strong&gt;All three know what happened. No repetition.&lt;/strong&gt; The knowledge graph is shared across all personas.&lt;/p&gt;

&lt;p&gt;These are not toy system prompts. Each persona has carefully designed therapeutic frameworks, response styles, and boundaries. The Compass persona alone references four evidence-based frameworks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Facts vs. Relationships: What Makes Memory Actually Useful
&lt;/h3&gt;

&lt;p&gt;Let me be honest about what other AI tools offer. Gemini and ChatGPT both now have memory features. And they do show you what they remember. But look at what that memory looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfzmcqor2mt5leld9la.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqfzmcqor2mt5leld9la.png" alt="ChatGPT Memory" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Luciana's favorite color is purple."&lt;br&gt;
"Luciana has a cat."&lt;br&gt;
"Is using Windows with WSL."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A flat list of disconnected facts. No relationships between them. No causality. No timeline. You can delete a memory, but you cannot tell the AI "actually, I left that job in March" and have it update everything connected to that fact.&lt;/p&gt;

&lt;p&gt;Now compare that to what Synapse stores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Work Stress → TRIGGERS → Insomnia
Insomnia → AFFECTS → Relationship with Partner
Therapist → RECOMMENDED → Grounding Techniques
Grounding Techniques → HELPED_WITH → Work Stress (since March)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is not just visibility. It is &lt;strong&gt;structure.&lt;/strong&gt; One stores isolated data points. The other stores a connected model of your life. When she tells the AI "I'm feeling overwhelmed today," a flat memory might recall that she mentioned "overwhelm" three months ago. The knowledge graph knows the causal chain: which project caused the stress, how the stress affected her sleep, and what techniques helped her last time.&lt;/p&gt;

&lt;p&gt;For personal conversations, especially around mental health, this difference changes everything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5qf9uewk8wpocsittg5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5qf9uewk8wpocsittg5.png" alt="Synapse Memory Explorer" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Your Memory Is Yours
&lt;/h3&gt;

&lt;p&gt;Beyond the graph structure, Synapse gives you full control over your AI's memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explore your graph.&lt;/strong&gt; An interactive force-directed visualization where you can click any entity and see its connections, descriptions, and relationships.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correct in plain English.&lt;/strong&gt; Type "I actually left that job in March, not April" and the graph updates. Graphiti handles temporal invalidation. Old facts are marked as outdated, not deleted. The AI knows what history vs. what is current is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export to Notion.&lt;/strong&gt; Your full knowledge graph synced to Notion databases. The AI designs the schema based on your actual data (if you talk about health, it creates a Medications database; if you talk about work, it creates a Projects database). Review it in a tool you already know. Flag errors with a checkbox. Push corrections back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fully open source.&lt;/strong&gt; Both &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;the frontend&lt;/a&gt; and &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;the backend&lt;/a&gt; are on GitHub. You can audit exactly how your data is processed. For something that touches mental health, this is not optional.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsm99ixmlbtxlcqyn8xx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsm99ixmlbtxlcqyn8xx.png" alt=" " width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  297 Messages in 15 Days
&lt;/h2&gt;

&lt;p&gt;Enough about architecture. What happens when a real person uses this thing every day?&lt;/p&gt;

&lt;p&gt;On March 19, 2026, I started tracking product analytics with PostHog. Here is what the first 15 days looked like for my wife, User Zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The numbers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;297 messages&lt;/strong&gt; sent across 15 days&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15.2 million tokens&lt;/strong&gt; processed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% daily active usage.&lt;/strong&gt; She used it every single day. No exceptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak: 69 messages in one day&lt;/strong&gt; (March 22). That is roughly 5 million tokens in a single day.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Average: ~20 messages per day&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average ~51K tokens per message.&lt;/strong&gt; These are not quick Q&amp;amp;A exchanges. These are deep, contextual conversations where the AI brings in compiled knowledge from weeks of previous sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fseui9itpa9mm3j0hwuyt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fseui9itpa9mm3j0hwuyt.png" alt="Messages sent" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oicgr04dbhzd04wqalm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oicgr04dbhzd04wqalm.png" alt="Tokens per day" width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is not a novelty bounce. She didn't try it and forget. She integrated Synapse into her daily routine. For context, she was already a heavy Gemini user before Synapse existed. The difference is she stopped using Gemini for personal conversations. Synapse became &lt;strong&gt;the de facto daily companion.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Therapy Bridge
&lt;/h3&gt;

&lt;p&gt;I want to be clear about something. Synapse is not replacing her therapist or psychiatrist. It is &lt;strong&gt;bridging the gaps between sessions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Between weekly therapy appointments, life happens. Emotions surface. Patterns repeat. With Synapse, she can process them in real-time with an AI that knows her full context. The Compass persona tracks nervous system states. It remembers which coping strategies worked. It connects this week's anxiety to the same trigger from last month.&lt;/p&gt;

&lt;p&gt;When she goes to her next therapy session, the preliminary exploration is already done. She arrives with clearer language for what she is feeling and why. The session is more productive because she is not spending the first 20 minutes catching her therapist up on context.&lt;/p&gt;

&lt;p&gt;A human therapist sees you for 1 hour per week. Synapse fills the other 167 hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synapse is not a replacement for therapy. It is the journal that talks back, with memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hfzrfc4trh0dpmljrmj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hfzrfc4trh0dpmljrmj.png" alt=" " width="552" height="526"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building for an Audience of One
&lt;/h2&gt;

&lt;p&gt;My wife is the most demanding and picky user I have known. She never cuts back from saying something I built is useless. But when something is actually useful, she uses it every day. My hack to build things that matter is simple: build for her, and I know I will build something real.&lt;/p&gt;

&lt;p&gt;A few reflections from this journey.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory is not a feature. It IS the product.&lt;/strong&gt; For mental health and personal conversations, the smartest model in the world is useless if it doesn't know what made you cry last Tuesday. Gemini is brilliant. It can explain quantum physics. But ask it about YOUR life, and it starts from zero every time. The Master Prompt was her workaround. Synapse is the fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparency is not optional for personal AI.&lt;/strong&gt; She can see her graph, correct it, export it to Notion. She knows what the AI "thinks" about her. For something this personal, a black box is not acceptable. Open source is not a nice-to-have. It is a requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building in public works.&lt;/strong&gt; The community saw the 120K-token problem before I did. My first article got 26 comments. Someone signed up for Synapse and warned me about costs within hours of publishing Article #3. I shipped a plan system and a demo account the same day. That feedback loop is priceless.&lt;/p&gt;

&lt;p&gt;One comment from Victor Okefie on my third article stuck with me. About watching my wife ignore the graph visualizer I built for her:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"That's not feature development. That's listening."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the whole philosophy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It, Break It, Build With Me
&lt;/h2&gt;

&lt;p&gt;Synapse is live and open source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's available today (Free tier):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 messages per day, all three personas&lt;/li&gt;
&lt;li&gt;Knowledge graph visualization and memory corrections&lt;/li&gt;
&lt;li&gt;Full English and Spanish support&lt;/li&gt;
&lt;li&gt;Fully open source, both frontend and backend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's coming:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pro tier:&lt;/strong&gt; 50 messages per day, intelligent retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Therapeutic tier:&lt;/strong&gt; Share graph insights with your therapist, crisis detection and alerts, session reminders, therapy homework integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;strong&gt;Try it:&lt;/strong&gt; &lt;a href="https://synapse-chat.juandago.dev/" rel="noopener noreferrer"&gt;synapse-chat.juandago.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;Frontend code:&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;github.com/juandastic/synapse-chat-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🧠 &lt;strong&gt;Backend code:&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;github.com/juandastic/synapse-cortex&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The technical deep-dive series:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;Beyond RAG: Building an AI Companion with Deep Memory Using Knowledge Graphs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;Scaling AI Memory: How I Tamed a 120K-Token Prompt with Deterministic GraphRAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/juandastic/full-circle-giving-my-ais-knowledge-graph-a-notion-interface-using-mcp-2dmp"&gt;Full Circle: Giving My AI's Knowledge Graph a Notion Interface Using MCP&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Building software is fun. But seeing it come alive and solve actual problems for someone you care about is something else entirely.&lt;/p&gt;

&lt;p&gt;Let's connect on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I Benchmarked Graphiti vs Mem0: The Hidden Cost of Context Blindness in AI Memory</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Sun, 22 Mar 2026 07:08:06 +0000</pubDate>
      <link>https://dev.to/juandastic/i-benchmarked-graphiti-vs-mem0-the-hidden-cost-of-context-blindness-in-ai-memory-4le3</link>
      <guid>https://dev.to/juandastic/i-benchmarked-graphiti-vs-mem0-the-hidden-cost-of-context-blindness-in-ai-memory-4le3</guid>
      <description>&lt;p&gt;A few days ago, &lt;a href="https://x.com/taranjeetio" rel="noopener noreferrer"&gt;Taranjeet&lt;/a&gt;, the CEO of Mem0, reacted to one of my articles about building AI memory with knowledge graphs. That caught my attention.&lt;/p&gt;

&lt;p&gt;Mem0 is one of the most popular memory frameworks in the AI space. Thousands of developers use it. And here I was, running a heavier, more expensive architecture with Graphiti and Neo4j for my personal project.&lt;/p&gt;

&lt;p&gt;Was I over-engineering this?&lt;/p&gt;

&lt;p&gt;I had to find out. So I built a benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Context: Why I Care About AI Memory
&lt;/h2&gt;

&lt;p&gt;I've been building &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;Synapse&lt;/a&gt;, an AI companion for my wife. Not a chatbot. A companion that remembers her life, her relationships, her emotional states, and how all of that connects over time.&lt;/p&gt;

&lt;p&gt;It started with a 35,000-token "Master Prompt" that she maintained manually in Notion. Every time something changed in her life, she updated it by hand. That obviously didn't scale. So I moved to &lt;a href="https://github.com/getzep/graphiti" rel="noopener noreferrer"&gt;Graphiti&lt;/a&gt;, a knowledge graph framework that extracts entities and relationships from conversations automatically.&lt;/p&gt;

&lt;p&gt;I wrote about this journey in two previous articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;Beyond RAG: Building an AI Companion with Deep Memory Using Knowledge Graphs&lt;/a&gt; (how knowledge graphs replaced the manual prompt)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;Scaling AI Memory: How I Tamed a 120K-Token Prompt with Deterministic GraphRAG&lt;/a&gt; (how I kept the prompt under control as the graph grew)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system works well. But when I started looking at Mem0, I realized they solve some of the same problems (fact extraction, deduplication, contradiction handling) with a different architecture. They use a vector store as the primary brain and offer an optional graph layer on top. Fewer LLM calls per ingestion, and a fundamentally different take on how to combine vectors and graphs.&lt;/p&gt;

&lt;p&gt;I wanted to understand both approaches. What does storing everything in one graph give you? What does splitting vectors and graphs into independent stores give you? What do you lose in each case?&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Fundamentally Different Philosophies
&lt;/h2&gt;

&lt;p&gt;Before the benchmark, let me explain what each system actually does under the hood. They both ingest conversations and store memories. But the architecture is completely different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Graphiti: The Unified Graph
&lt;/h3&gt;

&lt;p&gt;Graphiti puts everything in one place: a Neo4j graph database. Entities become nodes. Facts become edges. Embeddings live as properties on those nodes and edges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsml56d8mpzoxjxz7fw7k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsml56d8mpzoxjxz7fw7k.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key detail: each edge carries a full natural-language fact, plus temporal fields. When a fact becomes outdated, Graphiti doesn't delete it. It marks it with an &lt;code&gt;invalid_at&lt;/code&gt; timestamp and creates the new fact alongside it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mem0: The Split Architecture
&lt;/h3&gt;

&lt;p&gt;Mem0 takes a different approach. The primary brain is a vector store (Qdrant, Pinecone, etc.) holding atomic fact strings. It has an optional graph (Neo4j), but it runs as a completely independent parallel system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3ohkti1mhkk0eh5qyrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3ohkti1mhkk0eh5qyrh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The vector store holds rich text. The graph holds thin triples: &lt;code&gt;entity -&amp;gt; relationship_type -&amp;gt; entity&lt;/code&gt;. No natural-language facts on edges. No temporal fields. And critically: &lt;strong&gt;the two stores share no IDs and run independently&lt;/strong&gt;. They can drift out of sync.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Actually Stored on an Edge
&lt;/h3&gt;

&lt;p&gt;This is the single most important difference. Let me show it concretely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graphiti edge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Demy"&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maplewood"&lt;/span&gt;
&lt;span class="na"&gt;relation_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WORKS_AT"&lt;/span&gt;
&lt;span class="na"&gt;fact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Demy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;started&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;working&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;startup&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Maplewood&lt;/span&gt;
       &lt;span class="s"&gt;doing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;full-stack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;work,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;just&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;backend"&lt;/span&gt;
&lt;span class="na"&gt;valid_at&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-02-15&lt;/span&gt;
&lt;span class="na"&gt;invalid_at&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;0.012&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;-0.034&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;...&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mem0 graph edge&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;source:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demy"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;target:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"maplewood"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;relationship:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"WORKS_AT"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;valid:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;mentions:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Graphiti stores the &lt;strong&gt;full story&lt;/strong&gt; on every edge. Mem0 stores the &lt;strong&gt;label&lt;/strong&gt; on the graph edge and puts the text in the vector store as a separate entry. For retrieval this means: Graphiti can give you structure AND semantics in one query. Mem0 needs two separate lookups and hopes they align.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Aha!" Moment: Context Blindness
&lt;/h2&gt;

&lt;p&gt;Before I show you the benchmark results, I need to explain the insight that made this comparison matter to me. Because the results only make sense once you understand what "context blindness" means in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Pure RAG
&lt;/h3&gt;

&lt;p&gt;Most AI memory systems work like this: user asks something, you do a similarity search, you inject the top-K results into the prompt. Simple and effective.&lt;/p&gt;

&lt;p&gt;But there's a hidden cost. The LLM only sees what the similarity search returns. If the user asks about work, and the search returns work facts, the model has no idea about the emotional context from childhood that might be relevant. It's blind to everything outside the search window.&lt;/p&gt;

&lt;p&gt;I call this &lt;strong&gt;context blindness&lt;/strong&gt;: the LLM's intelligence is limited by the narrow slice of memory that semantic similarity surfaces for each turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Matters for a Companion
&lt;/h3&gt;

&lt;p&gt;Modern models are incredible at reasoning over large contexts. Give them 50k tokens of well-organized information about a person's life, and they make connections you didn't explicitly ask for. They notice patterns. They bring up relevant history naturally.&lt;/p&gt;

&lt;p&gt;But you can't give them everything. That's expensive and noisy. So the question becomes: &lt;strong&gt;how do you decide what the model should always know vs what it should retrieve on demand?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Synapse Approach: Base Context + RAG for the Long Tail
&lt;/h3&gt;

&lt;p&gt;This is the architecture I built for Synapse, which I call &lt;a href="https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85"&gt;Hydration V2&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Base Context&lt;/strong&gt;: A budget-aware prompt (~30k tokens) that always includes the most important entities. I use the graph structure, specifically node degree (how many connections an entity has), to find the "hubs" of her life. Elena (mom), Noa (partner), Marco (tech lead). These always go in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RAG for Long Tail&lt;/strong&gt;: Similarity search only kicks in for specific details that don't fit in the base context. And here's the trick: I track exactly which facts are already in the base prompt.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The metadata contract. Cortex sends this on every request
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compilationMetadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;included_node_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uuid-elena&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uuid-noa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uuid-marco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;included_edge_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uuid-works-at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uuid-diagnosed-with&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When RAG retrieves results, I cross-reference against this list and &lt;strong&gt;drop any facts already in context&lt;/strong&gt;. No duplication. No wasted tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Only Works with Co-located Semantics
&lt;/h3&gt;

&lt;p&gt;Here's the thing: this metadata contract requires that nodes and edges live in the same store with shared IDs. I go from "Elena has high degree" to "here are Elena's facts" in one database query.&lt;/p&gt;

&lt;p&gt;With Mem0's split architecture, this is impossible. The graph knows Elena is important (she has many connections). But Elena's actual facts live in the vector store under different IDs. There's no direct link between the graph entity "elena" and the vector memories about Elena. You'd need to search the vector store by text similarity to find Elena-related facts. Which is exactly the context blindness problem you're trying to avoid.&lt;/p&gt;

&lt;p&gt;Could you build a mapping table between vector IDs and graph entities? Sure. But at that point you're building a co-location layer on top of a split architecture. You're rebuilding what Graphiti gives you for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark: What I Actually Tested
&lt;/h2&gt;

&lt;p&gt;I built a 3-phase benchmark using a fictional user profile (Demy) with complex life situations: an ASD diagnosis, workplace dynamics, BJJ training, family trauma, and relationship changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat&lt;/strong&gt;: Synapse doesn't use advanced graph features like BFS traversal or multi-hop queries. It does hybrid search: BM25 + cosine similarity + RRF reranking. So this benchmark doesn't test "graph retrieval" in the academic sense. It tests something more practical: &lt;strong&gt;what you gain or lose in retrieval quality when semantic context and graph entities live together vs apart&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Setup
&lt;/h3&gt;

&lt;p&gt;Both systems got the exact same data, same LLM (gpt-4.1-mini), same embedding model (text-embedding-3-small). Graphiti searched with the same &lt;code&gt;SearchConfig&lt;/code&gt; that Cortex uses in production (edge + node hybrid with RRF). Mem0 searched with both vector memories AND graph relations in parallel.&lt;/p&gt;

&lt;p&gt;Every phase ended with a &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; assessment that scored both systems on relevant dimensions (1-5 scale).&lt;/p&gt;

&lt;p&gt;The full benchmark is &lt;a href="https://github.com/juandastic/graphiti-vs-mem0-benchmark" rel="noopener noreferrer"&gt;open source&lt;/a&gt;. You can run it yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Knowledge Extraction
&lt;/h3&gt;

&lt;p&gt;Four conversations ingested: an ASD Level 1 diagnosis, workplace feedback from tech lead Marco, a BJJ blue belt promotion, and childhood memories with mother Elena.&lt;/p&gt;

&lt;p&gt;Then I ran 5 knowledge probes: factual, relational, event-based, emotional, and workplace queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Contradiction Handling
&lt;/h3&gt;

&lt;p&gt;Six facts changed: new job (Maplewood startup), belt upgrade (blue to purple), gym switch (Roots MMA to Iron Flow), breakup (Noa), role change (backend to full-stack), and new pet (Pixel the cat).&lt;/p&gt;

&lt;p&gt;Both systems were probed before and after the updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Story Retention
&lt;/h3&gt;

&lt;p&gt;A rich 14-message narrative about a traumatic childhood event called "the forest event." A camping trip with family, sensory overload at a campfire, going nonverbal, the mother's reaction, a fight between parents that led to their divorce, and 20 years of guilt. Sensory triggers. EMDR therapy plans.&lt;/p&gt;

&lt;p&gt;This was the hardest test. Can atomic fact extraction preserve a story's connective tissue?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cost: Mem0 Wins
&lt;/h3&gt;

&lt;p&gt;No surprise here. Graphiti's richer pipeline costs more.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Graphiti&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase 1 (4 sessions)&lt;/td&gt;
&lt;td&gt;34,632 tokens&lt;/td&gt;
&lt;td&gt;25,394 tokens&lt;/td&gt;
&lt;td&gt;1.36x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 2 (2 sessions)&lt;/td&gt;
&lt;td&gt;25,601&lt;/td&gt;
&lt;td&gt;14,532&lt;/td&gt;
&lt;td&gt;1.76x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 3 (1 session, 14 msgs)&lt;/td&gt;
&lt;td&gt;26,900&lt;/td&gt;
&lt;td&gt;11,936&lt;/td&gt;
&lt;td&gt;2.25x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87,133&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;51,862&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.68x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ratio increases with narrative complexity. Phase 3's single story session cost 2.25x more with Graphiti, driven by its entity deduplication pipeline checking each new edge against the entire existing graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Knowledge Coverage
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Graphiti&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fact completeness&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entity relations&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specificity&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrievability&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.75&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.25&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Graphiti won 4 of 5 probes. Its entity summaries added context that Mem0 lacked. Marco's entity node included the specific date of the 1-on-1 and the feedback details, making retrieval sharper.&lt;/p&gt;

&lt;p&gt;But two problems showed up in Mem0's results that I didn't expect:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Top-K crowding.&lt;/strong&gt; When I asked "What feedback did Marco give Demy?", Mem0's vector search returned childhood memories about Elena alongside the Marco results. The emotional weight of those embeddings dominated the similarity rankings and pushed relevant results down. The graph relations were even worse, returning &lt;code&gt;elena → enrolled → demy&lt;/code&gt; and &lt;code&gt;elena → is_mom_of → demy&lt;/code&gt; for a workplace query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Graph retrieval noise.&lt;/strong&gt; Mem0's graph search returns structural neighbors without semantic awareness. It doesn't know that Elena triples are irrelevant to a Marco query. It just returns whatever is connected. This happened in 3 of 5 probes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Contradictions (The Split-Brain Problem)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Graphiti&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Temporal handling&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current fact retrieval&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additive facts&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Historical awareness&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.75&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Graphiti's temporal invalidation worked as expected. When I searched for "Who is Demy's partner?" after the breakup, the old Noa edge appeared clearly marked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User is processing their neurodivergent experience
with the support of Noa. [OUTDATED]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An LLM reading this knows Noa is history, not present. It can say "I remember Noa" without confusing past and present.&lt;/p&gt;

&lt;p&gt;Mem0 had a different problem. After the purple belt update, both facts appeared as equally current:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; Got a purple belt last month in martial arts
&lt;span class="p"&gt;-&lt;/span&gt; Got promoted to blue belt at Roots MMA
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No way to tell which is current. Both just exist side by side.&lt;/p&gt;

&lt;p&gt;But the most interesting finding was the &lt;strong&gt;split-brain&lt;/strong&gt;. When Demy switched gyms from Roots MMA to Iron Flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mem0's &lt;strong&gt;graph&lt;/strong&gt; correctly updated: &lt;code&gt;demy → trains_at → iron_flow_gym&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Mem0's &lt;strong&gt;vector store&lt;/strong&gt; still prominently featured: "Feels physically exhausted but mentally regulated after training" (from the Roots MMA era)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two independent stores drifted out of sync. The graph knew one thing, the vectors said another. This is an architectural consequence, not a bug. The two stores process the same messages independently with no cross-referencing.&lt;/p&gt;

&lt;p&gt;Both systems handled purely additive facts well. Pixel the cat was correctly stored by both. Graphiti even caught a secondary effect: the improved relationship with Rodrigo who helped pick out the cat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Story Retention (The Surprise)
&lt;/h3&gt;

&lt;p&gt;This is where it got interesting. I expected Graphiti to dominate again. It didn't.&lt;/p&gt;

&lt;p&gt;Graphiti extracted 16 story-related edges. Clean entity connections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; [Elena -&amp;gt; Tomas] Elena and Tomas had a major fight
  during the camping trip, leading to...
&lt;span class="p"&gt;-&lt;/span&gt; [Tomas -&amp;gt; Elena] Tomas was married to Elena until
  their separation about a year after the forest event [OUTDATED]
&lt;span class="p"&gt;-&lt;/span&gt; [User -&amp;gt; Dr. Vega] User is being treated by Dr. Vega
  who helps them understand the forest event trauma
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mem0 extracted 12 story-related memories. Different kind of detail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Has sensory triggers related to the event: smell of
  wood smoke, sound of running water, someone screaming a name
- Carried guilt for nearly 20 years believing the event
  caused parents' separation
- Experienced sensory overload on the second night due to
  noise, smoke, and flickering light
- Experienced a recent trigger in a park when someone
  yelled a name loudly, causing them to freeze
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern was clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graphiti captured the causal structure&lt;/strong&gt;: who did what to whom, what led to what, entity connections. The skeleton of the story.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mem0 captured the lived experience&lt;/strong&gt;: sensory triggers, emotional weight, the 20-year guilt, the specific park incident. The flesh of the story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I asked "What are Demy's sensory triggers?", Graphiti returned generic references to the forest event. Mem0 returned the exact three triggers: wood smoke, running water, someone screaming a name.&lt;/p&gt;

&lt;p&gt;When I asked "Why did Demy's parents separate?", Graphiti returned the direct causal chain: fight during camping → separation a year later. Mem0 returned the emotional aftermath but with weaker causation.&lt;/p&gt;

&lt;p&gt;For a companion that needs to both understand the story structure AND respond with emotional awareness, neither system alone was complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mem0 is genuinely good at vector retrieval
&lt;/h3&gt;

&lt;p&gt;Looking at the data fairly, Mem0's atomic fact extraction produces high-quality, well-crafted memories. "Feels anger about not knowing earlier, which might have prevented burnout." "At a cousin's birthday party, hide in the bathroom for 45 minutes due to the loud noise." These are clean, specific, and individually useful.&lt;/p&gt;

&lt;p&gt;For a standard RAG pipeline (similarity search against a query, inject top results), Mem0's memories are arguably better optimized than Graphiti's edge facts, which are structured around entity pairs rather than standalone readability.&lt;/p&gt;

&lt;h3&gt;
  
  
  But vector retrieval alone creates blind spots
&lt;/h3&gt;

&lt;p&gt;The top-k crowding problem is real. When all your memories are independent vectors with no structural awareness, emotionally heavy content dominates similarity rankings. Childhood trauma bleeds into workplace queries. The system has no way to say "these facts are about Elena, those are about Marco" without relying entirely on embedding distance.&lt;/p&gt;

&lt;p&gt;This is what I mean by context blindness. The LLM only sees what similarity search surfaces. And similarity search doesn't understand life categories. It understands embedding proximity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Co-located semantics are the key differentiator
&lt;/h3&gt;

&lt;p&gt;The practical advantage of Graphiti isn't graph traversal (I don't use it). It's that entities and their facts live together. This enables:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Knowing what matters&lt;/strong&gt;: node degree tells you Elena is a hub entity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Getting the full picture&lt;/strong&gt;: one query returns both the entity summary and all its facts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tracking what's in context&lt;/strong&gt;: the metadata contract prevents duplicate retrieval&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Mem0, you can know Elena is structurally important (the graph tells you). But getting Elena's rich facts requires a separate vector search, and that search might return non-Elena results based on embedding similarity. The two stores don't talk to each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real architecture is base context + selective RAG
&lt;/h3&gt;

&lt;p&gt;After running this benchmark, I'm more convinced than ever: the future of AI memory isn't "retrieve everything via similarity search." It's:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-load the important stuff&lt;/strong&gt;: use the graph structure to identify key entities, put their facts in the base context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use RAG for the long tail&lt;/strong&gt;: specific memories, niche details, historical events that don't fit in the budget&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track what's already in context&lt;/strong&gt;: so RAG doesn't waste tokens re-retrieving facts the model already knows&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This way, the model always has the structural backbone of the user's life. RAG extends it when needed. Latency stays low for the common case. And you avoid the top-k crowding problem because the important entities aren't competing in similarity search. They're already in context.&lt;/p&gt;

&lt;p&gt;I won't go into details today, but I feel the code agents do a similar thing with the AGENTS.md always there alongside the tools definition, and then skills search + code discovery&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mem0 is the right choice for most AI agents.&lt;/strong&gt; If you need a reliable, mutable memory system with great fact extraction and you're doing standard similarity search, Mem0 is simpler, cheaper (40% fewer tokens), and well-maintained. For 90% of agents, the split architecture doesn't matter because you're not building base context from graph structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graphiti is worth the cost for deeply interconnected companions.&lt;/strong&gt; If you need to build a structural understanding of someone's life, know which entities are central, pre-load their context, track what's already known, and handle temporal evolution, Graphiti's unified architecture pays for itself. The extra tokens buy you co-located semantics that enable strategies Mem0's split stores can't support.&lt;/p&gt;

&lt;p&gt;The hidden cost of context blindness isn't in the retrieval scores. It's in the connections the model never makes because the right context wasn't there.&lt;/p&gt;




&lt;p&gt;The full benchmark (scripts, seed data, results, and the technical report) is &lt;a href="https://github.com/juandastic/graphiti-vs-mem0-benchmark" rel="noopener noreferrer"&gt;open source&lt;/a&gt;. You can run it yourself, swap models, and see if your results match mine.&lt;/p&gt;

&lt;p&gt;If you're building memory systems for AI agents, I'd love to hear how you approach this. What's working for you? What breaks at scale?&lt;/p&gt;

&lt;p&gt;Find me on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Full Circle: Giving My AI's Knowledge Graph a Notion Interface using MCP</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Tue, 17 Mar 2026 06:06:10 +0000</pubDate>
      <link>https://dev.to/juandastic/full-circle-giving-my-ais-knowledge-graph-a-notion-interface-using-mcp-2dmp</link>
      <guid>https://dev.to/juandastic/full-circle-giving-my-ais-knowledge-graph-a-notion-interface-using-mcp-2dmp</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/notion-2026-03-04"&gt;Notion MCP Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I started building AI tools for my wife, it was because she had outgrown Notion. &lt;/p&gt;

&lt;p&gt;She uses LLMs as a life coach, therapist, and sounding board. To give the AI context, she maintained a massive 35,000-token "Master Prompt" in a Notion page detailing her life, medical history, and goals. She had to manually copy-paste this wall of text into every new chat. &lt;/p&gt;

&lt;p&gt;To automate this, I built &lt;a href="https://synapse-chat.juandago.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;Synapse&lt;/strong&gt;&lt;/a&gt;, a system that replaces that manual prompt with a Temporal Knowledge Graph (Neo4j + Graphiti). As she chats, the AI quietly extracts entities and relationships in the background, building a continuous memory.&lt;/p&gt;

&lt;p&gt;It worked perfectly. But then I hit a UX wall.&lt;/p&gt;

&lt;p&gt;I built a visualizer of the actual knowledge graph so she could explore her AI's memory. I thought it was beautiful. To me, it was fascinating to watch the graph grow and see new connections form over time. But to her, it was just overwhelming. The sheer amount of nodes and floating edges was too much to process, so she ended up completely ignoring that section of the app.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frajk3td11fe6mkekzasb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frajk3td11fe6mkekzasb.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It turns out that while the &lt;em&gt;concept&lt;/em&gt; of a graph is great for understanding relationships, navigating a massive raw graph view is for machines, not humans. She missed Notion. She missed structured tables, clear properties, and the simple ability to just click and type to fix a mistake.&lt;/p&gt;

&lt;p&gt;So, I brought the project full circle. I used the new Notion MCP to turn Notion back into the ultimate Human-Machine interface for her AI's brain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built a &lt;strong&gt;bidirectional, human-in-the-loop sync&lt;/strong&gt; between a Neo4j Knowledge Graph and Notion.&lt;/p&gt;

&lt;p&gt;This isn't just a one-way "AI appending text to a page" script. It is a dynamic two-way pipeline:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysapyhjh7zqw8zkg9ny2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysapyhjh7zqw8zkg9ny2.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Export (AI Designs the UI):&lt;/strong&gt; Instead of using hardcoded Notion templates, Synapse compiles the user's graph and asks Gemini to &lt;em&gt;design&lt;/em&gt; a custom database schema. If the user talks a lot about their health, the AI creates a "Medications" database with "Active/Suspended" select tags. If they talk about code, it creates a "Projects" database with tech stacks. No two exports look the same.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Import (Human-in-the-Loop):&lt;/strong&gt; AI memory systems hallucinate. To fix this, every AI-generated Notion database gets a &lt;code&gt;Needs Review&lt;/code&gt; checkbox and a &lt;code&gt;Correction Notes&lt;/code&gt; column. If the AI misunderstood something, my wife just checks the box, types the correction in Notion, and hits sync. The system updates the Knowledge Graph (invalidating the old facts) and automatically patches the Notion row.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Video Demo
&lt;/h2&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/AXeioxrrht0"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  Show us the code
&lt;/h2&gt;

&lt;p&gt;The entire architecture is open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Backend (Synapse Cortex):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;https://github.com/juandastic/synapse-cortex&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Frontend (Synapse Chat):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;https://github.com/juandastic/synapse-chat-ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/juandastic" rel="noopener noreferrer"&gt;
        juandastic
      &lt;/a&gt; / &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;
        synapse-cortex
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Synapse Cortex&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Cognitive backend for the Synapse AI Chat application&lt;/strong&gt;. A stateless REST API that processes conversational data into a dynamic knowledge graph, enabling personalized long-term memory and intelligent context retrieval for AI assistants.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;📋 Table of Contents&lt;/h2&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#overview" rel="noopener noreferrer"&gt;Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#core-features" rel="noopener noreferrer"&gt;Core Features&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#technical-architecture" rel="noopener noreferrer"&gt;Technical Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#backend-components" rel="noopener noreferrer"&gt;Backend Components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#api-endpoints" rel="noopener noreferrer"&gt;API Endpoints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#data-flow" rel="noopener noreferrer"&gt;Data Flow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#technology-stack" rel="noopener noreferrer"&gt;Technology Stack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#observability-in-axiom" rel="noopener noreferrer"&gt;Observability in Axiom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#notion-export" rel="noopener noreferrer"&gt;Notion Export&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#notion-correction-import" rel="noopener noreferrer"&gt;Notion Correction Import&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#setup--deployment" rel="noopener noreferrer"&gt;Setup &amp;amp; Deployment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/juandastic/synapse-cortex#demo-user-seeding" rel="noopener noreferrer"&gt;Demo User Seeding&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Overview&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Synapse Cortex is a &lt;strong&gt;knowledge graph-powered backend&lt;/strong&gt; designed to give AI chat applications long-term memory capabilities. Instead of treating each conversation in isolation, Synapse Cortex:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingests&lt;/strong&gt; conversational data from chat sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extracts&lt;/strong&gt; entities, relationships, and facts using LLMs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stores&lt;/strong&gt; them in a temporal knowledge graph (Neo4j)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieves&lt;/strong&gt; relevant context for future conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualizes&lt;/strong&gt; the knowledge graph for user exploration and debugging&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The system is built on &lt;a href="https://github.com/getzep/graphiti" rel="noopener noreferrer"&gt;Graphiti&lt;/a&gt;, a temporal knowledge graph framework that handles entity resolution, relationship extraction, and temporal invalidation of…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;However, to see the actual backend code that implements the Notion integration, you can check the &lt;a href="https://github.com/juandastic/synapse-cortex/commit/7e61868c8e27b180f0e83ea18e13784d5266a8ab" rel="noopener noreferrer"&gt;Export feature commit&lt;/a&gt; and the &lt;a href="https://github.com/juandastic/synapse-cortex/commit/66cee1fdd48a55136af3b0e503b523e724e1e831" rel="noopener noreferrer"&gt;Correction commit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/juandastic/synapse-chat-ai/commit/926d68050807d64bd3d5fecba9082d7bf199345e" rel="noopener noreferrer"&gt;UI work&lt;/a&gt; here was minimal since Notion will be the actual UI, but I decided to have a simple interface to set the Notion config (for simplicity, I did not implement a full OAuth flow) and trigger the export and sync corrections&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;Integrating AI with rigid APIs is usually a nightmare of mapping schemas, formatting JSON, and handling edge cases. MCP fundamentally changes this. I no longer write rigid ETL pipelines; I just give tools to reasoning engines.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. SDK for Structure, MCP for Intelligence
&lt;/h3&gt;

&lt;p&gt;I split my architecture into two phases. &lt;/p&gt;

&lt;p&gt;First, I use the standard Notion SDK to create the empty databases. This is a rigid, structural operation. &lt;/p&gt;

&lt;p&gt;Second, I use the &lt;code&gt;@notionhq/notion-mcp-server&lt;/code&gt; combined with &lt;strong&gt;LangGraph&lt;/strong&gt; (a ReAct agent) to actually populate the data and process corrections. &lt;/p&gt;

&lt;p&gt;When a row is flagged for correction, I don't write complex if/else logic to figure out how to update Notion. I just pass the user's correction and the updated graph data to the LangGraph agent equipped with the Notion MCP tools. &lt;strong&gt;The agent autonomously decides&lt;/strong&gt; whether to use &lt;code&gt;API-patch-page&lt;/code&gt; (to update the specific properties) or &lt;code&gt;API-delete-block&lt;/code&gt; (if the fact is completely invalidated and the row should be archived). &lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Engineering Deep Dive: Node.js in a Python World
&lt;/h3&gt;

&lt;p&gt;My backend is written in Python (FastAPI). The official Notion MCP server is written in Node.js. &lt;/p&gt;

&lt;p&gt;Because Synapse is a multi-tenant system (each user has their own independent Notion OAuth token), I couldn't just leave a single global MCP server running. I needed a way to securely isolate connections and ensure low latency between my Python agent and the MCP tools.&lt;/p&gt;

&lt;p&gt;I decided to run the official Node.js MCP server as a &lt;strong&gt;subprocess (&lt;code&gt;stdio&lt;/code&gt;)&lt;/strong&gt; directly inside my FastAPI backend. &lt;/p&gt;

&lt;p&gt;This created some fun lifecycle management challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Docker adjustments:&lt;/strong&gt; I had to modify my Python &lt;code&gt;Dockerfile&lt;/code&gt; to install Node.js so the environment could execute &lt;code&gt;npx @notionhq/notion-mcp-server&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Context Management:&lt;/strong&gt; I built an asynchronous context manager (&lt;code&gt;_NotionAgentContext&lt;/code&gt;) in Python. When an export or correction job starts, it spins up the Node subprocess, passes the specific user's &lt;code&gt;NOTION_TOKEN&lt;/code&gt; securely via environment variables, initializes the LangGraph agent, processes the batches of data, and gracefully shuts down the subprocess when the job is done.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_NotionAgentContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__aenter__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# 1. Start the Node.js MCP subprocess via stdio
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_stdio_cm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_stdio_cm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__aenter__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Initialize session and load Notion tools
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_cm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_cm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__aenter__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;load_mcp_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. Return a LangGraph autonomous agent equipped with Notion MCP
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__aexit__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_tb&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Gracefully shut down the subprocess to prevent zombie Node processes
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_cm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__aexit__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_tb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_stdio_cm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__aexit__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_tb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By running it via &lt;code&gt;stdio&lt;/code&gt; instead of SSE, the communication between the LangGraph reasoning loop and the Notion MCP server is lightning fast, localized, and securely scoped to the current user's job.&lt;/p&gt;

&lt;p&gt;Notion MCP allowed me to stop writing fragile API wrappers and focus on what actually matters: building a system that lets a human seamlessly collaborate with their AI's memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project has been incredibly rewarding. My wife absolutely loves the result; she finally has her AI's brain in a format she can actually read, organize, and correct without feeling overwhelmed. &lt;/p&gt;

&lt;p&gt;I also have to acknowledge that this Notion MCP Challenge was perfectly timed. I already knew my graph visualizer wasn't working for her, but this contest provided the exact motivation and the right technology (MCP) to bring this bidirectional integration to life. It’s a great feeling when a new tool perfectly aligns with a real-world problem you are trying to solve.&lt;/p&gt;

&lt;p&gt;If you are curious about the rest of the Synapse architecture—like why I chose Knowledge Graphs over standard Vector RAG, or how I handled the backend scaling challenges of processing massive context windows—you can check out my previous articles on my DEV profile.&lt;/p&gt;

&lt;p&gt;Synapse is live in &lt;a href="https://synapse-chat.juandago.dev/" rel="noopener noreferrer"&gt;https://synapse-chat.juandago.dev/&lt;/a&gt; if you want to check it out&lt;/p&gt;

&lt;p&gt;Building software is fun, but seeing it come alive and solve actual problems for the people you care about is magical. &lt;/p&gt;

&lt;p&gt;I'd love to hear your thoughts on this approach or how you are using MCP in your own projects. Let's continue the conversation on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or connect on &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>notionchallenge</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Scaling AI Memory: How I Tamed a 120k-Token Prompt with Deterministic GraphRAG</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Sun, 01 Mar 2026 08:59:50 +0000</pubDate>
      <link>https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85</link>
      <guid>https://dev.to/juandastic/scaling-ai-memory-how-i-tamed-a-120k-token-prompt-with-deterministic-graphrag-4f85</guid>
      <description>&lt;p&gt;In a past article, I wrote about &lt;strong&gt;Synapse&lt;/strong&gt;, an &lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;AI companion I built for my wife&lt;/a&gt;. To solve the problem of an LLM forgetting her past, I bypassed standard vector RAG entirely. Instead, I used a Knowledge Graph (via Graphiti and Neo4j) to map her life, compiled the &lt;em&gt;entire&lt;/em&gt; graph into text, and injected it straight into Gemini's massive context window.&lt;/p&gt;

&lt;p&gt;It worked beautifully. Until it didn't. &lt;/p&gt;

&lt;p&gt;When you build a prototype, you test it with a few messages. When your wife is the power user, she builds an entire world. By day 21 of her using the app daily for deep sessions, the system hit a wall. &lt;/p&gt;

&lt;p&gt;Here is the raw data of her input tokens per message over 18 days:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbpy2jr8nsx1vkk42jpo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbpy2jr8nsx1vkk42jpo.png" alt=" " width="700" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;She was sending &lt;strong&gt;over 120,000 tokens&lt;/strong&gt; of system context on &lt;em&gt;every single chat turn&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;Gemini handled it. Modern context windows are incredible, but the reality of production kicked in. My API costs were climbing, Convex bandwidth was getting chewed up storing and moving massive payloads, and latency was increasing. &lt;/p&gt;

&lt;p&gt;Dumping everything into the prompt is a great MVP, but it does not scale. I needed a new architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Community Was Right: Storage ≠ Retrieval
&lt;/h2&gt;

&lt;p&gt;When I published the first article, the Dev.to community called this exact scaling wall. &lt;/p&gt;

&lt;p&gt;Developers like &lt;a class="mentioned-user" href="https://dev.to/scottcjn"&gt;@scottcjn&lt;/a&gt; and &lt;a class="mentioned-user" href="https://dev.to/itskondrat"&gt;@itskondrat&lt;/a&gt; pointed out that while a Knowledge Graph is the perfect way to &lt;strong&gt;store&lt;/strong&gt; relationships and causality, you shouldn't retrieve the &lt;em&gt;whole&lt;/em&gt; thing every time. &lt;/p&gt;

&lt;p&gt;I didn't want to revert to standard Vector RAG, because standard RAG loses the plot. If she says "I'm stressed," a vector search retrieves a random journal entry about "stress." A graph knows the causality: &lt;code&gt;Project A -&amp;gt; CAUSED -&amp;gt; Stress&lt;/code&gt;, and also, for first sessions or smaller graphs, the full usage of the context window is still the best option&lt;/p&gt;

&lt;p&gt;I needed a hybrid approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;A Base Prompt (Working Memory):&lt;/strong&gt; The most critical structural info about her life, capped at a strict budget.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;GraphRAG (Episodic Recall):&lt;/strong&gt; Long-tail memories retrieved on-demand for the current chat turn.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is how I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: Hydration V2 (The Budget-Aware Brain)
&lt;/h2&gt;

&lt;p&gt;My first API endpoint (Hydration V1) just ran a &lt;code&gt;SELECT *&lt;/code&gt; from the graph and formatted the results. &lt;/p&gt;

&lt;p&gt;I rewrote it as &lt;strong&gt;Hydration V2&lt;/strong&gt;: a cascading waterfill allocation system. I set a hard limit of roughly 120,000 characters (~30k tokens). The goal is to maximize the &lt;em&gt;usefulness&lt;/em&gt; of the prompt without blowing the budget.&lt;/p&gt;

&lt;p&gt;Here is how the waterfill logic allocates space:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7juzj1bligdpp44mrbaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7juzj1bligdpp44mrbaq.png" alt=" " width="318" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Node Budget (40%):&lt;/strong&gt; &lt;br&gt;
Nodes are the entities (People, Projects, Concepts). I sort them by their "degree" (number of connections). The most connected nodes are included first. Because nodes are just short summaries, they rarely use the full 40%. The unused characters &lt;strong&gt;roll over&lt;/strong&gt; into the Edge budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Edge Budget (60% + Rollover):&lt;/strong&gt; &lt;br&gt;
Edges are the relationships (the actual stories and facts). To prioritize them, I classify nodes in the top 30th percentile of connections as &lt;strong&gt;"Hubs."&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;P1 (Hub-to-Hub):&lt;/strong&gt; The structural backbone of her life. (e.g., User -&amp;gt; WORKS_ON -&amp;gt; Main Career). These are included first.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;P2 (Hub-Adjacent):&lt;/strong&gt; One node is a Hub, sorted by recency.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;P3 (Long-Tail):&lt;/strong&gt; Low-degree nodes. These are the first to get cut when the budget fills up.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Bridge: The Metadata Contract
&lt;/h2&gt;

&lt;p&gt;Here was the hardest architectural problem: If Hydration V2 puts "Fact A" in the Base Prompt, and my RAG pipeline searches for "Fact A" on the next turn, I will inject duplicate data into the LLM.&lt;/p&gt;

&lt;p&gt;To fix this, Hydration V2 doesn't just return text. It returns a &lt;strong&gt;Metadata Contract&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compilationMetadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"is_partial"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_estimated_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;29500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"included_node_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"uuid-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid-2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"included_edge_ids"&lt;/span&gt;&lt;span class="p"&gt;:[&lt;/span&gt;&lt;span class="s2"&gt;"uuid-x"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid-y"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;is_partial&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;, it means the graph was too big and the waterfill algorithm had to cut things. It also returns the exact UUIDs of the nodes and edges that &lt;em&gt;did&lt;/em&gt; make it into the prompt. &lt;/p&gt;

&lt;p&gt;The React frontend stores this metadata and sends it back to the backend on every single chat request. Now, the backend knows exactly what the LLM already knows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: Deterministic GraphRAG (No Agents)
&lt;/h2&gt;

&lt;p&gt;Most RAG systems today use "Agents" or tool-calling loops. The LLM decides if it needs to search, writes a query, waits for the tool, and then answers. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I hate this pattern for chat UI.&lt;/strong&gt; Especially for use cases where no complex reasoning or multiple tools are needed, it adds 2 to 5 seconds of latency. I wanted my RAG pipeline to be deterministic and execute in under 1 second. &lt;/p&gt;

&lt;p&gt;Here is my straight-line GraphRAG pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Gate Check&lt;/strong&gt;&lt;br&gt;
Before doing any search, the backend checks &lt;code&gt;compilationMetadata.is_partial&lt;/code&gt;. If it is &lt;code&gt;false&lt;/code&gt;, that means her entire graph fits into the Base Prompt. &lt;strong&gt;The system skips RAG entirely.&lt;/strong&gt; Zero wasted compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Query&lt;/strong&gt;&lt;br&gt;
Instead of just taking her last message (which might just be "Why?"), I concatenate the &lt;strong&gt;last 3 non-system messages&lt;/strong&gt; to build a context-rich search query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Hybrid Search&lt;/strong&gt;&lt;br&gt;
I use Graphiti to run a single hybrid search: Semantic Search (vector embeddings) + BM25 (exact keyword match), fused together using Reciprocal Rank Fusion (RRF). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The Secret Sauce: Deduplication&lt;/strong&gt;&lt;br&gt;
Once I have the search results, I cross-reference them with the Metadata Contract from the frontend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deduplicate_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved_edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Edge&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CompilationMetadata&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Drops any edges that are already present in the Base System Prompt.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieved_edges&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;included_edge_ids&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This guarantees &lt;strong&gt;zero redundancy&lt;/strong&gt;. If the RAG pipeline finds a memory, but it's already in the Base Prompt, it gets silently dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Ephemeral Injection&lt;/strong&gt;&lt;br&gt;
The surviving edges and nodes are formatted and injected into the System Message right before hitting Gemini, under a clear header: &lt;code&gt;### RELEVANT EPISODIC MEMORY FOR THIS TURN ###&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Crucially, this injected context is &lt;strong&gt;ephemeral&lt;/strong&gt;. It is sent to the LLM for this specific turn, but it is &lt;em&gt;never&lt;/em&gt; saved to the persistent database chat history. This prevents the context window from bloating with old RAG results over time (context rot).&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability &amp;amp; The Results
&lt;/h2&gt;

&lt;p&gt;You can't improve what you don't measure. I added OpenTelemetry across the backend. Now, when I look at a trace, I can see exactly what the waterfill dropped (&lt;code&gt;hydrate.is_partial&lt;/code&gt;), how long the search took (&lt;code&gt;rag.search_duration_ms&lt;/code&gt;), and how many facts were actually injected (&lt;code&gt;rag.injected_edges_count&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Impact:&lt;/strong&gt;&lt;br&gt;
Look back at the chart at the start of this article. After Day 21, I deployed this architecture. &lt;br&gt;
The input tokens per message instantly collapsed from 120k back down to a stable ~40k tokens (the budget limit + chat history). &lt;/p&gt;

&lt;p&gt;The magic is that the AI didn't get dumber. It still feels like it knows &lt;em&gt;everything&lt;/em&gt; about her because the structural skeleton (the Hubs) is always there in the Base Prompt. But when she asks a specific question about a past event, the GraphRAG pipeline silently fetches the long-tail details in under a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A massive 1 million token context window is an incredible luxury, but it is not a substitute for software architecture. &lt;/p&gt;

&lt;p&gt;Dumping everything into the prompt is the best way to validate an idea. But building real products eventually forces you to move from "what works theoretically" to "what works economically and efficiently." &lt;/p&gt;

&lt;p&gt;By separating &lt;strong&gt;Storage&lt;/strong&gt; (Knowledge Graphs) from &lt;strong&gt;Retrieval&lt;/strong&gt; (Budget-Aware Base Prompts + Deterministic RAG), Synapse is now fast, cheap to run, and infinitely scalable.&lt;/p&gt;

&lt;p&gt;The code for both of these systems is open source. You can check out exactly how I implemented the waterfill allocation (&lt;code&gt;hydration_v2.py&lt;/code&gt;) and the retrieval pipeline (&lt;code&gt;graph_rag.py&lt;/code&gt;) in the backend repository.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Frontend (Body):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;synapse-chat-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Backend (Cortex):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;synapse-cortex&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I love sharing these real-world scaling problems. If you are building memory systems or working with AI in production, I'd love to hear your approach. Let's connect on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>When 5 Minutes Isn't Enough: Moving AI Ingestion from Sync to Async (And Saving 99% Compute)</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Fri, 13 Feb 2026 03:38:32 +0000</pubDate>
      <link>https://dev.to/juandastic/when-5-minutes-isnt-enough-moving-ai-ingestion-from-sync-to-async-and-saving-99-compute-59o8</link>
      <guid>https://dev.to/juandastic/when-5-minutes-isnt-enough-moving-ai-ingestion-from-sync-to-async-and-saving-99-compute-59o8</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e"&gt;last post&lt;/a&gt;, I introduced &lt;strong&gt;Synapse&lt;/strong&gt;, the AI system I built for my wife that uses a Knowledge Graph to give her LLM a "Deep Memory."&lt;/p&gt;

&lt;p&gt;In the early demos and tests, it looked perfect. She ends a chat, the system processes it, and the graph updates in about 50 seconds.&lt;/p&gt;

&lt;p&gt;But demos are lies.&lt;/p&gt;

&lt;p&gt;When we started using it for real, 45-minute chat sessions with tens of messages, the system fell apart. The "End Session" button would spin for 5 minutes and then crash.&lt;/p&gt;

&lt;p&gt;I thought I had a simple timeout bug. It turned out I had a fundamental architecture problem.&lt;/p&gt;

&lt;p&gt;Here is how I went from crashing servers and wasting tokens to a &lt;strong&gt;99% reduction in Convex Actions compute time&lt;/strong&gt; by implementing the &lt;strong&gt;Async Request-Reply Pattern&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Happy Path" Trap
&lt;/h2&gt;

&lt;p&gt;My initial implementation was naive. I treated the heavy AI processing like a standard web request.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Convex (The Orchestrator)&lt;/strong&gt; triggers an HTTP POST to my Python backend.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;FastAPI (The Brain)&lt;/strong&gt; calls &lt;code&gt;Graphiti&lt;/code&gt; + &lt;code&gt;Gemini&lt;/code&gt; to process the text.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;FastAPI&lt;/strong&gt; waits for the result and returns it.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Convex&lt;/strong&gt; saves the result to the DB.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the standard &lt;strong&gt;Synchronous&lt;/strong&gt; pattern.&lt;/p&gt;

&lt;p&gt;The problem? &lt;strong&gt;Convex Actions have a hard execution limit&lt;/strong&gt; (usually 5 to 10 minutes depending on the plan).&lt;/p&gt;

&lt;p&gt;When my wife had a short conversation, processing took 1 or 2 minutes. Fine.&lt;br&gt;
But when she had a deep conversation, the Graph extraction logic (running on Gemini 3 Flash) took &lt;strong&gt;15 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You cannot fit a 15-minute task into a 5-minute box.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempt #1: The "Brute Force" Retry (And Why It Failed)
&lt;/h2&gt;

&lt;p&gt;At first, I didn't realize it was taking 15 minutes. I assumed the Gemini API was just being flaky or slow.&lt;/p&gt;

&lt;p&gt;So, I did what any engineer does when things fail: &lt;strong&gt;I added retries.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I configured Convex to retry the action with exponential backoff on failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here is the disaster that followed:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Convex sends the request.&lt;/li&gt;
&lt;li&gt;It waits 5 minutes. &lt;strong&gt;Timeout.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Convex thinks the request failed, so it schedules a &lt;strong&gt;Retry&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;It sends the request &lt;em&gt;again&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Hidden Bug:&lt;/strong&gt;&lt;br&gt;
The Python backend didn't know Convex had timed out. The first process was &lt;em&gt;still running&lt;/em&gt; in the background, consuming LLM tokens and writing to the graph.&lt;/p&gt;

&lt;p&gt;Suddenly, I had &lt;strong&gt;two&lt;/strong&gt; heavy processes processing the &lt;em&gt;same&lt;/em&gt; chat log simultaneously. I was paying double the API costs, wasting bandwidth, and clogging my backend with "zombie" processes. And the user &lt;em&gt;still&lt;/em&gt; got an error message.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Turning Point: Observability
&lt;/h2&gt;

&lt;p&gt;I couldn't fix what I couldn't see. I installed &lt;strong&gt;OpenTelemetry&lt;/strong&gt; and connected it to &lt;strong&gt;Axiom&lt;/strong&gt; to trace the actual execution time on the Python backend.&lt;/p&gt;

&lt;p&gt;The trace was a slap in the face.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyjgqpwx4cum7ktoanmn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyjgqpwx4cum7ktoanmn.png" alt="Ingest trace screenshot" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The ingestion wasn't failing; it was just slow. It consistently took &lt;strong&gt;12 to 18 minutes&lt;/strong&gt; for large sessions.&lt;/p&gt;

&lt;p&gt;I realized this wasn't a bug I could "optimize" away. I needed to change the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: The Async Request-Reply Pattern
&lt;/h2&gt;

&lt;p&gt;In software engineering, when a task takes longer than a user (or a server) is willing to wait, you decouple the &lt;strong&gt;Request&lt;/strong&gt; from the &lt;strong&gt;Response&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I switched to a &lt;strong&gt;Polling Architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of Convex waiting for the answer, it just asks for a "ticket."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Convex&lt;/strong&gt; sends a &lt;code&gt;POST /ingest&lt;/code&gt; request.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;FastAPI&lt;/strong&gt; immediately returns &lt;code&gt;202 Accepted&lt;/code&gt; with a &lt;code&gt;jobId&lt;/code&gt;. (Time taken: ~300ms).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;FastAPI&lt;/strong&gt; starts the heavy processing in a background task (&lt;code&gt;asyncio.create_task&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Convex&lt;/strong&gt; goes to sleep and wakes up every few minutes to check the status.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is the flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zbmoayxjnfbwz795ph9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zbmoayxjnfbwz795ph9.png" alt="flow diagram" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Linear Backoff?
&lt;/h3&gt;

&lt;p&gt;I switched from &lt;strong&gt;Exponential&lt;/strong&gt; to &lt;strong&gt;Linear&lt;/strong&gt; backoff for the polling.&lt;/p&gt;

&lt;p&gt;If I know a task takes &lt;em&gt;at least&lt;/em&gt; 5 minutes, checking after 10 seconds is a waste of resources. Checking after 2 minutes is also a waste.&lt;/p&gt;

&lt;p&gt;I set the scheduler to check after &lt;strong&gt;5 minutes&lt;/strong&gt;, then &lt;strong&gt;10 minutes&lt;/strong&gt;, then &lt;strong&gt;10 minutes&lt;/strong&gt; again. This reduces the noise on my server significantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results: 99% Efficiency Gain
&lt;/h2&gt;

&lt;p&gt;The difference in resource usage is massive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (Synchronous):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Convex Action running time: &lt;strong&gt;5 minutes&lt;/strong&gt; (blocking/waiting).&lt;/li&gt;
&lt;li&gt;  Result: Fail -&amp;gt; Retry -&amp;gt; &lt;strong&gt;5 more minutes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Total "Billed" Compute: &lt;strong&gt;~10-15 minutes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Token Waste:&lt;/strong&gt; High (re-processing the same data).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After (Async Polling):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Request 1 (Trigger): &lt;strong&gt;~300ms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Request 2 (Poll at 5m): &lt;strong&gt;~300ms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Request 3 (Final Fetch): &lt;strong&gt;~300ms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Total "Billed" Compute: &lt;strong&gt;&amp;lt; 2 seconds&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We went from wasting 10 minutes of compute just "waiting" for a response, to using less than 2 seconds of active execution time to manage the same job.&lt;/p&gt;

&lt;p&gt;More importantly, the Python backend never processes the same job twice. If Convex asks for the status of a job that is already running, FastAPI just says "Still working on it," and the work continues undisturbed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project taught me a valuable lesson about building "Vertical AI" apps: &lt;strong&gt;AI tasks are slow.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are used to web requests taking 200ms. In the world of LLMs and Knowledge Graphs, a "fast" task might take 30 seconds, and a "deep" task might take 15 minutes.&lt;/p&gt;

&lt;p&gt;If your backend takes longer than your timeout limit, don't just increase the timeout. &lt;strong&gt;Decouple the request.&lt;/strong&gt; It makes your system more resilient, your bills lower, and your architecture cleaner.&lt;/p&gt;

&lt;p&gt;I'd love to hear how you handle long-running LLM tasks. Let me know on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Beyond RAG: Building an AI Companion with "Deep Memory" using Knowledge Graphs</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Mon, 09 Feb 2026 00:07:19 +0000</pubDate>
      <link>https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e</link>
      <guid>https://dev.to/juandastic/beyond-rag-building-an-ai-companion-with-deep-memory-using-knowledge-graphs-2e6e</guid>
      <description>&lt;p&gt;I build AI tools to solve my own problems. A while back, &lt;a href="https://dev.to/juandastic/i-ditched-myfitnesspal-and-built-an-ai-agent-to-track-my-food-3eia"&gt;I built NutriAgent to track my calories&lt;/a&gt; because I wanted to &lt;strong&gt;own my raw data&lt;/strong&gt;. But recently, the problem wasn't mine, it was my wife's.&lt;/p&gt;

&lt;p&gt;She uses LLMs differently than I do. While I use them for code or quick facts, she uses them as a therapist, a life coach, and a sounding board. Over the last year, she built a massive "Master Prompt" in Notion. It contained her medical history, key life events, emotional triggers, and ongoing projects.&lt;/p&gt;

&lt;p&gt;It was &lt;strong&gt;35,000 tokens&lt;/strong&gt; long.&lt;/p&gt;

&lt;p&gt;Every time she started a new chat, she had to manually copy-paste this wall of text just to get the AI up to speed. If she didn't, the advice was generic and useless.&lt;/p&gt;

&lt;p&gt;She didn't need a search engine or a simple chat history. She needed a &lt;strong&gt;continuous brain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I realized that the standard way we build AI memory with RAG (Retrieval Augmented Generation) wouldn't be enough. So I built &lt;strong&gt;Synapse AI Chat&lt;/strong&gt;. It's an AI architecture that uses a Knowledge Graph to give an LLM "Deep Memory."&lt;/p&gt;

&lt;p&gt;Here is how I built it, why I chose Knowledge Graphs over Vectors (To be fair, I used both), and how I handled the engineering messiness of making it work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Standard RAG Wasn't Enough
&lt;/h2&gt;

&lt;p&gt;Most AI memory systems today use Vector RAG. You chunk text, turn it into numbers (vectors), and find "similar" chunks later.&lt;/p&gt;

&lt;p&gt;This works great for finding a specific policy in a PDF, but not that great for modeling human relationships and history.&lt;/p&gt;

&lt;p&gt;Vectors find &lt;strong&gt;similarity&lt;/strong&gt;, not &lt;strong&gt;structure&lt;/strong&gt;.&lt;br&gt;
If my wife tells the AI, "I'm feeling overwhelmed today" a Vector search might pull up a journal entry from three months ago where she mentioned "overwhelm."&lt;/p&gt;

&lt;p&gt;But a &lt;strong&gt;Knowledge Graph&lt;/strong&gt; understands the &lt;em&gt;story&lt;/em&gt;. It knows:&lt;br&gt;
&lt;code&gt;"Project A" -&amp;gt; CAUSED -&amp;gt; "Stress" -&amp;gt; RESULTED_IN -&amp;gt; "Overwhelm"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I needed the AI to understand &lt;em&gt;causality&lt;/em&gt;, not just keywords.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Architecture Decision: Full Context Injection
&lt;/h3&gt;

&lt;p&gt;Because I was using Google's Gemini models (which have a massive context window), I didn't need to retrieve just 5 small chunks of text. I could inject the &lt;strong&gt;entire&lt;/strong&gt; compiled profile into the prompt.&lt;/p&gt;

&lt;p&gt;My goal was to turn the raw chat logs into a structured graph, then flatten it back into a comprehensive "User Manual" for the AI to read before every interaction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.getzep.com/product/open-source/" rel="noopener noreferrer"&gt;Graphiti&lt;/a&gt;, the framework I used for the graph indexing, supports semantic search for a retrieval strategy. I decided to take advantage of the Gemini's big context windows. The compiled graph output ended up being smaller than the source, from almost 35k tokens to ~14k, just combining the entities with their descriptions and their relations in plain text, avoiding extra tokens to build a narrative prompt like her old master's prompt&lt;/p&gt;
&lt;h2&gt;
  
  
  Introducing Synapse: The Architecture
&lt;/h2&gt;

&lt;p&gt;I split the project into two parts: the &lt;strong&gt;Body&lt;/strong&gt; (the UI you talk to) and the &lt;strong&gt;Brain&lt;/strong&gt; (the API that processes memory).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;The Frontend (Body):&lt;/strong&gt; React 19 + &lt;strong&gt;Convex&lt;/strong&gt;. I chose Convex because it handles real-time database syncing effortlessly, which makes the chat feel snappy.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Cortex (Brain):&lt;/strong&gt; Python + FastAPI. This does the heavy data processing.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Memory Engine:&lt;/strong&gt; &lt;strong&gt;Graphiti&lt;/strong&gt; + &lt;strong&gt;Neo4j&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Models:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Gemini 3 Flash:&lt;/strong&gt; For the "heavy lifting" (building the graph).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Gemini 2.5 Flash:&lt;/strong&gt; For the actual chat (speed and cost).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the high-level view:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzkxbivdlj2ycdiz1683.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzkxbivdlj2ycdiz1683.png" alt="high-level view" width="376" height="407"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works: The "Deep Memory" Pipeline
&lt;/h2&gt;

&lt;p&gt;The system operates in three distinct phases.&lt;/p&gt;
&lt;h3&gt;
  
  
  Phase A: Conversation (The Chat)
&lt;/h3&gt;

&lt;p&gt;When my wife chats with Synapse, she is talking to &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt;. It’s fast and fluid.&lt;/p&gt;

&lt;p&gt;The trick is that the System Prompt isn't static. Before she sends her first message, I &lt;strong&gt;hydrate&lt;/strong&gt; the prompt with a text summary of her entire Knowledge Graph. The AI immediately knows who she is, what she's worried about, and who her friends are.&lt;/p&gt;
&lt;h3&gt;
  
  
  Phase B: Ingestion (The "Sleep" Cycle)
&lt;/h3&gt;

&lt;p&gt;This is where the magic happens. When she finishes a conversation by stopping chatting for 3 hours or manually clicking a Consolidate button, I treat this like the AI taking a nap to consolidate memories.&lt;/p&gt;

&lt;p&gt;We send the chat transcript to the Python Cortex. Here, I switch to &lt;strong&gt;Gemini 3 Flash&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why the upgrade? Extracting entities from a messy human conversation is hard.&lt;br&gt;
If she says, "I stopped taking medication X and started Y," a weaker model might just add "Taking Y" to the graph. &lt;strong&gt;Gemini 3&lt;/strong&gt; is smart enough to create a generic logic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Find node "Medication X".&lt;/li&gt;
&lt;li&gt; Mark the relationship as &lt;code&gt;STOPPED&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Create node "Medication Y".&lt;/li&gt;
&lt;li&gt; Create relationship &lt;code&gt;STARTED&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frudhrks1ss6tb87h6bn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frudhrks1ss6tb87h6bn4.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Phase C: Hydration (The Awakening)
&lt;/h3&gt;

&lt;p&gt;When she returns, the next session is already prepared with the new compiled graph summary. It doesn't just dump a prompt. It compiles the nodes and edges into a natural language narrative.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_format_compilation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;relationships&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#### 1. CONCEPTUAL DEFINITIONS &amp;amp; IDENTITY ####&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# (Understanding what these concepts mean specifically for this user)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;definitions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;relationships&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#### 2. RELATIONAL DYNAMICS &amp;amp; CAUSALITY ####&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# (How these concepts interact and evolve over time)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relationships&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The "Killer Feature": Memory Explorer
&lt;/h2&gt;

&lt;p&gt;AI memory is usually a "Black Box." Users don't trust what they can't see.&lt;/p&gt;

&lt;p&gt;I wanted my wife to be able to audit her own brain. I built a visualizer using &lt;code&gt;react-force-graph&lt;/code&gt;. She can see bubbles representing her life: "Work," "Health," "Family."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtrtl3gie08bglhdfs8p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtrtl3gie08bglhdfs8p.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If she sees a connection that is wrong (e.g., the AI thinks she likes a food she actually hates), she can edit the input and re-process the graph with new information like &lt;em&gt;"I actually hate mushrooms now."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The system then processes that new input and updates the graph, creating new nodes and relations or invalidating the existing ones. This "Human-in-the-loop" approach builds massive trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Challenges
&lt;/h2&gt;

&lt;p&gt;Building this wasn't just about prompt engineering. There were real system challenges.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Handling Latency (The Job Queue)
&lt;/h3&gt;

&lt;p&gt;Graph ingestion is slow. It takes anywhere from &lt;strong&gt;60 to 200 seconds&lt;/strong&gt; for Graphiti and Gemini to process a long conversation and update Neo4j.&lt;/p&gt;

&lt;p&gt;I couldn't have the UI hang for 3 minutes.&lt;/p&gt;

&lt;p&gt;I used &lt;strong&gt;Convex&lt;/strong&gt; as a Job Queue. When the session ends, the UI returns immediately. Convex processes the job in the background, updating the UI state to "Processing..." and then "Memory Updated" when it's done.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Handling Flakiness (The Retry Logic)
&lt;/h3&gt;

&lt;p&gt;The Gemini API is powerful, but occasionally it throws &lt;strong&gt;503 Service Unavailable&lt;/strong&gt; errors, especially during heavy graph processing tasks.&lt;/p&gt;

&lt;p&gt;I implemented an "Event-Driven Retry" system. If the graph build fails, I don't just crash. I schedule a retry with exponential backoff.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;RETRY_DELAYS_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// Attempt 1: Immediate&lt;/span&gt;
  &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// Attempt 2: +2 minutes (let the API cool down)&lt;/span&gt;
  &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Attempt 3: +10 minutes&lt;/span&gt;
  &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Attempt 4: +30 minutes&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;processJob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;internalAction&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cortex_jobs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cortexJobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// 1. Do the heavy lifting (Call Gemini 3 Flash)&lt;/span&gt;
      &lt;span class="c1"&gt;// This is where 503 errors usually happen&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ingestGraphData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="c1"&gt;// 2. Mark complete if successful&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cortexJobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextAttempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nextAttempt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Stop the loop if we've tried too many times&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runMutation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cortexJobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
            &lt;span class="na"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// 3. Schedule the retry using Convex's scheduler&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;RETRY_DELAYS_MS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;nextAttempt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scheduler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;runAfter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;processJob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Snappy UX
&lt;/h3&gt;

&lt;p&gt;Convex's real-time sync was a lifesaver here. I didn't have to write complex WebSocket code. If the Python backend updates the status of a memory job in the database, the React UI updates instantly.&lt;/p&gt;

&lt;p&gt;The tokens streaming is better with convex in the middle, since the backend is connected with convex. If the user's browser is closed or the connection fails, the token generation will continue, passing the answer to Convex and streaming it to the user when it is possible.&lt;/p&gt;

&lt;p&gt;The catch here is that this could increase the Functions usage since each update will count, so the streaming updates are throttled to 100ms intervals to balance responsiveness with database write efficiency&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The difference is night and day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; My wife dreaded starting a new thread because of the "context set up" tax. She felt like she was constantly repeating herself, and having the responsibility to constanly doing break points to update the Master Prompt with the new data and start a new thread&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; She just talks. The system has a "Deep Memory" of about &lt;strong&gt;10,000 tokens&lt;/strong&gt; (compressed from months of chats) that is injected automatically.&lt;/p&gt;

&lt;p&gt;She has different threads for different topics, but they all share the same &lt;strong&gt;Cortex&lt;/strong&gt;. If she mentions a health issue in the "Work" thread (e.g., "My back hurts from sitting"), the "Health" thread knows about it the next time she logs in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project taught me that we are moving from "Horizontal" AI platforms (like ChatGPT, which knows a little about everything) to "Vertical" AI stacks that know &lt;strong&gt;everything about you&lt;/strong&gt;. I’ve been watching how the ChatGPT and Gemini apps are starting to create user profiles and thread summaries to build this kind of memory. They are chasing the same goal: a truly personalized experience.&lt;/p&gt;

&lt;p&gt;The key takeaway for me is that &lt;strong&gt;Vectors are great for search, but Knowledge Graphs are essential for understanding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I keep enjoying building solutions for real problems. Nowadays, we have powerful tools to build awesome software faster than ever, but I found that having a &lt;strong&gt;product vision&lt;/strong&gt; and the &lt;strong&gt;technical understanding&lt;/strong&gt; to architect a solution is still critical. That is the difference between building a quick prototype and solving a real problem.&lt;/p&gt;

&lt;p&gt;This project is being used for real by my wife and me, and honestly, this is my favorite part of building products. The fun doesn't end when the architecture is done; it begins when people actually use it. Watching the product evolve, finding bugs, pivoting features, or even realizing that an initial idea didn't make sense at all, that is the journey. Building software is fun, but seeing it come alive and solve actual problems is magical.&lt;/p&gt;

&lt;p&gt;The project is live at &lt;a href="https://synapse-chat.juandago.dev" rel="noopener noreferrer"&gt;synapse-chat.juandago.dev&lt;/a&gt; if you want to see it in action.&lt;/p&gt;

&lt;p&gt;The code is open source if you want to dig into the implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Frontend (Body):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-chat-ai" rel="noopener noreferrer"&gt;synapse-chat-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Backend (Cortex):&lt;/strong&gt; &lt;a href="https://github.com/juandastic/synapse-cortex" rel="noopener noreferrer"&gt;synapse-cortex&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to hear your impressions and thoughts. Let's continue the conversation on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or connect on &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>I Used My AI Nutrition Agent Every Day for a Month. Here's What I Actually Had to Fix</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Sat, 27 Dec 2025 06:19:17 +0000</pubDate>
      <link>https://dev.to/juandastic/i-used-my-ai-nutrition-coach-every-day-for-a-month-heres-what-i-actually-had-to-fix-1ej8</link>
      <guid>https://dev.to/juandastic/i-used-my-ai-nutrition-coach-every-day-for-a-month-heres-what-i-actually-had-to-fix-1ej8</guid>
      <description>&lt;p&gt;A month ago, I wrote about building NutriAgent, my AI nutrition tracker that logs meals from Telegram and the web into a Google Sheet I own (&lt;a href="https://dev.to/juandastic/i-ditched-myfitnesspal-and-built-an-ai-agent-to-track-my-food-3eia"&gt;you can read the original post here&lt;/a&gt;). I got it working, posted the article, and figured that was the end of the story.&lt;/p&gt;

&lt;p&gt;Then I started using it every single day. And that's when the real problems began to show up.&lt;/p&gt;

&lt;p&gt;Not bugs. Not crashes. Just... little things that made me think "wait, this is annoying" multiple times per day. Things you only notice when you're the actual user solving a real problem, not just demoing a cool idea.&lt;/p&gt;

&lt;p&gt;Two problems broke the experience completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Spreadsheets Problem (Why My Data Felt Broken)
&lt;/h2&gt;

&lt;p&gt;I'd log my breakfast quickly on Telegram from my phone. Then at lunch, I'd be at my computer and use the web interface because it was easier. But at the end of the day, when I wanted to see my full nutrition breakdown, I had my data split across two different accounts and two different spreadsheets. I had to manually copy rows and merge them just to get a simple daily total.&lt;/p&gt;

&lt;p&gt;The agent stored my Telegram meals under one user ID. My web chats were under another. When I asked "what did I eat this week?" the answer depended entirely on which platform I was using. My nutrition data was fragmented, making any real analysis impossible.&lt;/p&gt;

&lt;p&gt;I realized that "make it multi-user" wasn't enough. I needed one identity across both channels.&lt;/p&gt;

&lt;p&gt;Since I found both channels useful for different scenarios, I decided to find a way to use them while keeping my data integrated and easy to visualize, and analyze&lt;/p&gt;

&lt;h3&gt;
  
  
  How the Linking Actually Works
&lt;/h3&gt;

&lt;p&gt;I thought about building this feature into the main agent as a tool for this: "Send your email to link your account." But typing emails in chat felt clunky. Waiting for verification codes in Telegram felt slower than just clicking a button.&lt;/p&gt;

&lt;p&gt;Some features are just faster in a web interface. Account linking is one of them.&lt;/p&gt;

&lt;p&gt;So I built a Settings page in the web app that generates a short-lived linking code. You copy it, paste it into Telegram, and the bot connects your accounts. That's it.&lt;/p&gt;

&lt;p&gt;The flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get a code from the web Settings&lt;/li&gt;
&lt;li&gt;Send it to the Telegram bot&lt;/li&gt;
&lt;li&gt;Backend validates and binds your &lt;code&gt;telegram_user_id&lt;/code&gt; to your &lt;code&gt;clerk_user_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Merge the chat histories and nutrition logs to keep everything in a single user account&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faa50pjd22d8itug4m0h3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faa50pjd22d8itug4m0h3.png" alt="Screenshot of the web settings page" width="800" height="730"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Under the Hood: One User, Two Channels, One Source of Truth
&lt;/h3&gt;

&lt;p&gt;Under the hood, the core decision was to &lt;strong&gt;pick a single canonical user identity&lt;/strong&gt; and force everything else to align with it.&lt;/p&gt;

&lt;p&gt;On the web side, authentication is handled by Clerk, which gives me a stable &lt;code&gt;clerk_user_id&lt;/code&gt;. Instead of inventing a parallel identity system for Telegram, I decided to make &lt;code&gt;clerk_user_id&lt;/code&gt; the primary key everywhere.&lt;/p&gt;

&lt;p&gt;On the backend, the user model now looks roughly like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;clerk_user_id&lt;/code&gt; → primary identifier&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;telegram_user_id&lt;/code&gt; → optional, nullable&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;email&lt;/code&gt; → metadata and debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telegram is no longer a “separate user”&lt;/li&gt;
&lt;li&gt;It’s just another interface attached to the same account&lt;/li&gt;
&lt;li&gt;All nutrition logs, chat history, and summaries are keyed off the same ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The linking code flow is intentionally simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The web app generates a short-lived code bound to &lt;code&gt;clerk_user_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Telegram sends the code back to the backend&lt;/li&gt;
&lt;li&gt;If valid, the backend attaches &lt;code&gt;telegram_user_id&lt;/code&gt; to the existing user record&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No guessing. No heuristics. No email matching.&lt;br&gt;
If the code matches, the user explicitly intended to link the accounts.&lt;/p&gt;

&lt;p&gt;This small constraint eliminated an entire class of edge cases I didn’t want to debug later.&lt;/p&gt;
&lt;h2&gt;
  
  
  The "One Meal, Three Messages" Telegram Headache
&lt;/h2&gt;

&lt;p&gt;Once I got both channels working smoothly, I started using them interchangeably. That's when I noticed something else. The web version lets me attach multiple images to a single message, for instance, a photo of my food plus a screenshot of the nutrition label. This made the AI estimates much more accurate.&lt;/p&gt;

&lt;p&gt;But when I tried the same thing on Telegram, it fired off three separate messages, and I got three separate AI responses with different calorie counts. Each photo was processed in isolation from the webhook, without the context of the others. The experience gap was frustrating. The agent felt smart on web, broken on Telegram.&lt;/p&gt;
&lt;h3&gt;
  
  
  How I Fixed the Multiple Images Problem
&lt;/h3&gt;

&lt;p&gt;Telegram has a way to detect media groups that are sent the so I introduced a &lt;code&gt;MediaGroupHandler&lt;/code&gt; in the webhook handler for when you send multiple photos at once. So I built a simple batching system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When the bot receives an image as part of a media group, it waits 1 second to start processing the request&lt;/li&gt;
&lt;li&gt;If more images arrive in that chat within the window, it groups them and resets the delay&lt;/li&gt;
&lt;li&gt;Sends them all as &lt;code&gt;list[bytes]&lt;/code&gt; to the agent in one call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent's &lt;code&gt;analyze()&lt;/code&gt; method already accepts &lt;code&gt;list[bytes]&lt;/code&gt;, so no changes needed there. The fix was purely in the Telegram handler.&lt;/p&gt;

&lt;p&gt;Now I can send three angles of my plate plus a nutrition label and get one smart response.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why This Fix Lives in the Telegram Layer (Not the Agent)
&lt;/h3&gt;

&lt;p&gt;One important detail: &lt;strong&gt;I didn’t change the agent at all&lt;/strong&gt; to support multiple images.&lt;/p&gt;

&lt;p&gt;The agent already accepts &lt;code&gt;list[bytes]&lt;/code&gt; for images. The real bug wasn’t model capability — it was &lt;strong&gt;message orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Telegram delivers images as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate webhook events&lt;/li&gt;
&lt;li&gt;Sometimes grouped with a &lt;code&gt;media_group_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sometimes arriving milliseconds apart, out of order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Originally, each webhook triggered an agent call immediately. That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One image = one analysis&lt;/li&gt;
&lt;li&gt;Zero shared context&lt;/li&gt;
&lt;li&gt;Conflicting calorie estimates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix was to treat Telegram messages as &lt;strong&gt;signals&lt;/strong&gt;, not requests.&lt;/p&gt;

&lt;p&gt;I introduced a lightweight batching layer in the Telegram handler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Images with the same &lt;code&gt;media_group_id&lt;/code&gt; are buffered&lt;/li&gt;
&lt;li&gt;A short debounce window (1 second) waits for more images&lt;/li&gt;
&lt;li&gt;Each new image resets the timer&lt;/li&gt;
&lt;li&gt;When the window closes, all images are sent together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conceptually, it’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Wait until the user is done talking, then think.”&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;media_groups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;media_groups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setdefault&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;media_group_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;process_after_delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_after_delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;media_groups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;media_group_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By keeping this logic inside the Telegram adapter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent stays platform-agnostic&lt;/li&gt;
&lt;li&gt;The same analysis pipeline works for web uploads, Telegram albums, or future mobile clients&lt;/li&gt;
&lt;li&gt;Telegram quirks don’t leak into core business logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ended up being one of those fixes that made everything &lt;em&gt;feel&lt;/em&gt; smarter without making the system more complex.&lt;/p&gt;

&lt;p&gt;Another side effect of this implementation was that it forced me to go deeper into asynchronous programming with FastAPI and Uvicorn. I already had some exposure to asyncio, but this was the first time I had to reason explicitly about timing, cancellation, and shared state in a real user-facing flow.&lt;/p&gt;

&lt;p&gt;To keep the solution simple, I used in-memory storage combined with asyncio.Lock() and cancellable asyncio.Tasks to implement the batching and debounce logic. This works well because the bot currently runs with a single worker, so I don’t need external coordination or persistence.&lt;/p&gt;

&lt;p&gt;The important part is that this wasn’t a shortcut — it was a conscious tradeoff. The same pattern would translate cleanly to Redis, a queue, or a background worker if I needed to scale horizontally. For now, the simpler solution keeps the system easier to reason about, test, and evolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Oh, That's Actually Smooth Now" Moment
&lt;/h2&gt;

&lt;p&gt;After the changes, I logged lunch on Telegram during a break, used the web chat when I was at the computer, and that evening, I opened the single spreadsheet with the whole picture of my day ready to analyze and compare with the rest of the week.&lt;/p&gt;

&lt;p&gt;I sent three images of dinner—no spam, just one clean response. The product finally feels intentional instead of held together with duct tape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Dogfooding Actually Teaches You
&lt;/h2&gt;

&lt;p&gt;Building for yourself is different than building for a hypothetical user. You feel the pain immediately. You can't ignore bad UX because you're the one suffering.&lt;/p&gt;

&lt;p&gt;The gap between "it works" and "it works well enough to use daily" is massive—and only dogfooding reveals it.&lt;/p&gt;

&lt;p&gt;I learned that context engineering is more important than overloading prompts. I learned that some features belong in web UIs, not chat. And I learned that starting with a no-code tool is great for testing, but real usage demands real architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's a Real Product Now
&lt;/h2&gt;

&lt;p&gt;NutriAgent stopped being a toy project when I started needing it. These changes didn't just add features—they made it something I can share and scale.&lt;/p&gt;

&lt;p&gt;The project is live at &lt;a href="https://nutriagent.juandago.dev" rel="noopener noreferrer"&gt;https://nutriagent.juandago.dev&lt;/a&gt;. The code is open source for the &lt;a href="https://github.com/juandastic/nutri-agent-bot" rel="noopener noreferrer"&gt;Agent&lt;/a&gt; and &lt;a href="https://github.com/juandastic/nutri-agent-web" rel="noopener noreferrer"&gt;Web UI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This was my journey, but I'd love to hear your thoughts. Let's continue the conversation on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ux</category>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Ditched MyFitnessPal and Built an AI Agent to Track My Food</title>
      <dc:creator>Juan David Gómez</dc:creator>
      <pubDate>Sat, 15 Nov 2025 03:28:36 +0000</pubDate>
      <link>https://dev.to/juandastic/i-ditched-myfitnesspal-and-built-an-ai-agent-to-track-my-food-3eia</link>
      <guid>https://dev.to/juandastic/i-ditched-myfitnesspal-and-built-an-ai-agent-to-track-my-food-3eia</guid>
      <description>&lt;p&gt;I wanted to track my calories and protein for my training goals, but I got tired of existing apps. They lock you into their pretty dashboards, make it hard to export your own data, and you can't cross-reference that nutrition data with your training logs easily. I just wanted to &lt;strong&gt;own my raw data&lt;/strong&gt; and build custom reports for myself.&lt;/p&gt;

&lt;p&gt;So I built NutriAgent. It's an AI nutrition tracker that understands text and photos of my meals, logs everything into a database and Google Sheets that I control, and I can chat with it on Telegram or the web. This post is about my journey of turning a simple "call GPT" prototype into a real tool-using agent with memory—for myself, but built with proper product decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  My First Agent Wasn't Code (And That's Why I Rewrote It)
&lt;/h2&gt;

&lt;p&gt;I didn't start with Python. My first version was actually a quick PoC in n8n (the self-hosted workflow tool). I set up a simple flow with an agent node, a few tools, and Telegram integration. It worked surprisingly well; I used it for several days, and it logged my meals fine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14d6c0tbz59r7i0g1153.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14d6c0tbz59r7i0g1153.png" alt=" " width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem hit when I shared it with a friend. He wanted to try it, but I realized nothing was reusable. All my credentials for third-party services were hardcoded to my accounts. The whole flow was built around a single user: me. It couldn't support multiple people, and turning that n8n setup into a real product would have been a hack on top of a hack.&lt;br&gt;
That was the real push. I decided to rebuild it properly in Python—not just for me, but as a real multi-user system. It was more work, but it gave me the excuse to spend more time bringing a proper product to life, which is what I actually enjoy doing.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building a Proper Agent in Python
&lt;/h2&gt;

&lt;p&gt;The n8n prototype proved the concept worked, but now I had to rebuild it from scratch; this time with proper architecture for multiple users. As I started writing the Python version, I realized I needed to be more intentional about the agent's design than I was in my quick n8n flow.&lt;/p&gt;

&lt;p&gt;In n8n, I had basic tools duct-taped together. For a real system, I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clean agent setup that could handle many users' conversations and data&lt;/li&gt;
&lt;li&gt;Well-designed tools that actually corresponded to product features&lt;/li&gt;
&lt;li&gt;Robust memory that wouldn't break when I scaled beyond just my own use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used LangChain's &lt;code&gt;create_agent&lt;/code&gt; because it handles a lot of the heavy lifting. The core setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PROMPT_FILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;food_analysis_prompt.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FoodAnalysisAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_system_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_create_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PROMPT_FILE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;current_datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_datetime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I keep the prompt in a separate file because I edit it a lot. It's easier to tweak the instructions without touching code. I inject the current datetime so the agent knows when we are important for queries like "today" or "this week" in my conversations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making It Understand Photos and My Chat History
&lt;/h2&gt;

&lt;p&gt;The agent needs to handle my messy real-world inputs: sometimes text, sometimes a photo, sometimes both. Plus, it needs to remember what we were just talking about.&lt;/p&gt;

&lt;p&gt;Here's how I normalize everything before sending it to the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FoodAnalysisAgent.analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Pull my past conversation from DB and convert to LangChain format
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

    &lt;span class="c1"&gt;# Add my current message (text + optional images)
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets me send a photo of fries and add context like "these were air-fried" to get a better estimate. The agent sees the image and text together, plus our conversation history, so it feels like a natural chat about my meals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing Tools for My Own Use Cases
&lt;/h2&gt;

&lt;p&gt;Each tool maps to something I actually want to &lt;strong&gt;do&lt;/strong&gt;. I didn't want abstract functions; I wanted "register this meal" or "show me my data."&lt;/p&gt;

&lt;h3&gt;
  
  
  Saving My Meals to DB and Google Sheets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_register_nutritional_info_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@tool&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_nutritional_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;calories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;proteins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;carbs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;fats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;meal_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;extra_details&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;save_nutritional_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# This is me
&lt;/span&gt;            &lt;span class="n"&gt;calories&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;calories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;proteins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proteins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;carbs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;carbs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;fats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;meal_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meal_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;extra_details&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;extra_details&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_spreadsheet_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;append_nutritional_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;calories&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;calories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;proteins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proteins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;carbs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;carbs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;fats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;meal_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;meal_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;extra_details&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;extra_details&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# DB is my source of truth; Sheets is best-effort
&lt;/span&gt;                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to append to my spreadsheet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Build a friendly summary for me
&lt;/span&gt;        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;register_nutritional_info&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;My database is the source of truth.&lt;/strong&gt; Google Sheets is a nice-to-have mirror. If Sheets fails, I don't lose my data; the meal is already saved in Supabase. This gives me peace of mind because I know my data is always safe.&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying My Past Meals
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_query_nutritional_info_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@tool&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_nutritional_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_nutritional_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Querying my own history
&lt;/span&gt;            &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No nutritional records found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Date: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | Meal: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;meal_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calories: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;calories&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | Proteins: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;proteins&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;g | &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Carbs: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;carbs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;g | Fats: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fats&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;g&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query_nutritional_info&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I pre-format my records into simple text lines instead of dumping raw JSON. The model understands this better and can answer my questions like "what was my protein intake on Monday?" more reliably.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting My Google Sheets via OAuth
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_register_google_account_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@tool&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_google_account&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_spreadsheet_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your Google account is already connected. I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll keep saving meals there.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I need a valid redirect URL to start the Google authorization flow. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The server configuration seems incomplete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;authorization_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_authorization_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;To enable Google Sheets integration, please authorize access using this link:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;authorization_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;register_google_account&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps all the OAuth complexity inside a tool. The agent just decides &lt;em&gt;when&lt;/em&gt; I need to connect my account and triggers the flow naturally in our conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Memory System: Two Stores for Different Jobs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Supabase&lt;/strong&gt; is my core memory: my chats, messages, and nutritional records all live there. It's fast and reliable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google Sheets&lt;/strong&gt; is for me: I can see my data, build custom charts, and truly own it. But it's slower and sometimes fails, so it's a mirror, not the primary store.&lt;/p&gt;

&lt;p&gt;Here's how I ensure my spreadsheet exists before writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ensure_spreadsheet_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_spreadsheet_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No spreadsheet config for my user_id=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ensure_valid_credentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spreadsheet_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;create_spreadsheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;verify_spreadsheet_has_headers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;HttpError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;create_spreadsheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;spreadsheet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dual-store approach balances reliability with my need for ownership. I get a spreadsheet I control, but the app doesn't break if Google has issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same Brain, Different Ways to Chat
&lt;/h2&gt;

&lt;p&gt;The agent is just a class. I can talk to it however I want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telegram&lt;/strong&gt;: I message my bot, it normalizes my messages (text, photos, documents), downloads media, and calls the agent. I use webhooks to keep it responsive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web UI&lt;/strong&gt;: I built a simple web interface that hits the same agent API. It creates chats with &lt;code&gt;chat_type="external"&lt;/code&gt; so the agent doesn't care if I'm using Telegram or the web.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent interface is stable. I could add WhatsApp, SMS, or anything else without changing the core AI logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracing and Logging Saved My Sanity
&lt;/h2&gt;

&lt;p&gt;I added &lt;code&gt;@traceable&lt;/code&gt; from LangSmith around the main &lt;code&gt;analyze&lt;/code&gt; method. Suddenly I could see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exactly what the model received from me&lt;/li&gt;
&lt;li&gt;Every tool call and its arguments&lt;/li&gt;
&lt;li&gt;Where errors happened and how long things took&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also log my user ID, spreadsheet IDs, and macros to debug production issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real example&lt;/strong&gt;: When I built the Web UI, my meals stopped showing images in the traces. I saw the model wasn't receiving them. The format was wrong, I fixed it in 5 minutes because the trace made it obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned Building This for Myself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Where agents are worth it&lt;/strong&gt;: When they orchestrate real tools and stateful systems (like a database, Sheets, and OAuth), not just when they chat. Each tool should map to a clear, real-world action I want to take.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What surprised me&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You don't need the most intelligent LLM to build a useful agent. A simple, well-written prompt and simple tools that capture the main features are often enough to create a reliable and good user experience.&lt;/li&gt;
&lt;li&gt;Context engineering is key. Understanding the tools and what information or context each tool provides is more important than loading the prompt with ultra-detailed instructions.&lt;/li&gt;
&lt;li&gt;Handling OAuth tokens, refresh flows, and "self-healing" spreadsheets (like recreating one if I accidentally delete it) was critical for making a reliable tool that depends on a third-party service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The main takeaway&lt;/strong&gt;: I've always loved building digital products that solve real problems; it's been my main career motivation. But this project was different. I had a personal problem, and I wasn't just building a "good enough" solution; I was able to build the perfect solution for my own needs. That gets me excited to build more and keep growing my skills with these new technologies.&lt;/p&gt;

&lt;p&gt;Starting with a no-code tool like n8n was great for testing ideas quickly. But for a product you might want to share or scale, investing in proper code architecture from the start saves you from rebuilding everything later.&lt;/p&gt;

&lt;p&gt;I can't say it was easy; I definitely leaned on my existing experience in software development. But it's a total game-changer. The way we can build products today is so different from even just a few years ago.&lt;/p&gt;

&lt;p&gt;The project is live at &lt;a href="https://nutriagent.juandago.dev" rel="noopener noreferrer"&gt;https://nutriagent.juandago.dev&lt;/a&gt; if you want to see what I built. The code is available on GitHub for the &lt;a href="https://github.com/juandastic/nutri-agent-bot" rel="noopener noreferrer"&gt;Agent&lt;/a&gt; and also for the &lt;a href="https://github.com/juandastic/nutri-agent-web" rel="noopener noreferrer"&gt;Web UI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heads up&lt;/strong&gt;: Since this is a personal project, my Google Cloud account isn't verified. If you try connecting your Google account, you'll get a scary warning screen (Google's way of handling unverified apps). I don't store your credentials; it's just for writing to your own Sheets, but the warning looks dramatic.&lt;/p&gt;

&lt;p&gt;This was my journey, but I'd love to hear your thoughts. I'm excited to start sharing more updates on this project and other things I'm building. Let's continue the conversation on &lt;a href="https://x.com/juandastic" rel="noopener noreferrer"&gt;X&lt;/a&gt; or connect on &lt;a href="https://www.linkedin.com/in/juandastic/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>tooling</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
