<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Christian Alexander Nonis</title>
    <description>The latest articles on DEV Community by Christian Alexander Nonis (@christian_alexandernonis).</description>
    <link>https://dev.to/christian_alexandernonis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2110029%2F7d94a8c9-4081-4b91-ac1f-23a6acb18307.JPG</url>
      <title>DEV Community: Christian Alexander Nonis</title>
      <link>https://dev.to/christian_alexandernonis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/christian_alexandernonis"/>
    <language>en</language>
    <item>
      <title>From RAG to a “memory layer”: what building an AI assistant taught us</title>
      <dc:creator>Christian Alexander Nonis</dc:creator>
      <pubDate>Sun, 29 Mar 2026 16:36:07 +0000</pubDate>
      <link>https://dev.to/christian_alexandernonis/from-rag-to-a-memory-layer-what-building-an-ai-assistant-taught-us-4efm</link>
      <guid>https://dev.to/christian_alexandernonis/from-rag-to-a-memory-layer-what-building-an-ai-assistant-taught-us-4efm</guid>
      <description>&lt;p&gt;About a year and a half ago, we were building a proactive AI assistant.&lt;/p&gt;

&lt;p&gt;Not just a chatbot, but something that could actually act on your behalf.&lt;/p&gt;

&lt;p&gt;It could reply to emails in your tone, move calendar events, organize your inbox, and surface information based on what you actually care about.&lt;/p&gt;

&lt;p&gt;The goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;build something that feels like an extension of how you think.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The part we didn’t expect
&lt;/h2&gt;

&lt;p&gt;To make that work, we started with what most people use today: RAG.&lt;/p&gt;

&lt;p&gt;And to be fair - RAG works.&lt;/p&gt;

&lt;p&gt;You can go pretty far with chunking, embeddings, and retrieval.&lt;br&gt;
You can build systems that feel smart.&lt;/p&gt;

&lt;p&gt;But as the assistant got more complex, something started to break.&lt;/p&gt;

&lt;p&gt;Not in an obvious way.&lt;/p&gt;

&lt;p&gt;It was more subtle.&lt;/p&gt;

&lt;p&gt;The system could retrieve relevant information,&lt;br&gt;
but it didn’t really &lt;strong&gt;understand&lt;/strong&gt; how things were connected.&lt;/p&gt;

&lt;p&gt;Everything was based on similarity.&lt;/p&gt;

&lt;p&gt;And similarity is not structure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a "brain"
&lt;/h2&gt;

&lt;p&gt;To move forward, we needed something else.&lt;/p&gt;

&lt;p&gt;We started building what we internally called a "brain".&lt;/p&gt;

&lt;p&gt;A layer responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting meaning from data&lt;/li&gt;
&lt;li&gt;connecting concepts together&lt;/li&gt;
&lt;li&gt;maintaining a consistent structure over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the beginning, it was just a supporting component for the assistant.&lt;/p&gt;

&lt;p&gt;But the deeper we went, the more it became clear:&lt;/p&gt;

&lt;p&gt;this was the real problem.&lt;/p&gt;

&lt;p&gt;About 7 months ago, we made a decision:&lt;br&gt;
we stopped focusing on the assistant itself&lt;br&gt;
and went all-in on this layer.&lt;/p&gt;

&lt;p&gt;That became BrainAPI.&lt;/p&gt;




&lt;h2&gt;
  
  
  From retrieval to structure
&lt;/h2&gt;

&lt;p&gt;The shift can be summarized like this.&lt;/p&gt;

&lt;p&gt;Typical RAG pipeline:&lt;br&gt;
chunk -&amp;gt; embed -&amp;gt; retrieve -&amp;gt; generate&lt;/p&gt;

&lt;p&gt;What we moved toward:&lt;br&gt;
ingest -&amp;gt; extract -&amp;gt; connect -&amp;gt; graph -&amp;gt; query&lt;/p&gt;

&lt;p&gt;Instead of treating data as independent chunks,&lt;br&gt;
we process it into a structured representation of entities and relationships.&lt;/p&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;documents are parsed into concepts&lt;/li&gt;
&lt;li&gt;relationships are extracted and normalized&lt;/li&gt;
&lt;li&gt;everything is stored in a graph + vector layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vectors are still useful,&lt;br&gt;
but they are no longer the primary abstraction.&lt;/p&gt;

&lt;p&gt;The graph is.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changes in practice
&lt;/h2&gt;

&lt;p&gt;This changes how you interact with data.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"what text is similar to this query?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what entities are involved?&lt;/li&gt;
&lt;li&gt;how are they connected?&lt;/li&gt;
&lt;li&gt;what paths exist between concepts?&lt;/li&gt;
&lt;li&gt;what else is related in this context?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Retrieval becomes navigation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this approach helps
&lt;/h2&gt;

&lt;p&gt;We found this particularly useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;context spans across multiple sources and time&lt;/li&gt;
&lt;li&gt;relationships matter more than keywords&lt;/li&gt;
&lt;li&gt;consistency is important (not just relevance)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some practical use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recommendation systems (ecommerce, social)&lt;/li&gt;
&lt;li&gt;search systems that go beyond keyword matching&lt;/li&gt;
&lt;li&gt;persistent memory for agents and chatbots&lt;/li&gt;
&lt;li&gt;more reliable RAG setups in complex domains&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Exploring "polarities"
&lt;/h2&gt;

&lt;p&gt;One interesting direction we’ve been exploring is something we call &lt;strong&gt;polarities&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of returning a single "best" answer,&lt;br&gt;
the system can surface a range of possible solutions around a problem,&lt;br&gt;
based on how concepts relate in the graph.&lt;/p&gt;

&lt;p&gt;It’s less about ranking results,&lt;br&gt;
and more about exploring a solution space.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;At Lumen Labs (our startup), this direction came from a broader observation.&lt;/p&gt;

&lt;p&gt;AI systems today are powerful,&lt;br&gt;
but they are also fragile in how they represent knowledge.&lt;/p&gt;

&lt;p&gt;They retrieve well.&lt;br&gt;
They generate well.&lt;/p&gt;

&lt;p&gt;But they don’t really &lt;strong&gt;ground&lt;/strong&gt; information in a consistent structure.&lt;/p&gt;

&lt;p&gt;And that’s where a lot of issues come from,&lt;br&gt;
especially when accuracy actually matters.&lt;/p&gt;

&lt;p&gt;If we want systems that people can rely on,&lt;br&gt;
we need something closer to a structured memory layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Open sourcing it
&lt;/h2&gt;

&lt;p&gt;We’ve been using this approach in production for a few B2B use cases,&lt;br&gt;
but never exposed it publicly.&lt;/p&gt;

&lt;p&gt;Now we’re opening it up.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the core is open source&lt;/li&gt;
&lt;li&gt;it can run fully locally (we’ve tested it with Ollama + offline setups)&lt;/li&gt;
&lt;li&gt;or be deployed as managed instances in the cloud&lt;/li&gt;
&lt;li&gt;it’s extensible via a plugin system&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;We don’t think this replaces RAG.&lt;/p&gt;

&lt;p&gt;But it feels like RAG is one component of a bigger system,&lt;br&gt;
not the system itself.&lt;/p&gt;

&lt;p&gt;After spending the last year and a half building on top of AI systems,&lt;br&gt;
this "memory layer" is the piece that felt missing.&lt;/p&gt;

&lt;p&gt;Curious to hear how others are approaching this,&lt;br&gt;
especially if you’ve hit similar limitations with chunk-based retrieval.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/Lumen-Labs/brainapi2" rel="noopener noreferrer"&gt;https://github.com/Lumen-Labs/brainapi2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Website / Video: &lt;a href="https://brain-api.dev" rel="noopener noreferrer"&gt;https://brain-api.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>rag</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Giving LLMs Real Memory: Why It’s Hard, and How BrainAPI Solves It</title>
      <dc:creator>Christian Alexander Nonis</dc:creator>
      <pubDate>Sat, 09 Aug 2025 07:27:19 +0000</pubDate>
      <link>https://dev.to/christian_alexandernonis/giving-llms-real-memory-why-its-hard-and-how-brainapi-solves-it-eod</link>
      <guid>https://dev.to/christian_alexandernonis/giving-llms-real-memory-why-its-hard-and-how-brainapi-solves-it-eod</guid>
      <description>&lt;p&gt;Large Language Models (LLMs) are impressive. They can write code, answer questions, and chat fluently on almost any topic.&lt;br&gt;&lt;br&gt;
But there’s one fundamental flaw:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;LLMs have no memory.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Unless you manually feed the model your entire conversation history each time, it will forget everything.&lt;br&gt;&lt;br&gt;
For developers, this creates a lot of friction when building anything that needs &lt;em&gt;persistence&lt;/em&gt; or &lt;em&gt;context&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we’ll:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore &lt;strong&gt;why LLM memory is a hard problem&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;See &lt;strong&gt;how BrainAPI approaches it&lt;/strong&gt; with a structured, hybrid memory architecture&lt;/li&gt;
&lt;li&gt;Walk through a &lt;strong&gt;mini tutorial&lt;/strong&gt; to integrate it into your project&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Why LLM Memory Is Hard
&lt;/h2&gt;

&lt;p&gt;LLMs are &lt;strong&gt;stateless&lt;/strong&gt; by design. Each prompt is processed independently.&lt;br&gt;&lt;br&gt;
When you ask a follow-up question, the model doesn’t “remember” the previous answer — it only knows what you explicitly include in the input.&lt;/p&gt;

&lt;p&gt;Developers try to work around this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; — storing chunks of data in a vector DB and fetching relevant ones per query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt stuffing&lt;/strong&gt; — appending conversation history to every prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual state tracking&lt;/strong&gt; — keeping facts in variables or databases and re-injecting them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RAG struggles with &lt;strong&gt;multi-turn continuity&lt;/strong&gt; (“What was the second method again?”)&lt;/li&gt;
&lt;li&gt;Prompt stuffing &lt;strong&gt;bloats tokens&lt;/strong&gt; and drives up cost&lt;/li&gt;
&lt;li&gt;Coreference issues — “it” and “she” become ambiguous without entity tracking&lt;/li&gt;
&lt;li&gt;No &lt;em&gt;high-level awareness&lt;/em&gt; — the bot can’t easily remember your goals, preferences, or evolving context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We need something that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Understands&lt;/strong&gt; entities and relationships
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stores&lt;/strong&gt; facts and knowledge in a structured way
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieves&lt;/strong&gt; context intelligently, not just by keyword similarity
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tracks&lt;/strong&gt; conversation at both a &lt;em&gt;detail&lt;/em&gt; and &lt;em&gt;summary&lt;/em&gt; level&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Introducing BrainAPI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://brainapi.lumen-labs.ai/docs" rel="noopener noreferrer"&gt;&lt;strong&gt;BrainAPI&lt;/strong&gt;&lt;/a&gt; by Lumen Labs is an &lt;strong&gt;on-demand memory layer&lt;/strong&gt; for LLM applications.&lt;br&gt;&lt;br&gt;
It’s accessible via Python and Node.js SDKs, and it handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Storing conversation messages&lt;/li&gt;
&lt;li&gt;Injecting static or dynamic knowledge&lt;/li&gt;
&lt;li&gt;Retrieving relevant context for the current query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key differences vs. simple RAG:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coreference resolution&lt;/strong&gt; — normalizes references so “she” and “Mary” are connected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Triplet-based knowledge graph&lt;/strong&gt; — facts are stored as &lt;code&gt;subject → predicate → object&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval&lt;/strong&gt; — combines graph traversal and vector similarity search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-level observation layer&lt;/strong&gt; — summaries of user goals, topics, and context slices&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  How It Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;The architecture has &lt;strong&gt;five layers&lt;/strong&gt;:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Coreference Resolution
&lt;/h3&gt;

&lt;p&gt;Ensures entity consistency across messages.&lt;br&gt;&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Mary is getting married next year. She wants it in Rome."
→ "Mary is getting married next year. Mary wants the wedding in Rome."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Currently using &lt;a href="https://github.com/biu-nlp/fastcoref" rel="noopener noreferrer"&gt;&lt;code&gt;fastcoref&lt;/code&gt;&lt;/a&gt; in Python; exploring a faster C++ rule-based resolver.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Triplet Extraction &amp;amp; Embedding
&lt;/h3&gt;

&lt;p&gt;From each message or knowledge chunk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract &lt;strong&gt;subject-predicate-object&lt;/strong&gt; triples
&lt;/li&gt;
&lt;li&gt;Embed &lt;strong&gt;whole phrase&lt;/strong&gt; and &lt;strong&gt;individual entities&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wikify&lt;/strong&gt; entity names to avoid duplicates (e.g. "NYC" → "New York City")&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Storage Backend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Neo4j&lt;/strong&gt; — the knowledge graph
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pinecone&lt;/strong&gt; — vector embeddings for semantic search
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB&lt;/strong&gt; — raw text chunks, logs, and metadata&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. Hybrid Retrieval
&lt;/h3&gt;

&lt;p&gt;When asked &lt;em&gt;“Mary’s wedding date”&lt;/em&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract &lt;code&gt;(Mary)&lt;/code&gt; and &lt;code&gt;(wedding date)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Search Neo4j for subject = “Mary”
&lt;/li&gt;
&lt;li&gt;Traverse edges for exact match on object
&lt;/li&gt;
&lt;li&gt;If no exact match, run vector search on connected nodes
&lt;/li&gt;
&lt;li&gt;If no subject found, run vector search for closest parent entity&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  5. High-Level LLM Observations
&lt;/h3&gt;

&lt;p&gt;A summarization layer produces &lt;em&gt;structured observations&lt;/em&gt; every few turns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Topics discussed&lt;/li&gt;
&lt;li&gt;User goals&lt;/li&gt;
&lt;li&gt;Relevant constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These summaries give the bot &lt;strong&gt;bird’s-eye awareness&lt;/strong&gt; without flooding the context window.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use BrainAPI
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Documentation Bots&lt;/strong&gt; — remember context between Q&amp;amp;A and follow-ups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal-Oriented Assistants&lt;/strong&gt; — persist user preferences and constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Educational Tutors&lt;/strong&gt; — track student progress and personalize lessons&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal AI Companions&lt;/strong&gt; — maintain continuity across days or weeks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Mini Tutorial: Adding Memory to Your Bot
&lt;/h2&gt;

&lt;p&gt;Let’s add BrainAPI to a Python chatbot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install the SDK&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;lumen-brain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Save incoming messages&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from lumen_brain import LumenBrainDriver
driver = LumenBrainDriver("your-api-key")

driver.save_message(
    memory_uuid="project-chat-memory",
    content="I’m planning a conference in Rome next May.",
    role="user",
    conversation_id="conv-001"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Inject Knowledge&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;driver.inject_knowledge(
    memory_uuid="project-chat-memory",
    type="file",
    content="Our conference venue options include the Colosseum and Forum."
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Retrieve relevant context&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result = driver.query_memory(
    text="When is the conference happening again?",
    memory_uuid="project-chat-memory",
    conversation_id="conv-001"
)

response = llm.invoke({ "input": "When is the conference happening again?" + result.context })
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
  </channel>
</rss>
