<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oleksander</title>
    <description>The latest articles on DEV Community by Oleksander (@teolex2020).</description>
    <link>https://dev.to/teolex2020</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3794675%2Fdd280b1a-a0c5-4942-a1a0-9a6ea303a68f.jpeg</url>
      <title>DEV Community: Oleksander</title>
      <link>https://dev.to/teolex2020</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/teolex2020"/>
    <language>en</language>
    <item>
      <title>Why AI Needs an External Cognitive Layer Beyond Memory</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Thu, 02 Apr 2026 16:30:15 +0000</pubDate>
      <link>https://dev.to/teolex2020/why-ai-needs-an-external-cognitive-layer-beyond-memory-3f55</link>
      <guid>https://dev.to/teolex2020/why-ai-needs-an-external-cognitive-layer-beyond-memory-3f55</guid>
      <description>&lt;h1&gt;
  
  
  Why AI Needs an External Cognitive Layer Beyond Memory
&lt;/h1&gt;

&lt;p&gt;Most AI agents today are still built around a thin pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a large language model,&lt;/li&gt;
&lt;li&gt;a prompt,&lt;/li&gt;
&lt;li&gt;a tool loop,&lt;/li&gt;
&lt;li&gt;and some form of memory or retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That stack can look impressive in demos, but it breaks down once the agent needs continuity, specialization, self-consistency, and long-lived behavioral control.&lt;/p&gt;

&lt;p&gt;Memory alone is not enough.&lt;/p&gt;

&lt;p&gt;If an agent only stores past records, it can remember what happened. It still cannot reliably:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;form stable beliefs,&lt;/li&gt;
&lt;li&gt;build concepts over time,&lt;/li&gt;
&lt;li&gt;learn causal structure,&lt;/li&gt;
&lt;li&gt;accumulate policies,&lt;/li&gt;
&lt;li&gt;generate internal pressure,&lt;/li&gt;
&lt;li&gt;anticipate future outcomes,&lt;/li&gt;
&lt;li&gt;detect epistemic gaps,&lt;/li&gt;
&lt;li&gt;or regulate its own mode of operation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the gap we have been exploring in Aura.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Thesis
&lt;/h2&gt;

&lt;p&gt;AI systems need an external cognitive layer that lives outside model weights.&lt;/p&gt;

&lt;p&gt;Not just a vector database.&lt;br&gt;
Not just a chat history.&lt;br&gt;
Not just a memory API.&lt;/p&gt;

&lt;p&gt;A real cognitive layer should be able to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preserve continuity across sessions,&lt;/li&gt;
&lt;li&gt;accumulate knowledge in structured form,&lt;/li&gt;
&lt;li&gt;survive model upgrades,&lt;/li&gt;
&lt;li&gt;support domain specialization,&lt;/li&gt;
&lt;li&gt;remain inspectable and governed,&lt;/li&gt;
&lt;li&gt;and shape agent behavior over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because current LLMs are powerful, but they are still weak at stable long-horizon cognition. They are excellent inference engines. They are not yet sufficient as complete cognitive architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Memory to Cognition
&lt;/h2&gt;

&lt;p&gt;In Aura, the architecture has gradually moved beyond simple memory.&lt;/p&gt;

&lt;p&gt;The working progression is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Record&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Belief&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Concept&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Causal&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Policy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That already changes the role of memory.&lt;/p&gt;

&lt;p&gt;The system is no longer just storing facts. It is organizing experience into a structured cognitive state.&lt;/p&gt;

&lt;p&gt;And once that structure exists, new layers become possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Next Four Cognitive Functions
&lt;/h2&gt;

&lt;p&gt;The recent evolution of the system can be summarized in four steps:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Want
&lt;/h3&gt;

&lt;p&gt;The system should not only react to prompts.&lt;/p&gt;

&lt;p&gt;It should also detect internal tensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unresolved policy pressure,&lt;/li&gt;
&lt;li&gt;contradictions,&lt;/li&gt;
&lt;li&gt;unstable structure,&lt;/li&gt;
&lt;li&gt;pending cognitive obligations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those tensions can flow into drives, goals, and imperative-like internal pressure.&lt;/p&gt;

&lt;p&gt;That is the beginning of motivation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Expect
&lt;/h3&gt;

&lt;p&gt;A cognitive system should not only remember the past.&lt;/p&gt;

&lt;p&gt;It should form expectations about what should happen next.&lt;/p&gt;

&lt;p&gt;From stable causal structure, it can produce predictions.&lt;br&gt;
From mismatches between expectation and observation, it can produce surprise.&lt;/p&gt;

&lt;p&gt;That turns cognition from retrospective to anticipatory.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Wonder
&lt;/h3&gt;

&lt;p&gt;A capable system should not only repair contradictions.&lt;/p&gt;

&lt;p&gt;It should also notice what it does not know.&lt;/p&gt;

&lt;p&gt;Epistemic gaps matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;weakly grounded entities,&lt;/li&gt;
&lt;li&gt;missing causal mechanisms,&lt;/li&gt;
&lt;li&gt;underspecified policy dependencies,&lt;/li&gt;
&lt;li&gt;repeated blind spots,&lt;/li&gt;
&lt;li&gt;ambiguous concept boundaries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the beginning of curiosity.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Regulate
&lt;/h3&gt;

&lt;p&gt;A cognitive system should not always behave in exactly the same mode.&lt;/p&gt;

&lt;p&gt;Under pressure, it may need to become more conservative.&lt;br&gt;
Under stability, it may be able to explore.&lt;/p&gt;

&lt;p&gt;This is not about emotional theater.&lt;br&gt;
It is about regulation.&lt;/p&gt;

&lt;p&gt;A global modulation layer can shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;drive thresholds,&lt;/li&gt;
&lt;li&gt;curiosity thresholds,&lt;/li&gt;
&lt;li&gt;exploration budget,&lt;/li&gt;
&lt;li&gt;and behavioral selectivity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the beginning of self-regulation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Should Live Outside the Model
&lt;/h2&gt;

&lt;p&gt;This is the most important architectural point.&lt;/p&gt;

&lt;p&gt;If all cognition lives only inside model weights, you lose too much:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;portability,&lt;/li&gt;
&lt;li&gt;auditability,&lt;/li&gt;
&lt;li&gt;versioning,&lt;/li&gt;
&lt;li&gt;organization-level control,&lt;/li&gt;
&lt;li&gt;inspectability,&lt;/li&gt;
&lt;li&gt;and long-lived continuity across changing model generations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An external cognitive layer can survive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model upgrades,&lt;/li&gt;
&lt;li&gt;shell changes,&lt;/li&gt;
&lt;li&gt;deployment changes,&lt;/li&gt;
&lt;li&gt;domain swaps,&lt;/li&gt;
&lt;li&gt;and organizational adaptation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes it more durable than any single model interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Commercially
&lt;/h2&gt;

&lt;p&gt;This is not only a research direction.&lt;br&gt;
It is also a product direction.&lt;/p&gt;

&lt;p&gt;A governed external cognitive layer enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;specialist cognitive bases,&lt;/li&gt;
&lt;li&gt;organization-specific overlays,&lt;/li&gt;
&lt;li&gt;persistent agent continuity,&lt;/li&gt;
&lt;li&gt;safer multi-step behavior,&lt;/li&gt;
&lt;li&gt;and explainable adaptation without retraining.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That creates a path beyond generic chat agents.&lt;/p&gt;

&lt;p&gt;Instead of selling only an agent, you can sell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a cognitive substrate,&lt;/li&gt;
&lt;li&gt;a specialist module,&lt;/li&gt;
&lt;li&gt;and an organization layer that persists over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters Even If Model Architectures Change
&lt;/h2&gt;

&lt;p&gt;A common objection is:&lt;/p&gt;

&lt;p&gt;What if future models already include better internal cognition?&lt;/p&gt;

&lt;p&gt;That does not remove the need for an external layer.&lt;/p&gt;

&lt;p&gt;Even if models become far more capable, organizations will still need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;governance,&lt;/li&gt;
&lt;li&gt;portability,&lt;/li&gt;
&lt;li&gt;ownership,&lt;/li&gt;
&lt;li&gt;rollback,&lt;/li&gt;
&lt;li&gt;specialist control,&lt;/li&gt;
&lt;li&gt;and cognition that survives vendor changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the long-term bet is not:&lt;/p&gt;

&lt;p&gt;"models will stay weak."&lt;/p&gt;

&lt;p&gt;The better bet is:&lt;/p&gt;

&lt;p&gt;"portable, governed cognition will still matter even when models get stronger."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Direction
&lt;/h2&gt;

&lt;p&gt;The future of agent systems is unlikely to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model only,&lt;/li&gt;
&lt;li&gt;prompt only,&lt;/li&gt;
&lt;li&gt;or memory only.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It will likely require a distinct cognitive layer that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accumulate structured knowledge,&lt;/li&gt;
&lt;li&gt;generate internal motivational pressure,&lt;/li&gt;
&lt;li&gt;anticipate,&lt;/li&gt;
&lt;li&gt;explore,&lt;/li&gt;
&lt;li&gt;regulate,&lt;/li&gt;
&lt;li&gt;and remain externally governed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the direction we think is worth building toward.&lt;/p&gt;

&lt;p&gt;Not just better memory for AI.&lt;/p&gt;

&lt;p&gt;A real cognitive layer beyond memory.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I am currently building and testing this cognitive architecture in a closed environment. If you are an AI architect, researcher, or founder hitting the limits of RAG and standard agent loops, my DMs are open. I’d love to compare notes on the future of autonomous cognition.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Pennsylvania State found why AI memory fails across models. AuraSDK doesn't have this problem.</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Fri, 27 Mar 2026 16:52:05 +0000</pubDate>
      <link>https://dev.to/teolex2020/pennsylvania-state-found-why-ai-memory-fails-across-models-aurasdk-doesnt-have-this-problem-579</link>
      <guid>https://dev.to/teolex2020/pennsylvania-state-found-why-ai-memory-fails-across-models-aurasdk-doesnt-have-this-problem-579</guid>
      <description>&lt;p&gt;Pennsylvania State University just published a paper that exposes a structural flaw in how most AI agent memory systems work.&lt;/p&gt;

&lt;p&gt;The paper is called &lt;a href="https://arxiv.org/abs/2603.23234" rel="noopener noreferrer"&gt;MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation&lt;/a&gt;. The findings are uncomfortable if you're building agent memory the conventional way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The flaw
&lt;/h2&gt;

&lt;p&gt;Most agent memory systems work like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Model solves a problem&lt;/li&gt;
&lt;li&gt;Memory stores the reasoning trace — what the model did, how it got there&lt;/li&gt;
&lt;li&gt;Model retrieves that memory later and performs better&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The assumption buried inside this design: &lt;em&gt;the stored knowledge is about the task, not about the model that solved it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Pennsylvania State tested whether that assumption holds.&lt;/p&gt;

&lt;p&gt;They gave a 7B model's memory to a 32B model. &lt;strong&gt;MATH500 dropped from 63.8% to 50.6%. HumanEval dropped from 68.3% to 34.1%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then they gave the 32B model's memory to the 7B model. &lt;strong&gt;Performance dropped again. Both directions failed. Both fell below the zero-memory baseline.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Giving a model someone else's memory made it perform worse than having no memory at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;A model's reasoning traces don't just capture what the correct answer required. They capture &lt;em&gt;how that specific model thinks&lt;/em&gt; — its preferred solving strategies, its heuristic shortcuts, its stylistic patterns.&lt;/p&gt;

&lt;p&gt;Memory distilled from those traces encodes the model's reasoning personality alongside the actual task knowledge. When a different model retrieves that memory, it gets handed instructions optimized for a completely different cognitive architecture. The guidance actively interferes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MemCollab does
&lt;/h2&gt;

&lt;p&gt;MemCollab fixes this by making memory construction cross-model. Two agents — a smaller and a larger model — independently solve the same problem. One succeeds, one fails. The system contrasts the trajectories and extracts only the abstract invariants:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What reasoning principle was present in the success and violated in the failure?&lt;/li&gt;
&lt;li&gt;What error pattern appeared in the failure that the success avoided?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The extracted memory stores only those rules — not the solution, not the reasoning style, not the model-specific heuristics.&lt;/p&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Llama 3 8B: MATH500 from 27.4% → 42.4%&lt;/li&gt;
&lt;li&gt;Qwen 7B: MATH500 from 52.2% → 67.0%, HumanEval from 42.7% → 74.4%&lt;/li&gt;
&lt;li&gt;Reasoning turns cut from 3.3 → 1.5 on HumanEval (fewer dead ends)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The deeper insight
&lt;/h2&gt;

&lt;p&gt;The efficiency finding is the one that gets overlooked. MemCollab doesn't just improve accuracy — it makes agents reach correct answers in fewer steps. The contrastive memory isn't adding more guidance. It's stripping out the noise that was making agents explore dead ends repeatedly.&lt;/p&gt;

&lt;p&gt;By encoding &lt;em&gt;what not to do&lt;/em&gt; as explicitly as &lt;em&gt;what to do&lt;/em&gt;, the memory prunes the search space before the agent even starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AuraSDK doesn't have this problem
&lt;/h2&gt;

&lt;p&gt;AuraSDK avoids the contamination problem structurally — by never storing reasoning traces at all.&lt;/p&gt;

&lt;p&gt;When you store something in AuraSDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staging deploy prevented 3 production incidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;semantic_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're storing a &lt;strong&gt;claim about the world&lt;/strong&gt;, not a record of how a model reasoned about it. The cognitive layers — Belief, Concept, Causal, Policy — are derived from the content of what was observed, not from the model's processing of it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record → Belief → Concept → Causal → Policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is built deterministically from the one below. Beliefs emerge from clusters of records. Causal patterns emerge from temporal co-occurrence and explicit links. Policy hints emerge from repeated causal patterns. None of this touches model internals.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;the cognitive layer is model-agnostic by design.&lt;/strong&gt; Swap GPT-4o for Claude, swap Claude for Llama — the stored memory, the belief structure, the causal patterns, the policy hints all remain valid. There's nothing model-specific to contaminate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two different approaches to the same insight
&lt;/h2&gt;

&lt;p&gt;MemCollab and AuraSDK arrive at the same conclusion from different directions:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Memory that encodes how a model thinks is fragile. Memory that encodes what happened is durable.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;MemCollab fixes contamination after the fact — by contrasting two models' traces and extracting only what survived.&lt;/p&gt;

&lt;p&gt;AuraSDK avoids contamination by construction — by never storing traces in the first place.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;MemCollab&lt;/th&gt;
&lt;th&gt;AuraSDK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What's stored&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Abstract reasoning invariants across models&lt;/td&gt;
&lt;td&gt;Claims, facts, relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Requires LLM to build memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — two models per problem&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model-agnostic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes — by contrastive distillation&lt;/td&gt;
&lt;td&gt;Yes — by design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Works offline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Fully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLM-bound&lt;/td&gt;
&lt;td&gt;0.076ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cognitive layers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Belief → Concept → Causal → Policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Research paper&lt;/td&gt;
&lt;td&gt;MIT, ships today&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What this means for the field
&lt;/h2&gt;

&lt;p&gt;The Pennsylvania State paper validates something important: &lt;strong&gt;the right unit of memory is not a reasoning trace.&lt;/strong&gt; It's the abstract principle that holds regardless of which model does the reasoning.&lt;/p&gt;

&lt;p&gt;AuraSDK takes this further: the right unit of memory is a structured observation about the world — a fact, a decision, a contradiction, a preference — that any model can retrieve and use without being handed someone else's cognitive fingerprint.&lt;/p&gt;

&lt;p&gt;The field is converging on this. The implementations differ. But the core insight is the same.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>agents</category>
    </item>
    <item>
      <title>Google's TurboQuant solves half the AI memory problem. Here's the other half.</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Wed, 25 Mar 2026 16:49:10 +0000</pubDate>
      <link>https://dev.to/teolex2020/googles-turboquant-solves-half-the-ai-memory-problem-heres-the-other-half-44if</link>
      <guid>https://dev.to/teolex2020/googles-turboquant-solves-half-the-ai-memory-problem-heres-the-other-half-44if</guid>
      <description>&lt;p&gt;This week Google Research published &lt;a href="https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/" rel="noopener noreferrer"&gt;TurboQuant&lt;/a&gt; — a two-stage KV-cache quantization algorithm that achieves 6x memory reduction and 8x attention speedup with zero accuracy loss at 3 bits. No training required.&lt;/p&gt;

&lt;p&gt;It's genuinely impressive engineering. But it's worth being precise about what problem it solves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two AI memory problems
&lt;/h2&gt;

&lt;p&gt;Most people conflate two distinct problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem A: memory within a session&lt;/strong&gt;&lt;br&gt;
As context grows, the KV-cache grows. It becomes expensive in RAM and slow in attention computation. TurboQuant solves this — brilliantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem B: memory between sessions&lt;/strong&gt;&lt;br&gt;
When the session ends, the KV-cache is gone. The model starts from zero next time. No memory of past interactions, no accumulated patterns, no structured experience. TurboQuant doesn't touch this.&lt;/p&gt;
&lt;h2&gt;
  
  
  What TurboQuant actually does
&lt;/h2&gt;

&lt;p&gt;TurboQuant is a two-stage pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PolarQuant&lt;/strong&gt; — rotates vectors randomly, converts to polar coordinates, quantizes components without needing per-block normalization constants. This eliminates the 1–2 bit overhead that traditional quantization methods carry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QJL (Quantized Johnson-Lindenstrauss)&lt;/strong&gt; — encodes residual error with a single sign bit. Zero memory overhead.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Result: 3-bit KV-cache, 6x compression, 8x speedup, zero accuracy degradation on LongBench, Needle-in-a-Haystack, RULER, and ZeroSCROLLS benchmarks.&lt;/p&gt;

&lt;p&gt;This makes long-context inference significantly cheaper and faster. Real value.&lt;/p&gt;
&lt;h2&gt;
  
  
  The gap it leaves open
&lt;/h2&gt;

&lt;p&gt;The moment the session ends — the KV-cache is gone.&lt;/p&gt;

&lt;p&gt;Week 1 with any model: average answers.&lt;br&gt;
Week 4 with any model: still average answers. It forgot everything.&lt;/p&gt;

&lt;p&gt;Fine-tuning costs thousands of dollars and weeks. RAG gives you retrieval, not cognition. Context windows bill per token and still reset.&lt;/p&gt;
&lt;h2&gt;
  
  
  What we built for Problem B
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;AuraSDK&lt;/a&gt; is a local cognitive substrate that sits outside model weights.&lt;/p&gt;

&lt;p&gt;It accumulates structured experience across sessions through a 5-layer pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record → Belief → Concept → Causal → Policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is derived deterministically from the one below — no LLM, no embeddings. Policy hints like "deploy to staging first" aren't written by anyone. They emerge from repeated causal patterns in stored experience.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staging deploy prevented 3 production incidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User always deploys to staging first&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# after run_maintenance(), the cognitive stack derives:
&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_surfaced_policy_hints&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# → [{"action": "Prefer", "domain": "workflow", "description": "deploy to staging first"}]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What v1.5.4 adds:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autonomous cognitive plasticity — the substrate observes model output and updates itself. No fine-tuning. Full audit trail.&lt;/li&gt;
&lt;li&gt;Salience weighting — what matters persists longer, decays slower&lt;/li&gt;
&lt;li&gt;Contradiction governance — conflicting evidence surfaced explicitly, not averaged silently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Performance (1,000 records, Ryzen 7, v1.5.4):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store: 0.91ms&lt;/li&gt;
&lt;li&gt;Recall: 0.076ms (~2,600× faster than Mem0)&lt;/li&gt;
&lt;li&gt;Recall (cached): 1.4µs&lt;/li&gt;
&lt;li&gt;Maintenance cycle: 15ms median&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No API keys. No cloud. No LLM dependency. ~3MB binary. Fully offline. MIT license.&lt;/p&gt;

&lt;h2&gt;
  
  
  The full picture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;TurboQuant&lt;/th&gt;
&lt;th&gt;AuraSDK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;KV-cache overhead within session&lt;/td&gt;
&lt;td&gt;No memory between sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Approach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quantization of attention keys/values&lt;/td&gt;
&lt;td&gt;Persistent cognitive substrate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single inference pass&lt;/td&gt;
&lt;td&gt;Cross-session accumulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Requires LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (runs inside it)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Works offline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Fully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Research paper&lt;/td&gt;
&lt;td&gt;MIT, ships today&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are complementary. TurboQuant makes inference cheaper in the moment. AuraSDK makes the model smarter over time.&lt;/p&gt;

&lt;p&gt;The field needs both.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Your AI forgets everything — this layer fixes that without retraining</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Tue, 24 Mar 2026 16:19:10 +0000</pubDate>
      <link>https://dev.to/teolex2020/your-ai-forgets-everything-this-layer-fixes-that-without-retraining-nlf</link>
      <guid>https://dev.to/teolex2020/your-ai-forgets-everything-this-layer-fixes-that-without-retraining-nlf</guid>
      <description>&lt;p&gt;Your AI model forgets everything after every conversation.&lt;/p&gt;

&lt;p&gt;Not because it’s bad — because it has no memory system.&lt;/p&gt;

&lt;p&gt;RAG helps retrieve context.&lt;br&gt;
Fine-tuning helps adjust behavior.&lt;/p&gt;

&lt;p&gt;But neither actually gives your system memory.&lt;/p&gt;

&lt;p&gt;This article shows a different approach:&lt;br&gt;
a cognitive layer that sits outside the model&lt;br&gt;
and gets smarter over time — while the model stays frozen.&lt;/p&gt;


&lt;h2&gt;
  
  
  What the cognitive layer actually does
&lt;/h2&gt;

&lt;p&gt;AuraSDK builds a 5-layer structure from whatever the model and users store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record → Belief → Concept → Causal → Policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is derived from the one below — without LLM, without embeddings, locally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Record&lt;/strong&gt;: raw stored fact with trust score, provenance, decay rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Belief&lt;/strong&gt;: competing hypotheses about the same claim, epistemically weighted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concept&lt;/strong&gt;: stable abstractions over repeated beliefs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal&lt;/strong&gt;: learned cause→effect patterns from co-occurring evidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy&lt;/strong&gt;: advisory hints — Prefer, Avoid, Warn — that emerge from causal structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing in layers 2–5 is hand-authored. They emerge from what's stored and observed over time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Level&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_full_cognitive_stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staging deploy prevented 3 production incidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deploy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Direct prod deploy caused outage in Q3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deploy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# After maintenance:
&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_surfaced_policy_hints&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# → [{"action": "Prefer", "domain": "deploy", "description": "staging before production"}]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That policy hint was not written by anyone. The causal layer found the pattern. The policy layer surfaced it.&lt;/p&gt;




&lt;h2&gt;
  
  
  v1.5.4: the three things that were missing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The substrate now learns from the model's own output
&lt;/h3&gt;

&lt;p&gt;Before v1.5.4, the cognitive layer only knew what you explicitly stored. Now it observes model responses and updates itself.&lt;/p&gt;

&lt;p&gt;Claims are extracted. Confirmations strengthen existing beliefs. Contradictions raise volatility. The substrate evolves from inference — without retraining, without an external API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;capture&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;capture_experience&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How should we handle this deploy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Always verify staging health checks before pushing to production.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest_experience_batch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;capture&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_maintenance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# cognitive layer updated — next recall is different
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Safety bounds (non-negotiable):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generated claims capped at 0.70 confidence — cannot overwrite recorded facts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PlasticityMode::Off&lt;/code&gt; by default — nothing changes without explicit opt-in&lt;/li&gt;
&lt;li&gt;Every mutation writes to an audit trail traceable to the prompt that caused it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;purge_inference_records()&lt;/code&gt; — clean rollback when needed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;freeze_namespace_plasticity("medical")&lt;/code&gt; — some domains must never adapt from inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recorded facts always win over model inference. Always.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The substrate now knows what matters
&lt;/h3&gt;

&lt;p&gt;High-frequency recall and high-significance are not the same thing. A trivial fact mentioned 20 times should not outrank a critical decision mentioned once.&lt;/p&gt;

&lt;p&gt;v1.5.4 adds salience weighting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_record_salience&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → this record resists decay, ranks higher, gets preserved longer
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Maintenance now also produces bounded reflection summaries: recurring blockers, unresolved tensions, patterns that keep appearing. Not "feelings" — structured synthesis from what's actually stored.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Contradictions are now first-class, not silently averaged
&lt;/h3&gt;

&lt;p&gt;Before: conflicting evidence was weighted and averaged. The conflict was invisible.&lt;/p&gt;

&lt;p&gt;Now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;clusters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_contradiction_clusters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_contradiction_review_queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recall explanations carry explicit markers: &lt;em&gt;"this recommendation depends on unresolved evidence."&lt;/em&gt; The operator sees the friction. The user can be told honestly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What else ships in v1.5.4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Concept persistence&lt;/strong&gt; — concepts used to reset on every restart. Now they survive. The 5-layer stack is actually intact across sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Belief reranking active by default&lt;/strong&gt; — in v1.5.3, &lt;code&gt;BeliefRerankMode::Off&lt;/code&gt; was the default. The cognitive stack was engineered but not running. Now it runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production integrity&lt;/strong&gt; — startup validation, persistence manifest, concept partition cap for large corpora.&lt;/p&gt;




&lt;h2&gt;
  
  
  Explainability is built in
&lt;/h2&gt;

&lt;p&gt;Every recall decision is traceable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;explanation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;explain_recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployment decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → which records matched, why, what belief groups they belong to,
#   what salience contributed, whether unresolved evidence is present
&lt;/span&gt;
&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;provenance_chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → full trace from policy hint back to source records
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not logging. It is structural explainability derived from the cognitive layer itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Benchmarked on 1,000 records, Windows 10 / Ryzen 7:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;vs Mem0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Store&lt;/td&gt;
&lt;td&gt;0.09 ms&lt;/td&gt;
&lt;td&gt;~same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall&lt;/td&gt;
&lt;td&gt;0.74 ms&lt;/td&gt;
&lt;td&gt;~270× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall (cached)&lt;/td&gt;
&lt;td&gt;0.48 µs&lt;/td&gt;
&lt;td&gt;~400,000× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance&lt;/td&gt;
&lt;td&gt;1.1 ms&lt;/td&gt;
&lt;td&gt;no equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mem0 recall requires an embedding API call (~200ms+). AuraSDK recall is pure local computation. No embeddings required. No external service.&lt;/p&gt;




&lt;h2&gt;
  
  
  The positioning in one sentence
&lt;/h2&gt;

&lt;p&gt;AuraSDK is not a vector database. Not a RAG wrapper. Not a fine-tuning platform. Not a generic agent framework.&lt;/p&gt;

&lt;p&gt;It is a governable cognitive substrate for frozen AI models — the layer that makes them smarter, more consistent, and more explainable over time, without touching their weights.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try in browser (no install): &lt;a href="https://colab.research.google.com/github/teolex2020/AuraSDK/blob/main/examples/colab_quickstart.ipynb" rel="noopener noreferrer"&gt;Open in Colab&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;teolex2020/AuraSDK&lt;/a&gt; — MIT license, patent pending (US 63/969,703)&lt;/p&gt;

&lt;p&gt;Built in Kyiv, Ukraine 🇺🇦&lt;/p&gt;




&lt;p&gt;What would you build with a model that actually accumulates structured experience over time?&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I tested the same AI model against itself. Memory won 4/5.</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Wed, 18 Mar 2026 13:23:16 +0000</pubDate>
      <link>https://dev.to/teolex2020/i-tested-the-same-ai-model-against-itself-memory-won-45-336k</link>
      <guid>https://dev.to/teolex2020/i-tested-the-same-ai-model-against-itself-memory-won-45-336k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvutkq5layeg3y2crmtd.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvutkq5layeg3y2crmtd.JPG" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The experiment
&lt;/h2&gt;

&lt;p&gt;Same model. Same 5 questions. One difference: one side had persistent memory via AuraSDK, the other had none.&lt;/p&gt;

&lt;p&gt;Both sides used Gemini 2.5 Flash-Lite — identical model, identical cost per token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: 4/5 questions won by the side with memory. 48% fewer tokens used.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the questions tested
&lt;/h2&gt;

&lt;p&gt;Real Python dev scenarios — the kind where generic answers aren't enough:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"I'm writing a new async function to fetch user orders from the DB. What patterns should I follow?"&lt;/li&gt;
&lt;li&gt;"We need background email sending when an order is completed. How should we implement it?"&lt;/li&gt;
&lt;li&gt;"I'm writing tests for the new payment module. Walk me through the setup."&lt;/li&gt;
&lt;li&gt;"Code review: &lt;code&gt;results = list(db.query(Order).filter(...))&lt;/code&gt; — what's wrong?"&lt;/li&gt;
&lt;li&gt;"The reports endpoint is slow — 3 seconds for 200 users. How do I debug and fix this?"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These questions have correct generic answers. But they also have &lt;em&gt;specific&lt;/em&gt; answers — if you know the team's stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the model with memory knew
&lt;/h2&gt;

&lt;p&gt;AuraSDK had loaded 15 records of accumulated team knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use arq for task queues, not Celery — switched because Celery was too heavy&lt;/li&gt;
&lt;li&gt;Production incident: missing await on async DB call silently returned a coroutine. 2 hours to debug.&lt;/li&gt;
&lt;li&gt;SQLAlchemy lazy loading doesn't work with async sessions — always use selectinload() or joinedload()&lt;/li&gt;
&lt;li&gt;Code review caught: someone did list(queryset) and loaded 50k rows into memory. Lint rule added.&lt;/li&gt;
&lt;li&gt;N+1 queries found in the user reports endpoint — fixed with a single joined query&lt;/li&gt;
&lt;li&gt;Test setup: pytest + pytest-asyncio, transactions that rollback after each test, coverage must stay above 85%&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What happened question by question
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q2 — Background email sending:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With memory (1,307 tokens):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Use arq — we switched from Celery because it was too heavy for our scale. Define an arq job decorated with &lt;a class="mentioned-user" href="https://dev.to/arq"&gt;@arq&lt;/a&gt;.func, enqueue it when order completes."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without memory (3,203 tokens):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Here are several approaches: FastAPI BackgroundTasks for simple cases, Celery for robust production use..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same model. One knew the decision. One gave a textbook overview.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q4 — Code review:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With memory (829 tokens):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This is a direct violation of our established practice. We had an incident where this loaded 50,000 rows into memory. A lint rule was added specifically to prevent this."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without memory (1,604 tokens):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This is generally considered an anti-pattern in SQLAlchemy. Here's a breakdown of what's wrong..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How AuraSDK works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Level&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_full_cognitive_stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# store team knowledge
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;We use arq, not Celery — switched because Celery was too heavy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Production incident: list(queryset) loaded 50k rows into memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Decisions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lesson-learned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# recall before answering — &amp;lt;1ms, no API call
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;background email sending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# inject into prompt
&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TEAM CONTEXT:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Answer using this context.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No embeddings. No vector database. No LLM calls during learning. Pure local Rust computation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cognitive pipeline
&lt;/h2&gt;

&lt;p&gt;AuraSDK doesn't just store and retrieve text. Every record goes through 5 layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record → Belief → Concept → Causal → Policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Belief&lt;/strong&gt;: groups related observations, resolves contradictions with confidence scores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concept&lt;/strong&gt;: discovers stable topic clusters across beliefs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal&lt;/strong&gt;: finds cause-effect patterns from temporal and explicit links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy&lt;/strong&gt;: derives behavioral hints (Prefer / Avoid / Warn) from causal patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After enough interactions, the system surfaces this automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_surfaced_policy_hints&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# [{"action": "Prefer", "domain": "dev", "description": "use arq over celery for task queues"}]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nobody wrote that rule. The system derived it from the pattern of stored observations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token math
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;With memory&lt;/th&gt;
&lt;th&gt;Without memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q1&lt;/td&gt;
&lt;td&gt;1,200 tokens&lt;/td&gt;
&lt;td&gt;1,545 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q2&lt;/td&gt;
&lt;td&gt;1,307 tokens&lt;/td&gt;
&lt;td&gt;3,203 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q3&lt;/td&gt;
&lt;td&gt;1,923 tokens&lt;/td&gt;
&lt;td&gt;4,067 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4&lt;/td&gt;
&lt;td&gt;829 tokens&lt;/td&gt;
&lt;td&gt;1,604 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q5&lt;/td&gt;
&lt;td&gt;1,294 tokens&lt;/td&gt;
&lt;td&gt;2,155 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6,553 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12,574 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;48% fewer tokens. The memory layer doesn't add bloat — it gives the model exactly what it needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AuraSDK&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;th&gt;Zep&lt;/th&gt;
&lt;th&gt;Letta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM required for learning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works offline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fully&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;With local LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;1ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200ms+&lt;/td&gt;
&lt;td&gt;~200ms&lt;/td&gt;
&lt;td&gt;LLM-bound&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-derives behavioral policies&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~3MB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~50MB+&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Python pkg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
python examples/demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open source: github.com/teolex2020/AuraSDK&lt;br&gt;
Patent pending: US 63/969,703&lt;br&gt;
Built in Kyiv, Ukraine.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>I built a cognitive layer for AI agents that learns without LLM calls</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Tue, 17 Mar 2026 12:19:42 +0000</pubDate>
      <link>https://dev.to/teolex2020/i-built-a-cognitive-layer-for-ai-agents-that-learns-without-llm-calls-33no</link>
      <guid>https://dev.to/teolex2020/i-built-a-cognitive-layer-for-ai-agents-that-learns-without-llm-calls-33no</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Every time your agent starts a conversation, it starts from zero.&lt;/p&gt;

&lt;p&gt;Sure, you can stuff a summary into the system prompt. You can use RAG. You can call Mem0 or Zep.&lt;/p&gt;

&lt;p&gt;But all of these have the same problem: &lt;strong&gt;they need LLM calls to learn&lt;/strong&gt;. To extract facts, to build a user profile, to understand what matters — you're paying per token, adding latency, and depending on a cloud service.&lt;/p&gt;

&lt;p&gt;What if the learning happened locally, automatically, without any LLM involvement?&lt;/p&gt;

&lt;h2&gt;
  
  
  What AuraSDK does differently
&lt;/h2&gt;

&lt;p&gt;AuraSDK is a cognitive layer that runs alongside any LLM. It observes interactions and — without any LLM calls — builds up a structured understanding of patterns, causes, and behavioral rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Level&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_full_cognitive_stack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# store what happens
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User always deploys to staging first&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staging deploy prevented 3 production incidents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# sub-millisecond recall — inject into any LLM prompt
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployment decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# after enough interactions, the system derives this on its own:
&lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_surfaced_policy_hints&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# [{"action": "Prefer", "domain": "workflow", "description": "deploy to staging first"}]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nobody wrote that policy rule. The system derived it from the pattern of stored observations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cognitive pipeline
&lt;/h2&gt;

&lt;p&gt;AuraSDK processes every stored record through 5 layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Record → Belief → Concept → Causal → Policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is bounded and deterministic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Belief&lt;/strong&gt;: groups related observations, resolves contradictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concept&lt;/strong&gt;: discovers stable topic clusters across beliefs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Causal&lt;/strong&gt;: finds cause-effect patterns from temporal and explicit links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy&lt;/strong&gt;: derives behavioral hints (Prefer / Avoid / Warn) from causal patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire pipeline runs in milliseconds. No LLM. No cloud. No embeddings required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it in 60 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
python examples/demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Phase 4 - Recall in action

  Query: "deployment decision"  [0.29ms]
    1. Staging deploy prevented database migration failure
    2. Direct prod deploy skipped staging -- caused data loss

  Query: "code review"  [0.18ms]
    1. Code review caught SQL injection before merge
    2. Code review found performance regression early
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5 learning cycles completed in 16ms. Recall at 0.29ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;AuraSDK&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;th&gt;Zep&lt;/th&gt;
&lt;th&gt;Letta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM required for learning&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works offline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fully&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;With local LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;1ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200ms+&lt;/td&gt;
&lt;td&gt;~200ms&lt;/td&gt;
&lt;td&gt;LLM-bound&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-derives behavioral policies&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~3MB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~50MB+&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Python pkg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What's new in v1.5.3
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Full 5-layer cognitive pipeline active by default&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;enable_full_cognitive_stack()&lt;/code&gt; — one call to activate everything&lt;/li&gt;
&lt;li&gt;Decay now driven by memory level, not manual type labels&lt;/li&gt;
&lt;li&gt;Policy hints now work with explicit causal links (&lt;code&gt;link_records()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;demo.py&lt;/code&gt; — see it working in 60 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Built in Rust, from Kyiv
&lt;/h2&gt;

&lt;p&gt;Pure Rust core. No Python dependencies for the engine. Patent pending (US 63/969,703).&lt;/p&gt;

&lt;p&gt;Open source: github.com/teolex2020/AuraSDK&lt;br&gt;
Install: pip install aura-memory&lt;br&gt;
Web: aurasdk.dev&lt;/p&gt;

&lt;p&gt;If you're building AI agents and want deterministic, explainable, offline-capable memory — give it a try and tell me what you think.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>memory</category>
      <category>rust</category>
      <category>python</category>
    </item>
    <item>
      <title>10x Faster Recall + Memory That Evolves: Aura v1.3 for AI Agents</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Thu, 05 Mar 2026 09:42:48 +0000</pubDate>
      <link>https://dev.to/teolex2020/10x-faster-recall-memory-that-evolves-aura-v13-for-ai-agents-44ln</link>
      <guid>https://dev.to/teolex2020/10x-faster-recall-memory-that-evolves-aura-v13-for-ai-agents-44ln</guid>
      <description>&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Every AI agent framework has the same weakness: memory is an afterthought. Most solutions dump everything into a vector database and hope cosine similarity finds the right context. This works until it doesn't — when your agent needs to know &lt;em&gt;when&lt;/em&gt; it learned something, &lt;em&gt;what changed&lt;/em&gt; since last week, or &lt;em&gt;which&lt;/em&gt; memories are actually useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Aura Does Differently
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;Aura&lt;/a&gt; is a pure-Rust cognitive memory engine. Instead of embeddings + vector search, it uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SDR Encoding&lt;/strong&gt; (Sparse Distributed Representations) — biologically-inspired, noise-tolerant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RRF Fusion&lt;/strong&gt; — 4 parallel ranking signals (SDR similarity, MinHash, Tag Jaccard, optional embeddings)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal Decay&lt;/strong&gt; — memories naturally fade unless reinforced&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph Connections&lt;/strong&gt; — associative, causal, and co-activation links between memories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: sub-millisecond recall, ~3MB binary, zero external dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  10x Recall Speedup (v1.3.1)
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;recall_structured&lt;/code&gt; call was cloning ALL records into a new HashMap to filter by namespace. At 10K records, that's 94ms of pure waste.&lt;/p&gt;

&lt;p&gt;Fix: pass the original HashMap through the pipeline. Each signal collector filters by namespace inline with a cheap &lt;code&gt;contains()&lt;/code&gt; check. Plus a new &lt;code&gt;StructuredRecallCache&lt;/code&gt; for repeated queries.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Records&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1K&lt;/td&gt;
&lt;td&gt;15 ms&lt;/td&gt;
&lt;td&gt;2.6 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.8x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5K&lt;/td&gt;
&lt;td&gt;58 ms&lt;/td&gt;
&lt;td&gt;5.1 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.4x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10K&lt;/td&gt;
&lt;td&gt;94 ms&lt;/td&gt;
&lt;td&gt;8.6 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10.9x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Warm recall (cache hit): &lt;strong&gt;~0.07 ms&lt;/strong&gt; — constant time regardless of record count.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's New in v1.3.0
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Temporal Queries
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers dark mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ... days pass, user changes preference ...
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;supersede&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers light mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# What did we know last week?
&lt;/span&gt;&lt;span class="n"&gt;old_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_week_timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;recall_at(query, timestamp)&lt;/code&gt; filters records by creation time. &lt;code&gt;history(record_id)&lt;/code&gt; shows the full access/strength timeline. This is how you debug agent behavior — "why did it do X on Tuesday?"&lt;/p&gt;

&lt;h4&gt;
  
  
  2. LangChain Drop-In
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura.langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AuraMemory&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AuraMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Works with any LangChain chain
&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationChain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AuraChatMessageHistory&lt;/code&gt; implements the full &lt;code&gt;BaseChatMessageHistory&lt;/code&gt; interface. &lt;code&gt;AuraMemory&lt;/code&gt; is duck-type compatible with &lt;code&gt;ConversationBufferMemory&lt;/code&gt;. No changes to your existing code.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Adaptive Recall
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After recall, tell Aura what was useful
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployment steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;was_helpful&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;useful&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# +0.1 strength
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;useful&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# -0.15 strength
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over time, noise naturally decays while valuable memories get reinforced. No other memory SDK has this built-in.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Memory Versioning
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Save state before experiment
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before_refactor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ... agent does things ...
&lt;/span&gt;
&lt;span class="c1"&gt;# Something went wrong? Roll back
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before_refactor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Or compare states
&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before_refactor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after_refactor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Added: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;added&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Removed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;removed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  5. Agent-to-Agent Sharing
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Agent A exports relevant context
&lt;/span&gt;&lt;span class="n"&gt;fragment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent B imports it (strength halved, tagged "shared")
&lt;/span&gt;&lt;span class="n"&gt;agent_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;import_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fragment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The protocol envelope includes version and provenance metadata. Imported records arrive with reduced trust — they need to prove themselves.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. C FFI — Aura as a Platform
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;"aura.h"&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="n"&gt;AuraHandle&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aura_open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"./memory"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;aura_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Remember this"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aura_recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"what to remember"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;aura_free_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;aura_close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Working examples in &lt;a href="https://github.com/teolex2020/AuraSDK/blob/main/examples/go/main.go" rel="noopener noreferrer"&gt;Go&lt;/a&gt; and &lt;a href="https://github.com/teolex2020/AuraSDK/blob/main/examples/csharp/Program.cs" rel="noopener noreferrer"&gt;C#&lt;/a&gt;. Any language with C FFI can use Aura.&lt;/p&gt;

&lt;h4&gt;
  
  
  7. OpenTelemetry
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[features]&lt;/span&gt;
&lt;span class="py"&gt;telemetry&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"opentelemetry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"opentelemetry_sdk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"opentelemetry-otlp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"tracing-opentelemetry"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;17 key functions instrumented with &lt;code&gt;#[instrument]&lt;/code&gt; spans. OTLP export to any collector. &lt;a href="https://github.com/teolex2020/AuraSDK/blob/main/examples/grafana_dashboard.json" rel="noopener noreferrer"&gt;Grafana dashboard template&lt;/a&gt; included.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bug That Took 8 Hours
&lt;/h3&gt;

&lt;p&gt;Fun story: our CI was timing out at 6+ hours. We tried increasing timeouts, switching to release builds, reducing the test matrix. Nothing worked.&lt;/p&gt;

&lt;p&gt;Turns out: &lt;code&gt;Aura&lt;/code&gt; struct didn't have a &lt;code&gt;Drop&lt;/code&gt; implementation. When tests ended without calling &lt;code&gt;close()&lt;/code&gt;, internal file handles wouldn't release. Each test hung for 5 minutes waiting for a timeout that never came. 28 tests x 5 min = CI death.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix: 9 lines of code.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Drop&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.stop_background&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.storage&lt;/span&gt;&lt;span class="nf"&gt;.flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.index&lt;/span&gt;&lt;span class="nf"&gt;.save&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now 503 tests pass in 7 minutes. Sometimes the hardest bugs are the simplest ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./my_agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers concise answers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Identity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;how should I respond?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Returns formatted context for your LLM's system prompt
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/aura-memory/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/teolex2020/AuraSDK/releases/tag/v1.3.1" rel="noopener noreferrer"&gt;Full Changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aurasdk.dev" rel="noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Star the repo if this is useful. PRs and issues welcome.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>python</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Agent memory in 5 lines of Python — no LLM, no cloud, &lt;1ms recall</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Mon, 02 Mar 2026 07:32:40 +0000</pubDate>
      <link>https://dev.to/teolex2020/agent-memory-in-5-lines-of-python-no-llm-no-cloud-1ms-recall-55d5</link>
      <guid>https://dev.to/teolex2020/agent-memory-in-5-lines-of-python-no-llm-no-cloud-1ms-recall-55d5</guid>
      <description>&lt;p&gt;Last week, my AI agent analyzed 10 competitors in the AI memory market over 3 days. On day 4, I asked it to compare their pricing. It didn't search again — it already knew them all. That's what happens when your agent has real memory, not a chat history.&lt;/p&gt;

&lt;p&gt;Your AI agent forgets everything between sessions. Every conversation starts from zero. Every user preference, every decision, every piece of context — gone. You paste old conversations into the system prompt, hit the token limit, and wonder why the agent feels so... stateless.&lt;/p&gt;

&lt;p&gt;Most "memory" solutions bolt on a vector database, call an embedding API, and charge you per query. You now have 200ms latency, a cloud dependency, and a monthly bill — for what is essentially a fancy search index.&lt;/p&gt;

&lt;p&gt;What if your agent could remember like a human? Important things stick. Trivial things fade. Trusted sources rank higher than random web scrapes. And it all happens in &lt;strong&gt;under 1 millisecond&lt;/strong&gt;, locally, with &lt;strong&gt;zero LLM calls&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's &lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;Aura&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes Aura different
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Aura&lt;/th&gt;
&lt;th&gt;Others (Mem0, Zep, Cognee)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM required&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall latency&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;1ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;200ms+ / LLM-bound&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works offline&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.7 MB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Heavy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per op&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API billing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source provenance&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Built-in&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Aura is a Rust-native cognitive memory engine with Python bindings. It uses a 4-signal RRF (Reciprocal Rank Fusion) recall system — no embeddings required — and models memory decay, consolidation, and trust scoring inspired by how human memory actually works.&lt;/p&gt;

&lt;p&gt;Let's see how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No Docker, no API keys, no cloud account. The entire engine ships as a single 2.7 MB binary.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Store &amp;amp; recall — the basics
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Level&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store memories at different importance levels
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers dark mode and Vim keybindings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ui&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deploy staging before production, always run tests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Decisions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fix login bug - users getting 403 on /api/auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Working&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Recall — returns formatted context ready for LLM injection
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authentication issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== COGNITIVE CONTEXT ===
[IDENTITY]
  - User prefers dark mode and Vim keybindings [preference, ui]

[DECISIONS]
  - Deploy staging before production, always run tests [workflow]

[WORKING]
  - Fix login bug - users getting 403 on /api/auth [bug, auth]

=== END CONTEXT ===
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. &lt;code&gt;store()&lt;/code&gt; → &lt;code&gt;recall()&lt;/code&gt; → inject into your system prompt. Five lines to give your agent persistent memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Memory levels — not all memories are equal
&lt;/h2&gt;

&lt;p&gt;Aura organizes memory into 4 levels across 2 tiers, modeled after human cognitive architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Decay rate&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Identity&lt;/td&gt;
&lt;td&gt;0.99/cycle&lt;/td&gt;
&lt;td&gt;User preferences, personality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Domain&lt;/td&gt;
&lt;td&gt;0.95/cycle&lt;/td&gt;
&lt;td&gt;Learned facts, domain knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cognitive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decisions&lt;/td&gt;
&lt;td&gt;0.90/cycle&lt;/td&gt;
&lt;td&gt;Choices made, action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cognitive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;0.80/cycle&lt;/td&gt;
&lt;td&gt;Current tasks, recent messages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Core tier&lt;/strong&gt; = slow decay (weeks to months). Your agent's "personality" and knowledge base.&lt;br&gt;
&lt;strong&gt;Cognitive tier&lt;/strong&gt; = fast decay (hours to days). Ephemeral context that fades naturally.&lt;/p&gt;

&lt;p&gt;This means your agent doesn't need explicit "forget" logic. Old tasks decay away. Core knowledge persists. Just like your brain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Query only recent, ephemeral memories
&lt;/span&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_cognitive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Query only long-term knowledge
&lt;/span&gt;&lt;span class="n"&gt;knowledge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_core_tier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;programming&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Trust scoring — the killer feature
&lt;/h2&gt;

&lt;p&gt;Here's where Aura gets interesting. Not all information sources are equally reliable. A user telling you their name is more trustworthy than a web scrape claiming "Python 4.0 is coming soon."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TrustConfig&lt;/span&gt;

&lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrustConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_trust&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_trust_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store from different sources
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python 3.13 released October 2024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python 4.0 coming soon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Trust-weighted recall ranks user-sourced memory higher
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python release&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  score=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  trust=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;trust&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  score=0.995  trust=1.00  Python 3.13 released October 2024
  score=0.589  trust=0.50  Python 4.0 coming soon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user-sourced fact scores &lt;strong&gt;0.995&lt;/strong&gt;. The web scrape scores &lt;strong&gt;0.589&lt;/strong&gt;. Your agent now has built-in epistemological hygiene — it knows &lt;em&gt;how much&lt;/em&gt; to trust each piece of information.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Source provenance — know where every fact came from
&lt;/h2&gt;

&lt;p&gt;Every memory in Aura carries an epistemological tag:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;recorded&lt;/code&gt; — direct user input (trust × 1.00)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;retrieved&lt;/code&gt; — fetched from web/API (trust × 0.90)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inferred&lt;/code&gt; — LLM conclusion (trust × 0.85)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;generated&lt;/code&gt; — agent-created (trust × 0.80)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BTC at $67k on Feb 21&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;source_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crypto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers conservative trading strategy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;source_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recorded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crypto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Recall ranks recorded higher than retrieved
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crypto strategy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents a subtle but dangerous problem: agents presenting web search results as their own "memories." With &lt;code&gt;source_type&lt;/code&gt;, your agent always knows what it observed vs what it found vs what it guessed.&lt;/p&gt;

&lt;p&gt;No other memory SDK tracks this. Not Mem0. Not Zep. Not Cognee.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Plug it into any LLM
&lt;/h2&gt;

&lt;p&gt;Aura is LLM-agnostic. The pattern is always the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are my UI preferences?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Recall relevant context
&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Build system prompt
&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant with memory.

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Use the above context to personalize your responses.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Send to your LLM of choice:
# Ollama:     requests.post("http://localhost:11434/api/chat", ...)
# OpenAI:     openai.chat.completions.create(messages=[...])
# LangChain:  ChatPromptTemplate with {context}
# Claude:     anthropic.messages.create(...)
# Any HTTP:   just inject system_prompt
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No adapters. No framework lock-in. If your LLM takes a string, Aura works with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: structured recall with scores
&lt;/h2&gt;

&lt;p&gt;When you need more than formatted text — for routing, filtering, or debugging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall_structured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] score=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  [IDENTITY] score=0.590 -- User prefers dark mode and Vim keybindings
  [WORKING]  score=0.586 -- Fix login bug - users getting 403 on /api/auth
  [DECISIONS] score=0.581 -- Deploy staging before production, always run tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each result includes level, score, trust, tags, timestamps, and source metadata. Use this to build intelligent routing: high-trust Identity memories go straight to the system prompt; low-trust Working memories get verified first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance — sub-millisecond, for real
&lt;/h2&gt;

&lt;p&gt;Benchmarked on a standard machine with 1,000 stored records:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Store&lt;/td&gt;
&lt;td&gt;0.129 ms/op&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall (1K records)&lt;/td&gt;
&lt;td&gt;0.861 ms/op&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search by tag&lt;/td&gt;
&lt;td&gt;0.103 ms/op&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For comparison, embedding-based recall typically runs &lt;strong&gt;200ms+&lt;/strong&gt; per call. Aura is &lt;strong&gt;200x faster&lt;/strong&gt; because it uses SDR (Sparse Distributed Representation) encoding + MinHash + tag matching — no neural network inference needed.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;can&lt;/em&gt; optionally add embeddings as a 4th signal if you want semantic similarity on top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_embedding_fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the 3-signal fusion works great without them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Living memory — decay, reflect, consolidate
&lt;/h2&gt;

&lt;p&gt;Run a single maintenance cycle and Aura handles the rest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_maintenance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decayed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decayed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Promoted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reflect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;promoted&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Consolidated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;native_merged&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Archived: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;records_archived&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call this periodically (every N interactions, or on a schedule), and your agent's memory stays clean and relevant without manual curation.&lt;/p&gt;

&lt;h2&gt;
  
  
  More features you get out of the box
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Namespace isolation&lt;/strong&gt; — keep test/prod/per-user memories separate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption&lt;/strong&gt; — ChaCha20-Poly1305 + Argon2id, one argument: &lt;code&gt;Aura("./data", password="secret")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; — expose memory as a tool for Claude, GPT, or any MCP-compatible agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero dependencies&lt;/strong&gt; — pure Rust core, no runtime requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it now
&lt;/h2&gt;

&lt;p&gt;The fastest way to try Aura is the interactive Colab notebook — zero setup, runs in your browser:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://colab.research.google.com/github/teolex2020/AuraSDK/blob/main/examples/colab_quickstart.ipynb" rel="noopener noreferrer"&gt;▶ Open in Google Colab&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Or install locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; — star it if you find it useful ⭐&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/teolex2020/AuraSDK/blob/main/docs/API.md" rel="noopener noreferrer"&gt;API docs&lt;/a&gt;&lt;/strong&gt; — full reference for 40+ methods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/teolex2020/AuraSDK/tree/main/examples" rel="noopener noreferrer"&gt;Examples&lt;/a&gt;&lt;/strong&gt; — Ollama integration, research bot, edge devices&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Aura is MIT-licensed. Built by a solo developer in Kyiv, Ukraine — including during power outages. Patent pending (US 63/969,703). If you're building AI agents that need to remember, I'd love to hear what you think.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>419 Clones in 48 Hours — What Happened When I Launched an SDK for Offline AI Agent Memory</title>
      <dc:creator>Oleksander</dc:creator>
      <pubDate>Thu, 26 Feb 2026 13:05:19 +0000</pubDate>
      <link>https://dev.to/teolex2020/419-clones-in-48-hours-what-happened-when-i-launched-an-sdk-for-offline-ai-agent-memory-20n9</link>
      <guid>https://dev.to/teolex2020/419-clones-in-48-hours-what-happened-when-i-launched-an-sdk-for-offline-ai-agent-memory-20n9</guid>
      <description>&lt;p&gt;48 hours after launch. 419 clones. 90 unique developers. 8 stars. Nobody said a word.&lt;/p&gt;

&lt;p&gt;That silence told me something important: engineers don't star things — they test them.&lt;/p&gt;

&lt;p&gt;Here's the story of what I built, why, and what those numbers actually mean.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Everyone is building AI agents. Most of them have a memory problem.&lt;/p&gt;

&lt;p&gt;The standard approach: use embeddings. Store text as vectors, query them at recall time. Tools like Mem0, Zep, and LangMem all work this way.&lt;/p&gt;

&lt;p&gt;The hidden cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every recall = an embedding API call = 150–300ms latency&lt;/li&gt;
&lt;li&gt;Every embedding call = money (OpenAI charges per token)&lt;/li&gt;
&lt;li&gt;Offline deployment? Impossible — you need the embedding API available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For cloud-based chatbots this is fine. But for &lt;strong&gt;local AI agents running on your own hardware&lt;/strong&gt; — especially with Ollama — this breaks the whole offline-first promise.&lt;/p&gt;

&lt;p&gt;If your agent needs to "remember" something, it has to call home first.&lt;/p&gt;

&lt;p&gt;That felt wrong to me.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Different Idea: SDR Instead of Embeddings
&lt;/h2&gt;

&lt;p&gt;I started reading about &lt;strong&gt;Sparse Distributed Representations (SDR)&lt;/strong&gt; — the pattern encoding mechanism used in Hierarchical Temporal Memory (HTM) theory, originally inspired by how the neocortex works.&lt;/p&gt;

&lt;p&gt;The core idea: represent any concept as a sparse binary vector (256K bits in Aura's case) where only ~2% of bits are active. Similarity between patterns is computed using Tanimoto coefficient — pure bit math, no neural network needed.&lt;/p&gt;

&lt;p&gt;No embedding model. No API call. No GPU.&lt;/p&gt;

&lt;p&gt;Just math.&lt;/p&gt;

&lt;p&gt;Recall latency: &lt;strong&gt;0.35ms&lt;/strong&gt;. That's not a typo.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Aura&lt;/strong&gt; — a cognitive memory system for AI agents written in Rust.&lt;/p&gt;

&lt;p&gt;Key properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sub-millisecond recall&lt;/strong&gt; — 0.35ms average, 0.29ms after warm cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero LLM calls for memory operations&lt;/strong&gt; — the recall itself needs no model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2.7MB binary&lt;/strong&gt; — the entire memory engine fits in a small file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fully offline&lt;/strong&gt; — works with Ollama, any local model, no internet required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent across sessions&lt;/strong&gt; — brain reloads from disk, all context intact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;217 tests&lt;/strong&gt;, ChaCha20-Poly1305 encryption, patent pending (US 63/969,703)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Four memory levels with different retention weights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Working Memory    → 0.80 retention  (temporary context)
Decision Memory   → 0.90 retention  (choices made)
Domain Memory     → 0.95 retention  (learned knowledge)
Identity Memory   → 0.99 retention  (core facts)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Integration with Ollama: 3 Lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura_memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Aura&lt;/span&gt;

&lt;span class="n"&gt;brain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Aura&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent_brain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# inject context into your Ollama system prompt
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma3n:e4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# store the interaction
&lt;/span&gt;&lt;span class="n"&gt;brain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your Ollama agent now has persistent memory across sessions — no embedding API, no cloud, no ongoing cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Live Demo Output
&lt;/h2&gt;

&lt;p&gt;I ran a 4-phase test with gemma3n:e4b locally. Here's the actual terminal output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Storing facts
✓ Stored: Name is Aleksander, AI engineer from Ukraine
✓ Stored: Working on AuraSDK — cognitive memory for agents
✓ Stored: Prefers concise technical explanations

Phase 2: Conversations with memory context
[Recall: 0.35ms] Context injected into system prompt
[Recall: 0.48ms] Agent referenced previous preference correctly
[Recall: 0.41ms] Agent remembered project name without being told

Phase 3: Session reload (fresh Python instance)
Brain loaded from disk...
[Recall: 0.29ms] ALL context intact ✅

Total records: 12
Memory persisted: YES
LLM calls for memory: 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent remembered my name, project, and communication preferences &lt;strong&gt;across a completely fresh Python instance&lt;/strong&gt; — without a single LLM or embedding call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark vs Embedding-based approach
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Aura&lt;/th&gt;
&lt;th&gt;Embedding-based approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recall latency&lt;/td&gt;
&lt;td&gt;0.35ms&lt;/td&gt;
&lt;td&gt;~200ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding API calls&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Offline capable&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;2.7MB&lt;/td&gt;
&lt;td&gt;N/A (cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per recall&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;API pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speedup&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;270x faster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why Rust?
&lt;/h2&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt; — sub-millisecond recall requires zero garbage collection overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt; — memory systems that corrupt data are worse than no memory at all&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt; — 2.7MB binary runs anywhere: Raspberry Pi, edge devices, air-gapped servers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;19,500 lines of Rust. 217 tests. Built during power outages in Kyiv 🇺🇦&lt;/p&gt;




&lt;h2&gt;
  
  
  The 419 Clones
&lt;/h2&gt;

&lt;p&gt;After posting in the Ollama Discord and commenting on a few Twitter threads about agent memory, the GitHub traffic spiked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;419 clones&lt;/strong&gt; in 48 hours&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;90 unique cloners&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero comments&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think developers are quietly testing it. That's the most honest validation I could ask for — nobody clones a repo to be polite.&lt;/p&gt;

&lt;p&gt;If you're one of those 90 people: I'd genuinely love to know what you found. What worked, what didn't, what you were trying to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;📦 PyPI: &lt;a href="https://pypi.org/project/aura-memory/" rel="noopener noreferrer"&gt;aura-memory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🔗 GitHub: &lt;a href="https://github.com/teolex2020/AuraSDK" rel="noopener noreferrer"&gt;teolex2020/AuraSDK&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 Docs: &lt;a href="https://aurasdk.dev" rel="noopener noreferrer"&gt;aurasdk.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  One Question For You
&lt;/h2&gt;

&lt;p&gt;How are you handling memory in your AI agents right now?&lt;/p&gt;

&lt;p&gt;Embeddings? Simple conversation history? Something else entirely?&lt;/p&gt;

&lt;p&gt;I'm genuinely curious about the tradeoffs people are navigating — especially for local/offline deployments where latency and API costs actually matter.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>performance</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
