<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Wojtek Pluta</title>
    <description>The latest articles on DEV Community by Wojtek Pluta (@wspluta).</description>
    <link>https://dev.to/wspluta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1219726%2F89cc78af-bbcd-44c8-97af-296c1af9fa96.png</url>
      <title>DEV Community: Wojtek Pluta</title>
      <link>https://dev.to/wspluta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wspluta"/>
    <language>en</language>
    <item>
      <title>The Agent Loop Decoded: Three Levels Every Agent Engineer Must Know</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Fri, 19 Jun 2026 09:13:00 +0000</pubDate>
      <link>https://dev.to/oracledevs/the-agent-loop-decoded-three-levels-every-agent-engineer-must-know-33j0</link>
      <guid>https://dev.to/oracledevs/the-agent-loop-decoded-three-levels-every-agent-engineer-must-know-33j0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is syndicated from the original post on &lt;a href="https://blogs.oracle.com/developers/the-agent-loop-decoded-three-levels-every-agent-engineer-must-know" rel="noopener noreferrer"&gt;blogs.oracle.com&lt;/a&gt;. Read the canonical version there for the latest updates.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Chances are you have already run an agent loop today without naming it.&lt;/p&gt;

&lt;p&gt;Every session with a coding companion such as Claude Code, Codex, or Cursor is one: the model reads a request, inspects the repository, edits a file, runs the tests, observes the failures, and edits again until the build passes.&lt;/p&gt;

&lt;p&gt;That cycle of reasoning, acting, and observing the result is the agent loop at work, and it now sits at the centre of nearly every production agent system. &lt;strong&gt;The agent loop is the repeating cycle a harness runs within a single agent turn: assemble context, invoke the model to reason, act on its decision, and go again until a stop condition ends the run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This piece unpacks that loop across three levels of understanding.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Level 1 is the minimal loop most developers meet first: an LLM, a handful of tools, and a response.&lt;/li&gt;
&lt;li&gt;Level 2 introduces a lifecycle inside the loop, where memory operations turn a stateless process into a reasoning engine with state.&lt;/li&gt;
&lt;li&gt;Level 3 pushes operations both inside and outside the loop, where the agent harness becomes a system in its own right.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you will know which level your system sits at, what breaks when the level and the task are mismatched, and what engineering work moves you up. Every pattern discussed is implemented in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/agent_memory.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt;, built on Oracle AI Database, so you can run the loop rather than just read about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is an Agent
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig1_agent_vertical-2-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig1_agent_vertical-2-scaled.png" title="Figure 1: An agent perceives its environment, reasons with an LLM, acts, and remembers" alt="Diagram showing a basic AI agent architecture. The agent perceives an environment containing users, tools, and data, reasons using a large language model, and takes actions. The agent also reads from and writes to a memory system that stores state beyond the current message, enabling persistence across interactions." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: An agent perceives its environment, reasons with an LLM, acts, and remembers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An agent is a computational system that perceives its environment, reasons about what it perceives, takes actions to achieve a goal, and has some form of memory.&lt;/strong&gt; That description applies to many things: a thermostat, a chess engine, a human professional. What makes an AI agent distinct is that the reasoning step is handled by a large language model, and the range of possible actions extends well beyond a binary output.&lt;/p&gt;

&lt;p&gt;An agent’s architecture consists of two separable layers. The first is the model: the inference engine that does the reasoning. The second is the harness: the code that prepares context, executes tool calls, enforces operational constraints, and persists state. Most agent engineering work happens in the harness, not the model. Understanding that boundary clarifies where failures originate and where interventions are effective.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2FAgent-Harness.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2FAgent-Harness.png" title="Figure 2: The two layers of an agent’s architecture: the model and the harness" alt="Diagram showing the two layers of an agent architecture: the model and the harness." width="799" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: The two layers of an agent’s architecture: the model and the harness&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An agent needs at minimum four things to be useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt;: a system prompt or goal that tells it what it is trying to accomplish.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: access to information beyond the current message, including prior context, retrieved knowledge, and learned patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The ability to take actions&lt;/strong&gt;: tool calls, API requests, database writes, or any operation with an external effect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A reasoning engine&lt;/strong&gt;: an LLM that looks at context and decides what to do next.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Is a Loop?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A loop is a control structure that repeats a block of execution until a condition is met. In programming you encounter this everywhere: iterating over a collection, running until a flag is set, calling recursively until a base case is reached.&lt;/p&gt;

&lt;p&gt;The agent loop applies that same structure to an LLM-powered system. Rather than processing a user message once and returning a static response, the agent feeds its output back into itself, reasoning, acting, observing the result, and reasoning again, until it determines the task is complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig3_loop_vertical-1-1-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig3_loop_vertical-1-1-scaled.png" title="Figure 3: The agent loop: assemble context, reason, act, and repeat until a stop condition ends the run" alt="Flow diagram showing the agent loop. Context is assembled from instructions, memory, and tool outputs, then passed to a reasoning step. The agent acts by responding, calling tools, or writing state. The cycle repeats until a stop condition is met, producing a final response." width="800" height="892"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: The agent loop: assemble context, reason, act, and repeat until a stop condition ends the run&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The necessity for loops in agent execution can be derived from the nature of the use cases and tasks agents are applied to. These common use cases can be referred to as &lt;strong&gt;application modes&lt;/strong&gt;: the expected interaction patterns between a user and an agent. There are three:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assistant&lt;/li&gt;
&lt;li&gt;Deep Research&lt;/li&gt;
&lt;li&gt;Coding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Take the deep research mode. An agent tasked with finding relevant sources, identifying contradictions across them, and producing a structured summary is not running a single-shot task. It requires the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for relevant sources.&lt;/li&gt;
&lt;li&gt;Read and evaluate what it finds.&lt;/li&gt;
&lt;li&gt;Identify gaps and contradictions.&lt;/li&gt;
&lt;li&gt;Search again to fill in those gaps.&lt;/li&gt;
&lt;li&gt;Synthesise everything into a coherent output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig4_research_vertical-1-1-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig4_research_vertical-1-1-scaled.png" title="Figure 4: The deep research cycle: search, evaluate, identify gaps, and search again until coverage is sufficient" alt="Diagram showing an agentic research workflow. The process repeatedly searches for sources, reads and evaluates information, identifies gaps or contradictions, and performs additional searches until coverage is sufficient. The collected information is then synthesized into a structured summary." width="800" height="960"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: The deep research cycle: search, evaluate, identify gaps, and search again until coverage is sufficient&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No single LLM call can do all of that. What is required is the mechanism and scaffolding that allows the model to reason, act, observe the result, reason again, and continue until the task is complete. That mechanism is the agent loop.&lt;/p&gt;

&lt;p&gt;Notably, implementations of agent frameworks and harnesses, however opinionated, have shared one thing in common: convergence on a minimal agent loop design. That convergence is arguably not much of a design choice, so much as a logical consequence of the task itself.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The agent loop exists because long-horizon tasks cannot be completed in a single forward pass.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The loop emerging as a design pattern draws a parallel to how humans operate in most organisations: structured cycles of work, review, and feedback that repeat until the objective is met.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stop Conditions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Loops have to be exited eventually. The programmatic loops taught in computer science classes usually exit in one of two ways: the iteration count for the loop is reached, or a break statement inside the loop triggers an exit.&lt;/p&gt;

&lt;p&gt;A well-designed agent loop defines explicit exit criteria. Common examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model produces a final response with no pending tool calls.&lt;/li&gt;
&lt;li&gt;A goal-completion check returns true: an objective-specific predicate, not merely the absence of tool calls.&lt;/li&gt;
&lt;li&gt;A maximum number of iterations is reached.&lt;/li&gt;
&lt;li&gt;A wall-clock timeout expires.&lt;/li&gt;
&lt;li&gt;An error occurs that the agent cannot recover from.&lt;/li&gt;
&lt;li&gt;The harness identifies a failure mode, such as the agent repeating the same action without progress.&lt;/li&gt;
&lt;li&gt;The agent explicitly invokes an exit action or sets a completion flag.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the notebook accompanying this article, the stop conditions are implemented directly inside the harness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_execution_time_s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_execution_time_s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Wall-clock timeout
&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_openai_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Model produced a terminal message; exit the loop
&lt;/span&gt;
        &lt;span class="c1"&gt;# Execute tools, append outputs, continue
&lt;/span&gt;        &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Fallback if max iterations reached
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Max iterations reached; please refine the request.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The max iterations of the loop is set to 10 by default. This is a guard against the loop running indefinitely, which can incur high operational cost through the increase in token consumption across inference calls. There is also a max_execution_time_s parameter, which adds a temporal guard to the agent loop’s execution.&lt;/p&gt;

&lt;p&gt;It is worth noting that a terminal message from the model, one with no further tool calls, ends the agent’s turn. It does not mean the user’s goal has been satisfied. The model may return a clarifying question, a partial result, or a response that requires follow-up. The agent harness is responsible for checking whether the goal is actually complete, not simply whether the model has stopped emitting tool calls. This distinction becomes more consequential as tasks grow in length and complexity, and it is where domain expertise becomes paramount in agent harness engineering.&lt;/p&gt;

&lt;p&gt;Failure mode identification deserves its own mention as an exit path. A loop should break not only when work completes but when work stops progressing.&lt;/p&gt;

&lt;p&gt;The clearest example is tool call repetition: the agent invokes the same tool with identical arguments for a third consecutive iteration, a strong signal that it is stuck rather than working. A well-instrumented harness keeps a window of recent tool calls, detects the repetition, and exits with a diagnostic instead of spending the remaining iterations on a stalled run. Oscillation between two states belongs to the same family of detectable failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Defining the Agent Loop
&lt;/h2&gt;

&lt;p&gt;With the components and the exit criteria established, the definition can now be stated with precision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Agent Loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A cyclical, iterative execution pattern inside a single agent run where the harness repeatedly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Assembles execution context&lt;/strong&gt;: system instructions, conversation state, retrieved memory, tool outputs, and any relevant external data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invokes a reasoning model&lt;/strong&gt; to decide what to do next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acts&lt;/strong&gt;: responds to the user, calls tools, writes memory or state, or updates its plan.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each cycle appends its trace (assistant messages, tool outputs, state updates) to the context and repeats until a termination check ends the run. Context-window pressure and operational safety (timeouts, iteration caps, budget guards) are first-class concerns, not afterthoughts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Levels of the Agent Loop
&lt;/h2&gt;

&lt;p&gt;The agent loop is not a fixed pattern. The simple design presented above evolves as memory, tooling, and opinionated scaffolding are added. The three levels below provide a framework for where a system currently sits and what engineering work lies ahead. Most production failures (agents that repeat themselves, lose context, or produce inconsistent results across sessions) trace back to a mismatch between task complexity and agent level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fm_fig5_levels_vertical-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fm_fig5_levels_vertical-1.png" title="Figure 5: The three levels of the agent loop" alt="Figure 5: The three levels of the agent loop" width="390" height="820"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5: The three levels of the agent loop&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 1: LLM + Tools + Response
&lt;/h3&gt;

&lt;p&gt;At its simplest, the agent loop is an LLM that can call tools and return a response. There is no persistent memory, no external state, and no scaffolding beyond the loop itself. The loop iterates because tool results must be fed back to the model before it can produce a final answer.&lt;/p&gt;

&lt;p&gt;The code below demonstrates the pattern most developers encounter when building simple tool-calling agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;  &lt;span class="c1"&gt;# Terminal message; exit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig6_level1_vertical-1-1-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig6_level1_vertical-1-1-scaled.png" title="Figure 6: Level 1: the minimal tool-calling loop" alt="Diagram showing a Level 1 agent architecture. A user interacts with an agent loop containing a model and tools. The model issues tool calls, receives results, and repeats until the task is complete, after which a response is returned. No persistent memory is included." width="800" height="754"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6: Level 1: the minimal tool-calling loop&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LangChain’s ReAct agent provides this pattern out of the box. The agent receives an input query, selects a tool, calls it, observes the output, and reasons again, all within a single run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What are the latest AI papers on agent memory?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Level 1 is where most developers start, and it is genuinely useful for self-contained tasks. Its limitation is structural: the agent has no recollection of previous conversations. Every run starts cold, the context window is the only memory it has, and it resets completely when the run ends. On any multi-turn or long-horizon task, it will repeat work it already did, lose track of decisions made earlier in the session, and produce output that contradicts its own prior responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 2: Lifecycle Inside the Loop
&lt;/h3&gt;

&lt;p&gt;At Level 2, operations begin to appear inside the agent loop. Memory is read before the LLM is called, and memory is written after the agent acts. The loop now has a lifecycle. At Level 1, the loop can be seen as a transport mechanism for tool calls. At Level 2, the loop becomes a reasoning engine with state. This is also where the distinction between a memory-augmented agent and a memory-aware agent becomes consequential.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory-augmented agents&lt;/strong&gt; retrieve and inject information into context. They read from memory, but they do not actively manage it. Memory is something that happens to them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-aware agents&lt;/strong&gt; treat memory as a first-class engineering concern. They encode, store, retrieve, inject, and forget, actively managing their cognitive state within each run and across sessions. Level 2 is where you begin building memory-aware agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction, and the engineering it implies, is the subject of the DeepLearning.AI short course &lt;a href="https://www.deeplearning.ai/courses/agent-memory-building-memory-aware-agents" rel="noopener noreferrer"&gt;Agent Memory: Building Memory-Aware Agents,&lt;/a&gt; built with Oracle, if you want the full overview.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig7_aware_vertical-1-1-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig7_aware_vertical-1-1-scaled.png" title="Figure 7: Memory-augmented agents read from memory; memory-aware agents manage it" alt="Comparison of memory-augmented and memory-aware agents. In the memory-augmented approach, memory is retrieved and injected into the agent externally. In the memory-aware approach, the agent actively retrieves, stores, updates, and forgets information, directly managing its own memory state." width="800" height="918"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 7: Memory-augmented agents read from memory; memory-aware agents manage it&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Level 2 makes context assembly trade-offs immediately visible. Adding more memory types (conversation history, retrieved documents, entity records, workflow patterns) improves grounding and action selection. On the other hand, it also introduces cost: more tokens, higher latency, and a greater risk of injecting irrelevant or stale content that misleads the model rather than informing it.&lt;/p&gt;

&lt;p&gt;There are a few failure modes worth mentioning:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Noisy retrieval&lt;/strong&gt;: semantically similar documents that are not actually relevant to the current query. Mitigation approaches are implemented via relevance thresholds and precision-oriented retrieval strategies such as hybrid search and pre-, post-, and in-filtering methods in retrieval pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale memory&lt;/strong&gt;: data can quickly become irrelevant in a fast-paced problem domain: cached facts, entity records, or summaries that are no longer accurate. Mitigate with TTL policies and update-on-write patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool schema overload&lt;/strong&gt;: context bloat is a common problem, and it is most prevalent in tool-calling agents with too many tool definitions passed to the model at once, degrading tool selection accuracy. Mitigate with semantic tool retrieval rather than exhaustive enumeration; this is shown in the companion notebook for this piece.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are more failure modes, and in production these are not edge cases. They are predictable failures that any Level 2 agent will encounter as memory stores grow. Designing mitigation strategies at the start is cheaper than retrofitting fixes later.&lt;/p&gt;

&lt;p&gt;Memory operations are common in Level 2 agent loops, mainly because agents at this level are designed for continuity and adaptation. &lt;strong&gt;Memory operations are programmatic methods designed to modify data and information within the agent’s system boundary and across other system components such as databases and external stores.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;When It Runs&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read conversational memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Load prior chat history into context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read knowledge base&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Inject relevant documents and facts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read workflow memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Surface known action patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read entity memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Resolve named references in the query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write conversational memory&lt;/td&gt;
&lt;td&gt;After user message received&lt;/td&gt;
&lt;td&gt;Persist the user turn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write knowledge base&lt;/td&gt;
&lt;td&gt;After tool search&lt;/td&gt;
&lt;td&gt;Store retrieved results for future runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write entity memory&lt;/td&gt;
&lt;td&gt;After LLM response&lt;/td&gt;
&lt;td&gt;Extract and persist people, places, systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write conversational memory&lt;/td&gt;
&lt;td&gt;After final response&lt;/td&gt;
&lt;td&gt;Persist the assistant turn&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In the accompanying notebook, these operations are centralised in a MemoryManager class backed by Oracle AI Database. Before each run, the harness calls all read operations to assemble context. After each run, write operations persist the new information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Reads: all run BEFORE the tool-call loop
&lt;/span&gt;&lt;span class="n"&gt;conv_mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_conversational_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;knowledge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_summary_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conv_mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;knowledge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;workflows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Inner tool-call loop
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool_call_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Writes: all run AFTER the loop exits
&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_conversational_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;extract_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook uses six distinct memory types, each stored in Oracle AI Database and each serving a specific cognitive function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversational memory&lt;/strong&gt;: episodic chat history retrieved by thread ID via a standard SQL table. Exact lookup, no similarity search required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge base memory&lt;/strong&gt;: semantic memory backed by a vector-enabled SQL table with HNSW indexing for similarity search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow memory&lt;/strong&gt;: procedural memory storing learned action patterns and tool sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Toolbox memory&lt;/strong&gt;: a vector-indexed registry of tool definitions enabling semantic discovery rather than exhaustive schema enumeration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity memory&lt;/strong&gt;: LLM-extracted people, places, and systems, persisted across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summary memory&lt;/strong&gt;: compressed context for long conversations, with just-in-time expansion when the agent needs the full content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At Level 2, the loop is no longer just executing tools. It is actively managing its own cognitive state.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 3: Operations Inside and Outside the Loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At this point, developers understand not only which operations they require inside the loop; more opinionated scaffolding and harness begin to form around the agent loop itself.&lt;/p&gt;

&lt;p&gt;Operations now exist both within the loop and outside it, and there are deliberate architectural choices about which side of the boundary each operation belongs on. This is where agent engineering becomes opinionated, and where context engineering and memory engineering become distinct disciplines with separate concerns.&lt;/p&gt;

&lt;p&gt;In a Level 3 agent loop, some operations should be automatic. The agent should never have to decide whether to load its own conversation history. Others should be agent-triggered: the agent decides when to search the web, not the harness.&lt;/p&gt;

&lt;p&gt;Getting this boundary wrong produces either context bloat, when too much is loaded automatically, or missed context, when content that should always be present is left to the model’s discretion.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Programmatic&lt;/th&gt;
&lt;th&gt;Agent Triggered&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read conversational memory&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;The agent always needs its history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read knowledge base&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Relevant documents always loaded at run start&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read workflow base&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Known patterns always surfaced before reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read entity memory&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Named references always resolved upfront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read summary context&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Summary IDs always loaded; full content expanded on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expand a summary&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when it needs the full content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search the web (Tavily)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when stored knowledge is insufficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarise conversation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when context needs compaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write tool log (offload)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Automatic after every tool call; keeps context lean&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Context engineering at Level 3
&lt;/h4&gt;

&lt;p&gt;Three techniques only become necessary at Level 3. Below Level 3, your context is manageable by construction. At Level 3, with memory reads, multiple tool calls, and iterated reasoning, it is not.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window monitoring&lt;/strong&gt;: tracking token usage across iterations to detect when compaction is needed before the window fills and performance degrades.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation compaction&lt;/strong&gt;: replacing verbose chat history with compressed summaries while preserving originals in the database. The notebook marks messages with a summary_id rather than deleting them, keeping the full record available for audit and on-demand expansion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool output offloading&lt;/strong&gt;: persisting full tool outputs to a tool log table and replacing them in context with a compact one-line reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tool log pattern is worth examining in detail. A single web search can return three to four thousand tokens of raw results. Without offloading, every subsequent iteration in the same run carries those tokens. With offloading, the context receives only a reference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;raw_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Full output persisted to the database
&lt;/span&gt;    &lt;span class="n"&gt;log_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_tool_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tool_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Context receives only the compact reference
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[Tool Log ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Results stored. Call read_tool_log to retrieve.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Semantic tool discovery
&lt;/h4&gt;

&lt;p&gt;At Level 3, the number of available tools is unlikely to stay small. Passing every tool schema to the model on every iteration is a known failure mode: tool selection accuracy drops as the schema list grows, and token costs climb regardless of how many tools are actually relevant.&lt;/p&gt;

&lt;p&gt;The notebook addresses this with a &lt;strong&gt;Toolbox&lt;/strong&gt;: a vector-indexed registry of tool definitions where only semantically relevant tools are retrieved and passed to the model for each query. Tools are registered with LLM-augmented metadata so that embeddings capture intent and use case, not just function signatures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@toolbox.register_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;augment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# LLM enriches description for retrieval
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_tavily&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web and persist results in the knowledge base.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;# At runtime: only semantically relevant tools passed to the model
&lt;/span&gt;&lt;span class="n"&gt;relevant_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_toolbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Idempotency and tool reliability
&lt;/h4&gt;

&lt;p&gt;Tool call failures are a production reality. Network errors, rate limits, and transient service issues occur regularly. If the harness retries a failed tool call naively, it risks executing a side-effecting operation twice: writing a record, sending a message, or triggering a payment more than once.&lt;/p&gt;

&lt;p&gt;The mitigation is idempotency: assigning each tool call a stable key before execution so that retries can be safely distinguished from duplicate calls. This is harness-level engineering, not model-level reasoning, and it belongs in the Level 3 design.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prompt caching and message ordering
&lt;/h4&gt;

&lt;p&gt;At Level 3, the harness also starts to affect inference economics through prompt caching. Most LLM providers implement prefix-based caching: if the beginning of a prompt is identical to a recent request, the cached computation can be reused, reducing latency and cost.&lt;/p&gt;

&lt;p&gt;The implication for agent design is concrete. Rewriting earlier messages mid-conversation, to clean up history, reorder context, or inject new system instructions inline, breaks prefix stability and degrades cache hit rates. The correct pattern is to append new instructions rather than modifying existing message history. The &lt;a href="https://openai.com/index/unrolling-the-codex-agent-loop/" rel="noopener noreferrer"&gt;Codex implementation&lt;/a&gt; established this explicitly: old prompts are preserved as exact prefixes of new prompts specifically to maintain caching benefits across long multi-step runs.&lt;/p&gt;

&lt;p&gt;Level 3 is where the agent harness becomes a system in its own right. The inner loop, assembling context, invoking the model, and acting, has not changed. What has changed is everything around it: the scaffolding that feeds it, the operational constraints that govern it, and the persistence layer that gives it continuity across time and sessions.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Other Loops the Agent Engineer Should Know&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The agent loop does not run in isolation. It sits inside a wider system of loops, and the engineering decisions made inside the agent loop are shaped by what happens in the loops around it.&lt;/p&gt;

&lt;p&gt;Three matter most to agent engineers and memory engineers: the training loop that produced the model, the feedback loop that signals whether the system is working, and the human loop that bounds its authority.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig8_loops_vertical-1-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Ffig8_loops_vertical-1-scaled.png" title="Figure 8: The loops interconnected: the training loop produces the model, the agent loop generates experience, and the memory layer routes that experience back as training signal" alt="Diagram showing an agent loop connected to an Oracle AI Database memory layer. The loop assembles context, reasons, and acts while reading and writing episodic, semantic, procedural, entity, summary, and tool-log memories. Human review and feedback loops provide corrections and evaluation signals, while accumulated experience can feed future model training." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 8: The loops interconnected: the training loop produces the model, the agent loop generates experience, and the memory layer routes that experience back as training signal&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The training loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The training loop is the cycle that produced the model in the first place: data collection, gradient updates, evaluation, and release.&lt;/strong&gt; It operates offline, at a timescale of days or weeks, on curated datasets. The agent loop operates online, in real time, on live interactions.&lt;/p&gt;

&lt;p&gt;Today these two loops are largely decoupled. Training happens, weights are frozen, and the agent loop runs on top of those fixed weights. The apparent learning you observe within a session, an agent recalling prior context or adapting to corrections, is not weight updating. It is retrieval. The agent is not learning; it is reading from memory.&lt;/p&gt;

&lt;p&gt;This separation defines the boundary of what the agent loop can and cannot accomplish on its own. It can accumulate experience through memory operations. It cannot change the underlying model without a training cycle. Understanding this boundary tells you which problems belong to memory engineering and which require retraining.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The feedback loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every action the agent takes produces feedback. Tool results are feedback. User corrections are feedback. Evaluation metrics (hallucination rate, task completion, citation accuracy) are feedback at a system level.&lt;/p&gt;

&lt;p&gt;At Level 3, the agent harness begins to make the feedback loop explicit and instrumentable. The notebook’s context window growth chart is a primitive example: watching whether token counts stabilize across runs tells you whether your context engineering is actually working. More sophisticated systems route evaluation signals back into memory stores, marking retrieved content as reliable or unreliable based on downstream outcomes, and gradually improving retrieval quality without retraining.&lt;/p&gt;

&lt;p&gt;The feedback loop is what turns an agent into a system that improves over time. Without it, every invocation starts from the same baseline regardless of what the agent has done before.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Human in the loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Long-horizon tasks regularly reach decision points where the agent lacks the information, authority, or confidence to proceed without human input. The human-in-the-loop pattern introduces a pause condition: the agent surfaces a question or proposed action, waits for review or correction, and then continues.&lt;/p&gt;

&lt;p&gt;This is a stop condition of a different kind. Rather than halting because the task is finished, the loop pauses because it has reached the boundary of its autonomous authority. Designing this well involves two things: knowing in advance where those boundaries should sit for a given workflow, and ensuring the agent communicates specifically when it reaches one. A generic request for help is insufficient. The agent must surface a precise description of what information or decision is blocking progress.&lt;/p&gt;

&lt;p&gt;Human-in-the-loop is not a safety net for when the agent fails. It is a deliberate architectural decision about where human judgment adds the most value in a system. The agent loop handles what can be reasoned about autonomously. The human loop handles what requires authority, context, or accountability that the agent does not have.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where This Is Going&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The agent loop, the training loop, and the feedback loop are currently operated as separate engineering concerns. That separation is practical, not fundamental. As agents accumulate experience across millions of runs, the information they generate (episodic memories, entity graphs, workflow patterns, evaluation signals, context growth traces) becomes a training signal. The training loop will eventually consume the output of the agent loop, closing the circle.&lt;/p&gt;

&lt;p&gt;When that happens, the quality of the memory layer becomes the quality of the training data. Agents with well-engineered memory (clean episodic records, accurately extracted entities, reliable retrieval signals) produce better training signals than agents that let context accumulate without structure.&lt;/p&gt;

&lt;p&gt;This convergence has a name. &lt;strong&gt;Continual learning is the ability of a model to acquire new knowledge and capabilities from a stream of incoming data over time, without retraining from scratch and without catastrophically forgetting what it has already learned.&lt;/strong&gt; It is a formal machine learning discipline, not a metaphor, and it is the bridge between the two loops: the agent loop generates the experience, and continual learning is the process by which the training loop absorbs that experience into model weights.&lt;/p&gt;

&lt;p&gt;Continual learning in agentic systems is the capacity of an agent to improve over time through the accumulation of high-signal memory units, with the extracted signal applied across three optimization surfaces: token space, weight space, and latent space.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Union of the Agent Loop and the Training Loop&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;What connects them is the memory layer.&lt;/p&gt;

&lt;p&gt;Oracle AI Database serves as the agent memory core, providing vector search, relational storage, and graph capabilities in a single engine. Memory operations that run inside the agent loop (encoding, storing, retrieving, injecting, and forgetting) produce a durable record of agent experience.&lt;/p&gt;

&lt;p&gt;Oracle OCI provides the platform for continuous learning: the infrastructure to retrain models on that accumulated experience at scale, closing the loop from runtime behaviour back into model weights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent loop and the training loop are converging. The memory layer is where they meet.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For engineers building agents today, this means the decisions made about memory architecture are not just operational decisions. They are decisions about what the system will be able to learn from tomorrow. A database that can serve low-latency semantic search at runtime can also serve as the data source for a continuous training pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design your memory layer accordingly.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What is the agent loop?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent loop is the repeating cycle a harness runs within a single agent turn: assemble context, invoke the model to reason, act on its decision, and repeat until a stop condition ends the run. It exists because long-horizon tasks cannot be completed in a single LLM call.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How do you stop an agent loop from running forever?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Define explicit stop conditions in the harness: a terminal message with no pending tool calls, a goal-completion check, an iteration cap, a wall-clock timeout, unrecoverable errors, and failure mode detection such as the agent repeating the same tool call with identical arguments.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What is the difference between a memory-augmented agent and a memory-aware agent?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A memory-augmented agent retrieves and injects information into context but does not manage it; memory is something that happens to the agent. A memory-aware agent encodes, stores, retrieves, injects, and forgets, actively managing its cognitive state within each run and across sessions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How do I know which level my agent system sits at?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If there is no persistence beyond the context window, it is Level 1. If memory is read before the model call and written after the agent acts, it is Level 2. If there is a deliberate boundary between programmatic and agent-triggered operations, with techniques such as compaction, tool output offloading, and semantic tool discovery, it is Level 3.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What connects the agent loop to the training loop?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The memory layer. Agent runs generate experience: episodic records, entities, workflows, and evaluation signals. With continual learning, that experience becomes training signal. Oracle AI Database stores and serves it inside the agent loop; Oracle OCI provides the platform to retrain models on it. The patterns are implemented in the companion notebook.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>oracle</category>
      <category>database</category>
    </item>
    <item>
      <title>From RAG to Memory Systems: Building Stateful AI Architecture</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Fri, 19 Jun 2026 09:12:15 +0000</pubDate>
      <link>https://dev.to/oracledevs/from-rag-to-memory-systems-building-stateful-ai-architecture-32kn</link>
      <guid>https://dev.to/oracledevs/from-rag-to-memory-systems-building-stateful-ai-architecture-32kn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is syndicated from the original post on &lt;a href="https://blogs.oracle.com/developers/from-rag-to-memory-systems-building-stateful-ai-architecture" rel="noopener noreferrer"&gt;blogs.oracle.com&lt;/a&gt;. Read the canonical version there for the latest updates.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why retrieval is not memory, and what changes when you build for continuity instead of lookup.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAG is retrieval, not memory.&lt;/strong&gt; It helps AI find information, but it cannot reliably remember user preferences, past work, or ongoing conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory systems add continuity.&lt;/strong&gt; They store and reuse facts, preferences, policies, and summaries so AI can continue across sessions and interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different memories need different treatment.&lt;/strong&gt; Policies, preferences, facts, episodes, and traces should each have their own storage rules, retrieval methods, and lifecycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The memory manager is the core component.&lt;/strong&gt; It controls what gets stored, what gets retrieved, and how prompts are rebuilt each turn while maintaining governance, privacy, and efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern most teams ship as &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/retrieval-augmented-generation1.html" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; is roughly four lines of code: embed your documents, embed each user query, pull the top-k nearest neighbors, stuff them into the prompt. It works. It works well enough that it's the default for almost every internal knowledge assistant shipped in the last two years. It also explains why those assistants all feel the same after the first interesting conversation. They forget and they repeat themselves. They cannot learn from what they were just told, and they cannot carry on a meaningful series of conversations.&lt;/p&gt;

&lt;p&gt;RAG helps a model look things up. A memory system helps an application remember and continue across turns. The goal is not to keep packing more tokens into prompts. The goal is to build a reusable memory layer that ingests once, distills what's useful, and retrieves the right slice when it's needed. Most teams don't have agent memory, they have retrieval plus prompt inflation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try the companion demo:&lt;/strong&gt; explore the working RAG-to-memory implementation, schemas, and runnable code in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/rag-to-memory-systems-demo" rel="noopener noreferrer"&gt;Oracle AI Developer Hub repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out &lt;a href="https://www.youtube.com/watch?v=lQLHMXd08xI" rel="noopener noreferrer"&gt;the walkthrough&lt;/a&gt; where we show what has to sit around retrieval to make AI agents actually stateful: typed memory, scoped records, promotion gates, fallback retrieval, prompt assembly, and traces.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to evolve RAG into a memory system
&lt;/h2&gt;

&lt;p&gt;The short version: keep your retrieval pipeline as one input to a larger loop, and add these five things around it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Type your memory.&lt;/strong&gt; Separate policy, preference, fact, episodic, and trace memory into distinct schemas with distinct lifecycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope every record.&lt;/strong&gt; Add a tenant_id, user_id, and agent_id, then enforce scope as a hard predicate before ranking, never after.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put a promotion gate between observations and durable writes&lt;/strong&gt;. This prevents the store from poisoning itself with everything the model said.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reassemble the prompt on EVERY turn&lt;/strong&gt;. Use typed memory rather than accumulating context across turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instrument the loop.&lt;/strong&gt; Add a trace envelope so you can replay, audit, and improve it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2Frag-vs-memory-system-1024x547.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2Frag-vs-memory-system-1024x547.png" title="RAG vs Memory comparison" alt="A side-by-side architecture diagram comparing basic RAG as a one-way document lookup pattern with a memory system as a loop that distills observations into typed, governed memory and reassembles prompts each turn." width="799" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;RAG vs Memory comparison&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The rest of this guide walks through each step, the schemas behind them, the queries, the order you should build things in, and the tradeoffs that decide whether it survives production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why basic RAG stops short of stateful AI
&lt;/h2&gt;

&lt;p&gt;Pure retrieval architectures fail in five places, usually in the same order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-turn continuity.&lt;/strong&gt; The system has no memory of what was just said unless you re-stuff the transcript every turn. Stuff it and you pay tokens and lose model quality to context rot. Skip it and the agent forgets what the user asked three messages ago. Retrieval alone won't fix this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resumability.&lt;/strong&gt; A user closes the tab on Friday and reopens it on Monday. The agent should pick up where they left off. With RAG alone, the only thing that survives is the document corpus. Every preference the user expressed and every partial workflow they were in the middle of is gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long conversations.&lt;/strong&gt; After a few hundred turns you can't pack the transcript anymore. You need summarization and structured extraction. None of that is retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User preferences and policy recall.&lt;/strong&gt; "This user wants JSON." "This tenant formats dates DD/MM/YYYY." "Refunds over $500 require approval." You don't get those by semantic similarity to a user's last message. They're durable rules and parameters that need exact lookup with stable visibility on every turn, scoped to the right user or tenant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt growth.&lt;/strong&gt; The default response to all of the above is to throw more into the prompt. More retrieval, more history, more context. The token bill compounds, the model gets lost-in-the-middle, and the system feels slower at exactly the moment it should feel more competent.&lt;/p&gt;

&lt;p&gt;If all you need is semantic lookup over documents, basic RAG is enough. If you need continuity and governed recall, you now have a memory system design problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changes when retrieval becomes memory
&lt;/h2&gt;

&lt;p&gt;The shift here isn't bolting a database next to the vector store. What changes is what the storage layer is for and how your agents interact with it.&lt;/p&gt;

&lt;p&gt;Retrieval is a query against a corpus you ingested once. The corpus is upstream and the query is downstream. Nothing the model says or does flows back into the corpus.&lt;/p&gt;

&lt;p&gt;Memory is a &lt;em&gt;write&lt;/em&gt; path. Anything the system observes during a run, or that a user explicitly confirms, can be promoted into a durable store. Each promoted record carries its own scope, its own provenance back to the run that created it, and a retention rule. The same record can be read later from a different turn, a different session, or by a different agent operating under the same access boundary.&lt;/p&gt;

&lt;p&gt;A useful frame for the rest of this guide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve when you need knowledge&lt;/li&gt;
&lt;li&gt;Store when you need continuity&lt;/li&gt;
&lt;li&gt;Structure when you need precision&lt;/li&gt;
&lt;li&gt;Summarize when you need compression&lt;/li&gt;
&lt;li&gt;Govern when you need production trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a popular metaphor for this kind of system: &lt;em&gt;the second brain&lt;/em&gt;. The framing matters because most second-brain implementations stop one step short. They give you searchable notes, which is closer to a better filing cabinet than to actual memory. A real second brain is accumulated, reusable memory. Notes get distilled into facts that attach to the entities they describe, and completed work gets summarized into reusable episodes. The same store then serves a chat session for one user, an autonomous agent running on their behalf, and a human searching their own work later, without any of them needing its own copy. The architectural leap is from file retrieval to memory formation. The optimization is from context stuffing to context reuse.&lt;/p&gt;

&lt;p&gt;A vector store is not a memory system. The rest of this guide is what's underneath that distinction.&lt;/p&gt;




&lt;h2&gt;
  
  
  What kinds of memory do AI systems need
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2Ffive-types-of-memory-1024x694.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2Ffive-types-of-memory-1024x694.png" title="Memory types for AI systems" alt="A memory manager fans out into policy, preference, fact, episodic, and trace memory, with exact lookup for policy and preferences, hybrid retrieval for facts and episodes, and replay or distillation from trace." width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Memory types for AI systems&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;"Add memory" may sound like a single feature, but in practice it refers to several distinct systems. If you don't separate them, you end up with one catch-all store that handles every question poorly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: one memory store for every kind of memory.&lt;/strong&gt; A single vector index where everything goes in. Raw conversation logs sit next to extracted facts. A user preference looks like a passing remark from three weeks ago. A tenant policy is just another row to be ranked. Retrieval can't tell these apart at query time, and neither can the model when the results land in the prompt.&lt;/p&gt;

&lt;p&gt;There are five types worth distinguishing. Each one has its own schema and lifecycle, and each calls for a different retrieval strategy. Conflating any two of them produces specific, predictable failure modes.&lt;/p&gt;




&lt;h3&gt;
  
  
  Policy memory
&lt;/h3&gt;

&lt;p&gt;Procedural rules and constraints. Tenant-scoped or global, versioned, and tightly controlled. Retrieved by exact match, never similarity.&lt;/p&gt;

&lt;p&gt;Why it exists: the agent has to follow rules. Compliance constraints, brand guardrails, approval thresholds, security boundaries, etc. These change rarely and deliberately. They aren't suggestions and they aren't searchable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft/2020-12/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"policy_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"policy_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"policy_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"compliance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"guardrail"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"approval"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"policy_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"policy_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"effective_from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"effective_until"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"created_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refund_threshold"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max_auto_approve_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"data_residency"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"allowed_regions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tone_guardrail"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"policy_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"forbidden_phrases"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"risk-free"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retrieval is WHERE tenant_id = ? AND policy_key = ?. No vectors involved. Policy retrieval that uses similarity is a bug, because you'll silently drift away from the rule that's actually in force.&lt;/p&gt;




&lt;h3&gt;
  
  
  Preference memory
&lt;/h3&gt;

&lt;p&gt;Stable personalization parameters. User-scoped, sometimes tenant-scoped. Predictable lookup, no ranking.&lt;/p&gt;

&lt;p&gt;Why it exists: this is what makes the system feel tailored without the user having to retell it every session. "I want JSON." "Use DD/MM/YYYY." "Be terse." Skip preference retrieval once and the system feels generic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft/2020-12/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"preference_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"pref_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"pref_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"user_stated"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inferred"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"admin_set"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maximum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"updated_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"response_format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_stated"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date_format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DD/MM/YYYY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inferred"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"verbosity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pref_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"terse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_stated"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retrieval is a single keyed lookup that runs every turn. Preferences feed the static prefix of the prompt: the part that should hit the cache and stay stable.&lt;/p&gt;




&lt;h3&gt;
  
  
  Fact memory
&lt;/h3&gt;

&lt;p&gt;Durable assertions the agent may reuse, with provenance. Where compounding advantage lives, and where design problems get hardest, because every fact you write is a bet about the future.&lt;/p&gt;

&lt;p&gt;Why it exists: this is the agent's working knowledge of the world. "The customer's production database is in us-east-1." "Their fiscal year starts April 1." Facts are what let the agent stop asking the same clarifying questions across sessions. They're also what poisons the system when promoted carelessly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft/2020-12/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fact_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"fact_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NULL for tenant-scoped facts"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"agent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NULL for shared facts"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"entity the fact is about"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kind of fact"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"the fact text"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"content_hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SHA-256 for dedup"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minItems"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxItems"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nullable; rebuildable"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"partial-match filters"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"provisional"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"revoked"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"source_run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"source_turn_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maximum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"superseded_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fact_id of replacement"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"expires_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"x-indexes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_fact_scope"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_fact_status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_fact_dedup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"content_hash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few schema details deserve attention before moving on. The &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/create-tables-using-vector-data-type.html" rel="noopener noreferrer"&gt;embedding column&lt;/a&gt; lives in the same row as content and the scope columns, which lets one query plan filter by tenant and search semantically. The content_hash is the dedup primitive: a SHA-256 of normalized content, scoped by (tenant_id, user_id, agent_id), so the same assertion written twice resolves to the same row instead of two rows competing for retrieval. And subject and predicate work as metadata tags for retrieval and grouping, not as the structured pieces of an RDF-style triple; the assertion itself lives in content as prose, which is what the agent reads. (Embedding dimension is model-dependent. The 1024 shown here matches several modern embedding models, but set minItems/maxItems to match the model you actually use.)&lt;/p&gt;

&lt;p&gt;Example records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer:acme-corp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"predicate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"infrastructure"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Production database is in us-east-1, replicas in us-west-2."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"source_run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_a1b2c3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fact memory needs hybrid retrieval (lexical for "give me facts about subject X," semantic for "give me facts conceptually adjacent"). The retrieval section walks through the scoring.&lt;/p&gt;




&lt;h3&gt;
  
  
  Episodic memory
&lt;/h3&gt;

&lt;p&gt;Structured summaries of completed work. The summary is a reusable artifact distilled from the trace; the underlying trace is kept separately for replay.&lt;/p&gt;

&lt;p&gt;Why it exists: most production tasks are variations on tasks the agent has already completed. Episodic memory is what lets the system recognize "we've done this before" and retrieve the shape of the prior solution rather than re-deriving it from scratch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft/2020-12/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"episodic_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"episode_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"task_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"support_case"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"migration"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"resolved"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"escalated"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"key_steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"artifacts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file refs, ticket IDs"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"number"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minItems"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxItems"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"provisional"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"revoked"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"source_run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"completed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"x-indexes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_ep_scope"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task_type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed_at"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"task_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"support_case"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Stripe webhook signature mismatch after secret rotation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User rotated webhook secret in dashboard but kept old secret in env var. Resolved by updating env var and redeploying."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"resolved"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"key_steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"Confirmed signature verification was failing for all events"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"Compared dashboard secret to env var"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"Updated env var, redeployed, verified delivery"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Retrieval is hybrid (lexical + vector, fused) over the summary, optionally filtered by task_type.&lt;/p&gt;




&lt;h3&gt;
  
  
  Trace memory
&lt;/h3&gt;

&lt;p&gt;Raw, append-only execution events. The flight recorder. High-volume, mostly useful for replay and forensics.&lt;/p&gt;

&lt;p&gt;Why it exists: you need to be able to reconstruct what happened in a run. What did the agent retrieve, what did it decide, what tools did it call, what did the user say, what did the model say back. Trace memory is the source from which episodic and fact memory get distilled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft/2020-12/schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trace_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"turn_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"event_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"user_msg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model_msg"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"payload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"token_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date-time"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"x-indexes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_trace_run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"turn_index"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"idx_trace_tenant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trace memory is the wrong place to look when you want to know what a user is &lt;em&gt;like&lt;/em&gt;. It tells you what a user &lt;em&gt;did&lt;/em&gt;. Most teams conflate trace with fact and preference memory and end up doing semantic search across execution logs to infer how to behave. That's how systems start hallucinating their own personality.&lt;/p&gt;




&lt;h3&gt;
  
  
  How this maps to other taxonomies
&lt;/h3&gt;

&lt;p&gt;A lot of public writing slices memory differently. Cognitive-science framings use semantic, episodic, and procedural. Some product framings add conversational, summary, and entity. Those terms aren't wrong, they just mix structural categories (conversational, summary) with content categories (semantic, entity). For implementation I prefer this five-type split because each type behaves differently in storage and retrieval, and treating them as one thing produces predictable failures.&lt;/p&gt;

&lt;p&gt;If you're using the Oracle Agent Memory SDK, it records policy as a guideline, preferences as preference, facts as fact, episodic as memory, and traces as message. Pick a vocabulary and stick with it across your codebase. The categories matter; the names don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  How should you store agent memory
&lt;/h2&gt;

&lt;p&gt;The shape of the memory drives the store. Four tradeoffs come up over and over.&lt;/p&gt;




&lt;h3&gt;
  
  
  Short-term (STM) vs Long-term (LTM) memory
&lt;/h3&gt;

&lt;p&gt;Short-term memory means the agent process keeps state in RAM (a Python dict, a Redis cache, a session object) and loses it when the process restarts. Fast, simple, zero friction. The right call for one specific case: ephemeral session state during a single run. Tool outputs, scratch reasoning, intermediate retrieval results, anything that shouldn't outlive the turn.&lt;/p&gt;

&lt;p&gt;For everything else it's the wrong call. State that lives only in RAM gives you no replay path when something goes wrong, no audit trail when a regulator asks how a decision got made, no way to enforce retention, and no scoping model to keep tenants apart. Yesterday's session is gone the moment the process restarts.&lt;/p&gt;

&lt;p&gt;Long-term memory means the durable layer is a real datastore. The structured schema, the transactional guarantees, and the query language are why it costs more to write to and far more to operate around. It's the only viable choice for policy, preference, fact, and episodic memory.&lt;/p&gt;

&lt;p&gt;A real system uses both. Working set in short-term memory during a run, long-term for anything that should survive the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Ephemeral, discarded at end of run
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scratch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# tool outputs, intermediate state
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;  &lt;span class="c1"&gt;# current turn's events before flush
&lt;/span&gt;
        &lt;span class="c1"&gt;# Durable, read from / written to via the memory manager
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What changes when you move durable state into a database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replay becomes possible.&lt;/strong&gt; You can reproduce yesterday's session by replaying trace memory for that run_id. Without this, debugging is guesswork.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-process and multi-region work correctly.&lt;/strong&gt; A user starts a session on one server, comes back tomorrow on a different server, and the agent picks up where it left off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Promotion becomes a real operation.&lt;/strong&gt; You can't promote a candidate fact to durable memory if there is no durable memory. Everything is equally ephemeral, which means nothing accumulates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit and deletion become tractable.&lt;/strong&gt; GDPR right-to-forget is a DELETE cascading across stores. With short-term memory it's fiction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost goes up at write time and down at read time.&lt;/strong&gt; Database writes are slower than dict writes, but database reads of structured memory are far cheaper than re-running extraction over a transcript every turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: no replay.&lt;/strong&gt; You can't reproduce yesterday's session because the memory store has moved on. This is the bug class that ruins post-mortems. Trace memory written on every turn is the cheapest insurance against it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: treating scratchpad as memory.&lt;/strong&gt; Tool call results, intermediate reasoning, and generated assets all leaking into long-term storage because nothing ever explicitly discards them. Your memory ends up mostly exhaust. The split above (ephemeral RAM for the run, durable database for what should survive) is the discipline that prevents this.&lt;/p&gt;




&lt;h3&gt;
  
  
  Filesystem vs database memory
&lt;/h3&gt;

&lt;p&gt;A class of memory storage lives on the filesystem: markdown files, AGENTS.md, project notes, structured directories of knowledge. The pattern is everywhere in coding agents (Claude Code, Cursor, Codex), and it works there for a specific reason: the filesystem is already the unit of work. The repo is the scope and the file path is the boundary. Models trained on internet-era developer workflows are unusually competent with developer-native interfaces, which is why filesystems keep showing up in modern agent stacks.&lt;/p&gt;

&lt;p&gt;There are two distinctions. The first is between the filesystem as an &lt;em&gt;interface&lt;/em&gt; (the small Unix-shaped tool surface of ls, cat, grep, tail, read_range) and as a &lt;em&gt;substrate&lt;/em&gt; (the place durable state actually lives). The interface argument is strong: agents are good at composing those primitives, and you don't need bespoke memory APIs when a folder of markdown will do. The substrate argument is much weaker once you leave a single-tenant prototype, because filesystems give you almost none of what shared, reliable memory needs.&lt;/p&gt;

&lt;p&gt;What breaks when filesystem becomes the substrate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No tenant isolation.&lt;/strong&gt; A misconfigured path traversal or a shared root directory exposes another tenant's data. There is no row-level access control; permissions are filesystem-level, which is a coarse instrument for a fine-grained problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No transactional guarantees.&lt;/strong&gt; Updating a fact and superseding the old one takes two file writes that have to succeed together. They can't. Crash in between and you have inconsistent memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No hybrid retrieval in one query.&lt;/strong&gt; You can grep, or you can embed and search vectors, but not both filtered by the same predicates in the same operation. Once you start maintaining your own indexes and metadata files, you are building a database with fewer guarantees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No deletion cascade.&lt;/strong&gt; Removing a fact from a markdown file doesn't invalidate the embedding index that referenced it. GDPR right-to-forget is a much harder operation when truth is scattered across files and projections that point at them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency is your problem.&lt;/strong&gt; Concurrent writes can interleave or overwrite without locking. Locking semantics vary across platforms and network filesystems, and naive concurrent writes silently lose entries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When to choose which:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Filesystem&lt;/th&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-tenant local agent (coding, personal)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Overkill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-tenant SaaS agent&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need transactional updates&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need exact + semantic in one query&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Want users to edit memory by hand&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Possible but harder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need replay and audit trails&lt;/td&gt;
&lt;td&gt;Possible but painful&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Local files are not the end state. They're a useful interface for single-developer agents and a poor substrate for everything else.&lt;/p&gt;




&lt;h3&gt;
  
  
  Single table vs multiple typed tables
&lt;/h3&gt;

&lt;p&gt;Once you've chosen a database, the next decision is how to lay it out. Three patterns come up in practice, with real shipped examples for each.&lt;/p&gt;

&lt;p&gt;The simplest is a single store with a type discriminator (a memory_type column or a namespace path). &lt;a href="https://langchain-ai.github.io/langgraph/concepts/memory/" rel="noopener noreferrer"&gt;LangGraph's BaseStore&lt;/a&gt; takes this approach: a key-value store with hierarchical namespaces, where different memory categories live under different prefixes rather than in different tables. &lt;a href="https://docs.mem0.ai/overview" rel="noopener noreferrer"&gt;Mem0&lt;/a&gt; sits nearby in spirit, though its implementation spreads memory across a vector store, a graph store, and a key-value store rather than collapsing everything into one row shape. The wins of this pattern are real: schema simplicity, one storage interface for everything, no joins, easy to add a new category. For a single-tenant prototype or a low-volume internal tool, this is the right call.&lt;/p&gt;

&lt;p&gt;The middle pattern splits by access pattern. The most common shape is two stores: one for durable memory and a separate store for conversation history or traces. &lt;a href="https://langchain-ai.github.io/langgraph/concepts/persistence/" rel="noopener noreferrer"&gt;LangGraph does this at the framework level&lt;/a&gt; (BaseStore for cross-thread memory, checkpointer for thread-local conversation state). &lt;a href="https://docs.letta.com/guides/agents/memory" rel="noopener noreferrer"&gt;Letta splits further into core memory (in-context blocks), recall memory (conversation history), and archival memory (durable facts in a vector store)&lt;/a&gt;. &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/" rel="noopener noreferrer"&gt;Oracle's AI Agent Memory SDK&lt;/a&gt; lands here too: its public API exposes three primitives (users and agents, memories, threads), with the underlying storage split across distinct tables for threads, messages, memory records, and actor profiles. The reasoning is consistent across these systems. Traces are append-only, high-volume, and queried by run_id for replay. They share almost nothing at the storage layer with durable memory. Letting traces grow inside the same table as facts means heavy trace writes bloat the working set for every other query.&lt;/p&gt;

&lt;p&gt;The third pattern is one table per memory type, the model this article describes. The cost is real: more DDL, more places to touch when something cross-cutting changes, more conceptual surface area for someone new to the codebase. The payoff is that the rules each type follows live in the schema itself. Per-type retention, per-type indexes (no wasted vector index on policy or preference), and per-type access patterns become enforceable without application code having to remember them. The shift from the middle pattern to this one is incremental: you're already splitting traces out, now you split policy and preference out from fact and episodic for the same reason.&lt;/p&gt;

&lt;p&gt;A reasonable working rule: start with a single store when the system is a prototype. Split traces or conversation history out as soon as their volume changes the characteristics of the operational store. Move to the typed model when you have more than one tenant, or when policy and preference start needing different retention rules than fact memory. Each split costs something at the time you do it and pays back across every retrieval after.&lt;/p&gt;




&lt;h3&gt;
  
  
  Exact retrieval vs semantic retrieval
&lt;/h3&gt;

&lt;p&gt;Pure semantic retrieval is a first draft. It works for "find me documents about this concept." It fails for "what is this user's timezone preference," because that's a key lookup and similarity ranking is the wrong tool for a key lookup.&lt;/p&gt;

&lt;p&gt;Memory needs both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact retrieval handles preferences, policies, IDs, and scoped lookups&lt;/li&gt;
&lt;li&gt;Semantic retrieval handles facts, episodes, and recall over unstructured content&lt;/li&gt;
&lt;li&gt;Hybrid retrieval (lexical plus semantic, with reranking) is the production floor for fact and episodic memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architectural question is whether your store can do both in one query, or whether you have to compose them across systems. Composing across systems is workable until you need them filtered by the same access-control predicates, at which point a single query plan starts to look attractive.&lt;/p&gt;




&lt;h3&gt;
  
  
  Transcripts vs summaries
&lt;/h3&gt;

&lt;p&gt;Storing raw transcripts as memory is the most common anti-pattern in this space. Fast to ship, expensive to live with. Transcripts are noisy: false starts, retracted statements, recovered tool errors, intermediate reasoning the user never confirmed. Index them directly and you train your retrieval layer to surface noise.&lt;/p&gt;

&lt;p&gt;Summaries are durable memory. Transcripts are source material. Keep transcripts in trace memory for replay, promote structured summaries into episodic memory for reuse. The transcript records what happened. The summary records what should be remembered.&lt;/p&gt;




&lt;h3&gt;
  
  
  Storage tradeoff at a glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory type&lt;/th&gt;
&lt;th&gt;Store shape&lt;/th&gt;
&lt;th&gt;Retrieval pattern&lt;/th&gt;
&lt;th&gt;Lifecycle&lt;/th&gt;
&lt;th&gt;Risk if wrong&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Policy&lt;/td&gt;
&lt;td&gt;Relational, versioned&lt;/td&gt;
&lt;td&gt;Exact match by ID/version&lt;/td&gt;
&lt;td&gt;Immutable, deployment-controlled&lt;/td&gt;
&lt;td&gt;Silent guardrail drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preference&lt;/td&gt;
&lt;td&gt;Structured rows or JSON keyed to user&lt;/td&gt;
&lt;td&gt;Exact match by user ID&lt;/td&gt;
&lt;td&gt;TTL, user-controlled&lt;/td&gt;
&lt;td&gt;System feels generic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fact&lt;/td&gt;
&lt;td&gt;Hybrid: structured + vector&lt;/td&gt;
&lt;td&gt;Lexical + semantic with rerank&lt;/td&gt;
&lt;td&gt;Provenance, decay, revocation&lt;/td&gt;
&lt;td&gt;Memory poisoning, drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Structured summaries with vector index&lt;/td&gt;
&lt;td&gt;Lexical + Semantic over summaries&lt;/td&gt;
&lt;td&gt;Long-lived, policy-gated&lt;/td&gt;
&lt;td&gt;Precedent becomes policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace&lt;/td&gt;
&lt;td&gt;Append-only event log&lt;/td&gt;
&lt;td&gt;Replay by run ID; vector for forensics&lt;/td&gt;
&lt;td&gt;Retention-policy bound&lt;/td&gt;
&lt;td&gt;No replay, no debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How memory systems handle long conversations without exploding context
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fper-turn-loop-1024x590.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fper-turn-loop-1024x590.png" title="Bounded-prompt agent memory loop" alt="Each turn appends trace events, retrieves durable memory, assembles a bounded prompt, calls the model, records the response, extracts candidates, and sends them through a promotion gate." width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Bounded-prompt agent memory loop&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the question that drives most teams toward a real memory system, and the question they usually try to solve in the wrong place. The instinct is to keep the entire conversation in the prompt and trust the model's growing context window to absorb it. That's the dump-the-transcript anti-pattern, and it fails for three compounding reasons.&lt;/p&gt;

&lt;p&gt;Context windows aren't free. Cache hits are cheap, cache misses on a million-token prompt aren't, and most production agent workloads have enough volatile content per turn that cache hit rates end up mediocre.&lt;/p&gt;

&lt;p&gt;More tokens often make the model worse. Lost-in-the-middle, attention dilution, and context rot all show up in production at long context lengths. The default instinct to give the model everything is the default wrong answer.&lt;/p&gt;

&lt;p&gt;Even if cost and quality were free, transcripts are the wrong shape for retrieval. You can't ask a transcript "what does this user prefer?" and get a clean answer. You can only ask the summary you extracted from it.&lt;/p&gt;

&lt;p&gt;The pattern that works keeps the raw record in trace memory, but doesn't retrieve from it directly. Periodically distill it into structured artifacts in episodic and fact memory and retrieve from those. The conversation gets shorter over time in the prompt, even as the underlying record gets longer.&lt;/p&gt;

&lt;p&gt;Each turn runs a fixed loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Append the user message to trace memory&lt;/li&gt;
&lt;li&gt;Reassemble the prompt from durable memory (policy + preferences + retrieved facts + retrieved episodes + a short summary of recent turns)&lt;/li&gt;
&lt;li&gt;Call the model&lt;/li&gt;
&lt;li&gt;Append the response back to trace memory&lt;/li&gt;
&lt;li&gt;Extract structured artifacts for the promotion gate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key move is step 2. The prompt is rebuilt from scratch on every turn. Active policy goes in. Applicable preferences go in. The top-ranked facts and episodes for this query go in. A short summary of recent turns goes in at the tail. The transcript itself stays in trace memory. That is the difference between accumulating context and reassembling it: accumulating grows forever, reassembling stays bounded.&lt;/p&gt;

&lt;p&gt;For the extraction step, two patterns work in production. The first is a combined structured-output call: the model returns its reply AND any extraction candidates in one API request, so extraction adds no extra latency or cost to the user-visible turn. The second is a separate extractor LLM that runs as a follow-up call after the reply. That path should move off the request thread (a background task, a queue worker, a separate process) so the user doesn't wait on a second round-trip.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two retrieval paths, not one
&lt;/h2&gt;

&lt;p&gt;This is the most important structural decision in the system, and it's the one most often missed. Memory retrieval has two modes that look similar from the outside but have completely different requirements underneath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Known-scope lookup.&lt;/strong&gt; "Give me all active policies and preferences that apply to this agent for this turn." The query knows exactly what it wants. This is enumeration. Every policy that applies must be returned, no top-k cutoff, no ranking. Cheap, deterministic, runs on every single turn. This is what feeds the model's static prefix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic discovery.&lt;/strong&gt; "Find facts and episodes conceptually relevant to this user message." The query doesn't know what it wants until it sees what comes back. Ranking matters. Top-k matters. Score thresholds matter. This is what feeds the volatile tail of the prompt, on demand.&lt;/p&gt;

&lt;p&gt;Build both into the manager from day one, because conflating them is the most common mistake in this space.&lt;/p&gt;




&lt;h3&gt;
  
  
  Path A: Known-scope lookup (startup injection)
&lt;/h3&gt;

&lt;p&gt;When a session starts or an agent spawns, you need every policy and preference that applies, in full. You're not searching by relevance; you're loading the rule book. The right tool is exact-match SQL with a UNION across scope buckets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- All policy and preference memory that applies to this turn.&lt;/span&gt;
&lt;span class="c1"&gt;-- Exhaustive, no ranking, no top-k cutoff.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'policy'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;policy_key&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;policy_value&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
 &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;policy_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;effective_until&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;effective_until&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'preference'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pref_key&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pref_value&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
 &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;preference_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs on every turn. It's fast (indexed exact match), exhaustive (no top-k), and deterministic (same inputs, same output). It feeds the static prefix of the prompt: the part that ideally hits the prompt cache and doesn't change between turns.&lt;/p&gt;

&lt;p&gt;A couple of things to call out. The query filters by scope before doing anything else, so cross-tenant data is structurally invisible. And there's no embedding involved anywhere in the path. Policy retrieval that uses similarity drifts silently away from the rule that's actually in force, which is one of the worst failure modes in this space.&lt;/p&gt;




&lt;h3&gt;
  
  
  Path B: Semantic discovery (runtime recall)
&lt;/h3&gt;

&lt;p&gt;When the user sends a message, you need facts and episodes that might be relevant, ranked by how relevant they actually are. This is where hybrid retrieval lives: lexical search for term-exact recall, vector search for conceptual recall, fused into one ranked list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Hybrid: vector similarity + lexical match, scope-filtered before ranking.&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;vector_hits&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;fact_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ALL_MINILM_L12_V2&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;
 &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;superseded_by&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;
 &lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;lexical_hits&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;fact_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SCORE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lexical_score&lt;/span&gt;
 &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="c1"&gt;-- Oracle Text CONTEXT index on content&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;CONTAINS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
 &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;lexical_score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
 &lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;fact_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lexical_score&lt;/span&gt;
 &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;vector_hits&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;
 &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;lexical_hits&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_id&lt;/span&gt; &lt;span class="c1"&gt;-- content is CLOB; can't appear in USING/equality joins;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lexical CTE above uses &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/ccref/oracle-text-SQL-statements-and-operators.html" rel="noopener noreferrer"&gt;Oracle Text syntax&lt;/a&gt; (CONTAINS, SCORE, the trailing index-id argument). On other engines, you’d swap in the equivalent such as &lt;a href="https://www.postgresql.org/docs/current/textsearch-controls.html" rel="noopener noreferrer"&gt;tsvector @@ to_tsquery on Postgres&lt;/a&gt;, &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/fulltext-search.html" rel="noopener noreferrer"&gt;MATCH...AGAINST on MySQL&lt;/a&gt;, &lt;a href="https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-match-query-phrase" rel="noopener noreferrer"&gt;match_phrase on Elasticsearch&lt;/a&gt;. The hybrid pattern is the same with different operator names. The application layer fuses scores. A weighting that holds up well in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fuse_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lexical_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_lexical&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Vector: convert distance to similarity in [0, 1]
&lt;/span&gt;    &lt;span class="n"&gt;v_sim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

    &lt;span class="c1"&gt;# Lexical: normalize against the max in this batch
&lt;/span&gt;    &lt;span class="n"&gt;l_sim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lexical_score&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;max_lexical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lexical_score&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;max_lexical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

    &lt;span class="c1"&gt;# Weighted: lexical gets more weight when both signals exist
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;lexical_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;v_sim&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;l_sim&lt;/span&gt;

    &lt;span class="c1"&gt;# Vector-only fallback (no lexical match)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v_sim&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From running this in production, two settings matter most. Lexical hits get more weight (0.6) than vector similarity (0.4). When both signals fire on the same row, the row is genuinely relevant; when only the vector fires, it's plausibly relevant but should rank lower. And drop everything below a fused score of 0.4, because vector search has a long tail of weakly-related results that hurt more than they help.&lt;/p&gt;

&lt;p&gt;Calibrate the result set into tiers the agent can actually use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;relevance_tier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# lexical fallback, no vector signal
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returning bare scores asks the agent to do calibration work it’s not good at. Returning relevance: "high" lets the model decide how much weight to give each result, which is something it does well.&lt;/p&gt;




&lt;h3&gt;
  
  
  Filter before ranking, always
&lt;/h3&gt;

&lt;p&gt;Both paths share one rule: scope filters run before ranking, never after. A query that ranks across all tenants and then filters is a leak waiting to happen. The filter itself works fine; the problem is that the embedding neighborhood was already shaped by data the user shouldn't have seen.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- WRONG: ranks across all tenants, then filters&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt;
 &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- RIGHT: filters by scope first, ranks within scope&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;
 &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vector_distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'COSINE'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the single most important pattern in this section. It's also the one most teams get wrong, because the ranking-first pattern is what most vector store quickstarts demonstrate.&lt;/p&gt;




&lt;h3&gt;
  
  
  Cascade fallback
&lt;/h3&gt;

&lt;p&gt;In production, every layer of this can fail. The embedder is offline. The vector index is rebuilding. The lexical index is missing. A real system degrades through tiers rather than throwing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hybrid (vector + lexical) -&amp;gt; vector-only -&amp;gt; lexical-only -&amp;gt; exact-match LIKE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step is still useful, just at lower fidelity. The agent still gets a result. The system stays available. The failure shows up in the trace envelope as a degraded retrieval mode rather than user-visible breakage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a memory manager matters more than another vector store
&lt;/h2&gt;

&lt;p&gt;If you take memory typing seriously, you need something that decides what gets written and where, what gets retrieved when the model asks, and what gets assembled into the prompt on this turn. A vector store doesn't do any of that. The component that does is the memory manager.&lt;/p&gt;

&lt;p&gt;Five responsibilities, each of which gets glossed over until production exposes it.&lt;/p&gt;




&lt;h3&gt;
  
  
  Write: the promotion gate
&lt;/h3&gt;

&lt;p&gt;Decide what enters durable memory and what stays ephemeral. This is the highest-risk operation in the system. Promote everything and the memory store poisons itself; promote nothing and the agent feels amnesiac.&lt;br&gt;
The gate runs three operations in a single transaction:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classify and scope.&lt;/strong&gt; Determine the candidate's type and resolve its scope tuple (tenant, user, agent, session).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedup by content hash plus scope tuple.&lt;/strong&gt; If an existing record matches both, return a dedup result instead of writing. The same fact arriving from two runs in the same scope resolves to one row; the same fact in two different scopes ends up as two rows, because they live in different access boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type-specific verification.&lt;/strong&gt; Facts need a confidence above threshold and a source run ID. Contradicting facts route to the supersession path rather than a new write.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then compute status from scope and type — never from the caller — and write the record with provenance attached. The companion runtime package has the full implementation.&lt;/p&gt;

&lt;p&gt;The flow above hides two rules that do real work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Status assignment:&lt;/strong&gt; Agent-scoped memory, preferences, and episodes enter as active. Tenant-scoped facts and policies enter as provisional: durable enough to confirm later, but not retrievable by semantic discovery until status flips to active. That gap is what keeps a noisy promotion pipeline out of the recall layer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use both halves of the dedup key:&lt;/strong&gt; Without the scope half, you collapse rows across access boundaries that should stay separate. Without the hash half, the same fact creates competing duplicates in retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scope assignment at promotion time is the underrated half of this. If user Jane discovers that Acme's production database is in us-east-1, the fact is logically about Acme, not about Jane. Writing it user-scoped means every other user at Acme has to rediscover it independently and the compounding-advantage story falls apart. Writing it tenant-scoped without a rule lets user-scoped facts leak across users the first time someone gets sloppy. The working heuristic: default to the narrowest scope that matches the candidate (usually user), and only promote to tenant scope when the subject of the fact is the tenant entity itself and the fact has been observed from two independent sources. The promotion gate is where that rule belongs because callers should never be trusted to pick scope on their own.&lt;/p&gt;

&lt;p&gt;Before you ship, write down six things for each promotion path: what gets promoted, the granularity, which memory type it lands in, the provenance attached, the decay rule that retires it, and the authority that signed off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: user-supplied status.&lt;/strong&gt; Letting the save call accept a status field guarantees that somebody will eventually pass status="active" from a place you didn't expect. Compute status from scope and type &lt;em&gt;inside&lt;/em&gt; the manager, and keep it out of the public API entirely.&lt;/p&gt;


&lt;h3&gt;
  
  
  Update: invalidation across projections
&lt;/h3&gt;

&lt;p&gt;Memory drifts. Tenant policy changes, user preferences update, a fact gets contradicted by a system of record. Updates need to be coordinated: write the new state, mark the old state superseded, and invalidate any derived projections that cached the old answer. This is one of the places where transactional guarantees stop being a nice-to-have.&lt;/p&gt;

&lt;p&gt;Inside a single transaction: write the new fact and get its new ID, mark the old fact superseded with a pointer to the replacement, invalidate any cached retrieval results that referenced the old fact, and emit a fact.superseded event so other downstream projections (search indexes, etc.) update.&lt;/p&gt;

&lt;p&gt;Without atomicity, that sequence can interleave with retrieval and produce a window where both the old and new fact are visible. With it, retrieval sees either the old fact or the new one, never both.&lt;/p&gt;


&lt;h3&gt;
  
  
  Summarize: compression after stabilization
&lt;/h3&gt;

&lt;p&gt;Compress trace memory into episodic and fact memory. This is where transcripts become reusable. It's also where most teams get hurt: summarizing before you stabilize meaning (resolve references, attach provenance, normalize entities) compresses noise faster than signal. Stabilization comes first; the summary comes after.&lt;/p&gt;

&lt;p&gt;Pull the full trace for the run, stabilize it (resolve pronouns, normalize entity references, drop retracted statements), then run two extractors against the stabilized events: one for candidate facts (subject, predicate, content, confidence), one for an episode summary (only if the run completed a coherent task). Each extracted candidate runs through the promotion gate, where most get rejected. High rejection rates are what you want here.&lt;/p&gt;


&lt;h3&gt;
  
  
  Retrieve by type: orchestration and per-type strategy
&lt;/h3&gt;

&lt;p&gt;Resolve a request into the right slice of memory. Not "top-k from the vector store," but a parallel fan-out across the right stores with the right strategy per type. Policy and preference lookups, fact hybrid search, episode semantic search, and a recent-trace summary fan out in parallel, either across stores, or as a single unified retrieval query when the data is co-located in one engine, then assemble into a token-budgeted prompt with reserved slots for policy and preferences (always included) and ranked slots for facts, episodes, and recent activity (filled until the budget runs out).&lt;/p&gt;

&lt;p&gt;A memory manager that treats every retrieval as a vector query will be wrong about half the time, in ways that look like "the agent feels off" rather than "the agent threw an error." The strategy that actually works depends on the type:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Policy&lt;/td&gt;
&lt;td&gt;Exact match: WHERE tenant_id = ? AND policy_key = ? AND effective_until IS NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preference&lt;/td&gt;
&lt;td&gt;Exact match: WHERE user_id = ? AND tenant_id = ? (full set every turn)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fact&lt;/td&gt;
&lt;td&gt;Hybrid: lexical + vector, fused, filtered by tenant/user scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Hybrid: lexical + vector over summary, fused, with optional task_type filter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace&lt;/td&gt;
&lt;td&gt;Replay by run_id; vector search only for forensic queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  Decide what enters the context window
&lt;/h3&gt;

&lt;p&gt;The most underappreciated job. Even with the right memory in hand, the manager still has to choose what gets surfaced on this turn and in what order. It enforces the token budget while doing it. When the budget is exceeded, the manager decides what to compact or drop entirely.&lt;/p&gt;

&lt;p&gt;The pattern that holds up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reserve fixed slots for policy and preference (they're cheap and they always apply)&lt;/li&gt;
&lt;li&gt;Bound retrieval payloads (top-k with a max-token cap per item)&lt;/li&gt;
&lt;li&gt;Order recent context last (recency near the answer position helps)&lt;/li&gt;
&lt;li&gt;Compact rather than truncate when over budget (drop the least-recent episode, keep all preferences)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The assembler walks the reserved slots first (policy, preferences) and adds them unconditionally, then walks the ranked slots (facts, episodes, recent) and adds items in order until the token budget is exhausted. Anything that doesn't fit gets dropped rather than truncated mid-record.&lt;/p&gt;

&lt;p&gt;None of these are vector store operations. They're coordination logic. The vector store is one component; the memory manager runs the loop.&lt;/p&gt;


&lt;h2&gt;
  
  
  What supports privacy and data security in memory systems
&lt;/h2&gt;

&lt;p&gt;Memory systems amplify privacy stakes. A logged conversation is one privacy boundary. A fact extracted from a hundred conversations and promoted into durable memory is something else. &lt;a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32016R0679#d1e2573-1-1" rel="noopener noreferrer"&gt;Right-to-forget under GDPR&lt;/a&gt; becomes nontrivial the moment "the thing that knows about the user" is a derived artifact rather than a raw record. &lt;a href="https://eur-lex.europa.eu/eli/reg/2024/1689/" rel="noopener noreferrer"&gt;The EU AI Act's high-risk obligations&lt;/a&gt; were originally scheduled for August 2026 and now look likely to be deferred to December 2027 under the Digital Omnibus revisions that reached political agreement in May 2026. Either way, the direction is the same: a higher bar on data governance, audit, and human oversight for systems classified as high-risk.&lt;/p&gt;

&lt;p&gt;Five supports that matter, in rough order of how often they fail in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope as a structural primitive.&lt;/strong&gt; Every memory record carries scope columns (tenant_id, user_id, agent_id). Retrieval filters by scope as hard predicates, before ranking, never as a soft post-filter. The earlier WRONG/RIGHT example covers this. Scope is a hard access boundary, and resist the temptation to invent new scopes for relevance signals like "this repo only" or "this project only." A repo restriction is a filter; it belongs in metadata or a tag column rather than the access-control system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provenance on every durable record.&lt;/strong&gt; Where it came from, which run promoted it, under which policy version. Without provenance you can't honor a deletion request. You can't audit a decision later either, and you can't prove isolation when a regulator asks. Every promotion writes the source run, the source turn, the promoting policy version, and the timestamp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tombstones with cascading invalidation.&lt;/strong&gt; Deletion in the canonical store has to cascade into every derived projection. Otherwise the embedding index keeps surfacing artifacts that have been removed from canonical truth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- GDPR erasure: cascade across all memory types in one transaction&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt;
 &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[erased]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'erased'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'revoked'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;episodic_memory&lt;/span&gt;
 &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[erased]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'revoked'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;preference_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trace_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;deletion_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(:&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'all'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'gdpr_erasure'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a federated architecture that's a coordination problem across four systems, with a window during which deletion is partial. With everything in one engine inside one transaction, it's atomic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retention as a property of the record.&lt;/strong&gt; TTL, retention policy ID, and sensitivity classification live alongside the record. The manager enforces them at write and read time so application code never has to. A nightly job sweeps expired records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;episodic_memory&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trace_memory&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'90 days'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;retention_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'short'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tenant-scoped encryption domains.&lt;/strong&gt; Logical isolation is a software invariant; encryption isolation is a cryptographic one. Both matter. Encryption reinforces scope; it doesn't substitute for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: memory that never forgets.&lt;/strong&gt; No TTL, no decay, no eviction. Six months in, the store becomes a liability rather than an asset because the irrelevant outweighs the useful and retrieval gets noisier over time. Aggressive forgetting policies are some of the best behaviors you can add.&lt;/p&gt;

&lt;p&gt;Privacy gets routed before any durable write occurs, or it doesn't get enforced at all. You can't bolt it on after the architecture is settled.&lt;/p&gt;




&lt;h2&gt;
  
  
  From file retrieval to reusable memory
&lt;/h2&gt;

&lt;p&gt;A useful way to see what we've built up so far is to walk through what happens when a stack of notes turns into agent memory.&lt;/p&gt;

&lt;p&gt;A user dumps a folder of meeting notes, customer reports, and design docs into the system. Classical RAG indexes them, runs nearest-neighbor search at query time, and stuffs the top matches into the prompt. The notes serve the chat. Useful, limited, exactly what RAG is for.&lt;/p&gt;

&lt;p&gt;A memory system does something different. The notes still get indexed, but indexing is only the starting point. As the user and the agent work over those notes, three kinds of distillation happen in the background:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stable preferences get extracted into preference memory ("this user always wants exec summaries first"). They survive every session and feed every prompt.&lt;/li&gt;
&lt;li&gt;Durable assertions get extracted into fact memory ("Acme runs production in us-east-1"). They get attached to the entities they're about, with provenance back to the source notes, and become available to any agent that should see them.&lt;/li&gt;
&lt;li&gt;Completed tasks get distilled into episodic memory ("migrated Acme's webhook to v2, here's how"). They become the shape the next similar task can reuse instead of re-deriving.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same memory layer now serves a chat, an autonomous agent operating on the user's behalf, and a human searching their own notes. None of them needs a separate store. The notes were the raw material; the memory layer is the durable, reusable artifact distilled from them. Local files were the input. They're not the end state.&lt;/p&gt;

&lt;p&gt;This is what the brief at the top of this article meant by ingest once, distill, retrieve the right slice, reuse over time. The win isn't better prompt packing. The win is lower context overhead and better continuity, because the system is no longer reconstructing the same context from scratch every turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the memory layer actually lives
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fmemory-layer-co-location-1024x477.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F06%2Fmemory-layer-co-location-1024x477.png" title="Co-located versus split memory architectures for AI agents" alt="The memory manager uses Oracle AI Database as a converged memory substrate with relational tables, vector search, JSON metadata, text search, transactions, security, audit, and optional external stores for blobs or high-volume logs." width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Co-located versus split memory architectures for AI agents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;By this point we've designed the typed memory model. We've worked through the storage tradeoffs. We've built two retrieval paths into the manager and pinned down what the promotion gate has to enforce. The remaining question is what to put it all on.&lt;/p&gt;

&lt;p&gt;The accidental architecture most teams arrive at splits the memory layer along the axis that hurts most. Relational data goes in Postgres. Hybrid retrieval goes in Elasticsearch, OpenSearch, or a dedicated vector store. Traces end up in S3 or a time-series database. Each of these is a strong choice for the job it was built for, and many production stacks use them well. The problem is what happens when context has to cross between them.&lt;/p&gt;

&lt;p&gt;Every meaningful retrieval becomes a join across systems:&lt;/p&gt;

&lt;p&gt;“Find facts relevant to this query, filtered by the policies that apply to this tenant, scoped to users this agent is allowed to see, joined with the user's current preferences.”&lt;/p&gt;

&lt;p&gt;That join crosses a trust boundary. It also crosses a transactional boundary and a latency boundary. Every time context crosses one of those, you re-introduce the consistency problem you were trying to avoid. This is the polyglot persistence trap: each component excels at a specific function, but you end up running multiple databases with multiple security models and multiple backup strategies, all coordinated by glue code that becomes its own failure surface.&lt;/p&gt;

&lt;p&gt;Hybrid retrieval as a feature is largely a solved problem. Elasticsearch, OpenSearch, LanceDB, Pinecone, Weaviate, and the vector extensions to Postgres all support some version of it, and the production-quality gap between them is narrower than vendor marketing suggests. If the memory layer were just embeddings and full-text search, the architecture question would be boring.&lt;/p&gt;

&lt;p&gt;Memory is more than retrieval. The capability that actually matters is hybrid retrieval co-located with the relational data that governs it. A single query has to enforce policy, resolve user preferences, apply access controls, and rank by semantic similarity, all under one query plan, one transactional boundary, one security model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/" rel="noopener noreferrer"&gt;Oracle AI Database&lt;/a&gt; is a strong fit for this pattern in production today. It gives you the things every memory system needs in a single converged engine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Durable storage.&lt;/strong&gt; Canonical truth lives in real tables with real constraints. The vector index is a cache for retrieval, never the system of record.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relational precision.&lt;/strong&gt; Policies and preferences are exact-match lookups; the database has done that for forty years.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector retrieval.&lt;/strong&gt; Embeddings live in VECTOR columns alongside the rows they describe, queryable through &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_distance.html" rel="noopener noreferrer"&gt;VECTOR_DISTANCE&lt;/a&gt; and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_embedding.html" rel="noopener noreferrer"&gt;VECTOR_EMBEDDING&lt;/a&gt; in the same SQL statement as the structured filters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON and metadata handling.&lt;/strong&gt; JSON columns and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/adjsn/sql-json-function-json_value.html" rel="noopener noreferrer"&gt;JSON_VALUE&lt;/a&gt; give you flexible schemas where you need them, without introducing a separate document store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governed access patterns.&lt;/strong&gt; Tenant isolation lives in the data plane. So do &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/dbseg/using-oracle-vpd-to-control-data-access.html" rel="noopener noreferrer"&gt;row-level security&lt;/a&gt;, &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/dbseg/part_6.html" rel="noopener noreferrer"&gt;audit logging&lt;/a&gt;, and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/dbseg/part_4.html" rel="noopener noreferrer"&gt;encryption&lt;/a&gt;. None of those are application-code concerns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A path from local workflow to production.&lt;/strong&gt; The same SQL works on a developer's laptop with &lt;a href="https://www.oracle.com/database/free/" rel="noopener noreferrer"&gt;Oracle Database Free&lt;/a&gt; as it does in a regulated multi-region production deployment. The memory layer doesn't need to be reimplemented when it grows up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ACID transactions are the reason all of this collapses into one place rather than four. When a fact promotion supersedes an old fact, the same transaction can invalidate the cached projection and write the deletion event. Either all of it lands or none of it does. The same query that ranks facts by similarity can enforce tenant isolation, apply active-policy thresholds, and respect a user's personalization preference, inside one transaction boundary, with one security model.&lt;/p&gt;

&lt;p&gt;In practice, that's the difference between this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Retrieve facts relevant to this query, scoped to the user's tenant,&lt;/span&gt;
&lt;span class="c1"&gt;-- filtered by active policy, ranked by semantic similarity. One query plan.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;preference_memory&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
 &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="k"&gt;current_user&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;policy_memory&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
 &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;policy_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'fact_retrieval'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;effective_until&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;superseded_by&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;JSON_VALUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pref_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.allow_personalization'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;
 &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;JSON_VALUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;policy_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.min_confidence'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ALL_MINILM_L12_V2&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="n"&gt;COSINE&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And four round trips across three stores with a consistency model you have to defend yourself.&lt;/p&gt;

&lt;p&gt;Oracle has also recently introduced &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/" rel="noopener noreferrer"&gt;&lt;strong&gt;Oracle AI Agent Memory&lt;/strong&gt;&lt;/a&gt;, an SDK on top of the same database that gives you a working memory system out of the box: typed records (guideline, preference, fact, memory, message), scope columns (thread_id, user_id, agent_id), exact and similarity search APIs, and managed schema policies. It is the most direct implementation of the architecture this article has been building toward, and starting from it skips a lot of the reinvention that teams otherwise pay for in the first year of a memory system. Everything in this article still applies on top of it. The SDK handles the substrate; the design choices that make memory good for your product are still yours to make.&lt;/p&gt;

&lt;p&gt;You'll still want specialized stores for some things. Trace memory at high volume has different durability and cost characteristics than your operational database. Blob storage for large documents, attachments, and tool outputs belongs somewhere built for that. A converged platform doesn't eliminate every other store. What it eliminates is the most expensive join: the one between the relational data that governs what the agent is allowed to do and the vector data that shapes what the agent actually does.&lt;/p&gt;

&lt;p&gt;When those two are co-located, the architectural problem shifts from "how do I keep these stores in sync" to "how do I design good memory policies," which is the problem that actually matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you add memory to an AI agent
&lt;/h2&gt;

&lt;p&gt;If you have a working RAG pipeline today and want to evolve it into a memory system, the steps are sequential. Each one builds on the last, and none of them are optional if the system is going into production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1. Type your memory
&lt;/h3&gt;

&lt;p&gt;Inventory what your current system stores. Label each as policy, preference, fact, episodic, or trace. You'll discover that most of it is sitting in the same vector store and that some of it shouldn't be there at all.&lt;/p&gt;

&lt;p&gt;A practical exercise: take ten retrieval results from your current system and label them. If half come back as "trace memory I'm doing semantic search over," you've found the reason your agent feels noisy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2. Scope every record
&lt;/h3&gt;

&lt;p&gt;Tag every memory record with scope columns: tenant_id, user_id, agent_id. Make these required fields. Make scope a hard predicate at every retrieval site so a missing filter becomes a query error rather than a leak.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'default'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_fact_scope&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;fact_memory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Anti-pattern: scope as relevance signal.&lt;/strong&gt; Inventing new scopes (team_scope, repo_scope, project_scope) for what are actually filters. Scope is access control; filters are metadata. Mixing them turns retrieval into a permissions audit and makes the access boundary harder to reason about. If you find yourself adding a fifth or sixth scope dimension, what you actually need is a metadata column.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3. Separate truth from acceleration
&lt;/h3&gt;

&lt;p&gt;Move authoritative state into structured columns with provenance and retention. Every fact gets a row in fact_memory with the canonical content, provenance, status, and retention. The embedding column is a projection of that content; if the embedding model changes tomorrow, you re-embed from the structured rows. You should be able to drop the embedding column and rebuild it from the rows without losing any information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: the vector store is the system of record.&lt;/strong&gt; If a piece of data exists only in the embedding index, you've lost provenance, replayability, and deletion guarantees. The vector index is acceleration; the structured row is truth. You can re-derive embeddings from rows, but never the other direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4. Build a promotion gate
&lt;/h3&gt;

&lt;p&gt;Before anything writes to durable memory, it has to pass through a gate that decides scope, type, provenance, retention, and verification. Promotion is a database write. Treat it like one.&lt;/p&gt;

&lt;p&gt;The simplest viable gate is a function with explicit rules per type, plus the computed-status rule from earlier.&lt;/p&gt;

&lt;p&gt;The rules per type, in plain English. Preferences accept candidates with a confidence floor (0.5 is reasonable) and a non-empty pref_key. Production systems often tighten this with explicit confirmation or repeated observation across multiple turns. Facts need both a source run ID for provenance and a confidence above a higher threshold (0.7 holds up well). Episodes promote on task completion. Policy never auto-promotes; it requires explicit authoring. Status is then computed from scope and type, never accepted from the caller.&lt;/p&gt;

&lt;p&gt;Tune the thresholds against your own data. The first version will be wrong; that's expected. The point is having a gate at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern: promotion on user thumbs-up.&lt;/strong&gt; The strongest user signal is also the rarest and most biased. Systems that promote only on explicit positive feedback learn what makes users click the button, which is a different thing than what makes the agent good. Treat thumbs-up as one signal among several and combine it with confidence thresholds, repeated observation, and source provenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5. Reassemble per turn
&lt;/h3&gt;

&lt;p&gt;Stop accumulating the prompt across turns. Reassemble it on every turn from policy, preferences, retrieved facts, retrieved episodes, and a short summary of recent activity. The transcript is source material; the prompt is a reconstruction.&lt;/p&gt;

&lt;p&gt;You'll know this step is working when the prompt token count stops growing with conversation length.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6. Instrument the loop
&lt;/h3&gt;

&lt;p&gt;Capture trace envelopes for every run: what was retrieved, what was promoted, what the model used, what it cost. Without this you can't evaluate, replay, or debug.&lt;/p&gt;

&lt;p&gt;Minimum trace envelope per turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_a1b2c3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"turn_index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acme"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jane@acme.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"retrieval"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"facts_returned"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"fact_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fact_042"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"episodes_returned"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ep_017"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"preferences_applied"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"response_format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"verbosity"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"retrieval_mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hybrid"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"model_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_model_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4231&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"output_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;412&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"promotions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"fact_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fact_103"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"provisional"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"written | deduplicated | rejected | superseded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rejection reason or superseded:old_id"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These six steps aren't separable. You can't meaningfully build the promotion gate before you have typed memory, and you can't reassemble per turn before you've separated truth from acceleration. The order matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  A final note on what counts as memory
&lt;/h2&gt;

&lt;p&gt;A lot of current memory talk is retrieval plus larger prompts with better branding. If a system only needs semantic lookup over documents, basic RAG is enough; calling it memory doesn't make it more useful. If a system has to maintain continuity across sessions, recall structured information accurately, govern reuse against real privacy boundaries, and learn what to retain over time, then it's a memory system design problem. A vector store on its own won't cover it.&lt;/p&gt;

&lt;p&gt;"Add memory" undersells what's actually happening. You're moving from a stateless API call to a stateful loop with a managed memory layer underneath it. Done well, the model layer becomes interchangeable. The memory layer becomes the thing nobody else can copy, because what's stored in it reflects choices that only your team could have made.&lt;/p&gt;

&lt;p&gt;Models are shared. The memory system is yours.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;FAQs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why isn't a vector database the same as a memory system?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A vector database helps find similar information, but a memory system also decides what to remember, how to organize it, how to retrieve it, and how to maintain continuity across sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What kinds of memory do AI agents need?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The article identifies five types: policy memory (rules), preference memory (user settings), fact memory (durable knowledge), episodic memory (completed tasks), and trace memory (execution history).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do memory systems avoid huge prompts?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of continually adding conversation history, they rebuild prompts each turn using relevant memories, summaries, policies, and preferences while keeping raw transcripts stored separately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the biggest change when moving from RAG to memory?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system gains a durable write path, allowing it to store validated information from interactions and reuse it later, creating long-term continuity rather than one-time retrieval.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>database</category>
      <category>agents</category>
    </item>
    <item>
      <title>Oracle AI Agent Memory: A Governed, Unified Memory Core for Enterprise AI Agents</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Mon, 04 May 2026 08:57:04 +0000</pubDate>
      <link>https://dev.to/oracledevs/oracle-ai-agent-memory-a-governed-unified-memory-core-for-enterprise-ai-agents-4ml8</link>
      <guid>https://dev.to/oracledevs/oracle-ai-agent-memory-a-governed-unified-memory-core-for-enterprise-ai-agents-4ml8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is syndicated from the original post on &lt;a href="https://blogs.oracle.com/developers/oracle-ai-agent-memory-a-governed-unified-memory-core-for-enterprise-ai-agents" rel="noopener noreferrer"&gt;blogs.oracle.com&lt;/a&gt;. Read the canonical version there for the latest updates.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Recently, Oracle introduced the &lt;a href="https://www.oracle.com/database/ai-agent-memory/" rel="noopener noreferrer"&gt;Oracle AI Agent Memory&lt;/a&gt; Python package, a model and framework-agnostic memory solution that gives enterprise AI teams a governed memory core on Oracle AI Database: short-term threads with summaries and context cards, long-term durable memories with vector search, automatic LLM-based memory extraction, and the governance and isolation production agents require.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-1-transparent-1024x698.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-1-transparent-1024x698.png" title="Oracle Agent Memory: a unified agent memory layer built on Oracle AI Database." alt="Diagram titled “Oracle Agent Memory” showing a layered architecture. At the top, a framework layer includes LangGraph, Claude Agent SDK, OpenAI Agent SDK, WayFlow, and custom integrations feeding into a unified Oracle Agent Memory client. The client manages working, semantic, episodic, and procedural memory types. Below, the system is powered by Oracle AI Database, which provides governed, isolated, audited, encrypted, and highly available infrastructure with vector search, graph traversal, and relational query capabilities." width="800" height="545"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Oracle Agent Memory: a unified agent memory layer built on Oracle AI Database.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Oracle AI Agent Memory is available now via &lt;a href="https://pypi.org/project/oracleagentmemory/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; as &lt;code&gt;oracleagentmemory&lt;/code&gt; and documented in the &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/" rel="noopener noreferrer"&gt;Oracle Help Center&lt;/a&gt;. It is designed to replace the patchwork memory stack most production agents inherit with a single governed memory substrate built on Oracle AI Database.&lt;/p&gt;

&lt;p&gt;This is the difference between an agent that is memory-augmented, given a vector store to consult, and one that is memory-aware, responsible for reading from and writing to its own governed, durable state with one enterprise-grade backend.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"&lt;em&gt;Agent memory has shifted from a research curiosity to a production requirement in under two years. Teams shipping serious agentic systems need a backend that handles vectors, structured data, and transactional consistency in one place, not three stitched together. Oracle AI Database is one of the few platforms that delivers all of that natively, which is why we built Hindsight to run on it as a first-class backend.&lt;/em&gt;"&lt;br&gt;
&lt;em&gt;—&lt;/em&gt; &lt;em&gt;Chris Latimer, Co-Founder &amp;amp; CEO of Vectorize&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Agent Memory, Why Now
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-2-transparent-1024x914.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-2-transparent-1024x914.png" title="The four types of agent memory: working, semantic, episodic, and procedural." alt="Diagram titled “The four types of agent memory” showing an Oracle Agent Memory taxonomy. A central “Agent memory” layer branches into four categories: working memory (active state like current conversation and in-flight tasks), semantic memory (durable facts such as user preferences and entity data), episodic memory (specific past experiences like prior sessions and resolved tasks), and procedural memory (behavioral rules and tool preferences). Each category includes a short description and example elements." width="800" height="714"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The four types of agent memory: working, semantic, episodic, and procedural.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most agent implementations treat memory as a bolt-on. A vector store consulted at retrieval time, a chat history table glued on beside it, and whatever hand-written extraction logic the team can maintain.&lt;/p&gt;

&lt;p&gt;That stack holds together for a demo. It falls apart from the moment an enterprise AI team asks questions that actually matter. Who owns the memory? Where is it governed? How do we isolate tenants? How do we audit what the agent learned, and how do we forget it on request?&lt;/p&gt;

&lt;p&gt;Context windows have grown over the years, but no context window is large enough to hold the full state of a long-running agent: weeks of user preferences, accumulated domain knowledge, prior tool outcomes, evolving task state, and the reasoning history that makes each decision defensible.&lt;/p&gt;

&lt;p&gt;Agents need memory for the same reasons people do: to hold an active state while working on a problem, to retain facts learned over time, to recall specific past experiences, and to encode behavioral rules and procedures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A practical taxonomy for agent memory commonly used in agent design covers four types:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Working memory is the active state the agent is reasoning over right now, the running conversation and the scratchpad the model sees at inference time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Semantic memory is the durable facts and knowledge the agent accumulates about users, entities, and the world: preferences, canonical definitions, structured reference data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Episodic memory is specific past experiences the agent can recall, what happened on a prior session, what the user asked three weeks ago, how a similar task resolved last time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Procedural memory is the behavioral rules, guidelines, and learned procedures that shape how the agent acts, how to handle customers, which tools to prefer, what not to do.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These are not four different systems. They are four access patterns over the same underlying state, which is what makes a unified memory core the right architectural answer rather than four bolted-together services.&lt;/p&gt;




&lt;h2&gt;
  
  
  Oracle AI Database as the Memory Core
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagmar-3-transparent-775x1024.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagmar-3-transparent-775x1024.png" title="Reference architecture for Oracle AI Agent Memory on Oracle AI Database." alt="Reference architecture diagram titled “Oracle AI Agent Memory on Oracle AI Database.” It shows the flow from a customer-owned application tier (end users interacting via natural language with an AI agent built using frameworks like LangGraph or OpenAI Agent SDK) to the Oracle-owned memory SDK and Oracle AI Database. The Oracle AI Agent Memory layer provides APIs for search, message handling, and memory extraction with tenant isolation and governance. It connects to Oracle AI Database, which supports vector search, relational queries, graph traversal, and JSON storage. The diagram also highlights enterprise capabilities like backup, replication, high availability, encryption, access control, and auditing, with arrows indicating request and response flow." width="775" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reference architecture for Oracle AI Agent Memory on Oracle AI Database.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Oracle AI Database combines vector similarity search, relational querying, and graph-aware data access in one governed engine, enabling semantic recall alongside precise transactional and relationship-centric retrieval. Combined with Oracle's operational story, backups, replication, high availability, encryption, fine-grained access control, and audit, teams get a path from notebook to regulated production without swapping storage layers, rewriting compliance reviews, or stitching together bespoke isolation logic along the way.&lt;/p&gt;

&lt;p&gt;Memory engineering, as a discipline, demands substrate choices that hold up under the access patterns a real enterprise agent actually has: concurrent writes, per-user and per-tenant scoping, full audit, and semantic retrieval at scale.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Enterprise agents need an agent memory solution with robust security guarantees, strong governance controls, sophisticated workload isolation, as well as deep integration within the enterprise data platform. Oracle AI Agent Memory greatly simplifies building agent memory solutions by consolidating what are usually multiple separate and fragmented services, within the converged database architecture that customers already trust for their most critical data.”&lt;/em&gt;&lt;br&gt;
&lt;em&gt;— Tirthankar Lahiri, SVP, Mission-Critical Data and AI Engines, Oracle Database&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Production agent memory carries two loads. Developers wire the stack together: vector store, chat log, extraction scripts, isolation logic, governance per piece. Agents reason over the fragments, deciding what to retrieve from where and fitting the relevant world into a finite context window each turn.&lt;/p&gt;

&lt;p&gt;Oracle AI Agent Memory lifts both. One governed client replaces the four-service stack, with one set of credentials, one compliance review, and one backup story. Working, semantic, episodic, and procedural memory share one substrate and one retrieval surface, so the model reasons over a coherent view of its state. Summarization and scoped retrieval put the right subset into context at the right moment, freeing the model to spend its reasoning budget on the task rather than memory bookkeeping.&lt;/p&gt;

&lt;p&gt;Automatic LLM-based extraction turns conversation into durable memories without hand-rolled prompt chains. Multi-tenant isolation is enforced at the store layer, so a single schema can host multiple deployments without cross-tenant leakage. And because the SDK is framework-agnostic, integrating with &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/notebooks/agent_memory" rel="noopener noreferrer"&gt;LangGraph, Claude Agent SDK, OpenAI Agent SDK, WayFlow, and custom harnesses&lt;/a&gt;, teams aren't locked into a single runtime to get the substrate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Benefits For AI Workloads
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-4-transparent-1024x584.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-4-transparent-1024x584.png" title="Configuration: gpt-5.5, reasoning effort xhigh, nomic-embed-v1.5 embeddings, local HNSW index, top-K = 200. X-axis truncated; all categories scored above 88%." alt="Bar chart showing LongMemEval results with 93.8% overall accuracy (469/500). Per-category scores: single-session assistant 100%, temporal reasoning 96.2%, knowledge update 94.9%, single-session user 94.3%, single-session preference 93.3%, and multi-session 88.0%. Configuration notes include GPT-5.5, nomic-embed-v1.5 embeddings, and HNSW index." width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Configuration: gpt-5.5, reasoning effort xhigh, nomic-embed-v1.5 embeddings, local HNSW index, top-K = 200. X-axis truncated; all categories scored above 88%.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Oracle AI Agent Memory is built for the operational realities of running AI agents in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production-grade recall on long-horizon memory benchmarks.&lt;/strong&gt; On &lt;a href="https://arxiv.org/abs/2410.10813" rel="noopener noreferrer"&gt;LongMemEval&lt;/a&gt;, the standard academic benchmark for long-context agent memory, Oracle AI Agent Memory scores &lt;strong&gt;93.8%&lt;/strong&gt; (469 of 500), with the strongest results on the categories that matter most for production agents: 100% on single-session assistant recall, 96% on temporal reasoning, and 95% on knowledge-update tasks. Multi-session recall, the hardest category in the benchmark, lands at 88%. Configuration: OpenAI gpt-5.5 (reasoning effort xhigh), nomic-embed-text-v1.5 embeddings, local HNSW index, top-K = 200.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bounded per-turn cost as sessions extend.&lt;/strong&gt; Periodic thread summarization, durable memory extraction, and prompt-time message compaction keep the working context bounded as conversations grow. In an 80-turn scripted conversation, Oracle AI Agent Memory held per-request input around 1,300 tokens for the full run while a flat-history baseline grew linearly past 13,900 — roughly 9.5× more tokens per request by the final turn, and a much steeper bill across the full conversation. Teams shipping long-running agents trade a linear-in-history cost curve for a flat one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-5-transparent-1024x552.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-5-transparent-1024x552.png" title="80-turn ChromAtlas-ND scripted conversation · gpt-5.4 (raw OpenAI client, no framework). Token estimate: chars / 4 (notebook convention)" alt="Line chart comparing tokens per request over 80 conversation turns. A gray line (no memory management) rises steadily to ~13,900 tokens, while a red line (Oracle AI Agent Memory) stays flat around ~1,300 tokens. The chart highlights ~9.5× lower token usage with memory, showing stable context size as conversations grow." width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;80-turn ChromAtlas-ND scripted conversation · gpt-5.4 (raw OpenAI client, no framework). Token estimate: chars / 4 (notebook convention)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better answers than a flat-history baseline.&lt;/strong&gt; A flat-history agent has the entire verbatim conversation in its prompt — every fact ever mentioned, in order. By rights it should be hard to beat on recall. Across the same 80-turn conversation, evaluated by an impartial gpt-5.4 judge on accuracy, completeness, relevance, and coherence, Oracle AI Agent Memory won &lt;strong&gt;48 turns to flat history's 13&lt;/strong&gt;, with 19 ties: &lt;strong&gt;3.7× more wins despite the baseline's information advantage&lt;/strong&gt;. A retrieved context card focuses the model on what matters; a sprawling transcript dilutes attention across noise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-6-transparent-1024x563.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-6-transparent-1024x563.png" title="80-turn ChromAtlas-ND scripted conversation; judge: gpt-5.4; scored on accuracy, completeness, relevance, coherence" alt="Bar chart showing agent performance over 80 turns. Oracle AI Agent Memory wins 48 turns (60%), compared to 13 wins (16%) for a naive flat history" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;80-turn ChromAtlas-ND scripted conversation; judge: gpt-5.4; scored on accuracy, completeness, relevance, coherence&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost is a tunable knob, not a fixed value.&lt;/strong&gt; The summarization trigger controls how aggressively the package compacts thread context, and it moves the cost-fidelity trade-off directly. In an 8-query demo conversation (five runs per threshold), a 10,000-token trigger landed at a mean of 121,268 total tokens, about 60% under the 306,823-token flat-history baseline. As the trigger rises, the package compacts less often and preserves more raw context per turn; by a 50–70k trigger, mean total tokens approach or exceed the baseline, and run-to-run variance widens. Teams pick the threshold that matches their answer-quality requirements and lock in the cost envelope they want, rather than accepting whatever curve a fragmented stack produces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-7-transparent-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F05%2FDiagram-7-transparent-scaled.png" title="Memory Agent Efficiency vs Summarization Threshold on Demo Conversation. Num queries = 8; num runs per threshold = 5." alt="Line chart titled “Oracle Agent Memory Threshold Sweep” showing total tokens vs. summarization trigger (10k–70k). Mean tokens rise as the threshold increases, with shaded min–max and standard deviation bands. The lowest mean (~121k tokens) occurs at 10k, while higher thresholds approach or exceed a dashed naive baseline (~306k tokens)." width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Memory Agent Efficiency vs Summarization Threshold on Demo Conversation. Num queries = 8; num runs per threshold = 5.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One backend, every Python runtime.&lt;/strong&gt; &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/notebooks/agent_memory" rel="noopener noreferrer"&gt;LangGraph, the Claude Agent SDK, the OpenAI Agents SDK, WayFlow, and custom Python harnesses&lt;/a&gt; all instantiate the same OracleAgentMemory client and read and write the same Oracle Database store. Teams running more than one framework no longer rebuild memory per runtime, and migrations between frameworks no longer mean migrating memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primitives for audit and erasure on a single substrate.&lt;/strong&gt; Every record carries user, agent, thread, and timestamp scoping fields, and the SDK exposes search, list, and per-record delete operations across memories, threads, and messages, so callers can locate records for a subject and remove them on request. Oracle Database's native auditing covers the storage layer underneath. Compliance reviews land on a single substrate (one database with audit, retention, and access controls already in the data plane) rather than four services with four reviews.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One vendor relationship for production agent memory.&lt;/strong&gt; A single Oracle AI Database instance carries vector search, structured state, JSON document retrieval, transactional consistency, and database-native audit. No second vector database to license, no third service to monitor and scale, no fourth backup pipeline to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Oracle AI Agent Memory Is For
&lt;/h2&gt;

&lt;p&gt;Oracle AI Agent Memory is designed for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Developers and engineers&lt;/strong&gt; building production agents who need durable short-term and long-term memory in one place, with enterprise security and isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teams already running Oracle AI Database&lt;/strong&gt; who want their agents to write to the same governed backend as the rest of the business&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical leaders&lt;/strong&gt; evaluating Oracle AI Database for agent memory infrastructure at scale, with compliance and audit requirements&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Install the Oracle AI Agent Memory package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;oracleagentmemory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal end-to-end loop in Python looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;oracleagentmemory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentMemory&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AgentMemory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add conversation turns to a short-term thread
&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I prefer vegan meals.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Noted.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Extract durable long-term memories from the thread
&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Scoped search over long-term memory, enforced per-user
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dietary preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Code samples are illustrative; the final API surface is documented in the &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/" rel="noopener noreferrer"&gt;Oracle Help Center&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/notebooks/agent_memory" rel="noopener noreferrer"&gt;quickstart notebook and framework how-to guides&lt;/a&gt; are available in the Oracle AI Developer Hub, and the full API reference is available in the &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/" rel="noopener noreferrer"&gt;Oracle Help Center&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Oracle AI Agent Memory is the first release of a broader commitment to a governed memory substrate enterprise agents need. Memory engineering is still an emerging discipline. The infrastructure behind it should not be.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>database</category>
      <category>oracle</category>
    </item>
    <item>
      <title>16 Ways to Make a Small Language Model Think Bigger</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Tue, 21 Apr 2026 07:56:58 +0000</pubDate>
      <link>https://dev.to/oracledevs/16-ways-to-make-a-small-language-model-think-bigger-2lbo</link>
      <guid>https://dev.to/oracledevs/16-ways-to-make-a-small-language-model-think-bigger-2lbo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is syndicated from the original post on &lt;a href="https://blogs.oracle.com/developers/16-ways-to-make-a-small-language-model-think-bigger" rel="noopener noreferrer"&gt;blogs.oracle.com&lt;/a&gt;. Read the canonical version there for the latest updates.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;All of the code in this article is available in the &lt;a href="http://github.com/oracle-devrel/oracle-ai-developer-hub" rel="noopener noreferrer"&gt;Oracle AI Developer Hub&lt;/a&gt;. The repository is part of Oracle’s open-source AI collection and serves as the reference implementation for everything covered here.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You can install it with &lt;code&gt;pip install agent-reasoning&lt;/code&gt;, browse the 16 agent classes, run the TUI, or integrate it directly into an existing Ollama pipeline as a zero-change replacement client. If you find it useful, a GitHub star goes a long way.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Small language models struggle with complex reasoning on their own, but agent-based architectures (like Tree of Thoughts or Self-Consistency) can significantly improve their performance.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;agent-reasoning&lt;/code&gt; framework adds 16 research-backed reasoning strategies to any Ollama model using a simple &lt;code&gt;+strategy&lt;/code&gt; tag—no code changes required.&lt;/li&gt;
&lt;li&gt;Different strategies suit different tasks: CoT works well overall, ReAct excels with external data, and branching methods improve accuracy at the cost of speed.&lt;/li&gt;
&lt;li&gt;Much of modern AI progress comes from orchestration (prompting, search, control flow), not just larger models.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Generally, a 270M parameter LLM (as of today, April 2026) struggles with even basic multi-step reasoning. Ask a model like &lt;code&gt;gemma3:270m&lt;/code&gt; to solve the classic water jug problem, and it will often return a confidently incorrect answer—much like other small language models (SLMs) of similar size and training.&lt;/p&gt;

&lt;p&gt;However, take that same model and wrap it inside a Tree of Thoughts (ToT) agent, running a breadth-first search (BFS) with three levels and weighted branches, and it can reliably solve the puzzle. The improvement comes from the architecture: the agent distributes the reasoning process across structured exploration steps, compensating for the limitations of a single LLM call.&lt;/p&gt;

&lt;p&gt;This is where things get interesting. Much of the progress in applied AI isn't coming from bigger models alone, but from engineers rethinking how to orchestrate them—layering search, memory, and control flow on top of a standard LLM call to unlock new capabilities.&lt;/p&gt;

&lt;p&gt;This is the fundamental idea behind &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/agent-reasoning" rel="noopener noreferrer"&gt;agent-reasoning&lt;/a&gt;: sixteen cognitive architectures—each backed by peer-reviewed research—can be applied to any Ollama-served model via a simple &lt;code&gt;+Strategy&lt;/code&gt; tag appended to the model name. Call &lt;code&gt;gemma3:270m+tot&lt;/code&gt; instead of &lt;code&gt;gemma3:270m&lt;/code&gt;, and the interceptor handles everything else.&lt;/p&gt;

&lt;p&gt;We’ll talk about the different ways to invoke these reasoning strategies through the project.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You’ll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How the &lt;code&gt;ReasoningInterceptor&lt;/code&gt; intercepts model names, removes the &lt;code&gt;+Strategy&lt;/code&gt; tag, and directs traffic to one of 16 agent classes&lt;/li&gt;
&lt;li&gt;How 16 strategies divide into four families: sequential, branching, reflective, and meta —each representing a different reasoning approach and set of trade-offs&lt;/li&gt;
&lt;li&gt;What each major strategy accomplishes in practice, focusing on implementation rather than theory&lt;/li&gt;
&lt;li&gt;Which type of problem each strategy is best suited for, based on benchmark results from March 2026&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Interception Layer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; The &lt;code&gt;ReasoningInterceptor&lt;/code&gt; is an interchangeable drop-in client for Ollama that analyzes the model name for a &lt;code&gt;+Strategy&lt;/code&gt; tag and directs traffic to one of 16 cognitive agent classes while making no modifications to your pre-existing code.&lt;/p&gt;

&lt;p&gt;Everything relies on a single template: add &lt;code&gt;+Strategy&lt;/code&gt; to any Ollama model name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2APLi2WumhUe2et_POG0V_Og.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2APLi2WumhUe2et_POG0V_Og.png" title="Using ReasoningInterceptor as a drop-in replacement client" alt="Using ReasoningInterceptor as a drop-in replacement client; strategy routing can be enabled via model name tags (e.g., +tot)." width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Using ReasoningInterceptor as a drop-in replacement client&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The image below illustrates the entire routing process from start to finish. The interceptor acts as a middleman between your code and Ollama, removes the &lt;code&gt;+Strategy tag&lt;/code&gt;, and sends traffic to the correct agent class.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A5MwkQVsNUA1pqBEzsV4ACA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2A5MwkQVsNUA1pqBEzsV4ACA.png" title="Illustrating how the interceptor separates the base model from the Strategy tag" alt="Diagram illustrating how the interceptor separates the base model from the Strategy tag and directs traffic to the corresponding agent class." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Illustrating how the interceptor separates the base model from the Strategy tag&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;agent_map&lt;/code&gt; contains over fifty-five aliases mapped to sixteen agent classes. For example, &lt;code&gt;cot&lt;/code&gt;, &lt;code&gt;chain_of_thought&lt;/code&gt;, and &lt;code&gt;CoT&lt;/code&gt; all map to &lt;code&gt;CotAgent&lt;/code&gt;, while &lt;code&gt;mcts&lt;/code&gt; and &lt;code&gt;monte_carlo&lt;/code&gt; map to &lt;code&gt;MCTSAgent&lt;/code&gt;. Because the interceptor is a drop-in client for Ollama—supporting the same &lt;code&gt;.generate()&lt;/code&gt; and &lt;code&gt;.chat()&lt;/code&gt; APIs— existing LangChain pipelines, web UIs, and scripts can automatically gain reasoning capabilities by changing a single string in the model name.&lt;/p&gt;

&lt;p&gt;Additionally, the interceptor can be used as a network proxy. Instead of pointing an Ollama compatible application at &lt;code&gt;http://localhost:11434&lt;/code&gt;, direct it to &lt;code&gt;http://localhost:8080&lt;/code&gt; instead. Using a model name like &lt;code&gt;gemma3:270m+CoT&lt;/code&gt;, the gateway will apply reasoning transparently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Family 1: Sequential Strategies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Sequential Strategies process problems in a linear chain, where each step feeds into the next. In benchmarks, CoT achieved 88.7% average accuracy, compared to 81.3% for standard generation on the same model and weights.&lt;/p&gt;

&lt;p&gt;Each of the sixteen strategies fall into one of four families. The diagram below illustrates how they are grouped.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AqIVVyTPUDA2luQCNkzWgKw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AqIVVyTPUDA2luQCNkzWgKw.png" title="Categorization of the four strategy families" alt="Categorization of the four Strategy families: sequential, branching, reflective, and meta. Each route leads to a specific type of reasoning agent. The fastest Sequential Strategies occupy the top-left quadrant while slower Branching strategies sacrifice speed for increased accuracy." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Categorization of the four strategy families&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sequential strategies are designed for high-speed processing with minimal latency. They are ideal for problems with discrete, sequential steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chain of Thought (CoT)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Wei et al. (2022), &lt;a href="https://arxiv.org/abs/2201.11903" rel="noopener noreferrer"&gt;“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chain of Thought (CoT) is a prompting strategy in which the model generates intermediate reasoning steps before producing a final response. As noted in the original paper: prompting a model to produce these intermediate steps can significantly improve accuracy.&lt;/p&gt;

&lt;p&gt;For example, standard prompting on GSM8K achieves 66.7% accuracy. With CoT prompting, this increases to 73.3%— a 10% relative improvement achieved through simple prompt design alone.&lt;/p&gt;

&lt;p&gt;The following graphic illustrates how CoT chains appear in practice: a sequence of numbered steps, each building on the previous one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ANwSyAs818bWZ3mCEDW2lOg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ANwSyAs818bWZ3mCEDW2lOg.png" title="CoT in operation" alt="Visual representation of CoT in operation: the model sequentially progresses through numbered steps (step 1…step n). Each subsequent step depends on previously generated steps. The numbering in the prompt is the only special instruction provided." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CoT in operation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In terms of implementation within &lt;code&gt;CotAgent&lt;/code&gt;, the query is wrapped in a structured prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AolcatRJAj5naE6svAHQbOA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AolcatRJAj5naE6svAHQbOA.png" title="Structured prompting enforces step-by-step reasoning in CoTAgent" alt="Structured prompting enforces step-by-step reasoning in CoTAgent" width="800" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Structured prompting enforces step-by-step reasoning in CoTAgent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Benchmark result for qwen3.5:9b (9.7B): CoT achieves &lt;strong&gt;88.7% average accuracy&lt;/strong&gt;, across GSM8K (math), MMLU (logic), and ARC-Challenge (reasoning), compared to 81.3% for standard generation. This seven-point gain in performance is attributable solely to structural prompts. Identical weights and temperatures were applied to both models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Math word problems; logic puzzles; any multi-step reasoning task where the individual steps are sequential and do not have branches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decomposed Prompting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Khot et al. (2022), &lt;a href="https://arxiv.org/abs/2210.02406" rel="noopener noreferrer"&gt;“Decomposed Prompting: A Modular Approach for Solving Complex Tasks”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Decomposed prompting is an architectural module that splits large problems into smaller sub-problems. Each sub-problem is handled independently while carrying forward accumulated context from earlier steps. Once all sub-problems are processed, their outputs are synthesized into a final result. &lt;code&gt;DecomposedAgent&lt;/code&gt; follows a three-phase process—decomposition, execution and synthesis—and propagating context throughout so that each step can build on prior results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Planning problems; trip itinerary generation; any problem where the ultimate answer consists of multiple distinguishable parts that may be individually addressed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Decomposed prompting achieved only 38.5% average accuracy in benchmark testing. This result requires context. GSM8K primarily evaluates arithmetic reasoning, where decomposing a problem like “what is 47 × 13 + 9?” introduces overhead without improving the model's ability to compute the answer.&lt;/p&gt;

&lt;p&gt;Decomposition is more effective for problems with genuinely separable components (trip planning, multi-section reports etc.), where each part benefits from focused attention. These strengths are not captured by the benchmark, and the results reflect that mismatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Least-to-Most Prompting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Zhou et al. (2022), &lt;a href="https://arxiv.org/abs/2205.10625" rel="noopener noreferrer"&gt;“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Least-to-most prompting is a strategy that orders sub-questions from simplest to most complex, establishing prerequisite knowledge before tackling harder steps. Unlike decomposed prompting which generates arbitrary sub-problems, it enforces a deliberate progression where each step builds on the last. Knowledge is accumulated iteratively until the model reaches the final question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Questions with genuine prerequisites — e.g., “what is x?” before determining “how does x relate to y?”; educational style explanation sequences (“concept ladder”); tasks that require establishing foundational concepts before addressing more complex components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Family 2: Branching Strategies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Branching strategies explore multiple reasoning paths simultaneously and choose the best path. ToT scored 76.7% on GSM8K math, compared to 66.7% on GSM8K math with standard generation.&lt;/p&gt;

&lt;p&gt;More LLM calls mean higher latency— but often better answers on hard problems. Take this into consideration when running all branching strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tree of Thoughts (ToT)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Yao et al. (2023), &lt;a href="https://arxiv.org/abs/2305.10601" rel="noopener noreferrer"&gt;“Tree of Thoughts: Deliberate Problem Solving with Large Language Models”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ToT is a search-based methodology that evaluates numerous possible reasoning paths concurrently, selecting the best performing path as determined by evaluation metrics such as distance traveled or quality of intermediate solutions etc.&lt;/p&gt;

&lt;p&gt;Similar to chess engines, ToT applies BFS through an expanding tree of possible solutions. The core idea is straightforward: generate multiple partial solutions, evaluate them, prune weaker candidates, and continue exploring the most promising branches.&lt;/p&gt;

&lt;p&gt;Below is an illustration of how ToT generates and eliminates branches: green nodes represent surviving branches, while red nodes indicate those that have been eliminated. The final answer is derived from the highest scoring leaf node.&lt;/p&gt;

&lt;p&gt;A key design decision is how branches are evaluated. Should the same model handle both generation and scoring, or should a stronger model be introduced as a judge? In these benchmarks, the same model was used for both roles, but this is an area worth experimenting with, depending on your accuracy and latency constraints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AQHJPySSkNpDOji9BCKz-Ng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AQHJPySSkNpDOji9BCKz-Ng.png" title="Generating candidate branches at each level" alt="Illustration of how to generate candidate branches at each level; score candidate branches between 0 &amp;amp; 1; prune low-scored candidates; continue exploring surviving high-scored candidates until all levels are exhausted and then generate final answer from most promising leaf node." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Generating candidate branches at each level&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ToTAgent&lt;/code&gt; implements this as configurable by &lt;code&gt;depth&lt;/code&gt; (default=3) and &lt;code&gt;width&lt;/code&gt; (default=2 branches). At every level, the agent generates a set of candidate next steps, evaluates them using a scoring function, prunes low-scoring options, and expands the remaining candidates into the next level.&lt;/p&gt;

&lt;p&gt;Tot achieved &lt;strong&gt;76.7% accuracy&lt;/strong&gt;—a 10% percent improvement over standard generation on GSM8K math problems. This performance comes at a cost: additional LLM calls are required at each step to evaluate candidate paths and their intermediate result, making it roughly 5-8x slower than CoT equivalent queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Logic puzzles with multiple solution paths; strategic decision problems; tasks where multiple approaches can be explored and compared.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Consistency (Majority Voting)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Wang et al. (2022), &lt;a href="https://arxiv.org/abs/2203.11171" rel="noopener noreferrer"&gt;“Self-Consistency Improves Chain of Thought Reasoning in Language Models”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Self-Consistency is a sampling method that generates multiple independent reasoning traces and selects a final answer through majority voting. Unlike standard prompting, it relies on sampling k diverse traces at a higher temperature to encourage variation. Each trace produces a candidate answer, and the most frequently occurring answer is selected as the final output.&lt;/p&gt;

&lt;p&gt;The image below illustrates how both Self-Consistency and Monte Carlo Tree Search (MCTS) sample multiple reasoning paths, but differ fundamentally in how those paths are evaluated—majority voting versus UCB1-based exploration-exploitation balancing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AUKyufmNfjpFnSizTxD1M2w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AUKyufmNfjpFnSizTxD1M2w.png" title="Self-Consistency vs MCTS comparison" alt="Left: Self-Consistency flowchart — sampling k independent traces &amp;amp; selecting most commonly occurring final answer via majority vote. Right: Monte Carlo Tree Search (MCTS) flowchart — sampling new paths through UCB1-based exploration/exploitation tradeoff balancing — both generate multiple possible answers — selection methodology differ significantly." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Self-Consistency vs MCTS comparison&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ConsistencyAgent&lt;/code&gt; uses &lt;code&gt;k=5&lt;/code&gt; samples at temperature of &lt;code&gt;0.7&lt;/code&gt; by default. It extracts final answers using regex-based pattern matching and selects the most frequent result via &lt;code&gt;counter.most_common()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Self-Consistency matches CoT on both MMLU (96.7%) and GSM8K (76.7%). Its advantage lies in reliability rather than raw accuracy: majority voting across independent reasoning traces reduces the risk of single-trace errors propagating to the final answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Factual question answering; multiple-choice style questions; problems where arriving at the correct answer via diverse reasoning paths is more important than inspecting a single reasoning trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Family 3: Reflective Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Self-Reflection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Shinn et al. (2023), “Reflexion: Language Agents with Verbal Reinforcement Learning” — &lt;a href="https://arxiv.org/abs/2303.11366" rel="noopener noreferrer"&gt;arXiv:2303.11366&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Self-Reflection is a draft-critique-refine loop in which the model generates an initial answer, critiques it for errors, and then revises it. The Reflexion paper showed that this iterative process can meaningfully improve output quality, even without any gradient updates.&lt;/p&gt;

&lt;p&gt;The image below shows all 3 reflective strategies side by side: Self-Reflection, Debate, and Refinement Loop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AGyy_CHbQa01wEnpRxsWMcA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AGyy_CHbQa01wEnpRxsWMcA.png" title="Reflective strategies comparison" alt="Left: Self-Reflection drafts, critiques, and refines until the critique says “CORRECT.” Right: Debate puts PRO and CON agents against each other with a Judge scoring each round. Bottom: Refinement Loop uses a numeric quality gate (0.0–1.0) to decide when to stop iterating." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reflective strategies comparison&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SelfReflectionAgent&lt;/code&gt; runs a draft-critique-refine loop for up to 5 iterations, with early termination when the critique returns “CORRECT” in under 20 characters. If the critique is satisfied on an early pass, subsequent iterations are skipped. This approach helps keeps latency low for queries the model answers correctly on the initial pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Creative writing, high-stakes technical explanations, anything where “good enough on the first try” is insufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adversarial Debate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Irving et al. (2018), &lt;a href="https://arxiv.org/abs/1805.00899" rel="noopener noreferrer"&gt;“AI Safety via Debate”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Irving proposed debate as a mechanism for improving AI safety. Two agents present opposing arguments, and a judge (either a human or another LLM) evaluates their merits. The underlying premise is that that identifying flaw in weak arguments is often easier than constructing strong ones.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DebateAgent&lt;/code&gt; conducts multiple rounds of PRO and CON arguments, with a judge evaluating each exchange. Following all rounds, the strongest arguments from both sides are synthesized into a final answer that balances competing perspectives. Context is carried forward between rounds, enabling incremental refinement rather than redundant arguments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Controversial or ambiguous subjects; policy analysis; ethics and any subject matter requiring a balanced perspective.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refinement Loop
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Madaan et al. (2023), &lt;a href="https://arxiv.org/abs/2303.17651" rel="noopener noreferrer"&gt;“Self-Refine: Iterative Refinement with Self-Feedback”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This paper describes a refinement loop similar to self-reflection, but instead of relying on a human-style critique to guide revisions, it uses a machine-based evaluation system with quantifiable quality metrics. These metrics determine whether further refinement is necessary. The loop terminates when a predefined quality metric is reached (&amp;gt; 0.9 by default) or when the maximum number of iterations is exceeded.&lt;/p&gt;

&lt;p&gt;The five-stage complex refinement pipeline consists of sequential stages, each focused on a distinct type of critique: technical accuracy, structure, depth, examples, and polish.&lt;/p&gt;

&lt;p&gt;Each stage targets a distinct aspect of quality, ensuring the model focuses exclusively on improving that dimension rather than attempting to optimize everything at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Highly technical writing; documentation; blog posts, a scenario where production-quality output is required rather than simply a first draft.&lt;/p&gt;

&lt;h2&gt;
  
  
  Family 4: Cross-Domain and Meta Strategies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Cross-domain strategies enable sharing knowledge among disciplines, while meta-strategies automatically route queries to the most appropriate reasoning technique without requiring manual selection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analogy-Based Reasoning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Gentner (1983), &lt;a href="https://doi.org/10.1111/j.1551-6708.1983.tb00497.x" rel="noopener noreferrer"&gt;“Structure Mapping: A Theoretical Framework for Analogy”&lt;/a&gt;, Cognitive Science&lt;/p&gt;

&lt;p&gt;Gentner's structure-mapping theory proposes that analogical reasoning operates by identifying structural correspondences across domains, rather than relying on surface-level similarity. The &lt;code&gt;AnalogicalAgent&lt;/code&gt; builds on this idea through three phases: (1) identify the underlying structure independent of domain specifics, (2) generate analogous solutions from different domains that share that structure, (3) select the most effective analogy and apply its solution approach.&lt;/p&gt;

&lt;p&gt;This process reduces reliance on memorized patterns. By focusing on underlying structure, the model learns &lt;em&gt;why&lt;/em&gt; a solution works, rather than simply recalling &lt;em&gt;what&lt;/em&gt; worked before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage&lt;/strong&gt;: Solving problems that are structurally similar to prior ones, even if they differ superficially; transferring knowledge across domains; explaining complex concepts through analogy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Socratic Questioning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Paul &amp;amp; Elder (2007), &lt;a href="https://www.criticalthinking.org/" rel="noopener noreferrer"&gt;“The Art of Socratic Questioning”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Socratic Method:&lt;/strong&gt; Do not answer the question directly. Instead, ask follow-up questions that reduce ambiguity in the solution space.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SocraticAgent&lt;/code&gt; repeatedly asks questions and receives model responses, continuing until it reaches a limit of five question-response exchanges. It then synthesizes the collected information into a final answer. A deduplication or normalization step helps prevent repeated queries that differ only in wording.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended for:&lt;/strong&gt; Philosophy; ethics; deep technical knowledge; any field requiring the model to “know” something as opposed to merely answering it.&lt;/p&gt;

&lt;h3&gt;
  
  
  ReAct (Reason + Act)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; Yao et al. (2022), &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;“ReAct: Synergizing Reasoning and Acting in Language Models”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ReAct is a conceptual framework that interweaves reasoning steps with tool invocations, allowing the model to ground its thinking in external information. In practice, the model decides what action to take, calls a tool such as a web search engine, examines the result, updates its reasoning, and repeats the cycle until it reaches a satisfactory answer. Current tools include web scraping, accessing Wikipedia via an API call, and a calculator interface, with mock-ups available for off-line execution scenarios.&lt;/p&gt;

&lt;p&gt;Using ReAct achieved 70.0% accuracy on ARC-Challenge (Science Reasoning). While not the highest on this particular benchmark, it enabled tool use for the LLM and allowed it to search for required information on the Internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage&lt;/strong&gt;: Fact-checking; current events queries; mathematical calculations; tasks where access to grounded, external information is important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Auto Router: MetaReasoningAgent
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; A single LLM invocation allows &lt;code&gt;MetaReasoningAgent&lt;/code&gt; to classify each input into one of eleven categories and route it to the most appropriate strategy, without human intervention.&lt;/p&gt;

&lt;p&gt;All sixteen strategies depend on selecting the appropriate strategy for a given task. By removing this requirement, &lt;code&gt;MetaReasoningAgent&lt;/code&gt; eliminates the need for manual selection.&lt;/p&gt;

&lt;p&gt;The diagram below shows how each category maps to its corresponding strategy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ASSObpiuAEGr1s3E7oVbKGA.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2ASSObpiuAEGr1s3E7oVbKGA.png" title="MetaReasoningAgent classification diagram" alt="Classification occurs using a single LLM invocation returning CATEGORY, CONFIDENCE, and REASON." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;MetaReasoningAgent classification diagram&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MetaReasoningAgent&lt;/code&gt; instantiates the selected strategy class and passes control to it, along with all event objects for visualization.&lt;/p&gt;

&lt;p&gt;To use this capability, specify a model such as &lt;code&gt;gemma3:270m+meta&lt;/code&gt; or &lt;code&gt;gemma3:270m+auto&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In practice, routing is generally intuitive: math problems are directed to CoT, logic puzzles to ToT, philosophical questions to Socratic Questioning, and controversial topics to Adversarial Debate.&lt;/p&gt;

&lt;p&gt;The trade-off is reduced control over strategy-specific hyperparameters in exchange for automatic routing aligned with the problem type.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Strategy Should You Pick? Benchmark Results (March 2026)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; CoT performs best on average (88.7%) across diverse tasks. ReAct excels when tool use is available (70.0% on ARC-Challenge). ToT and Self-Consistency tie on GSM8K math at 76.7%.&lt;/p&gt;

&lt;p&gt;These results are based on 4,200 evaluations across 11 strategies using &lt;code&gt;qwen3.5:9b&lt;/code&gt;, collected as of March 2026. All 16 strategies are implemented and production-ready. However, the benchmarks shown below focus on the 11 that produce a single extractable answer. The remaining five are generation-focused and not suited to multiple-choice evaluation.&lt;/p&gt;

&lt;p&gt;The heat map and bar chart below provide a complete view of the results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AlkHAnyNpsABYEqnoueCr9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AlkHAnyNpsABYEqnoueCr9g.png" title="Benchmark results heatmap and bar chart" alt="Left: accuracy heatmap across GSM8K, MMLU, and ARC-Challenge for each strategy. Right: average accuracy bar chart. CoT wins overall at 88.7%." width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Benchmark results heatmap and bar chart&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The short version:&lt;/strong&gt; CoT wins on average across diverse tasks. Self-Consistency and ToT beat it on specific math benchmarks. ReAct dominates on factual/science tasks. Self-Reflection and Refinement Loop are not well captured by these benchmarks, as they primarily improve generation quality rather than multiple-choice accuracy.&lt;/p&gt;

&lt;p&gt;For most queries, start with &lt;code&gt;+cot&lt;/code&gt;. If you’re solving logic puzzles or planning problems, try &lt;code&gt;+tot&lt;/code&gt;. If you need factually grounded responses, use &lt;code&gt;+react&lt;/code&gt;. If you need polished, high-quality output rather than a quick answer, use &lt;code&gt;+refinement&lt;/code&gt;. When in doubt, &lt;code&gt;+meta&lt;/code&gt; will route the query automatically.&lt;/p&gt;

&lt;p&gt;In my experience building agent-reasoning, the most surprising finding is how much prompt structure alone can improve performance. For example, &lt;code&gt;qwen3.5:9b&lt;/code&gt; improves from 81.3% to 88.7% average accuracy simply by prompting it to produce numbered reasoning steps.&lt;/p&gt;

&lt;p&gt;As of March 2026, all 16 strategies are production-ready and have been evaluated across 4,200 benchmark runs.&lt;/p&gt;

&lt;p&gt;You can &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/agent-reasoning" rel="noopener noreferrer"&gt;find the repository here&lt;/a&gt;. Install with &lt;code&gt;pip install agent-reasoning&lt;/code&gt; or &lt;code&gt;uv add agent-reasoning&lt;/code&gt;. The commands to get started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AXo6o2jGEUekHQjIkVWUI_A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F800%2F1%2AXo6o2jGEUekHQjIkVWUI_A.png" title="Getting started commands" alt="Getting started commandsInstallation and launching agent-reasoning in seconds to access a TUI with 16 reasoning agents." width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Getting started commands&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The TUI provides a 16-agent sidebar, live streaming, and a step-through debugger. Arena mode runs all 16 agents simultaneously on the same query in a 4×4 grid.&lt;/p&gt;

&lt;p&gt;If this is useful, a GitHub star is always appreciated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do I need to modify my existing code to use agent-reasoning?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. The interceptor is a drop-in replacement for the Ollama client. Just change the model name string by appending &lt;code&gt;+strategy&lt;/code&gt; (e.g., &lt;code&gt;gemma3:270m+cot&lt;/code&gt;) and the interceptor handles everything else. Existing LangChain pipelines, web UIs, and scripts work without any other changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which strategy should I start with?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with &lt;code&gt;+cot&lt;/code&gt; (Chain of Thought). It scored the highest average accuracy (88.7%) across our benchmarks and adds minimal latency. If you are unsure, use &lt;code&gt;+meta&lt;/code&gt; and let the auto-router pick the best strategy for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why were only 11 of the 16 strategies benchmarked?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The benchmarks (GSM8K, MMLU, ARC-Challenge) measure multiple-choice accuracy, which works well for strategies that produce a single extractable answer. The remaining five strategies are generation-focused (e.g., Refinement Loop, MCTS) and their strengths in output quality are not captured by multiple-choice evaluations. All 16 strategies are fully implemented and production-ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use this with models other than Ollama-served models?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Currently the interceptor targets the Ollama API. Since it exposes the same &lt;code&gt;.generate()&lt;/code&gt; and &lt;code&gt;.chat()&lt;/code&gt; endpoints, any Ollama-compatible client works out of the box. Support for additional inference backends is on the roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much slower are branching strategies compared to CoT?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ToT is roughly 5-8x slower than CoT because it generates and evaluates multiple candidate branches at each level. Self-Consistency (k=5 samples) adds similar overhead. For latency-sensitive applications, stick with sequential strategies (CoT, Least-to-Most) and reserve branching strategies for problems where accuracy matters more than speed.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Created by Nacho Martinez, Data Scientist at Oracle. Find Nacho on &lt;a href="https://github.com/jasperan" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://linkedin.com/in/jasperan" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;, or visit the &lt;a href="https://www.oracle.com/developer/resources/" rel="noopener noreferrer"&gt;Oracle AI Developer page&lt;/a&gt; for more resources.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Agent Memory: Why Your AI Has Amnesia and How to Fix It</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Fri, 17 Apr 2026 09:08:10 +0000</pubDate>
      <link>https://dev.to/oracledevs/agent-memory-why-your-ai-has-amnesia-and-how-to-fix-it-475e</link>
      <guid>https://dev.to/oracledevs/agent-memory-why-your-ai-has-amnesia-and-how-to-fix-it-475e</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Today's AI agents forget everything between conversations. Every interaction starts from zero, with no recall of who you are or what you've discussed before.&lt;/li&gt;
&lt;li&gt;Agent memory isn't about bigger context windows. It's about a persistent, evolving state that works across sessions.&lt;/li&gt;
&lt;li&gt;The field has converged on four memory types (working, procedural, semantic, episodic) that map directly to how human memory works.&lt;/li&gt;
&lt;li&gt;Building agent memory at enterprise scale is fundamentally a database problem. You need vectors, graphs, relational data, and ACID transactions working together.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Is Agent Memory and Why Does Your AI Agent Need It?
&lt;/h2&gt;

&lt;p&gt;You've spent weeks building an AI customer service agent. It handles complaints, processes refunds, even cracks the occasional joke when the moment's right. A customer calls back the next day, and your agent has no idea who they are. The conversation from yesterday? Gone. The preference they mentioned twice last week? Never happened. Every single interaction starts from scratch.&lt;/p&gt;

&lt;p&gt;This isn't a bug in your code. It's a fundamental design problem in how we build AI agents today.&lt;/p&gt;

&lt;p&gt;LangChain put it well: '&lt;em&gt;Imagine if you had a coworker who never remembered what you told them, forcing you to keep repeating that information&lt;/em&gt;'. In the coworker scenario, that’s frustrating, and for AI applications, forgetfulness, that’s a dealbreaker.&lt;/p&gt;

&lt;p&gt;At Oracle, we've been deep in this problem as we continue to provide support to customers building AI applications. And here's what we've found: the solution isn't bigger context windows or more verbose prompts. It's a proper memory infrastructure. The kind that databases have been providing for decades.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent memory&lt;/strong&gt; is the composition of system components and infrastructure layer that gives AI agents a persistent, evolving state across conversations and sessions. It enables agents to store, retrieve, update, and forget information over time: learning user preferences, retaining context from past interactions, and adapting behavior based on accumulated experience. Without it, every interaction starts from zero.&lt;/p&gt;

&lt;p&gt;This article breaks down what agent memory actually is, how it works under the hood, the frameworks shaping the field, and guidance on how to build it for production. Whether you're prototyping your first agent or scaling one to thousands of users, this is the foundation you need to get right.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Bigger Context Windows Aren't the Answer
&lt;/h2&gt;

&lt;p&gt;The rapid expansion of context windows, now ranging from hundreds of thousands to millions of tokens, has created a convincing illusion across the industry: that with this much capacity available, the memory problem is effectively solved and retrieval-based mechanisms are behind us. That assumption is wrong.&lt;/p&gt;

&lt;p&gt;The industry calls it '&lt;a href="https://mem0.ai/blog/memory-in-agents-what-why-and-how" rel="noopener noreferrer"&gt;the illusion of memory&lt;/a&gt;'. Stuffing more tokens into a prompt isn't memory. It's a bigger Post-it note: more space to scribble on, but it still goes in the bin when the conversation ends. Memory means the notes survive. Here's why that distinction matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context windows degrade before they fill up.&lt;/strong&gt; Most models break well before their advertised limits. A model claiming 200K tokens typically becomes unreliable around 130K, with sudden performance drops rather than gradual degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There's no sense of importance.&lt;/strong&gt; Context windows treat every token equally. Your name gets the same weight as a throwaway comment from three weeks ago. There's no prioritisation, no salience, no relevance filtering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nothing persists.&lt;/strong&gt; Close the session and it's all gone. Every conversation starts from zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cost scales linearly.&lt;/strong&gt; Maintaining full context across a long agent lifetime gets expensive fast. You're paying per token, and most of those tokens are irrelevant noise.&lt;/p&gt;

&lt;p&gt;Memory is not only about storing chat history or passing more tokens into the context window. It's about building a persistent state stored in an external system, that evolves and informs every interaction the agent has, even weeks or months apart.&lt;/p&gt;

&lt;p&gt;Another misconception to address early on is that RAG (retrieval augmented generation) is agent memory. &lt;strong&gt;RAG brings external knowledge into the prompt at inference time&lt;/strong&gt;. It's great for grounding responses with facts from documents. But RAG is fundamentally stateless. It has no awareness of previous interactions, user identity, or how the current query relates to past conversations. Memory brings continuity. Put simply: RAG helps an agent answer better. Memory helps it learn and adapt. You need both.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Concept: A Mental Model for Agent Memory
&lt;/h2&gt;

&lt;p&gt;Let me give you a framework that makes all of this click. It maps directly to how your own brain works.&lt;/p&gt;

&lt;p&gt;In 2023, researchers at Princeton published the &lt;a href="https://arxiv.org/pdf/2309.02427" rel="noopener noreferrer"&gt;CoALA framework&lt;/a&gt; (Cognitive Architectures for Language Agents). It defines four types of memory, drawn from cognitive science and the &lt;a href="https://arxiv.org/pdf/2205.03854" rel="noopener noreferrer"&gt;SOAR architecture&lt;/a&gt; of the 1980s. Every major framework in the field builds on this taxonomy, and it answers a fundamental question: what options are available for adding memory to an AI agent?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Type&lt;/th&gt;
&lt;th&gt;Human Equivalent&lt;/th&gt;
&lt;th&gt;What It Does in an Agent&lt;/th&gt;
&lt;th&gt;Example Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Working Memory&lt;/td&gt;
&lt;td&gt;Your brain's scratch pad: holding what you're actively thinking about&lt;/td&gt;
&lt;td&gt;Current conversation context, retrieved data, intermediate reasoning&lt;/td&gt;
&lt;td&gt;Conversation buffers, sliding windows, rolling summaries, scratchpads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedural Memory&lt;/td&gt;
&lt;td&gt;Muscle memory: knowing how to ride a bike without thinking&lt;/td&gt;
&lt;td&gt;System prompts, agent code, decision logic&lt;/td&gt;
&lt;td&gt;Prompt templates, tool definitions, agent configs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Memory&lt;/td&gt;
&lt;td&gt;General knowledge: facts and concepts accumulated over your lifetime&lt;/td&gt;
&lt;td&gt;User preferences, extracted facts, knowledge bases&lt;/td&gt;
&lt;td&gt;Vector stores with similarity search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic Memory&lt;/td&gt;
&lt;td&gt;Autobiographical memory: recalling specific experiences from your past&lt;/td&gt;
&lt;td&gt;Past action sequences, conversation logs, few-shot examples&lt;/td&gt;
&lt;td&gt;Timestamped logs with metadata filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it this way. When you're in a meeting, your working memory holds what's being discussed right now. Your procedural memory knows how to take notes and when to speak up. Your semantic memory reminds you that Sarah's team prefers Slack over email. Your episodic memory recalls that the last time you proposed this feature, the VP shut it down because of budget constraints.&lt;/p&gt;

&lt;p&gt;An agent needs all four types working together. Most agents today only have working memory: whatever fits in the current context window. That's like trying to do your job using nothing but a whiteboard that gets wiped clean every evening.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lilianweng.github.io/posts/2023-06-23-agent/" rel="noopener noreferrer"&gt;Lilian Weng's influential formula&lt;/a&gt; captures the big picture simply:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent = LLM + Memory + Planning + Tool Use.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Her short-term memory maps to CoALA's working memory. Her long-term memory encompasses the other three types.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.langchain.com/oss/python/concepts/memory" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; adds a practical layer with two approaches to memory updates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hot path memory&lt;/strong&gt;: the agent explicitly decides to remember something before responding. This is what ChatGPT does. It adds latency but ensures immediate memory updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background memory&lt;/strong&gt;: a separate process extracts and stores memories during or after the conversation. No latency hit, but memories aren't available straight away.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: memory is application-specific. What a coding agent remembers about a user is very different from what a research agent might store.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2310.08560" rel="noopener noreferrer"&gt;Letta&lt;/a&gt; (formerly MemGPT) takes a different angle entirely, borrowing from operating systems. Treat the context window like RAM and external storage like a disk. The agent pages data between these tiers, creating a 'virtual context' that feels unlimited. The agent manages its own memory using tools: it decides what to remember, what to update, and what to archive.&lt;/p&gt;

&lt;p&gt;The distinction between programmatic memory (developer decides what to store) and agentic memory (the agent itself decides) matters. The field is moving towards the latter. Agents that manage their own memory adapt to individual users without requiring developer intervention for each new use case. The decision as to which memory operations are programmatic and agent triggered isn’t always as clear cut, and we’ve seen various approaches work well in certain use cases and domains. In a future post, we will go into the common patterns and design principles of memory engineering.&lt;/p&gt;

&lt;p&gt;Referring back to the customer service agent from the start of this article. Customer service is the most common use case for agents in production (26.5% of deployments, per &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;LangChain's 2025 industry survey&lt;/a&gt;), and it demands all four memory types working together. Episodic memory recalls past tickets and interactions. Semantic memory stores customer preferences and account details. Working memory tracks the live conversation. Procedural memory encodes resolution workflows and escalation rules. All four memory types enable the chatbot to perform well on continuous tasks and adapt to new information.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Landscape: Frameworks and Open-Source Libraries
&lt;/h2&gt;

&lt;p&gt;What are the commonly used libraries and open-source projects for agent memory? The ecosystem has matured quickly. Here are the projects shaping how developers build agent memory today.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Open Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangChain / LangMem / LangGraph&lt;/td&gt;
&lt;td&gt;Agent orchestration with built-in memory abstractions. Hot path and background memory. LangMem SDK handles extraction and consolidation.&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Letta (MemGPT)&lt;/td&gt;
&lt;td&gt;Stateful agent platform with OS-inspired memory hierarchy. Agents self-edit their own memory via tool calls.&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zep / Graphiti&lt;/td&gt;
&lt;td&gt;Temporal knowledge graphs for relationship-aware memory. Bi-temporal modelling with sub-200ms retrieval.&lt;/td&gt;
&lt;td&gt;Yes (Graphiti)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem0&lt;/td&gt;
&lt;td&gt;Self-improving memory layer with vector and graph architecture. Automatic memory extraction and conflict resolution.&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;langchain-oracledb&lt;/td&gt;
&lt;td&gt;Official LangChain integration for Oracle Database. Vector stores, hybrid search, and embeddings with enterprise-grade security.&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The orchestration library matters, but at scale, the storage backend matters more. Most of these frameworks are database-agnostic by design. The question isn't which framework to use. It's what database sits underneath it.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2FAgent-Memory_-Why-Your-AI-Has-Amnesia-and-How-to-Fix-It-visual-selection-5-1-edited.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2FAgent-Memory_-Why-Your-AI-Has-Amnesia-and-How-to-Fix-It-visual-selection-5-1-edited.png" alt="Illustration related to agent memory architectures" width="800" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deep Dive: How Agent Memory Actually Works
&lt;/h2&gt;

&lt;p&gt;What are the common storage options for agent memory? Production systems today use three paradigms working together. You need to understand all three.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector stores for semantic memory
&lt;/h3&gt;

&lt;p&gt;This is the most common approach. You take text, convert it to embeddings (typically 128 to 2,048 dimensions depending on embedding model utilised), and store them in a vector database. Retrieval works through vector search, against vectors that are indexed using HNSW (hierarchical navigable small world); typically we find the memories (embeddings in database) that are semantically closest to the current query.&lt;/p&gt;

&lt;p&gt;It's fast and simple but limited. Vector search captures semantic similarity well, yet misses structural relationships.&lt;/p&gt;

&lt;h3&gt;
  
  
  Knowledge graphs for relationship memory
&lt;/h3&gt;

&lt;p&gt;Vector search can tell you that a user mentioned coffee. But it can't tell you that they prefer a specific shop, ordered last Tuesday, and always get oat milk. That chain of connections (person, preference, place, time, detail) is a graph problem.&lt;/p&gt;

&lt;p&gt;Knowledge graphs store facts as entities and relationships, with edges capturing how they connect. Add bi-temporal modelling (tracking both when events happened and when the system learned about them) and you can ask not just 'what do we know?' but 'what did we know at any point in time?'&lt;/p&gt;

&lt;p&gt;Frameworks like Zep's Graphiti implement this pattern, &lt;a href="https://arxiv.org/html/2501.13956v1" rel="noopener noreferrer"&gt;achieving 94.8% accuracy&lt;/a&gt; on the Deep Memory Retrieval benchmark. Oracle Database supports property graphs natively through SQL/PGQ, so graph queries run inside the same engine as your vector search and relational data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured databases for factual memory
&lt;/h3&gt;

&lt;p&gt;Relational databases store the structured&amp;nbsp;data: user profiles, access controls, session metadata, audit logs. As &lt;a href="https://www.cognee.ai/blog/fundamentals/vectors-and-graphs-in-practice" rel="noopener noreferrer"&gt;Cognee&lt;/a&gt; puts it: 'Vectors deliver high-recall semantic candidates (what feels similar), while graphs provide the structure to trace relationships across entities and time (how things relate).' Relational tables anchor both with the transactional guarantees that production systems demand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does a converged database change the equation?
&lt;/h3&gt;

&lt;p&gt;Most teams stitch this together with separate databases: Pinecone for vectors, Neo4j for graphs, Postgres for relational data. Three security models, three failure modes, no shared transaction boundaries. If one write fails, your agent's memory is in an inconsistent state.&lt;/p&gt;

&lt;p&gt;Oracle's converged database runs all three paradigms natively inside a single engine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Vector Search&lt;/strong&gt; for embedding storage and similarity retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL/PGQ&lt;/strong&gt; for property graph queries across entity relationships&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relational tables&lt;/strong&gt; for structured data, metadata, and audit trails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Document Store&lt;/strong&gt; for flexible, schema-free memory objects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All four share the same ACID transaction boundary and the same security model. Row-level security, encryption, and access controls apply uniformly across every data type. One engine, one transaction, one security policy: the three paradigms above become three views of the same underlying data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Memory Operations
&lt;/h2&gt;

&lt;p&gt;Every memory system runs on four core operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ADD&lt;/strong&gt;: Store a completely new fact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UPDATE&lt;/strong&gt;: Modify an existing memory when new information complements or corrects it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DELETE&lt;/strong&gt;: Remove a memory when new information contradicts it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SKIP&lt;/strong&gt;: Do nothing when information is a repeat or irrelevant&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Modern memory systems delegate these decisions to the LLM itself rather than using brittle if/else logic. The extraction phase ingests context sources (the latest exchange, a rolling summary, recent messages) and uses the LLM to extract candidate memories. The update phase compares each new fact against the most similar entries in the vector database, using conflict detection to determine whether to add, merge, update, or skip.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval: how agents recall
&lt;/h3&gt;

&lt;p&gt;Due to the heterogenous nature of data that agents encounter, production systems combine multiple retrieval approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search&lt;/strong&gt;: vector similarity (cosine distance) for meaning-based matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal search&lt;/strong&gt;: bi-temporal models enable point-in-time queries ('What did the user prefer last March?')&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph traversal&lt;/strong&gt;: multi-hop queries across knowledge graph edges for complex reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval&lt;/strong&gt;: combining keyword (full-text) and semantic (vector) search in a single query, which is critical for retrieving specific facts like names, dates, or project codes alongside conceptually related memories&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Forgetting: the underrated operation
&lt;/h3&gt;

&lt;p&gt;Effective forgetting can be implemented with decay functions applied to vector relevance scores: by analysing the results of vector search, old and unreferenced embeddings naturally fade from the agent's attention, imitating biological human memory decay patterns. In a database, this is straightforward. A recency-weighted scoring function multiplies semantic similarity by an exponential decay factor based on time since last access. The result: memories that haven't been recalled recently lose salience gradually, just like human recall.&lt;/p&gt;

&lt;p&gt;Some systems take a different approach entirely. Old facts are invalidated but never discarded, preserving historical accuracy for audit trails. The right strategy depends on your use case, but both are fundamentally database operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Enterprise Reality: What Changes at Scale
&lt;/h2&gt;

&lt;p&gt;Here's where the gap between demo and production becomes a chasm.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://view.ceros.com/kpmg-design/kpmg-genai-study/p/1" rel="noopener noreferrer"&gt;KPMG's Pulse Survey&lt;/a&gt; of 130 C-suite leaders (all at companies with over $1B revenue) found that 65% cite agentic system complexity as the top barrier for two consecutive quarters. Agent deployment has more than doubled, from 11% in Q1 2025 to 26% in Q4 2025, but that still means three quarters of large enterprises haven't deployed. &lt;a href="https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work" rel="noopener noreferrer"&gt;McKinsey&lt;/a&gt; puts it even more starkly: only 1% of leaders describe their companies as 'mature' in AI deployment.&lt;/p&gt;

&lt;p&gt;The problems that surface at scale are database problems. They've been database problems all along.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2FGemini_Generated_Image_9flpfr9flpfr9flp-1024x559.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2FGemini_Generated_Image_9flpfr9flpfr9flp-1024x559.png" alt="Illustration related to enterprise agent memory at scale" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security and isolation.&lt;/strong&gt; Memory must be scoped per user, per team, per organisation. Memory poisoning is a real attack vector: adversaries can inject malicious information into an agent's memory to corrupt future decision-making. You need row-level security, not just namespace-level isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-tenancy.&lt;/strong&gt; Agents serving multiple organisations need complete data isolation. Most vector-only databases offer namespace-level separation. That's not the same as the row-level security that regulated industries require. Oracle's native PDB/CDB architecture provides inherent multi-tenant isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance is getting complex.&lt;/strong&gt; GDPR's right to be forgotten applies to explicit agent memory stores. But the EU AI Act (fully applicable from August 2026) requires 10-year audit trails for high-risk AI systems. Think about that tension: you need to delete personal data on request while maintaining a decade of audit history. That requires architectural sophistication that most startups are only beginning to address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACID transactions matter.&lt;/strong&gt; Agent memory operations often touch multiple data types simultaneously. Updating a vector embedding, modifying a graph relationship, and changing relational metadata must all succeed or all fail. Without atomicity, partial memory updates leave your agent in an inconsistent state.&lt;/p&gt;

&lt;p&gt;These aren't theoretical concerns. They're the reasons three quarters of enterprises are still stuck at the pilot stage.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Implementation: Building Agent Memory with LangChain and Oracle
&lt;/h2&gt;

&lt;p&gt;Let's get practical. We'll use LangChain as our orchestration framework and Oracle Database as the memory backend, using the langchain-oracledb package. Here's how quickly you can go from zero to a working memory system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-oracledb oracledb langchain-core
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connect and create a vector store
&lt;/h3&gt;

&lt;p&gt;This is all it takes to set up a production-ready vector store backed by Oracle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleVS&lt;/span&gt;

&lt;span class="c1"&gt;# Create a connection pool (production-ready)
&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hostname:port/service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialise vector store for semantic memory
&lt;/span&gt;&lt;span class="n"&gt;semantic_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OracleVS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
 &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# any LangChain-compatible embeddings
&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AGENT_SEMANTIC_MEMORY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;distance_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's your semantic memory store. Oracle handles the vector indexing, ACID transactions, and security natively. No separate vector database needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Store and retrieve a memory
&lt;/h3&gt;

&lt;p&gt;The core pattern is simple: write memories with metadata, retrieve them with similarity search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Store a memory
&lt;/span&gt;&lt;span class="n"&gt;semantic_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prefers dark mode and concise responses.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve relevant memories
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semantic_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are this user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s preferences?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From here, you can create separate vector stores for each memory type (semantic, episodic, procedural) under the same Oracle instance, all sharing the same security policies and transaction guarantees.&lt;/p&gt;

&lt;h3&gt;
  
  
  Go deeper: the full memory engineering notebook
&lt;/h3&gt;

&lt;p&gt;The snippets above show the building blocks, but a production agent memory system needs considerably more. We've published a complete, runnable notebook in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/memory_context_engineering_agents.ipynb" rel="noopener noreferrer"&gt;Oracle AI Developer Hub&lt;/a&gt; that implements the full architecture discussed in this post. This notebook builds a complete Memory Manager with &lt;strong&gt;six distinct memory types&lt;/strong&gt;, each backed by Oracle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Conversational&lt;/td&gt;
&lt;td&gt;Chat history per thread&lt;/td&gt;
&lt;td&gt;SQL Table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge Base&lt;/td&gt;
&lt;td&gt;Searchable documents and facts&lt;/td&gt;
&lt;td&gt;SQL Table + Vector Enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflow&lt;/td&gt;
&lt;td&gt;Learned action patterns&lt;/td&gt;
&lt;td&gt;SQL Table + Vector Enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Toolbox&lt;/td&gt;
&lt;td&gt;Dynamic tool definitions with semantic retrieval&lt;/td&gt;
&lt;td&gt;SQL Table + Vector Enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;td&gt;People, places, systems extracted from context&lt;/td&gt;
&lt;td&gt;SQL Table + Vector Enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summary&lt;/td&gt;
&lt;td&gt;Compressed context for long conversations&lt;/td&gt;
&lt;td&gt;SQL Table + Vector Enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It also covers &lt;strong&gt;context engineering&lt;/strong&gt; (monitoring context window usage, auto-summarisation at thresholds, just-in-time retrieval), &lt;strong&gt;semantic tool discovery&lt;/strong&gt; (scaling to hundreds of tools while only passing the relevant ones to the LLM), and a &lt;strong&gt;complete agent loop&lt;/strong&gt; that ties everything together.&lt;/p&gt;

&lt;p&gt;Run the notebook: &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/memory_context_engineering_agents.ipynb" rel="noopener noreferrer"&gt;oracle-devrel/oracle-ai-developer-hub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Perspective: Where This Is Heading
&lt;/h2&gt;

&lt;p&gt;Here's what I think is coming, and where I'm still working things out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sleep-time computation will change the game.&lt;/strong&gt; The idea is simple: agents that 'think' during idle time (reorganising, consolidating, refining their memories) perform better and cost less at query time. &lt;a href="https://openai.com/index/inside-our-in-house-data-agent/" rel="noopener noreferrer"&gt;OpenAI's internal data&lt;/a&gt; agent already runs this pattern in production. Their engineering team describes a daily offline pipeline that aggregates table usage, human annotations, and code-derived enrichment into a single normalised representation, then converts it into embeddings for retrieval. At query time, the agent pulls only the most relevant context rather than scanning raw metadata.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.letta.com/blog/sleep-time-compute" rel="noopener noreferrer"&gt;Letta's&lt;/a&gt; research puts numbers to it: agents using this approach achieve 18% accuracy gains and 2.5x cost reduction per query. We're going to see a clear separation between 'thinking agents' that run in the background and 'serving agents' that handle real-time interactions. That's a pattern databases have supported forever: batch processing alongside real-time queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory will extend naive RAG implementations.&lt;/strong&gt; The spectrum is already shifting: traditional RAG to agentic RAG to full memory systems. VentureBeat predicts that contextual memory will surpass RAG for agentic AI in 2026. I think that's right. RAG retrieves documents. Memory understands context. The agents that win will do both, but memory will be the differentiator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The convergent database will become non-negotiable.&lt;/strong&gt; Agent memory needs vectors, graphs, relational data, and temporal context working together. Stitching together separate databases for each type creates brittle systems with security gaps and consistency problems. I'm still figuring out exactly how fast this consolidation will happen, but the direction is clear.&lt;/p&gt;

&lt;p&gt;One open question remains, and that is the pace at which enterprises will transition from pilot to production deployment. At the moment the technology is at a clear stage of maturity and architectural design patterns are proven and battle tested. On the other hand, organisational readiness, encompassing governance, infrastructure modernisation, and cross-functional alignment, is a fundamentally different challenge.&lt;/p&gt;

&lt;p&gt;What is clear: agent memory is, at its foundation, a database problem. And building databases for mission-critical workloads is what Oracle has been doing for nearly five decades.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are the main types of agent memory used in AI systems?
&lt;/h3&gt;

&lt;p&gt;The field has converged on four types, drawn from cognitive science: &lt;strong&gt;working memory&lt;/strong&gt; (current conversation context), &lt;strong&gt;procedural memory&lt;/strong&gt; (system prompts and decision logic), &lt;strong&gt;semantic memory&lt;/strong&gt; (accumulated facts and user preferences), and &lt;strong&gt;episodic memory&lt;/strong&gt; (past interaction logs and experiences). Every major framework builds on this taxonomy, first formalised in the CoALA framework from Princeton in 2023.&lt;/p&gt;

&lt;h3&gt;
  
  
  What options are available for adding memory to an AI agent?
&lt;/h3&gt;

&lt;p&gt;Two broad approaches exist. &lt;strong&gt;Programmatic memory&lt;/strong&gt; is where the developer defines what gets stored and retrieved. &lt;strong&gt;Agentic memory&lt;/strong&gt; is where the agent itself decides what to remember, update, and forget using tool calls. Frameworks like Letta (formerly MemGPT) and LangChain's LangMem SDK support both patterns. The field is moving towards agentic memory, where agents manage their own state without developer intervention for each new use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are common agent memory storage options?
&lt;/h3&gt;

&lt;p&gt;Production systems typically combine three paradigms: &lt;strong&gt;vector stores&lt;/strong&gt; for meaning-based retrieval (storing embeddings and querying by cosine similarity), &lt;strong&gt;knowledge graphs&lt;/strong&gt; for relationship-aware retrieval (entities, edges, and bi-temporal modelling), and &lt;strong&gt;structured relational databases&lt;/strong&gt; for transactional data like user profiles, access controls, and audit logs. Most teams stitch these together with separate databases, though converged databases like Oracle can run all three natively in a single engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  What techniques allow AI agents to forget or selectively erase memory?
&lt;/h3&gt;

&lt;p&gt;The most common approach uses &lt;strong&gt;decay functions&lt;/strong&gt; applied to vector relevance scores: a recency-weighted scoring function multiplies semantic similarity by an exponential decay factor based on time since last access. Memories that haven't been recalled recently lose salience gradually, mimicking biological memory decay. An alternative approach &lt;strong&gt;invalidates&lt;/strong&gt; old facts without discarding them, preserving historical accuracy for audit trails while removing them from active retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the differences between short-term and long-term agent memory?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Short-term memory&lt;/strong&gt; (also called working memory) is the current context window: whatever the agent is actively reasoning about in this conversation. It's fast but volatile; close the session and it's gone. &lt;strong&gt;Long-term memory&lt;/strong&gt; encompasses everything that persists across sessions: semantic memory (facts and preferences), episodic memory (past interactions), and procedural memory (learned behaviours and decision logic). Long-term memory requires external storage and retrieval infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are commonly used libraries for agent memory?
&lt;/h3&gt;

&lt;p&gt;The ecosystem includes &lt;strong&gt;LangChain/LangMem&lt;/strong&gt; (hot path and background memory with extraction and consolidation), &lt;strong&gt;Letta/MemGPT&lt;/strong&gt; (OS-inspired memory hierarchy where agents self-edit memory via tool calls), &lt;strong&gt;Zep/Graphiti&lt;/strong&gt; (temporal knowledge graphs with sub-200ms retrieval), &lt;strong&gt;Mem0&lt;/strong&gt; (self-improving memory with automatic conflict resolution), and &lt;strong&gt;langchain-oracledb&lt;/strong&gt; (Oracle Database integration for vector stores, hybrid search, and embeddings with enterprise-grade security).&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I store and query vector embeddings?
&lt;/h3&gt;

&lt;p&gt;The core pattern is straightforward: convert text into embeddings (typically 128 to 2,048 dimensions), store them in a vector-capable database, and retrieve them using cosine similarity search. With langchain-oracledb and Oracle Database, you initialise a vector store, add texts with metadata (such as user ID and memory type), then query with similarity_search() filtered by metadata. Oracle handles vector indexing, ACID transactions, and security natively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which databases offer vector search capabilities for enterprises?
&lt;/h3&gt;

&lt;p&gt;Several databases now support vector search, but enterprise requirements go beyond basic similarity queries. You need ACID transactions, row-level security, multi-tenancy, and compliance features alongside your vector operations. Oracle Database provides native &lt;strong&gt;AI Vector Search&lt;/strong&gt; within its converged architecture, meaning vector queries run in the same engine as relational tables, property graphs (SQL/PGQ), and JSON document stores, all sharing a single transaction boundary and security model.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>database</category>
      <category>oracle</category>
    </item>
    <item>
      <title>What Is the AI Agent Loop? The Core Architecture Behind Autonomous AI Systems</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Fri, 17 Apr 2026 09:05:55 +0000</pubDate>
      <link>https://dev.to/oracledevs/what-is-the-ai-agent-loop-the-core-architecture-behind-autonomous-ai-systems-51b7</link>
      <guid>https://dev.to/oracledevs/what-is-the-ai-agent-loop-the-core-architecture-behind-autonomous-ai-systems-51b7</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Key Takeaways&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The architectural difference between a chatbot and an AI agent is one pattern: the agent loop. It’s an LLM invoking tools inside an iterative cycle, repeating until the task is complete or a stopping condition is reached.&lt;/li&gt;
&lt;li&gt;A chatbot responds in a single pass. An agent persists, adapts, and acts across multiple steps: perceiving its environment, reasoning over available options, executing an action, and observing the result before deciding what comes next.&lt;/li&gt;
&lt;li&gt;Every major AI company (OpenAI, Anthropic, Google, Microsoft, Meta) has converged on this same core pattern, despite building very different products around it.&lt;/li&gt;
&lt;li&gt;Building agent loops for production requires engineering for two constraints: cost, where agents consume approximately 4x more tokens than standard chat interactions and up to 15x in multi-agent systems, and observability, the ability to trace every reasoning step, tool call, and decision across an iterative execution cycle.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Is the AI Agent Loop and Why Should You Care?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FAgent-Loop-%25E2%2580%2594-Linear-Flow-with-Loop-back-1024x431.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FAgent-Loop-%25E2%2580%2594-Linear-Flow-with-Loop-back-1024x431.png" title="The five-stage agent loop: Perceive, Reason, Plan, Act, Observe" alt="The five-stage agent loop: Perceive, Reason, Plan, Act, Observe" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The five-stage agent loop: Perceive, Reason, Plan, Act, Observe&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You have built a chatbot. It works. Users ask a question, it generates a response, and the interaction is complete. Then someone asks it to do something that requires more than one step.&lt;/p&gt;

&lt;p&gt;‘Find me the three cheapest flights to Tokyo next month, check if my loyalty points cover any of them, and book the best option’. The chatbot has no mechanism to proceed. It generates a response and stops. It can answer questions about flights. It can explain how loyalty points work. It cannot execute the workflow. The interaction is stateless. Each prompt is processed in isolation, with no persistent context, no access to intermediate results, and no ability to chain decisions across steps.&lt;/p&gt;

&lt;p&gt;This is not a limitation of the model. Chat-GPT, Claude, and Gemini are all capable of reasoning through multi-step problems. The limitation is architectural. A chatbot is built to respond. An agent is built to act.&lt;/p&gt;

&lt;p&gt;The difference is one while loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the Agent Loop?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The AI agent loop is the iterative execution cycle at the core of every agentic AI system. At each iteration, the agent assembles context from available inputs, invokes an LLM to reason and select an action, executes that action, observes the outcome, and feeds the observation back into the next iteration. This process repeats until the task is complete or a defined stopping condition is reached.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Across the engineering teams Oracle works with building AI applications, one architectural pattern consistently separates working prototypes from production-grade systems: the agent loop. It’s the architecture that transforms a language model from a text generation system into one that can take actions, adapt to results, and complete multi-step tasks autonomously.&lt;/p&gt;

&lt;p&gt;This article examines the agent loop architecture: what it is, how it works, why every major AI company has converged on the same core pattern, and what is required to build one that holds up in production.&lt;/p&gt;

&lt;p&gt;All code in this article is available as a runnable companion notebook in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/agent_loop_foundations.ipynb" rel="noopener noreferrer"&gt;Oracle AI Developer Hub on GitHub&lt;/a&gt;. Follow along step by step or execute the full implementation end to end.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why Single-Pass Responses Hit a Wall&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FChatbot-vs-Agent-%25E2%2580%2594-Horizontal-Stacked-Blog-Ready-1024x408.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FChatbot-vs-Agent-%25E2%2580%2594-Horizontal-Stacked-Blog-Ready-1024x408.png" title="Single-pass chatbot vs. iterative agent loop: one response versus continuous execution until task completion" alt="Single-pass chatbot vs. iterative agent loop: one response versus continuous execution until task completion" width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Single-pass chatbot vs. iterative agent loop: one response versus continuous execution until task completion&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The standard chatbot interaction follows a simple pattern: user sends message, model generates response, done. One input, one output, no state between turns. It works brilliantly for question-answering, summarisation, and creative writing. It falls apart from the moment you need the model to &lt;em&gt;do&lt;/em&gt; something in the real world.&lt;/p&gt;

&lt;p&gt;A single-pass response has three fundamental constraints:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It cannot iterate on results.&lt;/strong&gt; A single-pass system can execute a tool call within a turn, but it has no mechanism to evaluate whether that action succeeded, adapt based on the outcome, or chain a subsequent decision from the result. There is no feedback loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It cannot recover from failure.&lt;/strong&gt; Without iterative execution, a failed tool call, an empty result set, or an ambiguous API response cannot trigger a revised strategy. The model has no visibility into downstream outcomes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It cannot decompose dependent tasks.&lt;/strong&gt; Real-world workflows require gathering information, making decisions based on that information, executing actions, and handling the consequences of those actions. Each step depends on the result of the previous one. That is a loop, not a straight line.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="http://lib.ysu.am/disciplines_bk/efdd4d1d4c2087fe1cbe03d9ced67f34.pdf" rel="noopener noreferrer"&gt;Russell and Norvig&lt;/a&gt; defined an agent back in 1995 as 'anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.' That definition is 30 years old and it still holds. The key word is &lt;em&gt;acting&lt;/em&gt;. Not responding. Acting.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/2210.03629" rel="noopener noreferrer"&gt;ReAct framework&lt;/a&gt; from Princeton and Google Research (Yao et al., 2022) made this practical for LLMs by interleaving reasoning with action in a single prompt-driven loop. The results demonstrated that models perform significantly better when they can reason, act, observe, and reason again: a 34% improvement on &lt;a href="https://arxiv.org/abs/2010.03768" rel="noopener noreferrer"&gt;ALFWorld&lt;/a&gt; and 10% on &lt;a href="https://arxiv.org/abs/2207.01206" rel="noopener noreferrer"&gt;WebShop&lt;/a&gt;. Single-pass responses are not just architecturally limiting. They leave measurable performance on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Agent Loop: A Mental Model&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FAgent-Loop-%25E2%2580%2594-Linear-Flow-with-Loop-back-1-1024x431.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FAgent-Loop-%25E2%2580%2594-Linear-Flow-with-Loop-back-1-1024x431.png" title="The five-stage agent loop: Perceive, Reason, Plan, Act, Observe" alt="The five-stage agent loop: Perceive, Reason, Plan, Act, Observe" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The five-stage agent loop: Perceive, Reason, Plan, Act, Observe&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The agent loop operates across five stages that repeat until the task is complete or a stopping condition is met:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Perceive:&lt;/strong&gt; The agent receives input. This could be a user message, an API response, an error, or the result of its last action.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reason:&lt;/strong&gt; The LLM processes everything in context and decides what to do next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan:&lt;/strong&gt; For complex tasks, the agent decomposes the objective into discrete subtasks before execution. Simpler workflows proceed directly to the Act stage without a dedicated planning step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act:&lt;/strong&gt; The agent executes something: a tool call, an API request, a database query, a code execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observe:&lt;/strong&gt; The agent examines the result. Did it work? Is the task complete? Does the plan need adjusting?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then, it loops back to step 1.&lt;/p&gt;

&lt;p&gt;In pseudocode, the complete pattern reduces to six lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;while not done:
   response = call_llm(messages)
   if response has tool_calls:
      results = execute_tools(response.tool_calls)
      messages.append(results)
   else:
      done = True
      return response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This execution pattern underpins every autonomous AI system currently in production. It is the foundation on which every major AI organisation has built its agentic architecture. &lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic's engineering&lt;/a&gt; guidance describes the pattern plainly: agents are often just LLMs using tools based on environmental feedback in a loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  When the Agent Loop Is Not the Right Architecture
&lt;/h3&gt;

&lt;p&gt;The agent loop is not the appropriate architecture for every use case. Before building an agentic system, validate that the workflow requires iterative execution.&lt;/p&gt;

&lt;p&gt;Agent loops are well-suited to tasks where the number of required steps cannot be predicted in advance, where the agent must adapt based on intermediate results, and where the cost of latency is acceptable relative to the value of task completion.&lt;/p&gt;

&lt;p&gt;Workflows that follow a fixed, predictable sequence of steps are better served by deterministic pipelines. Single-step tasks that require one LLM call and one tool invocation do not benefit from the overhead of an agent loop. Tasks where latency is the primary constraint should be evaluated carefully, as each loop iteration adds LLM call latency.&lt;/p&gt;

&lt;p&gt;The principle from both &lt;a href="https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; and &lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic's&lt;/a&gt; published guidance is consistent: start with the simplest architecture that solves the problem. Introduce the agent loop only when iterative reasoning and adaptive tool use are required.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How Every Major AI Company Converged on the Same Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FCompany-Convergence-%25E2%2580%2594-Headed-Cards-v2-1024x426.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FCompany-Convergence-%25E2%2580%2594-Headed-Cards-v2-1024x426.png" title="Six major AI organisations, one underlying architecture: LLM plus tools in a loop" alt="Six major AI organisations, one underlying architecture: LLM plus tools in a loop" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Six major AI organisations, one underlying architecture: LLM plus tools in a loop&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Despite differences in SDK design, nomenclature, and architectural philosophy, every major AI organisation has converged on the same underlying execution pattern. The table below maps each implementation against the five stages of the core loop:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;What they call it&lt;/th&gt;
&lt;th&gt;Core pattern&lt;/th&gt;
&lt;th&gt;Key contribution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Agent Loop&lt;/td&gt;
&lt;td&gt;Tool-calling loop via Codex SDK&lt;/td&gt;
&lt;td&gt;Code-first approach; anti-declarative-graph philosophy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Agent loop&lt;/td&gt;
&lt;td&gt;Augmented LLM + tools in loop&lt;/td&gt;
&lt;td&gt;Simplicity-first design; workflows vs. agents distinction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Orchestration layer&lt;/td&gt;
&lt;td&gt;ReAct (Thought-Action-Observation)&lt;/td&gt;
&lt;td&gt;Invented Chain-of-Thought and co-created ReAct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft&lt;/td&gt;
&lt;td&gt;Think-Act-Learn&lt;/td&gt;
&lt;td&gt;Conversation-driven loop&lt;/td&gt;
&lt;td&gt;Dual-loop ledger planning (Magentic-One)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Agent loop&lt;/td&gt;
&lt;td&gt;ReAct via Llama Stack&lt;/td&gt;
&lt;td&gt;Open-source building blocks; security-first ('Rule of Two')&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;Agent executor / StateGraph&lt;/td&gt;
&lt;td&gt;Tool-calling state machine&lt;/td&gt;
&lt;td&gt;Graph-based orchestration; middleware hooks for control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The implementations differ in naming conventions, SDK design, and architectural philosophy. The execution pattern is identical.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lilianweng.github.io/posts/2023-06-23-agent/" rel="noopener noreferrer"&gt;Lilian Weng's formula&lt;/a&gt; captures it simply: &lt;strong&gt;&lt;em&gt;Agent = LLM + Memory + Planning + Tool Use&lt;/em&gt;&lt;/strong&gt;. The agent loop is the runtime that ties those four components together.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How the Agent Loop Actually Works&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FHow-the-Agent-Loop-Works-%25E2%2580%2594-Iteration-Sequence-1024x849.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FHow-the-Agent-Loop-Works-%25E2%2580%2594-Iteration-Sequence-1024x849.png" title="Three iterations, three tool calls, one complete response." alt="Three iterations, three tool calls, one complete response." width="800" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Three iterations, three tool calls, one complete response.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The canonical pattern is &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct&lt;/a&gt;: reasoning interleaved with acting. The model does not simply select a tool. It reasons about why that tool is appropriate, executes the call, processes the result, and reasons again.&lt;/p&gt;

&lt;p&gt;To illustrate how the loop executes in practice, consider the following task: identify the most cited paper on agent memory published in 2026 and summarise its key findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iteration 1 (Reason → Act → Observe):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent reasons that it needs to search for papers on agent memory from 2026 and selects the search tool. It calls the search API with relevant keywords. The result returns 15 papers with citation counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iteration 2:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent identifies the top result with 340 citations and calls a document retrieval tool to access the full abstract and key sections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iteration 3:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent determines that sufficient information has been gathered, generates the summary, and exits the loop.&lt;/p&gt;

&lt;p&gt;Three iterations. Three tool calls. One complete answer that no single-pass chatbot could have produced.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Tool integration: the universal pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Across every provider, tool integration follows the same structure. Tools are defined with a name, description, and JSON Schema parameters. The model decides whether to call a tool and with what arguments. The system executes the function and returns results as a tool message. The model processes results and decides whether to continue looping or return a final response.&lt;/p&gt;

&lt;p&gt;Tools in an agent loop can be classified into three categories. Data tools retrieve context, such as database queries, vector search, or document retrieval. Action tools perform operations with side effects, such as writing records, calling external APIs, or executing code. Orchestration tools invoke other agents as callable sub-modules, enabling multi-agent coordination within a single workflow. Clear classification of tools at design time reduces ambiguous model behaviour at runtime.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Anthropic's Model Context Protocol&lt;/a&gt; (MCP) has emerged as a leading open standard for how agents discover and connect to external tools, with adoption across OpenAI, Google, Microsoft, and the broader ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Beyond the basic loop&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The core &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct&lt;/a&gt; loop handles most use cases, but the pattern extends in two important directions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan-and-execute separates planning from execution.&lt;/strong&gt; Instead of invoking the LLM at every step, a planner generates a full task breakdown upfront, an executor works through each subtask, and a re-planner adjusts when execution diverges from the plan. &lt;a href="https://arxiv.org/abs/2312.04511" rel="noopener noreferrer"&gt;LangChain's LLMCompiler&lt;/a&gt; implementation streams a directed acyclic graph of tasks with explicit dependency tracking, enabling parallel execution. The original paper (Kim et al., ICML 2024) reports a 3.6x speedup over sequential ReAct-style execution. At production scale, where each LLM call carries a direct cost, the architectural decision to plan upfront rather than reason at every step has measurable financial implications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent orchestration&lt;/strong&gt; distributes work across specialised agents. &lt;a href="https://www.anthropic.com/engineering/multi-agent-research-system" rel="noopener noreferrer"&gt;Anthropic's Claude Research&lt;/a&gt; system uses an orchestrator-worker pattern where a lead agent spawns sub-agents to explore different threads in parallel. Their multi-agent system outperformed a single-agent setup by 90.2% on internal research evaluations. &lt;a href="https://arxiv.org/abs/2411.04468" rel="noopener noreferrer"&gt;Microsoft's Magentic-One&lt;/a&gt; takes it further with a dual-loop system: an outer loop for strategic planning and an inner loop for step-by-step execution, with the ability to reset the entire strategy when progress stalls.&lt;/p&gt;

&lt;p&gt;These are powerful extensions, but the advice from every company is the same: start with the simplest loop that works. Only add complexity when you can measure the improvement.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Enterprise Reality: Cost and Observability&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FEnterprise-Reality-%25E2%2580%2594-Cost-and-Observability-1024x416.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F03%2FEnterprise-Reality-%25E2%2580%2594-Cost-and-Observability-1024x416.png" title="Token cost scaling from standard chat (1x) to single agent (4x) to multi-agent (15x), with corresponding production requirements" alt="Token cost scaling from standard chat (1x) to single agent (4x) to multi-agent (15x), with corresponding production requirements" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Token cost scaling from standard chat (1x) to single agent (4x) to multi-agent (15x), with corresponding production requirements&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Agent loops that perform well in controlled environments frequently expose new failure modes at production scale. The two constraints that dominate enterprise deployments are cost and observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cost scales with iteration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every loop iteration is an LLM call. &lt;a href="https://www.anthropic.com/engineering/multi-agent-research-system" rel="noopener noreferrer"&gt;Anthropic's internal data&lt;/a&gt; shows that agents consume roughly 4x more tokens than standard chat. Multi-agent systems push that to approximately 15x. At thousands of agent sessions per day, token costs compound with every loop iteration. Without cost controls embedded at the architecture level, this becomes a significant operational constraint.&lt;/p&gt;

&lt;p&gt;The mitigation strategies are architectural. Plan-and-execute patterns reduce the number of LLM calls by planning upfront rather than reasoning at every step. Caching commonly retrieved tool results avoids redundant work. Setting token and cost budgets per agent run prevents runaway spending. These controls must be designed into the system from the start, not added retroactively.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Observability: knowing what your agent did and why&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A standard chat interaction produces a single response from a single LLM call. An agent running 15 iterations, calling 8 different tools, and branching across multiple reasoning paths produces a complex execution trace. When a failure occurs, diagnosing it requires structured visibility into every stage of that trace: what the model reasoned, which tool it invoked, what arguments it passed, what the result was, and how the model interpreted that result before the next iteration.&lt;/p&gt;

&lt;p&gt;Production agent systems need structured logging at every stage of the loop: what the model reasoned, which tool it called, what arguments it passed, what came back, and how it interpreted the result. &lt;a href="https://www.microsoft.com/en-us/research/blog/autogen-v0-4-reimagining-the-foundation-of-agentic-ai-for-scale-extensibility-and-robustness/" rel="noopener noreferrer"&gt;Microsoft's AutoGen 0.4 builds on OpenTelemetry&lt;/a&gt; for this. LangChain's middleware hooks (before_model, after_model, modify_model_request) let you intercept and inspect every iteration.&lt;/p&gt;

&lt;p&gt;Stopping conditions are the other critical piece. Without them, agents can loop indefinitely, burning tokens and producing increasingly incoherent results. Every production system needs maximum iteration limits, no-progress detection (exiting when repeated iterations produce no new information), and token/cost budgets as hard guardrails.&lt;/p&gt;

&lt;p&gt;The following scenario illustrates the consequence of deploying an agent loop without hard stopping conditions:&lt;/p&gt;

&lt;p&gt;An agent is deployed to scrape a website and summarise the data. The target website updates its structure, causing the scraping tool to return an empty result. The agent lacks a hard stopping condition, and its prompt instructs it to retry until data is retrieved. It enters a runaway loop, calling the broken tool 400 times in five minutes and consuming thousands of tokens before hitting a platform rate limit. A maximum iteration limit of three cycles would have prevented the failure entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Building an Agent Loop with LangChain and Oracle&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before selecting a framework or writing code, address the following implementation requirements. These apply regardless of which orchestration library is used:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify tools and schema:&lt;/strong&gt; What actions can the agent take, and what exact parameters do those tools need?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose state representation:&lt;/strong&gt; How will you store the conversation history and intermediate tool results?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define stopping criteria:&lt;/strong&gt; What are the hard limits (iterations, tokens, budget) that will force the loop to terminate?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Establish logging and telemetry:&lt;/strong&gt; How will you track each reasoning step, tool call, and result?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select a memory layer:&lt;/strong&gt; Where will you store persistent knowledge (like vector embeddings or user preferences) across sessions?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is one concrete way to implement that checklist using LangChain and Oracle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolMessage&lt;/span&gt;

&lt;span class="c1"&gt;# Connect to Oracle AI Database
&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hostname:port/service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define tools the agent can use
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate a mathematical expression and return the numeric result.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_units&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert a numeric value from one unit to another.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;timezone_convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert a local time from one city&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s timezone to another.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="c1"&gt;# Create the agent -- returns a compiled StateGraph that runs the loop
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;convert_units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone_convert&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a precise reasoning assistant. Use tools for all calculations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;QUESTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A flight from London to New York JFK covers 5,570 km. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The aircraft cruises at 900 km/h. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The flight departs London at 14:00 local time. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long is the flight in hours and minutes, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and what local time does it arrive in New York?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Stream the loop live -- each chunk shows one stage of the agent's reasoning
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;QUESTION&lt;/span&gt;&lt;span class="p"&gt;)]},&lt;/span&gt;
    &lt;span class="n"&gt;stream_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;last_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ACT] &lt;/span&gt;&lt;span class="se"&gt;\u2192&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[OBSERVE] &lt;/span&gt;&lt;span class="se"&gt;\u2190&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;last_msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implementation above is a working agent loop. The compiled agent graph manages the while loop internally, invoking the LLM, evaluating tool calls, executing them, appending results to the message state, and repeating until the model returns a final response without further tool calls or the recursion limit is reached.&lt;/p&gt;

&lt;p&gt;Oracle AI Database provides the storage backend for the tools the agent calls. Vector search for semantic retrieval, relational tables for structured data, and ACID transactions ensuring that every tool call either fully succeeds or fully rolls back. No partial state. No corrupted memory.&lt;/p&gt;

&lt;p&gt;We've published a complete, runnable notebook that implements a full agent loop architecture with LangChain and Oracle AI Database in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub" rel="noopener noreferrer"&gt;Oracle AI Developer Hub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/agent_loop_foundations.ipynb" rel="noopener noreferrer"&gt;&lt;strong&gt;Run the notebook →&lt;/strong&gt;&lt;/a&gt; &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/agent_loop_foundations.ipynb" rel="noopener noreferrer"&gt;oracle-devrel/oracle-ai-developer-hub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Where This Is Heading&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Three structural shifts are emerging in how production agent systems are designed and operated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The core loop architecture is stable.&lt;/strong&gt; The active area of development is the infrastructure built around it: context management, multi-loop coordination, and decision auditability. The while loop itself is not changing. What is evolving is how context is managed within it, how multiple loops are coordinated together, and how the loop's decisions are made auditable and controllable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent middleware is emerging as the standard abstraction layer for production systems.&lt;/strong&gt; LangChain's recent work on middleware hooks (intercepting the loop at before_model, after_model, and modify_model_request) suggests a future where developers don't modify the loop itself but layer behaviour on top of it: summarisation, PII redaction, human-in-the-loop approval, dynamic model switching. It's the same pattern that made web frameworks powerful: don't change the request-response cycle, add middleware to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-per-task will replace cost-per-token as the primary efficiency metric.&lt;/strong&gt; Token usage is an input measure. The metric that reflects actual business value is the total cost required to complete a task end to end, including LLM calls, tool executions, and any human escalations triggered by agent failures.&lt;/p&gt;

&lt;p&gt;An agent that consumes 15x more tokens but resolves a customer issue without human escalation is cheaper than a chatbot that consumes fewer tokens but requires human intervention to complete the task.&lt;/p&gt;

&lt;p&gt;The primary open question in production agent deployment is the pace at which observability tooling will mature. Debugging a 20-iteration agent run currently requires piecing together structured logs, tool call traces, and LLM reasoning outputs across multiple systems. The industry needs better tooling for tracing, replaying, and interpreting agent decisions. The building blocks exist in OpenTelemetry, structured logging, and middleware hooks. The developer experience remains the unsolved problem.&lt;/p&gt;

&lt;p&gt;The agent loop is the foundational pattern for any AI system that needs to do more than generate a response. It is the architectural starting point for production-grade autonomous AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Frequently Asked Questions&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is an AI agent loop?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The AI agent loop is an iterative architecture where a large language model repeatedly reasons about a task, takes an action (typically a tool call), observes the result, and decides what to do next. The cycle continues until the task is complete or a stopping condition is met. In its simplest form, it's an LLM calling tools inside a while loop. This pattern, formalised in the &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct&lt;/a&gt; framework (2022), is the core architecture behind every major autonomous AI system shipping today.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the architectural difference between an AI agent and a chatbot?
&lt;/h3&gt;

&lt;p&gt;A chatbot generates a single response to a single input. It answers questions but cannot execute multi-step actions or adapt based on intermediate results. An AI agent uses the agent loop to iteratively reason, act, and observe, handling complex tasks that require multiple steps, tool interactions, and course corrections. The architectural difference is simple: a chatbot is one LLM call; an agent is an LLM calling tools in a loop until the job is done.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does the ReAct framework work?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct&lt;/a&gt; (Reasoning + Acting) interleaves reasoning traces with tool actions in a prompt-driven loop. At each step, the model generates a 'thought' explaining its reasoning, takes an 'action' by calling a tool, and receives an 'observation' with the result. This cycle repeats until the task is complete. The key innovation is that reasoning and acting reinforce each other: the model reasons about what to do (reason to act) and uses action results to inform further reasoning (act to reason).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are common patterns for multi-agent orchestration?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Three patterns dominate. The &lt;strong&gt;manager pattern&lt;/strong&gt; uses a central agent that delegates subtasks to specialised sub-agents via tool calls (used by &lt;a href="https://openai.github.io/openai-agents-python/" rel="noopener noreferrer"&gt;OpenAI's Agents SDK&lt;/a&gt;). The &lt;strong&gt;orchestrator-worker pattern&lt;/strong&gt; has a lead agent spawning workers for parallel exploration (used by Anthropic's Claude Research). The &lt;strong&gt;handoff pattern&lt;/strong&gt; treats agents as peers that transfer control to one another based on specialisation. Most production systems start with a single agent loop and only move to multi-agent orchestration when task complexity genuinely demands it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you prevent an AI agent from running forever?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Production agent loops use multiple stopping conditions layered together. &lt;strong&gt;Maximum iteration limits&lt;/strong&gt; cap the number of loop cycles (for example, max_iterations=10). &lt;strong&gt;Token and cost budgets&lt;/strong&gt; set hard spending limits per agent run. &lt;strong&gt;No-progress detection&lt;/strong&gt; exits the loop when repeated iterations produce no new information. &lt;strong&gt;Goal-achievement checks&lt;/strong&gt; evaluate whether the task objective has been met. Microsoft's Magentic-One adds a dual-loop approach where the outer loop can reset the entire strategy when the inner loop stalls, preventing the agent from spinning on a failed approach.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building ONNX Embedding Workflows in Oracle AI Database with Python</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Fri, 17 Apr 2026 08:44:35 +0000</pubDate>
      <link>https://dev.to/oracledevs/a-practical-guide-to-importing-an-onnx-embedding-model-generating-embeddings-and-running-semantic-4e1m</link>
      <guid>https://dev.to/oracledevs/a-practical-guide-to-importing-an-onnx-embedding-model-generating-embeddings-and-running-semantic-4e1m</guid>
      <description>&lt;h2&gt;
  
  
  A practical guide to importing an ONNX embedding model, generating embeddings, and running semantic search in Oracle AI Database
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Companion notebook:&lt;/strong&gt; &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/onnx_embeddings_oracle_ai_database.ipynb" rel="noopener noreferrer"&gt;ONNX In-Database Embeddings with Oracle AI Database 26ai&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Oracle AI Database can load and register an augmented ONNX embedding model with &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/arpls/dbms_vector1.html" rel="noopener noreferrer"&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; lets SQL generate embeddings directly inside Oracle AI Database.&lt;/li&gt;
&lt;li&gt;Embeddings can be stored natively in &lt;code&gt;VECTOR&lt;/code&gt; columns.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; enables semantic search directly in SQL.&lt;/li&gt;
&lt;li&gt;LangChain can build on the same Oracle-native workflow without moving embeddings or retrieval outside the database (&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;LangChain Oracle vector store integration&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;In many embedding pipelines, source data resides in a relational database, the model runs somewhere else as an external service, and the vectors are stored in a separate vector database. While this architecture can work well, it introduces additional data movement, infrastructure, and operational complexity.&lt;/p&gt;

&lt;p&gt;Oracle AI Database supports a more consolidated approach. You can load an &lt;a href="https://onnx.ai/" rel="noopener noreferrer"&gt;ONNX&lt;/a&gt; embedding model directly into the database, invoke it, store the generated embeddings in native &lt;code&gt;VECTOR&lt;/code&gt; columns, and perform semantic search in the same database.&lt;/p&gt;

&lt;p&gt;This article walks through that end-to-end workflow using an ONNX model: loading it into Oracle AI Database, validating that it is registered correctly, generating embeddings with SQL, storing them in a native vector column, and querying them using semantic similarity. It also demonstrates how the same architecture can be used with LangChain, without changing where embedding and retrieval occur.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How to load an augmented ONNX model with Oracle AI Database.&lt;/li&gt;
&lt;li&gt;How to generate embeddings directly in SQL with &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;How to run semantic search with &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; in Oracle AI Database and through LangChain.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;This workflow keeps model execution, vector storage, and semantic retrieval inside Oracle AI Database. An augmented ONNX model is exposed through an Oracle directory object, loaded with &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/code&gt;, invoked with &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt;, and queried with &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt;. The model artifact can come either from a local or container-mounted path or directly from Oracle Cloud Object Storage using &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/load_onnx_model_cloud.html#GUID-82A8D291-8096-4A7C-8882-9B6AC4A7FCCB" rel="noopener noreferrer"&gt;&lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt;&lt;/a&gt;. LangChain can build on the same Oracle-native execution path through &lt;code&gt;OracleEmbeddings&lt;/code&gt; and &lt;code&gt;OracleVS&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;Oracle AI Database 26ai running in a container&lt;/li&gt;
&lt;li&gt;Dependencies such as &lt;code&gt;oracledb&lt;/code&gt;, &lt;code&gt;python-dotenv&lt;/code&gt;, &lt;code&gt;pandas&lt;/code&gt;, &lt;code&gt;numpy&lt;/code&gt;, &lt;code&gt;langchain&lt;/code&gt;, &lt;code&gt;langchain-community&lt;/code&gt;, and &lt;code&gt;langchain-oracledb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;For cloud loading: an Oracle Cloud Object Storage bucket and model URI, or a PAR URL&lt;/li&gt;
&lt;li&gt;If not using a PAR URL, an Object Storage credential created with &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/arpls/dbms_cloud.html" rel="noopener noreferrer"&gt;&lt;code&gt;DBMS_CLOUD.CREATE_CREDENTIAL&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the notebook, those packages are installed up front:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;executable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;install&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;oracledb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python-dotenv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numpy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langchain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langchain-core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langchain-community&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langchain-oracledb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Packages installed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Install failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The example also assumes Oracle AI Database 26ai is running in a container, with a mounted directory for ONNX model files. That mounted directory becomes important later, because Oracle accesses the model through a database directory object rather than through ad hoc file access.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step-by-Step Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Understand why Oracle requires an augmented ONNX model
&lt;/h3&gt;

&lt;p&gt;One of the most important details in this workflow is that Oracle needs an &lt;strong&gt;augmented ONNX model&lt;/strong&gt;, not just a standard transformer export.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; to accept raw text directly, tokenization and related preprocessing need to be included inside the ONNX graph itself. That is what allows Oracle to take a normal text string and produce an embedding without relying on external preprocessing in Python.&lt;/p&gt;

&lt;p&gt;In the notebook, the model used is an augmented version of &lt;code&gt;all-MiniLM-L12-v2&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all_MiniLM_L12_v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;ONNX_FILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all_MiniLM_L12_v2.onnx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without that augmented packaging, the flow would no longer be fully Oracle-native, because preprocessing would have to happen outside the database first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Prepare an ONNX model for Oracle AI Database
&lt;/h3&gt;

&lt;p&gt;Before the model can be used in SQL, Oracle needs controlled access to the ONNX file through a database directory object. This is a database-managed reference to a filesystem location, which means access to the model artifact is handled through Oracle privileges rather than through direct filesystem assumptions.&lt;/p&gt;

&lt;p&gt;The notebook includes a one-time admin setup that creates the user, grants privileges, and registers the ONNX model directory. At runtime, the important pieces are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a database user with the required privileges&lt;/li&gt;
&lt;li&gt;permission to load mining models&lt;/li&gt;
&lt;li&gt;a registered Oracle directory such as &lt;code&gt;ONNX_DIR&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;access to the ONNX file from inside the container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simplified version of the directory setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="n"&gt;DIRECTORY&lt;/span&gt; &lt;span class="n"&gt;ONNX_DIR&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="s1"&gt;'/opt/oracle/onnx_models'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;READ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;WRITE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;DIRECTORY&lt;/span&gt; &lt;span class="n"&gt;ONNX_DIR&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;my_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters because the model import is not treated as an ad hoc file operation. The file is exposed to Oracle through a controlled database object, which is much more aligned with enterprise governance expectations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Figure 1.&lt;/strong&gt; An augmented ONNX model is exposed through an Oracle directory object, loaded with &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/code&gt;, registered in Oracle, and invoked from SQL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2Foracle_onnx_flow_reworked-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2Foracle_onnx_flow_reworked-2.png" alt="Diagram showing the workflow for loading and using an ONNX model in Oracle Database. An ONNX model file is stored in an Oracle directory object (ONNX_DIR), then loaded using the DBMS_VECTOR.LOAD_ONNX_MODEL() procedure. The model is registered inside the database and can then be invoked directly from SQL." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2b: Cloud option - load ONNX from Oracle Object Storage
&lt;/h3&gt;

&lt;p&gt;Oracle also supports loading ONNX models from Oracle Cloud Object Storage with &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/load_onnx_model_cloud.html#GUID-82A8D291-8096-4A7C-8882-9B6AC4A7FCCB" rel="noopener noreferrer"&gt;&lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt;&lt;/a&gt;. This is a documented alternative to the local directory workflow used in the companion notebook.&lt;/p&gt;

&lt;p&gt;Per Oracle documentation, use a credential for standard Object Storage URIs, and pass &lt;code&gt;credential =&amp;gt; NULL&lt;/code&gt; for pre-authenticated request (PAR) URLs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Option A: regular Object Storage URI (credential required)&lt;/span&gt;
&lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="n"&gt;DBMS_VECTOR&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOAD_ONNX_MODEL_CLOUD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ALL_MINILM_L12_V2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'OBJ_STORE_CRED'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;uri&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'https://objectstorage.&amp;lt;region&amp;gt;.oraclecloud.com/n/&amp;lt;namespace&amp;gt;/b/&amp;lt;bucket&amp;gt;/o/all_MiniLM_L12_v2.onnx'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;metadata&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{
    "function":"embedding",
    "embeddingOutput":"embedding",
    "input":{"input":["DATA"]}
  }'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Option B: PAR URL (credential must be NULL)&lt;/span&gt;
&lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="n"&gt;DBMS_VECTOR&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOAD_ONNX_MODEL_CLOUD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ALL_MINILM_L12_V2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;credential&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;uri&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'https://objectstorage.&amp;lt;region&amp;gt;.oraclecloud.com/p/&amp;lt;par-token&amp;gt;/n/&amp;lt;namespace&amp;gt;/b/&amp;lt;bucket&amp;gt;/o/all_MiniLM_L12_v2.onnx'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; According to Oracle documentation, &lt;code&gt;metadata&lt;/code&gt; is optional for models prepared with Oracle's Python utility defaults, model names must follow Oracle naming rules, and the ONNX file size limit for cloud loading is 2 GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2c: Multi-cloud note (AWS/GCP/Google Drive)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt; is documented for Oracle Cloud Object Storage. If your model artifact is hosted in AWS S3, Google Cloud Storage, or Google Drive, use a portable two-step pattern: download the ONNX file to a database-accessible local path, then load it with &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This keeps embedding generation and semantic retrieval Oracle-native while allowing model artifact hosting outside OCI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;model_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_SIGNED_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# S3 pre-signed URL / GCS signed URL / Drive direct URL
&lt;/span&gt;&lt;span class="n"&gt;target_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/opt/oracle/onnx_models/all_MiniLM_L12_v2.onnx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model downloaded to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="n"&gt;DBMS_VECTOR&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOAD_ONNX_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;directory&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ONNX_DIR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_name&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'all_MiniLM_L12_v2.onnx'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ALL_MINILM_L12_V2'&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Connect to Oracle AI Database from Python
&lt;/h3&gt;

&lt;p&gt;The notebook connects to Oracle AI Database using &lt;code&gt;python-oracledb&lt;/code&gt; in Thin mode, so no Oracle Client libraries are required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connected to Oracle AI Database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That same connection is then reused across the SQL examples and the LangChain integration later in the notebook.&lt;/p&gt;

&lt;p&gt;To keep the notebook readable, it defines a small helper function for executing SQL and optionally returning results as a pandas DataFrame:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;many&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute SQL against Oracle Database.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;many&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executemany&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cols&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The example also assumes Oracle AI Database 26ai is running in a container, with a mounted directory for ONNX model files. That mounted directory becomes important later, because Oracle accesses the model through a database directory object rather than through ad hoc file access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Load an ONNX embedding model into Oracle AI Database
&lt;/h3&gt;

&lt;p&gt;The notebook does not assume the ONNX model is already present. If the file is missing, it downloads the official pre-built augmented model and places it in the model directory used by Oracle.&lt;/p&gt;

&lt;p&gt;Once the model file is available, either through an Oracle directory object or a cloud URI, it can be imported with &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/code&gt; or &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A simplified version of the local directory call looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="n"&gt;DBMS_VECTOR&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOAD_ONNX_MODEL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;directory&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ONNX_DIR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_name&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'all_MiniLM_L12_v2.onnx'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'ALL_MINILM_L12_V2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{
      "function":"embedding",
      "embeddingOutput":"embedding",
      "input":{"input":["DATA"]}
    }'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the point where the model becomes more than a file. Oracle registers it, stores the associated metadata, and exposes it as a named object that SQL can invoke directly.&lt;/p&gt;

&lt;p&gt;The metadata is especially important. It defines how Oracle maps the SQL input text into the model graph and identifies which output node should be used as the embedding vector. In the notebook, the workflow also checks whether the model already exists before reloading it. This makes reruns safer and ensures the workflow remains idempotent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model_check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT COUNT(*) AS cnt FROM USER_MINING_MODELS WHERE MODEL_NAME = UPPER(:model_name)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;fetch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt; the model check confirms whether the ONNX model is already registered, so reruns stay idempotent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Verify that Oracle registered the model correctly
&lt;/h3&gt;

&lt;p&gt;After the import, the next step is to validate that Oracle recognizes the model.&lt;/p&gt;

&lt;p&gt;The notebook queries the model catalog to verify that the ONNX model has been loaded successfully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mining_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algorithm&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_mining_models&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ALL_MINILM_L12_V2'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a small but important part of the workflow. It confirms that the model is visible to Oracle as a registered object and is ready to be used by the vector functions that come next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt; the query returns the registered ONNX model from &lt;code&gt;USER_MINING_MODELS&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Generate embeddings in SQL with VECTOR_EMBEDDING()
&lt;/h3&gt;

&lt;p&gt;Once the model is registered, Oracle can use it directly through &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The notebook first tests this with a simple text input to confirm that the model works and that the returned vector has the expected size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="n"&gt;ALL_MINILM_L12_V2&lt;/span&gt;
         &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'Oracle Database supports vector search.'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dual&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the most important parts of the article. Embedding generation is no longer a separate service call. It becomes a SQL operation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the application does not need to call an external embedding API&lt;/li&gt;
&lt;li&gt;the database can generate embeddings internally&lt;/li&gt;
&lt;li&gt;the semantic representation stays close to the data it describes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Expected output:&lt;/strong&gt; Oracle returns a 384-dimensional embedding for the supplied text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Store embeddings in a native VECTOR column
&lt;/h3&gt;

&lt;p&gt;After validating embedding generation, the notebook creates a table where the source text and its embedding live together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;onnx_docs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;        &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;category&lt;/span&gt;  &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;doc_text&lt;/span&gt;  &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an important design choice. The vector is not stored as an opaque blob or external payload. It is stored in Oracle's native &lt;code&gt;VECTOR&lt;/code&gt; type, which means it becomes part of the same database model as the relational data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vectors stay linked to the exact rows they describe&lt;/li&gt;
&lt;li&gt;access control applies consistently&lt;/li&gt;
&lt;li&gt;backups and retention policies stay unified&lt;/li&gt;
&lt;li&gt;the application does not need to coordinate data across multiple storage systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The notebook inserts demo content and generates the embedding directly in the same SQL statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;onnx_docs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'database'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Oracle AI Database supports in-database vector search and semantic retrieval.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ALL_MINILM_L12_V2&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'Oracle AI Database supports in-database vector search and semantic retrieval.'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The semantic representation is created at the same time as the row is written, inside the same transactional boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Figure 2.&lt;/strong&gt; Embedding generation happens at insert time inside Oracle AI Database, where document text is embedded with &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; and stored together with the row in a &lt;code&gt;VECTOR&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FFigure-2-v2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FFigure-2-v2.png" alt="Diagram showing embedding generation inside Oracle AI Database during data insertion. Document text is passed through a SQL INSERT statement, where the VECTOR_EMBEDDING() function generates a vector (for example, VECTOR(384)) within the same transactional boundary, and the resulting embedding is stored alongside the data as stored vectors." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before moving into retrieval, the notebook inspects the inserted rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DBMS_LOB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;preview&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;onnx_docs&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8: Run semantic search in SQL and LangChain
&lt;/h3&gt;

&lt;p&gt;Once embeddings are stored, semantic retrieval is handled entirely inside Oracle. The notebook uses &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; together with &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; so that the query text is embedded on the fly and compared against the stored vectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;DBMS_LOB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SUBSTR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;doc_preview&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ALL_MINILM_L12_V2&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'How does Oracle support semantic search?'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;DATA&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;COSINE&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;onnx_docs&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user query is embedded directly within Oracle, where it is compared against stored document vectors. The results are then ranked by similarity, and the closest semantic matches are returned through SQL.&lt;/p&gt;

&lt;p&gt;The notebook explicitly explains how to interpret the output: the smaller the cosine distance, the more semantically similar the document is to the query.&lt;/p&gt;

&lt;p&gt;The notebook also runs several queries to validate that semantic ranking remains meaningful across different phrasings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;test_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which Oracle feature helps semantic retrieval?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can I store embeddings in the database?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How does LangChain work with Oracle vectors?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Why are ONNX models useful here?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Figure 3.&lt;/strong&gt; At query time, Oracle embeds the input text, compares it with stored vectors using &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt;, and returns the nearest semantic matches directly through SQL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FFigure-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FFigure-3.png" alt="Diagram showing semantic search in Oracle AI Database at query time. A user query is embedded into a query vector, which is then compared against stored vectors using a distance search with VECTOR_DISTANCE(). The system returns the closest semantic matches as ranked results directly through SQL." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The notebook then adds an optional framework layer using LangChain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.vectorstores.oraclevs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleVS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;OracleEmbeddings&lt;/code&gt;, the application can use Oracle's registered in-database embedding model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;oracle_embedder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OracleEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook also validates that the LangChain embedding call returns a vector of the expected size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lc_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracle_embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Oracle AI Database performs semantic search using vectors.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Embedding dimension: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lc_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;First 5 values: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lc_embedding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook then uses &lt;code&gt;OracleVS&lt;/code&gt;, a LangChain-compatible vector store backed by Oracle AI Vector Search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.documents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.vectorstores.oraclevs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleVS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;

&lt;span class="n"&gt;langchain_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Oracle AI Database supports vector storage and semantic search.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An ONNX embedding model can be loaded directly into Oracle.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LangChain can use OracleVS to query Oracle AI Vector Search.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using in-database embeddings can reduce architectural complexity.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OracleVS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;langchain_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;oracle_embedder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LC_ONNX_DEMO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;distance_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook also runs a similarity query through the LangChain abstraction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How can Oracle Database help with semantic retrieval?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Validation &amp;amp; Troubleshooting
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Validate that the model appears in &lt;code&gt;USER_MINING_MODELS&lt;/code&gt; after &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL()&lt;/code&gt; or &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Confirm that &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; returns a 384-dimensional embedding for the loaded model.&lt;/li&gt;
&lt;li&gt;If semantic ranking looks off, verify that the same model is used for both stored document embeddings and query embeddings.&lt;/li&gt;
&lt;li&gt;If using cloud loading, verify URI or PAR validity, bucket path, region, and credential privileges.&lt;/li&gt;
&lt;li&gt;When rerunning the notebook, check whether the model and demo tables already exist to avoid duplicate object errors.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why load the model into Oracle instead of calling an external API?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Because Oracle can generate embeddings directly in SQL, which reduces external dependencies and keeps data and inference inside the same system boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does the model need to be augmented?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Because Oracle must be able to accept raw text input directly. That requires tokenization and preprocessing logic to already be included in the ONNX graph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; do?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
It invokes the registered model inside Oracle and returns the embedding vector for the input text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does the &lt;code&gt;VECTOR&lt;/code&gt; column store?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
It stores the numeric embedding representation produced by the model. In this example, the vectors are 384-dimensional &lt;code&gt;FLOAT32&lt;/code&gt; values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is semantic similarity computed?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This workflow uses &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; with cosine distance to compare the stored document vectors with the embedded query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can the model be reused by multiple applications?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Once registered and granted appropriately, the model can be invoked by any application that has access to the Oracle environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I load the model from cloud storage instead of a local mounted directory?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Oracle AI Database supports &lt;code&gt;DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD()&lt;/code&gt; for models in Oracle Cloud Object Storage, with either a credential or a PAR URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does LangChain move embeddings outside Oracle?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. LangChain provides a higher-level interface, but the model execution and vector search still run in Oracle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this replace a separate vector database?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For many use cases, yes. Oracle provides native vector storage and vector search directly in the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Documentation and Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Ela689/oracle-ai-developer-hub/blob/onnx-embeddings/notebooks/onnx_embeddings_oracle_ai_database.ipynb" rel="noopener noreferrer"&gt;Companion notebook on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/" rel="noopener noreferrer"&gt;Oracle Database 26ai documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/" rel="noopener noreferrer"&gt;Oracle AI Vector Search User's Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/load_onnx_model_cloud.html#GUID-82A8D291-8096-4A7C-8882-9B6AC4A7FCCB" rel="noopener noreferrer"&gt;LOAD_ONNX_MODEL_CLOUD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/arpls/dbms_vector1.html" rel="noopener noreferrer"&gt;DBMS_VECTOR package reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_embedding.html" rel="noopener noreferrer"&gt;VECTOR_EMBEDDING SQL reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_distance.html" rel="noopener noreferrer"&gt;VECTOR_DISTANCE SQL reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/arpls/dbms_cloud.html" rel="noopener noreferrer"&gt;DBMS_CLOUD package reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/adjsn/" rel="noopener noreferrer"&gt;Oracle JSON Developer's Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/ccapp/" rel="noopener noreferrer"&gt;Oracle Text Application Developer's Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/spatl/" rel="noopener noreferrer"&gt;Oracle Spatial and Graph documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/dbseg/" rel="noopener noreferrer"&gt;Oracle Database Security Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;LangChain Oracle vector store integration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>oracle</category>
      <category>database</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>Vector Embeddings: How They Work, Where to Store Them, and Best Practices</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Thu, 16 Apr 2026 15:51:46 +0000</pubDate>
      <link>https://dev.to/oracledevs/vector-embeddings-how-they-work-where-to-store-them-and-best-practices-429g</link>
      <guid>https://dev.to/oracledevs/vector-embeddings-how-they-work-where-to-store-them-and-best-practices-429g</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Vector embeddings convert unstructured data into numeric representations that power semantic search, recommendations, and multimodal analytics beyond keywords.&lt;/li&gt;
&lt;li&gt;Embedding success isn’t just about the model—it also depends on a data platform that can meet requirements for scale, low latency, security, and governance, including vector indexing/ANN search, access controls, encryption, and monitoring.&lt;/li&gt;
&lt;li&gt;Oracle AI Database unifies native vector types and similarity search, enterprise-grade security, and integrated vector, structured, and unstructured data—so teams can build RAG, search, and analytics without piecing together multiple systems.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-3-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-3-1.png" title="Semantic similarity search over vector space - Oracle Help Center" alt="Semantic similarity search over vector space - Oracle Help Center" width="719" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Semantic similarity search over vector space - Oracle Help Center&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;Vector embeddings&amp;nbsp;have changed&amp;nbsp;the way we interact with unstructured data such as text, images, audio, and code. By transforming this data into high-dimensional numeric vectors,&amp;nbsp;we can use embeddings to&amp;nbsp;process the semantic meaning and relationships within the data.&lt;/p&gt;

&lt;p&gt;We can look at embeddings as task or domain-specific representations of vectors. The geometric relationships among them represent meaningful similarities between concepts in semantic space. The efficient storage and querying of vector embeddings enables capabilities such as semantic search, recommendations, and advanced analytics; and bridges the gap between unstructured and structured information.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Vector Embeddings? A Definition and Their Role
&lt;/h2&gt;

&lt;p&gt;Vector embeddings are mathematical representations of objects—such as words, sentences, images, or audio—encoded as dense, high-dimensional vectors. Each vector encapsulates features that capture semantic meaning, context, or structure of the data. For example, similar words or images will have embeddings positioned closely in the vector space, enabling similarity-based operations.&amp;nbsp;This allows for similar “things” to be grouped together under a distance metric.&lt;/p&gt;

&lt;p&gt;The adoption of vector embeddings underpins many&amp;nbsp;cutting-edge&amp;nbsp;technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieval-augmented generation (RAG):&lt;/strong&gt; Enhances large language models by retrieving relevant&amp;nbsp;context&amp;nbsp;using embedding similarity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic search:&lt;/strong&gt; Finds documents with similar context, not just matching keywords.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt; Suggests products or content by comparing user or item embeddings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deduplication and anomaly detection:&lt;/strong&gt; Identifies&amp;nbsp;near-duplicates or outliers based on embedding distances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multimodal analytics:&lt;/strong&gt; Links&amp;nbsp;information across text, image, audio, and other domains.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ability to bridge structured and unstructured data makes embeddings indispensable across modern data architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Create Embeddings? Some Tools That Can Help
&lt;/h2&gt;

&lt;p&gt;A variety of tools can encode text, images, and code as vector embeddings, enabling similarity search, retrieval workflows (including RAG), and other ML tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI – provides hosted embedding APIs backed by task-optimized models, accessible with REST interfaces.&lt;/li&gt;
&lt;li&gt;Hugging Face – offers a large catalog of pre-trained multimodal embedding models and libraries (such as the Transformers library), plus community benchmarks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.oracle.com/database/" rel="noopener noreferrer"&gt;Oracle AI Database&lt;/a&gt; – provides a native vector memory store in Oracle Database, enabling storage, indexing (e.g., IVF/flat/HNSW), and retrieval of vector embeddings alongside relational data with SQL and PL/SQL integration; supports hybrid search (vector + metadata filters), enterprise-grade security, and governance for RAG and semantic search workloads&lt;/li&gt;
&lt;li&gt;TensorFlow – enables building and serving custom embedding models using&amp;nbsp;Keras, enabling easy integration into training pipelines.&lt;/li&gt;
&lt;li&gt;PyTorch&amp;nbsp;– provides flexible primitives to fine-tune or implement embedding&amp;nbsp;models, and deploy them via&amp;nbsp;TorchScript.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benefits of Working With Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;The following are just a few of the benefits vector embeddings have brought to today's AI tech stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Vector embeddings are currently the best way to transform complex data into numerical units that reflect meaning, similarity and enable clustering and retrieval beyond keyword matching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The limitations of keyword methods were particularly visible in areas such as synonym handling, typos, and paraphrasing, and are now absent in modern-day LLMs relying on vector embeddings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embeddings support multilingual and cross-modal experiences by aligning meaning across languages and modalities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other approaches, such as sparse lexical retrieval and symbolic/ontology-based methods, can be effective, but dense vector embeddings are often a better fit when you need semantic similarity matching (for example, paraphrases and synonyms) rather than exact keyword overlap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges in Working With Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;The following are some of the potential challenges you may face in working with vector embeddings, and potential ways to mitigate them:&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Volume and High Dimensionality
&lt;/h3&gt;

&lt;p&gt;Storage challenges include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Large embedding volumes:&lt;/strong&gt; Billions of vectors require scalable storage and efficient indexing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High dimensionality:&lt;/strong&gt; Embeddings of 128, 512, or 1024+ dimensions need specialized data structures and optimized storage formats.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance and Latency Bottlenecks
&lt;/h3&gt;

&lt;p&gt;Performance factors include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Indexing and search speed:&lt;/strong&gt; ANN techniques improve latency, but&amp;nbsp;very large&amp;nbsp;datasets demand optimized infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Batch insertion and streaming:&lt;/strong&gt; Efficiently handling ongoing ingestion of new embeddings.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Distributed System Complexities and Operational Overhead
&lt;/h3&gt;

&lt;p&gt;At scale, sharding, replication, and consistency management become complex. Automated scaling, monitoring, and failover are desirable for production systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Factors
&lt;/h3&gt;

&lt;p&gt;Vector embeddings may affect operational cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compute and storage requirements:&lt;/strong&gt; High-dimensional data and fast search consume substantial resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational overhead:&lt;/strong&gt; Consider cost of infrastructure, team&amp;nbsp;expertise, and maintenance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Encryption at Rest and in Transit
&lt;/h3&gt;

&lt;p&gt;Securing embeddings is crucial as they can encode sensitive information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encryption at rest:&lt;/strong&gt; Protects stored vectors using strong industry-standard algorithms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Encryption in transit:&lt;/strong&gt; Ensures vectors&amp;nbsp;remain&amp;nbsp;confidential when transmitted between systems or users.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Oracle AI Database enforces encryption by default and&amp;nbsp;integrates with&amp;nbsp;enterprise key management solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Access Control and Authentication
&lt;/h3&gt;

&lt;p&gt;Control who can access,&amp;nbsp;modify, or query embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Granular permissions:&lt;/strong&gt; Define user roles and table-level permissions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration with SSO and identity providers:&lt;/strong&gt; Streamlines enterprise authentication.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit trails:&lt;/strong&gt; Track access and changes for compliance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Sanitization and Monitoring
&lt;/h3&gt;

&lt;p&gt;Reduce risk by implementing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sanitization:&lt;/strong&gt; Remove or obfuscate sensitive or personal information in embeddings before storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitoring and anomaly detection:&lt;/strong&gt; Detect unusual access patterns or potential misuse.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advanced Cryptographic Techniques
&lt;/h3&gt;

&lt;p&gt;For&amp;nbsp;highly sensitive&amp;nbsp;embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Homomorphic encryption or secure multi-party computation:&lt;/strong&gt; Enables computation and&amp;nbsp;search&amp;nbsp;on encrypted embeddings, minimizing exposure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Vector Embedding Use Cases
&lt;/h2&gt;

&lt;p&gt;Embeddings&amp;nbsp;open up&amp;nbsp;a wide array of practical use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise search and information retrieval:&lt;/strong&gt; Improved accuracy and relevance in document and knowledge base searches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personalization and recommendation engines:&lt;/strong&gt; Enhanced user experiences by surfacing relevant content or products.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fraud and anomaly detection:&lt;/strong&gt; Early identification of unusual patterns using embedding distances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data deduplication and clustering:&lt;/strong&gt; Streamlined datasets and improved analytics through intelligent grouping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal retrieval and analytics:&lt;/strong&gt; Unified analysis over diverse data types, fostering deeper insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storing Vector Embeddings and the Oracle Advantage
&lt;/h2&gt;

&lt;p&gt;The following are a few key points related to the storage of vector embeddings, and how Oracle AI Database's native vector store capabilities can streamline and strengthen your stack with its native vector store capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Specialized Vector Databases
&lt;/h3&gt;

&lt;p&gt;Dedicated vector databases are built for storing, indexing, and searching embeddings efficiently. These databases excel at large-scale similarity search with features such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High-dimensional indexing:&lt;/strong&gt; Specialized data structures to support billion-scale embeddings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Approximate search capabilities:&lt;/strong&gt; Fast, scalable similarity queries using Approximate Nearest Neighbor (ANN) techniques.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;RESTful APIs and SDKs:&lt;/strong&gt; Developer-friendly interfaces for ingestion and search.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Popular examples include Pinecone,&amp;nbsp;Weaviate, Milvus, and Vespa. Specialized databases are ideal for workloads with large volumes of embeddings and demanding&amp;nbsp;similarity&amp;nbsp;search requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL/NoSQL Databases with Vector Support
&lt;/h3&gt;

&lt;p&gt;Traditional databases are evolving to meet AI's demands by adding native vector data types and search capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SQL databases:&lt;/strong&gt; PostgreSQL (with&amp;nbsp;pgvector), Oracle AI Database, and others support vector columns and similarity search via extensions or built-in features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NoSQL databases:&lt;/strong&gt; MongoDB and Redis now offer basic vector search features, often using plugins or modules.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This integration enables seamless blending of embeddings with structured business data, supporting hybrid query scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Oracle AI Database Approach
&lt;/h3&gt;

&lt;p&gt;From Oracle's viewpoint, AI databases must natively support vector data types, efficient similarity queries, and enterprise security for integrating embeddings across applications. Oracle AI Database is designed to address these needs at scale.&lt;/p&gt;

&lt;p&gt;The Oracle AI Database offers a unified approach allowing developers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Store embeddings alongside structured and unstructured data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run similarity queries directly using SQL and specialized vector search operators.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Integrate with Oracle's rich security, high availability, and scalability features.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Combine vector search, filtering, ranking, and analytical queries in a single stack.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example Procedures - Using Vector Embeddings in Oracle AI Database
&lt;/h2&gt;

&lt;p&gt;The following examples are intentionally minimal and illustrative. They&amp;nbsp;highlight&amp;nbsp;how Oracle AI Database supports native vector storage and SQL-based similarity search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;

 &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

 &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;

&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example shows a minimal&amp;nbsp;table&amp;nbsp;definition using Oracle AI Database’s native VECTOR data type. In practice, embeddings are stored alongside structured or unstructured application data in the same database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;

&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example illustrates&amp;nbsp;SQL-based similarity search in Oracle AI Database. The &lt;code&gt;:query_vector&lt;/code&gt;&amp;nbsp;placeholder&amp;nbsp;represents&amp;nbsp;the embedding generated from&amp;nbsp;user input by an embedding&amp;nbsp;model (inside or outside the database) and is used to rank the nearest matches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid query pattern (semantic + relational filtering)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;

&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;

&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This hybrid pattern combines standard SQL filtering with semantic ranking in a single query. It is useful when semantic search must also respect metadata&amp;nbsp;constraints, access controls, or business rules.&amp;nbsp;This streamlines workflows and&amp;nbsp;facilitates&amp;nbsp;embedding-driven applications without moving data across siloed systems.&lt;/p&gt;

&lt;p&gt;Using Oracle Autonomous AI Database in conjunction with&amp;nbsp;&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;langchain-oracledb&lt;/a&gt;, for example, we can simply generate embeddings, store, and interact with vectors directly from within&amp;nbsp;the&amp;nbsp;database – requiring no&amp;nbsp;additional&amp;nbsp;investment in another separate vector database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying and Searching for Stored Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;The following are a few of the things you should keep in mind if your work involves querying and searching for stored vector embeddings:&lt;/p&gt;

&lt;h3&gt;
  
  
  Approximate Nearest Neighbor (ANN) Algorithms and Data Structures
&lt;/h3&gt;

&lt;p&gt;Searching for similar embeddings at scale requires efficient algorithms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ANN Techniques:&lt;/strong&gt; Rather than exact search, algorithms like HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), and PQ (Product Quantization) yield fast, near-accurate results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Structures:&lt;/strong&gt; Use trees (KD-Tree, Ball Tree), graphs (HNSW), or hash-based indices (LSH) to organize and retrieve vectors efficiently.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ANN can deliver millisecond-latency searches over millions or billions of embeddings, making it essential for operational AI applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-level retrieval workflow (generalized)
&lt;/h3&gt;

&lt;p&gt;At&amp;nbsp;a high level, semantic retrieval follows a simple and reusable pattern that applies across vector databases, frameworks, and application stacks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Convert user input into a query embedding.&lt;/li&gt;
&lt;li&gt;Compare it against stored embeddings.&lt;/li&gt;
&lt;li&gt;Rank results by similarity.&lt;/li&gt;
&lt;li&gt;Apply filters and business rules as needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This high-level workflow is framework- and language-agnostic. While the underlying implementation differs across platforms and tools, the conceptual flow&amp;nbsp;remains&amp;nbsp;the same for&amp;nbsp;the most&amp;nbsp;vector search and RAG-style applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Popular Libraries
&lt;/h3&gt;

&lt;p&gt;Several tools make it easier to store, and search embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector search libraries:&lt;/strong&gt; FAISS (Facebook AI Similarity Search), Annoy (Spotify), NMSLIB,&amp;nbsp;ScaNN.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These libraries power both stand-alone vector stores and integrations within general-purpose databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Choose the Right Similarity Metrics
&lt;/h3&gt;

&lt;p&gt;Selecting the right similarity metric is critical for effective search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cosine similarity:&lt;/strong&gt; Measures the angle between vectors; ideal for text and semantic similarity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Euclidean distance:&lt;/strong&gt; Useful for geometric or spatial data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dot product:&lt;/strong&gt; Common in deep learning models; efficient for high-dimensional comparisons.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your choice depends on the nature of your data and the specifics of your application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Oracle AI Database Capabilities
&lt;/h3&gt;

&lt;p&gt;Oracle’s AI Database combines native vector capabilities, enterprise security, and proven scalability, making it a robust choice for organizations seeking a unified solution for traditional data and AI-enabled workloads.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Native vector data types and indexing:&lt;/strong&gt; Supports efficient storage and retrieval of high-dimensional vectors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrated similarity search:&lt;/strong&gt; Enables querying and filtering based on vector proximity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enterprise-grade security:&lt;/strong&gt; Encryption at rest, robust access controls, and activity monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid queries:&lt;/strong&gt; Seamless combination of structured, unstructured, and vector data in complex analytical tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High scalability:&lt;/strong&gt; Handles massive volumes of&amp;nbsp;embeddings&amp;nbsp;without performance degradation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for Working With Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;The following are a few of the best practices for using vector embeddings to power semantic search, personalized recommendations, multimodal analytics (including anomaly detection), and domain-specific insights across enterprise applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Search and Information Retrieval
&lt;/h3&gt;

&lt;p&gt;Semantic search with embeddings offers better context and intent recognition than keyword search. Querying an embedding retrieves documents or objects with similar meanings—crucial for legal, healthcare, customer support, and research applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendation Systems and Personalization
&lt;/h3&gt;

&lt;p&gt;Compare user and item embeddings to power personalized recommendations. This increases engagement, retention, and value in e-commerce, media, and B2B applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multimodal Search and Anomaly Detection
&lt;/h3&gt;

&lt;p&gt;Combine embeddings across text, image, and audio for multimodal analytics or use distance-based thresholds to flag anomalies and outliers in fraud prevention or system monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain-Specific Analytics
&lt;/h3&gt;

&lt;p&gt;Specialized embeddings can be trained for&amp;nbsp;particular industries—finance, healthcare, retail—and stored/retrieved for advanced analytics, predictions, or compliance monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Select Appropriate Tools and Architectures
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Match your use case to the data platform (dedicated vector database vs. extended relational/NoSQL).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you want both, Oracle AI Database is a good option.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Factor in scale, integration needs, security requirements, and budget.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leverage proven libraries and frameworks to speed up development.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Security and Scalability Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Encrypt embeddings, control access, and&amp;nbsp;monitor&amp;nbsp;usage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose solutions that scale with data growth and user demand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Balance security, performance, and cost based on enterprise requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architectural Patterns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hybrid architecture:&lt;/strong&gt; Combine vector storage/search with structured data in a unified database like Oracle AI Database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Microservices:&lt;/strong&gt; Separate ingestion, search, and analytics as independently scaling components if needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cloud-native solutions:&lt;/strong&gt; Consider managed vector databases for elasticity and reduced operational burden.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tooling Reminders
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use specialized libraries (FAISS, Annoy,&amp;nbsp;HNSWLib) for local development, prototyping, or custom solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For production or enterprise use, rely on databases with native vector support and robust security, such as Oracle AI Database.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions (FAQ)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are vector embeddings and why do they matter?
&lt;/h3&gt;

&lt;p&gt;Vector embeddings are dense, high-dimensional numeric representations of objects like text, images, audio, or code. They place semantically&amp;nbsp;similar items&amp;nbsp;near each other in a continuous space, enabling tasks like semantic search, recommendations, RAG, deduplication, and anomaly detection. Compared with keyword or symbolic methods,&amp;nbsp;embeddings&amp;nbsp;better capture meaning, handle synonyms/paraphrases, and are robust across languages and modalities.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the main challenges in storing and querying embeddings at scale?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Volume and dimensionality: Billions of vectors, often 128–1024+ dimensions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance: Fast indexing and low-latency search, efficient batch/stream ingestion&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Distributed ops: Sharding, replication, consistency, monitoring, and failover&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost: Compute, storage, and operational overhead&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Security: Encryption at rest/in transit, access control, auditing, data sanitization, and advanced cryptographic techniques for sensitive data&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where should I store embeddings: a dedicated vector database or a database with vector support?
&lt;/h3&gt;

&lt;p&gt;Two common patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Specialized vector databases (e.g., Pinecone,&amp;nbsp;Weaviate, Milvus, Vespa) for high-scale, low-latency similarity search with ANN, SDKs, and REST APIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQL/NoSQL databases with vector support (e.g., Oracle AI Database, PostgreSQL with&amp;nbsp;pgvector, MongoDB, Redis) for blending vectors with structured data and enabling hybrid queries. Your choice should consider scale, integration with existing data, security, cost, and operational complexity.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What does Oracle AI Database provide for embeddings?
&lt;/h3&gt;

&lt;p&gt;Oracle AI Database offers native vector types and indexing, integrated similarity search in SQL, enterprise-grade security (encryption, granular access control, auditing), and high scalability. It supports hybrid analytical queries across structured, unstructured, and vector data. With Oracle Autonomous AI Database and libraries like&amp;nbsp;&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;langchain-oracledb&lt;/a&gt;, teams can generate, store, and query embeddings within one platform—avoiding data silos and extra operational overhead. Encrypt data, enforce access controls, and&amp;nbsp;monitor&amp;nbsp;usage to meet enterprise requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Storing and querying vector embeddings is a critical enabler for next-generation AI and data applications. By leveraging the right databases, libraries, and best practices, organizations and engineers can unlock new value from unstructured content, while maintaining performance, scalability, and security.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;LangChain - Oracle AI Vector Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/oracle/langchain-oracle" rel="noopener noreferrer"&gt;GitHub - LangChain-Oracle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.oracle.com/database/ai-vector-search/" rel="noopener noreferrer"&gt;Oracle AI Vector Search&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>oracle</category>
      <category>database</category>
      <category>ai</category>
      <category>vectorsearch</category>
    </item>
    <item>
      <title>Agent Memory: A Free Short Course on Building Memory-Aware Agents</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Thu, 16 Apr 2026 12:13:17 +0000</pubDate>
      <link>https://dev.to/oracledevs/agent-memory-a-free-short-course-on-building-memory-aware-agents-365k</link>
      <guid>https://dev.to/oracledevs/agent-memory-a-free-short-course-on-building-memory-aware-agents-365k</guid>
      <description>&lt;p&gt;Oracle and DeepLearning.AI have launched &lt;a href="https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Memory: Building Memory-Aware Agents&lt;/strong&gt;&lt;/a&gt;, a free short course on DeepLearning.AI that teaches developers how to architect memory systems that give agents persistence, continuity, and the ability to learn over time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Memory turns a stateless LLM into an agent that learns over time. How to architect agentic memory is one of the most debated topics in AI right now. This course gives AI developers and engineers a comprehensive view of the most common memory patterns."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Andrew Ng, Founder, DeepLearning.AI&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most agents forget. Each new session starts from zero, accumulated context from previous interactions is discarded, and the agent has no mechanism to learn from what it has already done. As a result, AI developers often rely on workarounds: cramming everything into the context window, reloading conversation logs, or bolting on ad-hoc retrieval.&lt;/p&gt;

&lt;p&gt;These approaches can work, but they don't provide a clear mental model for how information should live inside an agentic system boundary. This course treats memory as a first-class citizen in AI agents, and is built around that memory-first perspective.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"For the past few years, we have focused on prompt and context engineering to get the best results from a single LLM call. But engineering the right context for agents that need to work over days or weeks needs an effective memory system. This course takes that memory-first approach to building agents."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Richmond Alake, AI Developer Experience Director, Oracle&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Beyond Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;You’ve heard about prompt engineering. You've probably heard about context engineering. This course introduces the next layer: &lt;strong&gt;memory engineering&lt;/strong&gt;, treating long-term memory as first-class infrastructure that is external to the model, persistent, and structured.&lt;/p&gt;

&lt;p&gt;The course covers the full memory stack across five hands-on modules, built on LangChain, Tavily, and Oracle AI Database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Why AI Agents Need Memory:&lt;/strong&gt; Explore failure modes of stateless agents and the memory-first architecture used throughout the course.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constructing the Memory Manager:&lt;/strong&gt; Design persistent memory stores across memory types, model memory data for efficient retrieval, and implement a manager that orchestrates read, write, and retrieval operations during agent execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling Agent Tool Use with Semantic Tool Memory:&lt;/strong&gt; Treat tools as procedural memory, index them in a vector store, and retrieve only contextually relevant tools at inference time using semantic search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Operations: Extraction, Consolidation, and Self-Updating Memory:&lt;/strong&gt; Build LLM-powered pipelines that extract structured facts from raw interactions, consolidate episodic memory into semantic memory, and implement write-back loops that let an agent autonomously update and resolve conflicts in its own knowledge base.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-Aware Agent:&lt;/strong&gt; Assemble a stateful agent that initializes from long-term memory at startup, checkpoints intermediate reasoning states during execution, and persists learned context across sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"The patterns we cover here are not theoretical. AI developers and engineers will walk through real implementations: building memory stores, wiring up extraction pipelines, and handling contradictions in memory. You leave with working code you can adapt for your own production agents."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Nacho Martinez, AI Developer Advocate, Oracle&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Oracle AI Database as the Agent Memory Core
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database serves as the unified agent memory core throughout the course. Instead of treating a database as a passive store, the course demonstrates how Oracle AI Database functions as the active retrieval and persistence layer that makes each memory pattern work in production.&lt;/p&gt;

&lt;p&gt;Oracle AI Database brings key retrieval strategies into a single engine, including vector search for semantic similarity and unstructured knowledge retrieval, graph traversal for relationship-aware reasoning across connected entities, and relational queries for structured, transactional memory that demands precision and consistency. This helps reduce complexity by avoiding separate systems for different data types.&lt;/p&gt;

&lt;p&gt;The memory patterns taught in this course, such as semantic tool memory, self-updating memory, and memory consolidation, are the same patterns used to build production-grade agentic systems on Oracle AI Database. This course puts that architecture directly in the hands of AI developers and engineers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who This Course Is For
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent Memory: Building Memory-Aware Agents&lt;/strong&gt; is designed for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI developers and engineers building or evaluating agentic systems who need production-grade memory architecture&lt;/li&gt;
&lt;li&gt;ML engineers integrating LLMs into multi-turn or multi-session workflows&lt;/li&gt;
&lt;li&gt;Developers working with LangChain, LangGraph, or Tavily who want durable, structured memory&lt;/li&gt;
&lt;li&gt;Technical leaders assessing Oracle AI Database for agent infrastructure at scale&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Availability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent Memory: Building Memory-Aware Agents&lt;/strong&gt; is available now on DeepLearning.AI. The course is free to access and requires no prior Oracle experience. Developers can &lt;a href="https://www.deeplearning.ai/short-courses/agent-memory-building-memory-aware-agents/" rel="noopener noreferrer"&gt;enroll in the course&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  About Oracle AI Database
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database is a converged database platform built for AI workloads. It provides native vector search, graph traversal, relational retrieval, and the persistence infrastructure required for production agent memory systems in a single database engine. This removes the fragmented infrastructure that can become a bottleneck for AI innovation. Oracle AI Database is used by developers and enterprises as the unified memory core for AI agents to build and deploy intelligent, secure, memory-aware systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>oracle</category>
      <category>database</category>
      <category>agents</category>
    </item>
    <item>
      <title>A Practical Guide to Choosing the Right Memory Substrate for Your AI Agents</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Thu, 16 Apr 2026 12:11:25 +0000</pubDate>
      <link>https://dev.to/oracledevs/a-practical-guide-to-choosing-the-right-memory-substrate-for-your-ai-agents-33hj</link>
      <guid>https://dev.to/oracledevs/a-practical-guide-to-choosing-the-right-memory-substrate-for-your-ai-agents-33hj</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't conflate interface with substrate.&lt;/strong&gt; Filesystems win as an interface (LLMs already know how to use them); databases win as a substrate (concurrency, auditability, semantic search).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For prototypes, files are hard to beat.&lt;/strong&gt; Simple, transparent, debuggable—a folder of markdown gets you surprisingly far when iteration speed matters most.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared state demands a database.&lt;/strong&gt; Concurrent filesystem writes can silently corrupt data. If multiple agents or users touch the same memory, start with database guarantees.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic retrieval beats keyword search at scale.&lt;/strong&gt; Grep performance degrades on paraphrases and synonyms. Vector search finds content by meaning, this is critical once your knowledge base grows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid polyglot persistence.&lt;/strong&gt; Running separate systems for vectors, documents, and transactions means four failure modes. Oracle AI Database simplifies your memory architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI developers are watching agent engineering evolve in real time, with leading teams openly sharing what works. One principle keeps showing up from the front lines: &lt;strong&gt;build within the LLM’s constraints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In practice, two constraints dominate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLMs are stateless across sessions&lt;/strong&gt; (no durable memory unless you bring it back in).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context windows are bounded&lt;/strong&gt; (and performance can degrade as you stuff more tokens in).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So “just add more context” isn’t a reliable strategy due to the quadratic cost of attention mechanisms and the degradation of reasoning capabilities as context fills up. The winning pattern is &lt;strong&gt;external memory + disciplined retrieval&lt;/strong&gt;: store state outside the prompt (artifacts, decisions, tool outputs), then pull back only what matters for the current loop.&lt;/p&gt;

&lt;p&gt;There’s also a useful upside: because models are trained on internet-era developer workflows, they’re unusually competent with &lt;strong&gt;developer-native interfaces&lt;/strong&gt;: repos, folders, markdown, logs, and CLI-style interactions. That’s why filesystems keep showing up in modern agent stacks.&lt;/p&gt;

&lt;p&gt;This is where the debate heats up: “files are all you need” for agent memory. Most arguments collapse because they treat &lt;strong&gt;interface&lt;/strong&gt;, &lt;strong&gt;storage&lt;/strong&gt;, and &lt;strong&gt;deployment&lt;/strong&gt; as the same decision. They aren’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filesystems are winning as an interface&lt;/strong&gt; because models already know how to list directories, grep for patterns, read ranges, and write artifacts. &lt;strong&gt;Databases are winning as a substrate&lt;/strong&gt; because once memory must be shared, audited, queried, and made reliable under concurrency, you either adopt database guarantees or painfully reinvent them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-FILEvsDB.drawio-4-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-FILEvsDB.drawio-4-scaled.png" alt="Filesystem interface versus database substrate for AI agent memory" width="800" height="755"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this piece, we give a systematic comparison of filesystems and databases for agent memory: where each approach shines, where it breaks down, and a decision framework for choosing the right foundation as you move from prototype to production.&lt;/p&gt;

&lt;p&gt;Our aim is to educate AI developers on various approaches to agent memory, backed by performance guidance and working code.&lt;/p&gt;

&lt;p&gt;All code presented in this article can be found &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/fs_vs_dbs.ipynb" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Agent Memory and Its Importance
&lt;/h2&gt;

&lt;p&gt;Let’s take the common use case of building a Research Assistant with Agentic capabilities.&lt;/p&gt;

&lt;p&gt;You build a Research Assistant agent that performs brilliantly in a demo; in the current execution, it can search arXiv, summarize papers, and draft a clean answer in a single run. Then you come back the next morning, start from a clean run, and then prompt the agent: &lt;em&gt;“Continue from where we left off, and also compare Paper A to Paper B.”&lt;/em&gt; The agent responds as if it has never met you because LLMs are inherently stateless. Unless you send prior context back in, the model has no durable awareness of what happened in previous turns or previous sessions.&lt;/p&gt;

&lt;p&gt;Once you move beyond single-turn Q&amp;amp;A into long-horizon tasks, deep research, multi-step workflows, and multi-agent coordination, you need a way to preserve continuity when the context window truncates, sessions restart, or multiple workers act on shared state. This takes us into the realm of leveraging systems of record for agents and introduces the concept of Agent Memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Stateless LLM Problem
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-2.drawio-7-scaled.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-2.drawio-7-scaled.png" title="Why your Research Assistant forgets everything between sessions?" alt="Why your Research Assistant forgets everything between sessions?" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why your Research Assistant forgets everything between sessions?&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Agent Memory?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Agent memory is the set of system components and techniques that enable an AI agent to store, recall, and update information over time so it can adapt to new inputs and maintain continuity across long-horizon tasks.&lt;/strong&gt; Core components typically include the language and embedding model, information retrieval mechanisms, and a persistent storage layer such as a database.​​&lt;/p&gt;

&lt;h3&gt;
  
  
  Types of Agent Memory
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-Types-of-Agent-Memory.drawio-6-1024x764.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Ffs-db-Types-of-Agent-Memory.drawio-6-1024x764.png" title="Types of Agent Memory" alt="Types of Agent Memory" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Types of Agent Memory&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In practical systems, agent memory is usually classified into two distinct forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short-term memory (working memory):&lt;/strong&gt; whatever is currently inside the context window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term memory:&lt;/strong&gt; a persistent state that survives beyond a single call or session (facts, artifacts, plans, prior decisions, tool outputs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Concepts and techniques associated with agent memory all come together within the agent loop and the agent harness, as demonstrated in this &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/fs_vs_dbs.ipynb" rel="noopener noreferrer"&gt;notebook&lt;/a&gt; and explained later in this article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Loop and Agent Harness
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The agent loop is the iterative execution cycle in which an LLM receives instructions from the environment and decides whether to generate a response or make a tool call based on its internal reasoning about the input provided in the current loop.&lt;/strong&gt; This process repeats until the LLM produces a final output or an exit criterion is met. At a high level, the following operations are present within the agent loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Assemble context&lt;/strong&gt; (user request + relevant memory + tool json schemas).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Call the model&lt;/strong&gt; (plan, decide next action).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Take actions&lt;/strong&gt; (tools, search, code execution, database queries).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observe results&lt;/strong&gt; (tool outputs, errors, intermediate artifacts).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update memory&lt;/strong&gt; (write transcripts, store artifacts, summarize, index).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat&lt;/strong&gt; until the task completes or hands control back to the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Anthropic’s &lt;a href="https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents" rel="noopener noreferrer"&gt;guidance&lt;/a&gt; on long-running agents directly points to this: they describe harness practices that help agents quickly re-understand the state of work when starting with a fresh context window, including maintaining explicit progress artifacts.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;agent harness&lt;/strong&gt; is the surrounding runtime and rules that make the loop reliable: how you wire tools, where you write artifacts, how you log/trace behavior, how you manage memory, and how you prevent the agent from drowning in context.&lt;/p&gt;

&lt;p&gt;To complete the picture, the discipline of context engineering is heavily involved in the agent loop and aspect of the agent harness itself. &lt;strong&gt;Context engineering is the systematic design and curation of the content placed in an LLM’s context window so that the model receives high-signal tokens and produces the intended, reliable output within a fixed budget&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this piece, we implement context engineering as a set of repeatable techniques inside the agent harness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context retrieval and selection:&lt;/strong&gt; Pull only what is relevant (via grep for filesystem memory, via vector similarity and SQL filters for database memory).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive disclosure:&lt;/strong&gt; Start small (snippets, tails, line ranges) and expand only when needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context offloading:&lt;/strong&gt; Write large tool outputs and artifacts outside the prompt, then reload selectively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context reduction:&lt;/strong&gt; Summarize or compact information when you approach a degradation threshold, then store the summary in durable memory so you can rehydrate later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The concepts and explanations above set us up for the rest of the comparison we introduce in this piece. Now that we have the “why” and the moving parts (stateless models, the agent loop, the agent harness, and memory), we can evaluate the two dominant substrates teams are using today to make memory real: the filesystem and the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filesystem-first Agentic Research Assistant
&lt;/h2&gt;

&lt;p&gt;A filesystem-based memory architecture is not “the agent remembers everything forever”. It is the agent that can persist state and artifacts outside the context window and then pull them back selectively when needed. This aligns with two of the earlier-mentioned LLM constraints: a limited context window and statelessness.&lt;/p&gt;

&lt;p&gt;In our Research Assistant, the filesystem becomes the memory substrate. Rather than injecting a large number of tools and extensive documentation into the LLM's context window (which would inflate the token count and trigger early summarization), we store them on disk and let the agent search and selectively read what it needs. This matches with what the Applied AI team at Cursor calls “&lt;a href="https://cursor.com/blog/dynamic-context-discovery" rel="noopener noreferrer"&gt;Dynamic Context Discovery&lt;/a&gt;”: write large output to files, then let the agent &lt;code&gt;tail&lt;/code&gt; and read ranges as required.&lt;/p&gt;

&lt;p&gt;Our FSAgent and &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/fs_vs_dbs.ipynb" rel="noopener noreferrer"&gt;demo&lt;/a&gt; is using valid filesystem-OS related operations (such as tail and cat to read the contents of files; but that this is a very "simplified" approach, with a limited number of operations for demonstration purposes, and the capabilities offered in the file system can be optimized (with other commands and implementations).&lt;/p&gt;

&lt;p&gt;On the other hand, it's a great start for people to get familiarized with tool access and how file system memory is achieved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-10.png" alt="Filesystem-first agent memory architecture with semantic, episodic, and procedural memory layers" width="610" height="545"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Semantic memory (durable knowledge):&lt;/strong&gt; papers and reference docs saved as markdown.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Episodic memory (experience):&lt;/strong&gt; conversation transcripts + tool outputs per session/run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedural memory (how to work):&lt;/strong&gt; “rules” / instructions files (e.g., CLAUDE.md / AGENTS.md) that shape behavior across sessions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What does this look like in tooling?
&lt;/h3&gt;

&lt;p&gt;Before we jump into the code, here’s the minimal tool surface we provide to the agent in the table below. Notice the pattern: instead of inventing specialized “memory APIs,” we expose a small set of filesystem primitives and let the agent compose them (very Unix).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;arxiv_search_candidates(query, k=5)&lt;/td&gt;
&lt;td&gt;Searches arXiv and returns a JSON list of candidate papers with IDs, titles, authors, and abstracts.&lt;/td&gt;
&lt;td&gt;JSON string of paper candidates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fetch_and_save_paper(arxiv_id)&lt;/td&gt;
&lt;td&gt;Fetches full paper text (PDF → text) and saves to &lt;code&gt;semantic/knowledge_base/&amp;lt;id&amp;gt;.md&lt;/code&gt;. Avoids routing full content through the LLM.&lt;/td&gt;
&lt;td&gt;File path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;read_file(path)&lt;/td&gt;
&lt;td&gt;Reads a file from disk and returns its contents in full (use sparingly).&lt;/td&gt;
&lt;td&gt;Full file contents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tail_file(path, n_lines=80)&lt;/td&gt;
&lt;td&gt;Reads the last N lines of a file (first step for large files).&lt;/td&gt;
&lt;td&gt;Last N lines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;read_file_range(path, start_line, end_line)&lt;/td&gt;
&lt;td&gt;Reads a line range to “zoom in” without loading everything.&lt;/td&gt;
&lt;td&gt;Selected line range&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;grep_files(pattern, root_dir, file_glob)&lt;/td&gt;
&lt;td&gt;Grep-like search across files to find relevant passages quickly.&lt;/td&gt;
&lt;td&gt;Matches with file path + line number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;list_papers()&lt;/td&gt;
&lt;td&gt;Lists all locally saved papers in &lt;code&gt;semantic/knowledge_base/&lt;/code&gt;.&lt;/td&gt;
&lt;td&gt;List of filenames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;conversation_to_file(run_id, messages)&lt;/td&gt;
&lt;td&gt;Appends conversation entries to one transcript file per run in &lt;code&gt;episodic/conversations/&lt;/code&gt;.&lt;/td&gt;
&lt;td&gt;File path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;summarise_conversation_to_file(run_id, messages)&lt;/td&gt;
&lt;td&gt;Saves full transcript, then writes a compact summary to &lt;code&gt;episodic/summaries/&lt;/code&gt;.&lt;/td&gt;
&lt;td&gt;Dict with transcript + summary paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;monitor_context_window(messages)&lt;/td&gt;
&lt;td&gt;Estimates current context usage (tokens used/remaining).&lt;/td&gt;
&lt;td&gt;Dict with token stats&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This design directly reflects what the AI ecosystem is converging on: a filesystem and a handful of core tools, rather than an explosion of bespoke tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Progressive reading (read, tail, range)
&lt;/h3&gt;

&lt;p&gt;The first memory principle implementation is simple: &lt;strong&gt;don’t load large files unless you must&lt;/strong&gt;. Filesystems are excellent at sequential read/write and work naturally with tools like &lt;code&gt;grep&lt;/code&gt; and log-style access. This makes them a strong fit for append-only transcript and artifact storage.&lt;/p&gt;

&lt;p&gt;That’s why we implement three reading tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read everything (rare),&lt;/li&gt;
&lt;li&gt;Read the end (common for logs/transcripts)&lt;/li&gt;
&lt;li&gt;Read a slice (common for zooming into a match)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tools below were implemented in Python and converted into objects callable by a langchain agent using the &lt;code&gt;@tool&lt;/code&gt; decorator from the langchain agent module.&lt;/p&gt;

&lt;p&gt;First is the &lt;code&gt;read_file&lt;/code&gt; tool, the “load it all” option. This tool is useful when the file is small, or you truly need the full artifact, but it’s intentionally not the default because it can expand the context window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;tail_file&lt;/code&gt; function is the first step for large files. It grabs the end of a log/transcript to quickly see the latest or most relevant portion before deciding whether to read more.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tail_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_lines&lt;/span&gt;&lt;span class="p"&gt;):])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;read_file_range&lt;/code&gt; function is seen as the surgical tool that is used once you’ve located the right region (often via &lt;code&gt;grep&lt;/code&gt; or after a &lt;code&gt;tail&lt;/code&gt;), pulls in just the exact line span you need, so the agent stays token-efficient and grounded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_file_range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;end_line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Empty range: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;start_line&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;end_line&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (file has &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; lines)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, this is essentially dynamic context discovery in a microcosm: load a small view first, then expand only when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grep-style search (find first, read second)
&lt;/h3&gt;

&lt;p&gt;A filesystem-based agent should quickly find relevant material and pull only the exact slices it needs. This is why &lt;code&gt;grep&lt;/code&gt; is such a recurring theme in the agent tooling conversation: it gives the model a fast way to locate relevant regions before spending tokens to pull content.&lt;/p&gt;

&lt;p&gt;Here’s a simple grep-like tool that returns line-numbered hits so the agent can immediately jump to &lt;code&gt;read_file_range&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;grep_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;file_glob&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;max_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;ignore_case&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="n"&gt;root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Directory not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ignore_case&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;rx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid regex pattern: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_glob&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_file&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="k"&gt;continue&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_posix&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;max_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;[TRUNCATED: max_matches reached]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;continue&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No matches found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One subtle but important detail in our grep_files implementation is how we read files. Rather than loading entire files into memory with &lt;code&gt;read_text().splitlines()&lt;/code&gt;, we iterate lazily with for line in open(fp), which streams one line at a time and keeps memory usage constant regardless of file size.&lt;/p&gt;

&lt;p&gt;This aligns with the "find first, read second" philosophy: locate what you need without loading everything upfront. For readers interested in maximum performance, the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/fs_vs_dbs.ipynb" rel="noopener noreferrer"&gt;full notebook&lt;/a&gt; also includes a grep_files_os_based variant that shells out to ripgrep or grep, leveraging OS-level optimizations like memory-mapped I/O and SIMD instructions. In practice, this pattern (“search first, then read a range”) is one reason filesystem agents can feel surprisingly strong on focused corpora: the agent iteratively narrows the context instead of relying on a single-shot retrieval query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool outputs as files: keeping big JSON out of the prompt
&lt;/h3&gt;

&lt;p&gt;One of the fastest ways to blow up your context window is to return large JSON payloads from tools. &lt;a href="https://cursor.com/blog/dynamic-context-discovery" rel="noopener noreferrer"&gt;Cursor’s approach&lt;/a&gt; is to write these results to files and let the agent inspect them on demand (often starting with tail).&lt;/p&gt;

&lt;p&gt;That’s exactly why our folder structure includes a &lt;code&gt;tool_outputs/&amp;lt;session_id&amp;gt;/&lt;/code&gt; directory: it acts like an “evidence locker” for everything the agent did, without forcing those payloads into the current context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"ts_utc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-27T12:41:12.135396+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arxiv_search_candidates"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{'query': 'memgpt'}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content='[&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n {&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;arxiv_id&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;2310.08560v2&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;entry_id&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;http://arxiv.org/abs/2310.08560v2&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;title&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;MemGPT: Towards LLMs as Operating Systems&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;authors&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;published&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;2024-02-12&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;n &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;abstract&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: ...msPnaMxOl8Pa'"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Putting it together: the agent toolset
&lt;/h3&gt;

&lt;p&gt;Before we create the agent, we bundle the tools into a small, composable toolbox. This matches the broader trend: agents often perform better with a smaller tool surface, less choice paralysis (aka context confusion), fewer weird and overlapping tool schemas, and more reliance on proven filesystem workflows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FS_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
 &lt;span class="n"&gt;arxiv_search_candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# search arXiv for relevant research papers
&lt;/span&gt; &lt;span class="n"&gt;fetch_and_save_paper&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# fetch paper text (PDF-&amp;gt;text) and save to semantic/knowledge_base/&amp;lt;id&amp;gt;.md
&lt;/span&gt; &lt;span class="n"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# read a file in full (use sparingly)
&lt;/span&gt; &lt;span class="n"&gt;tail_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# read end of file first
&lt;/span&gt; &lt;span class="n"&gt;read_file_range&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# read a specific line range
&lt;/span&gt; &lt;span class="n"&gt;conversation_to_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# append conversation entries to episodic memory
&lt;/span&gt; &lt;span class="n"&gt;summarise_conversation_to_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# save transcript + compact summary
&lt;/span&gt; &lt;span class="n"&gt;monitor_context_window&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# estimate token usage
&lt;/span&gt; &lt;span class="n"&gt;list_papers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# list saved papers
&lt;/span&gt; &lt;span class="n"&gt;grep_files&lt;/span&gt; &lt;span class="c1"&gt;# grep-like search over files
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The “filesystem-first” system prompt: policy beats cleverness
&lt;/h3&gt;

&lt;p&gt;Filesystem tools alone aren’t enough, you also need &lt;strong&gt;a reading policy&lt;/strong&gt; that keeps the agent's token usage efficient and grounded. This is the same reason &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, and &lt;code&gt;SKILLS.md&lt;/code&gt; matter: they’re procedural memory that is applied consistently across sessions.&lt;/p&gt;

&lt;p&gt;Key policies we encode below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store big artifacts on disk (papers, tool outputs, transcripts).&lt;/li&gt;
&lt;li&gt;Prefer grep + range reads over full reads.&lt;/li&gt;
&lt;li&gt;Use tail first for large files and logs.&lt;/li&gt;
&lt;li&gt;Be explicit about what you actually read (grounding).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is the implementation of an agent using the langchain framework.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fs_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OPENAI_MODEL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FS_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a conversational research ingestion agent.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Core behavior:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- When asked to find a paper: use arxiv_search_candidates, pick the best arxiv_id, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;then call fetch_and_save_paper to store the full text in semantic/knowledge_base/.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Papers/knowledge base live in semantic/knowledge_base/.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Conversations (transcripts) live in episodic/conversations/ (one file per run).&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Summaries live in episodic/summaries/.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Conversation may be summarised externally; respect summary + transcript references.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What the memory footprint looks like on disk
&lt;/h3&gt;

&lt;p&gt;After running the agent, you end up with a directory layout that makes the agent’s “memory” tangible and inspectable. In your example, the agent produces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;episodic/conversations/fsagent_session_0010.md&lt;/code&gt; — the session transcript (episodic memory)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;episodic/tool_outputs/fsagent_session_0010/*.json&lt;/code&gt; — tool results saved as files (evidence + replay)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;semantic/knowledge_base/*.md&lt;/code&gt; — saved papers (semantic memory)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is &lt;em&gt;exactly&lt;/em&gt; the point of filesystem-first memory: the model doesn’t “remember” by magically retaining state; it “remembers” because it can re-open, search, and selectively read its prior artifacts.&lt;/p&gt;

&lt;p&gt;This is also why so many teams keep rediscovering the same pattern: files are a simple abstraction, and agents are surprisingly good at using them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages of File Systems In AI Agents
&lt;/h2&gt;

&lt;p&gt;In the previous section, we showed what a filesystem‑first memory harness looks like in practice: the agent writes durable artifacts (papers, tool outputs, transcripts) to disk, then “remembers” by searching and selectively reading only the parts it needs.&lt;/p&gt;

&lt;p&gt;This approach works because it directly addresses two core constraints of LLMs: limited context windows and inherent statelessness. Once those constraints are handled, it becomes clear why file systems so often become the default interface for early agent systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pretraining‑native interface:&lt;/strong&gt; LLMs have ingested massive amounts of repos, docs, logs, and README‑driven workflows, so folders and files are a familiar operating surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple primitives, strong composition:&lt;/strong&gt; A small action set (list/read/write/search) composes into sophisticated behavior without needing schemas, migrations, or query planning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token efficiency via progressive disclosure:&lt;/strong&gt; Retrieve via search, then load a small slice (snippets, line ranges) instead of dumping entire documents into the prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural home for artifacts and evidence:&lt;/strong&gt; Transcripts, intermediate results, cached documents, and tool outputs fit cleanly as files and remain human‑inspectable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debuggable by default:&lt;/strong&gt; You can open the directory and see exactly what the agent saved, what tools returned, and what the agent could have referenced.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability:&lt;/strong&gt; A folder is easy to copy, zip, diff, version, and replay elsewhere, great for demos, reproducibility, and handoffs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low operational overhead:&lt;/strong&gt; For PoCs and MVPs, you get persistence and structure without provisioning extra infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, filesystem memory excels when the workload is artifact‑heavy (research notes, paper dumps, transcripts), when you want a clear audit trail, and when iteration speed matters more than sophisticated retrieval. It also encourages good agent hygiene: write outputs down, cite sources, and load only what you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disadvantages of Filesystems In AI Agents
&lt;/h2&gt;

&lt;p&gt;But, unfortunately, it doesn’t end there. The same strengths that make files attractive, simplicity, relatively low cost, and fast implementation, can quickly become bottlenecks once you promote these systems into production, where they are expected to behave like a shared, reliable memory platform.&lt;/p&gt;

&lt;p&gt;As soon as an agent moves beyond single-user prototypes into real-world scenarios, where concurrent reads and writes are the norm and robustness under load is non-negotiable, filesystems start to show their limits.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weak concurrency guarantees by default:&lt;/strong&gt; Multiple processes can overwrite or interleave writes unless you implement locking correctly. Even then, locking semantics vary across platforms and network filesystems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No ACID transactions:&lt;/strong&gt; You don’t get atomic multi-step updates, isolation between writers, or durable commit semantics without building them. Partial writes and mid-operation failures can leave memory in inconsistent states.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search quality is usually brittle:&lt;/strong&gt; Keyword/grep-style retrieval misses meaning, synonyms, and paraphrases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling becomes “death by a thousand files”:&lt;/strong&gt; Directory bloat, fragmented artifacts, and expensive scans make performance degrade as memory grows, especially if you rely on repeated full-folder searches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexing is DIY:&lt;/strong&gt; The moment you want fast retrieval, deduplication, ranking, or recency weighting, you end up maintaining your own indexes and metadata stores (which, being honest here…is basically a database).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata and schema drift:&lt;/strong&gt; Agents inevitably accumulate extra fields (source URLs, timestamps, embeddings, tags). Keeping those consistent across files is harder than enforcing constraints in tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poor multi-user / multi-agent coordination:&lt;/strong&gt; Shared memory across agents means shared state. Without a central coordinator, you’ll hit race conditions, inconsistent views, and an unclear “source of truth.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Harder auditing at scale:&lt;/strong&gt; Files are human-readable, but reconstructing “what happened” across many runs and threads becomes messy without structured logs, timestamps, and queryable history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security and access control are coarse:&lt;/strong&gt; Permissions are filesystem-level, not row-level. It’s hard to enforce “agent A can read X but not Y” without duplicating data or adding an auth layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core pattern is that filesystem memory stays attractive until you need correctness under concurrency, semantic retrieval, or structured guarantees. At that point, you either accept the limitations (and keep the agent single-user/single-process) or you adopt a database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database For Agent Memory
&lt;/h2&gt;

&lt;p&gt;By this point, most AI developers can see why filesystem first agent implementations are having a moment. It is a familiar interface, easy to prototype with, and our agents can “remember” by writing artifacts to disk and reloading them later via search plus selective reads. For a single developer on a laptop, that is often enough. But once we move beyond “it works on my laptop” and start supporting developers who ship to thousands or millions of users, memory stops being a folder of helpful files and becomes a shared system that has to behave predictably under load.&lt;/p&gt;

&lt;p&gt;Databases were created for the exact moment when “a pile of files” stops being good enough because too many people and processes are touching the same data. One of the &lt;a href="https://www.ibm.com/docs/en/zos-basic-skills?topic=now-history-ims-beginnings-nasa" rel="noopener noreferrer"&gt;most-cited&lt;/a&gt; origin stories of the database dates to the Apollo era. IBM, alongside partners, built what became IMS to manage complex operational data for the program, and early versions were installed in 1968 at the Rockwell Space Division, supporting NASA. The point was not simply storage. It was coordination, correctness, and the ability to trust shared data while many activities were happening simultaneously.&lt;/p&gt;

&lt;p&gt;That same production reality is what pushes agent memory toward databases today.&lt;/p&gt;

&lt;p&gt;When agent memory must handle concurrent reads and writes, preserve an auditable history of what happened, support fast retrieval across many sessions, and enforce consistent updates, we want database guarantees rather than best-effort file conventions.&lt;/p&gt;

&lt;p&gt;Oracle has been solving these exact problems since 1979, when we shipped the first commercial SQL database. The goal then was the same as now: make shared state reliable, portable, and trustworthy under load.&lt;/p&gt;

&lt;p&gt;On that note, allow us to show how this can work in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database-first Research Assistant
&lt;/h2&gt;

&lt;p&gt;In the filesystem first section, our Research Assistant “remembered” by writing artifacts to disk and reloading them later using cheap search plus selective reads. That is a great starting point. But when we want memory that is shared, queryable, and reliable under concurrent use, we need a different foundation.&lt;/p&gt;

&lt;p&gt;In this iteration of our agent, we keep the same user experience and the same high-level job. Search arXiv, ingest papers, answer follow-up questions, and maintain continuity across sessions. The difference is that memory now lives in the Oracle AI Database, where we can make it durable, indexed, filterable, and safe for concurrent reads and writes. We also achieve a clean separation between two memory surfaces: structured history in SQL tables and semantic recall via vector search.&lt;/p&gt;

&lt;p&gt;The result is what we call a MemAgent, an agent whose memory is not a folder of artifacts, but a queryable system. It is designed to support multi-threaded sessions, store full conversational history, store tool logs for debugging and auditing, and store a semantic knowledge base that can be searched by meaning rather than keywords.&lt;/p&gt;

&lt;h3&gt;
  
  
  Available tools for MemAgent
&lt;/h3&gt;

&lt;p&gt;Before we wire up the agent loop, we need to define the tool surface that MemAgent can use to reason, retrieve, and persist knowledge. The design goal here is similar to the filesystem-first approach: keep the toolset small and composable, but shift the memory substrate from files to the database. Instead of grepping folders and reading line ranges, MemAgent uses vector similarity search to retrieve semantically relevant context, and it persists what it learns in a way that is queryable and reliable across sessions.&lt;/p&gt;

&lt;p&gt;In practice, that means two things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, ingestion tools do not just “fetch” content; they also chunk and embed it so it becomes searchable later.&lt;/li&gt;
&lt;li&gt;Second, retrieval tools are meaning-based rather than keyword-based, so the agent can find relevant passages even when the user paraphrases, uses synonyms, or asks higher-level conceptual questions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The table below summarizes the minimal set of tools we expose to MemAgent and where each tool stores its outputs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;arxiv_search_candidates(query, k)&lt;/td&gt;
&lt;td&gt;Searches arXiv for candidate papers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fetch_and_save_paper_to_kb_db(arxiv_id)&lt;/td&gt;
&lt;td&gt;Fetches paper, chunks text, stores embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;search_knowledge_base(query, k)&lt;/td&gt;
&lt;td&gt;Semantic search over stored papers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;store_to_knowledge_base(text, metadata)&lt;/td&gt;
&lt;td&gt;Manually store text with metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;FSAgent and MemAgent can look similar from the outside because both can ingest papers, answer questions, and maintain continuity. The difference is what powers that continuity and how retrieval works when the system grows.&lt;/p&gt;

&lt;p&gt;FSAgent relies on the operating system as its memory surface, which is great for iteration speed and human inspectability, but it typically relies on keyword-style discovery and file traversal. MemAgent treats memory as a database concern, which adds setup overhead, but unlocks indexed retrieval, stronger guarantees under concurrency, and richer ways to query and filter what the agent has learned.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;FSAgent (Filesystem)&lt;/th&gt;
&lt;th&gt;MemAgent (Database)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Search&lt;/td&gt;
&lt;td&gt;Keyword and grep&lt;/td&gt;
&lt;td&gt;Semantic similarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;Markdown files&lt;/td&gt;
&lt;td&gt;SQL tables + vector indexes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;Directory traversal&lt;/td&gt;
&lt;td&gt;Indexed queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query Language&lt;/td&gt;
&lt;td&gt;Paths and regex&lt;/td&gt;
&lt;td&gt;SQL + vector similarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup Complexity&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Requires database runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Creating data stores with LangChain and Oracle AI Database
&lt;/h3&gt;

&lt;p&gt;Before we start defining tables and vector stores, it is worth being explicit about the stack we are using and why. In this implementation, we are not building a bespoke agent framework from scratch.&lt;/p&gt;

&lt;p&gt;We use LangChain as the LLM framework to abstract the agent loop, tool calling, and message handling, then pair it with a model provider for reasoning and generation, and with Oracle AI Database as the unified memory core that stores both structured history and semantic embeddings.&lt;/p&gt;

&lt;p&gt;This separation is important because it mirrors how production agent systems are typically built. The agent logic evolves quickly, the model can be swapped, and the memory layer must remain reliable and queryable.&lt;/p&gt;

&lt;p&gt;Think of this as the agent stack. Each layer has a clear job, and together they create an agent that is both practical to build and robust enough to scale.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model provider (OpenAI):&lt;/strong&gt; generates reasoning, responses, and tool decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM framework (LangChain):&lt;/strong&gt; provides the agent abstraction, tool wiring, and runtime orchestration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified memory core (Oracle AI Database):&lt;/strong&gt; stores durable conversational memory in SQL and semantic memory in vector indexes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With that stack in place, the first step is simply to connect to the Oracle Database and initialize an embedding model. The database connection serves as the foundation for all memory operations, and the embedding model enables us to store and retrieve knowledge semantically through the vector store layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect_oracle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:1521/FREEPDB1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langchain_oracledb_demo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;database_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;connect_oracle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VECTOR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VectorPwd_2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;dsn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:1521/FREEPDB1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;devrel.content.filesystem_vs_dbs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using user:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/paraphrase-mpnet-base-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we define the database schema to store our agent’s memory and prepare a clean slate for the demo. We separate memory into distinct tables so each type can be managed, indexed, and queried appropriately.&lt;/p&gt;

&lt;p&gt;Installing the Oracle Database integration in the LangChain ecosystem is straightforward. You can add it to your environment with a single pip command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install -U langchain-oracledb&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Conversational history and logs are naturally tabular, while semantic and summary memory are stored in vector-backed tables through &lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;OracleVS&lt;/a&gt;. For reproducibility, we drop any existing tables from previous runs, making the notebook deterministic and avoiding confusing results when you re-run the walkthrough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleVS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_oracledb.vectorstores.oraclevs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_index&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;

&lt;span class="n"&gt;CONVERSATIONAL_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONVERSATIONAL_MEMORY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;KNOWLEDGE_BASE_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEMANTIC_MEMORY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;LOGS_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOGS_MEMORY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SUMMARY_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUMMARY_MEMORY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;ALL_TABLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
 &lt;span class="n"&gt;CONVERSATIONAL_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;KNOWLEDGE_BASE_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;LOGS_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;SUMMARY_TABLE&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ALL_TABLES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP TABLE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; PURGE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORA-00942&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (not exists)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; [FAIL] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create the vector stores and HNSW indexes
&lt;/h3&gt;

&lt;p&gt;For this section, it is worth explaining what a “vector store” actually is in the context of agents. A vector store is a storage system that persists embeddings alongside metadata and supports similarity search, so the agent can retrieve items by meaning rather than keywords.&lt;/p&gt;

&lt;p&gt;Instead of asking “which file contains this exact phrase”, the agent asks “which chunks are semantically closest to my question” and pulls back the best matches.&lt;/p&gt;

&lt;p&gt;Under the hood, that usually means an approximate nearest neighbor index, because scanning every vector becomes prohibitively expensive as your knowledge base grows. HNSW is one of the most common indexing approaches for this style of retrieval.&lt;/p&gt;

&lt;p&gt;In the code below, we create two vector stores using the langchain_oracledb module OracleVS, one for the knowledge base and one for summaries, both using cosine distance.&lt;/p&gt;

&lt;p&gt;Second, it builds HNSW indexes so similarity search stays fast as memory grows, which is exactly what you want once your Research Assistant starts ingesting many papers and running over long-lived threads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OracleVS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;KNOWLEDGE_BASE_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;distance_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;summary_vs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OracleVS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SUMMARY_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;distance_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DistanceStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safe_create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idx_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;idx_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idx_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HNSW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; Created index: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idx_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ORA-00955&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; [SKIP] Index already exists: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idx_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;raise&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Creating vector indexes...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;safe_create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kb_hnsw_cosine_idx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;safe_create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary_hnsw_cosine_idx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All indexes created!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory Manager
&lt;/h3&gt;

&lt;p&gt;In the code below, we create a custom Memory manager. The Memory manager is the abstraction layer that turns raw database operations into “agent memory behaviours”. This is the part that makes the database-first agent easy to reason about.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL methods store and load conversational history by &lt;code&gt;thread_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Vector methods store and retrieve semantic memory by similarity search&lt;/li&gt;
&lt;li&gt;Summary methods store compressed context and let us rotate the working set when we approach context limits
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
 A simplified memory manager for AI agents using Oracle AI Database.
 &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation_table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_log_table&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conversation_table&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary_vs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summary_vs&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_log_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_log_table&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_conversational_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;id_var&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
 INSERT INTO &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (thread_id, role, content, metadata, timestamp)
 VALUES (:thread_id, :role, :content, :metadata, CURRENT_TIMESTAMP)
 RETURNING id INTO :id
 &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;id_var&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
 &lt;span class="n"&gt;record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id_var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getvalue&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;id_var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getvalue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_conversational_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
 &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
 SELECT role, content FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
 WHERE thread_id = :thread_id AND summary_id IS NULL
 ORDER BY timestamp ASC
 FETCH FIRST :limit ROWS ONLY
 &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
 &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;read&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mark_as_summarized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
 UPDATE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
 SET summary_id = :summary_id
 WHERE thread_id = :thread_id AND summary_id IS NULL
 &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; Marked messages as summarized (summary_id: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata_json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;## Knowledge Base Memory: This are general information that is relevant to the question
### How to use: Use the knowledge base as background information that can help answer the question

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;full_content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
 &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_summary_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summary &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;summary_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No summary content.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_summary_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Summary Memory&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;No summaries available.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

 &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Summary Memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use expand_summary(id) to get full content:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
 &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;sid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; - [ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we instantiate it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;database_connection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;conversation_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CONVERSATION_HISTORY_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;tool_log_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_LOG_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;summary_vs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;summary_vs&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating the tools and agent
&lt;/h3&gt;

&lt;p&gt;The database-first agent follows a simple, production-friendly pattern.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persists every conversation turn as structured rows, including user and assistant messages with thread or run IDs and timestamps, so sessions are recoverable, traceable, and consistent across restarts.&lt;/li&gt;
&lt;li&gt;Persists long-term knowledge in a vector-enabled store by chunking documents, generating embeddings, and storing them with metadata, so retrieval is semantic, ranked, and fast as the corpus grows.&lt;/li&gt;
&lt;li&gt;Persists tool activity as first-class records that capture the tool name, inputs, outputs, status, errors, and key metadata, so agent behavior is inspectable, reproducible, and auditable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On top of that, the agent actively manages context: it tracks token usage and periodically rolls older dialogue and intermediate state into durable summaries (and/or “memory” tables), so the working prompt stays small while the full history remains available on demand.&lt;/p&gt;

&lt;h4&gt;
  
  
  Ingest papers into the knowledge base vector store
&lt;/h4&gt;

&lt;p&gt;This is the database-first equivalent of “fetch and save paper”. Instead of writing markdown files, we do three steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load paper text from arXiv&lt;/li&gt;
&lt;li&gt;Chunk it to respect the embedding model limits&lt;/li&gt;
&lt;li&gt;Store chunks with metadata in the vector store, which gives us fast semantic search later
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ArxivLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_text_splitters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_and_save_paper_to_kb_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArxivLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;load_max_docs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;doc_content_chars_max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No documents found for arXiv id: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

 &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

 &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arXiv &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="n"&gt;entry_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Entry ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entry_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
 &lt;span class="n"&gt;published&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
 &lt;span class="n"&gt;authors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

 &lt;span class="n"&gt;full_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loaded arXiv &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; but extracted empty text (PDF parsing issue).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

 &lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="n"&gt;ts_utc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
 &lt;span class="n"&gt;metadatas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
 &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
 &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arxiv_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entry_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entry_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authors&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingested_ts_utc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ts_utc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="n"&gt;knowledge_base_vs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved arXiv &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;arxiv_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;KNOWLEDGE_BASE_TABLE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks (title: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
 &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We create two more tools below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;search_knowledge_base(query, k=5):&lt;/strong&gt; Runs a semantic similarity search over the database-backed knowledge base and returns the top &lt;em&gt;k&lt;/em&gt; most relevant chunks, so the agent can retrieve context by meaning rather than exact keywords.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;store_to_knowledge_base(text, metadata_json="{}"):&lt;/strong&gt; Stores a new piece of text into the knowledge base and attaches metadata (as JSON), which gets embedded and indexed so it becomes searchable in future queries.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_to_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata_json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
 &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully stored text to knowledge base.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we build the LangChain agent using the database-first tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_agent&lt;/span&gt;

&lt;span class="n"&gt;MEM_AGENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OPENAI_MODEL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store_to_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arxiv_search_candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch_and_save_paper_to_kb_db&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Result Comparison: FSAgent vs MemAgent: End-to-End Benchmark (Latency + Quality)
&lt;/h2&gt;

&lt;p&gt;At this point, the difference between a filesystem agent and a database-backed agent should feel less like a philosophical debate and more like an engineering trade-off. Both approaches can “remember” in the sense that they can persist state, retrieve context, and answer follow-up questions. The real test is what happens when you leave the tidy laptop demo and hit production realities: &lt;strong&gt;larger corpora, fuzzier queries, and concurrent workloads&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To make that concrete, we ran an end-to-end benchmark and measured the full agent loop per query—retrieval, context assembly, tool calls, model invocations, and the final answer—across three scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Small-corpus retrieval:&lt;/strong&gt; a tight, keyword-friendly dataset to validate baseline retrieval and answer synthesis with minimal context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large-corpus retrieval:&lt;/strong&gt; a larger dataset with more paraphrase variability to stress retrieval quality and context efficiency at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent write integrity:&lt;/strong&gt; a multi-worker stress test to evaluate correctness under simultaneous reads/writes (integrity, race conditions, throughput).&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  FSAgent vs MemAgent: End-to-End Benchmark (Latency + Quality)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-7-1024x703.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-7-1024x703.png" alt="Benchmark chart comparing FSAgent and MemAgent on end-to-end latency and answer quality" width="800" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the result shown in the image above, two conclusions immediately stand out: &lt;strong&gt;latency&lt;/strong&gt; and &lt;strong&gt;answer quality&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In our run, MemAgent generally finished faster end-to-end than FSAgent. That might sound counterintuitive if you assume “database equals overhead,” and sometimes it does.&lt;/p&gt;

&lt;p&gt;But the agent loop is not dominated by raw storage primitives. It is dominated by how quickly you can find the right information and how little unnecessary context you force into the model, also known as context engineering. Semantic retrieval tends to return fewer, more relevant chunks (subject to tuning of the retrieval pipelines), which means less scanning, less paging through files, and fewer tokens burned on irrelevant text.&lt;/p&gt;

&lt;p&gt;In this particular run, both agents produced similar-quality answers. That is not surprising. When the questions are retrieval-friendly and the corpus is small enough, both approaches can find the right passages. FSAgent gets there through keyword search and careful reading. MemAgent gets there through similarity search over embedded chunks. Different roads, similar destination.&lt;/p&gt;

&lt;p&gt;And I think it’s worth zooming in on one nuance here. When the information to traverse is minimal in terms of character length and the query is keyword-friendly, the retrieval quality of both agents tends to converge. At that scale, “search” is barely a problem, so the dominant factor becomes the model’s ability to read and synthesise, not the retrieval substrate. The gap only starts to widen when the corpus grows, the wording becomes fuzzier, and the system must retrieve reliably under real-world constraints such as noise, paraphrases, and concurrency. Which it eventually does.&lt;/p&gt;

&lt;h3&gt;
  
  
  About the “LLM-as-a-Judge” metric
&lt;/h3&gt;

&lt;p&gt;We also scored answers using an LLM-as-a-judge prompt. It is a pragmatic way to get directional feedback when you do not have labeled ground truth, but it is not a silver bullet. Judges can be sensitive to prompt phrasing, can over-reward fluency, and can miss subtle grounding failures.&lt;/p&gt;

&lt;p&gt;If you are building this for production, treat LLM judging as a starting signal, not the finish line. The more reliable approach is a mix of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reference-based evaluation&lt;/strong&gt; when you have ground truth, such as rubric grading, exact match, or F1-style scoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval-aware evaluation&lt;/strong&gt; when context matters, such as context precision and recall, answer faithfulness, and groundedness. &lt;strong&gt;Tracing plus evaluation tooling&lt;/strong&gt; so you can connect failures to the specific retrievals, tool calls, and context assembly decisions that caused them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even with a lightweight judge, the directional story remains consistent. As retrieval becomes more difficult and the system becomes busier, database-backed memory tends to perform better.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large Corpus Benchmark: Why the gap widens as data grows
&lt;/h3&gt;

&lt;p&gt;The large-corpus test is designed to stress the exact weakness of keyword-first memory. We intentionally made the search problem harder by growing the corpus and making the queries less “exact match.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FSAgent with a concatenated corpus:&lt;/strong&gt;&lt;br&gt;
When you merge many papers into large markdown files, FSAgent becomes dependent on grep-style discovery followed by paging the right sections into the context window. It can work, but it gets brittle as the corpus grows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the user paraphrases or uses synonyms, exact keyword matches can fail.&lt;/li&gt;
&lt;li&gt;If the keyword is too common, you get too many hits, and the agent has to sift through them manually.&lt;/li&gt;
&lt;li&gt;When uncertain, the agent often loads larger slices “just in case,” which increases token count, latency, and the risk of context dilution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MemAgent with chunked, embedded memory:&lt;/strong&gt;&lt;br&gt;
Chunking plus embeddings makes retrieval more forgiving and more stable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The user does not need to match the source phrasing exactly.&lt;/li&gt;
&lt;li&gt;The agent can fetch a small set of high-similarity chunks, keeping context tight.&lt;/li&gt;
&lt;li&gt;Indexed retrieval remains predictable as memory grows, rather than requiring repeated scans of files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The narrative takeaway is simple. Filesystems feel great when the corpus is small and the queries are keyword-friendly. As the corpus grows and the questions get fuzzier, semantic retrieval becomes the differentiator, and database-backed memory becomes the more dependable default.&lt;/p&gt;

&lt;p&gt;The quality gap widens with scale. On a handful of documents, grep can brute-force its way to a reasonable answer: the agent finds a keyword match, pulls surrounding context, and responds.&lt;/p&gt;

&lt;p&gt;But scatter the same information across hundreds of files, and keyword search starts missing the forest for the trees. It returns too many shallow hits or none when the user's phrasing doesn't match the source text verbatim. Semantic search, by contrast, surfaces conceptually relevant chunks even when the vocabulary differs. The result isn't just faster retrieval, it's more coherent answers with fewer hallucinated gaps. This is evident in our LLM judge evaluation on the large corpus benchmark, where FSAgent achieved a score of 29.7% while MemAgent reached 87.1%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-5-1024x727.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-5-1024x727.png" alt="Large-corpus benchmark showing the widening quality gap between FSAgent and MemAgent" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency Test: What production teaches you very quickly
&lt;/h3&gt;

&lt;p&gt;We find that the real breaking point for filesystem memory is rarely retrieval. It is concurrency.&lt;/p&gt;

&lt;p&gt;We ran three versions of the same workload under concurrent writes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem without locking,&lt;/strong&gt; where multiple workers append to the same file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem with locking,&lt;/strong&gt; where writes are guarded by file locks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Oracle AI Database with transactions,&lt;/strong&gt; where multiple workers write rows under ACID guarantees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then we measured two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Integrity,&lt;/strong&gt; meaning, did we get the expected number of entries with no corruption?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution time,&lt;/strong&gt; meaning how long the batch took end-to-end.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F02%2Fimage-6.jpg" alt="Concurrent write integrity comparison across filesystem and database memory backends" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What we observed maps to what many teams discover the hard way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Naive filesystem writes can be fast and still be wrong.&lt;/strong&gt; Without locking, concurrent writes conflict with each other. You might get good throughput and still lose memory entries. If your agent’s “memory” is used for downstream reasoning, silent loss is not a performance issue. It is a correctness failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Locking fixes integrity, but now correctness is your job.&lt;/strong&gt; With explicit locking, you can make filesystem writes safe. But you inherit the complexity. Lock scope, lock contention, platform differences, network filesystem behavior, and failure recovery all become part of your agent engineering work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Databases make correctness the default.&lt;/strong&gt; Transactions and isolation are exactly what databases were designed for. Yes, there is overhead. But the key difference is that you are not bolting correctness on after a production incident. You start with a system whose job is to protect the shared state.&lt;/p&gt;

&lt;p&gt;And of course, you can take the file-locking approach, add atomic writes, build a write-ahead log, introduce retry and recovery logic, maintain indexes for fast lookups, and standardise metadata so you can query it reliably.&lt;/p&gt;

&lt;p&gt;Eventually, though, you will realise you have not “avoided” a database at all.&lt;/p&gt;

&lt;p&gt;You have just rebuilt one, only with fewer guarantees and more edge cases to own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Is there a happy medium for AI Developers?
&lt;/h2&gt;

&lt;p&gt;This isn’t a religious war between “files” and “databases.” It’s a question of what you’re optimizing for—and which failure modes you’re willing to own. If you’re building single-user or single-writer prototypes, filesystem memory is a great default. It’s simple, transparent, and fast to iterate on. You can open a folder and see exactly what the agent saved, diff it, version it, and replay it with nothing more than a text editor.&lt;/p&gt;

&lt;p&gt;If you’re building multi-user agents, background workers, or anything you plan to ship at scale, a database-backed memory store is a safer foundation at that stage. At that stage, concurrency, integrity, governance, access control, and auditability matter more than raw simplicity. A practical compromise is a hybrid design: keep file-like ergonomics for artifacts and developer workflows, but store durable memory in a database that can enforce correctness.&lt;/p&gt;

&lt;p&gt;And if you insist on filesystem-only memory in production, treat &lt;strong&gt;locking, atomic writes, recovery, indexing, and metadata discipline&lt;/strong&gt; as first-class engineering work. Because the moment you do that seriously, you’re no longer “just using files”—you’re rebuilding a database.&lt;/p&gt;

&lt;p&gt;One last trap worth calling out: &lt;strong&gt;polyglot persistence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Many AI stacks drift into an anti-pattern: a vector DB for embeddings, a NoSQL DB for JSON, a graph DB for relationships, and a relational DB for transactions. Each product is “best at its one thing,” until you realize you’re operating four databases, four security models, four backup strategies, four scaling profiles, and four cascading failure points.&lt;/p&gt;

&lt;p&gt;Coordination becomes the tax. You end up building glue code and sync pipelines just to make the system feel unified to the agent. This is why converged approaches matter in agent systems: production memory isn’t only about storing vectors—it’s about storing &lt;strong&gt;operational history, artifacts, metadata, and semantics&lt;/strong&gt; under one consistent set of guarantees.&lt;/p&gt;

&lt;p&gt;For AI Developers, your application acts as an integration layer for multiple storage engines, each with different access patterns and operational semantics. You end up building glue code, sync pipelines, and reconciliation logic just to make the system feel unified to the agent.&lt;/p&gt;

&lt;p&gt;Of course, production data is inherently heterogeneous. You will inevitably deal with structured, semi-structured, unstructured text, embeddings, JSON documents, and relationship-heavy data.&lt;/p&gt;

&lt;p&gt;The point is not that “one model wins”.&lt;/p&gt;

&lt;p&gt;The point is that when you understand the fundamentals of data management, reliability, indexing, governance, and queryability, you want a platform that can store and retrieve these forms without turning your AI infrastructure into a collection of loosely coordinated subsystems.&lt;/p&gt;

&lt;p&gt;This is the philosophy behind Oracle’s &lt;a href="https://www.oracle.com/uk/database/" rel="noopener noreferrer"&gt;converged database approach&lt;/a&gt;, which is designed to support multiple data types and workloads natively within a single engine. In the world of agents, that becomes a practical advantage because we can use Oracle as the unified memory core for both operational memory (SQL tables for history and logs) and semantic memory (vector search for retrieval).&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What is AI Agent memory?&lt;/strong&gt; AI agent memory is the set of system components and techniques that enable an AI agent to store, recall, and update information over time. Because LLMs are inherently stateless—they have no built-in ability to remember previous sessions—agent memory provides the persistence layer that allows agents to maintain continuity across conversations, learn from past interactions, and adapt to user preferences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Should I use a filesystem or a database for an AI agent's memory?&lt;/strong&gt; It depends on your use case. Filesystems excel at single-user prototypes, artifact-heavy workflows, and rapid iteration—they're simple, transparent, and align with how LLMs naturally operate. Databases become essential when you need concurrent access, ACID transactions, semantic retrieval, or shared state across multiple agents or users. Many production systems use a hybrid approach: file-like interfaces for agent interaction, with database guarantees underneath.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How do I build an AI agent with long-term memory?&lt;/strong&gt; Start by separating memory types: working memory (current context), semantic memory (knowledge base), episodic memory (interaction history), and procedural memory (behavioral rules). Implement storage: a filesystem for prototypes and a database for production. Add retrieval tools that the agent can call. Build a summarization to compress the old context. Test with multi-session scenarios where the agent must recall information from previous conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What are semantic, episodic, and procedural memory in AI agents?&lt;/strong&gt; These terms, borrowed from cognitive science, describe different types of agent memory. Semantic memory stores durable knowledge and facts (like saved documents or reference materials). Episodic memory captures experiences and interaction history (conversation transcripts, tool outputs). Procedural memory encodes how the agent should behave—instructions, rules, files like CLAUDE.md, and learned workflows that shape behavior across sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What is the best database for AI applications?&lt;/strong&gt; The best database depends on your requirements. For AI agent memory specifically, you need: vector search capability for semantic retrieval, SQL or structured queries for history and metadata, ACID transactions if multiple agents share state, and scalability as your memory corpus grows. Converged databases that combine these capabilities—like Oracle AI Database—reduce operational complexity versus running separate specialized systems.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>database</category>
      <category>oracle</category>
    </item>
    <item>
      <title>How I Added Memory to an AI Agent Using Spring AI and Oracle AI Database</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:17:32 +0000</pubDate>
      <link>https://dev.to/oracledevs/how-i-added-memory-to-an-ai-agent-using-spring-ai-and-oracle-ai-database-2e55</link>
      <guid>https://dev.to/oracledevs/how-i-added-memory-to-an-ai-agent-using-spring-ai-and-oracle-ai-database-2e55</guid>
      <description>&lt;h2&gt;&lt;strong&gt;Practical guide with a sample app for adding episodic, semantic, and procedural memory to an AI agent using Spring AI and a single Oracle AI Database instance.&lt;/strong&gt;&lt;/h2&gt;

&lt;p&gt;This post shows how to build three types of persistent memory — episodic (chat history), semantic (domain knowledge via hybrid search), and procedural (tool calls) — using Spring AI and a single Oracle AI Database instance. Here's the code: &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/oracle-database-java-agent-memory" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Key Takeaways&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLMs forget everything between sessions.&lt;/strong&gt; Episodic, semantic, and procedural memory fix that — chat history, domain knowledge retrieval, and actionable tool calls, all persisted in the database.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;One database handles it all.&lt;/strong&gt; Oracle AI Database stores chat history, runs hybrid vector search, and hosts the application tables — no need to bolt on a separate vector database or search engine.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Hybrid search beats pure vector search.&lt;/strong&gt; Combining dense embeddings with keyword matching (fused via Reciprocal Rank Fusion) means the agent finds documents by meaning &lt;em&gt;and&lt;/em&gt; by exact terms like order IDs.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Embeddings stay in the database.&lt;/strong&gt; A loaded ONNX model computes embeddings on insert — no external embedding API calls, no extra infrastructure.&lt;/li&gt;



&lt;li&gt;
&lt;strong&gt;Agent memory doesn't have to be complicated.&lt;/strong&gt; Two advisors, six tools backed by real database tables, one database, and the LLM stops forgetting.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Why This Matters&lt;/h2&gt;

&lt;p&gt;Every LLM has the same problem: it forgets everything the moment the conversation ends, sometimes even during long conversations. Spend twenty minutes explaining your project setup, your constraints, your preferences — and it nails the answer. Close the tab, open a new session, and it greets you like a stranger. All that context, gone.&lt;/p&gt;

&lt;p&gt;If you want to build an AI &lt;em&gt;agent&lt;/em&gt; — one that remembers context, understands your domain, and can take action — you need to give it memory. Practical memory: capturing what users say, retrieving learned facts and executing real workflows backed by database queries.&lt;/p&gt;

&lt;p&gt;This post walks through a proof of concept that does exactly that. Three types of memory, one database, and minimal code.&lt;/p&gt;

&lt;h2&gt;What You'll Learn&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How to implement episodic, semantic, and procedural memory for an AI agent using &lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs/oracle.html" rel="noopener noreferrer"&gt;Spring AI&lt;/a&gt; advisors and &lt;code&gt;@Tool&lt;/code&gt; methods&lt;/li&gt;



&lt;li&gt;How to use Oracle AI Database &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/create-vector-indexes-and-hybrid-vector-indexes.html" rel="noopener noreferrer"&gt;Hybrid Vector Indexes&lt;/a&gt; (vector and keyword search fused with Reciprocal Rank Fusion) for semantic retrieval&lt;/li&gt;



&lt;li&gt;How to &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/load_onnx_model-procedure.html" rel="noopener noreferrer"&gt;compute embeddings in-database with a loaded ONNX model&lt;/a&gt; — no external embedding API calls&lt;/li&gt;



&lt;li&gt;How to wire it all together with one database, one connection pool, and minimal configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Architecture Overview&lt;/h2&gt;

&lt;p&gt;&lt;/p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FScreenshot-2026-04-14-at-11.23.03-1024x550.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2FScreenshot-2026-04-14-at-11.23.03-1024x550.png" alt="Architecture diagram showing a Streamlit UI connecting to a Spring Boot service, which then routes to Oracle AI Database 26ai for chat memory and vector search, to Ollama for LLM chat, and @tool methods for procedural memory." width="800" height="430"&gt;&lt;/a&gt;System architecture for a memory-enabled AI assistant using Streamlit, Spring Boot, Oracle AI Database 26ai, Ollama, and @Tool methods.&lt;p&gt;&lt;/p&gt;

&lt;p&gt;The agent runs on Spring Boot with Spring AI, with Ollama handling local chat inference (&lt;a href="https://ollama.com/library/qwen2.5" rel="noopener noreferrer"&gt;qwen2.5&lt;/a&gt;). Oracle AI Database 26ai stores all three memory types: a relational table for chat history (episodic), a hybrid vector index for domain knowledge retrieval (semantic), and application tables queried by &lt;code&gt;@Tool&lt;/code&gt; methods (procedural). Embeddings are computed in-database by a loaded ONNX model (&lt;a href="https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2" rel="noopener noreferrer"&gt;all-MiniLM-L12-v2&lt;/a&gt;), eliminating the need for external embedding API calls. A Streamlit frontend provides a simple web UI.&lt;/p&gt;

&lt;p&gt;Both advisors and all six tools run on every request. The agent simultaneously remembers what you said, retrieves relevant knowledge, and executes tasks — all from a single Oracle Database instance. No second database. One connection pool, one set of credentials, one system to monitor.&lt;/p&gt;

&lt;h2&gt;Prerequisites&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Java 21&lt;/li&gt;



&lt;li&gt;Gradle 8.14&lt;/li&gt;



&lt;li&gt;Oracle AI Database 26ai (container or instance)&lt;/li&gt;



&lt;li&gt;Ollama with the &lt;code&gt;qwen2.5&lt;/code&gt; model pulled&lt;/li&gt;



&lt;li&gt;Python 3.x with Streamlit (optional, for the web UI)&lt;/li&gt;



&lt;li&gt;The ONNX model file (&lt;code&gt;all_MiniLM_L12_v2.onnx&lt;/code&gt;) for in-database embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Step-by-Step Guide&lt;/h2&gt;

&lt;h3&gt;Step 1: Set Up the Oracle AI Database and Hybrid Vector Index&lt;/h3&gt;

&lt;p&gt;Start an Oracle AI Database instance, then run the one-time setup script to load the ONNX embedding model and create the hybrid vector index. This enables in-database embeddings and combined vector and keyword search.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;-- Load the ONNX model for in-database embeddings
BEGIN
  DBMS_VECTOR.LOAD_ONNX_MODEL(
    directory  =&amp;gt; 'DM_DUMP',
    file_name  =&amp;gt; 'all_MiniLM_L12_v2.onnx',
    model_name =&amp;gt; 'ALL_MINILM_L12_V2'
  );
END;
/

-- Create a hybrid index: vector similarity + Oracle Text keyword search
CREATE HYBRID VECTOR INDEX POLICY_HYBRID_IDX
ON POLICY_DOCS(content)
PARAMETERS('MODEL ALL_MINILM_L12_V2 VECTOR_IDXTYPE HNSW');
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once the index is created, embeddings are computed automatically on insert — no external embedding API calls required.&lt;/p&gt;

&lt;h3&gt;Step 2: Define Procedural Memory with @Tool Methods&lt;/h3&gt;

&lt;p&gt;Procedural memory is implemented as &lt;code&gt;@Tool&lt;/code&gt;-annotated methods in a Spring component. These methods execute real database queries via JPA, which the LLM can call when it decides a task requires action, not just an answer. The &lt;code&gt;@Tool&lt;/code&gt; description tells the LLM &lt;em&gt;when&lt;/em&gt; to use each method, and &lt;code&gt;@ToolParam&lt;/code&gt; defines the inputs.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Tool(description = "Look up the status of a customer order by its order ID. " +
        "Returns the current status including shipping information.")
public String lookupOrderStatus(
        @ToolParam(description = "The order ID to look up, e.g. ORD-1001") String orderId) {
    // Fetches order from DB via JPA, returns formatted status string
}

@Tool(description = "Initiate a product return for a given order. " +
        "Validates the order exists, checks that it is in DELIVERED status, " +
        "and verifies the return is within the 30-day return window.")
public String initiateReturn(
        @ToolParam(description = "The order ID to return") String orderId,
        @ToolParam(description = "The reason for the return") String reason) {
    // Validates order exists, checks DELIVERED status and 30-day window, updates status via JPA
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The full class has six tools: &lt;code&gt;getCurrentDateTime&lt;/code&gt;, &lt;code&gt;listOrders&lt;/code&gt;, &lt;code&gt;lookupOrderStatus&lt;/code&gt;, &lt;code&gt;initiateReturn&lt;/code&gt;, &lt;code&gt;escalateToSupport&lt;/code&gt;, and &lt;code&gt;listSupportTickets&lt;/code&gt;. The LLM decides &lt;em&gt;when&lt;/em&gt; to act; the Java methods define &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;Step 3: Wire the Controller with Advisors and Tools&lt;/h3&gt;

&lt;p&gt;The controller builds a single &lt;code&gt;ChatClient&lt;/code&gt; with two advisors and six tools. &lt;code&gt;MessageChatMemoryAdvisor&lt;/code&gt; handles episodic memory by loading the last 100 messages for the current conversation from a relational table and persisting each new exchange. &lt;code&gt;RetrievalAugmentationAdvisor&lt;/code&gt;, with a custom &lt;code&gt;OracleHybridDocumentRetriever&lt;/code&gt;, handles semantic memory by calling &lt;code&gt;DBMS_HYBRID_VECTOR.SEARCH&lt;/code&gt; to run vector and keyword search in parallel, fused with Reciprocal Rank Fusion (RRF). The tools are registered via &lt;code&gt;.defaultTools(agentTools)&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@RestController
@RequestMapping("/api/v1/agent")
public class AgentController {

    public AgentController(ChatClient.Builder builder,
                           JdbcChatMemoryRepository chatMemoryRepository,
                           JdbcTemplate jdbcTemplate,
                           AgentTools agentTools) {
        // Builds a ChatClient with:
        //   - MessageChatMemoryAdvisor (episodic: last 100 messages per conversation)
        //   - RetrievalAugmentationAdvisor + OracleHybridDocumentRetriever (semantic: hybrid search)
        //   - AgentTools via .defaultTools() (procedural: 6 @Tool methods)
        //   - System prompt defining the agent persona and tool usage rules
    }

    @PostMapping("/chat")
    public ResponseEntity&amp;lt;String&amp;gt; chat(
            @RequestBody String message,
            @RequestHeader("X-Conversation-Id") String conversationId) {
        // Sends message to ChatClient with conversation ID, returns LLM response
    }

    @PostMapping("/knowledge")
    public ResponseEntity&amp;lt;String&amp;gt; addKnowledge(@RequestBody String content) {
        // Inserts text into POLICY_DOCS table via JDBC (hybrid index handles embedding)
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All three memory types run on every request. The agent simultaneously remembers what you said, retrieves relevant knowledge, and executes tasks.&lt;/p&gt;

&lt;h3&gt;Step 4: Implement the Hybrid Document Retriever&lt;/h3&gt;

&lt;p&gt;The custom &lt;code&gt;OracleHybridDocumentRetriever&lt;/code&gt; implements Spring AI's &lt;code&gt;DocumentRetriever&lt;/code&gt; interface and calls &lt;code&gt;DBMS_HYBRID_VECTOR.SEARCH&lt;/code&gt; via JDBC. It passes a JSON parameter specifying the hybrid index, the RRF scorer, and a keyword match clause, bypassing &lt;code&gt;OracleVectorStore&lt;/code&gt; entirely for retrieval.&lt;/p&gt;

&lt;p&gt;Why hybrid instead of pure vector search? Dense embeddings capture meaning — a query about "return policy" can match documents about refunds and exchanges. But they're weaker on exact terms: a query for "ORD-1001" performs poorly because embeddings encode semantics, not keywords. Hybrid search addresses both: the vector side captures meaning, the keyword side handles exact matches, and RRF merges the result sets by rank position.&lt;/p&gt;

&lt;h3&gt;Step 5: Run the Application&lt;/h3&gt;

&lt;p&gt;Start the Oracle DB container, install Ollama, pull the chat model, run the Spring Boot backend with the &lt;code&gt;local&lt;/code&gt; profile, and optionally start the Streamlit UI.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2Fepisodic-memory-1024x644.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogs.oracle.com%2Fdevelopers%2Fwp-content%2Fuploads%2Fsites%2F129%2F2026%2F04%2Fepisodic-memory-1024x644.png" alt="Dark-mode chatbot interface showing a user asking, “Do you remember my name?” and the assistant replying that it remembers Victor from a previous conversation and can create a support ticket for an ergonomic mouse connection issue tied to order ORD-1007." width="800" height="503"&gt;&lt;/a&gt;The assistant recalls the customer’s name, prior issue, and order details to continue support without repeating context.&lt;p&gt;&lt;/p&gt;

&lt;p&gt;Optionally, &lt;strong&gt;quick test with cURL:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;curl -X POST http://localhost:8080/api/v1/agent/chat \&lt;br&gt;  -H "Content-Type: text/plain" \&lt;br&gt;  -H "X-Conversation-Id: test-1" \&lt;br&gt;  -d "What orders do I have?"&lt;br&gt;&lt;br&gt;&lt;/pre&gt;

&lt;p&gt;The agent will remember your name and details, or use procedural memory (the &lt;code&gt;listOrders&lt;/code&gt; tool) to query the database and return the demo orders. Try "What is your return policy?" to see semantic memory (hybrid search over policy documents) in action. Then type "My name is Victor" followed later by "What's my name?" to test episodic memory.&lt;/p&gt;

&lt;h2&gt;Frequently Asked Questions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why does the agent need three types of memory instead of just chat history?&lt;/strong&gt;&lt;br&gt;Chat history (episodic memory) only covers what was said in the conversation. Semantic memory lets the agent retrieve domain knowledge — like return policies or shipping rules — that was never mentioned in chat. Procedural memory lets it take actions, such as looking up an order or initiating a return, by calling tool methods backed by real database queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why use hybrid search instead of plain vector similarity?&lt;/strong&gt;&lt;br&gt;Pure vector search matches by meaning, which works well for natural-language questions but struggles with exact terms like product codes or order IDs. Hybrid search runs vector and keyword search in parallel and merges the results by rank position (Reciprocal Rank Fusion), so the agent finds relevant documents whether the match is semantic, lexical, or both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a separate vector database to build this?&lt;/strong&gt;&lt;br&gt;No. Oracle AI Database 26ai supports relational tables, hybrid vector indexes, and full-text search in a single instance. The POC uses one connection pool and one set of credentials for chat history, vector retrieval, and all application data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are the embeddings generated?&lt;/strong&gt;&lt;br&gt;An ONNX model (all-MiniLM-L12-v2) is loaded directly into Oracle AI Database. Embeddings are computed automatically whenever a row is inserted into the indexed table — no external API calls and no separate embedding service required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the limitations?&lt;/strong&gt;&lt;br&gt;This is a proof of concept. There's no authentication, no rate limiting, and no streaming responses. It demonstrates the architecture and approach — production use would require hardening those areas.&lt;/p&gt;

&lt;h2&gt;Next Steps&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/oracle-database-java-agent-memory" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/li&gt;



&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/" rel="noopener noreferrer"&gt;Oracle AI Vector Search documentation&lt;/a&gt;&lt;/li&gt;



&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/" rel="noopener noreferrer"&gt;Spring AI documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Author&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Victor Martin Alvarez&lt;/strong&gt; – Senior Principal Product Manager, Oracle AI DatabaseBuilding AI-powered applications with Oracle AI Database and Spring AI.&lt;a href="https://www.linkedin.com/in/victormartindeveloper/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;



</description>
      <category>oracle</category>
      <category>ai</category>
      <category>database</category>
      <category>springai</category>
    </item>
    <item>
      <title>Build an Ultra-Lightweight, Local-First AI Assistant with Persistent Memory</title>
      <dc:creator>Wojtek Pluta</dc:creator>
      <pubDate>Tue, 14 Apr 2026 14:16:59 +0000</pubDate>
      <link>https://dev.to/oracledevs/build-an-ultra-lightweight-local-first-ai-assistant-with-persistent-memory-11i0</link>
      <guid>https://dev.to/oracledevs/build-an-ultra-lightweight-local-first-ai-assistant-with-persistent-memory-11i0</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw&lt;/a&gt; is a lightweight, offline AI assistant with local inference via &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.oracle.com/database/" rel="noopener noreferrer"&gt;Oracle AI Database&lt;/a&gt; stores sessions, memories, transcripts, prompts, and state with durable ACID-backed persistence.&lt;/li&gt;
&lt;li&gt;Semantic recall happens in the database using &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/onnx-pipeline-models-text-embedding.html" rel="noopener noreferrer"&gt;ONNX embeddings&lt;/a&gt; and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/overview-ai-vector-search.html" rel="noopener noreferrer"&gt;vector search&lt;/a&gt;, removing the need for an external embedding API.&lt;/li&gt;
&lt;li&gt;The same project runs locally for development and can move to &lt;a href="https://www.oracle.com/cloud/" rel="noopener noreferrer"&gt;Oracle Cloud Infrastructure&lt;/a&gt; (OCI) when you need a managed deployment.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Local-First AI Assistant with Built-In Memory
&lt;/h2&gt;

&lt;p&gt;If you want to build an AI assistant that runs locally, retains &lt;strong&gt;meaningful&lt;/strong&gt; context, and can move to the cloud &lt;strong&gt;without rearchitecting the stack&lt;/strong&gt;, &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw&lt;/a&gt; is a &lt;strong&gt;strong&lt;/strong&gt; starting point. It pairs a lightweight &lt;a href="https://go.dev/" rel="noopener noreferrer"&gt;Go&lt;/a&gt; runtime with local inference via Ollama and uses Oracle AI Database as the &lt;strong&gt;persistent memory layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This matters for developers building edge AI systems, private assistants, or local-first prototypes. Instead of stitching together separate services for storage, embeddings, and retrieval, you can keep memory, state, and semantic recall within Oracle AI Database while still running a lightweight local runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is PicoOraClaw?
&lt;/h2&gt;

&lt;p&gt;PicoOraClaw is a fork of &lt;strong&gt;&lt;a href="https://github.com/sipeed/picoclaw?tab=readme-ov-file" rel="noopener noreferrer"&gt;PicoClaw&lt;/a&gt;&lt;/strong&gt; that keeps the runtime lightweight, uses Ollama as the default inference backend, and adds Oracle AI Database for &lt;strong&gt;persistent memory and state&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;PicoClaw&lt;/strong&gt; is an independent open-source project initiated by &lt;a href="https://sipeed.com/" rel="noopener noreferrer"&gt;Sipeed&lt;/a&gt;, written entirely in &lt;strong&gt;Go&lt;/strong&gt; from scratch - not a fork of &lt;a href="https://openclawd.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, &lt;a href="https://github.com/HKUDS/nanobot" rel="noopener noreferrer"&gt;NanoBot&lt;/a&gt;, or any other project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The result is a developer-friendly architecture for assistants that retain meaningful context and retrieve it semantically, rather than relying on keyword matching. &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw&lt;/a&gt; targets use cases such as edge AI, IoT, private assistants, and local-first developer workflows, where a small footprint and persistent context matter more than a cloud-only approach. See the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight&lt;/strong&gt; Go runtime for local and edge-friendly assistant workflows&lt;/li&gt;
&lt;li&gt;Oracle AI Database-backed &lt;strong&gt;memory, state, and semantic recall&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Ollama as the default local inference backend&lt;/li&gt;
&lt;li&gt;Support for &lt;strong&gt;multiple LLM providers&lt;/strong&gt; including &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;, &lt;a href="https://www.anthropic.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;, &lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;, &lt;a href="https://gemini.google.com/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, &lt;a href="https://www.deepseek.com/" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt;, &lt;a href="https://groq.com/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, and &lt;a href="https://chat.z.ai/" rel="noopener noreferrer"&gt;Zhipu&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default:&lt;/strong&gt; &lt;a href="https://www.oracle.com/database/free/" rel="noopener noreferrer"&gt;&lt;strong&gt;Oracle AI Database Free&lt;/strong&gt;&lt;/a&gt; with Oracle AI Vector Search for semantic memory&lt;/li&gt;
&lt;li&gt;Optional &lt;a href="https://www.oracle.com/autonomous-database/" rel="noopener noreferrer"&gt;Autonomous AI Database&lt;/a&gt; path for managed cloud deployment&lt;/li&gt;
&lt;li&gt;Graceful file-based fallback when Oracle is unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Choose PicoOraClaw vs. Standard PicoClaw?
&lt;/h2&gt;

&lt;p&gt;If you're already familiar with PicoClaw, PicoOraClaw adds a more complete memory layer for developers who need durable context and semantic recall.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Oracle AI Database as the persistent backend for &lt;strong&gt;memories&lt;/strong&gt;, &lt;strong&gt;sessions, transcripts, state, notes, prompts, and configuration&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-database ONNX embeddings&lt;/strong&gt; and &lt;strong&gt;vector search for semantic memory using&lt;/strong&gt; &lt;code&gt;VECTOR_EMBEDDING()&lt;/code&gt; and &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Ollama as the default &lt;strong&gt;local LLM backend&lt;/strong&gt; with no cloud dependency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-click OCI deployment&lt;/strong&gt; with Oracle AI Database Free, Ollama, and the PicoOraClaw gateway&lt;/li&gt;
&lt;li&gt;Optional OCI Generative AI integration through the included &lt;code&gt;oci-genai&lt;/code&gt; proxy&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;oracle-inspect&lt;/code&gt; CLI support for inspecting what the assistant stores without writing SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Features of PicoOraClaw
&lt;/h2&gt;

&lt;p&gt;What PicoOraClaw enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unified Memory Core&lt;/strong&gt; - PicoOraClaw uses Oracle AI Database to store sessions, transcripts, notes, prompts, configuration, and long-term memories in a single persistent system. The database is the memory substrate for long-running, context-aware assistant behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build Fast with Modern APIs&lt;/strong&gt; - Get started locally with a lightweight runtime, Ollama for local inference, and Oracle AI Database Free for semantic memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Robust Scaling Path&lt;/strong&gt; - Start locally, keep the same overall architecture, and move to OCI later when you need a managed environment.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation - Quick Start (in 5 minutes!)
&lt;/h2&gt;

&lt;p&gt;For the fastest path to a working setup, use the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw&lt;/a&gt; one-command installer. It clones, configures, and runs the application in a single step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/oracle-devrel/oracle-ai-developer-hub/refs/heads/main/apps/picooraclaw/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To control the workspace path, clone the Oracle DevRel repository directly and build from the PicoOraClaw app directory.&lt;/p&gt;

&lt;p&gt;Follow the steps below:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go 1.24+&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;Docker (for &lt;a href="https://www.oracle.com/database/free/" rel="noopener noreferrer"&gt;Oracle Database Free&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Build
&lt;/h3&gt;

&lt;p&gt;Clone the Oracle DevRel repository, navigate to the PicoOraClaw app folder, and build the binary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/oracle-devrel/oracle-ai-developer-hub.git
&lt;span class="nb"&gt;cd &lt;/span&gt;oracle-ai-developer-hub/apps/picooraclaw
make build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Initialize
&lt;/h3&gt;

&lt;p&gt;Initialize the application so it creates the local configuration and working directories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw onboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Start Ollama and pull a model
&lt;/h3&gt;

&lt;p&gt;Ollama is the default and recommended LLM backend for private local inference with no API keys and no cloud dependency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama if needed: https://ollama.com/download&lt;/span&gt;
ollama pull qwen3:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Configure for Ollama
&lt;/h3&gt;

&lt;p&gt;Edit &lt;code&gt;~/.picooraclaw/config.json&lt;/code&gt; so PicoOraClaw points at your Ollama instance and model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3:latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"api_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"api_base"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:11434/v1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Test semantic memory
&lt;/h3&gt;

&lt;p&gt;Once the binary, config, and model are ready, start the assistant and test local conversations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# One-shot&lt;/span&gt;
./build/picooraclaw agent &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Hello!"&lt;/span&gt;

&lt;span class="c"&gt;# Interactive mode&lt;/span&gt;
./build/picooraclaw agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this stage, you have a working local AI assistant with no cloud dependency.&lt;/p&gt;

&lt;p&gt;The default LLM backend is &lt;strong&gt;Ollama&lt;/strong&gt;, with an optional alternative for using OCI-hosted models. See &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/apps/picooraclaw/oci-genai/README.md" rel="noopener noreferrer"&gt;oci-genai/README.md&lt;/a&gt; for related documentation.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;oci-genai&lt;/code&gt; module provides &lt;strong&gt;OCI Generative AI&lt;/strong&gt; as an optional backend for PicoOraClaw. It runs a local OpenAI-compatible proxy that authenticates with OCI using your &lt;code&gt;~/.oci/config&lt;/code&gt; credentials and forwards requests to the OCI GenAI inference endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to Oracle Cloud (one-click procedure)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.oracle.com/resourcemanager/stacks/create?zipUrl=https://github.com/jasperan/picooraclaw/raw/main/deploy/oci/orm/picooraclaw-orm.zip" rel="noopener noreferrer"&gt;Click here to deploy to Oracle Cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This deployment provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an OCI Compute instance&lt;/li&gt;
&lt;li&gt;Ollama with a model preloaded for CPU inference&lt;/li&gt;
&lt;li&gt;Oracle AI Database Free by default, with an optional Autonomous AI Database path&lt;/li&gt;
&lt;li&gt;the PicoOraClaw gateway as a &lt;code&gt;systemd&lt;/code&gt; service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can start locally, keep the same overall architecture, and move to OCI when you need a managed environment.&lt;/p&gt;

&lt;p&gt;After deployment, use these commands to verify setup, start chatting, and check gateway health:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check setup progress&lt;/span&gt;
ssh opc@&amp;lt;public_ip&amp;gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="s1"&gt;'tail -f /var/log/picooraclaw-setup.log'&lt;/span&gt;

&lt;span class="c"&gt;# Start chatting&lt;/span&gt;
ssh opc@&amp;lt;public_ip&amp;gt; &lt;span class="nt"&gt;-t&lt;/span&gt; picooraclaw agent

&lt;span class="c"&gt;# Check gateway health&lt;/span&gt;
curl http://&amp;lt;public_ip&amp;gt;:18790/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding Oracle AI Vector Search
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database provides &lt;strong&gt;persistent storage&lt;/strong&gt;, &lt;strong&gt;semantic memory&lt;/strong&gt; and recall, and crash-safe &lt;strong&gt;ACID transactions&lt;/strong&gt;, with an optional file-based storage mode.&lt;/p&gt;

&lt;p&gt;Simply run the setup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./scripts/setup-oracle.sh &lt;span class="o"&gt;[&lt;/span&gt;optional-password]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script performs the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pulls and starts the Oracle AI Database Free container&lt;/li&gt;
&lt;li&gt;Waits for the database to be ready&lt;/li&gt;
&lt;li&gt;Creates the &lt;code&gt;picooraclaw&lt;/code&gt; database user with the required grants&lt;/li&gt;
&lt;li&gt;Patches &lt;code&gt;~/.picooraclaw/config.json&lt;/code&gt; with the Oracle connection settings&lt;/li&gt;
&lt;li&gt;Runs &lt;code&gt;picooraclaw setup-oracle&lt;/code&gt; to initialize the schema and load the ONNX embedding model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This step gives the assistant durable semantic memory. Instead of relying on local files or ephemeral process state, PicoOraClaw persists and retrieves meaning-based context directly through Oracle AI Vector Search.&lt;/p&gt;

&lt;p&gt;Expected output when setup is complete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;── Step 4/4: Schema + ONNX model ─────────────────────────────────────────
  Running picooraclaw setup-oracle...
✓ Connected to Oracle AI Database
✓ Schema initialized (8 tables with PICO_ prefix)
✓ ONNX model 'ALL_MINILM_L12_V2' already loaded
✓ VECTOR_EMBEDDING() test passed
✓ Prompts seeded from workspace

════════════════════════════════════════════════════════
  Oracle AI Database setup complete!
  Test with:
    ./build/picooraclaw agent -m "Remember that I love Go"
    ./build/picooraclaw agent -m "What language do I like?"
    ./build/picooraclaw oracle-inspect
════════════════════════════════════════════════════════
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test semantic memory
&lt;/h3&gt;

&lt;p&gt;Use the following commands to test semantic memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Store a fact&lt;/span&gt;
./build/picooraclaw agent &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Remember that my favorite language is Go"&lt;/span&gt;

&lt;span class="c"&gt;# Recall by meaning (not keywords)&lt;/span&gt;
./build/picooraclaw agent &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"What programming language do I prefer?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second command finds the stored memory via cosine similarity on 384-dimensional vectors rather than exact keyword matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspecting Oracle Data with oracle-inspect
&lt;/h2&gt;

&lt;p&gt;A useful operational feature is &lt;code&gt;oracle-inspect&lt;/code&gt;, a CLI tool that lets you inspect stored data without writing SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;picooraclaw oracle-inspect &lt;span class="o"&gt;[&lt;/span&gt;table] &lt;span class="o"&gt;[&lt;/span&gt;options]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memories, sessions, transcripts, state, notes, prompts, config, meta
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-n &amp;lt;limit&amp;gt; max rows (default 20), -s &amp;lt;text&amp;gt; semantic search (memories only)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To list all memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect memories
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also perform semantic search over memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect memories &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"what does the user like to program in"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a meaningful developer benefit. Oracle-backed memory is inspectable, debuggable, and operationally visible. You can understand what the assistant stores without building a separate admin layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overview dashboard
&lt;/h3&gt;

&lt;p&gt;Run the following command to view an overview dashboard of stored data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the command with no arguments gives you a summary view across tables, recent memory entries, transcripts, sessions, state, notes, prompts, and schema metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=============================================================
  PicoOraClaw Oracle AI Database Inspector
=============================================================

  Table                  Rows
  ─────────────────────  ────
  Memories                  20  ████████████████████
  Sessions                   4  ████
  Transcripts                6  ██████
  State                      8  ████████
  Daily Notes                3  ███
  Prompts                    4  ████
  Config                     2  ██
  Meta                       1  █
  ─────────────────────  ────
  Total                     48

  Tip: Run 'picooraclaw oracle-inspect &amp;lt;table&amp;gt;' for details
       Run 'picooraclaw oracle-inspect memories -s "query"' for semantic search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  List all memories
&lt;/h3&gt;

&lt;p&gt;Run the following command to list all stored memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect memories
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;All Memories
─────────────────────────────────────────────────────────

ID: faffd019  Vector: yes
Created: 2026-02-19 04:13  Importance: 0.9  Category: preference  Accessed: 0x
Content: User prefers Oracle Database as the primary database. They work at Oracle
and prefer Oracle AI Vector Search for embeddings.

ID: 0e39036f  Vector: yes
Created: 2026-02-19 04:13  Importance: 0.8  Category: preference  Accessed: 0x
Content: Go is the user's primary programming language. They use Go 1.24 and target
embedded Linux devices (RISC-V, ARM64, x86_64).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Semantic search over memories
&lt;/h3&gt;

&lt;p&gt;The following example shows how to perform semantic search over stored memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect memories &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"what does the user like to program in"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Semantic Search: "what does the user like to program in"
─────────────────────────────────────────────────────────

[ 61.3% match]  ID: 383ff5d3
Created: 2026-02-16 06:13  Importance: 0.7  Category: preference  Accessed: 0x
Content: I prefer Python and Go for programming

[ 60.7% match]  ID: 0e74a94c
Created: 2026-02-18 02:20  Importance: 0.7  Category: preference  Accessed: 0x
Content: my favorite programming language is Go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For deeper inspection of sessions, transcripts, notes, config, prompts, and schema metadata, see the PicoOraClaw app in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;Oracle DevRel repository&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inspect sessions
&lt;/h3&gt;

&lt;p&gt;You can inspect stored chat sessions using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Chat Sessions
─────────────────────────────────────────────────────────

Session: discord:dev-channel
Created: 2026-02-19 04:13  Updated: 2026-02-19 04:13  Messages size: 673 bytes

Session: cli:default
Created: 2026-02-16 06:12  Updated: 2026-02-18 06:07  Messages size: 2848 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inspect agent state
&lt;/h3&gt;

&lt;p&gt;Inspect the agent's stored state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./build/picooraclaw oracle-inspect state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent State (Key-Value)
─────────────────────────────────────────────────────────
agent_mode                     = interactive
last_channel                   = cli
last_model                     = gpt-4o-mini
total_conversations            = 42
user_name                      = jasperan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How Oracle Storage Works
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;remember&lt;/code&gt; tool stores text along with a vector embedding using &lt;code&gt;VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING :text AS DATA)&lt;/code&gt;. The &lt;code&gt;recall&lt;/code&gt; tool then uses &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; for cosine similarity search.&lt;/p&gt;

&lt;p&gt;With Oracle-backed storage in place, PicoOraClaw supports the following LLM providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;OpenRouter&lt;/li&gt;
&lt;li&gt;Zhipu&lt;/li&gt;
&lt;li&gt;Anthropic&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Gemini&lt;/li&gt;
&lt;li&gt;DeepSeek&lt;/li&gt;
&lt;li&gt;Groq&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PicoOraClaw also supports &lt;strong&gt;OCI Generative AI&lt;/strong&gt; as an optional LLM backend for enterprise models via the included &lt;code&gt;oci-genai&lt;/code&gt; proxy.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLI Reference
&lt;/h2&gt;

&lt;p&gt;The following commands cover the core PicoOraClaw workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw onboard&lt;/code&gt; - initialize config and workspace&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw agent -m "..."&lt;/code&gt; - one-shot chat&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw agent&lt;/code&gt; - interactive chat mode&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw setup-oracle&lt;/code&gt; - initialize Oracle schema and ONNX model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw oracle-inspect&lt;/code&gt; - inspect data stored in Oracle AI Database&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw oracle-inspect memories -s "query"&lt;/code&gt; - semantic search over stored memories&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;picooraclaw gateway&lt;/code&gt; - start the long-running service with channels enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;PicoOraClaw is more than a lightweight assistant runtime. Combined with Oracle AI Database, it becomes a practical pattern for building assistants that retain context, retrieve facts semantically, and scale from local development to OCI without rearchitecting.&lt;/p&gt;

&lt;p&gt;Start small, stay local, add durable semantic memory with Oracle AI Vector Search, and keep a clear path to a managed deployment model when you need it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions (FAQs)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What hardware do I need to run PicoOraClaw?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
PicoOraClaw runs on resource-constrained environments including x86_64, ARM64, and RISC-V platforms with a very small footprint. See the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;project repository&lt;/a&gt; for exact requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does PicoOraClaw remember information?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
PicoOraClaw stores memories, sessions, and related state in Oracle AI Database. It uses in-database ONNX embeddings and vector search to retrieve memory by meaning rather than exact keyword matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need an external embedding API?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No, the Oracle-backed memory flow uses in-database embeddings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I run PicoOraClaw fully offline?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. Ollama as the default backend enables fully local inference, making PicoOraClaw suitable for offline or privacy-sensitive workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I deploy PicoOraClaw to Oracle Cloud?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes. The OCI deployment path provisions compute, Oracle AI Database Free, Ollama, and the PicoOraClaw gateway as a &lt;code&gt;systemd&lt;/code&gt; service, with an optional Autonomous AI Database path. &lt;a href="https://cloud.oracle.com/resourcemanager/stacks/create?zipUrl=https://github.com/jasperan/picooraclaw/raw/main/deploy/oci/orm/picooraclaw-orm.zip" rel="noopener noreferrer"&gt;Deploy here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which LLM providers are supported?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Ollama (default), OpenRouter, Zhipu, Anthropic, OpenAI, Gemini, DeepSeek, Groq, and optional OCI Generative AI integration through the included proxy. See the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;PicoOraClaw repository&lt;/a&gt; for details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/apps/picooraclaw" rel="noopener noreferrer"&gt;See the Oracle AI Developer Hub PicoOraClaw page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.oracle.com/database/free/" rel="noopener noreferrer"&gt;Try Oracle Database Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.oracle.com/autonomous-database/" rel="noopener noreferrer"&gt;Learn more about Autonomous AI Database&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>oracle</category>
      <category>ai</category>
      <category>database</category>
      <category>picoclaw</category>
    </item>
  </channel>
</rss>
