DEV Community

Wojtek Pluta
Wojtek Pluta

Posted on

Agent Memory Storage: A Practical Guide

Why this matters

Agentic AI promises autonomous systems that reason and act, but without persistent memory, they fail in real-world use. Agents forget conversation context after a few turns or repeat API errors due to stateless design. This tutorial shows how to build a hybrid memory system using Oracle AI Database, combining SQL for exact recall and vectors for semantic search, enabling production-ready agents that learn and remember across sessions.

What we’ll build

We'll create a Hybrid Memory Manager for AI agents that stores conversational history in SQL tables for ACID compliance and semantic knowledge in vector indexes for similarity search. Key features include automatic summarization to manage context windows, persistent tool retrieval, and integration with web search results via Tavily. This converged approach in Oracle AI Database eliminates data silos.

  • Hybrid Storage: SQL for episodic memory (chats), vectors for semantic memory (facts/tools).
  • Lifecycle Management: Summarize long threads to keep prompts efficient.
  • Learning Loop: Cache web search results in the knowledge base.
  • Requirements:
    • Docker for Oracle Database 23ai Free.
    • Python 3.10+ with libraries: oracledb, langchain-oracledb, langchain, sentence-transformers, tavily-python.

Setup

Install and configure

Start the Oracle Database 23ai Free container using Docker. This provides the converged engine for SQL and vector operations.

docker run -d \
  --name oracle-free \
  -p 1521:1521 -p 5500:5500 \
  -e ORACLE_PWD=OraclePwd_2025 \
  -v "$HOME/oracle/full_data:/opt/oracle/oradata" \
  container-registry.oracle.com/database/free:latest
Enter fullscreen mode Exit fullscreen mode

Wait for the container to start (check with docker logs oracle-free). The database is ready when you see "DATABASE IS READY TO USE!" in logs.

Install Python dependencies for database connection, embeddings, and agent tools.

python -m pip install -U oracledb langchain-oracledb langchain \
  sentence-transformers tavily-python
Enter fullscreen mode Exit fullscreen mode

Output:

Successfully installed oracledb-2.3.0 langchain-oracledb-0.1.0 ...
Enter fullscreen mode Exit fullscreen mode

Prerequisites:

  • Product: Oracle Database 23ai Free.
  • Tools: Docker, Python 3.10+, VS Code or similar.
  • Skills: Basic SQL, Python, Docker.
  • Access: Run locally; no cloud signup needed for this guide.

Connect and verify

Connect to the database and verify the Oracle version. This Python script retries on startup delays.

import oracledb
import time

def connect_to_oracle(user="VECTOR", password="VectorPwd_2025", dsn="127.0.0.1:1521/FREEPDB1", max_retries=3):
    for attempt in range(1, max_retries + 1):
        try:
            conn = oracledb.connect(user=user, password=password, dsn=dsn)
            with conn.cursor() as cur:
                cur.execute("SELECT banner FROM v$version WHERE banner LIKE 'Oracle%'")
                print(f"Connected to: {cur.fetchone()[0]}")
            return conn
        except oracledb.Error as e:
            if attempt == max_retries:
                raise
            print(f"Retry {attempt}...")
            time.sleep(5)

conn = connect_to_oracle()
Enter fullscreen mode Exit fullscreen mode

Output:

Connected to: Oracle Database 23ai Free - Version
Enter fullscreen mode Exit fullscreen mode

Core steps

Step 1: Design hybrid memory architecture

Match storage types to use cases: Use SQL for structured, exact-match episodic memory (e.g., chat history) to ensure auditability and transaction safety. Use vector embeddings for semantic search in knowledge bases and tools, retrieving similar content without exact keywords. In Oracle AI Database, store both in one schema to avoid sync issues and simplify queries.

This design prevents data drift and enables unified queries, like joining chat metadata with similar knowledge vectors.

Step 2: Implement episodic memory with SQL

Store conversation turns in a SQL table for reliable, queryable history. This supports filtering by thread_id and timestamp, essential for multi-turn agents.

Execute these SQL commands via SQLcl or the connection script (connect as SYS or VECTOR user with privileges).

-- Create episodic memory table
CREATE TABLE CONVERSATIONAL_MEMORY (
  id         VARCHAR2(100) DEFAULT SYS_GUID() PRIMARY KEY,
  thread_id  VARCHAR2(100) NOT NULL,
  role       VARCHAR2(50)  NOT NULL,
  content    CLOB          NOT NULL,
  timestamp  TIMESTAMP     DEFAULT CURRENT_TIMESTAMP,
  metadata   CLOB,
  summary_id VARCHAR2(100)
);

CREATE INDEX idx_conv_thread ON CONVERSATIONAL_MEMORY(thread_id);
Enter fullscreen mode Exit fullscreen mode

Output:

Table created.
Index created.
Enter fullscreen mode Exit fullscreen mode

Now, add a Python function to insert messages.

def write_conversation(conn, thread_id: str, role: str, content: str) -> str:
    with conn.cursor() as cur:
        out_id = cur.var(str)
        cur.execute(
            """INSERT INTO CONVERSATIONAL_MEMORY(thread_id, role, content, metadata)
               VALUES(:t, :r, :c, '{}') RETURNING id INTO :id""",
            {"t": thread_id, "r": role, "c": content, "id": out_id}
        )
    conn.commit()
    return out_id.getvalue()[0]

# Example usage
msg_id = write_conversation(conn, "thread123", "user", "Hello, agent!")
print(f"Inserted message ID: {msg_id}")
Enter fullscreen mode Exit fullscreen mode

Output:

Inserted message ID: 123e4567-e89b-12d3-a456-426614174000
Enter fullscreen mode Exit fullscreen mode

Pitfall: Always commit after inserts to persist data; use metadata CLOB for JSON extras like user ID.

Step 3: Build semantic memory with vectors

Use vector search for fuzzy retrieval of knowledge and tools. Oracle AI Database's vector index enables efficient similarity searches on embedded text.

First, create the vector table (via SQL).

-- Create semantic memory table with vector column
CREATE TABLE SEMANTIC_MEMORY (
  id         VARCHAR2(100) DEFAULT SYS_GUID() PRIMARY KEY,
  content    CLOB NOT NULL,
  embedding  VECTOR(384, FLOAT32),  -- Dimension matches model
  metadata   CLOB,
  source     VARCHAR2(100)
);

CREATE VECTOR INDEX idx_sem_vec ON SEMANTIC_MEMORY(embedding) VECTOR PARAMETERS('ORGANIZER=HNSW');
Enter fullscreen mode Exit fullscreen mode

Output:

Table created.
Index created.
Enter fullscreen mode Exit fullscreen mode

Now, integrate with LangChain for easy ingestion and search.

from langchain_oracledb.vectorstores import OracleVS
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-mpnet-base-v2")

kb_vs = OracleVS(client=conn, embedding_function=embedding,
                 table_name="SEMANTIC_MEMORY", embedding_col="embedding")

# Ingest sample knowledge
kb_vs.add_texts(
  texts=["A tablespace can be online or offline for maintenance.", "LOB segments are indexed implicitly for performance."],
  metadatas=[{"source": "admin_guide"}, {"source": "dev_guide"}]
)

# Search example
results = kb_vs.similarity_search("database maintenance", k=1)
print(results[0].page_content)
Enter fullscreen mode Exit fullscreen mode

Output:

A tablespace can be online or offline for maintenance.
Enter fullscreen mode Exit fullscreen mode

Pitfall: Match embedding dimensions to your model (384 for paraphrase-mpnet); re-embed if switching models to avoid drift.

Step 4: Unify with MemoryManager class

Encapsulate SQL and vector ops in a single class for clean agent integration. This abstraction simplifies the main loop, allowing agents to query memory without knowing the backend.

class MemoryManager:
    def __init__(self, conn, convo_table, kb_vs):
        self.conn = conn
        self.convo_table = convo_table
        self.kb_vs = kb_vs

    def read_convo(self, thread_id: str):
        with self.conn.cursor() as cur:
            cur.execute(
                f"SELECT role, content FROM {self.convo_table} WHERE thread_id=:t AND summary_id IS NULL ORDER BY timestamp",
                {"t": thread_id}
            )
            return cur.fetchall()

    def search_knowledge(self, query: str, k=3):
        return self.kb_vs.similarity_search(query, k=k)

    def write_convo(self, thread_id: str, role: str, content: str):
        return write_conversation(self.conn, thread_id, role, content)

# Example usage
mm = MemoryManager(conn, "CONVERSATIONAL_MEMORY", kb_vs)
history = mm.read_convo("thread123")
print(f"History length: {len(history)}")

knowledge = mm.search_knowledge("database tablespace")
print(knowledge[0].page_content)
Enter fullscreen mode Exit fullscreen mode

Output:

History length: 1
A tablespace can be online or offline for maintenance.
Enter fullscreen mode Exit fullscreen mode

This keeps the agent code focused on reasoning, not storage details.

Step 5: Add summarization for context management

Long conversations exceed LLM context limits and increase costs. Automatically summarize threads when they exceed a threshold (e.g., 10 turns), store the summary as a vectorized entry, and link originals via summary_id.

Extend MemoryManager with summarization (requires OpenAI or local LLM).

from langchain_openai import OpenAI  # Or use local model
from langchain.chains.summarize import load_summarize_chain

def summarize_thread(mm, thread_id: str, llm):
    history = mm.read_convo(thread_id)
    if len(history) < 10:
        return None

    # Format history for summarization
    docs = [f"{role}: {content}" for role, content in history]
    chain = load_summarize_chain(llm, chain_type="map_reduce")
    summary = chain.run(docs)

    # Store summary in semantic memory
    summary_doc = kb_vs.add_texts([summary], metadatas=[{"type": "summary", "thread_id": thread_id}])[0]
    summary_id = summary_doc.metadata.get("id", str(hash(summary)))

    # Mark originals
    def mark_summarized(conn, thread_id: str, summary_id: str):
        with conn.cursor() as cur:
            cur.execute("UPDATE CONVERSATIONAL_MEMORY SET summary_id=:s WHERE thread_id=:t AND summary_id IS NULL",
                        {"s": summary_id, "t": thread_id})
        conn.commit()

    mark_summarized(mm.conn, thread_id, summary_id)
    return summary

# Example (set OPENAI_API_KEY env)
llm = OpenAI(temperature=0)
summary = summarize_thread(mm, "thread123", llm)
print(f"Summary: {summary}")
Enter fullscreen mode Exit fullscreen mode

Output:

Summary: The user greeted the agent, which responded helpfully about database topics.
Enter fullscreen mode Exit fullscreen mode

Pitfall: Choose chain_type="stuff" for short threads; monitor token usage to avoid recursion in map_reduce.

Step 6: Integrate learning with Tavily search

Agents should persist external knowledge to reduce API calls and improve speed. Use Tavily for web search, then embed and store results in semantic memory.

Set TAVILY_API_KEY environment variable.

from tavily import TavilyClient

def learn_from_web(mm, query: str, max_results=3):
    client = TavilyClient(api_key="your-tavily-key")
    response = client.search(query, max_results=max_results)

    for result in response["results"]:
        content = result["content"]
        # Embed and store
        mm.kb_vs.add_texts([content], metadatas=[{"source": "tavily", "query": query, "url": result["url"]}])

    # Return for immediate use
    return mm.search_knowledge(query)

# Example
web_knowledge = learn_from_web(mm, "best practices for AI agent memory")
print(web_knowledge[0].page_content[:100] + "...")
Enter fullscreen mode Exit fullscreen mode

Output:

Hybrid memory systems combine relational and vector databases for robust AI agents...
Enter fullscreen mode Exit fullscreen mode

This creates a feedback loop: Search once, reuse forever. Security note: Sanitize web content before storage to avoid injection.

Pitfalls and patterns

  • Unbounded Growth: Implement summarization thresholds (e.g., every 10 turns) to prune history; monitor table sizes with SQL queries like SELECT COUNT(*) FROM CONVERSATIONAL_MEMORY.
  • Security: Encrypt PII in metadata CLOBs using Oracle Transparent Data Encryption. Use row-level security for multi-tenant agents. Audit access with Oracle Data Safe.
  • Embedding Drift: If updating the embedding model, add a version column and re-embed incrementally with a migration script.
  • Performance: Use HNSW index for vectors (default in Oracle); batch inserts for high-volume ingestion. Test query latency with EXPLAIN PLAN.
  • Error Handling: Wrap DB ops in try-except; retry on transient errors like connection timeouts.

Next steps

Explore the full implementation in the tutorial notebook. Star the repo. Share your agent builds in the comments!

Top comments (0)