DEV Community: Anya Summers

LangGraph persistence with Oracle AI Database

Anya Summers — Wed, 08 Jul 2026 15:58:59 +0000

Durable LangGraph workflows with Oracle AI Database

Key Takeaways

Oracle AI Database can be used as a persistence layer for LangGraph using langchain-oracle libraries. This persistence layer powers agent workflows with retries, audit trails, and repeatability.
OracleSaver preserves (checkpoints) graph state at a point in time. This state may be resumed, inspected, or replayed using a the run’s uniquethread_id.
OracleStore persists long-term, cross-thread memory, including user preferences, facts, and shared knowledge.

A human-in-the-loop graph is easy to demonstrate when the entire workflow stays in memory. The more realistic challenge starts when the graph pauses for a reviewer, waits for a business decision, and later resumes with the same state intact.

The langgraph_persistence sample shows how to build that pattern with LangGraph and Oracle AI Database. It evaluates a travel request, uses OCI Generative AI to draft a concise reviewer brief, checkpoints the graph state with langgraph-oracledb, interrupts for approval, resumes the same thread_id with the reviewer’s decision, and stores the approved record separately with OracleStore.

This sample keeps the workflow intentionally small so the persistence pattern is easy to follow: graph state is durable, human approval happens outside the running process, and the final approved request is stored as application data.

LangGraph persistence architecture using Oracle AI Database for checkpoints and durable application state.

Sample Description

This sample demonstrates a LangGraph travel approval workflow that persists graph state in Oracle AI Database using langgraph-oracledb.

The workflow evaluates a travel request, generates a reviewer brief with OCI Generative AI, and then pauses for human approval. While the graph is interrupted, LangGraph checkpoints the current state in Oracle AI Database. After a reviewer provides a decision, the workflow resumes using the same thread_id and continues from the saved state.

LangGraph approval workflow with human review, checkpointing, and Oracle AI Database persistence.

When a request is approved, the sample also writes a separate approved record through OracleStore, including the original request, approval decision, policy reason, and generated brief.

The accompanying diagrams show how the Python sample, LangGraph, OracleSaver, OracleStore, Testcontainers, OCI Generative AI, and Oracle AI Database fit together.

Prerequisites

Python 3.13+
Poetry
Docker compatible environment to run Oracle AI Database Free
Local OCI configuration for OCI Generative AI on-demand chat

Install dependencies from the python-oracle/ directory:

poetry install

Set the OCI compartment before running the command-line sample:

export OCI_COMPARTMENT_ID=<your-compartment-ocid>

The sample assumes an OCI Generative AI on-demand model and defaults to the model alias cohere.command-latest. It builds the regional service endpoint from the region in your DEFAULT OCI config profile, so no dedicated AI cluster endpoint is required.

Run the Sample

LangGraph approval workflow with automated routing, human review, and Oracle AI Database persistence.

From the python-oracle/ directory:

poetry run python src/python_oracle/langgraph_persistence/travel_approval_graph.py

The script starts Oracle AI Database Free with Testcontainers, creates the LangGraph checkpoint and store tables, runs the request, and prints the final outcome. For the default over-limit request, it also drafts an OCI-generated approval brief, pauses with interrupt() (which pauses graph execution), prints a checkpoint summary from OracleSaver, resumes with Command(resume=...), and reads the approved business record back from OracleStore.

To run the rejection branch:

poetry run python src/python_oracle/langgraph_persistence/travel_approval_graph.py --reject

LangGraph bits

thread_id identifies the LangGraph run, used for the checkpoint summary, and the resume command.
OracleSaver is the persistence checkpointer. It persists graph state so the human approval interrupt can survive outside the process.
OracleStore is the persistence store. The last graph node writes approved business records and the CLI reads the record back with store.get(...).
Runtime[ApprovalContext] carries the live ChatOCIGenAI model into the graph. The model is runtime context, not checkpointed graph state — it can be easily reconstructed on a new run.
OCI Generative AI drafts reviewer context only after policy says approval is required. Auto-approved requests skip the model call and go straight to finalization.

Building the graph with OracleStore and OracleSaver

The OracleStore and OracleSaver objects are easily constructed with a database connection string:

with (
    OracleSaver.from_conn_string(conn_string) as checkpointer,
    OracleStore.from_conn_string(conn_string) as store,
):
    checkpointer.setup()
    store.setup()

Then, the store and checkpointer can be used to build our graph:

def build_graph(checkpointer: OracleSaver, store: OracleStore):
    builder = StateGraph(ApprovalState, context_schema=ApprovalContext)
    builder.add_node("evaluate_policy", evaluate_policy)
    builder.add_node("draft_approval_brief", draft_approval_brief)
    builder.add_node("request_approval", request_approval)
    builder.add_node("finalize_request", finalize_request)
    builder.add_edge(START, "evaluate_policy")
    builder.add_conditional_edges(
        "evaluate_policy",
        route_after_policy,
        {"approval": "draft_approval_brief", "finalize": "finalize_request"},
    )
    builder.add_edge("draft_approval_brief", "request_approval")
    builder.add_edge("request_approval", "finalize_request")
    builder.add_edge("finalize_request", END)
    return builder.compile(
        checkpointer=checkpointer,
        store=store,
    )

FAQ

Why use OracleSaver? OracleSaver is used as the LangGraph checkpointer. It persists graph state under a thread_id so the workflow can survive a pause and resume later.
Why use OracleStore? OracleStore stores the final approved business record. This is separate from checkpointing because application records and graph execution state have different purposes.
What is stored as the approved record? The approved record includes the travel request, approval decision, final status, policy reason, and generated approval brief.
What’s the difference between OracleSaver and OracleStore? The Saver (Checkpointer) is for checkpoints of in-progress work, and the Store (Durable memory) is for workflow history.
What data is checkpointed? Checkpointers like OracleSaver save a snapshot of the graph state at each step, organized by thread_id. Checkpoints enable human-in-the-loop workflows, time travel debugging, fault-tolerant execution, and conversational memory.
Why does thread_id matter? The thread_id corresponds to a specific checkpoints in graph state. The state can be inspected, replayed, and more from a given thread_id .
Why use Oracle AI Database persistence? Use persistence when you want to keep information beyond a single graph run. Persistence helps when you want to continue a conversation, resume after an interruption, recover from a failure, or remember information across interactions.

References

sample app -> travel_approval_graph.py
Oracle AI Database testcontainers integration -> oracle_database_container.py
Designing agent memory with Oracle AI Database
langchain-oracle

A tour of LangChain Oracle ingestion and retrieval

Anya Summers — Wed, 08 Jul 2026 15:57:47 +0000

A practical walkthrough for building a compact LangChain retrieval pipeline on Oracle AI Database, from document ingestion and vector storage to hybrid search, semantic caching, and chat history.

Key takeaways

langchain-oracledb connects LangChain workflows directly to Oracle AI Database. The article shows how Oracle-specific LangChain classes can load, split, store, search, cache, and persist chat-related data in one workflow.
OracleVS supports both storage and semantic retrieval. After embeddings are saved in Oracle AI Database, those vectors are used to run similarity search and return the nearest matching documents for a user question.
Hybrid retrieval improves answer selection. Reciprocal Rank Fusion (RRF), is a method used to combine different search results into one single list. The sample combines semantic search from OracleVS with keyword/full-text search from OracleTextSearchRetriever, then fuses the results into a single best match.
Caching and chat history are built into the workflow. OracleSemanticCache avoids regenerating answers for similar questions, while OracleChatMessageHistory stores human and AI messages for conversation persistence.
The sample is designed to run locally. It uses Testcontainers with Oracle AI Database Free, a local embedding model, Python 3.13+, Poetry, and a Docker-compatible environment.

The langchain-oracledb package makes it easy to integrate LangChain concepts with your Oracle AI Database instance.

In this article, we’ll explore the langchain_retrieval sample, using langchain-oracledb for content retrieval. The sample composes various langchain-oracledb classes, including to load, split, save, and retrieve content. This post is a companion article to my prior post, LangGraph persistence with Oracle AI Database.

LangChain and Oracle AI Database integration for hybrid retrieval, persistence, and answer generation.

These are the langchain-oracledb classes we’re going to use:

# langchain-oracledb imports
from langchain_oracledb.cache import OracleSemanticCache
from langchain_oracledb.chat_message_histories import OracleChatMessageHistory
from langchain_oracledb.document_loaders import OracleDocLoader, OracleTextSplitter
from langchain_oracledb.retrievers import OracleTextSearchRetriever
from langchain_oracledb.vectorstores import DistanceStrategy, OracleVS

This sample uses Testcontainers with Oracle AI Database Free, and a local embedding model. If you want to jump to the code, start here: langchain_retrieval (GitHub)

Here’s what all the Oracle pieces look like together:

LangChain state management in Oracle AI Database, including retrieval, memory, and semantic caching.

Create a vector store: load, split, and embed content

On startup, our sample populates a vector store with documents from a database table. It does this by chaining an OracleDocLoader to load documents, splitting those documents with OracleTextSplitter, and then storing them in a vector table with OracleVS. These utility classes avoid hand tuned logic, allowing you to easily build retrieval pipelines. Let’s break this down, step-by-step.

Load data

The OracleDocLoader class can be used to ingest documents from various sources. We’re going to use it to load documents from a database table, that’s populated at app startup:

loader = OracleDocLoader(
    conn=conn,
    params={
        "owner": conn.username,
        "tablename": SOURCE_TABLE,
        "colname": "BODY",
        "mdata_cols": ["RUNBOOK_ID", "TITLE", "PRODUCT"],
    },
)

Using a database connection, the loader is provisioned with a table name and any relevant columns to load. From this information, it returns a list of LangChain Document objects that are usable in any LangChain workflow.

Split text

OracleTextSplitter is handy to break documents into chunks for embedding. Using a few parameters, we can split the document by words, sentence, with a max size per chunk.

splitter = OracleTextSplitter(
    conn=conn,
    params={"by": "words", "max": 30, "split": "sentence", "normalize": "all"},
)
ids = [str(document.metadata["runbook_id"]) for document in source_documents]

Save and embed

With our loaded and split content, it’s time to save them into a vector store. The OracleVS class provides a nifty interface to do this for us, without much code:

embedding_model = embeddings or AllMiniLMEmbeddings()
return OracleVS(
    conn,
    embedding_model,
    table_name=VECTOR_TABLE,
    distance_strategy=DistanceStrategy.COSINE,
    mutate_on_duplicate=True,
)

Note that you should provide an embedding model when creating an OracleVS object. We’re using the popular local AllMiniLMEmbeddings model, which you can find on HuggingFace. We also supply a table name, a vector distance strategy, and an update policy.

Putting it all together

We now have all the pieces of an ingestion pipeline, using LangChain interfaces! Let’s see what it looks like, in one piece: load documents, split, and embed into OracleVS.

def load_source_documents(conn: oracledb.Connection) -> list[Document]:
    loader = OracleDocLoader(
        conn=conn,
        params={
            "owner": conn.username,
            "tablename": SOURCE_TABLE,
            "colname": "BODY",
            "mdata_cols": ["RUNBOOK_ID", "TITLE", "PRODUCT"],
        },
    )
    return [_normalize_document(document) for document in loader.load()]
def build_vector_store(
    conn: oracledb.Connection,
    source_documents: list[Document],
    embeddings: Embeddings | None = None,
) -> tuple[OracleVS, list[str]]:
    vector_store = create_vector_store(conn, embeddings)
    chunk_ids = add_documents_to_vector_store(conn, vector_store, source_documents)
    return vector_store, chunk_ids

def create_vector_store(
    conn: oracledb.Connection,
    embeddings: Embeddings | None = None,
) -> OracleVS:
    embedding_model = embeddings or AllMiniLMEmbeddings()
    return OracleVS(
        conn,
        embedding_model,
        table_name=VECTOR_TABLE,
        distance_strategy=DistanceStrategy.COSINE,
        mutate_on_duplicate=True,
    )

def add_documents_to_vector_store(
    conn: oracledb.Connection,
    vector_store: OracleVS,
    source_documents: list[Document],
) -> list[str]:
    splitter = OracleTextSplitter(
        conn=conn,
        params={"by": "words", "max": 30, "split": "sentence", "normalize": "all"},
    )
    ids = [str(document.metadata["runbook_id"]) for document in source_documents]
    return vector_store.add_documents(
        source_documents,
        text_splitter=splitter,
        ids=ids,
        add_chunk_metadata=True,
    )

The great thing about LangChain is that this is fairly compact. If I were to write this code without the library classes, it’d be several hundred lines of Python!

Now, let’s try retrieval and question answering

Retrieval is where things get a bit more interesting. langchain-oracledb offers a few nice helpers in this area:

OracleVS for semantic search
OracleSemanticCache to cache embeddings/answers. Embeddings are stored in Oracle AI Database.
OracleTextSearchRetriever for keyword based full-text search, which complements semantic search
and, OracleChatMessageHistory to work with chat histories.

Let’s see how these fit together.

Semantic search

OracleVS, which helped us store embeddings, of course also allows us to retrieve them. This is quite easy using the Python interface:

def semantic_search(
    vector_store: OracleVS,
    question: str,
    *,
    product_filter: str | None = None,
    k: int = 4,
) -> list[RunbookHit]:
    metadata_filter = {"product": {"$eq": product_filter}} if product_filter else None
    documents_with_scores = vector_store.similarity_search_with_score(
        question,
        k=k,
        filter=metadata_filter,
    )
    return [
        _document_hit(document, score, "semantic", rank)
        for rank, (document, score) in enumerate(documents_with_scores, start=1)
    ]

We use an optional metadata filter to further refine our results, and return the nearest K results to the input question. Note that the input question is embedded using the same embedding model we initialized the vector store with.

Keyword search

We can supplement semantic search with text search using OracleTextSearchRetriever, which returns a separate score. This is done using text operators in the database to find relevant content:

def keyword_search(vector_store: OracleVS, question: str, k: int = 4) -> list[RunbookHit]:
    retriever = OracleTextSearchRetriever(
        vector_store=vector_store,
        k=k,
        return_scores=True,
    )
    return [
        _document_hit(document, float(document.metadata.get("score", 0)), "keyword", rank)
        for rank, document in enumerate(retriever.invoke(question), start=1)
    ]

Fusing results

Taking results from both similarity and full-text search, we can accumulate them into one result set. It’s important to note which result came from which search method, as this helps the consumer determine relevancy:

def fuse_hits(semantic_hits: list[RunbookHit], keyword_hits: list[RunbookHit]) -> RunbookHit:
    if not semantic_hits and not keyword_hits:
        raise RuntimeError("No runbook matched the question.")
    by_runbook: dict[int, RankAccumulator] = {}
    for hit in [*semantic_hits, *keyword_hits]:
        accumulator = by_runbook.setdefault(hit.runbook_id, RankAccumulator(hit))
        accumulator.score += 1.0 / (hit.rank + 1)
    return max(by_runbook.values(), key=lambda entry: entry.score).hit

Putting retrieval together to answer a question

We now have all the components needed to scaffold a basic question answering function. Our answer_question method combines OracleVS semantic search, OracleTextSearchRetriever full-text search, and OracleSemanticCache for answer caching:

def answer_question(
    conn: oracledb.Connection,
    vector_store: OracleVS,
    question: str,
    *,
    embeddings: Embeddings | None = None,
    product_filter: str | None = None,
) -> QuestionResult:
    embedding_model = embeddings or AllMiniLMEmbeddings()
    cache = OracleSemanticCache(
        conn,
        embedding_model,
        table_name=CACHE_TABLE,
        score_threshold=0.001,
    )
    cached_generations = cache.lookup(question, LLM_CACHE_KEY) or []
    cache_hit = bool(cached_generations)
    semantic_hits = semantic_search(vector_store, question, product_filter=product_filter)
    keyword_hits = keyword_search(vector_store, question)
    fused_hit = fuse_hits(semantic_hits, keyword_hits)
    answer = cached_generations[0].text if cached_generations else build_answer(question, fused_hit)
    if not cached_generations:
        cache.update(question, LLM_CACHE_KEY, [Generation(text=answer)])
    history = OracleChatMessageHistory(
        SESSION_ID,
        client=conn,
        table_name=HISTORY_TABLE,
    )
    if not cache_hit:
        history.add_messages([HumanMessage(content=question), AIMessage(content=answer)])
    return QuestionResult(
        question=question,
        answer=answer,
        cache_hit=cache_hit,
        semantic_hits=semantic_hits,
        keyword_hits=keyword_hits,
        fused_hit=fused_hit,
        history_count=len(history.messages),
    )

In a semantic cache, the score_threshold determines how similar an entry must be to signify a cache hit. Embeding is required for a semantic cache, so expect additional latency/work on cache lookups.

Time to take the sample for a spin

Let’s run it locally. You’ll need the following prerequisites, as the example spins up a disposable Oracle AI Database Free container:

Python 3.13+
Poetry
Docker compatible environment

From python-oracle/ directory, Install dependencies:

poetry install

Then run the sample:

poetry run python src/python_oracle/langchain_retrieval/runbook_retrieval.py

The script starts the full Oracle AI Database Free image, expected output is similar to:

#### Loaded runbooks into Oracle AI Database ####
Source runbooks: 4
Vector chunks:   12

#### Retrieval ####
Question:      My VPN disconnects every few minutes on Wi-Fi, but it stays connected on Ethernet. What should I try?
Semantic top:  Stabilize VPN over Wi-Fi
Keyword top:   Stabilize VPN over Wi-Fi
Fused top:     Stabilize VPN over Wi-Fi

#### Response Persistence ####
For: My VPN disconnects every few minutes on Wi-Fi, but it stays connected on Ethernet. What should I try?
Use runbook: Stabilize VPN over Wi-Fi.
Why: it matches the network product area and says to Use this runbook when a VPN client disconnects every few minutes on Wi-Fi but stays connected on Ethernet.

Chat history messages: 2
Second lookup used OracleSemanticCache: True

The chunk count may vary if Oracle AI Database chunking behavior changes, but it should be greater than the four source runbooks.

References

FAQs

What problem does this article solve? It demonstrates how to build a LangChain-based retrieval workflow using Oracle AI Database for ingestion, vector storage, search, caching, and chat history.
What is langchain-oracledb used for? It provides Oracle Database integrations for LangChain concepts, including document loaders, text splitters, vector stores, retrievers, semantic cache, and chat message history.
How does the sample ingest documents? It reads documents from a database table using OracleDocLoader, includes selected metadata columns, normalizes the documents, splits them into chunks, and stores the chunks in a vector table.
Why does the article split text before embedding it? Splitting turns larger documents into smaller chunks that are better suited for embedding and retrieval; the sample uses word-based splitting with sentence boundaries and normalization.
What embedding model does the sample use? The sample uses a local AllMiniLMEmbeddings model by default, though the vector store creation function can accept another embedding model.
What is the difference between semantic search and keyword search here? Semantic search finds meaning-based matches through vector similarity, while keyword search uses Oracle text search capabilities to find lexical matches; the sample combines both approaches.
How are final retrieval results chosen? The sample ranks semantic and keyword hits, accumulates scores by runbook, and selects the strongest fused result as the best match for the question.
What does OracleSemanticCache add? It stores and retrieves prior answers for semantically similar questions, so repeated or near-repeated queries can reuse cached responses instead of rebuilding the answer.
What does OracleChatMessageHistory add? It persists the user question and AI response in Oracle Database, giving the workflow a durable chat history.
Who is this article most useful for? It is most useful for developers building RAG-style applications with LangChain who want Oracle AI Database to handle document ingestion, vector search, hybrid retrieval, caching, and conversation persistence.

Is Oracle AI Database the best choice for small to mid size shops?

Anya Summers — Wed, 08 Jul 2026 15:55:17 +0000

Developers, startups, and small-to-medium-sized businesses (SMBs) all benefit from a converged data platform that scales from free development to global production.

Key Takeaways

Oracle AI Database is worth a serious look for small and midsize teams when one application needs relational data, JSON, search, vectors, spatial queries, REST endpoints, and internal app tooling without operating a stack of separate services.
The free path lets you verify it: local containers, FreeSQL, SQLcl, SQL Developer, and Always Free Autonomous AI Database let you prove before paying.
The free path isn’t everything: Resource limits, Always Free cloud constraints, backups, networking, and operational skills still matter.
Oracle AI Database scales from local development and free tiers to enterprise deployments without changing database platforms.
Oracle’s multicloud story is stronger than you might think: Oracle deployment patterns across AWS, Azure, and Google Cloud. Multicloud deployment options may differ slightly between providers.
I’m not saying “Oracle for everything”, but to consider Oracle when its capabilities meet your needs: fewer databases, fewer synchronization layers, less operational overhead. Try it with a sample app

Comparing multi-service architectures with a consolidated Oracle AI Database approach.

Oracle AI Database isn’t an automatic “best database” choice for every app, side project, or business system. If you need the absolute simplest embedded database, use SQLite. If your team already knows Postgres, that might be the shortest path.

However, if your team needs one database that can cover normal relational data, JSON documents, search, spatial data, graph relationships, events, REST endpoints, APEX apps, local testing, and a managed cloud path, Oracle AI Database Free deserves a look.

Everyone knows Oracle can run giant enterprise systems, but people generally assume this comes with complexity. People miss that a hobbyist or small-to-medium-size business can start small, paying little-to-nothing, and scale massively on a single, converged data platform.

Where Oracle May Surprise You

Your app may start with customers and invoices. Then someone asks for geofenced service regions. Then support tickets need full-text search. Then a feature asks for semantic search over notes. Then an internal team wants a quick admin app. Then you need REST endpoints for a partner integration. Then the owner wants reporting.

Each of those requests could nudge you to another service to deploy, operate, and patch. This results in ever-increasing operational overhead and architectural complexity.

With Oracle AI Database, more of that can stay close to the data:

Relational tables for core records
JSON for flexible payloads
Hybrid search workloads, including full-text, i.e., combining search methods like vector and full-text.
Spatial for location-aware queries
Property graph features for relationship-heavy questions
Events and database-backed messaging patterns, i.e., combining messaging with database transactions for atomic workflows.
Vector search for embedding-driven retrieval
ORDS for REST and SQL Developer Web workflows
APEX for browser apps and internal tools

I’m not saying you have to use every feature: the options are there, and should be considered before adding another moving part from a third-party vendor.

Guidance on when Oracle AI Database Free is appropriate versus when production deployments are needed.

It’s the difference between “we can ship this in the database we already run” vs. “we need to run and sync another service forever”.

The Free Tooling Is Excellent

In 5 Free Oracle AI Database Dev Tools I’d Put in a Starter Kit, I laid out the starter kit I use day-to-day. These are also tools you can use prove your workload. Start small, then scale to cloud:

Local:

Oracle AI Database Free container images: local or small development, POCs, and testing.
SQLcl MCP Server: MCP integrations with your database.
SQL Developer: When you need a developer UI for your database.

Cloud:

FreeSQL: free online labs, SQL testing, and disposable database connections for learning.
Always Free Autonomous AI Database: managed-cloud free tier for development, POCs, and testing.

Use the container when you need repeatable local development or disposable integration tests. Use FreeSQL when the right setup is no setup: a browser-based SQL environment with a personal schema for learning, examples, or quick query checks. Use SQLcl when you want scripts, automation, setup validation, exports, loads, and command-line repeatability.

SQLcl also matters for AI-assisted development because it includes an MCP server. That gives tools like Codex or Claude Code a structured way to inspect schema metadata and run database operations through saved SQLcl connections instead of guessing from stale prompt context.

Use SQL Developer when you need to browse schemas visually, inspect rows, debug SQL or PL/SQL, or explain something on a screen share. Use Always Free Autonomous AI Database when the app needs a persistent managed database in OCI for demos, APEX, ORDS, wallet-based connectivity, or cloud deployment validation.

Run on any major hyperscaler

One practical change from the old Oracle mental model is that Oracle AI Database is not only an Oracle-only conversation.

Oracle can run on-prem, Oracle Cloud Infrastructure (OCI), and other cloud providers like AWS, Azure, and Google Cloud. If you’re already bought into one of the “big three” cloud providers, that doesn’t preclude you from Oracle.

Comparing multi-service architectures with a consolidated Oracle AI Database approach.

My Practical Recommendation

For hobbyists, I’d start with Oracle AI Database Free when you need something more than basic relational storage.

If you want to learn serious database development, build a portfolio app, test SQL beyond toy examples, try vector search, expose REST endpoints, build APEX screens, and run realistic integration tests.

For small-to-mid-size businesses, I would evaluate it when the app is likely to need multiple data patterns but the team does not want to operate multiple databases and sync layers.

A good first POC is small:

Run the local container.
Create the first schema.
Connect with SQLcl.
Browse it with SQL Developer.
Add one feature that would otherwise require another service.
Rebuild the environment from scripts.
Try the same app against Always Free Autonomous AI Database.

Final Thoughts

Oracle is an exceptional database when you want one data platform that can work with any kind data, that starts free, scales globally, and can absorb almost any app needs without adding another service.

Start free. Prove one workflow. Consolidate services and data onto one platform. Scale globally.

FAQ

Is Oracle AI Database really a good choice for a hobby project?

Definitely. If you want to test against a real database, try vector search, build APEX screens, expose REST endpoints, or keep multiple data patterns in one place. If you only need a local file-backed store, SQLite is usually simpler.

Is this just for Oracle shops?

No. A neutral team should evaluate it with containers, SQLcl, SQL Developer, FreeSQL, and Always Free Autonomous AI Database before committing.

Does multicloud support mean I can ignore cloud choice?

No. Oracle Database offerings across AWS, Azure, and Google Cloud make the deployment story more flexible, but they are not identical checkboxes. Regions, service availability, preview status, networking, support, pricing, and operational ownership still need review.

Can a small business run production on the free tier?

Free tiers are great for learning, demos, and small hobby workloads, but not as a reliable production resource. Production data needs backup and restore expectations, monitoring, access control, scaling guarantees, security patching, recovery testing, support, and a clear owner.

When would I avoid Oracle AI Database?

Avoid it when the app only needs the simplest persistence layer, when the team already has a database workflow that fits, or when the Oracle learning curve would slow delivery more than consolidation helps. A good database choice (and architecture in general) makes your life simpler.

What are the limits of Oracle AI Database Free and Always Free Autonomous AI Database?

Refer to the Free FAQ

How does Oracle AI Database compare with PostgreSQL for SMB workloads?

Choose Oracle AI Database if regulated data, strong built-in security, mixed operational + analytical workloads are a priority, or if you want native AI/vector/agent workflows governed inside the database.

Do I need Oracle-specific skills to get started?

No. It’s best to get started with the free tier, experiment, and see how your business can benefit.

References

What Is a Converged Database? Definition, Five Tests, and AI Use Cases

Anya Summers — Mon, 06 Jul 2026 16:09:42 +0000

A converged database is a single database engine that natively supports multiple data models - relational, document/JSON, graph, vector, spatial, and text - under one optimizer, one transaction boundary, one consistency model, and one security and governance domain, exposed through the access surfaces developers expect, including SQL, document APIs, and REST.

In the Oracle AI Database context, this matters because vector search, JSON/document access, graph patterns, and relational joins can be treated as one data architecture rather than a chain of specialized services.

Many databases can store several of these models. The qualifier that matters isn't the list of models. It's everything after the dash - because that is where multi-store architectures incur their costs.

Answer box - the short version
What it is: one database engine in which relational tables, JSON documents, graphs, vectors, spatial data, and text share the same transactions, the same query optimizer, the same consistency guarantees, and the same security model.
Why it matters: AI and modern operational workloads need retrieval that is simultaneously fresh, governed, and joined across data models - properties that are difficult to assemble from separate specialized stores connected by synchronization pipelines.
How it differs from multi-model: multi-model describes what a product can store; converged describes which guarantees span the models. Storing several models is now common; one transaction boundary, one optimizer, and one governance domain across them is not.

Key takeaways

A converged database is defined by cross-model guarantees, not just by storage support.
The five tests are one transaction boundary, one optimizer, one consistency model, one governance domain, and shared access surfaces.
For RAG and AI agent workloads, convergence matters because retrieval must be fresh, governed, and joined with operational data.
Multi-model storage is common; cross-model guarantees are the differentiator.

Converged database vs multi-model database vs vector database

Database pattern	What it means	AI workload implication
Converged database	One engine supports multiple data models under shared transaction, optimizer, consistency, and governance guarantees.	Useful when RAG or agents need retrieval that is fresh, governed, and joined with operational context.
Multi-model database	One product can store several models, but the guarantees may stop at model boundaries.	Useful for consolidation, but not sufficient if cross-model queries, rollback, or access control must behave as one system.
Vector database	A specialized system optimized for embedding similarity search.	Useful for standalone similarity serving; less complete when answers also need live predicates, transactions, and relational joins.

One disambiguation before we start: this is about converged databases, not converged infrastructure. Hyperconverged infrastructure is a hardware story about collapsing compute, storage, and networking. This is a data architecture story about collapsing several database engines into one.

Everything in this article runs. Each claim maps to a numbered assertion in a public companion repository, converged-database-lab, executed by CI against Oracle AI Database 26ai Free - a freely available container (See Oracle Database API for MongoDB, overview (includes beta-stage notes for $vectorSearch/$search/$changeStream)). The methodology section at the end describes how to reproduce every result.

Where did the term converged database come from?

Oracle introduced the term "converged database" around 2020. Maria Colgan's original definition - native support for all modern data types and development paradigms in a single engine (See M. Colgan, "What is a Converged Database?," March 2020) - described convergence primarily as consolidation: one system instead of five, fewer licenses, fewer backups, less integration plumbing.

Three developments since then changed convergence from a convenience into a structural property.

First, the SQL standard absorbed the models. SQL:2016 brought JSON operators into the language, and SQL:2023 added a native JSON type and an entire new part - ISO/IEC 9075-16, Property Graph Queries (SQL/PGQ) - bringing graph pattern matching into standard SQL (See ISO/IEC 9075:2023, SQL, including Part 16, Property Graph Queries (SQL/PGQ), June 2023; SQL/JSON operators in SQL:2016; native JSON type (T801) in SQL:2023. Summary: P. Eisentraut, "SQL:2023 is finished: Here is what's new.") Graph traversal is no longer a separate database category's exclusive capability; it is a clause in the FROM list.

Second, AI workloads arrived with a requirement that multi-store architectures must engineer around: retrieval that is simultaneously fresh, governed, and joined. We return to this below.

Third, a rigorous academic argument for the same convergence pattern arrived independently. In "What Goes Around Comes Around... And Around..." (SIGMOD Record, June 2024, please see the full version under here, Michael Stonebraker and Andrew Pavlo - two of the field's most credentialed relational researchers - surveyed twenty years of data-model alternatives and concluded that document databases are "on a collision course with RDBMSs," whose differences "have diminished over time and should become nearly indistinguishable in the future." Vector databases, in their analysis, "are essentially document-oriented DBMSs with specialized ANN indexes" - indexes being "a feature, not the foundation of a new system architecture." On text search engines: "It would be valuable if RDBMSs had a better story for search so these would not have to be a separate product."

The term began as Oracle vocabulary. The architectural trajectory it names is now argued, on independent evidence, in the field's own literature.

What are the five tests for a converged database?

"Supports multiple models" is a property of a product's storage layer. Convergence is a property of its guarantees. Five testable criteria separate the two - each one demonstrated by a runnable, asserted proof in the companion repository.

1. One transaction boundary. A single ACID transaction can span a relational insert, a document write, a vector update, and the indexes that serve them - and a rollback reverts all of it atomically. This is the test most multi-model systems do not attempt: Lu and Holubová's survey of multi-model databases (ACM Computing Surveys, 2019) examined some twenty systems and reported finding no "explicit information about existence of a special type of transaction management" across data models (J. Lu and I. Holubová, "Multi-model Databases: A New Journey to Handle the Variety of Data," ACM Computing Surveys 52(3), Article 55, 2019.).

2. One optimizer. A cost-based planner produces a single costed plan for a statement that touches several models. This claim is checkable: the companion repository runs EXPLAIN PLAN over a statement combining a graph pattern, a JSON predicate, a vector distance ranking, and relational joins, and asserts that one plan tree covers all four (proof 5 below). If a "graph query" is an application-side loop over a service API, there is no such plan - there is a distributed system whose join order is hard-coded in application logic.

3. One consistency model. Read-your-writes holds across every model and every API, because no replication pipeline sits between the models - no change streams feeding a sidecar process, no oplog window, no reindex delay.

4. One security and governance domain. The same grants, the same row-level policies, the same audit stream cover the document API, the SQL interface, the vector search path, and the graph traversal, because each is a projection of the same engine over the same rows.

5. Shared access surfaces. SQL, a MongoDB-compatible document API, and REST operate on the same data as projections of one engine - rather than different engines behind one gateway.

Converged = multi-model + the guarantees: many data models pass through five gates - one transaction boundary, one optimizer, one consistency model, one governance domain, shared access surfaces - to qualify as converged; multi-model stops at the first gate.

The proof matrix (every script runs in CI; assertion counts per script):

Test	Proof	Assertions
One transaction boundary	`01-one-transaction-every-model.sql`	6
One engine under two APIs	`02-duality-roundtrip.js`	4
Cross-model statement	`03-one-optimizer.sql`	2
One consistency model	`04-read-your-writes.js`	2
One optimizer, one plan	`05-one-plan.sql`	6

The first proof is the one to internalize. Four writes - a relational order, its line item, a JSON document into a collection, and a vector embedding update - in one uncommitted transaction:

INSERT INTO orders (customer_id, store_id, status, total_amount)
VALUES (1, 1, 'placed', 99.99);

INSERT INTO order_items (order_id, line_no, product_id, qty, unit_price)
VALUES ((SELECT MAX(order_id) FROM orders), 1, 1, 1, 99.99);

INSERT INTO events (data) VALUES (JSON('{"type":"order_placed","channel":"lab","note":"document write, same txn"}'));

UPDATE support_tickets
   SET status = 'pending',
       embedding = TO_VECTOR('[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5]', 8, FLOAT32)
 WHERE ticket_id = 1;

Now consider the same four writes spread across a typical specialized stack - DynamoDB for the order, OpenSearch for the searchable event, Pinecone for the embedding. Each system's atomicity stops at its own boundary: DynamoDB's TransactWriteItems groups up to 100 actions within DynamoDB (See Amazon DynamoDB Developer Guide, "Amazon DynamoDB transactions."), and Pinecone documents that its index updates are eventually consistent, providing a freshness-check mechanism precisely because written data is not immediately visible (See Pinecone documentation, "Check data freshness."). No transaction API spans the three systems. Across separate systems, rollback becomes an application-level compensation problem - code that must be written, tested, and maintained for every failure mode.

The optimizer proof pairs a cross-model statement with its execution evidence. The statement traverses a referral graph, ranks the reachable customers' support tickets by vector similarity, and joins relational context:

ring AS (
  SELECT DISTINCT cid FROM GRAPH_TABLE (customer_graph
    MATCH (a IS customers) -[IS referrals]->{1,4} (b IS customers)
    WHERE a.customer_id = 10
    COLUMNS (b.customer_id AS cid))
)
SELECT 'ASSERT:converged-query-returns:' ||
       CASE WHEN COUNT(*) > 0 THEN 'PASS' ELSE 'FAIL' END
FROM (
  SELECT c.customer_id
  FROM ring r
  JOIN customers c        ON c.customer_id = r.cid
  JOIN support_tickets st ON st.customer_id = c.customer_id
  WHERE st.status IN ('open','pending')
  ORDER BY VECTOR_DISTANCE(st.embedding,
           TO_VECTOR('[0.35,-0.35,0.35,-0.35,0.35,-0.35,0.35,-0.35]', 8, FLOAT32), COSINE)
  FETCH FIRST 10 ROWS ONLY
);

Note that the claim is the assertion - that is the contract of the companion repository: the article quotes tests, not aspirations.

What does the engine do with such a statement? Oracle's documentation states that the GRAPH_TABLE operator "is internally translated into equivalent SQL" (See Oracle Database Property Graph Developer's Guide - GRAPH_TABLE operator and SQL translation.) - the graph pattern becomes relational algebra and is costed by the same optimizer as everything else. Proof 5 captures the evidence: EXPLAIN PLAN over a four-model statement (graph + JSON + vector + relational), with assertions that the plan references the graph's edge table, the JSON collection, the vector column's table, and the relational tables - in one plan tree:

| Id  | Operation                         | Name            |
|   0 | SELECT STATEMENT                  |                 |
|   1 |  COUNT STOPKEY                    |                 |
|   2 |   VIEW                            |                 |
|   3 |    SORT ORDER BY STOPKEY          |                 |  ← vector-distance ranking
|   4 |     HASH JOIN                     |                 |
|   5 |      HASH JOIN ANTI               |                 |  ← JSON NOT EXISTS
|   6 |       HASH JOIN                   |                 |
|   7 |        VIEW                       |                 |
|   8 |         HASH UNIQUE               |                 |
|   9 |          VIEW                     | CUSTOMER_GRAPH  |  ← the graph, as a row source
|  10 |           UNION-ALL               |                 |  ← {1,4} hops, unrolled
|  11 |            INDEX RANGE SCAN       | SYS_C008779     |
|  ...|            ... 1–4 hop joins over the referral edge index ...      |
|  28 |        TABLE ACCESS FULL          | CUSTOMERS       |  ← relational
|  29 |       TABLE ACCESS FULL           | EVENTS          |  ← JSON collection
|  30 |      TABLE ACCESS FULL            | SUPPORT_TICKETS |  ← vector column's table

(Illustrative output from DBMS_XPLAN.DISPLAY, abridged; plan hash 4056235962 on the lab container. System-generated index names such as SYS_C008779 vary per build; the proof's assertions therefore resolve index names through user_indexes rather than hard-coding them.)

Read what the plan shows: the graph quantifier {1,4} unrolls into a UNION-ALL of one- to four-hop joins over the referral edge index, appearing as an ordinary view row source named for the graph; the JSON predicate becomes a hash anti-join against the collection table; the vector ranking is a sort over the tickets table - one tree, one cost model. There is no federation seam in that plan, no per-model planner boundary, and no statistics boundary. That is the concrete meaning of "one optimizer."

How is a converged database different from a multi-model database?

It is not "multi-model." Multi-model means several data models are storable. Converged means the five tests pass. The distinction has academic prior art: the same 2019 survey that documented the transaction gap also ruled that an RDBMS storing another model's data without a cross-model query language and "optimization of query evaluation" is not meaningfully multi-model ((J. Lu and I. Holubová, "Multi-model Databases: A New Journey to Handle the Variety of Data," ACM Computing Surveys 52(3), Article 55, 2019.). Storage is necessary, not sufficient. The full treatment of this line is its own article in this series.

It is not a vector index with a database attached. Similarity search is one capability of an AI data architecture, not the architecture itself. The deeper requirements - filtered search against live relational predicates, access control enforced inside retrieval, embeddings updated in the same transaction as the facts they encode - are the qualifiers in the definition above. That argument, with the vendor-documented consistency behaviors of the specialized stores, is developed later in this series.

It is not five engines behind one API gateway. A unified API over separate engines unifies syntax and nothing else; the transaction boundary, the optimizer, the consistency model, and the governance domain remain fragmented.

To be precise about what the alternatives genuinely provide - this series does not argue against capabilities its subjects do not claim, and the following is sourced from each vendor's own documentation, as of June 2026:

MongoDB's multi-document ACID transactions are real, including across shards, with documented operational parameters (See MongoDB documentation, "Production Considerations" (transactions)) Its search and vector search, however, run in mongot, a separate Lucene-based process fed from the database by change streams; MongoDB's architecture documentation describes indexes "built from the data continuously sourced from the database," and its search documentation describes eventual consistency without read-after-write guarantees (See MongoDB documentation, "mongot Architecture" (search process, change-stream sourcing) and Atlas Search index performance (consistency).
ArangoDB's single-server deployments offer genuine cross-model ACID transactions; its own documentation states the qualifiers that apply to sharded clusters (See ArangoDB documentation, "Transactions - Limitations." ).
PostgreSQL's extension ecosystem (pgvector, PostGIS, and others) shares one transaction manager, one planner, and one security model - a real architectural achievement. The seams documented by the projects themselves concern optimization depth: pgvector's README notes that with HNSW indexes, filtering "is applied after the index is scanned," with iterative scan modes added as mitigation (See pgvector README (filtering and iterative scans)).

Each of these systems passes some of the five tests. Converged means passing all five at once.

One further distinction: a converged database is not what results from storing JSON in an unindexed text or BLOB column. Storage without first-class semantics - native indexing, optimizer statistics, partial updates, path expressions in the query language - is storage, not support. A native JSON type with a binary representation designed for the engine (OSON, in Oracle's case), multivalue indexes, and full SQL/JSON integration is what "native" means here (See Oracle JSON-Relational Duality Developer's Guide (duality views, etags, _id requirement, documented restrictions); Oracle AI Vector Search overview).

Why did document databases diverge from relational databases?

A credible definition of convergence has to account for why divergence happened in the first place. I can speak to this directly: I was part of it.

The common relational account holds that NoSQL was a misunderstanding - developers drawn in by marketing ("SQL is slow," "ACID is optional") who needed twenty years to rediscover transactions. Even Stonebraker and Pavlo, in an otherwise rigorous paper, characterize the document movement as impedance-mismatch complaints plus marketing, and close the denormalization question with "the problems with denormalization/prejoining is an old topic that dates back to the 1970s" (See the source here).

The published record tells a more specific story. Werner Vogels documented the workload analysis behind Amazon's move: roughly 70 percent of Amazon's relational operations were single-row, key-value accesses, and another 20 percent returned rows from a single table (See W. Vogels, "A Decade of Dynamo," October 2017). At that scale, on workloads with known, fixed access patterns, distributed join cost was a measurable per-request tax. Denormalizing around the access pattern - the discipline that became single-table design, which I spent years building and teaching at AWS - was not a rejection of relational theory. It was engineering against the physics of the read path.

The citation in that SIGMOD passage repays a closer read. The reference for "settled in the 1970s" is E. F. Codd - the 1971 normalization paper, RJ909 (E. F. Codd, "Further Normalization of the Data Base Relational Model," IBM Research Report RJ909, 1971.). But Codd's contemporaneous writing frames stored redundancy as a workload-dependent tradeoff, not a prohibition. In his 1969 IBM research report: "Only in an environment with a heavy load of queries relative to the other kinds of interaction with the data bank would strong redundancy be justified in the stored set of relations" (See E. F. Codd, "Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks," IBM Research Report RJ599, August 1969 (reprinted in SIGMOD Record 38(1), 2009), §5.). In the 1970 CACM paper, he priced it: stored redundancy consumes "extra storage space and update time" in exchange for "a potential drop in query time" (See E. F. Codd, "A Relational Model of Data for Large Shared Data Banks," CACM 13(6), June 1970, §2.2.1.). That is a cost model - reads versus writes - and it is the same dial document data modelers have been turning for fifteen years. (A corollary, which is ours rather than Codd's: immutable data is the limiting case in which the update side of the tradeoff goes to zero and redundancy becomes nearly free.)

Codd drew one more distinction in 1969 that the document era set aside and the converged era restores: the named set of relations - the logical model - versus the stored set, the physical representation (See E. F. Codd, "Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks," IBM Research Report RJ599, August 1969 (reprinted in SIGMOD Record 38(1), 2009), §5.). Keep the logical model normalized; let the stored representation serve the workload. Document databases collapsed that distinction: they won read locality at the price of data independence, because the schema was the access pattern.

Convergence, implemented carefully, rebuilds Codd's separation with modern machinery. In Oracle AI Database 26ai, a JSON Relational Duality View is a document that is its underlying rows - readable and writable as a document through the MongoDB-compatible API, fully normalized underneath, with lock-free optimistic concurrency via etags (See Oracle Database API for MongoDB, overview (includes beta-stage notes for $vectorSearch/$search/$changeStream) and (See Oracle JSON-Relational Duality Developer's Guide (duality views, etags, _id requirement, documented restrictions); Oracle AI Vector Search overview). The proof script updates a customer's segment through the document API and reads the change back through SQL in the same engine:

// Update segment THROUGH THE DOCUMENT API...
col.updateOne({ _id: 42 }, { $set: { segment: 'vip' } });
const after = col.findOne({ _id: 42 });
print('ASSERT:dv-doc-updated:' + (after.segment === 'vip' ? 'PASS' : 'FAIL'));

// ...and read it back through SQL in the SAME api (one engine underneath):
const rows = db.aggregate([{ $sql: 'SELECT segment AS "segment" FROM customers WHERE customer_id = 42' }]).toArray();
print('ASSERT:dv-sql-sees-doc-write:' + (rows.length === 1 && rows[0].segment === 'vip' ? 'PASS' : 'FAIL'));

Two qualifications, from the product documentation: duality views require an _id field as the document identifier, and they carry documented restrictions (among them, no Virtual Private Database policies on the view itself) (See Oracle JSON-Relational Duality Developer's Guide (duality views, etags, _id requirement, documented restrictions); Oracle AI Vector Search overview); the MongoDB-compatible API documents $vectorSearch, $search, and $changeStream as beta features (See Oracle Database API for MongoDB, overview (includes beta-stage notes for $vectorSearch/$search/$changeStream). Neither qualification affects the scripts above, which use general-availability surfaces only.

Model the domain. Project the access. The document patterns still matter - embedding, referencing, bucketing are still bets on read/write ratios, and the dials Codd priced in 1970 still exist. What changed is that you stop paying so much for the joins you cannot avoid in a pure document database. When joins are cheap, denormalization more often becomes a projection you declare instead of a copy you maintain.

Why does a converged database matter for RAG and AI agents?

AI did not create the multi-store consistency problem. It removed the tolerance for it.

A retrieval-augmented pipeline - and, more acutely, an agent that takes actions - needs context that is fresh, governed, and joined. Fresh: the embedding must reflect the row as it is now, not as of the last synchronization. Governed: the user's (or agent's) permissions must be enforced inside retrieval, not in application code that every access path is assumed to traverse. Joined: "similar documents" is rarely the production question; "similar documents for this customer, in this region, with an open ticket" is.

In a multi-store architecture, each of those properties is a pipeline. The vector index trails the operational store by design: Pinecone's documentation provides a data-freshness checking mechanism because updates are not immediately visible to queries (See Pinecone documentation, "Check data freshness."), and MongoDB's documentation describes search indexes continuously sourced from the database by a separate process, with eventual consistency and no read-after-write guarantee (See MongoDB documentation, "mongot Architecture" (search process, change-stream sourcing) and Atlas Search index performance (consistency). When the index disagrees with the table, a language model does not become uncertain; it becomes confidently wrong about stale facts. I call this State Vector Dissonance - otherwise known as hallucinating with confidence. When the consumer is an agent with the authority to act, the staleness window converts directly into business risk.

One request, two architectures: a polyglot stack assembles every answer in the application across an operational store, search, graph, vector, and cache services connected by query and CDC/ETL sync buses - every edge adds a network hop, serialization, sync lag, and stale reads. A converged engine answers the same request in one transaction with one optimizer.

The converged response to this problem is not a faster pipeline; it is the absence of one:

const evts = db.getCollection('events');
const marker = 'rww-' + Math.floor(Math.random() * 1e9);
evts.insertOne({ type: 'consistency_probe', marker: marker });

const viaSql = db.aggregate([
  { $sql: 'SELECT COUNT(*) AS "n" FROM events e WHERE e.data.marker.string() = \'' + marker + '\'' }
]).toArray();
print('ASSERT:read-your-writes-sql:' + (viaSql.length === 1 && Number(viaSql[0].n) === 1 ? 'PASS' : 'FAIL'));

A write through the document API, visible to SQL in the same second, because nothing sits between them to lag. How agents should consume enterprise data - memory, retrieval, permissions, audit - is the subject of a later article in this series.

Example: one commerce domain across relational, JSON, graph, vector, and spatial data

Every article in this series runs against the same small domain in the companion repository: 200 customers, 1,000 orders, 300 support tickets with embeddings, a referral-and-device graph, store locations. It is a deliberately ordinary commerce domain that requires every model naturally: orders are relational, profiles are documents, fraud rings are graphs, ticket similarity is vectors, store proximity is spatial.

Here is customer 42 as the document API sees it - a duality view document, abridged from the live container:

{
  "_id": 42,
  "_metadata": { "etag": "E9BA8572B721D85E653B49930B83D911", "asof": "000000000022B79D" },
  "email": "customer42@example.com",
  "fullName": "Customer 42",
  "segment": "standard",
  "orders": [
    { "orderId": 90, "status": "delivered", "orderTs": "2026-05-22T00:00:00",
      "total": 273.96,
      "items": [ { "line": 1, "productId": 32, "qty": 3, "unitPrice": 273.96 } ] }
  ]
}

This document and the customers, orders, and order_items rows beneath it are the same logical data managed by one engine - no second persisted copy, no synchronization pipeline. The etag gives the document world lock-free optimistic concurrency; the rows give the relational world its constraints, its statistics, its joins (See Oracle JSON-Relational Duality Developer's Guide (duality views, etags, _id requirement, documented restrictions); Oracle AI Vector Search overview)

One truth, two shapes: customer 42's duality view document on the left and the customers, orders, and order_items rows it is built from on the right - the same data, one copy, no sync pipeline, with the etag providing lock-free optimistic concurrency.

The lab's stated limits, for the record: Oracle AI Database 26ai Free is capped at 2 CPUs, 2 GB of database memory, and 12 GB of user data - sufficient for correctness proofs, deliberately unsuitable for benchmarks, which is why this article contains no performance numbers. The demonstration embeddings are 8-dimensional and deterministic so that CI results are reproducible; engine behavior is dimension-independent, and a real-model flow with in-database ONNX embedding generation is planned for a later module.

FAQ

What is a converged database? A single database engine that natively supports relational, document, graph, vector, spatial, and text data under one optimizer, one transaction boundary, one consistency model, and one security domain - exposed through the major access surfaces: SQL, document APIs, and REST. The defining property is not storing many models; it is that the guarantees span them.

What's the difference between a converged database and a multi-model database? Multi-model means multiple data models can be stored in one product. Converged means the architectural guarantees - transactions, optimization, consistency, governance - apply across those models. The 2019 ACM survey of multi-model systems reported no evidence of cross-model transaction management in the products it examined (J. Lu and I. Holubová, "Multi-model Databases: A New Journey to Handle the Variety of Data," ACM Computing Surveys 52(3), Article 55, 2019.); that gap is the line. Full comparison: later article in this series.

Do converged databases replace vector databases? For enterprise RAG and agent workloads, similarity search increasingly belongs beside the data it describes - filtered by live predicates, governed by the database's access controls, updated in the same transaction as the source rows. Specialized vector stores remain a defensible choice for standalone similarity serving without relational context. The architectural comparison appears later in this series.

Isn't "converged database" just an Oracle marketing term? The term originated at Oracle in 2020 (See M. Colgan, "What is a Converged Database?," March 2020). The architecture it names has a strong academic argument behind it: Stonebraker and Pavlo (SIGMOD Record, 2024) describe document databases converging with relational systems and vector search as a feature of existing engines (See the source here), and the SQL standard itself absorbed JSON (2016, native type 2023) and property graphs (See ISO/IEC 9075:2023, SQL, including Part 16, Property Graph Queries (SQL/PGQ), June 2023; SQL/JSON operators in SQL:2016; native JSON type (T801) in SQL:2023. Summary: P. Eisentraut, "SQL:2023 is finished: Here is what's new."). The vocabulary is a vendor's; the trajectory is documented in the field's literature.

Does convergence make document data modeling obsolete? No. Embedding, referencing, bucketing, and computing remain bets on read/write ratios and access patterns - the tradeoff Codd priced in 1970 (See E. F. Codd, "A Relational Model of Data for Large Shared Data Banks," CACM 13(6), June 1970, §2.2.1.) did not disappear. What convergence changes is the cost of being wrong: when documents are projections of normalized rows (duality views), changing the access pattern means changing the projection, not migrating the data. The patterns still matter; there are simply more tools now.

Why does a converged database matter for RAG? RAG systems need retrieved context that is current, permission-aware, and connected to operational facts. A converged database can reduce the synchronization gap between source rows, embeddings, and the queries that use them.

What is Oracle AI Database in this article? Oracle AI Database is the Oracle database platform context used by the proof scripts and examples in this article, including SQL, document/JSON, graph, vector, and related access surfaces.

What is vector search in a converged database? Vector search ranks rows or documents by embedding similarity while remaining close to relational filters, access controls, transactions, and other database context.

How fresh is the methodology and proof?

Every code sample in this article is a verbatim excerpt of a script in converged-database-lab, executed by GitHub Actions CI - on every change and on a nightly schedule - against Oracle AI Database 26ai Free (the gvenzl/oracle-free 23.26.x container line; year.quarter version tags correspond to the 26ai release). Current status: 5 proof scripts, 20 assertions, passing as of June 12, 2026 (including the day's scheduled nightly run). Reproduce it in three commands:

docker compose up -d --build oracle
pip install -r validator/requirements.txt
python validator/run.py

The Agent Communication Matrix: When MCP, A2A, and Plain REST Each Win

Anya Summers — Mon, 29 Jun 2026 16:04:28 +0000

Key Takeaways

Agent communication has three problems, not just one. Tool access, peer coordination, and system integration each need a different solution. Most production failures occur when one protocol tries to cover all three.
MCP and A2A are complements, not rivals. The Model Context Protocol (Anthropic, 2024) defines how models find and use tools. The Agent-to-Agent (Google, 2025) explains how agents cooperate. Generally, agents use A2A for coordination and MCP for tool access.
Simple infrastructure still works well for many tasks. Message queues provide at-least-once delivery, dead-letter queues, and automatic back-pressure. Adding these features to MCP or A2A requires coding idempotency, retry coordination, and ordering manually. When the LLM acts as a worker, use a queue.
Three reference patterns cover most production needs. MCP-Centric Tool Access (one orchestrator, multiple tools), A2A Mesh with Oracle Memory (peer agents coordinating via task envelopes), and Queue-Backed Backoffice Agents (RabbitMQ workers writing to Oracle, without agent protocol). Each includes runnable Python in the companion repo.
The protocol layer can change; the memory layer should stay stable. The Oracle AI Database remains consistent across all three patterns (vector-indexed, transactional, audit-friendly). This consistency allows the protocol above to evolve while keeping the system of record intact.

The protocol you picked is doing three jobs at once

Three communication patterns for AI agents: tool access, peer coordination, and system integration.

Imagine your team created a multi-step research agent. It has three specialist sub-agents: a retriever, a synthesizer, and a reviewer. They connect over plain REST. It worked well in staging. But in production, p99 latency hit 14 seconds at the third hop. Retries piled up. A failed downstream call left the orchestrator with a half-written database row. The rollback logic, designed for a different failure mode, made things worse.

Then they implemented RabbitMQ. Latency stabilized, and throughput increased. Now, retry issues were someone else’s concern, which was the goal. But two weeks later, the security team filed a ticket. They asked which agent had touched which row during a specific time, and nobody could answer. Request-scoped tracing had disappeared into the queue.

The LLM-facing tool interface had splintered into six unique queue-message schemas, one for each specialist. None were introspectable by the model. The synthesizer agent started calling tools it didn’t know about and failed silently if they didn’t respond.

The team hadn’t chosen a bad protocol. They had picked the same protocol for three different jobs, twice in a row.

Agent communication isn’t just one problem. It involves tool access, peer coordination, and system integration. Each area needs its own solution.

Tool access occurs when a model needs to use a capability it lacks, like a SQL query or memory write. The Model Context Protocol (MCP) addresses this need. It’s often the first task for production agent systems.

Peer coordination happens when one agent assigns work to another. This isn’t just a function call; it’s a task with its own state and lifecycle. The second agent may work independently on this task. The Agent-to-Agent protocol (A2A) supports this, solving problems that the tool-call model can’t handle well.

System integration involves agents interacting with your broader infrastructure—databases, queues, services, scheduled jobs, and audit pipelines. For two decades, REST, message queues, and event buses have managed this. Often, the simplest solution is the best.

This article offers a framework to help you choose the right protocol for your needs. It includes three reference patterns for building with these protocols, each with runnable Python examples using Oracle AI Database. One constant remains true as the protocol layer evolves: the governed memory core.

The Agent Communication Matrix

Key insight: Protocols aren’t ranked on a single axis. They differ on five concrete attributes (interaction shape, streaming, reliability semantics, governance surface, and primary job), and the right choice is the one whose attribute profile matches the job.

Protocol	Primary job	Interaction shape	Streaming	Reliability semantics	Governance surface
MCP	Expose tools and resources to a model	Typed request/response: model calls a discoverable tool, server returns structured output	Native: supports streaming responses and progress notifications	At-most-once over HTTP/JSON-RPC; retries are the client’s job	Strong: tools self-describe via JSON Schema; capabilities are discoverable at connection time
A2A	Coordinate work between peer agents	Task-oriented: one agent submits a task, another reports state changes (submitted, working, completed, failed)	Native: status and artifact updates stream as the task progresses	At-most-once with task-level retry; tasks are addressable and resumable	Medium: agent cards declare capabilities, but task semantics are author-defined
REST	Service-to-service integration	Synchronous request/response: caller blocks until server returns	None native; long-poll or upgrade to SSE/WebSocket if needed	Best-effort; retry and idempotency are the application’s problem	Weak by default: OpenAPI helps, but it’s convention, not contract
Message queue	Hand work to a worker asynchronously	Fire-and-forget: producer drops a message, worker consumes when ready	None: queues deliver discrete messages, not streams	At-least-once with ack/nack; dead-letter queues catch poison messages	Medium: per-queue ACLs and DLQs give operational control, but no schema layer
Event bus	Broadcast facts to many consumers	Publish-subscribe: one producer, N consumers, decoupled in time	Stream-native: consumers replay from offsets	At-least-once, often with ordering guarantees per partition; replayable history	Medium: topic-level governance, schemas via registry (Avro, Protobuf), but consumer behavior is author-defined

Caption: The Agent Communication Matrix. Use these five attributes to decide which protocol fits which job. WebSockets, SSE, and gRPC streaming appear in this discussion as transports, not as peers; they carry messages for several of the protocols above.

Three cells in this matrix do most of the real work, and they’re worth examining.

MCP and A2A look similar on the wire but interact differently. Both use HTTP, JSON-RPC, and streaming. MCP treats interaction as atomic: the model makes a call, and the server returns a structured response. A2A treats interaction as stateful: an agent submits a task, which follows a lifecycle (submitted, working, completed, failed) that both sides monitor.

This has clear implications for engineers. If your “agent” acts like a stateless function, MCP is ideal, and A2A adds overhead. If your “agent” has a lifecycle (it can pause, resume, check status, or cancel), A2A provides functionality that MCP lacks. This highlights the real differences between Pattern 1 and Pattern 2.

The reliability column is the most important trade-off in the matrix. Teams often misjudge it.

Application versus infrastructure responsibility for reliability, retries, and message delivery.

At-most-once delivery, common in HTTP protocols like MCP and REST, means that a failed request might have been completed or not. This leaves the client uncertain.

At-least-once delivery, typical for queues and event buses, ensures a message is processed at least once. However, it might be processed more than once if a worker crashes. Here, idempotency becomes the app’s responsibility.

Neither approach is better than the other. The key question is where you want retry logic: in your application code (HTTP) or in your infrastructure (queues).

Pattern 3 suggests that for certain agent tasks, placing retry logic in the infrastructure is better.

MCP’s “strong” governance and REST’s “weak by default” rating tackle the same issue that created the OpenAPI ecosystem, but they do it differently. MCP servers self-describe when a connection happens. For example, a client requests tools/list and receives a complete schema of capabilities. This includes types, descriptions, and parameter constraints.

In contrast, REST provides OpenAPI only if the producing team publishes and maintains it, while the consuming team must trust it. That’s three “if”s the agent runtime can’t resolve at runtime. MCP makes discoverability a requirement, not just a convention. This governance model includes auditable tool inventories, type-checked invocations, and capability negotiation for each session. Many underestimate this before adopting it. This is the edge that Pattern 1 uses.

The matrix doesn’t choose a protocol for you. It shows what you’re trading.

Decision tree for choosing between MCP, A2A, and message-queue architectures.

Pattern 1: MCP-Centric Tool Access

Single-agent orchestration pattern using Oracle AI Database for retrieval, transactions, and memory.

*Spec: *Model Context Protocol specification, Anthropic (2024)

This is the pattern for production agent systems: a single LLM orchestrator, one or more MCP servers, and a typed contract between them. The model doesn’t call your database directly. Instead, it uses a tool that interacts with it, and this difference is important.

MCP excels at the discovery layer. When an MCP client connects to a server, it sends a tools/list request. In return, it receives the full schema of available capabilities: names, parameters, descriptions, and return types. The model sees this inventory before acting. Selecting tools becomes a reasoning step rather than a hardcoded choice. This is a significant change. A model with discoverable tools handles “I don’t know how to do that” better than one with fixed function calls. The lack of a tool provides useful information for the model to reason about.

Oracle AI Database serves well as the MCP server. The capabilities you want to expose to an agent—like vector search over embedded content, parameterized SQL against business tables, and structured memory reads and writes—fit perfectly with MCP’s tool model. A typical Oracle-backed MCP server offers four or five tools: vector_search, run_sql, read_memory, write_memory, and summarize_thread. Each is a small, focused function with a typed schema. The model chooses among them based on the task.

The code below shows the minimal version: an MCP server registering one tool that performs vector search against Oracle AI Database. Note the typed array.array("f", ... ) bind for the vector column; a plain Python list will not work. The full server, with authentication, retries, and the other four tools, is in the companion repo.

import array

from mcp.server import Server

from mcp.types import Tool, TextContent

import oracledb, os

server = Server("oracle-tools")

pool = oracledb.create_pool(user=os.environ["DB_USER"],

                            password=os.environ["DB_PASS"],

                            dsn=os.environ["DB_DSN"], min=1, max=4)

@server.list_tools()

async def list_tools() -> list[Tool]:

    return [Tool(

        name="vector_search",

        description="Semantic search over the knowledge base. Returns top-k passages.",

        inputSchema={

            "type": "object",

            "properties": {"query": {"type": "string"}, "k": {"type": "integer", "default": 5}},

            "required": ["query"],

        },

    )]

@server.call_tool()

async def call_tool(name: str, arguments: dict) -> list[TextContent]:

    # Embed the query, then run an Oracle AI Vector Search against the indexed corpus.

    vec = array.array("f", await embed(arguments["query"]))

    with pool.acquire() as conn, conn.cursor() as cur:

        cur.execute("""

            SELECT chunk_text FROM kb_chunks

            ORDER BY VECTOR_DISTANCE(embedding, :q, COSINE)

            FETCH FIRST :k ROWS ONLY

        """, q=vec, k=arguments.get("k", 5))

        return [TextContent(type="text", text=row[0]) for row in cur]

Here are a few key points about what’s happening here. The tool schema acts as a contract. The model views vector_search as a typed capability. It requires a query string and allows an optional integer k. This info helps the model decide when and how to use it.

The Oracle AI Database vector search executes as a single SQL statement on a VECTOR column. It uses a cosine-distance HNSW index. There’s no separate vector store, no sync job, and no eventual-consistency window.

The embed() call is left out here for clarity. In the repo, it connects to a local Ollama model. (This setup allows the demo to run without needing paid API keys).

In tests with the demo corpus (1,000 chunks using 768-dimensional embeddings via Ollama’s nomic-embed-text on a GPU workstation), median tool-call round-trip latency is 15.3ms. This includes 14.1ms for embedding inference and 0.9ms for Oracle vector search. On CPU-only hardware, the embedding step is usually 5 to 20 times slower, but the database side remains constant. Remember this key point: an MCP tool call to an Oracle-backed server is fast like a database, not like an LLM. The latency during an agent turn mainly relies on the model’s own inference, not the tool.

Recommended usage: Use one orchestrator agent with various specialized tools. Teams should standardize access across different agent frameworks or model providers. This setup works best when the typed schema improves model behavior, which is often true. MCP is ideal if you plan to add tools later. The discovery layer allows new capabilities to integrate with the model without changes on the client side.

When not to use this: Avoid having a single agent for just one tool. Don’t add a protocol if a function call is enough. If your “agent” is simply one model with a clear capability, an MCP server adds extra complexity. The discovery layer is helpful when there’s something to find; with only one tool, there’s nothing to discover.

Pattern 2: A2A Mesh with Oracle Memory

Multi-agent collaboration pattern with shared state stored in Oracle AI Database.

Spec: Agent-to-Agent Protocol specification, Google (2025)

A2A solves a problem MCP doesn’t: what happens when an agent isn’t calling a tool but handing work to another agent. The distinction sounds semantic until you try to express “the Researcher has finished gathering sources; the Writer should now draft a response using them” as a tool call. It doesn’t fit. The Writer isn’t a function the Researcher invokes. It’s a peer with its own model, its own prompt, its own lifecycle. A2A models that relationship as a task with state, addressable identity, and a status machine that both sides observe.

Consider a two-agent research workflow. A Researcher agent gathers context from external sources, checks relevance, and produces findings. A Writer agent then uses those findings to draft a response in the desired tone and format. A simple setup would have the Researcher return findings directly as a response to a tool call. This works for two agents but fails with three. When a third agent, like a Reviewer, needs the same findings, you end up duplicating data in message history instead of having a central record.

The A2A pattern changes this. Findings are stored in the Oracle AI Database as durable, vector-indexed rows. The Researcher writes them and sends a task message to the Writer with a reference to the findings, not the data itself. The Writer reads from the same table. This protocol ensures coordination, while the database holds the state. Multiple agents can access the same information without the protocol layer needing to see the contents.

The code below is one half of the mesh: the Writer agent’s task handler, listening for task.created events from the Researcher and producing a draft. The Researcher side, plus the full A2A envelope with retries and status updates, lives in the companion repo.

import asyncio, oracledb, os

from a2a.server import A2AServer

from a2a.types import Task, TaskStatus, Message

pool = oracledb.create_pool(user=os.environ["DB_USER"],

                            password=os.environ["DB_PASS"],

                            dsn=os.environ["DB_DSN"], min=1, max=4)

writer = A2AServer(agent_id="writer-v1")

@writer.on_task("draft_response")

async def handle_draft(task: Task) -> Message:

    # The Researcher passed a memory_id, not the findings themselves.

    memory_id = task.payload["memory_id"]

    with pool.acquire() as conn, conn.cursor() as cur:

        cur.execute("SELECT findings, source_refs FROM agent_memory WHERE id = :id",

                    id=memory_id)

        findings, sources = cur.fetchone()

    await writer.update_status(task.id, TaskStatus.WORKING)

    draft = await llm_draft(findings, sources, tone=task.payload["tone"])

    cur.execute("UPDATE agent_memory SET draft = :d WHERE id = :id", d=draft, id=memory_id)

    return Message(role="agent", content=draft, refs={"memory_id": memory_id})

A few things in that snippet do the load-bearing work. The Writer never receives the findings in the message. It receives a memory_id and reads the actual content from Oracle. That’s the payload-by-reference pattern, and it’s the core architectural move of this pattern. The update_status call tells the Researcher (and any observer subscribed to the task) that work has begun; A2A’s status machine handles the streaming update without the Writer having to manage its own connection lifecycle. The final Message returns the draft inline because it’s the artifact of the task, but it also includes the memory_id ref, so a third agent picking this up next reads the same memory rather than re-deserializing a payload.

The trade-off is clear in token counts. In the demo, using serialized findings in the message costs 1,394 tokens per Writer turn for 3KB of research. This size is typical for a research agent creating a synthesized summary with source references.

In contrast, the payload-by-reference version only costs 61 tokens, no matter the findings’ size. This means a 22.9 times reduction at 3KB. The difference grows with findings size: at 500 characters, the reduction is 5.6 times; at 8KB, it reaches 58.9 times. The ratio isn’t fixed; it depends on how much data is in the message versus in the database. (Tokens counted using OpenAI’s cl100k_base tokenizer; Anthropic and Google tokenizers yield similar counts for English text.)

The compounding effect is more important than any single hop. A three-agent mesh sharing the same research context across two handoffs costs about 4,000 tokens in the naive version. In the payload-by-reference version, it costs only 183 tokens. At five hops, the difference exceeds 6,500 tokens per request. This is before any agent has done actual reasoning work. The cost of “just put it in the message” increases linearly with mesh depth. Most mesh topologies grow over time, not the other way around.

Oracle AI Database plays a key role here. The agent_memory table serves as a single source of truth. It is vector-indexed for semantic recall and transactional for consistency between reads and writes. Each row includes the agent ID, making it audit-friendly. The protocol layer can be A2A today and something else tomorrow. However, the memory layer stays the same.

Recommended usage: Use multi-agent workflows that need peer coordination, like planner-and-specialist patterns or multi-step research pipelines. This applies when multiple agents require consistent access to the same conversational or task state. A2A is ideal for long-running tasks where a synchronous request/response model would keep connections open unnecessarily.

When not to use this: Avoid A2A for one agent with a set of tools. This is an MCP problem, not an A2A issue. Using A2A with a single orchestrator adds unnecessary task lifecycle management. A good test: if you can name a second agent and explain its independent decisions, A2A works. If the “second agent” is just a different prompt using the same model, it’s a tool call.

Pattern 3: Queue-Backed Backoffice Agents

Message-driven AI workflow using RabbitMQ and Oracle AI Database.

Imagine a document-processing pipeline. PDFs arrive in a queue. A worker agent picks them up, extracts text, embeds chunks, and writes them to the Oracle AI Database with vector indexing. It then shows results through a simple FastAPI endpoint. No MCP. No A2A. This is key. Adding either would increase the surface area without enhancing capability.

This pattern resists the pull of new protocols. The urge to add MCP just because there’s an LLM involved somewhere is strong but should be resisted. A worker using the same embedding model and prompt for each message doesn’t need tool discovery. It needs at-least-once delivery, a dead-letter queue, and back-pressure for when the embedding service slows down. These are queue issues, not agent-protocol issues.

The architectural shape predates the agent era, which is key to its function. Producers send messages to a queue. Workers process them at their own pace. If messages fail, they retry with exponential backoff and go to a dead-letter queue if they keep failing. The LLM acts as a worker in the pipeline, not as its orchestrator. This means the protocol layer above the LLM is as straightforward as the rest of the system, and that’s a benefit.

The code below is the worker’s core loop: consume a message, embed the chunks, write to Oracle AI Database, acknowledge. Producer, dead-letter handling, and the FastAPI edge live in the companion repo.

import array, json, oracledb, pika, os

from embed import embed_chunks  # local Ollama call, see repo

pool = oracledb.create_pool(user=os.environ["DB_USER"],

       password=os.environ["DB_PASS"],
       dsn=os.environ["DB_DSN"], min=1, max=4)

conn = pika.BlockingConnection(pika.URLParameters(os.environ["AMQP_URL"]))

ch = conn.channel()

ch.queue_declare(queue="documents", durable=True,

                 arguments={"x-dead-letter-exchange": "documents.dlx"})

ch.basic_qos(prefetch_count=4)  # back-pressure: at most 4 in-flight per worker

def handle(ch, method, _props, body):

    doc = json.loads(body)

    chunks = doc["chunks"]                          # already segmented upstream

    vectors = [array.array("f", v) for v in embed_chunks(chunks)]

    with pool.acquire() as db, db.cursor() as cur:

        cur.executemany("""

            INSERT INTO kb_chunks (doc_id, chunk_text, embedding)

            VALUES (:doc, :txt, :vec)

        """, [(doc["id"], t, v) for t, v in zip(chunks, vectors)])

        db.commit()

    ch.basic_ack(delivery_tag=method.delivery_tag)

ch.basic_consume(queue="documents", on_message_callback=handle)

ch.start_consuming()

The interesting parts are what isn’t there. There’s no tool schema, no agent identity, and no status state machine. The worker doesn’t need to show its capabilities because nothing is looking for it. The contract is the queue’s message schema, enforced by the producer’s chosen validation. The prefetch_count=4 setting tells the whole back-pressure story. If the embedding service slows down or Oracle’s connection pool fills up, messages stay on the queue instead of piling up in worker memory. The DLX (dead-letter exchange) on the queue means any message that fails repeatedly goes to a place where a human can check it, without the producer or any other agent needing to know.

Reliability semantics play a crucial role here. RabbitMQ provides at-least-once delivery with ack/nack semantics. This means if a worker crashes during processing, the message is sent to another worker. You don’t need application-level retry logic. In contrast, achieving the same reliability with an MCP server involves manually writing idempotency keys, retry coordination, and ordering logic. The queue handles this for you, and “for free.” RabbitMQ has been improving these semantics since 2007. You’re not going to outdo that on a side project.

The Oracle integration mirrors Patterns 1 and 2: it’s durable, vector-indexed, and transactional. The worker writes embedded chunks into the same kb_chunks table that Pattern 1’s MCP vector_search tool reads from. Teams using Oracle Database can merge the queue and memory layer into one component with Oracle Advanced Queuing. The trade-off is one less service to manage, but with slightly less portable demo code. This is the architectural benefit of the three-pattern arc: while the protocol layer changes (MCP, A2A, none), the memory layer remains constant. A document processed by Pattern 3’s queue worker is instantly searchable by Pattern 1’s MCP tool and can be referenced by Pattern 2’s A2A peers. This isn’t a coincidence; it reflects the efficiency gained when each protocol performs its best role, with Oracle AI Database maintaining the shared state for all three patterns.

Recommended usage: Use this for asynchronous, idempotent, and throughput-sensitive tasks. Examples include document processing, batch embedding, ETL pipelines, scheduled report generation, and back-office automation. Here, the LLM acts as a worker, not an orchestrator. This pattern is ideal when you need to manage slow or temporarily down downstream services. Queues can handle those issues without the application noticing.

When not to use this: Avoid it for real-time, conversational, or human-in-the-loop tasks. Don’t place a queue between a user and a chatbot. It’s not suitable when latency is more critical than throughput, especially when users expect quick answers. The conversational loop fits in Patterns 1 or 2, while Pattern 3 works behind them, tackling non-interactive tasks.

The Enterprise Reality

Cost scales with protocol surface area. Adding each protocol to an agent system creates another layer. This layer must be monitored, secured, and fixed if something goes wrong at 3 a.m. The patterns above outline the architecture, while what follows shows how that architecture shifts when faced with real users at scale.

Auditability across async boundaries. When a request crosses from MCP to a queue to A2A, the question regulators and security teams actually ask is which agent touched which row, and when? The answer almost never lives in any single protocol. According to LangChain’s State of Agent Engineering 2026 report, 89% of organizations have implemented some form of agent observability, and among teams already running agents in production that figure rises to 94%, with 71.5% reporting full tracing across individual agent steps and tool calls.

The teams ahead of the curve are not the ones with the most sophisticated protocols; they are the ones who decided early that the system of record sits in the database, not in protocol message history. Oracle AI Database earns its place here as that record. Every memory write, every tool invocation, every agent identity is durable in a single governed store that does not care which protocol delivered the message.

Cost scales with iteration, not just calls. The LangChain report shows that 32% of respondents see quality as their main blocker, while latency follows at 20%. Interestingly, cost concerns have dropped over the year. Teams aren’t just paying for tokens; they’re paying for hops. Each protocol boundary adds latency, retries, and overhead. A multi-agent system crossing four boundaries per request multiplies the engineering effort. The key lesson from Pattern 3 is clear: Avoid adding a coordination protocol when the work is async, idempotent, and doesn’t need a model in the orchestration loop.

Multi-tenancy and Isolation. A report shows that among enterprises with 2,000 or more employees, security is now the second-largest barrier to production, noted by 24.9% of respondents. This is more significant than latency, and it affects protocol choice. MCP servers can be deployed for each tenant or shared with tenant-scoped tools. A2A meshes follow the trust boundaries of their network. Queues can isolate by virtual host or topic. None of these options are wrong, but they differ. A tenancy model that works for one protocol often doesn’t fit all three. The constant factor is the database tenancy model. Row-level security, schema-per-tenant, and Oracle’s audit infrastructure remain relevant, regardless of which protocol is in vogue for the next roadmap.

The protocol layer can change. The governed memory layer should not.

Where This Is Heading

Three things are visibly changing in the agent communication layer right now, and one of them is not yet resolved.

Protocols are converging on capability cards. Both MCP’s tool schemas and A2A’s agent cards share a key idea: discoverable, typed descriptions of capabilities. These can be fetched at connection time instead of being hard-coded in client code. The two specifications came to this idea independently, suggesting it’s a fundamental concept. In the next two years, we can expect to see shared schema conventions across protocols. This may include cross-walks between MCP’s tools/list and A2A’s agent cards, or even a new specification that combines both. Teams using either spec now are not going against this convergence; they are ready for it.

Database-resident memory is becoming the default. In this article’s three patterns, the key constant is the memory layer, not the protocol. We see vector-typed columns, consistent transactions between agent writes and reads, and audit trails that endure even after framework updates. This marks a significant shift from the architecture of 2023 and early 2024. Back then, vector stores operated as separate sidecars, while agent memory was just a Python dictionary. Oracle AI Database showcases this trend. The larger pattern shows that durable agent state should exist in the same governed system that manages your data, not in a separate stack needing constant syncing.

The tool/agent boundary is dissolving, and the taxonomy in this article will eventually need to be rewritten.

An MCP server can wrap an LLM-powered backend, in which case calling it is functionally an agent invocation. An A2A agent can expose itself as an MCP tool, in which case it is being addressed as a capability rather than a peer. Both moves are legitimate, both are happening in production today, and the protocols themselves do not yet have an opinion on which framing is correct.

This is the open question. The matrix in this article tells you what each protocol is good at today, and the three patterns work today. But the line between here is a tool, call it and here is a peer, coordinate with it is genuinely blurring, and I do not think the industry has agreed yet on where it settles. The people I trust most on this question are the ones building both patterns in production and treating the distinction as an engineering choice rather than a protocol mandate. That is the right posture for the next eighteen months. The taxonomy will catch up to the practice, or it will not, and the architectural decisions you make this quarter should be robust to either outcome.

Frequently Asked Questions

Should I pick MCP or A2A for my first agent project? Almost certainly MCP. A2A addresses peer coordination, which many initial projects lack. Start with one model and a set of tools. Introduce A2A when you have a second agent that needs to work with the first on a task that lasts beyond a single request. Using A2A too early creates extra coordination without a clear need.

Do I need both MCP and A2A in the same system? Yes, often. The typical production shape is A2A between agents and MCP from each agent to its tools. This is because the two protocols operate at different layers and handle different tasks. A system requires both when it has real peer coordination and actual tool access. If you don’t have one of these, you don’t need the related protocol yet.

Can I migrate from REST to MCP without rewriting my services? Yes, that’s usually the cleanest adoption path. An MCP server acts as a thin wrapper over existing REST endpoints. It adds a typed tool schema and a discovery layer without altering your service code. The migration cost lies in the wrapper, not in the services. The services continue to serve their non-agent clients as before.

Does Oracle AI Database require Oracle-specific tooling for any of these patterns? No. All three patterns in this article use standard open-source Python libraries (oracledb, the mcp SDK, pika for RabbitMQ, FastAPI). Oracle AI Database participates through a connection string and a vector-typed column, not through a framework lock-in. Teams already running Oracle gain the option of collapsing the queue and the memory layer into a single component using Oracle Advanced Queuing, but it is an option, not a requirement.

What is the cheapest way to try this end-to-end? The companion repository ships a docker-compose.yml that stands up Oracle AI Database Free, RabbitMQ, and Ollama for local model inference. No paid API keys, no cloud accounts, no proprietary SDKs. The entire three-pattern demo runs on a developer laptop with roughly 16GB of RAM, and the Oracle AI Database Free edition supports up: 2 CPUs for foreground processes, 2GB of RAM (SGA and PGA combined), 12GB of user data on disk (irrespective of compression factor)

When is “just use a queue” the right answer? When the work is asynchronous, idempotent, and sensitive to throughput, the LLM acts as a worker, not an orchestrator. This applies to most backoffice tasks, like automation, batch embedding, document processing, and scheduled reporting. The key test is if a human needs the result in real time. If not, a queue is usually the best choice. MCP or REST should only be used at the edges, where the system interacts with a human or a synchronous external service.

An Agent Skill that uses Kafka Java APIs for Oracle AI Database

Anya Summers — Mon, 29 Jun 2026 15:36:49 +0000

Use skills to build OKafka apps with Oracle AI Database

Key Takeaways

OKafka is a Kafka Java API for Oracle AI Database Transactional Event Queues. OKafka implements standard Kafka Java interfaces to create topics, produce, and consume messages directly in the database.
This agent skill helps you write Kafka Java for Oracle AI Database Transactional Event Queues’ OKafka library.
The skill encodes Oracle-specific additions to the Kafka Java API: authentication, using transactions, serialization, and database-specific topic behavior.
Good agent skills raises the team baseline: better first pass code, fewer manual corrections, and improved integrations with Oracle AI Database.

Skill-driven generation of OKafka applications with validated transaction patterns.

In my own work, I found most coding agents weren’t generating high-quality code for Oracle AI Database’s Kafka Java API (OKafka). You can get results, but they’re not idiomatic, and miss subtleties. This is why I created the okafka-java-code oracle agent skill, based off my hand-written Kafka Java API examples.

Agent skills can greatly enhance code generation for Oracle AI Database apps, and this skill encapsulates solutions to the problems I kept hand-coding: how to authenticate with OKafka, how to use transactions, how to create topics, and how to use Oracle-specific serialization.

To install the skill, point your agents at this GitHub link:

https://github.com/anders-swanson/oracle-database-code-samples/tree/main/skills/okafka-java-code

What’s in the skill

This is a standard agent skill, with markdown references to code snippets implemented by my samples:

skills/okafka-java-code
├── agent-skill-okafka-java-api.md
├── agents
│   └── openai.yaml
├── references
│   ├── authentication-and-properties.md
│   ├── dependencies.md
│   ├── oson-serialization.md
│   ├── producer-consumer.md
│   ├── testing-and-troubleshooting.md
│   ├── topics-and-admin.md
│   └── transactions.md
└── SKILL.md

Each reference markdown file covers specific areas of OKafka Java code: initializing OKafka classes, serialization, authentication, testing, and transactional workloads.

Let’s try using the skill to generate an app

Start by installing the OKafka Java Code skill and see what you can generate.

I used the Oracle agent skill to generate an app with a transactional producer and consumer, and a Testcontainers test. The app was generated in one shot with Codex and GPT 5.5-high and is almost identical to code I’d write myself. Transactional workflows are handled by calling getDBConnection on the producer and consumer, producing and consuming messages in the same database transaction as insert and updates.

The generated app creates a transactional event flow around Oracle AI Database Transactional Event Queues:

TopicAdmin creates the topic through Kafka Admin with OKafka’s AdminClient.
OkafkaProperties builds base properties and adds producer or consumer settings in separate methods.
TransactionalEventProducer sends a record and writes to produced_events through producer.getDBConnection() before commitTransaction().
TransactionalEventConsumer writes consumed records through consumer.getDBConnection() and calls commitSync() only after the database work succeeds.
TransactionalEventsIT starts an Oracle AI Database Free container with Testcontainers, applies the OKafka grants, creates a topic, and verifies producer commit, producer abort, and consumer rollback behavior.

This producer method is the kind of output I wanted to nudge agent stoward:

private void publish(BusinessEvent event, boolean failAfterDatabaseWrite) throws Exception {
    producer.beginTransaction();
    try {
        producer.send(new ProducerRecord<>(topic, event.id(), event.payload())).get();
        insertProducedEvent(producer.getDBConnection(), event);
        if (failAfterDatabaseWrite) {
            throw new IllegalStateException("Simulated failure before producer commit");
        }
        producer.commitTransaction();
    } catch (InterruptedException exception) {
        Thread.currentThread().interrupt();
        abortAndRethrow(exception);
    } catch (Exception exception) {
        abortAndRethrow(exception);
    }
}

You can see the transaction boundary, the Kafka send, the database write, and the abort path in one place.

Transactional OKafka pattern coordinating Kafka messages and database changes.

The consumer side follows the same idea:

private void persistAndCommit(ConsumerRecords<String, String> records, boolean failAfterDatabaseWrite)
        throws Exception {
    Connection connection = consumer.getDBConnection();
    try {
        for (ConsumerRecord<String, String> record : records) {
            insertConsumedEvent(connection, record);
        }
        if (failAfterDatabaseWrite) {
            throw new IllegalStateException("Simulated failure before consumer commit");
        }
        consumer.commitSync();
    } catch (Exception exception) {
        connection.rollback();
        throw exception;
    }
}

The generated code preserves the important bits: database work happens on the consumer’s OKafka connection, and the offset is committed only after that work succeeds.

Testing is part of the skill

Runnable OKafka test topology validating commit, abort, and rollback behavior.

This skill includes guidance to validate with an integration test or smoke path that creates the topic, produces records, consumes records, and queries the TxEventQ backing table or related database side effect.

The generated app follows that direction. Its integration test starts gvenzl/oracle-free:23.26.2-slim-faststart, writes an ojdbc.properties file for local PLAINTEXT OKafka access, and then checks three paths:

a committed producer transaction creates the database row and can be consumed;
an aborted producer transaction leaves no produced row and no consumable record;
a failed consumer batch rolls back the database write and leaves the record available for a later successful consume.

You can run the generated app tests with mvn verify

The test includes grants and setup for the local container:

alter session set container=freepdb1;

grant aq_user_role to TESTUSER;
grant execute on dbms_aq to TESTUSER;
grant execute on dbms_aqadm to TESTUSER;
grant select on gv_$session to TESTUSER;
grant select on v_$session to TESTUSER;
grant select on gv_$instance to TESTUSER;
grant select on gv_$listener_network to TESTUSER;
grant select on SYS.DBA_RSRC_PLAN_DIRECTIVES to TESTUSER;
grant select on gv_$pdbs to TESTUSER;
grant select on user_queue_partition_assignment_table to TESTUSER;
exec dbms_aqadm.GRANT_PRIV_FOR_RM_PLAN('TESTUSER');
commit;

This is loaded and run on the local container at test startup:

@Container
private static final OracleContainer ORACLE = new OracleContainer(ORACLE_IMAGE)
        .withStartupTimeout(Duration.ofMinutes(4))
        .withUsername(TEST_USER)
        .withPassword(TEST_PASSWORD);

private static OracleDataSource dataSource;
private static Path okafkaConfigDirectory;

@BeforeAll
static void configureDatabase() throws Exception {
    ORACLE.copyFileToContainer(MountableFile.forClasspathResource("okafka.sql"), "/tmp/okafka.sql");
    org.testcontainers.containers.Container.ExecResult result =
            ORACLE.execInContainer("sqlplus", "sys / as sysdba", "@/tmp/okafka.sql");
    if (result.getExitCode() != 0) {
        throw new IllegalStateException("Unable to apply OKafka grants: " + result.getStderr());
    }

    dataSource = new OracleDataSource();
    dataSource.setURL(ORACLE.getJdbcUrl());
    dataSource.setUser(TEST_USER);
    dataSource.setPassword(TEST_PASSWORD);

    okafkaConfigDirectory = Files.createTempDirectory("okafka-tns-admin-");
    Files.writeString(okafkaConfigDirectory.resolve("ojdbc.properties"), """
            user = testuser
            password = Welcome123#
            """);

    try (Connection connection = dataSource.getConnection()) {
        EventSchema.createTables(connection);
    }
}

Final Thoughts

Agent skill design for reusable OKafka coding patterns and validation workflows.

The real leverage here is developing and sharing agent skills that capture the Oracle AI Database patterns your team needs. Do you have common database workflows? Common development patterns? Encapsulate them in a skill, iterate on it, and share it.

Once details are packaged, agents can operate at a higher level. You spend less time correcting boilerplate and more time designing stronger examples, testing real behavior, and building more powerful Oracle AI Database applications from a better starting point.

To summarize

Any Java developer working with Oracle AI Database can use this skill to write pub/sub code with Kafka APIs that target the database.
OKafka adds database connection APIs to standard Kafka Java APIs; otherwise, the same interfaces are used.
The getDBConnection() method in OKafka KafkaProducer and KafkaConsumer classes allows developers to add database logic to produce and consume operations in a single transaction.
To validate generated code yourself, refer to concrete OKafka code examples.
The skill leverages hand-written, tested OKafka code to generate new code specific to your application. You can find additional samples here.

References

Single OpenAI-compatible endpoint for OCI Generative AI models with LiteLLM

Anya Summers — Mon, 29 Jun 2026 14:30:28 +0000

This post stands up a LiteLLM gateway on an OCI Compute instance that authenticates to OCI Generative AI using an instance principal — the identity OCI already hands every VM — so there are no signing keys to generate, mount, or rotate. Supported OCI Generative AI models such as Grok, Gemini, Llama, and Cohere models can be reached through the gateway, subject to region and model availability. And because routing is pure passthrough, the new supported on-demand models can be discovered without maintaining a hardcoded model list.

If you saw the announcement that LiteLLM now natively supports Oracle Generative AI, this is the hands-on companion: the exact resources, the IAM that makes instance principal work, and the networking detail that ties it together — start to finish.

Why this shape

LiteLLM gives you a single OpenAI-compatible surface (/v1/chat/completions, /v1/embeddings, /v1/models) in front of Grok, Llama, Gemini, Cohere Command/Embed, and OpenAI gpt-oss — all hosted on OCI Generative AI, with OCI Signature v1 signing handled inside LiteLLM. Running it inside your tenancy on a Compute instance buys a simpler OCI credential story: the instance authenticates as itself, governed by an IAM policy, and you never handle a private key.

LiteLLM with OCI GenAI Architecture

The gateway runs inside your tenancy. The client hits Caddy on :443 (NSG-gated, SSH scoped to your IP); Caddy terminates TLS and reverse-proxies to the shim on localhost:4000. The shim signs each call with the VM’s own instance-principal identity — token fetched from 169.254.169.254 — and reaches OCI Generative AI via the Internet Gateway. One IAM policy authorizes it all; no OCI API signing keys on disk. The federated token is short-lived, so the shim re-federates automatically on an OCI 401 INVALID_AUTHENTICATION_INFO — token expiry self-heals rather than surfacing as an error.

The one caveat worth reading first

LiteLLM exposes OCI two ways, and they are not interchangeable:

The LiteLLM Proxy (litellm --config config.yaml) supports OCI via manual API-key credentials only — oci_user, oci_fingerprint, oci_tenancy, oci_key/oci_key_file, oci_compartment_id. There is no way to hand the proxy an instance-principal signer object through YAML.
The LiteLLM SDK (litellm.completion(...)) accepts an oci_signer= object, which is the door to instance principal, resource principal, and OKE workload identity.

So if you want instance principal without OCI API signing keys, you call the SDK and put a thin OpenAI-compatible HTTP layer in front of it. That’s the path of this implementation. You trade away the proxy’s management UI (virtual keys, budgets, logs); you avoid storing OCI API signing credentials.

Step 1 — IAM: one policy, no keys, no users

Instance principal is an any-principal identity at request time; the only thing between your VM and OCI Generative AI is a policy. Broad version:

allow any-user to manage generative-ai-family in compartment <YourCompartment>

Least-privilege version, scoped to the instance via a dynamic group:

# Dynamic group (Identity & Security > Domains > Dynamic groups)
ALL {instance.compartment.id = '<compartment-ocid>'}
# Policy
allow dynamic-group <litellm-dg> to use generative-ai-family in compartment <YourCompartment>

‘use‘ is enough for inference. That’s the entire identity story — no OCI API signing key stored on the instance.

Step 2 — Networking

A dedicated VCN keeps the gateway self-contained and trivially removable:

COMP=<compartment-ocid>; REGION=us-ashburn-1
VCN=$(oci network vcn create -c $COMP --region $REGION --cidr-blocks '["10.20.0.0/16"]' \
  --display-name litellm-vcn --dns-label litellmvcn --query data.id --raw-output --wait-for-state AVAILABLE)
IGW=$(oci network internet-gateway create -c $COMP --region $REGION --vcn-id $VCN --is-enabled true \
  --display-name litellm-igw --query data.id --raw-output --wait-for-state AVAILABLE)
RT=$(oci network route-table create -c $COMP --region $REGION --vcn-id $VCN --display-name litellm-rt \
  --route-rules '[{"destination":"0.0.0.0/0","destinationType":"CIDR_BLOCK","networkEntityId":"'$IGW'"}]' \
  --query data.id --raw-output --wait-for-state AVAILABLE)
SUBNET=$(oci network subnet create -c $COMP --region $REGION --vcn-id $VCN --cidr-block 10.20.1.0/24 \
  --display-name litellm-subnet --dns-label litellmsub --route-table-id $RT \
  --prohibit-public-ip-on-vnic false --query data.id --raw-output --wait-for-state AVAILABLE)
NSG=$(oci network nsg create -c $COMP --region $REGION --vcn-id $VCN --display-name litellm-nsg \
  --query data.id --raw-output --wait-for-state AVAILABLE)
oci network nsg rules add --nsg-id $NSG --region $REGION --security-rules \
  '[{"direction":"INGRESS","protocol":"6","source":"0.0.0.0/0","sourceType":"CIDR_BLOCK","isStateless":false,"tcpOptions":{"destinationPortRange":{"min":4000,"max":4000}}}]'

Keep SSH (22) on the VCN default security list but scope it to your own IP. Port 4000 lives on the NSG, so you open it as wide as your clients need.

Step 3 — A baked image, not install-on-boot

The LiteLLM image runs from a uv-managed venv at /app/.venv that ships without pip, so python3 -m pip install oci fails with “No module named pip”. Bootstrap it once at build time and bake the result into an image, so every container start is fast and offline:

FROM ghcr.io/berriai/litellm:main-stable
RUN /app/.venv/bin/python3 -m ensurepip --upgrade \
 && /app/.venv/bin/python3 -m pip install --no-cache-dir oci fastapi uvicorn
COPY server.py /app/server.py
ENV PORT=4000
ENTRYPOINT ["/app/.venv/bin/python3", "/app/server.py"]

podman build -t oci-litellm-gateway:latest -f Containerfile .
podman run -d --name litellm --restart=always --network=host \
  -e OCI_REGION=us-ashburn-1 -e OCI_COMPARTMENT_ID=<compartment-ocid> \
  -e LITELLM_MASTER_KEY=<your-bearer-key> -e PORT=4000 \
  oci-litellm-gateway:latest

Two details that can save you a lot of debugging time:

--network=host is mandatory. Instance principal fetches its leaf certificate and token from the metadata service at 169.254.169.254 (link-local address). A container on the default bridge network can’t route to that link-local address; host networking fixes it (and binds :4000 on the host directly).
Use the venv’s python3. litellm lives in /app/.venv; a system python won’t see it. The ENTRYPOINT above pins it.

Step 4 — The shim: LiteLLM SDK behind an OpenAI-compatible API

This is the whole gateway. It builds the instance-principal signer once, exposes the OpenAI routes, forwards every model name straight through as oci/, and discovers /v1/models live from OCI so there is no list to maintain. Those routes are the Chat Completions–era API (/v1/chat/completions, /v1/embeddings, /v1/models) — deliberately not OpenAI’s newer Responses API (/v1/responses); LiteLLM’s completion() and embedding() map to Chat Completions and Embeddings, which is still what every mainstream chat client speaks. Configuration is entirely environment-driven.

import os, json, datetime, litellm
from oci.auth.signers import InstancePrincipalsSecurityTokenSigner
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse

REGION = os.environ.get("OCI_REGION", "us-ashburn-1")
COMP   = os.environ.get("OCI_COMPARTMENT_ID", "")
MKEY   = os.environ.get("LITELLM_MASTER_KEY", "")
PORT   = int(os.environ.get("PORT", "4000"))

SIGNER = InstancePrincipalsSecurityTokenSigner()
OCI = dict(oci_signer=SIGNER, oci_region=REGION, oci_compartment_id=COMP)
app = FastAPI()

def resolve(name):                       # pure passthrough: any model -> oci/<model>
    if not name: raise HTTPException(400, "missing 'model'")
    return name if name.startswith("oci/") else f"oci/{name}"

def auth(r):
    if MKEY and r.headers.get("authorization", "") != f"Bearer {MKEY}":
        raise HTTPException(401, "unauthorized")

def discover_models():                   # live, best-effort; never fatal
    try:
        import oci
        c = oci.generative_ai.GenerativeAiClient(config={}, signer=SIGNER)
        now = datetime.datetime.now(datetime.timezone.utc); out = []
        for m in c.list_models(compartment_id=COMP).data.items:
            caps = set(m.capabilities or [])
            if m.lifecycle_state != "ACTIVE" or not ({"CHAT","TEXT_EMBEDDINGS"} & caps): continue
            r = m.time_on_demand_retired
            if r is not None and r.year > 1971 and r <= now: continue
            out.append(m.display_name)
        return sorted(set(out))
    except Exception:
        return []

@app.get("/health/readiness")
def ready(): return {"status": "connected"}

@app.get("/v1/models")
def models():
    return {"object": "list", "data": [{"id": m, "object": "model", "owned_by": "oci"} for m in discover_models()]}

@app.post("/v1/chat/completions")
async def chat(req: Request):
    auth(req); b = await req.json()
    common = dict(model=resolve(b.get("model")), messages=b["messages"], **OCI)
    if b.get("stream"):
        def gen():
            for c in litellm.completion(stream=True, **common):
                yield f"data: {json.dumps(c.model_dump())}\n\n"
            yield "data: [DONE]\n\n"
        return StreamingResponse(gen(), media_type="text/event-stream")
    return litellm.completion(**common).model_dump()

@app.post("/v1/embeddings")
async def embeddings(req: Request):
    auth(req); b = await req.json()
    inp = b["input"]; inp = [inp] if isinstance(inp, str) else inp
    return litellm.embedding(model=resolve(b.get("model")), input=inp, **OCI).model_dump()

if __name__ == "__main__":
    import uvicorn; uvicorn.run(app, host="0.0.0.0", port=PORT)

Because routing is passthrough, supported on-demand OCI Generative AI models can be reached through the gateway, subject to region, tenancy access, model availability, and LiteLLM compatibility. The live /v1/models discovery means you do not need to maintain a hardcoded model list, and supported new models can become available through the endpoint as OCI exposes them.

Step 5 — Test it

IP=<public-ip>; KEY=<your-bearer-key>
curl -s http://$IP:4000/health/readiness
curl -s http://$IP:4000/v1/chat/completions \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"model":"xai.grok-4.3","messages":[{"role":"user","content":"In one sentence, what is Oracle Generative AI?"}]}'

It’s a drop-in OpenAI base URL, so the OpenAI SDK works unchanged:

from openai import OpenAI
c = OpenAI(base_url="http://<public-ip>:4000/v1", api_key="<your-bearer-key>")
print(c.chat.completions.create(model="xai.grok-4.3",
    messages=[{"role": "user", "content": "Hello from OCI"}]).choices[0].message.content)

Step 6 — Harden

Scope SSH to your IP in the VCN default security list; leave port 4000 (on the NSG) as open as you need.
Front it with a name and TLS (Caddy). Point a DNS-only A record at the instance, open 80 + 443 in the NSG, then run Caddy alongside the gateway with a two-line Caddyfile:chat.example.com { reverse_proxy localhost:4000 } podman run -d --name caddy --restart=always --network=host \ -v /opt/caddy/Caddyfile:/etc/caddy/Caddyfile:Z \ -v caddy_data:/data -v caddy_config:/config \ docker.io/library/caddy:latest Caddy obtains and renews a Let’s Encrypt certificate automatically (TLS-ALPN-01 on 443, HTTP-01 on 80) and reverse-proxies to the shim on localhost:4000, passing the Authorization header through. Now https://chat.example.com/v1 works with no port (your domain name may vary here) — which also unblocks browser-hosted chat UIs that refuse to call plain-HTTP endpoints (mixed content). Note most DNS proxies won’t forward arbitrary ports, so keep the record DNS-only (or let Caddy own 443).
Let browser UIs in (CORS). Server-side clients (curl, the SDK, Open WebUI, LobeChat on Vercel) work as-is, but browser apps that call the endpoint straight from the page need CORS headers or the browser blocks the preflight. Set ENABLE_CORS=true and scope CORS_ORIGINS; the master key stays the gate, and since auth is a bearer header rather than a cookie, credentials mode stays off.
Rotate the key by changing LITELLM_MASTER_KEY and restarting the container.
For real multi-tenant key management, budgets, and request logs, switch to the LiteLLM Proxy with a manual signing key.

Conclusion

This is a single OpenAI-compatible endpoint for supported on-demand OCI Generative AI models, authenticated with the instance’s own identity, without storing an OCI API signing key. That last part is the real win. Revoking access is editing one IAM policy.

And it’s small enough to trust: one VCN, one subnet, one NSG, one VM, one policy — a surface you can hand to a security reviewer on a single page or stamp out per environment from the cloud-init here.

In return, any OpenAI-compatible client—desktop apps, browser UIs, or your own code—can access Grok, Gemini, Llama, and Cohere without SDKs or per-application credentials. And because models are discovered dynamically, new OCI Generative AI models become available through the endpoint automatically.

The only implementation details worth remembering are the non-obvious ones: --network=host for metadata access, bootstrapping pip into the image’s venv at build time, and remembering that instance principal authentication lives on the SDK path, not in the proxy’s YAML configuration.

The Agent Loop Decoded

Anya Summers — Mon, 29 Jun 2026 14:28:14 +0000

This article was originally written and published by Richmond Alake on blogs.oracle on 11 June.

Three Levels Every Agent Engineer Must Know

Chances are you have already run an agent loop today without naming it.

Every session with a coding companion such as Claude Code, Codex, or Cursor is one: the model reads a request, inspects the repository, edits a file, runs the tests, observes the failures, and edits again until the build passes.

That cycle of reasoning, acting, and observing the result is the agent loop at work, and it now sits at the centre of nearly every production agent system. The agent loop is the repeating cycle a harness runs within a single agent turn: assemble context, invoke the model to reason, act on its decision, and go again until a stop condition ends the run.

This piece unpacks that loop across three levels of understanding.

Level 1 is the minimal loop most developers meet first: an LLM, a handful of tools, and a response.
Level 2 introduces a lifecycle inside the loop, where memory operations turn a stateless process into a reasoning engine with state.
Level 3 pushes operations both inside and outside the loop, where the agent harness becomes a system in its own right.

By the end, you will know which level your system sits at, what breaks when the level and the task are mismatched, and what engineering work moves you up. Every pattern discussed is implemented in the companion notebook, built on Oracle AI Database, so you can run the loop rather than just read about it.

What is an Agent

Figure 1: An agent perceives its environment, reasons with an LLM, acts, and remembers

An agent is a computational system that perceives its environment, reasons about what it perceives, takes actions to achieve a goal, and has some form of memory. That description applies to many things: a thermostat, a chess engine, a human professional. What makes an AI agent distinct is that the reasoning step is handled by a large language model, and the range of possible actions extends well beyond a binary output.

An agent’s architecture consists of two separable layers. The first is the model: the inference engine that does the reasoning. The second is the harness: the code that prepares context, executes tool calls, enforces operational constraints, and persists state. Most agent engineering work happens in the harness, not the model. Understanding that boundary clarifies where failures originate and where interventions are effective.

Figure 2: The two layers of an agent’s architecture: the model and the harness

An agent needs at minimum four things to be useful:

Instructions: a system prompt or goal that tells it what it is trying to accomplish.
Memory: access to information beyond the current message, including prior context, retrieved knowledge, and learned patterns.
The ability to take actions: tool calls, API requests, database writes, or any operation with an external effect.
A reasoning engine: an LLM that looks at context and decides what to do next.

What Is a Loop?

A loop is a control structure that repeats a block of execution until a condition is met. In programming you encounter this everywhere: iterating over a collection, running until a flag is set, calling recursively until a base case is reached.

The agent loop applies that same structure to an LLM-powered system. Rather than processing a user message once and returning a static response, the agent feeds its output back into itself, reasoning, acting, observing the result, and reasoning again, until it determines the task is complete.

Figure 3: The agent loop: assemble context, reason, act, and repeat until a stop condition ends the run

The necessity for loops in agent execution can be derived from the nature of the use cases and tasks agents are applied to. These common use cases can be referred to as application modes: the expected interaction patterns between a user and an agent. There are three:

Assistant
Deep Research
Coding

Take the deep research mode. An agent tasked with finding relevant sources, identifying contradictions across them, and producing a structured summary is not running a single-shot task. It requires the agent to:

Search for relevant sources.
Read and evaluate what it finds.
Identify gaps and contradictions.
Search again to fill in those gaps.
Synthesise everything into a coherent output.

Figure 4: The deep research cycle: search, evaluate, identify gaps, and search again until coverage is sufficient

No single LLM call can do all of that. What is required is the mechanism and scaffolding that allows the model to reason, act, observe the result, reason again, and continue until the task is complete. That mechanism is the agent loop.

Notably, implementations of agent frameworks and harnesses, however opinionated, have shared one thing in common: convergence on a minimal agent loop design. That convergence is arguably not much of a design choice, so much as a logical consequence of the task itself.

The agent loop exists because long-horizon tasks cannot be completed in a single forward pass.

The loop emerging as a design pattern draws a parallel to how humans operate in most organisations: structured cycles of work, review, and feedback that repeat until the objective is met.

Stop Conditions

Loops have to be exited eventually. The programmatic loops taught in computer science classes usually exit in one of two ways: the iteration count for the loop is reached, or a break statement inside the loop triggers an exit.

A well-designed agent loop defines explicit exit criteria. Common examples:

The model produces a final response with no pending tool calls.
A goal-completion check returns true: an objective-specific predicate, not merely the absence of tool calls.
A maximum number of iterations is reached.
A wall-clock timeout expires.
An error occurs that the agent cannot recover from.
The harness identifies a failure mode, such as the agent repeating the same action without progress.
The agent explicitly invokes an exit action or sets a completion flag.

In the notebook accompanying this article, the stop conditions are implemented directly inside the harness:

def call_agent(query, thread_id='1', max_iterations=10,  
max_execution_time_s=60.0): 
 start_time = time.time() 
 iteration = 0 
 while iteration < max_iterations: 
 if time.time() - start_time > max_execution_time_s: 
 break # Wall-clock timeout 
 response = call_openai_chat(messages, tools) 
 if not response.tool_calls: 
 break # Model produced a terminal message; exit the loop 
 # Execute tools, append outputs, continue 
 iteration += 1 
 # Fallback if max iterations reached 
 return 'Max iterations reached; please refine the request.'

The max iterations of the loop is set to 10 by default. This is a guard against the loop running indefinitely, which can incur high operational cost through the increase in token consumption across inference calls. There is also a max_execution_time_s parameter, which adds a temporal guard to the agent loop’s execution.

It is worth noting that a terminal message from the model, one with no further tool calls, ends the agent’s turn. It does not mean the user’s goal has been satisfied. The model may return a clarifying question, a partial result, or a response that requires follow-up. The agent harness is responsible for checking whether the goal is actually complete, not simply whether the model has stopped emitting tool calls. This distinction becomes more consequential as tasks grow in length and complexity, and it is where domain expertise becomes paramount in agent harness engineering.

Failure mode identification deserves its own mention as an exit path. A loop should break not only when work completes but when work stops progressing.

The clearest example is tool call repetition: the agent invokes the same tool with identical arguments for a third consecutive iteration, a strong signal that it is stuck rather than working. A well-instrumented harness keeps a window of recent tool calls, detects the repetition, and exits with a diagnostic instead of spending the remaining iterations on a stalled run. Oscillation between two states belongs to the same family of detectable failures.

Defining the Agent Loop

With the components and the exit criteria established, the definition can now be stated with precision:

The Agent Loop

A cyclical, iterative execution pattern inside a single agent run where the harness repeatedly:

Assembles execution context: system instructions, conversation state, retrieved memory, tool outputs, and any relevant external data.
Invokes a reasoning model to decide what to do next.
Acts: responds to the user, calls tools, writes memory or state, or updates its plan.

Each cycle appends its trace (assistant messages, tool outputs, state updates) to the context and repeats until a termination check ends the run. Context-window pressure and operational safety (timeouts, iteration caps, budget guards) are first-class concerns, not afterthoughts.

Three Levels of the Agent Loop

The agent loop is not a fixed pattern. The simple design presented above evolves as memory, tooling, and opinionated scaffolding are added. The three levels below provide a framework for where a system currently sits and what engineering work lies ahead. Most production failures (agents that repeat themselves, lose context, or produce inconsistent results across sessions) trace back to a mismatch between task complexity and agent level.

Figure 5: The three levels of the agent loop

Level 1: LLM + Tools + Response

At its simplest, the agent loop is an LLM that can call tools and return a response. There is no persistent memory, no external state, and no scaffolding beyond the loop itself. The loop iterates because tool results must be fed back to the model before it can produce a final answer.

The code below demonstrates the pattern most developers encounter when building simple tool-calling agents:

messages = [system_prompt, user_message] 
while True: 
 response = llm.chat(messages, tools=available_tools) 
 if response.tool_calls: 
 for call in response.tool_calls: 
 result = execute_tool(call.name, call.args) 
 messages.append(tool_result(result)) 
 else: 
 return response.content # Terminal message; exit

Figure 6: Level 1: the minimal tool-calling loop

LangChain’s ReAct agent provides this pattern out of the box. The agent receives an input query, selects a tool, calls it, observes the output, and reasons again, all within a single run:

from langchain.agents import AgentExecutor, create_react_agent from langchain_openai import ChatOpenAI 
llm = ChatOpenAI(model='gpt-4o') 
agent = create_react_agent(llm, tools=[search_tool], prompt=prompt) executor = AgentExecutor(agent=agent, tools=[search_tool],  
max_iterations=10) 
executor.invoke({'input': 'What are the latest AI papers on agent  memory?'})

Level 1 is where most developers start, and it is genuinely useful for self-contained tasks. Its limitation is structural: the agent has no recollection of previous conversations. Every run starts cold, the context window is the only memory it has, and it resets completely when the run ends. On any multi-turn or long-horizon task, it will repeat work it already did, lose track of decisions made earlier in the session, and produce output that contradicts its own prior responses.

Level 2: Lifecycle Inside the Loop

At Level 2, operations begin to appear inside the agent loop. Memory is read before the LLM is called, and memory is written after the agent acts. The loop now has a lifecycle. At Level 1, the loop can be seen as a transport mechanism for tool calls. At Level 2, the loop becomes a reasoning engine with state. This is also where the distinction between a memory-augmented agent and a memory-aware agent becomes consequential.

Memory-augmented agents retrieve and inject information into context. They read from memory, but they do not actively manage it. Memory is something that happens to them.
Memory-aware agents treat memory as a first-class engineering concern. They encode, store, retrieve, inject, and forget, actively managing their cognitive state within each run and across sessions. Level 2 is where you begin building memory-aware agents.

This distinction, and the engineering it implies, is the subject of the DeepLearning.AI short course Agent Memory: Building Memory-Aware Agents, built with Oracle, if you want the full overview.

Figure 7: Memory-augmented agents read from memory; memory-aware agents manage it

Level 2 makes context assembly trade-offs immediately visible. Adding more memory types (conversation history, retrieved documents, entity records, workflow patterns) improves grounding and action selection. On the other hand, it also introduces cost: more tokens, higher latency, and a greater risk of injecting irrelevant or stale content that misleads the model rather than informing it.

There are a few failure modes worth mentioning:

Noisy retrieval: semantically similar documents that are not actually relevant to the current query. Mitigation approaches are implemented via relevance thresholds and precision-oriented retrieval strategies such as hybrid search and pre-, post-, and in-filtering methods in retrieval pipelines.
Stale memory: data can quickly become irrelevant in a fast-paced problem domain: cached facts, entity records, or summaries that are no longer accurate. Mitigate with TTL policies and update-on-write patterns.
Tool schema overload: context bloat is a common problem, and it is most prevalent in tool-calling agents with too many tool definitions passed to the model at once, degrading tool selection accuracy. Mitigate with semantic tool retrieval rather than exhaustive enumeration; this is shown in the companion notebook for this piece.

There are more failure modes, and in production these are not edge cases. They are predictable failures that any Level 2 agent will encounter as memory stores grow. Designing mitigation strategies at the start is cheaper than retrofitting fixes later.

Memory operations are common in Level 2 agent loops, mainly because agents at this level are designed for continuity and adaptation. Memory operations are programmatic methods designed to modify data and information within the agent’s system boundary and across other system components such as databases and external stores.

Operation	When It Runs	Purpose
Read conversational memory	Before LLM call	Load prior chat history into context
Read knowledge base	Before LLM call	Inject relevant documents and facts
Read workflow memory	Before LLM call	Surface known action
patterns
Read entity memory	Before LLM call	Resolve named references in the query
Write conversational memory	After user message
received	Persist the user turn
Write knowledge base	After tool search	Store retrieved results for future runs
Write entity memory	After LLM response	Extract and persist people, places, systems
Write conversational memory	After final response	Persist the assistant turn

In the accompanying notebook, these operations are centralised in a MemoryManager class backed by Oracle AI Database. Before each run, the harness calls all read operations to assemble context. After each run, write operations persist the new information:

# -- Reads: all run BEFORE the tool-call loop ------------------------ conv_mem = memory_manager.read_conversational_memory(thread_id) knowledge = memory_manager.read_knowledge_base(query) 
workflows = memory_manager.read_workflow(query) 
entities = memory_manager.read_entity(query) 
summaries = memory_manager.read_summary_context(thread_id) 
context = build_context(conv_mem, knowledge, workflows, entities,  summaries) 
# -- Inner tool-call loop -------------------------------------------- response = run_tool_call_loop(context, tools) 
# -- Writes: all run AFTER the loop exits ---------------------------- memory_manager.write_conversational_memory(thread_id, 'assistant',  response) 
memory_manager.write_entity(extract_entities(query, response))

The notebook uses six distinct memory types, each stored in Oracle AI Database and each serving a specific cognitive function:

Conversational memory: episodic chat history retrieved by thread ID via a standard SQL table. Exact lookup, no similarity search required.
Knowledge base memory: semantic memory backed by a vector-enabled SQL table with HNSW indexing for similarity search.
Workflow memory: procedural memory storing learned action patterns and tool sequences.
Toolbox memory: a vector-indexed registry of tool definitions enabling semantic discovery rather than exhaustive schema enumeration.
Entity memory: LLM-extracted people, places, and systems, persisted across sessions.
Summary memory: compressed context for long conversations, with just-in-time expansion when the agent needs the full content.

At Level 2, the loop is no longer just executing tools. It is actively managing its own cognitive state.

Level 3: Operations Inside and Outside the Loop

At this point, developers understand not only which operations they require inside the loop; more opinionated scaffolding and harness begin to form around the agent loop itself.

Operations now exist both within the loop and outside it, and there are deliberate architectural choices about which side of the boundary each operation belongs on. This is where agent engineering becomes opinionated, and where context engineering and memory engineering become distinct disciplines with separate concerns.

In a Level 3 agent loop, some operations should be automatic. The agent should never have to decide whether to load its own conversation history. Others should be agent-triggered: the agent decides when to search the web, not the harness.

Getting this boundary wrong produces either context bloat, when too much is loaded automatically, or missed context, when content that should always be present is left to the model’s discretion.

Operation	Programmatic	Agent Triggered	Why
Read conversational memory	Yes	No	The agent always needs its history
Read knowledge base	Yes	No	Relevant documents always loaded at run start
Read workflow base	Yes	No	Known patterns always
surfaced before reasoning
Read entity memory	Yes	No	Named references always resolved upfront
Read summary context	Yes	No	Summary IDs always loaded; full content expanded on demand
Expand a summary	No	Yes	Agent decides when it needs the full content
Search the web (Tavily)	No	Yes	Agent decides when stored knowledge is insufficient
Summarise conversation	No	Yes	Agent decides when context needs compaction
Write tool log (offload)	Yes	No	Automatic after every tool call; keeps context lean

Context engineering at Level 3

Three techniques only become necessary at Level 3. Below Level 3, your context is manageable by construction. At Level 3, with memory reads, multiple tool calls, and iterated reasoning, it is not.

Context window monitoring: tracking token usage across iterations to detect when compaction is needed before the window fills and performance degrades.
Conversation compaction: replacing verbose chat history with compressed summaries while preserving originals in the database. The notebook marks messages with a summary_id rather than deleting them, keeping the full record available for audit and on-demand expansion.
Tool output offloading: persisting full tool outputs to a tool log table and replacing them in context with a compact one-line reference.

The tool log pattern is worth examining in detail. A single web search can return three to four thousand tokens of raw results. Without offloading, every subsequent iteration in the same run carries those tokens. With offloading, the context receives only a reference:

def execute_tool(tool_name, tool_args, thread_id): 
 raw_output = run_tool(tool_name, tool_args) 
 # Full output persisted to the database 
 log_id = memory_manager.write_tool_log( 
 thread_id=thread_id, 
 tool_name=tool_name, 
 tool_output=raw_output 
 ) 
 # Context receives only the compact reference 
 return f'[Tool Log ID: {log_id}] Results stored. Call read_tool_log to  retrieve.'

Semantic tool discovery

At Level 3, the number of available tools is unlikely to stay small. Passing every tool schema to the model on every iteration is a known failure mode: tool selection accuracy drops as the schema list grows, and token costs climb regardless of how many tools are actually relevant.

The notebook addresses this with a Toolbox: a vector-indexed registry of tool definitions where only semantically relevant tools are retrieved and passed to the model for each query. Tools are registered with LLM-augmented metadata so that embeddings capture intent and use case, not just function signatures:

@toolbox.register_tool(augment=True) # LLM enriches description for  retrieval 
def search_tavily(query: str, max_results: int = 5): 
 """Search the web and persist results in the knowledge base."""  ... 
# At runtime: only semantically relevant tools passed to the model
relevant_tools = memory_manager.read_toolbox(current_query)

Idempotency and tool reliability

Tool call failures are a production reality. Network errors, rate limits, and transient service issues occur regularly. If the harness retries a failed tool call naively, it risks executing a side-effecting operation twice: writing a record, sending a message, or triggering a payment more than once.

The mitigation is idempotency: assigning each tool call a stable key before execution so that retries can be safely distinguished from duplicate calls. This is harness-level engineering, not model-level reasoning, and it belongs in the Level 3 design.

Prompt caching and message ordering

At Level 3, the harness also starts to affect inference economics through prompt caching. Most LLM providers implement prefix-based caching: if the beginning of a prompt is identical to a recent request, the cached computation can be reused, reducing latency and cost.

The implication for agent design is concrete. Rewriting earlier messages mid-conversation, to clean up history, reorder context, or inject new system instructions inline, breaks prefix stability and degrades cache hit rates. The correct pattern is to append new instructions rather than modifying existing message history. The Codex implementation established this explicitly: old prompts are preserved as exact prefixes of new prompts specifically to maintain caching benefits across long multi-step runs.

Level 3 is where the agent harness becomes a system in its own right. The inner loop, assembling context, invoking the model, and acting, has not changed. What has changed is everything around it: the scaffolding that feeds it, the operational constraints that govern it, and the persistence layer that gives it continuity across time and sessions.

Other Loops the Agent Engineer Should Know

The agent loop does not run in isolation. It sits inside a wider system of loops, and the engineering decisions made inside the agent loop are shaped by what happens in the loops around it.

Three matter most to agent engineers and memory engineers: the training loop that produced the model, the feedback loop that signals whether the system is working, and the human loop that bounds its authority.

Figure 8: The loops interconnected: the training loop produces the model, the agent loop generates experience, and the memory layer routes that experience back as training signal

The training loop

The training loop is the cycle that produced the model in the first place: data collection, gradient updates, evaluation, and release. It operates offline, at a timescale of days or weeks, on curated datasets. The agent loop operates online, in real time, on live interactions.

Today these two loops are largely decoupled. Training happens, weights are frozen, and the agent loop runs on top of those fixed weights. The apparent learning you observe within a session, an agent recalling prior context or adapting to corrections, is not weight updating. It is retrieval. The agent is not learning; it is reading from memory.

This separation defines the boundary of what the agent loop can and cannot accomplish on its own. It can accumulate experience through memory operations. It cannot change the underlying model without a training cycle. Understanding this boundary tells you which problems belong to memory engineering and which require retraining.

The feedback loop

Every action the agent takes produces feedback. Tool results are feedback. User corrections are feedback. Evaluation metrics (hallucination rate, task completion, citation accuracy) are feedback at a system level.

At Level 3, the agent harness begins to make the feedback loop explicit and instrumentable. The notebook’s context window growth chart is a primitive example: watching whether token counts stabilize across runs tells you whether your context engineering is actually working. More sophisticated systems route evaluation signals back into memory stores, marking retrieved content as reliable or unreliable based on downstream outcomes, and gradually improving retrieval quality without retraining.

The feedback loop is what turns an agent into a system that improves over time. Without it, every invocation starts from the same baseline regardless of what the agent has done before.

Human in the loop

Long-horizon tasks regularly reach decision points where the agent lacks the information, authority, or confidence to proceed without human input. The human-in-the-loop pattern introduces a pause condition: the agent surfaces a question or proposed action, waits for review or correction, and then continues.

This is a stop condition of a different kind. Rather than halting because the task is finished, the loop pauses because it has reached the boundary of its autonomous authority. Designing this well involves two things: knowing in advance where those boundaries should sit for a given workflow, and ensuring the agent communicates specifically when it reaches one. A generic request for help is insufficient. The agent must surface a precise description of what information or decision is blocking progress.

Human-in-the-loop is not a safety net for when the agent fails. It is a deliberate architectural decision about where human judgment adds the most value in a system. The agent loop handles what can be reasoned about autonomously. The human loop handles what requires authority, context, or accountability that the agent does not have.

Where This Is Going

The agent loop, the training loop, and the feedback loop are currently operated as separate engineering concerns. That separation is practical, not fundamental. As agents accumulate experience across millions of runs, the information they generate (episodic memories, entity

graphs, workflow patterns, evaluation signals, context growth traces) becomes a training signal. The training loop will eventually consume the output of the agent loop, closing the circle.

When that happens, the quality of the memory layer becomes the quality of the training data. Agents with well-engineered memory (clean episodic records, accurately extracted entities, reliable retrieval signals) produce better training signals than agents that let context accumulate without structure.

This convergence has a name. **Continual learning is the ability of a model to acquire new knowledge and capabilities from a stream of incoming data over time, without retraining from scratch and without catastrophically forgetting what it has already

learned.** It is a formal machine learning discipline, not a metaphor, and it is the bridge between the two loops: the agent loop generates the experience, and continual learning is the process by which the training loop absorbs that experience into model weights.

Continual learning in agentic systems is the capacity of an agent to improve over time through the accumulation of high-signal memory units, with the extracted signal applied across three optimization surfaces: token space, weight space, and latent space.

The Union of the Agent Loop and the Training Loop

What connects them is the memory layer.

Oracle AI Database serves as the agent memory core, providing vector search, relational storage, and graph capabilities in a single engine. Memory operations that run inside the agent loop (encoding, storing, retrieving, injecting, and forgetting) produce a durable record of agent experience.

Oracle OCI provides the platform for continuous learning: the infrastructure to retrain models on that accumulated experience at scale, closing the loop from runtime behaviour back into model weights.

The agent loop and the training loop are converging. The memory layer is where they meet.

For engineers building agents today, this means the decisions made about memory architecture are not just operational decisions. They are decisions about what the system will be able to learn from tomorrow. A database that can serve low-latency semantic search at runtime can also serve as the data source for a continuous training pipeline.

Design your memory layer accordingly.

FAQ

1. What is the agent loop?

The agent loop is the repeating cycle a harness runs within a single agent turn: assemble context, invoke the model to reason, act on its decision, and repeat until a stop condition ends the run. It exists because long-horizon tasks cannot be completed in a single LLM call.

2. How do you stop an agent loop from running forever?

Define explicit stop conditions in the harness: a terminal message with no pending tool calls, a goal-completion check, an iteration cap, a wall-clock timeout, unrecoverable errors, and failure mode detection such as the agent repeating the same tool call with identical arguments.

3. What is the difference between a memory-augmented agent and a memory-aware agent?

A memory-augmented agent retrieves and injects information into context but does not manage it; memory is something that happens to the agent. A memory-aware agent encodes, stores, retrieves, injects, and forgets, actively managing its cognitive state within each run and across sessions.

4. How do I know which level my agent system sits at?

If there is no persistence beyond the context window, it is Level 1. If memory is read before the model call and written after the agent acts, it is Level 2. If there is a deliberate boundary between programmatic and agent-triggered operations, with techniques such as compaction, tool output offloading, and semantic tool discovery, it is Level 3.

5. What connects the agent loop to the training loop?

The memory layer. Agent runs generate experience: episodic records, entities, workflows, and evaluation signals. With continual learning, that experience becomes training signal. Oracle AI Database stores and serves it inside the agent loop; Oracle OCI provides the platform to retrain models on it. The patterns are implemented in the companion notebook.

Database-Enforced Authorization for Agentic AI .NET Applications

Anya Summers — Thu, 25 Jun 2026 20:07:31 +0000

Protect .NET applications from over-broad agent access, prompt injection, and tool misuse with Oracle Deep Data Security and ODP.NET

Key Takeaways

Agentic AI can perform complex tasks, but it often requires broad and dynamic data access, which increases security and compliance risk.
Enforcing authorization in the database reduces duplicated application logic and keeps access rules consistent across applications, agents, and tools.
Oracle Deep Data Security provides database-native policy enforcement. ODP.NET 23.26.2 adds support for passing end-user security context from .NET applications.
.NET applications can adopt this model by integrating end-user security context into the data access layer.

Agentic AI changes how applications access data. Instead of following fixed application flows, agents can choose tools and generate actions at runtime.

When given relevant data, agents can further optimize workflows to meet objectives. However, broader access also increases the risk of unauthorized use and data exfiltration.

As organizations deploy agentic AI, keeping enterprise data protected and auditable becomes harder with the wrong security model. When agents access the database directly or through Model Context Protocol (MCP) server tools, authorization must still be enforced before data is returned. If the agent’s database access is broader than the end user’s authorization, it can expose sensitive data or modify protected records.

The key design question is where authorization should be enforced: in every application, or in the database. At scale, maintaining separate authorization logic in every application becomes difficult to validate and easy to get wrong. When requirements, queries, or schemas change, teams must update authorization logic across every affected application. This becomes unmanageable as more AI applications are deployed across the enterprise.

As attackers adopt AI-driven penetration testing tools to find application vulnerabilities faster than before, securing access control at every app level entry point becomes even more critical.

On the other hand, database-layer enforcement centralizes authorization and applies policies consistently before data is returned. Instead of relying on every developer to secure their part of the app perimeter, the same database policies can apply whether access comes from an application, an AI agent, or an MCP-based tool. Oracle AI Database 26ai (23.26.2) enables this capability with Deep Data Security.

Why Agentic AI Apps Need Deep Data Security

Oracle Deep Data Security is a database-native authorization model that extends traditional system and object privileges by using end-user, agent, role, and attribute context in authorization decisions. It is designed for workloads where users, applications, agents, and tools may access the same data through different paths.

Deep Data Security securely propagates end-user and agent identities, roles, and attributes to the database at runtime using an end-user security context. It is important to note that the end-users may not necessarily be database users. They can be any user type, such as Microsoft Entra ID or web application users. The database uses this context to enforce policies that define what users and agents can do — and when — and to generate audit records that capture activity. For example, a policy can allow a sales manager to see only customer rows for their assigned region, even if an agent generates a broader query.

The Deep Data Security authorization model enforces fine-grained security at the row, column, and cell levels, enabling least-privilege access so end-users and agents see only authorized data. The database can return only authorized rows and masks sensitive column values when the end-user or agent lacks the required entitlement. Because the database enforces these policies during SQL execution, authorization remains consistent even when different applications or agents access the same data.

Since policies are enforced in the database, developers do not have to duplicate the same authorization rules in every application or agent workflow. When requirements change, teams can update the database policy instead of rewriting authorization logic across multiple applications.

For sensitive workflows, access can be granted only for the duration and scope of that workflow, instead of giving the application broad standing privileges. This reduces reliance on shared high-privilege service accounts that can read or write more data than the end user should have access to.

Access boundaries must stay manageable, enforceable, and auditable as workflows change. Deep Data Security enforces least-privilege access for users and agents while preserving user identity in audit records to support safer, compliant AI adoption. .NET applications should incorporate Deep Data Security and pass end-user context to the database in a way they can manage consistently.

Develop .NET Apps with Deep Data Security

.NET applications can pass end-user identity, claims, roles, and application context to the database, where Deep Data Security evaluates policies during SQL execution before unauthorized rows, columns, or values can be returned. Managed ODP.NET and ODP.NET Core 23.26.2 add extension methods to use this context payload.

With minimal code changes, existing ODP.NET applications can use Deep Data Security’s protection with agentic AI. Applications do not need to map each end user to a separate database user. The database evaluates authorization using the supplied end-user security context. The database manages session lifecycles automatically based on OAuth2 tokens, which include user authorization claims for resources and applications.

To do this, you will set the end-user security context on the ODP.NET connection using OracleConnection.SetEndUserSecurityContext. The connection then executes commands on behalf of an end user identified by a token. The application supplies the end-user context separately from the database access token used by the mid-tier. Deep Data Security then evaluates policies using the end-user claims, roles, and attributes. Data roles and attributes allow Oracle AI Database to evaluate role mappings and token claims during authorization. This enables Deep Data Security to deliver fine-grained, end-user-aware access control in .NET without database user credentials.

Deep Data Security evaluates policies during SQL execution, before unauthorized rows, columns, or values are returned. By default, unauthorized data is masked as NULL, though SQL functions can apply other formats.

ODP.NET uses the OracleEndUserSecurityContext class to represent the security identity for an application end user’s database operations.

Putting it altogether, the following .NET code sample shows how to set a connection’s end-user security context and clear it after use.

OracleConnection conn = new OracleConnection(connStr);
conn.Open();
string userToken = GetUserToken();
string midTierToken = GetMidtierToken();

// Create security context using tokens
OracleEndUserSecurityContext securityContext = OracleEndUserSecurityContext.CreateWithTokens(midTierToken, userToken);

// Set security context on the connection
conn.SetEndUserSecurityContext(securityContext);

// Execute database operations

// Clear security context from connection
conn.ClearEndUserSecurityContext();

// Close connection
conn.Close();

Start Developing with ODP.NET Deep Data Security

With ODP.NET and Oracle Deep Data Security, you can build end-to-end agentic AI .NET applications while protecting data from current and emerging threats. Data protection rules can evolve with simple changes, and access can be centrally managed and audited.

Get started by downloading managed ODP.NET or ODP.NET Core 23.26.2 with Deep Data Security and reviewing the Oracle Deep Data Security web page and ODP.NET Developer’s Guide Deep Data Security section.

FAQ

What is agentic AI?

It’s AI that can plan, reason, and execute multi-step tasks independently, often without human supervision.

Why is managing data security at the database-level preferred?

It centralizes data access control, making it easier to manage, update, and audit compared to securing each application individually.

How does Deep Data Security protect data?

It enforces policies at row, column, and cell levels, ensuring users and AI agents only access authorized data.

How do .NET apps use Deep Data Security?

They pass user and app identity via tokens into an ODP.NET connection security context, allowing the database to enforce access rules without exposing credentials.

5 Oracle AI Database Dev Tools I’d Put in a Starter Kit

Anya Summers — Thu, 25 Jun 2026 19:56:31 +0000

A practical toolkit to quickly build, test, and validate Oracle AI Database workflows from local to cloud

Key Takeaways

Start fast with containers or FreeSQL to reduce setup time and quickly validate ideas or queries.
Use SQLcl and SQL Developer together for both automation (CLI) and visual inspection (GUI).
Enable AI-assisted workflows with SQLcl’s MCP Server while enforcing security at the data layer.
Move seamlessly from local experiments to Always Free Autonomous AI Database for realistic cloud testing.

Developers need the shortest path from claim to proof:

“Can I start a database locally?”
“Can I connect from my app?”
“Can I run my tests against it?”
“Can I inspect the schema without guessing?”
“Can I use it with scripts, agents, and CI?”
“Can I easily move from a laptop to a managed cloud database?”

In this article, we’ll look at tools that help you shorten the feedback loop for the development process you’re trying to prove.

Here are the five I would put in a practical starter kit. These are tools I use every day.

1. Oracle AI Database Free Container Images

Start local when you can. While the database container images are around 4–5 GB, they are multi-arch and start quickly for easy dev workflows on your laptop.

The Oracle AI Database Docker Compose sample spins up a disposable database on localhost:1521. It’s enough for most app development: point your app at the container database and fire away. When you’re done, throw the container away.

I like containers a lot and use them constantly for development work. Here are some more Oracle-specific container resources:

For API work, try Oracle REST Data Services (ORDS) with Docker Compose
For Testcontainers developers, try both Oracle AI Database Free and ORDS in your test suites.

Use containers when you:

Need repeatable local development
Are running integration tests that create and destroy their own database
Are testing a feature before moving it into shared infrastructure
Need ORDS locally for REST, JSON, or SQL Developer Web workflows

Containers help you proof your code, schema, and assumptions on a clean database environment.

2. FreeSQL

Sometimes the right local setup is no local setup.

Oracle FreeSQL gives you a browser-based SQL environment for learning, testing queries, and sharing examples without installing a database first. It is a good tool when the goal is to remove setup friction.

With a free account, you get a personal schema and can connect from tools such as SQLcl, VS Code, and application code. I covered that workflow in Use Oracle FreeSQL as a remote test database.

Use FreeSQL when you:

Are learning SQL or teaching someone else
Need a remote schema without provisioning cloud infrastructure
Want to test a query from a browser
Are looking for a simple database target for examples, demos, or agents

FreeSQL is a low-friction place to start proving small things.

3. SQLcl MCP Server

SQLcl is one of the first tools I install for Oracle AI Database work.

It is fast, scriptable, and useful for normal database development. You can run SQL, execute setup scripts, inspect objects, export data, load data, and automate validation without opening a full IDE.

Now SQLcl also matters for AI-assisted development. Oracle describes SQLcl as a free command-line interface with an integrated MCP Server, and the SQLcl MCP Server documentation explains how AI clients can use saved SQLcl connections to discover database context and run database operations through a structured MCP interface. The MCP server is something you can plug into Codex or Claude Code to assist with database operations.

Use SQLcl when you want:

A reliable command-line SQL tool
Repeatable scripts for setup, validation, or data loading
An MCP bridge between an AI assistant and Oracle AI Database
Agents to inspect real schema metadata instead of guessing

If you’re using MCP, I also recommend reading up on Oracle Deep Data Security, which is aimed at solving problems around authorization for agentic AI. The practical idea of Deep Data Security is simple: enforce authorization at the data layer, not only in the app or the prompt.

4. SQL Developer

SQL Developer complements SQLcl, providing additional features beyond the capabilities of the command line.

Most database developers eventually need a visual tool for browsing schemas, inspecting rows, reviewing objects, writing SQL, debugging PL/SQL, or explaining something on a screen share.

SQL Developer is Oracle’s tool family for that job. If you need a dedicated database IDE, use standalone SQL Developer. If your day already lives in VS Code, use SQL Developer for VS Code and keep database work closer to your application code.

Use SQL Developer when you:

Want to browse schemas and database objects visually
You are writing or debugging SQL and PL/SQL
You need to inspect data quickly
Want Oracle AI Database tooling inside VS Code or as a standalone app

5. Always Free Autonomous AI Database

Some work needs a managed cloud database.

Always Free Autonomous AI Database is what I use when I need something closer to a real cloud deployment.

It’s a strong fit for personal projects, demos, APEX and ORDS work, cloud-native experiments, and validation that needs real cloud connectivity. You can test wallets, network rules, deployment behavior, and managed database operations in a realistic environment.

The tradeoff is that it’s still managed cloud infrastructure. You need an Oracle Cloud Infrastructure (OCI) account, and you need to understand wallets, networking, and free-tier quotas. Always Free is useful for learning and validation, but it is not production capacity. Treating it like production will lead to bad assumptions.

Use Always Free Autonomous AI Database when you:

Need a persistent, managed Oracle AI Database environment
Are building demos or personal projects with Oracle AI Database
Want to test wallet-based connectivity
Need to validate cloud deployment behavior before using paid resources

The Always-Free tier includes not just one, but two free Autonomous AI Database instances. Beyond that limit, Database For Developers offers fixed size database instances for ~$30/month on OCI.

Bonus: LiveLabs Training and Tutorials

Tools are easier to adopt when there is a guided path.

Oracle LiveLabs gives you hands-on labs and workshops across Oracle technologies. It is useful when you need more than documentation but less than a full course.

Use Oracle LiveLabs when you:

Are learning a feature for the first time
A guided workshop before building your own version
Need training material for a team
Want examples that connect product features to real tasks

Start Small and Prove One Thing
The goal isn’t to collect Oracle tools.

The goal is to keep the development loop short: write code, run SQL, inspect results, automate the boring parts, and move from local to cloud without changing the way you think about the database.

FAQs

Q: What’s the fastest way to start using Oracle AI Database locally?
Use container images with Docker Compose to spin up a disposable database for development and testing.

Q: When should I use FreeSQL instead of a local database?
When you want zero setup — ideal for learning SQL, quick demos, or testing queries in a browser.

Q: Why use both SQLcl and SQL Developer?
SQLcl is great for scripting and automation, while SQL Developer helps with visual tasks like browsing schemas and debugging.

Q: When do I move to a cloud database?
Use Always Free Autonomous AI Database when you need persistent storage, cloud connectivity testing, or a more production-like environment.

When to use Claude memory, Oracle AI Agent Memory, and LangChain together

Anya Summers — Thu, 25 Jun 2026 19:44:11 +0000

Build a controlled Claude MCP workflow with Oracle SQLcl, Oracle AI Database, Oracle AI Agent Memory, and LangChain.

Companion notebook: Claude MCP Oracle AI Database: When to use Claude memory, Oracle AI Agent Memory, and LangChain together

Key Takeaways

MCP turns AI-to-database access into an explicit tool contract instead of implicit system access. 
Oracle SQLcl in MCP mode (sql -mcp) is a direct, documented way to connect Claude Desktop to Oracle AI Database through an MCP server. 
Oracle AI Database provides the persistent storage and vector search layer for memory workloads, while Oracle AI Agent Memory gives teams a Python API for threads, durable memory records, scoped retrieval, and context assembly on top of it.
LangChain plus langchain-oracledb is useful for structured retrieval pipelines once the memory layer is in place.
A hybrid model is a strong default for many teams: Claude + MCP for operational interaction, Oracle AI Database + LangChain for durable memory records and retrieval.

Here is how those components connect in this pattern. Claude talks to Oracle through SQLcl MCP, using the tools and database permissions you expose. Oracle AI Agent Memory is the Python package your app uses to manage durable memory records and context assembly. LangChain is an optional wrapper at the end of the retrieval path. The knowledge base is data in Oracle tables, with retrieval and access governed by your application and database design.

Production success depends less on “prompt quality” and more on boundaries, privileges, logging, and repeatable runbooks. 

Many AI assistant demos fail in the same place: not in the first interaction, but in week two. The assistant can generate SQL and explain concepts, but the workflow often lacks durable context across sessions. Teams also struggle to answer basic operational questions, like who executed what, where, and with which permissions. 

That is why this topic matters for developer teams right now. If you are integrating AI into workflows that query, analyze, or modify data in Oracle- running reports, inspecting schemas, retrieving context, or writing results back- you need two things at once: controlled execution and durable memory.

MCP defines the execution boundary. Oracle AI Database provides durable storage, vector search, and database controls for application memory records. You can build that layer directly with tables and retrieval logic, but the Oracle AI Agent Memory Python package makes the integration easier once memory workflows start getting more complex. LangChain comes in later when you need structured retrieval and orchestration on top of that.  

By the end of this guide you will know how to connect Claude to Oracle AI Database through a controlled MCP boundary, when Claude’s built-in memory is sufficient and when your application needs Oracle AI Agent Memory to manage durable memory records in Oracle AI Database, and how to build a retrieval pipeline you can query, audit, and grow over time.

The developer path through this guide is simple:

Start with one approved Oracle connection and a read-only validation query. 
Put SQLcl MCP in front of that connection so Claude sees tools, not raw database credentials. 
Check the audit and activity trail before adding more tool access. 
Add Oracle AI Agent Memory when the workflow needs durable thread context, scoped recall, or reusable context cards. 
Add LangChain only when you need application-side retrieval orchestration beyond the MCP interaction loop.

Claude Memory vs Oracle AI Agent Memory

Claude’s built-in memory has improved significantly, with support for chat history and project-level context. It works well for assistant continuity, but it is still scoped to the assistant experience. 

Before getting into the memory categories, it is worth introducing Oracle AI Agent Memory properly. It is a Python package that sits on top of Oracle AI Database and provides the application-facing API for conversation threads, durable memory records, scoped retrieval, and context cards you can pass back to an assistant. You can build the same tables and retrieval logic yourself, and the companion notebook shows exactly how that works at the table level. But once memory workflows grow-multiple users, cross-session context, retrieval at scale, this package saves a lot of repeated work. Think of Oracle AI Agent Memory as the API your application talks to, and Oracle AI Database as the storage and enforcement layer underneath it.

In practice, “memory” means different things depending on the layer you are talking about. Claude Memory and Oracle AI Agent Memory solve different problems: 

As of writing, Claude’s memory makes conversations smoother, but it’s still scoped to the assistant experience. It’s not built for querying application history, sharing context across users, or enforcing database-level audit and access controls. That’s where Oracle AI Agent Memory comes in. It gives you a persistent application memory layer you can query and manage across sessions and teams. Important decisions should still be grounded in systems of record, application authorization, and human or workflow review where required. 

A simple way to think about it: Claude remembers for the conversation. Oracle AI Agent Memory remembers for the system.

Because memory records live in Oracle AI Database and not on one local machine, they can become portable across approved clients. Point a new machine at the same database with the right credentials and policies, and the application can retrieve the same memory records.

Even with Claude memory, teams often need an application-level memory layer. Claude memory is not designed for querying history across users, storing tool logs, or applying database access controls. Oracle AI Database can help fill that gap by providing durable, shared, and queryable memory records for application workflows. 

Use the layers this way:

Use Claude memory for assistant continuity: preferences, project context, and conversational convenience inside the assistant experience. 
Use SQLcl MCP when Claude needs to inspect or query Oracle through an explicit tool boundary. 
Use Oracle AI Agent Memory when your application needs durable threads, searchable memory records, scoped retrieval, or context cards across users, agents, and sessions. 
Use LangChain when your app needs reusable retrieval chains, routing logic, or orchestration around the memory and vector search layer.

Why this architecture is useful for developers 

The developer value is practical: each layer gives you something concrete to test before you trust the whole workflow. You can validate the MCP server, the saved SQLcl connection, the database role, the durable application-memory write path, and the retrieval query separately.

That matters after the demo. When an answer looks wrong, a developer can inspect whether the tool call ran, which database user executed it, what SQL or retrieval path was used, which durable memory records or tool traces were returned, and whether the application assembled the right context. The failure stops being “the model was wrong” and becomes a narrower engineering problem.

The responsibilities break down into testable layers:

The assistant translates user intent into a plan your app or MCP client can inspect. 
MCP exposes a declared tool surface instead of broad implicit system access. 
SQLcl MCP gives developers a reproducible bridge from Claude Desktop to approved Oracle connections. 
Oracle AI Database keeps roles, privileges, memory records, tool logs, and vector retrieval close to the data layer. 
Oracle AI Agent Memory gives Python developers a package API for threads, durable memory records, scoped search, and context cards. This is application memory, not just chat history. 
LangChain handles retrieval workflows and tool coordination where application logic is needed, without becoming the permission boundary.

The payoff is a workflow that is easier to review, easier to debug, and easier to grow. You can start read-only, prove the connection and logging path, add durable memory when the application needs continuity across sessions or workflows, and keep each new capability attached to a named layer instead of burying everything in prompts or a custom agent framework. 

Understanding the two execution loops 

Building on the separation of responsibilities above, the system naturally forms two execution loops: 

Loop A: An operational loop for real-time interaction (Claude + MCP): This is the real-time interaction where Claude works with MCP to run queries, inspect data, and respond immediately. 
Loop B: A persistence loop for cross-session memory (Oracle AI Database + LangChain): This is where Oracle AI Database and LangChain handle durable memory records, tool logs, and context retrieval across sessions.

One loop handles real-time interaction, the other handles durable memory records and retrieval. 

SQLcl MCP is for Claude operating interactively- real-time queries during a conversation, routed through a declared tool contract. Oracle AI Agent Memory is for your application code- storing turns, retrieving history, assembling context before Claude sees a prompt. They serve different loops at different times. You can drop either one depending on your use case, but many production setups benefit from both.

Setup Guide: Reproducing the Oracle SQLcl MCP and Claude Workflow 

The SQLcl MCP setup is documented by Oracle and reproducible in the way that matters for developers: you can install it, test it, validate the saved connection, and inspect activity before Claude runs a real query.

Prerequisites before you connect Claude

Oracle SQLcl 25.2.0 or higher. 
Oracle JRE 17 or 21. 
Claude Desktop or another MCP-capable client you are explicitly configuring and testing. 
At least one saved SQLcl connection profile under ~/.dbtools, created with password persistence for MCP use. 
A database user with the minimum permissions required for the workflow. Start with read-only access and a sanitized development or replica environment where possible.

The core idea is simple. SQLcl runs in MCP mode with sql -mcp. Claude Desktop launches it as an MCP server and talks to the database through declared tools and the permissions attached to the saved connection. Connections come from saved profiles in the SQLcl connection store under ~/.dbtools. Claude does not invent them at runtime, it reuses ones you have already created and validated.

One setup detail that catches people out: MCP-compatible saved connections need the password persisted. That is what the -savepwd flag does when you create the connection. Treat that saved profile as a credentialed application path: use a purpose-specific database user, keep the grant surface small, and avoid pointing first experiments at production data.

Once that is done, you configure Claude Desktop to point at the SQLcl executable and pass -mcp as the argument. Claude Desktop manages the server lifecycle from there, and SQLcl translates tool calls into database operations. Oracle recommends granting the minimum permissions required, considering sanitized copies or read-only replicas for AI access, and auditing LLM activity. SQLcl MCP activity can be inspected through database-side traces such as DBTOOLS$MCP_LOG and session views such as V$SESSION. (docs.oracle.com) 

SQLcl MCP also supports restrict levels. The documented default is restrict level 4, which disables sensitive commands such as unrestricted file system access and host execution. Treat changes to the restrict level as an explicit security decision, not as a convenience toggle. (docs.oracle.com)

A minimal configuration looks like this:

 

{
  "mcpServers": {
    "sqlcl": {
      "command": "PATH/bin/sql",
      "args": ["-mcp"]
    }
  }
}

That small JSON block defines the connection between Claude and SQLcl MCP Server. Claude interacts with the database through the tools and permissions exposed by the MCP server, using the saved SQLcl connection profile you created and tested first. 

Validation checklist before expanding access

Run sql -mcp locally and confirm the server starts. 
Restart Claude Desktop and confirm the SQLcl tools are discoverable. 
Run one read-only query against an approved schema. 
Check database-side MCP activity logs and session metadata. 
Document the connection alias, database user, grant scope, restrict level, and troubleshooting owner.

Good first proof looks like this:

The MCP server starts without a Java or path error. 
Claude lists the SQLcl MCP tools after restart. 
A read-only query succeeds against the expected schema. 
The database-side activity trail shows the MCP interaction. 
A denied query fails because of the database role, not because a prompt asked nicely.

Why put application memory records in Oracle AI Database, not just outputs

 
Once your first tool calls work, the next challenge is continuity. If memory lives only in chat context, the system is fragile. If memory is scattered across files without structure, retrieval and auditing become expensive over time. 

It’s worth calling out the difference here. At this point, the challenge shifts from conversation persistence to system-level memory. 

A model that uses Oracle AI Agent Memory is often cleaner and easier to operate as the workflow grows. 

The companion notebook builds this memory layer from scratch, so the mechanics are visible and then shows how Oracle AI Agent Memory slots on top of it once the substrate is working.

Memory categories that matter in practice 

Conversational memory  Stores user and assistant turns, thread IDs, timestamps, and metadata. 
Operational memory  Stores tool inputs, outputs, status, and error classes for troubleshooting and audit. 
Semantic memory  Stores chunks and embeddings for meaning-based retrieval when exact keywords are absent.

Why this matters technically

SQL tables give deterministic filtering and ordering. 
Transactions improve integrity under concurrent writes. 
Vector retrieval helps with paraphrases and conceptual matches. 
Keeping memory on one platform makes it easier to manage, audit, and keep consistent over time.

This works well with Oracle AI Database because structured records and semantic retrieval data can stay in one place.

Where LangChain adds value (and where it should not be overused)

 
LangChain is useful as orchestration glue, especially when teams want a documented path for tool definitions and retrieval calls. One thing worth stating clearly: in the architecture shown here, Claude Desktop does not call LangChain directly. LangChain runs in your application layer to format context before it reaches Claude’s prompt. With langchain-oracledb, teams can wire vector retrieval in Oracle AI Database while keeping control in database roles and runtime policies. 

Good uses of LangChain in this architecture 

Declaring retrieval and memory tools in a consistent format. 
Running retrieval-first answer pipelines. 
Standardizing how context is assembled before generation. 
Building reusable agent patterns across teams.
 
Poor uses of LangChain in this architecture 
Assuming LangChain automatically makes database access safe.
Relying only on prompts to limit what the assistant is allowed to do.
Adding too many tools before your team knows how to manage and troubleshoot them.

A good rule is to enforce permissions in the database and infrastructure layer, not only in framework code or prompts.

Practical Implementation Snippets

The snippets below show the minimum useful shape of the implementation: the MCP boundary, the memory substrate, a package-level memory API, and the retrieval policy that keeps generated answers grounded. 

1) MCP boundary snippet

 

{
  "mcpServers": {
    "sqlcl": {
      "command": "C:\\tools\\sqlcl\\bin\\sql.exe",
      "args": ["-mcp"]
    }
  }
}

2) Memory schema concept snippet

 

-- CONVERSATIONAL_MEMORY  
THREAD_ID, ROLE, CONTENT, METADATA_JSON, CREATED_AT  
  
-- TOOL_LOGS  
THREAD_ID, TOOL_NAME, TOOL_INPUT, TOOL_OUTPUT, STATUS, ERROR_MESSAGE, CREATED_AT  
  
-- KB_CHUNKS (used for vector retrieval via langchain-oracledb)  
TEXT_CHUNK, METADATA_JSON, EMBEDDING

3) Oracle AI Agent Memory package path

pip install "oracleagentmemory==26.4.0"

The package path expects Python 3.10 or later, Oracle AI Database, version 26ai or later for compatibility, an Oracle AI Database connection or connection pool, an embedding model for retrieval, and an optional LLM for memory extraction, summaries, and context cards. The exact adapters depend on your application, but the API shape is intentionally small:

from oracleagentmemory.apis.searchscope import SearchScope
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory
from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm

embedder = Embedder(model="YOUR_EMBEDDING_MODEL")
llm = Llm(model="YOUR_LLM")
db_pool = ...  # your oracledb connection or connection pool

memory = OracleAgentMemory(connection=db_pool, embedder=embedder, llm=llm)

thread = memory.create_thread(user_id="user_123")
thread.add_messages([
    {"role": "user", "content": "Remember that I prefer morning deployment reviews."},
    {"role": "assistant", "content": "Got it. I will keep that preference in mind."},
])

thread.add_memory("The user prefers morning deployment reviews.")

results = memory.search(
    query="When does this user prefer deployment reviews?",
    scope=SearchScope(user_id="user_123"),
)

context = thread.get_context_card()

Use oracleagentmemory from your application layer when you need package-managed users, agents, memories, threads, scoped retrieval, and context assembly. Keep systems of record separate from memory records: memory helps provide context, but application logic and authoritative data sources should still decide what is true, allowed, and final. (docs.oracle.com)

4) Retrieval-first policy snippet (pseudo-policy) 

Retrieve relevant memory before synthesizing final answer. 
If retrieval is empty, say context is insufficient. 
Keep answers evidence-first and concise. 
Log tool calls with status and timestamp.

Engineering guidance for production teams

 
The difference between demo success and production success is disciplined operations. Most failures at this stage come from integration gaps, not model behavior. 

Access and privilege model 

Separate accounts per environment (dev, test, prod). 
Start read-only wherever possible. 
Use least privilege grants and schema allowlists. 
Gate write operations with explicit confirmation workflows.

Observability model 

Log tool name, thread ID, timestamp, status, and sanitized inputs. 
Classify failures into runtime, connection, privilege, query, and retrieval. 
Keep a troubleshooting playbook in your repo. 
Check whether retrieval results become less accurate as more data is added.

Reliability model 

Prefer deterministic SQL patterns with bounded result sets. 
Use retrieval-first context assembly for memory-heavy tasks. 
Avoid giant context stuffing as a substitute for memory design. 
Review and prune tool surfaces periodically.

This is also where teams should align with platform and security teams early. Governance should be designed into the architecture, not bolted on after incidents. 

Typical failure modes and how to diagnose them fast 

Most teams hit a predictable set of issues.  

Runtime failure: sql -mcp does not start 

Check the absolute SQLcl path, confirm Java is available, and run sql -mcp outside Claude first. Resolve runtime first before checking assistant behavior. 

Discovery failure: Claude does not see tools 

Check the Claude Desktop JSON, confirm the configured command points to the SQLcl executable, and restart Claude Desktop after edits. If the server starts in a terminal but not from Claude, treat it as a config or environment-path problem. 

Connection failure: tools are present but queries fail immediately 

Check the saved SQLcl connection alias, confirm the profile lives under the expected SQLcl connection store, and verify password persistence for the MCP workflow. Then test the same connection outside Claude. 

Permission failure: queries execute selectively and fail on specific objects 

Check the database role first. A selective failure can be the right outcome when least privilege is working. Add grants intentionally, prefer schema allowlists, and keep read-write access separate from the initial validation path. 

Retrieval quality failure: answers are fluent but weakly grounded 

Inspect the retrieved records before blaming the model. Check chunk size, metadata filters, embedding choice, top-k settings, and whether the query is asking for exact history, semantic similarity, or operational logs. 

Why the hybrid model is a strong long-term default

 
By this point, you’ve probably noticed a pattern: no single layer handles both execution and memory well. 

Trying to force everything into the assistant gets messy fast. You either lose control over execution, or you end up stuffing too much context into prompts just to keep things working. On the other side, if you only build backend memory systems, you lose the speed and usability that makes assistants useful in the first place. 

The hybrid approach works because it doesn’t try to solve everything in one place: 

Execution stays controlled through MCP. 
Memory stays durable and queryable in the database. 
The two are connected where needed, not tightly coupled.

In real teams, this usually evolves over time. It starts simple: Claude with SQLcl MCP, read-only access, and basic workflows. Once people start relying on it, the gaps show up: we lose context, we can’t trace what happened, or we are repeating work. 

That’s when it makes sense to introduce Oracle AI Agent Memory and retrieval. Not earlier. 

The goal isn’t to build perfect architecture upfront. It’s to add structure where the system starts to break. 

Conclusion

 
Setting up Claude with SQLcl MCP works well when treated as an architectural pattern, not just a series of setup steps.  Each layer has one job: Claude handles intent, MCP enforces the execution boundary, Oracle AI Database stores durable memory records and audit data, and LangChain handles retrieval orchestration where needed.

With clear execution boundaries and durable memory, you can trace what happened, understand failures, and evolve the workflow without introducing hidden behaviour.

That shift, from implicit access and adhoc context to explicit boundaries and durable memory, is what moves AI-assisted workflows from experiments to operational systems.

Frequently Asked Questions

 
What is MCP in this context? 

MCP is a protocol that lets Claude call explicit tools exposed by a server, rather than accessing systems implicitly. 

Why use SQLcl for Oracle MCP? 

SQLcl already understands Oracle workflows and can run as MCP server with sql -mcp, making integration practical and direct. 

Is this setup only for Claude Desktop? 

No. The same MCP and memory architecture concepts can be reused with other MCP-capable clients and backend services. 

Why include Oracle AI Database if MCP already works? 

MCP handles execution boundaries. Oracle AI Database handles durable memory records, retrieval, concurrency, and database access controls. Claude’s own memory helps within a session, but it is not designed as an application memory layer. 

What versions are required for the SQLcl MCP setup?

Oracle documents that the SQLcl MCP Server requires Oracle SQLcl 25.2.0 or higher, Oracle JRE 17 or 21, Claude Desktop, and at least one saved SQLcl connection profile with password persistence enabled via -savepwd. Teams should verify the latest compatibility guidance in the official Oracle documentation as MCP support evolves.

Where does Oracle AI Agent Memory fit?

Oracle AI Agent Memory sits between your application code and Oracle AI Database. The package manages threads, durable memories, scoped retrieval, and context cards, while Oracle AI Database remains the storage and enforcement layer underneath.

Where does LangChain fit? 

LangChain is an orchestration layer for tools and retrieval. It can help assemble context and retrieval pipelines, but permissions still belong in the database, infrastructure, and application runtime. 

Do I need vector search for every use case? 

No. Start with structured memory. Add vector retrieval when paraphrase-heavy or concept-level retrieval becomes important. 

How do I prevent risky SQL operations? 

Use least privilege roles, schema allowlists, read-only access where possible, SQLcl MCP restrict levels, and explicit confirmation workflows for high-impact actions. 

Can this support audit or compliance needs? 

It can support audit-oriented workflows if tool traces, SQL-level controls, retention policies, and review processes are implemented consistently. Do not treat memory records as the sole authoritative record for regulated or high-impact decisions. 

Companion Troubleshooting Appendix

Minimum viable setup: SQLcl MCP configured in Claude, one approved Oracle connection, read-only validation, and database-side activity logging. 
First checks: confirm sql -mcp starts, Claude sees the tools after restart, and the saved SQLcl connection alias resolves. 
Environment model: use separate credentials and policies for dev, test, and prod, with stricter controls as capability expands. 
Logging model: capture tool name, timestamp, thread ID, status, sanitized input/output summaries, and relevant SQLcl MCP log records. 
Retrieval quality: tune chunk size, enrich metadata, review embedding choice, and evaluate retrieval against representative queries. 
Common anti-pattern: expanding tool surfaces before ownership, logging standards, and runbooks are in place. 
Rollout path: pilot in dev with read-only access and strong logging, then expand capabilities in controlled phases.

Which Agent Memory Approach Is Best for Long Conversations?

Anya Summers — Tue, 23 Jun 2026 17:01:51 +0000

How sliding windows, summaries, vector retrieval, structured memory, episodic memory, and memory managers work together to support long AI agent conversations.

Companion notebook: Agent Memory for Long Conversations with Oracle AI Database

Key Takeaways

Long conversations are continuity problems.

The best practical pattern is hybrid layered memory: recent context, summaries, vector retrieval, structured memory, episodic memory, and a memory manager.
Sliding window memory keeps recent turns available, but older context still falls out.
Summarization compresses older dialogue, but it can lose details or drift.
Vector retrieval finds semantically related context, but similarity is not the same as relevance.
Structured memory stores stable facts, preferences, entities, decisions, and state.
Episodic memory preserves important events, outcomes, and prior attempts.
A memory manager decides what gets stored, updated, retrieved, summarized, and passed into the model.
Oracle AI Database becomes useful when long-conversation memory needs durable storage, relational precision, vector retrieval, JSON metadata, and governed access patterns.

The Practical Pattern

For long AI agent conversations, the most reliable pattern is hybrid layered memory. In practice, that means each memory layer has a specific job:

Keep the latest turns available as recent context.
Summarize older dialogue so the model does not need the full transcript every time.
Use vector retrieval when the user refers back to older context with different wording.
Store stable facts, preferences, decisions, and state in structured memory.
Preserve important events, outcomes, and prior attempts as episodic memory.

The memory manager sits above those layers and decides what gets written, updated, retrieved, summarized, and passed into the model for the current turn. The companion notebook implements this pattern with Oracle AI Database, Oracle AI Agent Memory, and LangChain, but the first idea is vendor-neutral: long conversation memory needs architecture, not just a larger prompt.

Why Long Conversations Break Simple Chat History

A short chat can usually survive with raw conversation history. The model sees the latest turns, understands what the user is asking, and continues naturally. Long conversations are different because they contain many kinds of information at once:

temporary details that only matter for the next response;
durable decisions that should be remembered later;
user preferences, project facts, and task state;
tool results, failed attempts, successful outcomes, and follow-up actions.

Treating all of that as one long transcript does not scale well. The model either receives too much irrelevant context, misses older details, or depends on a compressed summary that may have lost something important. Long conversation memory needs structure because not every part of a conversation has the same value, lifetime, or retrieval pattern.

Why Bigger Context Windows Are Not Enough

A bigger context window can delay the problem, but it does not solve it. More context means the model can see more text at once, which is useful for long documents and extended sessions. But it does not answer the harder engineering questions:

Which facts should survive across sessions?
Which older details are still relevant?
Which decisions are authoritative?
Which prior attempts should not be repeated?
Which memory belongs to this user, this project, or this task?

A bigger context window gives you more room. It does not give you a memory policy. That policy has to come from the application architecture: what to store, what to summarize, what to retrieve, what to trust, and what to pass into the model for a specific turn.

The Memory Approaches That Actually Help

Different memory approaches solve different parts of the long conversation problem. The useful framing is not to ask which one is universally best, but which layer should handle which kind of continuity.

Memory approach	Best for	Weakness
Sliding window memory	Recent turns and immediate continuity	Older context falls out
Conversation summary memory	Compressing older dialogue	Can lose detail or drift
Vector memory	Semantic recall across older context	Similarity is not the same as relevance
Structured memory	Facts, preferences, entities, decisions, and state	Requires extraction and update rules
Episodic memory	Events, outcomes, prior attempts, and task resumption	Needs importance and retention rules
Memory manager	Coordinating what to store, retrieve, summarize, update, and pass forward	Adds application logic that must be tested

The important point is that none of these approaches is enough by itself. A useful long-conversation system combines them, then lets a memory manager decide which pieces are relevant for the current turn.

Sliding Window and Summarization for Short-Term Continuity

The first layer is sliding window memory. It keeps the latest turns close to the model so the current exchange remains coherent. If a developer just asked a follow-up question, the model needs the most recent messages to understand the current task and avoid asking for context that was already provided.

But a sliding window is temporary by design. Once the conversation gets long enough, older context falls out. Summarization helps by compressing older dialogue into a smaller representation, preserving continuity without passing the entire transcript into every request. The tradeoff is that summaries are not perfect memory. They can omit details, merge separate ideas, or drift over time. In practice, summaries work best when they are supported by more precise layers, especially structured memory and episodic memory.

Vector Retrieval for Long-Term Semantic Recall

Oracle AI Vector Search helps when the user refers to older context with different wording. For example, the user might ask, “Earlier we debugged this issue. What did we decide, and what should I try next?” That question does not repeat every detail from the earlier debugging work. A vector memory layer can still retrieve related chunks about the root cause, the decision, the failed patch, and the rollout plan.

Vector retrieval is especially useful for recall across sessions, paraphrased follow-up questions, large conversation histories, and knowledge that is easier to find by meaning than by exact keyword. But it should not be the only memory layer. Semantic similarity is not the same as correctness. A retrieved chunk can be related but outdated, incomplete, or less authoritative than a structured decision record.

Structured Memory for Facts, Preferences, and State

Structured memory stores information that should be precise. This includes user preferences, project facts, entities, decisions, task state, configuration choices, and metrics to monitor. These are not just pieces of text; they are records the application may need to query, update, validate, and govern.

In the companion notebook, structured memory includes project state, decisions, metrics, and preferences. For example, it stores the decision to use a region-specific inventory lock timeout, the project state that EU payment authorization latency exceeded the existing timeout, and the metric to monitor expired inventory locks by region. This kind of memory helps the memory manager prefer authoritative facts over loosely related retrieved chunks.

Episodic Memory for What Happened and Why It Mattered

Episodic memory stores important events and outcomes. It matters for long conversations because agents often need to resume work, explain prior decisions, or avoid repeating failed attempts. A fact says what is true. An episode says what happened, what changed, and why it mattered.

In the notebook, episodic memory stores events such as a rejected global patch, an EU-only patch that passed staging, and an agreed rollout plan. If the developer later asks what to try next, the agent should know that the global patch already failed and that the EU-only patch passed staging. That is the difference between remembering text and remembering progress.

The Best Pattern: Hybrid Layered Memory

The best pattern for long conversation memory is a layered architecture. Recent context keeps the current exchange coherent. Summaries compress older dialogue. Vector retrieval brings back semantically related information. Structured memory preserves stable facts and decisions. Episodic memory records what happened and what was tried.

The memory manager coordinates the layers. That coordination is what turns memory from a pile of stored text into a usable system.

How a Memory Manager Assembles Context for Each Turn

A memory manager should not blindly stuff every stored item into the prompt. For each turn, it should decide:

which recent turns to include;
whether the rolling summary is needed;
which structured facts and episodic events matter;
which retrieved chunks are useful;
what should be stored or updated after the response.

Example context package:

context_package = {
    "question": question,
    "recent_context": recent_turns,
    "rolling_summary": summary,
    "structured_memory": structured_memory,
    "episodic_memory": episodic_memory,
    "retrieved_memory": retrieved_memory,
}

This shape is easier to inspect than a giant prompt. If the answer is wrong, developers can debug the context package: was the summary stale, did retrieval miss the right memory, was the structured decision missing, or did the episodic log omit a failed attempt?

Handling Memory Conflicts and Freshness

Layered memory introduces a new engineering question: what happens when memory layers disagree?

For example, a rolling summary might preserve an older plan, while structured memory contains the final decision. A vector search result might retrieve a semantically related note that is no longer current. An episodic memory entry might show that a previous attempt failed, even if the latest summary does not mention it.

A reliable memory manager should treat memory as evidence, not as a flat transcript. Useful conflict and freshness rules include:

prefer structured decisions over summaries when both refer to the same fact;
prefer newer memory when two records have the same authority;
prefer scoped memory over generic memory, such as project-specific or region-specific records;
downgrade retrieved chunks that are old, superseded, or weakly related to the current task;
keep source, timestamp, scope, and memory type metadata with each memory record;
mark important records as active, superseded, rejected, or archived instead of deleting context too early.

This makes long-conversation memory easier to inspect. If the agent gives the wrong answer, developers can check which memory layer supplied the evidence and why that evidence was selected.

Making the Memory Manager Concrete

A memory manager is not just a helper that collects context. It is the policy layer for memory.

For each turn, the memory manager can rank candidate memories using simple rules:

recent turns explain the current exchange;
structured decisions are usually more precise than summaries;
episodic memory is useful when the user asks about prior attempts, outcomes, or what to try next;
vector results are useful when they pass a similarity threshold and match the current thread or task scope;
stale or superseded memories should be excluded unless they explain why a previous path should not be repeated.

A simple priority order could look like this:

Current user message
Recent conversation turns
Active structured decisions and project state
Relevant episodic events
Rolling summary
Vector-retrieved chunks
Archived or superseded memory only when needed for explanation

The exact policy depends on the application, but the principle is consistent: the memory manager should assemble the smallest useful context package that is current, scoped, and explainable.

Where a Database-Backed Memory Layer Fits

The first half of this architecture is intentionally vendor-neutral. Any serious long-conversation agent needs memory layers and a memory manager. Once memory needs to survive beyond a single session, a database-backed layer becomes useful because the system needs:

durable storage and queryable history;
structured facts and state;
vector retrieval and JSON metadata;
timestamps, status fields, and policy metadata for freshness and conflict handling;
user, thread, and task scoping;
access controls and auditability.

That is where Oracle AI Database fits naturally. It can store relational memory, JSON metadata, episodic logs, and vector-searchable chunks in one governed layer. The point is not that every application needs the same table names. The point is the separation of responsibilities.

What the Companion Notebook Demonstrates

The companion notebook implements the layered pattern end to end. It demonstrates:

every message stored in conversational memory;
a rolling summary per thread;
project state and decisions stored as structured memory;
important events stored with timestamps and outcomes;
retrieval chunks and Oracle vector search when available;
a context package assembled for a follow-up question from older conversation history;
a package-level validation path for oracleagentmemory, including creating a thread, writing memories, and searching them back.

The example follow-up question is:

Earlier we debugged this issue.
What did we decide, and what should I try next?

The notebook stores enough memory to answer that question without relying only on the latest chat turns. It also shows Oracle AI Agent Memory as a higher-level package workflow and LangChain as an interoperability layer.

Building the Oracle-Backed Memory Workflow

The notebook stores each memory layer in Oracle AI Database. Recent context is retrieved with a bounded query so the model receives the latest turns without carrying the full transcript.

Example recent-context query:

SELECT turn_id, role, content
FROM lcam_conversation_memory
WHERE thread_id = :thread_id
ORDER BY turn_id DESC
FETCH FIRST 6 ROWS ONLY

Structured memory is stored separately from raw messages so facts, decisions, preferences, and project state can be updated and queried directly.

Example structured-memory insert:

INSERT INTO lcam_structured_memory
(thread_id, memory_type, memory_key, memory_value, scope_json)
VALUES (:1, :2, :3, :4, :5)

Vector retrieval can use Oracle vector search when the database supports it.

Example vector retrieval query:

SELECT chunk_id, source, text
FROM lcam_vector_memory
WHERE thread_id = :thread_id
ORDER BY VECTOR_DISTANCE(embedding, :query_embedding, COSINE)
FETCH FIRST 6 ROWS ONLY

The notebook first stores retrieval chunks as inspectable memory records, then creates vector-searchable memory when Oracle VECTOR support is available. The query uses VECTOR_DISTANCE to rank candidate chunks by distance from the query embedding. The snippets are intentionally small so the architecture stays visible. The notebook carries the full executable workflow and the real database results.

Oracle AI Agent Memory as a Higher-Level Memory API

The custom tables in the notebook make the memory mechanics visible. Oracle AI Agent Memory provides a higher-level package interface for working with threads, memory records, and retrieval on top of Oracle AI Database. That is useful when a team wants the benefits of persistent memory without rebuilding every memory component from scratch.

The companion notebook also validates the oracleagentmemory package path by creating a thread, writing durable memories, and searching those memories back. That package-level proof is important because the table-level walkthrough explains the architecture, while Oracle AI Agent Memory shows the application-facing API path developers can use.

Example Oracle AI Agent Memory workflow:

agent_memory.create_thread(thread_id=thread_id)

agent_memory.add_memory(
    "EU checkout timeout decision: use 12 seconds for EU and 5 seconds for US.",
    thread_id=thread_id,
)

results = agent_memory.search(
    "What timeout did we choose for EU?",
    thread_id=thread_id,
    exact_thread_match=True,
)

This higher-level API belongs after the architecture is understood. It should not hide the core design question: which memory should be stored, updated, retrieved, and trusted for the current turn?

Where LangChain Fits

LangChain can help once the memory layer is working. It is useful for orchestration, document wrapping, retriever interfaces, and repeatable application flows. It should not replace database privileges, memory policy, or observability.

In the notebook, retrieved Oracle-backed memory is converted into LangChain Document objects so the same memory layer can participate in LangChain-style application flows.

Example LangChain document wrapping:

documents = [
    Document(
        page_content=row.text,
        metadata={"source": row.source, "score": float(row.score)},
    )
    for row in retrieved_memory.itertuples()
]

For Oracle-backed retrieval pipelines, Oracle AI Vector Search integration with LangChain gives developers a bridge between LangChain and Oracle AI Database.

Practical Recommendation for Developers

Use the simplest memory layer that solves the problem, but do not pretend one layer solves everything. Short chats may only need a sliding window. Long linear chats usually need a sliding window plus summaries. Recall across sessions needs vector retrieval. Correct preferences and profile facts need structured memory. Task resumption needs episodic memory. Production-grade continuity needs hybrid layered memory with a memory manager.

Scenario	Recommended approach
Short chats	Sliding window memory
Long linear chats	Sliding window plus summaries
Recall across sessions	Vector retrieval
Correct preferences and profile facts	Structured memory
Task resumption	Episodic memory
Reliable long-term continuity	Hybrid layered memory with a memory manager

A practical rollout is straightforward:

Store every message so the raw conversation can be inspected.
Keep a bounded recent context window and add a rolling summary for older dialogue.
Extract structured memory for facts, preferences, decisions, and state.
Store episodic memory for important events and prior attempts.
Add vector retrieval for semantic recall.
Use a memory manager to assemble context for each turn.
Move to a database-backed memory layer when memory needs to be durable, queryable, shared, and governed.

Conclusion

Long conversations are not solved by one memory technique. A bigger context window, raw chat history, summaries, vector retrieval, structured facts, and episodic logs each solve part of the problem. The best pattern is hybrid layered memory coordinated by a memory manager.

Oracle AI Database provides a durable implementation layer for that pattern when teams need relational precision, vector retrieval, JSON metadata, and governed access. Oracle AI Agent Memory and LangChain can then sit above that layer when developers need higher-level APIs or orchestration. The goal is not to keep making prompts larger. The goal is to make memory inspectable, retrievable, updateable, and reliable.

Run the companion notebook to see the pattern stored, retrieved, scoped, and validated in Oracle AI Database, including the oracleagentmemory package workflow.

Frequently Asked Questions

What is the best memory approach for long conversations?

Hybrid layered memory: recent context, summaries, vector retrieval, structured memory, episodic memory, and a memory manager.

Is a larger context window enough?

No. It gives the model more room, but it does not define what should be stored, retrieved, updated, or trusted.

What is conversation summary memory good for?

It compresses older dialogue so the model can keep continuity without receiving the full transcript.

What is vector memory good for?

Vector memory helps retrieve semantically related context, especially when users ask follow-up questions with different wording.

What is structured memory good for?

Structured memory stores stable facts, preferences, entities, decisions, and state.

What is episodic memory good for?

Episodic memory stores important events, outcomes, and prior attempts, which helps with task resumption.

What does a memory manager do?

It decides what gets stored, updated, retrieved, summarized, and passed into the model for each turn.

Where does Oracle AI Database fit?

It provides the durable memory layer for relational memory, JSON metadata, episodic logs, and vector-searchable chunks.

Where does Oracle AI Agent Memory fit?

It provides a higher-level package API for memory records, threads, and retrieval on top of Oracle AI Database.

Where does LangChain fit?

LangChain can help with orchestration and retriever interfaces after the memory layer is working.

DEV Community: Anya Summers

LangGraph persistence with Oracle AI Database

Key Takeaways

Sample Description

Prerequisites

Run the Sample

LangGraph bits

Building the graph with OracleStore and OracleSaver

FAQ

References

A tour of LangChain Oracle ingestion and retrieval

Key takeaways

Create a vector store: load, split, and embed content

Load data

Split text

Save and embed

Putting it all together

Now, let’s try retrieval and question answering

Semantic search

Keyword search

Fusing results

Putting retrieval together to answer a question

Time to take the sample for a spin

References

FAQs

Is Oracle AI Database the best choice for small to mid size shops?

Key Takeaways

Where Oracle May Surprise You

The Free Tooling Is Excellent

Local:

Cloud:

Run on any major hyperscaler

My Practical Recommendation

Final Thoughts

FAQ

References

What Is a Converged Database? Definition, Five Tests, and AI Use Cases

Key takeaways

Converged database vs multi-model database vs vector database

Where did the term converged database come from?

What are the five tests for a converged database?

How is a converged database different from a multi-model database?

Why did document databases diverge from relational databases?

Why does a converged database matter for RAG and AI agents?

Example: one commerce domain across relational, JSON, graph, vector, and spatial data

FAQ

Related Oracle resources and next reads

How fresh is the methodology and proof?

The Agent Communication Matrix: When MCP, A2A, and Plain REST Each Win

Key Takeaways

The protocol you picked is doing three jobs at once

The Agent Communication Matrix

Pattern 1: MCP-Centric Tool Access

Pattern 2: A2A Mesh with Oracle Memory

Pattern 3: Queue-Backed Backoffice Agents

The Enterprise Reality

Where This Is Heading

Frequently Asked Questions

An Agent Skill that uses Kafka Java APIs for Oracle AI Database

Key Takeaways

What’s in the skill

Let’s try using the skill to generate an app

Testing is part of the skill

Final Thoughts

To summarize

References

Single OpenAI-compatible endpoint for OCI Generative AI models with LiteLLM

Why this shape

The one caveat worth reading first

Step 1 — IAM: one policy, no keys, no users

Step 2 — Networking

Step 3 — A baked image, not install-on-boot

Step 4 — The shim: LiteLLM SDK behind an OpenAI-compatible API

Step 5 — Test it

Step 6 — Harden

Conclusion

The Agent Loop Decoded

Three Levels Every Agent Engineer Must Know

What is an Agent

What Is a Loop?

Why this architecture is useful for developers 

Understanding the two execution loops 

Setup Guide: Reproducing the Oracle SQLcl MCP and Claude Workflow 

Where LangChain adds value (and where it should not be overused)

Typical failure modes and how to diagnose them fast