<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oracle Developers</title>
    <description>The latest articles on DEV Community by Oracle Developers (oracledevs).</description>
    <link>https://dev.to/oracledevs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F11587%2F7c934ee0-6aa6-42f9-b43f-91e6fa82ef41.png</url>
      <title>DEV Community: Oracle Developers</title>
      <link>https://dev.to/oracledevs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oracledevs"/>
    <language>en</language>
    <item>
      <title>The Agent Communication Matrix: When MCP, A2A, and Plain REST Each Win</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Mon, 29 Jun 2026 16:04:28 +0000</pubDate>
      <link>https://dev.to/oracledevs/the-agent-communication-matrix-when-mcp-a2a-and-plain-rest-each-win-4omo</link>
      <guid>https://dev.to/oracledevs/the-agent-communication-matrix-when-mcp-a2a-and-plain-rest-each-win-4omo</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent communication has three problems, not just one.&lt;/strong&gt; Tool access, peer coordination, and system integration each need a different solution. Most production failures occur when one protocol tries to cover all three.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP and A2A are complements, not rivals.&lt;/strong&gt; The &lt;a href="https://modelcontextprotocol.io/specification" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (Anthropic, 2024) defines how models find and use tools. The &lt;a href="https://google.github.io/A2A/" rel="noopener noreferrer"&gt;Agent-to-Agent (Google, 2025)&lt;/a&gt; explains how agents cooperate. Generally, agents use A2A for coordination and MCP for tool access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple infrastructure still works well for many tasks.&lt;/strong&gt; Message queues provide at-least-once delivery, dead-letter queues, and automatic back-pressure. Adding these features to MCP or A2A requires coding idempotency, retry coordination, and ordering manually. When the LLM acts as a worker, use a queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three reference patterns cover most production needs.&lt;/strong&gt; MCP-Centric Tool Access (one orchestrator, multiple tools), A2A Mesh with Oracle Memory (peer agents coordinating via task envelopes), and Queue-Backed Backoffice Agents (RabbitMQ workers writing to Oracle, without agent protocol). Each includes runnable Python in the companion repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The protocol layer can change; the memory layer should stay stable.&lt;/strong&gt; The Oracle AI Database remains consistent across all three patterns (vector-indexed, transactional, audit-friendly). This consistency allows the protocol above to evolve while keeping the system of record intact.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The protocol you picked is doing three jobs at once
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flncpccd66iw3hhemsvax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flncpccd66iw3hhemsvax.png" alt="Diagram describing three communication patterns for AI agents. Tool Access enables models to call external capabilities such as SQL queries and vector search. Peer Coordination allows agents to hand work to other agents with their own state and lifecycle. System Integration connects agents to enterprise systems such as databases, queues, and audit services." width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Three communication patterns for AI agents: tool access, peer coordination, and system integration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Imagine your team created a multi-step research agent. It has three specialist sub-agents: a retriever, a synthesizer, and a reviewer. They connect over plain REST. It worked well in staging. But in production, p99 latency hit 14 seconds at the third hop. Retries piled up. A failed downstream call left the orchestrator with a half-written database row. The rollback logic, designed for a different failure mode, made things worse.&lt;/p&gt;

&lt;p&gt;Then they implemented RabbitMQ. Latency stabilized, and throughput increased. Now, retry issues were someone else’s concern, which was the goal. But two weeks later, the security team filed a ticket. They asked which agent had touched which row during a specific time, and nobody could answer. Request-scoped tracing had disappeared into the queue.&lt;/p&gt;

&lt;p&gt;The LLM-facing tool interface had splintered into six unique queue-message schemas, one for each specialist. None were introspectable by the model. The synthesizer agent started calling tools it didn’t know about and failed silently if they didn’t respond.&lt;/p&gt;

&lt;p&gt;The team hadn’t chosen a bad protocol. They had picked the same protocol for three different jobs, twice in a row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent communication isn’t just one problem. It involves tool access, peer coordination, and system integration. Each area needs its own solution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool access&lt;/strong&gt; occurs when a model needs to use a capability it lacks, like a SQL query or memory write. The Model Context Protocol (MCP) addresses this need. It’s often the first task for production agent systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Peer coordination&lt;/strong&gt; happens when one agent assigns work to another. This isn’t just a function call; it’s a task with its own state and lifecycle. The second agent may work independently on this task. The Agent-to-Agent protocol (A2A) supports this, solving problems that the tool-call model can’t handle well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System integration&lt;/strong&gt; involves agents interacting with your broader infrastructure—databases, queues, services, scheduled jobs, and audit pipelines. For two decades, REST, message queues, and event buses have managed this. Often, the simplest solution is the best.&lt;/p&gt;

&lt;p&gt;This article offers a framework to help you choose the right protocol for your needs. It includes three reference patterns for building with these protocols, each with runnable Python examples using Oracle AI Database. One constant remains true as the protocol layer evolves: the governed memory core.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent Communication Matrix
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Protocols aren’t ranked on a single axis. They differ on five concrete attributes (interaction shape, streaming, reliability semantics, governance surface, and primary job), and the right choice is the one whose attribute profile matches the job.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Primary job&lt;/th&gt;
&lt;th&gt;Interaction shape&lt;/th&gt;
&lt;th&gt;Streaming&lt;/th&gt;
&lt;th&gt;Reliability semantics&lt;/th&gt;
&lt;th&gt;Governance surface&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;Expose tools and resources to a model&lt;/td&gt;
&lt;td&gt;Typed request/response: model calls a discoverable tool, server returns structured output&lt;/td&gt;
&lt;td&gt;Native: supports streaming responses and progress notifications&lt;/td&gt;
&lt;td&gt;At-most-once over HTTP/JSON-RPC; retries are the client’s job&lt;/td&gt;
&lt;td&gt;Strong: tools self-describe via JSON Schema; capabilities are discoverable at connection time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2A&lt;/td&gt;
&lt;td&gt;Coordinate work between peer agents&lt;/td&gt;
&lt;td&gt;Task-oriented: one agent submits a task, another reports state changes (submitted, working, completed, failed)&lt;/td&gt;
&lt;td&gt;Native: status and artifact updates stream as the task progresses&lt;/td&gt;
&lt;td&gt;At-most-once with task-level retry; tasks are addressable and resumable&lt;/td&gt;
&lt;td&gt;Medium: agent cards declare capabilities, but task semantics are author-defined&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;td&gt;Service-to-service integration&lt;/td&gt;
&lt;td&gt;Synchronous request/response: caller blocks until server returns&lt;/td&gt;
&lt;td&gt;None native; long-poll or upgrade to SSE/WebSocket if needed&lt;/td&gt;
&lt;td&gt;Best-effort; retry and idempotency are the application’s problem&lt;/td&gt;
&lt;td&gt;Weak by default: OpenAPI helps, but it’s convention, not contract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Message queue&lt;/td&gt;
&lt;td&gt;Hand work to a worker asynchronously&lt;/td&gt;
&lt;td&gt;Fire-and-forget: producer drops a message, worker consumes when ready&lt;/td&gt;
&lt;td&gt;None: queues deliver discrete messages, not streams&lt;/td&gt;
&lt;td&gt;At-least-once with ack/nack; dead-letter queues catch poison messages&lt;/td&gt;
&lt;td&gt;Medium: per-queue ACLs and DLQs give operational control, but no schema layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event bus&lt;/td&gt;
&lt;td&gt;Broadcast facts to many consumers&lt;/td&gt;
&lt;td&gt;Publish-subscribe: one producer, N consumers, decoupled in time&lt;/td&gt;
&lt;td&gt;Stream-native: consumers replay from offsets&lt;/td&gt;
&lt;td&gt;At-least-once, often with ordering guarantees per partition; replayable history&lt;/td&gt;
&lt;td&gt;Medium: topic-level governance, schemas via registry (Avro, Protobuf), but consumer behavior is author-defined&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Caption: The Agent Communication Matrix. Use these five attributes to decide which protocol fits which job. WebSockets, SSE, and gRPC streaming appear in this discussion as transports, not as peers; they carry messages for several of the protocols above.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Three cells in this matrix do most of the real work, and they’re worth examining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP and A2A look similar on the wire but interact differently.&lt;/strong&gt; Both use HTTP, JSON-RPC, and streaming. MCP treats interaction as atomic: the model makes a call, and the server returns a structured response. A2A treats interaction as stateful: an agent submits a task, which follows a lifecycle (submitted, working, completed, failed) that both sides monitor.&lt;/p&gt;

&lt;p&gt;This has clear implications for engineers. If your “agent” acts like a stateless function, MCP is ideal, and A2A adds overhead. If your “agent” has a lifecycle (it can pause, resume, check status, or cancel), A2A provides functionality that MCP lacks. This highlights the real differences between Pattern 1 and Pattern 2.&lt;/p&gt;

&lt;p&gt;The reliability column is the most important trade-off in the matrix. Teams often misjudge it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3jnlfej33khs3vg7o5pq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3jnlfej33khs3vg7o5pq.png" alt="Comparison of reliability models. Application-layer protocols such as MCP and REST provide at-most-once delivery and require retry logic in application code. Infrastructure services such as message queues and event buses provide at-least-once delivery with built-in retries, acknowledgments, back-pressure handling, and dead-letter queues. The diagram advises keeping infrastructure reliability concerns in the platform layer." width="800" height="709"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Application versus infrastructure responsibility for reliability, retries, and message delivery.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At-most-once delivery, common in HTTP protocols like MCP and REST, means that a failed request might have been completed or not. This leaves the client uncertain.&lt;/p&gt;

&lt;p&gt;At-least-once delivery, typical for queues and event buses, ensures a message is processed at least once. However, it might be processed more than once if a worker crashes. Here, idempotency becomes the app’s responsibility.&lt;/p&gt;

&lt;p&gt;Neither approach is better than the other. The key question is where you want retry logic: in your application code (HTTP) or in your infrastructure (queues).&lt;/p&gt;

&lt;p&gt;Pattern 3 suggests that for certain agent tasks, placing retry logic in the infrastructure is better.&lt;/p&gt;

&lt;p&gt;MCP’s “strong” governance and REST’s “weak by default” rating tackle the same issue that created the OpenAPI ecosystem, but they do it differently. MCP servers self-describe when a connection happens. For example, a client requests tools/list and receives a complete schema of capabilities. This includes types, descriptions, and parameter constraints.&lt;/p&gt;

&lt;p&gt;In contrast, REST provides OpenAPI only if the producing team publishes and maintains it, while the consuming team must trust it. That’s three “if”s the agent runtime can’t resolve at runtime. MCP makes discoverability a requirement, not just a convention. This governance model includes auditable tool inventories, type-checked invocations, and capability negotiation for each session. Many underestimate this before adopting it. This is the edge that Pattern 1 uses.&lt;/p&gt;

&lt;p&gt;The matrix doesn’t choose a protocol for you. It shows what you’re trading.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6jh165h9dp4yqgad1kd5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6jh165h9dp4yqgad1kd5.png" alt="Decision tree for selecting an agent communication pattern. If a workload does not require real-time or human-in-the-loop interaction, use message queues. If it does, determine whether multiple agents coordinate shared state. Coordinated agents use an A2A mesh, while a single orchestrator uses MCP. A note recommends starting with MCP unless peer coordination is required." width="800" height="844"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Decision tree for choosing between MCP, A2A, and message-queue architectures.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: MCP-Centric Tool Access
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fojiz8ay5tol33ercotmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fojiz8ay5tol33ercotmh.png" alt="Diagram showing a single-agent architecture. An Orchestrator LLM accesses Oracle AI Database through four tools: vector search, SQL queries, memory reads, and memory writes. Oracle AI Database serves as the shared backend for vector-indexed, transactional, and audited data access." width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Single-agent orchestration pattern using Oracle AI Database for retrieval, transactions, and memory.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Spec: *&lt;/em&gt;&lt;a href="https://modelcontextprotocol.io/specification" rel="noopener noreferrer"&gt;Model Context Protocol specification, Anthropic (2024)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the pattern for production agent systems: a single LLM orchestrator, one or more MCP servers, and a typed contract between them. The model doesn’t call your database directly. Instead, it uses a tool that interacts with it, and this difference is important.&lt;/p&gt;

&lt;p&gt;MCP excels at the discovery layer. When an MCP client connects to a server, it sends a tools/list request. In return, it receives the full schema of available capabilities: names, parameters, descriptions, and return types. The model sees this inventory before acting. Selecting tools becomes a reasoning step rather than a hardcoded choice. This is a significant change. A model with discoverable tools handles “I don’t know how to do that” better than one with fixed function calls. The lack of a tool provides useful information for the model to reason about.&lt;/p&gt;

&lt;p&gt;Oracle AI Database serves well as the MCP server. The capabilities you want to expose to an agent—like vector search over embedded content, parameterized SQL against business tables, and structured memory reads and writes—fit perfectly with MCP’s tool model. A typical Oracle-backed MCP server offers four or five tools: vector_search, run_sql, read_memory, write_memory, and summarize_thread. Each is a small, focused function with a typed schema. The model chooses among them based on the task.&lt;/p&gt;

&lt;p&gt;The code below shows the minimal version: an MCP server registering one tool that performs vector search against Oracle AI Database. Note the typed array.array("f", ... ) bind for the vector column; a plain Python list will not work. The full server, with authentication, retries, and the other four tools, is in the &lt;a href="https://github.com/JeremyMorgan/oracle-ai-developer-hub/tree/adding-agent-communication-matrix" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import array

from mcp.server import Server

from mcp.types import Tool, TextContent

import oracledb, os

server = Server("oracle-tools")

pool = oracledb.create_pool(user=os.environ["DB_USER"],

                            password=os.environ["DB_PASS"],

                            dsn=os.environ["DB_DSN"], min=1, max=4)

@server.list_tools()

async def list_tools() -&amp;gt; list[Tool]:

    return [Tool(

        name="vector_search",

        description="Semantic search over the knowledge base. Returns top-k passages.",

        inputSchema={

            "type": "object",

            "properties": {"query": {"type": "string"}, "k": {"type": "integer", "default": 5}},

            "required": ["query"],

        },

    )]

@server.call_tool()

async def call_tool(name: str, arguments: dict) -&amp;gt; list[TextContent]:

    # Embed the query, then run an Oracle AI Vector Search against the indexed corpus.

    vec = array.array("f", await embed(arguments["query"]))

    with pool.acquire() as conn, conn.cursor() as cur:

        cur.execute("""

            SELECT chunk_text FROM kb_chunks

            ORDER BY VECTOR_DISTANCE(embedding, :q, COSINE)

            FETCH FIRST :k ROWS ONLY

        """, q=vec, k=arguments.get("k", 5))

        return [TextContent(type="text", text=row[0]) for row in cur]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are a few key points about what’s happening here. The tool schema acts as a contract. The model views vector_search as a typed capability. It requires a query string and allows an optional integer k. This info helps the model decide when and how to use it.&lt;/p&gt;

&lt;p&gt;The Oracle AI Database vector search executes as a single SQL statement on a VECTOR column. It uses a cosine-distance HNSW index. There’s no separate vector store, no sync job, and no eventual-consistency window.&lt;/p&gt;

&lt;p&gt;The embed() call is left out here for clarity. In the repo, it connects to a local Ollama model. (This setup allows the demo to run without needing paid API keys).&lt;/p&gt;

&lt;p&gt;In tests with the demo corpus (1,000 chunks using 768-dimensional embeddings via Ollama’s nomic-embed-text on a GPU workstation), median tool-call round-trip latency is 15.3ms. This includes 14.1ms for embedding inference and 0.9ms for Oracle vector search. On CPU-only hardware, the embedding step is usually 5 to 20 times slower, but the database side remains constant. Remember this key point: an &lt;strong&gt;MCP tool call to an Oracle-backed server is fast like a database, not like an LLM&lt;/strong&gt;. The latency during an agent turn mainly relies on the model’s own inference, not the tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Use one orchestrator agent with various specialized tools. Teams should standardize access across different agent frameworks or model providers. This setup works best when the typed schema improves model behavior, which is often true. MCP is ideal if you plan to add tools later. The discovery layer allows new capabilities to integrate with the model without changes on the client side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When not to use this:&lt;/strong&gt; Avoid having a single agent for just one tool. Don’t add a protocol if a function call is enough. If your “agent” is simply one model with a clear capability, an MCP server adds extra complexity. The discovery layer is helpful when there’s something to find; with only one tool, there’s nothing to discover.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 2: A2A Mesh with Oracle Memory
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyj23gtzjsnys91iznvv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyj23gtzjsnys91iznvv9.png" alt="Diagram showing a multi-agent workflow with Researcher, Writer, and Reviewer agents. The Researcher gathers sources, the Writer drafts content, and the Reviewer validates and revises it. Agent state and payloads are stored in Oracle AI Database, which provides vector-indexed, transactional, and audited persistence across the workflow." width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Multi-agent collaboration pattern with shared state stored in Oracle AI Database.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec:&lt;/strong&gt; &lt;a href="https://google.github.io/A2A/" rel="noopener noreferrer"&gt;Agent-to-Agent Protocol specification, Google (2025)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A2A solves a problem MCP doesn’t: what happens when an agent isn’t calling a tool but handing work to another agent. The distinction sounds semantic until you try to express “&lt;em&gt;the Researcher has finished gathering sources; the Writer should now draft a response using them&lt;/em&gt;” as a tool call. It doesn’t fit. The Writer isn’t a function the Researcher invokes. It’s a peer with its own model, its own prompt, its own lifecycle. A2A models that relationship as a task with state, addressable identity, and a status machine that both sides observe.&lt;/p&gt;

&lt;p&gt;Consider a two-agent research workflow. A Researcher agent gathers context from external sources, checks relevance, and produces findings. A Writer agent then uses those findings to draft a response in the desired tone and format. A simple setup would have the Researcher return findings directly as a response to a tool call. This works for two agents but fails with three. When a third agent, like a Reviewer, needs the same findings, you end up duplicating data in message history instead of having a central record.&lt;/p&gt;

&lt;p&gt;The A2A pattern changes this. Findings are stored in the Oracle AI Database as durable, vector-indexed rows. The Researcher writes them and sends a task message to the Writer with a reference to the findings, not the data itself. The Writer reads from the same table. This protocol ensures coordination, while the database holds the state. Multiple agents can access the same information without the protocol layer needing to see the contents.&lt;/p&gt;

&lt;p&gt;The code below is one half of the mesh: the Writer agent’s task handler, listening for task.created events from the Researcher and producing a draft. The Researcher side, plus the full A2A envelope with retries and status updates, lives in the &lt;a href="https://github.com/JeremyMorgan/oracle-ai-developer-hub/tree/adding-agent-communication-matrix" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio, oracledb, os

from a2a.server import A2AServer

from a2a.types import Task, TaskStatus, Message

pool = oracledb.create_pool(user=os.environ["DB_USER"],

                            password=os.environ["DB_PASS"],

                            dsn=os.environ["DB_DSN"], min=1, max=4)

writer = A2AServer(agent_id="writer-v1")

@writer.on_task("draft_response")

async def handle_draft(task: Task) -&amp;gt; Message:

    # The Researcher passed a memory_id, not the findings themselves.

    memory_id = task.payload["memory_id"]

    with pool.acquire() as conn, conn.cursor() as cur:

        cur.execute("SELECT findings, source_refs FROM agent_memory WHERE id = :id",

                    id=memory_id)

        findings, sources = cur.fetchone()

    await writer.update_status(task.id, TaskStatus.WORKING)

    draft = await llm_draft(findings, sources, tone=task.payload["tone"])

    cur.execute("UPDATE agent_memory SET draft = :d WHERE id = :id", d=draft, id=memory_id)

    return Message(role="agent", content=draft, refs={"memory_id": memory_id})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things in that snippet do the load-bearing work. The Writer never receives the findings in the message. It receives a memory_id and reads the actual content from Oracle. That’s the payload-by-reference pattern, and it’s the core architectural move of this pattern. The update_status call tells the Researcher (and any observer subscribed to the task) that work has begun; A2A’s status machine handles the streaming update without the Writer having to manage its own connection lifecycle. The final Message returns the draft inline because it’s the artifact of the task, but it also includes the memory_id ref, so a third agent picking this up next reads the same memory rather than re-deserializing a payload.&lt;/p&gt;

&lt;p&gt;The trade-off is clear in token counts. In the demo, using serialized findings in the message costs 1,394 tokens per Writer turn for 3KB of research. This size is typical for a research agent creating a synthesized summary with source references.&lt;/p&gt;

&lt;p&gt;In contrast, the payload-by-reference version only costs 61 tokens, no matter the findings’ size. This means a 22.9 times reduction at 3KB. The difference grows with findings size: at 500 characters, the reduction is 5.6 times; at 8KB, it reaches 58.9 times. The ratio isn’t fixed; it depends on how much data is in the message versus in the database. (Tokens counted using OpenAI’s cl100k_base tokenizer; Anthropic and Google tokenizers yield similar counts for English text.)&lt;/p&gt;

&lt;p&gt;The compounding effect is more important than any single hop. A three-agent mesh sharing the same research context across two handoffs costs about 4,000 tokens in the naive version. In the payload-by-reference version, it costs only 183 tokens. At five hops, the difference exceeds 6,500 tokens per request. This is before any agent has done actual reasoning work. The cost of “just put it in the message” increases linearly with mesh depth. Most mesh topologies grow over time, not the other way around.&lt;/p&gt;

&lt;p&gt;Oracle AI Database plays a key role here. The agent_memory table serves as a single source of truth. It is vector-indexed for semantic recall and transactional for consistency between reads and writes. Each row includes the agent ID, making it audit-friendly. The protocol layer can be A2A today and something else tomorrow. However, the memory layer stays the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Use multi-agent workflows that need peer coordination, like planner-and-specialist patterns or multi-step research pipelines. This applies when multiple agents require consistent access to the same conversational or task state. A2A is ideal for long-running tasks where a synchronous request/response model would keep connections open unnecessarily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When not to use this:&lt;/strong&gt; Avoid A2A for one agent with a set of tools. This is an MCP problem, not an A2A issue. Using A2A with a single orchestrator adds unnecessary task lifecycle management. A good test: if you can name a second agent and explain its independent decisions, A2A works. If the “second agent” is just a different prompt using the same model, it’s a tool call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Queue-Backed Backoffice Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flkeyszc3rfr3vcvteexo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flkeyszc3rfr3vcvteexo.png" alt="Diagram showing an asynchronous processing pattern. A producer publishes messages to a RabbitMQ exchange, which delivers work to an LLM worker. Failed messages are routed to a dead-letter exchange for inspection. The worker writes processed results into Oracle AI Database and uses queue features such as acknowledgments, retries, and back-pressure management." width="800" height="745"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Message-driven AI workflow using RabbitMQ and Oracle AI Database.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Imagine a document-processing pipeline. PDFs arrive in a queue. A worker agent picks them up, extracts text, embeds chunks, and writes them to the Oracle AI Database with vector indexing. It then shows results through a simple FastAPI endpoint. No MCP. No A2A. This is key. Adding either would increase the surface area without enhancing capability.&lt;/p&gt;

&lt;p&gt;This pattern resists the pull of new protocols. The urge to add MCP just because there’s an LLM involved somewhere is strong but should be resisted. A worker using the same embedding model and prompt for each message doesn’t need tool discovery. It needs at-least-once delivery, a dead-letter queue, and back-pressure for when the embedding service slows down. These are queue issues, not agent-protocol issues.&lt;/p&gt;

&lt;p&gt;The architectural shape predates the agent era, which is key to its function. Producers send messages to a queue. Workers process them at their own pace. If messages fail, they retry with exponential backoff and go to a dead-letter queue if they keep failing. The LLM acts as a worker in the pipeline, not as its orchestrator. This means the protocol layer above the LLM is as straightforward as the rest of the system, and that’s a benefit.&lt;/p&gt;

&lt;p&gt;The code below is the worker’s core loop: consume a message, embed the chunks, write to Oracle AI Database, acknowledge. Producer, dead-letter handling, and the FastAPI edge live in the &lt;a href="https://github.com/JeremyMorgan/oracle-ai-developer-hub/tree/adding-agent-communication-matrix" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import array, json, oracledb, pika, os

from embed import embed_chunks  # local Ollama call, see repo

pool = oracledb.create_pool(user=os.environ["DB_USER"],

       password=os.environ["DB_PASS"],
       dsn=os.environ["DB_DSN"], min=1, max=4)

conn = pika.BlockingConnection(pika.URLParameters(os.environ["AMQP_URL"]))

ch = conn.channel()

ch.queue_declare(queue="documents", durable=True,

                 arguments={"x-dead-letter-exchange": "documents.dlx"})

ch.basic_qos(prefetch_count=4)  # back-pressure: at most 4 in-flight per worker

def handle(ch, method, _props, body):

    doc = json.loads(body)

    chunks = doc["chunks"]                          # already segmented upstream

    vectors = [array.array("f", v) for v in embed_chunks(chunks)]

    with pool.acquire() as db, db.cursor() as cur:

        cur.executemany("""

            INSERT INTO kb_chunks (doc_id, chunk_text, embedding)

            VALUES (:doc, :txt, :vec)

        """, [(doc["id"], t, v) for t, v in zip(chunks, vectors)])

        db.commit()

    ch.basic_ack(delivery_tag=method.delivery_tag)

ch.basic_consume(queue="documents", on_message_callback=handle)

ch.start_consuming()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interesting parts are what isn’t there. There’s no tool schema, no agent identity, and no status state machine. The worker doesn’t need to show its capabilities because nothing is looking for it. The contract is the queue’s message schema, enforced by the producer’s chosen validation. The prefetch_count=4 setting tells the whole back-pressure story. If the embedding service slows down or Oracle’s connection pool fills up, messages stay on the queue instead of piling up in worker memory. The DLX (dead-letter exchange) on the queue means any message that fails repeatedly goes to a place where a human can check it, without the producer or any other agent needing to know.&lt;/p&gt;

&lt;p&gt;Reliability semantics play a crucial role here. RabbitMQ provides at-least-once delivery with ack/nack semantics. This means if a worker crashes during processing, the message is sent to another worker. You don’t need application-level retry logic. In contrast, achieving the same reliability with an MCP server involves manually writing idempotency keys, retry coordination, and ordering logic. The queue handles this for you, and “for free.” RabbitMQ has been improving these semantics since 2007. You’re not going to outdo that on a side project.&lt;/p&gt;

&lt;p&gt;The Oracle integration mirrors Patterns 1 and 2: it’s durable, vector-indexed, and transactional. The worker writes embedded chunks into the same kb_chunks table that Pattern 1’s MCP vector_search tool reads from. Teams using Oracle Database can merge the queue and memory layer into one component with Oracle Advanced Queuing. The trade-off is one less service to manage, but with slightly less portable demo code. This is the architectural benefit of the three-pattern arc: while the protocol layer changes (MCP, A2A, none), the memory layer remains constant. A document processed by Pattern 3’s queue worker is instantly searchable by Pattern 1’s MCP tool and can be referenced by Pattern 2’s A2A peers. This isn’t a coincidence; it reflects the efficiency gained when each protocol performs its best role, with Oracle AI Database maintaining the shared state for all three patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended usage:&lt;/strong&gt; Use this for asynchronous, idempotent, and throughput-sensitive tasks. Examples include document processing, batch embedding, ETL pipelines, scheduled report generation, and back-office automation. Here, the LLM acts as a worker, not an orchestrator. This pattern is ideal when you need to manage slow or temporarily down downstream services. Queues can handle those issues without the application noticing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When not to use this:&lt;/strong&gt; Avoid it for real-time, conversational, or human-in-the-loop tasks. Don’t place a queue between a user and a chatbot. It’s not suitable when latency is more critical than throughput, especially when users expect quick answers. The conversational loop fits in Patterns 1 or 2, while Pattern 3 works behind them, tackling non-interactive tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Enterprise Reality
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost scales with protocol surface area.&lt;/strong&gt; Adding each protocol to an agent system creates another layer. This layer must be monitored, secured, and fixed if something goes wrong at 3 a.m. The patterns above outline the architecture, while what follows shows how that architecture shifts when faced with real users at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auditability across async boundaries.&lt;/strong&gt; When a request crosses from MCP to a queue to A2A, the question regulators and security teams actually ask is &lt;em&gt;which agent touched which row, and when?&lt;/em&gt; The answer almost never lives in any single protocol. According to &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;LangChain’s State of Agent Engineering 2026 report&lt;/a&gt;, 89% of organizations have implemented some form of agent observability, and among teams already running agents in production that figure rises to 94%, with 71.5% reporting full tracing across individual agent steps and tool calls.&lt;/p&gt;

&lt;p&gt;The teams ahead of the curve are not the ones with the most sophisticated protocols; they are the ones who decided early that the system of record sits in the database, not in protocol message history. Oracle AI Database earns its place here as that record. Every memory write, every tool invocation, every agent identity is durable in a single governed store that does not care which protocol delivered the message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost scales with iteration, not just calls.&lt;/strong&gt; The LangChain report shows that 32% of respondents see quality as their main blocker, while latency follows at 20%. Interestingly, cost concerns have dropped over the year. Teams aren’t just paying for tokens; they’re paying for hops. Each protocol boundary adds latency, retries, and overhead. A multi-agent system crossing four boundaries per request multiplies the engineering effort. The key lesson from Pattern 3 is clear: Avoid adding a coordination protocol when the work is async, idempotent, and doesn’t need a model in the orchestration loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-tenancy and Isolation.&lt;/strong&gt; A report shows that among enterprises with 2,000 or more employees, security is now the second-largest barrier to production, noted by 24.9% of respondents. This is more significant than latency, and it affects protocol choice. MCP servers can be deployed for each tenant or shared with tenant-scoped tools. A2A meshes follow the trust boundaries of their network. Queues can isolate by virtual host or topic. None of these options are wrong, but they differ. A tenancy model that works for one protocol often doesn’t fit all three. The constant factor is the database tenancy model. Row-level security, schema-per-tenant, and Oracle’s audit infrastructure remain relevant, regardless of which protocol is in vogue for the next roadmap.&lt;/p&gt;

&lt;p&gt;The protocol layer can change. The governed memory layer should not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Is Heading
&lt;/h2&gt;

&lt;p&gt;Three things are visibly changing in the agent communication layer right now, and one of them is not yet resolved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocols are converging on capability cards.&lt;/strong&gt; Both MCP’s tool schemas and A2A’s agent cards share a key idea: discoverable, typed descriptions of capabilities. These can be fetched at connection time instead of being hard-coded in client code. The two specifications came to this idea independently, suggesting it’s a fundamental concept. In the next two years, we can expect to see shared schema conventions across protocols. This may include cross-walks between MCP’s tools/list and A2A’s agent cards, or even a new specification that combines both. Teams using either spec now are not going against this convergence; they are ready for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database-resident memory is becoming the default.&lt;/strong&gt; In this article’s three patterns, the key constant is the memory layer, not the protocol. We see vector-typed columns, consistent transactions between agent writes and reads, and audit trails that endure even after framework updates. This marks a significant shift from the architecture of 2023 and early 2024. Back then, vector stores operated as separate sidecars, while agent memory was just a Python dictionary. Oracle AI Database showcases this trend. The larger pattern shows that durable agent state should exist in the same governed system that manages your data, not in a separate stack needing constant syncing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tool/agent boundary is dissolving, and the taxonomy in this article will eventually need to be rewritten.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An MCP server can wrap an LLM-powered backend, in which case calling it is functionally an agent invocation. An A2A agent can expose itself as an MCP tool, in which case it is being addressed as a capability rather than a peer. Both moves are legitimate, both are happening in production today, and the protocols themselves do not yet have an opinion on which framing is correct.&lt;/p&gt;

&lt;p&gt;This is the open question. The matrix in this article tells you what each protocol is good at &lt;em&gt;today&lt;/em&gt;, and the three patterns work today. But the line between &lt;em&gt;here is a tool, call it and here is a peer, coordinate with it&lt;/em&gt; is genuinely blurring, and I do not think the industry has agreed yet on where it settles. The people I trust most on this question are the ones building both patterns in production and treating the distinction as an engineering choice rather than a protocol mandate. That is the right posture for the next eighteen months. The taxonomy will catch up to the practice, or it will not, and the architectural decisions you make this quarter should be robust to either outcome.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Should I pick MCP or A2A for my first agent project?&lt;/strong&gt; Almost certainly MCP. A2A addresses peer coordination, which many initial projects lack. Start with one model and a set of tools. Introduce A2A when you have a second agent that needs to work with the first on a task that lasts beyond a single request. Using A2A too early creates extra coordination without a clear need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need both MCP and A2A in the same system?&lt;/strong&gt; Yes, often. The typical production shape is A2A between agents and MCP from each agent to its tools. This is because the two protocols operate at different layers and handle different tasks. A system requires both when it has real peer coordination and actual tool access. If you don’t have one of these, you don’t need the related protocol yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I migrate from REST to MCP without rewriting my services?&lt;/strong&gt; Yes, that’s usually the cleanest adoption path. An MCP server acts as a thin wrapper over existing REST endpoints. It adds a typed tool schema and a discovery layer without altering your service code. The migration cost lies in the wrapper, not in the services. The services continue to serve their non-agent clients as before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Oracle AI Database require Oracle-specific tooling for any of these patterns?&lt;/strong&gt; No. All three patterns in this article use standard open-source Python libraries (oracledb, the mcp SDK, pika for RabbitMQ, FastAPI). Oracle AI Database participates through a connection string and a vector-typed column, not through a framework lock-in. Teams already running Oracle gain the option of collapsing the queue and the memory layer into a single component using Oracle Advanced Queuing, but it is an option, not a requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the cheapest way to try this end-to-end?&lt;/strong&gt; The companion repository ships a docker-compose.yml that stands up Oracle AI Database Free, RabbitMQ, and Ollama for local model inference. No paid API keys, no cloud accounts, no proprietary SDKs. The entire three-pattern demo runs on a developer laptop with roughly 16GB of RAM, and the Oracle AI Database Free edition supports up: 2 CPUs for foreground processes, 2GB of RAM (SGA and PGA combined), 12GB of user data on disk (irrespective of compression factor)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When is “just use a queue” the right answer?&lt;/strong&gt; When the work is asynchronous, idempotent, and sensitive to throughput, the LLM acts as a worker, not an orchestrator. This applies to most backoffice tasks, like automation, batch embedding, document processing, and scheduled reporting. The key test is if a human needs the result in real time. If not, a queue is usually the best choice. MCP or REST should only be used at the edges, where the system interacts with a human or a synchronous external service.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>a2a</category>
    </item>
    <item>
      <title>An Agent Skill that uses Kafka Java APIs for Oracle AI Database</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Mon, 29 Jun 2026 15:36:49 +0000</pubDate>
      <link>https://dev.to/oracledevs/an-agent-skill-that-uses-kafka-java-apis-for-oracle-ai-database-47j3</link>
      <guid>https://dev.to/oracledevs/an-agent-skill-that-uses-kafka-java-apis-for-oracle-ai-database-47j3</guid>
      <description>&lt;p&gt;&lt;strong&gt;Use skills to build OKafka apps with Oracle AI Database&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OKafka is a Kafka Java API for Oracle AI Database Transactional Event Queues. OKafka implements standard Kafka Java interfaces to create topics, produce, and consume messages directly in the database.&lt;/li&gt;
&lt;li&gt;This agent skill helps you write Kafka Java for Oracle AI Database Transactional Event Queues’ &lt;a href="https://github.com/oracle/okafka" rel="noopener noreferrer"&gt;OKafka library&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The skill encodes Oracle-specific additions to the Kafka Java API: authentication, using transactions, serialization, and database-specific topic behavior. &lt;/li&gt;
&lt;li&gt;Good agent skills raises the team baseline: better first pass code, fewer manual corrections, and improved integrations with Oracle AI Database.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fh936arajxhmzntp7mh6j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fh936arajxhmzntp7mh6j.png" alt="Diagram showing how hand-written examples feed an agent skill containing OKafka administration, transaction handling, database connections, and Testcontainers patterns. The skill generates an OKafka application and tests. Review effort shifts from setup corrections to validating transaction behavior, commit paths, rollback handling, and runnable proofs." width="800" height="785"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Skill-driven generation of OKafka applications with validated transaction patterns.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In my own work, I found most coding agents weren’t generating high-quality code for &lt;a href="https://andersswanson.dev/2025/07/09/authenticate-to-your-oracle-database-like-its-a-kafka-cluster/" rel="noopener noreferrer"&gt;Oracle AI Database’s Kafka Java API&lt;/a&gt; (OKafka). You can get results, but they’re not idiomatic, and miss subtleties. This is why I created the &lt;a href="https://github.com/anders-swanson/oracle-database-code-samples/tree/main/skills/okafka-java-code" rel="noopener noreferrer"&gt;okafka-java-code oracle agent skill&lt;/a&gt;, based off my hand-written &lt;a href="https://anders-swanson.github.io/oracle-database-code-samples/features/kafka/" rel="noopener noreferrer"&gt;Kafka Java API examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Agent skills can greatly enhance code generation for Oracle AI Database apps, and this skill encapsulates solutions to the problems I kept hand-coding: how to authenticate with OKafka, how to use transactions, how to create topics, and how to use Oracle-specific serialization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To install the skill, point your agents at this GitHub link:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/anders-swanson/oracle-database-code-samples/tree/main/skills/okafka-java-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What’s in &lt;a href="https://github.com/anders-swanson/oracle-database-code-samples/tree/main/skills/okafka-java-code" rel="noopener noreferrer"&gt;the skill&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This is a standard agent skill, with markdown references to code snippets &lt;a href="https://anders-swanson.github.io/oracle-database-code-samples/features/kafka/" rel="noopener noreferrer"&gt;implemented by my samples&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skills/okafka-java-code
├── agent-skill-okafka-java-api.md
├── agents
│   └── openai.yaml
├── references
│   ├── authentication-and-properties.md
│   ├── dependencies.md
│   ├── oson-serialization.md
│   ├── producer-consumer.md
│   ├── testing-and-troubleshooting.md
│   ├── topics-and-admin.md
│   └── transactions.md
└── SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each reference markdown file covers specific areas of OKafka Java code: initializing OKafka classes, serialization, authentication, testing, and transactional workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Let’s try using the skill to generate an app
&lt;/h2&gt;

&lt;p&gt;Start by installing the &lt;a href="https://github.com/anders-swanson/oracle-database-code-samples/blob/main/skills/okafka-java-code/SKILL.md" rel="noopener noreferrer"&gt;OKafka Java Code skill&lt;/a&gt; and see what you can generate.&lt;/p&gt;

&lt;p&gt;I used the Oracle agent skill to &lt;a href="https://github.com/anders-swanson/generated-okafka-app" rel="noopener noreferrer"&gt;generate an app&lt;/a&gt; with a transactional producer and consumer, and a Testcontainers test. The app was generated in one shot with Codex and GPT 5.5-high and is &lt;strong&gt;almost identical to code I’d write myself&lt;/strong&gt;. Transactional workflows are handled by calling getDBConnection on the producer and consumer, producing and consuming messages in the &lt;strong&gt;same database transaction&lt;/strong&gt; as insert and updates.&lt;/p&gt;

&lt;p&gt;The generated app creates a transactional event flow around Oracle AI Database Transactional Event Queues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/main/java/com/example/okafka/transactions/TopicAdmin.java" rel="noopener noreferrer"&gt;TopicAdmin&lt;/a&gt; creates the topic through Kafka Admin with OKafka’s AdminClient.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/main/java/com/example/okafka/transactions/OkafkaProperties.java" rel="noopener noreferrer"&gt;OkafkaProperties&lt;/a&gt; builds base properties and adds producer or consumer settings in separate methods.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/main/java/com/example/okafka/transactions/TransactionalEventProducer.java" rel="noopener noreferrer"&gt;TransactionalEventProducer&lt;/a&gt; sends a record and writes to produced_events through producer.getDBConnection() before commitTransaction().&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/main/java/com/example/okafka/transactions/TransactionalEventConsumer.java" rel="noopener noreferrer"&gt;TransactionalEventConsumer&lt;/a&gt; writes consumed records through consumer.getDBConnection() and calls commitSync() only after the database work succeeds.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/test/java/com/example/okafka/transactions/TransactionalEventsIT.java" rel="noopener noreferrer"&gt;TransactionalEventsIT&lt;/a&gt; starts an Oracle AI Database Free container with Testcontainers, applies the OKafka grants, creates a topic, and verifies producer commit, producer abort, and consumer rollback behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This producer method is the kind of output I wanted to nudge agent stoward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private void publish(BusinessEvent event, boolean failAfterDatabaseWrite) throws Exception {
    producer.beginTransaction();
    try {
        producer.send(new ProducerRecord&amp;lt;&amp;gt;(topic, event.id(), event.payload())).get();
        insertProducedEvent(producer.getDBConnection(), event);
        if (failAfterDatabaseWrite) {
            throw new IllegalStateException("Simulated failure before producer commit");
        }
        producer.commitTransaction();
    } catch (InterruptedException exception) {
        Thread.currentThread().interrupt();
        abortAndRethrow(exception);
    } catch (Exception exception) {
        abortAndRethrow(exception);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see the transaction boundary, the Kafka send, the database write, and the abort path in one place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Femv50qh579kwtkr889y8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Femv50qh579kwtkr889y8.png" alt="Diagram showing a transactional OKafka workflow. A producer begins a transaction, sends a Kafka record, inserts database rows, and either commits or aborts. A consumer processes records, applies side effects, commits offsets, and rolls back on failure. Kafka records and SQL state share the same Oracle Database transaction boundary." width="800" height="1244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Transactional OKafka pattern coordinating Kafka messages and database changes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The consumer side follows the same idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private void persistAndCommit(ConsumerRecords&amp;lt;String, String&amp;gt; records, boolean failAfterDatabaseWrite)
        throws Exception {
    Connection connection = consumer.getDBConnection();
    try {
        for (ConsumerRecord&amp;lt;String, String&amp;gt; record : records) {
            insertConsumedEvent(connection, record);
        }
        if (failAfterDatabaseWrite) {
            throw new IllegalStateException("Simulated failure before consumer commit");
        }
        consumer.commitSync();
    } catch (Exception exception) {
        connection.rollback();
        throw exception;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated code preserves the important bits: database work happens on the consumer’s OKafka connection, and the offset is committed only after that work succeeds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing is part of the skill
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flg9y0bt74nd4psu143wk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Flg9y0bt74nd4psu143wk.png" alt="Diagram titled “The test is the claim.” A runnable OKafka demo uses Testcontainers to provision Oracle AI Database Free, bootstraps grants and configuration, and creates a Kafka topic. Three test outcomes are validated: successful commit with visible rows and records, producer abort with no persisted data, and consumer rollback where messages remain available for retry." width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Runnable OKafka test topology validating commit, abort, and rollback behavior.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This skill includes guidance to validate with an integration test or smoke path that creates the topic, produces records, consumes records, and queries the TxEventQ backing table or related database side effect.&lt;/p&gt;

&lt;p&gt;The generated app follows that direction. Its &lt;a href="https://github.com/anders-swanson/generated-okafka-app/blob/main/src/test/java/com/example/okafka/transactions/TransactionalEventsIT.java" rel="noopener noreferrer"&gt;integration test&lt;/a&gt; starts gvenzl/oracle-free:23.26.2-slim-faststart, writes an ojdbc.properties file for local PLAINTEXT OKafka access, and then checks three paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a committed producer transaction creates the database row and can be consumed;&lt;/li&gt;
&lt;li&gt;an aborted producer transaction leaves no produced row and no consumable record;&lt;/li&gt;
&lt;li&gt;a failed consumer batch rolls back the database write and leaves the record available for a later successful consume.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can run the generated app tests with &lt;strong&gt;mvn verify&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The test includes grants and setup for the local container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alter session set container=freepdb1;

grant aq_user_role to TESTUSER;
grant execute on dbms_aq to TESTUSER;
grant execute on dbms_aqadm to TESTUSER;
grant select on gv_$session to TESTUSER;
grant select on v_$session to TESTUSER;
grant select on gv_$instance to TESTUSER;
grant select on gv_$listener_network to TESTUSER;
grant select on SYS.DBA_RSRC_PLAN_DIRECTIVES to TESTUSER;
grant select on gv_$pdbs to TESTUSER;
grant select on user_queue_partition_assignment_table to TESTUSER;
exec dbms_aqadm.GRANT_PRIV_FOR_RM_PLAN('TESTUSER');
commit;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is loaded and run on the local container at test startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Container
private static final OracleContainer ORACLE = new OracleContainer(ORACLE_IMAGE)
        .withStartupTimeout(Duration.ofMinutes(4))
        .withUsername(TEST_USER)
        .withPassword(TEST_PASSWORD);

private static OracleDataSource dataSource;
private static Path okafkaConfigDirectory;

@BeforeAll
static void configureDatabase() throws Exception {
    ORACLE.copyFileToContainer(MountableFile.forClasspathResource("okafka.sql"), "/tmp/okafka.sql");
    org.testcontainers.containers.Container.ExecResult result =
            ORACLE.execInContainer("sqlplus", "sys / as sysdba", "@/tmp/okafka.sql");
    if (result.getExitCode() != 0) {
        throw new IllegalStateException("Unable to apply OKafka grants: " + result.getStderr());
    }

    dataSource = new OracleDataSource();
    dataSource.setURL(ORACLE.getJdbcUrl());
    dataSource.setUser(TEST_USER);
    dataSource.setPassword(TEST_PASSWORD);

    okafkaConfigDirectory = Files.createTempDirectory("okafka-tns-admin-");
    Files.writeString(okafkaConfigDirectory.resolve("ojdbc.properties"), """
            user = testuser
            password = Welcome123#
            """);

    try (Connection connection = dataSource.getConnection()) {
        EventSchema.createTables(connection);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F85v4v97duua5vjfeinru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F85v4v97duua5vjfeinru.png" alt="Diagram titled “Package the corrections.” Workflow rules and review guidance are packaged into an OKafka Java coding skill covering topics, transactions, and testing. The skill generates reusable artifacts such as topic administration, configuration properties, producer/consumer code, and integration tests. The goal is to turn recurring review comments into reusable implementation guidance." width="800" height="1232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Agent skill design for reusable OKafka coding patterns and validation workflows.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The real leverage here is developing and sharing agent skills that capture the Oracle AI Database patterns your team needs. Do you have common database workflows? Common development patterns? Encapsulate them in a skill, iterate on it, and share it.&lt;/p&gt;

&lt;p&gt;Once details are packaged, agents can operate at a higher level. You spend less time correcting boilerplate and more time designing stronger examples, testing real behavior, and building more powerful Oracle AI Database applications from a better starting point.&lt;/p&gt;




&lt;h2&gt;
  
  
  To summarize
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Any Java developer working with Oracle AI Database can use this skill to write pub/sub code with Kafka APIs that target the database.&lt;/li&gt;
&lt;li&gt;OKafka adds database connection APIs to standard Kafka Java APIs; otherwise, the same interfaces are used.&lt;/li&gt;
&lt;li&gt;The getDBConnection() method in OKafka KafkaProducer and KafkaConsumer classes allows developers to add database logic to produce and consume operations in a single transaction.&lt;/li&gt;
&lt;li&gt;To validate generated code yourself, refer to &lt;a href="https://andersswanson.dev/2025/07/09/authenticate-to-your-oracle-database-like-its-a-kafka-cluster/" rel="noopener noreferrer"&gt;concrete OKafka code examples&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The skill leverages hand-written, tested OKafka code to generate new code specific to your application. You can find &lt;a href="https://anders-swanson.github.io/oracle-database-code-samples/patterns/event-streaming/" rel="noopener noreferrer"&gt;additional samples here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2026/04/21/using-agent-skills-to-develop-with-oracle-ai-database/" rel="noopener noreferrer"&gt;Using Agent Skills to develop with Oracle AI Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2025/07/09/authenticate-to-your-oracle-database-like-its-a-kafka-cluster/" rel="noopener noreferrer"&gt;Authenticate to your Oracle AI Database like it’s a Kafka cluster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2025/09/18/pub-sub-in-your-db-oracle-database-txeventq/" rel="noopener noreferrer"&gt;Pub/Sub in your DB? Oracle AI Database TxEventQ&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2025/05/28/migrate-apache-kafka-applications-to-oracle-database/" rel="noopener noreferrer"&gt;Migrate Apache Kafka applications to Oracle AI Database: Part I&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2026/01/14/propagating-cross-database-events-with-oracle-ai-database/" rel="noopener noreferrer"&gt;Propagating Cross-Database Events with Oracle AI Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2025/05/29/easily-test-oracle-database-applications-with-testcontainers/" rel="noopener noreferrer"&gt;Easily test Oracle AI Database applications with Testcontainers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andersswanson.dev/2025/05/22/oracle-database-for-free/" rel="noopener noreferrer"&gt;Oracle AI Database for Free?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agentskills</category>
      <category>kafka</category>
      <category>javaapi</category>
      <category>oracle</category>
    </item>
    <item>
      <title>Single OpenAI-compatible endpoint for OCI Generative AI models with LiteLLM</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Mon, 29 Jun 2026 14:30:28 +0000</pubDate>
      <link>https://dev.to/oracledevs/single-openai-compatible-endpoint-for-oci-generative-ai-models-with-litellm-43e3</link>
      <guid>https://dev.to/oracledevs/single-openai-compatible-endpoint-for-oci-generative-ai-models-with-litellm-43e3</guid>
      <description>&lt;p&gt;This post stands up a LiteLLM gateway on an OCI Compute instance that authenticates to OCI Generative AI using an &lt;strong&gt;instance principal&lt;/strong&gt; — the identity OCI already hands every VM — so there are no signing keys to generate, mount, or rotate. Supported OCI Generative AI models such as Grok, Gemini, Llama, and Cohere models can be reached through the gateway, subject to region and model availability. And because routing is pure passthrough, the new supported on-demand models can be discovered without maintaining a hardcoded model list.&lt;/p&gt;

&lt;p&gt;If you saw the &lt;a href="https://blogs.oracle.com/ai-and-datascience/litellm-natively-supports-generative-ai" rel="noopener noreferrer"&gt;announcement&lt;/a&gt; that LiteLLM now natively supports Oracle Generative AI, this is the hands-on companion: the exact resources, the IAM that makes instance principal work, and the networking detail that ties it together — start to finish.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this shape
&lt;/h2&gt;

&lt;p&gt;LiteLLM gives you a single OpenAI-compatible surface (/v1/chat/completions, /v1/embeddings, /v1/models) in front of Grok, Llama, Gemini, Cohere Command/Embed, and OpenAI gpt-oss — all hosted on OCI Generative AI, with OCI Signature v1 signing handled inside LiteLLM. Running it inside your tenancy on a Compute instance buys a simpler OCI credential story: the instance authenticates as itself, governed by an IAM policy, and you never handle a private key.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F061gekqqcjp6dq1exacd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F061gekqqcjp6dq1exacd.png" alt="Architecture: a client calls the gateway over HTTPS on port 443; inside the OCI tenancy a Compute VM runs Caddy (automatic TLS) reverse-proxying to a podman container with the LiteLLM SDK + FastAPI shim and the instance-principal signer, which pulls a token from the instance metadata service and makes Signature-v1-signed calls out through the Internet Gateway to OCI Generative AI; an IAM policy authorizes generative-ai-family in the compartment." width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;LiteLLM with OCI GenAI Architecture&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The gateway runs inside your tenancy. The client hits Caddy on :443 (NSG-gated, SSH scoped to your IP); Caddy terminates TLS and reverse-proxies to the shim on localhost:4000. The shim signs each call with the VM’s own instance-principal identity — token fetched from 169.254.169.254 — and reaches OCI Generative AI via the Internet Gateway. One IAM policy authorizes it all; no OCI API signing keys on disk. The federated token is short-lived, so the shim re-federates automatically on an OCI 401 INVALID_AUTHENTICATION_INFO — token expiry self-heals rather than surfacing as an error.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The one caveat worth reading first
&lt;/h2&gt;

&lt;p&gt;LiteLLM exposes OCI two ways, and they are not interchangeable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The LiteLLM Proxy&lt;/strong&gt; (litellm --config config.yaml) supports OCI via &lt;strong&gt;manual API-key&lt;/strong&gt; credentials only — oci_user, oci_fingerprint, oci_tenancy, oci_key/oci_key_file, oci_compartment_id. There is no way to hand the proxy an instance-principal signer object through YAML.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The LiteLLM SDK&lt;/strong&gt; (litellm.completion(...)) accepts an oci_signer= object, which is the door to instance principal, resource principal, and OKE workload identity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if you want instance principal without OCI API signing keys, you call the &lt;strong&gt;SDK&lt;/strong&gt; and put a thin OpenAI-compatible HTTP layer in front of it. That’s the path of this implementation. You trade away the proxy’s management UI (virtual keys, budgets, logs); you avoid storing OCI API signing credentials.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 — IAM: one policy, no keys, no users
&lt;/h2&gt;

&lt;p&gt;Instance principal is an any-principal identity at request time; the only thing between your VM and OCI Generative AI is a policy. Broad version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;allow any-user to manage generative-ai-family in compartment &amp;lt;YourCompartment&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Least-privilege version, scoped to the instance via a dynamic group:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Dynamic group (Identity &amp;amp; Security &amp;gt; Domains &amp;gt; Dynamic groups)
ALL {instance.compartment.id = '&amp;lt;compartment-ocid&amp;gt;'}
# Policy
allow dynamic-group &amp;lt;litellm-dg&amp;gt; to use generative-ai-family in compartment &amp;lt;YourCompartment&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;‘use‘ is enough for inference. That’s the entire identity story — no OCI API signing key stored on the instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2 — Networking
&lt;/h2&gt;

&lt;p&gt;A dedicated VCN keeps the gateway self-contained and trivially removable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;COMP=&amp;lt;compartment-ocid&amp;gt;; REGION=us-ashburn-1
VCN=$(oci network vcn create -c $COMP --region $REGION --cidr-blocks '["10.20.0.0/16"]' \
  --display-name litellm-vcn --dns-label litellmvcn --query data.id --raw-output --wait-for-state AVAILABLE)
IGW=$(oci network internet-gateway create -c $COMP --region $REGION --vcn-id $VCN --is-enabled true \
  --display-name litellm-igw --query data.id --raw-output --wait-for-state AVAILABLE)
RT=$(oci network route-table create -c $COMP --region $REGION --vcn-id $VCN --display-name litellm-rt \
  --route-rules '[{"destination":"0.0.0.0/0","destinationType":"CIDR_BLOCK","networkEntityId":"'$IGW'"}]' \
  --query data.id --raw-output --wait-for-state AVAILABLE)
SUBNET=$(oci network subnet create -c $COMP --region $REGION --vcn-id $VCN --cidr-block 10.20.1.0/24 \
  --display-name litellm-subnet --dns-label litellmsub --route-table-id $RT \
  --prohibit-public-ip-on-vnic false --query data.id --raw-output --wait-for-state AVAILABLE)
NSG=$(oci network nsg create -c $COMP --region $REGION --vcn-id $VCN --display-name litellm-nsg \
  --query data.id --raw-output --wait-for-state AVAILABLE)
oci network nsg rules add --nsg-id $NSG --region $REGION --security-rules \
  '[{"direction":"INGRESS","protocol":"6","source":"0.0.0.0/0","sourceType":"CIDR_BLOCK","isStateless":false,"tcpOptions":{"destinationPortRange":{"min":4000,"max":4000}}}]'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep SSH (22) on the VCN default security list but scope it to your own IP. Port 4000 lives on the NSG, so you open it as wide as your clients need.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — A baked image, not install-on-boot
&lt;/h2&gt;

&lt;p&gt;The LiteLLM image runs from a uv-managed venv at /app/.venv that ships &lt;strong&gt;without pip&lt;/strong&gt;, so python3 -m pip install oci fails with “No module named pip”. Bootstrap it once at build time and bake the result into an image, so every container start is fast and offline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ghcr.io/berriai/litellm:main-stable
RUN /app/.venv/bin/python3 -m ensurepip --upgrade \
 &amp;amp;&amp;amp; /app/.venv/bin/python3 -m pip install --no-cache-dir oci fastapi uvicorn
COPY server.py /app/server.py
ENV PORT=4000
ENTRYPOINT ["/app/.venv/bin/python3", "/app/server.py"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;podman build -t oci-litellm-gateway:latest -f Containerfile .
podman run -d --name litellm --restart=always --network=host \
  -e OCI_REGION=us-ashburn-1 -e OCI_COMPARTMENT_ID=&amp;lt;compartment-ocid&amp;gt; \
  -e LITELLM_MASTER_KEY=&amp;lt;your-bearer-key&amp;gt; -e PORT=4000 \
  oci-litellm-gateway:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details that can save you a lot of debugging time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;--network=host is mandatory.&lt;/strong&gt; Instance principal fetches its leaf certificate and token from the metadata service at 169.254.169.254 (link-local address). A container on the default bridge network can’t route to that link-local address; host networking fixes it (and binds :4000 on the host directly).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the venv’s python3.&lt;/strong&gt; litellm lives in /app/.venv; a system python won’t see it. The ENTRYPOINT above pins it.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 4 — The shim: LiteLLM SDK behind an OpenAI-compatible API
&lt;/h2&gt;

&lt;p&gt;This is the whole gateway. It builds the instance-principal signer once, exposes the OpenAI routes, forwards every model name straight through as oci/, and discovers /v1/models live from OCI so there is no list to maintain. Those routes are the &lt;strong&gt;Chat Completions–era&lt;/strong&gt; API (/v1/chat/completions, /v1/embeddings, /v1/models) — deliberately not OpenAI’s newer &lt;strong&gt;Responses API&lt;/strong&gt; (/v1/responses); LiteLLM’s completion() and embedding() map to Chat Completions and Embeddings, which is still what every mainstream chat client speaks. Configuration is entirely environment-driven.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os, json, datetime, litellm
from oci.auth.signers import InstancePrincipalsSecurityTokenSigner
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse

REGION = os.environ.get("OCI_REGION", "us-ashburn-1")
COMP   = os.environ.get("OCI_COMPARTMENT_ID", "")
MKEY   = os.environ.get("LITELLM_MASTER_KEY", "")
PORT   = int(os.environ.get("PORT", "4000"))

SIGNER = InstancePrincipalsSecurityTokenSigner()
OCI = dict(oci_signer=SIGNER, oci_region=REGION, oci_compartment_id=COMP)
app = FastAPI()

def resolve(name):                       # pure passthrough: any model -&amp;gt; oci/&amp;lt;model&amp;gt;
    if not name: raise HTTPException(400, "missing 'model'")
    return name if name.startswith("oci/") else f"oci/{name}"

def auth(r):
    if MKEY and r.headers.get("authorization", "") != f"Bearer {MKEY}":
        raise HTTPException(401, "unauthorized")

def discover_models():                   # live, best-effort; never fatal
    try:
        import oci
        c = oci.generative_ai.GenerativeAiClient(config={}, signer=SIGNER)
        now = datetime.datetime.now(datetime.timezone.utc); out = []
        for m in c.list_models(compartment_id=COMP).data.items:
            caps = set(m.capabilities or [])
            if m.lifecycle_state != "ACTIVE" or not ({"CHAT","TEXT_EMBEDDINGS"} &amp;amp; caps): continue
            r = m.time_on_demand_retired
            if r is not None and r.year &amp;gt; 1971 and r &amp;lt;= now: continue
            out.append(m.display_name)
        return sorted(set(out))
    except Exception:
        return []

@app.get("/health/readiness")
def ready(): return {"status": "connected"}

@app.get("/v1/models")
def models():
    return {"object": "list", "data": [{"id": m, "object": "model", "owned_by": "oci"} for m in discover_models()]}

@app.post("/v1/chat/completions")
async def chat(req: Request):
    auth(req); b = await req.json()
    common = dict(model=resolve(b.get("model")), messages=b["messages"], **OCI)
    if b.get("stream"):
        def gen():
            for c in litellm.completion(stream=True, **common):
                yield f"data: {json.dumps(c.model_dump())}\n\n"
            yield "data: [DONE]\n\n"
        return StreamingResponse(gen(), media_type="text/event-stream")
    return litellm.completion(**common).model_dump()

@app.post("/v1/embeddings")
async def embeddings(req: Request):
    auth(req); b = await req.json()
    inp = b["input"]; inp = [inp] if isinstance(inp, str) else inp
    return litellm.embedding(model=resolve(b.get("model")), input=inp, **OCI).model_dump()

if __name__ == "__main__":
    import uvicorn; uvicorn.run(app, host="0.0.0.0", port=PORT)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because routing is passthrough, supported on-demand OCI Generative AI models can be reached through the gateway, subject to region, tenancy access, model availability, and LiteLLM compatibility. The live /v1/models discovery means you do not need to maintain a hardcoded model list, and supported new models can become available through the endpoint as OCI exposes them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 — Test it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IP=&amp;lt;public-ip&amp;gt;; KEY=&amp;lt;your-bearer-key&amp;gt;
curl -s http://$IP:4000/health/readiness
curl -s http://$IP:4000/v1/chat/completions \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"model":"xai.grok-4.3","messages":[{"role":"user","content":"In one sentence, what is Oracle Generative AI?"}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s a drop-in OpenAI base URL, so the OpenAI SDK works unchanged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from openai import OpenAI
c = OpenAI(base_url="http://&amp;lt;public-ip&amp;gt;:4000/v1", api_key="&amp;lt;your-bearer-key&amp;gt;")
print(c.chat.completions.create(model="xai.grok-4.3",
    messages=[{"role": "user", "content": "Hello from OCI"}]).choices[0].message.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 6 — Harden
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope SSH to your IP&lt;/strong&gt; in the VCN default security list; leave port 4000 (on the NSG) as open as you need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Front it with a name and TLS (Caddy).&lt;/strong&gt; Point a DNS-only A record at the instance, open 80 + 443 in the NSG, then run Caddy alongside the gateway with a two-line Caddyfile:chat.example.com { reverse_proxy localhost:4000 } podman run -d --name caddy --restart=always --network=host \ -v /opt/caddy/Caddyfile:/etc/caddy/Caddyfile:Z \ -v caddy_data:/data -v caddy_config:/config \ docker.io/library/caddy:latest Caddy obtains and renews a Let’s Encrypt certificate automatically (TLS-ALPN-01 on 443, HTTP-01 on 80) and reverse-proxies to the shim on localhost:4000, passing the Authorization header through. Now &lt;a href="https://chat.example.com/v1" rel="noopener noreferrer"&gt;https://chat.example.com/v1&lt;/a&gt; works with no port (your domain name may vary here) — which also unblocks browser-hosted chat UIs that refuse to call plain-HTTP endpoints (mixed content). Note most DNS proxies won’t forward arbitrary ports, so keep the record DNS-only (or let Caddy own 443).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let browser UIs in (CORS).&lt;/strong&gt; Server-side clients (curl, the SDK, Open WebUI, LobeChat on Vercel) work as-is, but browser apps that call the endpoint straight from the page need CORS headers or the browser blocks the preflight. Set ENABLE_CORS=true and scope CORS_ORIGINS; the master key stays the gate, and since auth is a bearer header rather than a cookie, credentials mode stays off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate the key&lt;/strong&gt; by changing LITELLM_MASTER_KEY and restarting the container.&lt;/li&gt;
&lt;li&gt;For real multi-tenant key management, budgets, and request logs, switch to the LiteLLM Proxy with a manual signing key.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This is a single OpenAI-compatible endpoint for supported on-demand OCI Generative AI models, authenticated with the instance’s own identity, without storing an OCI API signing key. That last part is the real win. Revoking access is editing one IAM policy.&lt;/p&gt;

&lt;p&gt;And it’s small enough to trust: one VCN, one subnet, one NSG, one VM, one policy — a surface you can hand to a security reviewer on a single page or stamp out per environment from the cloud-init here.&lt;/p&gt;

&lt;p&gt;In return, any OpenAI-compatible client—desktop apps, browser UIs, or your own code—can access Grok, Gemini, Llama, and Cohere without SDKs or per-application credentials. And because models are discovered dynamically, new OCI Generative AI models become available through the endpoint automatically.&lt;/p&gt;

&lt;p&gt;The only implementation details worth remembering are the non-obvious ones: --network=host for metadata access, bootstrapping pip into the image’s venv at build time, and remembering that instance principal authentication lives on the SDK path, not in the proxy’s YAML configuration.&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>ai</category>
      <category>litellm</category>
      <category>oci</category>
    </item>
    <item>
      <title>The Agent Loop Decoded</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Mon, 29 Jun 2026 14:28:14 +0000</pubDate>
      <link>https://dev.to/oracledevs/the-agent-loop-decoded-2ak2</link>
      <guid>https://dev.to/oracledevs/the-agent-loop-decoded-2ak2</guid>
      <description>&lt;p&gt;This article was originally written and published by Richmond Alake on &lt;a href="https://blogs.oracle.com/developers/the-agent-loop-decoded-three-levels-every-agent-engineer-must-know" rel="noopener noreferrer"&gt;blogs.oracle&lt;/a&gt; on 11 June.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Levels Every Agent Engineer Must Know
&lt;/h2&gt;

&lt;p&gt;Chances are you have already run an agent loop today without naming it.&lt;/p&gt;

&lt;p&gt;Every session with a coding companion such as Claude Code, Codex, or Cursor is one: the model reads a  request, inspects the repository, edits a file, runs the tests, observes the failures, and edits  again until the build passes.&lt;/p&gt;

&lt;p&gt;That cycle of reasoning, acting, and observing the result is the  agent loop at work, and it now sits at the centre of nearly every production agent system. &lt;strong&gt;The agent loop is the repeating cycle a harness runs within a single agent turn: assemble context, invoke the model to reason, act on its decision, and go again until a stop condition ends the run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This piece unpacks that loop across three levels of understanding.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Level 1 is the minimal  loop most developers meet first: an LLM, a handful of tools, and a response.&lt;/li&gt;
&lt;li&gt;Level 2  introduces a lifecycle inside the loop, where memory operations turn a stateless process into a reasoning engine with state.&lt;/li&gt;
&lt;li&gt;Level 3 pushes operations both inside and outside the loop,  where the agent harness becomes a system in its own right.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you will know which level your system sits at, what breaks when the level and the task are mismatched, and what engineering work moves you up. Every pattern discussed is implemented in the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/agent_memory.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt;, built on Oracle AI Database, so you can run the loop rather than just read about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is an Agent
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsx60rb5dufdka42vbpha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsx60rb5dufdka42vbpha.png" alt="Diagram showing a basic AI agent architecture. The agent perceives an environment containing users, tools, and data, reasons using a large language model, and takes actions. The agent also reads from and writes to a memory system that stores state beyond the current message, enabling persistence across interactions." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: An agent perceives its environment, reasons with an LLM, acts, and remembers&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An agent is a computational system that perceives its environment, reasons about  what it perceives, takes actions to achieve a goal, and has some form of memory.&lt;/strong&gt; That description applies to many things: a thermostat, a chess engine, a human professional.  What makes an AI agent distinct is that the reasoning step is handled by a large language  model, and the range of possible actions extends well beyond a binary output.&lt;/p&gt;

&lt;p&gt;An agent’s architecture consists of two separable layers. The first is the model: the inference engine that does the reasoning. The second is the harness: the code that prepares context,  executes tool calls, enforces operational constraints, and persists state. Most agent  engineering work happens in the harness, not the model. Understanding that boundary  clarifies where failures originate and where interventions are effective.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbk9dx5n77vlt8lmuaw4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbk9dx5n77vlt8lmuaw4i.png" alt="Diagram showing an agent architecture with two layers. A model handles reasoning and decision-making inside a larger agent harness. The harness provides context assembly, tool execution, operational constraints, and state persistence around the model. A note emphasizes that most agent engineering work occurs in the harness rather than the model itself." width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: The two layers of an agent’s architecture: the model and the harness&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An agent needs at minimum four things to be useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instructions:&lt;/strong&gt; a system prompt or goal that tells it what it is trying to accomplish.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory:&lt;/strong&gt; access to information beyond the current message, including prior context,  retrieved knowledge, and learned patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The ability to take actions:&lt;/strong&gt; tool calls, API requests, database writes, or any operation with an external effect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A reasoning engine:&lt;/strong&gt; an LLM that looks at context and decides what to do next.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Is a Loop?
&lt;/h2&gt;

&lt;p&gt;A loop is a control structure that repeats a block of execution until a condition is met. In  programming you encounter this everywhere: iterating over a collection, running until a flag  is set, calling recursively until a base case is reached.&lt;/p&gt;

&lt;p&gt;The agent loop applies that same structure to an LLM-powered system. Rather than  processing a user message once and returning a static response, the agent feeds its output  back into itself, reasoning, acting, observing the result, and reasoning again, until it  determines the task is complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fuhq5i1fg3fhiwrz36ty3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fuhq5i1fg3fhiwrz36ty3.png" alt="Flow diagram showing the agent loop. Context is assembled from instructions, memory, and tool outputs, then passed to a reasoning step. The agent acts by responding, calling tools, or writing state. The cycle repeats until a stop condition is met, producing a final response." width="800" height="827"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: The agent loop: assemble context, reason, act, and repeat until a stop condition ends the run&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The necessity for loops in agent execution can be derived from the nature of the use cases  and tasks agents are applied to. These common use cases can be referred to as &lt;strong&gt;application modes&lt;/strong&gt;: the expected interaction patterns between a user and an agent. There  are three:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assistant&lt;/li&gt;
&lt;li&gt;Deep Research&lt;/li&gt;
&lt;li&gt;Coding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Take the deep research mode. An agent tasked with finding relevant sources, identifying  contradictions across them, and producing a structured summary is not running a single-shot task. It requires the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for relevant sources.&lt;/li&gt;
&lt;li&gt;Read and evaluate what it finds.&lt;/li&gt;
&lt;li&gt;Identify gaps and contradictions.&lt;/li&gt;
&lt;li&gt;Search again to fill in those gaps.&lt;/li&gt;
&lt;li&gt;Synthesise everything into a coherent output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdtndyipnfa3y8bb2bvsh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdtndyipnfa3y8bb2bvsh.png" alt="Diagram showing an agentic research workflow. The process repeatedly searches for sources, reads and evaluates information, identifies gaps or contradictions, and performs additional searches until coverage is sufficient. The collected information is then synthesized into a structured summary." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: The deep research cycle: search, evaluate, identify gaps, and search again until coverage is sufficient&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No single LLM call can do all of that. What is required is the mechanism and scaffolding that  allows the model to reason, act, observe the result, reason again, and continue until the task is complete. That mechanism is the agent loop.&lt;/p&gt;

&lt;p&gt;Notably, implementations of agent frameworks and harnesses, however opinionated, have  shared one thing in common: convergence on a minimal agent loop design. That  convergence is arguably not much of a design choice, so much as a logical consequence of  the task itself.&lt;/p&gt;

&lt;p&gt;The agent loop exists because long-horizon tasks cannot be  completed in a single forward pass.&lt;/p&gt;

&lt;p&gt;The loop emerging as a design pattern draws a parallel to how humans operate in most  organisations: structured cycles of work, review, and feedback that repeat until the objective  is met.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop Conditions
&lt;/h2&gt;

&lt;p&gt;Loops have to be exited eventually. The programmatic loops taught in computer science  classes usually exit in one of two ways: the iteration count for the loop is reached, or a break statement inside the loop triggers an exit.&lt;/p&gt;

&lt;p&gt;A well-designed agent loop defines explicit exit criteria. Common examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model produces a final response with no pending tool calls.&lt;/li&gt;
&lt;li&gt;A goal-completion check returns true: an objective-specific predicate, not merely the  absence of tool calls.&lt;/li&gt;
&lt;li&gt;A maximum number of iterations is reached.&lt;/li&gt;
&lt;li&gt;A wall-clock timeout expires.&lt;/li&gt;
&lt;li&gt;An error occurs that the agent cannot recover from.&lt;/li&gt;
&lt;li&gt;The harness identifies a failure mode, such as the agent repeating the same action  without progress.&lt;/li&gt;
&lt;li&gt;The agent explicitly invokes an exit action or sets a completion flag.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the notebook accompanying this article, the stop conditions are implemented directly  inside the harness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
&lt;span class="n"&gt;max_execution_time_s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; 
 &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
 &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; 
 &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_execution_time_s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="c1"&gt;# Wall-clock timeout 
&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_openai_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="c1"&gt;# Model produced a terminal message; exit the loop 
&lt;/span&gt; &lt;span class="c1"&gt;# Execute tools, append outputs, continue 
&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; 
 &lt;span class="c1"&gt;# Fallback if max iterations reached 
&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Max iterations reached; please refine the request.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The max iterations of the loop is set to 10 by default. This is a guard against the loop running indefinitely, which can incur high operational cost through the increase in token consumption across inference calls. There is also a max_execution_time_s parameter, which adds a  temporal guard to the agent loop’s execution.&lt;/p&gt;

&lt;p&gt;It is worth noting that a terminal message from the model, one with no further tool calls, ends the agent’s turn. It does not mean the user’s goal has been satisfied. The model may return  a clarifying question, a partial result, or a response that requires follow-up. The agent  harness is responsible for checking whether the goal is actually complete, not simply  whether the model has stopped emitting tool calls. This distinction becomes more  consequential as tasks grow in length and complexity, and it is where domain expertise  becomes paramount in agent harness engineering.&lt;/p&gt;

&lt;p&gt;Failure mode identification deserves its own mention as an exit path. A loop should break  not only when work completes but when work stops progressing.&lt;/p&gt;

&lt;p&gt;The clearest example is tool call repetition: the agent invokes the same tool with identical arguments for a third consecutive iteration, a strong signal that it is stuck rather than working. A well-instrumented harness keeps a window of recent tool calls, detects the repetition, and exits with a diagnostic instead of spending the remaining iterations on a stalled run. Oscillation between two states belongs to the same family of detectable failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Defining the Agent Loop
&lt;/h2&gt;

&lt;p&gt;With the components and the exit criteria established, the definition can now be stated with precision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Agent Loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A cyclical, iterative execution pattern inside a single agent run where the harness  repeatedly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Assembles execution context:&lt;/strong&gt; system instructions, conversation state, retrieved  memory, tool outputs, and any relevant external data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invokes a reasoning model&lt;/strong&gt; to decide what to do next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acts:&lt;/strong&gt; responds to the user, calls tools, writes memory or state, or updates its plan.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each cycle appends its trace (assistant messages, tool outputs, state updates) to the  context and repeats until a termination check ends the run. Context-window pressure  and operational safety (timeouts, iteration caps, budget guards) are first-class concerns, not afterthoughts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Levels of the Agent Loop
&lt;/h2&gt;

&lt;p&gt;The agent loop is not a fixed pattern. The simple design presented above evolves as  memory, tooling, and opinionated scaffolding are added. The three levels below provide a  framework for where a system currently sits and what engineering work lies ahead. Most  production failures (agents that repeat themselves, lose context, or produce inconsistent  results across sessions) trace back to a mismatch between task complexity and agent level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdej38hkepjf75hl5wh9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdej38hkepjf75hl5wh9n.png" alt="Diagram titled “Memory Maturity” showing three levels of AI memory architecture. Level 1 uses only an LLM, tools, and responses, with no persistence beyond the context window. Level 2 adds memory lifecycle management, including reading memory before actions and writing memory afterward across multiple memory types. Level 3 extends memory inside and outside the agent loop with compaction, offloading, tool discovery, idempotency, and prompt caching." width="648" height="1232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5: The three levels of the agent loop&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 1: LLM + Tools + Response
&lt;/h2&gt;

&lt;p&gt;At its simplest, the agent loop is an LLM that can call tools and return a response. There is  no persistent memory, no external state, and no scaffolding beyond the loop itself. The loop  iterates because tool results must be fed back to the model before it can produce a final  answer.&lt;/p&gt;

&lt;p&gt;The code below demonstrates the pattern most developers encounter when building simple  tool-calling agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
 &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
 &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="c1"&gt;# Terminal message; exit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzxmwlfkbzowmqk2pe48j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fzxmwlfkbzowmqk2pe48j.png" alt="Diagram showing a Level 1 agent architecture. A user interacts with an agent loop containing a model and tools. The model issues tool calls, receives results, and repeats until the task is complete, after which a response is returned. No persistent memory is included." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6: Level 1: the minimal tool-calling loop&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;LangChain’s ReAct agent provides this pattern out of the box. The agent receives an input  query, selects a tool, calls it, observes the output, and reasons again, all within a single run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt; 
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  
&lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What are the latest AI papers on agent  memory?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Level 1 is where most developers start, and it is genuinely useful for self-contained tasks. Its  limitation is structural: the agent has no recollection of previous conversations. Every run  starts cold, the context window is the only memory it has, and it resets completely when the  run ends. On any multi-turn or long-horizon task, it will repeat work it already did, lose track  of decisions made earlier in the session, and produce output that contradicts its own prior  responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 2: Lifecycle Inside the Loop
&lt;/h2&gt;

&lt;p&gt;At Level 2, operations begin to appear inside the agent loop. Memory is read before the LLM is called, and memory is written after the agent acts. The loop now has a lifecycle. At Level  1, the loop can be seen as a transport mechanism for tool calls. At Level 2, the loop  becomes a reasoning engine with state. This is also where the distinction between a  memory-augmented agent and a memory-aware agent becomes consequential.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory-augmented agents&lt;/strong&gt; retrieve and inject information into context. They read  from memory, but they do not actively manage it. Memory is something that happens  to them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-aware agents&lt;/strong&gt; treat memory as a first-class engineering concern. They  encode, store, retrieve, inject, and forget, actively managing their cognitive state within  each run and across sessions. Level 2 is where you begin building memory-aware  agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction, and the engineering it implies, is the subject of the DeepLearning.AI short  course &lt;a href="https://www.deeplearning.ai/courses/agent-memory-building-memory-aware-agents" rel="noopener noreferrer"&gt;Agent Memory: Building Memory-Aware Agents&lt;/a&gt;, built with Oracle, if you want the full  overview.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgts166k1h4mduci779si.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgts166k1h4mduci779si.png" alt="Comparison of memory-augmented and memory-aware agents. In the memory-augmented approach, memory is retrieved and injected into the agent externally. In the memory-aware approach, the agent actively retrieves, stores, updates, and forgets information, directly managing its own memory state." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 7: Memory-augmented agents read from memory; memory-aware agents manage it&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Level 2 makes context assembly trade-offs immediately visible. Adding more memory types  (conversation history, retrieved documents, entity records, workflow patterns) improves  grounding and action selection. On the other hand, it also introduces cost: more tokens,  higher latency, and a greater risk of injecting irrelevant or stale content that misleads the  model rather than informing it.&lt;/p&gt;

&lt;p&gt;There are a few failure modes worth mentioning:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Noisy retrieval:&lt;/strong&gt; semantically similar documents that are not actually relevant to the  current query. Mitigation approaches are implemented via relevance thresholds and  precision-oriented retrieval strategies such as hybrid search and pre-, post-, and in-filtering methods in retrieval pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale memory:&lt;/strong&gt; data can quickly become irrelevant in a fast-paced problem domain:  cached facts, entity records, or summaries that are no longer accurate. Mitigate with  TTL policies and update-on-write patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool schema overload:&lt;/strong&gt; context bloat is a common problem, and it is most prevalent in tool-calling agents with too many tool definitions passed to the model at once,  degrading tool selection accuracy. Mitigate with semantic tool retrieval rather than  exhaustive enumeration; this is shown in the companion notebook for this piece.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are more failure modes, and in production these are not edge cases. They are  predictable failures that any Level 2 agent will encounter as memory stores grow. Designing  mitigation strategies at the start is cheaper than retrofitting fixes later.&lt;/p&gt;

&lt;p&gt;Memory operations are common in Level 2 agent loops, mainly because agents at this level  are designed for continuity and adaptation. &lt;strong&gt;Memory operations are programmatic  methods designed to modify data and information within the agent’s system  boundary and across other system components such as databases and external  stores.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;When It Runs&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read conversational memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Load prior chat history into  context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read knowledge base&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Inject relevant documents&amp;nbsp;and facts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read workflow memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Surface known action&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;patterns&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read entity memory&lt;/td&gt;
&lt;td&gt;Before LLM call&lt;/td&gt;
&lt;td&gt;Resolve named references in the query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write conversational memory&lt;/td&gt;
&lt;td&gt;After user message&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;received&lt;/td&gt;
&lt;td&gt;Persist the user turn&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write knowledge base&lt;/td&gt;
&lt;td&gt;After tool search&lt;/td&gt;
&lt;td&gt;Store retrieved results for future runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write entity memory&lt;/td&gt;
&lt;td&gt;After LLM response&lt;/td&gt;
&lt;td&gt;Extract and persist people, places, systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write conversational memory&lt;/td&gt;
&lt;td&gt;After final response&lt;/td&gt;
&lt;td&gt;Persist the assistant turn&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In the accompanying notebook, these operations are centralised in a MemoryManager class  backed by Oracle AI Database. Before each run, the harness calls all read operations to  assemble context. After each run, write operations persist the new information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# -- Reads: all run BEFORE the tool-call loop ------------------------ conv_mem = memory_manager.read_conversational_memory(thread_id) knowledge = memory_manager.read_knowledge_base(query) 
&lt;/span&gt;&lt;span class="n"&gt;workflows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_summary_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conv_mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knowledge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="c1"&gt;# -- Inner tool-call loop -------------------------------------------- response = run_tool_call_loop(context, tools) 
# -- Writes: all run AFTER the loop exits ---------------------------- memory_manager.write_conversational_memory(thread_id, 'assistant',  response) 
&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;extract_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook uses six distinct memory types, each stored in Oracle AI Database and each  serving a specific cognitive function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Conversational memory:&lt;/strong&gt; episodic chat history retrieved by thread ID via a standard  SQL table. Exact lookup, no similarity search required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge base memory:&lt;/strong&gt; semantic memory backed by a vector-enabled SQL table  with HNSW indexing for similarity search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow memory:&lt;/strong&gt; procedural memory storing learned action patterns and tool  sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Toolbox memory:&lt;/strong&gt; a vector-indexed registry of tool definitions enabling semantic  discovery rather than exhaustive schema enumeration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity memory:&lt;/strong&gt; LLM-extracted people, places, and systems, persisted across  sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summary memory:&lt;/strong&gt; compressed context for long conversations, with just-in-time  expansion when the agent needs the full content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At Level 2, the loop is no longer just executing tools. It is actively managing its own cognitive state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Level 3: Operations Inside and Outside the Loop
&lt;/h2&gt;

&lt;p&gt;At this point, developers understand not only which operations they require inside the loop;  more opinionated scaffolding and harness begin to form around the agent loop itself.&lt;/p&gt;

&lt;p&gt;Operations now exist both within the loop and outside it, and there are deliberate  architectural choices about which side of the boundary each operation belongs on. This is  where agent engineering becomes opinionated, and where context engineering and memory engineering become distinct disciplines with separate concerns.&lt;/p&gt;

&lt;p&gt;In a Level 3 agent loop, some operations should be automatic. The agent should never have to decide whether to load its own conversation history. Others should be agent-triggered: the agent decides when to search the web, not the harness.&lt;/p&gt;

&lt;p&gt;Getting this boundary wrong produces either context bloat, when too much is loaded automatically, or missed context,  when content that should always be present is left to the model’s discretion.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Programmatic&lt;/th&gt;
&lt;th&gt;Agent Triggered&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read conversational memory&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;The agent always needs its history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read knowledge base&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Relevant documents always  loaded at run start&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read workflow base&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Known patterns always&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;surfaced before reasoning&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read entity memory&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Named references always resolved upfront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read summary context&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Summary IDs always loaded; full content expanded on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expand a summary&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when it needs the full content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search the web (Tavily)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when stored knowledge is insufficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarise conversation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Agent decides when context needs compaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write tool log (offload)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Automatic after every tool call; keeps context lean&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Context engineering at Level 3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three techniques only become necessary at Level 3. Below Level 3, your context is  manageable by construction. At Level 3, with memory reads, multiple tool calls, and iterated  reasoning, it is not.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window monitoring:&lt;/strong&gt; tracking token usage across iterations to detect when  compaction is needed before the window fills and performance degrades.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation compaction:&lt;/strong&gt; replacing verbose chat history with compressed  summaries while preserving originals in the database. The notebook marks messages  with a summary_id rather than deleting them, keeping the full record available for audit  and on-demand expansion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool output offloading:&lt;/strong&gt; persisting full tool outputs to a tool log table and replacing  them in context with a compact one-line reference.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tool log pattern is worth examining in detail. A single web search can return three to four thousand tokens of raw results. Without offloading, every subsequent iteration in the same  run carries those tokens. With offloading, the context receives only a reference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; 
 &lt;span class="n"&gt;raw_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
 &lt;span class="c1"&gt;# Full output persisted to the database 
&lt;/span&gt; &lt;span class="n"&gt;log_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_tool_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
 &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
 &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
 &lt;span class="n"&gt;tool_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_output&lt;/span&gt; 
 &lt;span class="p"&gt;)&lt;/span&gt; 
 &lt;span class="c1"&gt;# Context receives only the compact reference 
&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[Tool Log ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;log_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Results stored. Call read_tool_log to  retrieve.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Semantic tool discovery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At Level 3, the number of available tools is unlikely to stay small. Passing every tool schema  to the model on every iteration is a known failure mode: tool selection accuracy drops as the  schema list grows, and token costs climb regardless of how many tools are actually relevant.&lt;/p&gt;

&lt;p&gt;The notebook addresses this with a &lt;strong&gt;Toolbox&lt;/strong&gt;: a vector-indexed registry of tool definitions  where only semantically relevant tools are retrieved and passed to the model for each query. Tools are registered with LLM-augmented metadata so that embeddings capture intent and  use case, not just function signatures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@toolbox.register_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;augment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# LLM enriches description for  retrieval 
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_tavily&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; 
 &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web and persist results in the knowledge base.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt; 
&lt;span class="c1"&gt;# At runtime: only semantically relevant tools passed to the model
&lt;/span&gt;&lt;span class="n"&gt;relevant_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_toolbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Idempotency and tool reliability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tool call failures are a production reality. Network errors, rate limits, and transient service  issues occur regularly. If the harness retries a failed tool call naively, it risks executing a  side-effecting operation twice: writing a record, sending a message, or triggering a payment  more than once.&lt;/p&gt;

&lt;p&gt;The mitigation is idempotency: assigning each tool call a stable key before execution so that  retries can be safely distinguished from duplicate calls. This is harness-level engineering, not model-level reasoning, and it belongs in the Level 3 design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt caching and message ordering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At Level 3, the harness also starts to affect inference economics through prompt caching.  Most LLM providers implement prefix-based caching: if the beginning of a prompt is identical to a recent request, the cached computation can be reused, reducing latency and cost.&lt;/p&gt;

&lt;p&gt;The implication for agent design is concrete. Rewriting earlier messages mid-conversation,  to clean up history, reorder context, or inject new system instructions inline, breaks prefix  stability and degrades cache hit rates. The correct pattern is to append new instructions  rather than modifying existing message history. The &lt;a href="https://openai.com/index/unrolling-the-codex-agent-loop/" rel="noopener noreferrer"&gt;Codex implementation&lt;/a&gt; established this  explicitly: old prompts are preserved as exact prefixes of new prompts specifically to  maintain caching benefits across long multi-step runs.&lt;/p&gt;

&lt;p&gt;Level 3 is where the agent harness becomes a system in its own right. The inner loop,  assembling context, invoking the model, and acting, has not changed. What has changed is  everything around it: the scaffolding that feeds it, the operational constraints that govern it,  and the persistence layer that gives it continuity across time and sessions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other Loops the Agent Engineer Should Know
&lt;/h2&gt;

&lt;p&gt;The agent loop does not run in isolation. It sits inside a wider system of loops, and the  engineering decisions made inside the agent loop are shaped by what happens in the loops  around it.&lt;/p&gt;

&lt;p&gt;Three matter most to agent engineers and memory engineers: the training loop that produced the model, the feedback loop that signals whether the system is working, and  the human loop that bounds its authority.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fr02zt6qj6mfbylorrfu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fr02zt6qj6mfbylorrfu8.png" alt="Diagram showing an agent loop connected to an Oracle AI Database memory layer. The loop assembles context, reasons, and acts while reading and writing episodic, semantic, procedural, entity, summary, and tool-log memories. Human review and feedback loops provide corrections and evaluation signals, while accumulated experience can feed future model training." width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 8: The loops interconnected: the training loop produces the model, the agent loop generates experience, and the memory layer routes that experience back as training signal&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The training loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The training loop is the cycle that produced the model in the first place: data  collection, gradient updates, evaluation, and release.&lt;/strong&gt; It operates offline, at a timescale of days or weeks, on curated datasets. The agent loop operates online, in real time, on live  interactions.&lt;/p&gt;

&lt;p&gt;Today these two loops are largely decoupled. Training happens, weights are frozen, and the  agent loop runs on top of those fixed weights. The apparent learning you observe within a  session, an agent recalling prior context or adapting to corrections, is not weight updating. It  is retrieval. The agent is not learning; it is reading from memory.&lt;/p&gt;

&lt;p&gt;This separation defines the boundary of what the agent loop can and cannot accomplish on  its own. It can accumulate experience through memory operations. It cannot change the  underlying model without a training cycle. Understanding this boundary tells you which  problems belong to memory engineering and which require retraining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The feedback loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every action the agent takes produces feedback. Tool results are feedback. User corrections are feedback. Evaluation metrics (hallucination rate, task completion, citation accuracy) are  feedback at a system level.&lt;/p&gt;

&lt;p&gt;At Level 3, the agent harness begins to make the feedback loop explicit and instrumentable.  The notebook’s context window growth chart is a primitive example: watching whether token  counts stabilize across runs tells you whether your context engineering is actually working.  More sophisticated systems route evaluation signals back into memory stores, marking  retrieved content as reliable or unreliable based on downstream outcomes, and gradually  improving retrieval quality without retraining.&lt;/p&gt;

&lt;p&gt;The feedback loop is what turns an agent into a system that improves over time. Without it,  every invocation starts from the same baseline regardless of what the agent has done  before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human in the loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Long-horizon tasks regularly reach decision points where the agent lacks the information,  authority, or confidence to proceed without human input. The human-in-the-loop pattern  introduces a pause condition: the agent surfaces a question or proposed action, waits for  review or correction, and then continues.&lt;/p&gt;

&lt;p&gt;This is a stop condition of a different kind. Rather than halting because the task is finished,  the loop pauses because it has reached the boundary of its autonomous authority.  Designing this well involves two things: knowing in advance where those boundaries should  sit for a given workflow, and ensuring the agent communicates specifically when it reaches  one. A generic request for help is insufficient. The agent must surface a precise description  of what information or decision is blocking progress.&lt;/p&gt;

&lt;p&gt;Human-in-the-loop is not a safety net for when the agent fails. It is a deliberate architectural  decision about where human judgment adds the most value in a system. The agent loop  handles what can be reasoned about autonomously. The human loop handles what requires authority, context, or accountability that the agent does not have.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Is Going
&lt;/h2&gt;

&lt;p&gt;The agent loop, the training loop, and the feedback loop are currently operated as separate  engineering concerns. That separation is practical, not fundamental. As agents accumulate  experience across millions of runs, the information they generate (episodic memories, entity &lt;/p&gt;

&lt;p&gt;graphs, workflow patterns, evaluation signals, context growth traces) becomes a training  signal. The training loop will eventually consume the output of the agent loop, closing the  circle.&lt;/p&gt;

&lt;p&gt;When that happens, the quality of the memory layer becomes the quality of the training data. Agents with well-engineered memory (clean episodic records, accurately extracted entities,  reliable retrieval signals) produce better training signals than agents that let context  accumulate without structure.&lt;/p&gt;

&lt;p&gt;This convergence has a name. **Continual learning is the ability of a model to acquire  new knowledge and capabilities from a stream of incoming data over time, without  retraining from scratch and without catastrophically forgetting what it has already &lt;/p&gt;

&lt;p&gt;learned.** It is a formal machine learning discipline, not a metaphor, and it is the bridge  between the two loops: the agent loop generates the experience, and continual learning is  the process by which the training loop absorbs that experience into model weights.&lt;/p&gt;

&lt;p&gt;Continual learning in agentic systems is the capacity of an agent to improve over time through the accumulation of high-signal memory units, with the extracted signal applied across three optimization surfaces: token space, weight space, and latent space.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Union of the Agent Loop and the Training Loop
&lt;/h2&gt;

&lt;p&gt;What connects them is the memory layer.&lt;/p&gt;

&lt;p&gt;Oracle AI Database serves as the agent memory core, providing vector search,  relational storage, and graph capabilities in a single engine. Memory operations that run inside the agent loop (encoding, storing, retrieving, injecting, and forgetting) produce a  durable record of agent experience.&lt;/p&gt;

&lt;p&gt;Oracle OCI provides the platform for continuous learning: the infrastructure to retrain  models on that accumulated experience at scale, closing the loop from runtime  behaviour back into model weights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent loop and the training loop are converging. The memory layer is where  they meet.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For engineers building agents today, this means the decisions made about memory  architecture are not just operational decisions. They are decisions about what the system will be able to learn from tomorrow. A database that can serve low-latency semantic search at  runtime can also serve as the data source for a continuous training pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design your memory layer accordingly.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. What is the agent loop?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent loop is the repeating cycle a harness runs within a single agent turn:  assemble context, invoke the model to reason, act on its decision, and repeat until a  stop condition ends the run. It exists because long-horizon tasks cannot be completed  in a single LLM call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How do you stop an agent loop from running forever?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Define explicit stop conditions in the harness: a terminal message with no pending tool  calls, a goal-completion check, an iteration cap, a wall-clock timeout, unrecoverable  errors, and failure mode detection such as the agent repeating the same tool call with  identical arguments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. What is the difference between a memory-augmented agent and a memory-aware agent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A memory-augmented agent retrieves and injects information into context but does not  manage it; memory is something that happens to the agent. A memory-aware agent  encodes, stores, retrieves, injects, and forgets, actively managing its cognitive state  within each run and across sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. How do I know which level my agent system sits at?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If there is no persistence beyond the context window, it is Level 1. If memory is read  before the model call and written after the agent acts, it is Level 2. If there is a  deliberate boundary between programmatic and agent-triggered operations, with  techniques such as compaction, tool output offloading, and semantic tool discovery, it  is Level 3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. What connects the agent loop to the training loop?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The memory layer. Agent runs generate experience: episodic records, entities,  workflows, and evaluation signals. With continual learning, that experience becomes  training signal. Oracle AI Database stores and serves it inside the agent loop; Oracle  OCI provides the platform to retrain models on it. The patterns are implemented in the  companion notebook.&lt;/p&gt;

</description>
      <category>agentloop</category>
      <category>agents</category>
      <category>memory</category>
      <category>ai</category>
    </item>
    <item>
      <title>Oracle Backend for Microservices and AI: The Business Value of a Microservices Backend Platform</title>
      <dc:creator>Mark Nelson</dc:creator>
      <pubDate>Fri, 26 Jun 2026 12:12:34 +0000</pubDate>
      <link>https://dev.to/oracledevs/oracle-backend-for-microservices-and-ai-the-business-value-of-a-microservices-backend-platform-26b7</link>
      <guid>https://dev.to/oracledevs/oracle-backend-for-microservices-and-ai-the-business-value-of-a-microservices-backend-platform-26b7</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Oracle Backend for Microservices and AI, also referred to as OBaaS, gives teams a shared backend foundation for microservices and AI-enabled applications.&lt;/strong&gt; It brings Oracle AI Database together with cloud-native platform patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The main business value is repeatability.&lt;/strong&gt; Platform teams can standardize common backend concerns, while application teams still own service design, security, data modeling, testing, deployment planning, and operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OBaaS uses backend-as-a-service thinking for enterprise platforms.&lt;/strong&gt; It helps teams handle recurring needs such as gateways, configuration, observability, messaging, Oracle AI Database integration, workflow coordination, and platform operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment choices still need architecture review.&lt;/strong&gt; OCI Magic Button is useful for development and test environments on Oracle Cloud Infrastructure, while Helm is the production-oriented installation path.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is Oracle Backend for Microservices and AI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.oracle.com/database/microservices-backend/" rel="noopener noreferrer"&gt;Oracle Backend for Microservices and AI&lt;/a&gt;, also referred to as "&lt;strong&gt;OBaaS&lt;/strong&gt;", is a platform for building, deploying, and scaling microservices and AI-enabled applications. It is built around Oracle AI Database and cloud-native infrastructure. This series focuses on Oracle Backend for Microservices and AI 2.1.0.&lt;/p&gt;

&lt;p&gt;A microservice is rarely just a container with business logic. It needs an entry point. It needs configuration. It needs telemetry. It often needs messaging, database access, and a way to coordinate work across service boundaries. It also needs an operating model after the first demo works.&lt;/p&gt;

&lt;p&gt;AI-enabled applications add more pressure. They often bring more data movement, more review, and more governance into the same system. The service code may be small, but the platform around it is not.&lt;/p&gt;

&lt;p&gt;That is the problem OBaaS is designed to address. It gives Oracle-centered teams a shared foundation for common backend concerns. Instead of asking every team to assemble its own gateway, configuration, telemetry, messaging, database, and operations patterns, the organization can start from a common platform baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why microservices and AI applications need a shared backend foundation
&lt;/h2&gt;

&lt;p&gt;Many microservice programs start with a good goal: smaller services, clearer ownership, and faster delivery. The hard part comes later.&lt;/p&gt;

&lt;p&gt;One team chooses a gateway pattern. Another team handles configuration a different way. A third team emits telemetry in a different format. A fourth team builds local conventions for messaging and database access. Each choice may be reasonable. Together, they can become hard to secure, monitor, upgrade, and explain.&lt;/p&gt;

&lt;p&gt;AI-enabled applications make this more visible. Teams may be building recommendation flows, knowledge assistants, decision-support features, or data-rich operational services. The business logic changes from project to project. The platform concerns repeat.&lt;/p&gt;

&lt;p&gt;Services still need reliable access patterns. They still need externalized configuration. They still need observability, durable data, Oracle AI Database integration, and asynchronous communication where it fits. They also need operational practices that make the system reviewable beyond a prototype.&lt;/p&gt;

&lt;p&gt;OBaaS helps by giving platform teams a more consistent foundation to govern. Application teams get a better starting point, instead of rebuilding the same platform plumbing for every service.&lt;/p&gt;

&lt;p&gt;That does not remove engineering judgment. Teams still need to design service boundaries, model data, review security, test carefully, plan for production, and operate what they build. OBaaS helps with the repeated backend foundation around that work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backend-as-a-service thinking for enterprise platform teams
&lt;/h2&gt;

&lt;p&gt;“Backend as a service” can mean different things. In mobile or consumer app development, it often means a hosted backend that provides features such as identity, storage, APIs, and push notifications.&lt;/p&gt;

&lt;p&gt;OBaaS is different. It is not a consumer mobile backend product. It is better understood as backend-as-a-service thinking applied to an enterprise platform.&lt;/p&gt;

&lt;p&gt;In that model, common backend capabilities are provided as a governed foundation. Platform teams define standard patterns. Application teams use those patterns across services and projects.&lt;/p&gt;

&lt;p&gt;The infrastructure still exists. Architecture still matters. DBAs, security teams, platform engineers, and application owners still have work to do.&lt;/p&gt;

&lt;p&gt;The value is not magic. The value is repeatability. A backend platform helps when it turns recurring setup and integration work into reusable patterns. Teams can then spend more time on the business behavior, data model, security posture, and operational model their applications require.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OBaaS relates to microservices chassis concerns
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://microservices.io/patterns/microservice-chassis.html" rel="noopener noreferrer"&gt;microservices chassis&lt;/a&gt; pattern is a useful comparison. A traditional chassis is usually a code-level framework or library pattern. It gives individual services common features such as configuration, logging, metrics, health checks, service discovery, or tracing.&lt;/p&gt;

&lt;p&gt;That pattern helps developers avoid rebuilding the same service plumbing again and again.&lt;/p&gt;

&lt;p&gt;OBaaS applies similar thinking at the platform level. It is not a replacement for a service framework. It is not a code-level chassis that dictates how every microservice must be written. Teams can still choose the languages and frameworks that fit their services.&lt;/p&gt;

&lt;p&gt;The difference is the layer of concern. OBaaS focuses on the shared backend foundation around services: access, configuration, observability, messaging, Oracle AI Database integration, workflow coordination, and operations.&lt;/p&gt;

&lt;p&gt;This matters because teams can go too far in either direction. If every team solves platform concerns alone, the organization gets local speed but long-term inconsistency. If the platform team centralizes too much, application teams lose the flexibility that makes microservices useful.&lt;/p&gt;

&lt;p&gt;OBaaS fits between those extremes. It gives teams a shared foundation for repeated backend concerns, while application teams keep ownership of service design and business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OBaaS helps standardize
&lt;/h2&gt;

&lt;p&gt;OBaaS helps platform teams provide a common foundation for backend capabilities that show up across many microservices and AI-enabled applications.&lt;/p&gt;

&lt;p&gt;Service access is usually one of the first concerns. Services need controlled entry points, routing patterns, and a clear way to expose capabilities. Without a common approach, each team can end up inventing its own access model.&lt;/p&gt;

&lt;p&gt;Configuration is another repeated need. Teams need a consistent way to manage settings across development, test, and production review. Ad hoc configuration works for a small prototype. It becomes harder to manage as services multiply.&lt;/p&gt;

&lt;p&gt;Observability is essential in a distributed system. A service can work well by itself and still fail as part of a larger application. Teams need operational signals that show how requests, dependencies, and failures behave across the system.&lt;/p&gt;

&lt;p&gt;Messaging helps when asynchronous communication is a better fit than direct calls. Not every service needs the same messaging pattern. But platform teams can still provide consistent conventions for teams that do.&lt;/p&gt;

&lt;p&gt;Oracle AI Database integration matters because many enterprise applications are built around durable business data. OBaaS is positioned around Oracle data-platform integration, not around a database-neutral abstraction.&lt;/p&gt;

&lt;p&gt;Workflow and transaction-related coordination help teams reason about multi-step processes that cross service boundaries. Platform operations bring the foundation together so it can be installed, managed, upgraded, and reviewed as part of the organization’s operating model.&lt;/p&gt;

&lt;p&gt;The point is not that every service uses every capability in the same way. The point is that teams start from a documented platform baseline instead of a blank page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Business value for platform and application teams
&lt;/h2&gt;

&lt;p&gt;The business case for OBaaS starts with duplication.&lt;/p&gt;

&lt;p&gt;When every application team assembles gateways, configuration, observability, messaging, database connectivity, and operational integration on its own, the organization pays for the same work many times. It also pays later when those choices must be secured, monitored, upgraded, debugged, and explained across teams.&lt;/p&gt;

&lt;p&gt;OBaaS helps move that work into a shared platform foundation. A platform team can define gateway access, telemetry conventions, database connectivity, and messaging patterns once. Application teams can then focus on service boundaries and business behavior.&lt;/p&gt;

&lt;p&gt;Over time, the organization can improve the common patterns instead of rediscovering them project by project.&lt;/p&gt;

&lt;p&gt;For architects, the value is better alignment between application design and platform capabilities. For engineering managers, it is a more predictable starting point for teams. For DBAs and data platform leaders, it is a clearer connection between microservices and Oracle AI Database. For business leaders, it is a better path from prototype to enterprise review without treating every application as a one-off infrastructure effort.&lt;/p&gt;

&lt;p&gt;The careful word is “helps.” OBaaS helps standardize. It helps reduce repeated assembly. It provides a foundation.&lt;/p&gt;

&lt;p&gt;It makes it a lot easier to achieve lower cost, faster delivery, compliance, performance, availability, or production readiness; but those outcomes also depend on architecture, implementation, governance, and operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment flexibility without removing architecture choices
&lt;/h2&gt;

&lt;p&gt;OBaaS supports more than one adoption path. That flexibility is useful, but it should not be confused with automatic portability or production readiness.&lt;/p&gt;

&lt;p&gt;For development and test use on Oracle Cloud Infrastructure (OCI), &lt;a href="https://oracle.github.io/microservices-backend/obaas/docs/setup/" rel="noopener noreferrer"&gt;OCI Magic Button&lt;/a&gt; provides a way to provision complete development and test infrastructure. This path is useful when teams want to explore, prototype, or evaluate OBaaS on OCI. It should not be treated as the production deployment path.&lt;/p&gt;

&lt;p&gt;For existing Kubernetes clusters, &lt;a href="https://oracle.github.io/microservices-backend/obaas/docs/setup/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt; is the documented production-oriented installation method. Helm gives platform teams a controlled installation path for Kubernetes environments that meet the required prerequisites.&lt;/p&gt;

&lt;p&gt;Helm does not make an environment production-ready by itself. Production still requires review of the target Kubernetes environment, network model, database access, security controls, observability, scaling behavior, backup and recovery expectations, upgrade process, and operational ownership.&lt;/p&gt;

&lt;p&gt;OBaaS deployment planning can include OCI, other public cloud providers, and hybrid environments where the Kubernetes environment meets documented prerequisites and support boundaries. That qualification is important. Deployment flexibility does not mean universal compatibility with every Kubernetes distribution, cloud service, or operating model.&lt;/p&gt;

&lt;p&gt;The useful question is not “Can we avoid architecture decisions?” The useful question is “Can we standardize the backend foundation so those decisions start from a better place?” OBaaS is aimed at that second question.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next in OBaaS 2.1.0
&lt;/h2&gt;

&lt;p&gt;This article focused on why Oracle Backend for Microservices and AI matters as a shared backend platform. The core value is repeatability: a governed foundation for recurring backend concerns across microservices and AI-enabled applications, especially for teams building around Oracle AI Database and cloud-native infrastructure.&lt;/p&gt;

&lt;p&gt;The next article looks at what changed in Oracle Backend for Microservices and AI 2.1.0 across the platform areas that matter to application and platform teams. That includes gateway, observability, configuration, messaging, workflow, installation, and upgrade concerns.&lt;/p&gt;

&lt;p&gt;The goal is not to repeat release notes line by line. The goal is to explain how the 2.1.0 update strengthens the platform foundation that OBaaS provides.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Database-Enforced Authorization for Agentic AI .NET Applications</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Thu, 25 Jun 2026 20:07:31 +0000</pubDate>
      <link>https://dev.to/oracledevs/database-enforced-authorization-for-agentic-ai-net-applications-75g</link>
      <guid>https://dev.to/oracledevs/database-enforced-authorization-for-agentic-ai-net-applications-75g</guid>
      <description>&lt;p&gt;Protect .NET applications from over-broad agent access, prompt injection, and tool misuse with Oracle Deep Data Security and ODP.NET&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fy40s1qsxts8kt28pu04k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fy40s1qsxts8kt28pu04k.png" alt=" " width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Agentic AI can perform complex tasks, but it often requires broad and dynamic data access, which increases security and compliance risk.&lt;/li&gt;
&lt;li&gt;Enforcing authorization in the database reduces duplicated application logic and keeps access rules consistent across applications, agents, and tools.&lt;/li&gt;
&lt;li&gt;Oracle Deep Data Security provides database-native policy enforcement. ODP.NET 23.26.2 adds support for passing end-user security context from .NET applications.&lt;/li&gt;
&lt;li&gt;.NET applications can adopt this model by integrating end-user security context into the data access layer.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Agentic AI changes how applications access data. Instead of following fixed application flows, agents can choose tools and generate actions at runtime.&lt;/p&gt;

&lt;p&gt;When given relevant data, agents can further optimize workflows to meet objectives. However, broader access also increases the risk of unauthorized use and data exfiltration.&lt;/p&gt;

&lt;p&gt;As organizations deploy agentic AI, keeping enterprise data protected and auditable becomes harder with the wrong security model. When agents access the database directly or through Model Context Protocol (MCP) server tools, authorization must still be enforced before data is returned. If the agent’s database access is broader than the end user’s authorization, it can expose sensitive data or modify protected records.&lt;/p&gt;

&lt;p&gt;The key design question is where authorization should be enforced: in every application, or in the database. At scale, maintaining separate authorization logic in every application becomes difficult to validate and easy to get wrong. When requirements, queries, or schemas change, teams must update authorization logic across every affected application. This becomes unmanageable as more AI applications are deployed across the enterprise.&lt;/p&gt;

&lt;p&gt;As attackers adopt AI-driven penetration testing tools to find application vulnerabilities faster than before, securing access control at every app level entry point becomes even more critical.&lt;/p&gt;

&lt;p&gt;On the other hand, database-layer enforcement centralizes authorization and applies policies consistently before data is returned. Instead of relying on every developer to secure their part of the app perimeter, the same database policies can apply whether access comes from an application, an AI agent, or an MCP-based tool. Oracle AI Database 26ai (23.26.2) enables this capability with Deep Data Security.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agentic AI Apps Need Deep Data Security
&lt;/h2&gt;

&lt;p&gt;Oracle Deep Data Security is a database-native authorization model that extends traditional system and object privileges by using end-user, agent, role, and attribute context in authorization decisions. It is designed for workloads where users, applications, agents, and tools may access the same data through different paths.&lt;/p&gt;

&lt;p&gt;Deep Data Security securely propagates end-user and agent identities, roles, and attributes to the database at runtime using an end-user security context. It is important to note that the end-users may not necessarily be database users. They can be any user type, such as Microsoft Entra ID or web application users. The database uses this context to enforce policies that define what users and agents can do — and when — and to generate audit records that capture activity. For example, a policy can allow a sales manager to see only customer rows for their assigned region, even if an agent generates a broader query.&lt;/p&gt;

&lt;p&gt;The Deep Data Security authorization model enforces fine-grained security at the row, column, and cell levels, enabling least-privilege access so end-users and agents see only authorized data. The database can return only authorized rows and masks sensitive column values when the end-user or agent lacks the required entitlement. Because the database enforces these policies during SQL execution, authorization remains consistent even when different applications or agents access the same data.&lt;/p&gt;

&lt;p&gt;Since policies are enforced in the database, developers do not have to duplicate the same authorization rules in every application or agent workflow. When requirements change, teams can update the database policy instead of rewriting authorization logic across multiple applications.&lt;/p&gt;

&lt;p&gt;For sensitive workflows, access can be granted only for the duration and scope of that workflow, instead of giving the application broad standing privileges. This reduces reliance on shared high-privilege service accounts that can read or write more data than the end user should have access to.&lt;/p&gt;

&lt;p&gt;Access boundaries must stay manageable, enforceable, and auditable as workflows change. Deep Data Security enforces least-privilege access for users and agents while preserving user identity in audit records to support safer, compliant AI adoption. .NET applications should incorporate Deep Data Security and pass end-user context to the database in a way they can manage consistently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Develop .NET Apps with Deep Data Security
&lt;/h2&gt;

&lt;p&gt;.NET applications can pass end-user identity, claims, roles, and application context to the database, where Deep Data Security evaluates policies during SQL execution before unauthorized rows, columns, or values can be returned. Managed ODP.NET and ODP.NET Core 23.26.2 add extension methods to use this context payload.&lt;/p&gt;

&lt;p&gt;With minimal code changes, existing ODP.NET applications can use Deep Data Security’s protection with agentic AI. Applications do not need to map each end user to a separate database user. The database evaluates authorization using the supplied end-user security context. The database manages session lifecycles automatically based on OAuth2 tokens, which include user authorization claims for resources and applications.&lt;/p&gt;

&lt;p&gt;To do this, you will set the end-user security context on the ODP.NET connection using OracleConnection.SetEndUserSecurityContext. The connection then executes commands on behalf of an end user identified by a token. The application supplies the end-user context separately from the database access token used by the mid-tier. Deep Data Security then evaluates policies using the end-user claims, roles, and attributes. Data roles and attributes allow Oracle AI Database to evaluate role mappings and token claims during authorization. This enables Deep Data Security to deliver fine-grained, end-user-aware access control in .NET without database user credentials.&lt;/p&gt;

&lt;p&gt;Deep Data Security evaluates policies during SQL execution, before unauthorized rows, columns, or values are returned. By default, unauthorized data is masked as NULL, though SQL functions can apply other formats.&lt;/p&gt;

&lt;p&gt;ODP.NET uses the OracleEndUserSecurityContext class to represent the security identity for an application end user’s database operations.&lt;/p&gt;

&lt;p&gt;Putting it altogether, the following .NET code sample shows how to set a connection’s end-user security context and clear it after use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OracleConnection conn = new OracleConnection(connStr);
conn.Open();
string userToken = GetUserToken();
string midTierToken = GetMidtierToken();

// Create security context using tokens
OracleEndUserSecurityContext securityContext = OracleEndUserSecurityContext.CreateWithTokens(midTierToken, userToken);

// Set security context on the connection
conn.SetEndUserSecurityContext(securityContext);

// Execute database operations

// Clear security context from connection
conn.ClearEndUserSecurityContext();

// Close connection
conn.Close();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Start Developing with ODP.NET Deep Data Security
&lt;/h2&gt;

&lt;p&gt;With ODP.NET and Oracle Deep Data Security, you can build end-to-end agentic AI .NET applications while protecting data from current and emerging threats. Data protection rules can evolve with simple changes, and access can be centrally managed and audited.&lt;/p&gt;

&lt;p&gt;Get started by downloading &lt;a href="https://www.nuget.org/packages/Oracle.ManagedDataAccess" rel="noopener noreferrer"&gt;managed ODP.NET&lt;/a&gt; or &lt;a href="https://www.nuget.org/packages/Oracle.ManagedDataAccess.Core" rel="noopener noreferrer"&gt;ODP.NET Core&lt;/a&gt; 23.26.2 with Deep Data Security and reviewing the &lt;a href="https://www.oracle.com/security/database-security/features/deep-data-security/" rel="noopener noreferrer"&gt;Oracle Deep Data Security web page&lt;/a&gt; and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/odpnt/featConnecting.html#GUID-5A35E30D-8252-4ED1-9903-80C28E9DE011" rel="noopener noreferrer"&gt;ODP.NET Developer’s Guide Deep Data Security section&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is agentic AI?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s AI that can plan, reason, and execute multi-step tasks independently, often without human supervision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why is managing data security at the database-level preferred?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It centralizes data access control, making it easier to manage, update, and audit compared to securing each application individually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Deep Data Security protect data?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It enforces policies at row, column, and cell levels, ensuring users and AI agents only access authorized data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do .NET apps use Deep Data Security?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They pass user and app identity via tokens into an ODP.NET connection security context, allowing the database to enforce access rules without exposing credentials.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>ai</category>
      <category>dotnet</category>
      <category>deepdatasecurity</category>
    </item>
    <item>
      <title>5 Oracle AI Database Dev Tools I’d Put in a Starter Kit</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Thu, 25 Jun 2026 19:56:31 +0000</pubDate>
      <link>https://dev.to/oracledevs/5-oracle-ai-database-dev-tools-id-put-in-a-starter-kit-89d</link>
      <guid>https://dev.to/oracledevs/5-oracle-ai-database-dev-tools-id-put-in-a-starter-kit-89d</guid>
      <description>&lt;p&gt;A practical toolkit to quickly build, test, and validate Oracle AI Database workflows from local to cloud&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Start fast with &lt;strong&gt;containers or FreeSQL&lt;/strong&gt; to reduce setup time and quickly validate ideas or queries.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;SQLcl and SQL Developer together&lt;/strong&gt; for both automation (CLI) and visual inspection (GUI).&lt;/li&gt;
&lt;li&gt;Enable &lt;strong&gt;AI-assisted workflows&lt;/strong&gt; with SQLcl’s MCP Server while enforcing security at the data layer.&lt;/li&gt;
&lt;li&gt;Move seamlessly from local experiments to &lt;strong&gt;Always Free Autonomous AI Database&lt;/strong&gt; for realistic cloud testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7e5yw6y4ie9pce9ex9g0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7e5yw6y4ie9pce9ex9g0.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Developers need the &lt;strong&gt;shortest path from claim to proof&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Can I start a database locally?”&lt;/li&gt;
&lt;li&gt;“Can I connect from my app?”&lt;/li&gt;
&lt;li&gt;“Can I run my tests against it?”&lt;/li&gt;
&lt;li&gt;“Can I inspect the schema without guessing?”&lt;/li&gt;
&lt;li&gt;“Can I use it with scripts, agents, and CI?”&lt;/li&gt;
&lt;li&gt;“Can I easily move from a laptop to a managed cloud database?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we’ll look at tools that help you shorten the feedback loop for the development process you’re trying to prove.&lt;/p&gt;

&lt;p&gt;Here are the five I would put in a practical starter kit. &lt;strong&gt;These are tools I use every day.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Oracle AI Database Free Container Images
&lt;/h2&gt;

&lt;p&gt;Start local when you can. While the database container images are around 4–5 GB, they are multi-arch and start quickly for easy dev workflows on your laptop.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/anders-swanson/oracle-database-code-samples/blob/main/oracle-ai-database-docker-compose/README.md" rel="noopener noreferrer"&gt;Oracle AI Database Docker Compose sample&lt;/a&gt; spins up a disposable database on localhost:1521. It’s enough for most app development: point your app at the container database and fire away. When you’re done, throw the container away.&lt;/p&gt;

&lt;p&gt;I like containers a lot and use them constantly for development work. Here are some more Oracle-specific container resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For API work, try &lt;a href="https://github.com/anders-swanson/oracle-database-code-samples/blob/main/ords-docker-compose/README.md" rel="noopener noreferrer"&gt;Oracle REST Data Services (ORDS) with Docker Compose&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For Testcontainers developers, try both &lt;a href="https://andersswanson.dev/2025/09/11/learn-testcontainers-java-with-oracle-database-free/" rel="noopener noreferrer"&gt;Oracle AI Database Free&lt;/a&gt; and &lt;a href="https://andersswanson.dev/2026/04/07/test-ords-locally-with-testcontainers-oracle-ai-database-free-and-mongodb/" rel="noopener noreferrer"&gt;ORDS&lt;/a&gt; in your test suites.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use containers when you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need repeatable local development&lt;/li&gt;
&lt;li&gt;Are running integration tests that create and destroy their own database&lt;/li&gt;
&lt;li&gt;Are testing a feature before moving it into shared infrastructure&lt;/li&gt;
&lt;li&gt;Need ORDS locally for REST, JSON, or SQL Developer Web workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Containers help you proof your code, schema, and assumptions on a clean database environment.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. FreeSQL
&lt;/h2&gt;

&lt;p&gt;Sometimes the right local setup is no local setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.oracle.com/database/technologies/oracle-free-sql/" rel="noopener noreferrer"&gt;Oracle FreeSQL&lt;/a&gt; gives you a browser-based SQL environment for learning, testing queries, and sharing examples without installing a database first. It is a good tool when the goal is to remove setup friction.&lt;/p&gt;

&lt;p&gt;With a free account, you get a personal schema and can connect from tools such as SQLcl, VS Code, and application code. I covered that workflow in &lt;a href="https://andersswanson.dev/2026/02/04/use-oracle-freesql-com-as-a-remote-test-database/" rel="noopener noreferrer"&gt;Use Oracle FreeSQL as a remote test database&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Use FreeSQL when you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are learning SQL or teaching someone else&lt;/li&gt;
&lt;li&gt;Need a remote schema without provisioning cloud infrastructure&lt;/li&gt;
&lt;li&gt;Want to test a query from a browser&lt;/li&gt;
&lt;li&gt;Are looking for a simple database target for examples, demos, or agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FreeSQL is a low-friction place to start proving small things.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. SQLcl MCP Server
&lt;/h2&gt;

&lt;p&gt;SQLcl is one of the first tools I install for Oracle AI Database work.&lt;/p&gt;

&lt;p&gt;It is fast, scriptable, and useful for normal database development. You can run SQL, execute setup scripts, inspect objects, export data, load data, and automate validation without opening a full IDE.&lt;/p&gt;

&lt;p&gt;Now SQLcl also matters for AI-assisted development. Oracle describes &lt;a href="https://www.oracle.com/database/sqldeveloper/technologies/sqlcl/" rel="noopener noreferrer"&gt;SQLcl&lt;/a&gt; as a free command-line interface with an integrated MCP Server, and the &lt;a href="https://docs.oracle.com/en/database/oracle/sql-developer-command-line/26.1/sqcug/sqlcl-mcp-server.html" rel="noopener noreferrer"&gt;SQLcl MCP Server documentation&lt;/a&gt; explains how AI clients can use saved SQLcl connections to discover database context and run database operations through a structured MCP interface. The MCP server is something you can plug into Codex or Claude Code to assist with database operations.&lt;/p&gt;

&lt;p&gt;Use SQLcl when you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A reliable command-line SQL tool&lt;/li&gt;
&lt;li&gt;Repeatable scripts for setup, validation, or data loading&lt;/li&gt;
&lt;li&gt;An MCP bridge between an AI assistant and Oracle AI Database&lt;/li&gt;
&lt;li&gt;Agents to inspect real schema metadata instead of guessing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re using MCP, I also recommend reading up on &lt;a href="https://blogs.oracle.com/database/oracle-deep-data-security-is-now-available-in-oracle-ai-database-26ai" rel="noopener noreferrer"&gt;Oracle Deep Data Security&lt;/a&gt;, which is aimed at solving problems around authorization for agentic AI. The practical idea of Deep Data Security is simple: enforce authorization at the data layer, not only in the app or the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. SQL Developer
&lt;/h2&gt;

&lt;p&gt;SQL Developer complements SQLcl, providing additional features beyond the capabilities of the command line.&lt;/p&gt;

&lt;p&gt;Most database developers eventually need a visual tool for browsing schemas, inspecting rows, reviewing objects, writing SQL, debugging PL/SQL, or explaining something on a screen share.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.oracle.com/database/sqldeveloper/" rel="noopener noreferrer"&gt;SQL Developer&lt;/a&gt; is Oracle’s tool family for that job. If you need a dedicated database IDE, use standalone SQL Developer. If your day already lives in VS Code, use &lt;a href="https://www.oracle.com/database/sqldeveloper/vscode/" rel="noopener noreferrer"&gt;SQL Developer for VS Code&lt;/a&gt; and keep database work closer to your application code.&lt;/p&gt;

&lt;p&gt;Use SQL Developer when you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to browse schemas and database objects visually&lt;/li&gt;
&lt;li&gt;You are writing or debugging SQL and PL/SQL&lt;/li&gt;
&lt;li&gt;You need to inspect data quickly&lt;/li&gt;
&lt;li&gt;Want Oracle AI Database tooling inside VS Code or as a standalone app&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Always Free Autonomous AI Database
&lt;/h2&gt;

&lt;p&gt;Some work needs a managed cloud database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/cloud/paas/autonomous-database/serverless/adbsb/autonomous-always-free.html" rel="noopener noreferrer"&gt;Always Free Autonomous AI Database&lt;/a&gt; is what I use when I need something closer to a real cloud deployment.&lt;/p&gt;

&lt;p&gt;It’s a strong fit for personal projects, demos, APEX and ORDS work, cloud-native experiments, and validation that needs real cloud connectivity. You can test wallets, network rules, deployment behavior, and managed database operations in a realistic environment.&lt;/p&gt;

&lt;p&gt;The tradeoff is that it’s still managed cloud infrastructure. You need an Oracle Cloud Infrastructure (OCI) account, and you need to understand wallets, networking, and free-tier quotas. Always Free is useful for learning and validation, but it is not production capacity. Treating it like production will lead to bad assumptions.&lt;/p&gt;

&lt;p&gt;Use Always Free Autonomous AI Database when you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need a persistent, managed Oracle AI Database environment&lt;/li&gt;
&lt;li&gt;Are building demos or personal projects with Oracle AI Database&lt;/li&gt;
&lt;li&gt;Want to test wallet-based connectivity&lt;/li&gt;
&lt;li&gt;Need to validate cloud deployment behavior before using paid resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Always-Free tier includes not just one, but two free Autonomous AI Database instances. Beyond that limit, &lt;a href="https://docs.oracle.com/en-us/iaas/autonomous-database-serverless/doc/autonomous-database-for-developers.html" rel="noopener noreferrer"&gt;Database For Developers&lt;/a&gt; offers fixed size database instances for ~$30/month on OCI. &lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: LiveLabs Training and Tutorials
&lt;/h2&gt;

&lt;p&gt;Tools are easier to adopt when there is a guided path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://livelabs.oracle.com/ords/r/dbpm/livelabs/home" rel="noopener noreferrer"&gt;Oracle LiveLabs&lt;/a&gt; gives you hands-on labs and workshops across Oracle technologies. It is useful when you need more than documentation but less than a full course.&lt;/p&gt;

&lt;p&gt;Use Oracle LiveLabs when you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are learning a feature for the first time&lt;/li&gt;
&lt;li&gt;A guided workshop before building your own version&lt;/li&gt;
&lt;li&gt;Need training material for a team&lt;/li&gt;
&lt;li&gt;Want examples that connect product features to real tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Start Small and Prove One Thing&lt;/strong&gt;&lt;br&gt;
The goal isn’t to collect Oracle tools.&lt;/p&gt;

&lt;p&gt;The goal is to keep the development loop short: write code, run SQL, inspect results, automate the boring parts, and move from local to cloud without changing the way you think about the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: What’s the fastest way to start using Oracle AI Database locally?&lt;/strong&gt;&lt;br&gt;
 Use container images with Docker Compose to spin up a disposable database for development and testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: When should I use FreeSQL instead of a local database?&lt;/strong&gt;&lt;br&gt;
 When you want zero setup — ideal for learning SQL, quick demos, or testing queries in a browser.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Why use both SQLcl and SQL Developer?&lt;/strong&gt;&lt;br&gt;
 SQLcl is great for scripting and automation, while SQL Developer helps with visual tasks like browsing schemas and debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: When do I move to a cloud database?&lt;/strong&gt;&lt;br&gt;
 Use Always Free Autonomous AI Database when you need persistent storage, cloud connectivity testing, or a more production-like environment.&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>webdev</category>
      <category>sqlserver</category>
      <category>database</category>
    </item>
    <item>
      <title>When to use Claude memory, Oracle AI Agent Memory, and LangChain together</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Thu, 25 Jun 2026 19:44:11 +0000</pubDate>
      <link>https://dev.to/oracledevs/when-to-use-claude-memory-oracle-ai-agent-memory-and-langchain-together-1g03</link>
      <guid>https://dev.to/oracledevs/when-to-use-claude-memory-oracle-ai-agent-memory-and-langchain-together-1g03</guid>
      <description>&lt;p&gt;&lt;strong&gt;Build a controlled Claude MCP workflow with Oracle SQLcl, Oracle AI Database, Oracle AI Agent Memory, and LangChain.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Companion notebook:&lt;/strong&gt; &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/claude_mcp_oracle_ai_database_memory_langchain.ipynb" rel="noopener noreferrer"&gt;Claude MCP Oracle AI Database: When to use Claude memory, Oracle AI Agent Memory, and LangChain together&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; turns AI-to-database access into an explicit tool contract instead of implicit system access.  &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.oracle.com/en/database/oracle/sql-developer-command-line/25.4/sqcug/using-oracle-sqlcl-mcp-server.html" rel="noopener noreferrer"&gt;Oracle SQLcl in MCP mode&lt;/a&gt; (sql -mcp) is a direct, documented way to connect Claude Desktop to Oracle AI Database through an MCP server.  &lt;/li&gt;
&lt;li&gt;Oracle AI Database provides the persistent storage and &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/overview-ai-vector-search.html" rel="noopener noreferrer"&gt;vector search&lt;/a&gt; layer for memory workloads, while &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/agmea/get-started.html" rel="noopener noreferrer"&gt;Oracle AI Agent Memory&lt;/a&gt; gives teams a Python API for threads, durable memory records, scoped retrieval, and context assembly on top of it. &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.langchain.com/oss/python/integrations/vectorstores/oracle" rel="noopener noreferrer"&gt;LangChain plus langchain-oracledb&lt;/a&gt; is useful for structured retrieval pipelines once the memory layer is in place. &lt;/li&gt;
&lt;li&gt;A hybrid model is a strong default for many teams: Claude + MCP for operational interaction, Oracle AI Database + LangChain for durable memory records and retrieval.
  &lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Here is how those components connect in this pattern. Claude talks to Oracle through SQLcl MCP, using the tools and database permissions you expose. Oracle AI Agent Memory is the Python package your app uses to manage durable memory records and context assembly. LangChain is an optional wrapper at the end of the retrieval path. The knowledge base is data in Oracle tables, with retrieval and access governed by your application and database design. &lt;/p&gt;

&lt;p&gt;Production success depends less on “prompt quality” and more on boundaries, privileges, logging, and repeatable runbooks.  &lt;/p&gt;

&lt;p&gt;Many AI assistant demos fail in the same place: not in the first interaction, but in week two. The assistant can generate SQL and explain concepts, but the workflow often lacks durable context across sessions. Teams also struggle to answer basic operational questions, like who executed what, where, and with which permissions.  &lt;/p&gt;

&lt;p&gt;That is why this topic matters for developer teams right now. If you are integrating AI into workflows that query, analyze, or modify data in Oracle- running reports, inspecting schemas, retrieving context, or writing results back- you need two things at once: controlled execution and durable memory.&lt;/p&gt;

&lt;p&gt;MCP defines the execution boundary. Oracle AI Database provides durable storage, vector search, and database controls for application memory records. You can build that layer directly with tables and retrieval logic, but the Oracle AI Agent Memory Python package makes the integration easier once memory workflows start getting more complex. LangChain comes in later when you need structured retrieval and orchestration on top of that.  &lt;/p&gt;

&lt;p&gt;By the end of this guide you will know how to connect Claude to Oracle AI Database through a controlled MCP boundary, when Claude’s built-in memory is sufficient and when your application needs Oracle AI Agent Memory to manage durable memory records in Oracle AI Database, and how to build a retrieval pipeline you can query, audit, and grow over time. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The developer path through this guide is simple:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with one approved Oracle connection and a read-only validation query.  &lt;/li&gt;
&lt;li&gt;Put SQLcl MCP in front of that connection so Claude sees tools, not raw database credentials.  &lt;/li&gt;
&lt;li&gt;Check the audit and activity trail before adding more tool access.  &lt;/li&gt;
&lt;li&gt;Add Oracle AI Agent Memory when the workflow needs durable thread context, scoped recall, or reusable context cards.  &lt;/li&gt;
&lt;li&gt;Add LangChain only when you need application-side retrieval orchestration beyond the MCP interaction loop.  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgndzargo0v2g7the8flx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgndzargo0v2g7the8flx.png" alt=" " width="799" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Memory vs Oracle AI Agent Memory
&lt;/h2&gt;

&lt;p&gt;Claude’s built-in memory has improved significantly, with support for chat history and project-level context. It works well for assistant continuity, but it is still scoped to the assistant experience.  &lt;/p&gt;

&lt;p&gt;Before getting into the memory categories, it is worth introducing Oracle AI Agent Memory properly. It is a Python package that sits on top of Oracle AI Database and provides the application-facing API for conversation threads, durable memory records, scoped retrieval, and context cards you can pass back to an assistant. You can build the same tables and retrieval logic yourself, and the companion notebook shows exactly how that works at the table level. But once memory workflows grow-multiple users, cross-session context, retrieval at scale, this package saves a lot of repeated work. Think of Oracle AI Agent Memory as the API your application talks to, and Oracle AI Database as the storage and enforcement layer underneath it. &lt;/p&gt;

&lt;p&gt;In practice, “memory” means different things depending on the layer you are talking about. Claude Memory and Oracle AI Agent Memory solve different problems:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8wmq44okx3udlnzu7rxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F8wmq44okx3udlnzu7rxz.png" alt=" " width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As of writing, Claude’s memory makes conversations smoother, but it’s still scoped to the assistant experience. It’s not built for querying application history, sharing context across users, or enforcing database-level audit and access controls. That’s where Oracle AI Agent Memory comes in. It gives you a persistent application memory layer you can query and manage across sessions and teams. Important decisions should still be grounded in systems of record, application authorization, and human or workflow review where required.  &lt;/p&gt;

&lt;p&gt;A simple way to think about it: Claude remembers for the conversation. Oracle AI Agent Memory remembers for the system. &lt;/p&gt;

&lt;p&gt;Because memory records live in Oracle AI Database and not on one local machine, they can become portable across approved clients. Point a new machine at the same database with the right credentials and policies, and the application can retrieve the same memory records. &lt;/p&gt;

&lt;p&gt;Even with Claude memory, teams often need an application-level memory layer. Claude memory is not designed for querying history across users, storing tool logs, or applying database access controls. Oracle AI Database can help fill that gap by providing durable, shared, and queryable memory records for application workflows.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use the layers this way:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Claude memory for assistant continuity: preferences, project context, and conversational convenience inside the assistant experience.  &lt;/li&gt;
&lt;li&gt;Use SQLcl MCP when Claude needs to inspect or query Oracle through an explicit tool boundary.  &lt;/li&gt;
&lt;li&gt;Use Oracle AI Agent Memory when your application needs durable threads, searchable memory records, scoped retrieval, or context cards across users, agents, and sessions.  &lt;/li&gt;
&lt;li&gt;Use LangChain when your app needs reusable retrieval chains, routing logic, or orchestration around the memory and vector search layer.  &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why this architecture is useful for developers 
&lt;/h2&gt;

&lt;p&gt;The developer value is practical: each layer gives you something concrete to test before you trust the whole workflow. You can validate the MCP server, the saved SQLcl connection, the database role, the durable application-memory write path, and the retrieval query separately. &lt;/p&gt;

&lt;p&gt;That matters after the demo. When an answer looks wrong, a developer can inspect whether the tool call ran, which database user executed it, what SQL or retrieval path was used, which durable memory records or tool traces were returned, and whether the application assembled the right context. The failure stops being “the model was wrong” and becomes a narrower engineering problem. &lt;/p&gt;

&lt;p&gt;The responsibilities break down into testable layers: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The assistant translates user intent into a plan your app or MCP client can inspect.  &lt;/li&gt;
&lt;li&gt;MCP exposes a declared tool surface instead of broad implicit system access.  &lt;/li&gt;
&lt;li&gt;SQLcl MCP gives developers a reproducible bridge from Claude Desktop to approved Oracle connections.  &lt;/li&gt;
&lt;li&gt;Oracle AI Database keeps roles, privileges, memory records, tool logs, and vector retrieval close to the data layer.  &lt;/li&gt;
&lt;li&gt;Oracle AI Agent Memory gives Python developers a package API for threads, durable memory records, scoped search, and context cards. This is application memory, not just chat history.  &lt;/li&gt;
&lt;li&gt;LangChain handles retrieval workflows and tool coordination where application logic is needed, without becoming the permission boundary. &lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The payoff is a workflow that is easier to review, easier to debug, and easier to grow. You can start read-only, prove the connection and logging path, add durable memory when the application needs continuity across sessions or workflows, and keep each new capability attached to a named layer instead of burying everything in prompts or a custom agent framework.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the two execution loops 
&lt;/h2&gt;

&lt;p&gt;Building on the separation of responsibilities above, the system naturally forms two execution loops:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loop A: An operational loop for real-time interaction (Claude + MCP): This is the real-time interaction where Claude works with MCP to run queries, inspect data, and respond immediately.  &lt;/li&gt;
&lt;li&gt;Loop B: A persistence loop for cross-session memory (Oracle AI Database + LangChain): This is where Oracle AI Database and LangChain handle durable memory records, tool logs, and context retrieval across sessions. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One loop handles real-time interaction, the other handles durable memory records and retrieval.  &lt;/p&gt;

&lt;p&gt;SQLcl MCP is for Claude operating interactively- real-time queries during a conversation, routed through a declared tool contract. Oracle AI Agent Memory is for your application code- storing turns, retrieving history, assembling context before Claude sees a prompt. They serve different loops at different times. You can drop either one depending on your use case, but many production setups benefit from both. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcny0lwz4qw0pio1vyzp6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcny0lwz4qw0pio1vyzp6.png" alt=" " width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup Guide: Reproducing the Oracle SQLcl MCP and Claude Workflow 
&lt;/h2&gt;

&lt;p&gt;The SQLcl MCP setup is documented by Oracle and reproducible in the way that matters for developers: you can install it, test it, validate the saved connection, and inspect activity before Claude runs a real query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites before you connect Claude&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Oracle SQLcl 25.2.0 or higher.  &lt;/li&gt;
&lt;li&gt;Oracle JRE 17 or 21.  &lt;/li&gt;
&lt;li&gt;Claude Desktop or another MCP-capable client you are explicitly configuring and testing.  &lt;/li&gt;
&lt;li&gt;At least one saved SQLcl connection profile under ~/.dbtools, created with password persistence for MCP use.  &lt;/li&gt;
&lt;li&gt;A database user with the minimum permissions required for the workflow. Start with read-only access and a sanitized development or replica environment where possible.
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core idea is simple. SQLcl runs in MCP mode with sql -mcp. Claude Desktop launches it as an MCP server and talks to the database through declared tools and the permissions attached to the saved connection. Connections come from saved profiles in the SQLcl connection store under ~/.dbtools. Claude does not invent them at runtime, it reuses ones you have already created and validated. &lt;/p&gt;

&lt;p&gt;One setup detail that catches people out: MCP-compatible saved connections need the password persisted. That is what the -savepwd flag does when you create the connection. Treat that saved profile as a credentialed application path: use a purpose-specific database user, keep the grant surface small, and avoid pointing first experiments at production data.&lt;/p&gt;

&lt;p&gt;Once that is done, you configure Claude Desktop to point at the SQLcl executable and pass -mcp as the argument. Claude Desktop manages the server lifecycle from there, and SQLcl translates tool calls into database operations. Oracle recommends granting the minimum permissions required, considering sanitized copies or read-only replicas for AI access, and auditing LLM activity. SQLcl MCP activity can be inspected through database-side traces such as DBTOOLS$MCP_LOG and session views such as V$SESSION. (&lt;a href="https://docs.oracle.com/en/database/oracle/sql-developer-command-line/25.4/sqcug/using-oracle-sqlcl-mcp-server.html" rel="noopener noreferrer"&gt;docs.oracle.com&lt;/a&gt;)  &lt;/p&gt;

&lt;p&gt;SQLcl MCP also supports restrict levels. The documented default is restrict level 4, which disables sensitive commands such as unrestricted file system access and host execution. Treat changes to the restrict level as an explicit security decision, not as a convenience toggle. (&lt;a href="https://docs.oracle.com/en/database/oracle/sql-developer-command-line/25.4/sqcug/configuring-restrict-levels-sqlcl-mcp-server.html" rel="noopener noreferrer"&gt;docs.oracle.com&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;A minimal configuration looks like this:&lt;br&gt;
&lt;br&gt;
  &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sqlcl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PATH/bin/sql"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That small JSON block defines the connection between Claude and SQLcl MCP Server. Claude interacts with the database through the tools and permissions exposed by the MCP server, using the saved SQLcl connection profile you created and tested first.  &lt;/p&gt;
&lt;h2&gt;
  
  
  Validation checklist before expanding access
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Run sql -mcp locally and confirm the server starts.  &lt;/li&gt;
&lt;li&gt;Restart Claude Desktop and confirm the SQLcl tools are discoverable.  &lt;/li&gt;
&lt;li&gt;Run one read-only query against an approved schema.  &lt;/li&gt;
&lt;li&gt;Check database-side MCP activity logs and session metadata.  &lt;/li&gt;
&lt;li&gt;Document the connection alias, database user, grant scope, restrict level, and troubleshooting owner.  &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Good first proof looks like this:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The MCP server starts without a Java or path error.  &lt;/li&gt;
&lt;li&gt;Claude lists the SQLcl MCP tools after restart.  &lt;/li&gt;
&lt;li&gt;A read-only query succeeds against the expected schema.  &lt;/li&gt;
&lt;li&gt;The database-side activity trail shows the MCP interaction.  &lt;/li&gt;
&lt;li&gt;A denied query fails because of the database role, not because a prompt asked nicely.  &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why put application memory records in Oracle AI Database, not just outputs
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
Once your first tool calls work, the next challenge is continuity. If memory lives only in chat context, the system is fragile. If memory is scattered across files without structure, retrieval and auditing become expensive over time.  &lt;/p&gt;

&lt;p&gt;It’s worth calling out the difference here. At this point, the challenge shifts from conversation persistence to system-level memory.  &lt;/p&gt;

&lt;p&gt;A model that uses Oracle AI Agent Memory is often cleaner and easier to operate as the workflow grows. &lt;/p&gt;

&lt;p&gt;The companion notebook builds this memory layer from scratch, so the mechanics are visible and then shows how Oracle AI Agent Memory slots on top of it once the substrate is working. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory categories that matter in practice&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conversational memory  
Stores user and assistant turns, thread IDs, timestamps, and metadata.  &lt;/li&gt;
&lt;li&gt;Operational memory  
Stores tool inputs, outputs, status, and error classes for troubleshooting and audit.  &lt;/li&gt;
&lt;li&gt;Semantic memory  
Stores chunks and embeddings for meaning-based retrieval when exact keywords are absent.  &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why this matters technically
&lt;/h2&gt;

&lt;p&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL tables give deterministic filtering and ordering.  &lt;/li&gt;
&lt;li&gt;Transactions improve integrity under concurrent writes.  &lt;/li&gt;
&lt;li&gt;Vector retrieval helps with paraphrases and conceptual matches.  &lt;/li&gt;
&lt;li&gt;Keeping memory on one platform makes it easier to manage, audit, and keep consistent over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works well with Oracle AI Database because structured records and semantic retrieval data can stay in one place. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjhqqd9i574gj7ch02x7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjhqqd9i574gj7ch02x7m.png" alt=" " width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Where LangChain adds value (and where it should not be overused)
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
LangChain is useful as orchestration glue, especially when teams want a documented path for tool definitions and retrieval calls. One thing worth stating clearly: in the architecture shown here, Claude Desktop does not call LangChain directly. LangChain runs in your application layer to format context before it reaches Claude’s prompt. With langchain-oracledb, teams can wire vector retrieval in Oracle AI Database while keeping control in database roles and runtime policies.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good uses of LangChain in this architecture&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Declaring retrieval and memory tools in a consistent format.  &lt;/li&gt;
&lt;li&gt;Running retrieval-first answer pipelines.  &lt;/li&gt;
&lt;li&gt;Standardizing how context is assembled before generation.  &lt;/li&gt;
&lt;li&gt;&lt;p&gt;Building reusable agent patterns across teams.&lt;br&gt;
  &lt;br&gt;
&lt;strong&gt;Poor uses of LangChain in this architecture&lt;/strong&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assuming LangChain automatically makes database access safe. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Relying only on prompts to limit what the assistant is allowed to do. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adding too many tools before your team knows how to manage and troubleshoot them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good rule is to enforce permissions in the database and infrastructure layer, not only in framework code or prompts.&lt;/p&gt;


&lt;h2&gt;
  
  
  Practical Implementation Snippets
&lt;/h2&gt;

&lt;p&gt;The snippets below show the minimum useful shape of the implementation: the MCP boundary, the memory substrate, a package-level memory API, and the retrieval policy that keeps generated answers grounded.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1) MCP boundary snippet&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
  &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sqlcl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;tools&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;sqlcl&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;bin&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;sql.exe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;2) Memory schema concept snippet&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
  &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- CONVERSATIONAL_MEMORY  &lt;/span&gt;
&lt;span class="n"&gt;THREAD_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CONTENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;METADATA_JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CREATED_AT&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; 
&lt;span class="err"&gt; &lt;/span&gt; 
&lt;span class="c1"&gt;-- TOOL_LOGS  &lt;/span&gt;
&lt;span class="n"&gt;THREAD_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TOOL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TOOL_INPUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TOOL_OUTPUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ERROR_MESSAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CREATED_AT&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; 
&lt;span class="err"&gt; &lt;/span&gt; 
&lt;span class="c1"&gt;-- KB_CHUNKS (used for vector retrieval via langchain-oracledb)  &lt;/span&gt;
&lt;span class="n"&gt;TEXT_CHUNK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;METADATA_JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EMBEDDING&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;3) Oracle AI Agent Memory package path&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"oracleagentmemory==26.4.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The package path expects Python 3.10 or later, Oracle AI Database, version 26ai or later for compatibility, an Oracle AI Database connection or connection pool, an embedding model for retrieval, and an optional LLM for memory extraction, summaries, and context cards. The exact adapters depend on your application, but the API shape is intentionally small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;oracleagentmemory.apis.searchscope&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SearchScope&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;oracleagentmemory.core.oracleagentmemory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OracleAgentMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;oracleagentmemory.core.embedders.embedder&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Embedder&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;oracleagentmemory.core.llms.llm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Llm&lt;/span&gt;

&lt;span class="n"&gt;embedder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Embedder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_EMBEDDING_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_LLM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;  &lt;span class="c1"&gt;# your oracledb connection or connection pool
&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OracleAgentMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db_pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Remember that I prefer morning deployment reviews.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got it. I will keep that preference in mind.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The user prefers morning deployment reviews.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When does this user prefer deployment reviews?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SearchScope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_context_card&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use oracleagentmemory from your application layer when you need package-managed users, agents, memories, threads, scoped retrieval, and context assembly. Keep systems of record separate from memory records: memory helps provide context, but application logic and authoritative data sources should still decide what is true, allowed, and final. (&lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/agmea/get-started.html" rel="noopener noreferrer"&gt;docs.oracle.com&lt;/a&gt;) &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4) Retrieval-first policy snippet (pseudo-policy)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve relevant memory before synthesizing final answer.  &lt;/li&gt;
&lt;li&gt;If retrieval is empty, say context is insufficient.  &lt;/li&gt;
&lt;li&gt;Keep answers evidence-first and concise.  &lt;/li&gt;
&lt;li&gt;Log tool calls with status and timestamp.
  &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Engineering guidance for production teams
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
The difference between demo success and production success is disciplined operations. Most failures at this stage come from integration gaps, not model behavior.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access and privilege model&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate accounts per environment (dev, test, prod).  &lt;/li&gt;
&lt;li&gt;Start read-only wherever possible.  &lt;/li&gt;
&lt;li&gt;Use least privilege grants and schema allowlists.  &lt;/li&gt;
&lt;li&gt;Gate write operations with explicit confirmation workflows.  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Observability model&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log tool name, thread ID, timestamp, status, and sanitized inputs.  &lt;/li&gt;
&lt;li&gt;Classify failures into runtime, connection, privilege, query, and retrieval.  &lt;/li&gt;
&lt;li&gt;Keep a troubleshooting playbook in your repo.  &lt;/li&gt;
&lt;li&gt;Check whether retrieval results become less accurate as more data is added.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reliability model&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prefer deterministic SQL patterns with bounded result sets.  &lt;/li&gt;
&lt;li&gt;Use retrieval-first context assembly for memory-heavy tasks.  &lt;/li&gt;
&lt;li&gt;Avoid giant context stuffing as a substitute for memory design.  &lt;/li&gt;
&lt;li&gt;Review and prune tool surfaces periodically. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where teams should align with platform and security teams early. Governance should be designed into the architecture, not bolted on after incidents.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Typical failure modes and how to diagnose them fast 
&lt;/h2&gt;

&lt;p&gt;Most teams hit a predictable set of issues.   &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime failure: sql -mcp does not start&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Check the absolute SQLcl path, confirm Java is available, and run sql -mcp outside Claude first. Resolve runtime first before checking assistant behavior.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discovery failure: Claude does not see tools&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Check the Claude Desktop JSON, confirm the configured command points to the SQLcl executable, and restart Claude Desktop after edits. If the server starts in a terminal but not from Claude, treat it as a config or environment-path problem.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection failure: tools are present but queries fail immediately&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Check the saved SQLcl connection alias, confirm the profile lives under the expected SQLcl connection store, and verify password persistence for the MCP workflow. Then test the same connection outside Claude.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Permission failure: queries execute selectively and fail on specific objects&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Check the database role first. A selective failure can be the right outcome when least privilege is working. Add grants intentionally, prefer schema allowlists, and keep read-write access separate from the initial validation path.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval quality failure: answers are fluent but weakly grounded&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Inspect the retrieved records before blaming the model. Check chunk size, metadata filters, embedding choice, top-k settings, and whether the query is asking for exact history, semantic similarity, or operational logs.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Why the hybrid model is a strong long-term default
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
By this point, you’ve probably noticed a pattern: no single layer handles both execution and memory well.  &lt;/p&gt;

&lt;p&gt;Trying to force everything into the assistant gets messy fast. You either lose control over execution, or you end up stuffing too much context into prompts just to keep things working. On the other side, if you only build backend memory systems, you lose the speed and usability that makes assistants useful in the first place.  &lt;/p&gt;

&lt;p&gt;The hybrid approach works because it doesn’t try to solve everything in one place:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execution stays controlled through MCP.  &lt;/li&gt;
&lt;li&gt;Memory stays durable and queryable in the database.  &lt;/li&gt;
&lt;li&gt;The two are connected where needed, not tightly coupled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real teams, this usually evolves over time. It starts simple: Claude with SQLcl MCP, read-only access, and basic workflows. Once people start relying on it, the gaps show up: we lose context, we can’t trace what happened, or we are repeating work.  &lt;/p&gt;

&lt;p&gt;That’s when it makes sense to introduce Oracle AI Agent Memory and retrieval. Not earlier. &lt;/p&gt;

&lt;p&gt;The goal isn’t to build perfect architecture upfront. It’s to add structure where the system starts to break. &lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
Setting up Claude with SQLcl MCP works well when treated as an architectural pattern, not just a series of setup steps.  Each layer has one job: Claude handles intent, MCP enforces the execution boundary, Oracle AI Database stores durable memory records and audit data, and LangChain handles retrieval orchestration where needed. &lt;/p&gt;

&lt;p&gt;With clear execution boundaries and durable memory, you can trace what happened, understand failures, and evolve the workflow without introducing hidden behaviour. &lt;/p&gt;

&lt;p&gt;That shift, from implicit access and adhoc context to explicit boundaries and durable memory, is what moves AI-assisted workflows from experiments to operational systems. &lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;  &lt;br&gt;
&lt;strong&gt;What is MCP in this context?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;MCP is a protocol that lets Claude call explicit tools exposed by a server, rather than accessing systems implicitly.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why use SQLcl for Oracle MCP?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;SQLcl already understands Oracle workflows and can run as MCP server with sql -mcp, making integration practical and direct.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this setup only for Claude Desktop?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;No. The same MCP and memory architecture concepts can be reused with other MCP-capable clients and backend services.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why include Oracle AI Database if MCP already works?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;MCP handles execution boundaries. Oracle AI Database handles durable memory records, retrieval, concurrency, and database access controls. Claude’s own memory helps within a session, but it is not designed as an application memory layer.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What versions are required for the SQLcl MCP setup?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Oracle documents that the SQLcl MCP Server requires Oracle SQLcl 25.2.0 or higher, Oracle JRE 17 or 21, Claude Desktop, and at least one saved SQLcl connection profile with password persistence enabled via -savepwd. Teams should verify the latest compatibility guidance in the official Oracle documentation as MCP support evolves. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does Oracle AI Agent Memory fit?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Oracle AI Agent Memory sits between your application code and Oracle AI Database. The package manages threads, durable memories, scoped retrieval, and context cards, while Oracle AI Database remains the storage and enforcement layer underneath. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does LangChain fit?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;LangChain is an orchestration layer for tools and retrieval. It can help assemble context and retrieval pipelines, but permissions still belong in the database, infrastructure, and application runtime.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need vector search for every use case?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;No. Start with structured memory. Add vector retrieval when paraphrase-heavy or concept-level retrieval becomes important.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I prevent risky SQL operations?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Use least privilege roles, schema allowlists, read-only access where possible, SQLcl MCP restrict levels, and explicit confirmation workflows for high-impact actions.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can this support audit or compliance needs?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;It can support audit-oriented workflows if tool traces, SQL-level controls, retention policies, and review processes are implemented consistently. Do not treat memory records as the sole authoritative record for regulated or high-impact decisions.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Companion Troubleshooting Appendix
&lt;/h2&gt;

&lt;p&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimum viable setup:&lt;/strong&gt; SQLcl MCP configured in Claude, one approved Oracle connection, read-only validation, and database-side activity logging.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First checks:&lt;/strong&gt; confirm sql -mcp starts, Claude sees the tools after restart, and the saved SQLcl connection alias resolves.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment model:&lt;/strong&gt; use separate credentials and policies for dev, test, and prod, with stricter controls as capability expands.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging model:&lt;/strong&gt; capture tool name, timestamp, thread ID, status, sanitized input/output summaries, and relevant SQLcl MCP log records.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval quality:&lt;/strong&gt; tune chunk size, enrich metadata, review embedding choice, and evaluate retrieval against representative queries.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common anti-pattern:&lt;/strong&gt; expanding tool surfaces before ownership, logging standards, and runbooks are in place.  &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollout path:&lt;/strong&gt; pilot in dev with read-only access and strong logging, then expand capabilities in controlled phases.  &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claude</category>
      <category>oracle</category>
      <category>langchain</category>
      <category>agentmemory</category>
    </item>
    <item>
      <title>Measuring Semantic Cache Quality, Latency, and Provider-Call Avoidance with Oracle AI Database 26ai</title>
      <dc:creator>Mark Nelson</dc:creator>
      <pubDate>Thu, 25 Jun 2026 18:19:53 +0000</pubDate>
      <link>https://dev.to/oracledevs/measuring-semantic-cache-quality-latency-and-provider-call-avoidance-with-oracle-ai-database-26ai-5do6</link>
      <guid>https://dev.to/oracledevs/measuring-semantic-cache-quality-latency-and-provider-call-avoidance-with-oracle-ai-database-26ai-5do6</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Measure answer quality before latency.&lt;/strong&gt; A vector match is only a cache candidate until threshold checks and policy rules approve reuse.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Count provider-call avoidance only after approved reuse.&lt;/strong&gt; An avoided provider call means the application returned a cached answer and skipped generation, not merely that a nearby vector was found.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compare request paths separately.&lt;/strong&gt; No cache, exact cache, semantic cache on the primary database, and semantic cache through Oracle True Cache all do different work. Blending them into one latency average hides the result.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Oracle True Cache as a read-path option.&lt;/strong&gt; Oracle True Cache can help with eligible read-heavy lookup traffic, while semantic matching, threshold checks, policy approval, write routing, and invalidation policy remain separate responsibilities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If your semantic cache is returning hits, congratulations - you have reached the dangerous part!&lt;/p&gt;

&lt;p&gt;A hit counter can make a demo look better than it is. Maybe the cached answer was safe to reuse. Maybe the application found a nearby vector match that the policy needed to reject. Maybe the hit rate improved, but every miss now pays for embedding generation, database lookup, provider generation, and write-back. Maybe Oracle True Cache is configured, but the lookup you care about is still using the primary database route.&lt;/p&gt;

&lt;p&gt;That is why measurement discipline matters. In article 1, we drew the architecture boundary: semantic caching is governed answer reuse, not just vector search. In article 2, we implemented the pattern with Spring Boot, a provider abstraction, an Oracle semantic-cache schema, and Oracle True Cache for eligible read-only lookup traffic. Now the developer question is simple: &lt;strong&gt;How do I know whether this cache is helping without fooling myself?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer starts with quality. Then it moves to provider-call accounting, latency by path, and the read-path role of Oracle True Cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;This article uses the same demo application from article 2, in the public repository at &lt;code&gt;https://github.com/markxnelson/semantic-cache-oracle-demo&lt;/code&gt;. There is no second demo codebase here; article 3 focuses on how to read the validation and benchmark-lite output from that application.&lt;/p&gt;

&lt;p&gt;You need a Linux/Bash environment, a checkout of the demo repository, the article 2 database and Oracle True Cache stack started by following the repository setup instructions, the Maven wrapper and demo scripts available and executable, and access to generated reports under &lt;code&gt;reports/generated/&lt;/code&gt; after the validation and benchmark-lite scripts run.&lt;/p&gt;

&lt;p&gt;The validation and benchmark-lite commands use the article 2 demo configuration. Before interpreting the results, confirm whether that configuration uses deterministic fixtures, local services, or live provider-backed calls. Provider-call counts from deterministic or stubbed runs are useful for checking the measurement path, but they are not token or billing evidence unless the report records token accounting from a live provider run.&lt;/p&gt;

&lt;p&gt;The demo repository includes the Maven wrapper and the two scripts used in this article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo
./scripts/run-validation.sh
./scripts/run-benchmark-lite.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;run-validation.sh&lt;/code&gt; rebuilds the Spring Boot demo, runs the deterministic scenario set, and writes the validation reports. &lt;code&gt;run-benchmark-lite.sh&lt;/code&gt; reuses that generated event data to summarize decision counts, provider-call accounting, and route latency fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  Name the semantic-cache request paths before comparing numbers
&lt;/h2&gt;

&lt;p&gt;A semantic-cache measurement run does not begin with one blended average. It begins by naming the path each request took.&lt;/p&gt;

&lt;p&gt;For this series, the important paths are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No cache&lt;/strong&gt; means the request does not reuse a cache entry. The application calls the provider and, in the demo pattern, writes a new cache entry and event through the primary database path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exact cache&lt;/strong&gt; means a scoped deterministic lookup succeeds. This is usually based on a normalized prompt hash plus tenant, model, prompt template, source, policy, status, and time-to-live predicates. Exact hits can avoid both embedding generation and provider calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic cache on the primary database&lt;/strong&gt; means the application embeds the new prompt, performs vector-aware candidate lookup against the primary Oracle AI Database 26ai route, applies threshold and policy checks, and either returns a cached answer or calls the provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic cache through Oracle True Cache&lt;/strong&gt; means the read-only candidate lookup is routed through the Oracle True Cache read path when eligible. The application still applies the same threshold and policy checks. In this demo architecture, cache inserts, invalidations, and event records are routed to the primary database path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvkz3fjpo8micic8wyd05.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvkz3fjpo8micic8wyd05.png" alt="Semantic-cache measurement paths to separate" width="605" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Separate no cache, exact cache, semantic cache on the primary database, and semantic cache through the Oracle True Cache read path before interpreting provider calls, avoided calls, or latency.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This separation matters because the paths do different work. An exact hit may avoid embeddings entirely. A semantic hit pays for embedding and lookup but may avoid a provider call. A semantic miss can be slower than no cache because it adds embedding and lookup before the provider call. An Oracle True Cache route can affect the eligible database read portion, but it does not remove embedding generation, application policy evaluation, provider calls on misses, or primary-database write-back.&lt;/p&gt;

&lt;p&gt;The first rule is practical: &lt;strong&gt;compare paths, not vibes.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Validate cache decisions before measuring speed
&lt;/h2&gt;

&lt;p&gt;Before you look at latency, prove the harness can distinguish safe reuse from unsafe reuse.&lt;/p&gt;

&lt;p&gt;From the demo repository directory, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo

./scripts/start-databases.sh
./scripts/wait-for-oracle.sh

./scripts/run-validation.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The startup script brings up the primary Oracle AI Database 26ai Free container, aligns the password file needed by Oracle True Cache, starts the True Cache service, and registers the primary and PDB-level True Cache services used by the demo. The wait script then checks both database routes before the application validation runs. The validation script builds the application and runs the scenario harness; it is the executable check for this demo.&lt;/p&gt;

&lt;p&gt;After the script completes, open &lt;code&gt;reports/generated/validation-summary.md&lt;/code&gt;. In this demo run, the validation summary separates the scenarios that matter to semantic-cache correctness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A seed miss that calls the provider.&lt;/li&gt;
&lt;li&gt;An exact hit that returns a cached answer without calling the provider.&lt;/li&gt;
&lt;li&gt;A semantic hit that passes threshold and policy checks before reuse.&lt;/li&gt;
&lt;li&gt;A near miss that finds a nearby candidate but rejects it.&lt;/li&gt;
&lt;li&gt;A tenant-isolation case that does not reuse another tenant’s cached answer.&lt;/li&gt;
&lt;li&gt;A model-mismatch case that does not reuse an answer scoped to a different model.&lt;/li&gt;
&lt;li&gt;A source-fingerprint mismatch that does not reuse an answer from a different source version.&lt;/li&gt;
&lt;li&gt;An expired-entry case that proves TTL filters prevent stale reuse.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current generated summary is intentionally short:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Semantic Cache Validation Summary

Status: passed

- `seed-miss`: miss via `primary`, provider calls `1`
- `exact-hit`: exact-hit via `true-cache`, provider calls `0`
- `semantic-hit`: semantic-hit via `true-cache`, provider calls `0`
- `near-miss`: near-miss via `true-cache`, provider calls `1`
- `tenant-isolation`: miss via `primary`, provider calls `1`
- `model-mismatch`: miss via `primary`, provider calls `1`
- `source-fingerprint-mismatch`: miss via `primary`, provider calls `1`
- `expired-entry`: miss via `primary`, provider calls `1`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For measurement work, &lt;code&gt;reports/generated/validation-events.csv&lt;/code&gt; is more useful because it keeps the decision, route, distance, threshold, provider-call count, and latency together:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;th&gt;Distance&lt;/th&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Provider calls&lt;/th&gt;
&lt;th&gt;Latency ms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seed-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.000016&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.679840&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tenant-isolation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;107&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;source-fingerprint-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;expired-entry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0.1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;160&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is more useful than a single hit-rate number. A useful validation report proves that the harness can distinguish approved reuse from rejected candidates and scoped mismatches. It also makes provider-call accounting visible per scenario.&lt;/p&gt;

&lt;p&gt;The deterministic fixture behaved as expected: exact and semantic approved hits avoided provider calls in the harness, while near misses and scoped mismatches still called the provider. That does not make the threshold universally safe or turn fixture latency into production performance. It gives you a repeatable way to inspect the cache decision before you start tuning for speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Separate vector candidates from approved semantic-cache hits
&lt;/h2&gt;

&lt;p&gt;A fast false positive is worse than a miss. If the cache returns a wrong answer quickly, the latency improvement is not a win. The first measurement job is to separate “we found something nearby” from “we safely reused the answer.”&lt;/p&gt;

&lt;p&gt;A useful semantic-cache report separates at least four ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Candidate found:&lt;/strong&gt; the vector lookup found a nearby stored prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threshold passed:&lt;/strong&gt; the candidate was close enough under the configured metric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy approved:&lt;/strong&gt; deterministic scope checks allowed reuse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider call avoided:&lt;/strong&gt; the application returned the cached answer instead of calling the provider.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only the last one is an avoided provider call.&lt;/p&gt;

&lt;p&gt;Your validation evidence needs both a positive semantic-hit case and a rejected near-miss case. The semantic-hit case shows a candidate that passed the configured threshold and policy checks. The near-miss case shows that a similar-looking request can still be rejected when it fails the threshold or policy. Those two cases are the heart of semantic-cache quality measurement: the harness proves that a paraphrase can hit, and it also proves that unsafe or insufficiently similar requests can miss.&lt;/p&gt;

&lt;p&gt;Treat threshold values as application settings, not universal tuning advice. A useful threshold depends on the fixture, embedding model, distance metric, prompt domain, and risk tolerance. In your own application, the important question is not “what threshold did the demo use?” The important question is “which labeled prompts pass, which fail, and are those decisions safe?”&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Oracle AI Vector Search with relational policy data
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database 26ai includes Oracle AI Vector Search capabilities for storing vectors, calculating vector distance, and using vector indexes for similarity-search workloads. For semantic caching, the useful part is not vector search by itself. It is vector search next to relational policy data.&lt;/p&gt;

&lt;p&gt;Conceptually, the lookup can combine vector distance with deterministic policy filters. The following SQL is an illustrative teaching shape, not a copy-paste replacement for the article 2 repository query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;cached_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cached_answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached_prompt_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;policy_version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expires_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;your_semantic_cache_table&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;chat_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;chat_model&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding_dimension&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_template_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;prompt_template_version&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;source_fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;source_fingerprint&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;policy_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;policy_version&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACTIVE'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual bind syntax, vector conversion, table names, column names, and index strategy depend on your driver, framework, and schema. For example, if your application passes vectors as text rather than a native vector bind, use the documented conversion approach for that driver or framework.&lt;/p&gt;

&lt;p&gt;The important point is the combination. &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; calculates a distance value that can be used to order nearest candidates under the configured metric. SQL predicates narrow the search to policy-compatible rows. The application applies the threshold and any additional reuse rules. If no candidate is approved, the application calls the provider and writes the new answer through the primary database path.&lt;/p&gt;

&lt;p&gt;This is why a semantic-cache entry is not just a vector. Cache rows need deterministic scope and lifecycle data, such as tenant, chat model, embedding model, embedding dimension, prompt-template version, source fingerprint, policy version, status, and expiration. Those fields are part of correctness. When they are part of the lookup, hit counts are easier to trust because reuse is constrained to the right tenant, model, source, and policy boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Count provider-call avoidance only after approved reuse
&lt;/h2&gt;

&lt;p&gt;Developers often want one number from a semantic cache: “How many provider calls did we avoid?” That is a good number, but it needs a strict accounting rule. An avoided provider call means the application returned a cached answer and did not call the generation provider for that request.&lt;/p&gt;

&lt;p&gt;Use this rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;avoided_provider_call =
  cache_candidate_found
  AND threshold_passed
  AND policy_approved
  AND cached_answer_returned
  AND provider_call_not_made
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Java-like pseudocode, the metric belongs after the decision, not after candidate lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isPresent&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;passesThreshold&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;approves&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;returnedFromCache&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementProviderCallsAvoided&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementProviderCallsMade&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Near misses are not savings. Tenant mismatches are not savings. Model mismatches are not savings. Oracle True Cache reads that still lead to provider calls are not savings.&lt;/p&gt;

&lt;p&gt;Also be precise about the provider mode. The benchmark-lite report for this demo is intended to confirm measurement wiring for the configured workload. If that workload uses deterministic or stubbed provider behavior, it is useful for repeatability but is not a live billing report. Unless your generated artifacts include token accounting from a provider-backed run, say “provider calls avoided,” not “tokens saved.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Read latency by semantic-cache path
&lt;/h2&gt;

&lt;p&gt;Once quality behavior is visible, latency becomes useful. Your validation and benchmark-lite reports record latency with enough context to explain what happened. At minimum, inspect latency by scenario, decision, route, and provider-call count.&lt;/p&gt;

&lt;p&gt;Read those values as observations from the workload you ran. They are useful because they prove the harness captures latency by scenario and route. They are not evidence that one path will always be faster than another in your environment. The current benchmark-lite report for the demo is a measurement-wiring check unless you extend it to produce path-level latency summaries with units, sample counts, and route labels.&lt;/p&gt;

&lt;p&gt;For your own measurements, break latency into the components you can observe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding time&lt;/strong&gt; matters for semantic paths. Exact-cache lookup can often skip it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database lookup time&lt;/strong&gt; matters for exact and semantic paths. This is the portion where primary versus Oracle True Cache routing may be relevant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider time&lt;/strong&gt; may dominate misses in provider-backed applications, but deterministic mode does not represent live provider behavior unless you configure and measure a live provider run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write-back time&lt;/strong&gt; appears on misses and on any synchronous event recording path. In this demo pattern, writes are routed through the primary database path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Returned-answer time&lt;/strong&gt; is the end-to-end value your application users experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A semantic hit can still be worthwhile even if the cache path adds embedding and database lookup, because it may avoid a provider call that is slower, more expensive, or operationally constrained in your application. A semantic miss can be a latency regression if repeated paraphrases are rare or if the threshold rejects most candidates after doing extra work. That is why misses and near misses belong in the measurement set. Hits alone make a cache look cleaner than it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Put Oracle True Cache in the eligible read path
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database stores and queries the semantic-cache records. Oracle True Cache can participate in eligible read-only SQL lookup traffic. The application still owns threshold evaluation, policy approval, and the final reuse decision.&lt;/p&gt;

&lt;p&gt;That component boundary matters. Oracle True Cache can be part of the route used to read candidate rows. The semantic-cache decision still comes from vector-aware SQL, deterministic filters, threshold logic, and application policy.&lt;/p&gt;

&lt;p&gt;Before interpreting route-level measurements, confirm that the generated validation evidence shows the Oracle True Cache route is queryable. Useful checks include the True Cache database role or open mode, read-only state, application object visibility, and a simple &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; query through the True Cache service. That evidence proves the route is available for validation checks. It does not prove that every semantic-cache query in every workload belongs on Oracle True Cache, and it does not prove a production performance result.&lt;/p&gt;

&lt;p&gt;The boundary is easier to see as a flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request
  -&amp;gt; embedding generation, if semantic lookup is needed
  -&amp;gt; read-only semantic-cache lookup
       -&amp;gt; primary Oracle AI Database 26ai path
       OR eligible Oracle True Cache read path
  -&amp;gt; application threshold and policy checks
  -&amp;gt; cached answer returned, or provider called
  -&amp;gt; cache insert/update routed through the primary database path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application decision is the same whether the read came from the primary route or the Oracle True Cache route. In this demo architecture, misses, cache inserts, invalidations, and event writes are routed to the primary database path. Oracle True Cache itself is read-only; DML redirection is a separate database capability and is outside the semantic-cache write path described here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F31l675g9u6fuatvxxqiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F31l675g9u6fuatvxxqiy.png" alt="Oracle True Cache read-path boundary" width="738" height="1024"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In this demo pattern, Oracle True Cache belongs on eligible read-heavy lookup traffic. Semantic matching still comes from Oracle vector SQL and application policy, while write-back is routed to the primary database path.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There is also a freshness boundary. Oracle True Cache is read-only and consistent, but its data might not be the most current version compared with the primary database at every moment. In a semantic cache, freshness is part of correctness. If an invalidation or source-policy update must take effect immediately, route that check through the primary database path or require a primary-confirmed policy version before reuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use benchmark-lite as a measurement-wiring check
&lt;/h2&gt;

&lt;p&gt;Now run the benchmark-lite script and inspect &lt;code&gt;reports/generated/benchmark-lite-summary.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo

./scripts/run-benchmark-lite.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benchmark-lite report is a measurement-method check, not a production benchmark. The demo report summarizes the deterministic validation workload with scenario count, provider mode, decision counts, provider-call accounting, approved cache-hit rate, and p50/p95 latency by route.&lt;/p&gt;

&lt;p&gt;The script also prints the summary to the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;== Benchmark lite summary ==
This lite report reuses the deterministic validation workload to confirm measurement wiring. It is not a production performance benchmark.
Scenarios: 8
Provider mode: deterministic mock
Embedding mode: deterministic fixture vectors
Decision counts:
- exact-hit: 1
- miss: 5
- near-miss: 1
- semantic-hit: 1
Provider calls made: 6
Provider calls avoided by approved reuse: 2
Cache hit rate for approved exact or semantic reuse: 25.00%
Latency by route:
- primary: samples=5 p50_ms=113 p95_ms=203
- true-cache: samples=3 p50_ms=58 p95_ms=64
Route by scenario:
- seed-miss: decision=miss route=primary provider_calls=1 distance=n/a
- exact-hit: decision=exact-hit route=true-cache provider_calls=0 distance=n/a
- semantic-hit: decision=semantic-hit route=true-cache provider_calls=0 distance=1.622126787093059E-5
- near-miss: decision=near-miss route=true-cache provider_calls=1 distance=0.6798398782881692
- tenant-isolation: decision=miss route=primary provider_calls=1 distance=n/a
- model-mismatch: decision=miss route=primary provider_calls=1 distance=n/a
- source-fingerprint-mismatch: decision=miss route=primary provider_calls=1 distance=n/a
- expired-entry: decision=miss route=primary provider_calls=1 distance=n/a
Generated reports:
- reports/generated/benchmark-lite-events.csv
- reports/generated/benchmark-lite-summary.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That route-by-scenario block is the part to look at when you want to know whether the request was satisfied from the True Cache read route or had to go back through the primary path:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;What happened&lt;/th&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seed-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No reusable entry existed, so the provider was called and the answer was written through the primary path.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The scoped prompt hash matched and the provider was skipped.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The scoped vector candidate passed the threshold and the provider was skipped.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The nearest candidate failed the threshold, so the provider was called.&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;true-cache&lt;/code&gt; lookup, then primary write path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tenant-isolation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scope rejected reuse for a different tenant.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scope rejected reuse for a different embedding model.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;source-fingerprint-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scope rejected reuse for a different source fingerprint.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;expired-entry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TTL filtering rejected the stale entry.&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here is the important excerpt from the generated summary:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Report field&lt;/th&gt;
&lt;th&gt;Value from the demo run&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scenarios&lt;/td&gt;
&lt;td&gt;&lt;code&gt;8&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provider mode&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deterministic mock&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding mode&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deterministic fixture vectors&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency units&lt;/td&gt;
&lt;td&gt;&lt;code&gt;milliseconds&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision counts&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;exact-hit: 1&lt;/code&gt;, &lt;code&gt;semantic-hit: 1&lt;/code&gt;, &lt;code&gt;near-miss: 1&lt;/code&gt;, &lt;code&gt;miss: 5&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provider calls made&lt;/td&gt;
&lt;td&gt;&lt;code&gt;6&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provider calls avoided by approved reuse&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Approved exact-or-semantic cache-hit rate&lt;/td&gt;
&lt;td&gt;&lt;code&gt;25.00%&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary route latency&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;5 samples&lt;/code&gt;, &lt;code&gt;p50: 113 ms&lt;/code&gt;, &lt;code&gt;p95: 203 ms&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;True Cache route latency&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;3 samples&lt;/code&gt;, &lt;code&gt;p50: 58 ms&lt;/code&gt;, &lt;code&gt;p95: 64 ms&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fu4zc7xwru1stf2e8xxyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fu4zc7xwru1stf2e8xxyy.png" alt="Benchmark-lite run results graph" width="800" height="832"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The deterministic benchmark-lite run records provider-call accounting and route latency for the validation workload. It shows measurement wiring, not production performance.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Read that table carefully. It proves that the harness records useful categories and route labels for this deterministic run. It does not prove that True Cache is faster for your workload, that the route samples are comparable, or that the demo represents live provider latency, concurrency, warm-up behavior, token billing, or production traffic mix.&lt;/p&gt;

&lt;p&gt;If you extend the harness into a fuller path comparison, use stable labels such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;no-cache&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;exact-cache&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;semantic-cache-primary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;semantic-cache-true-cache&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The labels in the article, report, scripts, and visuals must match the code path each request actually used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkggz8umyu1drmb3z5fo6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkggz8umyu1drmb3z5fo6.png" alt="Reading the benchmark-lite report" width="800" height="998"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Read benchmark-lite from evidence to interpretation: quality signals, provider-call accounting, route notes, latency context, and limitations all matter before drawing conclusions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A production-ready measurement report answers the questions that the demo report only starts to address: what workload ran, whether the provider and embeddings were deterministic or live, which path each scenario used, what units and sample sizes were recorded, whether warm-up and run order were controlled, whether provider calls were made or avoided, whether Oracle True Cache was available and actually used, and which limitations stay visible when someone reads the report later.&lt;/p&gt;

&lt;p&gt;If the report does not answer those questions, improve the report before improving the graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adapt the harness to your own prompts
&lt;/h2&gt;

&lt;p&gt;The next step is not to tune for the highest hit rate. The next step is to add representative prompts and inspect the rejected cases.&lt;/p&gt;

&lt;p&gt;Start with a small labeled set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;safe paraphrase:
  "How long do I have to return unopened shoes?"
  "What is the return window for shoes I have not worn?"

near miss:
  "Can I return worn shoes after 90 days?"

scope rejection:
  same wording, different tenant
  same wording, different chat model
  same wording, different source fingerprint or policy version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then change one thing at a time. Change one threshold or policy setting, rerun validation, compare the generated validation summary, and rerun benchmark-lite only after the quality behavior still looks safe.&lt;/p&gt;

&lt;p&gt;Use the same scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo

./scripts/run-validation.sh
./scripts/run-benchmark-lite.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the same fixture, embedding model, and distance metric, a stricter threshold usually reduces semantic approvals and increases misses or near misses. A looser threshold may increase approvals, but it can also admit unsafe reuse. Policy filters reject candidates that are semantically close but unsafe across tenant, model, prompt-template, source, policy, status, or time-to-live boundaries.&lt;/p&gt;

&lt;p&gt;Tune quality first. A threshold that improves hit rate while returning unsafe answers is a regression.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical interpretation rules for semantic-cache reports
&lt;/h2&gt;

&lt;p&gt;When you review your generated reports, require labeled near misses. If the workload only contains obvious hits, it cannot tell you whether the cache is safe.&lt;/p&gt;

&lt;p&gt;Separate exact hits from semantic hits. Exact hits are usually cheaper because they can avoid embedding and vector lookup. Semantic hits are valuable when paraphrased repetition is common enough to justify the extra work.&lt;/p&gt;

&lt;p&gt;Count provider-call avoidance only after approved reuse. A rejected vector candidate is not a hit. An Oracle True Cache lookup that still leads to a provider call is not a saved provider call.&lt;/p&gt;

&lt;p&gt;Keep route evidence visible. If you plan to use Oracle True Cache, the report shows which requests used the True Cache read route and which write operations were routed to the primary database path.&lt;/p&gt;

&lt;p&gt;Keep benchmark-lite limitations in the report itself. A deterministic local workload is useful for wiring and behavior. Production benchmarking needs representative prompts, realistic provider mode, concurrency, warm-up, isolation, run order, sample sizes, latency distribution, route validation, and operational monitoring.&lt;/p&gt;

&lt;p&gt;These rules may sound conservative, but they make the cache easier to trust. The goal is not to make the demo look fast. The goal is to make the reuse decision observable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: make semantic caching measurable before making it faster
&lt;/h2&gt;

&lt;p&gt;A semantic cache helps when it safely avoids work the application would otherwise repeat. That means correctness and cost have to be measured together.&lt;/p&gt;

&lt;p&gt;Start with validation. Prove that exact hits, semantic hits, misses, near misses, tenant isolation, model mismatch, source-fingerprint mismatch, and expired-entry rejection behave differently. Then inspect provider-call accounting. Count avoided calls only when approved reuse returns the cached answer. Then read latency by path, not as one blended number. Finally, decide whether Oracle True Cache belongs in the eligible read-heavy lookup path for your workload.&lt;/p&gt;

&lt;p&gt;The demo gives you a compact starting point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo

./scripts/run-validation.sh
./scripts/run-benchmark-lite.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, replace the fixture prompts with representative traffic from your application. Keep the generated reports under review, and tune the threshold and policy rules before chasing latency improvements.&lt;/p&gt;

&lt;p&gt;A semantic cache is only useful when it is both safe and cheaper than generation. Measure those two things together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/" rel="noopener noreferrer"&gt;Oracle AI Vector Search documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_distance.html" rel="noopener noreferrer"&gt;Oracle &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; SQL function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector.html" rel="noopener noreferrer"&gt;Oracle &lt;code&gt;VECTOR&lt;/code&gt; data type&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/create-vector-index.html" rel="noopener noreferrer"&gt;Oracle &lt;code&gt;CREATE VECTOR INDEX&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/overview-oracle-true-cache.html" rel="noopener noreferrer"&gt;Oracle True Cache overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/methods-connecting-true-cache.html" rel="noopener noreferrer"&gt;Connecting applications to Oracle True Cache&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs.html" rel="noopener noreferrer"&gt;Spring AI vector databases reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs/oracle.html" rel="noopener noreferrer"&gt;Spring AI Oracle Vector Store reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Implement an Oracle-Backed Semantic Cache for Spring Applications</title>
      <dc:creator>Mark Nelson</dc:creator>
      <pubDate>Wed, 24 Jun 2026 15:01:23 +0000</pubDate>
      <link>https://dev.to/oracledevs/implement-an-oracle-backed-semantic-cache-for-spring-applications-44nc</link>
      <guid>https://dev.to/oracledevs/implement-an-oracle-backed-semantic-cache-for-spring-applications-44nc</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The demo keeps semantic caching small enough to inspect.&lt;/strong&gt; It uses Spring Boot, JDBC, Oracle AI Database 26ai Free, Oracle True Cache, deterministic fixture vectors, and a small Java cache service so the database behavior is visible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vector search proposes one candidate; policy decides whether it can be reused.&lt;/strong&gt; The demo checks tenant, chat model, embedding model, embedding dimension, prompt template, source fingerprint, policy version, status, TTL, and a cosine-distance threshold before returning a cached answer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Oracle True Cache is validated as a read route, not as the semantic engine.&lt;/strong&gt; Exact and semantic lookup SQL can run through the True Cache service, while cache inserts and event writes go to the primary Oracle AI Database route.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The benchmark-lite report is wiring evidence, not a performance benchmark.&lt;/strong&gt; It confirms that the validation workload records decisions, routes, provider calls, distances, thresholds, and latency fields; it does not claim production latency, cost, or scalability results.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Article 1 defined semantic caching as governed answer reuse. This article turns that architecture into a runnable demo.&lt;/p&gt;

&lt;p&gt;The point of the demo is not to hide the policy behind a framework abstraction. It is to make the important boundaries visible: what gets stored, what SQL runs, which route handles reads, which route handles writes, when a candidate becomes a hit, and when the provider must still be called.&lt;/p&gt;

&lt;p&gt;That means the implementation is deliberately more direct than a production Spring AI application. The sample app is a Spring Boot command-line application. It creates two Oracle JDBC data sources in Java, constructs a &lt;code&gt;SemanticCacheService&lt;/code&gt;, and runs eight deterministic ecommerce-returns scenarios. The vectors are fixed fixture vectors supplied by the scenarios, not embeddings generated from prompt text by OpenAI, OCI Generative AI, or a Spring AI embedding model.&lt;/p&gt;

&lt;p&gt;That is a useful tradeoff for this article. We can validate Oracle AI Database vector storage, &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt;, policy predicates, Oracle True Cache routing, miss behavior, near-miss rejection, and event reporting without adding provider variability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0ox75bt2cuzgrxqyqhcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0ox75bt2cuzgrxqyqhcr.png" alt="Implementation topology for the semantic-cache demo" width="800" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1. The demo keeps the cache service, Oracle AI Database primary route, Oracle True Cache read route, cache tables, event table, deterministic provider, and generated reports visible.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Demo Contains
&lt;/h2&gt;

&lt;p&gt;The public demo code is available at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/markxnelson/semantic-cache-oracle-demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important files are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;semantic-cache-oracle-demo/
  docker-compose.yml
  .env.example
  app/
    src/main/java/com/example/semcache/app/SemanticCacheDemoApplication.java
  oracle-semantic-cache/
    src/main/java/com/example/semcache/oracle/AnswerProvider.java
    src/main/java/com/example/semcache/oracle/SemanticCacheRequest.java
    src/main/java/com/example/semcache/oracle/SemanticCacheResponse.java
    src/main/java/com/example/semcache/oracle/SemanticCacheService.java
  db/
    init/
      001-create-app-user.sql
      010-semantic-cache-schema.sql
  scripts/
    start-databases.sh
    wait-for-oracle.sh
    run-validation.sh
    run-benchmark-lite.sh
  reports/
    generated/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This article keeps the Java application intentionally narrow. The Spring Boot app is a command-line validation harness around the Oracle semantic-cache service, not a full chatbot server. It does not include Spring AI starter dependencies, provider profiles, REST controllers, or a separate RAG schema.&lt;/p&gt;

&lt;p&gt;The demo is still useful because it validates the Oracle-backed semantic-cache core that a Spring AI application could call from its own prompt flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understand the Java Application
&lt;/h2&gt;

&lt;p&gt;The Java application has two small layers.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;app/&lt;/code&gt; contains &lt;code&gt;SemanticCacheDemoApplication&lt;/code&gt;, the Spring Boot entry point. It implements &lt;code&gt;CommandLineRunner&lt;/code&gt;, so running the jar executes the demo once and exits. That runner reads environment variables, creates the Oracle JDBC connections, constructs the cache service, runs the fixture scenarios, writes reports, and fails the process if any scenario returns the wrong decision.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;oracle-semantic-cache/&lt;/code&gt; contains the reusable cache code. &lt;code&gt;SemanticCacheService&lt;/code&gt; owns the lookup and write logic. &lt;code&gt;SemanticCacheRequest&lt;/code&gt; carries the prompt, scope fields, fixture embedding, and distance threshold. &lt;code&gt;SemanticCacheResponse&lt;/code&gt; carries the decision that ends up in the reports.&lt;/p&gt;

&lt;p&gt;The entry point is deliberately easy to follow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataSource&lt;/span&gt; &lt;span class="n"&gt;primary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracleDataSource&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PRIMARY_JDBC_URL"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"jdbc:oracle:thin:@//localhost:1521/FREEPDB1"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;DataSource&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oracleDataSource&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"TRUE_CACHE_JDBC_URL"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PRIMARY_JDBC_URL"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"jdbc:oracle:thin:@//localhost:1521/FREEPDB1"&lt;/span&gt;&lt;span class="o"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;SemanticCacheService&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SemanticCacheService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;readRouteName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deterministicProvider&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runner then resets the demo tables, seeds one expired entry, and executes eight requests. The first request creates the initial cache entry. The next two prove exact and semantic reuse. The remaining scenarios prove that near misses, tenant differences, model differences, source changes, and expired entries do not reuse the old answer.&lt;/p&gt;

&lt;p&gt;Each request is explicit. For example, the semantic-hit request uses the same tenant and policy scope as the seed request, but a slightly different prompt and a nearby fixture vector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"semantic-hit"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"store-a"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"What is the return window for shoes I have not worn?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.101&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.199&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.302&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.398&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The demo does not ask a model to create that embedding. The vector is part of the fixture so the result is repeatable. That makes the database behavior easier to inspect: if a scenario fails, the problem is in the cache policy, route setup, SQL, or report generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the Database Boundary
&lt;/h2&gt;

&lt;p&gt;The Docker Compose topology starts a primary Oracle AI Database 26ai Free container and an Oracle True Cache container. Bring that stack up before reading the database evidence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/markxnelson/semantic-cache-oracle-demo.git
&lt;span class="nb"&gt;cd &lt;/span&gt;semantic-cache-oracle-demo

&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Review .env and update any ports, passwords, or image names needed for your machine.&lt;/span&gt;

./scripts/start-databases.sh
./scripts/wait-for-oracle.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;start-databases.sh&lt;/code&gt; starts the primary database first, copies the generated primary password file needed by True Cache, starts the True Cache container, and registers the True Cache services from the primary database. &lt;code&gt;wait-for-oracle.sh&lt;/code&gt; then waits until both the primary PDB service and the registered True Cache PDB service can answer a simple SQL query as the application user.&lt;/p&gt;

&lt;p&gt;When the services are ready, the script prints output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;primary database: ready
true cache: ready
app schema: SEMCACHE_APP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now run the validation script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./scripts/run-validation.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command checks that both the primary database and the registered True Cache PDB service can accept the application login, builds the sample application, runs the deterministic validation workload, and writes the evidence files under &lt;code&gt;reports/generated/&lt;/code&gt;. If the readiness check cannot see &lt;code&gt;semcache_pdb_tc&lt;/code&gt;, rerun &lt;code&gt;./scripts/start-databases.sh&lt;/code&gt; so the script can repair and register the demo True Cache services before validation.&lt;/p&gt;

&lt;p&gt;The validation script now prints what it is doing as it goes. The start of a healthy run looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;== Semantic cache validation wrapper ==
This validates Oracle primary plus True Cache readiness, then runs deterministic semantic-cache scenarios.
The Java app prints scenario-level decision, route, provider calls, distance, and threshold details.
Generated reports are written under reports/generated/.

Checking primary database and registered True Cache application login readiness...
Waiting for primary Oracle AI Database 26ai Free service...
primary database: ready
Waiting for Oracle True Cache lookup service...
true cache: ready
app schema: SEMCACHE_APP
Building the Spring Boot validation app...
Running deterministic semantic-cache scenarios...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the Java app prints each scenario. Here are three representative entries from the current validation run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario 1/8: seed-miss
Prompt: How long do I have to return unopened shoes?
Scope: tenant=store-a chat_model=gpt-4o-mini embedding_model=text-embedding-3-small source=returns-policy-2026-01
Result: decision=miss route=primary provider_calls=1 distance=n/a threshold=0.1
Why it matters: First request seeds the cache through the provider because no reusable entry exists.

Scenario 2/8: exact-hit
Prompt: How long do I have to return unopened shoes?
Scope: tenant=store-a chat_model=gpt-4o-mini embedding_model=text-embedding-3-small source=returns-policy-2026-01
Result: decision=exact-hit route=true-cache provider_calls=0 distance=n/a threshold=0.1
Why it matters: Exact reuse returns the cached answer without a provider call.

Scenario 3/8: semantic-hit
Prompt: What is the return window for shoes I have not worn?
Scope: tenant=store-a chat_model=gpt-4o-mini embedding_model=text-embedding-3-small source=returns-policy-2026-01
Result: decision=semantic-hit route=true-cache provider_calls=0 distance=0.000016 threshold=0.1
Why it matters: Safe paraphrase reuse stays within the distance threshold and avoids a provider call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the end, the script prints the validation result and the generated report paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;== Validation summary ==
Status: passed | scenarios=8 | failures=0
Provider calls made: 6
Provider calls avoided by approved exact/semantic reuse: 2
Generated reports:
- reports/generated/validation-events.csv
- reports/generated/validation-summary.json
- reports/generated/validation-summary.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The database-boundary checks in this section come from &lt;code&gt;reports/generated/validation-evidence.md&lt;/code&gt;, which the validation script creates every time it runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reports/generated/validation-evidence.md
reports/generated/validation-events.csv
reports/generated/validation-summary.json
reports/generated/validation-summary.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The evidence file records the service names that the rest of the article uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Primary PDB service: FREEPDB1
Registered True Cache PDB service: semcache_pdb_tc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also records that the primary database is running with archive logging and force logging enabled. That output comes from this query against the primary database service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;log_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;force_logging&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG_MODE     FORCE_LOGGING
------------ ---------------------------------------
ARCHIVELOG   YES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The True Cache check comes from the same query shape against the True Cache container after service registration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;open_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database_role&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That query returns the expected read-only True Cache role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
READ ONLY WITH APPLY TRUE CACHE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because True Cache is only useful in this pattern if the demo proves two separate facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The application schema and cache tables exist through the primary database route.&lt;/li&gt;
&lt;li&gt;Eligible read-only lookup SQL can run through the registered True Cache service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The validation run includes both. It shows &lt;code&gt;SEM_CACHE_ENTRY&lt;/code&gt; and &lt;code&gt;SEM_CACHE_EVENT&lt;/code&gt; through the primary database and through the registered True Cache service. It also runs a &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; expression through the True Cache route and gets a distance of &lt;code&gt;0&lt;/code&gt; for identical vectors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Store Cache Entries as Governed Rows
&lt;/h2&gt;

&lt;p&gt;The schema creates a dedicated application user and two semantic-cache tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SEM_CACHE_ENTRY
SEM_CACHE_EVENT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SEM_CACHE_ENTRY&lt;/code&gt; is the answer store. It contains prompt text, a prompt hash, a native vector column, the generated answer, scope metadata, policy metadata, status, and expiration time. The vector column is fixed for this demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt_embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That fixed dimension is intentional. The scenario code supplies four-dimensional fixture vectors so the validation path is deterministic. A production implementation that uses real embedding models would choose a dimension compatible with the selected embedding model and would need a migration and validation strategy when that model changes.&lt;/p&gt;

&lt;p&gt;The table also stores the fields that make reuse safe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tenant_id
chat_model
embedding_model
embedding_dimension
prompt_template_version
source_fingerprint
policy_version
status
expires_at
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SEM_CACHE_EVENT&lt;/code&gt; records the outcome for each scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scenario_name
route_name
decision
reason
distance
threshold
provider_calls
latency_ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That event table is what lets the demo show why a request became an exact hit, semantic hit, near miss, or miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build the Cache Service Around Two Routes
&lt;/h2&gt;

&lt;p&gt;The two data sources passed into &lt;code&gt;SemanticCacheService&lt;/code&gt; give the demo its read/write boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;SemanticCacheService&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SemanticCacheService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;readRouteName&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deterministicProvider&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;primary&lt;/code&gt; data source is used for reset, inserts, and event writes. The &lt;code&gt;read&lt;/code&gt; data source is used for exact and semantic lookup. In a normal validation run, &lt;code&gt;TRUE_CACHE_JDBC_URL&lt;/code&gt; points at the registered True Cache service. For debugging Java code, you can temporarily point the read URL at the primary service, but that does not validate the True Cache path.&lt;/p&gt;

&lt;p&gt;This direct wiring is less abstract than a production Spring configuration, and that is the point. The article can show the route behavior without implying Spring bean definitions that are not in the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run Exact Lookup Before Vector Lookup
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;SemanticCacheService.answer()&lt;/code&gt; starts by hashing the prompt and running an exact lookup on the read connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;answer_text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sem_cache_entry&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;chat_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_template_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;source_fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;policy_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACTIVE'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An exact prompt hash is still not enough. The SQL requires the same tenant, chat model, embedding model, embedding dimension, prompt template, source fingerprint, policy version, active status, and TTL window.&lt;/p&gt;

&lt;p&gt;When the exact row exists, the service returns an &lt;code&gt;exact-hit&lt;/code&gt;, records a hit event through the primary route, and avoids the provider call.&lt;/p&gt;

&lt;p&gt;That is the safest reuse path. In most semantic-cache designs, it comes before embedding or vector lookup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Vector Distance for a Candidate, Not a Decision
&lt;/h2&gt;

&lt;p&gt;If exact lookup misses, the service runs one semantic candidate query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;answer_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TO_VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sem_cache_entry&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;chat_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_template_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;source_fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;policy_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ACTIVE'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is top-1 retrieval. The current demo does not implement configurable top-k retrieval or reranking. It finds the nearest scoped active candidate, then the Java policy checks whether the distance is under the configured threshold.&lt;/p&gt;

&lt;p&gt;The default threshold is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SEM_CACHE_THRESHOLD=0.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The semantic-hit scenario uses a prompt with a fixture vector close to the seeded entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is the return window for shoes I have not worn?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The near-miss scenario uses a prompt in the same broad domain but with a different policy meaning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can I return worn shoes after 90 days?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is to prove both sides of the policy. A useful semantic cache reuses safe paraphrases and rejects unsafe near misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep the Provider Deterministic
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;deterministicProvider()&lt;/code&gt; passed into &lt;code&gt;SemanticCacheService&lt;/code&gt; is the replacement for a live LLM call in this validation harness. The cache code only needs an &lt;code&gt;AnswerProvider&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;AnswerProvider&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SemanticCacheRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SemanticCacheDemoApplication&lt;/code&gt; supplies a deterministic provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;AnswerProvider&lt;/span&gt; &lt;span class="nf"&gt;deterministicProvider&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;scenarioName&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"near-miss"&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="s"&gt;"Worn shoes follow the used-item policy and are not accepted after 90 days."&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="s"&gt;"Unopened shoes can be returned within 30 days when the tenant return policy is returns-v1."&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;};&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the fixture vectors are already part of each &lt;code&gt;SemanticCacheRequest&lt;/code&gt;, this provider only supplies answer text on misses and near misses. The demo is not trying to prove OpenAI or OCI Generative AI behavior. Those providers are examples of how a production application might generate answers or embeddings, not part of the deterministic validation path.&lt;/p&gt;

&lt;p&gt;This keeps the scenario run stable. If a scenario fails, the failure is in the cache policy, SQL, route configuration, or report generation, not in a live model response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run the Validation Scenarios
&lt;/h2&gt;

&lt;p&gt;The same &lt;code&gt;./scripts/run-validation.sh&lt;/code&gt; command runs the scenario workload and writes the decision reports. If you skipped the database-boundary check earlier, run it now with the Docker Compose environment running.&lt;/p&gt;

&lt;p&gt;The validation workload runs these scenarios in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed-miss
exact-hit
semantic-hit
near-miss
tenant-isolation
model-mismatch
source-fingerprint-mismatch
expired-entry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated validation events show the expected decision pattern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;th&gt;Provider calls&lt;/th&gt;
&lt;th&gt;Distance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seed-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0.000016&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.679840&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tenant-isolation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;source-fingerprint-mismatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;expired-entry&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The exact and semantic hits avoid provider calls. The near miss calls the provider and writes a new answer. Tenant, model, source fingerprint, and expiration differences miss because reuse would be unsafe.&lt;/p&gt;

&lt;p&gt;This is the most important output of the demo. It shows that the cache is not returning the nearest answer blindly. It is enforcing scope and freshness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat the Lite Report as Measurement Wiring
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;run-benchmark-lite.sh&lt;/code&gt; reuses the deterministic validation workload and summarizes the events. It records decision counts, provider-call accounting, and latency fields by route.&lt;/p&gt;

&lt;p&gt;That is useful, but it is not a production benchmark.&lt;/p&gt;

&lt;p&gt;The report has only eight deterministic scenarios. It uses fixture vectors and a deterministic provider. It does not run a statistically meaningful workload, does not isolate every mode as an independent benchmark path, and does not justify claims about production latency, cost, throughput, or scalability.&lt;/p&gt;

&lt;p&gt;What it does prove is simpler: the app can generate CSV and Markdown artifacts with the fields a later benchmark would need.&lt;/p&gt;

&lt;p&gt;Run it after validation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./scripts/run-benchmark-lite.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The current run prints this summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;== Benchmark lite summary ==
This lite report reuses the deterministic validation workload to confirm measurement wiring. It is not a production performance benchmark.
Scenarios: 8
Provider mode: deterministic mock
Embedding mode: deterministic fixture vectors
Decision counts:
- exact-hit: 1
- miss: 5
- near-miss: 1
- semantic-hit: 1
Provider calls made: 6
Provider calls avoided by approved reuse: 2
Cache hit rate for approved exact or semantic reuse: 25.00%
Latency by route:
- primary: samples=5 p50_ms=113 p95_ms=203
- true-cache: samples=3 p50_ms=58 p95_ms=64
Generated reports:
- reports/generated/benchmark-lite-events.csv
- reports/generated/benchmark-lite-summary.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Markdown report renders the same data in table form:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;exact-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;near-miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;semantic-hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;th&gt;Samples&lt;/th&gt;
&lt;th&gt;p50 latency ms&lt;/th&gt;
&lt;th&gt;p95 latency ms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;primary&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;113&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;true-cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use those numbers as validation evidence for this fixture workload only. They show that the measurement fields are wired correctly; they are not a general semantic-caching ROI claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Hardening Checklist
&lt;/h2&gt;

&lt;p&gt;Before adapting the pattern, keep the core policy explicit.&lt;/p&gt;

&lt;p&gt;Store enough scope to decide whether reuse is safe: tenant, security scope if applicable, chat model, embedding model, embedding dimension, prompt template, source fingerprint, policy version, status, and expiration. Keep exact lookup ahead of semantic lookup. Treat vector distance as candidate selection, not approval. Record misses and near misses, not only hits.&lt;/p&gt;

&lt;p&gt;Route writes, invalidations, and event recording to the primary database. Route only eligible read-only lookup SQL through Oracle True Cache. Decide how your application handles read-after-write visibility before putting cache lookup on a latency-sensitive route.&lt;/p&gt;

&lt;p&gt;If you add real embedding providers, make embedding provenance part of the cache key. If you add RAG, keep source documents separate from generated answer reuse. If you add a real benchmark, separate behavior validation from performance claims.&lt;/p&gt;

&lt;p&gt;For production work, turn those principles into a short release checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define the cache scope fields before writing the first lookup query.&lt;/li&gt;
&lt;li&gt;Version prompt templates, source fingerprints, and policy rules.&lt;/li&gt;
&lt;li&gt;Keep model names and embedding dimensions in the reuse predicate.&lt;/li&gt;
&lt;li&gt;Choose a TTL that matches the business policy behind the answer.&lt;/li&gt;
&lt;li&gt;Route writes and invalidations to the primary database.&lt;/li&gt;
&lt;li&gt;Send only eligible read-only lookups to Oracle True Cache.&lt;/li&gt;
&lt;li&gt;Measure exact hits, semantic hits, near misses, misses, provider calls, and latency by route.&lt;/li&gt;
&lt;li&gt;Review retention, encryption, masking, audit, and deletion requirements before storing prompts or answers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A semantic cache is safest when it behaves like governed application state, not like an unqualified nearest-neighbor shortcut.&lt;/p&gt;

&lt;p&gt;This demo shows that pattern with Oracle AI Database 26ai Free tables, a fixed native vector column, &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; candidate lookup, exact-hit and semantic-hit reuse, near-miss rejection, scope-based misses, Oracle True Cache read routing, and primary-route writes.&lt;/p&gt;

&lt;p&gt;The next step is to preserve those boundaries while adding the production pieces your application needs: real embeddings, stronger route instrumentation, operational invalidation, privacy controls, and a benchmark that is large enough to support performance conclusions.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Which Agent Memory Approach Is Best for Long Conversations?</title>
      <dc:creator>Anya Summers</dc:creator>
      <pubDate>Tue, 23 Jun 2026 17:01:51 +0000</pubDate>
      <link>https://dev.to/oracledevs/which-agent-memory-approach-is-best-for-long-conversations-1me4</link>
      <guid>https://dev.to/oracledevs/which-agent-memory-approach-is-best-for-long-conversations-1me4</guid>
      <description>&lt;p&gt;&lt;strong&gt;How sliding windows, summaries, vector retrieval, structured memory, episodic memory, and memory managers work together to support long AI agent conversations.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Companion notebook: &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;Agent Memory for Long Conversations with Oracle AI Database&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Long conversations are continuity problems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The best practical pattern is hybrid layered memory: recent context, summaries, vector retrieval, structured memory, episodic memory, and a memory manager.&lt;/li&gt;
&lt;li&gt;Sliding window memory keeps recent turns available, but older context still falls out.&lt;/li&gt;
&lt;li&gt;Summarization compresses older dialogue, but it can lose details or drift.&lt;/li&gt;
&lt;li&gt;Vector retrieval finds semantically related context, but similarity is not the same as relevance.&lt;/li&gt;
&lt;li&gt;Structured memory stores stable facts, preferences, entities, decisions, and state.&lt;/li&gt;
&lt;li&gt;Episodic memory preserves important events, outcomes, and prior attempts.&lt;/li&gt;
&lt;li&gt;A memory manager decides what gets stored, updated, retrieved, summarized, and passed into the model.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/" rel="noopener noreferrer"&gt;Oracle AI Database&lt;/a&gt; becomes useful when long-conversation memory needs durable storage, relational precision, vector retrieval, JSON metadata, and governed access patterns.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Practical Pattern
&lt;/h2&gt;

&lt;p&gt;For long AI agent conversations, the most reliable pattern is hybrid layered memory. In practice, that means each memory layer has a specific job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep the latest turns available as recent context.&lt;/li&gt;
&lt;li&gt;Summarize older dialogue so the model does not need the full transcript every time.&lt;/li&gt;
&lt;li&gt;Use vector retrieval when the user refers back to older context with different wording.&lt;/li&gt;
&lt;li&gt;Store stable facts, preferences, decisions, and state in structured memory.&lt;/li&gt;
&lt;li&gt;Preserve important events, outcomes, and prior attempts as episodic memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory manager sits above those layers and decides what gets written, updated, retrieved, summarized, and passed into the model for the current turn. &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;The companion notebook&lt;/a&gt; implements this pattern with Oracle AI Database, &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/" rel="noopener noreferrer"&gt;Oracle AI Agent Memory&lt;/a&gt;, and LangChain, but the first idea is vendor-neutral: long conversation memory needs architecture, not just a larger prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Long Conversations Break Simple Chat History
&lt;/h2&gt;

&lt;p&gt;A short chat can usually survive with raw conversation history. The model sees the latest turns, understands what the user is asking, and continues naturally. Long conversations are different because they contain many kinds of information at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temporary details that only matter for the next response;&lt;/li&gt;
&lt;li&gt;durable decisions that should be remembered later;&lt;/li&gt;
&lt;li&gt;user preferences, project facts, and task state;&lt;/li&gt;
&lt;li&gt;tool results, failed attempts, successful outcomes, and follow-up actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Treating all of that as one long transcript does not scale well. The model either receives too much irrelevant context, misses older details, or depends on a compressed summary that may have lost something important. Long conversation memory needs structure because not every part of a conversation has the same value, lifetime, or retrieval pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Bigger Context Windows Are Not Enough
&lt;/h2&gt;

&lt;p&gt;A bigger context window can delay the problem, but it does not solve it. More context means the model can see more text at once, which is useful for long documents and extended sessions. But it does not answer the harder engineering questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which facts should survive across sessions?&lt;/li&gt;
&lt;li&gt;Which older details are still relevant?&lt;/li&gt;
&lt;li&gt;Which decisions are authoritative?&lt;/li&gt;
&lt;li&gt;Which prior attempts should not be repeated?&lt;/li&gt;
&lt;li&gt;Which memory belongs to this user, this project, or this task?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A bigger context window gives you more room. It does not give you a memory policy. That policy has to come from the application architecture: what to store, what to summarize, what to retrieve, what to trust, and what to pass into the model for a specific turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Approaches That Actually Help
&lt;/h2&gt;

&lt;p&gt;Different memory approaches solve different parts of the long conversation problem. The useful framing is not to ask which one is universally best, but which layer should handle which kind of continuity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory approach&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Weakness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sliding window memory&lt;/td&gt;
&lt;td&gt;Recent turns and immediate continuity&lt;/td&gt;
&lt;td&gt;Older context falls out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation summary memory&lt;/td&gt;
&lt;td&gt;Compressing older dialogue&lt;/td&gt;
&lt;td&gt;Can lose detail or drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector memory&lt;/td&gt;
&lt;td&gt;Semantic recall across older context&lt;/td&gt;
&lt;td&gt;Similarity is not the same as relevance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured memory&lt;/td&gt;
&lt;td&gt;Facts, preferences, entities, decisions, and state&lt;/td&gt;
&lt;td&gt;Requires extraction and update rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic memory&lt;/td&gt;
&lt;td&gt;Events, outcomes, prior attempts, and task resumption&lt;/td&gt;
&lt;td&gt;Needs importance and retention rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory manager&lt;/td&gt;
&lt;td&gt;Coordinating what to store, retrieve, summarize, update, and pass forward&lt;/td&gt;
&lt;td&gt;Adds application logic that must be tested&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important point is that none of these approaches is enough by itself. A useful long-conversation system combines them, then lets a memory manager decide which pieces are relevant for the current turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sliding Window and Summarization for Short-Term Continuity
&lt;/h2&gt;

&lt;p&gt;The first layer is sliding window memory. It keeps the latest turns close to the model so the current exchange remains coherent. If a developer just asked a follow-up question, the model needs the most recent messages to understand the current task and avoid asking for context that was already provided.&lt;/p&gt;

&lt;p&gt;But a sliding window is temporary by design. Once the conversation gets long enough, older context falls out. Summarization helps by compressing older dialogue into a smaller representation, preserving continuity without passing the entire transcript into every request. The tradeoff is that summaries are not perfect memory. They can omit details, merge separate ideas, or drift over time. In practice, summaries work best when they are supported by more precise layers, especially structured memory and episodic memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vector Retrieval for Long-Term Semantic Recall
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/ai.html" rel="noopener noreferrer"&gt;Oracle AI Vector Search&lt;/a&gt; helps when the user refers to older context with different wording. For example, the user might ask, “Earlier we debugged this issue. What did we decide, and what should I try next?” That question does not repeat every detail from the earlier debugging work. A vector memory layer can still retrieve related chunks about the root cause, the decision, the failed patch, and the rollout plan.&lt;/p&gt;

&lt;p&gt;Vector retrieval is especially useful for recall across sessions, paraphrased follow-up questions, large conversation histories, and knowledge that is easier to find by meaning than by exact keyword. But it should not be the only memory layer. Semantic similarity is not the same as correctness. A retrieved chunk can be related but outdated, incomplete, or less authoritative than a structured decision record.&lt;/p&gt;




&lt;h2&gt;
  
  
  Structured Memory for Facts, Preferences, and State
&lt;/h2&gt;

&lt;p&gt;Structured memory stores information that should be precise. This includes user preferences, project facts, entities, decisions, task state, configuration choices, and metrics to monitor. These are not just pieces of text; they are records the application may need to query, update, validate, and govern.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt;, structured memory includes project state, decisions, metrics, and preferences. For example, it stores the decision to use a region-specific inventory lock timeout, the project state that EU payment authorization latency exceeded the existing timeout, and the metric to monitor expired inventory locks by region. This kind of memory helps the memory manager prefer authoritative facts over loosely related retrieved chunks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Episodic Memory for What Happened and Why It Mattered
&lt;/h2&gt;

&lt;p&gt;Episodic memory stores important events and outcomes. It matters for long conversations because agents often need to resume work, explain prior decisions, or avoid repeating failed attempts. A fact says what is true. An episode says what happened, what changed, and why it mattered.&lt;/p&gt;

&lt;p&gt;In the notebook, episodic memory stores events such as a rejected global patch, an EU-only patch that passed staging, and an agreed rollout plan. If the developer later asks what to try next, the agent should know that the global patch already failed and that the EU-only patch passed staging. That is the difference between remembering text and remembering progress.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Best Pattern: Hybrid Layered Memory
&lt;/h2&gt;

&lt;p&gt;The best pattern for long conversation memory is a layered architecture. Recent context keeps the current exchange coherent. Summaries compress older dialogue. Vector retrieval brings back semantically related information. Structured memory preserves stable facts and decisions. Episodic memory records what happened and what was tried.&lt;/p&gt;

&lt;p&gt;The memory manager coordinates the layers. That coordination is what turns memory from a pile of stored text into a usable system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0qzpy0lk8n2kjour8tch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0qzpy0lk8n2kjour8tch.png" alt=" " width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How a Memory Manager Assembles Context for Each Turn
&lt;/h2&gt;

&lt;p&gt;A memory manager should not blindly stuff every stored item into the prompt. For each turn, it should decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which recent turns to include;&lt;/li&gt;
&lt;li&gt;whether the rolling summary is needed;&lt;/li&gt;
&lt;li&gt;which structured facts and episodic events matter;&lt;/li&gt;
&lt;li&gt;which retrieved chunks are useful;&lt;/li&gt;
&lt;li&gt;what should be stored or updated after the response.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example context package:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context_package&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recent_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;recent_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rolling_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;structured_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;structured_memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodic_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;episodic_memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retrieved_memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shape is easier to inspect than a giant prompt. If the answer is wrong, developers can debug the context package: was the summary stale, did retrieval miss the right memory, was the structured decision missing, or did the episodic log omit a failed attempt?&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling Memory Conflicts and Freshness
&lt;/h2&gt;

&lt;p&gt;Layered memory introduces a new engineering question: what happens when memory layers disagree?&lt;/p&gt;

&lt;p&gt;For example, a rolling summary might preserve an older plan, while structured memory contains the final decision. A vector search result might retrieve a semantically related note that is no longer current. An episodic memory entry might show that a previous attempt failed, even if the latest summary does not mention it.&lt;/p&gt;

&lt;p&gt;A reliable memory manager should treat memory as evidence, not as a flat transcript. Useful conflict and freshness rules include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prefer structured decisions over summaries when both refer to the same fact;&lt;/li&gt;
&lt;li&gt;prefer newer memory when two records have the same authority;&lt;/li&gt;
&lt;li&gt;prefer scoped memory over generic memory, such as project-specific or region-specific records;&lt;/li&gt;
&lt;li&gt;downgrade retrieved chunks that are old, superseded, or weakly related to the current task;&lt;/li&gt;
&lt;li&gt;keep source, timestamp, scope, and memory type metadata with each memory record;&lt;/li&gt;
&lt;li&gt;mark important records as active, superseded, rejected, or archived instead of deleting context too early.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes long-conversation memory easier to inspect. If the agent gives the wrong answer, developers can check which memory layer supplied the evidence and why that evidence was selected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Making the Memory Manager Concrete
&lt;/h2&gt;

&lt;p&gt;A memory manager is not just a helper that collects context. It is the policy layer for memory.&lt;/p&gt;

&lt;p&gt;For each turn, the memory manager can rank candidate memories using simple rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recent turns explain the current exchange;&lt;/li&gt;
&lt;li&gt;structured decisions are usually more precise than summaries;&lt;/li&gt;
&lt;li&gt;episodic memory is useful when the user asks about prior attempts, outcomes, or what to try next;&lt;/li&gt;
&lt;li&gt;vector results are useful when they pass a similarity threshold and match the current thread or task scope;&lt;/li&gt;
&lt;li&gt;stale or superseded memories should be excluded unless they explain why a previous path should not be repeated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple priority order could look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Current user message&lt;/li&gt;
&lt;li&gt;Recent conversation turns&lt;/li&gt;
&lt;li&gt;Active structured decisions and project state&lt;/li&gt;
&lt;li&gt;Relevant episodic events&lt;/li&gt;
&lt;li&gt;Rolling summary&lt;/li&gt;
&lt;li&gt;Vector-retrieved chunks&lt;/li&gt;
&lt;li&gt;Archived or superseded memory only when needed for explanation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The exact policy depends on the application, but the principle is consistent: the memory manager should assemble the smallest useful context package that is current, scoped, and explainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where a Database-Backed Memory Layer Fits
&lt;/h2&gt;

&lt;p&gt;The first half of this architecture is intentionally vendor-neutral. Any serious long-conversation agent needs memory layers and a memory manager. Once memory needs to survive beyond a single session, a database-backed layer becomes useful because the system needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;durable storage and queryable history;&lt;/li&gt;
&lt;li&gt;structured facts and state;&lt;/li&gt;
&lt;li&gt;vector retrieval and JSON metadata;&lt;/li&gt;
&lt;li&gt;timestamps, status fields, and policy metadata for freshness and conflict handling;&lt;/li&gt;
&lt;li&gt;user, thread, and task scoping;&lt;/li&gt;
&lt;li&gt;access controls and auditability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where Oracle AI Database fits naturally. It can store relational memory, JSON metadata, episodic logs, and vector-searchable chunks in one governed layer. The point is not that every application needs the same table names. The point is the separation of responsibilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkxzfqjfrtb81czljfqyu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkxzfqjfrtb81czljfqyu.png" alt=" " width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Companion Notebook Demonstrates
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt; implements the layered pattern end to end. It demonstrates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every message stored in conversational memory;&lt;/li&gt;
&lt;li&gt;a rolling summary per thread;&lt;/li&gt;
&lt;li&gt;project state and decisions stored as structured memory;&lt;/li&gt;
&lt;li&gt;important events stored with timestamps and outcomes;&lt;/li&gt;
&lt;li&gt;retrieval chunks and Oracle vector search when available;&lt;/li&gt;
&lt;li&gt;a context package assembled for a follow-up question from older conversation history;&lt;/li&gt;
&lt;li&gt;a package-level validation path for &lt;a href="https://docs.oracle.com/en/database/oracle/agent-memory/26.4/agmea/api/agentmemory.html" rel="noopener noreferrer"&gt;oracleagentmemory&lt;/a&gt;, including creating a thread, writing memories, and searching them back.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The example follow-up question is:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Earlier we debugged this issue.
What did we decide, and what should I try next?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook stores enough memory to answer that question without relying only on the latest chat turns. It also shows Oracle AI Agent Memory as a higher-level package workflow and LangChain as an interoperability layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Oracle-Backed Memory Workflow
&lt;/h2&gt;

&lt;p&gt;The notebook stores each memory layer in Oracle AI Database. Recent context is retrieved with a bounded query so the model receives the latest turns without carrying the full transcript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example recent-context query:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;turn_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;lcam_conversation_memory&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;turn_id&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Structured memory is stored separately from raw messages so facts, decisions, preferences, and project state can be updated and queried directly.&lt;/p&gt;

&lt;p&gt;Example structured-memory insert:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;lcam_structured_memory&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scope_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector retrieval can use Oracle vector search when the database supports it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example vector retrieval query:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;lcam_vector_memory&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The notebook first stores retrieval chunks as inspectable memory records, then creates vector-searchable memory when Oracle VECTOR support is available. The query uses &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/vector_distance.html" rel="noopener noreferrer"&gt;VECTOR_DISTANCE&lt;/a&gt; to rank candidate chunks by distance from the query embedding. The snippets are intentionally small so the architecture stays visible. The notebook carries the full executable workflow and the real database results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Oracle AI Agent Memory as a Higher-Level Memory API
&lt;/h2&gt;

&lt;p&gt;The custom tables in the notebook make the memory mechanics visible. Oracle AI Agent Memory provides a higher-level package interface for working with threads, memory records, and retrieval on top of Oracle AI Database. That is useful when a team wants the benefits of persistent memory without rebuilding every memory component from scratch.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt; also validates the oracleagentmemory package path by creating a thread, writing durable memories, and searching those memories back. That package-level proof is important because the table-level walkthrough explains the architecture, while Oracle AI Agent Memory shows the application-facing API path developers can use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Oracle AI Agent Memory workflow:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EU checkout timeout decision: use 12 seconds for EU and 5 seconds for US.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What timeout did we choose for EU?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;exact_thread_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This higher-level API belongs after the architecture is understood. It should not hide the core design question: which memory should be stored, updated, retrieved, and trusted for the current turn?&lt;/p&gt;




&lt;h2&gt;
  
  
  Where LangChain Fits
&lt;/h2&gt;

&lt;p&gt;LangChain can help once the memory layer is working. It is useful for orchestration, document wrapping, retriever interfaces, and repeatable application flows. It should not replace database privileges, memory policy, or observability.&lt;/p&gt;

&lt;p&gt;In the notebook, retrieved Oracle-backed memory is converted into LangChain Document objects so the same memory layer can participate in LangChain-style application flows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example LangChain document wrapping:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieved_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;itertuples&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Oracle-backed retrieval pipelines, &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/oracle-ai-vector-search-integration-langchain.html" rel="noopener noreferrer"&gt;Oracle AI Vector Search integration with LangChain&lt;/a&gt; gives developers a bridge between LangChain and Oracle AI Database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Recommendation for Developers
&lt;/h2&gt;

&lt;p&gt;Use the simplest memory layer that solves the problem, but do not pretend one layer solves everything. Short chats may only need a sliding window. Long linear chats usually need a sliding window plus summaries. Recall across sessions needs vector retrieval. Correct preferences and profile facts need structured memory. Task resumption needs episodic memory. Production-grade continuity needs hybrid layered memory with a memory manager.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Short chats&lt;/td&gt;
&lt;td&gt;Sliding window memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long linear chats&lt;/td&gt;
&lt;td&gt;Sliding window plus summaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall across sessions&lt;/td&gt;
&lt;td&gt;Vector retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Correct preferences and profile facts&lt;/td&gt;
&lt;td&gt;Structured memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task resumption&lt;/td&gt;
&lt;td&gt;Episodic memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliable long-term continuity&lt;/td&gt;
&lt;td&gt;Hybrid layered memory with a memory manager&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A practical rollout is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store every message so the raw conversation can be inspected.&lt;/li&gt;
&lt;li&gt;Keep a bounded recent context window and add a rolling summary for older dialogue.&lt;/li&gt;
&lt;li&gt;Extract structured memory for facts, preferences, decisions, and state.&lt;/li&gt;
&lt;li&gt;Store episodic memory for important events and prior attempts.&lt;/li&gt;
&lt;li&gt;Add vector retrieval for semantic recall.&lt;/li&gt;
&lt;li&gt;Use a memory manager to assemble context for each turn.&lt;/li&gt;
&lt;li&gt;Move to a database-backed memory layer when memory needs to be durable, queryable, shared, and governed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Long conversations are not solved by one memory technique. A bigger context window, raw chat history, summaries, vector retrieval, structured facts, and episodic logs each solve part of the problem. The best pattern is hybrid layered memory coordinated by a memory manager.&lt;/p&gt;

&lt;p&gt;Oracle AI Database provides a durable implementation layer for that pattern when teams need relational precision, vector retrieval, JSON metadata, and governed access. Oracle AI Agent Memory and LangChain can then sit above that layer when developers need higher-level APIs or orchestration. The goal is not to keep making prompts larger. The goal is to make memory inspectable, retrievable, updateable, and reliable.&lt;/p&gt;

&lt;p&gt;Run the &lt;a href="https://github.com/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/oracle_agent_memory_long_conversations.ipynb" rel="noopener noreferrer"&gt;companion notebook&lt;/a&gt; to see the pattern stored, retrieved, scoped, and validated in Oracle AI Database, including the oracleagentmemory package workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the best memory approach for long conversations?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hybrid layered memory: recent context, summaries, vector retrieval, structured memory, episodic memory, and a memory manager.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is a larger context window enough?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. It gives the model more room, but it does not define what should be stored, retrieved, updated, or trusted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is conversation summary memory good for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It compresses older dialogue so the model can keep continuity without receiving the full transcript.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is vector memory good for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector memory helps retrieve semantically related context, especially when users ask follow-up questions with different wording.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is structured memory good for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Structured memory stores stable facts, preferences, entities, decisions, and state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is episodic memory good for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Episodic memory stores important events, outcomes, and prior attempts, which helps with task resumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does a memory manager do?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It decides what gets stored, updated, retrieved, summarized, and passed into the model for each turn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does Oracle AI Database fit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It provides the durable memory layer for relational memory, JSON metadata, episodic logs, and vector-searchable chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does Oracle AI Agent Memory fit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It provides a higher-level package API for memory records, threads, and retrieval on top of Oracle AI Database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where does LangChain fit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangChain can help with orchestration and retriever interfaces after the memory layer is working.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentmemory</category>
      <category>agents</category>
      <category>oracle</category>
    </item>
    <item>
      <title>Semantic Caching with Spring AI, Oracle AI Database 26ai, and Oracle True Cache: The Architecture Before the Code</title>
      <dc:creator>Mark Nelson</dc:creator>
      <pubDate>Mon, 22 Jun 2026 14:53:45 +0000</pubDate>
      <link>https://dev.to/oracledevs/semantic-caching-with-spring-ai-oracle-ai-database-26ai-and-oracle-true-cache-the-architecture-llb</link>
      <guid>https://dev.to/oracledevs/semantic-caching-with-spring-ai-oracle-ai-database-26ai-and-oracle-true-cache-the-architecture-llb</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Semantic caching is policy-controlled answer reuse, not just vector search. A nearest-neighbor match is only a candidate until tenant, security, model, prompt-template, data-domain, threshold, freshness, and reuse policy approve it.&lt;/li&gt;
&lt;li&gt;Keep semantic-cache answers separate from retrieval-augmented generation (RAG) documents. RAG retrieves source material for a new answer; semantic caching retrieves a prior generated answer only when reuse is safe.&lt;/li&gt;
&lt;li&gt;Oracle AI Database 26ai is a strong fit when the cache lookup needs both vector ranking and SQL predicates. Native VECTOR, VECTOR_DISTANCE(), vector indexes, relational columns, transactions, metadata, provenance, and invalidation state can live in one SQL-backed record.&lt;/li&gt;
&lt;li&gt;Oracle True Cache belongs on the eligible read-only lookup path. In this series, we use it for lookup-heavy semantic-cache SQL traffic where routing and freshness rules fit; it does not compute embeddings, judge semantic equivalence, or approve cached answers.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Your Spring AI application has probably seen traffic like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt A: "How do I reset my password?"
Prompt B: "I forgot my login password. How do I reset it?"
Prompt C: "Can you help me recover account access?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you cache only by exact prompt text, those are three different strings. Unless your application normalizes them into the same scoped key, you get three misses, three large language model (LLM) calls, and three chances to spend latency and tokens on essentially the same answer.&lt;/p&gt;

&lt;p&gt;That is the problem semantic caching is meant to solve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic response caching reuses a previously generated answer when a new prompt is semantically similar and the application policy says reuse is safe.&lt;/strong&gt; Instead of asking, “Have I seen this exact string before?”, the application asks a more useful question: “Have I already answered a sufficiently similar question, and is that answer still safe to reuse for this request?”&lt;/p&gt;

&lt;p&gt;The second half of that question is what keeps the architecture honest. A semantic cache is not a “vector search equals cache hit” button. Vector search proposes candidates. The application and database policy decide whether reuse is allowed.&lt;/p&gt;

&lt;p&gt;That is where Oracle AI Database 26ai becomes interesting for Spring AI developers. Semantic-cache entries are not just disposable cache keys. In many applications, they are governed records: scoped by tenant, security, chat model, embedding model, prompt template, data domain, freshness rules, provenance, invalidation state, feedback, and operational metrics. With Oracle AI Database 26ai, the prompt embedding and policy metadata can live in the same transactional database record and be queried together.&lt;/p&gt;

&lt;p&gt;Oracle True Cache fits one layer below that decision. When semantic-cache lookups become read-heavy, True Cache can support eligible read-only lookup traffic without changing what a semantic-cache hit means.&lt;/p&gt;

&lt;p&gt;This is the first article in the series, so we will stay at the architecture level. We will define the moving parts, draw the boundaries, and set the decision rules that the implementation and benchmark preserve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Different cache layers solve different LLM application problems
&lt;/h2&gt;

&lt;p&gt;Caching discussions around LLM applications get confusing because several mechanisms reduce repeated work, but they operate at different layers. The simplest way to keep them straight is to ask what each layer stores and who decides whether reuse is safe.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;exact response cache&lt;/strong&gt; stores an answer under a deterministic key. That key usually includes normalized prompt text plus scope such as tenant, chat model, prompt template, application, and data domain. Exact caching is simple and often the safest place to start. If the same scoped request arrives again, return the same answer. If the wording changes, the exact key usually changes too.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;semantic response cache&lt;/strong&gt; stores a prior prompt embedding plus the generated answer and policy metadata. When a new request arrives, the application embeds the new prompt and searches for nearby prior prompts. A close match can avoid another LLM call, but only after policy approves reuse.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;retrieval-augmented generation (RAG) store&lt;/strong&gt; is different. RAG retrieves source material: documentation chunks, policy text, product manuals, support articles, tickets, or other records used to construct a new answer. RAG retrieval does not mean “return this old model answer.” It means “bring relevant source content into the generation step.”&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;database result cache&lt;/strong&gt; or &lt;strong&gt;HTTP cache&lt;/strong&gt; usually caches deterministic outputs for exact queries or resources. It does not understand paraphrases. It is useful, but it is not semantic matching.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;LLM provider prompt cache&lt;/strong&gt; is also adjacent, not equivalent. Provider prompt caching, where available, can reduce provider-side processing for repeated prompt prefixes or context blocks. The application still sends the request, and the provider still generates the response. Semantic response caching is an application-controlled decision to skip generation and reuse a prior answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Oracle True Cache&lt;/strong&gt; is another layer. True Cache is an in-memory, read-only cache in front of Oracle AI Database. In this architecture, it helps with eligible database reads during semantic-cache candidate lookup. It is not the semantic cache itself, and it does not decide whether two prompts mean the same thing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwahvyo7038z6e9mmase8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwahvyo7038z6e9mmase8.png" alt="Semantic cache lookup with Oracle AI Database 26ai and Oracle True Cache" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Oracle AI Database 26ai is the semantic-cache system of record. Vector similarity proposes candidates, SQL predicates narrow the eligible set, Oracle True Cache supports eligible read-only lookup traffic, and the app calls the LLM only when no candidate is approved for reuse.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The distinction looks small in a diagram. In production code, it is the difference between safe reuse and a false-positive machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  A semantic-cache candidate is not a hit
&lt;/h2&gt;

&lt;p&gt;Let’s replay the password example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exact-cache lookup:
A -&amp;gt; MISS -&amp;gt; call LLM -&amp;gt; store answer
B -&amp;gt; MISS -&amp;gt; call LLM again
C -&amp;gt; MISS -&amp;gt; call LLM again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now compare that with a semantic-cache path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;semantic-cache lookup:
A -&amp;gt; MISS -&amp;gt; call LLM -&amp;gt; store prompt embedding + answer + policy metadata
B -&amp;gt; CANDIDATE -&amp;gt; passes policy -&amp;gt; reuse answer
C -&amp;gt; CANDIDATE -&amp;gt; may pass or fail depending on threshold and policy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the word &lt;strong&gt;candidate&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Prompt B is probably a safe paraphrase of Prompt A in a simple account-help FAQ application. Prompt C is broader. “Recover account access” could mean reset a password, unlock an account, recover a username, pass multi-factor authentication, or talk to support. Whether Prompt C can reuse the same answer depends on your domain and policy.&lt;/p&gt;

&lt;p&gt;That is the mental model to keep: vector similarity proposes candidates; policy approves hits.&lt;/p&gt;

&lt;p&gt;A semantic-cache candidate needs to pass checks like these before the application returns the cached answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;same tenant or authorized sharing scope&lt;/li&gt;
&lt;li&gt;same security scope&lt;/li&gt;
&lt;li&gt;same application and data domain&lt;/li&gt;
&lt;li&gt;compatible chat model or model family, depending on your reuse policy&lt;/li&gt;
&lt;li&gt;same embedding model and embedding dimension&lt;/li&gt;
&lt;li&gt;same prompt template and prompt-template version&lt;/li&gt;
&lt;li&gt;unexpired and not invalidated&lt;/li&gt;
&lt;li&gt;acceptable vector distance or similarity threshold&lt;/li&gt;
&lt;li&gt;acceptable provenance and source-policy version&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of those checks can live in vector-store metadata filters. In an Oracle-backed design, you can also encode them directly as SQL predicates alongside vector ranking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spring AI handles the AI flow; the semantic-cache service owns reuse policy
&lt;/h2&gt;

&lt;p&gt;Spring AI provides the application-level pieces for this architecture, including chat clients, embedding models, vector-store abstractions, &lt;code&gt;SearchRequest&lt;/code&gt;, metadata filters, provider integrations, and advisor-style request interception. The Spring AI vector database reference describes the core &lt;code&gt;VectorStore&lt;/code&gt; shape, including similarity search with top-k, threshold, and filter expressions. Spring AI also documents an &lt;code&gt;OracleVectorStore&lt;/code&gt; integration for Oracle Database AI Vector Search.&lt;/p&gt;

&lt;p&gt;For an Oracle implementation, the important design choice is to keep the semantic-cache store dedicated. Even if your application already has a RAG vector store, avoid quietly reusing it for cached answers. Give the cache its own Oracle table, schema, or hard metadata scope.&lt;/p&gt;

&lt;p&gt;Depending on the Spring AI version you target, the implementation can take one of two paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use a Spring AI cache component if the selected version exposes one that fits the required policy and storage model&lt;/li&gt;
&lt;li&gt;implement a small Oracle-native semantic-cache service beside Spring AI, using Spring AI for embeddings and chat while Oracle AI Database 26ai handles vector-plus-policy lookup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That second option is not a workaround. It can be the cleaner production shape when the cache decision needs strict relational predicates, transactional hit logging, invalidation, provenance, and reporting. Spring AI remains the Java AI framework. Oracle AI Database 26ai serves as the governed semantic-cache backend.&lt;/p&gt;

&lt;p&gt;One practical caveat: do not assume that every Spring AI cache API is backend-agnostic or that every vector-store abstraction is enough for strict tenant, security, model, template, freshness, and invalidation policy. For the Oracle path, pin the Spring AI version and the Oracle AI Database target, then decide whether &lt;code&gt;OracleVectorStore&lt;/code&gt; metadata filters are sufficient or whether direct Oracle SQL is the clearer implementation.&lt;/p&gt;

&lt;p&gt;For this architecture article, the important boundary is simple: Spring AI handles the application AI flow, and the semantic-cache service owns the answer-reuse decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The semantic-cache lookup architecture
&lt;/h2&gt;

&lt;p&gt;The main lookup path has a simple rhythm.&lt;/p&gt;

&lt;p&gt;The application receives a prompt and builds a scoped exact-cache key first. If there is no exact hit, it creates an embedding for the prompt. Then it queries a dedicated Oracle semantic-cache table for the nearest policy-eligible candidates. If a candidate passes threshold and reuse policy, the application returns the cached answer. If not, it calls the LLM, stores the new answer and metadata in Oracle AI Database 26ai, and returns the new answer.&lt;/p&gt;

&lt;p&gt;The key point is not only that the database stores vectors, although it does. Oracle AI Database 26ai includes a native &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/create-tables-using-vector-data-type.html" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR&lt;/code&gt; data type&lt;/a&gt;, SQL vector distance functions such as &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_distance.html" rel="noopener noreferrer"&gt;&lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt;&lt;/a&gt;, and vector indexes such as HNSW and IVF through &lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/create-vector-index.html" rel="noopener noreferrer"&gt;&lt;code&gt;CREATE VECTOR INDEX&lt;/code&gt;&lt;/a&gt;. Those features matter because they let the cache lookup live in SQL with the same metadata that determines whether reuse is safe.&lt;/p&gt;

&lt;p&gt;A semantic-cache row is closer to an operational record than a document chunk. It might include fields such as tenant scope, security scope, application identity, model identity, prompt-template version, data domain, the original question, the question embedding, the generated answer, source-policy version, timestamps, invalidation state, provenance, hit metadata, and feedback signals.&lt;/p&gt;

&lt;p&gt;You do not need every field on day one. A first implementation can start narrower: tenant, domain, model identity, template version, embedding model, expiration, invalidation state, the prompt embedding, and the answer. The point is to make the reuse boundary explicit: who may reuse the answer, which model and prompt template produced it, which domain it belongs to, and when it becomes unsafe to serve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Oracle SQL to rank by vector distance and filter by reuse policy
&lt;/h2&gt;

&lt;p&gt;The semantic-cache lookup is not just this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A representative lookup query looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;cache_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;answer_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;semantic_cache&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;security_scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;security_scope&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;application_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;application_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;chat_model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;chat_model_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding_model_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;embedding_dimension&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_template_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;prompt_template_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;prompt_template_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;prompt_template_version&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;data_domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;data_domain&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;source_policy_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;source_policy_version&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;invalidated_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;expires_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SYSTIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shape is the important part. The query combines vector ranking with policy predicates. The database does not return “the answer is safe.” It returns the nearest candidates that are eligible under the SQL predicates. The application still applies the threshold and any application-specific reuse rules before serving the cached answer.&lt;/p&gt;

&lt;p&gt;That distinction also avoids a common threshold mistake. &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; returns a distance value; lower is closer for a distance metric such as cosine distance. Spring AI’s &lt;code&gt;SearchRequest&lt;/code&gt; exposes a similarity-threshold concept at the abstraction layer, where values closer to &lt;code&gt;1&lt;/code&gt; represent higher similarity. If your implementation reports both, make the direction explicit. A distance threshold and a similarity threshold are not the same number with a different label.&lt;/p&gt;

&lt;p&gt;For small demos, exact vector search can be easier to reason about. For larger cache tables, Oracle vector indexes such as HNSW and IVF become tuning tools. Approximate indexes trade recall and performance characteristics, so they belong in the measurement discussion after the correctness rules are stable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcdu3adcxi0iyo2jztf4e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcdu3adcxi0iyo2jztf4e.png" alt="A semantic-cache candidate is not automatically a hit" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The nearest vector result is a candidate. Tenant, chat model, embedding model, prompt-template version, domain, freshness, invalidation, and threshold rules determine whether the application can reuse the answer.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep RAG documents and cached answers in separate vector spaces
&lt;/h2&gt;

&lt;p&gt;RAG and semantic caching both use embeddings, but they store different things for different purposes. A RAG vector store retrieves source material for generation. A semantic-cache store retrieves a prior final answer for possible reuse.&lt;/p&gt;

&lt;p&gt;That difference is important enough to show in the data model.&lt;/p&gt;

&lt;p&gt;A RAG store contains source material:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RAG store:
- record_type: RAG_DOCUMENT
- content: "Password reset links expire after 15 minutes."
- purpose: source material for generating a new answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A semantic-cache store contains prior generated answers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Semantic-cache store:
- record_type: SEMANTIC_CACHE
- question: "How do I reset my password?"
- answer: "Go to Account Settings, choose Security, then Reset Password..."
- purpose: previously generated final answer that may be reused
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A source document can help generate a new answer. A cached answer is prior model output. Do not let one silently stand in for the other.&lt;/p&gt;

&lt;p&gt;The safest default is separate Oracle tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Source chunks for RAG retrieval&lt;/span&gt;
&lt;span class="n"&gt;rag_documents&lt;/span&gt;

&lt;span class="c1"&gt;-- Prior generated answers for semantic-cache reuse&lt;/span&gt;
&lt;span class="n"&gt;semantic_cache&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you intentionally use a shared table or shared vector-store infrastructure, every query needs a hard predicate such as &lt;code&gt;record_type = 'SEMANTIC_CACHE'&lt;/code&gt; or &lt;code&gt;record_type = 'RAG_DOCUMENT'&lt;/code&gt;, plus tenant and domain scope. Separate storage spaces are easier to inspect, test, audit, and explain than a shared table that relies on every caller passing the right filter every time.&lt;/p&gt;

&lt;p&gt;This separation also helps Spring applications as they grow. Today you may have only a semantic cache. Tomorrow you may add RAG, memory, tools, or safety advisors. Explicitly named stores and beans keep those paths from bleeding into each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Oracle True Cache offloads eligible lookup reads, not semantic decisions
&lt;/h2&gt;

&lt;p&gt;True Cache is useful after the semantic-cache boundaries are clear.&lt;/p&gt;

&lt;p&gt;In this architecture, Oracle AI Database 26ai primary remains the authoritative store for semantic-cache entries, RAG documents, policy metadata, invalidation state, hit logging, feedback, and new cache writes. Oracle True Cache is the read-path component we use for eligible, read-only semantic-cache candidate lookup traffic in the &lt;code&gt;semantic-true-cache&lt;/code&gt; mode.&lt;/p&gt;

&lt;p&gt;The read/write boundary matters. Semantic-cache writes, invalidation updates, feedback, and hit metadata belong on the primary write path. Candidate lookup can use the configured read path when the query is eligible and the freshness behavior matches the application’s correctness rules.&lt;/p&gt;

&lt;p&gt;The intended separation is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;semantic-primary:
lookup path -&amp;gt; Oracle AI Database 26ai primary service

semantic-true-cache:
lookup path -&amp;gt; Oracle True Cache read service
write path  -&amp;gt; Oracle AI Database 26ai primary service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request flow then follows the read/write boundary shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0vll3a8a5qzz3rlb0l8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0vll3a8a5qzz3rlb0l8w.png" alt="Semantic-cache request flow with Oracle True Cache" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;True Cache supports eligible read-only lookup SQL. Semantic approval remains an application responsibility, and all writes, invalidations, events, and feedback stay on the primary database path.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Oracle True Cache does not embed prompts. It does not calculate semantic meaning by itself. It does not decide whether a cached answer is safe. It supports the Oracle AI Database read path.&lt;/p&gt;

&lt;p&gt;There is an important freshness caveat. True Cache is automatically maintained from the primary database, and reads return committed, consistent data. Like any cache, though, it may not show the latest primary write at every instant. That matters for semantic caching because invalidation and expiration are correctness rules, not only performance details. For checks that are sensitive to the latest primary write, use the primary service for that check, require a primary-confirmed policy version before reuse, or measure refresh behavior for the workload before routing that path through True Cache.&lt;/p&gt;

&lt;p&gt;That read/write separation also makes measurement honest. If every cache hit synchronously updates &lt;code&gt;hit_count&lt;/code&gt;, &lt;code&gt;last_hit_at&lt;/code&gt;, and detailed metrics in the same request, the workload may stop being read-heavy. The implementation can still record hit metadata, but those writes belong on the primary database path. The benchmark separates read-only lookup latency from full request latency with write-back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Oracle AI Database 26ai is an excellent semantic-cache backend
&lt;/h2&gt;

&lt;p&gt;Oracle AI Database 26ai is a strong fit when cached LLM answers are governed application records, not just short-lived cache values. The database is especially useful when the reuse decision needs vector similarity and relational policy checks in the same lookup.&lt;/p&gt;

&lt;p&gt;That combination is the center of this architecture. A cache row can store the question embedding, generated answer, tenant scope, security scope, model identity, prompt-template version, data domain, provenance, expiration, invalidation state, and feedback signals together. A single SQL query can rank candidates by vector distance while filtering by the policy fields that decide whether reuse is even eligible.&lt;/p&gt;

&lt;p&gt;Oracle AI Database 26ai is attractive when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tenant isolation and security scope are mandatory&lt;/li&gt;
&lt;li&gt;answers depend on chat model, embedding model, prompt template, data domain, or policy version&lt;/li&gt;
&lt;li&gt;invalidation must be auditable&lt;/li&gt;
&lt;li&gt;provenance and source fingerprints matter&lt;/li&gt;
&lt;li&gt;hit/miss behavior needs SQL reporting&lt;/li&gt;
&lt;li&gt;feedback or quality signals are stored with the cache entry&lt;/li&gt;
&lt;li&gt;application data and policy state already live in Oracle AI Database&lt;/li&gt;
&lt;li&gt;DBAs and platform teams want backup, access controls, lifecycle management, and operational views in the same database estate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design is strongest when the cache decision is semantic and relational at the same time. If repeated traffic is mostly exact repeats, start with an exact scoped cache. If answers are disposable, short-lived, and governed only by simple TTL rules, a lightweight cache service may be enough. If vector retrieval is a standalone platform shared across many independent applications, a dedicated vector database can also be a reasonable fit.&lt;/p&gt;

&lt;p&gt;For this series, the interesting case is the governed one: cached answers that must carry tenant, security, model, prompt, domain, freshness, invalidation, and provenance policy with the vector used to find them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the series benchmark measures
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Articles 1 through 3 use a single-machine Docker Compose environment as a functional test bed. That setup is ideal for validating the schema, policy rules, route selection, exact hits, semantic hits, near misses, invalidation behavior, and the basic Oracle True Cache read-path integration. Article 4 then takes the same application pattern into a more realistic deployment scenario, with the application and Oracle True Cache kept together and the primary database moved to a remote OCI deployment so the read-path comparison reflects a real network hop.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The series demo is designed to test the same architecture described here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpzuv9rx5zqscdlrl8bcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fpzuv9rx5zqscdlrl8bcn.png" alt="Benchmark measurement plan for the semantic-cache series" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The benchmark keeps the same application, schema boundaries, provider pinning, and policy rules while comparing &lt;code&gt;none&lt;/code&gt;, &lt;code&gt;exact&lt;/code&gt;, &lt;code&gt;semantic-primary&lt;/code&gt;, and &lt;code&gt;semantic-true-cache&lt;/code&gt; modes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Treat &lt;code&gt;semantic-true-cache&lt;/code&gt; as the mode that exercises Oracle True Cache for eligible read-only semantic-cache lookup SQL. True Cache can offload eligible read-only database lookups from the primary database and may improve lookup latency or scalability for read-heavy workloads, but that is something to measure for the workload, not assume.&lt;/p&gt;

&lt;p&gt;The report tracks total requests, LLM calls avoided, exact hits, semantic candidates, accepted semantic hits, rejected near misses, latency percentiles, database lookup time, embedding time, token usage where available, expiration and invalidation behavior, and the Oracle True Cache read-path comparison.&lt;/p&gt;

&lt;p&gt;The goal is not to publish a universal “semantic caching saves X percent” claim. The useful result is a repeatable way to answer narrower questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For this workload, how many repeated LLM calls did the cache safely avoid?&lt;/li&gt;
&lt;li&gt;Which near misses did the policy reject?&lt;/li&gt;
&lt;li&gt;How sensitive were results to threshold and freshness settings?&lt;/li&gt;
&lt;li&gt;How much time did embedding and database lookup add?&lt;/li&gt;
&lt;li&gt;Did the Oracle True Cache read path help, remain neutral, or add overhead for this lookup workload?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those answers have to come from measurement, not assumptions or a single happy-path demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical decision rule for Spring AI semantic caching
&lt;/h2&gt;

&lt;p&gt;Use semantic caching when your application repeatedly answers semantically equivalent questions and answer reuse is safe within the same tenant, security scope, chat model or approved model family, embedding model, prompt template, data domain, and freshness window.&lt;/p&gt;

&lt;p&gt;Use exact caching first when exact reuse is available. Add semantic caching when paraphrased repetition is common enough to justify embedding and vector lookup. Keep RAG documents and cached answers separate. Treat vector results as candidates. Make freshness and invalidation part of the schema, not an afterthought. Route eligible read-only semantic-cache lookups through Oracle True Cache when the query path and freshness rules fit the workload, and measure the effect rather than assuming it.&lt;/p&gt;

&lt;p&gt;Semantic caching is a poor fit when each answer depends on rapidly changing user-specific state, when the prompt is high-risk, or when a near miss could cause material harm. In those cases, a cache miss and a fresh generation are cheaper than a wrong answer.&lt;/p&gt;

&lt;p&gt;That is the architecture to build on: Spring AI at the application layer, Oracle AI Database 26ai as the vector-plus-policy semantic-cache backend, and Oracle True Cache as the eligible read-path component for lookup-heavy semantic-cache SQL traffic.&lt;/p&gt;

&lt;p&gt;In the next article, we will turn this architecture into an inspectable implementation: a Spring Boot command-line demo, a dedicated Oracle semantic-cache table, exact and semantic lookup paths, scoped rejection cases, fixture vectors, and validation output that shows what the database returned. After that, the measurement article will use the same demo reports to separate correctness checks from benchmark claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs.html" rel="noopener noreferrer"&gt;Spring AI vector database support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/api/vectordbs/oracle.html" rel="noopener noreferrer"&gt;Spring AI Oracle Vector Store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/" rel="noopener noreferrer"&gt;Oracle AI Vector Search documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/create-tables-using-vector-data-type.html" rel="noopener noreferrer"&gt;Oracle &lt;code&gt;VECTOR&lt;/code&gt; data type&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/vector_distance.html" rel="noopener noreferrer"&gt;Oracle &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; SQL function&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/create-vector-index.html" rel="noopener noreferrer"&gt;Oracle vector indexes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/overview-oracle-true-cache.html" rel="noopener noreferrer"&gt;Oracle True Cache overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/methods-connecting-true-cache.html" rel="noopener noreferrer"&gt;Connecting applications to Oracle True Cache&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>oracle</category>
      <category>ai</category>
      <category>semantic</category>
      <category>spring</category>
    </item>
  </channel>
</rss>
