DEV Community

Wojtek Pluta for Oracle Developers

Posted on

Build a Scalable Multi-Agent RAG System with A2A Protocol, Oracle AI Database and LangChain

Why this matters

Multi-agent systems for retrieval-augmented generation (RAG) promise collaborative AI reasoning but often fail at scale due to resource contention and tight coupling. This tutorial shows how to build a distributed system using the Agent2Agent (A2A) Protocol, enabling independent agent scaling while integrating Oracle AI Database for vector storage and search through LangChain package (langchain-oracledb). You'll end up with a flexible RAG pipeline that handles document queries efficiently, suitable for AI developers facing production bottlenecks.

The outcome: A loosely coupled architecture where agents like planners, researchers, and synthesizers communicate via A2A, reducing latency and improving fault isolation in high-load scenarios.

What we’ll build

We'll create Agentic RAG, an intelligent RAG system with multi-agent CoT reasoning:

  • Planner, Researcher, Reasoner, Synthesizer agents communicating via A2A Protocol.
  • PDF/web/repo processing with Docling/Trafilatura/Gitingest.
  • Persistent vector storage in Oracle AI Database 26ai.
  • FastAPI API and Gradio UI for uploads/queries.
  • Local LLMs via Ollama (gemma3:270m default).

Architecture showing PDF/web processing, vector store, RAG agent, and A2A multi-agent CoT.
Alt: Architecture showing PDF/web processing, vector store, RAG agent, and A2A multi-agent CoT.

Requirements:

Repo: Oracle AI Developer Hub.

Setup

Install and configure

Clone the repo and install dependencies:

git clone https://github.com/oracle-devrel/oracle-ai-developer-hub.git
cd oracle-ai-developer-hub/apps/agentic_rag
pip install -r requirements.txt  # Includes docling, trafilatura, oracledb, fastapi, gradio, ollama and langchain-oracledb
Enter fullscreen mode Exit fullscreen mode

Set up Ollama for local LLMs (default: gemma3:270m):

ollama pull gemma3:270m
ollama serve
Enter fullscreen mode Exit fullscreen mode

Configure Oracle AI Database 26ai in config.yaml:

  • ORACLE_DB_USERNAME, ORACLE_DB_PASSWORD, ORACLE_DB_DSN. Use Oracle AI Database Free to store and retrive vector embeddings.

Connect and verify

Test Oracle connection (via tests/test_oradb.py):

python tests/test_oradb.py --stats-only
Enter fullscreen mode Exit fullscreen mode

Or in Python (using oracledb):

import oracledb
connection = oracledb.connect(user="ADMIN", password="<pass>", dsn="<dsn>")
cursor = connection.cursor()
cursor.execute("SELECT * FROM v$version WHERE banner LIKE '%26ai%'")
print(cursor.fetchall())
connection.close()
Enter fullscreen mode Exit fullscreen mode

Output:

[('Oracle Database 26ai ...',)]
Enter fullscreen mode Exit fullscreen mode

Verify Ollama: curl http://localhost:11434/api/tags (should list gemma3:270m).

Core steps

Step 1: Process and ingest data into Oracle AI Database 26ai

Use built-in processors for PDFs (Docling), websites (Trafilatura), repos (Gitingest) to chunk text and generate vector embeddings, then store them in vector collections (PDFCOLLECTION, WEBCOLLECTION, REPOCOLLECTION, GENERALCOLLECTION).

Why vector embeddings: Embeddings capture semantic meaning, enabling efficient similarity search via Oracle AI Database's VECTOR_DISTANCE function. This supports intelligent query routing across diverse sources like PDFs and web content.

Focus on LangChain for embeddings: Integrate LangChain's embedding models (e.g., OllamaEmbeddings for local LLMs) to generate vectors before storing in Oracle Database. This allows seamless switching between embedding providers while leveraging Oracle's scalable vector storage.

Process a PDF:

python -m src.pdf_processor --input https://arxiv.org/pdf/2203.06605 --output chunks.json
python -m src.store --add chunks.json  # Adds to PDFCOLLECTION
Enter fullscreen mode Exit fullscreen mode

For websites:

python -m src.web_processor --input https://example.com --output web_content.json
python -m src.store --add-web web_content.json  # Adds to WEBCOLLECTION
Enter fullscreen mode Exit fullscreen mode

In code (src/store.py equivalent, with LangChain embeddings):

from langchain.embeddings import OllamaEmbeddings
from oracle_ai_vector_search import OracleVectorStore  # Compatible with langchain-oracledb

# Initialize embeddings with LangChain
embeddings = OllamaEmbeddings(model="gemma3:270m")

# From src/store.py - initialize OracleVS
store = OracleVectorStore(
    connection=connection,
    collection="PDFCOLLECTION",
    embedding=embeddings  # Pass LangChain embeddings
)
store.add_texts(chunks, metadatas=metadata)
results = store.similarity_search("query", k=5, embedding=embeddings)
Enter fullscreen mode Exit fullscreen mode

Expected output:

Processed 10 chunks from PDF.
Generated embeddings with OllamaEmbeddings.
Added to vector store: PDFCOLLECTION (total chunks: 15).
Enter fullscreen mode Exit fullscreen mode

Pitfall: Configure config.yaml for DB creds; large PDFs may need chunk_size adjustment in LangChain's text splitter. Ensure Ollama is running for local embeddings.

LangChain Integration for RAG Orchestration

LangChain simplifies building RAG pipelines by providing chains for retrieval, question-answering, and conversational memory. In this system:

  • Use RetrievalQA or ConversationalRetrievalChain to query the Oracle vector store.
  • Integrate with A2A agents for multi-step reasoning: LangChain's tool-calling agents can invoke A2A endpoints as custom tools.
  • Example: Wire OracleVectorStore to a RetrievalQA chain for hybrid search (vector + keyword).

Code snippet (in src/local_rag_agent.py):

from langchain.chains import RetrievalQA
from langchain.embeddings import OllamaEmbeddings
from oracle_ai_vector_search import OracleVectorStore

embeddings = OllamaEmbeddings(model="gemma3:270m")
store = OracleVectorStore(connection=connection, embedding=embeddings, collection="PDFCOLLECTION")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,  # Ollama LLM
    chain_type="stuff",
    retriever=store.as_retriever(search_kwargs={"k": 5})
)

result = qa_chain.run("Explain DaGAN in Depth-Aware GAN")
print(result)
Enter fullscreen mode Exit fullscreen mode

Why it fits: LangChain's modular design complements A2A's distributed agents, enabling scalable CoT while offloading vector ops to Oracle AI Database.

Step 2: Implement A2A agent cards and discovery

A2A Protocol enables JSON-RPC communication for agent discovery, task management, and distributed CoT.

Why: Supports interoperable, scalable multi-agent workflows with capability advertisement.

Agent card example (from agent_card.py):

{
  "agent_id": "planner_agent_v1",
  "name": "Strategic Planner",
  "version": "1.0.0",
  "description": "Breaks queries into actionable steps",
  "capabilities": ["agent.query"],
  "inputs": {"query": "string", "step": "optional"},
  "outputs": {"plan": "array"},
  "endpoint": "http://localhost:8000/a2a"
}
Enter fullscreen mode Exit fullscreen mode

Discovery via curl:

curl -X POST http://localhost:8000/a2a \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "agent.discover",
    "params": {"capability": "agent.query"},
    "id": "1"
  }'
Enter fullscreen mode Exit fullscreen mode

Output:

{"jsonrpc":"2.0","result":{"agents":[{"agent_id":"planner_agent_v1","url":"http://localhost:8000/a2a"}]},"id":"1"}
Enter fullscreen mode Exit fullscreen mode

Deploy: Update config.yaml with AGENT_ENDPOINTS for distributed (e.g., planner_url: http://server1:8001).

Pitfall: Ensure A2A server runs (python -m src.main).

Step 3: Build the multi-agent pipeline with A2A and CoT

Use local_rag_agent for RAG queries; enable --use-cot for distributed multi-agent reasoning (Planner → Researcher → Reasoner → Synthesizer via A2A).

Why: Provides structured CoT for complex queries, with transparent steps and sources.

CLI example:

python -m src.local_rag_agent --query "Explain DaGAN in Depth-Aware GAN" --use-cot --model gemma3:270m
Enter fullscreen mode Exit fullscreen mode

In code (from src/local_rag_agent.py):

# Simplified orchestrator
from a2a_handler import A2AHandler
handler = A2AHandler()

def run_cot_pipeline(query: str):
    # Step 1: Planner via A2A
    plan = handler.call_agent("planner_agent_v1", {"query": query})
    # Step 2-4: Delegate to researcher/reasoner/synthesizer
    research = handler.call_agent("researcher_agent_v1", {"query": query, "plan": plan})
    reasoning = handler.call_agent("reasoner_agent_v1", {"context": research})
    final = handler.call_agent("synthesizer_agent_v1", {"steps": [plan, research, reasoning]})
    return final

result = run_cot_pipeline("What is A2A?")
print(result)
Enter fullscreen mode Exit fullscreen mode

Output:

Step 1: Planning - Break down query...
Step 2: Research - Gathered from PDFCOLLECTION...
...
Final Answer: A2A is an open protocol for agent communication...
Sources: document.pdf (pages 1-3)
Enter fullscreen mode Exit fullscreen mode

Pitfall: CoT increases latency (2-5x); use for complex queries only. Ensure all agents registered.

Step 4: Launch Gradio UI and API for interaction

Run Gradio for UI (includes model management, document processing, chat with CoT/A2A tabs).

Why: Provides user-friendly interface for uploads, queries, and A2A testing.

python gradio_app.py  # Starts at http://localhost:7860; auto-starts A2A server
Enter fullscreen mode Exit fullscreen mode

API endpoints (FastAPI at http://localhost:8000):

# Upload PDF
POST /upload/pdf
Content-Type: multipart/form-data
file: <pdf-file>

# Query with CoT
POST /query
Content-Type: application/json
{
    "query": "Your question",
    "use_cot": true
}
Enter fullscreen mode Exit fullscreen mode

In code:

import requests
response = requests.post("http://localhost:8000/query", json={"query": "Test RAG", "use_cot": True})
print(response.json()["answer"])
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "answer": "Response with CoT steps...",
  "sources": ["PDFCOLLECTION"]
}
Enter fullscreen mode Exit fullscreen mode

Pitfall: Port conflicts; use --port flag. Gradio requires gradio installed.

Practical Benefits of using A2A

  • It's Free: all LLMs are open-source, so you only have to deploy them and start talking free of charge.
  • Operational Clarity: With Agent Cards and discovery, your ops team knows exactly what agents are available, what they can do, and how loaded they are. Monitoring becomes straightforward - track task completion rates per agent type, identify real bottlenecks, and scale intelligently.

  • Fault Isolation: When one researcher agent crashes, others continue working. When a planner agent goes down, you can quickly discover an alternative or restart it without disrupting the entire pipeline.

  • Flexibility: Need better document analysis? Swap your researcher agent for one using a different model or provider. A2A doesn't lock you into a specific implementation.

  • Enterprise Compliance: Each agent can enforce its own security policies, authentication schemes, and audit logging. A2A supports JWT, OIDC, and custom authentication at the agent level.

Next steps for the project

I'd like to implement a few things into this project - and we're looking for contributors to get involved! Give us a star on our GitHub repository.

Couple of things on our roadmap are:

  • The ability to create custom agents, not only the pre-defined pipeline I created (planner -> researcher -> reasoner -> synthesizer)

  • Fully decouple the LLMs in the current pipeline: I'd like to test another architecture where agents work independently on parts of the answer instead of having a cascading or sequential mechanism (what we have more or less right now, as the synthesizer agent has to wait for the other agents to finish their tasks first)

Conclusions

The evolution from monolithic Agentic RAG to A2A-based distributed systems is well underway, moving away from ‘deploy the whole pipeline more times’ to a position of deploying the right number of the right agents.
The beauty of A2A adoption is that it's open-source and standardized (and it's always nice to have it developed and maintained by Google). For organizations building serious agentic systems, this is the time where you can get ahead of the rest and start building with Oracle AI Database, A2A Protocol and LangChain Oracle AI Vector Search Integration!

Additional Links

Top comments (0)