Why this matters
Multi-agent systems for retrieval-augmented generation (RAG) promise collaborative AI reasoning but often fail at scale due to resource contention and tight coupling. This tutorial shows how to build a distributed system using the Agent2Agent (A2A) Protocol, enabling independent agent scaling while integrating Oracle AI Database for vector storage and search through LangChain package (langchain-oracledb). You'll end up with a flexible RAG pipeline that handles document queries efficiently, suitable for AI developers facing production bottlenecks.
The outcome: A loosely coupled architecture where agents like planners, researchers, and synthesizers communicate via A2A, reducing latency and improving fault isolation in high-load scenarios.
What we’ll build
We'll create Agentic RAG, an intelligent RAG system with multi-agent CoT reasoning:
- Planner, Researcher, Reasoner, Synthesizer agents communicating via A2A Protocol.
- PDF/web/repo processing with Docling/Trafilatura/Gitingest.
- Persistent vector storage in Oracle AI Database 26ai.
- FastAPI API and Gradio UI for uploads/queries.
- Local LLMs via Ollama (gemma3:270m default).

Alt: Architecture showing PDF/web processing, vector store, RAG agent, and A2A multi-agent CoT.
Requirements:
- Oracle AI Database instance (Autonomous Database).
- LangChain Integration for Oracle AI Vector Search - Vector Store
- Python 3.10+, dependencies from requirements.txt.
- Ollama installed and running.
- Docling
- Trafilatura
- Gitingest
Repo: Oracle AI Developer Hub.
Setup
Install and configure
Clone the repo and install dependencies:
git clone https://github.com/oracle-devrel/oracle-ai-developer-hub.git
cd oracle-ai-developer-hub/apps/agentic_rag
pip install -r requirements.txt # Includes docling, trafilatura, oracledb, fastapi, gradio, ollama and langchain-oracledb
Set up Ollama for local LLMs (default: gemma3:270m):
ollama pull gemma3:270m
ollama serve
Configure Oracle AI Database 26ai in config.yaml:
-
ORACLE_DB_USERNAME,ORACLE_DB_PASSWORD,ORACLE_DB_DSN. Use Oracle AI Database Free to store and retrive vector embeddings.
Connect and verify
Test Oracle connection (via tests/test_oradb.py):
python tests/test_oradb.py --stats-only
Or in Python (using oracledb):
import oracledb
connection = oracledb.connect(user="ADMIN", password="<pass>", dsn="<dsn>")
cursor = connection.cursor()
cursor.execute("SELECT * FROM v$version WHERE banner LIKE '%26ai%'")
print(cursor.fetchall())
connection.close()
Output:
[('Oracle Database 26ai ...',)]
Verify Ollama: curl http://localhost:11434/api/tags (should list gemma3:270m).
Core steps
Step 1: Process and ingest data into Oracle AI Database 26ai
Use built-in processors for PDFs (Docling), websites (Trafilatura), repos (Gitingest) to chunk text and generate vector embeddings, then store them in vector collections (PDFCOLLECTION, WEBCOLLECTION, REPOCOLLECTION, GENERALCOLLECTION).
Why vector embeddings: Embeddings capture semantic meaning, enabling efficient similarity search via Oracle AI Database's VECTOR_DISTANCE function. This supports intelligent query routing across diverse sources like PDFs and web content.
Focus on LangChain for embeddings: Integrate LangChain's embedding models (e.g., OllamaEmbeddings for local LLMs) to generate vectors before storing in Oracle Database. This allows seamless switching between embedding providers while leveraging Oracle's scalable vector storage.
Process a PDF:
python -m src.pdf_processor --input https://arxiv.org/pdf/2203.06605 --output chunks.json
python -m src.store --add chunks.json # Adds to PDFCOLLECTION
For websites:
python -m src.web_processor --input https://example.com --output web_content.json
python -m src.store --add-web web_content.json # Adds to WEBCOLLECTION
In code (src/store.py equivalent, with LangChain embeddings):
from langchain.embeddings import OllamaEmbeddings
from oracle_ai_vector_search import OracleVectorStore # Compatible with langchain-oracledb
# Initialize embeddings with LangChain
embeddings = OllamaEmbeddings(model="gemma3:270m")
# From src/store.py - initialize OracleVS
store = OracleVectorStore(
connection=connection,
collection="PDFCOLLECTION",
embedding=embeddings # Pass LangChain embeddings
)
store.add_texts(chunks, metadatas=metadata)
results = store.similarity_search("query", k=5, embedding=embeddings)
Expected output:
Processed 10 chunks from PDF.
Generated embeddings with OllamaEmbeddings.
Added to vector store: PDFCOLLECTION (total chunks: 15).
Pitfall: Configure config.yaml for DB creds; large PDFs may need chunk_size adjustment in LangChain's text splitter. Ensure Ollama is running for local embeddings.
LangChain Integration for RAG Orchestration
LangChain simplifies building RAG pipelines by providing chains for retrieval, question-answering, and conversational memory. In this system:
- Use
RetrievalQAorConversationalRetrievalChainto query the Oracle vector store. - Integrate with A2A agents for multi-step reasoning: LangChain's tool-calling agents can invoke A2A endpoints as custom tools.
- Example: Wire
OracleVectorStoreto aRetrievalQAchain for hybrid search (vector + keyword).
Code snippet (in src/local_rag_agent.py):
from langchain.chains import RetrievalQA
from langchain.embeddings import OllamaEmbeddings
from oracle_ai_vector_search import OracleVectorStore
embeddings = OllamaEmbeddings(model="gemma3:270m")
store = OracleVectorStore(connection=connection, embedding=embeddings, collection="PDFCOLLECTION")
qa_chain = RetrievalQA.from_chain_type(
llm=llm, # Ollama LLM
chain_type="stuff",
retriever=store.as_retriever(search_kwargs={"k": 5})
)
result = qa_chain.run("Explain DaGAN in Depth-Aware GAN")
print(result)
Why it fits: LangChain's modular design complements A2A's distributed agents, enabling scalable CoT while offloading vector ops to Oracle AI Database.
Step 2: Implement A2A agent cards and discovery
A2A Protocol enables JSON-RPC communication for agent discovery, task management, and distributed CoT.
Why: Supports interoperable, scalable multi-agent workflows with capability advertisement.
Agent card example (from agent_card.py):
{
"agent_id": "planner_agent_v1",
"name": "Strategic Planner",
"version": "1.0.0",
"description": "Breaks queries into actionable steps",
"capabilities": ["agent.query"],
"inputs": {"query": "string", "step": "optional"},
"outputs": {"plan": "array"},
"endpoint": "http://localhost:8000/a2a"
}
Discovery via curl:
curl -X POST http://localhost:8000/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "agent.discover",
"params": {"capability": "agent.query"},
"id": "1"
}'
Output:
{"jsonrpc":"2.0","result":{"agents":[{"agent_id":"planner_agent_v1","url":"http://localhost:8000/a2a"}]},"id":"1"}
Deploy: Update config.yaml with AGENT_ENDPOINTS for distributed (e.g., planner_url: http://server1:8001).
Pitfall: Ensure A2A server runs (python -m src.main).
Step 3: Build the multi-agent pipeline with A2A and CoT
Use local_rag_agent for RAG queries; enable --use-cot for distributed multi-agent reasoning (Planner → Researcher → Reasoner → Synthesizer via A2A).
Why: Provides structured CoT for complex queries, with transparent steps and sources.
CLI example:
python -m src.local_rag_agent --query "Explain DaGAN in Depth-Aware GAN" --use-cot --model gemma3:270m
In code (from src/local_rag_agent.py):
# Simplified orchestrator
from a2a_handler import A2AHandler
handler = A2AHandler()
def run_cot_pipeline(query: str):
# Step 1: Planner via A2A
plan = handler.call_agent("planner_agent_v1", {"query": query})
# Step 2-4: Delegate to researcher/reasoner/synthesizer
research = handler.call_agent("researcher_agent_v1", {"query": query, "plan": plan})
reasoning = handler.call_agent("reasoner_agent_v1", {"context": research})
final = handler.call_agent("synthesizer_agent_v1", {"steps": [plan, research, reasoning]})
return final
result = run_cot_pipeline("What is A2A?")
print(result)
Output:
Step 1: Planning - Break down query...
Step 2: Research - Gathered from PDFCOLLECTION...
...
Final Answer: A2A is an open protocol for agent communication...
Sources: document.pdf (pages 1-3)
Pitfall: CoT increases latency (2-5x); use for complex queries only. Ensure all agents registered.
Step 4: Launch Gradio UI and API for interaction
Run Gradio for UI (includes model management, document processing, chat with CoT/A2A tabs).
Why: Provides user-friendly interface for uploads, queries, and A2A testing.
python gradio_app.py # Starts at http://localhost:7860; auto-starts A2A server
API endpoints (FastAPI at http://localhost:8000):
# Upload PDF
POST /upload/pdf
Content-Type: multipart/form-data
file: <pdf-file>
# Query with CoT
POST /query
Content-Type: application/json
{
"query": "Your question",
"use_cot": true
}
In code:
import requests
response = requests.post("http://localhost:8000/query", json={"query": "Test RAG", "use_cot": True})
print(response.json()["answer"])
Output:
{
"answer": "Response with CoT steps...",
"sources": ["PDFCOLLECTION"]
}
Pitfall: Port conflicts; use --port flag. Gradio requires gradio installed.
Practical Benefits of using A2A
- It's Free: all LLMs are open-source, so you only have to deploy them and start talking free of charge.
Operational Clarity: With Agent Cards and discovery, your ops team knows exactly what agents are available, what they can do, and how loaded they are. Monitoring becomes straightforward - track task completion rates per agent type, identify real bottlenecks, and scale intelligently.
Fault Isolation: When one researcher agent crashes, others continue working. When a planner agent goes down, you can quickly discover an alternative or restart it without disrupting the entire pipeline.
Flexibility: Need better document analysis? Swap your researcher agent for one using a different model or provider. A2A doesn't lock you into a specific implementation.
Enterprise Compliance: Each agent can enforce its own security policies, authentication schemes, and audit logging. A2A supports JWT, OIDC, and custom authentication at the agent level.
Next steps for the project
I'd like to implement a few things into this project - and we're looking for contributors to get involved! Give us a star on our GitHub repository.
Couple of things on our roadmap are:
The ability to create custom agents, not only the pre-defined pipeline I created (planner -> researcher -> reasoner -> synthesizer)
Fully decouple the LLMs in the current pipeline: I'd like to test another architecture where agents work independently on parts of the answer instead of having a cascading or sequential mechanism (what we have more or less right now, as the synthesizer agent has to wait for the other agents to finish their tasks first)
Conclusions
The evolution from monolithic Agentic RAG to A2A-based distributed systems is well underway, moving away from ‘deploy the whole pipeline more times’ to a position of deploying the right number of the right agents.
The beauty of A2A adoption is that it's open-source and standardized (and it's always nice to have it developed and maintained by Google). For organizations building serious agentic systems, this is the time where you can get ahead of the rest and start building with Oracle AI Database, A2A Protocol and LangChain Oracle AI Vector Search Integration!
Top comments (0)