DEV Community

Cover image for Building Observable, Secure, and Resilient AI Agents with Oracle MCP, OpenTelemetry, and LangGraph
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building Observable, Secure, and Resilient AI Agents with Oracle MCP, OpenTelemetry, and LangGraph

Why Agentic AI Needs Observability

AI systems have changed fundamentally.

We are no longer building single-prompt chatbots. We are building agents i.e., systems that plan, call tools, query databases, and make decisions over multiple steps.

The hard problem is no longer “Can the model answer?”

The hard problem is:

“What happened when the agent failed?”

Without observability:

  • Tool failures look like hallucinations
  • Latency spikes are invisible
  • SQL errors are buried
  • Partial failures silently corrupt results

This post introduces TalentScout AI, a reference implementation of an observable, secure, and resilient agent system. The use case is simple, but the architecture is realistic and production-oriented.

Table of Contents

  1. What Observability Means for AI Agents
  2. Architecture
  3. Local Development Prerequisites
  4. Installing Docker
  5. Running Oracle AI Database Locally
  6. Installing Oracle SQLcl (MCP Server)
  7. Installing Ollama (Local LLM Runtime)
  8. Setting Up the Python Environment
  9. Database Schema and Seed Data
  10. Secure Database Access with MCP
  11. Agent Architecture with LangGraph
  12. Implementing the Agents
  13. Adding OpenTelemetry Observability
  14. Running the Agents
  15. Viewing Traces in Phoenix
  16. Troubleshooting and Failure Modes
  17. Resources

What Observability Means for AI Agents

Observability is not logging.

In agentic systems, observability means being able to reconstruct an execution after the fact, without rerunning it.

An observable agent must capture:

  1. Signal
  2. Purpose
  3. Execution order
  4. Understand control flow
  5. Tool calls
  6. Identify external dependencies
  7. LLM prompts
  8. Debug reasoning errors
  9. Generated SQL
  10. Catch unsafe or invalid queries
  11. Latency
  12. Find bottlenecks
  13. Errors and warnings
  14. Diagnose partial failures

If you cannot answer “what happened inside the agent?”, you cannot operate it safely.

Architecture

TalentScout AI is built as a graph of agents, not a monolith.

Graph of Agents

Each agent has one responsibility:

  • Web Agent → gathers external context
  • DB Agent → queries enterprise data
  • Orchestrator → makes the final decision

This separation makes failures visible and traceable.

Local Development Prerequisites

Hardware

  • macOS, Linux, or Windows (WSL2 recommended)
  • Minimum 16GB RAM
  • Docker-capable CPU

Software

  • Docker
  • Python 3.12+
  • Git

Installing Docker

macOS / Windows

Download Docker Desktop: https://www.docker.com/products/docker-desktop

Verify installation:

docker --version
docker compose version
Enter fullscreen mode Exit fullscreen mode

Running Oracle AI Database Locally

Oracle provides a free container image suitable for local development.

Pull the Image

docker pull container-registry.oracle.com/database/free:latest
Enter fullscreen mode Exit fullscreen mode

Run the Container

docker run -d \
  --name oracle-ai \
  -p 1521:1521 \
  -e ORACLE_PWD=oracle \
  container-registry.oracle.com/database/free:latest

Enter fullscreen mode Exit fullscreen mode

Verify Startup

docker logs oracle-ai
Enter fullscreen mode Exit fullscreen mode

Wait until you see:

DATABASE IS READY TO USE!
Enter fullscreen mode Exit fullscreen mode

Installing Oracle SQLcl (MCP Server)

SQLcl acts as the Model Context Protocol server.

Download SQLcl

Install

unzip sqlcl-25.x.x.zip
export PATH=$PATH:/path/to/sqlcl/bin

Enter fullscreen mode Exit fullscreen mode

Verify:

sql -v

Enter fullscreen mode Exit fullscreen mode

Installing Ollama (Local LLM Runtime)

Ollama allows you to run LLMs locally without cloud APIs.

Install Ollama

https://ollama.com/download
Enter fullscreen mode Exit fullscreen mode

Verify:

ollama --version

Enter fullscreen mode Exit fullscreen mode

Pull a Model

ollama pull gemma3:12b

Enter fullscreen mode Exit fullscreen mode

Test:

ollama run gemma3:12b "Hello"
Enter fullscreen mode Exit fullscreen mode

Setting Up the Python Environment

Clone the Repository

git clone https://github.com/harishkotra/talentscoutai/
cd talentscoutai
Enter fullscreen mode Exit fullscreen mode

Create a Virtual Environment

python3.12 -m venv .venv
source .venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Install Dependencies

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Example requirements.txt:

langchain
langchain-community
langchain-ollama
langchain-tavily
langchain-mcp-adapters
langgraph
oracledb
arize-phoenix
openinference-instrumentation-langchain
opentelemetry-sdk
opentelemetry-exporter-otlp
rich
python-dotenv
mcp
Enter fullscreen mode Exit fullscreen mode

Database Schema and Seed Data

Create the Table

CREATE TABLE talent_roster (
    id NUMBER GENERATED BY DEFAULT AS IDENTITY,
    actor_name VARCHAR2(100),
    availability_status VARCHAR2(20)
);
Enter fullscreen mode Exit fullscreen mode

Insert Sample Data

INSERT INTO talent_roster VALUES (DEFAULT, 'Pedro Pascal', 'AVAILABLE');
INSERT INTO talent_roster VALUES (DEFAULT, 'Cillian Murphy', 'AVAILABLE');
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Secure Database Access with MCP

Why MCP Exists

Direct database access from agents:

  • Leaks credentials
  • Breaks auditability
  • Grants too much authority to LLMs

MCP solves this by separating reasoning from execution.

SQLcl MCP

The agent requests execution; SQLcl owns execution.

Agent Architecture with LangGraph

Agent state is explicit and typed:

class AgentState(TypedDict):
    request: str
    research_data: str
    db_data: str
    final_report: str
Enter fullscreen mode Exit fullscreen mode

This makes transitions observable and debuggable.

Implementing the Agents

Web Research Agent

async def web_search_node(state):
    query_prompt = f"Provide actors suitable for: {state['request']}"
    search_query = await (llm | StrOutputParser()).ainvoke(query_prompt)
    result = tavily.invoke({"query": search_query})
    return {"research_data": result}
Enter fullscreen mode Exit fullscreen mode

This agent collects context only, it does not decide.

Database Agent (MCP)

async with mcp_client.session("oracle") as session:
    await session.initialize()
    result = await session.call_tool(
        "run-sqlcl",
        arguments={"sqlcl": sql_script}
    )

Enter fullscreen mode Exit fullscreen mode

Key properties:

  • No JDBC in Python
  • No credentials in prompts
  • SQL execution is mediated

Orchestrator Agent

async def orchestrator_node(state):
    prompt = f"""
    Context: {state['research_data']}
    DB Results: {state['db_data']}
    Recommend an available actor.
    """
    report = await (llm | StrOutputParser()).ainvoke(prompt)
    return {"final_report": report}
Enter fullscreen mode Exit fullscreen mode

Adding OpenTelemetry Observability

Instrument LangChain

LangChainInstrumentor().instrument(
    tracer_provider=tracer_provider
)
Enter fullscreen mode Exit fullscreen mode

Export Traces

OTLPSpanExporter(
    endpoint="http://localhost:6006/v1/traces"
)

Enter fullscreen mode Exit fullscreen mode

This captures:

  • Agent execution order
  • Prompts
  • SQL
  • Errors
  • Latency

Running the agents

python main.py
Enter fullscreen mode Exit fullscreen mode

You should see:

  • Web research
  • SQL execution
  • Final recommendation

Agents working together

Viewing Traces in Phoenix (OpenTelemetry)

Open:

http://localhost:6006
Enter fullscreen mode Exit fullscreen mode

You can inspect:

  • Each agent span
  • Generated SQL
  • MCP warnings
  • Latency breakdowns

Observability with OpenTelemetry

This is the agent’s internal state, preserved.

Troubleshooting and Failure Modes

SQLcl Banner Noise

Always use:

-nolog -silent
Enter fullscreen mode Exit fullscreen mode

MCP Protocol Mismatch

Catch and recover instead of crashing.

Silent SQL Output

Check for:

ORA-
SP2-
Enter fullscreen mode Exit fullscreen mode

Resources

👉 GitHub
👉 YouTube Video

Agentic AI does not fail because models are weak.

It fails because:

  • Systems are opaque
  • Failures are invisible
  • Security is bolted on too late

By combining:

  • Structured agents
  • Secure tool boundaries
  • End-to-end observability

We move from impressive demos to operable systems.

That is how agentic AI becomes production-ready.

Top comments (0)