Why Agentic AI Needs Observability
AI systems have changed fundamentally.
We are no longer building single-prompt chatbots. We are building agents i.e., systems that plan, call tools, query databases, and make decisions over multiple steps.
The hard problem is no longer “Can the model answer?”
The hard problem is:
“What happened when the agent failed?”
Without observability:
- Tool failures look like hallucinations
- Latency spikes are invisible
- SQL errors are buried
- Partial failures silently corrupt results
This post introduces TalentScout AI, a reference implementation of an observable, secure, and resilient agent system. The use case is simple, but the architecture is realistic and production-oriented.
Table of Contents
- What Observability Means for AI Agents
- Architecture
- Local Development Prerequisites
- Installing Docker
- Running Oracle AI Database Locally
- Installing Oracle SQLcl (MCP Server)
- Installing Ollama (Local LLM Runtime)
- Setting Up the Python Environment
- Database Schema and Seed Data
- Secure Database Access with MCP
- Agent Architecture with LangGraph
- Implementing the Agents
- Adding OpenTelemetry Observability
- Running the Agents
- Viewing Traces in Phoenix
- Troubleshooting and Failure Modes
- Resources
What Observability Means for AI Agents
Observability is not logging.
In agentic systems, observability means being able to reconstruct an execution after the fact, without rerunning it.
An observable agent must capture:
- Signal
- Purpose
- Execution order
- Understand control flow
- Tool calls
- Identify external dependencies
- LLM prompts
- Debug reasoning errors
- Generated SQL
- Catch unsafe or invalid queries
- Latency
- Find bottlenecks
- Errors and warnings
- Diagnose partial failures
If you cannot answer “what happened inside the agent?”, you cannot operate it safely.
Architecture
TalentScout AI is built as a graph of agents, not a monolith.
Each agent has one responsibility:
- Web Agent → gathers external context
- DB Agent → queries enterprise data
- Orchestrator → makes the final decision
This separation makes failures visible and traceable.
Local Development Prerequisites
Hardware
- macOS, Linux, or Windows (WSL2 recommended)
- Minimum 16GB RAM
- Docker-capable CPU
Software
- Docker
- Python 3.12+
- Git
Installing Docker
macOS / Windows
Download Docker Desktop: https://www.docker.com/products/docker-desktop
Verify installation:
docker --version
docker compose version
Running Oracle AI Database Locally
Oracle provides a free container image suitable for local development.
Pull the Image
docker pull container-registry.oracle.com/database/free:latest
Run the Container
docker run -d \
--name oracle-ai \
-p 1521:1521 \
-e ORACLE_PWD=oracle \
container-registry.oracle.com/database/free:latest
Verify Startup
docker logs oracle-ai
Wait until you see:
DATABASE IS READY TO USE!
Installing Oracle SQLcl (MCP Server)
SQLcl acts as the Model Context Protocol server.
Download SQLcl
Install
unzip sqlcl-25.x.x.zip
export PATH=$PATH:/path/to/sqlcl/bin
Verify:
sql -v
Installing Ollama (Local LLM Runtime)
Ollama allows you to run LLMs locally without cloud APIs.
Install Ollama
https://ollama.com/download
Verify:
ollama --version
Pull a Model
ollama pull gemma3:12b
Test:
ollama run gemma3:12b "Hello"
Setting Up the Python Environment
Clone the Repository
git clone https://github.com/harishkotra/talentscoutai/
cd talentscoutai
Create a Virtual Environment
python3.12 -m venv .venv
source .venv/bin/activate
Install Dependencies
pip install -r requirements.txt
Example requirements.txt:
langchain
langchain-community
langchain-ollama
langchain-tavily
langchain-mcp-adapters
langgraph
oracledb
arize-phoenix
openinference-instrumentation-langchain
opentelemetry-sdk
opentelemetry-exporter-otlp
rich
python-dotenv
mcp
Database Schema and Seed Data
Create the Table
CREATE TABLE talent_roster (
id NUMBER GENERATED BY DEFAULT AS IDENTITY,
actor_name VARCHAR2(100),
availability_status VARCHAR2(20)
);
Insert Sample Data
INSERT INTO talent_roster VALUES (DEFAULT, 'Pedro Pascal', 'AVAILABLE');
INSERT INTO talent_roster VALUES (DEFAULT, 'Cillian Murphy', 'AVAILABLE');
COMMIT;
Secure Database Access with MCP
Why MCP Exists
Direct database access from agents:
- Leaks credentials
- Breaks auditability
- Grants too much authority to LLMs
MCP solves this by separating reasoning from execution.
The agent requests execution; SQLcl owns execution.
Agent Architecture with LangGraph
Agent state is explicit and typed:
class AgentState(TypedDict):
request: str
research_data: str
db_data: str
final_report: str
This makes transitions observable and debuggable.
Implementing the Agents
Web Research Agent
async def web_search_node(state):
query_prompt = f"Provide actors suitable for: {state['request']}"
search_query = await (llm | StrOutputParser()).ainvoke(query_prompt)
result = tavily.invoke({"query": search_query})
return {"research_data": result}
This agent collects context only, it does not decide.
Database Agent (MCP)
async with mcp_client.session("oracle") as session:
await session.initialize()
result = await session.call_tool(
"run-sqlcl",
arguments={"sqlcl": sql_script}
)
Key properties:
- No JDBC in Python
- No credentials in prompts
- SQL execution is mediated
Orchestrator Agent
async def orchestrator_node(state):
prompt = f"""
Context: {state['research_data']}
DB Results: {state['db_data']}
Recommend an available actor.
"""
report = await (llm | StrOutputParser()).ainvoke(prompt)
return {"final_report": report}
Adding OpenTelemetry Observability
Instrument LangChain
LangChainInstrumentor().instrument(
tracer_provider=tracer_provider
)
Export Traces
OTLPSpanExporter(
endpoint="http://localhost:6006/v1/traces"
)
This captures:
- Agent execution order
- Prompts
- SQL
- Errors
- Latency
Running the agents
python main.py
You should see:
- Web research
- SQL execution
- Final recommendation
Viewing Traces in Phoenix (OpenTelemetry)
Open:
http://localhost:6006
You can inspect:
- Each agent span
- Generated SQL
- MCP warnings
- Latency breakdowns
This is the agent’s internal state, preserved.
Troubleshooting and Failure Modes
SQLcl Banner Noise
Always use:
-nolog -silent
MCP Protocol Mismatch
Catch and recover instead of crashing.
Silent SQL Output
Check for:
ORA-
SP2-
Resources
👉 GitHub
👉 YouTube Video
Agentic AI does not fail because models are weak.
It fails because:
- Systems are opaque
- Failures are invisible
- Security is bolted on too late
By combining:
- Structured agents
- Secure tool boundaries
- End-to-end observability
We move from impressive demos to operable systems.
That is how agentic AI becomes production-ready.


Top comments (0)