DEV Community

Cover image for We tested structured ontology vs Markdown+RAG for AI agents — "why?" recall was 0% vs 100%
Martin Arva
Martin Arva

Posted on

We tested structured ontology vs Markdown+RAG for AI agents — "why?" recall was 0% vs 100%

Our AI agent knew the company uses Provider A for identity verification. It could name the provider, list the integration specs, recite the timeline.

Then we asked why Provider A was chosen over Provider B.

The agent couldn't answer. Not once across 24 attempts. Zero percent recall on reasoning questions.

So we built the layer that was missing — and ran 48 controlled experiments to measure the difference.

The problem: AI agents can't answer "why?"

If you give an AI agent a folder of Markdown docs and let it use RAG to find answers, it handles factual questions well. What modules exist? Who owns this component? When was this decision made?

But "why?" is different.

Reasoning is rarely stored as a discrete fact. It's spread across meeting notes, scattered through Slack threads, buried in the third paragraph of a design doc written six months ago. The connection between a strategic goal and an operational decision almost never appears as a single retrievable chunk.

This means vector search finds the documents that mention the decision, but not the reasoning chain that justifies it. The agent knows what happened. It doesn't know why.

This matters more than it sounds. An agent that doesn't understand why a decision was made will make follow-up decisions that are technically correct but institutionally wrong — optimizing for the wrong goal, violating an unwritten constraint, repeating a mistake that was already analyzed and rejected.

Our approach: structured ontology as a navigation layer

We didn't replace the Markdown docs. We added a structured layer on top — a four-level ontology that maps business reasoning into queryable relationships:

LORE       (foundational beliefs, worldview)
  ↓ interpreted_into
VISION     (goals, priorities, boundaries)
  ↓ operationalized_into
RULES      (policies, decision rules, constraints)
  ↓ applied_to
OPERATIONS (initiatives, decisions, tasks)
Enter fullscreen mode Exit fullscreen mode

Every connection between layers carries an assertion — an explicit explanation of why that relationship exists. This means an agent can trace from any operational decision back to the foundational beliefs that justify it.

Here's what that looks like in practice. Ask: "Why did we choose Provider A for identity verification?"

The agent traces:

OPERATIONS → Chose Provider A (affordable, OIDC-compatible)
  ← applied_to
RULES → Start with affordable identity provider, plan migration later
  ← operationalized_into
VISION → Build self-service tools for micro-entrepreneurs
  ← interpreted_into
LORE → Small business owners want to handle accounting themselves
Enter fullscreen mode Exit fullscreen mode

No vector search. No probabilistic retrieval. SQL queries over a versioned database.

The backend is Dolt — a database with Git semantics. Branch, commit, diff, merge, pull request. Every change to the ontology goes through human review before it becomes canonical.

The interface is MCP (Model Context Protocol) — the de facto standard for connecting AI agents to external tools. Our server exposes 18 tools: 9 for querying, 4 for proposing changes, 3 for generating reasoning envelopes, and 2 for Dolt version control.

The experiment

We tested this on a real business domain — a SaaS company's market expansion project. Same knowledge base, same questions, two modes:

  • Mode A: Agent gets Markdown documentation + file search tools
  • Mode B: Agent gets the same knowledge as a structured ontology + Dolt MCP tools

48 sessions. 8 task types. 3 runs per task per mode. Two independent LLM judges (GPT-5.4 and Claude Opus 4.5) evaluated every answer against ground truth.

Results

Metric Markdown + RAG Right Reasons Delta
Entity recall 0.514 0.976 +90%
"Why?" question recall 0.000 1.000 0% → 100%
Reasoning quality (1-5) 1.96 4.33 +121%
Stability (variance) 1.457 0.472 3× more stable
Latency 284.6s 183.8s 35% faster
Pairwise wins 0 20 (4 ties)

The "why?" result is the headline: Mode A scored 0.0 entity recall across all 6 runs on reasoning questions. Not low — zero. Mode B scored 1.0 across all 6 runs. This isn't statistical noise. It's a deterministic gap.

The conventional assumption is that structured retrieval is a tradeoff — better recall but more overhead and higher latency. This experiment showed the opposite: the structured approach was simultaneously more accurate, faster, more stable, and more compact in its answers.

Judge agreement was 83.3%. Average judge confidence was 0.927. The only disagreements were on impact analysis tasks where multiple valid reasoning paths existed.

What we didn't prove (honestly)

  • Ingest: Getting business knowledge into the ontology was manual. This is the hardest unsolved problem.
  • Write path: We only tested reading. Agents proposing ontology changes is designed but not yet benchmarked.
  • Generalization: Tested on one domain (dev planning). Other domains are next.

How knowledge enters the ontology: EPICAL

We're not expecting anyone to manually populate SQL tables. The designed ingest pipeline is called EPICAL:

Source docs → EXTRACT → PONDER → INTERROGATE → CALIBRATE → AUTHENTICATE → LOAD
Enter fullscreen mode Exit fullscreen mode

The first two stages (Extract and Ponder) are agent-driven — the AI proposes candidate objects and relationships from source documents. Interrogate and Calibrate refine confidence. Authenticate is the human gate — a Dolt diff review, just like a code PR. Only after human approval does knowledge become canonical.

The epistemic boundary is strict: an agent cannot bypass human validation. The promote_candidate tool requires authenticated status.

OPS Contracts: reasoning envelopes for external work

One more concept worth mentioning. When work happens in external systems (Jira, GitHub, CI/CD), the agent can generate an OPS Contract — a reasoning envelope that attaches institutional context to a work item:

generate_ops_contract(
    external_work_ref="jira://TASK-123",
    description="Prepare annual report for submission",
    contract_kind="annual_reporting"
)
Enter fullscreen mode Exit fullscreen mode

The contract tells the executing agent why this task matters, what rules apply, and which boundaries must not be crossed — without the agent needing to query the full ontology itself.

Try it

The full repo is open source:

git clone https://github.com/Right-Reasons/right-reasons
cd right-reasons
docker compose up -d
cd mcp-server && pip install -e .
Enter fullscreen mode Exit fullscreen mode

Connect your agent, then ask:

"Why did we choose Provider A over Provider B for identity? Use the get_explanation_packet tool with object ID ex_ops_02."

The agent will trace the full reasoning chain across all four layers.


Right Reasons is built by MindWorks Industries. We're looking for early users who want to give their AI agents actual institutional reasoning. Reach out at hello@rightreasons.ai.

Top comments (0)