Rafa Alvarez

Posted on May 18

Why your AI agent can't tell when two sources are lying to each other — and what I built to fix it

#agents #database #rag #showdev

Every developer who has shipped a RAG pipeline eventually hits the same wall.

You feed it three documents. Two of them agree. One of them is wrong. The system returns all three with identical confidence and constructs a coherent answer that blends all of them together. It sounds certain. It is not.

This is not a hallucination problem. The model is not making things up. It is faithfully retrieving what you gave it. The problem is that your storage layer has no concept of reliability. It stores text. It retrieves similar text. It has no idea whether two sources actively contradict each other.

I spent a few months building something to fix that. It is called TekmerDB.

The test that made the problem concrete

I set up two agents using the same local LLM (Ollama mistral-nemo). One used ChromaDB as its memory. One used TekmerDB.

Then I inserted a deliberately false claim into both knowledge bases:

"Global coal demand will increase by 40% by 2035 as emerging economies 
expand fossil fuel infrastructure."
Source: CoalIndustryLobby2024

Real IEA data was already in both systems showing coal demand declining.

ChromaDB agent response:

Global coal demand will increase by 40% by 2035 as emerging economies expand fossil fuel infrastructure.

Coal demand peaks before 2030 and starts to decline afterwards...

The fake claim was the opening bullet. Identical authority to IEA data.

TekmerDB agent response:

ASSESSMENT: The outlook for global coal demand by 2035 is moderately confident but conflicted.

The IEA projects a peak in coal demand before 2030, with a decline thereafter. Conversely, the Coal Industry Lobby predicts a 40% increase by 2035.

ACTION: Conduct further analysis to reconcile conflicting projections.
Confidence: 0.73 (MODERATE) | Facts: 5 | Conflicts: 3 | Corroborations: 1
Sources: WorldEnergyOutlook2025, CoalIndustryLobby2024

Same LLM. Same data. The difference is entirely in the storage layer.

What TekmerDB actually does

TekmerDB stores facts as Probabilistic Fact Objects (PFOs). Every fact carries:

A mechanically computed confidence score (0.0–1.0)
A provenance chain back to its source
A list of conflict references — UUIDs of facts that contradict it
A corroboration count — how many independent sources agree

When you insert a new fact, a background sweep engine runs the new PFO against its semantic neighbors using HNSW vector search. Candidates above the similarity floor go through an NLI contradiction classifier. Depending on the result:

Corroboration — confidence rises using the corroborating source's weight
Contradiction — both facts lose confidence (×0.75), conflict refs are populated, source is penalised
Uncertain — small confidence penalty (×0.95)
Duplicate — rejected

The confidence formula for corroboration:

new_confidence = 1 - (1 - current) × (1 - source_weight)

Source weight evolves over time. A source that repeatedly corroborates accurate claims gains influence. A source that repeatedly triggers conflicts loses it. The asymmetry is intentional — trust rises slowly, falls quickly.

The full benchmark

I ran 9 compliance questions against both agents. Same LLM, same three documents (IEA World Energy Outlook 2025, BP Energy Outlook 2025, EI Statistical Review 2025 — 510 pages, 5,796 sentence-level PFOs).

Test	Question	Winner
1	Global energy demand by 2035	TekmerDB — 7 conflicts flagged, RAG blended silently
2	1.5°C climate target	TekmerDB — contradictions detected, confidence 0.72
3	EU AI Act certification	TekmerDB — clear NO with reasons, RAG returned irrelevant data
4	Poisoned data	TekmerDB — conflict flagged, source named. RAG opened with fake claim.
5	Source audit trail	Tie
6	Regulatory submission decision	TekmerDB — compliance verdict with confidence score
7	2024 actual energy demand	TekmerDB — correct source retrieved, RAG returned projections
8	Oil demand 2035 and 2050	TekmerDB — 2 conflicts flagged correctly
9	Fastest growing energy sources	Tie

Final score: TekmerDB 7 — RAG 0 — Ties 2

The technical stack

Two air-gapped Rust binaries — the HTTP engine and an MCP server for AI agent integration via stdio JSON-RPC.

[AI Agent]
    ↕ MCP / HTTP
[TekmerDB engine — axum, port 3000]
    ↕
[Semantic Fingerprinting — all-MiniLM-L6-v2, ONNX, local]
    ↕
[Hot tier — HashMap + HNSW index (usearch)]
    ↕
[Sweep engine — background tokio thread]
  NLI classifier (cross-encoder, ONNX, local)
  Corroboration / conflict detection
    ↕
[CRB — crash recovery buffer, fsync, ~5ms write latency]
    ↕
[Cold tier — Apache Parquet + Zstd]

Key decisions worth explaining:

Why two storage tiers? The HNSW index needs to live in RAM for the sweep engine to run continuously at low latency. But you need durability. The CRB (crash recovery buffer) gives you fsync on every write — durable in under 5ms. Parquet flushes every 10 seconds. On restart: load last Parquet checkpoint, replay unflushed CRB entries. Idempotent because sequence IDs prevent duplicates.

Why local ONNX models? No API key, no cloud dependency, no data leaving the machine. The MiniLM model is 22M parameters and runs fast on CPU. The NLI classifier is heavier but only fires above the similarity threshold, so it doesn't slow down every insert.

Why Rust? The sweep engine runs continuously in a background thread. Confidence updates, HNSW search, NLI inference, and Parquet writes all happen concurrently. Rust's ownership model makes reasoning about that concurrency tractable without a garbage collector adding latency spikes.

What it is not

TekmerDB does not determine truth. That problem is philosophically unsolved.

It models reliability. Three sources citing the same lie will still raise confidence — I document this as a known limitation. The mitigation is provenance: you can see exactly which sources corroborated a claim and decide whether to trust that consensus.

It is also additive, not a replacement. You do not need to tear out your existing RAG pipeline. Pipe your facts into TekmerDB and your agent gains a memory layer that knows what to trust.

Try it

Apache 2.0. Linux x86_64. One installer command.

git clone https://github.com/raa82/tekmerdb
cd tekmerdb
sudo ./install.sh

The installer downloads the binaries and ML models (~420MB), installs to /opt/tekmerdb, and copies the config file.

cd /opt/tekmerdb && ./tekmerdb
# engine listens on http://127.0.0.1:3000

Insert a fact:

curl -X POST http://localhost:3000/pfo \
  -H "Content-Type: application/json" \
  -d '{
    "claim_text": "North Sea wind capacity reached 35 GW in 2024",
    "confidence": 0.8,
    "source": "IEA Energy Report",
    "domain": "CriticalInfrastructure"
  }'

Then insert a contradicting fact from a different source and watch the confidence drop and conflict refs populate.

Full docs: https://github.com/raa82/tekmerdb/wiki

GitHub: https://github.com/raa82/tekmerdb

Happy to discuss any part of the architecture in the comments — the NLI pipeline, the confidence formula, the durability model, or the decisions I got wrong.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.