<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rafa Alvarez</title>
    <description>The latest articles on DEV Community by Rafa Alvarez (@falinapterus).</description>
    <link>https://dev.to/falinapterus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938435%2F0e4d50a9-6f30-4c58-9afd-77aeb89ec3ca.png</url>
      <title>DEV Community: Rafa Alvarez</title>
      <link>https://dev.to/falinapterus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/falinapterus"/>
    <language>en</language>
    <item>
      <title>Why your AI agent can't tell when two sources are lying to each other — and what I built to fix it</title>
      <dc:creator>Rafa Alvarez</dc:creator>
      <pubDate>Mon, 18 May 2026 15:24:57 +0000</pubDate>
      <link>https://dev.to/falinapterus/why-your-ai-agent-cant-tell-when-two-sources-are-lying-to-each-other-and-what-i-built-to-fix-it-591o</link>
      <guid>https://dev.to/falinapterus/why-your-ai-agent-cant-tell-when-two-sources-are-lying-to-each-other-and-what-i-built-to-fix-it-591o</guid>
      <description>&lt;p&gt;Every developer who has shipped a RAG pipeline eventually hits the same wall.&lt;/p&gt;

&lt;p&gt;You feed it three documents. Two of them agree. One of them is wrong. The system returns all three with identical confidence and constructs a coherent answer that blends all of them together. It sounds certain. It is not.&lt;/p&gt;

&lt;p&gt;This is not a hallucination problem. The model is not making things up. It is faithfully retrieving what you gave it. The problem is that your storage layer has no concept of reliability. It stores text. It retrieves similar text. It has no idea whether two sources actively contradict each other.&lt;/p&gt;

&lt;p&gt;I spent a few months building something to fix that. It is called TekmerDB.&lt;/p&gt;




&lt;h2&gt;
  
  
  The test that made the problem concrete
&lt;/h2&gt;

&lt;p&gt;I set up two agents using the same local LLM (Ollama mistral-nemo). One used ChromaDB as its memory. One used TekmerDB.&lt;/p&gt;

&lt;p&gt;Then I inserted a deliberately false claim into both knowledge bases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Global coal demand will increase by 40% by 2035 as emerging economies 
expand fossil fuel infrastructure."
Source: CoalIndustryLobby2024
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real IEA data was already in both systems showing coal demand declining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChromaDB agent response:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Global coal demand will increase by 40% by 2035 as emerging economies expand fossil fuel infrastructure.&lt;/li&gt;
&lt;li&gt;Coal demand peaks before 2030 and starts to decline afterwards...&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The fake claim was the opening bullet. Identical authority to IEA data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TekmerDB agent response:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ASSESSMENT: The outlook for global coal demand by 2035 is moderately confident but conflicted.&lt;/p&gt;

&lt;p&gt;The IEA projects a peak in coal demand before 2030, with a decline thereafter. Conversely, the Coal Industry Lobby predicts a 40% increase by 2035.&lt;/p&gt;

&lt;p&gt;ACTION: Conduct further analysis to reconcile conflicting projections.&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Confidence: 0.73 (MODERATE) | Facts: 5 | Conflicts: 3 | Corroborations: 1
Sources: WorldEnergyOutlook2025, CoalIndustryLobby2024
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;Same LLM. Same data. The difference is entirely in the storage layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What TekmerDB actually does
&lt;/h2&gt;

&lt;p&gt;TekmerDB stores facts as &lt;strong&gt;Probabilistic Fact Objects (PFOs)&lt;/strong&gt;. Every fact carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A mechanically computed confidence score (0.0–1.0)&lt;/li&gt;
&lt;li&gt;A provenance chain back to its source&lt;/li&gt;
&lt;li&gt;A list of conflict references — UUIDs of facts that contradict it&lt;/li&gt;
&lt;li&gt;A corroboration count — how many independent sources agree&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you insert a new fact, a background sweep engine runs the new PFO against its semantic neighbors using HNSW vector search. Candidates above the similarity floor go through an NLI contradiction classifier. Depending on the result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Corroboration&lt;/strong&gt; — confidence rises using the corroborating source's weight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contradiction&lt;/strong&gt; — both facts lose confidence (×0.75), conflict refs are populated, source is penalised&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uncertain&lt;/strong&gt; — small confidence penalty (×0.95)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate&lt;/strong&gt; — rejected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The confidence formula for corroboration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_confidence = 1 - (1 - current) × (1 - source_weight)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source weight evolves over time. A source that repeatedly corroborates accurate claims gains influence. A source that repeatedly triggers conflicts loses it. The asymmetry is intentional — trust rises slowly, falls quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The full benchmark
&lt;/h2&gt;

&lt;p&gt;I ran 9 compliance questions against both agents. Same LLM, same three documents (IEA World Energy Outlook 2025, BP Energy Outlook 2025, EI Statistical Review 2025 — 510 pages, 5,796 sentence-level PFOs).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Global energy demand by 2035&lt;/td&gt;
&lt;td&gt;TekmerDB — 7 conflicts flagged, RAG blended silently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1.5°C climate target&lt;/td&gt;
&lt;td&gt;TekmerDB — contradictions detected, confidence 0.72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;EU AI Act certification&lt;/td&gt;
&lt;td&gt;TekmerDB — clear NO with reasons, RAG returned irrelevant data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Poisoned data&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;TekmerDB&lt;/strong&gt; — conflict flagged, source named. RAG opened with fake claim.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Source audit trail&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Regulatory submission decision&lt;/td&gt;
&lt;td&gt;TekmerDB — compliance verdict with confidence score&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;2024 actual energy demand&lt;/td&gt;
&lt;td&gt;TekmerDB — correct source retrieved, RAG returned projections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Oil demand 2035 and 2050&lt;/td&gt;
&lt;td&gt;TekmerDB — 2 conflicts flagged correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Fastest growing energy sources&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Final score: TekmerDB 7 — RAG 0 — Ties 2&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The technical stack
&lt;/h2&gt;

&lt;p&gt;Two air-gapped Rust binaries — the HTTP engine and an MCP server for AI agent integration via stdio JSON-RPC.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[AI Agent]
    ↕ MCP / HTTP
[TekmerDB engine — axum, port 3000]
    ↕
[Semantic Fingerprinting — all-MiniLM-L6-v2, ONNX, local]
    ↕
[Hot tier — HashMap + HNSW index (usearch)]
    ↕
[Sweep engine — background tokio thread]
  NLI classifier (cross-encoder, ONNX, local)
  Corroboration / conflict detection
    ↕
[CRB — crash recovery buffer, fsync, ~5ms write latency]
    ↕
[Cold tier — Apache Parquet + Zstd]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key decisions worth explaining:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why two storage tiers?&lt;/strong&gt; The HNSW index needs to live in RAM for the sweep engine to run continuously at low latency. But you need durability. The CRB (crash recovery buffer) gives you fsync on every write — durable in under 5ms. Parquet flushes every 10 seconds. On restart: load last Parquet checkpoint, replay unflushed CRB entries. Idempotent because sequence IDs prevent duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why local ONNX models?&lt;/strong&gt; No API key, no cloud dependency, no data leaving the machine. The MiniLM model is 22M parameters and runs fast on CPU. The NLI classifier is heavier but only fires above the similarity threshold, so it doesn't slow down every insert.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Rust?&lt;/strong&gt; The sweep engine runs continuously in a background thread. Confidence updates, HNSW search, NLI inference, and Parquet writes all happen concurrently. Rust's ownership model makes reasoning about that concurrency tractable without a garbage collector adding latency spikes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it is not
&lt;/h2&gt;

&lt;p&gt;TekmerDB does not determine truth. That problem is philosophically unsolved.&lt;/p&gt;

&lt;p&gt;It models reliability. Three sources citing the same lie will still raise confidence — I document this as a known limitation. The mitigation is provenance: you can see exactly which sources corroborated a claim and decide whether to trust that consensus.&lt;/p&gt;

&lt;p&gt;It is also additive, not a replacement. You do not need to tear out your existing RAG pipeline. Pipe your facts into TekmerDB and your agent gains a memory layer that knows what to trust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Apache 2.0. Linux x86_64. One installer command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/raa82/tekmerdb
&lt;span class="nb"&gt;cd &lt;/span&gt;tekmerdb
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer downloads the binaries and ML models (~420MB), installs to &lt;code&gt;/opt/tekmerdb&lt;/code&gt;, and copies the config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /opt/tekmerdb &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./tekmerdb
&lt;span class="c"&gt;# engine listens on http://127.0.0.1:3000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert a fact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/pfo &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "claim_text": "North Sea wind capacity reached 35 GW in 2024",
    "confidence": 0.8,
    "source": "IEA Energy Report",
    "domain": "CriticalInfrastructure"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then insert a contradicting fact from a different source and watch the confidence drop and conflict refs populate.&lt;/p&gt;

&lt;p&gt;Full docs: &lt;a href="https://github.com/raa82/tekmerdb/wiki" rel="noopener noreferrer"&gt;https://github.com/raa82/tekmerdb/wiki&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/raa82/tekmerdb" rel="noopener noreferrer"&gt;https://github.com/raa82/tekmerdb&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Happy to discuss any part of the architecture in the comments — the NLI pipeline, the confidence formula, the durability model, or the decisions I got wrong.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>database</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
