<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 2 dogs and a nerd</title>
    <description>The latest articles on DEV Community by 2 dogs and a nerd (@2_dogsandanerd_e4f473e).</description>
    <link>https://dev.to/2_dogsandanerd_e4f473e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3752043%2F5663f9f2-33c4-4960-a610-3e5142461bb6.png</url>
      <title>DEV Community: 2 dogs and a nerd</title>
      <link>https://dev.to/2_dogsandanerd_e4f473e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/2_dogsandanerd_e4f473e"/>
    <language>en</language>
    <item>
      <title>My Sister Hates My New Openclaw (Because I'm Finally Right)</title>
      <dc:creator>2 dogs and a nerd</dc:creator>
      <pubDate>Wed, 04 Feb 2026 08:43:32 +0000</pubDate>
      <link>https://dev.to/2_dogsandanerd_e4f473e/my-sister-hates-my-new-openclaw-because-im-finally-right-4d54</link>
      <guid>https://dev.to/2_dogsandanerd_e4f473e/my-sister-hates-my-new-openclaw-because-im-finally-right-4d54</guid>
      <description>&lt;p&gt;I built &lt;strong&gt;ClawRAG&lt;/strong&gt; because I was tired of losing arguments as she is always right ;) Not just any arguments – the ones where my sister is convinced she's right about some old receipt, warranty or contract clause&lt;/p&gt;

&lt;p&gt;I needed my entire paper trail with me at all times. But more importantly, I needed a way to query it from WhatsApp while on the move&lt;/p&gt;

&lt;p&gt;"Standard" RAG Fails&lt;br&gt;
Most RAG  setups work fine for simple text. But real-world documents are messy. They have legacy tables, complex layouts, and exact terms that get lost in vector-only search&lt;/p&gt;

&lt;p&gt;If I'm arguing about a warranty, I need the exact page and section&lt;/p&gt;
&lt;h3&gt;
  
  
  Pillar 1: Structure-First Ingestion with Docling
&lt;/h3&gt;

&lt;p&gt;Most PDF parsers treat a page like a flat bag of words. I chose &lt;strong&gt;Docling&lt;/strong&gt; because it understands document structure. If there's a table on page 12, Docling extracts it as Markdown, preserving the rows and columns that a standard parser would turn into junk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How we configure the pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract from backend/src/core/docling_loader.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docling.document_converter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentConverter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PdfFormatOption&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docling.datamodel.pipeline_options&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PdfPipelineOptions&lt;/span&gt;

&lt;span class="c1"&gt;# We force table and figure extraction to preserve structure
&lt;/span&gt;&lt;span class="n"&gt;pipeline_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PdfPipelineOptions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pipeline_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_picture_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;pipeline_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_table_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="n"&gt;converter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentConverter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;format_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;InputFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PDF&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;PdfFormatOption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;markdown_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export_to_markdown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pillar 2: Hybrid Search Accuracy
&lt;/h3&gt;

&lt;p&gt;Vector search (semantic) is the hype, but keyword search (BM25) is the truth. If I search for "Article 4.2", a vector model might give me "something similar". BM25 finds exactly "Article 4.2".&lt;/p&gt;

&lt;p&gt;ClawRAG uses &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt; to combine both worlds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fusion Logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract from backend/src/core/retrievers/hybrid_retriever.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_fuse_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results_per_retriever&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;fused_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_per_retriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_with_score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node_with_score&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id_&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fused_scores&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;fused_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;node_with_score&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;# Reciprocal Rank Fusion formula (default k=60)
&lt;/span&gt;            &lt;span class="n"&gt;fused_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Sort candidates by combined score
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fused_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pillar 3: Remote Access via MCP
&lt;/h3&gt;

&lt;p&gt;The reason my sister hates it? I have it on my phone. ClawRAG implements the &lt;strong&gt;MCP&lt;/strong&gt;, allowing me to connect it to &lt;strong&gt;OpenClaw (ClawBot)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My agent "sees" the knowledge base as a tool. When I ask a question on WhatsApp, the agent calls ClawRAG, retrieves the context, and answers me in seconds&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Conversation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Me:&lt;/strong&gt; "Hey, is the fridge still covered? Sis says warranty is over."&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;ClawRAG:&lt;/strong&gt; "Actually, checking &lt;code&gt;Invoice_Kitchen_2023.pdf&lt;/code&gt;: You have a 3-year 'Premium' extension. It expires June 2026. Your sister is likely thinking of the standard 12-month manufacturer warranty."&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why I Open Sourced It
&lt;/h3&gt;

&lt;p&gt;ClawRAG isn't just a toy; it's the core extracted from our professional RAG platform. It's built for developers who need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Production Readiness&lt;/strong&gt;: No mocks, no fakes&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Privacy&lt;/strong&gt;: Runs 100% local with Ollama and ChromaDB&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Extensibility&lt;/strong&gt;: MIT licensed and easy to hook into existing agents&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Check it out, star it, and tell me where it breaks:&lt;br&gt;
&lt;a href="https://github.com/2dogsandanerd/ClawRag" rel="noopener noreferrer"&gt;github.com/2dogsandanerd/ClawRag&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. If you're reading this, Sis: I'm still waiting for that new flatscreen you bet me&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>mcp</category>
      <category>ai</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
