<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Enrique Gordoncillo</title>
    <description>The latest articles on DEV Community by Enrique Gordoncillo (@enrique_gordoncillo_0e27b).</description>
    <link>https://dev.to/enrique_gordoncillo_0e27b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3945764%2F12e070bc-3831-4f89-a9e7-f707d2e4f11e.png</url>
      <title>DEV Community: Enrique Gordoncillo</title>
      <link>https://dev.to/enrique_gordoncillo_0e27b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/enrique_gordoncillo_0e27b"/>
    <language>en</language>
    <item>
      <title>Building a P2P 'Wikipedia for Machines': Verifiable RAG with the Holepunch Stack</title>
      <dc:creator>Enrique Gordoncillo</dc:creator>
      <pubDate>Fri, 22 May 2026 09:43:45 +0000</pubDate>
      <link>https://dev.to/enrique_gordoncillo_0e27b/building-a-p2p-wikipedia-for-machines-verifiable-rag-with-the-holepunch-stack-1kge</link>
      <guid>https://dev.to/enrique_gordoncillo_0e27b/building-a-p2p-wikipedia-for-machines-verifiable-rag-with-the-holepunch-stack-1kge</guid>
      <description>&lt;p&gt;Current LLMs hallucinate when they lack context. While Retrieval-Augmented Generation (RAG) helps, existing pipelines force a tough compromise: you either trust centralized search APIs, scrape the live web (which is slow, fragile, and bloated with SEO spam), or maintain heavy, complex crawling infrastructure yourself.&lt;/p&gt;

&lt;p&gt;More importantly, none of these methods give you cryptographic proof that the content your LLM is citing actually came from the source you claim.&lt;/p&gt;

&lt;p&gt;To solve this, I've been building HIVE—a decentralized, peer-to-peer knowledge base designed to be consumed by LLMs, not humans. Think of it as a "Wikipedia for machines" that is completely P2P, cryptographically verifiable, and has no central authority.&lt;/p&gt;

&lt;p&gt;The Architecture: BEEs and Queens&lt;br&gt;
Instead of building a monolithic crawler, HIVE splits the workload into two distinct peer topologies using a pure Holepunch stack (Hypercore + Hyperswarm). There is no shared coordination ledger or Autobase; it relies entirely on single-writer logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ Wikipedia / arXiv / RSS ]
│
▼ (Autonomous Extraction &amp;amp; Signing)
┌─────────┐
│   BEE   │ (Producer Node - Low Power)
└────┬────┘
│
▼ (Hyperswarm DHT / P2P Replication)
┌─────────┐
│  QUEEN  │ (Consumer Node - Heavy Indexer)
└────┬────┘
│
▼ (Semantic Vector Search)
[ Qdrant ] ──► [ Local LLM / API Synthesis ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;BEEs (Producer Nodes)
BEEs are lightweight, Raspberry Pi-friendly nodes. They autonomously crawl and extract knowledge from structured sources (currently heavy on Wikipedia, with initial arXiv and RSS support).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every extracted text fragment is ed25519-signed by the BEE.&lt;/p&gt;

&lt;p&gt;The signed fragments are appended to the node's local, single-writer Hypercore feed.&lt;/p&gt;

&lt;p&gt;Other nodes can only replicate this feed as read-only.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Queens (Consumer Nodes)
Queens are the heavy indexer and query nodes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;They discover active BEEs via the Hyperswarm DHT.&lt;/p&gt;

&lt;p&gt;They replicate all discovery feeds entirely over P2P—there is zero HTTP traffic between nodes.&lt;/p&gt;

&lt;p&gt;As they ingest the data, they verify the ed25519 signatures to guarantee data integrity.&lt;/p&gt;

&lt;p&gt;The fragments are then indexed into a local Qdrant vector database to serve semantic queries.&lt;/p&gt;

&lt;p&gt;When an LLM queries a Queen, it receives signed, source-traceable fragments with absolute cryptographic proof of origin.&lt;/p&gt;

&lt;p&gt;Try It In 3 Minutes (Docker Compose)&lt;br&gt;
If you want to spin up the full stack locally—including a BEE node, a Queen indexer, Qdrant, and a reverse proxy—you can run it via Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/capybarist/hive.git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;hive
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
nano .env  &lt;span class="c"&gt;# Paste your LLM API Key (e.g., a free Gemini key from aistudio.google.com)&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Queen Query Endpoint: Open &lt;a href="http://localhost" rel="noopener noreferrer"&gt;http://localhost&lt;/a&gt; to test the AI synthesis.&lt;/p&gt;

&lt;p&gt;BEE Dashboard: Open &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt; to monitor autonomous extraction activity.&lt;/p&gt;

&lt;p&gt;Minimal Path (~150MB)&lt;br&gt;
If you just want to run a lightweight BEE node to contribute to the network without a heavy vector indexing endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/capybarist/hive.git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;hive
npm &lt;span class="nb"&gt;install
echo&lt;/span&gt; &lt;span class="s2"&gt;"LLM_PROVIDER=gemini"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"LLM_API_KEY=your_key"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
bash hive.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Current State &amp;amp; Honest Limitations (v0.7.0)&lt;br&gt;
HIVE is functional, but it is not battle-tested at a massive scale yet. Here is exactly where the project stands today:&lt;/p&gt;

&lt;p&gt;What Works: Full Bee/Queen role splitting, native Hypercore replication over Hyperswarm DHT, automated signature verification on block receipt, and a working Wikipedia forager with a persistent BFS crawl queue.&lt;/p&gt;

&lt;p&gt;The Bottlenecks: The extraction rate is highly dependent on your local hardware or your LLM provider's rate limits (e.g., Groq free tier yields ~6k fragments/day, while an Ollama CPU setup does around ~600/day). Additionally, Hyperswarm DHT can still be finicky behind aggressive corporate firewalls.&lt;/p&gt;

&lt;p&gt;Roadmap: What's Next (v0.7.x)&lt;br&gt;
Source-Driven Extraction: Transitioning from a static topic tree to per-BEE source declarations (Wikipedia, arXiv, Common Crawl).&lt;/p&gt;

&lt;p&gt;Unified ForagerSource Interface: Making it so adding a new data source only requires implementing a single interface file.&lt;/p&gt;

&lt;p&gt;Common Crawl Integration: Indexing the open web via reproducible, immutable snapshots instead of live scraping.&lt;/p&gt;

&lt;p&gt;Score-by-Corroboration: Implementing an algorithm to boost content confidence when multiple independent BEEs extract identical facts from different sources.&lt;/p&gt;

&lt;p&gt;Looking for Feedback&lt;br&gt;
HIVE is completely open-source (BUSL, automatically converting to MIT in 4 years). There is no startup behind this, no VC funding, and absolutely no token economy or cryptocurrency attached. I built it because I needed verifiable local RAG and couldn't find an existing solution.&lt;/p&gt;

&lt;p&gt;I would love to hear from developers working on P2P architectures, distributed systems, or self-hosted RAG:&lt;/p&gt;

&lt;p&gt;The Topo Split: Does the Bee/Queen separation feel sound to you, or do you see architectural edge cases that could lead to unexpected bottlenecks?&lt;/p&gt;

&lt;p&gt;Open Web Ingestion: Is leveraging Common Crawl snapshots a solid approach for reproducible web ingestion, or are the local storage overheads too punishing for self-hosted nodes?&lt;/p&gt;

&lt;p&gt;The P2P Commons: Would you dedicate spare home-lab compute or bandwidth to seed/produce signed knowledge fragments for a shared network?&lt;/p&gt;

&lt;p&gt;Let me know your thoughts or hit me with some tough architectural critique in the comments!&lt;/p&gt;

&lt;p&gt;📂 Source Code: &lt;a href="https://github.com/capybarist/hive" rel="noopener noreferrer"&gt;https://github.com/capybarist/hive&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📖 Manifesto: &lt;a href="https://github.com/capybarist/hive/blob/main/MANIFESTO.md" rel="noopener noreferrer"&gt;https://github.com/capybarist/hive/blob/main/MANIFESTO.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🏗️ Technical Deep-Dive: &lt;a href="https://github.com/capybarist/hive/blob/main/CLAUDE.md" rel="noopener noreferrer"&gt;https://github.com/capybarist/hive/blob/main/CLAUDE.md&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>node</category>
      <category>p2p</category>
    </item>
  </channel>
</rss>
