<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AVINASH S KARANTH</title>
    <description>The latest articles on DEV Community by AVINASH S KARANTH (@avinash_s_karanth).</description>
    <link>https://dev.to/avinash_s_karanth</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3910531%2F0296ea82-9c25-4535-9d9c-006f91e74e77.jpg</url>
      <title>DEV Community: AVINASH S KARANTH</title>
      <link>https://dev.to/avinash_s_karanth</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/avinash_s_karanth"/>
    <language>en</language>
    <item>
      <title>p2p-ai — Distributed Peer-to-Peer AI Inference Network</title>
      <dc:creator>AVINASH S KARANTH</dc:creator>
      <pubDate>Fri, 22 May 2026 08:16:33 +0000</pubDate>
      <link>https://dev.to/avinash_s_karanth/p2p-ai-distributed-peer-to-peer-ai-inference-network-f38</link>
      <guid>https://dev.to/avinash_s_karanth/p2p-ai-distributed-peer-to-peer-ai-inference-network-f38</guid>
      <description>&lt;h1&gt;
  
  
  p2p-ai — Distributed Peer-to-Peer AI Inference Network
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;p2p-ai&lt;/strong&gt; is a distributed peer-to-peer AI inference network. Instead of relying on a centralized cloud API, every client that has the model can act as an inference provider for others. When your device doesn't have the model (or is already busy), it can offload generation tasks to available peers in the network — with zero task content ever touching the server.&lt;/p&gt;

&lt;p&gt;The architecture is fully end-to-end encrypted: a Node.js server acts only as a signaling layer and peer registry, while all prompts, tokens, and outputs flow directly between peers over WebRTC data channels. The server never sees the content — only metadata like token counts and job duration for performance analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On-device inference&lt;/strong&gt; via MediaPipe LLM Inference API in the browser&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time token streaming&lt;/strong&gt; from provider to requester over PeerJS DataConnections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid RSA-OAEP + AES-256-GCM encryption&lt;/strong&gt; per task — a fresh AES key is generated for every request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fair provider cycling&lt;/strong&gt; so load is distributed evenly across the pool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web and mobile clients&lt;/strong&gt; (web is fully functional; React Native client is scaffolded for LiteRT-LM)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Live deployment: &lt;a href="https://p2p-ai.nash.dpdns.org/" rel="noopener noreferrer"&gt;https://p2p-ai.nash.dpdns.org/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project is self-hosted: the server runs Node.js + MariaDB, and the web client is served statically. You can spin it up locally with Docker or &lt;code&gt;pnpm start&lt;/code&gt; + &lt;code&gt;pnpm web&lt;/code&gt;. A short video walkthrough is coming soon.&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repositories:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web: &lt;a href="https://github.com/AvinashSKaranth/p2p-ai" rel="noopener noreferrer"&gt;https://github.com/AvinashSKaranth/p2p-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Mobile (WIP): &lt;a href="https://github.com/AvinashSKaranth/p2p-ai-react-native" rel="noopener noreferrer"&gt;https://github.com/AvinashSKaranth/p2p-ai-react-native&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Node.js, Express, MariaDB, PeerServer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web client&lt;/td&gt;
&lt;td&gt;Vanilla JS ES modules, PeerJS, MediaPipe Tasks GenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mobile client&lt;/td&gt;
&lt;td&gt;Expo + React Native (LiteRT-LM bridge planned)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose &lt;strong&gt;Gemma 4 E4B IT&lt;/strong&gt; (&lt;code&gt;gemma-4-E4B-it&lt;/code&gt;) as the inference engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why E4B was the right fit:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It runs in the browser.&lt;/strong&gt; The E4B variant is small enough to download, cache in the Origin Private File System, and execute with the MediaPipe LLM Inference API via WebGPU/WASM — no server-side GPU farm needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality at the edge.&lt;/strong&gt; The instruction-tuned version gives good chat completions for summarization, translation, and creative writing directly on user hardware, which is exactly what a P2P network needs: capable providers without cloud costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mobile symmetry.&lt;/strong&gt; The same E4B checkpoint is available for LiteRT-LM, so the upcoming React Native client can share the same model weights and behavior as the web client. A single model family keeps the network homogeneous — any provider can serve any requester.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;IT (instruction-tuned) for generic tasks.&lt;/strong&gt; The network is task-agnostic at the transport layer; requesters construct their own prompts. Using an instruction-tuned model means the provider can handle arbitrary prompts without custom fine-tuning per task type.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The web client downloads the model from HuggingFace on first use, stores it locally via OPFS, and then registers itself as a provider. If the model isn't present, the client operates purely as a requester and offloads to available peers. This "compute when you can, borrow when you can't" design is what makes Gemma 4 the perfect backbone for a privacy-first, decentralized inference mesh.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evolution: Scaling Toward a Large-Scale Distributed Intelligence
&lt;/h2&gt;

&lt;p&gt;The current architecture proves the concept of privacy-preserving peer inference. The following outlines how this foundation can evolve into a large-scale operation capable of distributed summarization, translation, index building, and beyond.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large-Scale Task Pipelines
&lt;/h3&gt;

&lt;p&gt;The transport layer is already task-agnostic — providers execute whatever prompt they receive. Scaling to fleet-level workloads means introducing a &lt;strong&gt;task queue and coordinator layer&lt;/strong&gt; on top of the existing peer registry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Summarization at scale:&lt;/strong&gt; Long documents can be chunked client-side, dispatched to multiple peers in parallel (map), and reassembled by the requesting node (reduce). The server coordinates chunk routing without ever seeing content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation pipelines:&lt;/strong&gt; Language pairs can be broadcast as metadata so providers with multilingual model variants (or fine-tuned checkpoints) self-select for relevant tasks, improving quality without centralized routing logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index building:&lt;/strong&gt; Each peer can emit structured summaries or entity extractions from processed documents. These fragments can be aggregated incrementally into a distributed inverted index, with the server maintaining only routing tables and shard assignments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Embedded Browser Agent
&lt;/h3&gt;

&lt;p&gt;A natural evolution is building a &lt;strong&gt;browser agent directly into the p2p-ai client&lt;/strong&gt;. Rather than just processing prompts, the agent would:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a URL and extract the full readable content from the page (article text, headings, metadata).&lt;/li&gt;
&lt;li&gt;Run a local summarization or entity-extraction pass using the on-device Gemma model.&lt;/li&gt;
&lt;li&gt;Package the result (title, URL, summary, key terms) into a structured document ready for indexing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This transforms every participating device from a passive inference provider into an &lt;strong&gt;active knowledge harvester&lt;/strong&gt; — contributing structured intelligence back to the network as a byproduct of normal browsing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Federated Open Search Index
&lt;/h3&gt;

&lt;p&gt;The output of the browser agent feeds directly into a &lt;strong&gt;federated open search layer&lt;/strong&gt;. Rather than a single centralized index, results are pushed to one or more open-source search backends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://mwmbl.org/" rel="noopener noreferrer"&gt;Mwmbl&lt;/a&gt;&lt;/strong&gt; — a community-maintained, open-source search engine explicitly designed for crowdsourced crawling. p2p-ai peers become crawlers contributing to the public index.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is pluggable: the peer emits a structured indexing event (title, URL, excerpt, embeddings) and a thin adapter routes it to whichever backend is configured. Privacy is preserved because the &lt;em&gt;content processing&lt;/em&gt; happens on-device; only the structured summary is transmitted to the index.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Full Vision: Device-Native Intelligence Mesh
&lt;/h3&gt;

&lt;p&gt;Combining all three layers — &lt;strong&gt;P2P inference → browser agent → open search index&lt;/strong&gt; — produces a system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every device in the network contributes both compute (running inference for peers) and knowledge (harvesting and indexing web content).&lt;/li&gt;
&lt;li&gt;No single entity controls the index or the inference capacity.&lt;/li&gt;
&lt;li&gt;Users retain full ownership of their browsing data; only opt-in summaries are shared.&lt;/li&gt;
&lt;li&gt;The collective result is a continuously growing, crowd-sourced, AI-enriched knowledge base — built from the aggregate idle capacity of thousands of devices, without any centralized cloud infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the long-term direction for p2p-ai: not just a demo of edge inference, but a blueprint for &lt;strong&gt;decentralized, device-native AI at internet scale&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
