<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nir Strulovitz</title>
    <description>The latest articles on DEV Community by Nir Strulovitz (@ludongbin).</description>
    <link>https://dev.to/ludongbin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847706%2F2d5a0b8d-1755-4713-b9e6-4d90c5161a48.jpg</url>
      <title>DEV Community: Nir Strulovitz</title>
      <link>https://dev.to/ludongbin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ludongbin"/>
    <language>en</language>
    <item>
      <title>Distributed AI platform — task parallelism instead of model splitting, and why every other approach has it backwards</title>
      <dc:creator>Nir Strulovitz</dc:creator>
      <pubDate>Sat, 28 Mar 2026 13:24:19 +0000</pubDate>
      <link>https://dev.to/ludongbin/distributed-ai-platform-task-parallelism-instead-of-model-splitting-and-why-every-other-approach-29i1</link>
      <guid>https://dev.to/ludongbin/distributed-ai-platform-task-parallelism-instead-of-model-splitting-and-why-every-other-approach-29i1</guid>
      <description>&lt;p&gt;The problem&lt;/p&gt;

&lt;p&gt;You have multiple computers at home, each capable of running a local LLM. How do you make them work together?&lt;/p&gt;

&lt;p&gt;Every existing project — exo, distributed-llama, llama.cpp RPC — tries to split the model across machines. In theory this works. In practice, inter-node network latency kills performance, especially on home networks.&lt;/p&gt;

&lt;p&gt;A different approach&lt;/p&gt;

&lt;p&gt;I took the opposite path: split the task, not the model.&lt;/p&gt;

&lt;p&gt;One machine (the "Queen") receives a complex job and uses its local LLM to decompose it into independent subtasks. Other machines ("Workers"), each running their own complete local LLM, pick up one subtask each and process them in parallel. The Queen collects all results and combines them into the final answer.&lt;/p&gt;

&lt;p&gt;The key insight: if the subtasks are independent, the workers never need to communicate with each other. Zero inter-node communication during inference. Each worker is fully self-contained.&lt;/p&gt;

&lt;p&gt;Why this matters&lt;/p&gt;

&lt;p&gt;For individuals: You probably have more than one computer at home. Your desktop, your laptop, maybe an old machine sitting in a closet. Each one can contribute its processing power.&lt;/p&gt;

&lt;p&gt;For companies: Instead of paying OpenAI or Google for cloud AI, you can run a cluster of ordinary machines with local models. The cost reduction is massive.&lt;/p&gt;

&lt;p&gt;For the open-source community: This is like torrent technology but for AI computation. Instead of one powerful server, a swarm of ordinary machines collaborates.&lt;/p&gt;

&lt;p&gt;How it works&lt;/p&gt;

&lt;p&gt;User submits a complex task&lt;br&gt;
           |&lt;br&gt;
           v&lt;br&gt;
  Queen machine splits it into subtasks&lt;br&gt;
      /    |     \&lt;br&gt;
     v     v      v&lt;br&gt;
  Worker  Worker  Worker  (each processes independently)&lt;br&gt;
      \    |     /&lt;br&gt;
       v   v    v&lt;br&gt;
  Queen combines all results&lt;br&gt;
           |&lt;br&gt;
           v&lt;br&gt;
  User receives the final answer&lt;/p&gt;

&lt;p&gt;Workers can drop in and out at any time without breaking anything. If a worker disappears, its subtask times out and becomes available for another worker to pick up. True fault tolerance.&lt;/p&gt;

&lt;p&gt;The tech stack&lt;/p&gt;

&lt;p&gt;Platform (BeehiveOfAI):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flask 3.1.1 backend&lt;/li&gt;
&lt;li&gt;SQLAlchemy with SQLite&lt;/li&gt;
&lt;li&gt;Flask-Login authentication&lt;/li&gt;
&lt;li&gt;PayPal REST API for payments&lt;/li&gt;
&lt;li&gt;Cloudflare Tunnel deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Desktop Client (HoneycombOfAI):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyQt6 native desktop GUI&lt;/li&gt;
&lt;li&gt;CLI mode with Rich formatting&lt;/li&gt;
&lt;li&gt;5 AI backends: Ollama, LM Studio, llama.cpp (server), llama-cpp-python (in-process), vLLM&lt;/li&gt;
&lt;li&gt;All backends behind an abstract interface with factory pattern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Payment system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workers earn money for their compute&lt;/li&gt;
&lt;li&gt;65% goes to Workers, 30% to the Queen, 5% platform fee&lt;/li&gt;
&lt;li&gt;PayPal integration built in, architecture ready for Stripe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real test results&lt;/p&gt;

&lt;p&gt;Two Linux machines on my home network:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Machine 1: Linux Mint 22.2, RTX 4070 Ti&lt;/li&gt;
&lt;li&gt;Machine 2: Debian 13, RTX 5090&lt;/li&gt;
&lt;li&gt;LAN test: 64 seconds for a full distributed task&lt;/li&gt;
&lt;li&gt;Internet test (via Cloudflare): 29 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code&lt;/p&gt;

&lt;p&gt;Everything is open source under MIT license. Three repos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BeehiveOfAI — the platform (Flask backend, coordination logic)&lt;/li&gt;
&lt;li&gt;HoneycombOfAI — the desktop client (PyQt6 GUI, CLI, 5 backend integrations)&lt;/li&gt;
&lt;li&gt;TheDistributedAIRevolution — a non-technical book explaining the concept&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All under one GitHub account: &lt;a href="https://github.com/strulovitz" rel="noopener noreferrer"&gt;https://github.com/strulovitz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built in 7 days by one developer. I want people to fork this, build on it, take it in their own direction. The more machines running distributed local AI, the better for everyone.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>python</category>
      <category>distributedcomputing</category>
    </item>
  </channel>
</rss>
