Distributed AI platform — task parallelism instead of model splitting, and why every other approach has it backwards

#opensource #ai #python #distributedcomputing

The problem

You have multiple computers at home, each capable of running a local LLM. How do you make them work together?

Every existing project — exo, distributed-llama, llama.cpp RPC — tries to split the model across machines. In theory this works. In practice, inter-node network latency kills performance, especially on home networks.

A different approach

I took the opposite path: split the task, not the model.

One machine (the "Queen") receives a complex job and uses its local LLM to decompose it into independent subtasks. Other machines ("Workers"), each running their own complete local LLM, pick up one subtask each and process them in parallel. The Queen collects all results and combines them into the final answer.

The key insight: if the subtasks are independent, the workers never need to communicate with each other. Zero inter-node communication during inference. Each worker is fully self-contained.

Why this matters

For individuals: You probably have more than one computer at home. Your desktop, your laptop, maybe an old machine sitting in a closet. Each one can contribute its processing power.

For companies: Instead of paying OpenAI or Google for cloud AI, you can run a cluster of ordinary machines with local models. The cost reduction is massive.

For the open-source community: This is like torrent technology but for AI computation. Instead of one powerful server, a swarm of ordinary machines collaborates.

How it works

User submits a complex task
|
v
Queen machine splits it into subtasks
/ | \
v v v
Worker Worker Worker (each processes independently)
\ | /
v v v
Queen combines all results
|
v
User receives the final answer

Workers can drop in and out at any time without breaking anything. If a worker disappears, its subtask times out and becomes available for another worker to pick up. True fault tolerance.

The tech stack

Platform (BeehiveOfAI):

Flask 3.1.1 backend
SQLAlchemy with SQLite
Flask-Login authentication
PayPal REST API for payments
Cloudflare Tunnel deployment

Desktop Client (HoneycombOfAI):

PyQt6 native desktop GUI
CLI mode with Rich formatting
5 AI backends: Ollama, LM Studio, llama.cpp (server), llama-cpp-python (in-process), vLLM
All backends behind an abstract interface with factory pattern

Payment system:

Workers earn money for their compute
65% goes to Workers, 30% to the Queen, 5% platform fee
PayPal integration built in, architecture ready for Stripe

Real test results

Two Linux machines on my home network:

Machine 1: Linux Mint 22.2, RTX 4070 Ti
Machine 2: Debian 13, RTX 5090
LAN test: 64 seconds for a full distributed task
Internet test (via Cloudflare): 29 seconds

The code

Everything is open source under MIT license. Three repos:

BeehiveOfAI — the platform (Flask backend, coordination logic)
HoneycombOfAI — the desktop client (PyQt6 GUI, CLI, 5 backend integrations)
TheDistributedAIRevolution — a non-technical book explaining the concept

All under one GitHub account: https://github.com/strulovitz

Built in 7 days by one developer. I want people to fork this, build on it, take it in their own direction. The more machines running distributed local AI, the better for everyone.

DEV Community

Distributed AI platform — task parallelism instead of model splitting, and why every other approach has it backwards

Top comments (0)