The problem
You have multiple computers at home, each capable of running a local LLM. How do you make them work together?
Every existing project — exo, distributed-llama, llama.cpp RPC — tries to split the model across machines. In theory this works. In practice, inter-node network latency kills performance, especially on home networks.
A different approach
I took the opposite path: split the task, not the model.
One machine (the "Queen") receives a complex job and uses its local LLM to decompose it into independent subtasks. Other machines ("Workers"), each running their own complete local LLM, pick up one subtask each and process them in parallel. The Queen collects all results and combines them into the final answer.
The key insight: if the subtasks are independent, the workers never need to communicate with each other. Zero inter-node communication during inference. Each worker is fully self-contained.
Why this matters
For individuals: You probably have more than one computer at home. Your desktop, your laptop, maybe an old machine sitting in a closet. Each one can contribute its processing power.
For companies: Instead of paying OpenAI or Google for cloud AI, you can run a cluster of ordinary machines with local models. The cost reduction is massive.
For the open-source community: This is like torrent technology but for AI computation. Instead of one powerful server, a swarm of ordinary machines collaborates.
How it works
User submits a complex task
|
v
Queen machine splits it into subtasks
/ | \
v v v
Worker Worker Worker (each processes independently)
\ | /
v v v
Queen combines all results
|
v
User receives the final answer
Workers can drop in and out at any time without breaking anything. If a worker disappears, its subtask times out and becomes available for another worker to pick up. True fault tolerance.
The tech stack
Platform (BeehiveOfAI):
- Flask 3.1.1 backend
- SQLAlchemy with SQLite
- Flask-Login authentication
- PayPal REST API for payments
- Cloudflare Tunnel deployment
Desktop Client (HoneycombOfAI):
- PyQt6 native desktop GUI
- CLI mode with Rich formatting
- 5 AI backends: Ollama, LM Studio, llama.cpp (server), llama-cpp-python (in-process), vLLM
- All backends behind an abstract interface with factory pattern
Payment system:
- Workers earn money for their compute
- 65% goes to Workers, 30% to the Queen, 5% platform fee
- PayPal integration built in, architecture ready for Stripe
Real test results
Two Linux machines on my home network:
- Machine 1: Linux Mint 22.2, RTX 4070 Ti
- Machine 2: Debian 13, RTX 5090
- LAN test: 64 seconds for a full distributed task
- Internet test (via Cloudflare): 29 seconds
The code
Everything is open source under MIT license. Three repos:
- BeehiveOfAI — the platform (Flask backend, coordination logic)
- HoneycombOfAI — the desktop client (PyQt6 GUI, CLI, 5 backend integrations)
- TheDistributedAIRevolution — a non-technical book explaining the concept
All under one GitHub account: https://github.com/strulovitz
Built in 7 days by one developer. I want people to fork this, build on it, take it in their own direction. The more machines running distributed local AI, the better for everyone.
Top comments (0)