I wanted to answer a simple question: how hard is it to split a Python workload across multiple machines on the same network?
Not with a cloud cluster, not Kubernetes, just a few laptops on the same WiFi, sharing the work.
So I built distributed-compute-locally to find out. The goal was maximum simplicity if it takes more than a few lines of code to set up, I've failed. Then I benchmarked it against industry-standard tools to see how it holds up.
The API
from distributed_compute import Coordinator
coordinator = Coordinator()
coordinator.start_server()
results = coordinator.map(my_func, data)
On any other machine on the network:
pip install distributed-compute-locally
distcompute worker 192.168.1.100
That's all you have to do. A coordinator distributes tasks over TCP, workers execute them with cloudpickle, and results come back in order, same as Python's built-in map().
┌─────────────┐ TCP/5555 ┌──────────┐
│ Coordinator │◄────────────────►│ Worker 1 │
│ (your PC.). │◄────────────────►│ Worker 2 │
│ │◄────────────────►│ Worker 3 │
└─────────────┘ └──────────┘
Benchmarks
I ran three standard parallel computing benchmarks on an Apple M2 MacBook (8 cores) with 4 workers:
| Benchmark | Sequential | 4 Workers | Speedup |
|---|---|---|---|
| NAS EP — NASA Embarrassingly Parallel | 5.0s | 1.4s | 3.57x |
| Mandelbrot Set — 2048×2048, 256 iterations | 12.0s | 3.7s | 3.27x |
| SHA-256 Search — brute-force hash prefix | 6.1s | 1.6s | 3.72x |
| Average | 3.52x |
3.52x on 4 workers (theoretical max 4.0x) — 88% parallel efficiency.
How Does It Compare?
I was curious to see how this stacks up against Dask and Ray, so I ran the exact same workloads with the same setup (same machine, 4 workers, identical task code):
| Benchmark | distributed-compute-locally | Dask | Ray |
|---|---|---|---|
| NAS EP | 3.57x | 3.52x | 3.06x |
| Mandelbrot | 3.27x | 3.44x | 3.52x |
| SHA-256 | 3.72x | 3.59x | 3.90x |
| Average | 3.52x | 3.52x | 3.49x |
All three land within the same range of each other. For embarrassingly parallel workloads, the CPU is the bottleneck — not the framework. The task distribution overhead is negligible across all three.
Scaling Curve (N-Body Stress Test)
I also ran a heavier workload for an O(n²) pairwise gravity simulation, 500 particles × 100 timesteps and scaled from 1 to 8 workers:
| Workers | Time | Speedup |
|---|---|---|
| 1 | 179.1s | 1.00x |
| 2 | 145.7s | 1.23x |
| 4 | 102.7s | 1.74x |
| 6 | 81.6s | 2.20x |
| 8 | 78.8s | 2.27x |
Diminishing returns after 6 workers, that's the M2's asymmetric cores (4 performance + 4 efficiency). The slower efficiency cores become the bottleneck on heavy tasks. On machines with identical cores or across multiple machines, scaling would be more linear.
Findings and Learning
Building this tool reinforced something I suspected: for simple parallel workloads, the framework doesn't matter that much. Dask, Ray, and a minimal TCP-based tool all deliver roughly the same speedup. The difference is what they offer.
Dask and Ray give you a lot of features such as task graphs, dashboards, DataFrame integration, cloud deployment, and a massive ecosystem. They are the right choice for complex pipelines and production infrastructure.
This tool gives you none of that on purpose. It's for the cases where you just want to map() a function across a few machines.
| distributed-compute-locally | Dask / Ray | |
|---|---|---|
| Multi-machine setup | distcompute worker <ip> |
Scheduler + worker CLI + networking |
| API surface | coordinator.map() |
Client, delayed, futures, DataFrame, ... |
| Dashboard | No | Yes |
| Task graphs | No | Yes |
| Best for | Quick map() across LAN |
Complex pipelines, cloud clusters |
Other Features
-
Task retry —
coordinator.map(func, data, max_retries=3) -
Password auth —
distcompute coordinator --password secret - Interactive CLI — REPL with status monitoring and task submission
- Large payload handling — automatic chunking and compression
- cloudpickle — send lambdas, closures, and local functions
Try It
pip install distributed-compute-locally
Run the benchmarks yourself:
git clone https://github.com/Nebu0528/distributor.git
cd distributor
python3 benchmark/benchmark.py 4
GitHub: github.com/Nebu0528/distributor
If you find it useful or have feedback, would love to hear it. Open an issue or drop a star.
Top comments (0)