Nebu0528

Posted on Mar 29

I Built a Tool to Distribute Python Tasks Across Local Machines. Here's How It Performed

#distributedsystems #cli #opensource #showdev

I wanted to answer a simple question: how hard is it to split a Python workload across multiple machines on the same network?

Not with a cloud cluster, not Kubernetes, just a few laptops on the same WiFi, sharing the work.

So I built distributed-compute-locally to find out. The goal was maximum simplicity if it takes more than a few lines of code to set up, I've failed. Then I benchmarked it against industry-standard tools to see how it holds up.

The API

from distributed_compute import Coordinator

coordinator = Coordinator()
coordinator.start_server()
results = coordinator.map(my_func, data)

On any other machine on the network:

pip install distributed-compute-locally
distcompute worker 192.168.1.100

That's all you have to do. A coordinator distributes tasks over TCP, workers execute them with cloudpickle, and results come back in order, same as Python's built-in map().

┌─────────────┐     TCP/5555     ┌──────────┐
│ Coordinator │◄────────────────►│ Worker 1 │
│ (your PC.). │◄────────────────►│ Worker 2 │
│             │◄────────────────►│ Worker 3 │
└─────────────┘                  └──────────┘

Benchmarks

I ran three standard parallel computing benchmarks on an Apple M2 MacBook (8 cores) with 4 workers:

Benchmark	Sequential	4 Workers	Speedup
NAS EP — NASA Embarrassingly Parallel	5.0s	1.4s	3.57x
Mandelbrot Set — 2048×2048, 256 iterations	12.0s	3.7s	3.27x
SHA-256 Search — brute-force hash prefix	6.1s	1.6s	3.72x
Average			3.52x

3.52x on 4 workers (theoretical max 4.0x) — 88% parallel efficiency.

How Does It Compare?

I was curious to see how this stacks up against Dask and Ray, so I ran the exact same workloads with the same setup (same machine, 4 workers, identical task code):

Benchmark	distributed-compute-locally	Dask	Ray
NAS EP	3.57x	3.52x	3.06x
Mandelbrot	3.27x	3.44x	3.52x
SHA-256	3.72x	3.59x	3.90x
Average	3.52x	3.52x	3.49x

All three land within the same range of each other. For embarrassingly parallel workloads, the CPU is the bottleneck — not the framework. The task distribution overhead is negligible across all three.

Scaling Curve (N-Body Stress Test)

I also ran a heavier workload for an O(n²) pairwise gravity simulation, 500 particles × 100 timesteps and scaled from 1 to 8 workers:

Workers	Time	Speedup
1	179.1s	1.00x
2	145.7s	1.23x
4	102.7s	1.74x
6	81.6s	2.20x
8	78.8s	2.27x

Diminishing returns after 6 workers, that's the M2's asymmetric cores (4 performance + 4 efficiency). The slower efficiency cores become the bottleneck on heavy tasks. On machines with identical cores or across multiple machines, scaling would be more linear.

Findings and Learning

Building this tool reinforced something I suspected: for simple parallel workloads, the framework doesn't matter that much. Dask, Ray, and a minimal TCP-based tool all deliver roughly the same speedup. The difference is what they offer.

Dask and Ray give you a lot of features such as task graphs, dashboards, DataFrame integration, cloud deployment, and a massive ecosystem. They are the right choice for complex pipelines and production infrastructure.

This tool gives you none of that on purpose. It's for the cases where you just want to map() a function across a few machines.

	distributed-compute-locally	Dask / Ray
Multi-machine setup	`distcompute worker <ip>`	Scheduler + worker CLI + networking
API surface	`coordinator.map()`	Client, delayed, futures, DataFrame, ...
Dashboard	No	Yes
Task graphs	No	Yes
Best for	Quick `map()` across LAN	Complex pipelines, cloud clusters

Other Features

Task retry — coordinator.map(func, data, max_retries=3)
Password auth — distcompute coordinator --password secret
Interactive CLI — REPL with status monitoring and task submission
Large payload handling — automatic chunking and compression
cloudpickle — send lambdas, closures, and local functions

Try It

pip install distributed-compute-locally

Run the benchmarks yourself:

git clone https://github.com/Nebu0528/distributor.git
cd distributor
python3 benchmark/benchmark.py 4

GitHub: github.com/Nebu0528/distributor

If you find it useful or have feedback, would love to hear it. Open an issue or drop a star.

DEV Community