DEV Community

Nebu0528
Nebu0528

Posted on

I Built a Tool to Distribute Python Tasks Across Local Machines. Here's How It Performed

I wanted to answer a simple question: how hard is it to split a Python workload across multiple machines on the same network?

Not with a cloud cluster, not Kubernetes, just a few laptops on the same WiFi, sharing the work.

So I built distributed-compute-locally to find out. The goal was maximum simplicity if it takes more than a few lines of code to set up, I've failed. Then I benchmarked it against industry-standard tools to see how it holds up.

The API

from distributed_compute import Coordinator

coordinator = Coordinator()
coordinator.start_server()
results = coordinator.map(my_func, data)
Enter fullscreen mode Exit fullscreen mode

On any other machine on the network:

pip install distributed-compute-locally
distcompute worker 192.168.1.100
Enter fullscreen mode Exit fullscreen mode

That's all you have to do. A coordinator distributes tasks over TCP, workers execute them with cloudpickle, and results come back in order, same as Python's built-in map().

┌─────────────┐     TCP/5555     ┌──────────┐
│ Coordinator │◄────────────────►│ Worker 1 │
│ (your PC.). │◄────────────────►│ Worker 2 │
│             │◄────────────────►│ Worker 3 │
└─────────────┘                  └──────────┘
Enter fullscreen mode Exit fullscreen mode

Benchmarks

I ran three standard parallel computing benchmarks on an Apple M2 MacBook (8 cores) with 4 workers:

Benchmark Sequential 4 Workers Speedup
NAS EP — NASA Embarrassingly Parallel 5.0s 1.4s 3.57x
Mandelbrot Set — 2048×2048, 256 iterations 12.0s 3.7s 3.27x
SHA-256 Search — brute-force hash prefix 6.1s 1.6s 3.72x
Average 3.52x

3.52x on 4 workers (theoretical max 4.0x) — 88% parallel efficiency.

How Does It Compare?

I was curious to see how this stacks up against Dask and Ray, so I ran the exact same workloads with the same setup (same machine, 4 workers, identical task code):

Benchmark distributed-compute-locally Dask Ray
NAS EP 3.57x 3.52x 3.06x
Mandelbrot 3.27x 3.44x 3.52x
SHA-256 3.72x 3.59x 3.90x
Average 3.52x 3.52x 3.49x

All three land within the same range of each other. For embarrassingly parallel workloads, the CPU is the bottleneck — not the framework. The task distribution overhead is negligible across all three.

Scaling Curve (N-Body Stress Test)

I also ran a heavier workload for an O(n²) pairwise gravity simulation, 500 particles × 100 timesteps and scaled from 1 to 8 workers:

Workers Time Speedup
1 179.1s 1.00x
2 145.7s 1.23x
4 102.7s 1.74x
6 81.6s 2.20x
8 78.8s 2.27x

Diminishing returns after 6 workers, that's the M2's asymmetric cores (4 performance + 4 efficiency). The slower efficiency cores become the bottleneck on heavy tasks. On machines with identical cores or across multiple machines, scaling would be more linear.

Findings and Learning

Building this tool reinforced something I suspected: for simple parallel workloads, the framework doesn't matter that much. Dask, Ray, and a minimal TCP-based tool all deliver roughly the same speedup. The difference is what they offer.

Dask and Ray give you a lot of features such as task graphs, dashboards, DataFrame integration, cloud deployment, and a massive ecosystem. They are the right choice for complex pipelines and production infrastructure.

This tool gives you none of that on purpose. It's for the cases where you just want to map() a function across a few machines.

distributed-compute-locally Dask / Ray
Multi-machine setup distcompute worker <ip> Scheduler + worker CLI + networking
API surface coordinator.map() Client, delayed, futures, DataFrame, ...
Dashboard No Yes
Task graphs No Yes
Best for Quick map() across LAN Complex pipelines, cloud clusters

Other Features

  • Task retrycoordinator.map(func, data, max_retries=3)
  • Password authdistcompute coordinator --password secret
  • Interactive CLI — REPL with status monitoring and task submission
  • Large payload handling — automatic chunking and compression
  • cloudpickle — send lambdas, closures, and local functions

Try It

pip install distributed-compute-locally
Enter fullscreen mode Exit fullscreen mode

Run the benchmarks yourself:

git clone https://github.com/Nebu0528/distributor.git
cd distributor
python3 benchmark/benchmark.py 4
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/Nebu0528/distributor

If you find it useful or have feedback, would love to hear it. Open an issue or drop a star.

Top comments (0)