DEV Community

Nico Reyes
Nico Reyes

Posted on

Python multiprocessing crashed on my custom class. Took forever to figure out why.

So I had this script that processed a bunch of images. Image processing, right, perfect use case for multiprocessing. Split the work across 8 cores, should be way faster.

First attempt:

from multiprocessing import Pool

def process_image(image_path):
    # do stuff
    return processed

with Pool(8) as pool:
    results = pool.map(process_image, image_paths)
Enter fullscreen mode Exit fullscreen mode

Looked fine. Ran it.

Crashed immediately. Pickle error.

Cannot pickle <class 'MyImageProcessor'>

My what now?

So here's the thing I forgot. Multiprocessing in Python uses pickle to send data between processes. Functions get pickled, arguments get pickled, results get pickled. Everything gets pickled.

And my image processor class? Not pickleable.

Tried a few things:

  • Made the class simpler
  • Tried with just a function
  • Switched to threading instead

Threading worked but was slow as hell because of the GIL. Image processing is CPU intensive, threading doesn't help there.

Finally figured out the fix. The class held some state that couldn't be pickled. Moved the unpickleable stuff to getstate and setstate methods, or just initialized it inside the worker function instead of passing the whole object.

Working version:

from multiprocessing import Pool

def init_worker():
    # Initialize any unpickleable objects here
    global processor
    processor = MyImageProcessor()

def process_image(image_path):
    global processor
    return processor.process(image_path)

if __name__ == "__main__":
    with Pool(8, initializer=init_worker) as pool:
        results = pool.map(process_image, image_paths)
Enter fullscreen mode Exit fullscreen mode

Using the initializer pattern means each worker process sets up its own processor. No pickling needed, no crashes.

Alternatively, if your objects are simpler, just use functools.partial:

from functools import partial
from multiprocessing import Pool

def process_image(processor, image_path):
    return processor.process(image_path)

processor = MyImageProcessor()
bounded_func = partial(process_image, processor)

with Pool(8) as pool:
    results = pool.map(bounded_func, image_paths)
Enter fullscreen mode Exit fullscreen mode

This only works if the processor itself is pickleable of course.

Still not sure why Python's multiprocessing defaults to pickle instead of something more forgiving. The error message is garbage too, just points at the class without explaining what's actually blocking serialization.

Made the script work in the end. Still annoyed about losing a few hours to this.

Top comments (0)