Sushant Gaurav

Posted on Sep 15 • Edited on Oct 13

Python Multiprocessing: Start Methods, Pools, and Communication

#python #tutorial #programming #beginners

Processes vs Threads

Memory model and isolation

Threads live inside a single process and share the same address space. Any mutable object (lists, dicts, classes) is visible to every thread unless protected with synchronization primitives. Threads provide easy data sharing, but it is easy to corrupt shared state (race conditions).
Processes have separate address spaces. A child process cannot directly see the parent’s Python objects. Data must be transferred via inter-process communication (IPC), which involves serialization (pickle) unless using specialized shared memory. It provides safer isolation and robustness (a crash in one process usually does not corrupt others), but passing data has overhead.

Example: (updates a global in threads vs processes).

from threading import Thread
from multiprocessing import Process

counter_thread = 0
counter_process = 0

def bump_thread(n):
  global counter_thread
  for _ in range(n):
    counter_thread += 1
  print("Thread's counter value:", counter_thread)

def bump_process(n):
  global counter_process
  for _ in range(n):
    counter_process += 1
  print("Process's counter value:", counter_process)

if __name__ == "__main__":
  t1 = Thread(target=bump_thread, args=(100_000,))
  t2 = Thread(target=bump_thread, args=(100_000,))
  t1.start(); t2.start()
  t1.join(); t2.join()

  p1 = Process(target=bump_process, args=(100_000,))
  p2 = Process(target=bump_process, args=(100_000,))
  p1.start(); p2.start()
  p1.join(); p2.join()

  print("Process does not see counter in same memory:", counter_process)  
  print("Threads see counter in same memory:", counter_thread)

Output:

Thread's counter value: 100000
Thread's counter value: 200000
Process's counter value: 100000
Process's counter value: 100000
Process does not see counter in same memory: 0
Threads see counter in same memory: 200000

Overhead and scheduling

Threads are lightweight to create and context switch, but in CPython only one thread executes Python bytecode at a time (GIL). It is ideal for I/O, not for CPU-bound parallel work.
Processes are heavier in terms of start-up time, separate memory, and IPC costs. But each process has its own interpreter and GIL, so CPU-bound work can run truly in parallel across cores.

Communication

Threads

All threads within a process share the same memory space (global variables, heap, etc.). Communication is implicit in threads; so, if one thread modifies a variable, other threads can immediately see it.
Pros:
- Very fast, no serialization needed.
- Easy to share complex Python objects.
Cons:
- Risk of race conditions if multiple threads read/write the same variable at the same time.
- Need explicit synchronization (Lock, RLock, Semaphore, Condition, etc.) to avoid data corruption.

Process

Isolated memory model i.e. each process has its own memory space. Communication requires Inter-Process Communication (IPC), usually:
- multiprocessing.Queue (safe FIFO pipe).
- multiprocessing.Pipe (two-way communication channel).
- multiprocessing.Manager (shared dict/list across processes).
- multiprocessing.shared_memory (fast, large data arrays).
Pros:
- No accidental race conditions (since memory is isolated).
- Fault isolation (a crash in one process does not corrupt others).
Cons:
- Communication is slower due to serialization (pickling/unpickling).
- Extra overhead for large or frequent data transfers.

Creating Processes and Managing its Lifecycle

Note: Always protect the entry point with if __name__ == "__main__": on Windows/macOS (spawn/forkserver). It prevents recursive child creation when the module is imported by the new interpreter.

`multiprocessing.Process`

Example:

from multiprocessing import Process, current_process
import os, time

def work(n):
  print(f"[{current_process().name}] PID={os.getpid()} start with n={n}")
  time.sleep(1)
  print(f"[{current_process().name}] done")

if __name__ == "__main__":
  p1 = Process(target=work, name="Worker-1", args=(10,))
  p2 = Process(target=work, name="Worker-2", args=(20,))
  p1.start(); p2.start()
  print("Alive?", p1.is_alive(), p2.is_alive())
  p1.join(); p2.join()
  print("Exit codes:", p1.exitcode, p2.exitcode)

Output:

Alive? True True
[Worker-2] PID=56571 start with n=20
[Worker-1] PID=56570 start with n=10
[Worker-2] done
[Worker-1] done
Exit codes: 0 0

Important functions

p.start(): Launches the process in a new Python interpreter.
p.join(timeout=None): Waits for the process to finish (optional timeout).
p.terminate(): Forcefully stops the process (unsafe, no cleanup).
p.is_alive(): A boolean attribute that checks whether the process is still running (True) or not (False).
p.pid: The OS process ID of the child process.
p.name: Name of the process (default: Process-N, customizable).
p.exitcode: Exit status (0 means success, nonzero means error, None means still running).
p.daemon: A boolean attribute that tells whether a process is daemon or non-daemon.

Start Methods - `spawn`, `fork`, `forkserver`

Python provides three start methods for processes: spawn, fork, and forkserver.

`spawn`

It is default on Windows and macOS. It starts a fresh Python interpreter process. It is safer and more predictable because nothing is shared accidentally. It is slower to start than fork (since it loads everything fresh).

Use case: Cross-platform code, or when safety is more important than startup speed.

Note: In case of spawn, only picklable objects can be passed to child processes.

Example:

from multiprocessing import Process, set_start_method

def worker(x):
  print(f"Worker received {x}")

if __name__ == "__main__":
  set_start_method("spawn")
  p = Process(target=worker, args=(42,))
  p.start()
  p.join()

Output:

Worker received 42

`fork`

It is default on Linux/Unix. Child process is created by forking the parent (copy-on-write memory). It provides very fast startup because the child inherits the parent’s memory. It is unsafe with threads or some C extensions (like NumPy, TensorFlow, or database connectors) because they may not expect to be copied.

Use case: High-performance parallel tasks where there is no need to rely on complex C extensions or threading.

Example:

from multiprocessing import Process, set_start_method

def worker(x):
  print(f"Worker received {x}")

if __name__ == "__main__":
  set_start_method("fork")
  p = Process(target=worker, args=(99,))
  p.start()
  p.join()

Output:

Worker received 99

`forkserver`

It starts a dedicated fork server process. Every new process is forked from that clean server, not from the potentially complex parent. It avoids some of the unsafe state inheritance problems of fork. So, it is safer than fork, but still faster than spawn.

Use case: When there is a need of speed of fork but with more safety in complex apps (web servers, ML frameworks).

Example:

from multiprocessing import Process, set_start_method

def worker(x):
  print(f"Worker received {x}")

if __name__ == "__main__":
  set_start_method("forkserver")
  p = Process(target=worker, args=(123,))
  p.start()
  p.join()

Output:

Worker received 123

Summary table

Method	Default OS	Speed	Safety	Restrictions
`spawn`	Windows/macOS	Slow	Very Safe	Only picklable objects
`fork`	Linux/Unix	Fast	Risky	Unsafe with threads/C-extensions
`forkserver`	None (must set)	Medium	Safer than fork	Requires starting fork server

A good rule of thumb

Use spawn in case of portable and safe code.
Use fork on Linux and the need is fast startup, and the environment is thread-safe.
Use forkserver if need is balance between performance and safety.

Process Pools (`Pool`, `map`, `imap`, `imap_unordered`, `apply`, and `apply_async`)

Pool

When there are many tasks that can run in parallel, instead of manually creating dozens of Process objects, process pool can be used.

A Pool manages a fixed number of worker processes:

Submit tasks (functions + arguments).
The pool distributes tasks to workers.
Get results back (synchronously or asynchronously).

Note: This is perfect for CPU-bound work like prime testing, image processing, or simulations.

Example (Basic Pool Setup):

from multiprocessing import Pool, cpu_count

def square(x):
  return x * x

if __name__ == "__main__":
  # create pool with N workers
  with Pool(processes=cpu_count()) as pool:  
    numbers = [1, 2, 3, 4, 5]
    results = pool.map(square, numbers)
  print(results)

Output:

[1, 4, 9, 16, 25]

In the above example,

cpu_count() defines the number of CPU cores.
The line with Pool(processes=cpu_count()) as pool:
- Creates a process pool with as many worker processes as there are CPU cores on the machine.
- For example, if computer has 8 cores, then 8 worker processes are started.
- Each worker is a separate Python process that can run tasks in parallel (true parallelism, not blocked by the GIL).

`Pool.map` (Blocking Batch Processing)

It can be treated as parallel version of map(). It requires a function and a list of inputs. It waits until all results are ready.

It is ideal for:

A small list of tasks.
When all results at required together.
- It has some downside:
It blocks until the whole batch is done.
Since there are large iterables so there is high memory use.

Example:

from multiprocessing import Pool, cpu_count
import math, time

def is_prime(n: int) -> bool:
  # Check if n is prime (naive method).
  if n < 2: return False
  if n % 2 == 0: return n == 2
  r = int(math.isqrt(n))
  for f in range(3, r+1, 2):
    if n % f == 0:
      return False
  return True

if __name__ == "__main__":
  numbers = [10_000_019, 10_000_079, 10_000_091, 10_000_099]

  with Pool(cpu_count()) as pool:
    start = time.perf_counter()
    # waits for all results
    results = pool.map(is_prime, numbers)   
    finish = time.perf_counter()
    total = finish - start

  print(list(zip(numbers, results)))
  print(f"Time elapsed: {total:.2f}s with {cpu_count()} workers")

Output:

[(10000019, True), (10000079, True), (10000091, False), (10000099, False)]
Time elapsed: 0.06s with 8 workers

`Pool.imap` and `Pool.imap_unordered` (Streaming Results)

Unlike map, results are streamed back one at a time. It is useful when tasks are long-running or numerous.

`.imap`

imap returns results in the same order as the input. It is a generator so results are streamed one by one, not all at once (like map). It is useful when order matters (for example, processing a sequence of tasks and later logic depends on that order).

`.imap_unordered`

.imap_unordered as the name suggests, returns results as soon as they are ready (order is not guaranteed). It is also a generator which streams results as they complete. It is useful when order does not matter, and the need is faster throughput.

Example:

from multiprocessing import Pool
import time, random

def slow_square(n):
  # Simulate a task with random delay.
  delay = random.uniform(0.5, 2.0)  # 0.5–2 seconds
  time.sleep(delay)
  result = n * n
  print(f"Task {n} finished in {delay:.2f}s → {result}")
  return result

def demo_imap():
  print("--- Using imap (ordered results) ---")
  items = [1, 2, 3, 4, 5]
  with Pool(3) as pool:  # 3 workers
    for res in pool.imap(slow_square, items, chunksize=1):
      print("Got result (ordered):", res)

def demo_imap_unordered():
  print("--- Using imap_unordered (unordered results) ---")
  items = [1, 2, 3, 4, 5]
  with Pool(3) as pool:
    for res in pool.imap_unordered(slow_square, items, chunksize=1):
      print("Got result (unordered):", res)

if __name__ == "__main__":
  demo_imap()
  demo_imap_unordered()

Output:

--- Using imap (ordered results) ---
Task 1 finished in 1.53s → 1
Got result (ordered): 1
Task 2 finished in 1.61s → 4
Got result (ordered): 4
Task 3 finished in 1.89s → 9
Got result (ordered): 9
Task 4 finished in 1.41s → 16
Got result (ordered): 16
Task 5 finished in 1.57s → 25
Got result (ordered): 25
--- Using imap_unordered (unordered results) ---
Task 2 finished in 1.29s → 4
Got result (unordered): 4
Task 3 finished in 1.35s → 9
Got result (unordered): 9
Task 1 finished in 1.49s → 1
Got result (unordered): 1
Task 5 finished in 0.60s → 25
Got result (unordered): 25
Task 4 finished in 1.07s → 16
Got result (unordered): 16

Note: chunksize in multiprocessing.Pool controls how many tasks each worker process gets at once.

chunksize=1 send tasks one by one; more responsive, less efficient.

Larger chunksize send tasks in batches; less communication overhead, faster for uniform tasks.

`Pool.apply` (Blocking, one task at a time)

It blocks the main process until the task finishes. It returns the direct result, just like calling the function normally.

Execution flow:

Main process → give task to worker → wait until done → print result → continue

Useful if:
- Want to test multiprocessing with one task.
- Need the result immediately before moving on.

Example:

from multiprocessing import Pool

def cube(x):
  return x**3

if __name__ == "__main__":
  with Pool(3) as pool:
    print("Sending task...")
    result = pool.apply(cube, (5,))          # BLOCKS until done
    print("Got result:", result)             # Prints 125
  print("Main continues AFTER task is done.")

Output:

Sending task...
Got result: 125
Main continues AFTER task is done.

`Pool.apply_async` (Non-blocking, many tasks at once)

It does not block the main process, can keep doing other work while workers run. Instead of the result directly, it returns a AsyncResult object (a "future").

Can call:

.get(): Fetches the result (blocking). If the worker has finished, it returns the value. If it is still running, the program will wait until it is done.
Example:

  from multiprocessing import Pool
  import time

  def square(x):
    time.sleep(2)
    return x * x

  if __name__ == "__main__":
    with Pool(2) as pool:
      async_res = pool.apply_async(square, (5,))
      print("Main keeps running...")
      result = async_res.get()   # Waits here until worker finishes
      print("Result:", result)

  # OUTPUT:
  # Main keeps running...
  # Result: 25

.ready(): Fetches the result (blocking). If the worker has finished, it returns the value. If it is still running, the program will wait until it is done.
Example:

  from multiprocessing import Pool
  import time

  def square(x):
    time.sleep(2)
    return x * x

  if __name__ == "__main__":
    with Pool(2) as pool:
      async_res = pool.apply_async(square, (5,))
      while not async_res.ready():
        print("Still working...")
        time.sleep(0.5)
      print("Done:", async_res.get())

  # OUTPUT:
  # Main keeps running...
  # Result: 25

.wait(): Blocked until finished, no result. It just pauses until the worker is done, but does not return the result (can call .get() later).
Example:

  from multiprocessing import Pool
  import time

  def square(x):
    time.sleep(2)
    return x * x

  if __name__ == "__main__":
    with Pool(2) as pool:
      async_res = pool.apply_async(square, (5,))
      async_res.wait()   # Just wait until it’s finished
      print("Now safe to get:", async_res.get())

  # OUTPUT:
  # Now safe to get: 25

Supports callbacks:
- callback: It runs when task succeeds.
- error_callback: It runs if task fails.

Execution flow:

Main process → fire tasks → workers crunch numbers → results/errors handled via callbacks → main waits at the end (if needed)

Example (with callbacks):

from multiprocessing import Pool
import time

def work(x):
  time.sleep(1)  # pretend it's slow
  if x == 5:
    raise ValueError("boom")
  return x * x

def on_ok(result):
  print("Result ready:", result)

def on_err(err):
  print("Error:", err)

if __name__ == "__main__":
  with Pool(3) as pool:
    futures = [
        pool.apply_async(work, (i,), callback=on_ok, error_callback=on_err)
        for i in range(8)
    ]
    print("Main can do other stuff while workers run...")

    # Wait for all tasks
    for f in futures:
      f.wait()
    print("All tasks completed")

Output:

Main can do other stuff while workers run...
Result ready: 1
Result ready: 0
Result ready: 4
Result ready: 9
Result ready: 16
Error: boom
Result ready: 49
Result ready: 36
All tasks completed

Quick between `apply` and `apply_async`

Feature	`apply` (Blocking)	`apply_async` (Non-blocking)
Blocking?	Yes (main waits)	No (main continues)
Return value	Direct result	`AsyncResult` object
Suitable for	One task, need result now	Many tasks, async workflow
Callbacks	Not supported	Supported (success & error)
Typical use	Simple testing/debugging	Real-world workloads, batch jobs

Comparison

Method	Blocking?	Input Size Suitability	Order Preserved?	Best for
`map`	Yes	Small–medium lists	Yes	Simple batch jobs
`imap`	No	Large streams	Yes	Stream results in order
`imap_unordered`	No	Large streams	No	Faster feedback
`apply`	Yes	Single task	Yes	One-off process run
`apply_async`	No	Many tasks	Optional via callback	Async workflows

Rule of thumb

map: easiest for small datasets.
imap/imap_unordered: best for big datasets.
apply/apply_async: for one-off or async job scheduling.

DEV Community

Python Multiprocessing: Start Methods, Pools, and Communication

Processes vs Threads

Memory model and isolation

Overhead and scheduling

Communication

Threads

Process

Creating Processes and Managing its Lifecycle

`multiprocessing.Process`

Important functions

Start Methods - `spawn`, `fork`, `forkserver`

`spawn`

`fork`

`forkserver`

Summary table

A good rule of thumb

Process Pools (`Pool`, `map`, `imap`, `imap_unordered`, `apply`, and `apply_async`)

Pool

`Pool.map` (Blocking Batch Processing)

`Pool.imap` and `Pool.imap_unordered` (Streaming Results)

`.imap`

`.imap_unordered`

`Pool.apply` (Blocking, one task at a time)

`Pool.apply_async` (Non-blocking, many tasks at once)

Quick between `apply` and `apply_async`

Comparison

Rule of thumb

Top comments (0)

Processes vs Threads

Memory model and isolation

Overhead and scheduling

Communication

Threads

Process

Creating Processes and Managing its Lifecycle

multiprocessing.Process

Important functions

Start Methods - spawn, fork, forkserver

spawn

fork

forkserver

Summary table

A good rule of thumb

Process Pools (Pool, map, imap, imap_unordered, apply, and apply_async)

Pool

Pool.map (Blocking Batch Processing)

Pool.imap and Pool.imap_unordered (Streaming Results)

.imap

.imap_unordered

Pool.apply (Blocking, one task at a time)

Pool.apply_async (Non-blocking, many tasks at once)

Quick between apply and apply_async

Comparison

Rule of thumb

`multiprocessing.Process`

Start Methods - `spawn`, `fork`, `forkserver`

`spawn`

`fork`

`forkserver`

Process Pools (`Pool`, `map`, `imap`, `imap_unordered`, `apply`, and `apply_async`)

`Pool.map` (Blocking Batch Processing)

`Pool.imap` and `Pool.imap_unordered` (Streaming Results)

`.imap`

`.imap_unordered`

`Pool.apply` (Blocking, one task at a time)

`Pool.apply_async` (Non-blocking, many tasks at once)

Quick between `apply` and `apply_async`