DEV Community

Main
Main

Posted on • Originally published at pynerds.com

Multiprocessing in Python

By default, instructions in a program are executed sequentially, this means that one task has to be completed for another to start. In more familiar terms it means that, for example if you have two functions defined in a program, the first function to be called will be executed until it is complete before the other one starts. 

The sequential mode of execution, may not be admirable in some cases, for example, some tasks may take too long to complete thus blocking the execution of other important tasks. We may also want some tasks to run in the background without interfering with the execution of other tasks. This is where concurrency programming comes in handy.

Concurrent programming aims at managing multiple tasks in a program simultaneously. This means that some instructions can be run in parallel rather than sequentially. There are various ways concurrency is achieved in Python, one of this ways is through multiprocessing.

Multiprocessing is normally used with an aim of improving the performance of a program by leveraging multiple processors or cores available in a computer. The primary goal is to parallelize the execution of tasks, allowing them to be executed simultaneously, this generally leads to faster overall execution time.

Before we look into more details on how multiprocessing works, we need to first understand what a process is.

Processes in Python

Literally speaking a process refers to an instance of a program that is running. Thus when you run a program you are effectively starting a process. 

The operating system manages how processes are created and how they run in the background. Multiple processes can be be running simultaneously but each process is allocated its own chunk of memory where it stores and manipulates its data. A process cannot access or write into the space allocated for another process. 

In Python, processes are started by invoking instances of the Process class which is defined in the multiprocessing module of the standard library. This means that when you run a Python program you are basically running an instance of the multiprocessing.Process() class, or to put it better,  this is what happens under the hood.

Every program in python starts with a single process which is called the main process. We can then create other processes inside that single process, the new processes are referred to as child processes.

Multiprocessing happens when we start other instances of multiprosessing.Process() class inside the main process/program, this leads to multiple processes running in parallel alongside the main process. 

Creating and running processes

To create and run a process, follow the steps shown below:

  1. Begin by importing the Process class from the multiprocessing module.
  2. Create an instance of the  Process() class, with the necessary arguments.
  3. Start the created process by calling its start() method.
  4. Make the main process pause and wait until the started process is done executing by calling the join() method of the created process.
#import the Process class
from multiprocessing import Process

def squares(num):
art the process
    p.start()
    for i in range(num):
        print(i ** 2)

if __name__ == "__main__":
    #create a process
    p = Process(target = squares, args = (5, ) )
    
    #st
    #the main process waits for this process
    p.join() 

Output:

0
1
4
9
16

In the above example we have created a process, p which we have used to run the squares() function. Let us now look deeper into the Process objects and how they work.

The Process class has the following syntax:

Process(target = None, name = None, args = None, kwargs = None, daemon = None )
target The function to be run when the process is started.
name The name to identify the process. Multiple processes may share a common name. 
args A tuple with positional arguments that will be passed to the target function when it is called.
kwargs A dictionary with keyword arguments that will be passed to the target function when it is called.
daemon If set to True, the process will run in the background.

Multiprocessing example

Our previous example was not really an example of multiprocessing this is because only one process was running at a given time. In the following example, we create two processes to demonstrates how multiprocessing works.

#import the Process class
from multiprocessing import Process
import time

#The tasks to be executed
def squares(n):
    for i in range(n):
        print(f"square({i}) = {i ** 2}")
        time.sleep(0.1)

def cubes(n):
    for i in range(n):
         print(f"cube({i}) = {i ** 3}")
         time.sleep(0.1)

if __name__ == "__main__":
     #Create the processes
     psquare = Process(target = squares, name = "psquare", args = (5, ) )
     pcube = Process(target = cubes, name = "pcubes", args = (5, ) )
     
     #start the processes
     psquare.start()
     pcube.start()
     
     # the main process waits for this processes
     psquare.join()
     pcube.join()

Output:

square(0) = 0
cube(0) = 0
square(1) = 1
cube(1) = 1
square(2) = 4
cube(2) = 8
square(3) = 9
cube(3) = 27
square(4) = 16
cube(4) = 64

If you observe the above output, you can see that output from the two functions squares() and cubes() are been emitted alternately, this is because the two functions are actually been run in parallel rather than one after the other.

Daemon Processes

A daemon process is one that runs in the background. These processes can be used to perform background tasks.

We create a daemon process by setting the daemon argument as True when creating the process.

#import the Process class
from multiprocessing import Process

import time

def task():
    for i in range(20):
         print(i)
         time.sleep(0.1)

#a background task that runs at regular intervals
def background_task():
     for i in range(0, 21, 5):
        print("daemon task")
        time.sleep(0.5)

if __name__ == "__main__":
     #regular Process
     P1 = Process(target = task)
     #daemon Process
     P2 = Process(target = background_task, daemon = True)

     #start the Processs
     P1.start()
     P2.start()
   
     P1.join()
     P2.join()

Output:

daemon task
0
1
2
3
4
daemon task
5
6
7
8
9
daemon task
10
11
12
13
14
daemon task
15
16
17
18
19
daemon task

Terminating a process

Sometimes we may want to terminate a process prematurely. We can achieve this using either kill() or terminate() methods. This function should be used with caution and when absolutely necessary. This is because they makes the process to terminate abruptly without performing the necessary clean-up, thus open resources such as files and sockets will not be closed.

#terminate
p.terminate()

#kill
p.kill()

States of a process.

A process can be in four states:

  1. New - The process has just been instantiated but not yet started.
  2. Running - The process is been executed
  3. Paused - The process has been paused for some reason, for example it may be waiting for another process to complete its execution.
  4. Dead - The process is not runnable. It may have reached this state naturally after completing its task or it may have been terminated using terminate() or killed using kill().  

Processes contains the is_alive() method which can be used to tell whether a process is currently running.  The function returns True if the process is in the running state and False otherwise.

p.is_alive()

Multprocessing vs multithreading

Multithreading is another way that we can achieve concurrency in Python.

At first glance, multithreading looks very similar to multiprocessing,  their outer interface and usage looks almost identical, however, the internal workings of the two approaches is very different. You can check on this link to see more on threads and how multithreading works.

In a nutshell, threads can be thought of as smaller units of a process, this is why they are also referred to as sub-processes. A single process can contain multiple threads running simultaneously. All threads in a single process share same resources and data, this generally makes communication between multiple threads quicker.

Multithreading offers a more lightweight alternative approach to multiprocessing. This is partly because threads, unlike processes, share resources and thus communication between two threads is easier. Processes, on the other hand, do not share resources and data, this means that for multiprocessing to work, inter-process communication has to be established.

However, multiple threads sharing resources also comes with its own limitations because  a mechanism has to be established to ensure data safety, or otherwise, conflicts may arise like one thread deleting a resource that is been used by another thread .

What Python does to prevent conflicts between threads is ensuring that only a single thread at a time can access or modify shared resources, this is achieved through a mechanism known as the Global Interpreter Lock (GIL). This literally means that true parallelism cannot be achieved with threading like with multiprocessing. Concurrency in multithreading is achieved by switching very quickly between the threads concurrently. The switching is fast enough to prevent most noticeable delays for the end user.

Related articles


Top comments (0)