DEV Community

Ali
Ali

Posted on

Decoding the Multitasking Matrix: Async vs Threads vs Processes

Introduction

Starting a new software project in Python these days often means factoring for scale from the start. With cloud platforms and growing user bases, not leveraging multitasking can quickly lead to degraded performance and a restrictive bottleneck limiting future growth. Python offers several powerful options for introducing parallelism in applications, but understanding the ins and outs of these approaches is still puzzling to many developers.

Building highly concurrent and parallel systems requires understanding the sweet spots and limitations of each technique available to us within Python's flexible yet sometimes quirky set of multitasking tools. Should you reach for async? Multiple threads? Or is it time to leverage multiple processes? What factors impact these architectural decisions?

We'll explore Python's wide range of inbuilt multitasking abilities spanning several key approaches - asynchronous programming, threading, and multiprocessing. Through comprehensive comparisons, we'll cut through the confusion surrounding this critical matrix of parallelization options available to Python developers. You'll walk away with clarity of when to utilize async, when to thread for responsiveness, and when to multiprocess for true scaling.

So whether you are CPU intensive scientist trying to speed up data pipelines, a web developer looking to improve I/O response times, or an SRE planning for future scale, join us as we explore and explain the purpose-built strengths each option brings for tackling different workloads. Let's discover how to master multitasking with this multitasking matrix built into Python for both fast IO and faster compute!

Asynchronous Programming

Asynchronous programming allows a program to run without waiting for an earlier operation to complete. This allows long-running I/O-bound operations like network requests, file I/O, etc. to execute out of sequence without blocking the main thread.

In Python, we use async/await keywords to define asynchronous functions or coroutines. The async keyword defines a function as asynchronous while await is used to call asynchronous functions without blocking. Under the hood, Python runs asynchronous functions on a single-threaded event loop.

A key concept that enables asyncio's non-blocking behavior is the event loop. This is a single-threaded loop that asynchronously schedules and manages tasks and I/O operations by leveraging cooperative multitasking. It monitors network connections and request statuses, allowing it to switch between tasks efficiently without utilizing threads or processes.

Popular async frameworks like asyncio and Tornado provide helpful utility functions and classes to work with asynchronous programming. The benefits are high concurrency and improved speed for I/O-intensive workloads. The downside is that CPU-bound operations offer little advantage to running asynchronously.

Asynchronous Programming with aiohttp

aiohttp is an asynchronous HTTP client/server framework that allows you to perform asynchronous HTTP requests. It is particularly well-suited for scenarios where you need to make multiple HTTP requests concurrently without waiting for each one to complete.

Example

Let's explore a simple example using aiohttp to retrieve public IP addresses concurrently. In this example, we'll define an asynchronous function to make HTTP requests to an IP information API.

pip install aiohttp
Enter fullscreen mode Exit fullscreen mode
import asyncio
import aiohttp

async def get_ip():
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api64.ipify.org?format=json") as response:
            data = await response.json()
            print(f"IP Address: {data['ip']}")

async def main(num_tasks):
    tasks = [get_ip() for _ in range(num_tasks)]
    await asyncio.gather(*tasks)

# Specify the number of tasks you want
num_tasks = 3

# Run the main function
asyncio.run(main(num_tasks))
Enter fullscreen mode Exit fullscreen mode

In this example, we define the get_ip() coroutine to make an asynchronous HTTP request to the ipify API, retrieving information about the public IP address. The main function creates a specified number of tasks to run the get_ip() coroutine concurrently using asyncio.gather().

By utilizing aiohttp in this manner, you can achieve high concurrency and improved speed for I/O-intensive operations, making it a valuable tool for scenarios like web scraping, API interactions, and more.

Threads

Threads allow a Python program to run multiple executions flows concurrently within a single process. Each thread has its own call stack and local state, but all threads share global memory space. This makes it simple to coordinate and share data between threads.

However, the Python Global Interpreter Lock (GIL) limits execution to only one CPU core at a time even with multiple threads. So while threads make programming easier compared to separate processes, pure computational work won't see speedup.

Threads shine for I/O-bound tasks like serving web requests, handling user interactions, performing parallel I/O read/write operations, and anything that releases the GIL. The multithreading module and threading library provide helpful primitives for working with threads in Python.

When utilizing threads, developers need to be aware of potential race conditions or deadlocks. Since threads share state, careful coordination and locking of resources is required - otherwise issues can come up with threads competing to access mutable data.

Example

Here is an example worker thread pool that fetches URLs in parallel:

from concurrent.futures import ThreadPoolExecutor
import requests

def get_ip():
    response = requests.get("https://api64.ipify.org?format=json")
    data = response.json()
    print(f"IP Address: {data['ip']}")

def main(num_threads):
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        future_tasks = [executor.submit(get_ip) for _ in range(num_threads)]

# Specify the maximum number of threads you want to be used concurrently
num_threads = 3  

main(num_threads)
Enter fullscreen mode Exit fullscreen mode

In this code, we use the ThreadPoolExecutor from the concurrent.futures module to create a pool of worker threads. The purpose is to concurrently execute multiple instances of the get_ip function, which makes a web request to retrieve information about the public IP address.

Multiprocessing

Multiprocessing allows Python code to leverage multiple CPUs and cores by running separate Python interpreter processes to achieve true parallelization. The multiprocessing module provides an API for spawning and interacting with processes similar to threading.

Because each process gets its own Python interpreter instance, the Global Interpreter Lock issue is avoided. This makes multiprocessing ideal for CPU-bound jobs like numerical computations, machine learning, scientific workloads etc. Processes have higher overhead than threads, but fully utilize modern multi-core hardware.

Inter-process communication mechanisms like queues and pipes allow processes to coordinate. Shared memory can also be used for highly efficient data interchange.

Multiprocessing avoids the GIL and allows true parallelism, but has a higher overhead than threads. There are also challenges like ensuring processes are properly joined and cleaned up to avoid orphaned processes.

Example

Here is an example using a Process pool to distribute a CPU-intensive operation:

import multiprocessing 

def fibonacci(n):
    if n <= 1:
       return n
    else:
       return fibonacci(n-1) + fibonacci(n-2)

if __name__ == "__main__":
    nums = [28, 30, 32]   
    with multiprocessing.Pool() as pool:
        result = pool.map(fibonacci, nums)

    print(result)
Enter fullscreen mode Exit fullscreen mode

The code defines a Fibonacci function for recursive calculation. In the main section, a multiprocessing Pool is created to distribute the computation across multiple processes. The pool.map() method is used to concurrently calculate Fibonacci numbers for the specified values. The results are then collected and printed after all parallel processes are completed.

Comparison

Parallelism

  • Asyncio uses a single thread and event loop so no parallelism. Threads are limited by the GIL for CPU work. Multiprocessing fully utilizes multiple CPUs for parallelism.

Overhead

  • Asyncio and threads have low overhead, just a function and lightweight threading. Processes have higher overhead from separate interpreters.

Speed

  • Asyncio is fast for I/O by not blocking but not computation. Threads help I/O and releasing GIL ops. Multiprocessing fastest for CPU intensive work.

Scaling

  • Asyncio scales for I/O throughput. Threads scale well for I/O but not CPU. Multiprocessing best scales CPU capacity.

State

  • Threads share state simply. Processes and async require communication like queues between isolated states.

In summary, for I/O heavy apps, async and threads work well. For CPU intensive programs, multiprocessing parallelizes better. Combining async networking with process pools or threads coordinating async allows optimized applications leveraging the strengths of each.

Best Practices for Hybrid Architectures

There are many effective patterns for combining these approaches to build optimized hybrid architectures:

  • Multiprocessing Pools and Asyncio - Use multiprocessing to parallelize CPU-bound tasks while also leveraging asyncio to manage non-blocking I/O operations.
  • Thread Pools and Asyncio - When I/O parallelization is more important than CPU parallelization, asyncio + thread pools are very effective.
  • Process Pools and Threads - When both fast I/O and maximizing CPU utilization are critical, process pools can parallelize computations while threads handle coordination, networking, and I/O.

When looking to optimize program speed and scalability, thoughtfully combining asynchronous programming, multi-threading, and multi-processing often yields the best results. A high-performance approach is to leverage asyncio's event loop for asynchronous I/O operations, utilize thread pools for parallelized I/O and releases of the GIL, and spawn process pools for CPU-bound parallel processing. This allows asynchronously issuing non-blocking I/O requests, threads to coordinate and manage pools of computational processes, and full multi-core CPU parallelization to occur simultaneously. Careful design is required to ensure coordination and proper resource management, but the speed and scalability gains afforded by architecting hybrid systems make it well worthwhile. Following Python best practices like avoiding shared mutable state, thoughtful use of queues for communication, and clean-up of resources can help manage some of the complexity inherent in these hybrids. The end result is incredibly high throughput from maximizing asynchrony, I/O parallelism, and multi-core parallel computing in a holistically designed architecture.

Top comments (0)