Understanding Asyncio and Multithreading in Python

#python #programming #tutorial #beginners

When we think about synchronous program, it tells us that any task must finish before the next one starts. An asynchronous program can toggle between tasks so that tasks can run concurrently without blocking each other.

In Python, let's first try to understand the concepts of a coroutine and a subroutine. We are all familiar with a function, which is also known as a subroutine, procedure, sub-process, etc. A subroutine is just a normal Python function defined with def.

On the other hand, a coroutine is something that is defined with async def and can be paused with await or yield. A normal function in Python becomes a coroutine when it is defined with the async def syntax.

The async def lets the current function temporarily pause its execution while the execution of respective (I/O, network requests, etc.) operations is in progress.

It is important to note that coroutines do not make the code multi-threaded; rather, coroutines run in an event loop that executes in a single thread. Upon the usage of async def, the said function yields a coroutine object. When the await keyword is encountered, the current coroutine is paused, and control is passed back to the event loop.

The CPU is less utilized (or might be free) when I/O (or similar operations) are in progress. For instance, copying data to an external hard drive is an I/O operation where the CPU only initiates and accepts the I/O requests. The CPU can be better utilized in such cases for performing other tasks! The event loop continuously monitors the awaitable (coroutine, Task, or Future) until it is completed. Once the execution of the awaitable or the newly picked-up task is complete, the event loop restores the execution of the paused coroutine.

See the code below:

import asyncio
import sys
import time
from datetime import datetime

async def task1():
   print("Enter " + sys._getframe().f_code.co_name + " " + str(datetime.now().time()))    # Get function name
   await asyncio.sleep(2)    # Could be an I/O operation, network request, database operation, and more
   ret_info = await task2()
   print("After sleep " + sys._getframe().f_code.co_name + " " + str(datetime.now().time()))
   return "task1"

async def task2():
   print("Enter " + sys._getframe().f_code.co_name + " " + str(datetime.now().time()))
   await asyncio.sleep(2)
   print("After sleep " + sys._getframe().f_code.co_name + " " + str(datetime.now().time()))
   return "task2"

async def main():
   print("Enter main")
   start_time = time.perf_counter()
   ret_info = await task1()
   print(f"Data received from the task1: {ret_info}" + " " + str(datetime.now().time()))
   ret_info = await task2()
   print(f"Data received from the task2: {ret_info}" + " " + str(datetime.now().time()))
   end_time = time.perf_counter()
   print("Exit main")
   print(f'It took {round(end_time - start_time,0)} second(s) to complete.')


if __name__ == '__main__':
   # main coroutine
   asyncio.run(main())

Here is the output:

Enter main
Enter task1 10:27:58.983919
Enter task2 10:28:00.985972
After sleep task2 10:28:02.988581
After sleep task1 10:28:02.988645
Data received from the task1: task1 10:28:02.988658
Enter task2 10:28:02.988674
After sleep task2 10:28:04.991250
Data received from the task2: task2 10:28:04.991315
Exit main
It took 6.0 seconds to complete.

Here's the explanation:

asyncio.run() starts an event loop and runs the coroutine main().So execution enters the main() coroutine
Prints "Enter main".start_time records the precise start time for measuring total duration.
Call and await task1():
- Control passes to task1().
- Because of await, main() is paused until task1() completes.
Inside task1(): Prints "Enter task1" along with the current time. asyncio.sleep(2) pauses task1 for 2 seconds asynchronously.
During this time, other async tasks could run — but since main() is just waiting, nothing else does.
After 2 seconds, task1 resumes: Calls task2() and waits for it to complete.
Inside task2() (called from task1):
- Prints entry message.
- Sleeps for another 2 seconds.
- Prints after-sleep message and returns "task2".
Back to task1: Once task2() finishes, task1() prints and returns "task1".
Back to main(): Prints: "Data received from the task1: task1". Then it runs another: Calls and awaits task2() again (same as before: 2-second delay).
After task2() returns:
- Prints the result of task2.
- Ends program and prints total duration.

The task1() and task2() are defined as asynchronous functions (or coroutines). During the execution of task1(), an await keyword is encountered with an async sleep of 2 seconds. This pauses the coroutine and yields control back to the event loop.

With this, the execution of task1() is paused until the completion of task2() coroutine. Post the execution of task2(), the execution of task1() coroutine is resumed, and the return value of task1 is printed on the terminal.

To sum up, asyncio is built on coroutines managed by the event loop, with tasks scheduling them and futures holding results.

Multithreading

Python achieves concurrency via context switching, which involves frequently switching between threads on single-core CPUs, giving the illusion of parallel execution or multitasking.

Multiple threads facilitate the execution of background tasks without blocking the main program.

Consider the diagram below to understand how multiple threads exist in memory:

Before moving ahead, it’s important to note that Python threading is different from threading in other languages like Java, C#, or Go, where the threads can execute simultaneously on a multicore processor. Due to the design limitations of the Python interpreter(CPython), only one thread can run at a time.

Actually, threads are managed by the Global Interpreter Lock (GIL), which allows only one thread to execute Python bytecode at a time, even on multi-core systems.

What’s really interesting is, threads in Python can coexist (run concurrently), and the interpreter switches between them rapidly (context switching).Example: Thread A runs a bit, then Thread B runs a bit, then back to A. But they cannot run at the exact same time on multiple cores (because the GIL allows only one to execute Python bytecode at once). So, threads are concurrent, not parallel, in CPU-bound tasks.

To fully use multiple CPU cores in Python, use the multiprocessing module, which creates separate processes, each with its own GIL. Now each process runs independently, true parallelism.

Here, we can create three threads that will independently download the three webpages concurrently and take varying amounts of time depending upon network conditions.
The main thread will wait until all three threads have finished downloading since the join method of all the threads is set in the main thread.
Once all three downloads are finished, the data is given to the create_single_webpage() function to create a single webpage.
If the join() method is not used, the main thread will immediately run the create_single_webpage() function before the downloads are finished and resulting in an error.

Advantages of Threading

Threaded programs run faster on computer systems that have multiple CPUs.
Threads of a process can share the memory of global variables. If the variable's value is changed in one thread, it is applicable to all threads.

In the next part of this series, I’ll dive deeper into threads and how they differ from asyncio tasks.