A post or so ago, as a follow up to A Dive into Python Implementation, we had a discussion on the differences between multi-threading and multi-processing, this being a result of IronPython's feature to allow multi-threaded code to use multi-core processors. So in this article, I thought we'd open up the boxes for a while just to see what these two mean. Sit back, take a coffee, and lets fire up those neurons.
Now, for an understanding of the concepts, we should differentiate between a process and a thread.
Say I write a program:
sample_prog.py. Once I ran this and let it do what it does best, I have created an instance of the program. In definition, this is a process.
To break this even further, if my program
sample_prog.py was, say, taking multiple files, editing the title of these files, and moving them in a different directory once processed, I would have broken the process down to threads. In this case, I have two threads:
- Read the file and change the title within it.
- Copy this file into a different directory of processed files.
Don't get it wrong, this is the same program instance(also called process) doing different things.
If we were to come back to something closer to home, you have opened a tab on TheGreenCodes right now, while at the same time, I assume, have another tab open locked into your favourite site.
In this case, you have one program running, your Firefox browser as a program (Of course, it's not Internet explorer open, is it?); this is the instance, that is, a process. With this process, you have two tabs, in escence, two threads of the same process.
You have decided to download a file from the page while you continue reading the rest of the installation instructions. You have two threads running here, based on the same browser instance. Do they execute at the same time? Not at all. At any given time, a single thread is running. By now you may be wondering what this means as you can clearly see the download progress bar. What is actually happening is
context switching. They do not happen at the same time, but rather keep switching based on time allocated per thread or the thread being over(download in this case) altogether. To you, it will look like it is happening all at once.
Back to cores for a moment. Your underlying hardware, unless from a time-traveling adventure, definitely has more than one core. Take a dual-core machine, for instance, multithreading will now allow you to take single threads through dedicated cores. That's why, hardware with more cores seems faster, simply because it is. It is taking advantage of the multiple cores to run each thread differently. It's a win-win, no core is burdened as they share the load while you get a faster experience.
Photo by Roman Kraft
Now, to see it in action, let's do something simple to illustrate just how threading works:
import threading def my_amazing_function(user_input): print(f"Duplicating user input for process 1: ") print(user_input * 10) print("Do many other things for my_first_process: ...") def my_other_function(user_input): new_output = user_input + "*" print(new_output * 10) print("Do many other things for my_second_process:...") if __name__ == "__main__": my_input = "7 " # create threads for each function thread1 = threading.Thread(target=my_amazing_function, args=(my_input,)) thread2 = threading.Thread(target=my_other_function, args=(my_input,)) # start both processes in parallel.Go forth and do many things thread1.start() thread2.start() # report and tell me (the parent process) the results thread1.join() thread2.join()
Consider, in this instance, we have a program that has two methods. Fell free to add more and adjust accordingly.
These two methods each take a parameter. Leaving the function definitions aside, let's take a look at the thread.
thread1 = threading.Thread(target=my_amazing_function, args=(my_input,)) thread2 = threading.Thread(target=my_other_function, args=(my_input,))
thread1, we said 'create a thread that addresses the function
my_amazing_function'. Now from our definition of the
my_amazing_function, it should take and argument, and hence the extra param:
args=(my_input,). This same principle applies to
thread2. Place a class or function, the power is in your hands.
We then set off to start each of these threads. You can relate this to your OOP principles of creating an object, then calling a method of the object.
Once these things are done, they need to be linked back to the parent process(our program instance) so that they are not left hanging in the system. Simple as that. We created two threads from one process.
The result of this, in most programs, is simple:
- Shorter execution time.
- Application responsiveness.
Take, for instance, not having to wait for your browser to complete the download before scrolling down through the page; responsiveness. This is the fundamental of threading.
For safety reasons, we cannot have threads access the same point of memory(RAM). Hence we have GIL. Because the interpreter will work with one thread at a time, without proper management we might have a scenario where, during context switching, one thread needs a resource that the other has in its possession; a deadlock.
If both threads will use a particular resource, they are locked to limit the amount of that resource they can access. This means that no one particular thread will take 100%(lock) everything of the resource, leaving the others starved during the context switch.
Another interpreter using the same principle would be PHP and Perl. The concept behind this is memory management or as another might call it memory safety. In particular, garbage collection which can be further termed as, in this case, reference counting.
How do other languages handle this?
For one, we have our very own C and C++ which require manual memory management. These trust the developer with the power to allocate and free memory themselves, hence the methods: malloc, realloc, calloc, and free. Do not get started with the pointers right now, you'll need another brew of coffee. Among others, we also have automatic reference counting or ARC, for Swift and Objective C.
You can have multiple threads, as many as you want! It just gets a little tricky handling all of them at once if you are not careful about what exactly is started, what is stopped, what is working as a daemon, and so on. Get to know what program using threads might be an advantage and use these accordingly.