Discussion on: ⚡ ️Blazing Python 🐍 Scripts with Concurrency ⚡️️

View post

Hi CED, glad you wrote this article. There are a few definitions that contain imprecisions or incorrect statements that I think should be addressed, mostly because they trip a lot of developers. Concurrency is hard, there's no going around that

In the beginning you say:

Concurrency simply means having multiple computations happening at the same time.

This is strictly incorrect, as it's a possible definition of parallelism, not concurrency. Unfortunately this topic gets tricky really quickly because there is not one way to define both, let's use "looking at the system from the outside" as a lens.

Concurrency means that in, let's say, 10 seconds, more than one unit of work makes progress, but if you observe the system the system might be executing only one of those computations at any given time. This is why single thread async programs can be concurrent, because every independent unit of work makes progress in the lifetime of the program (basically using a mechanism similar to that of a basic operating system: rotating among unit of works and giving them attention for a small period of time).

If, by using concurrency and by observing the system, more than one computation is executed at the same time (literally), as my body can breathe and type without alternating between the two, then you achieved parallelism. So in this scenario the system has to be concurrent already (both breathing and typing advance while I'm writing this comment) but it has to have the resources to be parallel (my typing motor skills and the breathing apparatus do not communicate and work independently).

All the ThreadPoolExecutor subclass is doing is simply creating a Pool of Threads that are executed at the same time

Same as above, they could be, maybe not. You hint at the "not" scenario when the threads are not executing at the same time when you talk about the GIL later on and boundness.

Multiprocessing involves the use of two or more CPUs within a single computer system.

I guess you mean cores here.

By default, multiprocessing is not possible with the Python 🐍 programming language due to the GIL or Global Interpreter Lock hindrance.

This is a typo. The GIL limits multithreading, not multiprocessing. I wouldn't call it a hindrance but more a design choice, but that's entirely a personal point of view, the objective part is that I'm sure you meant multithreading here, not multiprocessing.

Hope this helps, concurrency is tricky :-)

CED • May 28 '19

Thanks for the input @ryhmes. I knew concurrency was really trick that why I was a bit skeptical about writing it. Changes have been made

rhymes • May 28 '19

Eheh don't worry, it's just a complicated topic and the fact that a lot of tutorials around still confuse the two doesn't help. It took me a while as well to have the picture clear in my head.

Thanks for the corrections!

Ian Kirker • May 29 '19

If your use of these terms is accurate to current use, I feel like they’ve been misappropriated a little.

Concurrent literally means running simultaneously, while parallelism expresses multiplicity but also similarity, so it seems like parallelism should be reserved for code where the same block or instruction is being executed concurrently (whether via SIMD instructions or SIMD code) while concurrency would refer to e.g. execution of multithreaded code where one thread handles IO, and one handles computation, but both still run simultaneously.

By extension, code written to be able to execute multiple threads simultaneously should be incapable of actual concurrency if executing on a single (non-pipelined) processor core because in any sufficiently small slice of time, only one thing is happening at once. That is to say, context-switching shouldn’t be counted as concurrency.

rhymes • May 29 '19

Thanks for the comment Ian. It's interesting because there's no universal consensus on the definition of the two. It's not the first time that I hear slightly different definitions of the two terms. I believe it's also because we tend to create definition in abstract and not relative to a context.

Taking into account your premises: how would you define code that runs in a single thread but that's able to advance multiple units of computation? The age old c10k problem implemented using select/poll/epoll/kqueue/iocp for example.

Ian Kirker • May 30 '19

Hm, yes, this almost seems analogous to the question of "how many things are you juggling at a time" vs "how many things do you have in your hand when you're juggling N things".

As to the c10k question: it's not a question I'd thought about -- it's still a form of multithreading, isn't it? Only one thing happens at once, but context-switching between threads, even if they're not formally labelled as threads, say, as green threads, 'coroutines', or even just handler contexts, can occur. It seems like "multiplexing" might even be an appropriate term to use, analogously to the use in signal transmission.