I was refactoring some program and I discovered that my concurrence was not working well: I programmed some
concurrent.futures.ThreadPoolExecutor tasks but some of them were waiting until the others finished.
This was a problem because the program was launched once each hour and it was not finishing in time for the next run. I had the program running several times (not a problem of concurrency since it is mostly well behaved, but definitely a non desirable way of working).
The problem? Well, these methods (
ProcessPoolExecutor, ...) have a
max_workers limit which is defined as follows:
Changed in version 3.8: Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.
The machine where this program is running is a cheap 1 vCPU machine, and this is a problem. However, my processes are very light, they just do some input, wait some time, do some output and that's all.
The solution? You can count (or at least, have some idea about it) the number of threads and set an adequate value for this
In my case:
with concurrent.futures.ThreadPoolExecutor(max_workers=75) as pool:
That is, 75 workers. As stated previously I have no problems with these processes and this allows the program to run all of them in a concurrent way.