DEV Community

Cover image for DevOps with Python: Python ”concurrent.futures“ Concurrency Tutorial - A Real-World Example
Tiexin Guo
Tiexin Guo

Posted on • Originally published at Medium

DevOps with Python: Python ”concurrent.futures“ Concurrency Tutorial - A Real-World Example

Author's note: this blog post ISN'T a beginner's guide to Python or DevOps.

Basic knowledge of Python, DevOps, Kubernetes, and Helm is assumed.

0 Background

0.1 Why Python

Programming languages rise and fall over time.

TIOBE, a Dutch software quality assurance company, has been tracking the popularity of programming languages. According to its programming community index (and its CEO, Paul Jansen, for that matter), Python ranks No.1 now: "for the first time in more than 20 years we have a new leader of the pack: the Python programming language. The long-standing hegemony of Java and C is over."

0.2 Why DevOps with Python

To quote Real Python:

Python is one of the primary technologies used by teams practicing DevOps. Its flexibility and accessibility make Python a great fit for this job, enabling the whole team to build web applications, data visualizations, and to improve their workflow with custom utilities.

On top of that, Ansible and other popular DevOps tools are written in Python or can be controlled via Python.

Plus, I'm a big fan of Python's easy-to-read, no-bracket code style.

One might wonder why easy-to-read is so essential. The 'puter has no problem executing code with ambiguous variable names, lengthy functions, a single file of a thousand (if not thousands) of lines of code, or all of them together, anyway. It will run properly, right?

Well, yes. But to quote Knuth:

Programs are meant to be read by humans and only incidentally for computers to execute.

All the methodologies and ideas, like refactoring, clean code, naming conventions, code smell, etc., are invented so that we, humans, can read the code better, not computers can run it better.

0.3 Why Concurrency

OK, this one is easy:

Because we can.

Jokes aside, the reason is, of course, performance:

Concurrent is faster (usually).

For instance, if you have multiple Helm charts installed in one namespace of a Kubernetes cluster and you want to purge all the Helm releases, of course, you can uninstall them one by one, waiting for the first release to be uninstalled, then start uninstalling the second, etc.

For some applications, the Helm uninstall part can be slow.

Even if for a few simple charts, uninstalling them concurrently can still drastically save time.

Based on a local test, uninstalling three helm charts (Nginx, Redis, and MySQL) one by one takes c.a. 0.8 second, while it takes 0.48s if done concurrently, a whopping 40% reduction.

If the scale of the problem goes up, like you have tens of charts to uninstall and you need to do them in multiple namespaces, the amount of time saved must be addressed.

Next, let's deal with this particular example using Python.


1 The Task

You have multiple teams and developers who share the same Kubernetes cluster as the dev ENV.

To achieve resource segregation, one namespace is assigned to each developer. Each developer needs to do some Helm install to get their apps and dependencies up and running so that they can develop and test them.

Now, since there are many namespaces, many Helm releases in each namespace, and many pods, which take up many nodes, you might wanna optimize the cost by deleting all those pods at the end of the working hours so that the cluster can scale down to save some dollars of the VM cost.

You want some form of automation that uninstalls all releases in some namespaces.

Let's tackle this issue in Python. For demonstration purposes, we will install nginx, redis and mysql into the default namespace, then write some automagic stuff to delete 'em.

Let's go.


2 Preparing the Environment for Testing Our Automation

Not that we are doing test-driven development, but before writing any code, let's create a local environment as a mock of this issue at hand so that we have something to test our automation script.

Here, we use Docker, minikube, and Helm. If you haven't installed them yet, check out the official websites:

Start your local Kubernetes cluster by running:

minikube start
Enter fullscreen mode Exit fullscreen mode

Then, install some Helm charts in the default namespace, which we will use automation to delete later:

# make sure we select the default namespace
kubectl config set-context --current --namespace=default

# add some helm repos and update
helm repo add nginx-stable https://helm.nginx.com/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# install three applications that we will use automation to delete
helm install nginx nginx-stable/nginx-ingress
helm install redis bitnami/redis
helm install mysql bitnami/mysql
Enter fullscreen mode Exit fullscreen mode

Local testing mock done.


3 Non-Concurrent Version

First, let's write some single-thread, non-concurrent code to solve this issue.

We will use the subprocess module to run helm list to get all the releases, run a simple loop over all the releases, and then helm uninstall them. Nothing fancy here; only use Python to run some CLI commands.

Talk is cheap; show me the code:

https://gist.github.com/IronCore864/ca1e74a65f4a97937d93c63c094e9d32


4 Introducing concurrent.futures

4.1 multiprocessing and threading

Before jumping right into concurrent.futures (as advertised in the title of this blog), let's talk multiprocessing and threading for a bit:

  • The threading module lets you work with multiple threads (also called lightweight processes or tasks) — multiple threads of control sharing their global data space.
  • multiprocessing is a package that supports spawning processes. The multiprocessing solves the Global Interpreter Lock issue using subprocesses instead of threads.

When choosing between the two, simply put (might not be 100% precise, but that's the gist):

  • If your task is CPU-intensive, go for multiprocessing (which bypasses the GIL issue by utilizing multiple processes instead of threads).
  • If your task is I/O-intensive, the threading module should work.

4.2 What is concurrent.futures, anyway?

Now that we've got these two modules out of the way, what's concurrent.futures?

It is a higher-level interface to start async tasks and an abstraction layer on top of threading and multiprocessing. It's the preferred tool when you just want to run a piece of code concurrently and don't need the extra functionalities provided by the threading or multiprocessing module's API.

4.3 Learn with an Example

OK, enough theory, let's get our hands dirty and learn by an example, an example from the official documentation:

import concurrent.futures
import urllib.request

URLS = ['http://www.cnn.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        data = future.result()
        print('%r page is %d bytes' % (url, len(data)))
Enter fullscreen mode Exit fullscreen mode

Some observations:

  • An executor, whatever it is, is required. (Which can either be ThreadPoolExecutor or ProcessPoolExecutor, according to the doc.)
  • The Executor.submit() method "submits" (or, in plain English, "schedules") the function calls (with parameters) and returns a Future object.
  • The concurrent.futures.as_completed method returns an iterator over the Future instance.

5 The Concurrent Code to Solve the Task

Once we understand the syntax and get a basic understanding of how it actually works, by copying and pasting from the example and being a little creative, it's easy to convert our non-concurrent version from the previous section into something concurrent. To put it all together:

See the code below:

https://gist.github.com/IronCore864/ad21130aa796d407624805c5342201db

Voila!

Note that the concurrent.futures part is of precisely the same structure as the official concurrent.futures example.


Summary

concurrent.futures cheat sheet:

    # or, with concurrent.futures.ProcessPoolExecutor()
    with futures.ThreadPoolExecutor(max_workers=5) as executor:
        future_objects = {
            executor.submit(some_func, param1, param2 ...): param1 for param1 in xxx
        }

        for f in futures.as_completed(future_objects):
            res = future_objects[f]
            do_something()
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: use ThreadPoolExecutor for I/O-intensive workload and ProcessPoolExecutor for CPU-intensive workload.

If you enjoyed this article, please like, comment, subscribe. See you in the next piece.

Latest comments (0)