DEV Community

Cover image for DeepCode's Top Findings #9: Deadlocks
cu_0xff 🇪🇺 for DeepCode.AI

Posted on • Originally published at Medium

DeepCode's Top Findings #9: Deadlocks

DeepCode offers an AI-based Static Program Analysis for Java, JavaScript and TypeScript, and Python. As you might know, DeepCode uses thousands of open source repos to train our engine. We asked the engine team to provide some stats on the findings. On the top suggestions from our engine, we want to introduce and give some background in this series of blog articles.

Language: Python
Defect: Deadlock (Category Defect)
Diagnose: Using wait() can deadlock if the child process prints larger output. Use communicate().

This finding is sponsored by an ansible extension for spacewalk. As usual, you can open it in (deepcode.ai)[https://deepcode.ai]. The repo is ansible/ansible.

def spacewalk_report(name):
    """Yield a dictionary form of each CSV output produced by the specified
    spacewalk-report
    """
    cache_filename = os.path.join(CACHE_DIR, name)
    if not os.path.exists(cache_filename) or \
            (time.time() - os.stat(cache_filename).st_mtime) > CACHE_AGE:
        # Update the cache
        fh = open(cache_filename, 'w')
        p = subprocess.Popen([SW_REPORT, name], stdout=fh)
        p.wait()
        fh.close()

    ...

Let us follow along with the code above: First, it generates a filename and checks if either the file does not exist or is outdated. If so, it generates the file. Then it asks subprocess to spawn a command and write the results into the file as the output pipe. Then it waits until this is done and closes the file. The rest is not of importance to us for now.

Background:

It made me think about pure functions. Let me explain a bit. A pure function is defined by two aspects: First, it returns the same value whenever it is called with the same parameters. So, it does not depend on the state of the program or environment. Secondly, it has no side-effects, so no changes to non-local variables or I/O streams. One nice thing on these pure functions is that we can easily test them, right? We can write unit tests easily for example.

I guess the problem is that we are actually paid for the not pure parts of our applications. Changing the external state by writing data or switching off the oven is what our customers actually want. We hardly can stay in a pure world only.

The unpure world though is a jungle, my friends. Because unpure means physics kicks in. As an example: In the theory of relational databases no indexes are needed. Because in theory, the access time is zero. In reality, hard drive heads need to move and that takes time. So, indexes speed up access time.

In our little example, we are calling an external command and wait for it to finish. I mean what could go wrong?

A deadlock means that two processes are waiting on different resources for each other. The child waits for the parent to do something, while the parent waits for the child. Both of them are in a stalemate. As you can see, DeepCode warns us that wait can deadlock the child process. In between child and parent, there is a limited pipe I/O buffer. Given the child would fill this buffer with data, the system would force the child to wait until the parent process reads off significant amounts of data from the buffer. With typical mutexes, the developer is normally aware of deadlock situations but with these I/O buffers, developers tend to forget about it.

It was found to be an issue that was corrected by 41 projects out of our training set. Actually, if you click on more info, it brings you here which is the official Python documentation telling you the same. It states the same as DeepCode: To use Popen.communicate() instead. Also, see the examples DeepCode shows to see how to use the command. The communicate method takes responsibility for avoiding deadlock; it only sends the next chunk of the stdin string when the subprocess is ready to read it, and it only tries to read the next chunk of stdout or stderr when the subprocess is ready to provide it.

Don't get me wrong: communicate is not the silver bullet. You should think about the size of your data and choose the proper means to communicate between the sub-processes.

Yet, there is even more to this story: Obviously, we have no error management here. No time out, no way to find out that the hard-drive just filled up (yeah, physics with these inconvenient issues :-) ). That is simply a bad style.

Whenever interacting with the external, unpure, world, make sure to ask yourself these questions:
(1) Do I call something that might not come back in time or maybe never?
(2) Do I receive data from an external, untrusted source?
(3) What error conditions might arise resulting from the size or makeup of the data, timing issues?

With the list in hand, decide what aspects to address in code by handling them and whatnot (as it is too unlikely or pain-versus-gain does not favour a handler). But make sure, you write them down as not handled error cases.

Next, maybe you want to add some testing on the ones you addressed.

Top comments (0)