DEV Community

Cover image for DeepCode's Top Findings #11: Synchronizing Strings
cu_0xff 🇪🇺 for DeepCode.AI

Posted on • Originally published at Medium

DeepCode's Top Findings #11: Synchronizing Strings

DeepCode offers an AI-based Static Program Analysis for Java, JavaScript and TypeScript, and Python. As you might know, DeepCode uses thousands of open source repos to train our engine. We asked the engine team to provide some stats on the findings. On the top suggestions from our engine, we want to introduce and give some background in this series of blog articles.

Language: Java
Defect: Do not use Strings for synchronization.
Diagnose: Do not use Strings for synchronization, this could lead to unwanted deadlocks and race-conditions. Strings with the same value might be represented as the same object in the JVM.

I found this one in the JDK 14 code. I copied parts of it out to this repo here https://github.com/CU-0xff/jdk14_parts to make things manageable. So, if you want to follow along, open that repo in DeepCode.ai. It is a really nasty one, as it forces you to know some of the implementation details of the JVM to understand the problem. Here is the code of interest:

class WindowsPath implements Path {
...
    // normalized path
    private final String path;
...
        // cache the resolved path (except drive relative paths as the working
        // directory on removal media devices can change during the lifetime
        // of the VM)
        if (type != WindowsPathType.DRIVE_RELATIVE) {
            synchronized (path) {
                pathForWin32Calls = new WeakReference<String>(resolved);
            }
        }
...

Background:

Synchronized

Let us introduce the idea of synchronized() first. In a multi-threaded environment, we have the issue to ensure that access to variables is managed properly. Java provides synchronized as functions or blocks to manage concurrent access. Below is a simple example:

class ListOfNumbers {
    int number;

    void printListOfNumbers(final int n) {
        this.number = 0;
        for (int i = 1; i <= 10; i++) {
            try {
                // Simulate some substantial work
                Thread.sleep(400);
            } catch (final Exception e) {
                System.out.println(e);
            }
            this.number += n;
            System.out.print(String.format("%d ",this.number));
        }
    }
}

class MyThread extends Thread {
    ListOfNumbers l;
    int flag;

    MyThread(final ListOfNumbers l, int flag) {
        this.l = l;
        this.flag = flag;
    }

    public void run() {
        l.printListOfNumbers(flag);
    }
}


class TestSynchronization1 {
    public static void main(final String args[]) {
        final ListOfNumbers obj = new ListOfNumbers();// only one object
        final MyThread t1 = new MyThread(obj, 1);
        final MyThread t2 = new MyThread(obj, 7);
        t1.start();
        t2.start();
    }
}

Here we have a class ListOfNumbers, that stores an integer and prints a series of it by looping and adding it. Now, we have two threads that are working on one object instance of ListOfNumbers and it happens as you would expect it. We would like to see 1 2 3 4 5 6 7 8 9 10 7 14 21 28 35 42 49 56 63 70on the console, but we get 8 8 9 16 24 24 25 32 39 39 46 47 55 55 62 62 69 69 76 77, then 1 1 9 9 17 10 24 25 32 32 40 40 48 48 55 56 57 57 65 65, then...

This teaches us two things: (1) Obviously, the two threads are overwriting each other as they are pointing to the same object. (2) The outcome is unpredictable as it depends on the sequence of how the threads are called.

With a simple change, we can sort the issue:

class MyThread extends Thread {
...
    public void run() {
        synchronized(l) {
            l.printListOfNumbers(flag);
        }
    }
...

With this, a thread asks for control over the instance before changing its state by calling the method. So far, so good.

Why not using synchronized() with strings?

In Java, Strings are immutable; they cannot be changed. If you overwrite an existing one, you will generate a new instance. There are several reasons why the language designers decided to do so, amongst them is that Strings are thread-safe (as they do not change, several threads can use them concurrently). But there is something that is called String Pool. Under the hood, the JVM optimizes the data usage. If the program asks for a string to be created and one with the same content already exists in the pool, the JVM returns the already existing instance instead of creating a new one.

Now, you get the larger issue of using a String in synchronized. You could end up locking your application because two strings have the same content (aka point to the same object) but within the code, they are different variables. Good luck debugging this one. By the way, see the DeepCode-provided explanation in this regard. Spot on I would say. In our example above, the path variable is a string and it is not guaranteed to be unique.

If you are using synchronized(), make sure to use an object that is not a string. One way for our example could be to use synchronized(this) as it would use an instance of a complex class.

This example also shows the difference between dynamic and static analysis. In the dynamic case, we would have to have the case of having two strings with the same content as a test case or by running the system, falling over it. In the static case, we test if the case is within all possible cases and ring the bell.

As you see from the text, DeepCode found this issue fixed in 57 projects, so it is not that uncommon. Have a look at your own code on deepcode.ai.

Discussion (0)