DeepCode offers an AI-based Static Program Analysis for Java, JavaScript and TypeScript, and Python. As you might know, DeepCode uses thousands of open source repos to train our engine. We asked the engine team to provide some stats on the findings. On the top suggestions from our engine, we want to introduce and give some background in this series of blog articles.
Language: Java
Defect: Unsanitized Input (Category Security 1)
Diagnose: Unsanitized input flows from a $Source$ and is used as a path to write files in java.io.File. This may result in a Path Traversal vulnerability.
Today's example was sponsored by openjdk/jdk14. If you want to follow along, go to deepcode.ai, on your dashboard, load the project and follow along.
Background:
As an external attacker, what vectors can you use to comprise a system? An obvious one is the input data you present to it. This is the basic idea of a whole family of attacks, ranging from SQL injection, cross-side-scripting, to path traversal and more. Some of them, like SQL injection or XSS, are so prominent these days, I guess, we are all aware and sensible. Still, we see them happen (and not only preserved in old code). Yet, it is the tip of the iceberg. It is really hard to comprehend (1) the different ways external data creeps in and (2) the sometimes complex paths the data takes. In Top Finding #5, we talked about an attack called ReDoS. Here we have a path traversal. So, let's have a look:
private boolean extract() throws IOException {
Path dir = options.extractDir != null ? options.extractDir : CWD;
try (JmodFile jf = new JmodFile(options.jmodFile)) {
jf.stream().forEach(e -> {
try {
ZipEntry entry = e.zipEntry();
String name = entry.getName();
int index = name.lastIndexOf("/");
if (index != -1) {
Path p = dir.resolve(name.substring(0, index));
if (Files.notExists(p))
Files.createDirectories(p);
}
try (OutputStream os = Files.newOutputStream(dir.resolve(name))) {
jf.getInputStream(e).transferTo(os);
}
} catch (IOException x) {
throw new UncheckedIOException(x);
}
});
return true;
}
}
The code we look at is part of the Jmod handler. Jmod are files very similar to jar-files as they are ZIP files. Jmod are not executable directly but rather used during compilation or linking.
Let us follow along with the flow of the above code: In the first line, an instance of a Path
class which is read from a global variable. Next, it opens a Jmod file and starts reading from its entries. As mentioned above, a Jmod represents actually a ZIP archive of files and subdirectories. So, our routine takes one after the other, checks if the path exists or directories have to be generated and generates if necessary. Next, resolve()
is used to generate the final file name and off we go copy the data from source to sink.
Now, think what happens if the name of the ZIP file includes relative paths? Well, with this we can actually traverse out of the given directory and literally overwrite whatever we have the right to. The attack gained the name ZIP slip for obvious reasons. OK, now you can argue "Jmods are pretty close to our heart. Why shouldn't we trust them?" and the answer should be - at least in my mind - why should we?
The best practice, in this case, is to use getCanonicalPath()
as this function resolves all relative elements in a path. Then check if the path leads into a subdirectory that is clean to use.
String canonicalDirAllowed = ...
File os = new File(destinationDir, e.getName());
String canonicalOSPath = os.getCanonicalPath();
if (!canonicalOSPath.startsWith(canonicalDirAllowed + File.separator)) {
throw new UncheckedIOException("Entry is outside of the target dir: " + e.getName());
}
DeepCode learned this from other repositories. The example drawn is using the normalize()
function which does the same thing. Let us inspect the suggestion of DeepCode.
What you can see is in the two lower highlighted red how the original code did the exact same thing as our Jmod code. The file path was taken from the ZIP file and used directly. If you look at the corresponding green parts, you can see how the code was changed to normalize, check the path, and to log an error accordingly. It provides a nice recipe to follow and do it right.
For me, this is an amazing example of how learning can help to understand and find the right solution.
Top comments (0)