DEV Community

Java is Unsound: The Industry Perspective

Ross Tate on January 24, 2017

Recently Nada Amin and I discovered that Java and Scala are unsound. We submitted the finding and related discussion to OOPSLA, an academic confere...

Read full post

Andrea Giammarchi • Jan 24 '17

This is a great post but did you really have to ruin it underlying some poor JavaScript knowledge?

const increment = i => i + 1;
increment("98765432");

That's a "987654321" because types in JS matters too.

Accordingly, that sentence fired back already:

"Sure, in JavaScript this might make sense, but Java is implemented very differently"

No, JS developers aren't that stupid or anti-academic, so thanks for dropping any reference to a programming language that has nothing to do with this report, and/or your findings, about Java.

From a JS, Python, PHP, C, C#, and Java developer.

Ross Tate • Jan 24 '17

Hi Andrea!

I know that's what it does in JavaScript. That's the point; I'm contrasting with JavaScript. JavaScript's implementation is designed to be safe without static types; consequently it heavily uses run-time type information to determine if an operation is safe and how it should execute. Java, however, relies on static type information to make run-time lookups more efficient (e.g. virtual-method tables and interface-method tables). Consequently, while this is safe in JavaScript and has a sensible semantics, in Java it would result in nonsense and even mangle memory (well, in more complex cases, not this one).

So sorry this came off as deriding JavaScript. It was meant as a positive: JavaScript handles this sensibly, whereas Java fails miserably (and for good reason in both cases).

Andrea Giammarchi • Jan 24 '17

Hi Ross,
we're not quite on the same line here.

"while this is safe in JavaScript and has a sensible semantics, in Java it would result in nonsense"

That's the thing, it's about the time the industry understand JavaScript is not just a toy driven by kids.

Nobody in the JS developers world I know would ever write that side effecting monstrosity that clearly denote whoever wrote that function had no idea what she/he was doing.

So yes, that example is nonsense in JS too.

JIT also would deopt from a function invoked with randomly typed arguments, so that everything that can be optimised upfront from VM side is vane .... because Java chaps still think JS is all about not caring about types?

This is yet another misunderstanding of what JavaScript is, how it is used, and what are developers common-sense practices to not abuse its coercion-based type system that can produce side effects.

As summary: is it really necessary to bring in poor, and unrealistic, JavaScript usage in this post? Do you really need JS to prove your point in here? I don't think so.

People bought domains to drop the useless comparison and the misunderstanding, that reading in 2017 finally somebody demonstrated Java ain't sound so it's actually not even superior to other PLs, but it has to underline a non-real-world deriding case of another language, feels a bit lame.

Use Python, PHP, use whatever else you want, not its misunderstood rival for the last 25 years please.

Thank you!

javascript.crockford.com/javascrip...
javascriptisnotjava.io

Seamus Connor • Jan 24 '17

But the behavior of the above is well defined for JavaScript. But it shouldn't happen in Java, and creates behavior that contradicts the intent of the language designers.

None of the code in this article is something you would expect in production.

Andrea Giammarchi • Jan 24 '17

The intent of the Java code is to sum num + 1 and there is an allusion that's what Javascript magically is going to do since the intent, expecting a number as input and a number as output, is clear to anyone.

Saying that passing a string is fine on JS side is insulting all JavaScript developers that perfectly understand the intent of that function and would never pass a bloody string to it.

Is it really that hard to understand?

Ross Tate • Jan 24 '17

Ooohhh, I think I get what you're saying. Sorry, I was so focused on the example that I didn't carefully consider the words around it. I changed the phrasing to make my intent clearer. Can you let me know if the new version addresses the concern? If so I'll update the other renditions of the article.

Andrea Giammarchi • Jan 24 '17

It's better ... but ... TypeScript, as well as Flowtypes, allow types in JS via tooling that would never let you write the following code:

function increment(i:Number):Number {
  return i + 1;
}
// double error, the inpout is wrong, expected result too
const countdown:String = increment("98765432");

The JS env and surrounding tooling got better since 90's so once again,
is it really necessary to include at all in this document, an outdated, misleading, inappropriate, abuse of a scripting programming language that already has been historically confused, or wrongly compared, to Java ?

What I am saying is that the correct replacemente for this sentence:

"Sure, JavaScript can successfully execute this program despite its nonsensical nature,"

would probably be like:

"Sure, a version of the code that wouldn't have types specified in it, used by a rookie developer that wouldn't care about JavaScript primitive types and their coercion possible side effects, could successfully execute this program despite its nonsensical nature and wrong expectations in terms of returned value ..."

Do you see this is not going to really add anything to this document?

Do you see there's no point at all to even name JS in a scenario that nobody would be since year 2000 ?

Anyway, thanks for, at least, trying to understand what I meant.

Best Regards

Ross Tate • Jan 25 '17

If I felt like this example wasn't necessary, I would remove it. But the problem is that it's in there because people were getting confused about specifically the issues I discuss there, namely that soundness is about the interplay between the type system and the semantics. Covariant arrays, raw types, and JavaScript were each explicitly mentioned by various people, so I discussed each one.

Moving forward, it sounds like, although not perfect, I've at least managed to make things a little better. Ideally I could provide the fully clarified explanation, but attention spans are limited, so I'll leave it at this compromise.

Thanks for helping me understand your perspective better, even though it took me a while to at least get a little close!

W. Brian Gourlie • Jan 25 '17

The juxtaposition seemed appropriate and certainly didn't appear to be an affront to javascript or javascript developers. I'm thoroughly confused by the strong reaction to it.

Ross Tate • Jan 25 '17

Oh, Andrea, I forgot to say, that website you linked to is amazing! Java is to JavaScript as ham is to hamster. Fantastic! I'll keep that site in mind next time a friend gets confused.

Shon Vella • Jan 25 '17

Minor nitpick: "Similarly, whenever get is called on a List, the runtime checks if the returned value is actually a String in case a raw type was misused, throwing a ClassCastException otherwise." Yes, in the code given there will be a ClassCastException, but not because the runtime system checks the return value of the get(), but rather because the runtime system checks the value of the assignment against the type of the variable to which it is being assigned. Because of type erasure, the runtime system doesn't know that the list is only supposed to hold strings, but it does know that the variable "one" can only hold a string and sees that the value being assigned is not one.

Ross Tate • Jan 25 '17

Sorry, I was being imprecise. What happens is actually somewhere in between. The bytecode has to be well-typed; you can't assign an object to a String variable (i.e. slot on the stack). So the runtime will never even execute the code if such an ill-typed assignment is in there. Consequently, the compiler inserts casts into the bytecode to make it well-typed. The runtime doesn't know why those casts are there, just like it doesn't know the list is supposed to be a List<String>, it just executes them because it was told to. This seemed like too much detail to go into, so I just sloppily called everything the runtime, heheh.

Cole Markham • Jan 24 '17

First of all, I certainly appreciate your work. It's never safe to assume that something works without having some proof to back that up.

Regarding the correctness of the compiler on the first example, I will point you to the Eclipse bug report I filed where Stephan Herrmann explains it why this should not compile according to the JLS: bugs.eclipse.org/bugs/show_bug.cgi... It seems like there are multiple versions of the JLS in use by the different compilers, and that is the one of the reasons for the discrepancies.

I like the last example with the use of an Interface to hide the broken implementation, very clever. Just goes to show that you need to verify that your 3rd party libraries don't have bugs.

Ross Tate • Jan 24 '17

I'm glad you liked it! And note that even a proof doesn't ensure you're safe. There are actually multiple proofs that Java's type system is sound. The problem is they don't handle all of Java (can you imagine how many years that would take to do?), and for historical reasons they happen to not handle null. Consequently, all the proofs were correct, but they failed to model Java correctly and missed this bug. Validation vs. Verification, if you know the software-engineering terms.

As for the bug, you made me realize I wrote this version of the paper a little incorrectly. In the original paper, I made sure to emphasize that the type-checker was not to blame. The reason is that, even though the type-checker is rejecting some valid Java code, that's necessarily going to happen. Type-checking Java is undecidable, and so it's impossible for a compiler to be always right. Our examples happen to fit into the space where some compilers succeed and some compilers fail. So in fact the program should type check, but it seems harsh to blame a compiler for occasionally failing to do the impossible, especially when the JLS's suggested algorithm fails to do so as you have found!

Michael Ostapenko • Jan 25 '17

Nice catch. Please put a comma before 'and T' or even better use 'while' or the like in the following sentence, because it is very confusing.

Many people out the type Constrain and say it should be invalid because Constrain requires its second type argument to be a subtype of its first and T, and every supertype of T, is not a subtype of U.

Ross Tate • Jan 25 '17

Yuck, that sentence is messy! Thanks for letting me know. I fixed it, or at least I fixed it a bit.

Michael E. Allen • Jan 26 '17

As mentioned by many already, this is a great post and really helps clarify the issue both technically and practically. I actually tried the code you posted and found that Java7 (1.7.0_67) compiled the program and ran to the expected runtime exception. When I tried Java8 (1.8.0_77), however, the program failed to compile and flagged the "return bind.upcast(constrain, t);" line with this error:
The method upcast(Unsound.Constrain, B) in the type Unsound.Bind is not applicable for the arguments
(Unsound.Constrain, T)

Does that mean that Java8 is more sound, or that the compiler is broken?

Ross Tate • Jan 26 '17

The compiler is broken. Also note that the paper has other examples that you should try. So far we've been able to get every compiler to (correctly) compile unsound code. Even the future Java compiler!

Paul Gier • Jan 27 '17

Similar to an earlier comment, I noticed the statement "whenever get is called on a List, the runtime checks if the returned value is actually a String" seems to be incorrect. The compiler and runtime will happily let you do something like this:

Integer myInt = (Integer)(Object)strs.get(0);

So the runtime did not enforce that the object returned from "get" was actually a string. The ClassCastException only occurs if you try to assign the value to a String.

Ross Tate • Jan 30 '17

Right. I simplified things for pedagogical purposes. The compiler inserts casts in order to make the resulting bytecode well typed. Since typically the result of strs.get would be used as a String, typically the cast is inserted. But examples such as yours show that this is not always done.

W. Brian Gourlie • Jan 24 '17

Amazing article. I wish every research paper was accompanied by something similar.

jlongo62 • Jan 25 '17

Does c# suffer these kinds of security related issues? Does it have it's own kind of different issues ? If so can you elaborate on them?

Ross Tate • Jan 25 '17

I don't think it does. The subtitle of the paper is "The Existential Crisis of Null Pointers" because there seem to be two key requirements for a language to exhibit this problem:

Something that represents existential quantification strong enough that it can introduce new evidence. So for Java, the type Constrain<U,? super T> is essentially the type "there exists some type X that is a supertype of T and a subtype of U", which in turn implies T is a subtype of U if it's inhabitable.
Implicit null pointers, meaning a null value that inhabits most types, or at least the ones used to form the existential quantification.

C# satisfies #2 but not #1. It doesn't really have any form existential quantification, and certainly not anything strong enough to introduce new evidence. Thus I don't think there's a way to use null to trick the type-checker into thinking two unrelated types are related.