Unicorn Developer

Posted on Mar 26

Closed-world assumption in Java

#java #jvm #graalvm #softwaredevelopment

Building Native Image for a Java application requires configuration of reflection, proxies, and other dynamic Java mechanisms. But why is this necessary if the JVM handles all of this automatically? To answer that, we need to look at the differences between static and dynamic compilation in Java.

Issue

At PVS-Studio, we are currently designing new static analyzers for JavaScript/TypeScript and Go to complement our existing tools. The Java team was tasked with developing the first version of the JavaScript/TypeScript analyzer. To avoid distributing a JRE and to gain performance profit, we decided to build the JavaScript/TypeScript analyzer into Native Image, i.e., to turn the Java application into a native program.

However, everything had its price, even performance. GraalVM immediately laid out its requirements: we had to explicitly specify in the configuration which classes would be accessed via reflection, which proxies would be created, which resources would be included in the binary, and which functions would be called via the Foreign Function & Memory API.

This raised a natural question: "Why does the VM automatically handle all dynamic behavior, while native build require configuration files?"

The intuitive response: "That's just how GraalVM works." But that explanation feels unsatisfying at all. It gave the impression that Native Image might be an immature technology or a questionable compromise. Of course, that's not the case. To understand why, we need to step back from Java for a moment and look at a more fundamental question: what does it actually mean to execute a program?

Recap: What is a program, and how is it executed?

When we write code in Java, Kotlin, or another object-oriented language, we work with classes, methods, and packages. However, from the processor's perspective, none of this exists. All code ultimately gets converted into binary data: sets of instructions, constants, and data.

To pass control to the section or access data, the processor needs the exact address. These addresses can be obtained in several ways:

fixed (hard-coded);
relative (calculated based on the current position);
dynamic (using tables, pointers, and late binding).

In statically compiled languages such as C, C++, Go, or Rust, address resolution happens during compilation via the compiler and linker.

A linker is a program that combines object files (output by the compiler) into a single executable file or library. It matches method calls with their definitions.

During the linking, all symbols and invocation points are known, so the linker has no trouble building the binary.

A symbol is a named entity in the code (a function and a variable).

Dynamic Java

Unlike statically compiled languages, where all addresses are resolved by the time the program runs and no additional loading occurs, Java was originally designed as a dynamic runtime environment. Its model is fundamentally different: the program is not compiled into a finished binary with fixed addresses, but remains a set of bytecode that is interpreted and optimized at runtime.

Several fundamental language properties help the virtual machine implement this behavior:

classes are not loaded entirely during the startup, but as needed via ClassLoader;
in bytecode, method calls are represented by symbolic references rather than addresses;
the actual method that will be called is determined only at runtime;
class metadata (fields, methods, annotations) is preserved and accessible at runtime.

A symbolic reference refers to a class instance using its name rather than its direct memory address. For example, it can call PrintWriter#println instead of going to the 0x954A address.

This allows Java to have such goodies as reflection, dynamic proxies, ServiceLoader, and DI frameworks like Spring, Micronaut, or Quarkus.

At the core of this dynamism is an architecture in which the virtual machine serves as the center of the Java "universe." The program "revolves" around the JVM: it retrieves addresses from it, calls methods, and entrusts the entire execution process to it.

Essentially, the JVM serves as the central hub of execution:

all classes revolve around it;
all calls go through it;
all decisions about what exactly will be called are made by it at runtime.

In other words, a Java application does not contain fixed addresses—the JVM determines them.

Next, I'll try to explain how the JVM works on the example of a BMW axle.

The JVM is like the BMW X6 axle

Imagine the JVM not as a virtual machine, but as the axle of a BMW X6 powered by a 4.4-liter S63B44T4 V8 engine producing 625 horsepower.

The whole system can look impressive: the engine delivers hundreds of horsepower, the electronics are sophisticated, and it has a multi-link rear suspension. But it is the axle that connects rotation, load, and motion into a single, coherent system.

The JVM plays that very role. It's the structural element through which all aspects of program execution are coordinated and brought together.

To be more specific, the JVM:

controls class loading;
allows symbolic references;
stores and interprets metadata;
determines which method implementation to invoke;
supports reflection, proxies, and dynamic behavior substitution.

Let's get even more specific. The Java program says, "I want to call a method named equals with the (Object)Object signature." The JVM responds: "Okay, let's find its address and call it."

As long as this "axle" exists, everything works as it should. But Native Image removes it, leaving nothing to coordinate the components with.

Leap of faith: Native Java build

A native build attempts to turn a program designed to run within a complex dynamic environment into a self-contained binary that runs directly on the operating system.

This means that:

the JVM, as the central hub, disappears;
decisions previously made at runtime must now be made ahead of time.

The linker requires a complete and closed call graph, and this is where a fundamental contradiction arises.

In Java:

classes can be loaded dynamically;
methods can be called via reflection;
proxies can be generated at runtime.

A static linker views it as a black box. In general, it is impossible to determine in advance exactly what will be called. What should we do?

Why not just enable everything?

At first glance, the solution seems obvious: if we can't determine what will be used, enable everything. In practice, however, this approach does not work.

First, it's binary size. Including all classes, metadata, and JVM infrastructure means increasing the binary size by several times. One of the key advantages of Native Image is lost.

Second, it's compilation performance. Analyzing and parsing the data requires significantly more time and memory.

But even if we're ready to sacrifice size and build time, we'd still face a third problem: reflection wouldn't work automatically anyway. What matters is not just the existence of classes, but the preservation of specific metadata: constructors, methods, and signatures. Without knowing in advance what will be accessed via reflection, it's impossible to retain all required metadata correctly.

So, the include-everything strategy fails both in theory and in practice.

Graal Native Image Configuration

Native Image is based on the closed-world assumption: all possible execution paths must be known at compile time. The build process relies on reachability analysis:

a call graph is constructed;
everything not included in this graph is considered unreachable and removed.

Java's dynamic mechanisms break this rule: reflection, proxies, and resources introduce hidden entry points—code paths that may be invoked at runtime but are invisible during analysis. Even code that seems static can generate such implicit calls, which means it requires configuration for native compilation.

We encountered this issue when we needed to read a value from the Windows Registry within the analyzer. We couldn't find a suitable library, so we decided to use the relatively new Foreign Function & Memory API, which allows native methods to be called directly from Java.

The actual registry access code would be interesting but too complex for an example, so let's simplify it to the standard "Hello, World" output in the console.

class Sandbox {
  void main() throws Throwable {
    Linker linker = Linker.nativeLinker(); // (1)
    MethodHandle printf = linker.downcallHandle( // (2)
        linker.defaultLookup().findOrThrow("printf"), // (3)
        FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.ADDRESS)
    );

    try (Arena arena = Arena.ofConfined()) { // (4)
      MemorySegment cString = arena.allocateFrom("Hello, world!\n"); // (5)
      printf.invoke(cString); // (6)
    }
  }
}

Here, we obtain a linker for native calls (1); create a MethodHandle for the printf method (2); resolve the method name at runtime (3); open an Arena, a scope for managing off-heap memory (4); allocate a block within it, write a C-style string into it (5), and call the method via the MethodHandle (6).

MethodHandle is a reference to executable code. You can learn more about this technology here.

FYI, the equivalent code in C looks like this:

int main() {
    printf("Hello, world!\n");
    return 0;
}

On the JVM, this code works correctly, but when the Native Image runs with no appropriate configuration, an error occurs. The stack trace shows that the Native Image can't set up the mechanism for calling the native method: it doesn't know how to pass arguments or receive the result because the call signature wasn't known at compile time. Graal is asking us to add information about the call point:

{
  "foreign": {
    "downcalls": [
      {
        "returnType": "jint", // (1)
        "parameterTypes": [ "void*" ] // (2)
      }
    ]
  }
}

This configuration doesn't describe a specific method, but rather the form of a native call that may occur at runtime. From this, Graal learns the size of the return value, how it should be received and interpreted after control is returned (1), as well as the argument size, how it's passed to native code, and which passing rules should be followed (2).

In other words, the configuration replaces decisions that the JVM would normally make at runtime.

Results

What can we take away from this?

The JVM can handle everything on its own because it is the runtime decision-making center. Native Image can't rely on such improvisation because it makes all decisions ahead of time.

Configuration in GraalVM Native Image is not a quirk, a limitation, or a design flaw. It is a direct consequence of Java's dynamic nature and the attempt to bring it into the world of static compilation.

If the JVM is the axle around which a Java application revolves, the Native Image is a snapshot of that system frozen at a specific moment. And to ensure that this snapshot is correct, we need to explicitly specify in the configuration exactly what should be included.