I Assumed Java Streams Had Minimal Overhead. They Didn’t

#java #programming #performance

Many of us have heard the same mantra: idiomatic code comes at little to no cost. The benefit to legibility vastly outweighs the performance boost.

I recently finished evaluating performance trade-offs, which I previously suggested in passing. Digging into the JIT output, I was surprised by how much additional assembly was generated to support the abstractions I use.

Until now, I mostly accepted the conventional wisdom. Fueled by the assumption that any difference was in the margin of error. On the rare occasion I tested this, in a hand-wavy, pseudo-scientific manner. For once, I wanted to measure it in a more rigorous way.

The Experiment

The prior test I mentioned was a bit too involved. So, I decided to create a very simple test. In this case, I would pit the tried-and-true for loop against the newer Streams API. The goal of this was just to find the cost of aggregating 1,000,000 double values using the two mechanisms.

double retVal = 0;
for (double val : values) {
    retVal += val;
}

versus

double result = Arrays.stream(test).sum();

Benchmark Methodology

JMH was my go-to for doing the benchmark. JMH allows you to test different implementations and does so with rigour. This ensures the code is run enough times to maximize the effectiveness of the JIT.

For this test, I ran on OpenJDK 17 (Hotspot) with a target architecture of ARM64. JMH was configured to have a warmup of 5 iterations and 5 forks. Results were consumed to avoid dead-code elimination, and the input array was pre-generated in a setup step.

Results

I knew to expect some performance degradation between the two approaches, but I was not prepared for what happened next. On my machine, the for loop outperformed the Streams implementation by roughly five times the operations per second.

What The JIT Produced

This didn't settle well with me. So, I generated the hotspot log file to see the disassembly for this program. In this particular compilation, the Streams version produced hundreds more instructions and significantly more memory loads/stores.

When you stack it up, there is a clear correlation between memory access instructions and operations per second.

Is The Cost Justified?

Many will be quick to notice that this is a toy benchmark and is not representative of real-world usage. To that end, you would be right. However, what this does do is boil a problem down to something simple and measurable. When you boil away all the complexity, we can see the real cost of the abstraction by itself.

This isn't to say that all code needs to be optimized. In fact, I have quite enjoyed the legibility of streams. However, it does come at a cost. The next time you are staring down some critical section of code, remember that something as simple as changing your iteration strategy can make all the difference.

DEV Community