DEV Community

Cover image for A Java myth busted (or is it?)
paxel
paxel

Posted on

A Java myth busted (or is it?)

tl;dr for-loops are not necessary way more performant than stream()

when in JDK 8 stream() was introduced to the java Collections, the immediate reaction was: "they are sooooo
much slower than a normal for-loop".
I tested it and indeed it was slower. by alot. (I remember at some point that oracle admitted that it is slower, but the focus was on functionality and the performance would probably improve in the future. alas I can not find a source for my memory, so let's assume it is wrong.)

what definetly exists are lots of articles about how slow streams are compared to for-loops. e.g. nipafx, who proved it with JMH and Angelika with the compelling argument, that the compiler optimization for loops is too good to be beaten by streams.

some developers took this fact and kept it stored in their brain forever. but streams were introduced 2014. 8 years have passed. how does it look today? is it really as slow as some repeatedly declare? let's find out.

I wrote a set of benchmarks that (to my knowledge) use the correct procedure in JMH.

  • create the data in a @State object
  • destroy the result in a blackhole

I let it process loops of 10, 10_000 and 10_000_000 entries. and these are the results:

10 entries

Benchmark                                               Mode  Cnt       Score       Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25  691000.985 ±  5338.170  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25  687244.094 ±  2287.375  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25  620127.959 ± 11149.611  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25  601047.148 ±  6901.828  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25  593137.918 ±  7027.976  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25  583345.516 ±  2706.945  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  752205.384 ±  3155.479  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  753751.877 ±  2618.748  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  732847.868 ±  1268.374  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25  725538.827 ±   859.032  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  725200.238 ±   825.300  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  723650.793 ±  1007.079  ops/s
Enter fullscreen mode Exit fullscreen mode

10_000 entries

Benchmark                                               Mode  Cnt      Score     Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25   4700.019 ±  13.206  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25   4613.177 ±  52.664  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25   4718.937 ± 232.897  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25   1369.088 ±  10.711  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25   1337.578 ±  10.015  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25   1383.158 ±  49.265  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  39043.233 ± 708.907  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  42027.702 ±  91.457  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  40108.355 ± 123.484  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25   9309.883 ±  13.252  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  14033.988 ±  13.011  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  13440.062 ±  98.916  ops/s
Enter fullscreen mode Exit fullscreen mode

10_000_000 entries

Benchmark                                               Mode  Cnt   Score   Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25   1.256 ± 0.044  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25   1.240 ± 0.038  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25   1.182 ± 0.052  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25   0.321 ± 0.006  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25   0.324 ± 0.005  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25   0.322 ± 0.006  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  39.874 ± 0.326  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  40.546 ± 0.356  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  40.263 ± 0.374  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25  14.993 ± 0.083  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  14.795 ± 0.091  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  14.746 ± 0.076  ops/s
Enter fullscreen mode Exit fullscreen mode

Conclusion

the benchmarks ran for 6 hours and ironed out most of the peaks.
the result is in operations per second, so the bigger the better.

in case you're too lazy to look what the benchmarks mean:

  • For is a normal modern for-loop for (X x:xs) that uses an iterator to run over the entries.
  • ForGet is a an old-school for (int i = 0; i < xs.size();x++) that calls then get(i) on an ArrayList
  • Stream is the modern stream() variant.
  • Collect adds all entities of the list to a set.
  • CollectFiltered adds only selected values to the set
  • EasyTask sums up all the entries
  • HeavyTask does a bit more if else and math stuff with the entries.

My expectations

I would guess that the ForGet benchmarks will be the fastest one of the three, because there will be no Iterator generated and get(i) on ArrayList is basically only a wrapped array access.

I would also assume that the For is faster than the Stream because it only generates one more Iterator instance, while stream() generates a bunch of instances to process the data.

I also assume that this overhead will go away with longer loops. One instance vs 10 instance on 10 million iterations is neglectable.

The result

The data looks almost as expected, except that stream() does not at all look like always the slowest. Feel free to check my benchmark code and maybe I did a mistake.

It looks that with short loops (few entries) the stream is up to 11% slower than a for-loop. but it depends very much on what you execute. the easyTask is the worst. the filteredCollect also not looking good for 10 entries.
but this changes already with 10_000 entries: then filteredCollect with stream is the fastest.

SOOOO I think the difference between the three measured loops is irrelevant.
I don't think that any of them is "way faster".
all three work very different, some have more overhead, but might be more intelligent, but none of them will be the bottleneck in any way.

some numbers seem odd and therefore I ran the benchmark twice to eliminate background processes interfering with the result.

Rule of thumb maybe:

  • short simple tasks probably better a for-loop.
  • long complex tasks probably better a stream()

as soon as you have to handle exceptions the for-loop is better anyway, because that is terrible in stream()

There is the one benchmark that looks suspicious: heavyTaskFor with 10_000 entries. I will repeat that again and comment on it. I assume my machine did something weird at that time. ignore it please for now

cheers, thanks for reading.

Top comments (1)

Collapse
 
paxel profile image
paxel • Edited

interestingly enough, the result was reproduced

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25   9350.889 ±  16.505  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  14035.277 ±  14.330  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  13519.908 ± 122.445  ops/s
Enter fullscreen mode Exit fullscreen mode

has someone an idea why the for approach fails so badly for 10_000 entries?

Edit: I assume the JIT somehow fails to optimize the loop?

third time, all of them parameterized

Benchmark                                              (entries)   Mode  Cnt       Score       Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor            10  thrpt   25  683731.274 ±  4940.686  ops/s
JmhStreamPerformanceMeasurement.collectFilteredFor         10000  thrpt   25    4715.825 ±    37.622  ops/s
JmhStreamPerformanceMeasurement.collectFilteredFor      10000000  thrpt   25       1.272 ±     0.041  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet         10  thrpt   25  679803.035 ±  3960.936  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet      10000  thrpt   25    4650.183 ±    34.151  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet   10000000  thrpt   25       1.244 ±     0.039  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream         10  thrpt   25  633705.579 ± 10922.524  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream      10000  thrpt   25    4408.448 ±   263.090  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream   10000000  thrpt   25       1.122 ±     0.036  ops/s
JmhStreamPerformanceMeasurement.collectFor                    10  thrpt   25  589769.815 ±   704.271  ops/s
JmhStreamPerformanceMeasurement.collectFor                 10000  thrpt   25    1379.825 ±    11.198  ops/s
JmhStreamPerformanceMeasurement.collectFor              10000000  thrpt   25       0.321 ±     0.007  ops/s
JmhStreamPerformanceMeasurement.collectForGet                 10  thrpt   25  596967.646 ±  7079.486  ops/s
JmhStreamPerformanceMeasurement.collectForGet              10000  thrpt   25    1351.077 ±     7.104  ops/s
JmhStreamPerformanceMeasurement.collectForGet           10000000  thrpt   25       0.323 ±     0.007  ops/s
JmhStreamPerformanceMeasurement.collectStream                 10  thrpt   25  581813.319 ±  4143.488  ops/s
JmhStreamPerformanceMeasurement.collectStream              10000  thrpt   25    1368.152 ±    40.238  ops/s
JmhStreamPerformanceMeasurement.collectStream           10000000  thrpt   25       0.317 ±     0.008  ops/s
JmhStreamPerformanceMeasurement.easyTaskFor                   10  thrpt   25  761951.821 ±  1284.488  ops/s
JmhStreamPerformanceMeasurement.easyTaskFor                10000  thrpt   25   39482.277 ±    59.233  ops/s
JmhStreamPerformanceMeasurement.easyTaskFor             10000000  thrpt   25      40.127 ±     0.341  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet                10  thrpt   25  765037.977 ±  1172.411  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet             10000  thrpt   25   42575.020 ±   102.708  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet          10000000  thrpt   25      40.428 ±     0.229  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream                10  thrpt   25  729735.395 ±  1595.568  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream             10000  thrpt   25   40602.726 ±    92.506  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream          10000000  thrpt   25      40.162 ±     0.429  ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor                  10  thrpt   25  724110.762 ±   482.638  ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor               10000  thrpt   25    9371.744 ±    11.235  ops/s
JmhStreamPerformanceMeasurement.heavyTaskFor            10000000  thrpt   25      15.364 ±     0.081  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet               10  thrpt   25  725171.042 ±   294.252  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet            10000  thrpt   25   14014.992 ±    11.633  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet         10000000  thrpt   25      14.778 ±     0.081  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream               10  thrpt   25  721638.858 ±   758.221  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream            10000  thrpt   25   13649.282 ±    95.292  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream         10000000  thrpt   25      14.734 ±     0.083  ops/s
Enter fullscreen mode Exit fullscreen mode