<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander Reelsen</title>
    <description>The latest articles on DEV Community by Alexander Reelsen (@spinscale).</description>
    <link>https://dev.to/spinscale</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F485382%2F77f713e3-e0e0-461f-8fc9-14ad62c4fc79.jpeg</url>
      <title>DEV Community: Alexander Reelsen</title>
      <link>https://dev.to/spinscale</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/spinscale"/>
    <language>en</language>
    <item>
      <title>Handling JDK &amp; GC options dynamically in Elasticsearch</title>
      <dc:creator>Alexander Reelsen</dc:creator>
      <pubDate>Thu, 03 Dec 2020 10:11:16 +0000</pubDate>
      <link>https://dev.to/spinscale/handling-jdk-gc-options-dynamically-in-elasticsearch-57ll</link>
      <guid>https://dev.to/spinscale/handling-jdk-gc-options-dynamically-in-elasticsearch-57ll</guid>
      <description>&lt;p&gt;TLDR; Today we will dive into the start up of Elasticsearch, how it parses the configurable JVM options and how it can ergonomically switch between JVM options on startup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.elastic.co/elasticsearch/"&gt;Elasticsearch&lt;/a&gt; is a distributed search &amp;amp; analytics engine. Elasticsearch's full text search capabilities are based on &lt;a href="https://lucene.apache.org/"&gt;Apache Lucene&lt;/a&gt;. It's the heart of the Elastic Stack and powers its solutions Enterprise Search, Observability and Security as well as many well known internet websites like Wikipedia, GitHub or Stack Overflow.&lt;/p&gt;

&lt;p&gt;Elasticsearch tries to be a good JVM ecosystem citizen and ships with a recent distribution of the JVM. Elasticsearch 7.9.3 ships with a recent OpenJDK 15 distribution. One of the core principles of Elasticsearch is to get up and running as simple as possible. This is the reason why Elasticsearch ships a JDK, so that the user does not have the trouble of installing one. Not everyone is a Java expert after all! At some point however, you need to become at least a small expert, as you need to configure some JDK options like setting the heap.&lt;/p&gt;

&lt;p&gt;In order to be able to configure JDK options for Elasticsearch before startup, these options need to be parsed and evaluated. When the user runs &lt;a href="https://github.com/elastic/elasticsearch/blob/7.9/distribution/src/bin/elasticsearch"&gt;./bin/elasticsearch&lt;/a&gt; or &lt;a href="https://github.com/elastic/elasticsearch/blob/7.9/distribution/src/bin/elasticsearch.bat"&gt;./bin/elasticsearch.bat&lt;/a&gt;, some more Java programs are started &lt;strong&gt;before&lt;/strong&gt; the actual Elasticsearch process is fired up. First a program to &lt;a href="https://github.com/elastic/elasticsearch/blob/7.9/distribution/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/TempDirectory.java"&gt;create a temporary directory&lt;/a&gt; is launched, which acts differently on Windows than on other operating systems. Second, the &lt;a href="https://github.com/elastic/elasticsearch/blob/7.9/distribution/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/JvmOptionsParser.java"&gt;JvmOptionsParser&lt;/a&gt; class is used to determine the Java options, and only after this is done, the output of the parser is used to start the main Elasticsearch process. This also allows to run the other Java programs with small heaps to make sure they are fast - by using the JDK defaults.&lt;/p&gt;

&lt;p&gt;Let's dive into the mechanism to configure JVM options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring JVM options with Elasticsearch
&lt;/h2&gt;

&lt;p&gt;The most commonly used jvm option that requires configuration before the Elasticsearch Java process is started, is setting the heap size. In order to do so, Elasticsearch makes use of &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.9/jvm-options.html"&gt;a mechanism&lt;/a&gt;, that not only reads the &lt;code&gt;config/jvm.options&lt;/code&gt; file but also reads the &lt;code&gt;config/jvm.options.d&lt;/code&gt; directory and appends the contents of all files to create a big list of JVM options.  You could create a file like &lt;code&gt;config/jvm.options.d/heap.options&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# make sure we configure 2gb of heap
-Xms2g
-Xmx2g
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would configure the heap on startup. However the configuration and parsing mechanism is more powerful. Not only you can configure options, you can also configure different options for different JDK major versions.&lt;/p&gt;

&lt;p&gt;Side note: In case you are asking yourself, why is there a &lt;code&gt;jvm.options.d&lt;/code&gt; directory and not just a file: this caters properly for package upgrades of RPM or debian packages, so that the original &lt;code&gt;jvm.options&lt;/code&gt; can be replaced and does not need to be edited.&lt;/p&gt;

&lt;p&gt;So, why is this useful you might ask yourself? Well, sometimes a new Java release deprecates features, and sometimes features get removed. One of those features was the CMS Garbage Collector, which got deprecated in &lt;a href="https://openjdk.java.net/jeps/291"&gt;Java 9&lt;/a&gt; and finally removed more than two years later in &lt;a href="https://openjdk.java.net/jeps/363"&gt;Java 14&lt;/a&gt;. Elasticsearch has been a happy user of the CMS for years, but with the removal there had to be a mechanism to start with another garbage collector as of Java 14 onwards. In order to support this the JVM options parser also supports the ability to set certain options only for a certain Java version like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same applies for different GC options with Java 8 and Java 9&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can read more about &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html"&gt;setting JVM options&lt;/a&gt; in the official Elastic docs.&lt;/p&gt;

&lt;p&gt;There is another &lt;a href="https://github.com/elastic/elasticsearch/blob/7.9/distribution/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/JvmErgonomics.java#L95-L130"&gt;safeguard&lt;/a&gt; to append all configured and dynamically created JVM flags and start a JVM is to check if those options are compatible, before starting Elasticsearch in order to fail fast.&lt;/p&gt;

&lt;p&gt;Also, Elasticsearch logs all JVM options on start up to allow for easy comparison of what is assumed by the user. Also, those options are not only logged, but can be retrieved using the &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-nodes-info.html"&gt;nodes info API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ergonomic defaults
&lt;/h2&gt;

&lt;p&gt;So, with an infrastructure in place like that, can we do more fancy things than just parsing JVM options? Of course we can! Ideas anyone?&lt;/p&gt;

&lt;p&gt;One of the advantages is to supply some useful standard JVM options, when starting Elasticsearch. There is a &lt;a href="https://github.com/elastic/elasticsearch/blob/7.10/distribution/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/SystemJvmOptions.java"&gt;SystemJvmOptions&lt;/a&gt; class, that lists a couple of interesting options like setting the default encoding to UTF-8 or configuring the DNS TTL caching - which is important as Elasticsearch always enables the Java Security Manager.&lt;/p&gt;

&lt;p&gt;Also, we can enable some options only, when a certain JDK version is in use. This enables dereferenced null pointer exceptions in Java 14 and above&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;maybeShowCodeDetailsInExceptionMessages&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;JavaVersion&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;majorVersion&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;JavaVersion&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;CURRENT&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"-XX:+ShowCodeDetailsInExceptionMessages"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But this infrastructure can go even further, and become smarter over time. How about providing different JVM options depending on configuration settings like the heap?&lt;/p&gt;

&lt;p&gt;This is exactly what has been worked on in a &lt;a href="https://github.com/elastic/elasticsearch/pull/59667"&gt;recent addition&lt;/a&gt; to Elasticsearch.&lt;/p&gt;

&lt;p&gt;If a small heap is configured in combination with the G1 garbage collectors, some additional options are configured.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCForSmallHeap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCForSmallHeap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heapSize&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCHeapRegion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
    &lt;span class="n"&gt;tuneG1GCHeapRegion&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finalJvmOptions&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCForSmallHeap&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCInitiatingHeapOccupancyPercent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;tuneG1GCInitiatingHeapOccupancyPercent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finalJvmOptions&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCReservePercent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;tuneG1GCReservePercent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finalJvmOptions&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tuneG1GCForSmallHeap&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, what happens here and why? If less than 8GB of heap are configured - which is more often than you think, as many users are also running smaller instances of Elasticsearch and there is an ongoing effort of using less heap and offload this to other parts of the system - three additional options are set. Of course everything can be manually overwritten.&lt;/p&gt;

&lt;p&gt;First, the size of a G1 heap region is set to 4 MB, using &lt;code&gt;XX:G1HeapRegionSize=4m&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Second, the heap occupancy threshold, which triggers a marking cycle is set to &lt;code&gt;XX:InitiatingHeapOccupancyPercent=30&lt;/code&gt;, somewhat earlier than the default of &lt;code&gt;45&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Third, the &lt;code&gt;G1ReservePercent&lt;/code&gt; options is set to 15 instead of 25 percent in the small heap case, in both cases deviating from the default of 10 percent.&lt;/p&gt;

&lt;p&gt;It took months of benchmarking and testing to come to these numbers, if you are interested in the discussion, there is a lengthy &lt;a href="https://github.com/elastic/night-rally/issues/246"&gt;GitHub issue&lt;/a&gt;. In case you are wondering how those kind of issues surface during testing Elasticsearch. Elasticsearch is using nightly benchmarks on bare metal hardware to easily spot and investigate regressions. You can check out those &lt;a href="https://benchmarks.elastic.co/index.html"&gt;benchmarks here&lt;/a&gt;. The tool used for this is called &lt;a href="https://github.com/elastic/rally"&gt;rally&lt;/a&gt;, a macrobenchmarking framework for Elasticsearch. One of the great features of rally is, that you can use your own data and queries to test and benchmark, so having your own nightly benchmarks is possible.&lt;/p&gt;

&lt;p&gt;So, why have those options been picked, you may ask yourself. Thanks to the benchmark infrastructure testing became easy, but not the reason for testing. After switching from CMS to G1 a few benchmark results got worse and required investigation. One of the approaches was also to test the ParallelGC for really small heaps instead of G1, but this was abandoned.&lt;/p&gt;

&lt;p&gt;We even managed to find a bug in our G1 configuration options. In order to understand the issue let's explain some Elasticsearch functionality. Elasticsearch utilizes circuit breakers to prevent overloading of a single node by accounting memory, for example when creating an aggregation response or receiving requests over the network. Once a certain limit is reached, Elasticsearch's circuit breaker will trip and return an exception.  The idea here is to prevent the famous &lt;code&gt;OutOfMemoryError&lt;/code&gt;, and tell the user that the request cannot be processed and also indicate if that is temporal or permanent issue. Since Elasticsearch 7.0 a &lt;a href="https://www.elastic.co/blog/improving-node-resiliency-with-the-real-memory-circuit-breaker"&gt;real memory circuit breaker&lt;/a&gt; has been added, that takes the total heap into account instead of only the currently accounted data, which is more exact.&lt;/p&gt;

&lt;p&gt;However this circuit breaker did not work in combination with the shipped G1 settings, as the configured settings assumed &lt;a href="https://github.com/elastic/elasticsearch/pull/46169"&gt;a heap bigger than 100%&lt;/a&gt; of what was configured and so the circuit breaker tripped before the garbage collector started its job of garbage collection per the supplied configuration. Also, the memory circuit breaker was enhanced with some G1 specific code to &lt;a href="https://github.com/elastic/elasticsearch/pull/58674"&gt;nudge G1 to do a young GC&lt;/a&gt; at some point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;As you can see, properly handling and parsing as well as choosing good default JDK options like switching from one garbage collector to another involves quite a bit of steps, infrastructure, testing, running in production &amp;amp; verification - and the same probably applies to your own applications as well.&lt;/p&gt;

&lt;p&gt;The same applies to all the new generation garbage collectors like &lt;a href="https://wiki.openjdk.java.net/display/zgc/Main"&gt;ZGC&lt;/a&gt; and&lt;br&gt;
&lt;a href="https://wiki.openjdk.java.net/display/shenandoah/Main"&gt;shenandoah&lt;/a&gt;. Those will require extensive testing, proper CI integration and maybe a even a few changes in the code. Albeit those GCs promise huge improvements, make sure you are testing properly with your own workloads before jumping on those.&lt;/p&gt;

&lt;p&gt;Also, never forget, that a tiny portion of your users will want to set their own options and cater for that properly, including upgrades.&lt;/p&gt;

</description>
      <category>elasticsearch</category>
      <category>java</category>
      <category>jvm</category>
      <category>elasticstack</category>
    </item>
  </channel>
</rss>
