DEV Community ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ป

DEV Community ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ป is a community of 964,423 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Cover image for Java Performance - 3 - A java Performance Toolbox
Yousef Zook
Yousef Zook

Posted on

Java Performance - 3 - A java Performance Toolbox

Recap

This article is part 4 for the series Java Performance that summarize the java performance book by Scot Oaks

In the previous chapter we have discussed performance testing methods. We have mentioned the difference between Micorbenchmarks, Macrobenchmarks and Mesobecnhmars. We have also talked about the responsetime, throughput and variability.

In this chapter we are going to discuss some intersting measurement tools for cpu, network and disk. We will understand the difference different profilers in java and talk a bit about JFR Java Flight Recorder.

Great, let's start the third chapter...
Intro

Chapter Title:

A java Performance Toolbox

Performance analysis is all about visibilityโ€”knowing what is going on inside an application and in the applicationโ€™s environment. Visibility is all about tools. And so performance tuning is all about tools.

1) Operating System Tools and Analysis

The starting point for program analysis is not Java-specific at all: it is the basic set of monitoring tools that come with the operating system.
We are going to see a quick look on operating system methods to take a look into the usage of:

  • CPU --> vmstat
  • Disk --> iostat
  • Network --> nicstat

A- CPU Usage

CPU usage is typically divided into two categories: user time and system time (Windows refers to this as privileged time).

  • User time is the percentage of time the CPU is executing application code.
  • System Time is the percentage of time the CPU is executing kernel code.

Goal is to maximize the cpu utilization.

If you run vmstat 1 on your Linux desktop, you will get a series of lines (one every second) that look like this:
vmstat
As you can find in the output:

  • Each second has a system time = 3% and user time = 42% approximately.
  • CPU total time [aka utilization] is 45% This means that the cpu is idle for 55% of the time.

The CPU can be idle for multiple reasons:

  • The application might be blocked on a synchronization primitive and unable to execute until that lock is released.
  • The application might be waiting for something, such as a response to come back from a call to the database.
  • The application might have nothing to do.

These first two situations are always indicative of something that can be addressed. If contention on the lock can be reduced or the database can be tuned so that it sends the answer back more quickly, then the program will run faster, and the average CPU use of the application will go up (assuming, of course, that there isnโ€™t another such issue that will continue to block the application).

Java and Single CPU:
If code is batch-style application, then the cpu will not be idle, because it has work to do always [if job is blocked for i/o or something, another batch can use the cpu .. etc]
...
Java and multi CPU:
The general idea is the same as in single cpu, however making sure individual threads are not blocked will drive the CPU higher.

CPU Run Queue

You can monitor the number of threads that can be run [aka not blocked]. Those threads are called to be in the CPU Run Queue. You can find the length of the run queue in the previous image at the first column procs r
vmstat queue length

Note

  • In linux: the number equals the number of currently running threads [those that are using the processors] and the others who are waiting for processors to use.
  • In Windows: the number equals the number does NOT count the currently running threads. So the goal in linux is to make this queue length = the number of machine processors, and in windows to make it = 0.

B- Disk Usage

Monitoring disk usage has two important goals.

  • The first pertains to the application itself: if the application is doing a lot of disk I/O, that I/O can easily become a bottleneck.
  • The second reason is to monitor disk usage, even if the application is not expected to perform a significant amount of I/Oโ€”is to help monitor if the system is swapping.

You can use iostat command to monitor the disk, Let's see an example:
iostat

  • This application is writing data to disk sda.
  • w_await: the time to service each I/O write
  • util: the disk utilization

Applications that write to disk can be bottlenecked both because they are writing data inefficiently (too little throughโ€ put) or because they are writing too much data (too much throughput).

C- Network Usage

If you are running an application that uses the networkโ€”for example, a REST serverโ€”you must monitor the network traffic as well.
You can use nicstat to monitor the network, it is not the default of the system but it's opensource with more features.
nicstat

Applications that write to the network can be bottlenecked because they are writing data inefficiently (too little throughโ€ put) or because they are writing too much data (too much throughput).


2) Java Monitoring Tools

To gain insight into the JVM itself, Java monitoring tools are required. These tools come with the JDK:

A- JVM Commands

  • jcmd: Prints basic class, thread, and JVM information for a Java process.
  • jconsole: Provides a graphical view of JVM activities, including thread usage, class usage, and GC activities
  • jmap: Provides heap dumps and other information about JVM memory usage. Suitable for scripting, though the heap dumps must be used in a postprocessing tool.
  • jinfo: Provides visibility into the system properties of the JVM, and allows some system properties to be set dynamically. Suitable for scripting.
  • jstack: Dumps the stacks of a Java process. Suitable for scripting.
  • jstat: Provides information about GC and class-loading activities. Suitable for scripting.
  • jvisualvm: A GUI tool to monitor a JVM, profile a running application, and analyze JVM heap dumps (which is a postprocessing activity, though jvisualvm can also take the heap dump from a live program).

if you are using docker, you can run them using docker exec except jconsole and jvisualvm.

These tools fits into these broad areas:
โ€ข Basic VM information
โ€ข Thread information
โ€ข Class information
โ€ข Live GC analysis
โ€ข Heap dump postprocessing
โ€ข Profiling a JVM

B- Basic VM Information

  • Uptime The length of time the JVM has been up can be found via this command:

% jcmd process_id VM.uptime

  • System properties

% jcmd process_id VM.system_properties

or

% jinfo -sysprops process_id

  • JVM version The version of the JVM is obtained like this:

% jcmd process_id VM.version

  • JVM tuning flags The tuning flags in effect for an application can be obtained like this:

% jcmd process_id VM.flags [-all]

Note you can change tuning flags dynamically at runtime using jinfo command, example:

% jinfo -flag -PrintGCDetails process_id # turns off PrintGCDetails
% jinfo -flag PrintGCDetails process_id


3) Profiling Tools

Profilers are the most important tool in a performance analystโ€™s toolbox. Many profilโ€ ers are available for Java, each with its own advantages and disadvantages.

Many common Java profiling tools are themselves written in Java and work by โ€œattachingโ€ themselves to the application to be profiled. This attachment is via a socket or via a native Java interface called the JVM Tool Interface (JVMTI).
This means you must pay attention to tuning the profiling tool just as you would tune any other Java application. In particular, if the application being profiled is large, it can transfer quite a lot of data to the profiling tool, so the profiling tool must have a sufficiently large heap to handle the data.

Profiling happens in one of two modes:

  • sampling mode
  • instrumented mode

A- Sampling Profilers

Pros:The basic mode of profiling and carries the least amount of overhead.

Cons: However, sampling profilers can be subject to all sorts of errors, for example, the most common sampling erro is as shown in the figure below:
Image description
The thread here is alternating between executing methodA (shown in the shaded bars) and methodB (shown in the clear bars). If the timer fires only when the thread happens to be in methodB, the profile will report that the thread spent all its time executing methodB; in reality, more time was actually spent in methodA.

Reason: this is due to safepoint bias, which means that the profiler can get the stack trace of a thread only when the thread is at safepoint, when they are:
โ€ข Blocked on a synchronized lock
โ€ข Blocked waiting for I/O
โ€ข Blocked waiting for a monitor
โ€ข Parked
โ€ข Executing Java Native Interface (JNI) code (unless they perform a GC locking function)

B- Instrumented Profilers

Pros: Instrumented profilers are much more intrusive than sampling profilers, but they can also give more beneficial information about whatโ€™s happening inside a program.

Cons: They are much more likely to introduce performance differences into the application than are sampling profilers.

Instrumented profilers work by altering the bytecode sequence of classes as they are loaded (inserting code to count the invocations, and so on).

Note

Is this a better profile than the sampled version? It depends; there is no way to know in a given situation which is the more accurate profile. The invocation count of an instrumented profile is certainly accurate, and that additional information is often helpful in determining where the code is spending more time and which things are more fruitful to optimize.

C- Native Profilers

Tools like async-profiler and Oracle Developer Studio have the capability to profile native code in addition to Java code. This has two advantages:

  • significant operations occur in native code, including within native libraries and native memory allocation.
  • we typically profile to find bottlenecks in application code, but sometimes the native code is unexpectedly dominating performance. We would prefer to find out our code is spending too much time in GC by examining GC logs.

4) Java Flight Recorder JFR

Java Flight Recorder (JFR) is a feature of the JVM that performs lightweight performance analysis of applications while they are running. As its name suggests, JFR data is a history of events in the JVM that can be used to diagnose the past performance and operations of the JVM.

The basic operation of JFR is that a set of events is enabled (for example, one event is that a thread is blocked waiting for a lock), and each time a selected event occurs, data about that event is saved (either in memory or to a file).

The higher the number of events, the higher the performance got affected by the JFR.

A- Java Mission Control

The usual tool to examine JFR recordings is Java Mission Control (jmc), though other tools exist, and you can use toolkits to write your own analysis tools.
The Java Mission Control program (jmc) starts a window that displays the JVM proโ€ cesses on the machine and lets you select one or more processes to monitor. Figure 3-9 shows the Java Management Extensions (JMX) console of Java Mission Control monitoring our example REST server.
JMC

B- JFR features

The following table shows what other tools can collect and what jfr collects for each event:

Event Other tools JFR
Classloading Number of classes loaded and unloaded Which classloader loaded the class; time required to load an individual class
Thread statistics Number of threads created and destroyed; thread dumps Which threads are blocked on locks (and the specific lock they are blocked on)
Throwables Throwable classes used by the application Number of exceptions and errors thrown and the stack trace of their creation
TLAB allocation Number of allocations in the heap and size of thread-local allocation buffers (TLABs) Specific objects allocated in the heap and the stack trace where they are allocated
File and socket I/O Time spent performing I/O Time spent per read/write call, the specific file or socket taking a long time to read or write
Monitor blocked Threads waiting for a monitor Specific threads blocked on specific monitors and the length of time they are blocked
Code cache Size of code cache and how much it contains Methods removed from the code cache; code cache configuration
Code compilation Which methods are compiled, on-stack replacement (OSR) compilation, and length of time to compile Nothing specific to JFR, but unifies information from several sources
Garbage collection Times for GC, including individual phases; sizes of generations Nothing specific to JFR, but unifies the information from several tools
Profiling Instrumenting and sampling profiles Not as much as youโ€™d get from a true profiler, but the JFR profile provides a good high-order overview

C- Enabling JFR

JFR is initially disabled. To enable it, add the flag
-XX:+FlightRecorder to the command line of the application. This enables JFR as a feature, but no recordings will be made until the recording process itself is enabled. That can occur either through a GUI or via the command line.

In Oracleโ€™s JDK 8, you must also specify this flag (prior to the FlightRecorder flag): -XX:+UnlockCommercialFeatures (default: false).
If you forget to include these flags, remember that you can use jinfo to change their values and enable JFR. If you use jmc to start a recording, it will automatically change these values in the target JVM if necessary.

To enable it from command line:
-XX:+FlightRecorderOptions=string
The string in that parameter is a list of comma-separated name- value pairs taken from these options:

name=name
-->The name used to identify the recording.
defaultrecording=<true|false>
-->Whether to start the recording initially. The default value is false; for reactive analysis, this should be set to true.
settings=path
-->Name of the file containing the JFR settings (see the next section).
delay=time
-->The amount of time (e.g., 30s, 1h) before the recording should start.
duration=time
-->The amount of time to make the recording.
filename=path
-->Name of the file to write the recording to.
compress=<true|false>
-->Whether to compress (with gzip) the recording; the default is false.
maxage=time
-->Maximum time to keep recorded data in the circular buffer.
maxsize=size
-->Maximum size (e.g., 1024K, 1M) of the recordingโ€™s circular buffer.
Enter fullscreen mode Exit fullscreen mode

๐Ÿƒ See you in chapter 4 ...


๐Ÿ’take a tip

Never trust your code. ๐Ÿ‘ฎ

Suspect your code

Top comments (0)

๐Ÿ‘€ Just want to lurk?

You can still create an account and turn on features like ๐ŸŒš dark mode.