This article is part 4 for the series
Java Performance that summarize the java performance book by Scot Oaks
In the previous chapter we have discussed performance testing methods. We have mentioned the difference between Micorbenchmarks, Macrobenchmarks and Mesobecnhmars. We have also talked about the responsetime, throughput and variability.
In this chapter we are going to discuss some intersting measurement tools for cpu, network and disk. We will understand the difference different profilers in java and talk a bit about JFR
Java Flight Recorder.
A java Performance Toolbox
Performance analysis is all about visibility—knowing what is going on inside an application and in the application’s environment. Visibility is all about tools. And so performance tuning is all about tools.
The starting point for program analysis is not Java-specific at all: it is the basic set of monitoring tools that come with the operating system.
We are going to see a quick look on operating system methods to take a look into the usage of:
- CPU --> vmstat
- Disk --> iostat
- Network --> nicstat
CPU usage is typically divided into two categories: user time and system time (Windows refers to this as privileged time).
- User time is the percentage of time the CPU is executing application code.
- System Time is the percentage of time the CPU is executing kernel code.
Goal is to maximize the cpu utilization.
- Each second has a
system time = 3%and
user time = 42%approximately.
- CPU total time [aka utilization] is 45% This means that the cpu is idle for 55% of the time.
The CPU can be idle for multiple reasons:
- The application might be blocked on a synchronization primitive and unable to execute until that lock is released.
- The application might be waiting for something, such as a response to come back from a call to the database.
- The application might have nothing to do.
These first two situations are always indicative of something that can be addressed. If contention on the lock can be reduced or the database can be tuned so that it sends the answer back more quickly, then the program will run faster, and the average CPU use of the application will go up (assuming, of course, that there isn’t another such issue that will continue to block the application).
Java and Single CPU:
If code is batch-style application, then the cpu will not be idle, because it has work to do always [if job is blocked for i/o or something, another batch can use the cpu .. etc]
Java and multi CPU:
The general idea is the same as in single cpu, however making sure individual threads are not blocked will drive the CPU higher.
You can monitor the number of threads that can be run [aka not blocked]. Those threads are called to be in the
CPU Run Queue. You can find the length of the run queue in the previous image at the first column
- In linux: the number equals the number of currently running threads [those that are using the processors] and the others who are waiting for processors to use.
- In Windows: the number equals the number does NOT count the currently running threads. So the goal in linux is to make this queue length = the number of machine processors, and in windows to make it = 0.
Monitoring disk usage has two important goals.
- The first pertains to the application itself: if the application is doing a lot of disk I/O, that I/O can easily become a bottleneck.
- The second reason is to monitor disk usage, even if the application is not expected to perform a significant amount of I/O—is to help monitor if the system is swapping.
- This application is writing data to disk sda.
- w_await: the time to service each I/O write
- util: the disk utilization
Applications that write to disk can be bottlenecked both because they are writing data inefficiently (too little through‐ put) or because they are writing too much data (too much throughput).
If you are running an application that uses the network—for example, a REST server—you must monitor the network traffic as well.
You can use nicstat to monitor the network, it is not the default of the system but it's opensource with more features.
Applications that write to the network can be bottlenecked because they are writing data inefficiently (too little through‐ put) or because they are writing too much data (too much throughput).
To gain insight into the JVM itself, Java monitoring tools are required. These tools come with the JDK:
- jcmd: Prints basic class, thread, and JVM information for a Java process.
- jconsole: Provides a graphical view of JVM activities, including thread usage, class usage, and GC activities
- jmap: Provides heap dumps and other information about JVM memory usage. Suitable for scripting, though the heap dumps must be used in a postprocessing tool.
- jinfo: Provides visibility into the system properties of the JVM, and allows some system properties to be set dynamically. Suitable for scripting.
- jstack: Dumps the stacks of a Java process. Suitable for scripting.
- jstat: Provides information about GC and class-loading activities. Suitable for scripting.
- jvisualvm: A GUI tool to monitor a JVM, profile a running application, and analyze JVM heap dumps (which is a postprocessing activity, though jvisualvm can also take the heap dump from a live program).
if you are using docker, you can run them using
docker exec except
These tools fits into these broad areas:
• Basic VM information
• Thread information
• Class information
• Live GC analysis
• Heap dump postprocessing
• Profiling a JVM
- Uptime The length of time the JVM has been up can be found via this command:
% jcmd process_id VM.uptime
- System properties
% jcmd process_id VM.system_properties
% jinfo -sysprops process_id
- JVM version The version of the JVM is obtained like this:
% jcmd process_id VM.version
- JVM tuning flags The tuning flags in effect for an application can be obtained like this:
% jcmd process_id VM.flags [-all]
Note you can change tuning flags dynamically at runtime using jinfo command, example:
% jinfo -flag -PrintGCDetails process_id # turns off PrintGCDetails
% jinfo -flag PrintGCDetails process_id
Profilers are the most important tool in a performance analyst’s toolbox. Many profil‐ ers are available for Java, each with its own advantages and disadvantages.
Many common Java profiling tools are themselves written in Java and work by “attaching” themselves to the application to be profiled. This attachment is via a socket or via a native Java interface called the JVM Tool Interface (JVMTI).
This means you must pay attention to tuning the profiling tool just as you would tune any other Java application. In particular, if the application being profiled is large, it can transfer quite a lot of data to the profiling tool, so the profiling tool must have a sufficiently large heap to handle the data.
Profiling happens in one of two modes:
- sampling mode
- instrumented mode
Pros:The basic mode of profiling and carries the least amount of overhead.
Cons: However, sampling profilers can be subject to all sorts of errors, for example, the most common sampling erro is as shown in the figure below:
The thread here is alternating between executing methodA (shown in the shaded bars) and methodB (shown in the clear bars). If the timer fires only when the thread happens to be in methodB, the profile will report that the thread spent all its time executing methodB; in reality, more time was actually spent in methodA.
Reason: this is due to
safepoint bias, which means that the profiler can get the stack trace of a thread only when the thread is at safepoint, when they are:
• Blocked on a synchronized lock
• Blocked waiting for I/O
• Blocked waiting for a monitor
• Executing Java Native Interface (JNI) code (unless they perform a GC locking function)
Pros: Instrumented profilers are much more intrusive than sampling profilers, but they can also give more beneficial information about what’s happening inside a program.
Cons: They are much more likely to introduce performance differences into the application than are sampling profilers.
Instrumented profilers work by altering the bytecode sequence of classes as they are loaded (inserting code to count the invocations, and so on).
Is this a better profile than the sampled version? It depends; there is no way to know in a given situation which is the more accurate profile. The invocation count of an instrumented profile is certainly accurate, and that additional information is often helpful in determining where the code is spending more time and which things are more fruitful to optimize.
Tools like async-profiler and Oracle Developer Studio have the capability to profile native code in addition to Java code. This has two advantages:
- significant operations occur in native code, including within native libraries and native memory allocation.
- we typically profile to find bottlenecks in application code, but sometimes the native code is unexpectedly dominating performance. We would prefer to find out our code is spending too much time in GC by examining GC logs.
Java Flight Recorder (JFR) is a feature of the JVM that performs lightweight performance analysis of applications while they are running. As its name suggests, JFR data is a history of events in the JVM that can be used to diagnose the past performance and operations of the JVM.
The basic operation of JFR is that a set of events is enabled (for example, one event is that a thread is blocked waiting for a lock), and each time a selected event occurs, data about that event is saved (either in memory or to a file).
The higher the number of events, the higher the performance got affected by the JFR.
The usual tool to examine JFR recordings is
Java Mission Control (jmc), though other tools exist, and you can use toolkits to write your own analysis tools.
The Java Mission Control program (jmc) starts a window that displays the JVM pro‐ cesses on the machine and lets you select one or more processes to monitor. Figure 3-9 shows the Java Management Extensions (JMX) console of Java Mission Control monitoring our example REST server.
The following table shows what other tools can collect and what jfr collects for each event:
|Classloading||Number of classes loaded and unloaded||Which classloader loaded the class; time required to load an individual class|
|Thread statistics||Number of threads created and destroyed; thread dumps||Which threads are blocked on locks (and the specific lock they are blocked on)|
|Throwables||Throwable classes used by the application||Number of exceptions and errors thrown and the stack trace of their creation|
|TLAB allocation||Number of allocations in the heap and size of thread-local allocation buffers (TLABs)||Specific objects allocated in the heap and the stack trace where they are allocated|
|File and socket I/O||Time spent performing I/O||Time spent per read/write call, the specific file or socket taking a long time to read or write|
|Monitor blocked||Threads waiting for a monitor||Specific threads blocked on specific monitors and the length of time they are blocked|
|Code cache||Size of code cache and how much it contains||Methods removed from the code cache; code cache configuration|
|Code compilation||Which methods are compiled, on-stack replacement (OSR) compilation, and length of time to compile||Nothing specific to JFR, but unifies information from several sources|
|Garbage collection||Times for GC, including individual phases; sizes of generations||Nothing specific to JFR, but unifies the information from several tools|
|Profiling||Instrumenting and sampling profiles||Not as much as you’d get from a true profiler, but the JFR profile provides a good high-order overview|
JFR is initially disabled. To enable it, add the flag
-XX:+FlightRecorder to the command line of the application. This enables JFR as a feature, but no recordings will be made until the recording process itself is enabled. That can occur either through a GUI or via the command line.
In Oracle’s JDK 8, you must also specify this flag (prior to the FlightRecorder flag):
-XX:+UnlockCommercialFeatures (default: false).
If you forget to include these flags, remember that you can use
jinfo to change their values and enable JFR. If you use jmc to start a recording, it will automatically change these values in the target JVM if necessary.
To enable it from command line:
The string in that parameter is a list of comma-separated name- value pairs taken from these options:
name=name -->The name used to identify the recording. defaultrecording=<true|false> -->Whether to start the recording initially. The default value is false; for reactive analysis, this should be set to true. settings=path -->Name of the file containing the JFR settings (see the next section). delay=time -->The amount of time (e.g., 30s, 1h) before the recording should start. duration=time -->The amount of time to make the recording. filename=path -->Name of the file to write the recording to. compress=<true|false> -->Whether to compress (with gzip) the recording; the default is false. maxage=time -->Maximum time to keep recorded data in the circular buffer. maxsize=size -->Maximum size (e.g., 1024K, 1M) of the recording’s circular buffer.
Never trust your code. 👮