DEV Community

Evan Lin
Evan Lin

Posted on • Originally published at evanlin.com on

TWJUG@LINE Conference Notes: September 5, 2019

Preface

Hello everyone, I am Evan Lin, a Technical Evangelist at LINE Taiwan. On the evening of 2019/09/05, I was very happy to invite the TWJUG community to LINE's Taipei office to hold a community gathering again. The speakers this time were Yoshida Shinya from LINE Tokyo office and Yuto Kawamura, who was a speaker at Kafka Summit 2017. The topics were ZGC for Future LINE HBase and Kafka Broker performance degradation by mysterious JVM pause.

Event URL: KKTIX: https://twjug.kktix.cc/events/twjug201909

ZGC for Future LINE HBase / LINE Shinya Yoshida

Slides

First, we have Shinya Yoshida from LINE, who is responsible for HBase-related processing, to share about the related applications of HBase on LINE. HBase is a widely used NoSQL application on Java, with fast response times and high availability. Because the JVM uses STW (Stop The World: that is, interrupting access to perform GC) in the processing of GC (Garbage Collection), this is quite troublesome for applications that require a large number of connections and high-performance processing. This presentation is to share related performance tuning and observation experiences. First, the speaker shares the related categories and types of Garbage Collection.

First, the speaker shares that GC has two main stages of processing:

  • Finding the garbage:
    • Explanation:
    • That is, you must first mark the memory resources that can be recycled as Garbage.
    • Application algorithm:
    • There are two main methods: one is through Counting Reference, which is to count the use of variables as a mark for recycling. The other is Mark, which is to organize the related usage of data through a tree structure. If there are no related variables, it means that the possibility of the variable being recycled is relatively high.
  • Collect the garbage and defrag:
    • Explanation:
    • Next is to recycle memory resources and organize fragmented available memory space.
    • Application algorithm:
    • Sweep/Compaction:
      • First, recycle the memory that can be recycled, and then organize the remaining discontinuous memory into continuous usable memory.
    • Copy:
      • Compared to the previous method, Copy directly uses a whole new memory space. Directly copy the continuous memory that can be used. This consumes more memory space, but it is relatively faster in terms of performance.

This picture explains many recent GC algorithms. In the processing of DeFragment, the Copy method is used, although it consumes more memory, the speed is relatively fast. Regarding these methods, you can see that there are many GC algorithms to choose from, G1GC, ZGC and Shenandoah and the old Old GC. And in the choice of GC, the speaker has the following directions to recommend for your reference:

Understand the advantages and disadvantages of each GC.

Select GC based on the hardware facilities of your application (service).

There are many usage references in it for readers to take a closer look. Here, let me directly share the relevant results with you. In terms of performance comparison, the speaker finally chose ZGC and G1GC to make some comparisons, and the results are as follows:

Based on these results, it can be seen that G1GC and ZGC will have different performance presentations on different hardware devices (memory configuration). It can be seen that under the configuration of a larger memory (128G), ZGC has better update and read performance. This configuration can also correspond to LINE's service applications. However, because ZGC on Java 11 is still in the experimental stage, it is only used as a performance test within LINE. It will also share the relevant results with you after more experiments and tests are officially launched.

Kafka Broker performance degradation by mysterious JVM pause / LINE - Yuto Kawamura

Slides

This section is shared by Yuto Kawamura, a senior engineer at LINE, about the debugging process of a problem that occurred in a live service. Kafka occupies a very important position in LINE's message backend, and more than sixty services use Kafka (you can refer to this slide). And the speaker shared a problem that occurred on Kafka at the time, and explained the entire debugging process.

Phenomenon/Problem

Originally, each Kafka message was very smooth, but suddenly the following situation occurred for a period of time: Response time degration for 99th %ile produce, and caused zookeeper session timeout.

Then start to analyze the following phenomena when the problem occurs:

  • It was found that the usage rate of each running thread soared
  • GC Time (STW) increased, and in the JVM environment analysis, it was found that the system pause (STW) time period generated when Garbage Collection was running also became longer.

Start narrowing the scope

Through these results, the speaker shared his debugging experience. First, he made an assumption: he suspected that some events at the JVM system level caused the system to be too slow. And try to Reproduce the same environment.

Here, let me explain that regarding the operation method of STW, the STW that operates GC will do two things: first, set safepoint which is to tell the JVM to start GC. At this time, the system will wait for each running thread to slowly pause, which is called safepoint sync.

And at this time, we want to test whether JVM safepoint sync causes too much time waste. The method used is to write a very long nested loop to prevent the system from executing safepoint sync too early. Then, through observation, to confirm whether your assumption is correct, and at the same time, observe whether the problem can be correctly reproduced.

The content here is quite exciting, and everyone is welcome to understand the content of the slides. The whole process is that the author continuously assumes, writes test tools to try to reproduce the problem. Finally, through some underlying observation tools to confirm whether his assumptions are correct.

What is the final problem? Let the author keep it a secret. Everyone is welcome to read the slides to understand the truth of this debugging system's underlying Kafka performance problem.

References

Event Summary

The content of this gathering was quite exciting, and it was an in-depth introduction and debugging experience of the underlying JVM system. I believe it brought a wonderful knowledge feast to everyone who participated, and everyone is also welcome to learn more about the content of the slides and discuss together.

Join the "LINE Developer Official Community" official account immediately, and you can receive the first-hand Meetup activities, or push notifications of the latest news related to the developer program. ▼

"LINE Developer Official Community" official account ID: @line_tw_dev

About "LINE Developer Community Program"

LINE launched the "LINE Developer Community Program" in Taiwan at the beginning of this year, and will invest long-term manpower and resources in Taiwan to hold internal and external, online and offline developer community gatherings, recruitment days, developer conferences, etc., and is expected to hold more than 30 events throughout the year. Readers are welcome to continue to check back for the latest updates. For details, please see 2019 LINE Developer Community Program Event Schedule (continuously updated)https://engineering.linecorp.com/zh-hant/blog/line-taiwan-developer-relations-2019-plan/

Top comments (0)