DEV Community

Apache Doris
Apache Doris

Posted on • Updated on

How to solve C++ memory problems efficiently

Leads

Apache Doris uses the C++ language to develop its execution engine. One of the most important factors affecting the efficiency of C++ development is the use of pointers, including illegal accesses, leaks, and forced type conversions. In this article, we will introduce the Sanitizer and Core Dump analysis tools to share how to quickly locate C++ problems in Apache Doris and help developers improve their development efficiency and master more effective development techniques.

Background

Apache Doris is a high-performance MPP analytical database. For performance reasons, Apache Doris uses the C++ language to implement its execution engine. In C++ development, one of the most important factors affecting development efficiency is the use of pointers, including illegal accesses, leaks, forced type conversions, etc. Google Sanitizer is a tool designed by Google for dynamic code analysis, and when Apache Doris development encounters memory problems caused by the use of pointers, it is thanks to Sanitizer makes it possible to improve the efficiency of problem solving. In addition, Core Dump files are a very effective way to locate and reproduce problems when some memory out-of-bounds or illegal accesses cause BE processes to crash, so an efficient tool to analyze CoreDump will further help locate the problem more quickly.

In this article, we will introduce the Sanitizer and Core Dump analysis tools to share how to quickly locate C++ problems in Apache Doris and help developers improve their development efficiency and acquire more effective development skills.

Introduction to Sanitizer

There are two tools commonly used to locate memory problems in C++ programs, Valgrind and Sanitizer。

A comparison of the two can be found at https://developers.redhat.com/blog/2021/05/05/memory-error-checking-in-c-and-c-comparing-sanitizers-and-valgrind

Valgrind translates the execution of binary instructions through the runtime software to obtain the relevant information, so Valgrind degrades the performance of the program very significantly, which makes it inefficient to use Valgrind to locate memory problems in some large projects such as Apache Doris.

Sanitizer, on the other hand, captures relevant information by inserting code at compile time, with much less performance degradation than Valgrind, so Saintizer is used by default for Apache Doris single tests and other test environments.

Sanitizer's algorithm can be found at https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm

During the development of Apache Doris, we usually use the Sanirizer to locate memory problems. there are several Sanitizers for LLVM and GNU C++:

  • AddressSanitizer (ASan) can find memory error problems,such as use after free,heap buffer overflow,stack buffer overflow,global buffer overflow,use after return,use after scope,memory leak,super large memory allocation etc.;
  • AddressSanitizerLeakSanitizer (LSan) can find memory leaks;
  • MemorySanitizer (MSan) can find uninitialized memory usage;
  • UndefinedBehaviorSanitizer (UBSan) can find undefined behaviour, such as out-of-bounds array accesses, value overflows, etc.;
  • ThreadSanitizer (TSan) can find the competing behaviour of threads.

Among them, AddressSanitizer, AddressSanitizerLeakSanitizer and UndefinedBehaviorSanitizer are the most effective for solving pointer-related problems.

Sanitizer not only finds errors, but also gives the source of the error and the location of the code, which makes problem solving very efficient, as illustrated by some examples of how easy Sanitizer is to use.

You can refer here to use Sanitizer https://github.com/apache/doris/blob/master/be/CMakeLists.txt

Sanitizer and CoreDump work together to locate memory problems very efficiently. By default, it does not generate Core Dump files, you can use the following environment variables to generate Core Dump files, and it is recommended to open them by default.

Reference can be made to https://github.com/apache/doris/blob/master/bin/start_be.sh

export ASAN_OPTIONS=symbolize=1:abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1
Enter fullscreen mode Exit fullscreen mode

Use the following environment variables to make UBSan generate the code stack, which is not generated by default.

export UBSAN_OPTIONS=print_stacktrace=1
Enter fullscreen mode Exit fullscreen mode

Sometimes it is necessary to display the location of the specified Symbolizer binary so that Sanitizer can generate a readable code stack directly.

export ASAN_SYMBOLIZER_PATH=your path of llvm-symbolizer
Enter fullscreen mode Exit fullscreen mode

Examples of Sanitizer usage

Use after free

User after free refers to access to freed memory. For use after free errors, AddressSanitizer can report the code stack using the freed address, the code stack for address allocation, and the code stack for address release. For example, in https://github.com/apache/doris/issues/9525, the code stack using the release address is as follows.

82849==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300074c420 at pc 0x56510f61a4f0 bp 0x7f48079d89a0 sp 0x7f48079d8990
READ of size 1 at 0x60300074c420 thread T94 (MemTableFlushTh)
    #0 0x56510f61a4ef in doris::faststring::append(void const*, unsigned long) /mnt/ssd01/tjp/incubator-doris/be/src/util/faststring.h:120
// For a more detailed code stack please go to [<https://github.com/apache/doris/issues/9525>](https://github.com/apache/doris/issues/9525) to view
Enter fullscreen mode Exit fullscreen mode

The code stack for the initial assignment of this address is as follows.

previously allocated by thread T94 (MemTableFlushTh) here:
    #0 0x56510e9b74b7 in __interceptor_malloc (/mnt/ssd01/tjp/regression_test/be/lib/palo_be+0x536a4b7)
    #1 0x56510ee77745 in Allocator<false, false>::alloc_no_track(unsigned long, unsigned long) /mnt/ssd01/tjp/incubator-doris/be/src/vec/common/allocator.h:223
    #2 0x56510ee68520 in Allocator<false, false>::alloc(unsigned long, unsigned long) /mnt/ssd01/tjp/incubator-doris/be/src/vec/common/allocator.h:104
Enter fullscreen mode Exit fullscreen mode

The code stack for address release is as follows.

0x60300074c420 is located 16 bytes inside of 32-byte region [0x60300074c410,0x60300074c430)
freed by thread T94 (MemTableFlushTh) here:
    #0 0x56510e9b7868 in realloc (/mnt/ssd01/tjp/regression_test/be/lib/palo_be+0x536a868)
    #1 0x56510ee8b913 in Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) /mnt/ssd01/tjp/incubator-doris/be/src/vec/common/allocator.h:125
    #2 0x56510ee814bb in void doris::vectorized::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 15ul, 16ul>::realloc<>(unsigned long) /mnt/ssd01/tjp/incubator-doris/be/src/vec/common/pod_array.h:147
Enter fullscreen mode Exit fullscreen mode

With a detailed illegal access to the address code stack, allocation code stack, and release code stack, the problem will be very easy to locate.

Note: Due to the length of the article, the stack in the example is not fully displayed.

heap buffer overflow

AddressSanitizer can report the code stack of heap buffer overflow.

For example, from https://github.com/apache/doris/issues/5951 combined with the Core Dump file generated at runtime, you can quickly locate the problem.

==3930==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c000000878 at pc 0x000000ae00ce bp 0x7ffeb16aa660 sp 0x7ffeb16aa658
READ of size 8 at 0x60c000000878 thread T0
    #0 0xae00cd in doris::StringFunctions::substring(doris_udf::FunctionContext*, doris_udf::StringVal const&, doris_udf::IntVal const&, doris_udf::IntVal const&) ../src/exprs/string_functions.cpp:98
Enter fullscreen mode Exit fullscreen mode

memory leak

AddressSanitizer is able to report where the allocated memory is not freed, so that the cause of the leak can be quickly analyzed.

==1504733==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 688128 byte(s) in 168 object(s) allocated from:
#0 0x560d5db51aac in __interceptor_posix_memalign (/mnt/ssd01/doris-master/VEC_ASAN/be/lib/doris_be+0x9227aac)
#1 0x560d5fbb3813 in doris::CoreDataBlock::operator new(unsigned long) /home/zcp/repo_center/doris_master/be/src/util/core_local.cpp:35
#2 0x560d5fbb65ed in doris::CoreDataAllocatorImpl<8ul>::get_or_create(unsigned long) /home/zcp/repo_center/doris_master/be/src/util/core_local.cpp:58
#3 0x560d5e71a28d in doris::CoreLocalValue::CoreLocalValue(long)
Enter fullscreen mode Exit fullscreen mode

https://github.com/apache/doris/issues/10926

https://github.com/apache/doris/pull/3326

Exception Distribution

AddressSanitizer will report an OOM error for allocating too much memory, and the stack and Core Dump file can be used to analyze where too much memory has been allocated. An example of the stack is as follows

Fix PR: https://github.com/apache/doris/pull/10289

UBSan can efficiently find errors in forced type conversions, as described in the following Issue link, and it can precisely describe the code that brings errors in forced type conversions, which can be more difficult to locate subsequently due to pointer misuse if such errors are not found in the first place.

Issue:https://github.com/apache/doris/issues/9105

UndefinedBehaviorSanitizer is also easier to find deadlocks than AddressSanitizer and others.

Such as https://github.com/apache/doris/issues/10309

Use of AddressSanitizer when maintaining memory pools for programs

The AddressSanitizer is used by the compiler to generate additional code for memory allocation, release, and access to implement memory problem analysis. If the program maintains its own memory pool, the AddressSanitizer cannot detect illegal accesses to memory in the pool. In this case, some additional work needs to be done to make AddressSanitizer work as much as possible, mainly using ASAN_POISON_MEMORY_REGION and ASAN_UNPOISON_MEMORY_REGION to manage memory accessibility, which is more difficult to use because AddressSanitizer has internal handling of address alignment, etc. For performance and memory release reasons, Apache Doris also maintains a memory allocation pool, and this approach does not ensure that AddressSanitizer will find all problems.

Reference can be made to https://github.com/apache/doris/pull/8148

When an application maintains its own memory pool, the use after free error becomes use after poison according to the method in https://github.com/apache/dorisw/pull/8148. However, use after poison does not give the stack where the address fails https://github.com/google/sanitizers/issues/191 , which makes it still difficult to locate and analyze the problem.

Therefore, it is recommended that the memory pool maintained by the program be turned off with an option so that the AddressSanitizer can be used to efficiently locate memory problems in the test environment.

Core dump analysis tool

A common problem in analyzing Core Dump files generated by C++ programs is how to print out the values of the STL containers and the values of the containers in Boost, and there are three tools to efficiently view the values of the STL and Boost containers.

STL-View

You can use STL-View by placing this file https://github.com/dataroaring/tools/blob/main/gdb/dbinit_stl_views-1.03.txt in ~/.gdbinit. STL-View output is very friendly and supports pvector, plist, plist_member, pmap, pmap_member, pset, pdequeue, pstack, pqueue, ppqueue, pbitset, pstring, pwstring. As an example, pvector is used in Apache Doris. It can output all the elements in a vector.

(gdb) pvector block.data
elem[0]: $5 = {
  column = {
    <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
      t = 0x606000fdc820
    }, <No data fields>},
  type = {
    <std::__shared_ptr<doris::vectorized::IDataType const, (__gnu_cxx::_Lock_policy)2>> = {
      <std::__shared_ptr_access<doris::vectorized::IDataType const, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>},
      members of std::__shared_ptr<doris::vectorized::IDataType const, (__gnu_cxx::_Lock_policy)2>:
      _M_ptr = 0x6030069e9780,
      _M_refcount = {
        _M_pi = 0x6030069e9770
      }
    }, <No data fields>},
  name = {
    static npos = 18446744073709551615,
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      members of std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
      _M_p = 0x61400006e068 "n_nationkey"
    },
    _M_string_length = 11,
    {
      _M_local_buf = "n_nationkey\000\276\276\276\276",
      _M_allocated_capacity = 7957695015158701934
    }
  }
}
elem[1]: $6 = {
  column = {
    <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
      t = 0x6080001ec220
    }, <No data fields>},
  type = {
  ...
Enter fullscreen mode Exit fullscreen mode

Pretty-Printer

GCC 7.0 now supports Pretty-Printer to print STL containers, you can place the following code in ~/.gdbinit to make Pretty-Printer work.

Note: /usr/share/gcc/python needs to be replaced with its local counterpart.

python
import sys
sys.path.insert(0, '/usr/share/gcc/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end
Enter fullscreen mode Exit fullscreen mode

Pretty-Printer can print out the details of a vector, for example.

(gdb) p block.data
$1 = std::vector of length 7, capacity 8 = {{
    column = {
      <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
        t = 0x606000fdc820
      }, <No data fields>},
    type = std::shared_ptr<const doris::vectorized::IDataType> (use count 1, weak count 0) = {
      get() = 0x6030069e9780
    },
    name = "n_nationkey"
  }, {
    column = {
      <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
        t = 0x6080001ec220
      }, <No data fields>},
    type = std::shared_ptr<const doris::vectorized::IDataType> (use count 1, weak count 0) = {
      get() = 0x6030069e9750
    },
    name = "n_name"
  }, {
    column = {
      <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
        t = 0x606000fd52c0
      }, <No data fields>},
    type = std::shared_ptr<const doris::vectorized::IDataType> (use count 1, weak count 0) = {
      get() = 0x6030069e9720
    },
    name = "n_regionkey"
  }, {
    column = {
      <COW<doris::vectorized::IColumn>::intrusive_ptr<doris::vectorized::IColumn const>> = {
        t = 0x6030069e96b0
      }, <No data fields>},
    type = std::shared_ptr<const doris::vectorized::IDataType> (use count 1, weak count 0) = {
      get() = 0x604000a66160
    },
    name = "n_comment"
Enter fullscreen mode Exit fullscreen mode

Boost Pretty Printer

Because Apache Doris uses very little Boost, there are no more examples.

Reference can be made to https://github.com/ruediger/Boost-Pretty-Printer

Summary

With Sanitizer, you can find problems in time in single test, functional, integration and stress test environments. Most importantly, most of the time, you can give the associated site of program problems, such as the call stack of memory allocation, the call stack of memory release, the call stack of illegal memory access, and with Core Dump, you can check the state of the site and solve C++ memory problems from guessing to field analysis with evidence.

Links

Apache Doris Official Website

http://doris.apache.org

Apache Doris Github

https://github.com/apache/doris

Apache Doris Mailing list:

dev@doris.apache.org

Top comments (0)