Time is of the essence: EBR in High-Performance Databases

#database #architecture #performance #opensource

"Can I `free` you?"

When writing high-performance engines that are multi-threaded and highly concurrent, one of the interesting problems to solve is memory reclamation. This is something I had to address when building orrp, a high-performance events/analytics database.

The nature of the problem is this: Memory gets allocated for some type of structure. In my case, it's a data structure containing a roaring bitmap, which is an implementation of bitmaps that are highly optimized and great for doing set-based calculations (AND/OR etc).

static eval_bitmap_t *_and(eval_bitmap_t *left, eval_bitmap_t *right,
                           eval_ctx_t *ctx, eng_eval_result_t *result) {
  (void)result;
  if (left->own) {
    // We own Left, so we can mutate it in-place
    bitmap_and_inplace(left->bm, right->bm);
    return left;
  }
  if (right->own) {
    // We own Right, so we can mutate it in-place
    bitmap_and_inplace(right->bm, left->bm);
    return right;
  }
  bitmap_t *res_bm = bitmap_and(left->bm, right->bm);
  if (!res_bm)
    return NULL;

  return _store_intermediate_bitmap(ctx, res_bm, true);
}

Bitmap evaluation in orrp.

Multiple threads are reading this cached data, and at some point we need to deallocate the memory, but we can only do so when we're sure that it's not being used. We need to be sure because if one thread deallocates the memory while another thread is using it for some type of operation, we will quickly run into the classic use-after-free error. A.K.A. You will crash hard and fast.

Atomic Ref Counting (ARC)

A very popular and valid approach to solve this problem is atomic reference counting. This approach uses atomic operations and an atomic integer called a reference counter that is incremented or decremented based on who is using the shared data. If the memory needs to later be evicted, a thread can see if the reference count is zero before evicting, making sure no one is using it. This prevents the use-after-free issue.

You also need to use a concurrency primitive, like a mutex or read-write lock, in order to make this work with a shared global cache.

  uv_rwlock_rdlock(&g_container_state.cache_rwlock);
  container_cache_node_t *node =
      container_cache_get(g_container_state.cache, name);

  if (node) {
    atomic_fetch_add(&node->reference_count, 1);
    result.success = true;
    result.container = node->container;

    if (g_container_state.cache->head == node) {
      uv_rwlock_rdunlock(&g_container_state.cache_rwlock);
      return result;
    }
    uv_rwlock_rdunlock(&g_container_state.cache_rwlock);
    uv_rwlock_wrlock(&g_container_state.cache_rwlock);
    container_cache_move_to_front(g_container_state.cache, node);
    uv_rwlock_wrunlock(&g_container_state.cache_rwlock);
    return result;
  }

ARC example.

However, there's another approach that can work even better in highly concurrent and hot parts of an application, without locks. Enter EBR!

Epoch-Based Reclamation (EBR)

I got introduced to EBR (Epoch-Based Reclamation) while studying the source code for Concurrency Kit. Concurrency Kit, or ck, is an awesome open source library of concurrency primitives that I use heavily in orrp. For example, it has lock-free ring buffers, hash tables, and implements primitives like memory barriers and fences. Shoutout to Samy Al Bahra for giving us an incredible OS library.

#### Safe Memory Reclamation

##### ck_epoch

A scalable safe memory reclamation mechanism with support idle threads and various optimizations that make it better than or competitive with many state-of-the-art solutions.

Excerpt from Concurrency Kit README.

At its core, the idea behind EBR is that each thread owns a ticket/record that is stored in its' thread-local storage, and there is a global epoch object that the threads register their tickets with. This global epoch record is a monotonic logical counter that is used as a reference for when threads enter and leave critical sections.

In other words, if I'm a reader thread and I want to read some data object, I register with the global epoch using my thread's ticket that I'm about to enter a critical section, so do not free any memory that I will be using within this critical section (in my case a roaring bitmap). Then, when I'm done, I let the epoch know that I'm done and that I've left the critical section.

  ck_epoch_section_t section;
  ebr_begin(&section);

  eng_eval_result_t eval_result =
      eng_eval_resolve_exp_to_events(cmd_ctx->where_tag_value, ctx);

  ebr_end(&section);

Example EBR critical section.

Later, when another thread wants to clean up memory, it uses the global epoch time-based object to know exactly which memory is safe to be freed/deallocated. The thread then calls a disposal callback function that does the actual memory deallocation.

What this means is that Epoch-Based Reclamation is a form of deferred free, similar to a Garbage Collector. Essentially, we defer freeing an object until we know that nobody is utilizing it. We don't have to use locks, and it's highly performant.

Going Deeper

If you're interested in learning more, Trevor Brown wrote an excellent research paper on this subject. In this paper, he also discusses similar techniques, such as hazard pointers, that are very interesting to learn about. This paper is definitely worth a read and I highly recommend it.

`orrp`

If you're interested in this type of advanced memory management, check out orrp. It's an open-source database that is built for use cases like event tracking, analytics, IoT workloads, network traffic, etc. If you're interested in contributing or have an idea, feel free to head over to the documentation and the GitHub repo, and you can either open a discussion post or an issue.

If you have questions, please leave them in the comments.

Thanks for reading!