Cables profiling returns: GC.compact & jemalloc

#ruby #profiling #benchmarks

This post introduces a new series—Unfinished plays. I have many drafts and incomplete posts I do not plan to finish in the nearest future. Since they still could be interesting/helpful to the community, I decided to release them almost as-is. This is the first one.

In the first part, we compared Action Cable memory usage with different VM configurations: MALLOC_ARENA_MAX, malloc_trim. I promised to continue researching in this direction and evaluate GC.compact and jemalloc (via Fullstaq Ruby) as well. So, here we are.

The Benchmark

This time I decided to analyze a little bit different scenario: "A sudden attack during a constant pressure".

"Constant pressure" means a uniform load: clients are connecting, communicating, and disconnecting during the benchmark, but the total number of concurrent users stays about the same.

I have a specific tool for writing such scenarios called wsdirector. It allows you to define a scenario in YML format and run it using a different scale factor.

"A sudden attack" emulates a situation when the number of concurrent connections unexpectedly spikes and then returns to normal. We do that by performing the "WebSocket shootout" scenario from the first part.

The source code of the benchmark is available here: https://github.com/anycable/simple-cable-app (see simiulate.yml and simulate.rb).

The server is running within a Docker container. During the benchmark, we capture the container's memory usage and generate a chart in the end (with the help of the awesome unicode_plot library, see monitor_docker.rb).

The exact command I used for the benchmarks below is as follows:

# first, start the app
dip up rails

# or for Fullstaq
dip up rails-fs

# or for Fullstaq with malloc_trim
dip up rails-fs-trim

# pressure
TOTAL=20 SAMPLE=50 N=4 SCALE=200 ruby benchmarks/simulate.rb

# spike (runs twice during the pressure)
cat benchmarks/broadcast.opts | xargs websocket-bench broadcast

Below you can find the results I got on my machine (Windows 10 + WSL2, AMD Ryzen 3200G 3.6GHz, 16GB RAM). But first, let's talk about the heap compaction for a bit.

`GC.compact`

GC compaction has been finally released in Ruby 2.7 (thanks to Aaron Patterson). To learn more about this feature, watch one of the latest Aaron's talks, for example, this one from RubyConf 2019:

First, I tried to visualize the effect of the GC.compact after running a simulation with 1k connected clients:

Awesome! Compaction works! Or does it 🤔

I tried to run the pressure scenario with added GC.compact calls after each wave (while other waves are active, i.e., we continue accepting connections, broadcasting messages, etc.) and, unfortunately, found myself in the segmentation fault situation. Likely, the problem is with C extensions (we have at least Puma and nio4r).

Calling GC.compact while there are no active Action Cable clients works fine. So, I had to update the scenario a bit and add a "stop-the-world"-like feature to perform compaction in isolation.

Fullstaq Ruby & jemalloc

Jemalloc is an alternative memory allocator which could be used instead of malloc (used by default in MRI).

Here are a couple of articles to learn more about jemalloc for Rails applications: one and two.

We're going to use a Fullstaq Ruby distribution with jemalloc built-in (via Docker images provided by Evil Martians).

Benchmark results

NOTE: Since this is an unfinished play, I'm not providing any explanations/bikeshedding here—just plain results. Feel free to start a discussion in the comments!

We have 6 different setups: