This post introduces a new series—Unfinished plays. I have many drafts and incomplete posts I do not plan to finish in the nearest future. Since they still could be interesting/helpful to the community, I decided to release them almost as-is. This is the first one.
In the first part, we compared Action Cable memory usage with different VM configurations:
malloc_trim. I promised to continue researching in this direction and evaluate
GC.compact and jemalloc (via Fullstaq Ruby) as well. So, here we are.
This time I decided to analyze a little bit different scenario: "A sudden attack during a constant pressure".
"Constant pressure" means a uniform load: clients are connecting, communicating, and disconnecting during the benchmark, but the total number of concurrent users stays about the same.
I have a specific tool for writing such scenarios called wsdirector. It allows you to define a scenario in YML format and run it using a different scale factor.
"A sudden attack" emulates a situation when the number of concurrent connections unexpectedly spikes and then returns to normal. We do that by performing the "WebSocket shootout" scenario from the first part.
The source code of the benchmark is available here: https://github.com/anycable/simple-cable-app (see simiulate.yml and simulate.rb).
The server is running within a Docker container. During the benchmark, we capture the container's memory usage and generate a chart in the end (with the help of the awesome unicode_plot library, see monitor_docker.rb).
The exact command I used for the benchmarks below is as follows:
# first, start the app dip up rails # or for Fullstaq dip up rails-fs # or for Fullstaq with malloc_trim dip up rails-fs-trim # pressure TOTAL=20 SAMPLE=50 N=4 SCALE=200 ruby benchmarks/simulate.rb # spike (runs twice during the pressure) cat benchmarks/broadcast.opts | xargs websocket-bench broadcast
Below you can find the results I got on my machine (Windows 10 + WSL2, AMD Ryzen 3200G 3.6GHz, 16GB RAM). But first, let's talk about the heap compaction for a bit.
GC compaction has been finally released in Ruby 2.7 (thanks to Aaron Patterson). To learn more about this feature, watch one of the latest Aaron's talks, for example, this one from RubyConf 2019:
First, I tried to visualize the effect of the
GC.compact after running a simulation with 1k connected clients:
Awesome! Compaction works! Or does it 🤔
I tried to run the pressure scenario with added
GC.compact calls after each wave (while other waves are active, i.e., we continue accepting connections, broadcasting messages, etc.) and, unfortunately, found myself in the segmentation fault situation. Likely, the problem is with C extensions (we have at least Puma and nio4r).
GC.compact while there are no active Action Cable clients works fine. So, I had to update the scenario a bit and add a "stop-the-world"-like feature to perform compaction in isolation.
Fullstaq Ruby & jemalloc
Jemalloc is an alternative memory allocator which could be used instead of
malloc (used by default in MRI).
Here are a couple of articles to learn more about jemalloc for Rails applications: one and two.
We're going to use a Fullstaq Ruby distribution with jemalloc built-in (via Docker images provided by Evil Martians).
NOTE: Since this is an unfinished play, I'm not providing any explanations/bikeshedding here—just plain results. Feel free to start a discussion in the comments!
We have 6 different setups:
- baseline: MRI 2.7.1
- compact: MRI 2.7.1 w/ compaction
- jemalloc: Fullstaq Ruby 2.7.1 w/ jemalloc
- jemalloc_compact: Fullstaq Ruby 2.7.1 w/ jemalloc and compaction
- trim: Fullstaq Ruby 2.7.1 w/ malloc_trim
- trim_compact: Fullstaq Ruby 2.7.1 w/ malloc_trim and compaction.
Below you can find the results of the benchmarks in different combinations (to better demonstrate the difference).
Top comments (2)
Just found this wile researching this topic. Great post!
Run GC.compact before fork #11121
What type of PR is this? (check all applicable)
Here are a few posts about
My thoughts here are that if tests pass, this is something we could deploy and observe. There may be more ways we could use this in specific cases.