DEV Community

Illia Zub for SerpApi

Posted on • Updated on • Originally published at serpapi.com

When counting lines in Ruby randomly failed SerpApi deployments

Recently, we observed the occasional failing deployments only on two of our servers. The failed servers even were closing my regular SSH connection. In this story, you'll learn how we reduced memory usage and made one piece of SerpApi code 1.5x faster.

TL;DR

str.count($/) was 1.5x faster compared to str.lines.count and didn't allocate additional memory.

Investigation

Only two servers faced the random failed deployments.

#<Thread:0x000055a170560e70 digital_ocean.rb:80 run> terminated with exception (report_on_exception is true):
SerpApi/vendor/bundle/ruby/2.7.0/gems/net-ssh-5.2.0/lib/net/ssh/transport/server_version.rb:54:in `readpartial': Connection reset by peer (Errno::ECONNRESET)
Enter fullscreen mode Exit fullscreen mode

These servers also randomly closed my SSH connections.

$ ssh server-2
Last login: Fri Feb 24 14:23:29 2023 from {remote_ip}
client_loop: send disconnect: Broken pipe
Enter fullscreen mode Exit fullscreen mode

DigitalOcean server's graphs shown that memory usage was near to 95% percent on these two servers. Load average was occasionally peaking at 12 compared to 2 on other servers.

We checked the Puma server flamegraph. Most of the wall time on production were the SearchSplitter#do_one_request and Puma thread pool.

image

We used rbspy to generate the flamegraph:

$ rbspy record -p $PID_OF_PUMA_PROCESS
Enter fullscreen mode Exit fullscreen mode

The flamegraph didn't reveal anything actionable and we moved to memory profiling.

Memory profiling

Here's the script we used:

require 'memory_profiler'

user = User.find_by(email: "me")

report = MemoryProfiler.report do
  threads = []

  (1..5).map do |i|
    (1..5).map do |j|
      threads << Thread.new {
        search = Search.new(engine: "google", q: "#{i} * #{j}", user: user)
        SearchProcessor.process(search)
      }
    end
  end

  threads.each(&:join)
end

report.pretty_print
Enter fullscreen mode Exit fullscreen mode

It turned out that the top allocator was line counting in the response validator.

elsif response[:html].lines.count < 50 && response[:html].match?(/<div id=\"main\"><div id=\"cnt\"><div class=\"dodTBe\" id=\"sfcnt\"><\/div><H1>.+<\/H1>.+<p>.+<\/p>$/)
Enter fullscreen mode Exit fullscreen mode

Why the String#lines and Array#count were the top allocators of the entire app?

String#lines

The HTML file size varies from 180 KB (regular organic results) to 1.3 MB (Google Shopping with num=100). The String#lines allocated an array multiple time per each search because we send multiple requests concurrently per each search.

Thanks to @guilhermesimoes's gist, we found that str.each_line.count should be faster. But it was not optimal and we found a way to improve the solution.

Solution

The solution was super simple — str.count($/). Here's the diff:

- elsif response[:html].lines.count < 50 && response[:html].match?(/<div id=\"main\"><div id=\"cnt\"><div class=\"dodTBe\" id=\"sfcnt\"><\/div><H1>.+<\/H1>.+<p>.+<\/p>$/)
+ elsif response[:html].count($/) < 50 && response[:html].match?(/<div id=\"main\"><div id=\"cnt\"><div class=\"dodTBe\" id=\"sfcnt\"><\/div><H1>.+<\/H1>.+<p>.+<\/p>$/)
Enter fullscreen mode Exit fullscreen mode

To make sure the problem was solved, we benchmarked multiple ways of counting string lines in Ruby. We reused and adopted the gist above to exclude File#read from the benchmark and added String#count to the benchmark.

Benchmark

String#count was 1.5 times faster that other options:

Warming up --------------------------------------
                size    31.000  i/100ms
              length    75.000  i/100ms
               count    77.000  i/100ms
   each_line + count    81.000  i/100ms
           count($/)   196.000  i/100ms
Calculating -------------------------------------
                size      1.529k (±33.9%) i/s -      4.774k in   5.015361s
              length      1.434k (±38.8%) i/s -      5.025k in   5.139834s
               count      1.335k (±40.7%) i/s -      4.697k in   5.079353s
   each_line + count      1.411k (±39.5%) i/s -      5.022k in   5.110146s
           count($/)      2.231k (± 2.6%) i/s -     11.172k in   5.012323s

Comparison:
           count($/):     2230.5 i/s
                size:     1529.0 i/s - 1.46x  (± 0.00) slower
              length:     1434.2 i/s - 1.56x  (± 0.00) slower
   each_line + count:     1411.0 i/s - 1.58x  (± 0.00) slower
               count:     1334.9 i/s - 1.67x  (± 0.00) slower
Enter fullscreen mode Exit fullscreen mode

Here's the script:

require "benchmark/ips"

html = File.read(Rails.root.join("spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"))

Benchmark.ips do |x|
  x.report("size") { html.lines.size < 50 }
  x.report("length") { html.lines.length < 50 }
  x.report("count") { html.lines.count < 50 }

  x.report("each_line + count") { html.each_line.count < 50 }

  x.report("count($/)") { html.count($/) < 50 }

  x.compare!
end
Enter fullscreen mode Exit fullscreen mode

Memory usage

count($/) doesn't allocate a new array compared to lines/each_line/etc.

We used the awesome heap-profiler and heapy Ruby gems to profile memory usage.

lines/readlines/each_line/etc.

html.lines.count allocated the new array and referenced the original string for each iteration in the benchmark.

$ bundle exec heapy read ./tmp/lines_count/allocated.heap 49 --lines=1

Analyzing Heap (Generation: 49)
-------------------------------

allocated by memory (204879705) (in bytes)
==============================
  204872652  tmp/html_length_vs_count_vs_size_bench.rb:6

object count (5406)
==============================
  5301  tmp/html_length_vs_count_vs_size_bench.rb:6

High Ref Counts
==============================
  5300  tmp/html_length_vs_count_vs_size_bench.rb:6
Enter fullscreen mode Exit fullscreen mode

We also used the predefined $/ line separator to allocate even less memory.

count($/)

Most of these memory allocations and all of the object reference counts were gone when we used the String#count($/).

$ bundle exec heapy read ./tmp/count_nl/allocated.heap 48 --lines=1

Analyzing Heap (Generation: 48)
-------------------------------

allocated by memory (2547465) (in bytes)
==============================
  2540804  tmp/html_length_vs_count_vs_size_bench.rb:4

object count (105)
==============================
  27  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172

High Ref Counts
==============================
  73  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
Enter fullscreen mode Exit fullscreen mode

Code

require "heap-profiler"

HeapProfiler.report(Rails.root.join('tmp/lines_count')) do
  html = File.read(Rails.root.join("spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"))

  100.times { html.lines.count < 50 }
  # 100.times { html.count($/) < 50 }
end
Enter fullscreen mode Exit fullscreen mode

Comparison process

The heap diff comparison was a bit manual because the heapy diff did't provide a diff. We commented / uncommented 100.times { html.lines.count < 50 } and replaced paths in the command above.

## Profile heap
$ bundle exec rails r tmp/html_length_vs_count_vs_size_bench.rb

## Read summary of heap allocations
$ bundle exec heapy read ./tmp/count_nl/allocated.heap

## Read a specific generation (48) limiting number of lines to output (1)
$ bundle exec heapy read ./tmp/count_nl/allocated.heap 48 --lines=1
Enter fullscreen mode Exit fullscreen mode

Results

Immediately after the fix was deployed, memory usage on the affected servers decreased and stabilized. Then memory usage fluctuated again, but deployments and SSH connections stabilized.

image

Observations and final thoughts

The initial assumption was the Puma graceful restart. During the phased-restart, Puma spawned additional workers to switch to the new code version (which was expected). It wasn't clear why SSH connections were dropping on two DigitalOcean droplets only.

Doubling the amount of RAM would also solve the problem, but it wouldn't be as efficient at this point. The fix was deployed half a year ago and the issue is definitely solved.

Update Sep 20th, 2023

Thanks to @Freaky from Reddit for a wonderful feedback and cooperation:

I love it when you get this ripple effect from something that initially seems pretty innocuous.


If you enjoy working on such challenges, come work here with us: https://serpapi.com/careers#open-roles

Top comments (0)