Stanislav Kozlovski

Posted on Sep 26, 2017

Working with Multithreaded Ruby Part I

#ruby #concurrency #multithreaded

Introduction

Multithreaded Ruby is a niche topic in our community and to no surprise. Most Ruby applications are web servers built on Rails or Sinatra, those are single-threaded frameworks and developers on such projects rarely even need to know about threads, as the framework usually has got your back.

Even if you do not use it, some basic knowledge of multithreading (and its basic concepts) in an interpreted language like Ruby will surely come in handy throughout your career.

I assume you know about the GIL (Global Interpreter Lock). In case you don't know what it is, you can read my article Ruby's GIL in a nutshell

GIL != the end of the world

Even though it limits parallelism, Ruby's GIL does not completely stop it. As we know, it exists to guard the interpreter's internal state. As such, it only applies to Ruby operations. In our normal day-to-day code there are a lot of operations that are not the job of Ruby's interpreter to handle.

A good example is I/O operations. While waiting for an external service to load something, there is no need to hold the GIL, as this external service cannot harm our internal state.
Ruby's PostgreSQL library is written in C and its method call for a DB query releases the GIL. The following example shows that:

require 'thwait'
require 'pg'

start = Time.now

first_sleep = Thread.new do
  puts 'Starting sleep 1'
  conn = PG::Connection.open(dbname: 'test')
  conn.exec('SELECT pg_sleep(1);')
  puts 'Finished sleep 1'
end

second_sleep = Thread.new do
  puts 'Starting sleep 2'
  conn = PG::Connection.open(dbname: 'test2')
  conn.exec('SELECT pg_sleep(1);')
  puts 'Finished sleep 2'
end

random = Thread.new do
  puts 'In a random thread'
end

ThWait.all_waits(first_sleep, second_sleep, random)

puts "Time it took: #{Time.now - start}"

Here we spin up two threads, create a connection to different databases and run a sleep query for a second. Without parallelism, this should take at minimum 2 seconds.

> enether$ ruby async_pg.rb
> Starting sleep 2
> Starting sleep 1
> In a random thread
> Finished sleep 2
> Finished sleep 1
> Time it took: 1.074824

But it runs in 1 second!
This proves that the PostgreSQL query does not hold the GIL and lets the other thread take control. Not only does it not lock the interpreter but it actually runs the query in parallel with the other query, that's the only way in which we could achieve a 1 second execution time to run two sleep queries!

Reminder: The GIL does not protect you

A problem can occur when two or more threads access shared data and try to change it. This is called a race condition.
Because Ruby's thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the algorithm and seemingly out of your control.
It is therefore possible for two threads to modify data in such a sequence where you get an unexpected outcome.

Here is an example of the so called check-and-act race condition, where you check for a variable's value and then act in regards to it.

require 'thwait'

def send_money(amount)
  puts "Sending $#{amount}"
  sleep 1  # Simulate network call sending of money PS: This is I/O, so you know Ruby releases GIL here
end


threads = []
money_is_sent = false

2.times do
  th = Thread.new do
    unless money_is_sent
      send_money 10
      money_is_sent = true
    end
  end
  threads << th
end


ThWait.all_waits(*threads)

We obviously want to send the money only once but running the code shows that this is not the case

> enether$ ruby balling.rb
> Sending $10
> Sending $10

This is what happens here

As you saw, what looks like straightforward code can end up producing a huge problem (losing us money!) when executed concurrently. It is up to you to make your code thread-safe.

How to protect yourself

So how could we avoid such race conditions?
Simple, you can take the same approach as the Ruby Core team and introduce your own lock (kind of like the GIL), which would be a local lock on a block of code.
This is called a Mutex (Mutual Exclusion) and it helps you synchronize access to blocks of code, acting like a gatekeeper.

require 'thwait'

def send_money(amount)
  puts "Sending $#{amount}"
  sleep 1  # Simulate network call sending of money
end

lock = Mutex.new
threads = []
money_is_sent = false

2.times do
  th = Thread.new do
    lock.synchronize {
      unless money_is_sent
        send_money 10
        money_is_sent = true
      end  
    }
  end
  threads << th
end


ThWait.all_waits(*threads)

We define a Mutex and call the synchronize method. When we enter the block in the synchronize method, our mutex gets locked. If another thread tries to access code through lock.synchronize it will see that the lock is locked and pause until it is unlocked.

> enether$ ruby balling_on_a_budget.rb
> Sending $10

Be sure to note that lock.synchronize only prevents a thread from being interrupted by others wanting to execute code wrapped inside the same lock variable!
Creating two different locks will obviously not work.

2.times do
  Thread.new do
    Mutex.new.synchronize {
      unless money_is_sent
        send_money 10
        money_is_sent = true
      end
    }
  end
end

> enether$ ruby lock_city.rb
> Sending $10
> Sending $10

yeah, no way

Mutexes are not perfect

Now that we know about these locks, we need to pay attention to how we use them. They offer protection but there is also a possibility where that can backfire on you if not used correctly.
It is possible to end up in a so-called deadlock (sounds scary, doesn't it?). A deadlock is a situation where one thread that holds mutex A waits for a mutex B to be released but the thread that holds mutex B is waiting for mutex A.

require 'thread'
require 'thwait'

first_lock = Mutex.new
second_lock = Mutex.new

a = Thread.new {
  first_lock.synchronize {
    sleep 1  # essentially forces a context switch
    second_lock.synchronize {
      puts 'Locked #1 then #2'
    }
  }
}

b = Thread.new {
  second_lock.synchronize {
    sleep 1  # essentially forces a context switch
    first_lock.synchronize {
      puts 'Locked #2 then #1'
    }
  }
}

ThWait.all_waits(a, b)

> enether$ ruby dead_lock.rb
> /Users/enether/.rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/thwait.rb:112:in `pop': No live threads left. Deadlock? (fatal)

They are both holding what the other thread wants and waiting for what the other thread has.
Of course, this is a pretty specific example and there are not many cases in which you might use two mutexes in such a way, but it is essential to know about this pitfall.

Summary

We saw that regardless of the GIL you can still do tasks asynchronously (I/O and native libraries) and confirmed that it won't save you from your thread-unsafe code.
You learned about the most common pitfall - the check-then-act race condition, we introduced a way of handling the problem through our own little GIL-esque lock (Mutex) and we saw that even that can backfire.

I hope I've managed to showcase how tricky multithreaded programming can turn out to be and how it can introduce problems you would not consider programming synchronously.

Top comments (1)

Martin Inkyov • Nov 29 '17

Fucking killer article man, has been a pleasure working with you (: