Raphael Jambalos

Posted on May 20, 2018 • Edited on Sep 29, 2019

Coding Sidekiq Workers the Right Way

#rails #sidekiq #ruby #refactoring

Like Rails controllers, it's easy to get comfortable dumping logic into Sidekiq workers. It's all in one place like good ol' imperative style. But overtime, it gets messy, and you find yourself making new Sidekiq workers instead of just fixing or using old ones.

So, before your background tasks in Sidekiq become more chaotic, here are a few tips I have learned in our team.

1. Don't place logic in your worker.

You'd start with 10 lines and think hey, maybe it's alright for this code to be here. Six months later, you find out you need add additional checks on that code. You add 20 more lines. Then, your teammates discover a bug and add ten more lines.

Then, a newbie decides that your logic is great. She wants to re-use it. But hey, the logic is fitted for the worker use case. Instead of risking a fuss in his first deployment, he just copy-pastes your code into his worker. Yuck. Now you have two copies of the same code evolving differently.

Point is, logic almost always grows. Be responsible. Take the time to think about where to place the logic - in an interactor, service object or model, where it belongs.

2. Don't make your workers too big.

Sidekiq is made for small tasks. Small, lightweight tasks. It isn't designed for long running workers.

So, how do you know your workers are "too big"?

There's a loop. For example, we have an Invoice model attached to an Order. Every time we issue an Order, we give an Invoice and reserve the stock for the items ordered. Invoices expire after 5 days of being issued. So, it brings us to this worker...

class InvoiceExpirerWorker
  include Sidekiq::Worker
  include Sidetiq::Schedulable

  def perform
    expired_invoices = Invoices.expirable

    expired_invoices.each do |invoice|
      invoice.expire!
    end
  end

Now, this code snippet may look harmless to you, but when you look at the Invoices model, there are 100 lines of code involved in expiring an invoice: cancelling the order, returning the stock of the items, emailing the customer, updating the records, etc.

Imagine doing that for 10,000 invoices expiring today. Imagine each invoice taking 20 seconds to fully expire, multiplied by 10,000 invoices.

Point is, the work is too much for a single worker.

In our team, we had around a dozen workers that don't finish execution. Why? Precisely because it does WAY too much than a single Sidekiq worker is designed for. The workers just run out of memory and Linux kills that task off, and we are left with hundreds of unexpired invoices that should have been expired. Great.

So, what do we do now?

Use a master worker to spawn smaller workers. The master worker is tasked with constructing a list of invoices to be expired. It goes through them one by one, and calls a separate worker on each one. If there are 10,000 invoices in the list, then 10,000 lightweight workers would be created.

That way, it has a greater chance of being completed because each worker doesn't take too long.

Here's a better pattern:

class BatchInvoiceExpirerWorker
  include Sidekiq::Worker
  include Sidetiq::Schedulable

  def perform
    expired_invoices = Invoices.expirable

    expired_invoices.each do |invoice|
      InvoiceExpirerWorker.perform_async(invoice.id)
    end
  end
end

class InvoiceExpirerWorker
  include Sidekiq::Worker

  def perform(invoice_id)
    invoice = Invoice.find(invoice_id)
    invoice.expire!
  end
end

Instead of one big worker taking 200,000 seconds to finish (that's around 55hours everyday), we now have 10,000 small workers queued up to run for 20 seconds each. Better.

3. Organize your code into directories.

We started with just 10 workers. Then it became 20, then 40, then 50. Now, it has 100 workers in apps/workers.

Do yourself a favor. Organize your code into directories, please.

4. Plan your Sidekiq execution schedules and prioritization schemes.

If you think 3 AM is a safe time to add workers… Well, think again. Maybe three other developers thought of that as well, and would add their own workers at 3 AM.

Let’s go back to our example earlier of 10,000 small Invoice expirer workers mentioned in #2 – let’s refer to that as Worker 1. For example, you have decided to add Worker 1 at 3 AM. Then, another worker – let’s call it Worker 2 – is queued around the same time as Worker 1. Yikes. If Worker 2 is a time critical worker, you may have to wait more than an hour to get that.

To avoid this, create a tracker. It could be as simple as a Google spreadsheet, or better yet, an automated system so you don't have to clean up after every developer adds/changes Sidekiq schedules. The tracker gives you an updated view of what workers are schedule when. This way, you can avoid Worker 1 getting in the way of the more time-critical Worker 2.

Now, don’t stop there. There would be instances wherein a certain worker would have to be prioritized over another, and you must also take that into account.

Let’s say you plan Worker 2 to run on a higher priority. That way, when it runs at around 3:05 AM, it will run ahead of Worker 1.

However, what if, in the future, someone else adds another worker (eg. Worker 3, Worker 4, etc.) of higher or equal priority on top of what is currently scheduled? This would add some delay before your worker gets done.

Prioritization in Sidekiq is done by organizing your workers in queues. Each queue has a corresponding priority level in Sidekiq. Read more about it here: https://github.com/mperham/sidekiq/wiki/Advanced-Options

Hence, it’s better to plan both prioritization and scheduled worker timings together in order to have a better chance of faster Sidekiq execution times, and lower under-utilization of your Sidekiq instance.

5. Periodically review your Sidekiq tasks.

Maybe a task before took just a few seconds. Now that there are 10x more orders, the task maybe taking 5minutes. Fast forward to this year, it may not even finish executing. It is now time to refactor the worker based on the principles above.

Conclusion

I hope this blog post saves you time and effort of sorting through Sidekiq workers. By having a standard practice in place, you save yourself (and everyone!) the hassle of trying to review your code, or simply finding out why a worker failed.

Special thanks to Allen, my editor, for helping make this post more coherent

Top comments (10)

Robert Fletcher • Jun 7 '18

One thing we've been doing for a little while with our logic is to encapsulate it in "callable poros", basically poros that have a single call method and no other public interface. The side effect of this is that we don't need to add new jobs, we can simply have a generic CallablePoroJob and pass the name of the poro along with arguments. It makes it trivial to background some logic on the fly without needing to create new worker classes. Which can be especially valuable when you do zero downtime deploys, which would typically necessitate multiple deploys to make sure the job class is available before code that makes use of it is deployed.

Amit Patel • Dec 11 '18

I need to process large(size and number) CSVs (couple of MBs having more than 0.1 million records each). Each record in the CSV has to be processed sequentially.

There are lot many relations comes into picture and so I have use memoization a lot to reduce db calls.

So in such cases, I cannot use the technique mentioned in point no 2.

Michael VanZant • May 29 '19

Load raw-ish data into a work table and then process in the database?

Dylan Pierce • May 30 '18

I agree with spawning concurrent workers, but how do you keep those 10,000 jobs from exhausting the db connection pool? I started wrapping my job logic with a .with_connection block to keep the job from using 10,000 connections to the database.

How do you execute both concurrency and database connection handling?

Raphael Jambalos • May 31 '18

For our team, we make the connection pool size = concurrency so we don't have to think about the concurrency exhausting the DB connection pool. With this, we have to trade-off having high levels of concurrency for being sure that our DB connection pool won't get exhausted.

There's a whole debate on the Sidekiq repository about this: github.com/mperham/sidekiq/issues/...

Henrik Nyh • Oct 26 '18

I used to agree with #1, but I've since come to realise it was (at least in my case) an overreaction.

Now, my feeling is this. If you need to run some logic in a worker, just write it in the worker until your needs change. The one valid reason for extracting it is that it improves something – makes tests easier to write or faster to run, de-duplicates code avoiding bugs etc.

Also consider that the cost of extracting it to another class today (when you don't know how or if you will reuse it) is probably about the same as extracting it later if the need arises. And then you will only do it if you need to, and you will be able to do a better job since you know how it's reused.

It's always the case that it depends, of course. There are nuances and I'll extract some stuff, e.g. things that are very clearly tightly coupled to a model and has the same reasons to change as other parts of that model.

Raphael Jambalos • Nov 1 '18

For small, straightroward workers, I agree with you that it could stay in the worker since we don't know how it would grow.

For workers with complex actions though, I think that #1 still applies since we already know more or less how the logic would grow, and enforcing that now could save other developers working on it in the future of having to figure out how it's supposed to be organized.