Soo Yang, Tan

Posted on Jan 31, 2019

Power of ActiveRecord and when to stop using it

#ruby #rails #activerecord #sql

In Rails, ActiveRecord makes it super easy to query for data from the database--but how well does it perform for your use case?

Scenario

Let's assume I have a Project model on my rails app. I would like to display the total number of project that are active and pending at a status against the total number of active projects on a dashboard.

Which would look something like this:

  Team Lead Action (25/100) | Manager Action (32/100)

Implementation

Thanks to ActiveRecord, we can easily achieve get the total count:

  active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
  @projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count

Voila! We can easily call these variables to display them on the dashboard. Problem solved!

However, I'm a little concerned if the amount of projects scales to a large amount. For example, just try imagining if we now need to display 5 different values, our queries will scale like this:

active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
  @projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count
  @projects_pending_manager_2_approval = active_projects.reject(&:manager_2_action_time).count
  @projects_pending_manager_3_approval = active_projects.reject(&:manager_3_action_time).count
  @projects_pending_manager_4_approval = active_projects.reject(&:manager_4_action_time).count
  @projects_pending_manager_5_approval = active_projects.reject(&:manager_5_action_time).count

That don't look very scalable. For such simple data value, it should be retrieved efficiently.

Let's verify things using data with the help of Benchmark with N = 1000:


                        user     system      total        real
count_active_project  0.066184   0.108178   0.174362 (  0.182557)

It takes about 0.18s. This time let's just change the query:

  active_projects = Project.where(status: 'active')
  @total_number_of_active_projects = active_projects.size
  @projects_pending_team_lead_approval = active_projects.where(team_lead_action_time: nil).size
  @projects_pending_manager_approval = active_projects.where(manager_action_time: nil).size

And benchmark it:


                              user     system      total        real
count_active_projects_ar    0.001460   0.000166   0.001626 (  0.002965)

This query which does the same thing takes only 0.002s - significantly faster. The difference is this time we are not using .reject - ( array method ) but instead relying on ActiveRecord. But why? Let's take a look at the logs.

Using .reject:

irb(main):036:0> Project.where(status: 'active').reject(&:team_lead_action_time).count
  Project Load (2.2ms)  SELECT "projects".* FROM "projects" WHERE "projects"."status" = $1  [["status", 0]]

Using ActiveRecord:

irb(main):040:0> Project.where(status: 'active', team_lead_action_time: nil).size
   (0.9ms)  SELECT COUNT(*) FROM "projects" WHERE "projects"."status" = $1 AND "projects"."team_lead_action_time" IS NULL  [["status", 0]]

As we can see from the output, .reject load the projects into the memory before processing while ActiveRecord will let the database handle the query. Database is fast and was designed to handle such process hence the speed improvement.

Annnd there's raw SQL

Taking a step further, we can write raw SQL in rails.

  query = <<~SQL
    SELECT
      SUM(CASE WHEN status = 0 THEN 1 ELSE 0 END),
      SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END),
      SUM(CASE WHEN status = 0 AND manager_action_time IS null THEN 1 ELSE 0 END)
    FROM
      PROJECTS
  SQL

  ActiveRecord::Base.connection.execute(query)

And the result is:


                              user     system      total        real
count_active_projects_sql    0.000250   0.000047   0.000297 (  0.000966)

Raw SQL only takes 0.0009s which is about 20 times faster.

Taking a look at the log:

irb(main):044:0> Project.count_active_projects_sql
   (0.8ms)  SELECT
  SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END)
FROM
  PROJECTS
=> #<PG::Result:0x00007fa3cdc1c460 status=PGRES_TUPLES_OK ntuples=1 nfields=1 cmd_tuples=1>

Raw SQL is significantly faster because it does not need to instantiate a new ActiveRecord objects which is slow but instead it returns a PG::Result object.

If we increase N = 100000, the benchmark result will be:

                              user     system      total        real
count_active_projects      4.428081   7.626659  12.054740 ( 12.177786)
count_active_projects_ar   0.002107   0.000303   0.002410 (  0.062542)
count_active_projects_sql  0.000380   0.000057   0.000437 (  0.026585)

We can clearly see that using raw SQL is the fastest 🏆.

Takeaway

1) In the end, I decided to go with the raw SQL method. Putting performance in mind, raw SQL is suited in handling such scenario.
2) Between raw SQL and ActiveRecord, in most cases ActiveRecord would be preferable for developers productivity.
3) Be careful with the methods used. Ruby .reject - (array methods) are great but use them with caution.

Depending on the scenario, solutions may vary. We can write our code creatively and test it out to fit our needs. We need to stay curious and be creative in improving our solutions.

** The example code can be found here https://github.com/sooyang/query_speed_example

Top comments (2)

MetaDave 🇪🇺 • Feb 1 '19

Good stuff.

I would have a look at the MiniSQL gem also, which we're using to generate dropdown lists, for example, and which is very efficient indeed.

One technique we've also used very effectively is to embed logic in views and place an ActiveRecord model on top of them. The scenic gem is useful for that, and it hides the SQL from your Ruby developers – they just get really fast processing!

Soo Yang, Tan • Feb 4 '19

Wow. I've checked them out and they are really interesting solutions.

MiniSQL gem looks really good due to its simplicity and fast! Certainly something I will consider using in the future ;)

Thanks for you input!

DEV Community

Power of ActiveRecord and when to stop using it

Scenario

Implementation

Annnd there's raw SQL

Takeaway

Top comments (2)

Read next

True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?

Populating a Vertica Database with Random Data

Best Practices for Using GROUP BY in MySQL for Converting Vertical Data to JSON

Alphanumeric / Natural sort in MySQL - 30 years and we still can't do this? 😤