In Rails, ActiveRecord
makes it super easy to query for data from the database--but how well does it perform for your use case?
Scenario
Let's assume I have a Project
model on my rails app. I would like to display the total number of project that are active and pending at a status against the total number of active projects on a dashboard.
Which would look something like this:
Team Lead Action (25/100) | Manager Action (32/100)
Implementation
Thanks to ActiveRecord
, we can easily achieve get the total count:
active_projects = Project.where(status: 'active')
@total_number_of_active_projects = active_projects.size
@projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
@projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count
Voila! We can easily call these variables to display them on the dashboard. Problem solved!
However, I'm a little concerned if the amount of projects scales to a large amount. For example, just try imagining if we now need to display 5 different values, our queries will scale like this:
active_projects = Project.where(status: 'active')
@total_number_of_active_projects = active_projects.size
@projects_pending_team_lead_approval = active_projects.reject(&:team_lead_action_time).count
@projects_pending_manager_approval = active_projects.reject(&:manager_action_time).count
@projects_pending_manager_2_approval = active_projects.reject(&:manager_2_action_time).count
@projects_pending_manager_3_approval = active_projects.reject(&:manager_3_action_time).count
@projects_pending_manager_4_approval = active_projects.reject(&:manager_4_action_time).count
@projects_pending_manager_5_approval = active_projects.reject(&:manager_5_action_time).count
That don't look very scalable. For such simple data value, it should be retrieved efficiently.
Let's verify things using data with the help of Benchmark
with N = 1000
:
user system total real
count_active_project 0.066184 0.108178 0.174362 ( 0.182557)
It takes about 0.18s
. This time let's just change the query:
active_projects = Project.where(status: 'active')
@total_number_of_active_projects = active_projects.size
@projects_pending_team_lead_approval = active_projects.where(team_lead_action_time: nil).size
@projects_pending_manager_approval = active_projects.where(manager_action_time: nil).size
And benchmark it:
user system total real
count_active_projects_ar 0.001460 0.000166 0.001626 ( 0.002965)
This query which does the same thing takes only 0.002s
- significantly faster. The difference is this time we are not using .reject
- ( array method ) but instead relying on ActiveRecord
. But why? Let's take a look at the logs.
Using .reject
:
irb(main):036:0> Project.where(status: 'active').reject(&:team_lead_action_time).count
Project Load (2.2ms) SELECT "projects".* FROM "projects" WHERE "projects"."status" = $1 [["status", 0]]
Using ActiveRecord
:
irb(main):040:0> Project.where(status: 'active', team_lead_action_time: nil).size
(0.9ms) SELECT COUNT(*) FROM "projects" WHERE "projects"."status" = $1 AND "projects"."team_lead_action_time" IS NULL [["status", 0]]
As we can see from the output, .reject
load the projects
into the memory before processing while ActiveRecord
will let the database handle the query. Database is fast and was designed to handle such process hence the speed improvement.
Annnd there's raw SQL
Taking a step further, we can write raw SQL in rails.
query = <<~SQL
SELECT
SUM(CASE WHEN status = 0 THEN 1 ELSE 0 END),
SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END),
SUM(CASE WHEN status = 0 AND manager_action_time IS null THEN 1 ELSE 0 END)
FROM
PROJECTS
SQL
ActiveRecord::Base.connection.execute(query)
And the result is:
user system total real
count_active_projects_sql 0.000250 0.000047 0.000297 ( 0.000966)
Raw SQL only takes 0.0009s
which is about 20 times faster.
Taking a look at the log:
irb(main):044:0> Project.count_active_projects_sql
(0.8ms) SELECT
SUM(CASE WHEN status = 0 AND team_lead_action_time IS null THEN 1 ELSE 0 END)
FROM
PROJECTS
=> #<PG::Result:0x00007fa3cdc1c460 status=PGRES_TUPLES_OK ntuples=1 nfields=1 cmd_tuples=1>
Raw SQL is significantly faster because it does not need to instantiate a new ActiveRecord objects which is slow but instead it returns a PG::Result
object.
If we increase N = 100000
, the benchmark result will be:
user system total real
count_active_projects 4.428081 7.626659 12.054740 ( 12.177786)
count_active_projects_ar 0.002107 0.000303 0.002410 ( 0.062542)
count_active_projects_sql 0.000380 0.000057 0.000437 ( 0.026585)
We can clearly see that using raw SQL is the fastest π.
Takeaway
1) In the end, I decided to go with the raw SQL method. Putting performance in mind, raw SQL is suited in handling such scenario.
2) Between raw SQL and ActiveRecord
, in most cases ActiveRecord
would be preferable for developers productivity.
3) Be careful with the methods used. Ruby .reject
- (array methods) are great but use them with caution.
Depending on the scenario, solutions may vary. We can write our code creatively and test it out to fit our needs. We need to stay curious and be creative in improving our solutions.
** The example code can be found here https://github.com/sooyang/query_speed_example
Top comments (2)
Good stuff.
I would have a look at the MiniSQL gem also, which we're using to generate dropdown lists, for example, and which is very efficient indeed.
One technique we've also used very effectively is to embed logic in views and place an ActiveRecord model on top of them. The scenic gem is useful for that, and it hides the SQL from your Ruby developers β they just get really fast processing!
Wow. I've checked them out and they are really interesting solutions.
MiniSQL gem looks really good due to its simplicity and fast! Certainly something I will consider using in the future ;)
Thanks for you input!