DEV Community

Cover image for Performance Guide to create 100k records in less than 3s using Ruby on Rails
Pimp My Ruby
Pimp My Ruby

Posted on • Updated on

Performance Guide to create 100k records in less than 3s using Ruby on Rails

When working on large-scale projects, quickly creating test data or dummy data can be crucial. In this article, we will explore different methods to efficiently create 100,000 records in Ruby on Rails.

Data Set Overview

For today's benchmark, we will start with the Postgres database used in my previous article:

# db/schema.rb
create_table "accounts", force: :cascade do |t|
  t.string "first_name"
  t.string "last_name"
  t.string "phone"
  t.string "email"
  t.string "role"
end
Enter fullscreen mode Exit fullscreen mode

By the way, if you haven't read my previous article, I invite you to do so.

To thoroughly test the efficiency of the methods we will discuss, I will generate two variables upfront that we will use:

accounts = FactoryBot.build_list(:account, 100_000)
accounts_attributes = accounts.map(&:attributes)
Enter fullscreen mode Exit fullscreen mode

Thanks to FactoryBot, we have 100,000 unpersisted ActiveRecord objects in the accounts variable and their attributes in the form of hashes in the accounts_attributes variable.

Finally, for each of our tests, we will use a method and make some variations to try to push the performance of Ruby on Rails to the maximum.

1. Using .save

One of the simplest methods to create a record is to use the .save method. For a large number of records, you can iterate over the ActiveRecord objects and call .save for each instance. However, this can be quite slow for a large number of records.

accounts.each do |account|
  account.save
end
Enter fullscreen mode Exit fullscreen mode

This approach can be slow because it executes an SQL query with each .save call, which can result in unsatisfactory performance.

The first variant of the .save method that I would like to test is .save!. In theory, the performance of .save! should be equivalent to .save. These are the same methods; it's just that .save! will raise an exception if an error occurs.

accounts.each do |account|
  account.save!
end
Enter fullscreen mode Exit fullscreen mode

The last variant I would like to test for .save involves a single SQL transaction. In our two previous examples, ActiveRecord will generate 100,000 transactions with the database when it wants to write our records. So 100,000 times, it will open a connection to the database, send the data, and close the connection. However, this process takes a lot of time!

Account.transaction do
  accounts.each do |account|
    account.save
  end
end
Enter fullscreen mode Exit fullscreen mode

In this code example, we can perform a single transaction with the database that will send all our 100,000 records. This is much faster!

2. Using .create

The .create method is very similar to the .save method. The only difference is that .create belongs to the ActiveRecord model. So, in our case, it belongs to Account. While the .save method belongs to an instance of our model, which is Account.new.

accounts_attributes.each do |account_attributes|
  Account.create(account_attributes)
end
Enter fullscreen mode Exit fullscreen mode

Normally, .create and .save should have the same performance.

The second variant I would like to test is using hashes to create records. According to the documentation, if you pass an array of hashes to .create, you can create multiple records at once. Let's see together if this is faster!

Account.create(accounts_attributes)
Enter fullscreen mode Exit fullscreen mode

To be honest, I think this variant will be as slow as .create on its own. According to the source code of .create, when you pass it an array of hashes, .create will simply iterate over the hashes and call itself to persist the data.

While we're at it, let's test the efficiency of .create! as we did for .save and .save!.

accounts_attributes.each do |account_attributes|
  Account.create!(account_attributes)
end
Enter fullscreen mode Exit fullscreen mode

The last variant I'd like to test involves a single transaction. The same discussion as for .save applies here; we will test what happens when we perform only a single SQL transaction.

Account.transaction do
  accounts_attributes.each do |account_attributes|
    Account.create!(account_attributes)
  end
end
Enter fullscreen mode Exit fullscreen mode

3. Using .insert_all

Ruby on Rails provides a method called .insert_all that allows you to insert multiple records in a single SQL query.

Account.insert_all(accounts_attributes)
Enter fullscreen mode Exit fullscreen mode

We will see in the performance test, but .insert_all will be very fast.

This speed is made possible by the absence of ActiveRecord validations and callbacks, which significantly speeds up the process.

4. Using .upsert_all

In the same vein as .insert_all, Ruby on Rails provides a method called .upsert_all, which allows you to create or update a record if it already exists in the database.

.upsert_all is very convenient for bulk updates.

Account.upsert_all(accounts_attributes)
Enter fullscreen mode Exit fullscreen mode

This method follows the same logic as insert_all; it does not invoke ActiveRecord validations and callbacks, greatly enhancing performance.

5. Using ActiveRecord-Import

For optimal performance, there is a gem called activerecord-import that adds a magical function, .import.

bundle add activerecord-import
Enter fullscreen mode Exit fullscreen mode

Once the gem is installed, you can use it as follows:

Account.import(accounts_attributes)
Enter fullscreen mode Exit fullscreen mode

.import is, in my opinion, the best approach because activerecord-import optimizes performance by minimizing network overhead and allowing block processing. It also handles ActiveRecord validations. So, compared to .insert_all and .upsert_all, it is compatible with the databases supported by Rails, no matter your system; it integrates perfectly.

Benchmark

Now that we know all these methods, let's find out which one is the fastest!

You can find the benchmark here.

Here is the time it takes to create 100,000 records. Place your bets!

Performance Benchmark

Label User System Total Real
.save 98.482027 7.339702 105.821729 174.001099
.save! 82.036858 7.221731 89.258589 145.422204
.save! with transaction 38.410147 2.444573 40.854720 68.257510
.create 105.934837 7.278972 113.213809 185.792927
.create with hashes 118.748599 8.459169 127.207768 204.100991
.create! 121.161595 7.396354 128.557949 203.611114
.create! with transaction 48.788214 2.510467 51.298681 79.932584
.insert_all 1.450411 0.143563 1.593974 3.064136
.upsert_all 1.442954 0.116461 1.559415 2.935700
activerecord-import 3.511353 0.082371 3.593724 4.778761

🥇 Upsert All (69 times faster than .create with hashes)

🥈 Insert All (66 times faster than .create with hashes)

🥉 ActiveRecord-Import (42 times faster than .create with hashes)

I find it very interesting that, overall, .save is slightly faster than .create.

Interpretation

  1. .save & .save! : The .save and .save! methods are very slow, taking about 2 to 3 minutes to process 100,000 records. This is mainly because they perform validations and saves one by one, resulting in frequent database calls.
  2. .create & .create! : The .create and .create! methods are slightly slower than .save.
  3. .save! with transaction & .create! with transaction : Using a transaction significantly improves performance compared to .save and .create, taking about 1 minute to process the same amount of data. So, it is important to use a single transaction when dealing with a large amount of data.
  4. .create with hashes : This method is slightly slower than .create, taking about 3 minutes and 24 seconds. It is the slowest of all in the benchmark. Avoid it in all situations!
  5. .insert_all & .upsert_all : The .insert_all and .upsert_all methods are much faster than the previous methods. They take about 3 to 4 seconds to process 100,000 records. These methods use batch SQL queries to insert or update data. It is crucial to note that validations and callbacks are not invoked with these methods.
  6. ActiveRecord-Import : The ActiveRecord-Import method is also very performant, taking about 4.8 seconds to process 100,000 records. It is essential to note that validations are taken in account when running insert with activerecord-import.

Recommendations

  • For creating a large number of records, I will prefer insert_all and upsert_all if validations are not necessary.
  • If validations are necessary, I will go for activerecord-import.
  • The use of .save and .create should be avoided for a large dataset. However, for a small dataset, and if it is enclosed in a transaction block, the performance loss can be minimized.

Conclusion

Creating 100,000 records in Ruby on Rails can be a challenge, but with the right methods, you can do it in a matter of seconds.

For quickly creating a large number of records, I strongly recommend using methods such as .insert_all, .upsert_all, or ActiveRecord-Import. Using transactions can also improve the performance of methods like .save! and .create!. For optimal performance, it is crucial to choose the method that suits your needs and avoid slower, sequential methods such as .save and .create.

Learn More

Top comments (5)

Collapse
 
btrewern profile image
Ben

If you really want fast imports into PostgreSQL then copy is your friend. There is a gem called activerecord-copy that should be useful. It again does not go through validations or callbacks but could be up to 3-4 times as fast as .insert_all / .upsert_all.

Collapse
 
pimp_my_ruby profile image
Pimp My Ruby

Hi Ben! Thanks for your feedback.

You're completely right, I already used this alternative, but I wanted to do a separate article for this one as the performance are very above the other alternatives.

Collapse
 
maprangsoft profile image
Maprangsoft

thank you.

Collapse
 
harry_wood profile image
Harry Wood

Interpretation point 6 is missing some words

Collapse
 
pimp_my_ruby profile image
Pimp My Ruby

Hey!

Thanks, I have updated the article with the missing content!