DEV Community

Cover image for What is the fastest way to remove data using Rails ?
Pimp My Ruby
Pimp My Ruby

Posted on

What is the fastest way to remove data using Rails ?

Efficient data record deletion is a crucial consideration for high-traffic Rails applications. In this article, we will explore various deletion methods using ActiveRecord.

Our mission today is to delete an entire section of our database. Our work will impact 3 tables: Accounts, Missions, and Tasks.

An Account has several Missions, and a Mission has several Tasks.

Our goal is to delete all Accounts, Missions, and Tasks as quickly as possible!

Table of Contents

 1. Dataset 💽
 2. Deletion Methods 🗑️
       1. #destroy_all
       2. #in_batches(of: 1,000).destroy_all
       3. #delete_all
       4. #truncate
       5. Add on_delete: :cascade on the foreign key
 3. Benchmark 🥊
       Interpretation of Results
       Recommendations

Dataset 💽

For today's benchmark, we will start with a classic scenario of a two-level has_many relationship.

Here is the database schema we will be using:



# db/schema.rb
create_table "accounts", force: :cascade do |t|
  t.string "first_name"
  t.string "last_name"
  t.string "phone"
  t.string "email"
  t.string "role"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.index ["email"], name: "index_accounts_on_email", unique: true
end

create_table "missions", force: :cascade do |t|
  t.bigint "account_id", null: false
  t.string "name"
  t.date "due_date"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.index ["account_id"], name: "index_missions_on_account_id"
end

create_table "tasks", force: :cascade do |t|
  t.bigint "mission_id", null: false
  t.string "title"
  t.text "description"
  t.string "status"
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.index ["mission_id"], name: "index_tasks_on_mission_id"
end


Enter fullscreen mode Exit fullscreen mode

To see how the data is generated, you can access my repository.

In my Models, I default to the following configuration, with dependent: :destroy

(We will discuss the different possibilities later)



# app/models/account.rb
class Account < ApplicationRecord
  has_many :missions, dependent: :destroy
end

# app/models/mission.rb
class Mission < ApplicationRecord
  belongs_to :account
  has_many :tasks, dependent: :destroy
end

# app/models/task.rb
class Task < ApplicationRecord
  belongs_to :mission
end


Enter fullscreen mode Exit fullscreen mode

Deletion Methods 🗑️

Let's look at all the methods we will compare today

1. #destroy_all

The destroy_all method is a common approach for deleting records using ActiveRecord. It instantiates each record and calls destroy individually, triggering associated callbacks. It's generally the slowest approach, but also the safest to use.



Account.destroy_all


Enter fullscreen mode Exit fullscreen mode

Since all Models have their dependant: :destroy relationship, destroy_all will trigger by callback the deletion of Missions associated with the Account we want to delete. But, as we delete a mission, we will also delete all the associated Tasks.

It's hassle-free, but very slow!

2. #in_batches(of: 1,000).destroy_all

I see a lot of ideas that deleting records in batches increases performance. In fact, it can reduce memory load and improve execution time.



Account.in_batches(of: 1,000).destroy_all


Enter fullscreen mode Exit fullscreen mode

3. #delete_all

The delete_all method deletes records from the database without going through ActiveRecord validation. So, we have no validation or callback. This means that data deletion will be refused at the database level, as integrity will be lost.

Therefore, we must conscientiously order the deletion as follows:



Task.delete_all
Mission.delete_all
Account.delete_all


Enter fullscreen mode Exit fullscreen mode

In theory, this method is much more efficient than any use of destroy

4. #truncate

The truncate method is a radical approach that deletes all table data in one operation.



Account.connection.truncate_tables('accounts', 'missions', 'tasks')


Enter fullscreen mode Exit fullscreen mode

If you are sure you want to delete all of your tables, this will always be the fastest way to delete your data. However, it is impossible to scope the data; it will really reset your table.

5. Add on_delete: :cascade on the foreign key

There is a way to delete your tables much more quickly by adding a cascade parameter to your foreign key.

For this, we need a migration:



# db/migrate/20231227130700_add_delete_cascade.rb
class AddDeleteCascade < ActiveRecord::Migration[7.0]
  def change
    remove_foreign_key :missions, :accounts
    add_foreign_key :missions, :accounts, on_delete: :cascade
    remove_foreign_key :tasks, :missions
    add_foreign_key :tasks, :missions, on_delete: :cascade
  end
end


Enter fullscreen mode Exit fullscreen mode

This will transform the declaration of our foreign key in the schema as follows:



# db/schema.rb
add_foreign_key "missions", "accounts", on_delete: :cascade
add_foreign_key "tasks", "missions", on_delete: :cascade


Enter fullscreen mode Exit fullscreen mode

What does this change?

Now, when we delete an Account with #delete, all records that depend on this Account will also be deleted. It's like a callback, but entirely managed by the database. This technique is very efficient! The only problem is that records deleted by the cascade are not logged.


Benchmark 🥊

For the benchmark, we will delete the same set of data each time, which contains:

  • 100,000 Accounts
  • 500,000 Missions (5 per Account)
  • 1,500,000 Tasks (3 per Mission)

All this makes us a nice total of about 2M records. Let's see together which method is the most efficient!

Method Average (s) Min (s) Max (s)
#destroy_all 412 386 443
#destroy_all with batches 515 492 551
#delete_all 7 5 10
#delete_all on top of a database cascade 4.7 2.9 7.7
#truncate 1.8 1.2 2.5

Well, the first thing to say is that clearly, #destroy_all and #destroy_all with batches are clearly behind the others!

Here is the graphical representation of the 10 trials that were conducted for the 3 fastest methods:

Image description

So here is the ranking, very predictably:

🥇#truncate

🥈#destroy_all on top of the cascade

🥉#destroy_all

truncate is therefore much more efficient than the other competitors. This result is not surprising, as, as mentioned earlier, truncate will delete the table all at once.


Interpretation of Results

The benchmark results reveal significant differences between data deletion methods in a Rails application. Let's take a closer look at each method:

  1. #destroy_all

    With an average of 412 seconds, #destroy_all proves to be the slowest method. This slowness is due to the instantiation of each record and the execution of associated callbacks, which creates an additional burden.

  2. #destroy_all with batches

    Surprisingly, this method, with an average of 515 seconds, is even slower than the classic #destroy_all. Although it is supposed to be more efficient in theory thanks to batch management, the results indicate that batch processing can, in some cases, worsen performance.

  3. #delete_all

    Significantly faster with an average of only 7 seconds, #delete_all avoids the costs associated with ActiveRecord callbacks and validations. However, it requires careful management of the deletion order to maintain data integrity.

  4. #delete_all on top of a database cascade

    With an average of 4.7 seconds, this method proves even more effective. It takes advantage of cascade deletion triggers at the database level, offering a quick and direct way to manage dependencies.

  5. #truncate

    By far the fastest method with an average of 1.8 seconds. As it resets tables without regard to individual records, #truncate offers unmatched performance, particularly suited to scenarios where the entirety of a table's data needs to be erased.

The graphical representation clearly confirms the superiority of #truncate in terms of speed, followed by methods based on the database cascade, and finally the #destroy_all approaches.


Recommendations

Considering the results, here are the recommendations for choosing the appropriate deletion method depending on specific needs:

  • For a secure and controlled deletion: Use #destroy_all. Ideal when callbacks and validations are crucial for data integrity and business logic.
  • For total deletions without integrity constraints: Prefer #truncate. This method is recommended in test scenarios or when you need to completely reset a table.
  • For optimized performance with integrity constraints: Consider #delete_all with a cascade parameter on foreign keys. This approach is suitable when data integrity is managed at the database level and ActiveRecord callbacks are not necessary.
  • In any case: Avoid #destroy_all with batches.

In conclusion, data deletion in Rails applications requires a balance between efficiency, security, and data integrity. This benchmark reveals that each method has its distinct advantages: #destroy_all ensures secure deletion despite its slowness, #delete_all optimizes performance while managing data integrity at the database level, and #truncate stands out for its unmatched speed in complete resets.

In the end, the choice of the appropriate method should be guided by the specific needs of your application and the technical compromises it implies. A deep understanding of these options is crucial to maintaining a balance between performance and reliability in managing your Rails application's data.

Top comments (0)