Efficient data record deletion is a crucial consideration for high-traffic Rails applications. In this article, we will explore various deletion methods using ActiveRecord.
Our mission today is to delete an entire section of our database. Our work will impact 3 tables: Accounts, Missions, and Tasks.
An Account has several Missions, and a Mission has several Tasks.
Our goal is to delete all Accounts, Missions, and Tasks as quickly as possible!
Table of Contents
1. Dataset 💽
2. Deletion Methods 🗑️
1. #destroy_all
2. #in_batches(of: 1,000).destroy_all
3. #delete_all
4. #truncate
5. Add on_delete: :cascade on the foreign key
3. Benchmark 🥊
Interpretation of Results
Recommendations
Dataset 💽
For today's benchmark, we will start with a classic scenario of a two-level has_many relationship.
Here is the database schema we will be using:
# db/schema.rb
create_table "accounts", force: :cascade do |t|
t.string "first_name"
t.string "last_name"
t.string "phone"
t.string "email"
t.string "role"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["email"], name: "index_accounts_on_email", unique: true
end
create_table "missions", force: :cascade do |t|
t.bigint "account_id", null: false
t.string "name"
t.date "due_date"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["account_id"], name: "index_missions_on_account_id"
end
create_table "tasks", force: :cascade do |t|
t.bigint "mission_id", null: false
t.string "title"
t.text "description"
t.string "status"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["mission_id"], name: "index_tasks_on_mission_id"
end
To see how the data is generated, you can access my repository.
In my Models, I default to the following configuration, with dependent: :destroy
(We will discuss the different possibilities later)
# app/models/account.rb
class Account < ApplicationRecord
has_many :missions, dependent: :destroy
end
# app/models/mission.rb
class Mission < ApplicationRecord
belongs_to :account
has_many :tasks, dependent: :destroy
end
# app/models/task.rb
class Task < ApplicationRecord
belongs_to :mission
end
Deletion Methods 🗑️
Let's look at all the methods we will compare today
1. #destroy_all
The destroy_all
method is a common approach for deleting records using ActiveRecord. It instantiates each record and calls destroy
individually, triggering associated callbacks. It's generally the slowest approach, but also the safest to use.
Account.destroy_all
Since all Models have their dependant: :destroy
relationship, destroy_all
will trigger by callback the deletion of Missions associated with the Account we want to delete. But, as we delete a mission, we will also delete all the associated Tasks.
It's hassle-free, but very slow!
2. #in_batches(of: 1,000).destroy_all
I see a lot of ideas that deleting records in batches increases performance. In fact, it can reduce memory load and improve execution time.
Account.in_batches(of: 1,000).destroy_all
3. #delete_all
The delete_all
method deletes records from the database without going through ActiveRecord validation. So, we have no validation or callback. This means that data deletion will be refused at the database level, as integrity will be lost.
Therefore, we must conscientiously order the deletion as follows:
Task.delete_all
Mission.delete_all
Account.delete_all
In theory, this method is much more efficient than any use of destroy
4. #truncate
The truncate
method is a radical approach that deletes all table data in one operation.
Account.connection.truncate_tables('accounts', 'missions', 'tasks')
If you are sure you want to delete all of your tables, this will always be the fastest way to delete your data. However, it is impossible to scope the data; it will really reset your table.
5. Add on_delete: :cascade on the foreign key
There is a way to delete your tables much more quickly by adding a cascade parameter to your foreign key.
For this, we need a migration:
# db/migrate/20231227130700_add_delete_cascade.rb
class AddDeleteCascade < ActiveRecord::Migration[7.0]
def change
remove_foreign_key :missions, :accounts
add_foreign_key :missions, :accounts, on_delete: :cascade
remove_foreign_key :tasks, :missions
add_foreign_key :tasks, :missions, on_delete: :cascade
end
end
This will transform the declaration of our foreign key in the schema as follows:
# db/schema.rb
add_foreign_key "missions", "accounts", on_delete: :cascade
add_foreign_key "tasks", "missions", on_delete: :cascade
What does this change?
Now, when we delete an Account with #delete
, all records that depend on this Account will also be deleted. It's like a callback, but entirely managed by the database. This technique is very efficient! The only problem is that records deleted by the cascade are not logged.
Benchmark 🥊
For the benchmark, we will delete the same set of data each time, which contains:
- 100,000 Accounts
- 500,000 Missions (5 per Account)
- 1,500,000 Tasks (3 per Mission)
All this makes us a nice total of about 2M records. Let's see together which method is the most efficient!
Method | Average (s) | Min (s) | Max (s) |
---|---|---|---|
#destroy_all | 412 | 386 | 443 |
#destroy_all with batches | 515 | 492 | 551 |
#delete_all | 7 | 5 | 10 |
#delete_all on top of a database cascade | 4.7 | 2.9 | 7.7 |
#truncate | 1.8 | 1.2 | 2.5 |
Well, the first thing to say is that clearly, #destroy_all and #destroy_all with batches are clearly behind the others!
Here is the graphical representation of the 10 trials that were conducted for the 3 fastest methods:
So here is the ranking, very predictably:
🥇#truncate
🥈#destroy_all on top of the cascade
🥉#destroy_all
truncate is therefore much more efficient than the other competitors. This result is not surprising, as, as mentioned earlier, truncate will delete the table all at once.
Interpretation of Results
The benchmark results reveal significant differences between data deletion methods in a Rails application. Let's take a closer look at each method:
-
#destroy_all
With an average of 412 seconds,
#destroy_all
proves to be the slowest method. This slowness is due to the instantiation of each record and the execution of associated callbacks, which creates an additional burden. -
#destroy_all with batches
Surprisingly, this method, with an average of 515 seconds, is even slower than the classic
#destroy_all
. Although it is supposed to be more efficient in theory thanks to batch management, the results indicate that batch processing can, in some cases, worsen performance. -
#delete_all
Significantly faster with an average of only 7 seconds,
#delete_all
avoids the costs associated with ActiveRecord callbacks and validations. However, it requires careful management of the deletion order to maintain data integrity. -
#delete_all on top of a database cascade
With an average of 4.7 seconds, this method proves even more effective. It takes advantage of cascade deletion triggers at the database level, offering a quick and direct way to manage dependencies.
-
#truncate
By far the fastest method with an average of 1.8 seconds. As it resets tables without regard to individual records,
#truncate
offers unmatched performance, particularly suited to scenarios where the entirety of a table's data needs to be erased.
The graphical representation clearly confirms the superiority of #truncate
in terms of speed, followed by methods based on the database cascade, and finally the #destroy_all
approaches.
Recommendations
Considering the results, here are the recommendations for choosing the appropriate deletion method depending on specific needs:
-
For a secure and controlled deletion: Use
#destroy_all
. Ideal when callbacks and validations are crucial for data integrity and business logic. -
For total deletions without integrity constraints: Prefer
#truncate
. This method is recommended in test scenarios or when you need to completely reset a table. -
For optimized performance with integrity constraints: Consider
#delete_all
with a cascade parameter on foreign keys. This approach is suitable when data integrity is managed at the database level and ActiveRecord callbacks are not necessary. - In any case: Avoid
#destroy_all with batches
.
In conclusion, data deletion in Rails applications requires a balance between efficiency, security, and data integrity. This benchmark reveals that each method has its distinct advantages: #destroy_all
ensures secure deletion despite its slowness, #delete_all
optimizes performance while managing data integrity at the database level, and #truncate
stands out for its unmatched speed in complete resets.
In the end, the choice of the appropriate method should be guided by the specific needs of your application and the technical compromises it implies. A deep understanding of these options is crucial to maintaining a balance between performance and reliability in managing your Rails application's data.
Top comments (0)