As a rule of thumb, we should never run database queries inside a for-loop!
“Database transaction” is an expensive operation.
For example, let’s say we designed an inventory software and it’s being used in production for a year, and reached 1,000,000 transactions.
Suddenly, we learnt that we didn’t add the VAT to our transactions. For the transactions in the future, it’s pretty easy to deal with, maybe with a mutator.
class Transaction extends Model {
public $vat = 0.20;
public function setPriceAttribute($value) {
$this->attributes['price'] += $value * $this->vat;
}
}
Future records are pretty easy to deal with. However how are we going to edit the 1 million records from the past.
For editing data from the past, I prefer to create a Seeder.
php artisan make:seeder AddVatToTransactions
How not to do it?
class AddVatToTransactions extends Seeder {
public function run()
{
$vat = 0.20;
$transactions = Transaction::get();
foreach ($transactions as $transaction) {
$transaction->price += $transaction->price * $vat
$transaction->save();
}
}
}
However, running it in a loop of 1 million and making a “database transaction” in each iteration of the loop — not a good idea! (Spoiler Alert: It’ll freeze your system 😀)
Then, how to do it?
Again, in our AddVatToTransactions
Seeder:
The idea in mysql query is “CASE Statements”
UPDATE db.transactions
SET PRICE = CASE
WHEN id = 3 THEN 500
WHEN id = 4 THEN 300
END
WHERE ID IN (3, 4)
Now, let’s do it in Laravel:
$vat = 0.20;
$transactions = Transaction::get();
$cases = [];
$ids = [];
$params = [];
foreach ($transactions as $transaction) {
$cases[] = "WHEN {$transaction->id} then ?";
$params[] = $transaction->profit * $vat;
$ids[] = $transaction->id;
}
$ids = implode(',', $ids);
$cases = implode(' ', $cases);
if (!empty($ids)) {
\DB::update("UPDATE transactions SET `price` = CASE `id` {$cases} END WHERE `id` in ({$ids})", $params);
}
This will make one database transaction to write it all your updates.⚡️
🗣 I can hear some of you saying: “It’s still FREEZING”
So.. Optimizing it even further:
#1: “Select” only the data you need from the database to consume less RAM.
In our example, we only use “id” and “price” columns. So let’s only select them.
$transactions = Transaction::select('id', 'price')->get();
#2: “Chunk” your collection to separate your transaction to multiple database transactions.
In Laravel, you can chunk collections like
Transaction::get()->chunk(5000);
Let’s apply all in our example
Here, first we divide our $transactions
collection into 5000 chunks and we do a “database transaction” per 5k records at once.
$vat = 0.20;
$transactions = Transaction::get();
foreach ($transactions->chunk(5000) as $chunk) {
$cases = [];
$ids = [];
$params = [];
foreach ($chunk as $transaction) {
$cases[] = "WHEN {$transaction->id} then ?";
$params[] = $transaction->profit * $vat;
$ids[] = $transaction->id;
}
$ids = implode(',', $ids);
$cases = implode(' ', $cases);
if (!empty($ids)) {
\DB::update("UPDATE transactions SET `price` = CASE `id` {$cases} END WHERE `id` in ({$ids})", $params);
}
}
Hope you like this trick!
Please let me know what you think in the comments 💬
Happy coding! 😊
Top comments (4)
Using
CASE
statements is a clever way to do this! You just need to make sure that you pay attention to the length of your query so that you don't go over the max query length/packet size. In MySQL you can check the max by looking at themax_allowed_packet
variable:show variables like 'max_allowed_packet';
When I have a one-off update to run against a large number of records, I typically create an Artisan command (or a job) that updates a small subset of the records, and then I'll schedule it to run every minute (or a longer period, if needed). It spreads the load over time and is easy to keep track of the progress. Once it's done, I remove the call from the scheduler and delete the command/job class.
Spreading it to several jobs is indeed a great strategy! I guess it's all about picking the right strategy for the job.
If the task is not a time-sensitive issue, spreading it overtime like you said beats the bulk update - but if it's time-sensitive, I go with the bulk update strategy.
Thank you,I used this in my Upload multiple images and rename them using drag & drop UI using dropzone.js
Link to article
img
Github link
I remember how this saved me few years back!