loading...

Faster Eloquent Chunking

ryancco profile image Ryan ・3 min read

As your applications scale, processing large amounts of database records with Laravel Eloquent can become increasingly difficult. Resulting in out of memory exceptions and overall slowing down your application. Why is that?

When fetching results from the database you are in turn pulling that data into memory. Take this snippet of code for instance

Post::all()->each(function ($post) {
    // ...
});

Which results in the following query, loading all records in the posts table into memory

select * from posts;

Typically for tables with a small number of records, this is absolutely acceptable. However, as you accumulate tens of thousands of posts you will eventually begin hitting memory resource constraints of your webserver.

Chunking

A common approach in Laravel is to use Eloquent's (via BuildsQuery) chunk() method which fetches a fixed amount of records breaking a larger set into more consumable chunks.

Post::chunk(1000, function ($post) {
    // ...
});

While this might seem fine there are both improvements and gotchas to be aware of.

First, imagine the following scenario: you are fetching Post records from the database to update an attribute that you are also using in a where clause

Post::where('published_at', '<', now())->chunk(1000, function ($post) {
    $post->update('published_at', now());
});

Although contrived, this exemplifies a very real problem where such a query would result in an endless loop as the published_at attribute will always be less than now() at the time of the queries next execution (assuming accuracy to the second as with MySQL's timestamp column type or similar).

Second, there is the matter of the queries performance and its impact on the database server. The above code would result in a query similar to the following

select * from posts order by posts.id asc limit 1000 offset 9000

MySQL is unable to go directly to the offset due to deleted records and additional query constraints and, therefore, this query has to effectively select the first 10,000 records to return only the last 1,000 selected. As you can imagine this will not scale well into many-thousands or even millions of rows. This will cause your database server to use unnecessary resources slowing down all other queries from your application.

Chunking... but better!

In order to both prevent the unforeseen gotcha and improve database server performance we can use Eloquent's chunkById method

Post::where('published_at', '<', now())->chunkById(1000, function ($post) {
    $post->update('published_at', now());
});

The above code snippet will result in a query similar to the following

select * from posts where published_at < '2019-09-11 12:00:00' and id > 9000 order by id asc limit 1000

Why is this approach considered "better"?

a) it allows MySQL to completely skip the first 9000 (assumingly sequential) records
b) we will no longer be re-selecting records which we have already updated due to the id constraint in our where clause

Bonus - How?! 🤔

Diving into the chunkyById method of the BuildsQueries trait we see that the id of the last record fetched (remember, we're ordering by id in ascending order) is stored and used as a parameter in the next query to be run.

Posted on by:

ryancco profile

Ryan

@ryancco

I am a web developer interested in TDD, mentoring/giving back where possible, developer-first-DevOps, and Laravel.

Discussion

markdown guide
 

Sure interesting post, but I'm against using eloquent when mass updating records as it updates one by one. I prefer to use update() on query builder, something like :

DB::table('posts')->where('published_at', '<', now())->update(['published_at' => now()]);

It will result query like this :

update `posts` set `published_at` = 2019-09-25 07:09:55 where `published_at` < 2019-09-25 07:09:55
 

Very nice Ryan! Hope things are going well for you.