DEV Community

Thai Nguyen Hung
Thai Nguyen Hung

Posted on • Edited on

Laravel "Chunk" ... be careful

Laravel "chunk" method is widely used to split large query into smaller chunks when processing large database. But there is a trick if the attribute used in query would be changed in call back function. It is easy to be ignored and thus often be misused.

Problem

Post::where('processed', 0)->chunk(100, function($posts) {
    foreach($posts as $post) {
       $post->processed = 1;
       $post->save();
    }
});

Assumption: the total number of unprocessed posts is 400 (Id is 1 - 400).
The problem is: Only half of the posts are processed.

Analyzation

Each chunk is being fetched via single query with limit and offset. Let’s have a look how chunk works.

In the beginning, unprocessed posts are 400 (Ids 1- 400), for 1st chunk query: select * from `posts` where `processed` = 0 limit 100 offset 0, Ids 1-100 are fetched.

After 1st chunk, unprocessed posts are 300 (Ids 101- 400), for 2nd chunk query: select * from `posts` where `processed` = 0 limit 100 offset 100, Ids 201-300 are fetched.

After 2nd chunk, unprocessed posts is 200 (Ids 101- 200 and 301- 400), for 3rd chunk query: select * from `posts` where `processed` = 0 limit 100 offset 200, bummmm ... none of post is fetched.

The result is: 200 posts are processed, but 200 posts are left unprocessed.

Solution:

"chunkById" is recommended to replace "chunk" in this situation. "chunkById" is similar to "chunk". The only difference is how raw SQL is being constructed. "chunkById" uses id and limit while "chunk" uses limit and offset.
The modified code would be like:

Post::where('processed', 0)->chunkById(100, function($posts) {
    foreach($posts as $post) {
       $post->processed = 1;
       $post->save();
    }
});

When using "chunkById", each chunk query sql sentence would be like below:

1st chunk query: select * from `posts` where `processed` = 0 and `id` > 0 order by id asc limit 100

2nd chunk query: select * from `posts` where `processed` = 0 and `id` > 100 order by id asc limit 100

3rd chunk query: select * from `posts` where `processed` = 0 and `id` > 200 order by id asc limit 100

4th chunk query: select * from `posts` where `processed` = 0 and `id` > 300 order by id asc limit 100

Therefore the fetch result would be 1-100, 101- 200, 201 -300, 301- 400 in sequence, and all of the posts are processed.

Top comments (1)

Collapse
 
jonaswebdev profile image
Jonas

Good insight, thank you